A galaxy is the environment in which stars are born and die, and distant galaxies are the luminous beacons that enable us to probe the distant universe. Our galaxy, the Milky Way, is one of billions of such systems in the observable universe. How galaxies formed represents a central theme in modern cosmology.

At first glance, the universe appears to contain two different types of galaxies: galaxies with disk-like morphologies and galaxies with spheroidal morphologies. This basic distinction breaks down, however, once individual disk and spheroidal galaxies are examined in detail, since most disk galaxies contain small spheroidal components at their centers and most spheroidal galaxies contain small disks at their centers.

Disks in and of themselves contain a wide variety of features. Most Disk Galaxies exhibit spiral arms with a large range of winding angles and contrast. Approximately half of all disk galaxies also contain a highly elongated bar structure near their center, the bars possessing a variety of axial ratios. Some disks also show moderate deviations from planarity toward their edges (warps) while other disks show significant lanes of dust across their observed profiles. Similarly, Elliptical Galaxies, while possessing relatively uniform profiles compared to disk galaxies, show a significant variety of substructure. At least half of ellipticals have detectable shell structure and others distinct cores. Finally, all galaxies, irrespective of type, show great variations in the amounts and spatial distribution of gas, dust, stars, and metal abundances as well as their basic surface brightnesses, luminosities, colours, and masses.

Despite their considerable diversity, galaxies show a remarkable degree of uniformity as well. The profiles of disk and ellipticals are remarkably homologous, the global structural parameters of disk and spheroidal galaxies define a tight two-dimensional plane, and the colours and apparent star formation histories of both spiral and elliptical galaxies show a striking correlation with luminosity. Of great significance is that most galaxies are very slowly evolving structures, both chemically and dynamically. Their properties were acquired long ago, at or soon after the epoch of galaxy formation.

Galaxies began as clouds of primordial gas, hydrogen and helium. Even before galaxies condensed into distinct clouds, infinitesimal density fluctuations were present in the Expanding Universe. These originated at very early epochs in an inflationary phase transition from a universe that initially was relatively uniform. Fluctuations grew in strength under the inexorable influence of self-gravity. Eventually, clouds developed that fragmented into stars. Much of the detailed physics in this schematic of Galaxy Evolution is now understood.

This review begins with a discussion of the cosmological world model in which galaxies form, discusses the processes by which the initial perturbations are established, presents the theory for the growth of these perturbations into collapsing and eventually virialized objects, illustrates the importance of gas cooling in the formation of galaxies, outlines the processes by which galaxies acquire angular momentum, and concludes by summarizing the basic observations and theory of disk and elliptical galaxy formation.

We begin by providing some background on the standard world model and the primordial fluctuations out of which galaxies are believed to have grown.

The apparent homogeneity and isotropy of the observable universe, both in terms of its large-scale structure and the cosmic infrared microwave background radiation—the almost constant 2.73 K blackbody radiation background in which the universe is immersed—motivate the assumption that the universe is both homogeneous and isotropic. By homogeneous, we mean that every point in space statistically resembles every other point in space. By isotropic, we mean there is no point in space where any direction differs statistically from any other direction in space.

Assuming universal homogeneity and isotropy, Einstein’s theory of General Relativity can be used to show that the evolution of the scale of the universe follows the two independent equations:

collectively known as Friedmann’s equations, where a is a measure of the size of the universe, p is the pressure, k is the curvature, ρ is the density, and G is Newton’s constant. Combined with the equation of state, these equations completely determine a(t), ρ(t), and p(t). These equations can be recast into the form:

where Q0 =8πGρ0, QR,0 =1/(H0a0R)2, QA,0 =A/3H02, ρ0 is the matter density of the universe at the present epoch, A is the vacuum energy density or cosmological constant, and R is a constant with units of length. Hubble’s constant, denoted by H0, characterizes the rate at which the universe is expanding at the present epoch. It is the constant of proportionality relating an object’s distance D to its rate of recession v:

Note that QR,0 =1 −Q0 −QA,0.

As the universe expands, the gravitational attraction of the mass inside it slows this expansion. This deceleration may or may not be enough to slow this expansion sufficiently so that the universe recollapses. The case where there is sufficient matter to cause such a recollapse corresponds to a universe where the universal geometry is closed, i.e., k = +1. The case where there is not sufficient matter to force such a recollapse corresponds to two separate geometries: one in which the universal geometry is flat (k = 0) and one in which the universal geometry is open (k =−1). Auniversewithaflatgeometry is known as Einstein–de Sitter.

The time evolution of the universal scale length a is amenable to the following simple analytic solution in the case of an Einstein–de Sitter universe where ordinary matter dominates the energy density

where t0 and a0 are the current age and size of the universe, respectively.

For an open universe, the solution is given in terms of the following parametric equations:

while for a closed universe, the parametric equations are

The turn-around time tm for this universe occurs when e=π, so from above equation, we find

for the turn-around time.

Requiring that the curvature be identical everywhere in space-time, the most general way of expressing the concept of distance is the Friedmann–Robertson–Walker metric:

The two-dimensional analogue to the Friedmann– Robertson–Walker metric is

which is more amenable to our everyday intuition, especially in the case of a closed k =+1 universe where

Using solutions to Friedmann equations, one may derive both the age of the universe and the effective distances to objects which existed at earlier epochs. By integrating up the infinitesimal times

Similarly, one may readily derive expressions for the distance though there is one subtlety. Two different measures of distance are standardly discussed in cosmology: the angular-size distance and the luminosity distance. Both distances are defined so that the standard expressions involving these quantities apply. The former, the angular-size distance, commonly denoted DA,is defined in analogy with the expression

where d is an object’s intrinsic size and θ is the angular size of the object on the sky in radians. Similarly, the latter, the luminosity distance, commonly denoted DL, is defined in

We now provide a heuristic derivation of the above equations. Imagine that the light from some object with size d, redshift zobs, and tangential coordinate χ converges to χ =0 and z =0 on paths where dθ =0. Along this path, the expression for the metric reduces to dl2 =a 2R2dχ2. The integrated coordinate distance χ is then

It is interesting to see how nova are distributed through the Galaxy. The distribution of longitudes is remarkable: out of 161 certain galactic nova, 74 occur between longitudes 345 degrees and 15 degrees, in other words , at least 15 degrees from the galactic center. Nova are thus particularly numerous in the direction of the galactic halo. As regards latitude, nova situated in the direction of the center are never more very far from the galactic plane. In other directions, several nova are known that are quite a long way from the galactic plane, but very few are more than 1500 parsec from it.

All his leads to the conclusion that nova form an intermediate Population II. However, they do not seem to form a homogeneous population. Two nova are known in globular clusters that we know form a typical Population II. In addition they have been found in three elliptical galaxies, which also form a typical Population II.

Many nova have been observed in nearby galaxies: 200 have been found in M 31 alone, which is as many if not more than are own Galaxy (they are on average of magnitude 16-17). Four nova are known in the Small Megallanic Cloud, six in the Large Megallanic Cloud, and a significant number in other galaxies. In all over 500 galactic and extra-galactic nova are now known.

Finally it should be pointed out that the nova observed in other galaxies are roughly of the same magnitude (-6 to -9) and have the same type of light curves as those in our galaxy. These nova are sometimes used to determine the distance of the galaxies in which they are observed. The results are uncertain, however, since the relationship between the speed of decline and the absolute magnitude is not rigorous one.

Our subjective perception of a nova is confined to light in the visual and photographic ranges of wavelength, omitting the ultraviolet. Since the ultraviolet is of varying importance, probably being of greater importance before the outburst than during it, our subjective perception tends to exaggerate the contrast between the pre-nova stage and the maximum emission of light during the outburst. The observed contrast for the visible light is usually about 10,000 to 1, but if all wavelengths are included the contrast would probably be 100 to 1.

The emission of visible light in the pre-nova stars is of a similar order to the emission of our Sun, whereas the emission at maximum outburst is of an order similar to an F8 super-giant star. A typical nova rises to its maximum in a few days and thereafter declines in brightness by a factor of about 10 in 40 days, although cases of both slower and more rapid declines are known and studies.

Clouds of gas are ejected at high speeds during outbursts, speeds typically of 930 miles/sec, which is more than sufficient for the expelled gases to become entirely lost into interstellar space, together with myriad fine dust particles that condense within the gases as they cool during their outward motion. The total amount of material thus lost is estimated to be about one part in ten thousand of the total mass of the parent white dwarf star, although the amount in especially violent cases is almost surely significantly larger than this.

Nova have varied amplitudes that range from 7 to more than 19 magnitudes, but the value cannot always be determined since the star is often very faint at its minimum. Nevertheless, the amplitudes are known for 76 stars. There are two peaks in the frequencies with which they occur, one for an amplitude of nine magnitudes and the other one of 12. However, it is probable that large amplitudes are more common but are not known.

Recurrent nova have very small amplitudes ranging from eight to ten magnitudes. Slow nova can have large amplitudes, but nova with amplitudes greater than 13 magnitudes are mostly fast nova. Note that tis figure does not include all nova and omits the largest ever Nova Cyg 1975, which is greater than 19 magnitudes. This star is considered by many to be an exception – intermediate between a nova and a supernova.

Much work has been done towards the establishment of absolute magnitudes. Standard methods for determining distances cannot be used at the distances of nova. Other measurements, such as the intensity of interstellar lines, (intensity increasing as the distance of the object increases) and secondly by obtaining the apparent velocity of expansion of the nebulosity, (enabling the distance to be known if radial velocity of the gases has been determined).

Two groups can also be detected by considering the maxima, one with absolute magnitudes around -6 and the other around -9, and these correspond to the two groups in the distribution of amplitudes. This shows that there is a correlation between the absolute magnitude at the maximum, the speed of decline T(3), and the amplitude; the very fast nova and this of large amplitude are also those with the greatest luminosity at maximum.

All these results are corroborated by the observation of nova in the Andromeda galaxy and in the Megallanic Clouds. They also have two peaks in their frequencies of occurrence around -6 and -9, and there is also a correlation observed between T(3) and the absolute magnitude. There are therefore no different from the galactic nova.

The pre-nova are generally not very well known, this is not surprising since it is not possible to predict which stars will become nova. However some nova had been known to be variables and so there is a pre-nova history on some stars.

These stars are obviously followed more closely in their post-nova phase. Some f them have fluctuations that are occasionally appreciable, with some sort of small secondary maxima of short duration but which may exceed one magnitude.

High precision photometry has revealed another type of variation that we shall find in many eruptive variables: this is “flickering,” which consists of small rapidly varying flares following each other without interruption.

RS OPh shows a semi-regular variation (P = 70 days) at an amplitude of 0.6 magnitudes. This confirms that there is an M giant in the system linked to a blue star.

In 1954 it was shown that DQ Her (nova 1934) is an eclipsing binary with a very short period of 4h 39m. Since then all the nova bright enough to be clearly observed have been shown to be double. In this case, therefor, doubling is a general feature.

These binaries are formed from a red star that is large but no very massive and a blue star of high density, which resembles a white dwarf. This dissimilar pair is generally closely bound and has a very short orbital period, usually a few hours.

There are several exceptions to this structure, some pairs have a red component that is a giant star and other cases it is a sub-giant; the pair containing a sub-giant having a much longer rotational period.

In some of these binaries small changes in period which arise from variations in the two stars has been detected. The most interesting case is that of V1500 Cyg. The period has changed from 0.1410 days at the beginning of September 975 (the time of the explosion) to 0.1399 days at the end of October and to 0.1384 days in May-June 1976. The period of the binary system may thus have been changed by the violence of the explosion.

Galaxies are much less massive than the mass scales (∼ 10exp14Mo) going nonlinear in the universe today, so clearly galaxies must be more than simply virialized structures. The key seems to be the process of cooling and the time scale for the settling of baryons into the centers of their dark matter halos. There are several cases to consider. Clearly, if the cooling time of a gas is larger than the Hubble time, the gas cannot have evolved much over the history of the universe. If the cooling time is smaller than the Hubble time but larger than the dynamical time, the gas will suffer slow quasistatic collapse into the center of the virialized halo. On the other hand, if the cooling time is smaller than the dynamical time, the ambient gas will undergo runaway cooling and collapse to the center of the virialized halo. It is this case, where the cooling time is much shorter than the dynamical time scales for ACCRETION or merging (Binney 1977, Rees and Ostriker 1977, Silk 1977) that is relevant for the formation of galaxies.

There are four important processes by which gas in halos cools: Compton cooling, free–free emission (bremsstrahlung), recombination, and collision-induced de-excitation.

We begin with a consideration of the Compton cooling process. When low-energy photons pass through a gas of non-relativistic electrons, they scatter off the electrons with the Thompson cross-section

where me is the mass of the electron and e is the charge of the electron. Some photons are scattered up in energy and some are scattered down, but the net effect is to slow the electrons relative to the frame of the cosmic microwave background radiation. The mean shift in photon energy per collision is

where k is Boltzmann’s constant, h is Planck’s constant, ν is the frequency of a photon, and Te is the temperature of the electron gas. In a sea of photons with temperature Tγ , the mean rate of energy loss per electron is

where a is the Stefan–Boltzmann constant. The cooling time tcool is

High temperature (10exp6–10exp7 K) primordial gases are almost entirely ionized. Under these circumstances, the dominant cooling mechanism is due to the acceleration of electrons off the bare H+ and He2+ nuclei. This results in a cooling rate per unit rate per unit volume:

The cooling time tcool here is approximately equal to

On the other hand, low temperature (10exp4–10exp5 K) primordial gases are only partially ionized. Here cooling is dominated by two processes: one where electrons recombine with ions resulting in the release of a photon (recombination) and one where electrons collide with partially ionized atoms, thereby exciting them to a state which they escape by the release of a photon. The total cooling rate can be expressed as

The latter process is the dominant one, and for primordial abundances, the function f(T ) can be approximated as 2.5(T/10exp6 K)−1/2 erg cm3 s−1. The cooling time tcool is then

We compare these cooling time scales with the dynamical time scales tdyn ∼ SQRT(1/Gρ). We consider a uniform spherical cloud with mass M in virial equilibrium with f fraction of its mass in dissipative baryonic matter and the rest in dark, dissipationless matter, so the gas mass Mg is equal to fM. For this mass configuration,

Now, we compare this dynamical time scale with the cooling times derived for each of the cooling processes discussed above. At early times, z > 10, the Compton cooling process dominates, and since

the dynamical time scale goes as

and the relative time scale τ goes as

Thus, Compton cooling will be important at z > 7 where τ < 1. Notice that the relative time scale does not depend upon mass, temperature, or density. Therefore, if galaxies formed at early times, they would have no preferred scales.

The relevant temperatures for lower mass halos (<=10exp12Mo) are less than 10exp6 K. In this case, line cooling dominates and the relative time scale τ goes as tcool/tdyn ∝ (T 3/2/ρ1/2) ∝ ((M2/3ρ1/3)3/2/ρ1/2) ∝ M. Therefore, the τ = 1 line runs parallel to lines of constant mass. To determine the mass limit more precisely, we look at

We can relate this to the mass of a spherical cloud model in virial equilibrium by using the following relation from the virial theorem:

where µ, the mean molecular weight, is roughly equal to half the proton mass mp since the medium is ionized. From this, it follows thatT 3 ∝ ρM2 ∝ nf −1M2 and then that


This sets the mass limit below which gas can effectively cool to form structures. For f ∼ 1, the mass limit is much larger than the typical limiting galaxy mass (∼ 10exp12Mo), but for smaller values consistent with constraints set by big-bang Nucleosnythesis (f ∼ 0.05 − 0.1), the mass limit is comparable to these limits.

On the other hand, for higher mass halos (<= 10exp12Mo), the relevant temperatures are greater than 10exp6K. Here the dominant cooling mechanism is bremsstrahlung, and the relative time scale ratios τ go as tcool/tdyn ∝ (T 1/2/ρ1/2) ∝ ((M2/3ρ1/3)1/2/ρ1/2) ∝ R, so the τ = 1 line runs parallel to lines of constant radius. To determine the limiting radius more precisely, we look at

Using the spherical cloud model again, we solve for R in terms of the other variables,

and so

Hence, for f ∼ 0.1, massive gas clouds of radii greater than 20 kpc can efficiently cool. Since this length is smaller than the typical cluster size, cooling is not very efficient in clusters, and therefore the gas simply suffers slow quasistatic collapse.

In the previous section, we discussed two important different regimes for virialized masses, one in which the cooling time was longer than the dynamical time and one in which it was shorter than it. In the former regime, one obtains Galaxy Clusters where most of the gas remains hot and in the latter regime one obtains galaxies where much of the halo gas has apparently cooled. In either case, one can use Press–Schechter theory to calculate the mass function, and with simple assumptions about the conversion of gas into stars or other luminous objects, one can convert this into a Luminosity Function of Galaxies.

Perhaps the most direct comparison with observations is via the mass function of galaxy clusters. The shape, expected to be exponential plus a power law tail, fits the prediction remarkably well, to the extent that cluster masses are well determined. Three techniques are used to estimate cluster masses: galaxy velocity dispersion and distribution, hot gas temperature and distribution, and gravitational lensing maps. The first two methods assume virial equilibrium. All three methods give consistent results, to within a factor of 2 in mass. One can compare the characteristic cluster mass, determined by the fitting function

with the predicted value of Mnl taken from field galaxy counts and a bias factor that has to be empirically deduced. Indeed, Mnl corresponds to a typical observed cluster mass. The normalization of the cluster mass function depends both on the mean density and σ(M, Z), with an exponential sensitivity to σ(M, Z). Only five percent of galaxies are in clusters, which can therefore account for perhaps one percent of the critical density. Clusters are therefore rare objects, typically 3σ fluctuations. The number density of clusters is controlled by both the mean density and σ8, in the combination σ8 Q0.6 ≈ 0.7 ± 0.2 . The scale 8h−1 Mpc, corresponding to unit amplitude of the optical counts and the mass M8 of a typical cluster, is used

As structures grow and collapse in the early universe, they exert tidal torques on each other, and this provides each collapsing mass with some angular momentum. This angular momentum, in turn, is important in determining the final properties of the disk and elliptical galaxies which form inside these collapsed structures.

The angular momentum of a collapsing halo can be expressed as

where x¯is the center of mass for the volume. Using equation (47), we express v as −ab˙∇10 where b(τ) = D/4πGρa¯3. For convenience we expand ∇10 in a Taylor series around the point

where Tjl =∂210/∂xj∂xl. Rewriting this, we get

where I(lk) is the inertial tensor.

We now estimate how Ji scales. Since I(lk) scales as a 2 until collapse while T(jl) continues to scale as ∇1/a2 ∼ (D/a)/a2 ∼ 1/a2, each structure effectively acquires angular momentum from its neighbors until collapse. Since the collapse of a structure occurs when δ ∼ 1, b ∼D(τ)/4πG ¯ρa2 ∼1/∇21 from Poisson’sρa3 ∼1/4πG ¯ equation and the relation D(τ) ∝ a,so T(jl) scales as ∇210 ∼1/b. I scales as MR02 ∼MR2/a2. Hence,

It is standard to construct a dimensionless quantity which characterizes the angular momentum that each collapsed mass has acquired via tidal torques. This quantity is called the dimensionless angular momentum λ, and it can be expressed as

Noting that |E|∼M2/R ∼M2(ρ/M)1/3 ∼M5/3(QH)1/3 ∼ Q1/3M5/3t−2/3, we see that λ ∼|J||E|1/2M−5/2 ∼ 0.071/3M5/3Q1/6−1/3M5/6M−5/2 Q−tt∼ Q0.1.

Therefore, the distribution of dimensionless angular momenta λ is essentially independent of a halo’s mass, collapse time, or even the basic world model. N-body simulations (Warren et al 1992, Cole and Lacey 1996, Catelan and Theuns 1996) and analytical treatments (Steinmetz and Bartelmann 1995) find a distribution which is well fitted by the expression

where λ¯=0.05 and σλ =0.5.

Disk galaxies make up the dominant component of the local galaxy census. Disk galaxies are known to have exponential profiles (I(r) ∝ exp(−r/rd)), to have significant fractions of dust and stars, to still be undergoing some Star Formation, and to be rotationally supported. They are extremely flattened objects and can appear very elongated if viewed edge-on. They also frequently have long bars and spiral structures. It is because of this latter feature that these galaxies are often called Spiral Galaxies.

Most of the global disk properties, e.g., mass, luminosity, size, and metallicity, define a two-dimensional manifold with little scatter about that manifold. It is more well-known in terms of its two-dimensional projections, in particular, the well-known Tully–Fisher Relation between luminosity and circular velocity. There are two main views on this tight relationship: one in which these processes as consequences of self-regulating mechanisms for star formation in disks (e.g. Silk 1997) and one in which this is simply the consequence of the cosmological equivalence of mass and circular velocity (e.g. Mo et al 1998).

In the past, disk galaxies were thought to have surface brightnesses tightly distributed around 21.65 bJ mag/arcsec2 (Freeman 1970). Shortly after this claim was made, arguments were made that there was a strong selection bias against low surface brightness galaxies and in reality the spread in surface brightness extended to much lower values (Disney 1976). Recently, there have been a large number of efforts to quantify the bivariate luminosity–surface brightness distribution (de Jong 1996, McGaugh 1996, Dalcanton et al 1997, Sprayberry et al 1997, de Jong and Lacey 1999). While the results are somewhat different in terms of their details, they suggest that the surface brightness distribution of galaxies peaks around 22 bJ mag/arcsec2 with a spread of ∼ 1 − 1.5 mag/arcsec2. The luminosity function of spiral galaxies is also nicely described by the Schechter function.

The typical values of the dimensionless angular momentum of collapsed halos (∼ 0.05) are considerably smaller than that of the largely flattened centrifugally-supported disk galaxies we observe in our universe today (∼ 0.4 − 0.5), so considerable dissipation must occur to produce these disks. Without the presence of dissipationless dark matter, the collapse would proceed in such a way that the total angular momentum J and the total mass m would be conserved, but the energy would scale as 1/R where R is the collapse factor, so that

√ λ ∝ JE1/2M−5/2 ∝ 1/R. The disk would then need to collapse by a factor of (0.5/0.05)2 ∼ 100 to obtain its observed dimensionless angular momenta, and this would take longer than the age of the universe for a 10-kpc disk! However, if the gas cloud collapses inside a dark matter halo, for which it represents only a fraction f of the mass, then the angular momentum J and mass M would scale by a factor f and the energy E would scale by a factor f 2, so that λ ∝ JE1/2M−5/2 ∝ 1/(f 1/2R1/2). For a typical estimate of the baryon fraction, f ∼ 0.1, the gas cloud would then only need to collapse by a factor of 10, easily accommodated in current theories. Despite the simplicity of this picture, a significant portion of the available gas cools to form Galaxies atT High Redshift. Detailed simulations which follow the evolution and merging of these galaxies into larger and larger systems produce disks whose sizes are much smaller than those observed (Steinmetz and Navarro 1999) because of substantial angular momentum transfer from the baryons to their dissipationless halos.

A non-negligible fraction of stars end their lives as Supernovae, injecting much of this energy into the ambient gaseous medium. This energy serves to heat the gas, either expelling it from the star-forming environment or making it too hot to be conducive to star formation. Hence, the formation of stars serves to suppress further star formation and hereby regulates itself. This process is quite logically called feedback. The presence of feedback, particularly in disk galaxies, explains why the conversion of gas into stars frequently requires ten to hundreds of dynamical time scales (∼ 10exp10 yr) instead of just several dynamical time scales (∼ 10exp8 yr).

Feedback also provides the preferred explanation for the flattening of the luminosity function relative to the mass function at low masses (see the section above on galaxy luminosity function). Dwarf Galaxy potential wells are shallow, and interstellar gas is readily energized above the escape velocity and therefore blown out in a galactic wind. Evidence for galactic winds is commonly found for Starburst Galaxies, often of relatively low mass.

Ellipticals make up the other principal component of the local galaxy census. Ellipticals possess elliptical isophotes with projected ellipticities � = a/b (a being the major axis and b the minor) ranging from 0 to 0.7, the former being denoted an E0 and the latter an E7. Low redshift ellipticals possess an abundance of low-mass stars and are therefore very red. The lack of short-lived blue stars is generally taken as an indication that these galaxies are very old and have not formed stars for at least 5–10 Gyr. Like spirals, the luminosity function for ellipticals can also be described by a Schechter function, but with a much shallower faint-end slope (Bromley et al 1998, Folkes et al 1999). Unlike spirals, ellipticals are predominantly found in dense regions, i.e., galaxy clusters (Dressler 1980).

Ellipticals are known to have approximately de Vaucouleurs surface brightness profiles:

where r is the radius and re is the half-light radius. To higher order, the surface brightness profiles of ellipticals show an important dichotomy. Some ellipticals, known as disky ellipticals, appear to have power-law profiles all the way into their center, and other ellipticals, known as boxy ellipticals, exhibit a sharp break from this power-law at some radius near the center.

Like spirals, the global structural properties of ellipticals are known to populate a two-dimensional manifold, commonly known as the fundamental plane. These are known according to various names: the Faber– Jackson (Faber and Jackson 1976) relationship (L ∝ σ4), the Kormendy luminosity–radius (Kormendy 1977) relationship, and the Dn-σ (Dressler et al 1987) relationship. It has largely been agreed that the fundamental plane is essentially a consequence of the virial theorem and a relatively homologous formation scenario where the mass-to-light ratio varies as a small power of the mass (M/L ∝ M1/6).

There are two prevailing scenarios for the formation of elliptical galaxies: one in which ellipticals formed as the result of mergers from spiral galaxies and one in which ellipticals formed at high redshift from monolithic collapse. We begin by presenting the monolithic collapse scenario.

Monolithic collapse

One possible mechanism for the formation of elliptical galaxies is the early formation of stars from the gas collapsing onto the center of a dark halo. Early collapse and fragmentation into stars prior to the collapse of the halo can constitute the core of the elliptical while stars formed from the secondary gas infall can constitute the shallower wings. An examination of the velocity-dispersion rotational-rate relationship demonstrates that ellipticals are essentially pressure-supported and that rotational flattening is not important in imparting ellipticity to these galaxies. In fact, detailed comparisons show that the dimensionless angular momenta of slow-rotating ellipticals seems to be no larger 0.05. In order to obtain the typical mass M(∼ 10exp11Mo), radius R(∼ 10 kpc), and angular momentum without recourse to dissipation, it would be necessary for the halo to collapse at redshifts beyond 10. On the other hand, with dissipation, one could easily obtain galaxies of the desired mass and radius, but the dimensional angular momentum would be too large (unless the initial angular momentum for the halo just happened to be particularly small), and the galaxy would resemble a disk.

Merger-based origin

Another mechanism for the formation of ellipticals is through the mergers of spiral galaxies. This mechanism provides a natural way of resolving the angular momentum problem, the crucial point being that since the spin angular momenta are randomly oriented with respect to each other, the resultant spin angular momentum for the formed elliptical can be considerably smaller than the spin angular momentum of the colliding disks. There are a number of other attractive features to this scenario. First, there are numerous examples of disk galaxies merging to form objects with de Vaucouleurs profiles in the local universe (Schweizer 1982, 1986), and it is quite conceivable that mergers were more frequent in the past. Secondly, nearly half of elliptical galaxies (Malin and Carter 1983, Schweizer and Ford 1984) possess features, such as shells or other sharp features, indicative of mergers or an otherwise violent formation. Thirdly, detailed N-body simulations of collisions between disk galaxies embedded in dark halos produce galaxies with de Vaucouleur profiles similar to those found in nearby ellipticals. Fourth, the Globular Cluster populations around ellipticals have bimodal metallicity distributions, indicative of a multi-stage formation scenario (Ashman and Zepf 1992, Zepf and Ashman 1993). All these features point toward the conclusion that at least some ellipticals formed by merging.

Before discussing the relative merits of the two formation scenarios for ellipticals, it is interesting to look at several of the difficulties which only arise in the merging scenario because of the close relationship between ellipticals and their progenitors (spirals). First, the energy per particle and phase space density are higher at the centers of ellipticals than any observed spiral, and therefore the merging process must be accompanied by a great deal of gas dissipation and cooling both to form a much deeper central potential and to obtain the high phase space density observed there if we presume this scenario is correct. In fact, nuclear starbursts are frequently observed to accompany such mergers (Schweizer 1990). Second, the number of globular clusters (∼ 104exp–10exp5Mo compact star clusters) per unit luminosity for ellipticals is typically 4–10 times larger than that for spirals (van den Bergh 1990), so disk-disk mergers must result in the formation of a large number of globular clusters if we presume this scenario is correct. Finally, while ellipticals might be expected to show relative alpha-to-iron abundances typical of spirals, ellipticals contain significantly larger abundances of alpha elements than iron elements, so a substantial fraction of the stars present in ellipticals must have formed in the merger events between two spiral galaxies.

A comparative evaluation

The principal observational differences between the monolithic collapse and merger scenarios for elliptical formation concern their predictions for the formation history of ellipticals. Monolithic scenarios tend to form elliptical galaxies at very high redshifts (z> 3) while the elliptical population builds up more gradually in hierarchical scenarios.

Consequently, the merger scenarios, with their more diverse and contemporary formation histories, show more scatter in both the colour-magnitude relationship and the fundamental plane than monolithic collapse scenarios. Observationally speaking, ellipticals show a high degree of uniformity both in their small color-magnitude scatter, i.e. σ(U − V) = 0.15 (Bower et al 1992) and their tightness around the fundamental plane (Renzini and Ciotti 1993). This observed tightness about the fundamental plane extends to z ∼ 1 (Aragon-Salamanca et al 1993, Stanford et al 1998). The observed tightness supports a monolithic collapse scenario where ellipticals form early and somewhat coevally. Of course, in hierarchical scenarios, most galaxies assemble quite early (z ∼ 2) in the rich clusters, where the most compelling examples of tight fundamental planes are observed, so apparent difficulties with this scenario are not as strong as they first might seem (Kauffmann and Charlot 1998a).

Due to the different formation times for ellipticals, these scenarios also yield remarkably different predictions for the evolution in the number density of early-type galaxies as a function of redshift. While there has been an increasing number of studies reporting a devolution in the number and luminosity of ellipticals at high redshift relative to that found in the local universe (Kauffmann and Charlot 1998b, Kauffmann et al 1996, Zepf 1997, Barger et al 1999, Menanteau et al 1999) as would be expected in a hierarchical scenario where their formation is more gradual, these results remain somewhat controversial (Broadhurst and Bouwens 1999, de Propris et al 2000).

Another important difference between these scenarios is the star formation rates they predict at high red-shift. In the hierarchical scenario, galaxies start out small and slowly build up to the massive entities we observe in the universe today. Clearly, we do not expect large star formation rates here at early times except possibly when two galaxies merge. On the other hand, in the monolithic scenario, ellipticals need to undergo huge star formation rates (∼ 100 Mo/yr) to form the typical 10exp11 Mo stars observed in nearby giant elliptical galaxies at high redshift since there is only a period of ∼ 10exp9 yr available. In fact, very few galaxies with these huge star formation rates (∼ 100 Mo/yr) have been found in either emission-line searches or Lyman-dropout searches at moderate redshifts (1 >z>5), pointing to either a high redshift of formation or dust-enshrouding. Recently, however, SCUBA results and subsequent follow-up work have revealed a population of ultraluminous infrared galaxies at moderate red-shifts with high enough star formation rates (∼ 100 Mo/yr) to match those needed in a monolithic collapse scenario. Nevertheless, the exact nature of this population, its number density, and its relevance remain unclear.


Many spiral galaxies feature a bulge, or a spheroid, at their centers. Spheroids resemble elliptical galaxies in many important respects including their overall appearance and placement in the fundamental plane. This suggests that bulges are nothing but elliptical galaxies upon which gas has later accreted. Note, however, that somewhat contrary to ellipticals, in particular ellipticals with boxy isophotes, is the presence of considerable rotational flattening in many bulges (Davies et al 1983, Davies 1987). This is in agreement with what one might expect from dissipational collapse and, in particular, from the formation of bulges via disk instabilities (van den Bosch 1998).

While there are many things we do not understand about galaxy formation, many pieces of the picture now seem to be clear. Galaxies seem to form in a homogeneous, isotropic universe that is expanding according to Friedmann’s equations. Inflation, though not unique, appears to be a relatively successful way of producing the scale-free spectrum of density fluctuations out of which galaxies have formed. Growth of the fluctuations can be followed initially with linear growth theory and later using a spherical collapse model. Press– Schechter theory provides a relatively successful way of putting these ingredients together to predict the mass spectrum of collapsed objects. The relative magnitudes of the cooling and dynamical time scales are important for determining the mass range of galaxies, galaxies forming when the cooling time is smaller than the dynamical time. Disk galaxies form from the cooling of gas onto the centers of collapsed halos, the gas settling into a disk supported by its angular momentum. Elliptical galaxies, on the other hand, seem to form by disk–disk mergers or by gas cooling within a halo of low intrinsic angular momentum (monolithic collapse).

Many important questions remain. For example, what is the relative importance of different mechanisms for the formation of both ellipticals and bulges? How do the sizes, luminosities, star formation rates, number densities, and metallicities of various galaxy types evolve over the history of the universe? What mechanisms are responsible for the tight correlation between the global properties of ellipticals and spirals? While theoretical simulations are becoming increasingly sophisticated, the inherent nonlinearity of galaxy formation processes make the role of new observations tantamount. To give the reader a taste of the improvements we will see in the next ten years in probing galaxy formation in the most remote regions of the universe, in figure 2 we have included some simulations for a hierarchical merging model using two current generation instruments (WFPC2 and NICMOS) and two future generation instruments (ACS and NGST). The obvious increase in depth will clearly bring our already moderately mature understanding of galaxy formation further into focus.