Derivation of the partition function
From Academic Kids

The partition function provides a link between the microscopic properties of atoms and molecules (e.g. size, shape and characteristic energy levels) and the bulk thermodynamic properties of matter. In order to understand the partition function, how it can be derived, and why it works, it is important to recognize that these bulk thermodynamic properties reflect the average behavior of the atoms and molecules. For example, the pressure of a gas is really just the average force per unit area exerted by its particles as they collide with the container walls. It doesn't matter which particular particles strike the wall at any given time or even the force with which a given particle strikes the wall. In addition it is not necessary to consider the fluctuations in pressure as different numbers of particles hit the walls, since the magnitude of these fluctuations is likely to be extremely small. Only the average force produced by all the particles over time is important in determining the pressure. Similarly for other properties, it is the average behavior that is important. The partition function provides a way to determine the most likely average behavior of atoms and molecules given information about the microscopic properties of the material.
In order to derive the partition function, consider a system composed of N molecules. Although the system has a constant total energy of E, the energy may be distributed among the molecules in any number of ways. As molecules interact, the energy is continually redistributed. Not only is energy exchanged between molecules, but between the various modes of motion (e.g. rotation, vibration, etc...). Instead of attempting to determine the energy of each individual molecule at every instant in time, we instead focus on the population of each energetic state. In other words, we would like to determine on average how many molecules, n_{i}, are in a particular energetic state, E_{i}. Over time the population of each state remains almost constant, although the individual molecules in each state may change at every collision.
In order to proceed we assume the ergodic hypothesis. This means that we assume that all states corresponding to a given energy are equally probable. (If there are other conserved quantities like particle number, this assumption becomes all states corresponding to a given energy and particle number/charge, and a similar derivation would lead to chemical potentials, electric potentials and the like) For example, vibrational states of a given energy are just as likely to be populated as rotational or electronic states of the same energy. We also assume that the molecules are independent in the sense that the total energy of the system is equal to the sum of the energies of each individual particle.
At any instant there will be n_{0} molecules in the state with energy E_{0}, n_{1} with E_{1}, and so on. The complete specification of populations n_{0}, n_{1},... for each energy state gives the instantaneous configuration of the system. For convenience we may write a particular configuration as {n_{0}, n_{1},...}. We'll also take E_{0} to correspond to the lowest energy level or the ground state.
A large number of configurations are possible. For instance one possible configuration is {N,0,0,...} with all of the molecules in the ground state, E_{0}. Another possible configuration could be {N1,1,0,...}, where one of the molecules is in the excited state, E_{1}. Of these two configurations, the second is much more likely, since any of the N molecules could be in the excited state resulting in a total of N possible arrangements of molecules. On the other hand there is only one possible way to get the first configuration, since all of the molecules must be in the ground state. If the system were free to fluctuate between these two states, we would expect to find it most frequently in the second state, especially for large values of N. Since the system would most often be found in the second state, we would also expect the characteristics of the system to be dominated by the characteristics of that state.
The number of arrangements, W, corresponding to a given configuration {n_{0}, n_{1},...} is given by:
 <math>W = \frac{N!}{n_0 !n_1 !n_2 !...}\qquad\qquad\qquad(1)<math>
This expression comes from combinatorics (and is applied in probability theory) and corresponds to the number of distinguishable ways N objects can be sorted into bins with n_{i} objects in bin i.
When working with large numbers it is often convenient to work with ln(W) instead of W itself. For this case:
 <math>\begin{matrix}\ln W
&=& \ln \frac{N!}{n_0 !n_1 !n_2 !...} \ \qquad\qquad\qquad \\ \\ \ &=& \ \ln N!  \ln(n_0 !n_1 !n_2 !...) \ \qquad\qquad \\ \\ \ &=& \ \ln N!  \sum_{i=0}^m \ln n_i! \ \qquad\qquad(2)\end{matrix}<math>
Applying Stirling's approximation,
 <math>\ln n! \approx n\ln n  n<math>
and the fact that
 <math>N = \sum n_i<math>
gives
 <math>\begin{matrix}\ln W
&=& \ N \ln N  N  \sum (n_i \ln n_i  n_i) \ \qquad\qquad \\ \\ \ &=& \ N \ln N  \sum (n_i \ln n_i) \ \qquad\qquad(3)\end{matrix}<math>
We showed previously that the configuration {N1,1,0...} dominates {N,0,0,...} because there are more ways to obtain it. We would expect there to be other configurations that dominate both of these. In fact we would expect the configuration with the largest value of W to dominate all other configurations. We can find this dominant configuration by finding the maximum of the function W with respect to n_{i}. We know that when W is a maximum then ln(W) is also a maximum, so for convenience we will instead try to find the maximum of ln(W).
One way to find the maximum of ln(W) is to solve the equation:
 <math> \frac{\partial \ln(W)}{\partial n_i} = 0\qquad\qquad\qquad(4)<math>
However, Equation (4) applies to the situation in which any arbitrary configuration {n_{0}, n_{1},...} is possible. In reality there are a few constraints on the system that must be accounted for. First, since the total number of molecules is fixed at N, not all values of n_{i} can be arbitrary. Instead only configurations in which:
 <math>N = \sum n_i\qquad\qquad\qquad(5)<math>
are possible. Also, the total energy of the system is fixed at E. Therefore, since the total energy is the sum of the energies of all the individual molecules:
 <math>E = \sum n_i E_i\qquad\qquad\qquad(6)<math>
We can find the maximum of ln(W) subject to the constraints on N and E expressed in equations (5) and (6) using the method of Lagrange multipliers as follows. First, we must rearrange the constraint equations as:
 <math>N  \sum n_i = 0 \quad\mbox{and}\quad E  \sum n_i E_i = 0\qquad\qquad\qquad(7)<math>
Next, we create a new function by multiplying the constraints by the arbitrary constants α' and β, and adding them to the original function, ln(W), to get:
 <math>\begin{matrix}f(n_i)
&=& \ln(W)  \alpha'\left(N\sum n_i\right)
+ \beta\left(E\sum n_i E_i\right)\qquad\qquad\qquad \\ \\
&=& N \ln N  \sum n_i\ln n_i
 \alpha'\left(N\sum n_i\right) + \beta\left(E\sum n_i E_i\right)\qquad(8) \end{matrix}<math>
Taking the derivative of Equation (8) and setting the result to zero gives:
 <math>\frac{\partial f(n_i)}{\partial n_i} = \left(1 + \ln(n_i)\right) + \alpha'  \beta E_i = 0\qquad\qquad\qquad(9)<math>
We define a new parameter α = α'  1, giving:
 <math>\ln\left(n_i\right) + \alpha  \beta E_i = 0<math>
Solving this for n_{i} gives the most probable population of state E_{i}:
 <math>n_i = \exp(\alpha  \beta E_i)\qquad\qquad\qquad(10)<math>
Finally, we must evaluate the constants α and β. Substituting Equation (10) back into Equation (5) and solving for exp(α) gives:
 <math> N = \sum n_i = \exp(\alpha)\sum\exp(\beta E_i)<math>
 <math> \exp(\alpha)= \frac{N}{\sum\exp(\beta E_i)}<math>
Changing the subscript to j and substituting this result back into Equation (10) gives the MaxwellBoltzmann distribution:
 <math>n_i = \frac{N \exp(\beta E_i)}{\sum\exp(\beta E_j)}\qquad\qquad\qquad(11)<math>
The Boltzmann distribution gives the most probable energy distribution of molecules in a system. It can further be shown that β = 1/kT, where k is Boltzmann's constant and T is the absolute temperature (given in kelvins). The term in the denominator is called the partition function and is defined as follows:
 <math>Z = \sum_j \exp\left(\frac{E_j}{kT}\right)<math>
The partition function provides a measure of the total number of energetic states that are accessible at a particular temperature and can be related to many different thermodynamic properties (see Statistical mechanics).
Canonical ensemble derivation
The previous derivation is too restricted. Because of its assumption of the independence of the molecules, it only really applies to ideal gases. In the following derivation, we assume the system is immersed in a "heat bath" environment with no transfer of matter across the boundary. The environment is assumed to be so much larger than the system that a huge influx of heat (energy) across the boundary in either direction would affect the system a lot but create very little of a change in its environment. Let's call the system S and the environment V, the energy of the system E_{S} and the energy of the environment E_{V}. The conservation of energy tells us E=E_{S}+E_{V} is conserved. Suppose the states of the system have energies E_{i} for the i^{th} state. The corresponding case for a continuum of states is similar in argument. Because the environment is so huge, even if its energy spectrum is discrete, the spacing between its energy levels is so small that for all intents and purposes, we can treat it as a continuum. So, let W(E_{V}) give the number of environmental states with energies between E_{V} and E_{V}+dE divided by dE. It's a general statistical observation that at least for tiny changes in energy (and because the environment is so huge, even what is considered an extremely, extremely huge energy transfer for the system, is still tiny for the environment) tends to vary exponentially with E. So, since the environment is so huge,
 <math>\ln[W(E_V)] = \ln[W(EE_S)]\approx \ln[W(E)]E_S\left.{d \ln[W]\over dE'}\right_{E'=E}<math>
as E_{S} varies over the entire range of energies the system can take without having astronomically tiny probabilities.
So, essentially, assuming ergodicity, the probability of the system being in state i is proportional to
 <math>W(EE_i)<math>
which in turn is proportional to
 <math>e^{E_i\left.{d \ln[W]\over dE'}\right_{E'=E}}<math>.
Let's call <math>\left.{d \ln[W]\over dE'}\right_{E'=E}<math> β. Note that this is purely a property of the environment. Then,
 <math>P_i\propto e^{\beta E_i}<math>
Grand canonical ensemble derivation
Let's now assume both energy and matter can be exchanged with the once again very huge environment. Then, using the same arguments,
 <math>\ln[W(E_V,N_V)]\approx \ln[W(E,N)]E_S\left.{\partial \ln W\over \partial E}\right_{E, N}\sum_a N_a\left.{\partial \ln[W]\over \partial N_a}\right_{E,N}<math>
assuming particle number is conserved (otherwise, we would have ergodicity with respect to particle number). Particle numbers are always integral, but it still turns out this approximation is always valid for cases of interest.
As before, call <math>\left.{\partial \ln W\over \partial E}\right_{E, N}<math> β and <math>N_a\left.{\partial \ln[W]\over \partial N_a}\right_{E,N}<math> βμ_{a} (the chemical potential) and note these are solely the property of the environment.
 <math>P_i\propto e^{\beta\left(E_i\sum_a \mu_a N_{ia}\right)}<math>.
Let's get back to the probability distribution of a molecule. As long as it doesn't interact too strongly with the other molecules (i.e. it's in a gas) we can assume the particle is a system and the rest of the gas is the reservoir. This leaves us with the same Boltzmann distribution. However, even for a nonideal gas, the energy is not simply the sum of the energy of the particle with the energy of the rest of the system because there are interaction terms, rendering the Boltzman distribution incorrect for the particle, even though it still continues to hold for the system as a whole.
Actually, the assumption of a very huge reservoir is often overkill as long as the number of molecules in the system is astronomical. It turns out for these systems, the number of states of the system at a given energy varies exponentially with energy and the number of particles. So, the range of values over which P_{i} has appreciable value is often much smaller than E_{S} and N_{S} by many orders of magnitude. So, oftentimes, a reservoir much smaller than the system can work.