***Dualities in two-dimensional Ising models

Suggested background: should be digestable for a second or third year physics major. Some exposure to statistical mechanics would help.

1. Introduction
2. Lattices
3. Ising model on a square lattice
     3.1. High temperature expansion
     3.2. Low temperature expansion
     3.3. Kramers-Wannier duality and the critical temperature
4. Ising model on triangular and hexagonal lattices
     4.1. Low and high temperature expansions
     4.2. Star-triangle transformation and the critical temperature

1. Introduction

This fall, I took an excellent class on the statistical mechanics of phase transitions, taught by Professor Dam Son. On our final exam, we were asked a problem which requires a clever trick: compute (exactly!) the critical temperature of the Ising model on a 2-dimensional triangular lattice. Since I’m not assuming any stat mech knowledge, let me provide a sketch of the problem. Imagine we have a 2D crystalline lattice of electrons that looks something like this:

Each electron has a spin which can point either up or down, and each electron only interacts with its nearest neighbors on the lattice (for example, on the triangular lattice, each electron has six nearest neighbors). Let’s further assume that two neighboring electrons whose spins are oppositely aligned contribute an energy {J} to the overall energy, while two electrons whose spins point in the same direction contribute {-J}. Then the overall energy associated to a particular configuration of spins on the lattice is

\displaystyle H = -J\sum_{\langle ij\rangle}S_i S_j, \ \ \ J>0

where {S_i=\pm 1} is the spin of the electron at lattice site {i} and the sum is taken so that only electrons which are neighbors contribute. This is called the Ising model.

There are a few qualitative observations we can make at this stage. Configurations of electrons which `go with the flow’ cost less energy; it isn’t too hard to see that having all the spins pointing in the same direction minimizes the energy of the system. At low temperatures (say absolute zero) the electrons will organize themselves to do just this; this is called the ordered phase.

At higher temperatures there will be thermal fluctuations which will allow some electrons to incur the energy cost associated with going against the grain, i.e. there will be pairs of electrons whose spins are anti-aligned. If one chooses the temperature to be sufficiently high, these thermal fluctuations dominate, and the spins will point in essentially random directions. This is called the disordered phase, and can be represented by a `percolation diagram’ where regions with spin up electrons are colored e.g. purple and regions with spin down electrons are colored e.g. blue.

This whole discussion can be summarized in a famous graph which plots the average value of the spins against the temperature.

One naturally wonders about intermediate temperatures. Is there a temperature at which one transitions from the ordered phase to the disordered phase? The graph above suggests yes (at least for the Ising models we’re interested in), and the temperature at which the phase transition occurs is called the critical temperature of the model, {T_\mathrm{c}}. The physics of systems at criticality is incredibly rich, hence our motivation to figure out what exactly the critical temperature is. In general this is very hard to do analytically, but for certain Ising models there is a trick involving a duality between ordered and disordered systems that allows you to solve this problem exactly. So I propose the following plan:

0) We’ll set up a bit of machinery involving lattices.
1) We’ll work our way towards defining this duality for the Ising model on a square lattice and argue that the system at its critical temperature is self-dual; intuitively, since our duality transforms ordered systems to disordered systems and vice versa, the point at which the Ising model transitions between these two phases will be a system which is mapped to itself under this duality transformation. This will recover the critical temperature.
2) We’ll see that this duality on its own won’t be enough to compute the critical temperature for the Ising model on a triangular lattice. Enter the star-triangle transformation. Using this additional ingredient will get us what we want, as well as the solution to the hexagonal lattice for free.

Here we go.

2. Lattices

This is a physics post, and I want it to be as painless as possible for math-phobes so I’m going to basically say as little as I need to say about lattices (and only through pictures!) so that I can get to the thermodynamics as fast as possible. If any of the definitions are confusing, the pictures accompanying them should clear things up. Caution: I may make up my own terminology or repurpose terminology that already exists, so try not to take my language too seriously!

We can myopically think of a 2-dimensional lattice as a repeating set of points in the plane, which will usually be drawn so that points which are nearest neighbors are connected by links. We will really only need to deal with three lattices in this post: the square lattice, the triangular lattice, and the hexagonal lattice. In these cases, the links bound squares, triangles, and hexagons which tile the plane, and we call each tile a cell. (Note: I’ve drawn things so that the vertices of the polygons are the lattice sites and their edges are the links. These lattices also extend infinitely in all directions, despite the fact that we have not yet invented infinitely large computer screens.)

Every lattice has a dual lattice; it’s obtained simply by putting a point at the center of each cell in the original lattice. Connecting the points of the dual lattice with perpendicular bisectors to the links of the original lattice gives you the links of the dual lattice. For example, the duality between the triangular and hexagonal lattices is depicted below.

Usually something only deserves to be called a `duality’ if the dual of the dual recovers the original object. And indeed, you can convince yourself by staring at the pictures above that the dual lattice of the dual lattice is just the original lattice you started with. Can you see why the square lattice is (conveniently) self-dual, i.e. is it’s own dual lattice?

By an Ising graph on a lattice, we mean a subset of the links of that lattice with the property that each point is touched by an even number number of links. We can draw an Ising graph by highlighting the links that belong to it.

The length of an Ising graph is just the number of links it has. Importantly, the smallest Ising graphs on the square lattice (aside from the unique graph of length zero) have length 4, and correspond to square loops. Similarly, the smallest Ising graphs on the triangular and hexagonal lattices have length 3 and 6 respectively (draw them).

Let’s say we’re given a configuration of spins. We can associate a domain wall drawing to this configuration by separating the regions which are spin up from the regions that are spin down. Here’s an example on the triangular lattice.

Staring at the last two pictures, one realizes that the Ising graph we drew on the hexagonal lattice is precisely the domain wall drawing we drew on the triangular lattice. This relationship is general: the domain wall drawings of a lattice are in one-to-one correspondence with the Ising graphs of its dual lattice.

One final remark: it will be convenient for us to `compactify’ our lattice. This simply amounts to picking some square region for the lattice to lie in and identifying the boundaries so that the lattice is really defined over a torus. If you haven’t seen ideas like this before, don’t worry; it’s not central to what follows. It just allows us to have a finite lattice without having to worry about boundary conditions (and without incurring serious physical consequences).

All the machinery is in place now. But why did we even bother? In the next sections, we will define the partition function of the Ising models and interpret its terms `diagrammatically’ in terms of Ising graphs and domain wall drawings.

3. Ising model on a square lattice

Let’s focus on the physics now. We will use units in which the Boltzmann constant is equal to one, {k_B = 1}. Recall that the partition function for a thermodynamic system is defined as

\displaystyle \mathcal{Z} = \sum_\alpha e^{- \beta E_\alpha}, \ \ \ \ \beta=1/T

where the sum is over all the possible states of the system, and {E_\alpha} is the energy of the {\alpha}th state. The partition function is a convenient way to encode all the thermodynamic information about the system that we care about. For example, the average energy of the system at temperature {T} can be computed from the partition function with the formula

\displaystyle \langle E \rangle = - \frac{\partial \log \mathcal{Z}}{\partial \beta}

and the free energy is defined through

\displaystyle F = - T \log \mathcal{Z}.

Let’s start by computing two different expansions of the partition function for the Ising model on the square lattice, one for low temperatures and one for high temperatures.

3.1. High temperature expansion


\displaystyle \mathcal{H}(\mathfrak{S}) = K \sum_{\langle i j \rangle}S_iS_j, \ \ \ K = J/T.

Then by definition the partition function is

\displaystyle \mathcal{Z}_{\square}(K) = \sum_{\mathfrak{S}} \exp\big[\mathcal{H}(\mathfrak{S})\big]

where the sum is over all possible spin configurations on the square lattice. This is just a shorthand notation:

\displaystyle \sum_{\mathfrak{S}} = \sum_{S_1=\pm 1}\cdots\sum_{S_N=\pm 1}

where {N} is the total number of lattice sites. Writing this out a little more explicitly gives us

\displaystyle \mathcal{Z}_\square(K) = \sum_{\mathfrak{S}} \prod_{\langle ij \rangle}e^{KS_iS_j}.

Because the spins take on values {\pm 1}, we can use the identity {e^{KSS'} = \cosh K ( 1 + S S'\tanh K )} to further rewrite this as

\displaystyle \mathcal{Z}_\square(K) = (\cosh K)^{N_\ell}\sum_{\mathfrak{S}}\prod_{\langle ij \rangle}(1+S_iS_j\tanh K)

where {N_\ell} is the total number of nearest neighbor links on the square lattice. We would like to multiply out the product to obtain an expansion of the partition function in powers of {\tanh K}. To do this, let’s introduce a convenient shorthand: a link connecting two neighboring lattice sites {i} and {j} on the lattice will be denoted {\ell = \langle i j \rangle}. Then if we define {\pi(\ell) = S_i S_j}, I claim we can write the product

\displaystyle \prod_{\langle i j\rangle}(1+S_iS_j\tanh K) = \sum_n \sum_{\{\ell_1,\dots,\ell_n\}}^\ast \pi(\ell_1)\cdots\pi(\ell_n)(\tanh K)^n

where the sum {\sum_{\{\ell_1,\dots,\ell_n\}}^\ast} is over all subsets of links of size {n}. Implicit in the definition of subset is that no two links are equal, {\ell_i\neq \ell_j} and order does not matter, so that we do not count e.g. {\{\ell_1,\ell_2\}} and {\{\ell_2,\ell_1\}} twice in the sum! If it is hard to see where this formula comes from, try multiplying out e.g. {(1+x_1)\cdots(1+x_4)} and seeing that it agrees; this corresponds to the imaginary situation of having four total links in the lattice.

Consider a subset of links {\{\ell_1,\dots,\ell_n\}} so that some lattice site {i} is touched by an odd number of these links. The term {\pi(\ell_1)\cdots\pi(\ell_n) (\tanh K)^n} will not contribute in the partition function. To see this, notice that, since {S_i} appears an odd number of times in {\pi(\ell_1)\cdots \pi(\ell_n)}, we will have that

\displaystyle \pi(\ell_1)\cdots\pi(\ell_n)\big\vert_{S_i=1} = - \pi(\ell_1)\cdots\pi(\ell_n)\big\vert_{S_i=-1}

i.e. the coefficient evaluated with {S_i=1} will have the opposite sign compared to when it is evaluated with {S_i=-1}. Remember that in the partition function, the sum over spin configurations can be written as {\sum_{\mathfrak{S}} = \sum_{S_1=\pm 1} \cdots \sum_{S_{N} = \pm 1}} so that moving {\sum_{S_i=\pm 1}} all the way to the front forces the term to vanish.

The discussion above implies that a term involving {\{\ell_1,\dots,\ell_n\}} can only contribute if every lattice site is touched by an even number of the links {\ell_i} (it’s OK for a lattice site not to be touched by any of the links as well). Furthermore, for all such subsets, {\pi(\ell_1)\cdots\pi(\ell_n) = 1}. Now, we will see the full power of the machinery we developed in the previous section. By definition, the subsets that contribute are precisely the Ising graphs on the square lattice! We can combine this entire discussion to simplify things considerably. Let {G_\square^{(n)}(N)} be the number of distinct Ising graphs of length {n} on a square lattice with {N} sites. Evaluating the sum on spin configurations first gives us

\displaystyle  \begin{array}{rcl}  \mathcal{Z}_\square(K) &=& (\cosh K)^{N_\ell} \sum_n \sum_{\{\ell_1,\dots,\ell_n\}}^\ast\sum_{\mathfrak{S}} \pi(\ell_1)\cdots\pi(\ell_n)(\tanh K)^n = (\cosh K)^{N_\ell} \sum_n\sum_{ \substack{ \text{Ising}\\ \text{graphs}}} \sum_{\mathfrak{S}}(\tanh K)^n \\ &=&(\cosh K)^{N_\ell}2^N \sum_n \sum_{ \substack{ \text{Ising}\\ \text{graphs}}} (\tanh K)^n = (\cosh K)^{N_\ell}2^N \sum_n G_\square^{(n)}(N)(\tanh K)^n \\ &=& (\cosh K)^{N_\ell}2^N\left(1 + N(\tanh K)^4 + \cdots \right) \end{array}

where again {N_\ell} is the number of links and {N} is the number of lattice sites. So the partition function admits a diagrammatic interpretation in the same spirit as Feynman diagrams: for each Ising graph of length {n}, add {(\tanh K)^n}.

The first non-trivial term of the expansion has coefficient {N} because there are {N} squares on a square lattice with {N} sites.

3.2. Low temperature expansion

Let’s now develop an expansion that works for low temperatures. Recall we argued that when the temperature approaches zero, the only relevant configurations are the ones where every spin points in the same direction so as to minimize the energy. There are only two of these — every spin points up or every spin points down — and the energy is {-JN_\ell} in both cases, so the partition function limits to

\displaystyle \mathcal{Z}_\square(K)\sim 2 e^{KN_\ell} \text{ as } T\rightarrow 0.

As one increases the temperature, one can imagine configurations of spins in which one electron is oppositely aligned becoming more relevant. There are {2N} of these, {N} of them corresponding to one electron spin up and the rest spin down and the remaining {N} corresponding to one electron spin down and the rest spin up. The energy of such a configuration is larger by {4\cdot 2J}; and so we can obtain a better approximation to the partition function as

\displaystyle \mathcal{Z}_\square(K) \sim 2e^{KN_\ell}(1 + N(e^{-2K})^4)

We can continue in this way, throwing in spin configurations with larger and larger amounts of electrons with oppositely aligned spins. Recall: spin configurations are in one to one correspondence with domain wall drawings on the lattice. It is also true in general that the energy of a spin configuration is larger than the ground state configuration by precisely {n\cdot 2J} where {n} is the length of the domain wall drawing we associate to that spin configuration. We can therefore write the full expansion as

\displaystyle \mathcal{Z}_\square(K) = 2e^{KN_\ell}\sum_n D_\square^{(n)}(N)(e^{-2K})^n

where we have denoted the number of domain wall drawings of length {n} on a square lattice with {N} sites as {D^{(n)}_\square(N)}. But the domain wall drawings of a lattice are also in one to one correspondence with Ising graphs of the dual lattice. The square lattice is self-dual and has the same number of sites as the original lattice, and we get that

\displaystyle D^{(n)}_\square(N) = G^{(n)}_\square(N)

to get the following spectacular result:

\displaystyle \mathcal{Z}_\square(K) = 2e^{KN_\ell} \sum_n G^{(n)}_\square(N)(e^{-2K})^n = 2e^{KN_\ell}(1 + N(e^{-2K})^4 + \cdots).

This admits the same diagrammatic interpretation in terms of counting Ising graphs on the square lattice.

3.3. Kramers-Wannier duality and the critical temperature

To really bring out the similarity between the two expansions we’ve developed so far, define the series

\displaystyle f(X) = \sum_n G^{(n)}_\square(N)X^n.


\displaystyle  \begin{array}{rcl}  \text{High-}T:& & \ \mathcal{Z}_\square(K) = (\cosh K)^{N_\ell}2^N f(\tanh K) = (\cosh K)^{N_\ell}2^N\left(1 + N(\tanh K)^4 + \cdots \right) \\ \text{Low-}T: & &\ \mathcal{Z}_\square(\tilde{K}) = 2\exp\left(\tilde{K}N_\ell\right)f\big(\exp(-2\tilde{K})\big) = 2\exp(\tilde{K}N_\ell)\left(1 + N\big(\exp(-2\tilde{K})\big)^4+\cdots\right). \end{array}

Up to a multiplicative constant out front, the high and low temperature expansions are just obtained by passing different arguments to the series defined by {f}. This suggests defining a duality between high and low temperature Ising models with the equation {e^{-2\tilde{K}} = \tanh K}, or

\displaystyle K\sim \tilde{K} = -\frac{1}{2}\log\left(\tanh K\right).

This is called the Kramers-Wannier duality. Now here comes the magic. Phase transitions go hand in hand with mathematical singularities, and the only possible source of these is the series {f(X)}. If we assume (correctly) that the Ising model on the square lattice only has one point at which it undergoes a phase transition, then {f(X)} only has one singularity. We have two different ways of writing the singular point of this series in terms of the critical parameter, prescribed by the low and high temperature expansions. These give us the formula {e^{-2K_{\mathrm{c}}} = \tanh K_{\mathrm{c}}} which can be solved to get

\displaystyle K_{\mathrm{c}} = \frac{\log(1+\sqrt{2})}{2}\implies T_{\mathrm{c}} = \frac{2J}{\log(1+\sqrt{2})}.


4. Ising model on triangular and hexagonal lattices

It’s natural to feel invigorated by this result and think, “I can tackle any lattice in the world! Bring them on!” Unfortunately, the machinery we developed in the previous section won’t get us quite that far. In general, the duality transformation we defined will relate e.g. a high temperature expansion on a lattice to a low temperature expansion on its dual lattice. The key property we exploited in the previous section was that the square lattice is self-dual, which afforded us a relationship between two regimes of the same model. If we apply this same idea to the triangular lattice, we get a relationship between high temperatures on the triangular lattice and low temperatures on the hexagonal lattice (its dual lattice) and vice versa. But hope is not lost — we will define one more transformation called the star-triangle transformation, which takes us from the hexagon to the triangle. With this, we will generalize the square lattice solution

\displaystyle \text{High-}T \text{ on }\square\xrightarrow{\text{KW}} \text{Low-}T \text{ on } \square

to a solution for the triangular and hexagonal lattices,

\displaystyle \text{High-}T \text{ on }\triangle\xrightarrow{\text{KW}} \text{Low-}T\text{ on }{\mathrm{hex}}\xrightarrow{\text{ST}}\text{Low-}T\text{ on } \triangle

\displaystyle \text{High-}T \text{ on }{\mathrm{hex}}\xrightarrow{\text{ST}} \text{High-}T \text{ on }\triangle\xrightarrow{\text{KW}}\text{Low-}T \text{ on }{\mathrm{hex}}

where KW denotes the Kramers-Wannier duality transformation and ST denotes the star-triangle transformation. (Unfortunately I have to denote the hexagonal lattice with `hex’ since the hexagon symbol is resisting all attempts at being put into WordPress). What we obtain in the end is the desired relation between high temperatures and low temperatures on the same lattice.

4.1. High and low temperature expansions

I won’t rederive the expansions for the triangular and hexagonal lattices from scratch, since the situation is nearly identical to that of the square lattice and we won’t really need them. Instead, I’ll just write down the final result here, with slight notational changes. The interested reader may wish to test their understanding by obtaining these results themselves.

High temperature expansions:

\displaystyle \mathcal{Z}_\triangle(K,S_\triangle) = (\cosh K)^{L_\triangle}2^{S_\triangle}\sum_nG_{\triangle}^{(n)}(S_\triangle)(\tanh K)^n= (\cosh K)^{L_\triangle}2^{S_\triangle}\left(1+2S_\triangle(\tanh K)^3 +\cdots\right)

\displaystyle \mathcal{Z}_{{\mathrm{hex}}}(K,S_{{\mathrm{hex}}}) = (\cosh K)^{L_{{\mathrm{hex}}}}2^{S_{{\mathrm{hex}}}}\sum_n G_{{\mathrm{hex}}}^{(n)}(S_{{\mathrm{hex}}})(\tanh K)^n = (\cosh K)^{L_{{\mathrm{hex}}}}2^{S_{{\mathrm{hex}}}}\left(1 + \frac{S_{{\mathrm{hex}}}}{2}(\tanh K)^6+\cdots\right)

Low temperature expansions:

\displaystyle \mathcal{Z}_\triangle(K,S_\triangle) = 2\exp\left(L_\triangle K\right)\sum_nD_{\triangle}^{(n)}(S_\triangle)\left(\exp(-2K)\right)^n

\displaystyle \mathcal{Z}_{{\mathrm{hex}}}(K,S_{{\mathrm{hex}}}) = 2\exp\left(L_{{\mathrm{hex}}}K\right)\sum_nD_{{\mathrm{hex}}}^{(n)}(S_{{\mathrm{hex}}})\left(\exp(-2K)\right)^n


\displaystyle  \begin{array}{rcl}  S_{\triangle,{\mathrm{hex}}} &=& \#\text{ sites on }\triangle\text{ or }{\mathrm{hex}} \text{ lattice} \\ L_{\triangle,{\mathrm{hex}}} &=& \#\text{ links on }\triangle\text{ or }{\mathrm{hex}} \text{ lattice} \\ G_{\triangle,{\mathrm{hex}}}^{(n)}(S_{\triangle,{\mathrm{hex}}}) &=& \#\text{ Ising graphs of length }n \text{ on } \triangle\text{ or }{\mathrm{hex}} \text{ lattice with } S_{\triangle,{\mathrm{hex}}} \text{ sites} \\ D_{\triangle,{\mathrm{hex}}}^{(n)}(S_{\triangle,{\mathrm{hex}}}) &=& \#\text{ domain wall drawings of length }n \text{ on } \triangle\text{ or }{\mathrm{hex}} \text{ lattice with } S_{\triangle,{\mathrm{hex}}} \text{ sites} \end{array}

In explicitly writing out the high temperature expansions, we used the fact that, for example, {G^{(3)}_{\triangle}(S_\triangle)=2S_\triangle}. To see this, just count the number of triangles on a triangular lattice. There are two types of triangles: the ones facing up and the ones facing down. The ones facing up are in one-to-one correspondence with the sites of the lattice, and since these are exactly half of the total number of triangles the result follows. Can you see why {G^{(6)}_{{\mathrm{hex}}}(S_{{\mathrm{hex}}}) = S_{{\mathrm{hex}}}/2}?

Let’s use the duality between triangular and hexagonal lattices to rewrite the low temperature expansions. We stated earlier that Ising graphs are in one-to-one correspondence with domain wall drawings on the dual lattice. Caution: to obtain the hexagonal lattice from the triangular one, we place a lattice site at the center of each triangle. Therefore, the dual of a triangular lattice with {S_\triangle} sites is a hexagonal lattice with {2S_\triangle} sites so the duality relationship reads

\displaystyle D^{(n)}_\triangle(S_\triangle) = G^{(n)}_{{\mathrm{hex}}}(2S_\triangle).

By the same token,

\displaystyle D^{(n)}_{{\mathrm{hex}}}(S_{{\mathrm{hex}}}) = G^{(n)}_\triangle(S_{{\mathrm{hex}}}/2)

so that we can now write

\displaystyle \mathcal{Z}_\triangle(K,S_\triangle) = 2\exp\left(L_\triangle K\right)\sum_nG_{{\mathrm{hex}}}^{(n)}(2S_\triangle)\left(\exp(-2K)\right)^n = 2\exp\left(L_\triangle K\right)\left(1+S_\triangle\exp(-2K)^6+\cdots\right)

\displaystyle \mathcal{Z}_{{\mathrm{hex}}}(K,S_{{\mathrm{hex}}}) = 2\exp\left(L_{{\mathrm{hex}}}K\right)\sum_nG_{\triangle}^{(n)}(S_{{\mathrm{hex}}}/2)\left(\exp(-2K)\right)^n = 2\exp\left(L_{{\mathrm{hex}}}K\right)\left(1+S_{{\mathrm{hex}}}\exp(-2K)^3+\cdots\right).

4.2. Star-triangle transformation and the critical temperature

The only thing we have left to do is define the star-triangle transformation, which takes us from a model on the hexagonal lattice to a model on the triangular lattice. Notice that the hexagonal lattice is composed of two triangular sublattices, which we’ll denote by {A} and {B}. We’re going to `decimate’ the spins which lie on the {A} sublattice (depicted in purple below).

The partition function involves a sum over spin configurations which can be split up into sums over spin configurations on the {A} and {B} sublattices,

\displaystyle \sum_{\mathfrak{S}} = \sum_{\mathfrak{S}_B}\sum_{\mathfrak{S}_A}.

The decimation procedure mathematically corresponds to actually evaluating the sum over the {A} sublattice. What we’ll be left with is a sum over configurations on the {B} sublattice (which is triangular) and if we’ve done everything correctly, the resulting partition function will have the same form as the partition function for a triangular Ising model.

Before we do this, let’s cite a purely algebraic result. We want to find solutions for {C} and {K'} in the equation

\displaystyle C \exp\left(K'(S_1S_2+S_2S_3+S_3S_1)\right) = 2\cosh\big(K(S_1+S_2+S_3)\big)

in terms of {K} that hold for all {S_i = \pm 1}. Choosing {S_1=S_2=S_3=1} and {S_1=S_2=1}, {S_3=-1} gives two equations (these are the only independent ones) which one can solve to obtain

\displaystyle K' = \frac{1}{4}\log\left(\frac{\cosh(3K)}{\cosh(K)}\right), \ \ \ C = 2(\cosh(K))^{3/4}(\cosh(3K))^{1/4}.

Now, note that every link on the hexagonal lattice is attached to a unique site on the {A} sublattice, and moreover each site on the {A} sublattice has 3 such links attached to it. So define {N_a = \left\{S_1^{(a)},S_2^{(a)},S_3^{(a)}\right\}} to be the three spins (on the {B} sublattice) which neighbor the site {a} on {A}. Notice further that the sum over nearest neighbors on the hexagonal lattice can be rewritten by summing over the nearest neighbors of every site on the {A} sublattice

\displaystyle \sum_{\langle ij\rangle\in{\mathrm{hex}}} = \sum_{a\in A}\sum_{b\in N_a}.

Then, taking {C} and {K'} defined as above, we can rewrite the partition function of the hexagonal model after decimation as

\displaystyle  \begin{array}{rcl}  \mathcal{Z}_{{\mathrm{hex}}}(K,S_{{\mathrm{hex}}}) &=& \sum_{\mathfrak{S}}\exp\left[\mathcal{H}(\mathfrak{S})\right] = \sum_{\mathfrak{S}}\exp\left(\sum_{\langle ij\rangle}KS_iS_j\right) =\sum_{\mathfrak{S}}\prod_{\langle ij\rangle}\exp(KS_iS_j) \\ &=& \sum_{\mathfrak{S}_A}\sum_{\mathfrak{S}_B}\prod_{a\in A}\prod_{b\in N_a}\exp(KS_aS_b) = \sum_{\mathfrak{S}_B}\prod_{a\in A}\sum_{S_a=\pm 1}\exp\left[KS_a\left(S_1^{(a)}+S_2^{(a)}+S_3^{(a)}\right)\right] \\ &=& \sum_{\mathfrak{S}_B}\prod_{a\in A}C\exp\left[K'\left(S_1^{(a)}S_2^{(a)} + S_2^{(a)}S_3^{(a)} + S_3^{(a)}S_1^{(a)}\right)\right] = C^{S_{{\mathrm{hex}}}/2}\sum_{\mathfrak{S}_B}\exp\left(K'\sum_{\langle i j\rangle\in\triangle}S_iS_j\right) \\ &=& C^{S_{{\mathrm{hex}}}/2}\mathcal{Z}_\triangle(K',S_{{\mathrm{hex}}}/2) \end{array}

Here’s the conclusion: the star-triangle transformation relates a model on the hexagonal lattice with {S_{{\mathrm{hex}}}} sites and parameter {K} to a triangular model with {S_{{\mathrm{hex}}}/2} sites and parameter {K'},

\displaystyle K\sim K' = \frac{1}{4}\log\left(\frac{\cosh(3K)}{\cosh(K)}\right).

Before, we showed that composing KW then ST transformations (or vice versa) relates the high and low temperature regimes of the triangular and hexagonal models respectively. Therefore, we can combine the above with Kramers-Wannier duality,

\displaystyle K\sim \tilde{K} = -\frac{1}{2}\log\left(\tanh K\right)

to get the desired relationship,

\displaystyle K\sim \hat{K} = \frac{1}{4}\log\frac{\cosh\left(-\frac{3}{2}\log\tanh K\right)}{\cosh\left(-\frac{1}{2}\log\tanh K\right)}.

Using the same arguments as we used for the square lattice, we obtain the critical temperature of the triangular model as the fixed point of this combined duality transformation, which can be computed using Mathematica as

\displaystyle K^{(\triangle)}_{\mathrm{c}} = \log(3)/4 \implies T^{(\triangle)}_{\mathrm{c}} = 4J/\log(3).

Composing the other way around gives us the critical temperature of the hexagonal model,

\displaystyle K_{\mathrm{c}}^{({\mathrm{hex}})} = \frac{1}{2}\log(2+\sqrt{3}) \implies T_{\mathrm{c}}^{({\mathrm{hex}})}=\frac{2J}{\log(2+\sqrt{3})}.

So concludes our solution.

**The story of moonshine, part I: symmetry, number theory, and the monster

Suggested background: none!

1. Introduction
2. Symmetry
     2.1. Wallpapers: an amuse-bouche
     2.2. Groups: the algebra of symmetries
     2.3. Classification: finding a periodic table of groups
     2.4. Representations: how groups act in different dimensions
3. Numbers
     3.1. Fermat: the math world’s biggest troll?
     3.2. Modular forms: the fifth elementary operation of arithmetic
4. Moonshine

1. Introduction

When I tell people I study moonshine, it usually invites puzzled looks. In fact, the field remains relatively obscure and unknown even to seasoned mathematicians and physicists. Of course, this is a travesty; the story of moonshine is beautiful, and it involves insight drawn from a century’s worth of ideas. And with just a little bit of work, anyone can appreciate how elegant and bizarre it is.

People usually joke that the study of moonshine was initiated with mathematician John McKay’s observation that

\displaystyle 196884= 196883 + 1.

Profound, no? To understand why this simple identity perplexed mathematicians, consider the dilemma our friend Hurley faced in the show, Lost. Some time after winning the lottery with the numbers, 4 8 15 16 23 42, he found that these exact numbers formed the passcode to a computer on a magical island which was responsible for staving off an apocalypse (or something like that). Indeed, a natural question is, “Why would these special numbers show up in two places that appear to have nothing to do with each other?”

Moonshine is analogous. The number on the left, 196884, is a very distinguished number which naturally appears when studying mathematical objects called modular forms. The numbers on the right, 196883 and 1, appear when investigating properties of the Monster group and the dimensions it acts in (we will define all these terms soon enough). What is more, infinitely many such relationships were discovered, and there was no reason to suspect that the different branches of mathematics involved had anything to do with each other, at least not in the way suggested by these equations.

Ultimately, I think the writers of Lost offered up some sort of painfully contrived explanation for the lottery numbers involving smoke monsters and pergatory (or not, there was really no resolution to that show). It turns out that the (partial) explanation for moonshine is only slightly less bizarre: the connection between the Monster group and modular forms can be seen via a particular string theory which describes particles whizzing around and interacting on a 24 dimensional doughnut-shaped space.

This all motivates giving moonshine a logo (the idea for which I am shamelessly stealing from Jeffrey Harvey):

In the diagram, each dot (we’ll call them vertices) is meant to represent a different ‘mathematical object’; they are the group, the modular form, the algebra, and the string theory. The lines themselves (let’s call them edges) are the relationships between these objects. Finally, the overall tetrahedron will represent the abstract, theoretical structure that arises when you collectively consider all of these connections together: moonshine!

I’ll try wherever possible to inject a bit of philosophy into the discussion, but this will be made difficult by 1) my own ignorance and 2) the infancy of the field — we still don’t have a satisfying framework explaining why moonshine works the way it does. In the least, I hope you’ll get a visceral sense for how strange and unexpected the universe can be, even the little universes that exist perhaps only in our imaginations.

To make this enjoyable for the lay person as well as the more technically inclined reader, the main narrative will be an intuitive, qualitative discussion, and the rigorous formulations of the concepts will be available on the side. At times, I will bend the truth, or completely break it — this is the price of maintaining some degree of simplicity, so I apologize to any mathematicians in the audience in advance! Here’s the plan, split across two blog posts.

Part I
1) I’ll start with a basic discussion of symmetry. To ground ourselves, we’ll study wallpapers and arrive at the notion of a ‘group’ in trying to capture our analysis mathematically.
2) We’ll leap over to the more technical, but (with a little bit of perspiration) just as pretty world of number theory and modular forms. I’m of the opinion that most mathematicians that work with these objects probably couldn’t explain to you in any intuitive sense why we should care about them or why they show up in math and physics. Because I am no better, I’ll perhaps only spend enough time here to convince you that modular forms are special, which will make their reappearance later all the more shocking.
3) We’ll finally be ready for our first glimpse at a bit of mathematical moonshine, but we’ll postpone the full discussion, which includes ideas from physics, until the next blog post.

Part II
4) I’ll give a brief overview of the physics relevant to moonshine (thermodynamics, quantum theories, etc.) and hopefully show you why some of the same ideas that we used to study wallpapers can be used to learn things about the way the universe works.
5) Crappy attempts at high level remarks.
6) I’ll explain the work I’ve been doing with Jeff Harvey on describing a new kind of moonshine and the analogies that exist between it and the old kinds.

Let’s get started!

2. Symmetry

A high level discussion of symmetry usually begins with the platonic solids, the Greeks, or some sort of appeal to beauty and elegance. It’s true of course that we could spend an eternity discussing the central position these concepts have occupied in the development of math and physics, but for our purposes, it’s best if we concretely establish the two very particular kinds of symmetry that we’ll be invoking throughout the remainder of this series.

The first kind is geometric symmetry. An example I like to have in the back of my mind is that human beings are approximately geometrically symmetric: if we ignore internal organs, and I drew a vertical line from someone’s head to her toes, her left hand side would roughly be a reflected image of her right hand side.

A closely related kind is a symmetry I’ll call physical symmetry. This is perhaps a more murky concept, but loosely speaking, we will be referring to theories and equations in physics as possessing the symmetry. It turns out that systems which possess physical symmetry also have conserved quantities (e.g. energy, momentum, etc.), an idea we’ll revisit in the next post.

2.1. Wallpapers: an amuse-bouche

In an attempt to capture the more artistically bent among us early, I’ll use wallpapers as a motivating example of geometric symmetry. We all have some visceral idea in the backs of our minds as to why exactly we should consider wallpapers as being symmetric, but let’s think about how we could communicate this precisely.

Imagine you and I had a roll of wallpaper, which repeats in the vertical direction every 1 foot:


We cover your walls with it (let’s pretend for a moment that you dropped a lot of money and invested in a wall which is infinite in extent), finish the job, and are quite pleased with ourselves. But little did you know — I’m an evil genius and have decided to play a prank on you! While you’re off to the bathroom, I slide the entire wallpaper up exactly 1 foot and then restick it onto the wall.

Alright, this is not much of a prank… actually when you return your world is completely unchanged; because I slid the wallpaper up by exactly the distance in which it repeats, you can’t even tell that I’ve done anything at all! (If you don’t quite see it, scroll down a bit and press the up arrow on the applet. Does the picture look different after the animation has finished?)

Because this is the case, we say that the wallpaper is symmetric under sliding, or more formally that it is symmetric with respect to translations. Because it repeats, I’m able to slide it along one direction when you’re not looking and there’s nothing you can do to be able to tell that I’ve changed it at all. This is the definition we’ll use for symmetry.

Definition: An object is said to be symmetric under a particular transformation (e.g. sliding) if the only way another person can tell that I have performed this transformation on the object is by catching me in the act.

Of course, translations aren’t the only type of symmetry this wallpaper has. I can also rotate it in a couple of ways, and reflect it along different lines. You can visualize a few of these below.

Note that the grid lines I placed are just to make it easier to see that the wallpaper really hasn’t changed at all after each of the transformations is performed on it; to see this, just look at the inside of one of the parallelograms, before and after.

Also, as an exercise, you might try to discover which symmetries I was too lazy to implement. They’re certainly not at all there!

Now, let’s say that you had watched me slide the wallpaper behind your back after all. There is no need for worry; you can always ‘undo’ my prank. In this case, you could just slide the wallpaper back down 1 foot, but if I had rotated it 60{^{\circ}} clockwise, then you could also just rotate it 60{^{\circ}} counterclockwise. This is a particularly nice feature of symmetry transformations: any symmetry can be undone by another symmetry. More formally, every symmetry has an inverse transformation, which is also a symmetry.

Definition: The inverse of a symmetry transformation is just the symmetry transformation which undoes it by bringing the object back to its initial configuration. Every symmetry has an inverse!

Another nice feature of symmetries is that you can always combine them. If I give you two different symmetry transformations, you can always obtain a third by doing one and then doing the other, one after another. For example, one can combine a 60{^{\circ}} rotation with itself to obtain a 120{^{\circ}} rotation. This is usually referred to as composition of symmetries, but I will sometimes also use the word multiplication synonymously for reasons that will become clear in a moment.

Finally, a trivial but important observation is that just leaving the wallpaper alone is a symmetry transformation as well. We’ll call this the identity symmetry.

An aside

OK, maybe I was being a bit coy in suggesting the exercise above. There are actually infinitely many symmetries of this wallpaper. For example there are infinitely many points about which I can rotate the wallpaper and leave it unchanged. But hope is not lost; it turns out that every symmetry of the wallpaper can be obtained by taking combinations of some finite collection of symmetries. So the modified exercise might be: what finite collection of symmetries generates all the rest?

Summary: We’ve defined a symmetry to be a transformation of an object that brings the object back to itself, so that the only way someone could tell that a transformation had been performed at all is if they observed it happening. In the case of a wallpaper, we are free to slide, rotate, and reflect it along different axes, and because the pattern repeats, the wallpaper afterwards will be in the exact same configuration as before we transformed it.

2.2. Groups: the algebra of symmetries

This previous discussion suggests a more abstract, algebraic definition of symmetries. Let’s capture the main properties of symmetry that we explored above in a mathematical definition.

Definition: A group {(G,\star)} is a collection of objects (called elements), {G}, along with a way of multiplying them, {\star}, satisfying the following properties:

1) Closure: For any two objects {g} and {h} in the group, their product {g\star h} is also in the group.

2) Identity: There is an object in the group, {e}, which acts as the number {1} does in the sense that multiplying it with any other object, {g}, gives that same object back:

\displaystyle e\star g = g\star e = g

3) Invertibility: Every object {g} can be undone, or inverted, in the sense that there is always some other object, {g^{-1}}, in the group which cancels it:

\displaystyle g\star g^{-1} = g^{-1}\star g = e

4) Associativity: When multiplying three objects {g,h,} and {k} together, it does not matter whether you multiply the product of the first and second with the third, or multiply the first with the product of the second and third:

\displaystyle g\star (h\star k) = (g\star h) \star k

The idea here is that each element of a group corresponds to some kind of symmetry transformation. In this way, we are able to discuss the algebra of symmetries without lugging around a bunch of ideas having to do with geometry. Algebra is easy!

This is what math is. We found some kind of interesting phenomenon in the real world (symmetry) and we captured its essence in an abstract definition (groups). Now any statement we make about groups is really a statement about symmetry, and in this way, we’ve provided a framework that affords us the ability to rigorously investigate the properties of symmetry. So let’s explore this rich structure with some examples of groups.

Here’s an especially simple group to get us started: it has only the elements {1} and {-1} in it and the group operation is just ordinary multiplication. We can summarize this in a group table.

What kind of symmetry does this correspond to? Well, human symmetry for one! We can imagine {-1} as abstractly representing a reflection which swaps our right and left half, and {1} as not doing anything. As a sanity check, we know that if I perform two reflections on you, one after another, you will return to your normal unreflected self. In mathematical terms, this is the statement you learned in 5th grade,

\displaystyle (-1)\cdot (-1) = 1.

Said in a group theoretic spirit, reflecting twice is the same thing as doing nothing to you! This group, called {\mathbb{Z}_2}, shows up all over the place in physics — for example some theories are symmetric under the reversal of time (flip time twice and you end up with time flowing in the direction it started).

There is maybe another group with two elements that is familiar. This time the group has elements {\overline{0}} and {\overline{1}} (I’m putting hats over these elements just to emphasize the fact that {\overline{1}} is different from the {1} from the previous group), and the multiplication is given by addition modulo 2.

Does this group give us anything new? In some sense, we would undoubtedly think that this one is different from the previous one we wrote down, but there is some sense in which they’re actually exactly the same. The key to understanding this is that {1,-1} and {\overline{0},\overline{1}} are just symbols — in fact, if we take this new group table defined with addition modulo 2, erase the {+} and write down a {\cdot} in its place, and erase all the instances of {\overline{0}} and {\overline{1}} and write {1} and {-1} in their place respectively, then we recover the exact same group table as before! In other words, the way {\overline{0}} and {\overline{1}} add together is the same way {1} and {-1} multiply.

Definition: Two groups {G_1} and {G_2} are said to be isomorphic if there is a one-to-one correspondence of symbols in {G_1} with symbols in {G_2} such that replacing the symbols which appear in {G_1} with their partners in {G_2} recovers precisely the multiplication table of {G_2}.

This is another common feature of mathematics. When we make definitions, there are often hidden redundancies, multiple objects that on the surface may look different, but are actually exactly the same up to relabeling. It turns out that there is only one group with two elements in it up to isomorphism, and we’ve written it above in two different ways. From now on, we’ll just take isomorphism to be our definition of equality — in other words, we’ll consider two groups to be different if and only if they’re not isomorphic to one another.

Summary: We captured the essence of symmetry in an algebraic definition, the group, and saw that groups are completely specified by their multiplication tables. We also defined the notion of isomorphism, which is a more useful notion of equality between groups because it disregards artificial differences between groups, like what symbols we choose to write them down with.

2.3. Classification: finding a periodic table of groups

It’s a common principle of math and science that once a definition is made, or a phenomenon observed, we ask ourselves, “What are all the different kinds of objects that fit into this definition?” For example, chemists might be concerned, indirectly at least, with what all the different types of molecules are. Of course this question seems very open-ended and difficult to answer as stated, but maybe there is a simpler question whose answer gets us most of the way there.

We learned in high school chemistry that molecules have atoms as their building blocks, so if we figure out what all the possible elements are, then we’ve made a significant amount of progress in our classification problem. And of course, chemists have given us such a classification — the periodic table of elements!

Mathematicians took a similar approach to answering the question, “What are all the possible finite groups we can write down?” The analog of ‘atoms’ here are called simple groups, which I may also refer to as atomic groups to make the analogy between group theory and chemistry more explicit. It turns out that even by considering these simpler groups, the answer to this question occupied the efforts of mathematicians for the greater part of the last century, and the classification now is spread out over thousands of pages in the literature. Here’s the statement of the classification of finite simple groups:

Theorem: Every finite simple group either belongs to one of 3 families (each family consisting of infinitely many finite simple groups), or is one of the 26 outlier groups, called the sporadic groups.

Of these 26 sporadic groups, one stands out. The Monster group ({\mathbb{M}}) is the largest finite simple sporadic group, weighing in with a whopping {8\times 10^{53}} elements! Imagine writing down the group table for that… In fact, it’s so big and monstrous that it gobbled up all but 6 of the other 25 outlier groups. That is to say, the structure of most of the other sporadic groups can be, in some sense, found within the Monster. One group in particular that can be found inside the belly of the beast and will be relevant for our story later is the Thompson group.

In my head, the role of the Monster group is played by Monstro the Whale (from Pinnochio) and so it naturally follows that the role of the Thompson group be played by Geppetto, who unfortunately spent some time in Monstro’s gastro-intestinal tract. Feel free to discard this silly mnemonic.

Summary: Just like chemists classified matter with the periodic table of elements, mathematicians have classified all the finite groups by considering simple groups. In this classification, there are 26 distinguished sporadic groups, 2 of which will be relevant for our story: the Monster group, which is the largest of the sporadic groups, and the Thompson group, which lives inside the Monster.

2.4. Representations: how groups act in different dimensions

Many of you have probably taken a linear algebra class in college or in high school. This is because, as a branch of mathematics, linear algebra is ubiquitous — it’s enjoyed tremendous application in nearly every discipline you can think of. Why? Because almost everything is approximately linear (for example, zoom in really close on any curve and it will basically look like a line) and linear algebra is an extremely well-developed and well-understood theory.

On the other hand, group theory is in general very difficult. To this end, mathematicians have found it incredibly fruitful to reduce problems of group theory to problems of linear algebra. Though this so-called representation theory is one of my favorite mathematical subjects and is an incredibly useful tool for understanding theoretical physics, we’ll try not to veer too far off course and will just present the minimum amount needed to understand moonshine.

So how exactly do we study groups in terms of linear algebra? Remember that the object that is most central to linear algebra is the matrix:

\displaystyle M = \left(\begin{array}{cccc} a_{1,1} & a_{1,2} & \cdots & a_{1,m} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,m} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n,1} & a_{n,2} & \cdots & a_{n,m} \end{array}\right)

and that two matrices can be multiplied together to produce a third matrix. The exact way this is happens is not so important for our purposes, but if you’re curious you can just punch ‘matrix multiplication’ into Google and get millions of results.

One way to think about a representation of a group is as a box which assigns a matrix {M(g)} to every group element g:

One shoves group elements into one side of this box and the box spits back a matrix which is meant to represent that group element. But this isn’t just any old box… the way this box represents elements preserves the basic structure of the group. The high level idea is that the matrices satisfy the same multiplication rules as the group elements themselves, and so we’ve reincarnated certain features of the group in a linear algebraic setting.

In order for this to happen, it shouldn’t matter whether we first combine the group elements and then look at the resulting matrix, or first shove group elements through the box and then combine their corresponding matrices; the result should be the same in either case! Pictorially,

To give a name to this, we say that representations satisfy the homomorphism property. If all this seems hopelessly complicated, I don’t blame you. The important thing to take away though is just that a representation is an assignment of a matrix to each element of the group in a way that captures features of the group that we’re interested in.

Mathematical definition of a representation
A representation of a group {G} is a homomorphism {\rho:G\rightarrow \mathrm{GL}(V)} from the group to the automorphism group of some vector space.

An example of a representation
Since {{\mathbb Z}_2 = \{1,-1\}} is our favorite group, we will give an example of one of its representations to make the above a bit easier to digest. All we need to do to define a representation of {{\mathbb Z}_2} is define two matrices associated with {1} and {-1}, {M(1)} and {M(-1)}, and make sure they have the homomorphism property. I claim that the assignment

{M(1) = \left(\begin{array}{rr} 1 & 0 \\   0 & 1 \end{array}\right),  \ \ \   M(-1) = \left(\begin{array}{rr} 0 & 1 \\   1 & 0 \end{array}\right)}

constitutes an honest-to-God representation!

We described above how {(-1)\cdot (-1) = 1}. So for example, the homomorphism property demands that {M(-1)\times M(-1) = M(1)}. If you know how to multiply matrices, then you can easily verify that this is true:

\displaystyle M(-1)\times M(-1) = \left(\begin{array}{rr} 0 & 1 \\ 1 & 0 \end{array}\right)\times\left(\begin{array}{rr} 0 & 1 \\ 1 & 0 \end{array}\right) = \left(\begin{array}{rr} 1 & 0 \\ 0 & 1 \end{array}\right)

\displaystyle M(1) =\left(\begin{array}{rr} 1 & 0 \\ 0 & 1 \end{array}\right)

Noticing that the right hand sides of both lines are the same is all it takes to prove that {M(-1)\times M(-1) = M(1)}. In principle, one would need to go through all combinations of group elements and verify that they satisfy the homomorphism property, but since {(-1)\cdot (-1) = 1} is the only interesting case we’ll just leave it at that.

If you’ve made it this far, pat yourself on the back. We have a little farther to go, but it won’t be long before we’ve reached the moonshine.

So once again, notice that we’ve defined another abstract structure: the representation. As with groups, we’re led to ask a natural question: “What are all the representations I can write down of a particular group?” If you read the example above, we wrote down one representation of {{\mathbb Z}_2} above. What do all the other representations of {{\mathbb Z}_2} look like?

Remember that in asking the analogous question for molecules, chemists studied the atom. Here, we have the exact same thing. There is a notion of an irreducible representation; the irreducible representations of a group constitute those representations out of which all the others can be constructed, just as atoms constitute the building blocks out of which all other molecules can be constructed.

Now, the chemists invented the periodic table of elements to further assist in their study of matter. Can we come up with an analogous table for representations of a particular group? The answer is yes, and it will be extremely useful for us when we encounter moonshine later on.

It turns out that there is an incredible amount of information encoded in the trace of a matrix, which is defined as the sum of the elements along the diagonal. So for example,

\text{trace of } \left(\begin{array}{rrr} 2 & 3 & 4 \\ 5 & 6 & 0 \\ 1 & 8 & 8 \end{array}\right) = 2+6+8=16

There is enough information baked into the trace that we will use it to characterize representations entirely.

Here’s what we’ll do: let’s say we’re given some group and we want to create its representation table. Each row will correspond to an irreducible representation {M}, and each column will correspond to a group element {g}. Each entry in the table will be the trace of the matrix {M(g)} assigned to that group element under that representation. In terms of the box analogy, the columns correspond to group elements, and the rows to different boxes. Each entry in the table is then the trace of the matrix that you get out when you shove that group element into that box. Simple enough, no?

For example, here’s the first bit of the representation table of the Monster group.

The first irreducible representation (or first box) I denote with an {M_1}, the second irreducible representation is denoted with an {M_2}, etc. As a final piece of terminology, we’ll define the dimension of a representation to be the size of the matrices that it maps group elements to. The dimension of the representation is equivalently the trace of the matrix assigned to the identity element, 1A, of the group. So the dimension of the second irreducible representation is 196883. Remember this number!

Summary: To learn more about groups, we decided to cast their study into the framework of linear algebra. We defined a representation of a group simply as an association of a matrix to every element of the group in a special way that preserves the group structure. In asking what all the representations of a group are, we learned about its irreducible representations, which are the representations out of which all others can be built. We summarized this information in a representation table. Every entry of this table is the trace of some matrix — if we are in the column corresponding to group element {g} and representation {M} then the matrix is the one that {g} maps to under the representation {M}. We will see these important numbers come up in the study of moonshine.

3. Numbers

If I was asked what I think the most difficult branch of mathematics is, I would, without hesitation, say that it’s number theory. I always found it strange that the most elementary concepts (adding, counting, etc.) could often times be the hardest to study. For example, one of the most notoriously challenging problems mathematicians faced for hundreds of years was the proof of Fermat’s Last Theorem. Yet the statement of the problem is so simple that one could explain it to a clever middle school student!

The main actors from number theory that will be useful for understanding moonshine are the modular forms. However, they’re abstract, absurdly so, and in order to get you motivated to study them, I want to convince you of their importance by discussing one or two of their innumerable applications. In the process, we’ll learn some cute bits of history, and you’ll have a story or two to tell at your next dinner party!

3.1. Fermat: the math world’s biggest troll?

To get us started, let’s review a bit of middle school math.


Recall that the lengths of the sides of a right triangle are related by a simple formula,

\displaystyle a^2 + b^2 = c^2,

where of course, {c} is the hypotenuse (the diagonal). One can ask, “Are there integer solutions to this equation?” The answer is of course yes! For example,

\displaystyle 3^2 + 4^2 = 5^2.

Another famous one is obtained by choosing {(a,b,c) = (5,12,13)}. These are called Pythagorean triples, and no doubt you’ve had to memorize one or two at some point in your life. Note when we say integer solutions we are excluding cases where {a, b, c} are fractions or have decimals. As suggested by the name, Pythagorean triples have been studied since the Greeks, and even since the Babylonians, and have enjoyed immense importance ever since.

Now what if we asked a possibly harder question. To state the problem, let {n} be some number that is greater than 2 (it could be {3}, 4, or 5,000,000 if you want). The problem is to find integer solutions {a,b,c} to the equation

\displaystyle a^n + b^n = c^n.

Fermat boldly decided this issue by claiming that you cannot! There simply do not exist any integer numbers {a,b,c} which satisfy the equation when {n} is greater than 2. In 1637, Fermat wrote this ‘theorem’ in the margin of his copy of the Arithmetica, saying,

It is impossible to separate a cube into two cubes, or a fourth power into two fourth powers, or in general, any power higher than the second, into two like powers. I have discovered a truly marvelous proof of this, which this margin is too narrow to contain.

I put ‘theorem’ in quotations because in order for something to be considered a theorem, it needs to be proven! Fermat did no such thing. In fact, the conjecture eluded mathematicians until 1994 when Andrew Wiles, using totally modern methods and devoting some seven years of his life in total secrecy to solving the problem, finally managed to churn out a proof. This makes it a bit difficult to believe that Fermat actually produced a proof in two minutes sitting in an armchair. I think most historians chalk this ‘marvelous proof’ up to a simple brain fart.

Nonetheless, it will be insightful for our purposes to analyze Andrew Wiles successful proof of Fermat’s Last Conjecture. The main engine which drove the proof forward was what has now become known as the modularity theorem. In layman’s terms, the modularity theorem established a correspondence between two branches of mathematics which were previously thought to be unrelated to each other. The first branch is the study of elliptic curves, like the one below.


In general, we can just think of an elliptic curve as the graph of an equation of the form

\displaystyle y^2 = x^3+ax+b

which also satisfies some other properties which are irrelevant for our purposes.

The other branch of mathematics is the study of modular forms, one of the main number theoretic actors in our drama (we’ll save the exact definition of what these are for the next section). Then this modularity theorem can be stated as follows:

Theorem: To each elliptic curve one can associate a unique modular object.

Sounds simple enough, but in fact the proof of this theorem was thought to be essentially impossible, or at least beyond the reach of mathematics (this was around the 1980s).

But what is the relevance of the modularity theorem to Fermat’s Last Conjecture? The answer is that mathematicians already knew a ‘proof by contradiction’ of Fermat’s Last Conjecture, but it relied on the modularity theorem being true. In other words, if one proved the modularity theorem, one would obtain Fermat’s Last Theorem as a corollary.

In a proof by contradiction, one assumes the opposite of what they want to prove, and demonstrates that it leads to an absurdity or a contradiction. For example, Sherlock Holmes may argue, “There is video footage of the suspect in his own home 20 minutes before the crime was committed. Assuming he murdered John Doe as you claim, investigator, then he would need to have driven his car from his home to the crime scene at an average of 250 miles per hour! This is an absurdity because even the fastest car in the world is not capable of traveling at these speeds, especially in New York traffic. Therefore, I claim, he is innocent!” Applying a similar argument structure to Fermat’s Last Conjecture, if one assumes that there are integer solutions to the equation {a^n + b^n = c^n} for some {n} greater than 2 (this of course being the opposite of Fermat’s Last Conjecture) then it can be shown that there exists an elliptic curve to which one cannot associate a unique modular object, which is absurd if you believe that the modularity theorem is true. In other words, if Fermat’s Last Conjecture is not true, then neither is the modularity theorem!

Thus, if one could prove the modularity theorem, then one would obtain also a proof of Fermat’s Last Conjecture. This is precisely what Andrew Wiles accomplished in 1994.

Summary: Modular forms are important!

3.2. Modular forms: the fifth elementary operation of arithmetic

Now, we have seen that modular forms enter into some pretty important mathematics (at least important to the mathematicians). In fact, modular forms show up all over the place, throughout the study of numbers. They’re so ubiquitous that famous mathematician Eichler is quoted as saying,

There are five elementary arithmetical operations: addition, subtraction, multiplication, division, and… modular forms.

So, if you will allow me, I can finally get around to explaining what they actually are.

Simply stated, a modular form is a function which enjoys a high degree of internal symmetry. Different modular functions will enjoy different amounts of symmetry, so it is convenient to drag this information along when we talk about a modular function in the form of a group. Let’s make a (very simplified) definition.

Loose definition: We say that a function is a weightless modular function with symmetry group {\Gamma} if it is symmetric with respect to all the symmetry transformations inside the group {\Gamma}.

If you need a refresher on group theory, go back and read the previous section. Whereas before we were talking about geometric objects as having the associated symmetry groups (humans had {\mathbb{Z}_2}-symmetry), now we are talking about mathematical objects having symmetry groups, like functions. There are still ways to visualize this kind of symmetry, but for the sake of getting to the end of story, we won’t pursue them here. However, I will say that it leads to pretty pictures like this one:


For those who are interested, we give a (slightly) more rigorous definition below.

Definition of a weightless modular function
Define the upper half plane {\mathfrak{h}} to be the set of complex numbers with positive imaginary part. A function {f:\mathfrak{h}\rightarrow {\mathbb C}} is called a weightless modular function with symmetry group {\Gamma} if it is meromorphic in the upper half plane, obeys some growth conditions, and, for every matrix {\left(\begin{array}{rr}a & b \\ c & d\end{array}\right)} in {\Gamma}, obeys the transformation equation

\displaystyle f\left(\frac{a\tau+b}{c\tau + d}\right) = f(\tau).

Now, let’s denote the collection of all weightless modular functions with symmetry group {\Gamma} as {\mathrm{Mod}(\Gamma)}. In other words, {\mathrm{Mod}(\Gamma)} is just a set — the elements of this set are functions which are symmetric at least under all the symmetry transformations in the group {\Gamma}.

As it turns out, if the group {\Gamma} is specially chosen, we can find one distinguished modular function inside of {\mathrm{Mod}(\Gamma)} from which all other modular functions in {\mathrm{Mod}(\Gamma)} can be generated. We will call such a special group a genus zero group and such a distinguished modular generating function a hauptmodul.

Rigorous definition of a hauptmodul for a genus zero group
A function {j\in \mathrm{Mod}(\Gamma)} is called a hauptmodul for {\Gamma} if any function {f\in \mathrm{Mod}(\Gamma)} can be written as a rational function in {j}, i.e.

\displaystyle f(\tau) = \frac{p(j(\tau))}{q(j(\tau))}

where {p(X)} and {q(X)} are both polynomials.

We’re almost to the juicy part. Many of you who studied calculus in high school or college may remember what a Taylor series is (or the closely related Fourier series). If not, don’t worry… the basic idea is easy enough to picture. The intuition behind Fourier series is that any wave or signal can be decomposed into its ‘tones’. For example, if I recorded an orchestra playing a chord and performed Fourier analysis on the resulting sound wave, it would tell me how loud the {D} is, how loud the {F\#} is, etc. If I wanted to reconstruct the original wave, I could just take a single {D} wave and an {F\#} wave and combine them in the right proportions, dictated by how loud each part is. Pictorially, Fourier analysis would decompose the red signal into the blue ones:


If I wanted to obtain the red one from the blue ones, I would just add them together. Mathematically, it might look something like this:

\displaystyle f(\tau) = a_0 + a_1q + a_2q^2 + a_3 q^3 + \cdots

Here, each term {a_nq^n} is a ‘single frequency wave’ (like the wave corresponding to the note {F\#}) and {f(\tau)} is the original signal, which we are writing down as a combination of a bunch of single frequency waves. Indeed, we can play this game for modular functions just as well as we can play it for sound waves, and we will do just this in the next section.

Summary: Modular forms are highly symmetric functions which appear all over the place. When considering the collection of all modular functions which are symmetric with respect to a particular symmetry group (we denoted this collection by {\mathrm{Mod}(\Gamma)}) we found that, if {\Gamma} is a genus zero symmetry group, every modular function in {\mathrm{Mod}(\Gamma)} could be written in terms of a special generating function, called a hauptmodul. We will see an example of a hauptmodul in the next section.

4. Moonshine

Alright, the time is finally here. We have most of the tools we need to finally understand moonshine. I was going to end on a cliffhanger and finish this post off next week, but I won’t do that to you… I’ll at least give you a little taste of the moonshine (mathematically), but we’ll have to postpone a discussion of what this has to do with physics until next week.

We ended the last section talking about hauptmoduls for genus zero groups. One very, very important modular function is the hauptmodul for a group called {\mathrm{SL}_2(\mathbb{Z})} — we’ll denote this function with {J(\tau)}. Its graph is very beautiful:

Let’s say we’re interested in learning more about this function. We can write out its ‘tonal’ expansion:

\displaystyle J(\tau) = q^{-1} + 196884q+ 21493760q^2 + 864299970q^3 + 20245856256q^4 + \cdots

There are an infinite number of tones in the expansion, but in the interest of saving cyber forests I’ve decided not to list all of them. The most attentive amongst you might recognize these numbers… if we take a look at the representation table we wrote out for the Monster group,

you’ll notice that we have some nice numerical identities.

\displaystyle  \begin{array}{rcl}  1&=& 1\\ 196884 &=& 196883 + 1 \\ 21493760 &=& 21296876 + 196883 + 1 \\ 864299970 &=& 842609326 + 21296876 + 2 \cdot 196883 + 2\cdot 1 \\ &\vdots& \end{array}

where the numbers on the left appear in the expansion of the {J}-function, and the numbers on the right appear in the table (in fact, they are the dimensions of the irreducible representations of the Monster group). Just the observation that 196884 = 196883 + 1 was bizarre enough that John Conway called the correspondence “moonshine”, which is apparently British slang meaning ‘crazy’ or ‘absurd’.

But the connection goes further. There is another genus zero group called {\Gamma_0(2)+} whose hauptmodul is

\displaystyle J_{2+}(\tau) = q^{-1} + 4372q + 96256q^2 + 1240002q^3 + \cdots

Can you guess how these numbers are related to the Monster group? Notice that if you take the numbers 1 and 196883 and look at the corresponding numbers one column to the right in the Monster representation table, you get 1 and 4371 respectively. In fact this works in general, if you just scoot one column over to the right you’ll get an analogous list of identities:

\displaystyle  \begin{array}{rcl}  1&=& 1\\ 4372 &=& 4371 + 1 \\ 96256 &=& 91884 + 4371 + 1 \\ 1240002 &=& 1139374 + 91884 + 2 \cdot 4371 + 2\cdot 1 \\ &\vdots& \end{array}

We did not develop enough tools to present the full Monstrous moonshine conjectures, but here is what we have so far:

Conjecture: For every single group element {g} of the Monster group, we can associate some genus zero group {\Gamma_g} whose hauptmodul we will denote {J_g}. This association is special because the tones of the hauptmodul,

\displaystyle J_g = q^{-1} + a_1q + a_2q^2 + a_3q^3 + \cdots

are related to the numbers which appear in the column of the Monster representation table which is associated with the element {g}.

For example, if we choose {g = 1A}, the identity element, then the genus zero group is

\displaystyle \Gamma_{1A} = \mathrm{SL}_2({\mathbb Z})

as we discussed above, its hauptmodul is just the ordinary {J}-function,

\displaystyle J_{1A}(\tau) = J(\tau) = q^{-1} + 196884q + \cdots

and its tones are related to the numbers which appear in the first column of the Monster representation table, which is the column corresponding to the 1A group element.

It’s important to note that the exact Monstrous moonshine conjectures are not only more precisely stated, but they say much more about what is going on than the above does. The entire discussion so far was just a way of giving a taste of the mathematical objects which are involved. In a week or so, we’ll see that the bridge which connects the {J} function to the Monster group is string theory.

***Gauge theory, the Aharonov-Bohm effect, and Dirac monopoles

Suggested background: modern physics and quantum mechanics, basic electricity and magnetism
I thought I’d kick off my blog with some beautiful physics that I learned in a quantum mechanics course during my junior year: Dirac’s argument that, if magnetic monopoles happen to exist, they must have quantized charge. Dirac’s idea was one of many insightful arguments (some of which we’ll see in future posts) motivating this so called quantization condition. I haven’t been able to find a resource which gives you Dirac’s story in its entirety, at least not in any accessible way. So this post will be a long, somewhat ridiculous journey during which we’ll learn some gauge theory and also the related Aharonov-Bohm effect, each jewels in their own right. You’ll probably laugh, you’ll probably cry, and hopefully by the end you’ll have a newfound appreciation for how ludicrous Dirac was.

Here’s the plan:

1) I’ll tell you what magnetic monopoles are.

2) I’ll introduce the relevant electricity & magnetism and gauge theory, as well as how the two manifest in quantum mechanics.

3) I’ll show how this manifestation leads to the Aharonov-Bohm effect, a quantum phenomenon in which the vector potential acquires a more physical status it did not enjoy in classical mechanics; we’ll see how the vector potential itself can physically impact a system through the wave function, even when the electric and magnetic fields are {0}. To get a quantitative feel for this effect, we’ll study how a solenoid, whose fields vanish outside of it, can affect the interference pattern observed in a traditional two-slit experiment.

4) I’ll justify how an infinitesimally thin solenoid with one end at infinity and the other at the origin can reproduce the field of a magnetic monopole. We’ll derive the quantization condition by asserting that, in order for such a solenoid to be indistinguishable from a magnetic monopole, there should be no experiment we can perform that will allow us to detect the solenoid. Bearing in mind the way in which solenoids can impact the two slit experiment, we’ll show that the solenoid current, and thus the magnetic charge, must take on only discrete, quantized values if the length of the solenoid is to escape detection via the Aharonov-Bohm effect.

If the details don’t make sense to you yet, don’t worry. Just reread the agenda after every section to reorient yourself to the larger task at hand and the logical flow shoud become apparent to you.

Magnetic monopoles

Before we chase down this bizarre argument, it’d be nice to know what magnetic monopoles are. In short, magnetic monopoles are hypothetical particles that carry magnetic charge in the same way electrons carry electric charge. The main reasons they remain ‘hypothetical’ are that they are 1) supposedly massive enough that their production requires more energy than available in current particle accelerators, and 2) also rare enough in the universe that the probability of their detection is vanishingly small. How convenient!

Still, most physicists would bet money that they’re real. Relatively recently, work in grand unified theories and quantum gravity has produced evidence in favor of their existence, but we will content ourselves with a more obvious argument by beauty.

Let’s look at Maxwell’s equations in the presence of sources, as they’re traditionally written down. In some units, they’re

\displaystyle \vec{\nabla}\cdot\vec{B} = 0, \ \vec{\nabla}\cdot\vec{E} = 4\pi\rho_e

\displaystyle \vec{\nabla}\times\vec{E} + \frac{1}{c}\frac{\partial\vec{B}}{\partial t} = 0, \ \vec{\nabla}\times\vec{B} - \frac{1}{c}\frac{\partial\vec{E}}{\partial t} = \frac{4\pi}{c}\vec{J}_e

One might notice that the equations on the left look exactly like those on the right, except for the fact that {0} on the left replaces {4\pi\rho_e} and {\frac{4\pi}{c}\vec{J}_e} on the right. This is precisely due to the fact that there is no magnetic charge. Just as the presence of electric charges gives us a non-zero divergence of the electric field, so too would the presence of magnetic charges lead to the non-zero divergence of the magnetic field. These equations are practically begging to be symmetrized, so let’s introduce some monopoles:

\displaystyle \vec{\nabla}\cdot\vec{B} = 4\pi\rho_m, \ \vec{\nabla}\cdot\vec{E} = 4\pi\rho_e

\displaystyle \vec{\nabla}\times\vec{E} + \frac{1}{c}\frac{\partial\vec{B}}{\partial t} = \frac{4\pi}{c}\vec{J}_m, \ \vec{\nabla}\times\vec{B} - \frac{1}{c}\frac{\partial\vec{E}}{\partial t} = \frac{4\pi}{c}\vec{J}_e

One can easily check that making the field substitutions

\displaystyle \vec{E}\rightarrow \vec{B}, \ \vec{B}\rightarrow -\vec{E}

and also the source substitutions

\displaystyle \rho_e\leftrightarrow \rho_m, \ \vec{J}_m\leftrightarrow \vec{J}_e

leaves the modified Maxwell’s equations invariant; this is known as electric-magnetic duality. Since a symmetric theory is a more beautiful theory (at least to most), this is usually used as a simple amuse-bouche for why we should take magnetic monopoles seriously. But Dirac’s argument doesn’t have much to do with their existence–instead it is meant to illustrate how assuming their existence leads to their quantization, so let’s get into the machinery we need to understand this.

Electromagnetism preliminaries

First, some reminders: the magnetic field of an ideal, long solenoid with radius {a} is given by \vec{B}(\vec{r}) =  \mu_0in when |\vec{r}|\leq a, and 0 otherwise. Here, {\mu_0} is the vacuum permeability, {i} is the current, and {n} indicates how densely it is wound.

Once we know the fields, electricity and magnetism tells you how to calculate forces on particles from them via the force law:

\displaystyle \vec{F} = q(\vec{E} + \vec{v}\times\vec{B})

It is clear from this relation that there can be no electromagnetic influence in a region where {\vec{B}} and {\vec{E}} are identically 0 because the electromagnetic force vanishes then as well!

One often defines the electric scalar potential, {\varphi}, and the magnetic vector potential, {\vec{A}}, by the relations

\displaystyle \vec{B}=\vec{\nabla}\times\vec{A}, \ \vec{E} = -\vec{\nabla}\varphi - \frac{\partial \vec{A}}{\partial t}.

In classical electromagnetism, the potentials, strictly speaking, only serve as a useful tool used in place of fields. Potentials aren’t responsible for anything physical and can’t be measured; the fields do all the heavy lifting. In fact, the alternative description they afford us is ‘redundant’ in the sense that one has some freedom in choosing what potentials they want to work with. More concretely, if {\varphi} and {\vec{A}} describe the physics at hand, then so do {\varphi'} and {\vec{A}'} given by

\displaystyle \varphi' = \varphi - \frac{\partial\chi}{\partial t}, \ \vec{A}' = \vec{A} + \nabla \chi

where {\chi} is an arbitrary function of position and time. You can verify this as an exercise by showing that

\displaystyle \vec{B'} = \vec{\nabla}\times\vec{A'} = \vec{\nabla}\times\vec{A}=\vec{B}, \text{ and}

\displaystyle \vec{E}' = -\vec{\nabla}\varphi' - \frac{\partial\vec{A}'}{\partial t} =-\vec{\nabla}\varphi - \frac{\partial\vec{A}}{\partial t} = \vec{E}

i.e., the fields described by the two sets of potentials are identical, and thus Maxwell’s equations are invariant under these transformations (hint: use the fact that {\vec{\nabla}\times\vec{\nabla}\chi = 0}). The changes {\varphi\rightarrow\varphi'} and {\vec{A} \rightarrow \vec{A}'} are referred to together as a gauge transformation. The statement that electromagnetism is a gauge invariant theory is precisely the statement that, if you tell me {\varphi} and {\vec{A}} describe the physics of a system, then I can pick any function {\chi} I want and do all the computations using {\varphi'} and {\vec{A}'} perscribed above.

In quantum mechanics, the Hamiltonian of a system is impacted by the presence of electromagnetic potentials:

\displaystyle H(\vec{A},\varphi) = \frac{1}{2m}\left[\vec{p} - q\vec{A} \right]^2 + V+q\varphi.

If you want a gauge invariant quantum theory of electromagnetism, you need to do more than just transform the potentials; you need to subject the wave function to a transformation as well. You should check that, if {\psi} is a solution to the Schroedinger equation

\displaystyle H(\vec{A},\varphi)\psi = E\psi

then {\psi'} is a solution to the Schroedinger equation

\displaystyle H(\vec{A}',\varphi')\psi' = E\psi'

where the gauge transformation is now

\displaystyle \varphi' = \varphi - \frac{\partial\chi}{\partial t}, \ \vec{A}' = \vec{A} + \nabla \chi, \ \psi' = e^{iq\chi/\hbar}\psi.

Of course, {\psi'} differs from {\psi} only by a phase factor and so represents the same physical state since all probabilities are conserved, so we’ve discovered a truly gauge invariant formulation.

To summarize the above, all we did was describe a way of changing problems in electricity and magnetism while retaining the physics–we have some freedom in choosing the potentials (the wavefunction must come along for the ride if we’re working in quantum mechanics) and we can exploit this as a computational tool.

As one final remark, I ask that you recall a theorem of vector calculus: if {\vec{\nabla}\times \vec{A} = 0} in any simply connected region (i.e. regions where all loops can be continuously contracted to a point), then you can write {\vec{A}} in that simply connected region as {\vec{A} = \vec{\nabla}\chi} for some function {\chi}.

If you’re not familiar with what it means for a space to be simply-connected, it’s ok. We only really care about a couple examples. Your run of the mill Euclidean space is pretty obviously simply-connected because you can take any loop anywhere and shrink it until it’s a point. However, if you consider all of Euclidean space, but then remove the {\hat{z}}-axis, then a circle in the {\hat{x}}{\hat{y}} plane cannot be contracted to a point continuously, because you would need to pull it through the missing {\hat{z}}-axis.

This is actually all we need to start understanding Dirac’s argument.

Aharonov-Bohm effect

In 1959, Aharonov and Bohm demonstrated that, in contrast to classical electricity and magnetism, there actually can be quantum mechanical e&m effects on particles which never travel through a region where there are fields. Here, I’ll basically follow Griffiths approach (which you should take a look at), but in the interest of showing you something new, I’ll try to solve everything Griffiths solves in a different way. We’ll look at two systems: the quantum ring, for which I’ll flex gauge theory’s muscles a bit, and the modified two-slit experiment, for which I’ll use Feynman’s path integral formulation. For the sake of continuity, I recommend you look at these two methods only after you’ve read the rest of this post (or at least read through Griffiths approach).

To start with, consider an electron constrained to move on a ring of radius {R} in the plane (I’ll commonly refer to this system as the quantum ring). This is a pretty easy system to solve the Schrodinger equation for, so I recommend you try it yourself. Here, I’ll just tell you that the eigenstates and eigenenergies are

\displaystyle \psi_n(\theta) \propto e^{\pm in\theta}, \ E_n = \frac{\hbar^2 n^2}{2mR^2}, \ n\in\mathbb{Z}.

Now imagine sticking a long solenoid with radius {a<R} inside the ring (I’ll refer to this new system as the solenoidal quantum ring). If we followed classical intuition, we might argue that nothing should change, since the fields outside of the solenoid all vanish and the particle can never travel inside the solenoid where the fields are non-zero. However the eigenvalue problem does change:

\displaystyle E_n = \frac{\hbar^2}{2mR^2}\left(n-\frac{q\Phi}{2\pi\hbar}\right)^2, \ n\in \mathbb{Z}.

The free particle cares about the flux through the solenoid–the vector potential in this context can be interpretted as having some sort of physicality.

Gauge theoretic derivation of solenoidal quantum ring

The good news is that the magnetic field is zero outside of the solenoid, which is where our particle lives. This means that {\vec{B} = \vec{\nabla}\times \vec{A} = 0} there as well. However we are not totally safe to write {\vec{A} = \vec{\nabla}\chi} because the region in which {\vec{\nabla}\times\vec{A}=0} is not simply-connected. The issue is that loop like paths that wind around the solenoid can’t be contracted into a point. We can eliminate these loopy paths from our region entirely by only considering the region with {\theta\in(-\pi+\epsilon,\pi-\epsilon)} and taking the limit as {\epsilon\rightarrow 0}. In other words, we are taking scissors to a point on the ring on which our particle lives and then bringing the ends closer and closer together until they’re basically connected.

So in the region outside the solenoid where {\theta\in(-\pi+\epsilon,\pi-\epsilon)}, we can write {\vec{A} = \vec{\nabla}\chi} for some function {\chi}. Here, the line integral between any two points {a} and {b} will be given by

\displaystyle \int_{a}^b\vec{A}\cdot d\vec{l} = \chi(b)-\chi(a).

Using Stoke’s theorem, we also know that

\displaystyle \oint\vec{A}\cdot d\vec{l} = \iint_{S}\vec{B}\cdot d\vec{A} = \Phi

where {\Phi} is the magnetic flux through the solenoid. Consider the line integral around our ring, and break it up into two integrals–one restricted to the region we defined above, and the other integrating over what remains. Then we have that

\displaystyle \Phi = \oint\vec{A}\cdot d\vec{l} = \int_{-\pi+\epsilon}^{\pi-\epsilon}RA_{\theta}d\theta + \int_{\pi-\epsilon}^{-\pi+\epsilon}RA_{\theta}d\theta.

If we take the limit as {\epsilon\rightarrow 0}, the second term will vanish because the limits of integration become closer and closer together. Rewriting the left hand term gives us

\displaystyle  \Phi = \lim_{\epsilon\rightarrow 0}\chi(\pi-\epsilon)-\chi(-\pi+\epsilon)

and so we can write some {\chi} which satisfies this constraint, the simplest choice being

\displaystyle \chi(\theta) = \frac{\theta}{2\pi}\Phi.

Keeping all that in mind, we can choose a gauge transformation which turns the solenoidal quantum ring into the traditional quantum ring. We’ll let

\displaystyle \vec{A}\rightarrow\vec{A}'= \vec{A}-\vec{\nabla}\chi = 0, \ \psi\rightarrow\psi' = e^{-iq\chi/\hbar}\psi.

Now {\psi'} will obey the free particle on a ring equation. The caveat is that since {\chi} is not single-valued at {\theta=\pi}, {\psi'} will not be either, so we have to come up with alternative boundary conditions to enforce. Let’s derive those boundary conditions quickly before we get into solving the eigenenergy problem.

Using the definition of {\psi'}, we get that

\displaystyle \psi'(\pi-\epsilon) = e^{iq\Phi(\pi-\epsilon)/(2\pi\hbar)}\psi(\pi-\epsilon),\text{ and}

\displaystyle \psi'(\epsilon-\pi) = e^{iq\Phi(\pi-\epsilon)/(2\pi\hbar)}\psi(\epsilon-\pi)

so taking the limit as {\epsilon\rightarrow 0} and requiring that the original wave function we had before the gauge transformation be single-valued, we get

\displaystyle \psi'(\pi) = e^{-iq\Phi/\hbar}\psi(-\pi)

Now we’re ready to solve the particle on a ring problem. Using the Wikipedia formula

\displaystyle \nabla^2f = \frac{1}{r}\frac{\partial}{\partial r}\left( r \frac{\partial f}{\partial r}\right) + \frac{1}{r^2}\frac{\partial^2 f}{\partial\theta^2} + \frac{\partial^2 f}{\partial z^2}

and recognizing that the wavefunction of our particle can be parametrized by only {\theta}, (i.e. {\frac{\partial \psi}{\partial r} = \frac{\partial \psi}{\partial z} = 0}) the Schrodinger equation reduces to

\displaystyle -\frac{\hbar^2}{2m}\frac{1}{R^2}\frac{\partial^2}{\partial \theta^2}\psi(\theta) = E\psi(\theta).

This is a pretty standard differential equation that most of you have probably seen in introductory quantum mechanics courses. Making the ansatz {\psi'(\theta) \propto e^{i\alpha\theta}} gives us

\displaystyle \frac{\hbar^2\alpha^2}{2mR^2}\psi' = E\psi'

and enforcing the boundary conditions we derived above gives us (after a little algebra) {\alpha = \frac{-q\Phi}{2\pi\hbar} + n} for {n\in \mathbb{Z}}. We can rearrange this to get the allowed energy values for this system:

\displaystyle E_n = \frac{\hbar^2}{2mR^2}\left(n - \frac{q\Phi}{2\pi\hbar}\right)^2, \ n\in\mathbb{Z}

which is the same result derived in Griffiths.

Now that we have a feel for what is happening, we can start thinking about the more traditional Aharonov-Bohm effect, which is studied in the context of electron interference.

Recall how an interference pattern is created in the ordinary two slit experiment from accumulated phase due to electron propagation. The idea is that placing a solenoid near the two slits (and making sure that electrons never pass through it) provides a new way for the two separate beams to acquire a new relative phase, and thus changes the interference pattern. If we look at a point directly on the opposite side of the solenoid, by symmetry, the phase the beams from the two different slits accumulate via the paths they take is the same, and so can be ignored. The Ahoronov-Bohm effect predicts that the solenoid will contribute a relative phase of

\displaystyle \Delta \phi = \frac{q\Phi}{\hbar}

between paths from the two slits so that the interference pattern that’s observed is different from that in the original two slit experiment. As in the case of the solenoidal quantum ring, placing a solenoid in between the two slits of a two slit experiment modifies the physics via the vector potential.

Path integral formulation of the modified two-slit experiment

To me, the most intuitive way to see how the Aharonov-Bohm effect arises is to use path-integrals. To appreciate the real solution, we’d need to use a bit of homotopy (more specifically, winding numbers to analyze all possible paths around the solenoid) but we’ll content ourselves with a bit of a simplification.

If you’re familiar with the Lagrangian formalism, you’ll remember that

\displaystyle \mathscr{L}'(\vec{r},\vec{v})=\mathscr{L}(\vec{r},\vec{v}) - q\vec{v}\cdot\vec{A}(\vec{r})

where {\mathscr{L}} is the Lagrangian of the free particle. The action of a path {\vec{r}} for this Lagrangian is given by

\displaystyle S'[\vec{r}(t)] = \int\mathscr{L}'[\vec{r},\vec{v}; t]dt=S[\vec{r}(t)]-q\int\frac{d \vec{r}}{d t}\cdot\vec{A}(\vec{r})dt = S[\vec{r}(t)]-q\int \vec{A}(\vec{r})\cdot d\vec{r}

where {S} is the action without the solenoid present, and {S'} is the action with the solenoid present.

Now, we want to determine the propagation amplitude from the electron source to the point directly opposite the solenoid. Of course, in Feynman’s formalism, we determine the propagator by summing a phase factor related to the action over all paths, namely

\displaystyle K' = \int \mathscr{D}[\vec{r}(t)]\exp\left(\frac{i}{\hbar}S'[\vec{r}(t)] \right)=\int\mathscr{D}[\vec{r}(t)]\exp\left(\frac{i}{\hbar}S[\vec{r}(t)] \right)\exp\left(-\frac{iq}{\hbar}\int\vec{A}(\vec{r})\cdot d\vec{r} \right).

We can define {K'_i} by the same integral as {K'}, except restricting the integral to be over paths which travel through slit {i}. Since each path either travels through slit 1 or 2, then we would have {K' = K'_1 + K'_2}.

Now, because {\vec{\nabla}\times\vec{A} = 0} outside the solenoid, we know that the line integral {\int \vec{A}(\vec{r})\cdot d\vec{r}} is the same for all paths which go through the same slit. In other words, if two different paths travel through the same slit {i}, then their line integrals will be identical. (If we were being very rigorous, we would need to to justify this fact by considering only simply-connected regions). We can use this fact to pull the second exponential factor in the definition of {K'_i} outside of the path integral,

\displaystyle K'_i =\exp\left( -\frac{iq}{\hbar}\int_{C_i} \vec{A}(\vec{r})\cdot d\vec{r}\right) \int_{\text{slit i}} \mathscr{D}[\vec{r}(t)] \exp\left(\frac{i}{\hbar}S[\vec{r}(t)] \right) = \exp\left(i\phi_i\right)K_i

where {C_i} is any path which travels through slit {i}, {K_i} is the path integral for the system without the solenoid, and {\phi_i := -\frac{q}{\hbar} \int_{C_i}\vec{A}(\vec{r})\cdot d\vec{r}}. Finally, we’re ready to derive the Aharonov-Bohm effect. The ordinary amplitude is given by

\displaystyle K \propto K_1 +K_2

whereas the solenoid modifies it like

\displaystyle  \begin{array}{rcl}  K' &\propto& K_1' + K_2' \\ &=& K_1e^{i\phi_1} + K_2e^{i\phi_2} \\ &=& e^{i\phi_1}\left(K_1 + e^{i(\phi_2-\phi_1)}K_2 \right). \end{array}

The exponent in the front is an overall phase factor, and so doesn’t affect the physics. However, the exponent in front of {K_2} induces a relative phase given by

\displaystyle \Delta\phi = \phi_2-\phi_1 = -\frac{q}{\hbar}\int_{C_2}\vec{A}(\vec{r})\cdot d\vec{r} + \frac{q}{\hbar}\int_{C_1}\vec{A}(\vec{r})\cdot d\vec{r}=\frac{q}{\hbar}\oint\vec{A}(\vec{r})\cdot d\vec{r}

where of course we can apply Stoke’s theorem to the closed line integral to convert it into a surface integral over {\vec{\nabla}\times\vec{A} = \vec{B}}, i.e.

\displaystyle \Delta \phi = \frac{q}{\hbar} \iint_S\vec{B}\cdot d\vec{a} = \frac{q}{\hbar}\Phi

This of course is the same answer we presented above.

Thin solenoid as a model for magnetic monopoles

You may be wondering how the Aharonov-Bohm effect relates to magnetic monopoles. I’ll try to justify the notion that we can think of a magnetic monopole as special kind of solenoid–once we’ve established this equivalence, we’ll be able to derive the quantization condition by studying monopoles within the context of the Aharonov-Bohm effect. I may address the equivalence in more detail in future blog posts, however, for our purposes we can intuit and then assume the answer.

If you image an ordinary solenoid and stare at the field lines, you’ll notice that it forms a dipole, each end constituting one of the poles. That’s all well and good, but we need one less pole if we want a monopole, so what’s the solution? Just toss one of the poles out of course!

To be less coy, what we’re proposing is that taking one end of the solenoid (one of the poles) and dragging it off to infinity makes the end that we have near us exhibit the field of a solenoid,

\displaystyle \vec{B} = \frac{g \hat{r}}{4\pi\mu_0 r^2}

where {g} is the magnetic charge.

The quantization condition

Here’s where the miracle happens. To recap, so far what we’ve argued is that in the two split experiment, if one includes a solenoid in the setup, electron beams from the two different slits arrive at the wall with a phase difference proportional to the magnetic flux through the solenoid–this causes a change in the interference pattern. We’ve also motivated how a solenoid might serve as a useful way of thinking about magnetic monopoles.

From here, Dirac asserted that, if solenoids were really a good model for magnetic monopoles, then there should be no experiment whatsoever that would allow us to determine that we have a solenoid, and not a monopole; the special Dirac solenoid and magnetic monopole should be indistinguishable. To undermine the equivalence, we’ve actually already shown how one might distinguish between a solenoid and a monopole–simply perform the two slit experiment along where you think the solenoid lies and check whether or not the interference pattern changes. Does this mean that we really can’t use Dirac solenoids and monopoles interchangably?

Not quite. In fact, we may be able to use this to our advantage. Let’s look more closely–if the accumulated phase difference is a multiple of {2\pi}, then {e^{i\Delta\phi}=1} and the solenoid makes no detectable impact on the physics. This occurs when (in units {\hbar=1})

\displaystyle e\Phi = 2\pi n

which we can rewrite by using the fact that the flux is related to the magnetic charge via {\Phi = 4\pi g},

\displaystyle e g = \frac{n}{2}.

This is the quantization condition. We used the Aharonov-Bohm effect to constrain the possible values of magnetic charge by demanding that the modified two-slit experiment should not be able to distinguish between a long solenoid and a monopole. Pretty clever, and sort of bizarre.

Phew… it took us quite a bit of effort to get there, but this is a great starting point for learning more about this far-reaching topic. In future posts, I’ll try to talk more about why we can or can’t trust this result and then go through the ‘t Hooft-Polyakov monopole.