*By Nick Kokron*

Cosmology in the 2020's is in a particularly privileged position as far as scientific fields go. Several new observatories will come online soon and provide large astrophysical datasets that will give us a tremendous amount of information about the so-called "dark" components of the Universe: dark matter and dark energy.

But if we’re not careful, we run the risk of drawing inadequate conclusions from these rich datasets. Not because our data aren’t adequate—they’re amazing—but because our models aren’t good enough. That’s why much current work is focused on creating better predictions of the signals these observatories were designed to measure.

To understand the importance of accurate model building, imagine the following scenario: you are playing darts with your friends. Whoever throws their darts closest to the center will win. If the person is as good at darts as I am, they will probably miss the board a few times and have darts spread out all over the board. That is because I have neither *precision* or *accuracy* at dart throwing. Now imagine that one of your friends, after bragging a lot about their skill at darts, manages to hit all of their darts in a tightly packed circle. Except, the circle is far away from the center of the board! The dart thrower is quite *precise* in their throw, but not particularly *accurate*. Their model for how to throw a dart at the board has a calibration problem, and their bragging rights perhaps weren’t as well-earned as they thought.

Now imagine that the stakes are a bit more consequential than throwing darts at boards. For example, imagine you want to report to the world how much dark energy is out there and how it behaves. If your model is not up to snuff, you’ll be like the dart thrower that missed the center but was very sure they knew what they were doing. You risk telling the world something that makes sense on its own, but just isn’t actually true.

Two key measurements I'm interested in relate to the properties of the *positions* of galaxies, separated by typical distances of tens of millions of lightyears, and their *shapes*. These measurements contain statistical information that enables scientists to derive a number of critical quantities to cosmologists, such as:

- The amount of dark matter in the Universe
- The amount and properties of dark energy
- The current rate of expansion of the Universe
- The indirect measurements of information about the very early inflationary period of the Universe

The ability to derive these quantities is a perfect example of the need for good models, and thus, good predictions.

As far as *techniques* for making these predictions go, they have typically fallen into two camps: supercomputer simulations of the evolution of the Universe from early ages to today, and calculations done with a pencil and paper (though often with a computer assist to do analytic calculations) by researchers with backgrounds in theoretical particle physics. The second technique may seem surprising at first glance, but particle physicists have a long history of dealing with what are called "perturbative expansions," where the biggest contributions to the behavior of an observable are dealt with in the first calculations, and then over time, as experiments get better and more precise, more and more contributions are added to the theoretical calculations to model the latest data. This technique works quite well; for example, particle physicists have used perturbative expansions to calculate the magnetic moment of the electron to about 12 significant digits, which have then consequently been confirmed by experiment. As we make more and more precise calculations in cosmology, we're finding similar approaches to be very useful in predicting quantities we can measure.

I will describe both techniques in more detail below.

**Good old pencil and paper**

The pencil-and-paper tools, which were developed by many scientists including former KIPAC professor Leonardo Senatore, treat the distribution of dark matter in the Universe as a fluid. The key development in Senatore's work was realizing that, although for decades this fluid has been treated as inviscid (in other words, freely flowing), the complicated process of gravitational collapse leads to equations that look very similar to those of a fluid with a viscosity close to that of chocolate syrup (yum)! His work, and that of collaborators, was outlined in this 2014 Quanta article. In 2020, Senatore and I, along with other collaborators, published an article in the *Journal of Cosmology and Astroparticle Physics*, applying this technique to a public data set.)

The improved predictions afforded by this model led to the most precise measurements at that time of the amount of dark matter in the Universe as well as its expansion rate, based on a large and very well-calibrated sample of galaxies from the Sloan Digital Sky Survey.

**Humans get a machine assist**

However, it is well-known that the pencil-and-paper tools I described above cannot be used to fully characterize the data sets mentioned so far. At some point their predictions fail (usually at very small scales which are difficult to model analytically), and instead we must solve these equations on computers numerically in simulations. Each computer simulation provides a very precise and accurate prediction for *one and only one* realization of the Universe. Running several simulations for several Universes, we can begin to statistically assess how similar our computational Universes are to the actual one we see in telescopes.

KIPAC has a long and storied history with computational cosmology. Professors Tom Abel and Risa Wechsler are world-leading experts on the topic of cosmological simulations, which have previously been discussed in this forum in November 2017 and February 2019.

**Hybrid approaches: Combining the best of both techniques**

With collaborators, we have figured out a way to combine these pencil-and-paper theories with simulations of only the dark components of the Universe (in other words, not including normal matter like stars, galaxies, and dust, but only dark matter and dark energy) to predict the signals of the positions of *actual galaxies* (also known as the large-scale structure of the Universe). This is done through a technique called *bias modelling*, which was originally developed in the context of the aforementioned pencil-and-paper theories.

A technical discussion of bias modelling is beyond the scope of this article, but in brief, the approach uses the fact that, due to the symmetries of the gravitational force and the equations of fluid dynamics, only a limited number of ingredients can contribute to creating the statistical large scale distribution of dark matter, and then via knowing this, the large scale structure of galaxy distributions. For example, we know that the Universe should look statistically the same no matter from what orientation we peer into it, due to a symmetry called rotational invariance, which is respected by the fluid dynamics equations describing how the properties of the large-scale density of dark matter change over time. This means that in the equations relating the statistical properties of galaxies to the statistical properties of dark matter, ingredients that disobey these rules are not allowed.

As mentioned above, the idea that "every ingredient allowed by the symmetries of your problem should be included," along with, "more complicated ingredients should matter less," (general ideas coming from theoretical particle physics), provide some of the pillars of model building in particle physics. This explains why several of the original proponents of the modern versions of bias models have particle physics backgrounds.

Cosmologists know how the Universe should look at very early times, when it was very smooth—for example, at the time of decoupling of the cosmic microwave background when the density contrast between different parts of the Universe was at most about one part in one hundred thousand—compared to now, when we have vast areas of virtually empty space in the Universe that contrast with (relatively) extremely dense massive galaxy clusters. However, it is much harder to describe structure during the high-contrast phases, when simple approximations won’t do the trick.

So particle physicists brought another tool to the table to tackle this issue: the aforementioned perturbation theories (or, equivalently, expansions), which cosmologists apply to structure formation. Given an initial picture of how the smooth Universe looked, perturbation theories of cosmic structure enable you to build an approximate solution to this collapse problem.

However, this process of non-linear gravitational collapse is extremely difficult to solve with a pencil and paper. In fact, both perturbation theory and bias modelling calculations, while based on very powerful and general principles, are difficult to use. The mathematical formalism is formidable, the equations difficult to work with, and by construction this approximation technique has to fail at some point. This is why simulations are so important to the field.

But simulations and pencil-and-paper tools are both trying to describe the same physical process. This then begged the question: can they be combined in some way? It turns out that the answer is *yes!* This was pointed out by a group of friends of ours across the Bay at UC Berkeley in a late 2019 publication.

The result: Significantly better precision *and* accuracy.

What our work did was realize this combination in a way that could be used for data analysis. We created the ingredients of this "bias model" explicitly in simulations. This circumvents the difficult calculations inherent to perturbation theories and we additionally gain the benefits of better small-scale modeling afforded by computer simulations.

We additionally circumvented the limitations of N-body simulations, the fact that they only provide predictions for a single Universe at a time while also being computationally demanding to run. We did this by leveraging a suite of simulations developed in an effort called the *Aemulus* Project to measure these ingredients for several different universes. *Aemulus* was a KIPAC-led project to run a very large number of high-resolution supercomputer simulations of the Universe. It consisted of 75 total simulations, with 47 different configurations of universes, chosen to correspond to values randomly spread across the possible universes allowed by current cosmic surveys.

The idea behind this is that the statistics we measure from galaxy samples should change relatively smoothly as we step around in this space of allowed universes. If you have a representative sample of this space and use some fancy statistics, you can then build a model to fill in the blanks for universes which lie in between the simulations. This is precisely what we did, using a combination of several statistical techniques (primarily principal component analysis and polynomial chaos expansion).

The end result is a model for the statistical properties of galaxies and gravitational lensing that is highly accurate *and* precise (remember these terms?) in regimes previously not thought possible, while also being computationally efficient. To be more precise, while the pencil-and-paper models developed by Senatore could accurately describe the statistical correlations of the Universe at separations larger than 43 megaparsecs (Mpc), this hybrid technique seems to be able to model these same correlations down to scales of 15 Mpc! This is close to a factor of three improvement in modelling capacity.

The amount of cosmological information contained in a correlation measurement will grow non-linearly with the smallest scales studied, roughly with to the 3/2 power. Therefore, applying the hybrid model to the same dataset has the potential to unlock a five-fold increase (3^{1.5} = 5.2) in statistical constraining power compared to current methods. As an example, Fig. 2, below, shows a comparison between the hybrid model and perturbation theory.

On the left-hand side of Fig. 2, we're looking at the largest scales (or smallest "wavenumbers," *k*), while the right-hand side of each panel is telling you about the small scales. As we mentioned, perturbation theory does not describe the Universe well at small scales and what we observe in this figure is a disagreement between the dashed and solid lines as we go more and more to the right.

The leftmost part of each panel is telling you about the clumpiness of the Universe at approximately 285 Mpc. The rightmost-side of each panel is looking at clumpiness on the order of only 3 Mpc! This type of analysis is similar to breaking down a song into different frequencies to understand which are more prevalent. For example, electronic music with its thumping bassline will have more low-frequency ingredients than usual.

Another advantage of this hybrid model is we can create very nice-looking visualizations of the bias ingredients, which can help inform our physics intuition as we attempt to understand what is happening cosmologically.

For example, the panels in Fig. 3, above, show three of the ingredients that can contribute to this bias model as slices through one of our simulations. The first panel shows the density of dark matter, 𝜹 (Greek lower-case delta), at very early times. We can also construct the density squared, 𝜹^{2} shown in the middle panel. Notice that this squaring operation is highlighting both the over-densities (red blobs) *and* the underdensities (blue blobs). Finally, we can also measure the *tidal strength* (net gravitational force at any point in space) in this simulation, which is shown in the third panel. Instead of solving for the time evolution of each of these ingredients using perturbation theory, we can let our simulations do the job for us! We can then reconstruct these panels (as well as the unweighted distribution of dark matter) and observe that different aspects of the distribution of dark matter are highlighted.

The distribution of *any* set of galaxies in this simulation that obeys the symmetries of our bias model then has to necessarily be written as a linear combination of the four panels in Fig. 4, below. There are free parameters, and they’re complicated, so we'll leave them for some other time.

To sum up: as next-generation cosmic surveys come online it's imperative that we have sufficiently good tools to get the most out of the data. There are several different approaches to describing the same problem, each with its own strengths and weaknesses. Our work took a step toward a new approach which combines two popular techniques in a way that seems to capture the benefits of each. These hybrid models are a novel way to build models for the data galaxy surveys will collect in the future. I'm actively working on better understanding them as well as shaping them into tools that will eventually integrate seamlessly with data analysis tools in use today, all in an effort to gain more insight into the detailed working of our vast and mysterious Universe...

**Read More**

The cosmology dependence of galaxy clustering and lensing from a hybrid N-body-perturbation theory model (Kokron, et al., 2021)