by Lori Ann White
One of the biggest challenges facing researchers who will use data from the Legacy Survey of Space and Time (LSST) provided by the Vera C. Rubin Observatory (formerly the Large Synoptic Survey Telescope) is that as it scans the sky from its perch on a mountain top in the Chilean Andes, it will see a crowded Universe. A significant number of objects in LSST images will appear to overlap to some extent—galaxies with galaxies, stars with stars, galaxies with stars—creating uncertainties in shapes and redshifts (i.e. distances from us).
This is called "blending," and is more challenging for the LSST due to the unprecedented depth of the survey, coupled with the blurring of the resultant images due to the Earth’s atmosphere. In other words, the more than 50 petabytes of raw image data accumulated by during the 10-year survey will show billions of objects reaching back billions of years in the lifetime of the Universe, all crammed into almost 20,000 square degrees of sky, their image further muddled by atmospheric turbulence.
This effect is especially significant for the subtle measurements necessary for studies of cosmic shear, which trace gravitational lensing that can help map the location of all matter, including dark matter, in time and space (as previously discussed in this Nov. 2017 KIPAC blogpost about Dark Energy Survey results).

"When images of objects overlap each other, the techniques we use to measure the properties of single objects, such as their shape or amount of flux, don’t work to the accuracy we need with a survey like LSST," says KIPAC professor Patricia Burchat, co-convener of the Blending Task Force for the LSST Dark Energy Science Collaboration.
Prior surveys would generally throw out blended objects, Burchat says, “but we can no longer afford to throw away every object that overlaps with another one, or simply treat them as a single object.” Approximately 60% of the galaxies seen by the Rubin Observatory will have their observed brightness altered by at least 2% due to the influence of a neighboring object. That’s more than enough to compromise observations.
Another challenge is unrecognized blends—objects that overlap so highly they are detected as a single object (or where one of the objects is so faint that it can’t be discerned directly, but it does affect the light yield from the other object). Tracing the distribution of matter and how it has evolved over the lifetime of the Universe due to the influence of dark energy depends on accurate detection of galaxies and accurate measurements of their properties.
One way KIPAC researchers are tackling this challenge is with machine learning that trains computers to recognize patterns—in this case, what overlapping galaxies look like.
The most recent versions of machine learning have opened up new possibilities in many fields, including image recognition, and KIPAC scientists are embracing it as a powerful tool.
One KIPAC researcher using machine learning is student Sowmya Kamath, who is preparing to graduate soon with her doctorate. Kamath joined Burchat to work on weak lensing systematics, and when she read a 2016 paper by several LSST scientists (“The Ellipticity Distribution of Ambiguously Blended Objects,” Dawson, et al. (2014) ) discussing the issues surrounding blending, she embraced this new challenge.

The consequences of not meeting the blending challenge are sobering, Kamath says. “Our cosmological measurements will have biases and uncertainties. The shapes could be wrong, flux measurements could be wrong. Distances calculated using photometric redshifts could be wrong because we’ll be basing them on colors that are a blend of light from more than one object.” All of which could ultimately strongly influence and even bias cosmology results from the survey.
Kamath is tackling the problem of unrecognized blends by training sophisticated machine-learning algorithms to detect candidate galaxies that have been classified as a single galaxy by classical methods, but in fact correspond to two galaxies.
"It’s a hard challenge, but there’s a path through it that looks like it might work," Kamath says.
The trail Kamath is blazing takes advantage of existing LSST tools along with her own work. Using GalSim, an open-source toolkit developed with considerable DESC involvement, she has amassed a large and varied collection of galaxies, because, as Kamath explains, "more data is better" for training neural networks. Kamath uses simulated galaxies instead of existing data from previous surveys because that way, "I know the truth"—i.e. she knows when an image that looks like one galaxy actually is an image of one galaxy and when it’s an image of overlapping galaxies that can be mistaken for one galaxy. No matter how good previous surveys are, that certainty can’t be guaranteed.
Kamath runs her tiny simulated universe through the LSST Science Pipeline (LSP), a suite of tools that performs the initial reduction of LSST data, such as creating catalogs of sources and other preliminary results. The LSP includes its own galaxy "deblending" (disambiguation) tool called Scarlet, which separates a blend scene as the superposition of several overlapping objects. What the individual objects look like is modeled by Scarlet based upon known galaxy characteristics, such as how quickly a galaxy dims with distance from its center. Any differences between the two—actual and model—comprise the "residuals." If the LSP detection step fails to identify a possible individual source among the blended objects in the first place, then the residuals could hint at its presence.

Finally, Kamath analyzes the results with her own code, using the differences between what the LSP and Scarlet thinks it should look like and what she knows to be true, and using that information to train her customized neural network—a.k.a. “the Residual Detectron”— to tease out unrecognized blends missed by the other tools.
Machine-learning algorithms have an important advantage over humans, as well as other object-recognition algorithms: machine-learning algorithms can deal with many more color bands than the standard red-green-blue of cameras, and are thus better equipped to deal with images from the LSST camera, which will see light in six different bands.
Kamath says the work she’s doing now is a direct outgrowth of—and motivated by—her research into weak lensing systematics, but now she’s approaching it with a new set of machine learning tools. She appreciates the opportunity to investigate a whole new field of data science, as well as really improve her coding skills.
More broadly, Stanford University as a whole has recognized the potential of machine learning and is working on increasing its machine-learning expertise via the Stanford Data Science Initiative, with KIPAC professor Tom Abel as a member of the initiative's working group while KIPAC Director Risa Wechsler helped define Stanford's vision in this area as a member of the Data Science Design Team.
But Kamath isn’t working so hard just for the experience of machine learning per se. She enjoys the blending challenge for the astrophysical implications, as well as its own sake. In fact, she says: "It’s fun!"
