Skip to main content Skip to secondary navigation
Main content start

Center for Decoding the Universe (C4DU) Journal Club

Xiaosheng Zhao (JHU)
CoDa W401

Event Details:

Monday, April 6, 2026
1:00pm - 3:00pm PDT

Location

CoDa W401

This event is open to:

Members
Students

Title: Decoding the Chemical Fossil Record: Machine Learning and Foundation Models for Near-Field Cosmology

Abstract: Flagship spectroscopic surveys such as DESI, LAMOST, SDSS, and PFS are accumulating an unprecedented dataset of stars spanning the Milky Way and the broader Local Group. In near-field cosmology, these ancient stars act as time capsules: their chemical abundances encode the star formation history and hierarchical assembly of our Galaxy. Maximizing the scientific yield of these heterogeneous datasets is a major challenge, as traditional stellar parameter pipelines often struggle at low-to-medium spectral resolution and suffer from domain mismatch across survey instruments. This creates a cross-domain transfer and label-scarce learning problem at astronomical scale.

In this talk, I will present data-driven machine learning frameworks designed to bridge the cross-survey divide. First, I will demonstrate how simple, parameter-efficient neural networks (MLPs) pre-trained on low-resolution data (LAMOST) can rapidly adapt to medium-resolution surveys (DESI) using few-shot transfer learning. This approach successfully recovers the Galactic thin and thick disk chemical bi-modality from DESI spectra, a separation that is smeared out in the classical DESI stellar parameter pipeline, and improves agreement with high-resolution reference labels.

Next, I will introduce SpecCLIP, a spectral foundation model that uses contrastive learning with auxiliary reconstruction and cross-modal prediction decoders to align different spectral modalities (LAMOST low-resolution and Gaia XP spectra) into a unified embedding space while preserving modality-specific information. SpecCLIP delivers competitive performance across a broad range of stellar parameters and enables cross-survey spectral retrieval and prediction. I will also discuss how embeddings from SpecCLIP generalize to DESI fine-tuning, examining the regimes where foundation model representations offer advantages over direct spectral inputs and where simple pre-trained MLPs remain competitive. Together, these approaches offer a practical path toward extracting the chemical fossil record at scale, providing observational constraints on early galaxy assembly and chemical enrichment history, with broader implications for multi-modal transfer learning in any domain where instruments are mismatched and labels are scarce.

Explore More Events