AstroM³: A self-supervised multimodal model for astronomy

Mar 17, 2025 - 11:00 am to 12:00 pm
Location

Campus, Varian 355

Speaker
Mariia Rizhko (UC Berkeley) In Person and zoom https://stanford.zoom.us/my/sihanyuan?pwd=QnpsUHZWWGJ2ekVYWmZVL3BmM0gzZz09

Zoom info:  https://stanford.zoom.us/my/sihanyuan?pwd=QnpsUHZWWGJ2ekVYWmZVL3BmM0gzZ…

While machine-learned models are now routinely employed to facilitate astronomical inquiry, model inputs tend to be limited to a primary data source (namely images or time series) and, in the more advanced approaches, some metadata. Yet with the growing use of wide-field, multiplexed observational resources, individual sources of interest often have a broad range of observational modes available. Here  we construct an astronomical multimodal dataset and propose AstroM³, a self-supervised pre-training approach that enables a model to learn from multiple modalities simultaneously. We extend the CLIP (Contrastive Language-Image Pretraining) model to a trimodal setting, allowing the integration of time-series photometry data, spectra, and astrophysical metadata. In a fine-tuning supervised setting, CLIP pre-training improves classification accuracy, particularly when labeled data is limited, with increases of up to 14.29% in spectra classification, 2.27% in metadata, and 10.20% in photometry. Furthermore, we show that combining photometry, spectra, and metadata improves classification accuracy over single-modality models. In addition to fine-tuned classification, we can use the trained model in other downstream tasks that are not explicitly contemplated during the construction of the self-supervised model. In particular we show the efficacy of using the learned embeddings to identify misclassifications, for similarity search, and for anomaly detection. One surprising highlight is the "rediscovery" of Mira subtypes and two Rotational variable subclasses using manifold learning and dimensionality reduction algorithms. To our knowledge this is the first construction of an n>2 mode model in astronomy. Extensions to n>3 modes are naturally anticipated with this approach.