AI Clustering of ALMA ISM & Star Formation Data

Keywords: ALMA, Radio Astronomy, Star Formation

Supervisor: 謝天晧 (Tien-Hao Hsieh) - TARA/ASIAA & 賴詩萍 (Shih-Ping Lai) - National Tsing Hua University (NTHU)

Number of Students: 1

Project Description

The Atacama Large Millimeter/submillimeter Array (ALMA) has produced a vast archive of image-spectral cubes over the past decade. However, only a fraction of these data have been systematically analyzed, leaving a large portion of the archive as an untapped resource for astronomy.

This project applies modern AI techniques—specifically self-supervised learning (SSL)—to more than 400,000 ALMA data cubes from the scientific category Interstellar Medium (ISM) and Star Formation. Without using manual labels, we train a large-scale pre-trained model to learn intrinsic data representations, automatically clustering sources according to their morphological and kinematic similarities.

Dimensionality-reduction methods such as UMAP reveal distinct clusters (“islands”) in the learned representation space. The central goal of this project is to interpret these clusters physically: do they correspond to protostellar outflows, Keplerian disks, filamentary structures, or other astrophysical phenomena? Once identified, we will investigate the statistical properties of each cluster as a population.

The selected student will combine AI-driven clustering with detailed cube inspection using CARTA, a powerful astronomical data visualization and analysis tool. By labeling and characterizing these clusters, this project aims to establish a systematic, large-scale framework for understanding ALMA archival data.

AI Clustering of ALMA ISM & Star Formation Data

Project Description

Required Background