About me

Table of Contents

I was born and raised in Nanjing, China. My journey has since taken me to Singapore and Cambridge, UK, where I had the privilege of living and studying before beginning my PhD in Baltimore, USA. During my undergraduate years studying mathematics, I developed a passion for two distinct directions: the rigorous study of abstract structures — with abstract algebra, geometry, topology, and category theory among my favorites — and the practical modeling of real-world data, involving statistics and machine learning. The convergence of these interests led me to explore interdisciplinary realms, as I found it particularly exciting to discover hidden connections and shared insights across seemingly different domains. I was then intrigued by the new research are of Geometric Deep Learning which is exactly at the intersection of my interests, and I decided to pursue a PhD to explore more. In the research journey, I’m simultaneously enjoying first the happiness of gradually discover what’s going on in these black-box models, like a scientist; on the other hand, i secretly uses the research as an excuse to continually learn and explore new mathematics.

Resesarch interests #

I am broadly interested in mathematically unraveling the interplay between the structure of data (e.g., symmetries, distributions, and geometric properties) and the neural networks that process them. By understanding why and when these models excel, I aim to improve existing models and design new ones, guided by mathematical foundations. Below, I outline some research questions I am currently exploring and am passionate about. If you share an interest in these topics, I would be delighted to connect.

Any-dimensional Learning and Size Generalization: While architectures like transformers, graph neural networks (GNNs), and DeepSets are designed to handle inputs of varying sizes with fixed-size learnable weights, their ability to generalize across sizes—for instance, training on smaller instances and performing well on larger ones—remains poorly understood. I seek to answer two fundamental questions:
1. For existing any-dimensional models (i.e., those capable of handling inputs of arbitrary sizes), what properties of the data and task enable effective size generalization?
  - This is a question of the model’s implicit inductive bias. Checkout our recent paper [1] and my talk [3].
2. For an any-dimensional learning task involving data of varying sizes, how can we design new architectures or modify existing ones to achieve effective size generalization?
  - I am particularly interested in the concept of “learning an algorithm” that works across all input dimensions. Check our our recent work in this direction [2].
Symmetries in Machine Learning Learning problems that I am passionate about — such as “learning an algorithm” — usually involve symmetries. This has drawn me to the area of equivariant machine learning, where neural network designs explicitly incorporate symmetry constraints inherent to the data or task. Meanwhile, it appears that in order for a model to use fixed learnable weights to parametrize a function capable of processing inputs of arbitrary dimensions, symmetry constraint is almost always required. This is closely related to the concept of representation stability.
Training dynamics and weight space structure More recently, I have been intrigued by the study of training dynamics, focusing on understanding how the function parametrized by a neural network evolves throughout the training process, under the random initialization. Specifically, I am exploring how the weight space structure interacts with the $\mu$P (maximal update parametrization) framework, which has proven to be a powerful tool for scaling neural networks effectively.

There's no articles to list here yet.