Author name cluster

Aarti Singh

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

51 papers

2 author rows

AAAI Conference 2026 Conference Paper

Scaling Up AI Alignment

Aarti Singh

From expert AI systems of the 1970s to self-supervised systems of the 2020s, the pendulum of AI development has swung from heavy reliance on human feedback to no or minimal reliance in the last 50 years. Self-supervised approaches have contributed significantly to the success and scalable development of AI. However, today we are at a tipping point where the future of AI, and whether so-ciety ends up benefiting from this technology in the long run, depends critically on the subsequent AI develop-ment aligning with human goals and values. Realizing this, there has been ramping up of efforts to align AI models with human expectations and values. Human feedback, however, remains limited and difficult to elicit. Thus, a key question lingers – how can we scale up alignment of AI systems with individual expectations and societal norms? This talk and paper provides an overview and perspective on efforts at answering this question.

PDF Details DOI

TMLR Journal 2025 Journal Article

Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts

Youngseog Chung
Dhruv Malik
Jeff Schneider
Yuanzhi Li
Aarti Singh

The traditional viewpoint on Sparse Mixture of Experts (MoE) models is that instead of training a single _large_ expert, which is computationally expensive, we can train many _small_ experts. The hope is that if the total parameter count of the small experts equals that of the singular large expert, then we retain the representation power of the large expert while gaining computational tractability and promoting expert specialization. The recently introduced Soft MoE replaces the Sparse MoE's discrete routing mechanism with a differentiable gating function that smoothly mixes tokens. While this smooth gating function successfully mitigates the various training instabilities associated with Sparse MoE, it is unclear whether it induces implicit biases that affect Soft MoE's representation power or potential for expert specialization. We prove that Soft MoE with a single arbitrarily powerful expert cannot represent simple convex functions. This justifies that Soft MoE's success cannot be explained by the traditional viewpoint of many small experts collectively mimicking the representation power of a single large expert, and that multiple experts are actually _necessary_ to achieve good representation power (even for a fixed total parameter count). Continuing along this line of investigation, we introduce a notion of expert specialization for Soft MoE, and while varying the number of experts yet fixing the total parameter count, we consider the following (computationally intractable) task. Given any input, how can we discover the expert subset that is specialized to predict this input's label? We empirically show that when there are many small experts, the architecture is implicitly biased in a fashion that allows us to efficiently approximate the specialized expert subset. Our method can be easily implemented to potentially reduce computation during inference. For example, using our method on ImageNet, one can perform inference using only $1/8$ of the experts and still retain $99$% of the test accuracy of using all experts.