Author name cluster

He He

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

1 author row

NeurIPS Conference 2025 Conference Paper

Predicting Empirical AI Research Outcomes with Language Models

Jiaxin Wen
Chenglei Si
Yueh-Han Chen
He He
Shi Feng

Many promising-looking ideas in AI research fail to deliver, but their validation takes substantial human labor and compute. Predicting an idea's chance of success is thus crucial for accelerating empirical AI research, a skill that even expert researchers can only acquire through substantial experience. We build the first benchmark for this task and compare LMs with human experts. Concretely, given two research ideas (e. g. , two jailbreaking methods), we aim to predict which will perform better on a set of benchmarks. We scrape ideas and experimental results from conference papers, yielding 1, 585 human-verified idea pairs \textit{published after our base model's cut-off date} for testing, and 6, 000 pairs for training. We then develop a system that combines a fine-tuned GPT-4. 1 with a paper retrieval agent, and we recruit 25 human experts to compare with. In the NLP domain, our system beats human experts by a large margin (64. 4\% v. s. 48. 9\%). On the full test set, our system achieves 77\% accuracy, while off-the-shelf frontier LMs like o3 perform no better than random guessing, even with the same retrieval augmentation. We verify that our system does not exploit superficial features like idea complexity through extensive human-written and LM-designed robustness tests. Finally, we evaluate our system on unpublished novel ideas, including ideas generated by an AI ideation agent. Our system achieves 63. 6\% accuracy, demonstrating its potential as a reward model for improving idea generation models. Altogether, our results outline a promising new direction for LMs to accelerate empirical AI research.

PDF Details

TMLR Journal 2024 Journal Article

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Usman Anwar
Abulhair Saparov
Javier Rando
Daniel Paleka
Miles Turpin
Peter Hase
Ekdeep Singh Lubana
Erik Jenner

This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose 200+, concrete research questions.

PDF Details

NeurIPS Conference 2024 Conference Paper

Iterative Reasoning Preference Optimization

Richard Y. Pang
Weizhe Yuan
Kyunghyun Cho
He He
Sainbayar Sukhbaatar
Jason Weston

Iterative preference optimization methods have recently been shown to perform well for general instruction tuning tasks, but typically make little improvement on reasoning tasks. In this work we develop an iterative approach that optimizes the preference between competing generated Chain-of-Thought (CoT) candidates by optimizing for winning vs. losing reasoning steps. We train using a modified DPO loss with an additional negative log-likelihood term, which we find to be crucial. We show reasoning improves across repeated iterations of this scheme. While only relying on examples in the training set, our approach results in increasing accuracy on GSM8K, MATH, and ARC-Challenge for Llama-2-70B-Chat, outperforming other Llama-2-based models not relying on additionally sourced datasets. For example, we see a large improvement from 55. 6% to 81. 6% on GSM8K and an accuracy of 88. 7% with majority voting out of 32 samples.

PDF Details DOI

TMLR Journal 2024 Journal Article

Nuisances via Negativa: Adjusting for Spurious Correlations via Data Augmentation

Aahlad Manas Puli
Nitish Joshi
Yoav Wald
He He
Rajesh Ranganath

In prediction tasks, there exist features that are related to the label in the same way across different settings for that task; these are semantic features or semantics. Features with vary- ing relationships to the label are nuisances. For example, in detecting cows from natural images, the shape of the head is semantic, but because images of cows often have grass back- grounds but not always, the background is a nuisance. Models that exploit nuisance-label relationships face performance degradation when these relationships change. Building mod- els robust to such changes requires additional knowledge beyond samples of the features and labels. For example, existing work uses annotations of nuisances or assumes erm-trained models depend on nuisances. Approaches to integrate new kinds of additional knowledge enlarge the settings where robust models can be built. We develop an approach to use knowledge about the semantics via data augmentations. These data augmentations cor- rupt semantic information to produce models that identify and adjust for where nuisances drive predictions. We study semantic corruptions in powering different spurious-correlation- avoiding methods on multiple out-of-distribution (ood) tasks like classifying waterbirds, natural language inference (nli), and detecting cardiomegaly in chest X-rays.

PDF Details

NeurIPS Conference 2024 Conference Paper

The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

Hannah R. Kirk
Alexander Whitefield
Paul Röttger
Andrew Bean
Katerina Margatina
Juan Ciro
Rafael Mosquera
Max Bartolo

Human feedback is central to the alignment of Large Language Models (LLMs). However, open questions remain about the methods (how), domains (where), people (who) and objectives (to what end) of feedback processes. To navigate these questions, we introduce PRISM, a new dataset which maps the sociodemographics and stated preferences of 1, 500 diverse participants from 75 countries, to their contextual preferences and fine-grained feedback in 8, 011 live conversations with 21 LLMs. With PRISM, we contribute (i) wider geographic and demographic participation in feedback; (ii) census-representative samples for two countries (UK, US); and (iii) individualised ratings that link to detailed participant profiles, permitting personalisation and attribution of sample artefacts. We target subjective and multicultural perspectives on value-laden and controversial issues, where we expect interpersonal and cross-cultural disagreement. We use PRISM in three case studies to demonstrate the need for careful consideration of which humans provide alignment data.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples

Abulhair Saparov
Richard Yuanzhe Pang
Vishakh Padmakumar
Nitish Joshi
Mehran Kazemi
Najoung Kim
He He

Given the intractably large size of the space of proofs, any model that is capable of general deductive reasoning must generalize to proofs of greater complexity. Recent studies have shown that large language models (LLMs) possess some abstract deductive reasoning ability given chain-of-thought prompts. However, they have primarily been tested on proofs using modus ponens or of a specific size, and from the same distribution as the in-context examples. To measure the general deductive reasoning ability of LLMs, we test on a broad set of deduction rules and measure their ability to generalize to more complex proofs from simpler demonstrations from multiple angles: depth-, width-, and compositional generalization. To facilitate systematic exploration, we construct a new synthetic and programmable reasoning dataset that enables control over deduction rules and proof complexity. Our experiments on four LLMs of various sizes and training objectives show that they are able to generalize to compositional proofs. However, they have difficulty generalizing to longer proofs, and they require explicit demonstrations to produce hypothetical subproofs, specifically in proof by cases and proof by contradiction.

PDF Details

NeurIPS Conference 2022 Conference Paper

SeqPATE: Differentially Private Text Generation via Knowledge Distillation

Zhiliang Tian
Yingxiu Zhao
Ziyue Huang
Yu-Xiang Wang
Nevin L. Zhang
He He

Protecting the privacy of user data is crucial for text generation models, which can leak sensitive information during generation. Differentially private (DP) learning methods provide guarantees against identifying the existence of a training sample from model outputs. PATE is a recent DP learning algorithm that achieves high utility with strong privacy protection on training samples. However, text generation models output tokens sequentially in a large output space; the classic PATE algorithm is not customized for this setting. Furthermore, PATE works well to protect sample-level privacy, but is not designed to protect phrases in samples. In this paper, we propose SeqPATE, an extension of PATE to text generation that protects the privacy of individual training samples and sensitive phrases in training data. To adapt PATE to text generation, we generate pseudo-contexts and reduce the sequence generation problem to a next-word prediction problem. To handle the large output space, we propose a candidate filtering strategy to dynamically reduce the output space, and refine the teacher aggregation of PATE to avoid low agreement due to voting for a large number of candidates. To further reduce privacy losses, we use knowledge distillation to reduce the number of teacher queries. The experiments verify the effectiveness of SeqPATE in protecting both training samples and sensitive phrases.

PDF Details

NeurIPS Conference 2021 Conference Paper

IRM—when it works and when it doesn't: A test case of natural language inference

Yana Dranker
He He
Yonatan Belinkov

Invariant Risk Minimization (IRM) is a recently proposed framework for out-of-distribution (o. o. d) generalization. Most of the studies on IRM so far have focused on theoretical results, toy problems, and simple models. In this work, we investigate the applicability of IRM to bias mitigation-a special case of o. o. d generalization-in increasingly naturalistic settings and deep models. Using natural language inference (NLI) as a test case, we start with a setting where both the dataset and the bias are synthetic, continue with a natural dataset and synthetic bias, and end with a fully realistic setting with natural datasets and bias. Our results show that in naturalistic settings, learning complex features in place of the bias proves to be difficult, leading to a rather small improvement over empirical risk minimization. Moreover, we find that in addition to being sensitive to random seeds, the performance of IRM also depends on several critical factors, notably dataset size, bias prevalence, and bias strength, thus limiting IRM's advantage in practical scenarios. Our results highlight key challenges in applying IRM to real-world scenarios, calling for a more naturalistic characterization of the problem setup for o. o. d generalization.

PDF Details

JMLR Journal 2020 Journal Article

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing

Jian Guo
He He
Tong He
Leonard Lausen
Mu Li
Haibin Lin
Xingjian Shi
Chenguang Wang

We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating). These toolkits provide state-of-the-art pre-trained models, training scripts, and training logs, to facilitate rapid prototyping and promote reproducible research. We also provide modular APIs with flexible building blocks to enable efficient customization. Leveraging the MXNet ecosystem, the deep learning models in GluonCV and GluonNLP can be deployed onto a variety of platforms with different programming languages. The Apache 2.0 license has been adopted by GluonCV and GluonNLP to allow for software distribution, modification, and usage. [abs] [ pdf ][ bib ] &copy JMLR 2020. ( edit, beta )

PDF Details

IJCAI Conference 2020 Conference Paper

Partial Adversarial Behavior Deception in Security Games

Thanh H. Nguyen
Arunesh Sinha
He He

Learning attacker behavior is an important research topic in security games as security agencies are often uncertain about attackers' decision making. Previous work has focused on developing various behavioral models of attackers based on historical attack data. However, a clever attacker can manipulate its attacks to fail such attack-driven learning, leading to ineffective defense strategies. We study attacker behavior deception with three main contributions. First, we propose a new model, named partial behavior deception model, in which there is a deceptive attacker (among multiple attackers) who controls a portion of attacks. Our model captures real-world security scenarios such as wildlife protection in which multiple poachers are present. Second, we introduce a new scalable algorithm, GAMBO, to compute an optimal deception strategy of the deceptive attacker. Our algorithm employs the projected gradient descent and uses the implicit function theorem for the computation of gradient. Third, we conduct a comprehensive set of experiments, showing a significant benefit for the attacker and loss for the defender due to attacker deception.

PDF Details DOI

NeurIPS Conference 2016 Conference Paper

A Credit Assignment Compiler for Joint Prediction

Kai-Wei Chang
He He
Stephane Ross
Hal Daume III
John Langford

Many machine learning applications involve jointly predicting multiple mutually dependent output variables. Learning to search is a family of methods where the complex decision problem is cast into a sequence of decisions via a search space. Although these methods have shown promise both in theory and in practice, implementing them has been burdensomely awkward. In this paper, we show the search space can be defined by an arbitrary imperative program, turning learning to search into a credit assignment compiler. Altogether with the algorithmic improvements for the compiler, we radically reduce the complexity of programming and the running time. We demonstrate the feasibility of our approach on multiple joint prediction tasks. In all cases, we obtain accuracies as high as alternative approaches, at drastically reduced execution and programming time.

PDF Details

NeurIPS Conference 2014 Conference Paper

Learning to Search in Branch and Bound Algorithms

He He
Hal Daume III
Jason Eisner

Branch-and-bound is a widely used method in combinatorial optimization, including mixed integer programming, structured prediction and MAP inference. While most work has been focused on developing problem-specific techniques, little is known about how to systematically design the node searching strategy on a branch-and-bound tree. We address the key challenge of learning an adaptive node searching order for any class of problem solvable by branch-and-bound. Our strategies are learned by imitation learning. We apply our algorithm to linear programming based branch-and-bound for solving mixed integer programs (MIP). We compare our method with one of the fastest open-source solvers, SCIP; and a very efficient commercial solver, Gurobi. We demonstrate that our approach achieves better solutions faster on four MIP libraries.

PDF Details

NeurIPS Conference 2012 Conference Paper

Imitation Learning by Coaching

He He
Jason Eisner
Hal Daume

Imitation Learning has been shown to be successful in solving many challenging real-world problems. Some recent approaches give strong performance guarantees by training the policy iteratively. However, it is important to note that these guarantees depend on how well the policy we found can imitate the oracle on the training data. When there is a substantial difference between the oracle's ability and the learner's policy space, we may fail to find a policy that has low error on the training set. In such cases, we propose to use a coach that demonstrates easy-to-learn actions for the learner and gradually approaches the oracle. By a reduction of learning by demonstration to online learning, we prove that coaching can yield a lower regret bound than using the oracle. We apply our algorithm to a novel cost-sensitive dynamic feature selection problem, a hard decision problem that considers a user-specified accuracy-cost trade-off. Experimental results on UCI datasets show that our method outperforms state-of-the-art imitation learning methods in dynamic features selection and two static feature selection methods.

PDF Details