EAAI Journal 2026 Journal Article
A deep learning-based imaging classification framework for interstitial lung disease
- Hongyi Wang
- Anqi Liu
- Xiaoyan Yang
- Yifei Ni
- Jianping Wang
- Jie Du
- Yuhui Qiang
- Bingbing Xie
Author name cluster
Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.
EAAI Journal 2026 Journal Article
TMLR Journal 2026 Journal Article
In-context learning (ICL) -- the capacity of a model to infer and apply abstract patterns from examples provided within its input -- has been extensively studied in large language models trained for next-token prediction on human text. In fact, prior work often attributes this emergent behavior to distinctive statistical properties in *human* language. This raises a fundamental question: can ICL arise *organically* in other sequence domains purely through large-scale predictive training? To explore this, we turn to genomic sequences, an alternative symbolic domain rich in statistical structure. Specifically, we study the Evo2 genomic model, trained predominantly on next-nucleotide (A/T/C/G) prediction, at a scale comparable to mid-sized LLMs. We develop a controlled experimental framework comprising symbolic reasoning tasks instantiated in both linguistic and genomic forms, enabling direct comparison of ICL across genomic and linguistic models. Our results show that genomic models, like their linguistic counterparts, exhibit log-linear gains in pattern induction as the number of in-context demonstrations increases. To the best of our knowledge, this is the first evidence of organically emergent ICL in genomic sequences, supporting the hypothesis that ICL arises as a consequence of large-scale predictive modeling over rich data.
AAAI Conference 2026 Conference Paper
The opioid crisis represents a significant moment in public health that reveals systemic shortcomings across regulatory systems, healthcare practices, corporate governance, and public policy. Analyzing how these interconnected systems simultaneously failed to protect public health requires innovative analytic approaches for exploring the vast amounts of data and documents disclosed in the UCSF-JHU Opioid Industry Documents Archive (OIDA). The complexity, multimodal nature, and specialized characteristics of these healthcare-related legal and corporate documents necessitate more advanced methods and models tailored to specific data types and detailed annotations, ensuring the precision and professionalism in the analysis. In this paper, we tackle this challenge by organizing the original dataset according to document attributes and constructing a benchmark with 400k training documents and 10k for testing. From each document, we extract rich multimodal information—including textual content, visual elements, and layout structures—to capture a comprehensive range of features. Using multiple AI models, we then generate a large-scale dataset comprising 360k training QA pairs and 10k testing QA pairs. Building on this foundation, we develop domain-specific multimodal Large Language Models (LLMs) and explore the impact of multimodal inputs on task performance. To further enhance response accuracy, we incorporate historical QA pairs as contextual grounding for answering current queries. Additionally, we incorporate page references within the answers and introduce an importance-based page classifier, further improving the precision and relevance of the information provided. Preliminary results indicate the improvements with our AI assistant in document information extraction and question-answering tasks.
TMLR Journal 2025 Journal Article
Supervised classification traditionally assumes that training and testing samples are drawn from the same underlying distribution. However, practical scenarios are often affected by distribution shifts, such as covariate and label shifts. Most existing techniques for correcting distribution shifts are based on a reweighted approach that weights training samples, assigning lower relevance to the samples that are unlikely at testing. However, these methods may achieve poor performance when the weights obtained take large values at certain training samples. In addition, in multi-source cases, existing methods do not exploit complementary information among sources, and equally combine sources for all instances. In this paper, we establish a unified learning framework for distribution shift adaptation. We present a double-weighting approach to deal with distribution shifts, considering weight functions associated with both training and testing samples. For the multi-source case, the presented methods assign source-dependent weights for training and testing samples, where weights are obtained jointly using information from all sources. We also present generalization bounds for the proposed methods that show a significant increase in the effective sample size compared with existing approaches. Empirically, the proposed methods achieve enhanced classification performance in both synthetic and empirical experiments.
NeurIPS Conference 2025 Conference Paper
Language model outputs are not always reliable, thus prompting research into how to adapt model responses based on uncertainty. Common approaches include: abstention, where models refrain from generating responses when uncertain; and linguistic calibration, where models hedge their statements using uncertainty quantifiers. However, abstention can withhold valuable information, while linguistically calibrated responses are often challenging to leverage in downstream tasks. We propose a unified view, Conformal Linguistic Calibration (CLC), which reinterprets linguistic calibration as answer set prediction. First we present a framework connecting abstention and linguistic calibration through the lens of linguistic pragmatics. We then describe an implementation of CLC that allows for controlling the level of imprecision in model responses. Results demonstrate our method produces calibrated outputs with conformal guarantees on factual accuracy. Further, our approach enables fine-tuning models to perform uncertainty-aware adaptive claim rewriting, offering a controllable balance between factuality and specificity.
UAI Conference 2025 Conference Paper
Reliable and accurate estimation of the error of an ML model in unseen test domains is an important problem for safe intelligent systems. Prior work uses \textit{disagreement discrepancy} (\disdis) to derive practical error bounds under distribution shifts. It optimizes for a maximally disagreeing classifier on the target domain to bound the error of a given source classifier. Although this approach offers a reliable and competitively accurate estimate of the target error, we identify a problem in this approach which causes the disagreement discrepancy objective to compete in the overlapping region between source and target domains. With an intuitive assumption that the target disagreement should be no more than the source disagreement in the overlapping region due to high enough support, we devise Overlap-aware Disagreement Discrepancy (\odd). Our \odd-based bound uses domain-classifiers to estimate domain-overlap and better predicts target performance than \disdis. We conduct experiments on a wide array of benchmarks to show that our method improves the overall performance-estimation error while remaining valid and reliable. Our code and results are available on \href{https: //github. com/aamixsh/odd}{GitHub}.
AIIM Journal 2024 Journal Article
ICML Conference 2024 Conference Paper
Sampling-based methods, e. g. , Deep Ensembles and Bayesian Neural Nets have become promising approaches to improve the quality of uncertainty estimation and robust generalization. However, they suffer from a large model size and high latency at test time, which limits the scalability needed for low-resource devices and real-time applications. To resolve these computational issues, we propose Density-Softmax, a sampling-free deterministic framework via combining a density function built on a Lipschitz-constrained feature extractor with the softmax layer. Theoretically, we show that our model is the solution of minimax uncertainty risk and is distance-aware on feature space, thus reducing the over-confidence of the standard softmax under distribution shifts. Empirically, our method enjoys competitive results with state-of-the-art techniques in terms of uncertainty and robustness, while having a lower number of model parameters and a lower latency at test time.
TMLR Journal 2024 Journal Article
We introduce a distributionally robust approach that enhances the reliability of offline policy evaluation in contextual bandits under general covariate shifts. Our method aims to deliver robust policy evaluation results in the presence of discrepancies in both context and policy distribution between logging and target data. Central to our methodology is the application of robust regression — a distributionally robust technique tailored here to improve the estimation of conditional reward distribution from logging data. Utilizing the reward model obtained from robust regression, we develop a comprehensive suite of policy value estimators, by integrating our reward model into established evaluation frameworks, namely direct methods and doubly robust methods. Through theoretical analysis, we further establish that the proposed policy value estimators offer a finite sample upper bound for the bias, providing a clear advantage over traditional methods, especially when the shift is large. Finally, we designed an extensive range of policy evaluation scenarios, covering diverse magnitudes of shifts and a spectrum of logging and target policies. Our empirical results indicate that our approach significantly outperforms baseline methods, most notably in 90% of the cases under the policy shift-only settings and 72% of the scenarios under the general covariate shift settings.
NeurIPS Conference 2024 Conference Paper
Training a policy in a source domain for deployment in the target domain under a dynamics shift can be challenging, often resulting in performance degradation. Previous work tackles this challenge by training on the source domain with modified rewards derived by matching distributions between the source and the target optimal trajectories. However, pure modified rewards only ensure the behavior of the learned policy in the source domain resembles trajectories produced by the target optimal policies, which does not guarantee optimal performance when the learned policy is actually deployed to the target domain. In this work, we propose to utilize imitation learning to transfer the policy learned from the reward modification to the target domain so that the new policy can generate the same trajectories in the target domain. Our approach, Domain Adaptation and Reward Augmented Imitation Learning (DARAIL), utilizes the reward modification for domain adaptation and follows the general framework of generative adversarial imitation learning from observation (GAIfO) by applying a reward augmented estimator for the policy optimization step. Theoretically, we present an error bound for our method under a mild assumption regarding the dynamics shift to justify the motivation of our method. Empirically, our method outperforms the pure modified reward method without imitation learning and also outperforms other baselines in benchmark off-dynamics environments.
NeurIPS Conference 2024 Conference Paper
Despite advancements in robotic-assisted surgery, automating complex tasks like suturing remains challenging due to the need for adaptability and precision. Learning-based approaches, particularly reinforcement learning (RL) and imitation learning (IL), require realistic simulation environments for efficient data collection. However, current platforms often include only relatively simple, non-dexterous manipulations and lack the flexibility required for effective learning and generalization. We introduce SurgicAI, a novel platform for development and benchmarking that addresses these challenges by providing the flexibility to accommodate both modular subtasks and more importantly task decomposition in RL-based surgical robotics. Compatible with the da Vinci Surgical System, SurgicAI offers a standardized pipeline for collecting and utilizing expert demonstrations. It supports the deployment of multiple RL and IL approaches, and the training of both singular and compositional subtasks in suturing scenarios, featuring high dexterity and modularization. Meanwhile, SurgicAI sets clear metrics and benchmarks for the assessment of learned policies. We implemented and evaluated multiple RL and IL algorithms on SurgicAI. Our detailed benchmark analysis underscores SurgicAI's potential to advance policy learning in surgical robotics. Details: https: //github. com/surgical-robotics-ai/SurgicAI
TMLR Journal 2024 Journal Article
Learning models whose predictions are invariant under multiple environments is a promising approach for out-of-distribution generalization. Such models are trained to extract features $X_{\text{inv}}$ where the conditional distribution $Y \mid X_{\text{inv}}$ of the label given the extracted features does not change across environments. Invariant models are also supposed to generalize to shifts in the marginal distribution $p(X_{\text{inv}})$ of the extracted features $X_{\text{inv}}$, a type of shift we call an invariant covariate shift. However, we show that proposed methods for learning invariant models underperform under invariant covariate shift, either failing to learn invariant models---even for data generated from simple and well-studied linear-Gaussian models---or having poor finite-sample performance. To alleviate these problems, we propose weighted risk invariance (WRI). Our framework is based on imposing invariance of the loss across environments subject to appropriate reweightings of the training examples. We show that WRI provably learns invariant models, i.e. discards spurious correlations, in linear-Gaussian settings. We propose a practical algorithm to implement WRI by learning the density $p(X_{\text{inv}})$ and the model parameters simultaneously, and we demonstrate empirically that WRI outperforms previous invariant learning methods under invariant covariate shift.
ICML Conference 2023 Conference Paper
Supervised learning is often affected by a covariate shift in which the marginal distributions of instances (covariates $x$) of training and testing samples $p_\text{tr}(x)$ and $p_\text{te}(x)$ are different but the label conditionals coincide. Existing approaches address such covariate shift by either using the ratio $p_\text{te}(x)/p_\text{tr}(x)$ to weight training samples (reweighted methods) or using the ratio $p_\text{tr}(x)/p_\text{te}(x)$ to weight testing samples (robust methods). However, the performance of such approaches can be poor under support mismatch or when the above ratios take large values. We propose a minimax risk classification (MRC) approach for covariate shift adaptation that avoids such limitations by weighting both training and testing samples. In addition, we develop effective techniques that obtain both sets of weights and generalize the conventional kernel mean matching method. We provide novel generalization bounds for our method that show a significant increase in the effective sample size compared with reweighted methods. The proposed method also achieves enhanced classification performance in both synthetic and empirical experiments.
IJCAI Conference 2023 Conference Paper
We propose a framework for learning calibrated uncertainties under domain shifts, considering the case where the source (training) distribution differs from the target (test) distribution. We detect such domain shifts through the use of a differentiable density ratio estimator and train it together with the task network, composing an adjusted softmax predictive form that concerns the domain shift. In particular, the density ratio estimator yields a density ratio that reflects the closeness of a target (test) sample to the source (training) distribution. We employ it to adjust the uncertainty of prediction in the task network. This idea of using the density ratio is based on the distributionally robust learning (DRL) framework, which accounts for the domain shift through adversarial risk minimization. We demonstrate that our proposed method generates calibrated uncertainties that benefit many downstream tasks, such as unsupervised domain adaptation (UDA) and semi-supervised learning (SSL). On these tasks, methods like self-training and FixMatch use uncertainties to select confident pseudo-labels for re-training. Our experiments show that the introduction of DRL leads to significant improvements in cross-domain performance. We also demonstrate that the estimated density ratios show an agreement with the human selection frequencies, suggesting a positive correlation with a proxy of human perceived uncertainties.
NeurIPS Conference 2022 Conference Paper
Contemporary vision benchmarks predominantly consider tasks on which humans can achieve near-perfect performance. However, humans are frequently presented with visual data that they cannot classify with 100% certainty, and models trained on standard vision benchmarks achieve low performance when evaluated on this data. To address this issue, we introduce a procedure for creating datasets of ambiguous images and use it to produce SQUID-E ("Squidy"), a collection of noisy images extracted from videos. All images are annotated with ground truth values and a test set is annotated with human uncertainty judgments. We use this dataset to characterize human uncertainty in vision tasks and evaluate existing visual event classification models. Experimental results suggest that existing vision models are not sufficiently equipped to provide meaningful outputs for ambiguous images and that datasets of this nature can be used to assess and improve such models through model training and direct evaluation of model calibration. These findings motivate large-scale ambiguous dataset creation and further research focusing on noisy visual data.
NeurIPS Conference 2022 Conference Paper
We propose \textbf{JAWS}, a series of wrapper methods for distribution-free uncertainty quantification tasks under covariate shift, centered on the core method \textbf{JAW}, the \textbf{JA}ckknife+ \textbf{W}eighted with data-dependent likelihood-ratio weights. JAWS also includes computationally efficient \textbf{A}pproximations of JAW using higher-order influence functions: \textbf{JAWA}. Theoretically, we show that JAW relaxes the jackknife+'s assumption of data exchangeability to achieve the same finite-sample coverage guarantee even under covariate shift. JAWA further approaches the JAW guarantee in the limit of the sample size or the influence function order under common regularity assumptions. Moreover, we propose a general approach to repurposing predictive interval-generating methods and their guarantees to the reverse task: estimating the probability that a prediction is erroneous, based on user-specified error criteria such as a safe or acceptable tolerance threshold around the true label. We then propose \textbf{JAW-E} and \textbf{JAWA-E} as the repurposed proposed methods for this \textbf{E}rror assessment task. Practically, JAWS outperform state-of-the-art predictive inference baselines in a variety of biased real world data sets for interval-generation and error-assessment predictive uncertainty auditing tasks.
AAAI Conference 2021 Conference Paper
Making predictions that are fair with regard to protected attributes (race, gender, age, etc.) has become an important requirement for classification algorithms. Existing techniques derive a fair model from sampled labeled data relying on the assumption that training and testing data are identically and independently drawn (iid) from the same distribution. In practice, distribution shift can and does occur between training and testing datasets as the characteristics of individuals interacting with the machine learning system change. We investigate fairness under covariate shift, a relaxation of the iid assumption in which the inputs or covariates change while the conditional label distribution remains the same. We seek fair decisions under these assumptions on target data with unknown labels. We propose an approach that obtains the predictor that is robust to the worst-case testing performance while satisfying target fairness requirements and matching statistical properties of the source data. We demonstrate the benefits of our approach on benchmark prediction tasks.
NeurIPS Conference 2016 Conference Paper
Recently proposed adversarial classification methods have shown promising results for cost sensitive and multivariate losses. In contrast with empirical risk minimization (ERM) methods, which use convex surrogate losses to approximate the desired non-convex target loss function, adversarial methods minimize non-convex losses by treating the properties of the training data as being uncertain and worst case within a minimax game. Despite this difference in formulation, we recast adversarial classification under zero-one loss as an ERM method with a novel prescribed loss function. We demonstrate a number of theoretical and practical advantages over the very closely related hinge loss ERM methods. This establishes adversarial classification under the zero-one loss as a method that fills the long standing gap in multiclass hinge loss classification, simultaneously guaranteeing Fisher consistency and universal consistency, while also providing dual parameter sparsity and high accuracy predictions in practice.
AAAI Conference 2016 Conference Paper
AAAI Conference 2015 Conference Paper
To facilitate interaction with people, robots must not only recognize current actions, but also infer a person’s intentions and future behavior. Recent advances in depth camera technology have significantly improved human motion tracking. However, the inherent high dimensionality of interacting with the physical world makes efficiently forecasting human intention and future behavior a challenging task. Predictive methods that estimate uncertainty are therefore critical for supporting appropriate robotic responses to the many ambiguities posed within the human-robot interaction setting. We address these two challenges, high dimensionality and uncertainty, by employing predictive inverse optimal control methods to estimate a probabilistic model of human motion trajectories. Our inverse optimal control formulation estimates quadratic cost functions that best rationalize observed trajectories framed as solutions to linear-quadratic regularization problems. The formulation calibrates its uncertainty from observed motion trajectories, and is efficient in highdimensional state spaces with linear dynamics. We demonstrate its effectiveness on a task of anticipating the future trajectories, target locations and activity intentions of hand motions.
AAAI Conference 2015 Conference Paper
Existing approaches to active learning are generally optimistic about their certainty with respect to data shift between labeled and unlabeled data. They assume that unknown datapoint labels follow the inductive biases of the active learner. As a result, the most useful datapoint labels—ones that refute current inductive biases— are rarely solicited. We propose a shift-pessimistic approach to active learning that assumes the worst-case about the unknown conditional label distribution. This closely aligns model uncertainty with generalization error, enabling more useful label solicitation. We investigate the theoretical benefits of this approach and demonstrate its empirical advantages on probabilistic binary classification tasks.
NeurIPS Conference 2014 Conference Paper
In many important machine learning applications, the source distribution used to estimate a probabilistic classifier differs from the target distribution on which the classifier will be used to make predictions. Due to its asymptotic properties, sample-reweighted loss minimization is a commonly employed technique to deal with this difference. However, given finite amounts of labeled source data, this technique suffers from significant estimation errors in settings with large sample selection bias. We develop a framework for robustly learning a probabilistic classifier that adapts to different sample selection biases using a minimax estimation formulation. Our approach requires only accurate estimates of statistics under the source distribution and is otherwise as robust as possible to unknown properties of the conditional label distribution, except when explicit generalization assumptions are incorporated. We demonstrate the behavior and effectiveness of our approach on synthetic and UCI binary classification tasks.