Author name cluster

Juncheng Dong

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

2 author rows

UAI Conference 2025 Conference Paper

CATE Estimation With Potential Outcome Imputation From Local Regression

Ahmed Aloui
Juncheng Dong
Cat Phuoc Le
Vahid Tarokh

One of the most significant challenges in Conditional Average Treatment Effect (CATE) estimation is the statistical discrepancy between distinct treatment groups. To address this issue, we propose a model-agnostic data augmentation method for CATE estimation. First, we derive regret bounds for general data augmentation methods suggesting that a small imputation error may be necessary for accurate CATE estimation. Inspired by this idea, we propose a contrastive learning approach that reliably imputes missing potential outcomes for a selected subset of individuals formed using a similarity measure. We augment the original dataset with these reliable imputations to reduce the discrepancy between different treatment groups while inducing minimal imputation error. The augmented dataset can subsequently be employed to train standard CATE estimation models. We provide both theoretical guarantees and extensive numerical studies demonstrating the effectiveness of our approach in improving the accuracy and robustness of numerous CATE estimation models.

Details

UAI Conference 2025 Conference Paper

Conditional Average Treatment Effect Estimation Under Hidden Confounders

Ahmed Aloui
Juncheng Dong
Ali Hasan
Vahid Tarokh

One of the major challenges in estimating conditional potential outcomes and conditional average treatment effects (CATE) is the presence of hidden confounders. Since testing for hidden confounders cannot be accomplished only with observational data, conditional unconfoundedness is commonly assumed in the literature of CATE estimation. Nevertheless, under this assumption, CATE estimation can be significantly biased due to the effects of unobserved confounders. In this work, we consider the case where in addition to a potentially large observational dataset, a small dataset from a randomized controlled trial (RCT) is available. Notably, we make no assumptions on the existence of any covariate information for the RCT dataset, we only require the outcomes to be observed. We propose a CATE estimation method based on a pseudo-confounder generator and a CATE model that aligns the learned potential outcomes from the observational data with those observed from the RCT. Our method is applicable to many practical scenarios of interest, particularly those where privacy is a concern (e. g. , medical applications). Extensive numerical experiments are provided demonstrating the effectiveness of our approach for both synthetic and real-world datasets.

Details

ICML Conference 2025 Conference Paper

In-Context Reinforcement Learning From Suboptimal Historical Data

Juncheng Dong
Moyang Guo
Ethan X. Fang
Zhuoran Yang
Vahid Tarokh

Transformer models have achieved remarkable empirical successes, largely due to their in-context learning capabilities. Inspired by this, we explore training an autoregressive transformer for in-context reinforcement learning (ICRL). In this setting, we initially train a transformer on an offline dataset consisting of trajectories collected from various RL tasks, and then fix and use this transformer to create an action policy for new RL tasks. Notably, we consider the setting where the offline dataset contains trajectories sampled from suboptimal behavioral policies. In this case, standard autoregressive training corresponds to imitation learning and results in suboptimal performance. To address this, we propose the Decision Importance Transformer (DIT) framework, which emulates the actor-critic algorithm in an in-context manner. In particular, we first train a transformer-based value function that estimates the advantage functions of the behavior policies that collected the suboptimal trajectories. Then we train a transformer-based policy via a weighted maximum likelihood estimation loss, where the weights are constructed based on the trained value function to steer the suboptimal policies to the optimal ones. We conduct extensive experiments to test the performance of DIT on both bandit and Markov Decision Process problems. Our results show that DIT achieves superior performance, particularly when the offline dataset contains suboptimal historical data.

Details

TMLR Journal 2025 Journal Article

Understanding and Robustifying Sub-domain Alignment for Domain Adaptation

Yiling Liu
Juncheng Dong
Ziyang Jiang
Ahmed Aloui
Keyu Li
Michael Hunter Klein
Vahid Tarokh
David Carlson

In unsupervised domain adaptation (UDA), aligning source and target domains improves the predictive performance of learned models on the target domain. A common methodological improvement in alignment methods is to divide the domains and align sub-domains instead. These sub-domain-based algorithms have demonstrated great empirical success but lack theoretical support. In this work, we establish a rigorous theoretical understanding of the advantages of these methods that have the potential to enhance their overall impact on the field. Our theory uncovers that sub-domain-based methods optimize an error bound that is at least as strong as non-sub-domain-based error bounds and is empirically verified to be much stronger. Furthermore, our analysis indicates that when the marginal weights of sub-domains shift between source and target tasks, the performance of these methods may be compromised. We therefore implement an algorithm to robustify sub-domain alignment for domain adaptation under sub-domain shift, offering a valuable adaptation strategy for future sub-domain-based methods. Empirical experiments across various benchmarks validate our theoretical insights, prove the necessity for the proposed adaptation strategy, and demonstrate the algorithm's competitiveness in handling label shift.

PDF Details

ICRA Conference 2024 Conference Paper

REFORMA: Robust REinFORceMent Learning via Adaptive Adversary for Drones Flying under Disturbances

Hao-Lun Hsu
Haocheng Meng
Shaocheng Luo
Juncheng Dong
Vahid Tarokh
Miroslav Pajic

In this work, we introduce REFORMA, a novel robust reinforcement learning (RL) approach to design controllers for unmanned aerial vehicles (UAVs) robust to unknown disturbances during flights. These disturbances, typically due to wind turbulence, electromagnetic interference, temperature extremes and many other external physical interference, are highly dynamic and difficult to model. REFORMA can perform a real-time online adaptation to these disturbances and generate appropriate velocity actions as countermeasures to stabilize the drone. REFORMA consists of two components: a base policy trained completely in simulation using model-free RL and an adaptation module trained via supervised learning with on-policy datasets. By varying the disturbance strength in an adaptation module, i. e. , adopting adaptive adversary, the policy is then able to handle extreme cases when the velocity of the drone is immediately affected by disturbances. Finally, we demonstrate the effectiveness of our method through extensive simulated experiments. To the best of our knowledge, REFORMA is the first robust RL approach that uses adaptive adversaries to tackle uncertain disturbances in drone tasks.

Details

IROS Conference 2024 Conference Paper

Steering Decision Transformers via Temporal Difference Learning

Hao-Lun Hsu
Alper Kamil Bozkurt
Juncheng Dong
Qitong Gao
Vahid Tarokh
Miroslav Pajic

Decision Transformers (DTs) have been highly effective for offline reinforcement learning (RL) tasks, successfully modeling the sequences of actions in a given set of demonstrations. However, DTs may perform poorly in stochastic environments, which are prevalent in robotics scenarios. In this paper, we identify that the root cause of this performance degradation is the growing variance of returns-to-go, the signal used by DTs to predict actions, accumulated over the horizon. Building upon this insight, we propose an extension to DTs that allows them to be steered toward high-reward regions, where the expected returns are estimated using temporal difference learning. This way, we not only mitigate the growing variance problem but also eliminate the need for DTs to have access to returns-to-go during evaluation and deployment phases. We show that our method outperforms state-of-the-art offline RL methods in both simulated and real-world robotic arm environments.

Details

NeurIPS Conference 2023 Conference Paper

Off-Policy Evaluation for Human Feedback

Qitong Gao
Ge Gao
Juncheng Dong
Vahid Tarokh
Min Chi
Miroslav Pajic

Off-policy evaluation (OPE) is important for closing the gap between offline training and evaluation of reinforcement learning (RL), by estimating performance and/or rank of target (evaluation) policies using offline trajectories only. It can improve the safety and efficiency of data collection and policy testing procedures in situations where online deployments are expensive, such as healthcare. However, existing OPE methods fall short in estimating human feedback (HF) signals, as HF may be conditioned over multiple underlying factors and are only sparsely available; as opposed to the agent-defined environmental rewards (used in policy optimization), which are usually determined over parametric functions or distributions. Consequently, the nature of HF signals makes extrapolating accurate OPE estimations to be challenging. To resolve this, we introduce an OPE for HF (OPEHF) framework that revives existing OPE methods in order to accurately evaluate the HF signals. Specifically, we develop an immediate human reward (IHR) reconstruction approach, regularized by environmental knowledge distilled in a latent space that captures the underlying dynamics of state transitions as well as issuing HF signals. Our approach has been tested over two real-world experiments, adaptive in-vivo neurostimulation and intelligent tutoring, and a simulation environment (visual Q&A). Results show that our approach significantly improves the performance toward estimating HF signals accurately, compared to directly applying (variants of) existing OPE methods.

PDF Details

ICML Conference 2023 Conference Paper

PASTA: Pessimistic Assortment Optimization

Juncheng Dong
Weibin Mo
Zhengling Qi
Cong Shi 0001
Ethan X. Fang
Vahid Tarokh

We consider a fundamental class of assortment optimization problems in an offline data-driven setting. The firm does not know the underlying customer choice model but has access to an offline dataset consisting of the historically offered assortment set, customer choice, and revenue. The objective is to use the offline dataset to find an optimal assortment. Due to the combinatorial nature of assortment optimization, the problem of insufficient data coverage is likely to occur in the offline dataset. Therefore, designing a provably efficient offline learning algorithm becomes a significant challenge. To this end, based on the principle of pessimism, we propose a novel algorithm called Pessimistic ASsortment opTimizAtion (PASTA for short), which can correctly identify the optimal assortment by only requiring the offline data to cover the optimal assortment under general settings. In particular, we establish the first regret bound for the offline assortment optimization problem under the celebrated multinomial logit model (MNL). We also propose an efficient computational procedure to solve our pessimistic assortment optimization problem. Our numerical studies demonstrate the superiority of the proposed method over the existing baseline method.

Details

UAI Conference 2023 Conference Paper

Transfer learning for individual treatment effect estimation

Ahmed Aloui
Juncheng Dong
Cat Phuoc Le
Vahid Tarokh

This work considers the problem of transferring causal knowledge between tasks for Individual Treatment Effect (ITE) estimation. To this end, we theoretically assess the feasibility of transferring ITE knowledge and present a practical framework for efficient transfer. A lower bound is introduced on the ITE error of the target task to demonstrate that ITE knowledge transfer is challenging due to the absence of counterfactual information. Nevertheless, we establish generalization upper bounds on the counterfactual loss and ITE error of the target task, demonstrating the feasibility of ITE knowledge transfer. Subsequently, we introduce a framework with a new Causal Inference Task Affinity (CITA) measure for ITE knowledge transfer. Specifically, we use CITA to find the closest source task to the target task and utilize it for ITE knowledge transfer. Empirical studies are provided, demonstrating the efficacy of the proposed method. We observe that ITE knowledge transfer can significantly (up to 95%) reduce the amount of data required for ITE estimation.

Details

ICLR Conference 2022 Conference Paper

Blaschke Product Neural Networks (BPNN): A Physics-Infused Neural Network for Phase Retrieval of Meromorphic Functions

Juncheng Dong
Simiao Ren
Yang Deng
Omar Khatib
Jordan M. Malof
Mohammadreza Soltani
Willie Padilla
Vahid Tarokh

Numerous physical systems are described by ordinary or partial differential equations whose solutions are given by holomorphic or meromorphic functions in the complex domain. In many cases, only the magnitude of these functions are observed on various points on the purely imaginary $j\omega$-axis since coherent measurement of their phases is often expensive. However, it is desirable to retrieve the lost phases from the magnitudes when possible. To this end, we propose a physics-infused deep neural network based on the Blaschke products for phase retrieval. Inspired by the Helson and Sarason Theorem, we recover coefficients of a rational function of Blaschke products using a Blaschke Product Neural Network (BPNN), based upon the magnitude observations as input. The resulting rational function is then used for phase retrieval. We compare the BPNN to conventional deep neural networks (NNs) on several phase retrieval problems, comprising both synthetic and contemporary real-world problems (e.g., metamaterials for which data collection requires substantial expertise and is time consuming). On each phase retrieval problem, we compare against a population of conventional NNs of varying size and hyperparameter settings. Even without any hyper-parameter search, we find that BPNNs consistently outperform the population of optimized NNs in scarce data scenarios, and do so despite being much smaller models. The results can in turn be applied to calculate the refractive index of metamaterials, which is an important problem in emerging areas of material science.

Details

AAMAS Conference 2022 Conference Paper

Multi-Agent Adversarial Attacks for Multi-Channel Communications

Juncheng Dong
Suya Wu
Mohammadreza Soltani
Vahid Tarokh

Recently Reinforcement Learning (RL) has been applied as an antiadversarial remedy in wireless communication networks. However studying the RL-based approaches from the adversary’s perspective has received little attention. Additionally, RL-based approaches in an anti-adversary or adversarial paradigm mostly consider singlechannel communication (either channel selection or single channel power control), while multi-channel communication is more common in practice. In this paper, we propose a multi-agent adversary system (MAAS) for modeling and analyzing adversaries in a wireless communication scenario by careful design of the reward function under realistic communication scenarios. In particular, by modeling the adversaries as learning agents, we show that the proposed MAAS is able to successfully choose the transmitted channel(s) and their respective allocated power(s) without any prior knowledge of the sender strategy. Compared to the single-agent adversary (SAA), multi-agents in MAAS can achieve significant reduction in signal-to-noise ratio (SINR) under the same power constraints and partial observability, while providing improved stability and a more efficient learning process. Moreover, through empirical studies we show that the results in simulation are close to the ones in communication in reality, a conclusion that is pivotal to the validity of performance of agents evaluated in simulations.

PDF

ICLR Conference 2022 Conference Paper

Task Affinity with Maximum Bipartite Matching in Few-Shot Learning

Cat Phuoc Le
Juncheng Dong
Mohammadreza Soltani
Vahid Tarokh

We propose an asymmetric affinity score for representing the complexity of utilizing the knowledge of one task for learning another one. Our method is based on the maximum bipartite matching algorithm and utilizes the Fisher Information matrix. We provide theoretical analyses demonstrating that the proposed score is mathematically well-defined, and subsequently use the affinity score to propose a novel algorithm for the few-shot learning problem. In particular, using this score, we find relevant training data labels to the test data and leverage the discovered relevant data for episodically fine-tuning a few-shot model. Results on various few-shot benchmark datasets demonstrate the efficacy of the proposed approach by improving the classification accuracy over the state-of-the-art methods even when using smaller models.

Details

NeurIPS Conference 2021 Conference Paper

Benchmarking Data-driven Surrogate Simulators for Artificial Electromagnetic Materials

Yang Deng
Juncheng Dong
Simiao Ren
Omar Khatib
Mohammadreza Soltani
Vahid Tarokh
Willie Padilla
Jordan Malof

Artificial electromagnetic materials (AEMs), including metamaterials, derive their electromagnetic properties from geometry rather than chemistry. With the appropriate geometric design, AEMs have achieved exotic properties not realizable with conventional materials (e. g. , cloaking or negative refractive index). However, understanding the relationship between the AEM structure and its properties is often poorly understood. While computational electromagnetic simulation (CEMS) may help design new AEMs, its use is limited due to its long computational time. Recently, it has been shown that deep learning can be an alternative solution to infer the relationship between an AEM geometry and its properties using a (relatively) small pool of CEMS data. However, the limited publicly released datasets and models and no widely-used benchmark for comparison have made using deep learning approaches even more difficult. Furthermore, configuring CEMS for a specific problem requires substantial expertise and time, making reproducibility challenging. Here, we develop a collection of three classes of AEM problems: metamaterials, nanophotonics, and color filter designs. We also publicly release software, allowing other researchers to conduct additional simulations for each system easily. Finally, we conduct experiments on our benchmark datasets with three recent neural network architectures: the multilayer perceptron (MLP), MLP-mixer, and transformer. We identify the methods and models that generalize best over the three problems to establish the best practice and baseline results upon which future research can build.

PDF Details