Author name cluster

Min Chi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

19 papers

2 author rows

IJCAI Conference 2025 Conference Paper

Human-Readable Neuro-Fuzzy Networks from Frequent Yet Discernible Patterns in Reward-Based Environments

John Wesley Hostetter
Adittya Soukarjya Saha
Md Mirajul Islam
Tiffany Barnes
Min Chi

We propose self-organizing and simplifying neuro-fuzzy networks (NFNs) to yield transparent human-readable policies by exploiting fuzzy information granulation and graph theory. Deriving from social network analysis, we retain only the frequent-yet-discernible (FYD) patterns in NFNs and apply them to reward-based scenarios. The effectiveness of NFNs from FYD patterns is shown in classic control and a real-world classroom using an intelligent tutoring system to teach students.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Iterative Counterfactual Data Augmentation

Mitchell Plyler
Min Chi

Counterfactual data augmentation (CDA) is a method for controlling information or biases in training datasets by generating a complementary dataset with typically opposing biases. Prior work often either relies on hand-crafted rules or algorithmic CDA methods which can leave unwanted information in the augmented dataset. In this work, we show iterative CDA (ICDA) with initial, high-noise interventions can converge to a state with significantly lower noise. Our ICDA procedure produces a dataset where one target signal in the training dataset maintains high mutual information with a corresponding label and the information of spurious signals are reduced. We show training on the augmented datasets produces rationales on documents that better align with human annotation. Our experiments include six human produced datasets and two large-language model generated datasets.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Get a Head Start: On-Demand Pedagogical Policy Selection in Intelligent Tutoring

Ge Gao
Xi Yang
Min Chi

Reinforcement learning (RL) is broadly employed in human-involved systems to enhance human outcomes. Off-policy evaluation (OPE) has been pivotal for RL in those realms since online policy learning and evaluation can be high-stake. Intelligent tutoring has raised tremendous attentions as highly challenging when applying OPE to human-involved systems, due to that students' subgroups can favor different pedagogical policies and the costly procedure that policies have to be induced fully offline and then directly deployed to the upcoming semester. In this work, we formulate on-demand pedagogical policy selection (ODPS) to tackle the challenges for OPE in intelligent tutoring. We propose a pipeline, EduPlanner, as a concrete solution for ODPS. Our pipeline results in an theoretically unbiased estimator, and enables efficient and customized policy selection by identifying subgroups over both historical data and on-arrival initial logs. We evaluate our approach on the Probability ITS that has been used in real classrooms for over eight years. Our study shows significant improvement on learning outcomes of students with EduPlanner, especially for the ones associated with low-performing subgroups.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Multi-TA: Multilevel Temporal Augmentation for Robust Septic Shock Early Prediction

Hyunwoo Sohn
Kyungjin Park
Baekkwan Park
Min Chi

Early predicting the onset of a disease is critical to timely and accurate clinical decision-making, where a model determines whether a patient will develop the disease n hours later. While deep learning algorithms have demonstrated great success using multivariate irregular time-series data such as electronic health records (EHRs), they often lack temporal robustness due to data scarcity problems becoming more prominent at multilevel as n increases. At event-level, the decreasing number of available events per trajectory increases uncertainty in anticipating future disease behavior. At trajectory-level, the scarcity of patient trajectories limits diversity in the training population, hindering the model's generalization. This work introduces Multi-TA, a multilevel temporal augmentation framework that leverages BERT-based temporal EHRs representation learning and a unified data augmentation approach, effectively addressing data scarcity issues at both event and trajectory levels. Validated on two real-world EHRs for septic shock, Multi-TA outperforms mixup and GAN-based state-of-the-art models across eight prediction windows, demonstrating improved temporal robustness. Further, we provide in-depth analyses on data augmentation.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Off-Policy Selection for Initiating Human-Centric Experimental Design

Ge Gao
Xi Yang
Qitong Gao
Song Ju
Miroslav Pajic
Min Chi

In human-centric applications like healthcare and education, the \textit{heterogeneity} among patients and students necessitates personalized treatments and instructional interventions. While reinforcement learning (RL) has been utilized in those tasks, off-policy selection (OPS) is pivotal to close the loop by offline evaluating and selecting policies without online interactions, yet current OPS methods often overlook the heterogeneity among participants. Our work is centered on resolving a \textit{pivotal challenge} in human-centric systems (HCSs): \textbf{\textit{how to select a policy to deploy when a new participant joining the cohort, without having access to any prior offline data collected over the participant? }} We introduce First-Glance Off-Policy Selection (FPS), a novel approach that systematically addresses participant heterogeneity through sub-group segmentation and tailored OPS criteria to each sub-group. By grouping individuals with similar traits, FPS facilitates personalized policy selection aligned with unique characteristics of each participant or group of participants. FPS is evaluated via two important but challenging applications, intelligent tutoring systems and a healthcare application for sepsis treatment and intervention. FPS presents significant advancement in enhancing learning outcomes of students and in-hospital care outcomes.

PDF Details DOI

ICLR Conference 2024 Conference Paper

On Trajectory Augmentations for Off-Policy Evaluation

Ge Gao
Qitong Gao
Xi Yang 0019
Song Ju
Miroslav Pajic
Min Chi

In the realm of reinforcement learning (RL), off-policy evaluation (OPE) holds a pivotal position, especially in high-stake human-involved scenarios such as e-learning and healthcare. Applying OPE to these domains is often challenging with scarce and underrepresentative offline training trajectories. Data augmentation has been a successful technique to enrich training data. However, directly employing existing data augmentation methods to OPE may not be feasible, due to the Markovian nature within the offline trajectories and the desire for generalizability across diverse target policies. In this work, we propose an offline trajectory augmentation approach to specifically facilitate OPE in human-involved scenarios. We propose sub-trajectory mining to extract potentially valuable sub-trajectories from offline data, and diversify the behaviors within those sub-trajectories by varying coverage of the state-action space. Our work was empirically evaluated in a wide array of environments, encompassing both simulated scenarios and real-world domains like robotic control, healthcare, and e-learning, where the training trajectories include varying levels of coverage of the state-action space. By enhancing the performance of a variety of OPE methods, our work offers a promising path forward for tackling OPE challenges in situations where data may be limited or underrepresentative.

Details

AAMAS Conference 2023 Conference Paper

A Self-Organizing Neuro-Fuzzy Q-Network: Systematic Design with Offline Hybrid Learning

John Wesley Hostetter
Mark Abdelshiheed
Tiffany Barnes
Min Chi

In this paper, we propose a systematic design process for automatically generating self-organizing neuro-fuzzy Q-networks by leveraging unsupervised learning and an offline, model-free fuzzy reinforcement learning algorithm called Fuzzy Conservative Qlearning (FCQL). Our FCQL offers more effective and interpretable policies than deep neural networks, facilitating human-in-the-loop design and explainability. The effectiveness of FCQL is empirically demonstrated in Cart Pole and in an Intelligent Tutoring System that teaches probability principles to real humans.

PDF

IJCAI Conference 2023 Conference Paper

Hierarchical Apprenticeship Learning for Disease Progression Modeling

Xi Yang
Ge Gao
Min Chi

Disease progression modeling (DPM) plays an essential role in characterizing patients' historical pathways and predicting their future risks. Apprenticeship learning (AL) aims to induce decision-making policies by observing and imitating expert behaviors. In this paper, we investigate the incorporation of AL-derived patterns into DPM, utilizing a Time-aware Hierarchical EM Energy-based Subsequence (THEMES) AL approach. To the best of our knowledge, this is the first study incorporating AL-derived progressive and interventional patterns for DPM. We evaluate the efficacy of this approach in a challenging task of septic shock early prediction, and our results demonstrate that integrating the AL-derived patterns significantly enhances the performance of DPM.

PDF Details DOI

AAMAS Conference 2023 Conference Paper

HOPE: Human-Centric Off-Policy Evaluation for E-Learning and Healthcare

Ge Gao
Song Ju
Markel Sanz Ausin
Min Chi

Reinforcement learning (RL) has been extensively researched for enhancing human-environment interactions in various humancentric tasks, including e-learning and healthcare. Since deploying and evaluating policies online are high-stakes in such tasks, offpolicy evaluation (OPE) is crucial for inducing effective policies. In human-centric environments, however, OPE is challenging because the underlying state is often unobservable, while only aggregate rewards can be observed (students’ test scores or whether a patient is released from the hospital eventually). In this work, we propose a human-centric OPE (HOPE) to handle partial observability and aggregated rewards in such environments. Specifically, we reconstruct immediate rewards from the aggregated rewards considering partial observability to estimate expected total returns. We provide a theoretical bound for the proposed method, and we have conducted extensive experiments in real-world human-centric tasks, including sepsis treatments and an intelligent tutoring system. Our approach reliably predicts the returns of different policies and outperforms state-of-the-art benchmarks using both standard validation methods and human-centric significance tests.

PDF

NeurIPS Conference 2023 Conference Paper

Off-Policy Evaluation for Human Feedback

Qitong Gao
Ge Gao
Juncheng Dong
Vahid Tarokh
Min Chi
Miroslav Pajic

Off-policy evaluation (OPE) is important for closing the gap between offline training and evaluation of reinforcement learning (RL), by estimating performance and/or rank of target (evaluation) policies using offline trajectories only. It can improve the safety and efficiency of data collection and policy testing procedures in situations where online deployments are expensive, such as healthcare. However, existing OPE methods fall short in estimating human feedback (HF) signals, as HF may be conditioned over multiple underlying factors and are only sparsely available; as opposed to the agent-defined environmental rewards (used in policy optimization), which are usually determined over parametric functions or distributions. Consequently, the nature of HF signals makes extrapolating accurate OPE estimations to be challenging. To resolve this, we introduce an OPE for HF (OPEHF) framework that revives existing OPE methods in order to accurately evaluate the HF signals. Specifically, we develop an immediate human reward (IHR) reconstruction approach, regularized by environmental knowledge distilled in a latent space that captures the underlying dynamics of state transitions as well as issuing HF signals. Our approach has been tested over two real-world experiments, adaptive in-vivo neurostimulation and intelligent tutoring, and a simulation environment (visual Q&A). Results show that our approach significantly improves the performance toward estimating HF signals accurately, compared to directly applying (variants of) existing OPE methods.

PDF Details

ICLR Conference 2023 Conference Paper

Variational Latent Branching Model for Off-Policy Evaluation

Qitong Gao
Ge Gao
Min Chi
Miroslav Pajic

Model-based methods have recently shown great potential for off-policy evaluation (OPE); offline trajectories induced by behavioral policies are fitted to transitions of Markov decision processes (MDPs), which are used to rollout simulated trajectories and estimate the performance of policies. Model-based OPE methods face two key challenges. First, as offline trajectories are usually fixed, they tend to cover limited state and action space. Second, the performance of model-based methods can be sensitive to the initialization of their parameters. In this work, we propose the variational latent branching model (VLBM) to learn the transition function of MDPs by formulating the environmental dynamics as a compact latent space, from which the next states and rewards are then sampled. Specifically, VLBM leverages and extends the variational inference framework with the recurrent state alignment (RSA), which is designed to capture as much information underlying the limited training data, by smoothing out the information flow between the variational (encoding) and generative (decoding) part of VLBM. Moreover, we also introduce the branching architecture to improve the model’s robustness against randomly initialized model weights. The effectiveness of the VLBM is evaluated on the deep OPE (DOPE) benchmark, from which the training trajectories are designed to result in varied coverage of the state-action space. We show that the VLBM outperforms existing state-of-the-art OPE methods in general.

Details

IJCAI Conference 2022 Conference Paper

A Reinforcement Learning-Informed Pattern Mining Framework for Multivariate Time Series Classification

Ge Gao
Qitong Gao
Xi Yang
Miroslav Pajic
Min Chi

Multivariate time series (MTS) classification is a challenging and important task in various domains and real-world applications. Much of prior work on MTS can be roughly divided into neural network (NN)- and pattern-based methods. The former can lead to robust classification performance, but many of the generated patterns are challenging to interpret; while the latter often produce interpretable patterns that may not be helpful for the classification task. In this work, we propose a reinforcement learning (RL) informed PAttern Mining framework (RLPAM) to identify interpretable yet important patterns for MTS classification. Our framework has been validated by 30 benchmark datasets as well as real-world large-scale electronic health records (EHRs) for an extremely challenging task: sepsis shock early prediction. We show that RLPAM outperforms the state-of-the-art NN-based methods on 14 out of 30 datasets as well as on the EHRs. Finally, we show how RL informed patterns can be interpretable and can improve our understanding of septic shock progression.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Cross-Lingual Adversarial Domain Adaptation for Novice Programming

Ye Mao
Farzaneh Khoshnevisan
Thomas Price
Tiffany Barnes
Min Chi

Student modeling sits at the epicenter of adaptive learning technology. In contrast to the voluminous work on student modeling for well-defined domains such as algebra, there has been little research on student modeling in programming (SMP) due to data scarcity caused by the unbounded solution spaces of open-ended programming exercises. In this work, we focus on two essential SMP tasks: program classification and early prediction of student success and propose a Cross-Lingual Adversarial Domain Adaptation (CrossLing) framework that can leverage a large programming dataset to learn features that can improve SMP’s build using a much smaller dataset in a different programming language. Our framework maintains one globally invariant latent representation across both datasets via an adversarial learning process, as well as allocating domain-specific models for each dataset to extract local latent representations that cannot and should not be united. By separating globally-shared representations from domain-specific representations, our framework outperforms existing state-of-the-art methods for both SMP tasks.

PDF Details

NeurIPS Conference 2021 Conference Paper

Making a (Counterfactual) Difference One Rationale at a Time

Mitchell Plyler
Michael Green
Min Chi

Rationales, snippets of extracted text that explain an inference, have emerged as a popular framework for interpretable natural language processing (NLP). Rationale models typically consist of two cooperating modules: a selector and a classifier with the goal of maximizing the mutual information (MMI) between the "selected" text and the document label. Despite their promises, MMI-based methods often pick up on spurious text patterns and result in models with nonsensical behaviors. In this work, we investigate whether counterfactual data augmentation (CDA), without human assistance, can improve the performance of the selector by lowering the mutual information between spurious signals and the document label. Our counterfactuals are produced in an unsupervised fashion using class-dependent generative models. From an information theoretic lens, we derive properties of the unaugmented dataset for which our CDA approach would succeed. The effectiveness of CDA is empirically evaluated by comparing against several baselines including an improved MMI-based rationale schema on two multi-aspect datasets. Our results show that CDA produces rationales that better capture the signal of interest.

PDF Details

IJCAI Conference 2021 Conference Paper

Multi-series Time-aware Sequence Partitioning for Disease Progression Modeling

Xi Yang
Yuan Zhang
Min Chi

Electronic healthcare records (EHRs) are comprehensive longitudinal collections of patient data that play a critical role in modeling the disease progression to facilitate clinical decision-making. Based on EHRs, in this work, we focus on sepsis -- a broad syndrome that can develop from nearly all types of infections (e. g. , influenza, pneumonia). The symptoms of sepsis, such as elevated heart rate, fever, and shortness of breath, are vague and common to other illnesses, making the modeling of its progression extremely challenging. Motivated by the recent success of a novel subsequence clustering approach: Toeplitz Inverse Covariance-based Clustering (TICC), we model the sepsis progression as a subsequence partitioning problem and propose a Multi-series Time-aware TICC (MT-TICC), which incorporates multi-series nature and irregular time intervals of EHRs. The effectiveness of MT-TICC is first validated via a case study using a real-world hand gesture dataset with ground-truth labels. Then we further apply it for sepsis progression modeling using EHRs. The results suggest that MT-TICC can significantly outperform competitive baseline models, including the TICC. More importantly, it unveils interpretable patterns, which sheds some light on better understanding the sepsis progression.

PDF Details DOI

IJCAI Conference 2020 Conference Paper

Hierarchical Reinforcement Learning for Pedagogical Policy Induction (Extended Abstract)

Guojing Zhou
Hamoon Azizsoltani
Markel Sanz Ausin
Tiffany Barnes
Min Chi

In interactive e-learning environments such as Intelligent Tutoring Systems, there are pedagogical decisions to make at two main levels of granularity: whole problems and single steps. In recent years, there is growing interest in applying data-driven techniques for adaptive decision making that can dynamically tailor students' learning experiences. Most existing data-driven approaches, however, treat these pedagogical decisions equally, or independently, disregarding the long-term impact that tutor decisions may have across these two levels of granularity. In this paper, we propose and apply an offline Gaussian Processes based Hierarchical Reinforcement Learning (HRL) framework to induce a hierarchical pedagogical policy that makes decisions at both problem and step levels. An empirical classroom study shows that the HRL policy is significantly more effective than a Deep Q-Network (DQN) induced policy and a random yet reasonable baseline policy.

PDF Details DOI

IJCAI Conference 2019 Conference Paper

ATTAIN: Attention-based Time-Aware LSTM Networks for Disease Progression Modeling

Yuan Zhang
Xi Yang
Julie Ivy
Min Chi

Modeling patient disease progression using Electronic Health Records (EHRs) is critical to assist clinical decision making. Long-Short Term Memory (LSTM) is an effective model to handle sequential data, such as EHRs, but it encounters two major limitations when applied to EHRs: it is unable to interpret the prediction results and it ignores the irregular time intervals between consecutive events. To tackle these limitations, we propose an attention-based time-aware LSTM Networks (ATTAIN), to improve the interpretability of LSTM and to identify the critical previous events for current diagnosis by modeling the inherent time irregularity. We validate ATTAIN on modeling the progression of an extremely challenging disease, septic shock, by using real-world EHRs. Our results demonstrate that the proposed framework outperforms the state-of-the-art models such as RETAIN and T-LSTM. Also, the generated interpretative time-aware attention weights shed some lights on the progression behaviors of septic shock.

PDF Details

IJCAI Conference 2019 Conference Paper

Unobserved Is Not Equal to Non-existent: Using Gaussian Processes to Infer Immediate Rewards Across Contexts

Hamoon Azizsoltani
Yeo Jin Kim
Markel Sanz Ausin
Tiffany Barnes
Min Chi

Learning optimal policies in real-world domains with delayed rewards is a major challenge in Reinforcement Learning. We address the credit assignment problem by proposing a Gaussian Process (GP)-based immediate reward approximation algorithm and evaluate its effectiveness in 4 contexts where rewards can be delayed for long trajectories. In one GridWorld game and 8 Atari games, where immediate rewards are available, our results showed that on 7 out 9 games, the proposed GP-inferred reward policy performed at least as well as the immediate reward policy and significantly outperformed the corresponding delayed reward policy. In e-learning and healthcare applications, we combined GP-inferred immediate rewards with offline Deep Q-Network (DQN) policy induction and showed that the GP-inferred reward policies outperformed the policies induced using delayed rewards in both real-world contexts.

PDF Details

IJCAI Conference 2018 Conference Paper

Temporal Belief Memory: Imputing Missing Data during RNN Training

Yeo Jin Kim
Min Chi

We propose a bio-inspired approach named Temporal Belief Memory (TBM) for handling missing data with recurrent neural networks (RNNs). When modeling irregularly observed temporal sequences, conventional RNNs generally ignore the real-time intervals between consecutive observations. TBM is a missing value imputation method that considers the time continuity and captures latent missing patterns based on irregular real time intervals of the inputs. We evaluate our TBM approach with real-world electronic health records (EHRs) consisting of 52, 919 visits and 4, 224, 567 events on a task of early prediction of septic shock. We compare TBM against multiple baselines including both domain experts' rules and the state-of-the-art missing data handling approach using both RNN and long-short term memory. The experimental results show that TBM outperforms all the competitive baseline approaches for the septic shock early prediction task.

PDF Details