Author name cluster

Ming Jin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

26 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Can LLMs Correct Themselves? A Benchmark of Self-Correction in LLMs

Guiyao Tie
Zenghui Yuan
Zeli Zhao
Chaoran Hu
Tianhe Gu
Ruihang Zhang
Sizhe Zhang
Junran Wu

Self-correction of large language models (LLMs) emerges as a critical component for enhancing their reasoning performance. Although various self-correction methods have been proposed, a comprehensive evaluation of these methods remains largely unexplored, and the question of whether LLMs can truly correct themselves is a matter of significant interest and concern. In this study, we introduce CorrectBench, a benchmark developed to evaluate the effectiveness of self-correction strategies, including intrinsic, external, and fine-tuned approaches, across three tasks: commonsense reasoning, mathematical reasoning, and code generation. Our findings reveal that: 1) Self-correction methods can improve accuracy, especially for complex reasoning tasks; 2) Mixing different self-correction strategies yields further improvements, though it reduces efficiency; 3) Reasoning LLMs (e. g. , DeepSeek-V3) have limited optimization under additional self-correction methods and have high time costs. Interestingly, a comparatively simple chain-of-thought (CoT) baseline demonstrates competitive accuracy and efficiency. These results underscore the potential of self-correction to enhance LLM's reasoning performance while highlighting the ongoing challenge of improving their efficiency. Consequently, we advocate for further research focused on optimizing the balance between reasoning capabilities and operational efficiency.

NeurIPS Conference 2025 Conference Paper

Don’t Trade Off Safety: Diffusion Regularization for Constrained Offline RL

Junyu Guo
Zhi Zheng
Donghao Ying
Ming Jin
Shangding Gu
Costas J Spanos
Javad Lavaei

Constrained reinforcement learning (RL) seeks high-performance policies under safety constraints. We focus on an offline setting where the agent learns from a fixed dataset—a common requirement in realistic tasks to prevent unsafe exploration. To address this, we propose Diffusion-Regularized Constrained Offline Reinforcement Learning (DRCORL), which first uses a diffusion model to capture the behavioral policy from offline data and then extracts a simplified policy to enable efficient inference. We further apply gradient manipulation for safety adaptation, balancing the reward objective and constraint satisfaction. This approach leverages high-quality offline data while incorporating safety requirements. Empirical results show that DRCORL achieves reliable safety performance, fast inference, and strong reward outcomes across robot learning tasks. Compared to existing safe offline RL methods, it consistently meets cost limits and performs well with the same hyperparameters, indicating practical applicability in real-world scenarios. We open-source our implementation at https: //github. com/JamesJunyuGuo/DRCORL.

NeurIPS Conference 2025 Conference Paper

DyG-Mamba: Continuous State Space Modeling on Dynamic Graphs

Dongyuan Li
Shiyin Tan
Ying Zhang
Ming Jin
Shirui Pan
Manabu Okumura
Renhe Jiang

Dynamic graph modeling aims to uncover evolutionary patterns in real-world systems, enabling accurate social recommendation and early detection of cancer cells. Inspired by the success of recent state space models in efficiently capturing long-term dependencies, we propose DyG-Mamba by translating dynamic graph modeling into a long-term sequence modeling problem. Specifically, inspired by Ebbinghaus' forgetting curve, we treat the irregular timespans between events as control signals, allowing DyG-Mamba to dynamically adjust the forgetting of historical information. This mechanism ensures effective usage of irregular timespans, thereby improving both model effectiveness and inductive capability. In addition, inspired by Ebbinghaus' review cycle, we redefine core parameters to ensure that DyG-Mamba selectively reviews historical information and filters out noisy inputs, further enhancing the model’s robustness. Through exhaustive experiments on 12 datasets covering dynamic link prediction and node classification tasks, we show that DyG-Mamba achieves state-of-the-art performance on most datasets, while demonstrating significantly improved computational and memory efficiency. Our code is available at https: //github. com/Clearloveyuan/DyG-Mamba.

NeurIPS Conference 2025 Conference Paper

Multi-Scale Finetuning for Encoder-based Time Series Foundation Models

Zhongzheng Qiao
Chenghao Liu
Yiming Zhang
Ming Jin
Quang Pham
Qingsong Wen
Ponnuthurai Suganthan
Xudong Jiang

Time series foundation models (TSFMs) demonstrate impressive zero-shot performance for time series forecasting. However, an important yet underexplored challenge is how to effectively finetune TSFMs on specific downstream tasks. While naive finetuning can yield performance gains, we argue that it falls short of fully leveraging TSFMs' capabilities, often resulting in overfitting and suboptimal performance. Given the diverse temporal patterns across sampling scales and the inherent multi-scale forecasting capabilities of TSFMs, we adopt a causal perspective to analyze finetuning process, through which we highlight the critical importance of explicitly modeling multiple scales and reveal the shortcomings of naive approaches. Focusing on encoder-based TSFMs, we propose Multiscale finetuning (MSFT), a simple yet general framework that explicitly integrates multi-scale modeling into the finetuning process. Experimental results on three different backbones (Moirai, Moment and Units) demonstrate that TSFMs finetuned with MSFT not only outperform naive and typical parameter efficient finetuning methods but also surpass state-of-the-art deep learning methods. Codes are available at https: //github. com/zqiao11/MSFT.

NeurIPS Conference 2025 Conference Paper

Probing Hidden Knowledge Holes in Unlearned LLMs

Myeongseob Ko
Hoang Anh Just
Charles Fleming
Ming Jin
Ruoxi Jia

Machine unlearning has emerged as a prevalent technical solution for selectively removing unwanted knowledge absorbed during pre-training, without requiring full retraining. While recent unlearning techniques can effectively remove undesirable content without severely compromising performance on standard benchmarks, we find that they may inadvertently create ``knowledge holes''---unintended losses of benign knowledge that standard benchmarks fail to capture. To probe where unlearned models reveal knowledge holes, we propose a test case generation framework that explores both immediate neighbors of unlearned content and broader areas of potential failures. Our evaluation demonstrates significant hidden costs of unlearning: up to 98. 7\% of the test cases yield irrelevant or nonsensical responses from unlearned models, despite being answerable by the pretrained model. These findings necessitate rethinking the conventional approach to evaluating knowledge preservation in unlearning, moving beyond standard, static benchmarks.

NeurIPS Conference 2025 Conference Paper

Reinforcement Learning with Backtracking Feedback

Bilgehan Sel
Vaishakh Keshava
Phillip Wallis
Lukas Rutishauser
Ming Jin
Dingcheng Li

Addressing the critical need for robust safety in Large Language Models (LLMs), particularly against adversarial attacks and in-distribution errors, we introduce Reinforcement Learning with Backtracking Feedback (RLBF). This framework advances upon prior methods, such as BSAFE, by primarily leveraging a Reinforcement Learning (RL) stage where models learn to dynamically correct their own generation errors. Through RL with critic feedback on the model's live outputs, LLMs are trained to identify and recover from their actual, emergent safety violations by emitting an efficient "backtrack by x tokens" signal, then continuing generation autoregressively. This RL process is crucial for instilling resilience against sophisticated adversarial strategies, including middle filling, Greedy Coordinate Gradient (GCG) attacks, and decoding parameter manipulations. To further support the acquisition of this backtracking capability, we also propose an enhanced Supervised Fine-Tuning (SFT) data generation strategy (BSAFE+). This method improves upon previous data creation techniques by injecting violations into coherent, originally safe text, providing more effective initial training for the backtracking mechanism. Comprehensive empirical evaluations demonstrate that RLBF significantly reduces attack success rates across diverse benchmarks and model scales, achieving superior safety outcomes while critically preserving foundational model utility.

NeurIPS Conference 2025 Conference Paper

ShapeX: Shapelet-Driven Post Hoc Explanations for Time Series Classification Models

Bosong Huang
Ming Jin
Yuxuan Liang
Johan Barthelemy
Debo Cheng
Qingsong Wen
Chenghao Liu
Shirui Pan

Explaining time series classification models is crucial, particularly in high-stakes applications such as healthcare and finance, where transparency and trust play a critical role. Although numerous time series classification methods have identified key subsequences, known as shapelets, as core features for achieving state-of-the-art performance and validating their pivotal role in classification outcomes, existing post-hoc time series explanation (PHTSE) methods primarily focus on timestep-level feature attribution. These explanation methods overlook the fundamental prior that classification outcomes are predominantly driven by key shapelets. To bridge this gap, we present ShapeX, an innovative framework that segments time series into meaningful shapelet-driven segments and employs Shapley values to assess their saliency. At the core of ShapeX lies the Shapelet Describe-and-Detect (SDD) framework, which effectively learns a diverse set of shapelets essential for classification. We further demonstrate that ShapeX produces explanations which reveal causal relationships instead of just correlations, owing to the atomicity properties of shapelets. Experimental results on both synthetic and real-world datasets demonstrate that ShapeX outperforms existing methods in identifying the most relevant subsequences, enhancing both the precision and causal fidelity of time series explanations.

IJCAI Conference 2025 Conference Paper

T2S: High-resolution Time Series Generation with Text-to-Series Diffusion Models

Yunfeng Ge
Jiawei Li
Yiji Zhao
Haomin Wen
Zhao Li
Meikang Qiu
Hongyan Li
Ming Jin

Text-to-Time Series generation holds significant potential to address challenges such as data sparsity, imbalance, and limited availability of multimodal time series data across domains. While diffusion models have achieved remarkable success in Text-to-X (e. g. , vision and audio data) generation, their use in time series generation remains limit. Existing approaches face two critical limitations: (1) reliance on domain-specific captions that generalize poorly, and (2) inability to generate time series of arbitrary length, limiting real-world use. In this work, we first introduce a new multimodal dataset containing over 600, 000 high-resolution text-time series pairs. Second, we propose Text-to-Series (T2S), a diffusion-based framework that bridges the gap between natural language and time series in a domain-agnostic manner. It employs a length-adaptive VAE to encode time series of varying lengths into consistent latent embeddings. On top of that, T2S effectively aligns textual representations with latent embeddings by utilizing Flow Matching and employing DiT as the denoiser. We train T2S in an interleaved paradigm across multiple lengths, allowing it to generate sequences of arbitrary lengths. Extensive evaluations demonstrate that T2S achieves state-of-the-art performance across 13 datasets spanning 12 domains.

PDF Details DOI

ICLR Conference 2025 Conference Paper

Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts

Xiaoming Shi
Shiyu Wang
Yuqi Nie
Dianqi Li
Zhou Ye 0001
Qingsong Wen
Ming Jin

Deep learning for time series forecasting has seen significant advancements over the past decades. However, despite the success of large-scale pre-training in language and vision domains, pre-trained time series models remain limited in scale and operate at a high cost, hindering the development of larger capable forecasting models in real-world applications. In response, we introduce Time-MoE, a scalable and unified architecture designed to pre-train larger, more capable forecasting foundation models while reducing inference costs. By leveraging a sparse mixture-of-experts (MoE) design, Time-MoE enhances computational efficiency by activating only a subset of networks for each prediction, reducing computational load while maintaining high model capacity. This allows Time-MoE to scale effectively without a corresponding increase in inference costs. Time-MoE comprises a family of decoder-only transformer models that operate in an auto-regressive manner and support flexible forecasting horizons with varying input context lengths. We pre-trained these models on our newly introduced large-scale data Time-300B, which spans over 9 domains and encompassing over 300 billion time points. For the first time, we scaled a time series foundation model up to 2.4 billion parameters, achieving significantly improved forecasting precision. Our results validate the applicability of scaling laws for training tokens and model size in the context of time series forecasting. Compared to dense models with the same number of activated parameters or equivalent computation budgets, our models consistently outperform them by large margin. These advancements position Time-MoE as a state-of-the-art solution for tackling real-world time series forecasting challenges with superior capability, efficiency, and flexibility. Code is available at https://github.com/Time-MoE/Time-MoE

IS Journal 2025 Journal Article

Transforming Urban Dynamics: Harnessing Large Language Models for Smarter Mobility

Hao Xue
Ming Jin
Shirui Pan
Flora Salim

Artificial intelligence (AI) has the potential to analyze mobility data and make mobility systems smarter by leveraging diverse data sources such as geospatial data, transportation logs, and real-time sensor data to optimize traffic flow, enhance public transportation systems, and support the development of autonomous vehicles. With the newly emerged generative AI paradigm, exemplified by large language models (LLMs), there is great potential to transform the current AI applications in mobility, transportation, and urban domains. This article provides an overview of recent efforts and aims to shed light on the challenges and future opportunities to facilitate the adaptation of LLMs for smarter mobility systems.

NeurIPS Conference 2024 Conference Paper

Attractor Memory for Long-Term Time Series Forecasting: A Chaos Perspective

Jiaxi Hu
Yuehong Hu
Wei Chen
Ming Jin
Shirui Pan
Qingsong Wen
Yuxuan Liang

In long-term time series forecasting (LTSF) tasks, an increasing number of works have acknowledged that discrete time series originate from continuous dynamic systems and have attempted to model their underlying dynamics. Recognizing the chaotic nature of real-world data, our model, Attraos, incorporates chaos theory into LTSF, perceiving real-world time series as low-dimensional observations from unknown high-dimensional chaotic dynamical systems. Under the concept of attractor invariance, Attraos utilizes non-parametric Phase Space Reconstruction embedding along with a novel multi-resolution dynamic memory unit to memorize historical dynamical structures, and evolves by a frequency-enhanced local evolution strategy. Detailed theoretical analysis and abundant empirical evidence consistently show that Attraos outperforms various LTSF methods on mainstream LTSF datasets and chaotic datasets with only one-twelfth of the parameters compared to PatchTST.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation

Shangding Gu
Bilgehan Sel
Yuhao Ding
Lu Wang
Qingwei Lin
Ming Jin
Alois Knoll

Ensuring the safety of Reinforcement Learning (RL) is crucial for its deployment in real-world applications. Nevertheless, managing the trade-off between reward and safety during exploration presents a significant challenge. Improving reward performance through policy adjustments may adversely affect safety performance. In this study, we aim to address this conflicting relation by leveraging the theory of gradient manipulation. Initially, we analyze the conflict between reward and safety gradients. Subsequently, we tackle the balance between reward and safety optimization by proposing a soft switching policy optimization method, for which we provide convergence analysis. Based on our theoretical examination, we provide a safe RL framework to overcome the aforementioned challenge, and we develop a Safety-MuJoCo Benchmark to assess the performance of safe RL algorithms. Finally, we evaluate the effectiveness of our method on the Safety-MuJoCo Benchmark and a popular safe benchmark, Omnisafe. Experimental results demonstrate that our algorithms outperform several state-of-the-art baselines in terms of balancing reward and safety optimization.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Boosting Alignment for Post-Unlearning Text-to-Image Generative Models

Myeongseob Ko
Henry Li
Zhun Wang
Jonathan Patsenker
Jiachen T. Wang
Qinbin Li
Ming Jin
Dawn Song

Large-scale generative models have shown impressive image-generation capabilities, propelled by massive data. However, this often inadvertently leads to the generation of harmful or inappropriate content and raises copyright concerns. Driven by these concerns, machine unlearning has become crucial to effectively purge undesirable knowledge from models. While existing literature has studied various unlearning techniques, these often suffer from either poor unlearning quality or degradation in text-image alignment after unlearning, due to the competitive nature of these objectives. To address these challenges, we propose a framework that seeks an optimal model update at each unlearning iteration, ensuring monotonic improvement on both objectives. We further derive the characterization of such an update. In addition, we design procedures to strategically diversify the unlearning and remaining datasets to boost performance improvement. Our evaluation demonstrates that our method effectively removes target classes from recent diffusion-based generative models and concepts from stable diffusion models while maintaining close alignment with the models' original trained states, thus outperforming state-of-the-art baselines.

PDF Details DOI

TMLR Journal 2024 Journal Article

Data-Centric Defense: Shaping Loss Landscape with Augmentations to Counter Model Inversion

Si Chen
Feiyang Kang
Nikhil Abhyankar
Ming Jin
Ruoxi Jia

Machine Learning models have shown susceptibility to various privacy attacks, with model inversion (MI) attacks posing a significant threat. Current defense techniques are mostly \emph{model-centric}, involving modifying model training or inference. However, these approaches require model trainers' cooperation, are computationally expensive, and often result in a significant privacy-utility tradeoff. To address these limitations, we propose a novel \emph{data-centric} approach to mitigate MI attacks. Compared to traditional model-centric techniques, our approach offers the unique advantage of enabling each individual user to control their data's privacy risk, aligning with findings from a Cisco survey that only a minority actively seek privacy protection. Specifically, we introduce several privacy-focused data augmentations that modify the private data uploaded to the model trainer. These augmentations shape the resulting model's loss landscape, making it challenging for attackers to generate private target samples. Additionally, we provide theoretical analysis to explain why such augmentations can reduce the risk of model inversion. We evaluate our approach against state-of-the-art MI attacks and demonstrate its effectiveness and robustness across various model architectures and datasets. Specifically, in standard face recognition benchmarks, we reduce face reconstruction success rates to $\leq5\%$, while maintaining high utility with only a 2\% classification accuracy drop, significantly surpassing state-of-the-art model-centric defenses. This is the first study to propose a data-centric approach for mitigating model inversion attacks, showing promising potential for decentralized privacy protection.

NeurIPS Conference 2024 Conference Paper

Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation

Shangding Gu
Laixi Shi
Yuhao Ding
Alois Knoll
Costas Spanos
Adam Wierman
Ming Jin

Safe reinforcement learning (RL) is crucial for deploying RL agents in real-world applications, as it aims to maximize long-term rewards while satisfying safety constraints. However, safe RL often suffers from sample inefficiency, requiring extensive interactions with the environment to learn a safe policy. We propose Efficient Safe Policy Optimization (ESPO), a novel approach that enhances the efficiency of safe RL through sample manipulation. ESPO employs an optimization framework with three modes: maximizing rewards, minimizing costs, and balancing the trade-off between the two. By dynamically adjusting the sampling process based on the observed conflict between reward and safety gradients, ESPO theoretically guarantees convergence, optimization stability, and improved sample complexity bounds. Experiments on the Safety-MuJoCo and Omnisafe benchmarks demonstrate that ESPO significantly outperforms existing primal-based and primal-dual-based baselines in terms of reward maximization and constraint satisfaction. Moreover, ESPO achieves substantial gains in sample efficiency, requiring 25--29\% fewer samples than baselines, and reduces training time by 21--38\%.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Fairness-Aware Meta-Learning via Nash Bargaining

Yi Zeng
Xuelin Yang
Li Chen
Cristian C. Ferrer
Ming Jin
Michael I. Jordan
Ruoxi Jia

To address issues of group-level fairness in machine learning, it is natural to adjust model parameters based on specific fairness objectives over a sensitive-attributed validation set. Such an adjustment procedure can be cast within a meta-learning framework. However, naive integration of fairness goals via meta-learning can cause hypergradient conflicts for subgroups, resulting in unstable convergence and compromising model performance and fairness. To navigate this issue, we frame the resolution of hypergradient conflicts as a multi-player cooperative bargaining game. We introduce a two-stage meta-learning framework in which the first stage involves the use of a Nash Bargaining Solution (NBS) to resolve hypergradient conflicts and steer the model toward the Pareto front, and the second stage optimizes with respect to specific fairness goals. Our method is supported by theoretical results, notably a proof of the NBS for gradient aggregation free from linear independence assumptions, a proof of Pareto improvement, and a proof of monotonic improvement in validation loss. We also show empirical effects across various fairness objectives in six key fairness datasets and two image classification tasks.

PDF Details DOI

TMLR Journal 2024 Journal Article

TeaMs-RL: Teaching LLMs to Generate Better Instruction Datasets via Reinforcement Learning

Shangding Gu
Alois Knoll
Ming Jin

The development of Large Language Models (LLMs) often confronts challenges stemming from the heavy reliance on human annotators in the reinforcement learning with human feedback (RLHF) framework, or the frequent and costly external queries tied to the self-instruct paradigm. In this work, we pivot to Reinforcement Learning (RL)---but with a twist. Diverging from the typical RLHF, which refines LLMs following instruction data training, we use RL to directly generate the foundational instruction dataset that alone suffices for fine-tuning. Our method, TeaMs-RL, uses a suite of textual operations and rules, prioritizing the diversification of training datasets. It facilitates the generation of high-quality data without excessive reliance on external advanced models, paving the way for a single fine-tuning step and negating the need for subsequent RLHF stages. Our findings highlight key advantages of our approach: reduced need for human involvement and fewer model queries (only $5.73\%$ of the strong baseline's total), along with enhanced capabilities of LLMs in crafting and comprehending complex instructions compared to strong baselines, and substantially improved model privacy protection. Code is available at the link: https://github.com/SafeRL-Lab/TeaMs-RL

AAAI Conference 2023 Conference Paper

Non-stationary Risk-Sensitive Reinforcement Learning: Near-Optimal Dynamic Regret, Adaptive Detection, and Separation Design

Yuhao Ding
Ming Jin
Javad Lavaei

We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic non-stationary Markov decision processes (MDPs). Both the reward functions and the state transition kernels are unknown and allowed to vary arbitrarily over time with a budget on their cumulative variations. When this variation budget is known a prior, we propose two restart-based algorithms, namely Restart-RSMB and Restart-RSQ, and establish their dynamic regrets. Based on these results, we further present a meta-algorithm that does not require any prior knowledge of the variation budget and can adaptively detect the non-stationarity on the exponential value functions. A dynamic regret lower bound is then established for non-stationary risk-sensitive RL to certify the near-optimality of the proposed algorithms. Our results also show that the risk control and the handling of the non-stationarity can be separately designed in the algorithm if the variation budget is known a prior, while the non-stationary detection mechanism in the adaptive algorithm depends on the risk parameter. This work offers the first non-asymptotic theoretical analyses for the non-stationary risk-sensitive RL in the literature.

PDF Details DOI

AAAI Conference 2023 Conference Paper

On Solution Functions of Optimization: Universal Approximation and Covering Number Bounds

Ming Jin
Vanshaj Khattar
Harshal Kaushik
Bilgehan Sel
Ruoxi Jia

We study the expressibility and learnability of solution functions of convex optimization and their multi-layer architectural extension. The main results are: (1) the class of solution functions of linear programming (LP) and quadratic programming (QP) is a universal approximant for the smooth model class or some restricted Sobolev space, and we characterize the rate-distortion, (2) the approximation power is investigated through a viewpoint of regression error, where information about the target function is provided in terms of data observations, (3) compositionality in the form of deep architecture with optimization as a layer is shown to reconstruct some basic functions used in numerical analysis without error, which implies that (4) a substantial reduction in rate-distortion can be achieved with a universal network architecture, and (5) we discuss the statistical bounds of empirical covering numbers for LP/QP, as well as a generic optimization problem (possibly nonconvex) by exploiting tame geometry. Our results provide the **first rigorous analysis of the approximation and learning-theoretic properties of solution functions** with implications for algorithmic design and performance guarantees.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Tempo Adaptation in Non-stationary Reinforcement Learning

Hyunin Lee
Yuhao Ding
JongMin Lee
Ming Jin
Javad Lavaei
Somayeh Sojoudi

We first raise and tackle a ``time synchronization'' issue between the agent and the environment in non-stationary reinforcement learning (RL), a crucial factor hindering its real-world applications. In reality, environmental changes occur over wall-clock time ($t$) rather than episode progress ($k$), where wall-clock time signifies the actual elapsed time within the fixed duration $t \in [0, T]$. In existing works, at episode $k$, the agent rolls a trajectory and trains a policy before transitioning to episode $k+1$. In the context of the time-desynchronized environment, however, the agent at time $t_{k}$ allocates $\Delta t$ for trajectory generation and training, subsequently moves to the next episode at $t_{k+1}=t_{k}+\Delta t$. Despite a fixed total number of episodes ($K$), the agent accumulates different trajectories influenced by the choice of interaction times ($t_1, t_2, .. ., t_K$), significantly impacting the suboptimality gap of the policy. We propose a Proactively Synchronizing Tempo ($\texttt{ProST}$) framework that computes a suboptimal sequence {$t_1, t_2, .. ., t_K$} (= { $t_{1: K}$}) by minimizing an upper bound on its performance measure, i. e. , the dynamic regret. Our main contribution is that we show that a suboptimal {$t_{1: K}$} trades-off between the policy training time (agent tempo) and how fast the environment changes (environment tempo). Theoretically, this work develops a suboptimal {$t_{1: K}$} as a function of the degree of the environment's non-stationarity while also achieving a sublinear dynamic regret. Our experimental evaluation on various high-dimensional non-stationary environments shows that the $\texttt{ProST}$ framework achieves a higher online return at suboptimal {$t_{1: K}$} than the existing methods.

AAAI Conference 2023 Conference Paper

Winning the CityLearn Challenge: Adaptive Optimization with Evolutionary Search under Trajectory-Based Guidance

Vanshaj Khattar
Ming Jin

Modern power systems will have to face difficult challenges in the years to come: frequent blackouts in urban areas caused by high peaks of electricity demand, grid instability exacerbated by the intermittency of renewable generation, and climate change on a global scale amplified by increasing carbon emissions. While current practices are growingly inadequate, the pathway of artificial intelligence (AI)-based methods to widespread adoption is hindered by missing aspects of trustworthiness. The CityLearn Challenge is an exemplary opportunity for researchers from multi-disciplinary fields to investigate the potential of AI to tackle these pressing issues within the energy domain, collectively modeled as a reinforcement learning (RL) task. Multiple real-world challenges faced by contemporary RL techniques are embodied in the problem formulation. In this paper, we present a novel method using the solution function of optimization as policies to compute the actions for sequential decision-making, while notably adapting the parameters of the optimization model from online observations. Algorithmically, this is achieved by an evolutionary algorithm under a novel trajectory-based guidance scheme. Formally, the global convergence property is established. Our agent ranked first in the latest 2021 CityLearn Challenge, being able to achieve superior performance in almost all metrics while maintaining some key aspects of interpretability.

PDF Details DOI

JBHI Journal 2022 Journal Article

Dynamic Domain Adaptation for Class-Aware Cross-Subject and Cross-Session EEG Emotion Recognition

Zhunan Li
Enwei Zhu
Ming Jin
Cunhang Fan
Huiguang He
Ting Cai
Jinpeng Li

It is vital to develop general models that can be shared across subjects and sessions in the real-world deployment of electroencephalogram (EEG) emotion recognition systems. Many prior studies have exploited domain adaptation algorithms to alleviate the inter-subject and inter-session discrepancies of EEG distributions. However, these methods only aligned the global domain divergence, but overlooked the local domain divergence with respect to each emotion category. This degenerates the emotion-discriminating ability of the domain invariant features. In this paper, we argue that aligning the EEG data within the same emotion categories is important for generalizable and discriminative features. Hence, we propose the dynamic domain adaptation (DDA) algorithm where the global and local divergences are disposed by minimizing the global domain discrepancy and local subdomain discrepancy, respectively. To tackle the absence of emotion labels in the target domain, we introduce a dynamic training strategy where the model focuses on optimizing the global domain discrepancy in the early training steps, and then gradually switches to the local subdomain discrepancy. The DDA algorithm is formally implemented as an unsupervised version and a semi-supervised version for different experimental settings. Based on the coarse-to-fine alignment, our model achieves the average peak accuracy of 91. 08%, 92. 89% on SEED, and 81. 58%, 80. 82% on SEED-IV in the cross-subject and cross-session scenarios, respectively.

NeurIPS Conference 2022 Conference Paper

Neural Temporal Walks: Motif-Aware Representation Learning on Continuous-Time Dynamic Graphs

Ming Jin
Yuan-Fang Li
Shirui Pan

Continuous-time dynamic graphs naturally abstract many real-world systems, such as social and transactional networks. While the research on continuous-time dynamic graph representation learning has made significant advances recently, neither graph topological properties nor temporal dependencies have been well-considered and explicitly modeled in capturing dynamic patterns. In this paper, we introduce a new approach, Neural Temporal Walks (NeurTWs), for representation learning on continuous-time dynamic graphs. By considering not only time constraints but also structural and tree traversal properties, our method conducts spatiotemporal-biased random walks to retrieve a set of representative motifs, enabling temporal nodes to be characterized effectively. With a component based on neural ordinary differential equations, the extracted motifs allow for irregularly-sampled temporal nodes to be embedded explicitly over multiple different interaction time intervals, enabling the effective capture of the underlying spatiotemporal dynamics. To enrich supervision signals, we further design a harder contrastive pretext task for model optimization. Our method demonstrates overwhelming superiority under both transductive and inductive settings on six real-world datasets.

AAAI Conference 2022 Conference Paper

Recurrent Neural Network Controllers Synthesis with Stability Guarantees for Partially Observed Systems

Fangda Gu
He Yin
Laurent El Ghaoui
Murat Arcak
Peter Seiler
Ming Jin

Neural network controllers have become popular in control tasks thanks to their flexibility and expressivity. Stability is a crucial property for safety-critical dynamical systems, while stabilization of partially observed systems, in many cases, requires controllers to retain and process long-term memories of the past. We consider the important class of recurrent neural networks (RNN) as dynamic controllers for nonlinear uncertain partially-observed systems, and derive convex stability conditions based on integral quadratic constraints, S-lemma and sequential convexification. To ensure stability during the learning and control process, we propose a projected policy gradient method that iteratively enforces the stability conditions in the reparametrized space taking advantage of mild additional information on system dynamics. Numerical experiments show that our method learns stabilizing controllers while using fewer samples and achieving higher final performance compared with policy gradient.

IJCAI Conference 2021 Conference Paper

Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning

Ming Jin
Yizhen Zheng
Yuan-Fang Li
Chen Gong
Chuan Zhou
Shirui Pan

Graph representation learning plays a vital role in processing graph-structured data. However, prior arts on graph representation learning heavily rely on labeling information. To overcome this problem, inspired by the recent success of graph contrastive learning and Siamese networks in visual representation learning, we propose a novel self-supervised approach in this paper to learn node representations by enhancing Siamese self-distillation with multi-scale contrastive learning. Specifically, we first generate two augmented views from the input graph based on local and global perspectives. Then, we employ two objectives called cross-view and cross-network contrastiveness to maximize the agreement between node representations across different views and networks. To demonstrate the effectiveness of our approach, we perform empirical experiments on five real-world datasets. Our method not only achieves new state-of-the-art results but also surpasses some semi-supervised counterparts by large margins. Code is made available at https: //github. com/GRAND-Lab/MERIT

PDF Details DOI

AAAI Conference 2021 Conference Paper

Power up! Robust Graph Convolutional Network via Graph Powering

Ming Jin
Heng Chang
Wenwu Zhu
Somayeh Sojoudi

Graph convolutional networks (GCNs) are powerful tools for graph-structured data. However, they have been recently shown to be vulnerable to topological attacks. To enhance adversarial robustness, we go beyond spectral graph theory to robust graph theory. By challenging the classical graph Laplacian, we propose a new convolution operator that is provably robust in the spectral domain and is incorporated in the GCN architecture to improve expressivity and interpretability. By extending the original graph to a sequence of graphs, we also propose a robust training paradigm that encourages transferability across graphs that span a range of spatial and spectral characteristics. The proposed approaches are demonstrated in extensive experiments to simultaneously improve performance in both benign and adversarial situations.