Arrow Research search

Author name cluster

Haifeng Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

43 papers
2 author rows

Possible papers

43

AAAI Conference 2026 Conference Paper

Brownian Bridge Augmented Surrogate Simulation and Injection Planning for Geological CO2 Storage

  • Haoyue Bai
  • Guodong Chen
  • Wangyang Ying
  • Xinyuan Wang
  • Nanxu Gong
  • Sixun Dong
  • Giulia Pedrielli
  • Haoyu Wang

Geological CO2 storage (GCS) involves injecting captured CO2 into deep subsurface formations to support climate goals. The effective management of GCS relies on adaptive injection planning to dynamically control injection rates and well pressures to balance both storage safety and efficiency. Prior literature, including numerical optimization methods and surrogate-optimization methods, is limited by real-world GCS requirements of smooth state transitions and goal-directed planning within limited time. To address these limitations, we propose a Brownian Bridge–augmented framework for surrogate simulation and injection planning in GCS and develop two insights (i) Brownian bridge as smooth state regularizer for better surrogate simulator; (ii) Brownian bridge as goal-time-conditioned planning guidance for better injection planning. Our method has three stages: (i) learning deep Brownian bridge representations with contrastive and reconstructive losses from historical reservoir and utility trajectories, (ii) incorporating Brownian bridge-based next state interpolation for simulator regularization (iii) guiding injection planning with Brownian utility-conditioned trajectories to generate high-quality injection plans. Experimental results across multiple datasets collected from diverse GCS settings demonstrate that our framework consistently improves simulation fidelity and planning effectiveness while maintaining low computational overhead.

AAAI Conference 2026 Conference Paper

MARLIN: Multi-Agent Reinforcement Learning for Incremental DAG Discovery

  • Dong Li
  • Zhengzhang Chen
  • Xujiang Zhao
  • Linlin Yu
  • Zhong Chen
  • Yi He
  • Haifeng Chen
  • Chen Zhao

Uncovering causal structures from observational data is crucial for understanding complex systems and making informed decisions. While reinforcement learning (RL) has shown promise in identifying these structures in the form of a directed acyclic graph (DAG), existing methods often lack efficiency, making them unsuitable for online applications. In this paper, we propose MARLIN, an efficient multi-agent RL-based approach for incremental DAG learning. MARLIN uses a DAG generation policy that maps a continuous real-valued space to the DAG space as an intra-batch strategy, then incorporates two RL agents—state-specific and state-invariant—to uncover causal relationships and integrates these agents into an incremental learning framework. Furthermore, the framework leverages a factored action space to enhance parallelization efficiency. Extensive experiments on synthetic and real datasets demonstrate that MARLIN outperforms state-of-the-art methods in terms of both efficiency and effectiveness.

ICLR Conference 2025 Conference Paper

Chain-of-region: Visual Language Models Need Details for Diagram Analysis

  • Xue Li
  • Yiyou Sun
  • Wei Cheng 0002
  • Yinglun Zhu
  • Haifeng Chen

Visual Language Models (VLMs) like GPT-4V have broadened the scope of LLM applications, yet they face significant challenges in accurately processing visual details, particularly in scientific diagrams. This paper explores the necessity of meticulous visual detail collection and region decomposition for enhancing the performance of VLMs in scientific diagram analysis. We propose a novel approach that combines traditional computer vision techniques with VLMs to systematically decompose diagrams into discernible visual elements and aggregate essential metadata. Our method employs techniques in OpenCV library to identify and label regions, followed by a refinement process using shape detection and region merging algorithms, which are particularly suited to the structured nature of scientific diagrams. This strategy not only improves the granularity and accuracy of visual information processing but also extends the capabilities of VLMs beyond their current limitations. We validate our approach through a series of experiments that demonstrate enhanced performance in diagram analysis tasks, setting a new standard for integrating visual and language processing in a multimodal context.

NeurIPS Conference 2025 Conference Paper

DISC: Dynamic Decomposition Improves LLM Inference Scaling

  • Jonathan Li
  • Wei Cheng
  • Benjamin Riviere
  • Yue Wu
  • Masafumi Oyamada
  • Mengdi Wang
  • Yisong Yue
  • Santiago Paternain

Inference scaling methods for LLMs often rely on decomposing problems into steps (or groups of tokens), followed by sampling and selecting the best next steps. However, these steps and their sizes are often predetermined or manually designed based on domain knowledge. We propose dynamic decomposition, a method that adaptively and automatically partitions solution and reasoning traces into manageable steps during inference. By more effectively allocating compute -- particularly through subdividing challenging steps and prioritizing their sampling -- dynamic decomposition significantly improves inference efficiency. Experiments on benchmarks such as APPS, MATH, and LiveCodeBench demonstrate that dynamic decomposition outperforms static approaches, including token-level, sentence-level, and single-step decompositions, reducing the pass@10 error rate by 5. 0%, 6. 7%, and 10. 5% respectively. These findings highlight the potential of dynamic decomposition to improve a wide range of inference scaling techniques.

AAAI Conference 2025 Conference Paper

Evolutionary Large Language Model for Automated Feature Transformation

  • Nanxu Gong
  • Chandan K Reddy
  • Wangyang Ying
  • Haifeng Chen
  • Yanjie Fu

Feature transformation aims to reconstruct the feature space of raw features to enhance the performance of downstream models. However, the exponential growth in the combinations of features and operations poses a challenge, making it difficult for existing methods to efficiently explore a wide space. Additionally, their optimization is solely driven by the accuracy of downstream models in specific domains, neglecting the acquisition of general feature knowledge. To fill this research gap, we propose an evolutionary LLM framework for automated feature transformation. This framework consists of two parts: 1) constructing a multi-population database through an RL data collector while utilizing evolutionary algorithm strategies for database maintenance, and 2) utilizing the ability of Large Language Model (LLM) in sequence understanding, we employ few-shot prompts to guide LLM in generating superior samples based on feature transformation sequence distinction. Leveraging the multi-population database initially provides a wide search scope to discover excellent populations. Through culling and evolution, high-quality populations are given greater opportunities, thereby furthering the pursuit of optimal individuals. By integrating LLMs with evolutionary algorithms, we achieve efficient exploration within a vast space, while harnessing feature knowledge to propel optimization, thus realizing a more adaptable search paradigm. Finally, we empirically demonstrate the effectiveness and generality of our proposed method.

IJCAI Conference 2025 Conference Paper

Harnessing Vision Models for Time Series Analysis: A Survey

  • Jingchao Ni
  • Ziming Zhao
  • ChengAo Shen
  • Hanghang Tong
  • Dongjin Song
  • Wei Cheng
  • Dongsheng Luo
  • Haifeng Chen

Time series analysis has evolved from traditional autoregressive models to deep learning, Transformers, and Large Language Models (LLMs). While vision models have also been explored along the way, their contributions are less recognized due to the predominance of sequence modeling. However, challenges such as the mismatch between continuous time series and LLMs’ discrete token space, and the difficulty in capturing multivariate correlations, have led to growing interest in Large Vision Models (LVMs) and Vision-Language Models (VLMs). This survey highlights the advantages of vision models over LLMs in time series analysis, offering a comprehensive dual-view taxonomy that answers key research questions like how to encode time series as images and how to model imaged time series. Additionally, we address pre- and post-processing challenges in this framework and outline future directions for advancing the field.

NeurIPS Conference 2025 Conference Paper

Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection

  • Cong Zeng
  • Shengkun Tang
  • Yuanzhou Chen
  • Zhiqiang Shen
  • Wenchao Yu
  • Xujiang Zhao
  • Haifeng Chen
  • Wei Cheng

The rapid advancement of large language models (LLMs) such as ChatGPT, DeepSeek, and Claude has significantly increased the presence of AI-generated text in digital communication. This trend has heightened the need for reliable detection methods to distinguish between human-authored and machine-generated content. Existing approaches both zero-shot methods and supervised classifiers largely conceptualize this task as a binary classification problem, often leading to poor generalization across domains and models. In this paper, we argue that such a binary formulation fundamentally mischaracterizes the detection task by assuming a coherent representation of human-written texts. In reality, human texts do not constitute a unified distribution, and their diversity cannot be effectively captured through limited sampling. This causes previous classifiers to memorize observed OOD characteristics rather than learn the essence of `non-ID' behavior, limiting generalization to unseen human-authored inputs. Based on this observation, we propose reframing the detection task as an out-of-distribution (OOD) detection problem, treating human-written texts as distributional outliers while machine-generated texts are in-distribution (ID) samples. To this end, we develop a detection framework using one-class learning method including DeepSVDD and HRN, and score-based learning techniques such as energy-based method, enabling robust and generalizable performance. Extensive experiments across multiple datasets validate the effectiveness of our OOD-based approach. Specifically, the OOD-based method achieves 98. 3\% AUROC and AUPR with only 8. 9\% FPR95 on DeepFake dataset. Moreover, we test our detection framework on multilingual, attacked, and unseen-model and -domain text settings, demonstrating the robustness and generalizability of our framework. Code will be released openly and also available in the supplementary materials.

ICLR Conference 2025 Conference Paper

Humanizing the Machine: Proxy Attacks to Mislead LLM Detectors

  • Tianchun Wang
  • Yuanzhou Chen
  • Zichuan Liu
  • Zhanwen Chen
  • Haifeng Chen
  • Xiang Zhang 0001
  • Wei Cheng 0002

The advent of large language models (LLMs) has revolutionized the field of text generation, producing outputs that closely mimic human-like writing. Although academic and industrial institutions have developed detectors to prevent the malicious usage of LLM-generated texts, other research has doubt about the robustness of these systems. To stress test these detectors, we introduce a humanized proxy-attack (HUMPA) strategy that effortlessly compromises LLMs, causing them to produce outputs that align with human-written text and mislead detection systems. Our method attacks the source model by leveraging a reinforcement learning (RL) fine-tuned humanized small language model (SLM) in the decoding phase. Through an in-depth analysis, we demonstrate that our attack strategy is capable of generating responses that are indistinguishable to detectors, preventing them from differentiating between machine-generated and human-written text. We conduct systematic evaluations on extensive datasets using proxy-attacked open-source models, including Llama2-13B, Llama3-70B, and Mixtral-8x7B in both white- and black-box settings. Our findings show that the proxy-attack strategy effectively deceives the leading detectors, resulting in an average AUROC drop of 70.4% across multiple datasets, with a maximum drop of 95.0% on a single dataset. Furthermore, in cross-discipline scenarios, our strategy also bypasses these detectors, leading to a significant relative decrease of up to 90.9%, while in cross-language scenario, the drop reaches 91.3%. Despite our proxy-attack strategy successfully bypassing the detectors with such significant relative drops, we find that the generation quality of the attacked models remains preserved, even within a modest utility budget, when compared to the text produced by the original, unattacked source model.

AAAI Conference 2025 System Paper

Incident Diagnosing and Reporting System Based on Retrieval Augmented Large Language Model

  • Peng Yuan
  • LuAn Tang
  • Yanchi Liu
  • Kobayashi Yuji
  • Moto Sato
  • Haifeng Chen

The Internet of Things (IoT) is widely used in many applications such as smart city, transportation, healthcare, and environment monitoring. A key task of IoT maintenance is to analyze the abnormal sensor records and generate incident report. Traditionally, domain experts engage in such labor intensive tasks. Recent advances in Large Language Model (LLM) have sparked interests in developing AI based systems to automate these labor intensive processes. However, two critical problems hinder the effective application of LLM in IoTs: (1) LLM lacks background knowledge of deployed IoTs; and (2) the incidents are complex events involving many sensors and components. LLM needs to understand the sensor relationships for accurate diagnosis. In this study, we propose a Retrieval Augmented language model based Incident Diagnosing and Reporting system (RAIDR) for IoT applications. RAIDR retrieves related system documents based on the incident features and leverages LLM to analyze anomalies, identify root causes, and automatically generate incident reports. The automated incident reporting process streamlines end users’ decision making for system maintenance and troubleshooting.

NeurIPS Conference 2025 Conference Paper

Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting

  • ChengAo Shen
  • Wenchao Yu
  • Ziming Zhao
  • Dongjin Song
  • Wei Cheng
  • Haifeng Chen
  • Jingchao Ni

Time series, typically represented as numerical sequences, can also be transformed into images and texts, offering multi-modal views (MMVs) of the same underlying signal. These MMVs can reveal complementary patterns and enable the use of powerful pre-trained large models, such as large vision models (LVMs), for long-term time series forecasting (LTSF). However, as we identified in this work, the state-of-the-art (SOTA) LVM-based forecaster poses an inductive bias towards "forecasting periods". To harness this bias, we propose DMMV, a novel decomposition-based multi-modal view framework that leverages trend-seasonal decomposition and a novel backcast-residual based adaptive decomposition to integrate MMVs for LTSF. Comparative evaluations against 14 SOTA models across diverse datasets show that DMMV outperforms single-view and existing multi-modal baselines, achieving the best mean squared error (MSE) on 6 out of 8 benchmark datasets. The code for this paper is available at: https: //github. com/D2I-Group/dmmv.

ICLR Conference 2025 Conference Paper

SFS: Smarter Code Space Search improves LLM Inference Scaling

  • Jonathan Light
  • Yue Wu
  • Yiyou Sun
  • Wenchao Yu
  • Yanchi Liu
  • Xujiang Zhao
  • Ziniu Hu
  • Haifeng Chen

We frame code generation as a black-box optimization problem within the code space and demonstrate how optimization-inspired techniques can enhance inference scaling over text. Based on this perspective, we propose **SCATTERED FOREST SEARCH (SFS)**, a novel approach that improves solution diversity during evolutionary search, thereby avoiding local optima. Our theoretical analysis illustrates how these methods improve exploration and enhance efficiency. Extensive experiments on *HumanEval, MBPP, APPS, CodeContests,* and *Leetcode* reveal significant performance gains. For instance, our method achieves a **pass@1 rate of 67.1% on HumanEval+** and **87.2% on HumanEval with GPT-3.5**, marking improvements of **8.6%** and **4.3%** over the state-of-the-art, while also halving the iterations needed to find the correct solution. Furthermore, our approach scales more efficiently than existing search techniques, including **tree search, line search,** and **repeated sampling (Best of N)**.

NeurIPS Conference 2025 Conference Paper

SolverLLM: Leveraging Test-Time Scaling for Optimization Problem via LLM-Guided Search

  • Dong Li
  • Xujiang Zhao
  • Linlin Yu
  • Yanchi Liu
  • Wei Cheng
  • Zhengzhang Chen
  • Zhong Chen
  • Feng Chen

Large Language Models (LLMs) offer promising capabilities for tackling complex reasoning tasks, including optimization problems. However, existing methods either rely on prompt engineering, which leads to poor generalization across problem types, or require costly supervised training. We introduce SolverLLM, a training-free framework that leverages test-time scaling to solve diverse optimization problems. Rather than solving directly, SolverLLM generates mathematical formulations and translates them into solver-ready code, guided by a novel Monte Carlo Tree Search (MCTS) strategy. To enhance the search process, we modify classical MCTS with (1) dynamic expansion for adaptive formulation generation, (2) prompt backpropagation to guide exploration via outcome-driven feedback, and (3) uncertainty backpropagation to incorporate reward reliability into decision-making. Experiments on six standard benchmark datasets demonstrate that SolverLLM outperforms both prompt-based and learning-based baselines, achieving strong generalization without additional training.

AAAI Conference 2025 Conference Paper

TimeCAP: Learning to Contextualize, Augment, and Predict Time Series Events with Large Language Model Agents

  • Geon Lee
  • Wenchao Yu
  • Kijung Shin
  • Wei Cheng
  • Haifeng Chen

Time series data is essential in various applications, including climate modeling, healthcare monitoring, and financial analytics. Understanding the contextual information associated with real-world time series data is often essential for accurate and reliable event predictions. In this paper, we introduce TimeCAP, a time-series processing framework that creatively employs Large Language Models (LLMs) as contextualizers of time series data, extending their typical usage as predictors. TimeCAP incorporates two independent LLM agents: one generates a textual summary capturing the context of the time series, while the other uses this enriched summary to make more informed predictions. In addition, TimeCAP employs a multi-modal encoder that synergizes with the LLM agents, enhancing predictive performance through mutual augmentation of inputs with in-context examples. Experimental results on real-world datasets demonstrate that TimeCAP outperforms state-of-the-art methods for time series event prediction, including those utilizing LLMs as predictors, achieving an average improvement of 28.75% in F1 score.

NeurIPS Conference 2025 Conference Paper

TimeXL: Explainable Multi-modal Time Series Prediction with LLM-in-the-Loop

  • Yushan Jiang
  • Wenchao Yu
  • Geon Lee
  • Dongjin Song
  • Kijung Shin
  • Wei Cheng
  • Yanchi Liu
  • Haifeng Chen

Time series analysis provides essential insights for real-world system dynamics and informs downstream decision-making, yet most existing methods often overlook the rich contextual signals present in auxiliary modalities. To bridge this gap, we introduce TimeXL, a multi-modal prediction framework that integrates a prototype-based time series encoder with three collaborating Large Language Models (LLMs) to deliver more accurate predictions and interpretable explanations. First, a multi-modal prototype-based encoder processes both time series and textual inputs to generate preliminary forecasts alongside case-based rationales. These outputs then feed into a prediction LLM, which refines the forecasts by reasoning over the encoder's predictions and explanations. Next, a reflection LLM compares the predicted values against the ground truth, identifying textual inconsistencies or noise. Guided by this feedback, a refinement LLM iteratively enhances text quality and triggers encoder retraining. This closed-loop workflow---prediction, critique (reflect), and refinement---continuously boosts the framework's performance and interpretability. Empirical evaluations on four real-world datasets demonstrate that TimeXL achieves up to 8. 9\% improvement in AUC and produces human-centric, multi-modal explanations, highlighting the power of LLM-driven reasoning for time series prediction.

IJCAI Conference 2025 Conference Paper

Unsupervised Feature Transformation via In-context Generation, Generator-critic LLM Agents, and Duet-play Teaming

  • Nanxu Gong
  • Xinyuan Wang
  • Wangyang Ying
  • Haoyue Bai
  • Sixun Dong
  • Haifeng Chen
  • Yanjie Fu

Feature transformation involves generating a new set of features from the original dataset to enhance the data's utility. In certain domains like material performance screening, dimensionality is large and collecting labels is expensive and lengthy. It highly necessitates transforming feature spaces efficiently and without supervision to enhance data readiness and AI utility. However, existing methods fall short in efficient navigation of a vast space of feature combinations, and are mostly designed for supervised settings. To fill this gap, our unique perspective is to leverage a generator-critic duet-play teaming framework using LLM agents and in-context learning to derive pseudo-supervision from unsupervised data. The framework consists of three interconnected steps: (1) Critic agent diagnoses data to generate actionable advice, (2) Generator agent produces tokenized feature transformations guided by the critic's advice, and (3) Iterative refinement ensures continuous improvement through feedback between agents. The generator-critic framework can be generalized to human-agent collaborative generation, by replacing the critic agent with human experts. Extensive experiments demonstrate that the proposed framework outperforms even supervised baselines in feature transformation efficiency, robustness, and practical applicability across diverse datasets. Our code is publicly available at https: //github. com/NanxuGong/LPFG.

NeurIPS Conference 2024 Conference Paper

DALD: Improving Logits-based Detector without Logits from Black-box LLMs

  • Cong Zeng
  • Shengkun Tang
  • Xianjun Yang
  • Yuanzhou Chen
  • Yiyou Sun
  • Zhiqiang Xu
  • Yao Li
  • Haifeng Chen

The advent of Large Language Models (LLMs) has revolutionized text generation, producing outputs that closely mimic human writing. This blurring of lines between machine- and human-written text presents new challenges in distinguishing one from the other – a task further complicated by the frequent updates and closed nature of leading proprietary LLMs. Traditional logits-based detection methods leverage surrogate models for identifying LLM-generated content when the exact logits are unavailable from black-box LLMs. However, these methods grapple with the misalignment between the distributions of the surrogate and the often undisclosed target models, leading to performance degradation, particularly with the introduction of new, closed-source models. Furthermore, while current methodologies are generally effective when the source model is identified, they falter in scenarios where the model version remains unknown, or the test set comprises outputs from various source models. To address these limitations, we present \textbf{D}istribution-\textbf{A}ligned \textbf{L}LMs \textbf{D}etection (DALD), an innovative framework that redefines the state-of-the-art performance in black-box text detection even without logits from source LLMs. DALD is designed to align the surrogate model's distribution with that of unknown target LLMs, ensuring enhanced detection capability and resilience against rapid model iterations with minimal training investment. By leveraging corpus samples from publicly accessible outputs of advanced models such as ChatGPT, GPT-4 and Claude-3, DALD fine-tunes surrogate models to synchronize with unknown source model distributions effectively. Our approach achieves SOTA performance in black-box settings on different advanced closed-source and open-source models. The versatility of our method enriches widely adopted zero-shot detection frameworks (DetectGPT, DNA-GPT, Fast-DetectGPT) with a `plug-and-play' enhancement feature. Extensive experiments validate that our methodology reliably secures high detection precision for LLM-generated text and effectively detects text from diverse model origins through a singular detector. Our method is also robust under the revised text attack and non-English texts.

ICML Conference 2024 Conference Paper

DFA-RAG: Conversational Semantic Router for Large Language Model with Definite Finite Automaton

  • Yiyou Sun
  • Junjie Hu
  • Wei Cheng 0002
  • Haifeng Chen

This paper introduces the retrieval-augmented large language model with Definite Finite Automaton (DFA-RAG), a novel framework designed to enhance the capabilities of conversational agents using large language models (LLMs). Traditional LLMs face challenges in generating regulated and compliant responses in special scenarios with predetermined response guidelines, like emotional support and customer service. Our framework addresses these challenges by embedding a Definite Finite Automaton (DFA), learned from training dialogues, within the LLM. This structured approach acts as a semantic router which enables the LLM to adhere to a deterministic response pathway. The routing is achieved by the retrieval-augmentation generation (RAG) strategy, which carefully selects dialogue examples aligned with the current conversational context. The advantages of DFA-RAG include an interpretable structure through human-readable DFA, context-aware retrieval for responses in conversations, and plug-and-play compatibility with existing LLMs. Extensive benchmarks validate DFA-RAG’s effectiveness, indicating its potential as a valuable contribution to the conversational agent.

ICLR Conference 2024 Conference Paper

DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

  • Xianjun Yang
  • Wei Cheng 0002
  • Yue Wu
  • Linda Ruth Petzold
  • William Yang Wang
  • Haifeng Chen

Large language models (LLMs) have notably enhanced the fluency and diversity of machine-generated text. However, this progress also presents a significant challenge in detecting the origin of a given text, and current research on detection methods lags behind the rapid evolution of LLMs. Conventional training-based methods have limitations in flexibility, particularly when adapting to new domains, and they often lack explanatory power. To address this gap, we propose a novel training-free detection strategy called Divergent N-Gram Analysis (DNA-GPT). Given a text, we first truncate it in the middle and then use only the preceding portion as input to the LLMs to regenerate the new remaining parts. By analyzing the differences between the original and new remaining parts through N-gram analysis in black-box or probability divergence in white-box, we can clearly illustrate significant discrepancies between machine-generated and human-written text. We conducted extensive experiments on the most advanced LLMs from OpenAI, including text-davinci-003, GPT-3.5-turbo, and GPT-4, as well as open-source models such as GPT-NeoX-20B and LLaMa-13B. Results show that our zero-shot approach exhibits state-of-the-art performance in distinguishing between human and GPT-generated text on four English and one German dataset, outperforming OpenAI's own classifier, which is trained on millions of text. Additionally, our methods provide reasonable explanations and evidence to support our claim, which is a unique feature of explainable detection. Our method is also robust under the revised text attack and can additionally solve model sourcing.

ICLR Conference 2024 Conference Paper

Parametric Augmentation for Time Series Contrastive Learning

  • Xu Zheng 0003
  • Tianchun Wang
  • Wei Cheng 0002
  • Aitian Ma
  • Haifeng Chen
  • Mo Sha 0001
  • Dongsheng Luo

Modern techniques like contrastive learning have been effectively used in many areas, including computer vision, natural language processing, and graph-structured data. Creating positive examples that assist the model in learning robust and discriminative representations is a crucial stage in contrastive learning approaches. Usually, preset human intuition directs the selection of relevant data augmentations. Due to patterns that are easily recognized by humans, this rule of thumb works well in the vision and language domains. However, it is impractical to visually inspect the temporal structures in time series. The diversity of time series augmentations at both the dataset and instance levels makes it difficult to choose meaningful augmentations on the fly. Thus, although prevalent, contrastive learning with data augmentation has been less studied in the time series domain. In this study, we address this gap by analyzing time series data augmentation using information theory and summarizing the most commonly adopted augmentations in a unified format. We then propose a parametric augmentation method, AutoTCL, which can be adaptively employed to support time series representation learning. The proposed approach is encoder-agnostic, allowing it to be seamlessly integrated with different backbone encoders. Experiments on univariate forecasting tasks demonstrate the highly competitive results of our method, with an average 6.5\% reduction in MSE and 4.7\% in MAE over the leading baselines. In classification tasks, AutoTCL achieves a $1.2\%$ increase in average accuracy.

IJCAI Conference 2024 Conference Paper

Reconstructing Missing Variables for Multivariate Time Series Forecasting via Conditional Generative Flows

  • Xuanming Hu
  • Wei Fan
  • Haifeng Chen
  • Pengyang Wang
  • Yanjie Fu

The Variable Subset Forecasting (VSF) problem, where the majority of variables are unavailable in the inference stage of multivariate forecasting, has been an important but under-explored task with broad impacts in many real-world applications. Missing values, absent inter-correlation, and the impracticality of retraining largely hinder the ability of multivariate forecasting models to capture inherent relationships among variables, impacting their performance. However, existing approaches towards these issues either heavily rely on local temporal correlation or face limitations in fully recovering missing information from the unavailable subset, accompanied by notable computational expenses. To address these problems, we propose a novel density estimation solution to recover the information of missing variables via flows-based generative framework. In particular, a novel generative network for time series, namely Time-series Reconstruction Flows (TRF), is proposed to estimate and reconstruct the missing variable subset. In addition, a novel meta-training framework, Variable-Agnostic Meta Learning, has been developed to enhance the generalization ability of TRF, enabling it to adapt to diverse missing variables situations. Finally, extensive experiments are conducted to demonstrate the superiority of our proposed method compared with baseline methods.

IJCAI Conference 2024 Conference Paper

Towards Counterfactual Fairness-aware Domain Generalization in Changing Environments

  • Yujie Lin
  • Chen Zhao
  • Minglai Shao
  • Baoluo Meng
  • Xujiang Zhao
  • Haifeng Chen

Recognizing domain generalization as a commonplace challenge in machine learning, data distribution might progressively evolve across a continuum of sequential domains in practical scenarios. While current methodologies primarily concentrate on bolstering model effectiveness within these new domains, they tend to neglect issues of fairness throughout the learning process. In response, we propose an innovative framework known as Disentanglement for Counterfactual Fairness-aware Domain Generalization (DCFDG). This approach adeptly removes domain-specific information and sensitive information from the embedded representation of classification features. To scrutinize the intricate interplay between semantic information, domain-specific information, and sensitive attributes, we systematically partition the exogenous factors into four latent variables. By incorporating fairness regularization, we utilize semantic information exclusively for classification purposes. Empirical validation on synthetic and authentic datasets substantiates the efficacy of our approach, demonstrating elevated accuracy levels while ensuring the preservation of fairness amidst the evolving landscape of continuous domains.

ICLR Conference 2024 Conference Paper

Towards Robust Fidelity for Evaluating Explainability of Graph Neural Networks

  • Xu Zheng 0003
  • Farhad Shirani 0001
  • Tianchun Wang
  • Wei Cheng 0002
  • Zhuomin Chen
  • Haifeng Chen
  • Hua Wei 0001
  • Dongsheng Luo

Graph Neural Networks (GNNs) are neural models that leverage the dependency structure in graphical data via message passing among the graph nodes. GNNs have emerged as pivotal architectures in analyzing graph-structured data, and their expansive application in sensitive domains requires a comprehensive understanding of their decision-making processes --- necessitating a framework for GNN explainability. An explanation function for GNNs takes a pre-trained GNN along with a graph as input, to produce a `sufficient statistic' subgraph with respect to the graph label. A main challenge in studying GNN explainability is to provide fidelity measures that evaluate the performance of these explanation functions. This paper studies this foundational challenge, spotlighting the inherent limitations of prevailing fidelity metrics, including $Fid_+$, $Fid_-$, and $Fid_\Delta$. Specifically, a formal, information-theoretic definition of explainability is introduced and it is shown that existing metrics often fail to align with this definition across various statistical scenarios. The reason is due to potential distribution shifts when subgraphs are removed in computing these fidelity measures. Subsequently, a robust class of fidelity measures are introduced, and it is shown analytically that they are resilient to distribution shift issues and are applicable in a wide range of scenarios. Extensive empirical analysis on both synthetic and real datasets are provided to illustrate that the proposed metrics are more coherent with gold standard metrics.

NeurIPS Conference 2023 Conference Paper

Hierarchical Gaussian Mixture based Task Generative Model for Robust Meta-Learning

  • Yizhou Zhang
  • Jingchao Ni
  • Wei Cheng
  • Zhengzhang Chen
  • Liang Tong
  • Haifeng Chen
  • Yan Liu

Meta-learning enables quick adaptation of machine learning models to new tasks with limited data. While tasks could come from varying distributions in reality, most of the existing meta-learning methods consider both training and testing tasks as from the same uni-component distribution, overlooking two critical needs of a practical solution: (1) the various sources of tasks may compose a multi-component mixture distribution, and (2) novel tasks may come from a distribution that is unseen during meta-training. In this paper, we demonstrate these two challenges can be solved jointly by modeling the density of task instances. We develop a meta-training framework underlain by a novel Hierarchical Gaussian Mixture based Task Generative Model (HTGM). HTGM extends the widely used empirical process of sampling tasks to a theoretical model, which learns task embeddings, fits the mixture distribution of tasks, and enables density-based scoring of novel tasks. The framework is agnostic to the encoder and scales well with large backbone networks. The model parameters are learned end-to-end by maximum likelihood estimation via an Expectation-Maximization (EM) algorithm. Extensive experiments on benchmark datasets indicate the effectiveness of our method for both sample classification and novel task detection.

ICML Conference 2023 Conference Paper

Personalized Federated Learning under Mixture of Distributions

  • Yue Wu
  • Shuaicheng Zhang
  • Wenchao Yu
  • Yanchi Liu
  • Quanquan Gu
  • Dawei Zhou 0003
  • Haifeng Chen
  • Wei Cheng 0002

The recent trend towards Personalized Federated Learning (PFL) has garnered significant attention as it allows for the training of models that are tailored to each client while maintaining data privacy. However, current PFL techniques primarily focus on modeling the conditional distribution heterogeneity (i. e. concept shift), which can result in suboptimal performance when the distribution of input data across clients diverges (i. e. covariate shift). Additionally, these techniques often lack the ability to adapt to unseen data, further limiting their effectiveness in real-world scenarios. To address these limitations, we propose a novel approach, FedGMM, which utilizes Gaussian mixture models (GMM) to effectively fit the input data distributions across diverse clients. The model parameters are estimated by maximum likelihood estimation utilizing a federated Expectation-Maximization algorithm, which is solved in closed form and does not assume gradient similarity. Furthermore, FedGMM possesses an additional advantage of adapting to new clients with minimal overhead, and it also enables uncertainty quantification. Empirical evaluations on synthetic and benchmark datasets demonstrate the superior performance of our method in both PFL classification and novel sample detection.

AAAI Conference 2023 Conference Paper

Time Series Contrastive Learning with Information-Aware Augmentations

  • Dongsheng Luo
  • Wei Cheng
  • Yingheng Wang
  • Dongkuan Xu
  • Jingchao Ni
  • Wenchao Yu
  • Xuchao Zhang
  • Yanchi Liu

Various contrastive learning approaches have been proposed in recent years and achieve significant empirical success. While effective and prevalent, contrastive learning has been less explored for time series data. A key component of contrastive learning is to select appropriate augmentations imposing some priors to construct feasible positive samples, such that an encoder can be trained to learn robust and discriminative representations. Unlike image and language domains where "desired'' augmented samples can be generated with the rule of thumb guided by prefabricated human priors, the ad-hoc manual selection of time series augmentations is hindered by their diverse and human-unrecognizable temporal structures. How to find the desired augmentations of time series data that are meaningful for given contrastive learning tasks and datasets remains an open question. In this work, we address the problem by encouraging both high fidelity and variety based on information theory. A theoretical analysis leads to the criteria for selecting feasible data augmentations. On top of that, we propose a new contrastive learning approach with information-aware augmentations, InfoTS, that adaptively selects optimal augmentations for time series representation learning. Experiments on various datasets show highly competitive performance with up to a 12.0% reduction in MSE on forecasting tasks and up to 3.7% relative improvement in accuracy on classification tasks over the leading baselines.

ICLR Conference 2022 Conference Paper

Superclass-Conditional Gaussian Mixture Model For Learning Fine-Grained Embeddings

  • Jingchao Ni
  • Wei Cheng 0002
  • Zhengzhang Chen
  • Takayoshi Asakura
  • Tomoya Soma
  • Sho Kato
  • Haifeng Chen

Learning fine-grained embeddings is essential for extending the generalizability of models pre-trained on "coarse" labels (e.g., animals). It is crucial to fields for which fine-grained labeling (e.g., breeds of animals) is expensive, but fine-grained prediction is desirable, such as medicine. The dilemma necessitates adaptation of a "coarsely" pre-trained model to new tasks with a few "finer-grained" training labels. However, coarsely supervised pre-training tends to suppress intra-class variation, which is vital for cross-granularity adaptation. In this paper, we develop a training framework underlain by a novel superclass-conditional Gaussian mixture model (SCGM). SCGM imitates the generative process of samples from hierarchies of classes through latent variable modeling of the fine-grained subclasses. The framework is agnostic to the encoders and only adds a few distribution related parameters, thus is efficient, and flexible to different domains. The model parameters are learned end-to-end by maximum-likelihood estimation via a principled Expectation-Maximization algorithm. Extensive experiments on benchmark datasets and a real-life medical dataset indicate the effectiveness of our method.

AAAI Conference 2022 Conference Paper

Zero-Shot Cross-Lingual Machine Reading Comprehension via Inter-sentence Dependency Graph

  • Liyan Xu
  • Xuchao Zhang
  • Bo Zong
  • Yanchi Liu
  • Wei Cheng
  • Jingchao Ni
  • Haifeng Chen
  • Liang Zhao

We target the task of cross-lingual Machine Reading Comprehension (MRC) in the direct zero-shot setting, by incorporating syntactic features from Universal Dependencies (UD), and the key features we use are the syntactic relations within each sentence. While previous work has demonstrated effective syntax-guided MRC models, we propose to adopt the inter-sentence syntactic relations, in addition to the rudimentary intra-sentence relations, to further utilize the syntactic dependencies in the multi-sentence input of the MRC task. In our approach, we build the Inter-Sentence Dependency Graph (ISDG) connecting dependency trees to form global syntactic relations across sentences. We then propose the ISDG encoder that encodes the global dependency graph, addressing the inter-sentence relations via both one-hop and multi-hop dependency paths explicitly. Experiments on three multilingual MRC datasets (XQuAD, MLQA, TyDiQA-GoldP) show that our encoder that is only trained on English is able to improve the zero-shot performance on all 14 test sets covering 8 languages, with up to 3. 8 F1 / 5. 2 EM improvement on-average, and 5. 2 F1 / 11. 2 EM on certain languages. Further analysis shows the improvement can be attributed to the attention on the cross-linguistically consistent syntactic path. Our code is available at https: //github. com/lxucs/ multilingual-mrc-isdg.

IS Journal 2021 Journal Article

Anomalous Event Sequence Detection

  • Boxiang Dong
  • Zhengzhang Chen
  • Lu-An Tang
  • Haifeng Chen
  • Hui Wang
  • Kai Zhang
  • Ying Lin
  • Zhichun Li

Anomaly detection has been widely applied in modern data-driven security applications to detect abnormal events/entities that deviate from the majority. However, less work has been done in terms of detecting suspicious event sequences/paths, which are better discriminators than single events/entities for distinguishing normal and abnormal behaviors in complex systems such as cyber-physical systems. A key and challenging step in this endeavor is how to discover those abnormal event sequences from millions of system event records in an efficient and accurate way. To address this issue, we propose NINA, a network diffusion based algorithm for identifying anomalous event sequences. Experimental results on both static and streaming data show that NINA is efficient (processes about 2 million records per minute) and accurate.

AAAI Conference 2021 Conference Paper

Dynamic Gaussian Mixture based Deep Generative Model For Robust Forecasting on Sparse Multivariate Time Series

  • Yinjun Wu
  • Jingchao Ni
  • Wei Cheng
  • Bo Zong
  • Dongjin Song
  • Zhengzhang Chen
  • Yanchi Liu
  • Xuchao Zhang

Forecasting on sparse multivariate time series (MTS) aims to model the predictors of future values of time series given their incomplete past, which is important for many emerging applications. However, most existing methods process MTS’s individually, and do not leverage the dynamic distributions underlying the MTS’s, leading to sub-optimal results when the sparsity is high. To address this challenge, we propose a novel generative model, which tracks the transition of latent clusters, instead of isolated feature representations, to achieve robust modeling. It is characterized by a newly designed dynamic Gaussian mixture distribution, which captures the dynamics of clustering structures, and is used for emitting time series. The generative model is parameterized by neural networks. A structured inference network is also designed for enabling inductive analysis. A gating mechanism is further introduced to dynamically tune the Gaussian mixture distributions. Extensive experimental results on a variety of reallife datasets demonstrate the effectiveness of our method.

NeurIPS Conference 2021 Conference Paper

InfoGCL: Information-Aware Graph Contrastive Learning

  • Dongkuan Xu
  • Wei Cheng
  • Dongsheng Luo
  • Haifeng Chen
  • Xiang Zhang

Various graph contrastive learning models have been proposed to improve the performance of tasks on graph datasets in recent years. While effective and prevalent, these models are usually carefully customized. In particular, despite all recent work create two contrastive views, they differ in a variety of view augmentations, architectures, and objectives. It remains an open question how to build your graph contrastive learning model from scratch for particular graph tasks and datasets. In this work, we aim to fill this gap by studying how graph information is transformed and transferred during the contrastive learning process, and proposing an information-aware graph contrastive learning framework called InfoGCL. The key to the success of the proposed framework is to follow the Information Bottleneck principle to reduce the mutual information between contrastive parts while keeping task-relevant information intact at both the levels of the individual module and the entire framework so that the information loss during graph representation learning can be minimized. We show for the first time that all recent graph contrastive learning methods can be unified by our framework. Based on theoretical and empirical analysis on benchmark graph datasets, we show that InfoGCL achieves state-of-the-art performance in the settings of both graph classification and node classification tasks.

AAAI Conference 2021 Conference Paper

Multi-Task Recurrent Modular Networks

  • Dongkuan Xu
  • Wei Cheng
  • Xin Dong
  • Bo Zong
  • Wenchao Yu
  • Jingchao Ni
  • Dongjin Song
  • Xuchao Zhang

We consider the models of deep multi-task learning with recurrent architectures that exploit regularities across tasks to improve the performance of multiple sequence processing tasks jointly. Most existing architectures are painstakingly customized to learn task relationships for different problems, which is not flexible enough to model the dynamic task relationships and lacks generalization abilities to novel test-time scenarios. We propose multi-task recurrent modular networks (MT-RMN) that can be incorporated in any multi-task recurrent models to address the above drawbacks. MT-RMN consists of a shared encoder and multiple task-specific decoders, and recurrently operates over time. For better flexibility, it modularizes the encoder into multiple layers of sub-networks and dynamically controls the connection between these subnetworks and the decoders at different time steps, which provides the recurrent networks with varying degrees of parameter sharing for tasks with dynamic relatedness. For the generalization ability, MT-RMN aims to discover a set of generalizable sub-networks in the encoder that are assembled in different ways for different tasks. The policy networks augmented with the differentiable routers are utilized to make the binary connection decisions between the sub-networks. The experimental results on three multi-task sequence processing datasets consistently demonstrate the effectiveness of MT-RMN.

AAAI Conference 2021 Conference Paper

Transformer-Style Relational Reasoning with Dynamic Memory Updating for Temporal Network Modeling

  • Dongkuan Xu
  • Junjie Liang
  • Wei Cheng
  • Hua Wei
  • Haifeng Chen
  • Xiang Zhang

Network modeling aims to learn the latent representations of nodes such that the representations preserve both network structures and node attribute information. This problem is fundamental due to its prevalence in numerous domains. However, existing approaches either target the static networks or struggle to capture the complicated temporal dependency, while most real-world networks evolve over time and the success of network modeling hinges on the understanding of how entities are temporally connected. In this paper, we present TRRN, a transformer-style relational reasoning network with dynamic memory updating, to deal with the above challenges. TRRN employs multi-head self-attention to reason over a set of memories, which provides a multitude of shortcut paths for information to flow from past observations to the current latent representations. By utilizing the policy networks augmented with differentiable binary routers, TRRN estimates the possibility of each memory being activated and dynamically updates the memories at the time steps when they are most relevant. We evaluate TRRN with the tasks of node classification and link prediction on four real temporal network datasets. Experimental results demonstrate the consistent performance gains for TRRN over the leading competitors.

AAAI Conference 2020 Conference Paper

Asymmetrical Hierarchical Networks with Attentive Interactions for Interpretable Review-Based Recommendation

  • Xin Dong
  • Jingchao Ni
  • Wei Cheng
  • Zhengzhang Chen
  • Bo Zong
  • Dongjin Song
  • Yanchi Liu
  • Haifeng Chen

Recently, recommender systems have been able to emit substantially improved recommendations by leveraging userprovided reviews. Existing methods typically merge all reviews of a given user (item) into a long document, and then process user and item documents in the same manner. In practice, however, these two sets of reviews are notably different: users’ reviews reflect a variety of items that they have bought and are hence very heterogeneous in their topics, while an item’s reviews pertain only to that single item and are thus topically homogeneous. In this work, we develop a novel neural network model that properly accounts for this important difference by means of asymmetric attentive modules. The user module learns to attend to only those signals that are relevant with respect to the target item, whereas the item module learns to extract the most salient contents with regard to properties of the item. Our multi-hierarchical paradigm accounts for the fact that neither are all reviews equally useful, nor are all sentences within each review equally pertinent. Extensive experimental results on a variety of real datasets demonstrate the effectiveness of our method.

AAAI Conference 2020 Conference Paper

Deep Unsupervised Binary Coding Networks for Multivariate Time Series Retrieval

  • Dixian Zhu
  • Dongjin Song
  • Yuncong Chen
  • Cristian Lumezanu
  • Wei Cheng
  • Bo Zong
  • Jingchao Ni
  • Takehiko Mizoguchi

Multivariate time series data are becoming increasingly ubiquitous in varies real-world applications such as smart city, power plant monitoring, wearable devices, etc. Given the current time series segment, how to retrieve similar segments within the historical data in an efficient and effective manner is becoming increasingly important. As it can facilitate underlying applications such as system status identification, anomaly detection, etc. Despite the fact that various binary coding techniques can be applied to this task, few of them are specially designed for multivariate time series data in an unsupervised setting. To this end, we present Deep Unsupervised Binary Coding Networks (DUBCNs) to perform multivariate time series retrieval. DUBCNs employ the Long Short-Term Memory (LSTM) encoder-decoder framework to capture the temporal dynamics within the input segment and consist of three key components, i. e. , a temporal encoding mechanism to capture the temporal order of different segments within a mini-batch, a clustering loss on the hidden feature space to capture the hidden feature structure, and an adversarial loss based upon Generative Adversarial Networks (GANs) to enhance the generalization capability of the generated binary codes. Thoroughly empirical studies on three public datasets demonstrated that the proposed DUBCNs can outperform state-of-the-art unsupervised binary coding techniques.

ICLR Conference 2020 Conference Paper

Inductive and Unsupervised Representation Learning on Graph Structured Objects

  • Lichen Wang
  • Bo Zong
  • Qianqian Ma
  • Wei Cheng 0002
  • Jingchao Ni
  • Wenchao Yu
  • Yanchi Liu
  • Dongjin Song

Inductive and unsupervised graph learning is a critical technique for predictive or information retrieval tasks where label information is difficult to obtain. It is also challenging to make graph learning inductive and unsupervised at the same time, as learning processes guided by reconstruction error based loss functions inevitably demand graph similarity evaluation that is usually computationally intractable. In this paper, we propose a general framework SEED (Sampling, Encoding, and Embedding Distributions) for inductive and unsupervised representation learning on graph structured objects. Instead of directly dealing with the computational challenges raised by graph similarity evaluation, given an input graph, the SEED framework samples a number of subgraphs whose reconstruction errors could be efficiently evaluated, encodes the subgraph samples into a collection of subgraph vectors, and employs the embedding of the subgraph vector distribution as the output vector representation for the input graph. By theoretical analysis, we demonstrate the close connection between SEED and graph isomorphism. Using public benchmark datasets, our empirical study suggests the proposed SEED framework is able to achieve up to 10% improvement, compared with competitive baseline methods.

NeurIPS Conference 2020 Conference Paper

Parameterized Explainer for Graph Neural Network

  • Dongsheng Luo
  • Wei Cheng
  • Dongkuan Xu
  • Wenchao Yu
  • Bo Zong
  • Haifeng Chen
  • Xiang Zhang

Despite recent progress in Graph Neural Networks (GNNs), explaining predictions made by GNNs remains a challenging open problem. The leading method mainly addresses the local explanations (i. e. , important subgraph structure and node features) to interpret why a GNN model makes the prediction for a single instance, e. g. a node or a graph. As a result, the explanation generated is painstakingly customized for each instance. The unique explanation interpreting each instance independently is not sufficient to provide a global understanding of the learned GNN model, leading to the lack of generalizability and hindering it from being used in the inductive setting. Besides, as it is designed for explaining a single instance, it is challenging to explain a set of instances naturally (e. g. , graphs of a given class). In this study, we address these key challenges and propose PGExplainer, a parameterized explainer for GNNs. PGExplainer adopts a deep neural network to parameterize the generation process of explanations, which enables PGExplainer a natural approach to multi-instance explanations. Compared to the existing work, PGExplainer has a better generalization power and can be utilized in an inductive setting easily. Experiments on both synthetic and real-life datasets show highly competitive performance with up to 24. 7\% relative improvement in AUC on explaining graph classification over the leading baseline.

ICML Conference 2020 Conference Paper

Robust Graph Representation Learning via Neural Sparsification

  • Cheng Zheng 0004
  • Bo Zong
  • Wei Cheng 0002
  • Dongjin Song
  • Jingchao Ni
  • Wenchao Yu
  • Haifeng Chen
  • Wei Wang 0010

Graph representation learning serves as the core of important prediction tasks, ranging from product recommendation to fraud detection. Real-life graphs usually have complex information in the local neighborhood, where each node is described by a rich set of features and connects to dozens or even hundreds of neighbors. Despite the success of neighborhood aggregation in graph neural networks, task-irrelevant information is mixed into nodes’ neighborhood, making learned models suffer from sub-optimal generalization performance. In this paper, we present NeuralSparse, a supervised graph sparsification technique that improves generalization power by learning to remove potentially task-irrelevant edges from input graphs. Our method takes both structural and non-structural information as input, utilizes deep neural networks to parameterize sparsification processes, and optimizes the parameters by feedback signals from downstream tasks. Under the NeuralSparse framework, supervised graph sparsification could seamlessly connect with existing graph neural networks for more robust performance. Experimental results on both benchmark and private datasets show that NeuralSparse can yield up to 7. 2% improvement in testing accuracy when working with existing graph neural networks on node classification tasks.

AAAI Conference 2020 Conference Paper

Tensorized LSTM with Adaptive Shared Memory for Learning Trends in Multivariate Time Series

  • Dongkuan Xu
  • Wei Cheng
  • Bo Zong
  • Dongjin Song
  • Jingchao Ni
  • Wenchao Yu
  • Yanchi Liu
  • Haifeng Chen

The problem of learning and forecasting underlying trends in time series data arises in a variety of applications, such as traffic management, energy optimization, etc. In literature, a trend in time series is characterized by the slope and duration, and its prediction is then to forecast the two values of the subsequent trend given historical data of the time series. For this problem, existing approaches mainly deal with the case in univariate time series. However, in many real-world applications, there are multiple variables at play, and handling all of them at the same time is crucial for an accurate prediction. A natural way is to employ multi-task learning (MTL) techniques in which the trend learning of each time series is treated as a task. The key point of MTL is to learn task relatedness to achieve better parameter sharing, which however is challenging in trend prediction task. First, effectively modeling the complex temporal patterns in different tasks is hard as the temporal and spatial dimensions are entangled. Second, the relatedness among tasks may change over time. In this paper, we propose a neural network, DeepTrends, for multivariate time series trend prediction. The core module of Deep- Trends is a tensorized LSTM with adaptive shared memory (TLASM). TLASM employs the tensorized LSTM to model the temporal patterns of long-term trend sequences in an MTL setting. With an adaptive shared memory, TLASM is able to learn the relatedness among tasks adaptively, based upon which it can dynamically vary degrees of parameter sharing among tasks. To further consider short-term patterns, Deep- Trends utilizes a multi-task 1dCNN to learn the local time series features, and employs a task-specific sub-network to learn a mixture of long-term and short-term patterns for trend prediction. Extensive experiments on real datasets demonstrate the effectiveness of the proposed model.

AAAI Conference 2019 Conference Paper

A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data

  • Chuxu Zhang
  • Dongjin Song
  • Yuncong Chen
  • Xinyang Feng
  • Cristian Lumezanu
  • Wei Cheng
  • Jingchao Ni
  • Bo Zong

Nowadays, multivariate time series data are increasingly collected in various real world systems, e. g. , power plants, wearable devices, etc. Anomaly detection and diagnosis in multivariate time series refer to identifying abnormal status in certain time steps and pinpointing the root causes. Building such a system, however, is challenging since it not only requires to capture the temporal dependency in each time series, but also need encode the inter-correlations between different pairs of time series. In addition, the system should be robust to noise and provide operators with different levels of anomaly scores based upon the severity of different incidents. Despite the fact that a number of unsupervised anomaly detection algorithms have been developed, few of them can jointly address these challenges. In this paper, we propose a Multi-Scale Convolutional Recurrent Encoder-Decoder (MSCRED), to perform anomaly detection and diagnosis in multivariate time series data. Specifically, MSCRED first constructs multi-scale (resolution) signature matrices to characterize multiple levels of the system statuses in different time steps. Subsequently, given the signature matrices, a convolutional encoder is employed to encode the inter-sensor (time series) correlations and an attention based Convolutional Long-Short Term Memory (ConvLSTM) network is developed to capture the temporal patterns. Finally, based upon the feature maps which encode the inter-sensor correlations and temporal information, a convolutional decoder is used to reconstruct the input signature matrices and the residual signature matrices are further utilized to detect and diagnose anomalies. Extensive empirical studies based on a synthetic dataset and a real power plant dataset demonstrate that MSCRED can outperform state-ofthe-art baseline methods.

IJCAI Conference 2019 Conference Paper

Heterogeneous Graph Matching Networks for Unknown Malware Detection

  • SHEN WANG
  • Zhengzhang Chen
  • Xiao Yu
  • Ding Li
  • Jingchao Ni
  • Lu-An Tang
  • Jiaping Gui
  • Zhichun Li

Information systems have widely been the target of malware attacks. Traditional signature-based malicious program detection algorithms can only detect known malware and are prone to evasion techniques such as binary obfuscation, while behavior-based approaches highly rely on the malware training samples and incur prohibitively high training cost. To address the limitations of existing techniques, we propose MatchGNet, a heterogeneous Graph Matching Network model to learn the graph representation and similarity metric simultaneously based on the invariant graph modeling of the program's execution behaviors. We conduct a systematic evaluation of our model and show that it is accurate in detecting malicious program behavior and can help detect malware attacks with less false positives. MatchGNet outperforms the state-of-the-art algorithms in malware detection by generating 50% less false positives while keeping zero false negatives.

IJCAI Conference 2018 Conference Paper

Exploiting Graph Regularized Multi-dimensional Hawkes Processes for Modeling Events with Spatio-temporal Characteristics

  • Yanchi Liu
  • Tan Yan
  • Haifeng Chen

Multi-dimensional Hawkes processes (MHP) has been widely used for modeling temporal events. However, when MHP was used for modeling events with spatio-temporal characteristics, the spatial information was often ignored despite its importance. In this paper, we introduce a framework to exploit MHP for modeling spatio-temporal events by considering both temporal and spatial information. Specifically, we design a graph regularization method to effectively integrate the prior spatial structure into MHP for learning influence matrix between different locations. Indeed, the prior spatial structure can be first represented as a connection graph. Then, a multi-view method is utilized for the alignment of the prior connection graph and influence matrix while preserving the sparsity and low-rank properties of the kernel matrix. Moreover, we develop an optimization scheme using an alternating direction method of multipliers to solve the resulting optimization problem. Finally, the experimental results show that we are able to learn the interaction patterns between different geographical areas more effectively with prior connection graph introduced for regularization.

IJCAI Conference 2017 Conference Paper

A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction

  • Yao Qin
  • Dongjin Song
  • Haifeng Chen
  • Wei Cheng
  • Guofei Jiang
  • Garrison W. Cottrell

The Nonlinear autoregressive exogenous (NARX) model, which predicts the current value of a time series based upon its previous values as well as the current and past values of multiple driving (exogenous) series, has been studied for decades. Despite the fact that various NARX models have been developed, few of them can capture the long-term temporal dependencies appropriately and select the relevant driving series to make predictions. In this paper, we propose a dual-stage attention-based recurrent neural network (DA-RNN) to address these two issues. In the first stage, we introduce an input attention mechanism to adaptively extract relevant driving series (a. k. a. , input features) at each time step by referring to the previous encoder hidden state. In the second stage, we use a temporal attention mechanism to select relevant encoder hidden states across all time steps. With this dual-stage attention scheme, our model can not only make predictions effectively, but can also be easily interpreted. Thorough empirical studies based upon the SML 2010 dataset and the NASDAQ 100 Stock dataset demonstrate that the DA-RNN can outperform state-of-the-art methods for time series prediction.

IJCAI Conference 2017 Conference Paper

Link Prediction with Spatial and Temporal Consistency in Dynamic Networks

  • Wenchao Yu
  • Wei Cheng
  • Charu C Aggarwal
  • Haifeng Chen
  • Wei Wang

Dynamic networks are ubiquitous. Link prediction in dynamic networks has attracted tremendous research interests. Many models have been developed to predict links that may emerge in the immediate future from the past evolution of the networks. There are two key factors: 1) a node is more likely to form a link in the near future with another node within its close proximity, rather than with a random node; 2) a dynamic network usually evolves smoothly. Existing approaches seldom unify these two factors to strive for the spatial and temporal consistency in a dynamic network. To address this limitation, in this paper, we propose a link prediction model with spatial and temporal consistency (LIST), to predict links in a sequence of networks over time. LIST characterizes the network dynamics as a function of time, which integrates the spatial topology of network at each timestamp and the temporal network evolution. Comparing to existing approaches, LIST has two advantages: 1) LIST uses a generic model to express the network structure as a function of time, which makes it also suitable for a wide variety of temporal network analysis problems beyond the focus of this paper; 2) by retaining the spatial and temporal consistency, LIST yields better prediction performance. Extensive experiments on four real datasets demonstrate the effectiveness of the LIST model.