Arrow Research search

Author name cluster

Yan Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

90 papers
2 author rows

Possible papers

90

AAAI Conference 2026 Conference Paper

From Pixels to Logic: A Perception-Reasoning Decomposition Framework for Open-World Referring Expression Comprehension

  • Lihong Huang
  • Sheng-hua Zhong
  • Zhi Zhang
  • Yan Liu

Recent advances in Referring Expression Comprehension (REC) have been largely driven by supervised learning on curated datasets, where each expression is assumed to refer to exactly one known object. However, such assumptions rarely hold in real-world scenarios, where expressions can refer to multiple objects, fail to refer to any, or involve novel categories and complex semantics. These challenges define the task of open-world REC, which demands robust generalization and structured reasoning beyond the scope of traditional REC methods. In this work, we introduce a novel, training-free framework that decouples visual perception from linguistic reasoning to address open-world REC. Our method first transforms the visual scene into a rich textual representation using an open-vocabulary multimodal perception module. It then employs a reasoning language model to interpret the referring expression and perform explicit logical inference over the perceived scene, enabling transparent decision-making and strong generalization in open-world scenarios. Experiments on three standard REC benchmarks as well as two more challenging ones, gRefCOCO and D³, demonstrate that our framework achieves highly competitive zero-shot performance, often surpassing supervised baselines.

AAAI Conference 2026 Conference Paper

Is Symbolic Music a Specific Language? Exploring Inspiration-to-Structure Machine Composition via LLMs

  • Zhejing Hu
  • Yan Liu
  • Zhi Zhang
  • Aiwei Zhang
  • Sheng-hua Zhong
  • Bruce X.B. Yu
  • Gong Chen

Large Language Models (LLMs) have demonstrated remarkable proficiency in diverse tasks. This success raises a fundamental question in machine composition: Can symbolic music be considered a special form of language that can be jointly modeled with natural language for composition tasks? Recent studies validate that symbolic music can be modeled as a human language, yet composing structured music from partial symbolic inputs through natural language interaction remains underexplored. Even LLMs struggle to generate structurally coherent compositions in such hybrid input-output scenarios, highlighting a fundamental gap that calls for a domain-specific learning paradigm. To this end, we propose Inspiration-to-Structure (IoS), a cognitively inspired framework that enables LLMs to generate structured musical sections from melodic ideas. IoS employs a three-phase process—semantic, structural, and collaborative cognition—and is supported by two key components: (1) a new dataset and construction protocol called Structured Triplet Data (STD), and (2) a training method, Dual-Instance Structural Contrastive Optimization (DiSCO), designed to enhance structural awareness. Experiments show that IoS improves structural coherence by 47.8% and artistic creativity by 21.8% compared to conventional language modeling paradigm, supervised fine-tuning, and even enables smaller LLMs to surpass larger LLMs. These results suggest that symbolic music, while language-like, demands specialized modeling beyond standard language modeling paradigms. IoS enables LLMs to transform music theory knowledge into structured composition, empowering users to compose music interactively via language and advancing toward general creative AI.

JBHI Journal 2026 Journal Article

Mutual Generation for Cross-Domain Challenge in Stroke Patients' Motor Imagery Classification and Functional Recovery Prediction

  • Rongrong Lu
  • Wenchang Deng
  • Tianhao Gao
  • Songhua Huang
  • Zhi Zhang
  • Yan Liu
  • Sheng-hua Zhong

The accumulating body of research indicates that Motor Imagery (MI)-BCIs have the potential to enhance the quality of life for individuals with disabilities and to advance our understanding of brain function and rehabilitation strategies. Among these diseases, stroke is the leading cause of long-term motor disability across the globe, thereby underscoring the need for innovative rehabilitation strategies, such as MI-BCI technologies. In contrast with these expectations, the majority of existing research is built upon data obtained from healthy subjects. The construction of effective classification models for Motor Imagery tasks in patients with brain diseases, particularly stroke, remains a significant challenge. The lateralization of the left and right hemispheres is more pronounced in patients who have suffered a stroke than in healthy individuals. Moreover, the specific locations of lesions and the regions of influence result in significant variations in the electroencephalogram (EEG) data of patients with different hemiplegic sides. This paper explores the potential of generative models in addressing the issue of domain differences arising from different hemiplegic sides EEG data. Furthermore, this paper circumvents the potential adverse effects of rigorous optimization of low-quality samples on model performance through the utilization of label softening algorithm. Two MI-EEG datasets of stroke patients performing Motor Imagery tasks are used to validate our method. In comparison to both classical machine learning methods and those state-of-the-art models for MI classification, the classification model in this paper achieves a noticeable performance improvement in different data partitioning strategies, including subject-dependent and subject-independent scenarios. Each sub-module, and each designed loss function, contributes to the final performance growth. In addition, this paper also investigates the potential of the proposed framework for predicting a patient's level of functional recovery. Our findings indicate that the addition of a prediction layer to the proposed model enables the accurate prediction of functional recovery level in stroke patients.

AAAI Conference 2026 Conference Paper

RAGFort: Dual-Path Defense Against Proprietary Knowledge Base Extraction in Retrieval-Augmented Generation

  • Qinfeng Li
  • Miao Pan
  • Ke Xiong
  • Ge Su
  • Zhiqiang Shen
  • Yan Liu
  • Sun Bing
  • Hao Peng

Retrieval-Augmented Generation (RAG) systems deployed over proprietary knowledge bases face growing threats from reconstruction attacks that aggregate model responses to replicate knowledge bases. Such attacks exploit both intra-class and inter-class paths—progressively extracting fine-grained knowledge within topics and diffusing it across semantically related ones, thereby enabling comprehensive extraction of the original knowledge base. However, existing defenses target only one path, leaving the other unprotected. We conduct a systematic exploration to assess the impact of protecting each path independently and find that joint protection is essential for effective defense. Based on this, we propose RAGFort, a structure-aware dual-module defense combining contrastive reindexing for inter-class isolation and constrained cascade generation for intra-class protection. Experiments across security, performance, and robustness confirm that RAGFort significantly reduces reconstruction success while preserving answer quality, offering the first comprehensive defense against knowledge base extraction attacks.

EAAI Journal 2026 Journal Article

Remaining useful life prediction of rotating machine via long short-term memory network with uncertainty quantification

  • Jialong He
  • Zhenbiao Ma
  • Yan Liu
  • Zhaojun Yang

Remaining useful life (RUL) prediction of rotating machinery is critical for intelligent maintenance and ensuring equipment reliability. However, existing methods often struggle to capture the long-term degradation trends and fail to adequately quantify the uncertainty in the predictions. To address these challenges, this paper proposes a novel RUL prediction method based on a long short-term memory-Wiener process-Bayesian optimization (LSTM-WP-Bo) degradation model. First, based on the Wiener process (WP), a long short-term memory (LSTM) network is used to model the drift function of the degradation process. Secondly, based on the concept of first hitting time (FHT), an approximate expression for the probability density function (PDF) of RUL is derived, while the uncertainty of the prediction is quantified. Lastly, the drift and diffusion coefficients are estimated using maximum likelihood estimation (MLE), and the LSTM network's hyperparameters are optimized through Bayesian optimization (Bo). The proposed method is analyzed comparatively on three datasets. For example, after validation on the servo turret power head system degradation dataset, the proposed method achieves a root mean square error (RMSE) of 15. 55 and a mean absolute percentage error (MAPE) of 12. 91 %, demonstrating significant improvements in prediction accuracy and robustness when compared to existing methods.

EAAI Journal 2026 Journal Article

Robot welding trajectory planning for branch-pipes “clustered” intersecting structure based on metaheuristic algorithms

  • Yan Liu
  • Qiu Tang

The multi-pipe intersecting structure is widely used in fire fighting, construction, and other industries. Due to the high complexity of weld seam, its welding is mainly based on manual work. To improve the welding efficiency, a new method of robot welding trajectory planning for multi-pipe intersecting structure based on metaheuristic algorithms is proposed by establishing the ideal intersection model. First, this paper establishes the mathematical model of branch pipes “clustered” intersecting structure through investigation, and proposes the calculation method of intersecting curve expression based on the improved proportional-integral-derivative based search algorithm. Second, the multi-pipe intersecting curve is spatial segmented and discontinuous. Taking into account factors such as path distance, attitude mutation, and trajectory smoothness, this paper proposes a path optimization framework for robot welding based on metaheuristic algorithms. Third, to guarantee the seamless transition of the robot trajectory and attitude among sub-paths, a model-driven robot welding trajectory planning approach is presented in this paper. Moreover, the arc transition and unit quaternion interpolation algorithms are integrated to prevent the abrupt change of robot welding attitude. Finally, the experiments are designed to verify the feasibility and accuracy of aforementioned approach.

JBHI Journal 2026 Journal Article

Structure-Constrained Regression Network for Efficient and Topology-Guaranteed Retinal Layer Segmentation in OCT Images

  • Yi-Peng Liu
  • Zhanqing Li
  • Junhao Qu
  • Yilong Zhang
  • Yan Liu

Retinal layer segmentation in optical coherence tomography (OCT) images enables the quantification of retinal morphology, which is critical for diagnosing and monitoring ophthalmic diseases. However, interference factors such as speckle noise, intensity variations, and pathological abnormalities hinder efficient and topology-guided layer segmentation. As topology-guaranteed layers can be efficiently derived from structured layer boundaries, this study presents a structure-constrained regression network (SCRNet): a lightweight, end-to-end deep network that leverages structural priors inherent in OCT retinal layers to segment these boundaries efficiently. As the retinal structure is robust to interference factors and critical for establishing structured layer boundaries, SCRNet introduces a lightweight, two-stream architecture to capture structural information, with one stream targeting the layer topology and the other targeting boundary continuity. Each stream incorporates a tailored structural feature module (SFM) and a structure-constrained loss (SCL) to extract structural information effectively from a global perspective. Then, a structure-constrained regression module (SCRM) integrates the complementary structural information from both streams to enable structured boundary regression, enhancing accuracy and robustness. Extensive experiments on two publicly available benchmark datasets demonstrate that SCRNet achieves state-of-the-art performance in segmenting structured layer boundaries and topology-guaranteed layers while maintaining high efficiency. The source code will be publicly available at https://github.com/cyan323/SCRNet.

TMLR Journal 2026 Journal Article

TS-Reasoner: Domain-Oriented Time Series Inference Agents for Reasoning and Automated Analysis

  • Wen Ye
  • Wei Yang
  • Defu Cao
  • Yizhou Zhang
  • Lumingyuan Tang
  • Jie Cai
  • Yan Liu

Time series analysis is crucial in real-world applications, yet traditional methods focus on isolated tasks only, and recent studies on time series reasoning remain limited to either single-step inference or are constrained to natural language answers. In this work, we introduce TS-Reasoner, a domain-specialized agent designed for multi-step time series inference. By integrating large language model (LLM) reasoning with domain- specific computational tools and error feedback loop, TS-Reasoner enables domain-informed, constraint-aware analytical workflows that combine symbolic reasoning with precise numerical analysis. We assess the system’s capabilities along two axes: 1) fundamental time series understanding assessed by TimeSeriesExam and 2) complex, multi-step inference, evaluated by a newly proposed dataset designed to test both compositional reasoning and computational precision in time series analysis. Experiments show that our approach outperforms standalone general-purpose LLMs in both basic time series concept understanding as well as the multi-step time series inference task, highlighting the promise of domain-specialized agents for automating real-world time series reasoning and analysis.

EAAI Journal 2025 Journal Article

A dual-branch convolutional neural network with domain-informed attention for arrhythmia classification of 12-lead electrocardiograms

  • Rucheng Jiang
  • Bin Fu
  • Renfa Li
  • Rui Li
  • Danny Z. Chen
  • Yan Liu
  • Guoqi Xie
  • Keqin Li

The automatic classification of arrhythmia is an important task in the intelligent auxiliary diagnosis of an electrocardiogram. Its efficiency and accuracy are vital for practical deployment and applications in the medical field. For the 12-lead electrocardiogram, we know that the comprehensive utilization of lead characteristics is key to enhancing diagnostic accuracy. However, existing classification methods (1) neglect the similarities and differences between the limb lead group and the precordial lead group; (2) the commonly adopted attention mechanisms struggle to capture the domain characteristics in an electrocardiogram. To address these issues, we propose a new dual-branch convolutional neural network with domain-informed attention, which is novel in two ways. First, it adopts a dual-branch network to extract intra-group similarities and inter-group differences of limb and precordial leads. Second, it proposes a domain-informed attention mechanism to embed the critical domain knowledge of electrocardiogram, multiple RR (R wave to R wave) intervals, into coordinated attention to adaptively assign attention weights to key segments, thereby effectively capturing the characteristics of the electrocardiogram domain. Experimental results show that our method achieves an F1-score of 0. 905 and a macro area under the curve of 0. 936 on two widely used large-scale datasets, respectively. Compared to state-of-the-art methods, our method shows significant performance improvements with a drastic reduction in model parameters.

AAAI Conference 2025 Conference Paper

Alleviating Shifted Distribution in Human Preference Alignment through Meta-Learning

  • Shihan Dou
  • Yan Liu
  • Enyu Zhou
  • Songyang Gao
  • Tianlong Li
  • Limao Xiong
  • Xin Zhao
  • Haoxiang Jia

The capability of the reward model (RM) is crucial for the success of Reinforcement Learning from Human Feedback (RLHF) in aligning with human preferences. However, as training progresses, the output space distribution of the policy model shifts. The RM, initially trained on responses sampled from the output distribution of the early policy model, gradually loses its ability to distinguish between responses from the newly shifted distribution. This issue is further compounded when the RM, trained on a specific data distribution, struggles to generalize to examples outside of that distribution. These two issues can be united as a challenge posed by the shifted distribution of the environment. To surmount this challenge, we introduce MetaRM, a novel method leveraging meta-learning to adapt the RM to the shifted environment distribution. MetaRM optimizes the RM in an alternating way, by preserving both the preferences of the original preference pairs, as well as maximizing discrimination power over new examples of the shifted distribution. Extensive experiments demonstrate that MetaRM can iteratively enhance the performance of human preference alignment by improving the RM's capacity to identify subtle differences in samples of shifted distributions.

IJCAI Conference 2025 Conference Paper

CompLex: Music Theory Lexicon Constructed by Autonomous Agents for Automatic Music Generation

  • Zhejing Hu
  • Yan Liu
  • Gong Chen
  • Bruce X. B. Yu

Generative artificial intelligence in music has made significant strides, yet it still falls short of the substantial achievements seen in natural language processing, primarily due to the limited availability of music data. Knowledge-informed approaches have been shown to enhance the performance of music generation models, even when only a few pieces of musical knowledge are integrated. This paper seeks to leverage comprehensive music theory in AI-driven music generation tasks, such as algorithmic composition and style transfer, which traditionally require significant manual effort with existing techniques. We introduce a novel automatic music lexicon construction model that generates a lexicon, named CompLex, comprising 37, 432 items derived from just 9 manually input category keywords and 5 sentence prompt templates. A new multi-agent algorithm is proposed to automatically detect and mitigate hallucinations. CompLex demonstrates impressive performance improvements across three state-of-the-art text-to-music generation models, encompassing both symbolic and audio-based methods. Furthermore, we evaluate CompLex in terms of completeness, accuracy, non-redundancy, and executability, confirming that it possesses the key characteristics of an effective lexicon.

AAAI Conference 2025 Conference Paper

Compose with Me: Collaborative Music Inpainter for Symbolic Music Infilling

  • Zhejing Hu
  • Yan Liu
  • Gong Chen
  • Bruce X.B. Yu

The field of music generation has seen a surge of interest from both academia and industry, with innovative platforms such as Suno, Udio, and SkyMusic earning widespread recognition. However, the challenge of music infilling—modifying specific music segments without reconstructing the entire piece—remains a significant hurdle for both audio-based and symbolic-based models, limiting their adaptability and practicality. In this paper, we address symbolic music infilling by introducing the Collaborative Music Inpainter (CMI), an advanced human-in-the-loop (HITL) model for music infilling. The CMI features the Joint Embedding Predictive Autoregressive Generative Architecture (JEP-AGA), which learns the high-level predictive representations of the masked part that needs to be infilled during the autoregressive generative process, akin to how humans perceive and interpret music. The newly developed Dynamic Interaction Learner (DIL) achieves HITL by iteratively refining the infilled output based on user interactions alone, significantly reducing the interaction cost without requiring further input. Experimental results confirm CMI’s superior performance in music infilling, demonstrating its efficiency in producing high-quality music.

IJCAI Conference 2025 Conference Paper

EDyGS: Event Enhanced Dynamic 3D Radiance Fields from Blurry Monocular Video

  • Mengxu Lu
  • Zehao Chen
  • Yan Liu
  • De Ma
  • Huajin Tang
  • Qian Zheng
  • Gang Pan

The task of generating novel views in dynamic scenes plays a critical role in the 3D vision domain. Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS) have shown great promise in this domain but struggle with motion blur, which often arises in real-world scenarios due to camera or object motion. Existing methods address camera motion blur but fall short in dynamic scenes, where the coupling of camera and object motion complicates multi-view consistency and temporal coherence. In this work, we propose EDyGS, a model designed to reconstruct sharp novel views from event streams and monocular videos of dynamic scenes with motion blur. Our approach introduces a motion-mask 3D Gaussian model that assigns each Gaussian an additional attribute to distinguish between static and dynamic regions. By leveraging this motion mask field, we separate and optimize the static and dynamic regions independently. A progressive learning strategy is adopted, where static regions are reconstructed by jointly optimizing camera poses and learnable 3D Gaussians, while dynamic regions are modeled using an implicit deformation field alongside learnable 3D Gaussians. We conduct both quantitative and qualitative experiments on synthetic and real-world data. Experimental results demonstrate that EDyGS effectively handles blurry inputs in dynamic scenes.

NeurIPS Conference 2025 Conference Paper

EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving

  • Shihan Dou
  • Ming Zhang
  • Chenhao Huang
  • Jiayi Chen
  • Feng Chen
  • Shichun Liu
  • Yan Liu
  • Chenxiao Liu

We introduce EvaLearn, a pioneering benchmark designed to evaluate large language models (LLMs) on their learning capability and efficiency in challenging tasks, a critical, yet underexplored aspect of model potential. EvaLearn contains 648 challenging problems across six task types, grouped into 182 sequences, each sequence dedicated to one task type. Diverging from most existing benchmarks that evaluate models in parallel, EvaLearn requires models to solve problems sequentially, allowing them to leverage the experience gained from previous solutions. EvaLearn provides five comprehensive automated metrics to evaluate models and quantify their learning capability and efficiency. We extensively benchmark nine frontier models and observe varied performance profiles: some models, such as Claude-3. 7-sonnet, start with moderate initial performance but exhibit strong learning ability, while some models struggle to benefit from experience and may even show negative transfer. Moreover, we investigate model performance under two learning settings and find that instance-level rubrics and teacher-model feedback further facilitate model learning. Importantly, we observe that current LLMs with stronger static abilities do not show a clear advantage in learning capability across all tasks, highlighting that EvaLearn evaluates a new dimension of model performance. We hope EvaLearn provides a novel evaluation perspective for assessing LLM potential and understanding the gap between models and human capabilities, promoting the development of deeper and more dynamic evaluation approaches. All datasets, the automatic evaluation framework, and the results studied in this paper are available in the supplementary materials.

JBHI Journal 2025 Journal Article

Identification of Protein-Nucleotide Binding Residues With Deep Multi-Task and Multi-Scale Learning

  • Jiashun Wu
  • Fang Ge
  • Shanruo Xu
  • Yan Liu
  • Jiangning Song
  • Dong-Jun Yu

Accurate identification of protein-nucleotide binding residues is essential for protein functional annotation and drug discovery. Advancements in computational methods for predicting binding residues from protein sequences have significantly improved predictive accuracy. However, it remains a challenge for current methodologies to extract discriminative features and assimilate heterogeneous data from different nucleotide binding residues. To address this, we introduce NucMoMTL, a novel predictor specifically designed for identifying protein-nucleotide binding residues. Specifically, NucMoMTL leverages a pre-trained language model for robust sequence embedding and utilizes deep multi-task and multi-scale learning within parameter-based orthogonal constraints to extract shared representations, capitalizing on auxiliary information from diverse nucleotides binding residues. Evaluation of NucMoMTL on the benchmark datasets demonstrates that it outperforms state-of-the-art methods, achieving an average AUROC and AUPRC of 0. 961 and 0. 566, respectively. NucMoMTL can be explored as a reliable computational tool for identifying protein-nucleotide binding residues and facilitating drug discovery.

ICLR Conference 2025 Conference Paper

InstaSHAP: Interpretable Additive Models Explain Shapley Values Instantly

  • James Enouen
  • Yan Liu

In recent years, the Shapley value and SHAP explanations have emerged as one of the most dominant paradigms for providing post-hoc explanations of blackbox models. Despite their well-founded theoretical properties, many recent works have focused on the limitations in both their computational efficiency and their representation power. The underlying connection with additive models, however, is left critically under-emphasized in the current literature. In this work, we find that a variational perspective linking GAM models and SHAP explanations is able to provide deep insights into nearly all recent developments. In light of this connection, we borrow in the other direction to develop a new method to train interpretable GAM models which are automatically purified to compute the Shapley value in a single forward pass. Finally, we provide theoretical results showing the limited representation power of GAM models is the same Achilles’ heel existing in SHAP and discuss the implications for SHAP’s modern usage in CV and NLP.

JBHI Journal 2025 Journal Article

Mesh Regression Based Shape Enhancement Operator Designed for Organ Segmentation

  • Yuanyuan Xu
  • Hui Yu
  • Jiliu Zhou
  • Yan Liu

Organ delineation is critical for diagnosis and treatment planning so as to attract a lot of attention. Recently, neural network based methods yield accurate segmentation metrics like dice coefficient. However, they have to face the problem of indistinct boundaries since segmentation is usually modeled as a pixel classification task ignoring anatomical priors. Inspired by the fact that anatomical information is an essential prior for doctors in organ segmentation, this paper proposes a mesh regression-based shape enhancement operator. This operator innovatively models the refinement of segmentation masks as a mesh vertex regression task, enabling the model to refine the segmentation contours from the perspective of segmentation targets rather than purely from a pixel perspective. The proposed operator starts from the coarse segmentation masks produced by any segmentation model. By representing mesh with the fast point feature histogram of mesh vertexes, the displacement of each vertex is predicted by a graph convolutional neural network. Once the coordinate displacements are obtained, the mesh will be evolved through vertex moving. The operator is plug-and-play, and could co-operate with any backbone segmentation model. The constructed two-stage segmentation pipeline is capable of refining organ segmentation results based on geometrical characteristics of target appearance. Validation has been performed on two public accessible datasets to delineate pancreas and liver. Results have shown that the proposed shape enhancement operator could significantly improve segmentation performance, which have also demonstrated its effectiveness and application prospects.

AAAI Conference 2025 Conference Paper

Mixture of Knowledge Minigraph Agents for Literature Review Generation

  • Zhi Zhang
  • Yan Liu
  • Sheng-hua Zhong
  • Gong Chen
  • Yu Yang
  • Jiannong Cao

Literature reviews play a crucial role in scientific research for understanding the current state of research, identifying gaps, and guiding future studies on specific topics. However, the process of conducting a comprehensive literature review is yet time-consuming. This paper proposes a novel framework, collaborative knowledge minigraph agents (CKMAs), to automate scholarly literature reviews. A novel prompt-based algorithm, the knowledge minigraph construction agent (KMCA), is designed to identify relations between concepts from academic literature and automatically constructs knowledge minigraphs. By leveraging the capabilities of large language models on constructed knowledge minigraphs, the multiple path summarization agent (MPSA) efficiently organizes concepts and relations from different viewpoints to generate literature review paragraphs. We evaluate CKMAs on three benchmark datasets. Experimental results show the effectiveness of the proposed method, further revealing promising applications of LLMs in scientific research.

TMLR Journal 2025 Journal Article

Physics-Aware Spatiotemporal Causal Graph Network for Forecasting with Limited Data

  • Zijun Cui
  • Sam Griesemer
  • Sungyong Seo
  • Joshua Hikida
  • Yan Liu

Spatiotemporal models have drawn significant interest recently due to their widespread applicability across many domains. These models are often made more practically useful by incorporating beneficial inductive biases, such as laws or symmetries from domain-relevant physics equations. This "physics-awareness" provides an interpretable means of grounding otherwise purely data-driven models, improving robustness and boosting performance in settings with limited data. In this work, we view physical dynamics as domain knowledge that captures fundamental causal relationships across space and time, and can be effectively leveraged by our proposed physics-aware spatiotemporal causal graph network (P-STCGN). We firstly describe a means of deriving causal relationships from spatiotemporal data, serving as physics-aware labels to learn a causal structure via a dedicated neural module. We then formulate a forecasting module that can operate under this causal structure, producing predictions that are guided by physics-aware cause-effect relationships among modeled variables. Extensive experimentation demonstrates that our method is robust to noisy and limited data, outperforming existing models across a variety of challenging synthetic tasks and benchmark datasets. We further evaluate our method on real-world graph signals and observe superior forecasting performance, achieved by effectively utilizing causal signals from prior physics knowledge.

NeurIPS Conference 2025 Conference Paper

Semantic Surgery: Zero-Shot Concept Erasure in Diffusion Models

  • Lexiang Xiong
  • Liu Chengyu
  • Jingwen Ye
  • Yan Liu
  • Yuecong Xu

With the growing power of text-to-image diffusion models, their potential to generate harmful or biased content has become a pressing concern, motivating the development of concept erasure techniques. Existing approaches, whether relying on retraining or not, frequently compromise the generative capabilities of the target model in achieving concept erasure. Here, we introduce Semantic Surgery, a novel training-free framework for zero-shot concept erasure. Semantic Surgery directly operates on text embeddings before the diffusion process, aiming to neutralize undesired concepts at their semantic origin with dynamism to enhance both erasure completeness and the locality of generation. Specifically, Semantic Surgery dynamically estimates the presence of target concepts in an input prompt, based on which it performs a calibrated, scaled vector subtraction to neutralize their influence at the source. The overall framework consists of a Co-Occurrence Encoding module for robust multi-concept erasure and a visual feedback loop to address latent concept persistence, thereby reinforcing erasure throughout the subsequent denoising process. Our proposed Semantic Surgery requires no model retraining and adapts dynamically to the specific concepts and their intensity detected in each input prompt, ensuring precise and context-aware interventions. Extensive experiments are conducted on object, explicit content, artistic style, and multi-celebrity erasure tasks, demonstrating that our method significantly outperforms state-of-the-art approaches. That is, our proposed concept erasure framework achieves superior completeness and robustness while preserving locality and general image quality (e. g. , achieving a 93. 58 H-score in object erasure, reducing explicit content to just 1 instance with a 12. 2 FID, and attaining an 8. 09 H_a in style erasure with no MS-COCO FID/CLIP degradation). Crucially, this robustness enables our framework to function as a built-in threat detection system by monitoring concept presence scores, offering a highly effective and practical solution for safer text-to-image generation. Our code is publicly available at: https: //github. com/Lexiang-Xiong/Semantic-Surgery

ICML Conference 2025 Conference Paper

SERENA: A Unified Stochastic Recursive Variance Reduced Gradient Framework for Riemannian Non-Convex Optimization

  • Yan Liu
  • Mingjie Chen
  • Chaojie Ji
  • Hao Zhang 0079
  • Ruxin Wang 0001

Recently, the expansion of Variance Reduction (VR) to Riemannian stochastic non-convex optimization has attracted increasing interest. Inspired by recursive momentum, we first introduce Stochastic Recursive Variance Reduced Gradient (SRVRG) algorithm and further present Stochastic Recursive Gradient Estimator (SRGE) in Euclidean spaces, which unifies the prevailing variance reduction estimators. We then extend SRGE to Riemannian spaces, resulting in a unified Stochastic rEcursive vaRiance reducEd gradieNt frAmework (SERENA) for Riemannian non-convex optimization. This framework includes the proposed R-SRVRG, R-SVRRM, and R-Hybrid-SGD methods, as well as other existing Riemannian VR methods. Furthermore, we establish a unified theoretical analysis for Riemannian non-convex optimization under retraction and vector transport. The IFO complexity of our proposed R-SRVRG and R-SVRRM to converge to $\varepsilon$-accurate solution is $\mathcal{O}\left(\min \{n^{1/2}{\varepsilon^{-2}}, \varepsilon^{-3}\}\right)$ in the finite-sum setting and ${\mathcal{O}\left( \varepsilon^{-3}\right)}$ for the online case, both of which align with the lower IFO complexity bound. Experimental results indicate that the proposed algorithms surpass other existing Riemannian optimization methods.

IJCAI Conference 2025 Conference Paper

Stackelberg vs. Nash in the Lottery Colonel Blotto Game

  • Yan Liu
  • Bonan Ni
  • Weiran Shen
  • Zihe Wang
  • Jie Zhang

Resource competition problems are often modeled using Colonel Blotto games, where players take simultaneous actions. However, many real-world scenarios involve sequential decision-making rather than simultaneous moves. To model these dynamics, we represent the Lottery Colonel Blotto game as a Stackelberg game, in which one player, the leader, commits to a strategy first, and the other player, the follower, responds. We derive the Stackelberg equilibrium for this game, formulating the leader's strategy as a bi-level optimization problem. To solve this, we develop a constructive method based on iterative game reductions, which allows us to efficiently compute the leader’s optimal commitment strategy in polynomial time. Additionally, we identify the conditions under which the Stackelberg equilibrium coincides with the Nash equilibrium. Specifically, this occurs when the budget ratio between the leader and the follower equals a certain threshold, which we can calculate in closed form. In some instances, we observe that when the leader’s budget exceeds this threshold, both players achieve higher utilities in the Stackelberg equilibrium compared to the Nash equilibrium. Lastly, we show that, in the best case, the leader can achieve an infinite utility improvement by making an optimal first move compared to the Nash equilibrium.

NeurIPS Conference 2024 Conference Paper

Active Sequential Posterior Estimation for Sample-Efficient Simulation-Based Inference

  • Sam Griesemer
  • Defu Cao
  • Zijun Cui
  • Carolina Osorio
  • Yan Liu

Computer simulations have long presented the exciting possibility of scientific insight into complex real-world processes. Despite the power of modern computing, however, it remains challenging to systematically perform inference under simulation models. This has led to the rise of simulation-based inference (SBI), a class of machine learning-enabled techniques for approaching inverse problems with stochastic simulators. Many such methods, however, require large numbers of simulation samples and face difficulty scaling to high-dimensional settings, often making inference prohibitive under resource-intensive simulators. To mitigate these drawbacks, we introduce active sequential neural posterior estimation (ASNPE). ASNPE brings an active learning scheme into the inference loop to estimate the utility of simulation parameter candidates to the underlying probabilistic model. The proposed acquisition scheme is easily integrated into existing posterior estimation pipelines, allowing for improved sample efficiency with low computational overhead. We further demonstrate the effectiveness of the proposed method in the travel demand calibration setting, a high-dimensional inverse problem commonly requiring computationally expensive traffic simulators. Our method outperforms well-tuned benchmarks and state-of-the-art posterior estimation methods on a large-scale real-world traffic network, as well as demonstrates a performance advantage over non-active counterparts on a suite of SBI benchmark environments.

EAAI Journal 2024 Journal Article

Adaptive type-2 fuzzy output feedback control using nonlinear observers for permanent magnet synchronous motor servo systems

  • Yongfu Wang
  • Yan Liu
  • Jinliang Ding
  • Dianhui Wang

This paper investigates the accurate servo position control of the permanent magnet synchronous motor (PMSM)-driven system under varying external disturbance and unknown mechanical friction. Firstly, to overcome the difficulty of state observer design caused by uncertainties, the interval type-2 fuzzy logic system (IT2 FLS) is introduced to approximate the nonlinear friction. Meanwhile, the nonlinear disturbance observer (NDO) is employed to estimate the lumped disturbance. Then aiming to solve the problem of explosion of complexity and filtering error, a finite-time (FT) adaptive IT2 fuzzy output feedback controller is proposed by combining the error compensation mechanism and FT control. In addition, to further improve the performance of the closed-loop system, the adaptive law is updated by the compensation error signal. By means of Lyapunov stability theory, all signals in the closed loop system can be guaranteed to be finite-time stable. Numerical simulations and real-time experiments are presented to demonstrate the effectiveness and superiority of the proposed controller.

AAAI Conference 2024 Conference Paper

Beyond Mimicking Under-Represented Emotions: Deep Data Augmentation with Emotional Subspace Constraints for EEG-Based Emotion Recognition

  • Zhi Zhang
  • Shenghua Zhong
  • Yan Liu

In recent years, using Electroencephalography (EEG) to recognize emotions has garnered considerable attention. Despite advancements, limited EEG data restricts its potential. Thus, Generative Adversarial Networks (GANs) are proposed to mimic the observed distributions and generate EEG data. However, for imbalanced datasets, GANs struggle to produce reliable augmentations for under-represented minority emotions by merely mimicking them. Thus, we introduce Emotional Subspace Constrained Generative Adversarial Networks (ESC-GAN) as an alternative to existing frameworks. We first propose the EEG editing paradigm, editing reference EEG signals from well-represented to under-represented emotional subspaces. Then, we introduce diversity-aware and boundary-aware losses to constrain the augmented subspace. Here, the diversity-aware loss encourages a diverse emotional subspace by enlarging the sample difference, while boundary-aware loss constrains the augmented subspace near the decision boundary where recognition models can be vulnerable. Experiments show ESC-GAN boosts emotion recognition performance on benchmark datasets, DEAP, AMIGOS, and SEED, while protecting against potential adversarial attacks. Finally, the proposed method opens new avenues for editing EEG signals under emotional subspace constraints, facilitating unbiased and secure EEG data augmentation.

YNIMG Journal 2024 Journal Article

Contrastive voxel clustering for multiscale modeling of brain network

  • Zhiyuan Ding
  • Yulang Huang
  • Xiangzhu Zeng
  • Shiyin Jiang
  • Shuyang Feng
  • Zhenduo Wang
  • Ling Wang
  • Zeng Wang

Resting-state functional magnetic resonance imaging (fMRI) provides an efficient way to analyze the functional connectivity between brain regions. A comprehensive understanding of brain functionality requires a unified description of multi-scale layers of neural structure. However, existing brain network modeling methods often simplify this property by averaging Blood oxygen level dependent (BOLD) signals at the brain region level for fMRI-based analysis with the assumption that BOLD signals are homogeneous within each brain region, which ignores the heterogeneity of voxels within each Region of Interest (ROI). This study introduces a novel multi-stage self-supervised learning framework for multiscale brain network analysis, which effectively delineates brain functionality from voxel to ROIs and up to sample level. A Contrastive Voxel Clustering (CVC) module is proposed to simultaneously learn the voxel-level features and clustering assignments, which ensures the retention of informative clustering features at the finest voxel-level and concurrently preserves functional connectivity characteristics. Additionally, based on the extracted features and clustering assignments at the voxel level by CVC, a Brain ROI-based Graph Neural Network (BR-GNN) is built to extract functional connectivity features at the brain ROI-level and used for sample-level prediction, which integrates the functional clustering maps with the pre-established structural ROI maps and creates a more comprehensive and effective analytical tool. Experiments are performed on two datasets, which illustrate the effectiveness and generalization ability of the proposed method by analyzing voxel-level clustering results and brain ROIs-level functional characteristics. The proposed method provides a multiscale modeling framework for brain functional connectivity analysis, which will be further used for other brain disease identification. Code is available at https://github.com/yanliugroup/fmri-cvc.

IJCAI Conference 2024 Conference Paper

D3ETR: Decoder Distillation for Detection Transformer

  • Xiaokang Chen
  • Jiahui Chen
  • Yan Liu
  • Jiaxiang Tang
  • Gang Zeng

Although various knowledge distillation (KD) methods for CNN-based detectors have been proven effective in improving small students, build- ing baselines and recipes for DETR-based detec- tors remains a challenge. This paper concentrates on the transformer decoder of DETR-based detec- tors and explores KD methods suitable for them. However, the random order of the decoder outputs poses a challenge for knowledge distillation as it provides no direct correspondence between the pre- dictions of the teacher and the student. To this end, we propose MixMatcher that aligns the de- coder outputs of DETR-based teacher and student, by mixing two teacher-student matching strategies for combined advantages. The first strategy, Adap- tive Matching, applies bipartite matching to adap- tively match the outputs of the teacher and the stu- dent in each decoder layer. The second strategy, Fixed Matching, fixes the correspondence between the outputs of the teacher and the student with the same object queries as input, which alleviates in- stability of bipartite matching in Adaptive Match- ing. Using both strategies together produces bet- ter results than using either strategy alone. Based on MixMatcher, we devise Decoder Distillation for DEtection TRansformer (D3ETR), which dis- tills knowledge in decoder predictions and attention maps from the teacher to student. D3ETR shows superior performance on various DETR-based de- tectors with different backbones. For instance, D3ETR improves Conditional DETR-R50-C5 by 8. 3 mAP under 12 epochs training setting with Conditional DETR-R101-C5 serving as the teacher. The code will be released.

JBHI Journal 2024 Journal Article

Gradient-Guided Network With Fourier Enhancement for Glioma Segmentation in Multimodal 3D MRI

  • Zhongzhou Zhang
  • Hui Yu
  • Zhongxian Wang
  • Zhiwen Wang
  • Jingfeng Lu
  • Yan Liu
  • Yi Zhang

Glioma segmentation is a crucial task in computer-aided diagnosis, requiring precise discrimination between lesions and normal tissue at the pixel level. Popular methods neglect crucial edge information, leading to inaccurate contour delineation. Moreover, global information has been proven beneficial for segmentation. The feature representations extracted by convolution neural networks often struggle with local-related information owing to the limited receptive fields. To address these issues, we propose a novel edge-aware segmentation network that incorporates a dual-path gradient-guided training strategy with Fourier edge-enhancement for precise glioma segmentation, a. k. a. GFNet. First, we introduce a Dual-path Gradient-guided Training strategy (DGT) based on a Siamese network guiding the optimizing direction of one path by the gradient from the other path. DGT pays attention to the indistinguishable pixels with large weight-updating gradient, such as the pixels near the boundary, to guide the network training, addressing hard samples. Second, to further perceive the edge information, we derive a Fourier Edge-enhancement Module (FEM) to augment feature edges with high-frequency representations from the spectral domain, providing global information and edge details. Extensive experiments on public glioma segmentation datasets, BraTS2020 and Medical Segmentation Decathlon (MSD) glioma and prostate segmentation, demonstrate that GFNet achieves competitive performance compared to other state-of-the-art methods, both qualitatively and quantitatively.

JBHI Journal 2024 Journal Article

HCT: Chinese Medical Machine Reading Comprehension Question-Answering via Hierarchically Collaborative Transformer

  • Meiling Wang
  • Xiaohai He
  • Luping Liu
  • Qingmao Fang
  • Mei Zhang
  • Honggang Chen
  • Yan Liu

Chinese medical machine reading comprehension question-answering (cMed-MRCQA) is a critical component of the intelligence question-answering task, focusing on the Chinese medical domain question-answering task. Its purpose enable machines to analyze and understand the given text and question and then extract the accurate answer. To enhance cMed-MRCQA performance, it is essential to possess a profound comprehension and analysis of the context, deduce concealed information from the textual content and, subsequently, precisely determine the answer's span. The answer span has predominantly been defined by language items, with sentences employed in most instances. However, it is worth noting that sentences may not be properly split to varying degrees in various languages, making it challenging for the model to predict the answer zone. To alleviate this issue, this paper presents a novel architecture called HCT based on a H ierarchically C ollaborative T ransformer. Specifically, we presented a hierarchical collaborative method to locate the boundaries of sentence and answer spans separately. First, we designed a hierarchical encoding module to obtain the local semantic features of the corpus; second, we proposed a sentence-level self-attention module and a fused interaction-attention module to get the global information about the text. Finally, the model is trained by combining loss functions. Extensive experiments were conducted on the public dataset CMedMRC and the reconstruction dataset eMedicine to validate the effectiveness of the proposed method. Experimental results showed that the proposed method performed better than the state-of-the-art methods. Using the F1 metric, our model scored 90. 4% on the CMedMRC and 73. 2% on eMedicine.

EAAI Journal 2024 Journal Article

Intelligent control of district heating system based on RDPG

  • Mingju Gong
  • Yan Liu
  • Jiawang Sun
  • Wei Xu
  • Wenxiang Li
  • Changcheng Yan
  • Wencheng Fu

Given the continuous expansion of heating areas in recent years, the design of a precise and dependable district heating system (DHS) has become increasingly crucial. Traditional control decisions are made based on real-time environmental temperature feedback, often leading to uneven heating on the user side and affecting residents' comfort. This paper proposes an intelligent control strategy based on the deep reinforcement learning recurrent deterministic policy gradient (RDPG) algorithm for DHSs. To explore the control performance of the RDPG algorithm on DHS, we have meticulously modeled the pivotal components of the DHSs, namely plate heat exchangers, secondary heating pipe networks, and heat users. Moreover, taking into account the periodic factors in heating regulation, the traditional recurrent neural network (RNN) in the recurrent deterministic policy gradient (RDPG) algorithm has been replaced with the long short-term memory (LSTM) network. The proposed algorithm was trained using actual data from a heat exchange station in Tianjin and compared with reinforcement learning algorithms such as TD3, DPPO, DDPG, and A3C in terms of training rewards, effectiveness, and training stability. The results of the models are evaluated and visualized. Experimental results show that the proposed control method based on the RDPG algorithm, compared to other control schemes, can achieve the highest training reward and the most stable control performance, with an indoor temperature fluctuation range of only 0. 1 °C.

EAAI Journal 2024 Journal Article

Intelligent mining methodology of product field failure data by fusing deep learning and association rules for after-sales service text

  • Yan Liu
  • Shijie Hu
  • Haichun Zhang
  • Qiuxian Dong
  • Weidong Liu

The after-sales service text contains wealthy field failure data, which can be used to estimate the field reliability of product and its components. However, human-based data mining suffers from low efficiency and unstable quality. This study proposes an innovative intelligent mining methodology that combines the product failure dictionary (PFD), gate recurrent unit (GRU) and association rules (AR) to solve the problem. Firstly, the text is studied, and the segmented mining process is developed. Secondly, the PFD is edited according to the analysis results of a large product failure sample, and combined with GRU to construct the PFD-GRU efficient mining model. Thirdly, based on the failure determination criteria and the dependency syntax analysis results of the text, the AR-based mining method is formulated for re-mining the text that does not satisfy the determination threshold of the PFD-GRU model to establish the PFD-GRU-AR intelligent mining methodology. The case study results of the after-sales service text mining for four types of air-conditioner show that the segmented mining pattern can obtain better quality mining results than the one-stage mining pattern, and the P, R, and F 1 of the PFD-GRU model have increased by at least 1. 48%, 1. 72% and 1. 6% respectively compared to the corresponding indicator values of the GRU model. With the increase of the threshold value, the quality and robustness are improved and the mining results tend to be stable. As threshold value is 0. 95, the PFD-GRU-AR methodology obtains minimum value for P, R, and F 1 as 98. 54%, 99. 09% and 98. 81% respectively.

EAAI Journal 2024 Journal Article

Linguistic q-rung orthopair fuzzy Z-number and its application in multi-criteria decision-making

  • Yan Liu
  • Zhaojun Yang
  • Jialong He
  • Guofa Li
  • Yuan Zhong

This paper proposes a new linguistic model called Linguistic q-Rung Orthopair fuzzy Z-number (LqROFZN), which combines the advantages of linguistic variables, Z-number and q-Rung Orthopair Fuzzy numbers. It can be used as a powerful tool for uncertain decision-making, which can effectively improve the accuracy and reliability of the decision-making results, and has a notable application prospect for the fields of information decision-making, risk assessment, diagnosis and so on. In this paper, firstly, the definition of LqROFZN and its operational rules are given, a new distance measure and the concept of entropy are given under LqROFZN, and the entropy of LqROFZN can assess the credibility situation of LqROFZN. Next, two aggregation operators under LqROFZN are given, namely the Linguistic q-Rung Orthopair fuzzy Z-number weighted aggregation (LqROFZWA) operator and the Linguistic q-Rung Orthopair fuzzy Z-number weighted Geometric aggregation (LqROFZWGA) operator. Finally, the MCDM method under LqROFZN is given and the credibility of the evaluation results is assessed using the entropy of LqROFZN. In a set of actual airline aircraft selection cases, the feasibility and advantages of the proposed method are verified through comparative analysis with other methods.

IJCAI Conference 2024 Conference Paper

MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music

  • Zihao Wang
  • Shuyu Li
  • Tao Zhang
  • Qi Wang
  • Pengfei Yu
  • Jinyang Luo
  • Yan Liu
  • Ming Xi

The rapidly evolving multimodal Large Language Models (LLMs) urgently require new benchmarks to uniformly evaluate their performance on understanding and textually describing music. However, due to semantic gaps between Music Information Retrieval (MIR) algorithms and human understanding, discrepancies between professionals and the public, and low precision of annotations, existing music description datasets cannot serve as benchmarks. To this end, we present MuChin, the first open-source music description benchmark in Chinese colloquial language, designed to evaluate the performance of multimodal LLMs in understanding and describing music. We established the Caichong Music Annotation Platform (CaiMAP) that employs an innovative multi-person, multi-stage assurance method, and recruited both amateurs and professionals to ensure the precision of annotations and alignment with popular semantics. Utilizing this method, we built a large-scale, private dataset with multi-dimensional, high-precision music annotations, the Caichong Music Dataset (CaiMD), and carefully selected 1, 000 high-quality entries to serve as the test set for MuChin. Based on MuChin, we analyzed the discrepancies between professionals and amateurs in terms of music description, and empirically demonstrated the effectiveness of CaiMD for fine-tuning LLMs. Ultimately, we employed MuChin to evaluate existing music understanding models on their ability to provide colloquial descriptions of music.

NeurIPS Conference 2024 Conference Paper

Physics-Constrained Comprehensive Optical Neural Networks

  • Yanbing Liu
  • Jianwei Qin
  • Yan Liu
  • Xi Yue
  • Xun Liu
  • Guoqing Wang
  • Tianyu Li
  • Fangwei Ye

With the advantages of low latency, low power consumption, and high parallelism, optical neural networks (ONN) offer a promising solution for time-sensitive and resource-limited artificial intelligence applications. However, the performance of the ONN model is often diminished by the gap between the ideal simulated system and the actual physical system. To bridge the gap, this work conducts extensive experiments to investigate systematic errors in the optical physical system within the context of image classification tasks. Through our investigation, two quantifiable errors—light source instability and exposure time mismatches—significantly impact the prediction performance of ONN. To address these systematic errors, a physics-constrained ONN learning framework is constructed, including a well designed loss function to mitigate the effect of light fluctuations, a CCD adjustment strategy to alleviate the effects of exposure time mismatches and a ’physics-prior based’ error compensation network to manage other systematic errors, ensuring consistent light intensity across experimental results and simulations. In our experiments, the proposed method achieved a test classification accuracy of 96. 5% on the MNIST dataset, a substantial improvement over the 61. 6% achieved with the original ONN. For the more challenging QuickDraw16 and Fashion MNIST datasets, experimental accuracy improved from 63. 0% to 85. 7% and from 56. 2% to 77. 5%, respectively. Moreover, the comparison results further demonstrate the effectiveness of the proposed physics-constrained ONN learning framework over state-of-the-art ONN approaches. This lays the groundwork for more robust and precise optical computing applications.

ECAI Conference 2024 Conference Paper

Reassessing Non-Autoregressive Neural Machine Translation with a Fine-Grained Error Taxonomy

  • Yan Liu
  • Longyue Wang
  • Zhaopeng Tu
  • Deyi Xiong

Non-autoregressive neural machine translation (NAT) has made remarkable progress since it is proposed. The performance of NAT in terms of BLEU has approached or even matched that of autoregressive neural machine translation (AT). However, other evaluation metrics show that NAT still lags behind. Unfortunately, these metrics only provide a numerical difference, and it is unclear how the translations produced by NAT differ from those produced by AT. In addition, the multimodality problem is always a significant issue in NAT. To assess whether NAT models are fully capable of solving the multimodality problem and achieving the performance of AT, we specifically design an error taxonomy to annotate errors in translations. The taxonomy is grounded on a systematic and hierarchical error analysis. We carry out an extensive annotation with professional annotators and analyze four NAT models and two AT models. Our analysis and experiments show that (1) the number of errors in NAT translations marked by annotators is 1. 54 times that of AT translations, (2) the multimodality problem of NAT affects translations from lexical to syntactic levels, and even up to discourse, and (3) the four NAT models cannot fully eradicate the multimodality problem despite mitigation efforts.

AAAI Conference 2024 Conference Paper

Responding to the Call: Exploring Automatic Music Composition Using a Knowledge-Enhanced Model

  • Zhejing Hu
  • Yan Liu
  • Gong Chen
  • Xiao Ma
  • Shenghua Zhong
  • Qianwen Luo

Call-and-response is a musical technique that enriches the creativity of music, crafting coherent musical ideas that mirror the back-and-forth nature of human dialogue with distinct musical characteristics. Although this technique is integral to numerous musical compositions, it remains largely uncharted in automatic music composition. To enhance the creativity of machine-composed music, we first introduce the Call-Response Dataset (CRD) containing 19,155 annotated musical pairs and crafted comprehensive objective evaluation metrics for musical assessment. Then, we design a knowledge-enhanced learning-based method to bridge the gap between human and machine creativity. Specifically, we train the composition module using the call-response pairs, supplementing it with musical knowledge in terms of rhythm, melody, and harmony. Our experimental results underscore that our proposed model adeptly produces a wide variety of creative responses for various musical calls.

AAAI Conference 2024 Conference Paper

Spiking NeRF: Representing the Real-World Geometry by a Discontinuous Representation

  • Zhanfeng Liao
  • Yan Liu
  • Qian Zheng
  • Gang Pan

A crucial reason for the success of existing NeRF-based methods is to build a neural density field for the geometry representation via multiple perceptron layers (MLPs). MLPs are continuous functions, however, real geometry or density field is frequently discontinuous at the interface between the air and the surface. Such a contrary brings the problem of unfaithful geometry representation. To this end, this paper proposes spiking NeRF, which leverages spiking neurons and a hybrid Artificial Neural Network (ANN)-Spiking Neural Network (SNN) framework to build a discontinuous density field for faithful geometry representation. Specifically, we first demonstrate the reason why continuous density fields will bring inaccuracy. Then, we propose to use the spiking neurons to build a discontinuous density field. We conduct a comprehensive analysis for the problem of existing spiking neuron models and then provide the numerical relationship between the parameter of the spiking neuron and the theoretical accuracy of geometry. Based on this, we propose a bounded spiking neuron to build the discontinuous density field. Our method achieves SOTA performance. The source code and the supplementary material are available at https://github.com/liaozhanfeng/Spiking-NeRF.

YNIMG Journal 2024 Journal Article

The dorsomedial prefrontal cortex promotes self-control by inhibiting the egocentric perspective

  • Chen Jin
  • Ying Li
  • Yin Yin
  • Tenda Ma
  • Wei Hong
  • Yan Liu
  • Nan Li
  • Xinyue Zhang

The dorsomedial prefrontal cortex (dmPFC) plays a crucial role in social cognitive functions, including perspective-taking. Although perspective-taking has been linked to self-control, the mechanism by which the dmPFC might facilitate self-control remains unclear. Using the multimodal neuroimaging dataset from the Human Connectome Project (Study 1, N =978 adults), we established a reliable association between the dmPFC and self-control, as measured by discounting rate-the tendency to prefer smaller, immediate rewards over larger, delayed ones. Experiments (Study 2, N = 36 adults) involving high-definition transcranial direct current stimulation showed that anodal stimulation of the dmPFC reduces the discounting of delayed rewards and decreases the congruency effect in egocentric but not allocentric perspective in the visual perspective-taking tasks. These findings suggest that the dmPFC promotes self-control by inhibiting the egocentric perspective, offering new insights into the neural underpinnings of self-control and perspective-taking, and opening new avenues for interventions targeting disorders characterized by impaired self-regulation.

EAAI Journal 2023 Journal Article

A novel seminar learning framework for weakly supervised salient object detection

  • Yan Liu
  • Yunzhou Zhang
  • Zhenyu Wang
  • Fei Yang
  • Feng Qiu
  • Sonya Coleman
  • Dermot Kerr

Weakly supervised salient object detection (SOD) is a challenging task and has drawn much attention from several research perspectives, it has revealed two problems while driving the rapid development of saliency detection. (1) Large divergence in the characteristics of saliency regions in terms of location, shape and size makes them difficult to recognize. (2) The properties of convolutional neural networks dictate that it is insensitive to various transformations, which will lead to hardly balance the application of various disturbances. To tackle these limitations, this paper proposes a novel seminar learning framework with consistent transformation ensembling (SLF-CT) for scribble supervised SOD. The framework consists of the teacher–student model and the student–student model for segmenting the salient objects. Specifically, we first design a cross attention guided network (CAGNet) as a baseline model for saliency prediction. Then we assign CAGNet to the teacher–student model, where the teacher network is based on the exponential moving average and guides the training of the student network. Moreover, we adopt multiple pseudo labels to transfer the information among students from different conditions. To further enhance the regularization of the network, a consistency transformation mechanism is also incorporated, which encourages the saliency prediction and input image of the network to be consistent. The experimental results demonstrate that the proposed approach performs favorably comparable with the state-of-the-art weakly supervised methods. As far as we know, the proposed approach is the first application of seminar learning in the SOD area.

EAAI Journal 2023 Journal Article

Decision system for copper flotation backbone process

  • Haipei Dong
  • Fuli Wang
  • Dakuo He
  • Yan Liu

This study proposed a decision system that can output the flotation backbone flowchart using the natural properties of copper ore. The proposed decision system includes three decision tasks: product scheme, flotation scheme and grinding scheme. Each decision task is a multi-label classification problem. To improve the classification effect of each sub-label, extreme gradient boosting (XGBoost) is used as a subclassifier, because of its ability to deal with small and high-dimensional samples. To selectively utilize the relations between the sub-labels in the same task, a modified classifier chain (MCC) was proposed. To specifically use the effect of a front-end task on a back-end task, the decision system connects the MCC-XGBoost corresponding to the three tasks in series. Accordingly, the outputs of a front-end task becomes the candidate features of a back-end task. To improve the recall rates of minority classes, the classification thresholds are customized using the Yoden index. Finally, the high performance of the decision system was demonstrated by hypothesis testing.

AAAI Conference 2023 Conference Paper

Estimating Treatment Effects from Irregular Time Series Observations with Hidden Confounders

  • Defu Cao
  • James Enouen
  • Yujing Wang
  • Xiangchen Song
  • Chuizheng Meng
  • Hao Niu
  • Yan Liu

Causal analysis for time series data, in particular estimating individualized treatment effect (ITE), is a key task in many real world applications, such as finance, retail, healthcare, etc. Real world time series, i.e., large-scale irregular or sparse and intermittent time series, raise significant challenges to existing work attempting to estimate treatment effects. Specifically, the existence of hidden confounders can lead to biased treatment estimates and complicate the causal inference process. In particular, anomaly hidden confounders which exceed the typical range can lead to high variance estimates. Moreover, in continuous time settings with irregular samples, it is challenging to directly handle the dynamics of causality. In this paper, we leverage recent advances in Lipschitz regularization and neural controlled differential equations (CDE) to develop an effective and scalable solution, namely LipCDE, to address the above challenges. LipCDE can directly model the dynamic causal relationships between historical data and outcomes with irregular samples by considering the boundary of hidden confounders given by Lipschitz constrained neural networks. Furthermore, we conduct extensive experiments on both synthetic and real world datasets to demonstrate the effectiveness and scalability of LipCDE.

NeurIPS Conference 2023 Conference Paper

Hierarchical Gaussian Mixture based Task Generative Model for Robust Meta-Learning

  • Yizhou Zhang
  • Jingchao Ni
  • Wei Cheng
  • Zhengzhang Chen
  • Liang Tong
  • Haifeng Chen
  • Yan Liu

Meta-learning enables quick adaptation of machine learning models to new tasks with limited data. While tasks could come from varying distributions in reality, most of the existing meta-learning methods consider both training and testing tasks as from the same uni-component distribution, overlooking two critical needs of a practical solution: (1) the various sources of tasks may compose a multi-component mixture distribution, and (2) novel tasks may come from a distribution that is unseen during meta-training. In this paper, we demonstrate these two challenges can be solved jointly by modeling the density of task instances. We develop a meta-training framework underlain by a novel Hierarchical Gaussian Mixture based Task Generative Model (HTGM). HTGM extends the widely used empirical process of sampling tasks to a theoretical model, which learns task embeddings, fits the mixture distribution of tasks, and enables density-based scoring of novel tasks. The framework is agnostic to the encoder and scales well with large backbone networks. The model parameters are learned end-to-end by maximum likelihood estimation via an Expectation-Maximization (EM) algorithm. Extensive experiments on benchmark datasets indicate the effectiveness of our method for both sample classification and novel task detection.

EAAI Journal 2023 Journal Article

Operating performance assessment method for industrial process with slowness principle-based LSTM network

  • Fei Chu
  • Shuangshuang Liao
  • Lili Hao
  • Pei Wang
  • Yan Liu
  • Fuli Wang

Timely and accurate performance assessment and non-optimal regulation of industrial processes can effectively guarantee product quality. Most industrial processes are highly nonlinear and dynamic, so long short-term memory (LSTM) network is suitable for industrial performance assessment. However, in the network learning, the typical LSTM network focuses on the representation learning of input variables, lacks the representation of comprehensive economic indexes (CEI), and cannot selectively learn essential features, which increases the computational burden and easily mixes redundant information. Thus, a supervised slow feature analysis (SFA)-based LSTM (SSFALSTM) network is proposed for industrial operating performance assessment. By utilizing CEI information and SFA constraints, the network is guided to simultaneously learn features related to CEI and slow-changing features that reflect the inherent dynamics of the process. Further, cascade performance recognition model to construct the complete performance assessment framework. For the non-optimal performance, a reconstruction-based contribution plot method is proposed to identify the main cause variables and guide adjustment operations. Finally, the effectiveness of the proposed method is validated on the dense medium coal preparation process.

YNICL Journal 2023 Journal Article

Role of hippocampal subfields in neurodegenerative disease progression analyzed with a multi-scale attention-based network

  • Hongbo Xu
  • Yan Liu
  • Ling Wang
  • Xiangzhu Zeng
  • Yingying Xu
  • Zeng Wang

BACKGROUND AND OBJECTIVE: Both Alzheimer's disease (AD) and Parkinson's disease (PD) are progressive neurodegenerative diseases. Early identification is very important for the prevention and intervention of their progress. Hippocampus plays a crucial role in cognition, in which there are correlations between atrophy of Hippocampal subfields and cognitive impairment in neurodegenerative diseases. Exploring biomarkers in the prediction of early cognitive impairment in AD and PD is significant for understanding the progress of neurodegenerative diseases. METHODS: A multi-scale attention-based deep learning method is proposed to perform computer-aided diagnosis for neurodegenerative disease based on Hippocampal subfields. First, the two dimensional (2D) Hippocampal Mapping Image (HMI) is constructed and used as input of three branches of the following network. Second, the multi-scale module and attention module are integrated into the 2D residual network to improve the diversity of the extracted features and capture significance of various voxels for classification. Finally, the role of Hippocampal subfields in the progression of different neurodegenerative diseases is analyzed using the proposed method. RESULTS: Classification experiments between normal control (NC), mild cognitive impairment (MCI), AD, PD with normal cognition (PD-NC) and PD with mild cognitive impairment (PD-MCI) are carried out using the proposed method. Experimental results show that subfields subiculum, presubiculum, CA1, and molecular layer are strongly correlated with cognitive impairment in AD and MCI, subfields GC-DG and fimbria are sensitive in detecting early stage of cognitive impairment in MCI, subfields CA3, CA4, GC-DG, and CA1 show significant atrophy in PD. For exploring the role of Hippocampal subfields in PD cognitive impairment, we find that left parasubiculum, left HATA and left presubiculum could be important biomarkers for predicting conversion from PD-NC to PD-MCI. CONCLUSION: The proposed multi-scale attention-based network can effectively discover the correlation between subfields and neurodegenerative diseases. Experimental results are consistent with previous clinical studies, which will be useful for further exploring the role of Hippocampal subfields in neurodegenerative disease progression.

JBHI Journal 2023 Journal Article

SemiMAR: Semi-Supervised Learning for CT Metal Artifact Reduction

  • Tao Wang
  • Hui Yu
  • Zhiwen Wang
  • Hu Chen
  • Yan Liu
  • Jingfeng Lu
  • Yi Zhang

Metal artifacts lead to CT imaging quality degradation. With the success of deep learning (DL) in medical imaging, a number of DL-based supervised methods have been developed for metal artifact reduction (MAR). Nonetheless, fully-supervised MAR methods based on simulated data do not perform well on clinical data due to the domain gap. Although this problem can be avoided in an unsupervised way to a certain degree, severe artifacts cannot be well suppressed in clinical practice. Recently, semi-supervised metal artifact reduction (MAR) methods have gained wide attention due to their ability in narrowing the domain gap and improving MAR performance in clinical data. However, these methods typically require large model sizes, posing challenges for optimization. To address this issue, we propose a novel semi-supervised MAR framework. In our framework, only the artifact-free parts are learned, and the artifacts are inferred by subtracting these clean parts from the metal-corrupted CT images. Our approach leverages a single generator to execute all complex transformations, thereby reducing the model's scale and preventing overlap between clean part and artifacts. To recover more tissue details, we distill the knowledge from the advanced dual-domain MAR network into our model in both image domain and latent feature space. The latent space constraint is achieved via contrastive learning. We also evaluate the impact of different generator architectures by investigating several mainstream deep learning-based MAR backbones. Our experiments demonstrate that the proposed method competes favorably with several state-of-the-art semi-supervised MAR techniques in both qualitative and quantitative aspects.

NeurIPS Conference 2023 Conference Paper

Uncovering and Quantifying Social Biases in Code Generation

  • Yan Liu
  • Xiaokang Chen
  • Yan Gao
  • Zhe Su
  • Fengji Zhang
  • Daoguang Zan
  • Jian-Guang Lou
  • Pin-Yu Chen

With the popularity of automatic code generation tools, such as Copilot, the study of the potential hazards of these tools is gaining importance. In this work, we explore the social bias problem in pre-trained code generation models. We propose a new paradigm to construct code prompts and successfully uncover social biases in code generation models. To quantify the severity of social biases in generated code, we develop a dataset along with three metrics to evaluate the overall social bias and fine-grained unfairness across different demographics. Experimental results on three pre-trained code generation models (Codex, InCoder, and CodeGen) with varying sizes, reveal severe social biases. Moreover, we conduct analysis to provide useful insights for further choice of code generation models with low social bias.

NeurIPS Conference 2022 Conference Paper

Counterfactual Neural Temporal Point Process for Estimating Causal Influence of Misinformation on Social Media

  • Yizhou Zhang
  • Defu Cao
  • Yan Liu

Recent years have witnessed the rise of misinformation campaigns that spread specific narratives on social media to manipulate public opinions on different areas, such as politics and healthcare. Consequently, an effective and efficient automatic methodology to estimate the influence of the misinformation on user beliefs and activities is needed. However, existing works on misinformation impact estimation either rely on small-scale psychological experiments or can only discover the correlation between user behaviour and misinformation. To address these issues, in this paper, we build up a causal framework that model the causal effect of misinformation from the perspective of temporal point process. To adapt the large-scale data, we design an efficient yet precise way to estimate the \textbf{Individual Treatment Effect} (ITE) via neural temporal point process and gaussian mixture models. Extensive experiments on synthetic dataset verify the effectiveness and efficiency of our model. We further apply our model on a real-world dataset of social media posts and engagements about COVID-19 vaccines. The experimental results indicate that our model recognized identifiable causal effect of misinformation that hurts people's subjective emotions toward the vaccines.

IJCAI Conference 2022 Conference Paper

EGCN: An Ensemble-based Learning Framework for Exploring Effective Skeleton-based Rehabilitation Exercise Assessment

  • Bruce X. B. Yu
  • Yan Liu
  • Xiang Zhang
  • Gong Chen
  • Keith C. C. Chan

Recently, some skeleton-based physical therapy systems have been attempted to automatically evaluate the correctness or quality of an exercise performed by rehabilitation subjects. However, in terms of algorithms and evaluation criteria, the task remains not fully explored regarding making full use of different skeleton features. To advance the prior work, we propose a learning framework called Ensemble-based Graph Convolutional Network (EGCN) for skeleton-based rehabilitation exercise assessment. As far as we know, this is the first attempt that utilizes both two skeleton feature groups and investigates different ensemble strategies for the task. We also examine the properness of existing evaluation criteria and focus on evaluating the prediction ability of our proposed method. We then conduct extensive cross-validation experiments on two latest public datasets: UI-PRMD and KIMORE. Results indicate that the model-level ensemble scheme of our EGCN achieves better performance than existing methods. Code is available: https: //github. com/bruceyo/EGCN.

EAAI Journal 2022 Journal Article

Failure mode risk assessment methodology for controlling multi-uncertainties in the evaluation process

  • Yan Liu
  • Bingsong Chen
  • Qiuxian Dong
  • Weidong Liu
  • Wenbin Nie
  • Chao Yang

The failure mode risk evaluation results of FMEA are affected by multi-uncertainties. This paper proposes a risk evaluation methodology for controlling multi-uncertainties in the assessment process. First, the fuzzy confidence interval number (FCIN) evaluation model is provided to control the uncertainty in assessing the severity (S), occurrence (O), and detectability (D). Then, the FCINs are converted into generalized trapezoidal fuzzy numbers (GTrFNs), and the GTrFNs’ scalar characteristic distances modified by the non-membership are used as the evaluation results of S, O, D and their synthesizer or risk priority number (RPN) to control the risk evaluation model uncertainty. Furthermore, the evaluation parameter value criteria of S, O, D are formulated based on the sensitivity analysis results, more precise than the general value guidelines introduced by industrial FMEA standards. The case study results show that the proposed methodology can significantly improve the risk assessment results and the risk discrimination of failure modes.

AAAI Conference 2022 Conference Paper

I-SEA: Importance Sampling and Expected Alignment-Based Deep Distance Metric Learning for Time Series Analysis and Embedding

  • Sirisha Rambhatla
  • Zhengping Che
  • Yan Liu

Learning effective embeddings for potentially irregularly sampled time-series, evolving at different time scales, is fundamental for machine learning tasks such as classification and clustering. Task-dependent embeddings rely on similarities between data samples to learn effective geometries. However, many popular time-series similarity measures are not valid distance metrics, and as a result they do not reliably capture the intricate relationships between the multi-variate time-series data samples for learning effective embeddings. One of the primary ways to formulate an accurate distance metric is by forming distance estimates via Monte-Carlo-based expectation evaluations. However, the high-dimensionality of the underlying distribution, and the inability to sample from it, pose significant challenges. To this end, we develop an Importance Sampling based distance metric – I-SEA – which enjoys the properties of a metric while consistently achieving superior performance for machine learning tasks such as classification and representation learning. I-SEA leverages Importance Sampling and Non-parametric Density Estimation to adaptively estimate distances, enabling implicit estimation from the underlying high-dimensional distribution, resulting in improved accuracy and reduced variance. We theoretically establish the properties of I-SEA and demonstrate its capabilities via experimental evaluations on real-world healthcare datasets.

AIIM Journal 2022 Journal Article

Medical visual question answering based on question-type reasoning and semantic space constraint

  • Meiling Wang
  • Xiaohai He
  • Luping Liu
  • Linbo Qing
  • Honggang Chen
  • Yan Liu
  • Chao Ren

Medical visual question answering (Med-VQA) aims to accurately answer clinical questions about medical images. Despite its enormous potential for application in the medical domain, the current technology is still in its infancy. Compared with general visual question answering task, Med-VQA task involve more demanding challenges. First, clinical questions about medical images are usually diverse due to different clinicians and the complexity of diseases. Consequently, noise is inevitably introduced when extracting question features. Second, Med-VQA task have always been regarded as a classification problem for predefined answers, ignoring the relationships between candidate responses. Thus, the Med-VQA model pays equal attention to all candidate answers when predicting answers. In this paper, a novel Med-VQA framework is proposed to alleviate the above-mentioned problems. Specifically, we employed a question-type reasoning module severally to closed-ended and open-ended questions, thereby extracting the important information contained in the questions through an attention mechanism and filtering the noise to extract more valuable question features. To take advantage of the relational information between answers, we designed a semantic constraint space to calculate the similarity between the answers and assign higher attention to answers with high correlation. To evaluate the effectiveness of the proposed method, extensive experiments were conducted on a public dataset, namely VQA-RAD. Experimental results showed that the proposed method achieved better performance compared to other the state-of-the-art methods. The overall accuracy, closed-ended accuracy, and open-ended accuracy reached 74. 1 %, 82. 7 %, and 60. 9 %, respectively. It is worth noting that the absolute accuracy of the proposed method improved by 5. 5 % for closed-ended questions.

TIST Journal 2022 Journal Article

MetaDetector: Meta Event Knowledge Transfer for Fake News Detection

  • Yasan Ding
  • Bin Guo
  • Yan Liu
  • Yunji Liang
  • Haocheng Shen
  • Zhiwen Yu

The blooming of fake news on social networks has devastating impacts on society, the economy, and public security. Although numerous studies are conducted for the automatic detection of fake news, the majority tend to utilize deep neural networks to learn event-specific features for superior detection performance on specific datasets. However, the trained models heavily rely on the training datasets and are infeasible to apply to upcoming events due to the discrepancy between event distributions. Inspired by domain adaptation theories, we propose an end-to-end adversarial adaptation network, dubbed as MetaDetector, to transfer meta knowledge (event-shared features) between different events. Specifically, MetaDetector pushes the feature extractor and event discriminator to eliminate event-specific features and preserve required meta knowledge by adversarial training. Furthermore, the pseudo-event discriminator is utilized to evaluate the importance of news records in historical events to obtain partial knowledge that are discriminative for detecting fake news. Under the coordinated optimization among all the submodules, MetaDetector accurately transfers the meta knowledge of historical events to the upcoming event for fact checking. We conduct extensive experiments on two real-world datasets collected from Sina Weibo and Twitter. The experimental results demonstrate that MetaDetector outperforms the state-of-the-art methods, especially when the distribution discrepancy between events is significant.

IJCAI Conference 2022 Conference Paper

Physics-Informed Long-Sequence Forecasting From Multi-Resolution Spatiotemporal Data

  • Chuizheng Meng
  • Hao Niu
  • Guillaume Habault
  • Roberto Legaspi
  • Shinya Wada
  • Chihiro Ono
  • Yan Liu

Spatiotemporal data aggregated over regions or time windows at various resolutions demonstrate heterogeneous patterns and dynamics in each resolution. Meanwhile, the multi-resolution characteristic provides rich contextual information, which is critical for effective long-sequence forecasting. The importance of such inter-resolution information is more significant in practical cases, where fine-grained data is usually collected via approaches with lower costs but also lower qualities compared to those for coarse-grained data. However, existing works focus on uni-resolution data and cannot be directly applied to fully utilize the aforementioned extra information in multi-resolution data. In this work, we propose Spatiotemporal Koopman Multi-Resolution Network (ST-KMRN), a physics-informed learning framework for long-sequence forecasting from multi-resolution spatiotemporal data. Our method jointly models data aggregated in multiple resolutions and captures the inter-resolution dynamics with the self-attention mechanism. We also propose downsampling and upsampling modules among resolutions to further strengthen the connections among data of multiple resolutions. Moreover, we enhance the modeling of intra-resolution dynamics with physics-informed modules based on Koopman theory. Experimental results demonstrate that our proposed approach achieves the best performance on the long-sequence forecasting tasks compared to baselines without a specific design for multi-resolution data.

NeurIPS Conference 2022 Conference Paper

Sparse Interaction Additive Networks via Feature Interaction Detection and Sparse Selection

  • James Enouen
  • Yan Liu

There is currently a large gap in performance between the statistically rigorous methods like linear regression or additive splines and the powerful deep methods using neural networks. Previous works attempting to close this gap have failed to fully consider the exponentially growing number of feature combinations which deep networks consider automatically during training. In this work, we develop a tractable selection algorithm to efficiently identify the necessary feature combinations by leveraging techniques in feature interaction detection. Our proposed Sparse Interaction Additive Networks (SIAN) construct a bridge from these simple and interpretable models to a fully connected neural network. SIAN achieves competitive performance against state-of-the-art methods across multiple large-scale tabular datasets and consistently finds an optimal tradeoff between the modeling capacity of neural networks and the generalizability of simpler methods.

JBHI Journal 2022 Journal Article

Whole-Brain Dynamic Resting-State Functional Network Analysis in Benign Epilepsy With Centrotemporal Spikes

  • Siqi Zhang
  • Jihong Tang
  • Jing Huang
  • Guihai Suo
  • Zhiyong Zhou
  • Bo You
  • Yakang Dai
  • Yan Liu

Benign epilepsy with centrotemporal spikes (BECTS), the most common type of epilepsy among children, is considered a network disorder. Both fMRI and EEG source imaging (ESI) studies have indicated that BECTS is associated with static resting-state functional network (SFN) alterations (e. g. , decreased global efficiency) in source space. However, we find that the abovementioned alterations are not significant when the SFN calculations are performed in the scalp space using only clinical routine low-density (e. g. , 19 channels) EEG recordings (shown in our results). In the context of EEG microstates, it is clear that networks in the scalp space with resting-state EEG recordings dynamically reconfigure in a well-organized way based on different functional states. We are therefore inspired to propose a whole-brain dynamic resting-state functional network (DFN) computation method based on resting-state low-density EEG recordings with four classical microstates in scalp space. Notably, on the one hand, this approach is suitable for clinical conditions, and, on the other hand, the dynamic alternations calculated with a DFN may promote our understanding of how the networks change in BECTS. We analysed the changes in a DFN in six frequency bands (δ, θ, α low, α high, β, and γ) in patients with BECTS compared to those for healthy controls. Superior to traditional SFNs, the proposed DFN can reveal significant differences between individuals with BECTS and healthy controls (e. g. , lower global efficiency), thus matching traditional fMRI and ESI methods in the source space. Our method directly performs DFN computations from low-density EEG recordings and avoids complex ESI computations, making it promising for clinical applications, especially in the outpatient diagnosis stage.

IJCAI Conference 2021 Conference Paper

An Examination of Fairness of AI Models for Deepfake Detection

  • Loc Trinh
  • Yan Liu

Recent studies have demonstrated that deep learning models can discriminate based on protected classes like race and gender. In this work, we evaluate bias present in deepfake datasets and detection models across protected subgroups. Using facial datasets balanced by race and gender, we examine three popular deepfake detectors and find large disparities in predictive performances across races, with up to 10. 7% difference in error rate between subgroups. A closer look reveals that the widely used FaceForensics++ dataset is overwhelmingly composed of Caucasian subjects, with the majority being female Caucasians. Our investigation of the racial distribution of deepfakes reveals that the methods used to create deepfakes as positive training signals tend to produce ``irregular" faces - when a person’s face is swapped onto another person of a different race or gender. This causes detectors to learn spurious correlations between the foreground faces and fakeness. Moreover, when detectors are trained with the Blended Image (BI) dataset from Face X-Rays, we find that those detectors develop systematic discrimination towards certain racial subgroups, primarily female Asians.

AAAI Conference 2021 Conference Paper

Depth Privileged Object Detection in Indoor Scenes via Deformation Hallucination

  • Zhijie Zhang
  • Yan Liu
  • Junjie Chen
  • Li Niu
  • Liqing Zhang

RGB-D object detection has achieved significant advance, because depth provides complementary geometric information to RGB images. Considering that depth images are unavailable in some scenarios, we focus on depth privileged object detection in indoor scenes, where the depth images are only available in the training stage. Under this setting, one prevalent research line is modality hallucination, in which depth image and depth feature are common hallucination targets. In contrast, we choose to hallucinate depth deformation, which benefits a lot from rich geometric information in depth data. Specifically, we employ the deformable convolutional layer with augmented offsets to perform geometric deformation, because the offsets enable flexibly sampling over the object and transforming to a canonical shape for ease of object detection. In addition, we design a quality-based weighted transfer loss to avoid negative transfer of depth deformation. Experimental results on NYUDv2 and SUN RGB-D demonstrate the effectiveness of our method against the state-of-theart methods for depth privileged object detection.

TIST Journal 2021 Journal Article

MetaStore: A Task-adaptative Meta-learning Model for Optimal Store Placement with Multi-city Knowledge Transfer

  • Yan Liu
  • Bin Guo
  • Daqing Zhang
  • Djamal Zeghlache
  • Jingmin Chen
  • Sizhe Zhang
  • Dan Zhou
  • Xinlei Shi

Optimal store placement aims to identify the optimal location for a new brick-and-mortar store that can maximize its sale by analyzing and mining users’ preferences from large-scale urban data. In recent years, the expansion of chain enterprises in new cities brings some challenges because of two aspects: (1) data scarcity in new cities, so most existing models tend to not work (i.e., overfitting), because the superior performance of these works is conditioned on large-scale training samples; (2) data distribution discrepancy among different cities, so knowledge learned from other cities cannot be utilized directly in new cities. In this article, we propose a task-adaptative model-agnostic meta-learning framework, namely, MetaStore, to tackle these two challenges and improve the prediction performance in new cities with insufficient data for optimal store placement, by transferring prior knowledge learned from multiple data-rich cities. Specifically, we develop a task-adaptative meta-learning algorithm to learn city-specific prior initializations from multiple cities, which is capable of handling the multimodal data distribution and accelerating the adaptation in new cities compared to other methods. In addition, we design an effective learning strategy for MetaStore to promote faster convergence and optimization by sampling high-quality data for each training batch in view of noisy data in practical applications. The extensive experimental results demonstrate that our proposed method leads to state-of-the-art performance compared with various baselines.

NeurIPS Conference 2021 Conference Paper

Mixed Supervised Object Detection by Transferring Mask Prior and Semantic Similarity

  • Yan Liu
  • Zhijie Zhang
  • Li Niu
  • Junjie Chen
  • Liqing Zhang

Object detection has achieved promising success, but requires large-scale fully-annotated data, which is time-consuming and labor-extensive. Therefore, we consider object detection with mixed supervision, which learns novel object categories using weak annotations with the help of full annotations of existing base object categories. Previous works using mixed supervision mainly learn the class-agnostic objectness from fully-annotated categories, which can be transferred to upgrade the weak annotations to pseudo full annotations for novel categories. In this paper, we further transfer mask prior and semantic similarity to bridge the gap between novel categories and base categories. Specifically, the ability of using mask prior to help detect objects is learned from base categories and transferred to novel categories. Moreover, the semantic similarity between objects learned from base categories is transferred to denoise the pseudo full annotations for novel categories. Experimental results on three benchmark datasets demonstrate the effectiveness of our method over existing methods. Codes are available at https: //github. com/bcmi/TraMaS-Weak-Shot-Object-Detection.

AAAI Conference 2021 Conference Paper

Multimodal Fusion via Teacher-Student Network for Indoor Action Recognition

  • Bruce X.B. Yu
  • Yan Liu
  • Keith C.C. Chan

Indoor action recognition plays an important role in modern society, such as intelligent healthcare in large mobile cabin hospitals. With the wide usage of depth sensors like Kinect, multimodal information including skeleton and RGB modalities brings a promising way to improve the performance. However, existing methods are either focusing on a single data modality or failed to take the advantage of multiple data modalities. In this paper, we propose a Teacher-Student Multimodal Fusion (TSMF) model1 that fuses the skeleton and RGB modalities at the model level for indoor action recognition. In our TSMF, we utilize a teacher network to transfer the structural knowledge of the skeleton modality to a student network for the RGB modality. With extensive experiments on two benchmarking datasets: NTU RGB+D and PKU-MMD, results show that the proposed TSMF consistently performs better than state-of-the-art single modal and multimodal methods. It also indicates that our TSMF could not only improve the accuracy of the student network but also significantly improve the ensemble accuracy.

IJCAI Conference 2021 Conference Paper

Physics-aware Spatiotemporal Modules with Auxiliary Tasks for Meta-Learning

  • Sungyong Seo
  • Chuizheng Meng
  • Sirisha Rambhatla
  • Yan Liu

Modeling the dynamics of real-world physical systems is critical for spatiotemporal prediction tasks, but challenging when data is limited. The scarcity of real-world data and the difficulty in reproducing the data distribution hinder directly applying meta-learning techniques. Although the knowledge of governing partial differential equations (PDE) of the data can be helpful for the fast adaptation to few observations, it is mostly infeasible to exactly find the equation for observations in real-world physical systems. In this work, we propose a framework, physics-aware meta-learning with auxiliary tasks, whose spatial modules incorporate PDE-independent knowledge and temporal modules utilize the generalized features from the spatial modules to be adapted to the limited data, respectively. The framework is inspired by a local conservation law expressed mathematically as a continuity equation and does not require the exact form of governing equation to model the spatiotemporal observations. The proposed method mitigates the need for a large number of real-world tasks for meta-learning by leveraging spatial information in simulated data to meta-initialize the spatial modules. We apply the proposed framework to both synthetic and real-world spatiotemporal prediction tasks and demonstrate its superior performance with limited observations.

NeurIPS Conference 2021 Conference Paper

VigDet: Knowledge Informed Neural Temporal Point Process for Coordination Detection on Social Media

  • Yizhou Zhang
  • Karishma Sharma
  • Yan Liu

Recent years have witnessed an increasing use of coordinated accounts on social media, operated by misinformation campaigns to influence public opinion and manipulate social outcomes. Consequently, there is an urgent need to develop an effective methodology for coordinated group detection to combat the misinformation on social media. However, existing works suffer from various drawbacks, such as, either limited performance due to extreme reliance on predefined signatures of coordination, or instead an inability to address the natural sparsity of account activities on social media with useful prior domain knowledge. Therefore, in this paper, we propose a coordination detection framework incorporating neural temporal point process with prior knowledge such as temporal logic or pre-defined filtering functions. Specifically, when modeling the observed data from social media with neural temporal point process, we jointly learn a Gibbs-like distribution of group assignment based on how consistent an assignment is to (1) the account embedding space and (2) the prior knowledge. To address the challenge that the distribution is hard to be efficiently computed and sampled from, we design a theoretically guaranteed variational inference approach to learn a mean-field approximation for it. Experimental results on a real-world dataset show the effectiveness of our proposed method compared to the SOTA model in both unsupervised and semi-supervised settings. We further apply our model on a COVID-19 Vaccine Tweets dataset. The detection result suggests the presence of suspicious coordinated efforts on spreading misinformation about COVID-19 vaccines.

AAAI Conference 2020 Conference Paper

Generative Attention Networks for Multi-Agent Behavioral Modeling

  • Guangyu Li
  • Bo Jiang
  • Hao Zhu
  • Zhengping Che
  • Yan Liu

Understanding and modeling behavior of multi-agent systems is a central step for artificial intelligence. Here we present a deep generative model which captures behavior generating process of multi-agent systems, supports accurate predictions and inference, infers how agents interact in a complex system, as well as identifies agent groups and interaction types. Built upon advances in deep generative models and a novel attention mechanism, our model can learn interactions in highly heterogeneous systems with linear complexity in the number of agents. We apply this model to three multi-agent systems in different domains and evaluate performance on a diverse set of tasks including behavior prediction, interaction analysis and system identification. Experimental results demonstrate its ability to model multi-agent systems, yielding improved performance over competitive baselines. We also show the model can successfully identify agent groups and interaction types in these systems. Our model offers new opportunities to predict complex multi-agent behaviors and takes a step forward in understanding interactions in multi-agent systems.

NeurIPS Conference 2020 Conference Paper

How does This Interaction Affect Me? Interpretable Attribution for Feature Interactions

  • Michael Tsang
  • Sirisha Rambhatla
  • Yan Liu

Machine learning transparency calls for interpretable explanations of how inputs relate to predictions. Feature attribution is a way to analyze the impact of features on predictions. Feature interactions are the contextual dependence between features that jointly impact predictions. There are a number of methods that extract feature interactions in prediction models; however, the methods that assign attributions to interactions are either uninterpretable, model-specific, or non-axiomatic. We propose an interaction attribution and detection framework called Archipelago which addresses these problems and is also scalable in real-world settings. Our experiments on standard annotation labels indicate our approach provides significantly more interpretable explanations than comparable methods, which is important for analyzing the impact of interactions on predictions. We also provide accompanying visualizations of our approach that give new insights into deep neural networks.

NeurIPS Conference 2020 Conference Paper

Multi-agent Trajectory Prediction with Fuzzy Query Attention

  • Nitin Kamra
  • Hao Zhu
  • Dweep Kumarbhai Trivedi
  • Ming Zhang
  • Yan Liu

Trajectory prediction for scenes with multiple agents and entities is a challenging problem in numerous domains such as traffic prediction, pedestrian tracking and path planning. We present a general architecture to address this challenge which models the crucial inductive biases of motion, namely, inertia, relative motion, intents and interactions. Specifically, we propose a relational model to flexibly model interactions between agents in diverse environments. Since it is well-known that human decision making is fuzzy by nature, at the core of our model lies a novel attention mechanism which models interactions by making continuous-valued (fuzzy) decisions and learning the corresponding responses. Our architecture demonstrates significant performance gains over existing state-of-the-art predictive models in diverse domains such as human crowd trajectories, US freeway traffic, NBA sports data and physics datasets. We also present ablations and augmentations to understand the decision-making process and the source of gains in our model.

TIST Journal 2019 Journal Article

Combating Fake News

  • Karishma Sharma
  • Feng Qian
  • He Jiang
  • Natali Ruchansky
  • Ming Zhang
  • Yan Liu

The proliferation of fake news on social media has opened up new directions of research for timely identification and containment of fake news and mitigation of its widespread impact on public opinion. While much of the earlier research was focused on identification of fake news based on its contents or by exploiting users’ engagements with the news on social media, there has been a rising interest in proactive intervention strategies to counter the spread of misinformation and its impact on society. In this survey, we describe the modern-day problem of fake news and, in particular, highlight the technical challenges associated with it. We discuss existing methods and techniques applicable to both identification and mitigation, with a focus on the significant advances in each method and their advantages and limitations. In addition, research has often been limited by the quality of existing datasets and their specific application contexts. To alleviate this problem, we comprehensively compile and summarize characteristic features of available datasets. Furthermore, we outline new directions of research to facilitate future development of effective and interdisciplinary solutions.

AAMAS Conference 2019 Conference Paper

Deep Fictitious Play for Games with Continuous Action Spaces

  • Nitin Kamra
  • Umang Gupta
  • Kai Wang
  • Fei Fang
  • Yan Liu
  • Milind Tambe

Fictitious play has been a classic algorithm to solve two-player adversarial games with discrete action spaces. In this work we develop an approximate extension of fictitious play to two-player games with high-dimensional continuous action spaces. We use generative neural networks to approximate players’ best responses while also learning a differentiable approximate model to the players’ rewards given their actions. Both these networks are trained jointly with gradient-based optimization to emulate fictitious play. We explore our approach in zero-sum games, non zero-sum games and security game domains.

JMLR Journal 2019 Journal Article

Scalable Interpretable Multi-Response Regression via SEED

  • Zemin Zheng
  • M. Taha Bahadori
  • Yan Liu
  • Jinchi Lv

Sparse reduced-rank regression is an important tool for uncovering meaningful dependence structure between large numbers of predictors and responses in many big data applications such as genome-wide association studies and social media analysis. Despite the recent theoretical and algorithmic advances, scalable estimation of sparse reduced-rank regression remains largely unexplored. In this paper, we suggest a scalable procedure called sequential estimation with eigen-decomposition (SEED) which needs only a single top-$r$ sparse singular value decomposition from a generalized eigenvalue problem to find the optimal low-rank and sparse matrix estimate. Our suggested method is not only scalable but also performs simultaneous dimensionality reduction and variable selection. Under some mild regularity conditions, we show that SEED enjoys nice sampling properties including consistency in estimation, rank selection, prediction, and model selection. Moreover, SEED employs only basic matrix operations that can be efficiently parallelized in high performance computing devices. Numerical studies on synthetic and real data sets show that SEED outperforms the state-of-the-art approaches for large-scale matrix estimation problem. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )

AAAI Conference 2019 Conference Paper

Spatiotemporal Multi-Graph Convolution Network for Ride-Hailing Demand Forecasting

  • Xu Geng
  • Yaguang Li
  • Leye Wang
  • Lingyu Zhang
  • Qiang Yang
  • Jieping Ye
  • Yan Liu

Region-level demand forecasting is an essential task in ridehailing services. Accurate ride-hailing demand forecasting can guide vehicle dispatching, improve vehicle utilization, reduce the wait-time, and mitigate traffic congestion. This task is challenging due to the complicated spatiotemporal dependencies among regions. Existing approaches mainly focus on modeling the Euclidean correlations among spatially adjacent regions while we observe that non-Euclidean pair-wise correlations among possibly distant regions are also critical for accurate forecasting. In this paper, we propose the spatiotemporal multi-graph convolution network (ST-MGCN), a novel deep learning model for ride-hailing demand forecasting. We first encode the non-Euclidean pair-wise correlations among regions into multiple graphs and then explicitly model these correlations using multi-graph convolution. To utilize the global contextual information in modeling the temporal correlation, we further propose contextual gated recurrent neural network which augments recurrent neural network with a contextual-aware gating mechanism to re-weights different historical observations. We evaluate the proposed model on two real-world large scale ride-hailing demand datasets and observe consistent improvement of more than 10% over stateof-the-art baselines.

NeurIPS Conference 2018 Conference Paper

Neural Interaction Transparency (NIT): Disentangling Learned Interactions for Improved Interpretability

  • Michael Tsang
  • Hanpeng Liu
  • Sanjay Purushotham
  • Pavankumar Murali
  • Yan Liu

Neural networks are known to model statistical interactions, but they entangle the interactions at intermediate hidden layers for shared representation learning. We propose a framework, Neural Interaction Transparency (NIT), that disentangles the shared learning across different interactions to obtain their intrinsic lower-order and interpretable structure. This is done through a novel regularizer that directly penalizes interaction order. We show that disentangling interactions reduces a feedforward neural network to a generalized additive model with interactions, which can lead to transparent models that perform comparably to the state-of-the-art models. NIT is also flexible and efficient; it can learn generalized additive models with maximum $K$-order interactions by training only $O(1)$ models.

IJCAI Conference 2018 Conference Paper

Neural User Response Generator: Fake News Detection with Collective User Intelligence

  • Feng Qian
  • Chengyue Gong
  • Karishma Sharma
  • Yan Liu

Fake news on social media is a major challenge and studies have shown that fake news can propagate exponentially quickly in early stages. Therefore, we focus on early detection of fake news, and consider that only news article text is available at the time of detection, since additional information such as user responses and propagation patterns can be obtained only after the news spreads. However, we find historical user responses to previous articles are available and can be treated as soft semantic labels, that enrich the binary label of an article, by providing insights into why the article must be labeled as fake. We propose a novel Two-Level Convolutional Neural Network with User Response Generator (TCNN-URG) where TCNN captures semantic information from article text by representing it at the sentence and word level, and URG learns a generative model of user response to article text from historical user responses which it can use to generate responses to new articles in order to assist fake news detection. We conduct experiments on one available dataset and a larger dataset collected by ourselves. Experimental results show that TCNN-URG outperforms the baselines based on prior approaches that detect fake news from article text alone.

AAAI Conference 2018 Conference Paper

Policy Learning for Continuous Space Security Games Using Neural Networks

  • Nitin Kamra
  • Umang Gupta
  • Fei Fang
  • Yan Liu
  • Milind Tambe

A wealth of algorithms centered around (integer) linear programming have been proposed to compute equilibrium strategies in security games with discrete states and actions. However, in practice many domains possess continuous state and action spaces. In this paper, we consider a continuous space security game model with infinite-size action sets for players and present a novel deep learning based approach to extend the existing toolkit for solving security games. Specifically, we present (i) OptGradFP, a novel and general algorithm that searches for the optimal defender strategy in a parameterized continuous search space, and can also be used to learn policies over multiple game states simultaneously; (ii) OptGradFP-NN, a convolutional neural network based implementation of OptGradFP for continuous space security games. We demonstrate the potential to predict good defender strategies via experiments and analysis of OptGradFP and OptGradFP-NN on discrete and continuous game settings.

TIST Journal 2017 Journal Article

Implicit Visual Learning

  • Yan Liu
  • Yang Liu
  • Shenghua Zhong
  • Songtao Wu

According to consciousness involvement, human’s learning can be roughly classified into explicit learning and implicit learning. Contrasting strongly to explicit learning with clear targets and rules, such as our school study of mathematics, learning is implicit when we acquire new information without intending to do so. Research from psychology indicates that implicit learning is ubiquitous in our daily life. Moreover, implicit learning plays an important role in human visual perception. But in the past 60 years, most of the well-known machine-learning models aimed to simulate explicit learning while the work of modeling implicit learning was relatively limited, especially for computer vision applications. This article proposes a novel unsupervised computational model for implicit visual learning by exploring dissipative system, which provides a unifying macroscopic theory to connect biology with physics. We test the proposed Dissipative Implicit Learning Model (DILM) on various datasets. The experiments show that DILM not only provides a good match to human behavior but also improves the explicit machine-learning performance obviously on image classification tasks.

NeurIPS Conference 2016 Conference Paper

Learning Influence Functions from Incomplete Observations

  • Xinran He
  • Ke Xu
  • David Kempe
  • Yan Liu

We study the problem of learning influence functions under incomplete observations of node activations. Incomplete observations are a major concern as most (online and real-world) social networks are not fully observable. We establish both proper and improper PAC learnability of influence functions under randomly missing observations. Proper PAC learnability under the Discrete-Time Linear Threshold (DLT) and Discrete-Time Independent Cascade (DIC) models is established by reducing incomplete observations to complete observations in a modified graph. Our improper PAC learnability result applies for the DLT and DIC models as well as the Continuous-Time Independent Cascade (CIC) model. It is based on a parametrization in terms of reachability features, and also gives rise to an efficient and practical heuristic. Experiments on synthetic and real-world datasets demonstrate the ability of our method to compensate even for a fairly large fraction of missing observations.

NeurIPS Conference 2016 Conference Paper

SPALS: Fast Alternating Least Squares via Implicit Leverage Scores Sampling

  • Dehua Cheng
  • Richard Peng
  • Yan Liu
  • Ioakeim Perros

Tensor CANDECOMP/PARAFAC (CP) decomposition is a powerful but computationally challenging tool in modern data analytics. In this paper, we show ways of sampling intermediate steps of alternating minimization algorithms for computing low rank tensor CP decompositions, leading to the sparse alternating least squares (SPALS) method. Specifically, we sample the the Khatri-Rao product, which arises as an intermediate object during the iterations of alternating least squares. This product captures the interactions between different tensor modes, and form the main computational bottleneck for solving many tensor related tasks. By exploiting the spectral structures of the matrix Khatri-Rao product, we provide efficient access to its statistical leverage scores. When applied to the tensor CP decomposition, our method leads to the first algorithm that runs in sublinear time per-iteration and approximates the output of deterministic alternating least squares algorithms. Empirical evaluations of this approach show significantly speedups over existing randomized and deterministic routines for performing CP decomposition. On a tensor of the size 2. 4m by 6. 6m by 92k with over 2 billion nonzeros formed by Amazon product reviews, our routine converges in two minutes to the same error as deterministic ALS.

IJCAI Conference 2016 Conference Paper

Timeline Summarization from Social Media with Life Cycle Models

  • Yi Chang
  • Jiliang Tang
  • Dawei Yin
  • Makoto Yamada
  • Yan Liu

The popularity of social media shatters the barrier for online users to create and share information at any place at any time. As a consequence, it has become increasing difficult to locate relevance information about an entity. Timeline has been proven to provide an effective and efficient access to understand an entity by displaying a list of episodes about the entity in chronological order. However, summarizing the timeline about an entity with social media data faces new challenges. First, key timeline episodes about the entity are typically unavailable in existing social media services. Second, the short, noisy and informal nature of social media posts determines that only content-based summarization could be insufficient. In this paper, we investigate the problem of timeline summarization and propose a novel framework Timeline-Sumy, which consists of episode detecting and summary ranking. In episode detecting, we explicitly model temporal information with life cycle models to detect timeline episodes since episodes usually exhibit sudden-rise-and-heavy-tail patterns on time-series. In summary ranking, we rank social media posts in each episode via a learning-to-rank approach. The experimental results on social media datasets demonstrate the effectiveness of the proposed framework.

NeurIPS Conference 2014 Conference Paper

Fast Multivariate Spatio-temporal Analysis via Low Rank Tensor Learning

  • Mohammad Taha Bahadori
  • Qi (Rose) Yu
  • Yan Liu

Accurate and efficient analysis of multivariate spatio-temporal data is critical in climatology, geology, and sociology applications. Existing models usually assume simple inter-dependence among variables, space, and time, and are computationally expensive. We propose a unified low rank tensor learning framework for multivariate spatio-temporal analysis, which can conveniently incorporate different properties in spatio-temporal data, such as spatial clustering and shared structure among variables. We demonstrate how the general framework can be applied to cokriging and forecasting tasks, and develop an efficient greedy algorithm to solve the resulting optimization problem with convergence guarantee. We conduct experiments on both synthetic datasets and real application datasets to demonstrate that our method is not only significantly faster than existing methods but also achieves lower estimation error.

TIST Journal 2013 Journal Article

Improving recency ranking using twitter data

  • Yi Chang
  • Anlei Dong
  • Pranam Kolari
  • Ruiqiang Zhang
  • Yoshiyuki Inagaki
  • Fernanodo Diaz
  • Hongyuan Zha
  • Yan Liu

In Web search and vertical search, recency ranking refers to retrieving and ranking documents by both relevance and freshness. As impoverished in-links and click information is the the biggest challenge for recency ranking, we advocate the use of Twitter data to address the challenge in this article. We propose a method to utilize Twitter TinyURL to detect fresh and high-quality documents, and leverage Twitter data to generate novel and effective features for ranking. The empirical experiments demonstrate that the proposed approach effectively improves a commercial search engine for both Web search ranking and tweet vertical ranking.

AAAI Conference 2013 Conference Paper

Video Saliency Detection via Dynamic Consistent Spatio-Temporal Attention Modelling

  • Sheng-hua Zhong
  • Yan Liu
  • Feifei Ren
  • Jinghuan Zhang
  • Tongwei Ren

Human vision system actively seeks salient regions and movements in video sequences to reduce the search effort. Modeling computational visual saliency map provides important information for semantic understanding in many real world applications. In this paper, we propose a novel video saliency detection model for detecting the attended regions that correspond to both interesting objects and dominant motions in video sequences. In spatial saliency map, we inherit the classical bottom-up spatial saliency map. In temporal saliency map, a novel optical flow model is proposed based on the dynamic consistency of motion. The spatial and the temporal saliency maps are constructed and further fused together to create a novel attention model. The proposed attention model is evaluated on three video datasets. Empirical validations demonstrate the salient regions detected by our dynamic consistent saliency map highlight the interesting objects effectively and efficiency. More importantly, the automatically video attended regions detected by proposed attention model are consistent with the ground truth saliency maps of eye movement data.

AAAI Conference 2012 Conference Paper

Query-Oriented Multi-Document Summarization via Unsupervised Deep Learning

  • Yan Liu
  • Sheng-hua Zhong
  • Wenjie Li

Extractive style query oriented multi document summariza tion generates the summary by extracting a proper set of sentences from multiple documents based on the pre given query. This paper proposes a novel multi document summa rization framework via deep learning model. This uniform framework consists of three parts: concepts extraction, summary generation, and reconstruction validation, which work together to achieve the largest coverage of the docu ments content. A new query oriented extraction technique is proposed to concentrate distributed information to hidden units layer by layer. Then, the whole deep architecture is fi ne tuned by minimizing the information loss of reconstruc tion validation. According to the concentrated information, dynamic programming is used to seek most informative set of sentences as the summary. Experiments on three bench mark datasets demonstrate the effectiveness of the proposed framework and algorithms.

AAAI Conference 2011 Conference Paper

Detecting Multilingual and Multi-Regional Query Intent in Web Search

  • Yi Chang
  • Ruiqiang Zhang
  • Srihari Reddy
  • Yan Liu

With rapid growth of commercial search engines, detecting multilingual and multi-regional intent underlying search queries becomes a critical challenge to serve international users with diverse language and region requirements. We introduce a query intent probabilistic model, whose input is the number of clicks on documents from different regions and in different language, while the output of this model is a smoothed probabilistic distribution of multilingual and multi-regional query intent. Based on an editorial test to evaluate the accuracy of the intent classifier, our probabilistic model could improve the accuracy of multilingual intent detection for 15%, and improve multi-regional intent detection for 18%. To improve web search quality, we propose a set of new ranking features to combine multilingual and multi-regional query intent with document language/region attributes, and apply different approaches in integrating intent information to directly affect ranking. The experiments show that the novel features could provide 2. 31% NDCG@1 improvement and 1. 81% NDCG@5 improvement.

NeurIPS Conference 2011 Conference Paper

Multiple Instance Learning on Structured Data

  • Dan Zhang
  • Yan Liu
  • Luo Si
  • Jian Zhang
  • Richard Lawrence

Most existing Multiple-Instance Learning (MIL) algorithms assume data instances and/or data bags are independently and identically distributed. But there often exists rich additional dependency/structure information between instances/bags within many applications of MIL. Ignoring this structure information limits the performance of existing MIL algorithms. This paper explores the research problem as multiple instance learning on structured data (MILSD) and formulates a novel framework that considers additional structure information. In particular, an effective and efficient optimization algorithm has been proposed to solve the original non-convex optimization problem by using a combination of Concave-Convex Constraint Programming (CCCP) method and an adapted Cutting Plane method, which deals with two sets of constraints caused by learning on instances within individual bags and learning on structured data. Our method has the nice convergence property, with specified precision on each set of constraints. Experimental results on three different applications, i. e. , webpage classification, market targeting, and protein fold identification, clearly demonstrate the advantages of the proposed method over state-of-the-art methods.

AAAI Conference 2011 Conference Paper

Ordinal Regression via Manifold Learning

  • Yang Liu
  • Yan Liu
  • Keith Chan

Ordinal regression is an important research topic in machine learning. It aims to automatically determine the implied rating of a data item on a fixed, discrete rating scale. In this paper, we present a novel ordinal regression approach via manifold learning, which is capable of uncovering the embedded nonlinear structure of the data set according to the observations in the highdimensional feature space. By optimizing the order information of the observations and preserving the intrinsic geometry of the data set simultaneously, the proposed algorithm provides the faithful ordinal regression to the new coming data points. To offer more general solution to the data with natural tensor structure, we further introduce the multilinear extension of the proposed algorithm, which can support the ordinal regression of high order data like images. Experiments on various data sets validate the effectiveness of the proposed algorithm as well as its extension.

AAAI Conference 2011 Conference Paper

Transfer Latent Semantic Learning: Microblog Mining with Less Supervision

  • Dan Zhang
  • Yan Liu
  • Richard Lawrence
  • Vijil Chenthamarakshan

The increasing volume of information generated on microblogging sites such as Twitter raises several challenges to traditional text mining techniques. First, most texts from those sites are abbreviated due to the constraints of limited characters in one post; second, the input usually comes in streams of large-volumes. Therefore, it is of significant importance to develop effective and efficient representations of abbreviated texts for better filtering and mining. In this paper, we introduce a novel transfer learning approach, namely transfer latent semantic learning, that utilizes a large number of related tagged documents with rich information from other sources (source domain) to help build a robust latent semantic space for the abbreviated texts (target domain). This is achieved by simultaneously minimizing the document reconstruction error and the classification error of the labeled examples from the source domain by building a classifier with hinge loss in the latent semantic space. We demonstrate the effectiveness of our method by applying them to the task of classifying and tagging abbreviated texts. Experimental results on both synthetic datasets and real application datasets, including Reuters-21578 and Twitter data, suggest substantial improvements using our approach over existing ones.

AAAI Conference 2010 Conference Paper

Learning Spatial-Temporal Varying Graphs with Applications to Climate Data Analysis

  • Xi Chen
  • Yan Liu
  • Han Liu
  • Jaime Carbonell

An important challenge in understanding climate change is to uncover the dependency relationships between various climate observations and forcing factors. Graphical lasso, a recently proposed `1 penalty based structure learning algorithm, has been proven successful for learning underlying dependency structures for the data drawn from a multivariate Gaussian distribution. However, climatological data often turn out to be non-Gaussian, e. g. cloud cover, precipitation, etc. In this paper, we examine nonparametric learning methods to address this challenge. In particular, we develop a methodology to learn dynamic graph structures from spatial-temporal data so that the graph structures at adjacent time or locations are similar. Experimental results demonstrate that our method not only recovers the underlying graph well but also captures the smooth variation properties on both synthetic data and climate data.

AAAI Conference 2010 Conference Paper

Multilinear Maximum Distance Embedding Via L1-Norm Optimization

  • Yang Liu
  • Yan Liu
  • Keith Chan

Dimensionality reduction plays an important role in many machine learning and pattern recognition tasks. In this paper, we present a novel dimensionality reduction algorithm called multilinear maximum distance embedding (M2 DE), which includes three key components. To preserve the local geometry and discriminant information in the embedded space, M2 DE utilizes a new objective function, which aims to maximize the distances between some particular pairs of data points, such as the distances between nearby points and the distances between data points from different classes. To make the mapping of new data points straightforward, and more importantly, to keep the natural tensor structure of high-order data, M2 DE integrates multilinear techniques to learn the transformation matrices sequentially. To provide reasonable and stable embedding results, M2 DE employs the L1-norm, which is more robust to outliers, to measure the dissimilarity between data points. Experiments on various datasets demonstrate that M2 DE achieves good embedding results of high-order data for classification tasks.

YNIMG Journal 2010 Journal Article

Structural asymmetries in motor and language networks in a population of healthy preterm neonates at term equivalent age: A diffusion tensor imaging and probabilistic tractography study

  • Yan Liu
  • Danielle Balériaux
  • Martin Kavec
  • Thierry Metens
  • Julie Absil
  • Vincent Denolin
  • Anne Pardou
  • Freddy Avni

In this MRI study, we aimed to provide new in vivo structural markers of asymmetry in motor and language networks in a population of healthy preterm neonates scanned at term equivalent age. Using diffusion tensor imaging and probabilistic tractography, we showed that, besides volume and microstructural asymmetries in the parieto-temporal part of the superior longitudinal fasciculus (SLF) and a trend towards microstructural asymmetry in the corticospinal tract (CST), volume asymmetry in the motor part of the superior thalamic radiations (STR) and a trend towards volume asymmetry in the CST are already present in the neonatal period. No asymmetry was found in the sensory part of the STR, the anterior thalamic radiations (ATR), and posterior thalamic radiations (PTR) neither in the fronto-parietal part of the SLF. These results suggest that structural asymmetries in the motor and language networks are present in healthy preterm neonates at term equivalent age, well before the development of speech and hand preference.

IJCAI Conference 2007 Conference Paper

  • Yan Liu
  • Jaime Carbonell
  • Vanathi Gopalakrishnan
  • Peter Weigele

Protein fold recognition is a crucial step in inferring biological structure and function. This paper focuses on machine learning methods for predicting quaternary structural folds, which consist of multiple protein chains that form chemical bonds among side chains to reach a structurally stable domain. The complexity associated with modeling the quaternary fold poses major theoretical and computational challenges to current machine learning methods. We propose methods to address these challenges and show how (1) domain knowledge is encoded and utilized to characterize structural properties using segmentation conditional graphical models; and (2) model complexity is handled through efficient inference algorithms. Our model follows a discriminative approach so that any informative features, such as those representative of overlapping or long-range interactions, can be used conveniently. The model is applied to predict two important quaternary folds, the triple beta-spirals and double-barrel trimers. Cross-family validation shows that our method outperforms other state-of-the art algorithms.

IJCAI Conference 2007 Conference Paper

  • Jingrui He
  • Jaime Carbonell
  • Yan Liu

This paper proposes and develops a new graph-based semi-supervised learning method. Different from previous graph-based methods that are based on discriminative models, our method is essentially a generative model in that the class conditional probabilities are estimated by graph propagation and the class priors are estimated by linear regression. Experimental results on various datasets show that the proposed method is superior to existing graph-based semi-supervised learning methods, especially when the labeled subset alone proves insufficient to estimate meaningful class priors.

IJCAI Conference 2007 Conference Paper

  • Katharina Probst
  • Rayid Ghani
  • Marko Krema
  • Andrew Fano
  • Yan Liu

We describe an approach to extract attribute-value pairs from product descriptions. This allows us to represent products as sets of such attribute-value pairs to augment product databases. Such a representation is useful for a variety of tasks where treating a product as a set of attribute-value pairs is more useful than as an atomic entity. Examples of such applications include product recommendations, product comparison, and demand forecasting. We formulate the extraction as a classification problem and use a semi-supervised algorithm (co-EM) along with (Naive Bayes). The extraction system requires very little initial user supervision: using unlabeled data, we automatically extract an initial seed list that serves as training data for the supervised and semi-supervised classification algorithms. Finally, the extracted attributes and values are linked to form pairs using dependency information and co-location scores. We present promising results on product descriptions in two categories of sporting goods.