Arrow Research search

Author name cluster

Yang Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

375 papers
2 author rows

Possible papers

375

EAAI Journal 2026 Journal Article

A comprehensive survey on table question answering: Datasets, methods and future directions

  • Weiqiang Xu
  • Yang Liu
  • Lingfeng Lu
  • Huakang Li
  • Guozi Sun

In the data-centric era of Industry 4. 0, a vast amount of critical information — ranging from machining sensor logs and material synthesis recipes to financial statements — is stored in structured tabular formats. Table Question Answering (TableQA) aims to automatically interpret tabular data and provide precise answers to natural language (NL) queries, serving as a vital interface for intelligent engineering systems. This survey provides a comprehensive review of the table question answering landscape, bridging theoretical advances with practical engineering applications. Methodologically, we propose a unified taxonomy that categorizes datasets and approaches into table-only and non-table-only paradigms. We systematically trace the technical evolution from early rule-based semantic parsing and pre-trained language models (PLMs) to recent large language models (LLMs), highlighting innovations in numerical reasoning and cross-modal alignment. From an engineering perspective, we critically evaluate how these techniques are applied to solve domain-specific challenges, such as predictive maintenance in smart manufacturing, property extraction in material informatics, and decision support in business intelligence. Furthermore, going beyond academic benchmarks, we analyze pressing constraints for industrial deployment, including real-time inference latency, system reliability, and verification in safety-critical environments. Finally, we outline future research directions for building robust, verifiable, and computationally efficient table question answering systems across various industrial domains.

EAAI Journal 2026 Journal Article

A novel domain generalization framework for fault diagnosis of rotating machinery based on causal representation learning and causal feature identification

  • Yang Liu
  • Weidong Cheng
  • Weigang Wen
  • Qiqiang Fang
  • Hengshan Wu

Rotating machinery in engineering applications often operates under complex working conditions, leading to distribution shifts in monitoring data that degrade the performance of traditional intelligent diagnosis models. Domain generalization (DG) has emerged as a pivotal technology for deploying intelligent diagnosis methods in practical engineering. Recently, causal representation learning has attracted significant attention in DG research. However, the causality of the features obtained by existing methods has not been further tested, resulting in diagnosis models failing to achieve optimal performance. To address this challenge, this study further tests the causality of features on the basis of causal representation learning and proposes a novel DG framework for fault diagnosis. Specifically, a causal analysis of vibration signal generation and feature extraction processes is conducted, and a structure causal model (SCM) is established. Based on the SCM, a pretraining model is designed for causal representation learning. Furthermore, a causality test algorithm is proposed for causal feature identification. Finally, a three-stage DG framework is constructed by integrating active learning (pretraining model) with objective testing (causality test algorithm). The superiority of the proposed method is verified on five datasets, including bearings and gears. The proposed method demonstrates exceptional performance in DG for rotating machinery fault diagnosis, while guiding model optimization and engineering deployment, indicating its broad application prospects in real-world engineering practices.

AIIM Journal 2026 Journal Article

A novel ECG QRS complex detection algorithm based on dynamic Bayesian network

  • Qince Li
  • Yang Liu
  • Na Zhao
  • Yongfeng Yuan
  • Runnan He

Accurate detection of the QRS complex, a crucial reference for heartbeat localization in electrocardiogram (ECG) signals, remains inadequate in wearable ECG devices due to complex noise interference. In this study, we propose a novel QRS complex detection method based on dynamic Bayesian network (DBN), integrating the probability distribution of RR intervals. Unlike methods focusing solely on ECG waveforms, our approach explicitly integrates ECG waveform and heart rhythm information into a unified probability model, enhancing noise robustness. Additionally, an unsupervised parameter optimization using expectation maximization (EM) adapts to individual differences of patients. Furthermore, several simplification strategies improve reasoning efficiency, and an online detection mode enables real-time applications. Our method outperforms other state-of-the-art QRS detection methods, including deep learning (DL) methods, on noisy datasets. In conclusion, the proposed DBN-based QRS detection algorithm demonstrates outstanding accuracy, noise robustness, generalization ability, real-time capability, and strong scalability, indicating its potential application in wearable ECG devices.

AAAI Conference 2026 Conference Paper

Addressing Polarization and Unfairness in Performative Prediction

  • Kun Jin
  • Tian Xie
  • Yang Liu
  • Xueru Zhang

In many real-world applications of machine learning—such as recommendations, hiring, and lending—deployed models influence the data they are trained on, leading to feedback loops between predictions and data distribution. The performative prediction (PP) framework captures this phenomenon by modeling the data distribution as a function of the deployed model. While prior work has focused on finding performative stable (PS) solutions for robustness, their societal impacts, particularly regarding fairness, remain underexplored. We show that PS solutions can lead to severe polarization and prediction performance disparities, and that conventional fairness interventions in previous works often fail under model-dependent distribution shifts due to failing the PS criteria. To address these challenges in PP, we introduce novel fairness mechanisms that provably ensure both stability and fairness, validated by theoretical analysis and empirical results.

EAAI Journal 2026 Journal Article

Automatic classification of circulating blood cell clusters based on multi-channel flow cytometry imaging

  • Suqiang Ma
  • Subhadeep Sengupta
  • Yao Lee
  • Beikang Gu
  • Xianyan Chen
  • Xianqiao Wang
  • Yang Liu
  • Mengjia Xu

Circulating blood cell clusters (CCCs) containing red blood cells (RBCs), white blood cells (WBCs), and platelets are significant biomarkers linked to pathological conditions like thrombosis, infection, and inflammation. Flow cytometry, paired with fluorescence staining, is commonly used to analyze these cell clusters, revealing cell morphology and protein profiles. While computational approaches based on machine learning have advanced the automatic analysis of single-cell flow cytometry images, there is a lack of effort to build tools to automatically analyze images containing CCCs. Unlike single cells, cell clusters exhibit irregular shapes and sizes. In addition, these cell clusters often consist of heterogeneous cell types, which require multi-channel staining to identify the specific cell types within the clusters. To address these challenges, we introduce a new computational framework for analyzing CCC images and identifying cell types within clusters. Our framework uses a two-step analysis strategy. First, it categorizes images into cell cluster and non-cluster groups by fine-tuning the You Only Look Once (YOLOv11) model, which outperforms traditional convolutional neural networks (CNNs), such as Vision Transformers (ViT). Then, it identifies cell types by overlaying cluster contours with regions from multi-channel fluorescence stains, thereby minimizing the impact of cell debris and staining artifacts. This approach achieved over 95% accuracy in both cluster classification and cell phenotype identification. In summary, our automated framework effectively analyzes CCC images from flow cytometry, leveraging both bright-field and fluorescence data. Initially tested on blood cells, it holds potential for broader applications, such as analyzing immune and tumor cell clusters, supporting cellular research across various diseases.

AAAI Conference 2026 Conference Paper

Controllable Financial Market Generation with Diffusion Guided Meta Agent

  • Yu-Hao Huang
  • Chang Xu
  • Yang Liu
  • Weiqing Liu
  • Wu-Jun Li
  • Jiang Bian

Generative modeling has transformed many fields, such as language and visual modeling, while its application in financial markets remains under-explored. As the minimal unit within a financial market is an order, order-flow modeling represents a fundamental generative financial task. However, current approaches often yield unsatisfactory fidelity in generating order flow, and their generation lacks controllability, thereby limiting their practical applications. In this paper, we formulate the challenge of controllable financial market generation, and propose a Diffusion Guided Meta Agent (DigMA) model to address it. Specifically, we employ a conditional diffusion model to capture the dynamics of the market state represented by time-evolving distribution parameters of the mid-price return rate and the order arrival rate, and we define a meta agent with financial economic priors to generate orders from the corresponding distributions. Extensive experimental results show that DigMA achieves superior controllability and generation fidelity. Moreover, we validate its effectiveness as a generative environment for downstream high-frequency trading tasks and its computational efficiency.

AIIM Journal 2026 Journal Article

Development and validation of deep continual learning model to sequentially learn multiple clinical prediction tasks for ICU patients

  • Zhixuan Zeng
  • Yang Liu
  • Shuo Yao
  • Xu Cai
  • Wenbin Nan
  • Yiyang Xie
  • Xun Gong

Background ICU patients often suffer from critical and complex condition, and multiple potential risks should be monitored to provide them comprehensive care. However, no study proposes continual learning (CL) model that can effectively solve multiple clinical prediction tasks without catastrophic forgetting. This study proposes three deep CL models for ICU patients. Methods Three public ICU databases were employed. The included patients from MIMIC-III and MIMIC-IV were divided into eight task sets, and the patients from eICU-CRD composed the test set. We propose three CL models (CL_1, CL_2, CL_3) to sequentially learn eight prediction tasks on the eight task sets, and then externally validate them on the test set. We compare our models to three representative baseline CL models and the single-task (ST) and multi-task (MT) model. We train all the CL models under different orders, and evaluate their prediction performance by multiple metrics and their memory ability by backward transfer (BWT). We also analyzed the effect of previously learned tasks on learning new tasks. Results Our three CL models had comparable or slightly weaker performance compared to ST and MT model on the eight tasks. They effectively mitigated catastrophic forgetting, and their performance is robust to different training orders. CL_2 and CL_3 even have improved performance on the current task after learning some previous tasks. Our three CL models outperformed the baseline CL models in most experiments. Conclusions Our CL models are promising to sequentially learn multiple clinical prediction tasks for ICU patients. The CL_2 and CL_3 show the ability of utilizing information of previous tasks to improve learning new tasks. More new datasets and tasks are still needed to further verify the validity of the CL models.

EAAI Journal 2026 Journal Article

Exploiting implicit knowledge for streaming perception object detection

  • Qingsong Tang
  • Jinting Guo
  • Xuexiao Zhou
  • Yongkang Li
  • Mingzhi Yang
  • Yang Liu

Stream perception is a more challenging task than offline perception. Existing methods perform stream perception object detection by endowing real-time detectors with the ability to predict the future. The difficulty of such methods mainly lies in perceiving complex and changing video background environments, as well as varying object speeds. In this context, we propose a real-time object detection model that utilizes implicit knowledge to enhance features. First, we use a channel implicit knowledge module to perform early fine-tuning on Argoverse-High Definition (Argoverse-HD). This allows the model to perceive the background environment and obtain rich positional features. Then, we use a spatial implicit knowledge module to refine the movement speed features of objects. These refined features are integrated with position features for final fine-tuning. In the final fine-tuning stage, we further weight the original dynamic top- k label assignment strategy to measure the importance of positive samples. Through this weighting, we aim to obtain finer-grained object localization. Our model achieves 37. 8% streaming Average Precision (sAP) on Argoverse-HD ( + 0. 9 % over baseline) with merely 0. 01G additional Floating Point Operations (FLOPs) and a latency increase of less than 3 millisecond (ms). Code is available on https: //github. com/GjtZ/ISYOLO. git.

AAAI Conference 2026 Conference Paper

Graph VQ-Transformer (GVT): Fast and Accurate Molecular Generation via High-Fidelity Discrete Latents

  • Haozhuo Zheng
  • Cheng Wang
  • Yang Liu

The de novo generation of molecules with desirable properties is a critical challenge, where diffusion models are computationally intensive and autoregressive models struggle with error propagation. In this work, we introduce the Graph VQ-Transformer (GVT), a two-stage generative framework that achieves both high accuracy and efficiency. The core of our approach is a novel Graph Vector Quantized Variational Autoencoder (VQ-VAE) that compresses molecular graphs into high-fidelity discrete latent sequences. By synergistically combining a Graph Transformer with canonical Reverse Cuthill-McKee (RCM) node ordering and Rotary Positional Embeddings (RoPE), our VQ-VAE achieves near-perfect reconstruction rates. An autoregressive Transformer is then trained on these discrete latents, effectively converting graph generation into a well-structured sequence modeling problem. Crucially, this mapping of complex graphs to high-fidelity discrete sequences bridges molecular design with the powerful paradigm of large-scale sequence modeling, unlocking potential synergies with Large Language Models (LLMs). Extensive experiments show that GVT achieves state-of-the-art or highly competitive performance across major benchmarks like ZINC250k, MOSES, and GuacaMol, and notably outperforms leading diffusion models on key distribution similarity metrics such as FCD and KL Divergence. With its superior performance, efficiency, and architectural novelty, GVT not only presents a compelling alternative to diffusion models but also establishes a strong new baseline for the field, paving the way for future research in discrete latent-space molecular generation.

AAAI Conference 2026 Conference Paper

Hidden in the Noise: Unveiling Backdoors in Audio LLMs Alignment Through Latent Acoustic Pattern Triggers

  • Liang Lin
  • Miao Yu
  • Kaiwen Luo
  • Yibo Zhang
  • Lilan Peng
  • Dexian Wang
  • Xuehai Tang
  • Yuanhe Zhang

As Audio Large Language Models (ALLMs) emerge as powerful tools for speech processing, their safety implications demand urgent attention. While considerable research has explored textual and vision safety, audio’s distinct characteristics present significant challenges. This paper first investigates: Is ALLM vulnerable to backdoor attacks exploiting acoustic triggers? In response to this issue, we introduce Hidden in the Noise (HIN), a novel backdoor attack framework designed to exploit subtle, audio-specific features. HIN applies acoustic modifications to raw audio waveforms, such as alterations to temporal dynamics and strategic injection of spectrally tailored noise. These changes introduce consistent patterns that an ALLM’s acoustic feature encoder captures, embedding robust triggers within the audio stream. To evaluate ALLM robustness against audio-feature-based triggers, we develop the AudioSafe benchmark, assessing nine distinct risk types. Extensive experiments on AudioSafe and three established safety datasets reveal critical vulnerabilities in existing ALLMs: (I) audio features like environment noise and speech rate variations achieve over 90% average attack success rate, (II) ALLMs exhibit significant sensitivity differences across acoustic features, particularly showing minimal response to volume as a trigger, and (III) poisoned sample inclusion causes only marginal loss curve fluctuations, highlighting the attack’s stealth.

AAAI Conference 2026 Conference Paper

I2E: Real-Time Image-to-Event Conversion for High-Performance Spiking Neural Networks

  • Ruichen Ma
  • Liwei Meng
  • Guanchao Qiao
  • Ning Ning
  • Yang Liu
  • Shaogang Hu

Spiking neural networks (SNNs) promise highly energy-efficient computing, but their adoption is hindered by a critical scarcity of event-stream data. This work introduces I2E, an algorithmic framework that resolves this bottleneck by converting static images into high-fidelity event streams. By simulating microsaccadic eye movements with a highly parallelized convolution, I2E achieves a conversion speed over 300x faster than prior methods, uniquely enabling on-the-fly data augmentation for SNN training. The framework's effectiveness is demonstrated on large-scale benchmarks. An SNN trained on the generated I2E-ImageNet dataset achieves a state-of-the-art accuracy of 60.50%. Critically, this work establishes a powerful sim-to-real paradigm where pre-training on synthetic I2E data and fine-tuning on the real-world CIFAR10-DVS dataset yields an unprecedented accuracy of 92.5%. This result validates that synthetic event data can serve as a high-fidelity proxy for real sensor data, bridging a long-standing gap in neuromorphic engineering. By providing a scalable solution to the data problem, I2E offers a foundational toolkit for developing high-performance neuromorphic systems. The open-source algorithm and all generated datasets are provided to accelerate research in the field.

AAAI Conference 2026 Conference Paper

ICM-Fusion: In-Context Meta-Optimized LoRA Fusion for Multi-Task Adaptation

  • Yihua Shao
  • Xiaofeng Lin
  • Xinwei Long
  • Siyu Chen
  • Minxi Yan
  • Yang Liu
  • Ziyang Yan
  • Ao Ma

Enabling multi-task adaptation in pre-trained Low-Rank Adaptation (LoRA) models is crucial for enhancing their generalization capabilities. Most existing pre-trained LoRA fusion methods decompose weight matrices, sharing similar parameters, while fusion divergent ones. However, this paradigm inevitably induces inter-weight conflicts and leads to catastrophic domain forgetting. While incremental learning enables adaptation to multiple tasks, it struggles to achieve generalization in few-shot scenarios. Consequently, when the weight data follows a long-tailed distribution, it can lead to forgetting in the fused weights. To address this issue, we propose In-Context Meta LoRA Fusion (ICM-Fusion), a novel framework that synergizes meta-learning with in-context adaptation. The key innovation lies in our task vector arithmetic, which dynamically balances conflicting optimization directions across domains through learned manifold projections. ICM-Fusion obtains the optimal task vector orientation for the fused model in the latent space by adjusting the orientation of the task vectors. Subsequently, the fused LoRA is reconstructed by a self-designed Fusion VAE (F-VAE) to realize multi-task LoRA generation. We have conducted extensive experiments on visual and linguistic tasks, and the experimental results demonstrate that ICM-Fusion can be adapted to a wide range of architectural models and applied to various tasks. Compared to the current pre-trained LoRA fusion method, ICM-Fusion fused LoRA can significantly reduce the multi-tasking loss and can even achieve task enhancement in few-shot scenarios.

AAAI Conference 2026 Conference Paper

Image-Text Knowledge Modeling for Unsupervised Multi-Scenario Person Re-Identification

  • Zhiqi Pang
  • Lingling Zhao
  • Yang Liu
  • Chunyu Wang
  • Gaurav Sharma

We propose unsupervised multi-scenario (UMS) person re-identification (ReID) as a new task that expands ReID across diverse scenarios (cross-resolution, clothing change, etc.) within a single coherent framework. To tackle UMS-ReID, we introduce image-text knowledge modeling (ITKM) -- a three-stage framework that effectively exploits the representational power of vision-language models. We start with a pre-trained CLIP model with an image encoder and a text encoder. In Stage I, we introduce a scenario embedding in the image encoder and fine-tune the encoder to adaptively leverage knowledge from multiple scenarios. In Stage II, we optimize a set of learned text embeddings to associate with pseudo-labels from Stage I and introduce a multi-scenario separation loss to increase the divergence between inter-scenario text representations. In Stage III, we first introduce cluster-level and instance-level heterogeneous matching modules to obtain reliable heterogeneous positive pairs (e.g., a visible image and an infrared image of the same person) within each scenario. Next, we propose a dynamic text representation update strategy to maintain consistency between text and image supervision signals. Experimental results across multiple scenarios demonstrate the superiority and generalizability of ITKM; it not only outperforms existing scenario-specific methods but also enhances overall performance by integrating knowledge from multiple scenarios.

AAAI Conference 2026 Conference Paper

Improving Sustainability of Adversarial Examples in Class-Incremental Learning

  • Taifeng Liu
  • Xinjing Liu
  • Liangqiu Dong
  • Yang Liu
  • Yilong Yang
  • Zhuo Ma

Current adversarial examples (AEs) are typically designed for static models. However, with the wide application of Class-Incremental Learning (CIL), models are no longer static and need to be updated with new data distributed and labeled differently from the old ones. As a result, existing AEs often fail after CIL updates due to significant domain drift. In this paper, we propose SAE to enhance the sustainability of AEs against CIL. The core idea of SAE is to enhance the robustness of AE semantics against domain drift by making them more similar to the target class while distinguishing them from all other classes. Achieving this is challenging, as relying solely on the initial CIL model to optimize AE semantics often leads to overfitting. To resolve the problem, we propose a Semantic Correction Module. This module encourages the AE semantics to be generalized, based on a generative model capable of producing universal semantics. Additionally, it incorporates the CIL model to correct the optimization direction of the AE semantics, guiding them closer to the target class. To further reduce fluctuations in AE semantics, we propose a Filtering-and-Augmentation Module, which first identifies non-target examples with target-class semantics in the latent space and then augments them to foster more stable semantics. Comprehensive experiments demonstrate that SAE outperforms baselines by an average of 31.28% when updated with a 9-fold increase in the number of classes.

EAAI Journal 2026 Journal Article

In-situ three-dimensional profilometry in high-temperature environment via laser structured light with conditional generative adversarial network-based adaptive speckle denoising

  • Hongsen Wang
  • Fujia Liu
  • Jianhong Yang
  • Yuxuan Jiang
  • Chaoyang Duan
  • Yang Liu

The laser structured light method has been proven effective for in-situ three-dimensional profilometry of high-temperature object in various industrial and aerospace applications. However, during the measurement process, laser speckle noise inevitably occurs, which degrades the measurement accuracy. Owing to the specific requirements for restoring fringe structures and edges, the coupled effects of various types of noise in high-temperature environments, and the inherent difficulty of acquiring training data under such conditions, previous speckle suppression methods exhibit limited performance. Therefore, an adaptive speckle denoising conditional generative adversarial network is proposed in this paper to improve the measurement accuracy of laser structured light method. Specifically, a multi-scale adaptive filtering module is introduced in the generator to reduce composite noise and a scale adaptive network is designed in the discriminator to provide supervision from pixel details to global structure. Moreover, a laser/light-emitting diode multiplexing optical path by the laser structured light measurement system is designed to improve the quality of training dataset. Experiments on in situ measurement of the contour of tungsten alloy sample blocks was carried out in an electron beam heating environment at 2015 °C, with an average error of 0. 12 mm, representing a 60. 1 percent improvement compared with existing methods, thereby demonstrating the advantages of the proposed method in high-temperature environments.

EAAI Journal 2026 Journal Article

Knowledge-data driven digital twin platform: intelligent prediction and control of tunnel face stability during large-diameter slurry shield construction

  • Xianguo Wu
  • Yu Lei
  • Feiming Su
  • Tiejun Li
  • Yang Liu

Large diameter shield construction (LDSS) safety is particularly important. To ensure safe excavation in LDSS projects, this paper proposed a digital twin (DT) platform integrated with knowledge-data driven method to achieve prediction and control of tunnel face stability (TFS) during LDSS construction. The DT platform enables acquisition of physical data and expert knowledge, facilitating bidirectional synchronization between physical entities and virtual counterparts. DT platform utilizes Bayesian Optimization (BO), Graph Convolutional Network (GCN), Bidirectional Long Short-Term Memory (BiLSTM) networks, and SHapley Additive exPlanations (SHAP), driven by physical data, together with expert knowledge, to achieve knowledge-data driven prediction and control of Tunnel Face Stability (TFS) during LDSS construction. A case study of Wuhan Metro Line 12 construction demonstrates the platform's effectiveness, with key findings revealing: (1) The DT platform achieves accurate TFS prediction across six Geological types, showing average R2 values of 0. 935 and root mean square error (RMSE) of 0. 239. (2) Key construction parameters are identified through the knowledge-data driven method, including air chamber pressure, grouting pressure, advance rate, cutterhead rotation speed, and cutterhead torque. (3) The DT platform enables control of the TFS during LDSS construction by maintaining the slurry pressure (SP) within an optimal range. The DT platform developed in this study fills TFS research gaps in LDSS construction and provides valuable references for similar complex projects.

AAAI Conference 2026 Conference Paper

MAGIC: Mastering Physical Adversarial Generation in Context Through Collaborative LLM Agents

  • Yun Xing
  • Nhat Chung
  • Jie Zhang
  • Yue Cao
  • Ivor Tsang
  • Yang Liu
  • Lei Ma
  • Qing Guo

Physical adversarial attacks in driving scenarios can expose critical vulnerabilities in visual perception models. However, developing such attacks remains non-trivial due to diverse real-world environmental influences. Existing approaches either struggle to generalize to dynamic environments or fail to achieve consistent physical attack performance. To address these challenges, we propose MAGIC (Mastering Physical Adversarial Generation In Context), a novel framework powered by multi-modal LLM agents to automatically understand the scene context during testing time and generate adversarial patches through synergistic interaction of language and vision understanding. Specifically, MAGIC orchestrates three specialized LLM agents: the adv-patch generation agent masters the creation of deceptive patches via strategic prompt manipulation for text-to-image models; the adv-patch deployment agent ensures contextual coherence by determining optimal deployment strategies based on scene understanding; and the self-examination agent completes this trilogy by providing critical oversight and iterative refinement of both processes. We validate our approach with both digital and physical scenarios, i.e., nuImage and real-world scenes, where both statistical and visual results demonstrate that our MAGIC is powerful and effective for attacking widely applied object detection systems, such as YOLO and DETR series.

AAAI Conference 2026 Conference Paper

Modeling Trend Dynamics with Variational Neural ODEs for Information Popularity Prediction

  • Yuchen Wang
  • Dongpeng Hou
  • Weikai Jing
  • Chao Gao
  • Xianghua Li
  • Yang Liu

Predicting the future popularity of information in online social networks is a crucial yet challenging task, due to the complex spatiotemporal dynamics underlying information diffusion. Existing methods typically use structural or sequential patterns within the observation window as direct inputs for subsequent popularity prediction. However, most approaches lack the ability to explicitly model the overall trend of popularity up to the prediction time, which leads to limited predictive capability. To address these limitations, we propose VNOIP, a novel method based on variational neural Ordinary Differential Equations (ODEs) for information popularity prediction. Specifically, VNOIP introduces bidirectional jump ODEs with attention mechanisms to capture long-range dependencies and bidirectional context within cascade sequences. Furthermore, by jointly considering both cascade patterns and overall trend temporal patterns, VNOIP explicitly models the continuous-time dynamics of popularity trend trajectories with variational neural ODEs. Additionally, a knowledge distillation loss is employed to align the evolution of prior and posterior latent variables. Extensive experiments on real-world datasets demonstrate that VNOIP is highly competitive in both prediction accuracy and efficiency compared to state-of-the-art baselines.

AAAI Conference 2026 Conference Paper

On the Alignment of Large Language Models with Global Human Opinion

  • Yang Liu
  • Masahiro Kaneko
  • Chenhui Chu

Today's large language models (LLMs) are capable of supporting multilingual scenarios, allowing users to interact with LLMs in their native languages. When LLMs respond to subjective questions posed by users, they are expected to align with the views of specific demographic groups or historical periods, shaped by the language in which the user interacts with the model. Existing studies mainly focus on researching the opinions represented by LLMs among demographic groups in the United States or a few countries, lacking worldwide country samples and studies on human opinions in different historical periods, as well as lacking discussion on using language to steer LLMs. Moreover, they also overlook the potential influence of prompt language on the alignment of LLMs' opinions. In this study, our goal is to fill these gaps. To this end, we create an evaluation framework based on the World Values Survey (WVS) to systematically assess the alignment of LLMs with human opinions across different countries, languages, and historical periods around the world. We find that LLMs appropriately or over-align the opinions with only a few countries while under-aligning the opinions with most countries. Furthermore, changing the language of the prompt to match the language used in the questionnaire can effectively steer LLMs to align with the opinions of the corresponding country more effectively than existing steering methods. At the same time, LLMs are more aligned with the opinions of the contemporary population. To our knowledge, our study is the first comprehensive investigation of the topic of opinion alignment in LLMs across global, language, and temporal dimensions.

AAAI Conference 2026 Conference Paper

PCSR: Pseudo-label Consistency-Guided Sample Refinement for Noisy Correspondence Learning

  • Zhuoyao Liu
  • Yang Liu
  • Wentao Feng
  • Shudong Huang

Cross-modal retrieval aims to align different modalities via semantic similarity. However, existing methods often assume that image-text pairs are perfectly aligned, overlooking Noisy Correspondences in real data. These misaligned pairs misguide similarity learning and degrade retrieval performance. Previous methods often rely on coarse-grained categorizations that simply divide data into clean and noisy samples, overlooking the intrinsic diversity within noisy instances. Moreover, they typically apply uniform training strategies regardless of sample characteristics, resulting in suboptimal sample utilization for model optimization. To address the above challenges, we introduce a novel framework, called Pseudo-label Consistency-Guided Sample Refinement (PCSR), which enhances correspondence reliability by explicitly dividing samples based on pseudo-label consistency. Specifically, we first employ a confidence-based estimation to distinguish clean and noisy pairs, then refine the noisy pairs via pseudo-label consistency to uncover structurally distinct subsets. We further proposed a Pseudo-label Consistency Score (PCS) to quantify prediction stability, enabling the separation of ambiguous and refinable samples within noisy pairs. Accordingly, we adopt Adaptive Pair Optimization (APO), where ambiguous samples are optimized with robust loss functions and refinable ones are enhanced via text replacement during training. Extensive experiments on CC152K, MS-COCO and Flickr30K validate the effectiveness of our method in improving retrieval robustness under noisy supervision.

AAAI Conference 2026 Conference Paper

PhysPatch: A Physically Realizable and Transferable Adversarial Patch Attack for Multimodal Large Language Models-based Autonomous Driving Systems

  • Qi Guo
  • Xiaojun Jia
  • Shanmin Pang
  • Simeng Qin
  • Lin Wang
  • Ju Jia
  • Yang Liu
  • Qing Guo

Multimodal Large Language Models (MLLMs) are becoming integral to autonomous driving (AD) systems due to their strong vision-language reasoning capabilities. However, MLLMs are vulnerable to adversarial attacks—particularly adversarial patch attacks—which can pose serious threats in real-world scenarios. Existing patch-based attack methods are primarily designed for object detection models. Due to the more complex architectures and strong reasoning capabilities of MLLMs, these approaches perform poorly when transferred to MLLM-based systems. To address these limitations, we propose PhysPatch, a physically realizable and transferable adversarial patch framework tailored for MLLM-based AD systems. PhysPatch jointly optimizes patch location, shape, and content to enhance attack effectiveness and real-world applicability. It introduces a semantic-based mask initialization strategy for realistic placement, an SVD-based local alignment loss with patch-guided crop-resize to improve transferability, and a potential field-based mask refinement method. Extensive experiments across open-source, commercial, and reasoning-capable MLLMs demonstrate that PhysPatch significantly outperforms state-of-the-art (SOTA) methods in steering MLLM-based AD systems toward target-aligned perception and planning outputs. Moreover, PhysPatch consistently places adversarial patches in physically feasible regions of AD scenes, ensuring strong real-world applicability and deployability.

AAAI Conference 2026 Conference Paper

ProtSAE: Disentangling and Interpreting Protein Language Models via Semantically-Guided Sparse Autoencoders

  • Xiangyu Liu
  • Haodi Lei
  • Yi Liu
  • Yang Liu
  • Wei Hu

Sparse Autoencoder (SAE) has emerged as a powerful tool for mechanistic interpretability of large language models. Recent works apply SAE to protein language models (PLMs), aiming to extract and analyze biologically meaningful features from their latent spaces. However, SAE suffers from semantic entanglement, where individual neurons often mix multiple nonlinear concepts, making it difficult to reliably interpret or manipulate model behaviors. In this paper, we propose a semantically-guided SAE, called ProtSAE. Unlike existing SAE which requires annotation datasets to filter and interpret activations, we guide semantic disentanglement during training using both annotation datasets and domain knowledge to mitigate the effects of entangled attributes. We design interpretability experiments showing that ProtSAE learns more biologically relevant and interpretable hidden features compared to previous methods. Performance analyses further demonstrate that ProtSAE maintains high reconstruction fidelity while achieving better results in interpretable probing. We also show the potential of ProtSAE in steering PLMs for downstream generation tasks.

AAAI Conference 2026 Conference Paper

RABot: Reinforcement-Guided Graph Augmentation for Imbalanced and Noisy Social Bot Detection

  • Longlong Zhang
  • Xi Wang
  • Haotong Du
  • Yangyi Xu
  • Zhuo Liu
  • Yang Liu

Social bot detection is pivotal for safeguarding the integrity of online information ecosystems. Although recent graph neural network (GNN) solutions achieve strong results, they remain hindered by two practical challenges: (i) severe class imbalance arising from the high cost of generating bots, and (ii) topological noise introduced by bots that skillfully mimic human behavior and forge deceptive links. We propose the Reinforcement-guided graph Augmentation social Bot detector (RABot), a multi-granularity graph-augmentation framework that addresses both issues in a unified manner. RABot employs a neighborhood-aware oversampling strategy that linearly interpolates minority-class embeddings within local subgraphs, thereby stabilizing the decision boundary under low-resource regimes. Concurrently, a reinforcement-learning-driven edge-filtering module combines similarity-based edge features with adaptive threshold optimization to excise spurious interactions during message passing, yielding a cleaner topology. Extensive experiments on three real-world benchmarks and four GNN backbones demonstrate that RABot consistently surpasses state-of-the-art baselines. In addition, since its augmentation and filtering modules are orthogonal to the underlying architecture, RABot can be seamlessly integrated into existing GNN pipelines to boost performance with minimal overhead.

AAAI Conference 2026 Conference Paper

ReACT: Reward-informed Autoregressive Decision CAD Transformer

  • Yijie Ding
  • Yang Liu
  • Haobo Jiang
  • Jianmin Zheng

Reconstructing precise CAD modeling sequences from point clouds remains a challenging task, especially for objects with complex geometry and topology. In this paper, by formulating the CAD sequence reconstruction as a Markov decision process, we introduce ReACT, a novel Reward-informed Autoregressive decision Cad Transformer architecture for robust CAD sequence prediction. Beyond previous imitation-only approaches, our key innovation is to frame the CAD Transformer under a reinforcement learning paradigm and thereby integrate reward-inspired heuristic learning into our architecture. This allows ReACT to effectively leverage shape-aware long-term reward feedback to guide the inference of (nearly) optimal CAD commands. Specifically, conditioned on past tokens, comprising the historical CAD states, sketch-extrude commands (i.e., actions) and associated geometric rewards, ReACT autoregressively outputs the most promising CAD commands in a causal manner. In particular, we develop a novel scaffold-aware CAD state representation that integrates global point-command features with an incrementally constructed surface point scaffold, enabling fine-grained geometric reasoning for subsequent reconstruction prediction. Moreover, an effective local barrel points-guided dense reward function is designed to jointly evaluate surface fidelity and command efficiency for reliable reward guidance. Extensive evaluations on the DeepCAD and Fusion360 benchmarks demonstrate that ReACT can achieve superior CAD reconstruction quality, even for objects with complex shapes.

AAAI Conference 2026 Conference Paper

Robust Learning from Noisily Labeled Long-Tailed Data via Fairness Regularizer

  • Jiaheng Wei
  • Zhaowei Zhu
  • Gang Niu
  • Tongliang Liu
  • Sijia Liu
  • Masashi Sugiyama
  • Yang Liu

Both long-tailed and noisily labeled data frequently appear in real-world applications and impose significant challenges for learning. Most prior works treat either problem in an isolated way and do not explicitly consider the coupling effects of the two. Our empirical observation reveals that such solutions fail to consistently improve the learning when the dataset is long-tailed with label noise. Moreover, with the presence of label noise, existing methods do not observe universal improvements across different sub-populations; in other words, some sub-populations enjoyed the benefits of improved accuracy at the cost of hurting others. Based on these observations, we introduce the Fairness Regularizer (FR), inspired by regularizing the performance gap between any two sub-populations. We show that the introduced fairness regularizer improves the performances of sub-populations on the tail and the overall learning performance. Extensive experiments demonstrate the effectiveness of the proposed solution when complemented with certain existing popular robust or class-balanced methods.

AAAI Conference 2026 Conference Paper

Stabilizing Self-Consuming Diffusion Models with Latent Space Filtering

  • Zhongteng Cai
  • Yaxuan Wang
  • Yang Liu
  • Xueru Zhang

As synthetic data proliferates across the Internet, it is often reused to train successive generations of generative models. This creates a "self-consuming loop" that can lead to training instability or *model collapse*. Common strategies to address the issue---such as accumulating historical training data or injecting fresh real data---either increase computational cost or require expensive human annotation. In this paper, we empirically analyze the latent space dynamics of self-consuming diffusion models and observe that the low-dimensional structure of latent representations extracted from synthetic data degrade over generations. Based on this insight, we propose *Latent Space Filtering* (LSF), a novel approach that mitigates model collapse by filtering out less realistic synthetic data from mixed datasets. Theoretically, we present a framework that connects latent space degradation to empirical observations. Experimentally, we show that LSF consistently outperforms existing baselines across multiple real-world datasets, effectively mitigating model collapse without increasing training cost or relying on human annotation.

JBHI Journal 2026 Journal Article

T2Net: Tongue Image-Based T2DM Detection via Simulated Clinical Diagnostic Reasoning

  • Yang Liu
  • Peiyu Liu
  • Yanyi Huang
  • Liyun Li
  • Xiaojie Feng
  • Miao Xie
  • Junhao Chen
  • Jiayu Ye

Clinical studies indicate that the progression of Type 2 Diabetes Mellitus (T2DM) is associated with characteristic alterations in tongue features, which may facilitate non-invasive early detection. However, current deep learning–based tongue imaging approaches for diabetes diagnosis remain constrained by limited datasets, subtle feature variations, dependence on clinical expertise, and the lack of quantitative evaluation. To address these issues, we developed an open-source dataset for T2DM tongue diagnosis (DMT) and benchmarked it using multiple baseline models. Building on DMT, we propose T2Net, a tongue image recognition model for T2DM that simulates the clinical diagnostic process. T2Net comprises four core components: local inspection, pathological clue integration, syndrome identification, and diagnostic confidence estimation. First, T2Net automatically extracts key ROIs by combining large-kernel decomposition with multi-scale learning. Then, a multi-order feature interaction module enables effective fusion of tongue image features across scales to capture pathological clues. Meanwhile, we design a context-aware dynamic aggregation convolution to model long-range dependencies, and propose a flexible focal loss to mimic the diagnostic reasoning process of clinicians, enabling brain-inspired inference. Finally, we propose a clustering-based confidence estimation approach to quantitatively evaluate the reliability of model predictions. Experimental results demonstrate that T2Net achieves highly competitive performance on the DMT dataset, outperforming the second-best baseline by 2. 7% in accuracy and 2. 0% in F1 score. Moreover, the quantitative evaluation scores are largely consistent with clinical assessments by physicians.

AAAI Conference 2026 Conference Paper

Temporal-Consistent Video Restoration with Pre-trained Diffusion Models

  • Hengkang Wang
  • Yang Liu
  • Huidong Liu
  • Chien-Chih Wang
  • Yanhui Guo
  • Hongdong Li
  • Bryan Wang
  • Ju Sun

Video restoration (VR) aims to recover high-quality videos from degraded ones. Although recent zero-shot VR methods using pre-trained diffusion models (DMs) show good promise, they suffer from approximation errors during reverse diffusion and insufficient temporal consistency. Moreover, dealing with 3D video data, VR is inherently computationally intensive. In this paper, we advocate viewing the reverse process in DMs as a function and present a novel Maximum a Posterior (MAP) framework that directly parameterizes video frames in the seed space of DMs, eliminating approximation errors. We also introduce strategies to promote bilevel temporal consistency: semantic consistency by leveraging clustering structures in the seed space, and pixel-level consistency by progressive warping with optical flow refinements. Extensive experiments on multiple virtual reality tasks demonstrate superior visual quality and temporal consistency achieved by our method compared to the state-of-the-art.

AILAW Journal 2026 Journal Article

The trade-off between robustness and reliability in chinese legal large language models: an empirical study

  • Yang Liu
  • Xukai Liu
  • Haozhen Huang
  • Fanfei Yu
  • Tao Xiong
  • Xiaoyiqi Xia
  • Bohao You
  • Jinqi Wu

Abstract Legal large language models (LLMs) deployed in high-stakes judicial settings must exhibit robustness against non-substantive linguistic variations while preserving acute sensitivity to legally determinative facts and norms. This study investigates this robustness–reliability trade-off within the context of Chinese legal tasks. We curate a dataset of 5, 000 Chinese judicial question–answer pairs and generate semantic-preserving adversarial rewrites, retaining only those validated by an embedding-based semantic consistency filter. Holding the total training budget and fine-tuning protocol constant, we fine-tune model variants that differ exclusively in their injection ratio of these verified rewrites, establishing seven distinct injection groups (G0–G6). We evaluate model reliability utilizing a composite protocol that incorporates objective accuracy on exam-style questions, expert evaluations of open-ended responses, and embedding-based semantic similarity. For trademark infringement reasoning tasks, we additionally assess verdict accuracy and rationale quality. Across varying model capacities (4B, 20B, and 32B backbones, including Qwen3-4B, GPT-OSS-20B, and Qwen3-VL-32B-Instruct) and both evaluated tasks, our findings reveal an inverted‑U relationship: moderate robustness data injection enhances reliability, whereas excessive injection degrades overall performance and induces characteristic failure modes, such as the attenuation of legally salient distinctions, the generation of boilerplate rationales, and overly cautious abstention. These findings substantiate “moderate robustness injection” as a practical heuristic and underscore a broader principle of differential sensitivity—achieving insensitivity to superficial variations without blunting the model’s sensitivity to legally decisive elements.

EAAI Journal 2026 Journal Article

Two-phase strategy framework for spatial prediction of landslide hazards in wide-area power linear engineering projects: the case of the China's Renewable Energy Transmission Corridors

  • Bijing Jin
  • Kunlong Yin
  • Taorui Zeng
  • Shuhao Liu
  • Yang Liu
  • Haoran Yang
  • Kai Wang
  • Lei Gui

A critical knowledge gap persists in the development of high-precision spatial prediction frameworks for landslide susceptibility assessment along wide-area linear power infrastructure. Therefore, this study develops a novel two-phase optimization framework to address this gap, focusing on China's Renewable Energy Transmission Corridors (RETCs). Phase Ⅰ employs natural breaks (optimal at 26-level grading) to address spatial heterogeneity in conditioning factors, while in Phase Ⅱ the selection of non-landslide sample is optimized based on different geological environment zones and areas with lower susceptibility levels. Six base machine learning models were evaluated, with two ensemble models (Stacking and Blending) achieving superior performance, achieving an Area Under the Curve (AUC) value exceeding 0. 88. The Blending model demonstrated peak accuracy (AUC = 0. 927), identifying 35% of transmission towers in high and very high susceptibility zones across nine provinces. The framework enables tower-specific susceptibility assessment, crucial for protecting China's 80, 000 km transmission network. These findings advance RETCs resilience by: (1) establishing continuous conditioning factor optimal grading strategy for linear infrastructure, (2) introducing a replicable non-landslide sample optimization protocol, and (3) demonstrating ensemble models superiority in energy corridor landslide susceptibility mapping. This framework provides robust support for securing stable clean energy delivery, with potential applications in global renewable energy grid landslide hazards management.

AAAI Conference 2026 Conference Paper

Visual-Friendly Concept Protection via Selective Adversarial Perturbations

  • Xiaoyue Mi
  • Fan Tang
  • You Wu
  • Juan Cao
  • Peng Li
  • Yang Liu

Personalized concept generation by tuning diffusion models with a few images raises potential legal and ethical concerns regarding privacy and intellectual property rights. Researchers attempt to prevent malicious personalization using adversarial perturbations. However, previous efforts have mainly focused on the effectiveness of protection while neglecting the visibility of perturbations. They utilize global adversarial perturbations, which introduce noticeable alterations to original images and significantly degrade visual quality. In this work, we propose the Visual-Friendly Concept Protection (VCPro) framework, which prioritizes the protection of key concepts chosen by the image owner through adversarial perturbations with lower perceptibility. To ensure these perturbations are as inconspicuous as possible, we introduce a relaxed optimization objective to identify the least perceptible yet effective adversarial perturbations, solved using the Lagrangian multiplier method. Qualitative and quantitative experiments validate that VCPro achieves a better trade-off between the visibility of perturbations and protection effectiveness, effectively prioritizing the protection of target concepts in images with less perceptible perturbations.

JBHI Journal 2026 Journal Article

Whole-Process Evolutionary Heterogeneity Analysis for Glioblastoma Radiotherapy Response Prediction

  • Yao Zheng
  • Dong Huang
  • Jie Wei
  • Tianci Liu
  • Xiaoting Wu
  • Yuefei Feng
  • Chengwei Chen
  • Yang Liu

As a highly heterogeneous tumor, radiotherapy for Glioblastoma (GBM) is a complex and dynamic process. Traditional predictive methods of treatment response often rely on one or a few fixed time points, but this static approach may fail to capture the detailed, individualized changes occurring throughout the treatment process. To address these limitations, we proposed a novel approach called Evolutionary Heterogeneity Analysis Framework (EvoHAF), which integrates tumor heterogeneity and whole-process evolution of GBM radiotherapy. Our framework introduces an Image Heterogeneity Encoder, designed to capture the intricate spatial heterogeneity based on tumor subregions. Additionally, the Temporal Self-Attention Module (TSAM) mechanism integrates longitudinal imaging data throughout the course of radiotherapy, capturing the evolving nature of the tumor. We further introduce a Compensated Prediction Head (CPH) that dynamically refines predictions throughout the patient's radiotherapy. Experimental results on a cross-center cohort, including an internal dataset of 112 patients and an external validation dataset of 80 patients, demonstrate that EvoHAF achieves strong performance. For internal 5-fold validation, the AUC was 0. 8519±0. 0583, and for external validation, the AUC was 0. 7675±0. 0858. These results demonstrate the model's capability to provide accurate whole-process predictions. Moreover, the model's credibility is reinforced by providing visual explanations at both 2D and 3D subregional levels, establishing trust in its decisions and laying a strong foundation for clinical applications.

EAAI Journal 2025 Journal Article

A lightweight detection algorithm for Camellia oleifera buds, stamens, and flowers using you only look once with large selective Kernel Network

  • Fei Long
  • Lijun Li
  • Yang Liu
  • Haifei Chen
  • Yuyan Zhang
  • Haorui Wang

To address the issues of occlusion and missed detections in small targets during the recognition of Camellia oleifera buds, stamens, and flowers and to improve recognition accuracy and computation speed, this study proposes a lightweight detection model based on YOLOv8s (You Only Look Once version 8 small). Firstly, we enhanced detection effectiveness by replacing the original YOLOv8s backbone with the Large Selective Kernel Network (LSKNet) and introducing the Minimum Point Distance Intersection over Union (MPD-IoU) loss function to accelerate convergence and improve recognition of overlapping targets. Secondly, for model lightweighting, we incorporated the Partial Convolution (PCConv) module to reduce parameters and floating-point operations (FLOPs) while enhancing feature representation. An additional detection head was added to improve small bud target detection. Experimental results show our improved model increased precision (P), recall (R), and mean average precision (mAP) by 0. 3%, 1. 2%, and 1. 2% respectively over baseline YOLOv8s. Compared to mainstream models – YOLOv3-tiny, ScaledYOLOv4, YOLOv5s, YOLOv7, YOLOv8s, and Faster Region-Based Convolutional Neural Network (Faster R-CNN) – our model achieved mAP improvements of 1. 8%, 1. 5%, 1. 4%, 2. 1%, 1. 2%, and 7. 7% respectively. The optimized model demonstrates faster, more accurate identification of Camellia oleifera buds, stamens, and flowers, making it suitable for mobile deployment.

EAAI Journal 2025 Journal Article

A safe multi-agent reinforcement learning algorithm using constraint update projection approach

  • Yang Liu
  • Xiang Feng
  • Huiqun Yu

Traditional reinforcement learning has a major limitation, which is that they optimize agent’s policy purely for maximizing rewards. It completely ignore safety considerations. However, in certain critical engineering fields, ensuring safety is of utmost importance, otherwise it can cause incalculable losses. Therefore, this paper proposes a safe Multi-Agent Constrained Update Projection(MACUP) algorithm, which can safely control agents to complete tasks. We solve this problem from the perspective of policy constraint optimization. Firstly, we derive the new bounds of multi-agent policy performance difference based on a tighter general policy performance difference. It contains generalized advantage estimates, and we utilize these bounds as surrogate functions concerning the objective and constraints. Secondly, to address the coordination issue among multiple agents, we employ a multi-agent sequential policy update framework. Finally, we use a projection method to optimize policies, which has low computational complexity and does not require convex approximation of the surrogate function for solving. It can help us reduce errors. Finally, we have validated our algorithm in two different multi-agent safety environments, and the results show that it is able to satisfy safety constraints while achieving higher rewards.

NeurIPS Conference 2025 Conference Paper

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

  • Xiaojun Jia
  • Sensen Gao
  • Simeng Qin
  • Tianyu Pang
  • Chao Du
  • Yihao Huang
  • Xinfeng Li
  • Yiming Li

Multimodal large language models (MLLMs) remain vulnerable to transferable adversarial examples. While existing methods typically achieve targeted attacks by aligning global features—such as CLIP’s [CLS] token—between adversarial and target samples, they often overlook the rich local information encoded in patch tokens. This leads to suboptimal alignment and limited transferability, particularly for closed-source models. To address this limitation, we propose a targeted transferable adversarial attack method based on feature optimal alignment, called FOA-Attack, to improve adversarial transfer capability. Specifically, at the global level, we introduce a global feature loss based on cosine similarity to align the coarse-grained features of adversarial samples with those of target samples. At the local level, given the rich local representations within Transformers, we leverage clustering techniques to extract compact local patterns to alleviate redundant local features. We then formulate local feature alignment between adversarial and target samples as an optimal transport (OT) problem and propose a local clustering optimal transport loss to refine fine-grained feature alignment. Additionally, we propose a dynamic ensemble model weighting strategy to adaptively balance the influence of multiple models during adversarial example generation, thereby further improving transferability. Extensive experiments across various models demonstrate the superiority of the proposed method, outperforming state-of-the-art methods, especially in transferring to closed-source MLLMs.

EAAI Journal 2025 Journal Article

An efficient anchor-free model for ore particle size detection

  • Kanghui Zhang
  • Qingkai Wang
  • Guobin Zou
  • Jiawei Yang
  • Tao Song
  • Yang Liu
  • Daoxi Liu

Accurate detection of ore size is crucial in mineral processing, directly impacting equipment efficiency and product quality. However, traditional anchor-based models often struggle with the irregular shapes and varying scales of ore particles, resulting in limited performance. To overcome these challenges, an anchor-free detection framework was proposed. It incorporates a cross-stage partial bottleneck and a spatial pyramid pooling cross-stage partial connections (SPPCSCP-DualConv), both enhanced with dual convolution, to improve feature extraction and multi-scale fusion. In the backbone, the dual convolution module combines group convolution with heterogeneous convolution to improve feature diversity. The SPPCSCP-DualConv module further enhances feature representation in complex backgrounds. Additionally, a simplified path aggregation network (simPANet) feature fusion module is employed in the neck to refine the integration of multi-scale features. The proposed model was trained using a combination of binary cross-entropy, complete intersection over union (IoU), and distribution focal loss to optimize detection accuracy. The proposed model achieved a mean average precision of 86. 80 % at an IoU threshold of. 5 and 78. 50 % across IoU thresholds from. 5 to. 95, surpassing existing methods while maintaining a lightweight architecture with only 10. 10 million parameters and 89. 45 giga floating point operations per second. Ablation studies confirmed the effectiveness of the simPANet and SPPCSPC-DualConv modules in enhancing feature representation. Generalization tests across mining sites with similar distributions demonstrated strong performance, although limitations remain for exceptionally large ore blocks due to dataset bias. The proposed model significantly improved the accuracy and efficiency of ore particle size detection, providing reliable real-time insights to improve grinding control and mineral processing operations.

IROS Conference 2025 Conference Paper

Annotation-Free Curb Detection Leveraging Altitude Difference Image

  • Fulong Ma
  • Peng Hou
  • Yuxuan Liu 0008
  • Yang Liu
  • Ming Liu 0001
  • Jun Ma 0008

Road curbs are considered as one of the crucial and ubiquitous traffic features, which are essential for ensuring the safety of autonomous vehicles. Current methods for detecting curbs primarily rely on camera imagery or LiDAR point clouds. Image-based methods are vulnerable to fluctuations in lighting conditions and exhibit poor robustness, while methods based on point clouds circumvent the issues associated with lighting variations. However, it is the typical case that significant processing delays are encountered due to the voluminous amount of 3D points contained in each frame of the point cloud data. Furthermore, the inherently unstructured characteristics of point clouds poses challenges for integrating the latest deep learning advancements into point cloud data applications. To address these issues, this work proposes an annotation-free curb detection method leveraging Altitude Difference Image (ADI) (as shown in Fig. 1), which effectively mitigates the aforementioned challenges. Given that methods based on deep learning generally demand extensive, manually annotated datasets, which are both expensive and labor-intensive to create, we present an Automatic Curb Annotator (ACA) module. This module utilizes a deterministic curb detection algorithm to automatically generate a vast quantity of training data. Consequently, it facilitates the training of the curb detection model without necessitating any manual annotation of data. Finally, by incorporating a post-processing module, we manage to achieve state-of-the-art results on the KITTI 3D curb dataset [1] with considerably reduced processing delays compared to existing methods, which underscores the effectiveness of our approach in curb detection tasks. Our code and data will be open-sourced at: https://sites.google.com/view/adi-curb-detection.

IROS Conference 2025 Conference Paper

Articulation-Gen: 3D Part Segmentation and Articulated Object Generation

  • Zhuoqun Xu
  • Yang Liu

Recent advances in 3D content generation, particularly 3D Gaussian Splatting (3DGS) and diffusion models, have significantly improved the synthesis of static shapes and textures. However, the modeling of dynamic articulations remains a significant challenge. Existing datasets lack physics-aware joint annotations, segmentation methods overlook kinematic constraints, and procedural generation techniques often prioritize space coverage over physical plausibility and visual realism. Motivated by these challenges, we propose Articulation-Gen, a scalable and robust framework for generating physically compliant, multi-joint 3D objects. Our approach comprises three components: (1) a 3D semantic segmentation module that integrates 2D visual models (SAM2 and DINO) to achieve 91. 4% part segmentation accuracy by resolving occlusions via multi-view fusion with semantic consistency; (2) a physics-guided joint optimizer that combines spatial sampling with heuristic search to reach 93. 7% axis alignment accuracy, representing a 20. 6% improvement; and (3) an LLM-augmented URDF synthesis mechanism that automatically produces physically plausible kinematic descriptions with language annotations, thereby improving generation accuracy by 87. 5%. Leveraging existing 3D asset datasets and generation techniques, we further construct a large-scale articulation asset dataset comprising 10. 6K articulated objects with 45. 2K validated joints. This dataset enables faster articulated asset generation while ensuring URDF compliance. By proposing our pipeline and dataset, this work provides foundational tools for physics-based computer graphics and embodied AI, advancing the frontiers of 3D content creation and robotic simulation.

AAAI Conference 2025 Conference Paper

Asymmetric Visual Semantic Embedding Framework for Efficient Vision-Language Alignment

  • Yang Liu
  • Mengyuan Liu
  • Shudong Huang
  • Jiancheng Lv

Learning visual semantic similarity is a critical challenge in bridging the gap between images and texts. However, there exist inherent variations between vision and language data, such as information density, i.e., images can contain textual information from multiple different views, which makes it difficult to compute the similarity between these two modalities accurately and efficiently. In this paper, we propose a novel framework called Asymmetric Visual Semantic Embedding (AVSE) to dynamically select features from various regions of images tailored to different textual inputs for similarity calculation. To capture information from different views in the image, we design a radial bias sampling module to sample image patches and obtain image features from various views, Furthermore, AVSE introduces a novel module for efficient computation of visual semantic similarity between asymmetric image and text embeddings. Central to this module is the presumption of foundational semantic units within the embeddings, denoted as ``meta-semantic embeddings." It segments all embeddings into meta-semantic embeddings with the same dimension and calculates visual semantic similarity by finding the optimal match of meta-semantic embeddings of two modalities. Our proposed AVSE model is extensively evaluated on the large-scale MS-COCO and Flickr30K datasets, demonstrating its superiority over recent state-of-the-art methods.

NeurIPS Conference 2025 Conference Paper

Beyond the Surface: Enhancing LLM-as-a-Judge Alignment with Human via Internal Representations

  • Peng Lai
  • Jianjie Zheng
  • Sijie Cheng
  • Yun Chen
  • Peng Li
  • Yang Liu
  • Guanhua Chen

The growing scale of evaluation tasks has led to the widespread adoption of automated evaluation using LLMs, a paradigm known as “LLM-as-a-judge”. However, improving its alignment with human preferences without complex prompts or fine-tuning remains challenging. Previous studies mainly optimize based on shallow outputs, overlooking rich cross-layer representations. In this work, motivated by preliminary findings that middle-to-upper layers encode semantically and task-relevant representations that are often more aligned with human judgments than the final layer, we propose LAGER, a post-hoc, plug-and-play framework for improving the alignment of LLM-as-a-Judge point-wise evaluations with human scores by leveraging internal representations. LAGER produces fine-grained judgment scores by aggregating cross-layer score-token logits and computing the expected score from a softmax-based distribution, while keeping the LLM backbone frozen and ensuring no impact on the inference process. LAGER fully leverages the complementary information across different layers, overcoming the limitations of relying solely on the final layer. We evaluate our method on the standard alignment benchmarks Flask, HelpSteer, and BIGGen using Spearman correlation, and find that LAGER achieves improvements of up to 7. 5% over the best baseline across these benchmarks. Without reasoning steps, LAGER matches or outperforms reasoning-based methods. Experiments on downstream applications, such as data selection and emotional understanding, further show the generalization of LAGER.

AAAI Conference 2025 Conference Paper

Can Large Language Models Derive High-Level Cognition from Low-Level and Fragmented Foundational Information?

  • Yang Liu
  • Xiaoping Wang
  • Kai Lu

As one of the key technologies leading to Artificial General Intelligence (AGI), Large Language Models (LLMs) have achieved remarkable accomplishments. Exploring the capabilities of LLMs is crucial for scientific research, and many studies propose new challenges from various aspects to explore the boundaries of capabilities in LLMs. This paper attempts to push the challenges of information understanding, synthesizing and reasoning to the extreme, in order to explore the boundaries of more advanced dimensional cognitive capabilities in LLMs. It is defined as the task of High-Level Cognition (HLC), which involves obtaining high-level conclusions from low-level and fragmented foundational information. To evaluate HLC, we construct a dataset based on soccer matches. Experiments and analysis on this dataset show that current state-of-the-art LLMs lack the ability to effectively solve the task of HLC, because their performance is equivalent to random-level. However, by fine-tuning Llama3-8B-Instruct, there are improvements of 14.4%, 48.1%, and 19.4% over random-level in three types of evaluation tasks. This indicates that LLMs have great potential to solve the task of HLC.

JBHI Journal 2025 Journal Article

Collaborative Learning Macroscopic Binding Trends and Microscopic Residue Interactions to Predict Peptide-Protein Interactions

  • Li Zeng
  • Yang Liu
  • Zu-guo Yu
  • Guosheng Han
  • Yuansheng Liu

Short peptides and their structural modifications have demonstrated significant potential in the field of therapeutic drug development. During the research and development process, peptide-protein interaction plays a crucial role for screening highly effective peptides. Although traditional experimental methods can identity peptide-protein interactions, their time-consuming and resource-intensive nature make researchers develop various of computational alternatives. In addition, accurately predicting these interactions necessitates both the macroscopic molecular binding affinity and the precise interaction patterns at the microscopic residue level. Existing computational methods face limitations, as they are typically confined to modeling at a single level, resulting in restricted prediction accuracy. To address this gap, we propose MMPepPro, a dual-level biofeature collaborative interaction learning framework that integrates macro-level binding trends with micro-level residue interaction features. Trained on 19, 187 peptide-protein complexes, MMPepPro combines molecular-level and amino acid-level features to achieve comprehensive modeling. Experimental validation demonstrates the model's superior performance across all evaluation metrics compared to other state-of-the-art methods in peptide-protein interaction prediction. More notably, its generalization performance across other four datasets validates the universality of this method, which will aid in the development of peptide-protein drugs.

JAIR Journal 2025 Journal Article

Combinatorial Multi-Armed Bandits with Fairness Constraints: An Online Convex Optimization Perspective

  • Xiaosong Chen
  • Hanqin Zhuang
  • Yang Liu
  • Huanle Xu
  • Wing Cheong Lau

The problem of multi-armed bandit (MAB) with fairness constraints has emerged as an important research topic recently. For such problems, one common objective is to maximize the total rewards within a fixed number of pull rounds, while satisfying the fairness requirement of a minimum selection fraction for each individual arm in the long run. Previous works have made substantial advancements in designing various online selection solutions for MAB, however, when incorporating such fairness constraints, they fail to achieve a sublinear regret bound. In this paper, we study a combinatorial MAB problem with concave objective and fairness constraints. In particular, we design a new selection algorithm that solves MAB problems from an online convex optimization perspective. Our algorithm is computationally efficient, and more importantly, manages to achieve a sublinear regret bound of O( √ T ln T) with high probability guarantees in T selection rounds. We also extend our framework to include more general knapsack constraints. Finally, we assess the performance of our algorithm through extensive simulations and real dataset applications, demonstrating its significant advantages over baseline schemes.

NeurIPS Conference 2025 Conference Paper

Compiler-R1: Towards Agentic Compiler Auto-tuning with Reinforcement Learning

  • Haolin Pan
  • Hongyu Lin
  • Haoran Luo
  • Yang Liu
  • Kaichun Yao
  • Libo Zhang
  • Mingjie Xing
  • Yanjun Wu

Compiler auto-tuning optimizes pass sequences to improve performance metrics such as Intermediate Representation (IR) instruction count. Although recent advances leveraging Large Language Models (LLMs) have shown promise in automating compiler tuning, two significant challenges still remain: the absence of high-quality reasoning datasets for agents training, and limited effective interactions with the compilation environment. In this work, we introduce Compiler-R1, the first reinforcement learning (RL)-driven framework specifically augmenting LLM capabilities for compiler auto-tuning. Compiler-R1 features a curated, high-quality reasoning dataset and a novel two-stage end-to-end RL training pipeline, enabling efficient environment exploration and learning through an outcome-based reward. Extensive experiments across seven datasets demonstrate Compiler-R1 achieving an average 8. 46\% IR instruction count reduction compared to opt -Oz, showcasing the strong potential of RL-trained LLMs for compiler optimization. Our code and datasets are publicly available at https: //github. com/Panhaolin2001/Compiler-R1.

NeurIPS Conference 2025 Conference Paper

CPSea: Large-scale cyclic peptide-protein complex dataset for machine learning in cyclic peptide design

  • Ziyi Yang
  • Hanyuan Xie
  • Yinjun Jia
  • Xiangzhe Kong
  • Jiqing Zheng
  • Ziting Zhang
  • Yang Liu
  • Lei Liu

Cyclic peptides exhibit better binding affinity and proteolytic stability compared to their linear counterparts. However, the development of cyclic peptide design models is hindered by the scarcity of data. To address this, we introduce **CPSea**(**C**yclic **P**eptide **Sea**), a dataset of 2. 71 million cyclic peptide-receptor complexes, curated through systematic mining of the AlphaFold Database (AFDB). Our pipeline extracts compact domains from AFDB, identifies cyclization sites using the $\beta$-carbon (C$_\beta$) distance thresholds, and applies multi-stage filtering to ensure structure fidelity and binding compatibility. Compared with experimental data of cyclic peptides, CPSea shows similar distributions in metrics on structure fidelity and wet-lab compatibility. To our knowledge, CPSea is the largest cyclic peptide-receptor dataset to date, enabling end-to-end model training for the first time. The dataset also showcases the feasibility of simulating inter-chain interactions using intra-chain interactions, expanding available resources for machine-learning models on protein-protein interactions. The dataset and relevant scripts are accessible on GitHub ([https: //github. com/YZY010418/CPSea](https: //github. com/YZY010418/CPSea)).

EAAI Journal 2025 Journal Article

Cross-domain fault diagnosis of marine diesel engines based on stepwise diffusion and iterative bidirectional optimization

  • Zhen Zhao
  • Ziru Jin
  • Xin Xin
  • Yutong Fu
  • Xiaotong Huang
  • Liang Li
  • Hongyan Qin
  • Chong Wei

Cross-domain fault diagnosis of marine diesel engines presents significant challenges due to variations in data distribution and the limited availability of labeled fault samples under different operating conditions. To address this, an unsupervised domain-adaptive diagnostic framework is proposed, integrating stepwise diffusion and iterative bidirectional optimization to enhance fault identification. First, the quadratic axial attention transformer introduces a fourth weight in the axial computation to effectively capture the long-range spatio-temporal correlations in the time–frequency representations and strengthen the cross-axis contextual dependence. Next, the domain stepwise diffusion bridge utilizes Markov transform to gradually refine the significant distributional differences across domains into continuous sub-distributions, ensuring a smoother adaptation process. Finally, an iterative bidirectional optimization strategy is proposed to dynamically coordinate the interaction between stepwise diffusion and fault classification, where two complementary learning directions are alternately executed to preserve the semantic integrity of features. Experimental validation on a self-constructed dataset covering multiple operating conditions demonstrates the effectiveness of the proposed approach, achieving 93. 80 % average accuracy, 93. 75 % precision, and 93. 45 % recall. This approach not only breaks through the limitations of existing domain alignment methods and provides a brand new solution for cross-domain fault diagnosis, but also provides a wide range of implications for future research and applications in this field. The code and model are available at: https: //github. com/lazyJzr/UDAtask.

NeurIPS Conference 2025 Conference Paper

DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding

  • Yue Jiang
  • Jichu Li
  • Yang Liu
  • Dingkang Yang
  • Feng Zhou
  • Quyu Kong

We introduce DanmakuTPPBench, a comprehensive benchmark designed to advance multi-modal Temporal Point Process (TPP) modeling in the era of Large Language Models (LLMs). While TPPs have been widely studied for modeling temporal event sequences, existing datasets are predominantly unimodal, hindering progress in models that require joint reasoning over temporal, textual, and visual information. To address this gap, DanmakuTPPBench comprises two complementary components: (1) DanmakuTPP-Events, a novel dataset derived from the Bilibili video platform, where user-generated bullet comments (Danmaku) naturally form multi-modal events annotated with precise timestamps, rich textual content, and corresponding video frames; (2) DanmakuTPP-QA, a challenging question-answering dataset constructed via a novel multi-agent pipeline powered by state-of-the-art LLMs and multi-modal LLMs (MLLMs), targeting complex temporal-textual-visual reasoning. We conduct extensive evaluations using both classical TPP models and recent MLLMs, revealing significant performance gaps and limitations in current methods’ ability to model multi-modal event dynamics. Our benchmark establishes strong baselines and calls for further integration of TPP modeling into the multi-modal language modeling landscape. Project page: https: //github. com/FRENKIE-CHIANG/DanmakuTPPBench.

NeurIPS Conference 2025 Conference Paper

DAPO : Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage-Based Policy Optimization

  • Jiacai Liu
  • Chaojie Wang
  • Chris Liu
  • Liang Zeng
  • Rui Yan
  • Yiwen Sun
  • Yang Liu

The role of reinforcement learning (RL) in enhancing the reasoning of large language models (LLMs) is becoming increasingly significant. Despite the success of RL in many scenarios, there are still many challenges in improving the reasoning of LLMs. One key challenge is the sparse reward, which introduces more training variance in policy optimization and makes it difficult to obtain a good estimation for value function in Actor-Critic (AC) methods. To address these issues, we introduce Direct Advantage-Based Policy Optimization (DAPO), a novel step-level offline RL algorithm with theoretical guarantees for enhancing the reasoning abilities of LLMs. Unlike response-level methods (such as DPO and GRPO) that the update directions of all reasoning steps are governed by the outcome reward uniformly, DAPO employs a critic function to provide step-level dense signals for policy optimization. Additionally, the actor and critic in DAPO are trained independently, ensuring that critic is a good estimation of true state value function and avoiding the co-training instability observed in standard AC methods. We train DAPO on mathematical and code problems and then evaluate its performance on multiple benchmarks. Our results show that DAPO can effectively enhance the mathematical and code capabilities on both SFT models and RL models, demonstrating the effectiveness of DAPO.

EAAI Journal 2025 Journal Article

Data-driven joint multiobjective prediction and optimization for tunnel-induced adjacent bridge pier displacement: A case study in China

  • Hongyu Chen
  • Jun Liu
  • Qiping Geoffrey Shen
  • Tiejun Li
  • Yang Liu

To reduce the impact of tunnel construction on adjacent bridge pile foundations and ensure safety during construction, a hybrid intelligent framework combining Bayesian optimization (BO), categorical boosting (CatBoost), and the nondominated sorting genetic algorithm-III (NSGA-III) is proposed in this paper. The nonlinear mapping function relationship between the nine input parameters and the bridge pier vertical and horizontal displacements is established via BO-CatBoost. The key optimization parameters are for interpretability analysis and determined via Shapley additive explanations (SHAP) method. NSGA-III is established with the goal of minimizing pier displacement. The applicability and validity of the proposed method are tested in a case study of the Wuhan Metro. The key findings of this study include the following. (1) The accuracy of the prediction model obtained by the BO-CatBoost algorithm on the basis of the training and simulation of the measured engineering data is high. On the bridge pier horizontal and vertical displacement test sets, the R2 values are 0. 823 and 0. 826, the RMSE values are 0. 452 and 0. 539, and the MAEs are 0. 293 and 0. 360, respectively. (2) The optimization effect of the two objectives is significant, and the average percentage of improvement stands at 35. 54%. When five shield construction parameters are adjusted simultaneously, the optimization effect of the two objectives is the best, and the average improvement percentage is 54. 76%. (3) The optimization effect of the developed BO-CatBoost-NSGA-III intelligent algorithm is greater than that of single-objective optimization. Therefore, the intelligent optimization framework proposed in this paper can provide guidance for the optimal control of pier displacement in shield underpass construction engineering.

AAAI Conference 2025 Conference Paper

DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification

  • Yuhao Wang
  • Yang Liu
  • Aihua Zheng
  • Pingping Zhang

Multi-modal object Re-IDentification (ReID) aims to retrieve specific objects by combining complementary information from multiple modalities. Existing multi-modal object ReID methods primarily focus on the fusion of heterogeneous features. However, they often overlook the dynamic quality changes in multi-modal imaging. In addition, the shared information between different modalities can weaken modality-specific information. To address these issues, we propose a novel feature learning framework called DeMo for multi-modal object ReID, which adaptively balances decoupled features using a mixture of experts. To be specific, we first deploy a Patch-Integrated Feature Extractor (PIFE) to extract multi-granularity and multi-modal features. Then, we introduce a Hierarchical Decoupling Module (HDM) to decouple multi-modal features into non-overlapping forms, preserving the modality uniqueness and increasing the feature diversity. Finally, we propose an Attention-Triggered Mixture of Experts (ATMoE), which replaces traditional gating with dynamic attention weights derived from decoupled features. With these modules, our DeMo can generate more robust multi-modal features. Extensive experiments on three object ReID benchmarks verify the effectiveness of our methods.

NeurIPS Conference 2025 Conference Paper

DepthVanish: Optimizing Adversarial Interval Structures for Stereo-Depth-Invisible Patches

  • Yun Xing
  • Yue Cao
  • Nhat Chung
  • Jie Zhang
  • Ivor Tsang
  • Ming-Ming Cheng
  • Yang Liu
  • Lei Ma

Stereo depth estimation is a critical task in autonomous driving and robotics, where inaccuracies (such as misidentifying nearby objects as distant) can lead to dangerous situations. Adversarial attacks against stereo depth estimation can help revealing vulnerabilities before deployment. Previous works have shown that repeating optimized textures can effectively mislead stereo depth estimation in digital settings. However, our research reveals that these naively repeated textures perform poorly in physical implementations, $\textit{i. e. }$, when deployed as patches, limiting their practical utility for stress-testing stereo depth estimation systems. In this work, for the first time, we discover that introducing regular intervals among the repeated textures, creating a grid structure, significantly enhances the patch attack performance. Through extensive experimentation, we analyze how variations of this novel structure influence the adversarial effectiveness. Based on these insights, we develop a novel stereo depth attack that jointly optimizes both the interval structure and texture elements. Our generated adversarial patches can be inserted into any scenes and successfully attack advanced stereo depth estimation methods of different paradigms, $\textit{i. e. }$, RAFT-Stereo and STTR. Most critically, our patch can also attack commercial RGB-D cameras (Intel RealSense) in real-world conditions, demonstrating their practical relevance for security assessment of stereo systems. The code is officially released at: https: //github. com/WiWiN42/DepthVanish

EAAI Journal 2025 Journal Article

Development of data-driven predictive model and enhanced multiobjective optimization to improve the excavation performance of large-diameter slurry shields

  • Feiming Su
  • Xianguo Wu
  • Tiejun Li
  • Yang Liu

Safety, efficiency and energy consumption are important aspects for evaluating the performance of large-diameter slurry shield, and improving the performance of shield is crucial for safe and efficient excavation. To this end, a data-driven hybrid method is developed to improve the excavation performance of large-diameter slurry shields by intelligence regulating shield parameters. This method combines Bayesian Optimization with categorical boosting (BO-CatBoost) and enhanced multiobjective evolutionary algorithm based on decomposition (EMOEA/D). The method uses surface settlement, penetration and specific energy as output targets and employs the expert knowledge to select the input parameters. Subsequently, the trained BO-CatBoost model is employed to fit the input-output relationship. On this basis, the multiobjective optimization process was performed using EMOEA/D, with the important parameters determined by Shapley Additive exPlanations as decision variables and the nonlinear relationship fitted by BO-CatBoost as the objective function. Finally, the technique for order preference similarity to ideal solution is applied to obtain optimal operational parameters, thereby enhancing the excavation performance of large-diameter slurry shield. The proposed method is applied to a Wuhan rail transit line to verify the effectiveness, and the result shows that: (1) Our method can accurately predict the three targets with goodness of fit ranging from 0. 938 to 0. 988, respectively. (2) The proposed method can effectively improve the excavation performance of the large-diameter slurry shield, and reaches 13. 88 %, 5. 21 %, and 10. 88 %, respectively. (3) An adaptive decision-making system for setting operational parameters is constructed, which is valuable for formulating of operational control strategies for large-diameter slurry shields.

AAAI Conference 2025 Conference Paper

DoGA: Enhancing Grounded Object Detection via Grouped Pre-Training with Attributes

  • Yang Liu
  • Feng Hou
  • Yunjie Peng
  • Gangjian Zhang
  • Yao Zhang
  • Dong Xie
  • Peng Wang
  • Yang Zhang

Recent advances in vision-language pre-training have significantly enhanced the model capabilities on grounded object detection. However, these studies often pre-train with coarse-grained text prompts, such as plain category names and brief grounded phrases. This limitation curtails the model's capacity for fine-grained linguistic comprehension and leads to a significant decline in performance when faced with detailed descriptions or contextual information. To tackle these problems, we develop DoGA: Detect objects with Grouped Attributes, which employs commonly apparent attributes to bridge different granular semantics and uses specific attributes to identify the object discrepancy. Our DoGA incorporates three principle components: 1) Generation of attribute-based prompts, consisting of linguistic definitions enriched with common-sense visible attributes and hard negative notations deriving from the image-specific attribute features; 2) Paralleled entity fusion and optimization, designed to manage long attribute-based descriptions and negative concepts efficiently; and 3) Prompt-wise grouped training to accommodate model to perform many-to-many assignments, facilitating simultaneous training and inferring with multiple attribute-based synonyms. Extensive experiments demonstrate that training with synonymous attribute-based prompts allows DoGA to generalize multi-granular prompts and surpass previous state-of-the-art approaches, yielding 50.2 on the COCO and 38.0 on the LVIS benchmarks under the zero-short setting. We will make our code publicly available upon acceptance.

AAAI Conference 2025 Conference Paper

Dynamic Graph Learning with Static Relations for Credit Risk Assessment

  • Qi Yuan
  • Yang Liu
  • Yateng Tang
  • Xinhuan Chen
  • Xuehao Zheng
  • Qing He
  • Xiang Ao

Credit risk assessment has increasingly become a prominent research field due to the dramatically increased incidents of financial default. Traditional graph-based methods have been developed to detect defaulters within user-merchant commercial payment networks. However, these methods face challenges in detecting complex risks, primarily due to their neglect of user-to-user fund transfer interactions and the under-utilization of temporal information. In this paper, we propose a novel framework named Dynamic Graph Neural Network with Static Relations (DGNN-SR) for credit risk assessment, which can encode the dynamic transaction graph and the static fund transfer graph simultaneously. To fully harness the temporal information, DGNN-SR employs a multi-view time encoder to explore the semantics of both relative and absolute time. To enhance the dynamic representations with static relations, we devise an adaptive re-weighting strategy to incorporate the static relations into the dynamic representations of time encoder, which extracts more discriminative features for risk assessment. Extensive experiments on two real-world business datasets demonstrate that our proposed method achieves a 0.85% - 2.5% improvement over existing SOTA methods.

TMLR Journal 2025 Journal Article

Enhancing Parameter Efficiency and Generalization in Large Models: A Regularized and Masked Low-Rank Adaptation Approach

  • Yuzhu Mao
  • Zihao Zhao
  • Siqi Ping
  • Yang Liu
  • Wenbo Ding

Large pre-trained models, such as large language models (LLMs), present significant resource challenges for fine-tuning due to their extensive parameter sizes, especially for applications in mobile systems. To address this, Low-Rank Adaptation (LoRA) has been developed to reduce resource consumption while maintaining satisfactory fine-tuning results. Despite its effectiveness, the original LoRA method faces the challenge of suboptimal performance. This paper investigates the intrinsic dimension of the matrix updates approximated by the LoRA method and reveals the performance benefits of increasing this intrinsic dimension. By employing regularization and a gradient masking method that encourages higher intrinsic dimension, the proposed method, termed Regularized and Masked LoRA (RM-LoRA), achieves superior generalization performance with the same or lower trainable parameter budget compared to the original LoRA and its latest variants across various open-source vision and language datasets.

NeurIPS Conference 2025 Conference Paper

Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games

  • Runyu Lu
  • Peng Zhang
  • Ruochuan Shi
  • Yuanheng Zhu
  • Dongbin Zhao
  • Yang Liu
  • Dong Wang
  • Cesare Alippi

Equilibrium learning in adversarial games is an important topic widely examined in the fields of game theory and reinforcement learning (RL). Pursuit-evasion game (PEG), as an important class of real-world games from the fields of robotics and security, requires exponential time to be accurately solved. When the underlying graph structure varies, even the state-of-the-art RL methods require recomputation or at least fine-tuning, which can be time-consuming and impair real-time applicability. This paper proposes an Equilibrium Policy Generalization (EPG) framework to effectively learn a generalized policy with robust cross-graph zero-shot performance. In the context of PEGs, our framework is generally applicable to both pursuer and evader sides in both no-exit and multi-exit scenarios. These two generalizability properties, to our knowledge, are the first to appear in this domain. The core idea of the EPG framework is to train an RL policy across different graph structures against the equilibrium policy for each single graph. To construct an equilibrium oracle for single-graph policies, we present a dynamic programming (DP) algorithm that provably generates pure-strategy Nash equilibrium with near-optimal time complexity. To guarantee scalability with respect to pursuer number, we further extend DP and RL by designing a grouping mechanism and a sequence model for joint policy decomposition, respectively. Experimental results show that, using equilibrium guidance and a distance feature proposed for cross-graph PEG training, the EPG framework guarantees desirable zero-shot performance in various unseen real-world graphs. Besides, when trained under an equilibrium heuristic proposed for the graphs with exits, our generalized pursuer policy can even match the performance of the fine-tuned policies from the state-of-the-art PEG methods.

ICLR Conference 2025 Conference Paper

Erasing Concept Combination from Text-to-Image Diffusion Model

  • Hongyi Nie
  • Quanming Yao
  • Yang Liu
  • Zhen Wang 0004
  • Yatao An Bian

Advancements in the text-to-image diffusion model have raised security concerns due to their potential to generate images with inappropriate themes such as societal biases and copyright infringements. Current studies have made notable progress in preventing the model from generating images containing specific high-risk visual concepts. However, these methods neglect the issue that inappropriate themes may also arise from the combination of benign visual concepts. A crucial challenge arises because the same image theme can be represented through multiple distinct visual concept combinations, and the model's ability to generate individual concepts may become distorted when processing these combinations. Consequently, effectively erasing such visual concept combinations from the diffusion model remains a formidable challenge. To tackle this problem, we formalize the problem as the Concept Combination Erasing (CCE) problem and propose a Concept Graph-based high-level Feature Decoupling framework (CoGFD) to address CCE. CoGFD identifies and decomposes visual concept combinations with a consistent image theme from an LLM-induced concept logic graph, and erases these combinations through decoupling co-occurrent high-level features. These techniques enable CoGFD to eliminate undesirable visual concept combinations while minimizing adverse effects on the generative fidelity of related individual concepts, outperforming state-of-the-art baselines. Extensive experiments across diverse visual concept combination scenarios verify the effectiveness of CoGFD.

NeurIPS Conference 2025 Conference Paper

Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia

  • Chandler Smith
  • Marwa Abdulhai
  • Manfred Díaz
  • Marko Tesic
  • Rakshit Trivedi
  • Sasha Vezhnevets
  • Lewis Hammond
  • Jesse Clifton

Large Language Model (LLM) agents have demonstrated impressive capabilities for social interaction and are increasingly being deployed in situations where they might engage with both human and artificial agents. These interactions represent a critical frontier for LLM-based agents, yet existing evaluation methods fail to measure how well these capabilities generalize to novel social situations. In this paper, we introduce a method for evaluating the ability of LLM-based agents to cooperate in zero-shot, mixed-motive environments using Concordia, a natural language multi-agent simulation environment. Our method measures general cooperative intelligence by testing an agent's ability to identify and exploit opportunities for mutual gain across diverse partners and contexts. We present empirical results from the NeurIPS 2024 Concordia Contest, where agents were evaluated on their ability to achieve mutual gains across a suite of diverse scenarios ranging from negotiation to collective action problems. Our findings reveal significant gaps between current agent capabilities and the robust generalization required for reliable cooperation, particularly in scenarios demanding persuasion and norm enforcement.

NeurIPS Conference 2025 Conference Paper

Evaluating LLM-contaminated Crowdsourcing Data Without Ground Truth

  • Yichi Zhang
  • Jinlong Pang
  • Zhaowei Zhu
  • Yang Liu

The recent success of generative AI highlights the crucial role of high-quality human feedback in building trustworthy AI systems. However, the increasing use of large language models (LLMs) by crowdsourcing workers poses a significant challenge: datasets intended to reflect human input may be compromised by LLM-generated responses. Existing LLM detection approaches often rely on high-dimensional training data such as text, making them unsuitable for structured annotation tasks like multiple-choice labeling. In this work, we investigate the potential of peer prediction --- a mechanism that evaluates the information within workers' responses --- to mitigate LLM-assisted cheating in crowdsourcing with a focus on annotation tasks. Our method quantifies the correlations between worker answers while conditioning on (a subset of) LLM-generated labels available to the requester. Building on prior research, we propose a training-free scoring mechanism with theoretical guarantees under a novel model that accounts for LLM collusion. We establish conditions under which our method is effective and empirically demonstrate its robustness in detecting low-effort cheating on real-world crowdsourcing datasets.

AAAI Conference 2025 Conference Paper

Exploring Enhanced Contextual Information for Video-Level Object Tracking

  • Ben Kang
  • Xin Chen
  • Simiao Lai
  • Yang Liu
  • Yi Liu
  • Dong Wang

Contextual information at the video level has become increasingly crucial for visual object tracking. However, existing methods typically use only a few tokens to convey this information, which can lead to information loss and limit their ability to fully capture the context. To address this issue, we propose a new video-level visual object tracking framework called MCITrack. It leverages Mamba's hidden states to continuously record and transmit extensive contextual information throughout the video stream, resulting in more robust object tracking. The core component of MCITrack is the Contextual Information Fusion module, which consists of the mamba layer and the cross-attention layer. The mamba layer stores historical contextual information, while the cross-attention layer integrates this information into the current visual features of each backbone block. This module enhances the model's ability to capture and utilize contextual information at multiple levels through deep integration with the backbone. Experiments demonstrate that MCITrack achieves competitive performance across numerous benchmarks. For instance, it gets 76.6% AUC on LaSOT and 80.0% AO on GOT-10k, establishing a new state-of-the-art performance.

NeurIPS Conference 2025 Conference Paper

Exploring Structural Degradation in Dense Representations for Self-supervised Learning

  • Siran Dai
  • Qianqian Xu
  • Peisong Wen
  • Yang Liu
  • Qingming Huang

In this work, we observe a counterintuitive phenomenon in self-supervised learning (SSL): longer training may impair the performance of dense prediction tasks (e. g. , semantic segmentation). We refer to this phenomenon as Self-supervised Dense Degradation (SDD) and demonstrate its consistent presence across sixteen state-of-the-art SSL methods with various losses, architectures, and datasets. When the model performs suboptimally on dense tasks at the end of training, measuring the performance during training becomes essential. However, evaluating dense performance effectively without annotations remains an open challenge. To tackle this issue, we introduce a Dense representation Structure Estimator (DSE), composed of a class-relevance measure and an effective dimensionality measure. The proposed DSE is both theoretically grounded and empirically validated to be closely correlated with the downstream performance. Based on this metric, we introduce a straightforward yet effective model selection strategy and a DSE-based regularization method. Experiments on sixteen SSL methods across four benchmarks confirm that model selection improves mIoU by $3. 0\\%$ on average with negligible computational cost. Additionally, DSE regularization consistently mitigates the effects of dense degradation. Code is available at \url{https: //github. com/EldercatSAM/SSL-Degradation}.

AAAI Conference 2025 Conference Paper

FastPERT: Towards Fast Microservice Application Latency Prediction via Structural Inductive Bias over PERT Networks

  • Da Sun Handason Tam
  • Huanle Xu
  • Yang Liu
  • Siyue Xie
  • Wing Cheong Lau

The recent surge in popularity of cloud-native applications using microservice architectures has led to a focus on accurate end-to-end latency prediction for proactive resource allocation. Existing models leverage Graph Transformers to Microservice Call Graphs or the Program Evaluation and Review Technique (PERT) graphs to capture complex temporal dependencies between microservices. However, these models incur a high computational cost during both training and inference phases. This paper introduces FastPERT, an efficient model for predicting end-to-end latency in microservice applications. FastPERT dissects an execution trace into several microservices tasks, using observations from prior execution traces of the application, akin to the PERT approach. Subsequently, a prediction model is constructed to estimate the completion time for each individual task. This information, coupled with the computational and structural inductive bias of the PERT graph, facilitates the efficient computation of the end-to-end latency of an execution trace. As a result, FastPERT can efficiently capture the complex temporal causality of different microservice tasks without relying on Graph Neural Networks, leading to more accurate and robust latency predictions across a variety of applications. An evaluation based on datasets generated from large-scale Alibaba microservice traces reveals that FastPERT significantly improves training and inference efficiency without compromising performance, demonstrating its potential as a superior solution for real-time end-to-end latency prediction in cloud-native microservice applications.

NeurIPS Conference 2025 Conference Paper

Federated Continual Learning via Orchestrating Multi-Scale Expertise

  • Xiaoyang Yi
  • Yang Liu
  • Binhan Yang
  • Jian Zhang

Federated continual learning (FCL) aims to maintain the model's performance on old tasks (i. e. , stability) while enhancing its ability to acquire knowledge from current tasks (i. e. , plasticity). With the development of pre-trained models (PTMs), fine-tuning PTMs on clients has become a promising approach to leveraging their extensive knowledge in FCL. In this paper, we propose MultiFCL, a novel FCL framework that fine-tunes PTMs to adapt to FCL while preserving their strong generalization capabilities. Specifically, to ensure the stability, MultiFCL introduces lightweight adapters for task adaption, which are subsequently frozen to prevent catastrophic forgetting. Moreover, by utilizing the semantic features of old tasks, MultiFCL performs multi-modal initialization of new task class prototypes. To enhance the plasticity, MultiFCL employs a multi-expert training mechanism that integrates multi-scale feature learning with multi-teacher dynamic self-distillation. Through intra-client and inter-client expert communication, MultiFCL facilitates cross-task and cross-client knowledge fusion. Experimental results demonstrate that MultiFCL achieves state-of-the-art performance across multiple datasets and settings, showcasing its effectiveness in FCL scenarios.

AAAI Conference 2025 Conference Paper

From Coarse to Fine: A Matching and Alignment Framework for Unsupervised Cross-View Geo-Localization

  • Xueyi Wang
  • Lele Zhang
  • Zheng Fan
  • Yang Liu
  • Chen Chen
  • Fang Deng

Cross-view geo-localization aims at determining the geographic location of a query image by matching the reference images. The matching pairs can be captured from diverse perspectives, such as those from satellites and drones. Most existing methods are supervised that require input of location-labeled images or matched and unmatched image pairs for training, resulting in high labor costs. Moreover, current unsupervised methods perform instances matching directly between different perspectives with dramatic discrepancies, resulting in poor performance. To address these issues, this paper proposes a novel matching and alignment framework from coarse instance-cluster level to fine intermediate instance level for unsupervised cross-view geo-localization. We first introduces cluster-based contrastive learning, assigning pseudo-labels to the instances and generate clusters within each view. Then we design a cross-view location alignment module that fully exploits the feature relationships between instances and clusters for intra- and inter-views. Finally, we design an intermediate state transition module that facilitates further alignment between views by constructing intermediate states and bringing both views closer to the intermediate domain simultaneously. Extensive experiments demonstrate that our method surpasses state-of-the-art unsupervised cross-view geo-localization methods and even achieves comparable performance to state-of-the-art supervised methods.

NeurIPS Conference 2025 Conference Paper

Generating Full-field Evolution of Physical Dynamics from Irregular Sparse Observations

  • Panqi Chen
  • Yifan Sun
  • Lei Cheng
  • Yang Yang
  • Weichang Li
  • Yang Liu
  • Weiqing Liu
  • Jiang Bian

Modeling and reconstructing multidimensional physical dynamics from sparse and off-grid observations presents a fundamental challenge in scientific research. Recently, diffusion-based generative modeling shows promising potential for physical simulation. However, current approaches typically operate on on-grid data with preset spatiotemporal resolution, but struggle with the sparsely observed and continuous nature of real-world physical dynamics. To fill the gaps, we present SDIFT, Sequential DIffusion in Functional Tucker space, a novel framework that generates full-field evolution of physical dynamics from irregular sparse observations. SDIFT leverages the functional Tucker model as the latent space representer with proven universal approximation property, and represents sparse observations as latent functions and Tucker core sequences. We then construct a sequential diffusion model with temporally augmented UNet in the functional Tucker space, denoising noise drawn from a Gaussian process to generate the sequence of core tensors. At the posterior sampling stage, we propose a Message-Passing Posterior Sampling mechanism, enabling conditional generation of the entire sequence guided by observations at limited time steps. We validate SDIFT on three physical systems spanning astronomical (supernova explosions, light-year scale), environmental (ocean sound speed fields, kilometer scale), and molecular (organic liquid, millimeter scale) domains, demonstrating significant improvements in both reconstruction accuracy and computational efficiency compared to state-of-the-art approaches.

AAAI Conference 2025 Conference Paper

Generative Video Diffusion for Unseen Novel Semantic Video Moment Retrieval

  • Dezhao Luo
  • Shaogang Gong
  • Jiabo Huang
  • Hailin Jin
  • Yang Liu

Video moment retrieval (VMR) aims to locate the most likely video moment(s) corresponding to a text query in untrimmed videos. Training of existing methods is limited by the lack of diverse and generalisable VMR datasets, hindering their ability to generalise moment-text associations to queries containing novel semantic concepts (unseen both visually and textually in a training source domain). For model generalisation to novel semantics, existing methods rely heavily on assuming to have access to both video and text sentence pairs from a target domain in addition to the source domain pair-wise training data. This is neither practical nor scalable. In this work, we introduce a more generalisable approach by assuming only text sentences describing new semantics are available in model training without having seen any videos from a target domain. To that end, we propose a Fine-grained Video Editing framework, termed FVE, that explores generative video diffusion to facilitate fine-grained video editing from the seen source concepts to the unseen target sentences consisting of new concepts. This enables generative hypotheses of unseen video moments corresponding to the novel concepts in the target domain. This fine-grained generative video diffusion retains the original video structure and subject specifics from the source domain while introducing semantic distinctions of unseen novel vocabularies in the target domain. A critical challenge is how to enable this generative fine-grained diffusion process to be meaningful in optimising VMR, more than just synthesising visually pleasing videos. We solve this problem by introducing a hybrid selection mechanism that integrates three quantitative metrics to selectively incorporate synthetic video moments (novel video hypotheses) as enlarged additions to the original source training data, whilst minimising potential detrimental noise or unnecessary repetitions in the novel synthetic videos harmful to VMR learning. Experiments on three datasets demonstrate the effectiveness of FVE to unseen novel semantic video moment retrieval tasks

IROS Conference 2025 Conference Paper

GFM-Planner: Perception-Aware Trajectory Planning with Geometric Feature Metric

  • Yue Lin
  • Xiaoxuan Zhang
  • Yang Liu
  • Dong Wang 0004
  • Huchuan Lu

Like humans who rely on landmarks for orientation, autonomous robots depend on feature-rich environments for accurate localization. In this paper, we propose the GFM-Planner, a perception-aware trajectory planning framework based on the geometric feature metric, which enhances LiDAR localization accuracy by guiding the robot to avoid degraded areas. First, we derive the Geometric Feature Metric (GFM) from the fundamental LiDAR localization problem. Next, we design a 2D grid-based Metric Encoding Map (MEM) to efficiently store GFM values across the environment. A constant-time decoding algorithm is further proposed to retrieve GFM values for arbitrary poses from the MEM. Finally, we develop a perception-aware trajectory planning algorithm that improves LiDAR localization capabilities by guiding the robot in selecting trajectories through feature-rich areas. Both simulation and real-world experiments demonstrate that our approach enables the robot to actively select trajectories that significantly enhance LiDAR localization accuracy.

EAAI Journal 2025 Journal Article

Health- and behavior-aware energy management strategy for fuel cell hybrid electric vehicles based on parallel deep deterministic policy gradient learning

  • Haochen Sun
  • Jing Li
  • Chun Cheng
  • Suzhen Shi
  • Jing Wang
  • Jingjing Lin
  • Yang Liu

To find a more optimal way to solve the energy management strategy (EMS) of fuel cell hybrid electric vehicles (FCHEVs), the majority of existing research focuses on external driving conditions, while the driver’s behavior as a more important internal influence factor also needs to be taken into account. In this paper, a health- and behavior-aware two-layer hierarchical energy management framework using an improved adaptive parallel deep deterministic policy gradient (DDPG) learning algorithm is proposed for obtaining the optimal EMS of a multi-source FCHEV. In the upper layer, machine learning approaches are employed to recognize the real-time driver’s behavior, and Pontryagin’s minimum principle is applied to calculate the optimal equivalent factor of each driver’s behavior. In the lower layer, to protect the service life of fuel cell and battery as well as increase the learning efficiency, an adaptive fuzzy filter is used, and a health- and behavior-aware multi-objective adaptive equivalent consumption minimization strategy model is constructed and solved by an improved adaptive parallel DDPG-based algorithm. Simulation results show that, the EMS obtained by the proposed DDPG algorithm can achieve the highest fuel cell (FC) working efficiency (approximate to 56%), apparently reduce the degree of degradation of battery (BAT) from 0. 42% to 0. 28%, and achieve a reduction of 9. 24% in terms of the total cost to use compared with deep Q network (DQN)-based EMS.

AAAI Conference 2025 Conference Paper

Human and AI Perceptual Differences in Image Classification Errors

  • Minghao Liu
  • Jiaheng Wei
  • Yang Liu
  • James Davis

Artificial intelligence (AI) models for computer vision trained with supervised machine learning are assumed to solve classification tasks by imitating human behavior learned from training labels. Most efforts in recent vision research focus on measuring the model task performance using standardized benchmarks such as accuracy. However limited work has sought to understand the perceptual difference between humans and machines. To fill this gap, this study first analyzes the statistical distributions of mistakes from the two sources, and then explores how task difficulty level affects these distributions. We find that even when AI learns an excellent model from the training data, one that outperforms humans in overall accuracy, these AI models have significant and consistent differences from human perception. We demonstrate the importance of studying these differences with a simple human-AI teaming algorithm that outperforms humans alone, AI alone, or AI-AI teaming.

IJCAI Conference 2025 Conference Paper

In-Context Meta LoRA Generation

  • Yihua Shao
  • Minxi Yan
  • Yang Liu
  • Siyu Chen
  • Wenjie Chen
  • Xinwei Long
  • Ziyang Yan
  • Lei Li

Low-rank Adaptation (LoRA) has demonstrated remarkable capabilities for task specific fine-tuning. However, in scenarios that involve multiple tasks, training a separate LoRA model for each one results in considerable inefficiency in terms of storage and inference. Moreover, existing parameter generation methods fail to capture the correlations among these tasks, making multi-task LoRA parameter generation challenging. To address these limitations, we propose In-Context Meta LoRA (ICM-LoRA), a novel approach that efficiently achieves task-specific customization of large language models (LLMs). Specifically, we use training data from all tasks to train a tailored generator, Conditional Variational Autoencoder (CVAE). CVAE takes task descriptions as inputs and produces task-aware LoRA weights as outputs. These LoRA weights are then merged with LLMs to create task-specialized models without the need for additional fine-tuning. Furthermore, we utilize in-context meta-learning for knowledge enhancement and task mapping, to capture the relationship between tasks and parameter distributions. As a result, our method achieves more accurate LoRA parameter generation for diverse tasks using CVAE. ICM-LoRA enables more accurate LoRA parameter reconstruction than current parameter reconstruction methods and is useful for implementing task-specific enhancements of LoRA parameters. At the same time, our method occupies 283MB, only 1% storage compared with the original LoRA. The code is available at https: //github. com/YihuaJerry/ICM-LoRA.

NeurIPS Conference 2025 Conference Paper

Incentivizing LLMs to Self-Verify Their Answers

  • Fuxiang Zhang
  • Jiacheng Xu
  • Chaojie Wang
  • Ce Cui
  • Yang Liu
  • Bo An

Large Language Models (LLMs) have demonstrated remarkable progress in complex reasoning tasks through both post-training and test-time scaling laws. While prevalent test-time scaling approaches are often realized by using external reward models to guide the model generation process, we find that only marginal gains can be acquired when scaling a model post-trained on specific reasoning tasks. We identify that the limited improvement stems from distribution discrepancies between the specific post-trained generator and the general reward model. To address this, we propose a framework that incentivizes LLMs to self-verify their own answers. By unifying answer generation and verification within a single reinforcement learning (RL) process, we train models that can effectively assess the correctness of their own solutions. The trained model can further scale its performance at inference time by verifying its generations, without the need for external verifiers. We train our self-verification models based on Qwen2. 5-Math-7B and DeepSeek-R1-Distill-Qwen-1. 5B, demonstrating their capabilities across varying reasoning context lengths. Experiments on multiple mathematical reasoning benchmarks show that our models can not only improve post-training performance but also enable effective test-time scaling. Our code is available at https: //github. com/mansicer/self-verification.

NeurIPS Conference 2025 Conference Paper

INST-IT: Boosting Instance Understanding via Explicit Visual Prompt Instruction Tuning

  • Wujian Peng
  • Lingchen Meng
  • Yitong Chen
  • Yiweng Xie
  • Yang Liu
  • Tao Gui
  • Hang Xu
  • Xipeng Qiu

Large Multimodal Models (LMMs) have made significant breakthroughs with the advancement of instruction tuning. However, while existing models can understand images and videos at a holistic level, they still struggle with instance-level understanding that requires a more fine-grained comprehension and alignment. Instance-level understanding is crucial for LMMs, as it focuses on the specific elements that we are most interested in. Excitingly, existing works find that the state-of-the-art LMMs exhibit strong instance understanding capabilities when provided with explicit visual cues. Motivated by this, we proposed Inst-IT, a solution to enhance LMMs in Instance understanding via explicit visual prompt Instruction Tuning for instance guidance. Inst-IT consists of a benchmark to diagnose multimodal instance-level understanding, a large-scale instruction-tuning dataset, and a continuous instruction-tuning training paradigm to effectively enhance spatial-temporal instance understanding capabilities of existing LMMs. Experimental results show that, enhanced by Inst-IT, our models not only achieve outstanding performance on Inst-IT-Bench and other instance understanding benchmarks, but also demonstrate significant improvements across various generic image and video understanding benchmarks. This highlights that our method not only boosts instance-level understanding but also strengthens the overall capabilities of generic image and video comprehension.

EAAI Journal 2025 Journal Article

Knowledge-enhanced multi-objective memetic algorithm for energy-efficient flexible job shop scheduling with limited multi-load automated guided vehicles

  • Lianghua Fan
  • Qi Lei
  • Yuchuan Song
  • Yang Liu
  • Yunfan Yang

In alignment with the national call for energy conservation and emission reduction, energy-efficient scheduling in manufacturing, especially intelligent workshops, has become a key research area. Automated guided vehicles (AGVs), as the core component of intelligent logistics systems, especially in applying multi-load AGVs, play a vital role in improving green manufacturing and optimizing logistics efficiency. While AGV transportation is considered in traditional energy-saving scheduling, most studies assume unlimited AGVs, and each can only load one job. This paper is the first to study the energy-efficient flexible job shop scheduling with limited multi-load AGVs (EFJSP-LMA), which integrates the sequencing of pickup and delivery tasks, and the allocation strategy of machines and AGVs. To address this problem effectively, the multi-objective mixed-integer programming (MMIP) model is developed to optimize the makespan and total energy consumption. To solve the MMIP model, a knowledge-enhanced multi-objective memetic algorithm (KMMA) is proposed. In the proposed KMMA, problem-specific heuristics are designed to generate a high-quality initial population with strong convergence and diversity. Subsequently, five knowledge-enhanced variable neighborhood structures are designed to enhance the quality and diversity of solutions. Additionally, an energy-saving strategy is incorporated to further optimize energy consumption. The effect of AGV quantity and load modes on the performance of the production system is studied and analyzed. Furthermore, experiment results of 60 test instances indicate that KMMA outperforms comparison algorithms, demonstrating its effectiveness in addressing the EFJSP-LMA. Finally, Real-world case studies further support our research, offering valuable insights for managing manufacturing environments.

NeurIPS Conference 2025 Conference Paper

LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS

  • Wanhua Li
  • Yujie Zhao
  • Minghan Qin
  • Yang Liu
  • Yuanhao Cai
  • Chuang Gan
  • Hanspeter Pfister

In this paper, we introduce LangSplatV2, which achieves high-dimensional feature splatting at 476. 2 FPS and 3D open-vocabulary text querying at 384. 6 FPS for high-resolution images, providing a 42 × speedup and a 47 × boost over LangSplat respectively, along with improved query accuracy. LangSplat employs Gaussian Splatting to embed 2D CLIP language features into 3D, significantly enhancing speed and learning a precise 3D language field with SAM semantics. Such advancements in 3D language fields are crucial for applications that require language interaction within complex scenes. However, LangSplat does not yet achieve real- time performance (8. 2 FPS), even with advanced A100 GPUs, severely limiting its broader application. In this paper, we first conduct a detailed time analysis of LangSplat, identifying the heavyweight decoder as the primary speed bottleneck. Our solution, LangSplatV2 assumes that each Gaussian acts as a sparse code within a global dictionary, leading to the learning of a 3D sparse coefficient field that entirely eliminates the need for a heavyweight decoder. By leveraging this sparsity, we further propose an efficient sparse coefficient splatting method with CUDA optimization, rendering high-dimensional feature maps at high quality while incurring only the time cost of splatting an ultra-low-dimensional feature. Our experimental results demonstrate that LangSplatV2 not only achieves better or competitive query accuracy but is also significantly faster. Codes and demos are available at our project page: https: //langsplat-v2. github. io.

NeurIPS Conference 2025 Conference Paper

Latent Retrieval Augmented Generation of Cross-Domain Protein Binders

  • Zishen Zhang
  • Xiangzhe Kong
  • Wenbing Huang
  • Yang Liu

Designing protein binders targeting specific sites, which requires to generate realistic and functional interaction patterns, is a fundamental challenge in drug discovery. Current structure-based generative models are limited in generating nterfaces with sufficient rationality and interpretability. In this paper, we propose R etrieval- A ugmented Di ffusion for A lig n ed interfa ce ( RADiAnce ), a new framework that leverages known interfaces to guide the design of novel binders. By unifying retrieval and generation in a shared contrastive latent space, our model efficiently identifies relevant interfaces for a given binding site and seamlessly integrates them through a conditional latent diffusion generator, enabling cross-domain interface transfer. Extensive exeriments show that RADiAnce significantly outperforms baseline models across multiple metrics, including binding affinity and recovery of geometries and interactions. Additional experimental results validate cross-domain generalization, demonstrating that retrieving interfaces from diverse domains, such as peptides, antibodies, and protein fragments, enhances the generation performance of binders for other domains. Our work establishes a new paradigm for protein binder design that successfully bridges retrieval-based knowledge and generative AI, opening new possibilities for drug discovery.

NeurIPS Conference 2025 Conference Paper

Learning 3D Anisotropic Noise Distributions Improves Molecular Force Fields

  • Xixian Liu
  • Rui Jiao
  • Zhiyuan Liu
  • Yurou Liu
  • Yang Liu
  • Ziheng Lu
  • Wenbing Huang
  • Yang Zhang

Coordinate denoising has emerged as a promising method for 3D molecular pretraining due to its theoretical connection to learning molecular force field. However, existing denoising methods rely on oversimplied molecular dynamics that assume atomic motions to be isotropic and homoscedastic. To address these limitations, we propose a novel denoising framework AniDS: Anisotropic Variational Autoencoder for 3D Molecular Denoising. AniDS introduces a structure-aware anisotropic noise generator that can produce atom-specific, full covariance matrices for Gaussian noise distributions to better reflect directional and structural variability in molecular systems. These covariances are derived from pairwise atomic interactions as anisotropic corrections to an isotropic base. Our design ensures that the resulting covariance matrices are symmetric, positive semi-definite, and SO(3)-equivariant, while providing greater capacity to model complex molecular dynamics. Extensive experiments show that AniDS outperforms prior isotropic and homoscedastic denoising models and other leading methods on the MD17 and OC22 benchmarks, achieving average relative improvements of 8. 9% and 6. 2% in force prediction accuracy. Our case study on a crystal and molecule structure shows that AniDS adaptively suppresses noise along the bonding direction, consistent with physicochemical principles. Our code is available at https: //github. com/ZeroKnighting/AniDS.

NeurIPS Conference 2025 Conference Paper

Learning CAD Modeling Sequences via Projection and Part Awareness

  • Yang Liu
  • Daxuan Ren
  • Yijie Ding
  • Jianmin Zheng
  • Fang Deng

This paper presents PartCAD, a novel framework for reconstructing CAD modeling sequences directly from point clouds by projection-guided, part-aware geometry reasoning. It consists of (1) an autoregressive approach that decomposes point clouds into part-aware latent representations, serving as interpretable anchors for CAD generation; (2) a projection guidance module that provides explicit cues about underlying design intent via triplane projections; and (3) a non-autoregressive decoder to generate sketch-extrusion parameters in a single forward pass, enabling efficient and structurally coherent CAD instruction synthesis. By bridging geometric signals and semantic understanding, PartCAD tackles the challenge of reconstructing editable CAD models—capturing underlying design processes—from 3D point clouds. Extensive experiments show that PartCAD significantly outperforms existing methods for CAD instruction generation in both accuracy and robustness. The work sheds light on part-driven reconstruction of interpretable CAD models, opening new avenues in reverse engineering and CAD automation.

NeurIPS Conference 2025 Conference Paper

Learning Counterfactual Outcomes Under Rank Preservation

  • Peng Wu
  • Haoxuan Li
  • Chunyuan Zheng
  • Yan Zeng
  • Jiawei Chen
  • Yang Liu
  • Ruocheng Guo
  • Kun Zhang

Counterfactual inference aims to estimate the counterfactual outcome at the individual level given knowledge of an observed treatment and the factual outcome, with broad applications in fields such as epidemiology, econometrics, and management science. Previous methods rely on a known structural causal model (SCM) or assume the homogeneity of the exogenous variable and strict monotonicity between the outcome and exogenous variable. In this paper, we propose a principled approach for identifying and estimating the counterfactual outcome. We first introduce a simple and intuitive rank preservation assumption to identify the counterfactual outcome without relying on a known structural causal model. Building on this, we propose a novel ideal loss for theoretically unbiased learning of the counterfactual outcome and further develop a kernel-based estimator for its empirical estimation. Our theoretical analysis shows that the rank preservation assumption is not stronger than the homogeneity and strict monotonicity assumptions, and shows that the proposed ideal loss is convex, and the proposed estimator is unbiased. Extensive semi-synthetic and real-world experiments are conducted to demonstrate the effectiveness of the proposed method.

AAAI Conference 2025 Conference Paper

Learning Dynamic Similarity by Bidirectional Hierarchical Sliding Semantic Probe for Efficient Text Video Retrieval

  • Yang Liu
  • Shudong Huang
  • Deng Xiong
  • Jiancheng Lv

Text-video retrieval is a foundation task in multi-modal research which aims to align texts and videos in the embedding space. The key challenge is to learn the similarity between videos and texts. A conventional approach involves directly aligning video-text pairs using cosine similarity. However, due to the disparity in the information conveyed by videos and texts, i.e., a single video can be described from multiple perspectives, the retrieval accuracy is suboptimal. An alternative approach employs cross-modal interaction to enable videos to dynamically acquire distinct features from various texts, thus facilitating similarity calculations. Nevertheless, this solution incurs a computational complexity of O(n^2) during retrieval. To this end, this paper proposes a novel method called Bidirectional Hierarchical Sliding Semantic Probe (BiHSSP), which calculates dynamic similarity between videos and texts with O(n) complexity during retrieval. We introduce a hierarchical semantic probe module that learns semantic probes at different scales for both video and text features. Semantic probe involves a sliding calculation of the cross-correlation between semantic probes at different scales and embeddings from another modality, allowing for dynamic similarity computation between video and text descriptions from various perspectives. Specifically, for text descriptions from different angles, we calculate the similarity at different locations within the video features and vice versa. This approach preserves the complete information of the video while addressing the issue of unequal information between video and text without requiring cross-modal interaction. Additionally, our method can function as a plug-and-play module across various methods, thereby enhancing the corresponding performance. Experimental results demonstrate that our BiHSSP significantly outperforms the baseline.

NeurIPS Conference 2025 Conference Paper

LIFEBENCH: Evaluating Length Instruction Following in Large Language Models

  • Wei Zhang
  • Zhenhong Zhou
  • Kun Wang
  • Junfeng Fang
  • Rongwu Xu
  • Yuanhe Zhang
  • Rui Wang
  • Ge Zhang

While large language models (LLMs) can solve PhD-level reasoning problems over long context inputs, they still struggle with a seemingly simpler task: following explicit length instructions —e. g. , write a 10, 000-word novel. Additionally, models often generate far too short outputs, terminate prematurely, or even refuse the request. Existing benchmarks focus primarily on evaluating generations quality, but often overlook whether the generations meet length constraints. To this end, we introduce Length Instruction Following Evaluation Benchmark (LIFEBench) to comprehensively evaluate LLMs' ability to follow length instructions across diverse tasks and a wide range of specified lengths. LIFEBench consists of 10, 800 instances across 4 task categories in both English and Chinese, covering length constraints ranging from 16 to 8192 words. We evaluate 26 widely-used LLMs and find that most models reasonably follow short-length instructions but deteriorate sharply beyond a certain threshold. Surprisingly, almost all models fail to reach the vendor-claimed maximum output lengths in practice, as further confirmed by our evaluations extending up to 32K words. Even long-context LLMs, despite their extended input-output windows, counterintuitively fail to improve length-instructions following. Notably, Reasoning LLMs outperform even specialized long-text generation models, achieving state-of-the-art length following. Overall, LIFEBench uncovers fundamental limitations in current LLMs' length instructions following ability, offering critical insights for future progress.

AAAI Conference 2025 Conference Paper

Logic-Q: Improving Deep Reinforcement Learning-based Quantitative Trading via Program Sketch-based Tuning

  • Zhiming Li
  • Junzhe Jiang
  • Yushi Cao
  • Aixin Cui
  • Bozhi Wu
  • Bo Li
  • Yang Liu
  • Danny Dongning Sun

Deep reinforcement learning (DRL) has revolutionized quantitative trading (Q-trading) by achieving decent performance without significant human expert knowledge. Despite its achievements, we observe that the current state-of-the-art DRL models are still ineffective in identifying the market trends, causing them to miss good trading opportunity or suffer from large drawdowns when encountering market crashes. To address this limitation, a natural approach is to incorporate human expert knowledge in identifying market trends. Whereas, such knowledge is abstract and hard to be quantified. In order to effectively leverage abstract human expert knowledge, in this paper, we propose a universal logic-guided deep reinforcement learning framework for Q-trading, called Logic-Q. In particular, Logic-Q adopts the program synthesis by sketching paradigm and introduces a logic-guided model design that leverages a lightweight, plug-and-play market trend-aware program sketch to determine the market trend and correspondingly adjusts the DRL policy in a post-hoc manner. Extensive evaluations of two popular quantitative trading tasks demonstrate that Logic-Q can significantly improve the performance of previous state-of-the-art DRL trading strategies.

TMLR Journal 2025 Journal Article

Long-Term Fairness Inquiries and Pursuits in Machine Learning: A Survey of Notions, Methods, and Challenges

  • Usman Gohar
  • Zeyu Tang
  • Jialu Wang
  • Kun Zhang
  • Peter Spirtes
  • Yang Liu
  • Lu Cheng

The widespread integration of Machine Learning systems in daily life, particularly in high-stakes domains, has raised concerns about the fairness implications. While prior works have investigated static fairness measures, recent studies reveal that automated decision-making has long-term implications and that off-the-shelf fairness approaches may not serve the purpose of achieving long-term fairness. Additionally, the existence of feedback loops and the interaction between models and the environment introduces additional complexities that may deviate from the initial fairness goals. In this survey, we review existing literature on long-term fairness from different perspectives and present a taxonomy for long-term fairness studies. We highlight key challenges and consider future research directions, analyzing both current issues and potential further explorations.

AAAI Conference 2025 Conference Paper

MambaPro: Multi-Modal Object Re-identification with Mamba Aggregation and Synergistic Prompt

  • Yuhao Wang
  • Xuehu Liu
  • Tianyu Yan
  • Yang Liu
  • Aihua Zheng
  • Pingping Zhang
  • Huchuan Lu

Multi-modal object Re-IDentification (ReID) aims to retrieve specific objects by utilizing complementary image information from different modalities. Recently, large-scale pre-trained models like CLIP have demonstrated impressive performance in traditional single-modal ReID tasks. However, they remain unexplored for multi-modal object ReID. Furthermore, current multi-modal aggregation methods have obvious limitations in dealing with long sequences from different modalities. To address above issues, we introduce a novel framework called MambaPro for multi-modal object ReID. To be specific, we first employ a Parallel Feed-Forward Adapter (PFA) for adapting CLIP to multi-modal object ReID. Then, we propose the Synergistic Residual Prompt (SRP) to guide the joint learning of multi-modal features. Finally, leveraging Mamba's superior scalability for long sequences, we introduce Mamba Aggregation (MA) to efficiently model interactions between different modalities. As a result, MambaPro could extract more robust features with lower complexity. Extensive experiments on three multi-modal object ReID benchmarks (i.e., RGBNT201, RGBNT100 and MSVR310) validate the effectiveness of our proposed methods.

NeurIPS Conference 2025 Conference Paper

Mesh Interpolation Graph Network for Dynamic and Spatially Irregular Global Weather Forecasting

  • Zinan Zheng
  • Yang Liu
  • Jia Li

Graph neural networks have shown promising results in weather forecasting, which is critical for human activity such as agriculture planning and extreme weather preparation. However, most studies focus on finite and local areas for training, overlooking the influence of broader areas and limiting their ability to generalize effectively. Thus, in this work, we study global weather forecasting that is irregularly distributed and dynamically varying in practice, requiring the model to generalize to unobserved locations. To address such challenges, we propose a general Mesh Interpolation Graph Network (MIGN) that models the irregular weather station forecasting, consisting of two key designs: (1) learning spatially irregular data with regular mesh interpolation network to align the data; (2) leveraging parametric spherical harmonics location embedding to further enhance spatial generalization ability. Extensive experiments on an up-to-date observation dataset show that MIGN significantly outperforms existing data-driven models. Besides, we show that MIGN has spatial generalization ability, and is capable of generalizing to previously unseen stations.

NeurIPS Conference 2025 Conference Paper

MM-OPERA: Benchmarking Open-ended Association Reasoning for Large Vision-Language Models

  • Zimeng Huang
  • Jinxin Ke
  • Xiaoxuan Fan
  • Yufeng Yang
  • Yang Liu
  • Liu Zhonghan
  • Zedi Wang
  • Junteng Dai

Large Vision-Language Models (LVLMs) have exhibited remarkable progress. However, deficiencies remain compared to human intelligence, such as hallucination and shallow pattern matching. In this work, we aim to evaluate a fundamental yet underexplored intelligence: association, a cornerstone of human cognition for creative thinking and knowledge integration. Current benchmarks, often limited to closed-ended tasks, fail to capture the complexity of open-ended association reasoning vital for real-world applications. To address this, we present MM-OPERA, a systematic benchmark with 11, 497 instances across two open-ended tasks: Remote-Item Association (RIA) and In-Context Association (ICA), aligning association intelligence evaluation with human psychometric principles. It challenges LVLMs to resemble the spirit of divergent thinking and convergent associative reasoning through free-form responses and explicit reasoning paths. We deploy tailored LLM-as-a-Judge strategies to evaluate open-ended outputs, applying process-reward-informed judgment to dissect reasoning with precision. Extensive empirical studies on state-of-the-art LVLMs, including sensitivity analysis of task instances, validity analysis of LLM-as-a-Judge strategies, and diversity analysis across abilities, domains, languages, cultures, etc. , provide a comprehensive and nuanced understanding of the limitations of current LVLMs in associative reasoning, paving the way for more human-like and general-purpose AI. The dataset and code are available at https: //github. com/MM-OPERA-Bench/MM-OPERA.

NeurIPS Conference 2025 Conference Paper

MOF-BFN: Metal-Organic Frameworks Structure Prediction via Bayesian Flow Networks

  • Rui Jiao
  • Hanlin Wu
  • Wenbing Huang
  • Yuxuan Song
  • Yawen Ouyang
  • Yu Rong
  • Tingyang Xu
  • Pengju Wang

Metal-Organic Frameworks (MOFs) have attracted considerable attention due to their unique properties including high surface area and tunable porosity, and promising applications in catalysis, gas storage, and drug delivery. Structure prediction for MOFs is a challenging task, as these frameworks are intrinsically periodic and hierarchically organized, where the entire structure is assembled from building blocks like metal nodes and organic linkers. To address this, we introduce MOF-BFN, a novel generative model for MOF structure prediction based on Bayesian Flow Networks (BFNs). Given the local geometry of building blocks, MOF-BFN jointly predicts the lattice parameters, as well as the positions and orientations of all building blocks within the unit cell. In particular, the positions are modelled in the fractional coordinate system to naturally incorporate the periodicity. Meanwhile, the orientations are modeled as unit quaternions sampled from learned Bingham distributions via the proposed Bingham BFN, enabling effective orientation generation on the 4D unit hypersphere. Experimental results demonstrate that MOF-BFN achieves state-of-the-art performance across multiple tasks, including structure prediction, geometric property evaluation, and de novo generation, offering a promising tool for designing complex MOF materials.

EAAI Journal 2025 Journal Article

Multi-class Agent Trajectory Prediction with Selective State Spaces for autonomous driving

  • Jin Fan
  • Zhanwen Liu
  • Yong Fang
  • Zeyu Huang
  • Yang Liu
  • Shan Lin

Understanding and predicting multi-class agents’ movement has become more critical and challenging in diverse applications such as autonomous driving and urban intelligent monitoring. The current research mainly focuses on the motion trajectory of single-class agents. However, due to real traffic scenarios’ complexity and interactive behaviors’ variability, the motion patterns displayed by various classes of agents show inherent randomness. In this paper, inspired by the linear-time sequence model Mamba, we propose a Multi-class Agent Trajectory Prediction with Selective State Spaces (MTPSS) to model the interaction between different agents and better predict the trajectory of an individual. Specifically, MTPSS includes modeling relationships in both temporal and spatial dimensions. When encoding the spatial correlation within the trajectory graph, we construct a category-based sorting approach, which puts large-size category nodes behind to enhance contextual access. Then the sorted nodes are bi-directionally scanned through Mamba blocks, which makes the model more robust to permutations. In terms of temporal, considering the highly dynamic nature of rapidly moving agents, we utilize Mamba’s remarkable performance on sequential data to effectively conduct temporal scans to capture long-range temporal dependencies. Finally, to compute physically feasible trajectories, MTPSS employs the Neural Ordinary Differential Equation to smooth the predicted trajectory of the agent. We conducted extensive experiments on two publicly available traffic datasets and compared our method with state-of-the-art methods. Quantitative experiments show that our performance metrics are superior to state-of-the-art methods, and qualitative experiments demonstrate that the predicted trajectories have good diversity, which shows its potential in real-world traffic scenarios.

AAAI Conference 2025 Conference Paper

Multifaceted User Modeling in Recommendation: A Federated Foundation Models Approach

  • Chunxu Zhang
  • Guodong Long
  • Hongkuan Guo
  • Zhaojie Liu
  • Guorui Zhou
  • Zijian Zhang
  • Yang Liu
  • Bo Yang

Multifaceted user modeling aims to uncover fine-grained patterns and learn representations from user data, revealing their diverse interests and characteristics, such as profile, preference, and personality. Recent studies on foundation model-based recommendation have emphasized the Transformer architecture's remarkable ability to capture complex, non-linear user-item interaction relationships. This paper aims to advance foundation model-based recommendersystems by introducing enhancements to multifaceted user modeling capabilities. We propose a novel Transformer layer designed specifically for recommendation, using the self-attention mechanism to capture sequential user-item interaction patterns. Specifically, we design a group gating network to identify user groups, enabling hierarchical discovery across different layers, thereby capturing the multifaceted nature of user interests through multiple Transformer layers. Furthermore, to broaden the data scope and further enhance multifaceted user modeling, we extend the framework to a federated setting, enabling the use of private datasets while ensuring privacy. Experimental validations on benchmark datasets demonstrate the superior performance of our proposed method.

YNIMG Journal 2025 Journal Article

Neural mechanisms of emotion-focused interventions: A meta-analytic review of fMRI studies

  • Yanlin Li
  • Geng Li
  • Yang Liu
  • Chengzhen Liu
  • Antao Chen

Emotion-focused interventions are emerging as promising tools to improve emotional functioning across clinical and nonclinical populations, yet their underlying neural mechanisms remain unclear. We conducted a coordinate-based meta-analysis (CBMA) using Seed-based d Mapping (SDM) of 20 task-based fMRI studies (N = 620) to quantify bidirectional activation changes associated with emotion-focused interventions. Results showed small-to-moderate improvements in emotional task performance (Hedges' g = 0.29) and self-reported affective outcomes (g = 0.54). Meta-analytic neuroimaging revealed increased activation in the right caudate and decreased activation in the right insula and left inferior frontal gyrus. Moderator analyses identified intervention type, emotional content, and delivery format as key modulators of these neural effects. Notably, reduced insula activity predicted better emotional outcomes, while right caudate activation increased with age. These findings are consistent with a dual-pathway model of neural plasticity-one marked by frontostriatal engagement (right caudate) and another by dampened salience and semantic-control responses (right insula, left inferior frontal gyrus). The results offer mechanistic insights into how emotion-focused training recalibrates regulatory networks and inform the development of targeted interventions.

NeurIPS Conference 2025 Conference Paper

Non-stationary Equivariant Graph Neural Networks for Physical Dynamics Simulation

  • Chaohao Yuan
  • Maoji Wen
  • Ercan KURUOGLU
  • Yang Liu
  • Jia Li
  • Tingyang Xu
  • Deli Zhao
  • Hong Cheng

To enhance the generalization ability of graph neural networks (GNNs) in learning and simulation physical dynamics, a series of equivariant GNNs have been developed to incorporate the symmetric inductive bias. However, the existing methods do not take into account the non-stationarity nature of physical dynamics, where the joint distribution changes over time. Moreover, previous approaches for modeling non-stationary time series typically involve normalizing the data, which disrupts the symmetric assumption inherent in physical dynamics. To model the non-stationary physical dynamics while preserving the symmetric inductive bias, we introduce a Non-Stationary Equivariant Graph Neural Network (NS-EGNN) to capture the non-stationarity in physical dynamics while preserving the symmetric property of the model. Specifically, NS-EGNN employs Fourier Transform on segments of physical dynamics to extract time-varying frequency information from the trajectories. It then uses the first and second-order differences to mitigate non-stationarity, followed by pooling for future predictions. Through capturing varying frequency characteristics and alleviate the linear and quadric trend in the raw physical dynamics, NS-EGNN better models the temporal dependencies in the physical dynamics. NS-EGNN has been applied on various types of physical dynamics, including molecular, motion and protein dynamics. In various scenario, NS-EGNN consistently surpasses the performance of existing state-of-the-art algorithms, underscoring its effectiveness. The implementation of NS-EGNN is available at https: //github. com/MaojiWEN/NS-EGNN.

AAAI Conference 2025 Conference Paper

Perception-Guided Jailbreak Against Text-to-Image Models

  • Yihao Huang
  • Le Liang
  • Tianlin Li
  • Xiaojun Jia
  • Run Wang
  • Weikai Miao
  • Geguang Pu
  • Yang Liu

In recent years, Text-to-Image (T2I) models have garnered significant attention due to their remarkable advancements. However, security concerns have emerged due to their potential to generate inappropriate or Not-Safe-For-Work (NSFW) images. In this paper, inspired by the observation that texts with different semantics can lead to similar human perceptions, we propose an LLM-driven perception-guided jailbreak method, termed PGJ. It is a black-box jailbreak method that requires no specific T2I model (model-free) and generates highly natural attack prompts. Specifically, we propose identifying a safe phrase that is similar in human perception yet inconsistent in text semantics with the target unsafe word and using it as a substitution. The experiments conducted on six open-source models and commercial online services with thousands of prompts have verified the effectiveness of PGJ.

IJCAI Conference 2025 Conference Paper

PeSANet: Physics-encoded Spectral Attention Network for Simulating PDE-Governed Complex Systems

  • Han Wan
  • Rui Zhang
  • Qi Wang
  • Yang Liu
  • Hao Sun

Accurately modeling and forecasting complex systems governed by partial differential equations (PDEs) is crucial in various scientific and engineering domains. However, traditional numerical methods struggle in real-world scenarios due to incomplete or unknown physical laws. Meanwhile, machine learning approaches often fail to generalize effectively when faced with scarce observational data and the challenge of capturing local and global features. To this end, we propose the Physics-encoded Spectral Attention Network (PeSANet), which integrates local and global information to forecast complex systems with limited data and incomplete physical priors. The model consists of two key components: a physics-encoded block that uses hard constraints to approximate local differential operators from limited data, and a spectral-enhanced block that captures long-range global dependencies in the frequency domain. Specifically, we introduce a novel spectral attention mechanism to model inter-spectrum relationships and learn long-range spatial features. Experimental results demonstrate that PeSANet outperforms existing methods across all metrics, particularly in long-term forecasting accuracy, providing a promising solution for simulating complex systems with limited data and incomplete physics.

JMLR Journal 2025 Journal Article

PFLlib: A Beginner-Friendly and Comprehensive Personalized Federated Learning Library and Benchmark

  • Jianqing Zhang
  • Yang Liu
  • Yang Hua
  • Hao Wang
  • Tao Song
  • Zhengui Xue
  • Ruhui Ma
  • Jian Cao

Amid the ongoing advancements in Federated Learning (FL), a machine learning paradigm that allows collaborative learning with data privacy protection, personalized FL (pFL) has gained significant prominence as a research direction within the FL domain. Whereas traditional FL (tFL) focuses on jointly learning a global model, pFL aims to balance each client's global and personalized goals in FL settings. To foster the pFL research community, we started and built PFLlib, a comprehensive pFL library with an integrated benchmark platform. In PFLlib, we implemented 37 state-of-the-art FL algorithms (8 tFL algorithms and 29 pFL algorithms) and provided various evaluation environments with three statistically heterogeneous scenarios and 24 datasets. At present, PFLlib has gained more than 1600 stars and 300 forks on GitHub. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2025. ( edit, beta )

ICLR Conference 2025 Conference Paper

PhyMPGN: Physics-encoded Message Passing Graph Network for spatiotemporal PDE systems

  • Bocheng Zeng
  • Qi Wang 0123
  • Mengtao Yan
  • Yang Liu
  • Ruizhi Chengze
  • Yi Zhang 0164
  • Hongsheng Liu 0002
  • Zidong Wang 0010

Solving partial differential equations (PDEs) serves as a cornerstone for modeling complex dynamical systems. Recent progresses have demonstrated grand benefits of data-driven neural-based models for predicting spatiotemporal dynamics (e.g., tremendous speedup gain compared with classical numerical methods). However, most existing neural models rely on rich training data, have limited extrapolation and generalization abilities, and suffer to produce precise or reliable physical prediction under intricate conditions (e.g., irregular mesh or geometry, complex boundary conditions, diverse PDE parameters, etc.). To this end, we propose a new graph learning approach, namely, Physics-encoded Message Passing Graph Network (PhyMPGN), to model spatiotemporal PDE systems on irregular meshes given small training datasets. Specifically, we incorporate a GNN into a numerical integrator to approximate the temporal marching of spatiotemporal dynamics for a given PDE system. Considering that many physical phenomena are governed by diffusion processes, we further design a learnable Laplace block, which encodes the discrete Laplace-Beltrami operator, to aid and guide the GNN learning in a physically feasible solution space. A boundary condition padding strategy is also designed to improve the model convergence and accuracy. Extensive experiments demonstrate that PhyMPGN is capable of accurately predicting various types of spatiotemporal dynamics on coarse unstructured meshes, consistently achieves the state-of-the-art results, and outperforms other baselines with considerable gains.

ICLR Conference 2025 Conference Paper

PINP: Physics-Informed Neural Predictor with latent estimation of fluid flows

  • Huaguan Chen
  • Yang Liu
  • Hao Sun 0002

Accurately predicting fluid dynamics and evolution has been a long-standing challenge in physical sciences. Conventional deep learning methods often rely on the nonlinear modeling capabilities of neural networks to establish mappings between past and future states, overlooking the fluid dynamics, or only modeling the velocity field, neglecting the coupling of multiple physical quantities. In this paper, we propose a new physics-informed learning approach that incorporates coupled physical quantities into the prediction process to assist with forecasting. Central to our method lies in the discretization of physical equations, which are directly integrated into the model architecture and loss function. This integration enables the model to provide robust, long-term future predictions. By incorporating physical equations, our model demonstrates temporal extrapolation and spatial generalization capabilities. Experimental results show that our approach achieves the state-of-the-art performance in spatiotemporal prediction across both numerical simulations and real-world extreme-precipitation nowcasting benchmarks.

AAAI Conference 2025 Conference Paper

PlanLLM: Video Procedure Planning with Refinable Large Language Models

  • Dejie Yang
  • Zijing Zhao
  • Yang Liu

Video procedure planning, i.e., planning a sequence of action steps given the video frames of start and goal states, is an essential ability for embodied AI. Recent works utilize Large Language Models (LLMs) to generate enriched action step description texts to guide action step decoding. Although LLMs are introduced these methods decode the action steps into a closed-set of one-hot vectors, limiting the model's capability of generalizing to new steps or tasks. Additionally, fixed action step descriptions based on world-level commonsense may contain noise in specific instances of visual states. In this paper, we propose PlanLLM, a cross-modal joint learning framework with LLMs for video procedure planning. We propose an LLM-Enhanced Planning module which fully uses the generalization ability of LLMs to produce free-form planning output and to enhance action step decoding. We also propose Mutual Information Maximization module to connect world-level commonsense of step descriptions and sample-specific information of visual states, enabling LLMs to employ the reasoning ability to generate step sequences. With the assistance of LLMs, our method can both closed-set and open vocabulary procedure planning tasks. Our PlanLLM achieves superior performance on three benchmarks, demonstrating the effectiveness of our designs.

JBHI Journal 2025 Journal Article

Predicting Surgical Outcome in Patients With Drug-Resistant Epilepsy Using Autoregressive Connectivity and Virtual Resection

  • Chunsheng Li
  • Heng Su
  • Yang Liu

Epilepsy is a brain network disorder that manifests through recurrent seizures. In cases of drug-resistant epilepsy, surgical removal of pivotal nodes within the epileptic brain network can lead to seizure freedom. Virtual resection on patient-specific brain network models can aid in the prediction of surgical outcomes. Some studies have investigated the virtual resection on undirected brain connectivity networks, such as using Pearson correlation or structural connectivity. We hypothesize that the directed functional connectivity enhances prediction performance. This study proposes a new approach for surgical outcome prediction by applying virtual resection of autoregressive (AR) connectivity networks in epilepsy patients. Intracranial EEG recordings from 16 drug-resistant epilepsy patients were analyzed. The performance of the proposed approach was evaluated based on patients' surgical volumes and prognosis outcome. We compared the performance of the AR connectivity with six other measures and concurrently explored three distinct neural mass models. The results show that virtual resection on AR connectivity demonstrated predictive accuracy at 87. 5% when paired with the bistable neural mass model. Notably, all eight patients with poor outcomes were accurately identified. In addition, our data shows that the estimated epileptic network is relatively stable during the interictal interval. Leveraging the AR model results in estimated directional connectivity among epileptic brain regions, which can then be used effectively for virtual resection. Our approach offers a promising avenue for clinicians in preoperative evaluation and augments existing clinical methodologies.

NeurIPS Conference 2025 Conference Paper

PubSub-VFL: Towards Efficient Two-Party Split Learning in Heterogeneous Environments via Publisher/Subscriber Architecture

  • Yi Liu
  • Yang Liu
  • Leqian Zheng
  • Jue Hong
  • Junjie Shi
  • Qingyou Yang
  • Ye Wu
  • Cong Wang

With the rapid advancement of the digital economy, data collaboration between organizations has become a well-established business model, driving the growth of various industries. However, privacy concerns make direct data sharing impractical. To address this, Two-Party Split Learning (a. k. a. Vertical Federated Learning (VFL)) has emerged as a promising solution for secure collaborative learning. Despite its advantages, this architecture still suffers from low computational resource utilization and training efficiency. Specifically, its synchronous dependency design increases training latency, while resource and data heterogeneity among participants further hinder efficient computation. To overcome these challenges, we propose \texttt{PubSub-VFL}, a novel VFL paradigm with a Publisher/Subscriber architecture optimized for two-party collaborative learning with high computational efficiency. \texttt{PubSub-VFL} leverages the decoupling capabilities of the Pub/Sub architecture and the data parallelism of the parameter server architecture to design a hierarchical asynchronous mechanism, reducing training latency and improving system efficiency. Additionally, to mitigate the training imbalance caused by resource and data heterogeneity, we formalize an optimization problem based on participants’ system profiles, enabling the selection of optimal hyperparameters while preserving privacy. We conduct a theoretical analysis to demonstrate that \texttt{PubSub-VFL} achieves stable convergence and is compatible with security protocols such as differential privacy. Extensive case studies on five benchmark datasets further validate its effectiveness, showing that \texttt{PubSub-VFL} compared to state-of-the-art baselines not only accelerates training by $2 \sim 7\times$ without compromising accuracy but also achieves computational resource utilization by up to 91. 07\%.

NeurIPS Conference 2025 Conference Paper

Quadratic Coreset Selection: Certifying and Reconciling Sequence and Token Mining for Efficient Instruction Tuning

  • Ziliang Chen
  • Yongsen Zheng
  • Zhao-Rong Lai
  • Zhanfu Yang
  • Cuixi Li
  • Yang Liu
  • Liang Lin

Instruction-Tuning (IT) was recently found the impressive data efficiency in post-training large language models (LLMs). While the pursuit of efficiency predominantly focuses on sequence-level curation, often overlooking the nuanced impact of critical tokens and the inherent risks of token noise and biases. Drawing inspiration from bi-level coreset selection, our work provides the principled view of the motivation behind selecting instructions' responses. It leads to our approach Quadratic Coreset Selection (QCS) that reconciles sequence-level and token-level influence contributions, deriving more expressive LLMs with established theoretical result. Despite the original QCS framework challenged by prohibitive computation from inverted LLM-scale Hessian matrices, we overcome this barrier by proposing a novel QCS probabilistic variant, which relaxes the original formulation through re-parameterized densities. This innovative solver is efficiently learned using hierarchical policy gradients without requiring back-propagation, achieving provable convergence and certified asymptotic equivalence to the original objective. Our experiments demonstrate QCS's superior sequence-level data efficiency and reveal how strategically leveraging token-level influence elevates the performance ceiling of data-efficient IT. Furthermore, QCS's adaptability is showcased through its successes in regular IT and challenging targeted IT scenarios, particularly in the cases of free-form complex instruction-following and CoT reasoning. They underscore QCS's potential for a wide array of versatile post-training applications.

ICML Conference 2025 Conference Paper

Robust Multi-bit Text Watermark with LLM-based Paraphrasers

  • Xiaojun Xu
  • Jinghan Jia
  • Yuanshun Yao
  • Yang Liu
  • Hang Li 0001

We propose an imperceptible multi-bit text watermark embedded by paraphrasing with LLMs. We fine-tune a pair of LLM paraphrasers that are designed to behave differently so that their paraphrasing difference reflected in the text semantics can be identified by a trained decoder. To embed our multi-bit watermark, we use two paraphrasers alternatively to encode the pre-defined binary code at the sentence level. Then we use a text classifier as the decoder to decode each bit of the watermark. Through extensive experiments, we show that our watermarks can achieve over 99. 99% detection AUC with small (1. 1B) text paraphrasers while keeping the semantic information of the original sentence. More importantly, our pipeline is robust under word substitution and sentence paraphrasing perturbations and generalizes well to out-of-distributional data. We also show the stealthiness of our watermark with LLM-based evaluation.

NeurIPS Conference 2025 Conference Paper

Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection

  • Zheng Zhan
  • Liliang Ren
  • Shuohang Wang
  • Liyuan Liu
  • Yang Liu
  • Yeyun Gong
  • Yanzhi Wang
  • Yelong Shen

State Space Models (SSMs) offer remarkable performance gains in efficient sequence modeling, with constant per-step inference-time computation and memory complexity. Recent advances, such as Mamba, further enhance SSMs with input-dependent gating and hardware-aware implementations, positioning them as strong alternatives to Transformers for long sequence modeling. However, efficiently scaling the expressive power of SSMs, particularly with Mixture of Experts (MoE), remains challenging, as naive integration attempts often falter or degrade performance. In this work, we introduce Routing Mamba (RoM), a novel approach that scales SSM parameters using sparse mixtures of linear projection experts. By sharing routing decisions between projection layers and lightweight sub-modules within Mamba across experts, RoM leverages synergies among linear projection experts for effective and efficient sparse scaling of Mamba layers. At a scale of 1. 3B active parameters (10B total) and 16K training sequence length, RoM achieves language modeling performance equivalent to a dense Mamba model requiring over 2. 3$\times$ more active parameters, and demonstrates consistent perplexity across context lengths. Experimental results further show RoM effectively scales hybrid language models, yielding a 23% FLOPS saving compared to dense Mamba scaling for similar performance. We release our training codebase at https: //github. com/zhanzheng8585/Routing-Mamba.

AAAI Conference 2025 Conference Paper

S^3cMath: Spontaneous Step-Level Self-Correction Makes Large Language Models Better Mathematical Reasoners

  • Yuchen Yan
  • Jin Jiang
  • Yang Liu
  • Yixin Cao
  • Xin Xu
  • Mengdi Zhang
  • Xunliang Cai
  • Jian Shao

Self-correction is a novel method that can stimulate the potential reasoning abilities of large language models (LLMs). It involves detecting and correcting errors during the inference process when LLMs solve reasoning problems. However, recent works do not regard self-correction as a spontaneous and intrinsic capability of LLMs. Instead, such correction is achieved through post-hoc generation, external knowledge introduction, multi-model collaboration, and similar techniques. In this paper, we propose a series of mathematical LLMs called S^3cMath, which are able to perform Spontaneous Step-level Self-correction for Mathematical reasoning. This capability helps LLMs to recognize whether their ongoing inference tends to contain errors and simultaneously correct these errors to produce a more reliable response. We proposed a method, which employs a step-level sampling approach to construct step-wise self-correction data for achieving such ability. Additionally, we implement a training strategy that uses above constructed data to equip LLMs with spontaneous step-level self-correction capacities. Our data and methods have been demonstrated to be effective across various foundation LLMs, consistently showing significant progress in evaluations on GSM8K, MATH, and other mathematical benchmarks. To the best of our knowledge, we are the first to introduce the spontaneous step-level self-correction ability of LLMs in mathematical reasoning.

IJCAI Conference 2025 Conference Paper

SAP: Privacy-Preserving Fine-Tuning on Language Models with Split-and-Privatize Framework

  • Xicong Shen
  • Yang Liu
  • Yi Liu
  • Peiran Wang
  • Huiqi Liu
  • Jue Hong
  • Bing Duan
  • Zirui Huang

Pre-trained Language Models (PLM) have enabled a cost-effective approach to handling various downstream applications via Parameter-Efficient-Fine-Tuning (PEFT) techniques. In this context, service providers have introduced a popular fine-tuning-based product service known as Model-as-a-Service (MaaS). This service offers users access to extensive PLMs and training resources. With MaaS, users can fine-tune, deploy, and utilize their customized models seamlessly, leveraging a one-stop platform that allows them to work with their private datasets efficiently. However, this service paradigm has recently been exposed to the possibility of leaking user private data. To this end, we identify the data privacy leakage risks in MaaS-based PEFT and propose a Split-and-Privatize (SAP) framework, mitigating the privacy leakage by integrating split learning and differential privacy into MaaS PEFT. Furthermore, we propose Contributing-Token-Identification (CTI), a novel method to balance model utility degradation and privacy leakage. As a result, the proposed framework is comprehensively evaluated, demonstrating a 65% improvement in empirical privacy with only a 1% degradation in model performance on the Stanford Sentiment Treebank dataset, outperforming existing state-of-the-art baselines.

NeurIPS Conference 2025 Conference Paper

SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG

  • Xiaonan Si
  • Meilin Zhu
  • Simeng Qin
  • Lijia Yu
  • Lijun Zhang
  • Shuaitong Liu
  • Xinfeng Li
  • Ranjie Duan

Retrieval-augmented generation (RAG) systems enhance large language models (LLMs) with external knowledge but are vulnerable to corpus poisoning and contamination attacks, which can compromise output integrity. Existing defenses often apply aggressive filtering, leading to unnecessary loss of valuable information and reduced reliability in generation. To address this problem, we propose a two-stage semantic filtering and conflict-free framework for trustworthy RAG. In the first stage, we perform a joint filter with semantic and cluster-based filtering which is guided by the Entity-intent-relation extractor (EIRE). EIRE extracts entities, latent objectives, and entity relations from both the user query and filtered documents, scores their semantic relevance, and selectively adds valuable documents into the clean retrieval database. In the second stage, we proposed an EIRE-guided conflict-aware filtering module, which analyzes semantic consistency between the query, candidate answers, and retrieved knowledge before final answer generation, filtering out internal and external contradictions that could mislead the model. Through this two-stage process, SeCon-RAG effectively preserves useful knowledge while mitigating conflict contamination, achieving significant improvements in both generation robustness and output trustworthiness. Extensive experiments across various LLMs and datasets demonstrate that the proposed SeCon-RAG markedly outperforms state-of-the-art defense methods.

ICML Conference 2025 Conference Paper

Self-cross Feature based Spiking Neural Networks for Efficient Few-shot Learning

  • Qi Xu 0008
  • Junyang Zhu
  • Dongdong Zhou
  • Hao Chen
  • Yang Liu
  • Jiangrong Shen
  • Qiang Zhang 0008

Deep neural networks (DNNs) excel in computer vision tasks, especially, few-shot learning (FSL), which is increasingly important for generalizing from limited examples. However, DNNs are computationally expensive with scalability issues in real world. Spiking Neural Networks (SNNs), with their event-driven nature and low energy consumption, are particularly efficient in processing sparse and dynamic data, though they still encounter difficulties in capturing complex spatiotemporal features and performing accurate cross-class comparisons. To further enhance the performance and efficiency of SNNs in few-shot learning, we propose a few-shot learning framework based on SNNs, which combines a self-feature extractor module and a cross-feature contrastive module to refine feature representation and reduce power consumption. We apply the combination of temporal efficient training loss and InfoNCE loss to optimize the temporal dynamics of spike trains and enhance the discriminative power. Experimental results show that the proposed FSL-SNN significantly improves the classification performance on the neuromorphic dataset N-Omniglot, and also achieves competitive performance to ANNs on static datasets such as CUB and miniImageNet with low power consumption.

IJCAI Conference 2025 Conference Paper

Sharpness-aware Zeroth-order Optimization for Graph Transformers

  • Yang Liu
  • Chuan Zhou
  • Yuhan Lin
  • Shuai Zhang
  • Yang Gao
  • Zhao Li
  • Shirui Pan

Graph Transformers (GTs) have emerged as powerful tools for handling graph-structured data through global attention mechanisms. While GTs can effectively capture long-range dependencies, they introduce difficulties in optimization due to their complex, non-differentiable operators, which cannot be directly handled by standard gradient-based optimizers (such as Adam or AdamW). To investigate the above issues, this work adopts the line of Zeroth-Order Optimization (ZOO) technique. However, direct integration of ZOO incurs considerable challenges due to the sharp loss landscape and steep gradients within the GT parameter space. Under the above observations, we propose a Sharpness-aware Zeroth-order Optimizer (SZO) that combines Sharpness-Aware Minimization (SAM) technique facilitating convergence within a flatter neighborhood, and leverages parallel computing for efficient gradient estimation. Theoretically, we provide a comprehensive analysis of the optimizer from both convergence and generalization perspectives. Empirically, we conduct extensive experiments on various classical GTs across a wide range of benchmark datasets, which underscore the superior performance of SZO over the state-of-the-art optimizers.

EAAI Journal 2025 Journal Article

Short-term prediction of dissolved oxygen and water temperature using deep learning with dual proportional-integral-derivative error corrector in pond culture

  • Xinhui Zhou
  • Yinfeng Hao
  • Yang Liu
  • Lanxue Dang
  • Baojun Qiao
  • Xianyu Zuo

Dissolved oxygen (DO) and water temperature (WT) are the most important water quality indicators that directly affect the metabolism and development of aquatic products in pond cultures. Therefore, the accurate prediction of these two water quality factors is crucial for improving aquaculture efficiency; however, the complexity and variability of pond aquaculture environments make accurate and efficient prediction of water quality challenging in this field. To address this problem, this study proposes a deep learning method combined with a proportional-integral-derivative (PID) error corrector for the accurate and rapid prediction of DO and WT. This method consists of two parts: a benchmark deep-network prediction model and a PID error corrector. The benchmark model is used for the multistep forward prediction of DO and WT, providing feedback input for the PID error corrector, which obtains the prediction error by comparing the feedback input with the actual values of the two predicted variables and then based on the prediction error, calculates the error correction amount for the current predicted values of the benchmark model. The proposed method was validated using three chaotic time-series datasets and an actual pond aquaculture water environment dataset. The results demonstrate the effectiveness of the proposed PID error corrector which significantly improved prediction accuracy of the benchmark model despite its simple structure (with only three adjustable parameters). Thus, the PID error corrector has great engineering application prospects in the field of pond culture water quality monitoring.

NeurIPS Conference 2025 Conference Paper

SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning

  • Yang Liu
  • Ming Ma
  • Xiaomin Yu
  • Pengxiang Ding
  • Han Zhao
  • Mingyang Sun
  • Siteng Huang
  • Donglin Wang

Despite impressive advancements in Visual-Language Models (VLMs) for multi-modal tasks, their reliance on RGB inputs limits precise spatial understanding. Existing methods for integrating spatial cues, such as point clouds or depth, either require specialized sensors or fail to effectively exploit depth information for higher-order reasoning. To this end, we propose a novel Spatial Sense and Reasoning method, dubbed SSR, a novel framework that transforms raw depth data into structured, interpretable textual rationales. These textual rationales serve as meaningful intermediate representations to significantly enhance spatial reasoning capabilities. Additionally, we leverage knowledge distillation to compress the generated rationales into compact latent embeddings, which facilitate resource-efficient and plug-and-play integration into existing VLMs without retraining. To enable comprehensive evaluation, we introduce a new dataset named SSR-CoT, a million-scale visual-language reasoning dataset enriched with intermediate spatial reasoning annotations, and present SSRBench, a comprehensive multi-task benchmark. Extensive experiments on multiple benchmarks demonstrate SSR substantially improves depth utilization and enhances spatial reasoning, thereby advancing VLMs toward more human-like multi-modal understanding. Project page: https: //yliu-cs. github. io/SSR.

NeurIPS Conference 2025 Conference Paper

TC-Light: Temporally Coherent Generative Rendering for Realistic World Transfer

  • Yang Liu
  • Chuanchen Luo
  • Zimo Tang
  • Yingyan Li
  • Yuran Yang
  • Yuanyong Ning
  • Lue Fan
  • Junran Peng

Illumination and texture rerendering are critical dimensions for world-to-world transfer, which is valuable for applications including sim2real and real2real visual data scaling up for embodied AI. Existing techniques generatively re-render the input video to realize the transfer, such as video relighting models and conditioned world generation models. Nevertheless, these models are predominantly limited to the domain of training data (e. g. , portrait) or fall into the bottleneck of temporal consistency and computation efficiency, especially when the input video involves complex dynamics and long durations. In this paper, we propose TC-Light, a novel paradigm characterized by the proposed two-stage post optimization mechanism. Starting from the video preliminarily relighted by an inflated video relighting model, it optimizes appearance embedding in the first stage to align global illumination. Then it optimizes the proposed canonical video representation, i. e. , Unique Video Tensor (UVT), to align fine-grained texture and lighting in the second stage. To comprehensively evaluate performance, we also establish a long and highly dynamic video benchmark. Extensive experiments show that our method enables physically plausible re-rendering results with superior temporal coherence and low computation cost. The code and video demos are available at our Project Page.

AIJ Journal 2025 Journal Article

TeachText: CrossModal text-video retrieval through generalized distillation

  • Ioana Croitoru
  • Simion-Vlad Bogolin
  • Marius Leordeanu
  • Hailin Jin
  • Andrew Zisserman
  • Yang Liu
  • Samuel Albanie

In recent years, considerable progress on the task of text-video retrieval has been achieved by leveraging large-scale pretraining on visual and audio datasets to construct powerful video encoders. By contrast, despite the natural symmetry, the design of effective algorithms for exploiting large-scale language pretraining remains under-explored. In this work, we investigate the design of such algorithms and propose a novel generalized distillation method, TeachText, which leverages complementary cues from multiple text encoders to provide an enhanced supervisory signal to the retrieval model. TeachText yields significant gains on a number of video retrieval benchmarks without incurring additional computational overhead during inference and was used to produce the winning entry in the Condensed Movie Challenge at ICCV 2021. We show how TeachText can be extended to include multiple video modalities, reducing computational cost at inference without compromising performance. Finally, we demonstrate the application of our method to the task of removing noisy descriptions from the training partitions of retrieval datasets to improve performance. Code and data can be found at https: //www. robots. ox. ac. uk/~vgg/research/teachtext/.

NeurIPS Conference 2025 Conference Paper

Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration

  • Wenju Sun
  • Qingyong Li
  • Wen Wang
  • Yang Liu
  • Yangliao Geng
  • Boyang Li

Multi-task model merging aims to consolidate knowledge from multiple fine-tuned task-specific experts into a unified model while minimizing performance degradation. Existing methods primarily approach this by minimizing differences between task-specific experts and the unified model, either from a parameter-level or a task-loss perspective. However, parameter-level methods exhibit a significant performance gap compared to the upper bound, while task-loss approaches entail costly secondary training procedures. In contrast, we observe that performance degradation closely correlates with feature drift, i. e. , differences in feature representations of the same sample caused by model merging. Motivated by this observation, we propose Layer-wise Optimal Task Vector Merging (LOT Merging), a technique that explicitly minimizes feature drift between task-specific experts and the unified model in a layer-by-layer manner. LOT Merging can be formulated as a convex quadratic optimization problem, enabling us to analytically derive closed-form solutions for the parameters of linear and normalization layers. Consequently, LOT Merging achieves efficient model consolidation through basic matrix operations. Extensive experiments across vision and vision-language benchmarks demonstrate that LOT Merging significantly outperforms baseline methods, achieving improvements of up to 4. 4% (ViT-B/32) over state-of-the-art approaches. The source code is available at https: //github. com/SunWenJu123/model-merging.

EAAI Journal 2025 Journal Article

Towards non-visual bowel cancer diagnosis: A certainty-aware data-driven method of lesion characterisation using a vibrating capsule

  • Kenneth Omokhagbo Afebu
  • Yang Liu
  • Evangelos Papatheou
  • Shyam Prasad

Recent advances in miniaturised, dynamically actuated robots have opened new pathways for non-visual, in-situ disease diagnosis. This study explores a novel method for early bowel cancer detection using a self-propelled robotic capsule that navigates the bowel and detects lesions based on variations in tissue stiffness. The approach capitalises on the sensitivity of the capsule’s dynamic responses to surrounding tissue properties. A dual-phase machine learning framework is proposed. The first phase uses regression models including multilayer perceptron (MLP), support vector regression (SVR), and Gaussian process regression (GPR) to predict tissue stiffness from displacement signal features. The second phase uses a Gaussian mixture model (GMM) to cluster the predicted stiffness values into different categories. Unlike our previous work, this study emphasises the robustness of the models under varying data conditions using both accuracy and reliability-oriented metrics. Based on our studies, MLP provided the most reliable regression results for simulated data and downstream clustering, though GPR performed better on experimental datasets. SVR consistently underperformed, especially on experimental data. The GMM achieved over 89% clustering accuracy across both simulated and experimental datasets, with improved results when predictions from more accurate regression models are used as the inputs. This work demonstrates a promising step toward dynamic, in-situ lesion characterisation and highlights the potential for integrating lesion biomechanics into future endoscopic diagnosis.

EAAI Journal 2025 Journal Article

Uncovering the geometry-dependent optical asymmetry of gold nanorods helical assemblies using artificial neural networks

  • Yang Liu
  • Yongguang Chen
  • Xiyang Wei
  • Jianhua Shang
  • Lina Zhao

The optical asymmetry of gold nanorods (Au-NRs) helical assemblies is well-documented with a wide range of applications. Nevertheless, the geometry-dependent optical asymmetry within these assemblies has not been adequately explored and quantified. The present study proposes a novel approach to predict the optical asymmetry of Au-NRs helical assemblies based on geometric characteristics using artificial neural networks (ANN). The performance of the ANN termed 3 N H L 50 N N was significantly enhanced through the optimization of the hidden layer and node, resulting in an R2 of the outcomes exceeding 0. 998 and a reduction in computational time exceeding 99. 99 %. In instances where the specific geometric characteristics are needed to attain a desired optical asymmetry, a retrieval of geometric characteristics of Au-NRs helical assemblies was additionally investigated using a traversing mechanism featured particle swarm optimization (PSO) algorithm. The results of the retrieval were obtained within 6 s and demonstrate a high degree of accuracy and reliability. The combination of the 3 N H L 50 N N and the PSO algorithm is capable of accurately predicting the optical asymmetry of Au-NRs helical assemblies and the retrieval of the geometry characteristics, thereby enabling the quantitative understanding of their overall geometry-dependent optical asymmetry.

NeurIPS Conference 2024 Conference Paper

3D Structure Prediction of Atomic Systems with Flow-based Direct Preference Optimization

  • Rui Jiao
  • Xiangzhe Kong
  • Wenbing Huang
  • Yang Liu

Predicting high-fidelity 3D structures of atomic systems is a fundamental yet challenging problem in scientific domains. While recent work demonstrates the advantage of generative models in this realm, the exploration of different probability paths are still insufficient, and hallucinations during sampling are persistently occurring. To address these pitfalls, we introduce FlowDPO, a novel framework that explores various probability paths with flow matching models and further suppresses hallucinations using Direct Preference Optimization (DPO) for structure generation. Our approach begins with a pre-trained flow matching model to generate multiple candidate structures for each training sample. These structures are then evaluated and ranked based on their distance to the ground truth, resulting in an automatic preference dataset. Using this dataset, we apply DPO to optimize the original model, improving its performance in generating structures closely aligned with the desired reference distribution. As confirmed by our theoretical analysis, such paradigm and objective function are compatible with arbitrary Gaussian paths, exhibiting favorable universality. Extensive experimental results on antibodies and crystals demonstrate substantial benefits of our FlowDPO, highlighting its potential to advance the field of 3D structure prediction with generative models.

IJCAI Conference 2024 Conference Paper

3D Vision and Language Pretraining with Large-Scale Synthetic Data

  • Dejie Yang
  • Zhu Xu
  • Wentao Mo
  • Qingchao Chen
  • Siyuan Huang
  • Yang Liu

3D Vision-Language Pre-training (3D-VLP) aims to provide a pre-train model which can bridge 3D scenes with natural language, which is an important technique for embodied intelligence. However, current 3D-VLP datasets are hindered by limited scene-level diversity and insufficient fine-grained annotations (only 1. 2K scenes and 280K textual annotations in ScanScribe), primarily due to the labor-intensive of collecting and annotating 3D scenes. To overcome these obstacles, we construct SynVL3D, a comprehensive synthetic scene-text corpus with 10K indoor scenes and 1M descriptions at object, view, and room levels, which has the advantages of diverse scene data, rich textual descriptions, multi-grained 3D-text associations, and low collection cost. Utilizing the rich annotations in SynVL3D, we pre-train a simple and unified Transformer for aligning 3D and language with multi-grained pretraining tasks. Moreover, we propose a synthetic-to-real domain adaptation in downstream task fine-tuning process to address the domain shift. Through extensive experiments, we verify the effectiveness of our model design by achieving state-of-the-art performance on downstream tasks including visual grounding, dense captioning, and question answering. Codes are available at: https: //github. com/idejie/3DSyn

IJCAI Conference 2024 Conference Paper

A General Black-box Adversarial Attack on Graph-based Fake News Detectors

  • Peican Zhu
  • Zechen Pan
  • Yang Liu
  • Jiwei Tian
  • Keke Tang
  • Zhen Wang

Graph Neural Network (GNN)-based fake news detectors apply various methods to construct graphs, aiming to learn distinctive news embeddings for classification. Since the construction details are unknown for attackers in a black-box scenario, it is unrealistic to conduct the classical adversarial attacks that require a specific adjacency matrix. In this paper, we propose the first general black-box adversarial attack framework, i. e. , General Attack via Fake Social Interaction (GAFSI), against detectors based on different graph structures. Specifically, as sharing is an important social interaction for GNN-based fake news detectors to construct the graph, we simulate sharing behaviors to fool the detectors. Firstly, we propose a fraudster selection module to select engaged users leveraging local and global information. In addition, a post injection module guides the selected users to create shared relations by sending posts. The sharing records will be added to the social context, leading to a general attack against different detectors. Experimental results on empirical datasets demonstrate the effectiveness of GAFSI.

TCS Journal 2024 Journal Article

A mechanism design approach for multi-party machine learning

  • Mengjing Chen
  • Yang Liu
  • Weiran Shen
  • Yiheng Shen
  • Pingzhong Tang
  • Qiang Yang

In a multi-party machine learning system, different parties cooperate on optimizing towards better models by sharing data in a privacy-preserving way. A major challenge in learning is the incentive issue. For example, if there is competition among the parties, one may strategically hide their data to prevent other parties from getting better models. In this paper, we study the problem through the lens of mechanism design and incorporate the features of multi-party learning in our setting. First, each agent's valuation has externalities that depend on others' types and actions. Second, each agent can only misreport a type lower than his true type, but not the other way round. We provide the optimal truthful mechanism in the separable utility setting, as well as necessary and sufficient conditions for truthful mechanisms in general cases. Finally, we propose an algorithm to find the desirable mechanism that is truthful, individually rational, efficient and weakly budget-balanced, and analyze the computational complexity of the algorithm.

NeurIPS Conference 2024 Conference Paper

A Simple Image Segmentation Framework via In-Context Examples

  • Yang Liu
  • Chenchen Jing
  • Hengtao Li
  • Muzhi Zhu
  • Hao Chen
  • Xinlong Wang
  • Chunhua Shen

Recently, there have been explorations of generalist segmentation models that can effectively tackle a variety of image segmentation tasks within a unified in-context learning framework. However, these methods still struggle with task ambiguity in in-context segmentation, as not all in-context examples can accurately convey the task information. In order to address this issue, we present SINE, a simple image $\textbf{S}$egmentation framework utilizing $\textbf{in}$-context $\textbf{e}$xamples. Our approach leverages a Transformer encoder-decoder structure, where the encoder provides high-quality image representations, and the decoder is designed to yield multiple task-specific output masks to eliminate task ambiguity effectively. Specifically, we introduce an In-context Interaction module to complement in-context information and produce correlations between the target image and the in-context example and a Matching Transformer that uses fixed matching and a Hungarian algorithm to eliminate differences between different tasks. In addition, we have further perfected the current evaluation system for in-context image segmentation, aiming to facilitate a holistic appraisal of these models. Experiments on various segmentation tasks show the effectiveness of the proposed method.

NeurIPS Conference 2024 Conference Paper

Achievable Fairness on Your Data With Utility Guarantees

  • Muhammad F. Taufiq
  • Jean-François Ton
  • Yang Liu

In machine learning fairness, training models that minimize disparity across different sensitive groups often leads to diminished accuracy, a phenomenon known as the fairness-accuracy trade-off. The severity of this trade-off inherently depends on dataset characteristics such as dataset imbalances or biases and therefore, using a uniform fairness requirement across diverse datasets remains questionable. To address this, we present a computationally efficient approach to approximate the fairness-accuracy trade-off curve tailored to individual datasets, backed by rigorous statistical guarantees. By utilizing the You-Only-Train-Once (YOTO) framework, our approach mitigates the computational burden of having to train multiple models when approximating the trade-off curve. Crucially, we introduce a novel methodology for quantifying uncertainty in our estimates, thereby providing practitioners with a robust framework for auditing model fairness while avoiding false conclusions due to estimation errors. Our experiments spanning tabular (e. g. , Adult), image (CelebA), and language (Jigsaw) datasets underscore that our approach not only reliably quantifies the optimum achievable trade-offs across various data modalities but also helps detect suboptimality in SOTA fairness methods.

EAAI Journal 2024 Journal Article

Automated pixel-level pavement marking detection based on a convolutional transformer

  • Hang Zhang
  • Anzheng He
  • Zishuo Dong
  • Allen A. Zhang
  • Yang Liu
  • You Zhan
  • Kelvin C.P. Wang
  • Zhihao Lin

Accurate detection of pavement markings at the pixel level is crucial for enhancing traffic safety. The majority of current advanced deep-learning networks predominantly focus on localized features, neglecting the global context of pavement image. Such networks often result in discontinuous segmentation outcomes and suboptimal recovery of local details. In this paper, a robust model named C-Transformer is proposed to provide an effective solution to this challenge. The contributions of this paper primarily involve two aspects. Firstly, the proposed C-Transformer is designed to succinctly integrate convolution operations and self-attention, facilitating a comprehensive understanding of essential features. Secondly, an efficient Feed-Forward Network called Inverse Residual Feed-Forward Network is also proposed in this paper and deployed in C-Transformer to improve latent representations. Experimental results demonstrate that, compared to other state-of-the-art networks, the proposed C-Transformer achieves a performance enhancement of 0. 93% in F-measure and a 1. 64% improvement in Intersection-Over-Union. In particular, the robustness and effectiveness of the C-Transformer in accurate pavement marking detection are proved through field test results. This paper illustrates the feasibility of employing a hybrid Convolutional neural network-Transformer-based network for automatic robust pavement marking detection under noisy conditions.

EAAI Journal 2024 Journal Article

Benchmarking deep Facial Expression Recognition: An extensive protocol with balanced dataset in the wild

  • Gianmarco Ipinze Tutuianu
  • Yang Liu
  • Ari Alamäki
  • Janne Kauttonen

Facial expression recognition (FER) is crucial in enhancing human-computer interaction. While current FER methods, leveraging various open-source deep learning models and training techniques, have shown promising accuracy and generalizability, their efficacy often diminishes in real-world scenarios that are not extensively studied. Addressing this gap, we introduce a novel in-the-wild balanced testing facial expression dataset designed for cross-domain validation, called BTFER. We rigorously evaluated widely utilized networks and self-designed architectures, adhering to a standardized protocol. Additionally, we explored different configurations, including input resolutions, class balance management, and pre-trained strategies, to ascertain their impact on performance. Through comprehensive testing across three major FER datasets and our in-depth cross-validation, we have ranked these network architectures and formulated a series of practical guidelines for implementing deep learning-based FER solutions in real-life applications. This paper also delves into the ethical considerations, privacy concerns, and regulatory aspects relevant to the deployment of FER technologies in sectors such as marketing, education, entertainment, and healthcare, aiming to foster responsible and effective use. The BTFER dataset and the implementation code are available in Kaggle and Github, respectively.

AAAI Conference 2024 Conference Paper

Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA

  • Wentao Mo
  • Yang Liu

In 3D Visual Question Answering (3D VQA), the scarcity of fully annotated data and limited visual content diversity hampers the generalization to novel scenes and 3D concepts (e.g., only around 800 scenes are utilized in ScanQA and SQA dataset). Current approaches resort supplement 3D reasoning with 2D information. However, these methods face challenges: either they use top-down 2D views that introduce overly complex and sometimes question-irrelevant visual clues, or they rely on globally aggregated scene/image-level representations from 2D VLMs, losing the fine-grained vision-language correlations. To overcome these limitations, our approach utilizes question-conditional 2D view selection procedure, pinpointing semantically relevant 2D inputs for crucial visual clues. We then integrate this 2D knowledge into the 3D-VQA system via a two-branch Transformer structure. This structure, featuring a Twin-Transformer design, compactly combines 2D and 3D modalities and captures fine-grained correlations between modalities, allowing them mutually augmenting each other. Integrating proposed mechanisms above, we present BridgeQA, that offers a fresh perspective on multi-modal transformer-based architectures for 3D-VQA. Experiments validate that BridgeQA achieves state-of-the-art on 3D-VQA datasets and significantly outperforms existing solutions. Code is available at https://github.com/matthewdm0816/BridgeQA.

JMLR Journal 2024 Journal Article

Cluster-Adaptive Network A/B Testing: From Randomization to Estimation

  • Yang Liu
  • Yifan Zhou
  • Ping Li
  • Feifang Hu

The performance of A/B testing in both online and offline experimental settings hinges on mitigating network interference and achieving covariate balancing. These experiments often involve an observable network with identifiable clusters, and measurable cluster-level and individual-level attributes. Exploiting these inherent characteristics holds potential for refining experimental design and subsequent statistical analyses. In this article, we propose a novel cluster-adaptive network A/B testing procedure, which contains a cluster-adaptive randomization (CLAR) and a cluster-adjusted estimator (CAE) to facilitate the design of the experiment and enhance the performance of ATE estimation. The CLAR sequentially assigns clusters to minimize the Mahalanobis distance, which further leads to the balance of the cluster-level covariates and the within-cluster-averaged individual-level covariates. The cluster-adjusted estimator (CAE) is tailored to offset biases caused by network interference. The proposed procedure has the following two folds of the desirable properties. First, we show that the Malanobis distance calculated for the two levels of covariates is $O_p(m^{-1})$, where $m$ represents the number of clusters. This result justifies the simultaneous balance of the cluster-level and individual-level covariates. Under mild conditions, we derive the asymptotic normality of CAE and demonstrate the benefit of covariate balancing on improving the precision for estimating ATE. The proposed A/B testing procedure is easy to calculate, consistent, and achieves higher accuracy. Extensive numerical studies are conducted to demonstrate the finite sample property of the proposed network A/B testing procedure. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

AAAI Conference 2024 Conference Paper

Comprehensive Visual Grounding for Video Description

  • Wenhui Jiang
  • Yibo Cheng
  • Linxin Liu
  • Yuming Fang
  • Yuxin Peng
  • Yang Liu

The grounding accuracy of existing video captioners is still behind the expectation. The majority of existing methods perform grounded video captioning on sparse entity annotations, whereas the captioning accuracy often suffers from degenerated object appearances on the annotated area such as motion blur and video defocus. Moreover, these methods seldom consider the complex interactions among entities. In this paper, we propose a comprehensive visual grounding network to improve video captioning, by explicitly linking the entities and actions to the visual clues across the video frames. Specifically, the network consists of spatial-temporal entity grounding and action grounding. The proposed entity grounding encourages the attention mechanism to focus on informative spatial areas across video frames, albeit the entity is annotated in only one frame of a video. The action grounding dynamically associates the verbs to related subjects and the corresponding context, which keeps fine-grained spatial and temporal details for action prediction. Both entity grounding and action grounding are formulated as a unified task guided by a soft grounding supervision, which brings architecture simplification and improves training efficiency as well. We conduct extensive experiments on two challenging datasets, and demonstrate significant performance improvements of +2.3 CIDEr on ActivityNet-Entities and +2.2 CIDEr on MSR-VTT compared to state-of-the-arts.

NeurIPS Conference 2024 Conference Paper

Customized Subgraph Selection and Encoding for Drug-drug Interaction Prediction

  • Haotong Du
  • Quanming Yao
  • Juzheng Zhang
  • Yang Liu
  • Zhen Wang

Subgraph-based methods have proven to be effective and interpretable in predicting drug-drug interactions (DDIs), which are essential for medical practice and drug development. Subgraph selection and encoding are critical stages in these methods, yet customizing these components remains underexplored due to the high cost of manual adjustments. In this study, inspired by the success of neural architecture search (NAS), we propose a method to search for data-specific components within subgraph-based frameworks. Specifically, we introduce extensive subgraph selection and encoding spaces that account for the diverse contexts of drug interactions in DDI prediction. To address the challenge of large search spaces and high sampling costs, we design a relaxation mechanism that uses an approximation strategy to efficiently explore optimal subgraph configurations. This approach allows for robust exploration of the search space. Extensive experiments demonstrate the effectiveness and superiority of the proposed method, with the discovered subgraphs and encoding functions highlighting the model’s adaptability.

IJCAI Conference 2024 Conference Paper

Denoising Diffusion-Augmented Hybrid Video Anomaly Detection via Reconstructing Noised Frames

  • Kai Cheng
  • Yaning Pan
  • Yang Liu
  • Xinhua Zeng
  • Rui Feng

Video Anomaly Detection (VAD) is crucial for enhancing security and surveillance systems through automatic identification of irregular events, thereby enabling timely responses and augmenting overall situational awareness. Although existing methods have achieved decent detection performances on benchmarks, their predicted objects still remain ambiguous in terms of the semantic aspect. To overcome this limitation, we propose the Denoising diffusion-augmented Hybrid Video Anomaly Detection (DHVAD) framework. The proposed Denoising diffusion-based Reconstruction Unit (DRU) enhances the understanding of semantically accurate normality as a crucial component in DHVAD. Meanwhile, we propose a detection strategy that integrates the advantages of a prediction-based Frame Prediction Unit (FPU) with DRU by exploring the spatial-temporal consistency seamlessly. The competitive performance of DHVAD compared with state-of-the-art methods on three benchmark datasets proves the effectiveness of our framework. The extended experimental analysis demonstrates that our framework can gain a better understanding of the normality in terms of semantic accuracy for VAD and efficiently leverage the strengths of both components.

EAAI Journal 2024 Journal Article

Discriminative feature learning based on multi-view attention network with diffusion joint loss for speech emotion recognition

  • Yang Liu
  • Xin Chen
  • Yuan Song
  • Yarong Li
  • Shengbei Wang
  • Weitao Yuan
  • Yongwei Li
  • Zhen Zhao

In speech emotion recognition, existing models often struggle to accurately classify emotions with high similarity. In this paper, we propose a novel architecture that integrates a multi-view attention network (MVAN) and diffusion joint loss to alleviate confusion by placing a stronger focus on emotions that are challenging to classify accurately. First, we use logarithmic Mel-spectrograms (log-Mels), deltas, and delta-deltas of log-Mels as three-dimensional features to minimize external interference. Then, we design the MVAN to extract effective multi-time scale emotion features, where the channel and spatial attention are used to selectively localize the regions in the input features related to the target emotion. A Multi-time view bidirectional long and short-term memory network is used to extract the shallow edge features and deep semantic features, and multi-scale self-attention fuses these features through cross-scale attention fusion to obtain multi-time scale emotion features. Finally, a diffusion joint loss strategy is introduced to distinguish the emotional embeddings with high similarity by the generated complex emotion triplets in a diffusing fashion. We evaluated our proposed method on the Interactive Emotional Mood Binary Motion Capture (IEMOCAP), Chinese Academy of Sciences Automation Institute of Automation (CASIA), and Berlin German Emotion Speech Bank (EMODB) corpus. The results show significant improvements over existing methods, achieving 86. 87% WA, 86. 60% UA, and 86. 82% WF1 on IEMOCAP; 70. 74% WA, 70. 74% UA, and 70. 25% WF1 on CASIA; and 93. 65% WA, 91. 13% UA, and 92. 26% WF1 on EMODB. These results confirm the superiority of our method. Our code and model are available at https: //github. com/Littleznnz/MVAN-DiffSEG.

TMLR Journal 2024 Journal Article

Equivariant Graph Learning for High-density Crowd Trajectories Modeling

  • Yang Liu
  • Zinan Zheng
  • Yu Rong
  • Jia Li

Understanding the high-density crowd dynamics of urbanization plays an important role in architectural design and urban planning, preventing the occurrence of crowd crush. Most traditional methods rely on formulas designed based on expert knowledge, which are inflexible and incomplete to model complex real-world crowd trajectories. To address the issue, recent studies propose to simulate crowds via data-driven models. However, these models fail to learn the inherent symmetry of high-density crowd trajectories, leading to insufficient generalization ability. For example, existing models can not predict left-to-right trajectories by learning right-to-left trajectories, even though they share similar patterns. In this work, we propose a novel Equivariant Graph Learning framework for high-density crowd dynamic modeling, called CrowdEGL. It utilizes an additional objective to encourage models to predict the transformed output given the input under the same transformation. We summarize three types of transformation groups, which are determined by the symmetry of environments. To explicitly incorporate these augmented data, a multi-channel GNN is employed to learn the latent graph embedding of pedestrian patterns. Finally, to model dense crowd interactions, future positions of original and transformed inputs are obtained by multiple independent graph decoders. Extensive experiments on 8 datasets from 5 different environments show that CrowdEGL outperforms existing models by a large margin.

AAAI Conference 2024 Conference Paper

Fair Participation via Sequential Policies

  • Reilly Raab
  • Ross Boczar
  • Maryam Fazel
  • Yang Liu

Leading approaches to algorithmic fairness and policy-induced distribution shift are often misaligned with long-term objectives in sequential settings. We aim to correct these shortcomings by ensuring that both the objective and fairness constraints account for policy-induced distribution shift. First, we motivate this problem using an example in which individuals subject to algorithmic predictions modulate their willingness to participate with the policy maker. Fairness in this example is measured by the variance of group participation rates. Next, we develop a method for solving the resulting constrained, non-linear optimization problem and prove that this method converges to a fair, locally optimal policy given first-order information. Finally, we experimentally validate our claims in a semi-synthetic setting.

NeurIPS Conference 2024 Conference Paper

Fairness without Harm: An Influence-Guided Active Sampling Approach

  • Jinlong Pang
  • Jialu Wang
  • Zhaowei Zhu
  • Yuanshun Yao
  • Chen Qian
  • Yang Liu

The pursuit of fairness in machine learning (ML), ensuring that the models do not exhibit biases toward protected demographic groups, typically results in a compromise scenario. This compromise can be explained by a Pareto frontier where given certain resources (e. g. , data), reducing the fairness violations often comes at the cost of lowering the model accuracy. In this work, we aim to train models that mitigate group fairness disparity without causing harm to model accuracy. Intuitively, acquiring more data is a natural and promising approach to achieve this goal by reaching a better Pareto frontier of the fairness-accuracy tradeoff. The current data acquisition methods, such as fair active learning approaches, typically require annotating sensitive attributes. However, these sensitive attribute annotations should be protected due to privacy and safety concerns. In this paper, we propose a tractable active data sampling algorithm that does not rely on training group annotations, instead only requiring group annotations on a small validation set. Specifically, the algorithm first scores each new example by its influence on fairness and accuracy evaluated on the validation dataset, and then selects a certain number of examples for training. We theoretically analyze how acquiring more data can improve fairness without causing harm, and validate the possibility of our sampling approach in the context of risk disparity. We also provide the upper bound of generalization error and risk disparity as well as the corresponding connections. Extensive experiments on real-world data demonstrate the effectiveness of our proposed algorithm. Our code is available at github. com/UCSC-REAL/FairnessWithoutHarm.

IJCAI Conference 2024 Conference Paper

Federated Adaptation for Foundation Model-based Recommendations

  • Chunxu Zhang
  • Guodong Long
  • Hongkuan Guo
  • Xiao Fang
  • Yang Song
  • Zhaojie Liu
  • Guorui Zhou
  • Zijian Zhang

With the recent success of large language models, particularly foundation models with generalization abilities, applying foundation models for recommendations becomes a new paradigm to improve existing recommendation systems. It becomes a new open challenge to enable the foundation model to capture user preference changes in a timely manner with reasonable communication and computation costs while preserving privacy. This paper proposes a novel federated adaptation mechanism to enhance the foundation model-based recommendation system in a privacy-preserving manner. Specifically, each client will learn a lightweight personalized adapter using its private data. The adapter then collaborates with pre-trained foundation models to provide recommendation service efficiently with fine-grained manners. Importantly, users' private behavioral data remains secure as it is not shared with the server. This data localization-based privacy preservation is embodied via the federated learning framework. The model can ensure that shared knowledge is incorporated into all adapters while simultaneously preserving each user's personal preferences. Experimental results on four benchmark datasets demonstrate our method's superior performance. The code is available.

AAAI Conference 2024 Conference Paper

FedFixer: Mitigating Heterogeneous Label Noise in Federated Learning

  • Xinyuan Ji
  • Zhaowei Zhu
  • Wei Xi
  • Olga Gadyatskaya
  • Zilong Song
  • Yong Cai
  • Yang Liu

Federated Learning (FL) heavily depends on label quality for its performance. However, the label distribution among individual clients is always both noisy and heterogeneous. The high loss incurred by client-specific samples in heterogeneous label noise poses challenges for distinguishing between client-specific and noisy label samples, impacting the effectiveness of existing label noise learning approaches. To tackle this issue, we propose FedFixer, where the personalized model is introduced to cooperate with the global model to effectively select clean client-specific samples. In the dual models, updating the personalized model solely at a local level can lead to overfitting on noisy data due to limited samples, consequently affecting both the local and global models’ performance. To mitigate overfitting, we address this concern from two perspectives. Firstly, we employ a confidence regularizer to alleviate the impact of unconfident predictions caused by label noise. Secondly, a distance regularizer is implemented to constrain the disparity between the personalized and global models. We validate the effectiveness of FedFixer through extensive experiments on benchmark datasets. The results demonstrate that FedFixer can perform well in filtering noisy label samples on different clients, especially in highly heterogeneous label noise scenarios.

NeurIPS Conference 2024 Conference Paper

FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models

  • Rui Ye
  • Rui Ge
  • Xinyu Zhu
  • Jingyi Chai
  • Yaxin Du
  • Yang Liu
  • Yanfeng Wang
  • Siheng Chen

Federated learning has enabled multiple parties to collaboratively train large language models without directly sharing their data (FedLLM). Following this training paradigm, the community has put massive efforts from diverse aspects including framework, performance, and privacy. However, an unpleasant fact is that there are currently no realistic datasets and benchmarks for FedLLM and previous works all rely on artificially constructed datasets, failing to capture properties in real-world scenarios. Addressing this, we propose FedLLM-Bench, which involves 8 training methods, 4 training datasets, and 6 evaluation metrics, to offer a comprehensive testbed for the FedLLM community. FedLLM-Bench encompasses three datasets (e. g. , user-annotated multilingual dataset) for federated instruction tuning and one dataset (e. g. , user-annotated preference dataset) for federated preference alignment, whose scale of client number ranges from 38 to 747. Our datasets incorporate several representative diversities: language, quality, quantity, instruction, length, embedding, and preference, capturing properties in real-world scenarios. Based on FedLLM-Bench, we conduct experiments on all datasets to benchmark existing FL methods and provide empirical insights (e. g. , multilingual collaboration). We believe that our FedLLM-Bench can benefit the FedLLM community by reducing required efforts, providing a practical testbed, and promoting fair comparisons. Code and datasets are available at https: //github. com/rui-ye/FedLLM-Bench.

AAAI Conference 2024 Conference Paper

FedMut: Generalized Federated Learning via Stochastic Mutation

  • Ming Hu
  • Yue Cao
  • Anran Li
  • Zhiming Li
  • Chengwei Liu
  • Tianlin Li
  • Mingsong Chen
  • Yang Liu

Although Federated Learning (FL) enables collaborative model training without sharing the raw data of clients, it encounters low-performance problems caused by various heterogeneous scenarios. Due to the limitation of dispatching the same global model to clients for local training, traditional Federated Average (FedAvg)-based FL models face the problem of easily getting stuck into a sharp solution, which results in training a low-performance global model. To address this problem, this paper presents a novel FL approach named FedMut, which mutates the global model according to the gradient change to generate several intermediate models for the next round of training. Each intermediate model will be dispatched to a client for local training. Eventually, the global model converges into a flat area within the range of mutated models and has a well-generalization compared with the global model trained by FedAvg. Experimental results on well-known datasets demonstrate the effectiveness of our FedMut approach in various data heterogeneity scenarios.

AAAI Conference 2024 Conference Paper

FedTGP: Trainable Global Prototypes with Adaptive-Margin-Enhanced Contrastive Learning for Data and Model Heterogeneity in Federated Learning

  • Jianqing Zhang
  • Yang Liu
  • Yang Hua
  • Jian Cao

Recently, Heterogeneous Federated Learning (HtFL) has attracted attention due to its ability to support heterogeneous models and data. To reduce the high communication cost of transmitting model parameters, a major challenge in HtFL, prototype-based HtFL methods are proposed to solely share class representatives, a.k.a, prototypes, among heterogeneous clients while maintaining the privacy of clients’ models. However, these prototypes are naively aggregated into global prototypes on the server using weighted averaging, resulting in suboptimal global knowledge which negatively impacts the performance of clients. To overcome this challenge, we introduce a novel HtFL approach called FedTGP, which leverages our Adaptive-margin-enhanced Contrastive Learning (ACL) to learn Trainable Global Prototypes (TGP) on the server. By incorporating ACL, our approach enhances prototype separability while preserving semantic meaning. Extensive experiments with twelve heterogeneous models demonstrate that our FedTGP surpasses state-of-the-art methods by up to 9.08% in accuracy while maintaining the communication and privacy advantages of prototype-based HtFL. Our code is available at https://github.com/TsingZ0/FedTGP.

NeurIPS Conference 2024 Conference Paper

Full-Atom Peptide Design with Geometric Latent Diffusion

  • Xiangzhe Kong
  • Yinjun Jia
  • Wenbing Huang
  • Yang Liu

Peptide design plays a pivotal role in therapeutics, allowing brand new possibility to leverage target binding sites that are previously undruggable. Most existing methods are either inefficient or only concerned with the target-agnostic design of 1D sequences. In this paper, we propose a generative model for full-atom Peptide design with Geometric LAtent Diffusion (PepGLAD) given the binding site. We first establish a benchmark consisting of both 1D sequences and 3D structures from Protein Data Bank (PDB) and literature for systematic evaluation. We then identify two major challenges of leveraging current diffusion-based models for peptide design: the full-atom geometry and the variable binding geometry. To tackle the first challenge, PepGLAD derives a variational autoencoder that first encodes full-atom residues of variable size into fixed-dimensional latent representations, and then decodes back to the residue space after conducting the diffusion process in the latent space. For the second issue, PepGLAD explores a receptor-specific affine transformation to convert the 3D coordinates into a shared standard space, enabling better generalization ability across different binding shapes. Experimental Results show that our method not only improves diversity and binding affinity significantly in the task of sequence-structure co-design, but also excels at recovering reference structures for binding conformation generation.

AAAI Conference 2024 Conference Paper

Jointly Modeling Spatio-Temporal Features of Tactile Signals for Action Classification

  • Jimmy Lin
  • Junkai Li
  • Jiasi Gao
  • Weizhi Ma
  • Yang Liu

Tactile signals collected by wearable electronics are essential in modeling and understanding human behavior. One of the main applications of tactile signals is action classification, especially in healthcare and robotics. However, existing tactile classification methods fail to capture the spatial and temporal features of tactile signals simultaneously, which results in sub-optimal performances. In this paper, we design Spatio-Temporal Aware tactility Transformer (STAT) to utilize continuous tactile signals for action classification. We propose spatial and temporal embeddings along with a new temporal pretraining task in our model, which aims to enhance the transformer in modeling the spatio-temporal features of tactile signals. Specially, the designed temporal pretraining task is to differentiate the time order of tubelet inputs to model the temporal properties explicitly. Experimental results on a public action classification dataset demonstrate that our model outperforms state-of-the-art methods in all metrics.

AAAI Conference 2024 Conference Paper

Knowledge Graph Error Detection with Contrastive Confidence Adaption

  • Xiangyu Liu
  • Yang Liu
  • Wei Hu

Knowledge graphs (KGs) often contain various errors. Previous works on detecting errors in KGs mainly rely on triplet embedding from graph structure. We conduct an empirical study and find that these works struggle to discriminate noise from semantically-similar correct triplets. In this paper, we propose a KG error detection model CCA to integrate both textual and graph structural information from triplet reconstruction for better distinguishing semantics. We design interactive contrastive learning to capture the differences between textual and structural patterns. Furthermore, we construct realistic datasets with semantically-similar noise and adversarial noise. Experimental results demonstrate that CCA outperforms state-of-the-art baselines, especially on semantically-similar noise and adversarial noise.

IJCAI Conference 2024 Conference Paper

Label Leakage in Vertical Federated Learning: A Survey

  • Yige Liu
  • Yiwei Lou
  • Yang Liu
  • Yongzhi Cao
  • Hanpin Wang

Vertical federated learning (VFL) is a distributed machine learning paradigm that collaboratively trains models using passive parties with features and an active party with additional labels. While VFL offers privacy preservation through data localization, the threat of label leakage remains a significant challenge. Label leakage occurs due to label inference attacks, where passive parties attempt to infer labels for their privacy and commercial value. Extensive research has been conducted on this specific VFL attack, but a comprehensive summary is still lacking. To bridge this gap, our paper aims to survey the existing label inference attacks and defenses. We propose two new taxonomies for both label inference attacks and defenses, respectively. Beyond summarizing the current state of research, we highlight techniques that we believe hold potential and could significantly influence future studies. Moreover, experimental benchmark datasets and evaluation metrics are summarized to provide a guideline for subsequent work.

NeurIPS Conference 2024 Conference Paper

Large Language Model Unlearning via Embedding-Corrupted Prompts

  • Chris Y. Liu
  • Yaxuan Wang
  • Jeffrey Flanigan
  • Yang Liu

Large language models (LLMs) have advanced to encompass extensive knowledge across diverse domains. Yet controlling what a large language model should not know is important for ensuring alignment and thus safe use. However, accurately and efficiently unlearning knowledge from an LLM remains challenging due to the potential collateral damage caused by the fuzzy boundary between retention and forgetting, and the large computational requirements for optimization across state-of-the-art models with hundreds of billions of parameters. In this work, we present \textbf{Embedding-COrrupted (ECO) Prompts}, a lightweight unlearning framework for large language models to address both the challenges of knowledge entanglement and unlearning efficiency. Instead of relying on the LLM itself to unlearn, we enforce an unlearned state during inference by employing a prompt classifier to identify and safeguard prompts to forget. We learn corruptions added to prompt embeddings via zeroth order optimization toward the unlearning objective offline and corrupt prompts flagged by the classifier during inference. We find that these embedding-corrupted prompts not only lead to desirable outputs that satisfy the unlearning objective but also closely approximate the output from a model that has never been trained on the data intended for forgetting. Through extensive experiments on unlearning, we demonstrate the superiority of our method in achieving promising unlearning at \textit{nearly zero side effects} in general domains and domains closely related to the unlearned ones. Additionally, we highlight the scalability of our method to 100 LLMs, ranging from 0. 5B to 236B parameters, incurring no additional cost as the number of parameters increases. We have made our code publicly available at \url{https: //github. com/chrisliu298/llm-unlearn-eco}.

NeurIPS Conference 2024 Conference Paper

Learning Superconductivity from Ordered and Disordered Material Structures

  • Pin Chen
  • Luoxuan Peng
  • Rui Jiao
  • Qing Mo
  • Zhen Wang
  • Wenbing Huang
  • Yang Liu
  • Yutong Lu

Superconductivity is a fascinating phenomenon observed in certain materials under certain conditions. However, some critical aspects of it, such as the relationship between superconductivity and materials' chemical/structural features, still need to be understood. Recent successes of data-driven approaches in material science strongly inspire researchers to study this relationship with them, but a corresponding dataset is still lacking. Hence, we present a new dataset for data-driven approaches, namely SuperCon3D, containing both 3D crystal structures and experimental superconducting transition temperature (Tc) for the first time. Based on SuperCon3D, we propose two deep learning methods for designing high Tc superconductors. The first is SODNet, a novel equivariant graph attention model for screening known structures, which differs from existing models in incorporating both ordered and disordered geometric content. The second is a diffusion generative model DiffCSP-SC for creating new structures, which enables high Tc-targeted generation. Extensive experiments demonstrate that both our proposed dataset and models are advantageous for designing new high Tc superconducting candidates.

NeurIPS Conference 2024 Conference Paper

Learning the Optimal Policy for Balancing Short-Term and Long-Term Rewards

  • Qinwei Yang
  • Xueqing Liu
  • Yan Zeng
  • Ruocheng Guo
  • Yang Liu
  • Peng Wu

Learning the optimal policy to balance multiple short-term and long-term rewards has extensive applications across various domains. Yet, there is a noticeable scarcity of research addressing policy learning strategies in this context. In this paper, we aim to learn the optimal policy capable of effectively balancing multiple short-term and long-term rewards, especially in scenarios where the long-term outcomes are often missing due to data collection challenges over extended periods. Towards this goal, the conventional linear weighting method, which aggregates multiple rewards into a single surrogate reward through weighted summation, can only achieve sub-optimal policies when multiple rewards are related. Motivated by this, we propose a novel decomposition-based policy learning (DPPL) method that converts the whole problem into subproblems. The DPPL method is capable of obtaining optimal policies even when multiple rewards are interrelated. Nevertheless, the DPPL method requires a set of preference vectors specified in advance, posing challenges in practical applications where selecting suitable preferences is non-trivial. To mitigate this, we further theoretically transform the optimization problem in DPPL into an $\varepsilon$-constraint problem, where $\varepsilon$ represents the minimum acceptable levels of other rewards while maximizing one reward. This transformation provides intuitive into the selection of preference vectors. Extensive experiments are conducted on the proposed method and the results validate the effectiveness of the method.

JBHI Journal 2024 Journal Article

MAD-Former: A Traceable Interpretability Model for Alzheimer's Disease Recognition Based on Multi-Patch Attention

  • Jiayu Ye
  • An Zeng
  • Dan Pan
  • Yiqun Zhang
  • Jingliang Zhao
  • Qiuping Chen
  • Yang Liu

The integration of structural magnetic resonance imaging (sMRI) and deep learning techniques is one of the important research directions for the automatic diagnosis of Alzheimer's disease (AD). Despite the satisfactory performance achieved by existing voxel-based models based on convolutional neural networks (CNNs), such models only handle AD-related brain atrophy at a single spatial scale and lack spatial localization of abnormal brain regions based on model interpretability. To address the above limitations, we propose a traceable interpretability model for AD recognition based on multi-patch attention (MAD-Former). MAD-Former consists of two parts: recognition and interpretability. In the recognition part, we design a 3D brain feature extraction network to extract local features, followed by constructing a dual-branch attention structure with different patch sizes to achieve global feature extraction, forming a multi-scale spatial feature extraction framework. Meanwhile, we propose an important attention similarity position loss function to assist in model decision-making. The interpretability part proposes a traceable method that can obtain a 3D ROI space through attention-based selection and receptive field tracing. This space encompasses key brain tissues that influence model decisions. Experimental results reveal the significant role of brain tissues such as the Fusiform Gyrus (FuG) in AD recognition. MAD-Former achieves outstanding performance in different tasks on ADNI and OASIS datasets, demonstrating reliable model interpretability.

IJCAI Conference 2024 Conference Paper

MAS-SAM: Segment Any Marine Animal with Aggregated Features

  • Tianyu Yan
  • Zifu Wan
  • Xinhao Deng
  • Pingping Zhang
  • Yang Liu
  • Huchuan Lu

Recently, Segment Anything Model (SAM) shows exceptional performance in generating high-quality object masks and achieving zero-shot image segmentation. However, as a versatile vision model, SAM is primarily trained with large-scale natural light images. In underwater scenes, it exhibits substantial performance degradation due to the light scattering and absorption. Meanwhile, the simplicity of the SAM's decoder might lead to the loss of fine-grained object details. To address the above issues, we propose a novel feature learning framework named MAS-SAM for marine animal segmentation, which involves integrating effective adapters into the SAM's encoder and constructing a pyramidal decoder. More specifically, we first build a new SAM's encoder with effective adapters for underwater scenes. Then, we introduce a Hypermap Extraction Module (HEM) to generate multi-scale features for a comprehensive guidance. Finally, we propose a Progressive Prediction Decoder (PPD) to aggregate the multi-scale features and predict the final segmentation results. When grafting with the Fusion Attention Module (FAM), our method enables to extract richer marine information from global contextual cues to fine-grained local details. Extensive experiments on four public MAS datasets demonstrate that our MAS-SAM can obtain better results than other typical segmentation methods. The source code is available at https: //github. com/Drchip61/MAS-SAM.

NeurIPS Conference 2024 Conference Paper

Membership Inference on Text-to-Image Diffusion Models via Conditional Likelihood Discrepancy

  • Shengfang Zhai
  • Huanran Chen
  • Yinpeng Dong
  • Jiajun Li
  • Qingni Shen
  • Yansong Gao
  • Hang Su
  • Yang Liu

Text-to-image diffusion models have achieved tremendous success in the field of controllable image generation, while also coming along with issues of privacy leakage and data copyrights. Membership inference arises in these contexts as a potential auditing method for detecting unauthorized data usage. While some efforts have been made on diffusion models, they are not applicable to text-to-image diffusion models due to the high computation overhead and enhanced generalization capabilities. In this paper, we first identify a conditional overfitting phenomenon in text-to-image diffusion models, indicating that these models tend to overfit the conditional distribution of images given the corresponding text rather than the marginal distribution of images only. Based on this observation, we derive an analytical indicator, namely Conditional Likelihood Discrepancy (CLiD), to perform membership inference, which reduces the stochasticity in estimating memorization of individual samples. Experimental results demonstrate that our method significantly outperforms previous methods across various data distributions and dataset scales. Additionally, our method shows superior resistance to overfitting mitigation strategies, such as early stopping and data augmentation.

NeurIPS Conference 2024 Conference Paper

Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs

  • Xin Ma
  • Yang Liu
  • Jingjing Liu
  • Xiaoxu Ma

Large language models (LLMs), although having revolutionized many fields, still suffer from the challenging extrapolation problem, where the inference ability of LLMs sharply declines beyond their max training lengths. In this work, we conduct a theoretical analysis to better understand why No Position Encoding (NoPE) fails outside its effective range, as well as examining the power of Position Encoding (PE) in this context. Our findings reveal that with meticulous weave position, PE can indeed be extended beyond effective range. Our theorems establish that LLMs equipped with weave PE can achieve improved extrapolation performance without additional cost. Furthermore, we introduce a novel weave PE method, Mesa-Extrapolation, which utilizes a chunk-based triangular attention matrix and applies Stair PE to manage the final chunk. This method not only retains competitive performance but also offers substantial benefits such as significantly reduced memory demand and faster inference speed. Extensive experiments validate the effectiveness of Mesa-Extrapolation, demonstrating its potential as a scalable solution to enhancing LLMs’ applicative reach.

NeurIPS Conference 2024 Conference Paper

Mitigating Reward Overoptimization via Lightweight Uncertainty Estimation

  • Xiaoying Zhang
  • Jean-François Ton
  • Wei Shen
  • Hongning Wang
  • Yang Liu

Reinforcement Learning from Human Feedback (RLHF) has been pivotal in aligning Large Language Models with human values but often suffers from overoptimization due to its reliance on a proxy reward model. To mitigate this limitation, we first propose a lightweight uncertainty quantification method that assesses the reliability of the proxy reward using only the last layer embeddings of the reward model. Enabled by this efficient uncertainty quantification method, we formulate AdvPO, a distributionally robust optimization procedure to tackle the reward overoptimization problem in RLHF. Through extensive experiments on the Anthropic HH and TL; DR summarization datasets, we verify the effectiveness of AdvPO in mitigating the overoptimization problem, resulting in enhanced RLHF performance as evaluated through human-assisted evaluation.

ICML Conference 2024 Conference Paper

MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization

  • Yu Zhang 0133
  • Qi Zhang 0020
  • Zixuan Gong
  • Yiwei Shi
  • Yepeng Liu
  • Duoqian Miao 0001
  • Yang Liu
  • Ke Liu

Contrastive Language-Image Pretraining (CLIP) has achieved remarkable success, leading to rapid advancements in multimodal studies. However, CLIP faces a notable challenge in terms of inefficient data utilization. It relies on a single contrastive supervision for each image-text pair during representation learning, disregarding a substantial amount of valuable information that could offer richer supervision. Additionally, the retention of non-informative tokens leads to increased computational demands and time costs, particularly in CLIP’s ViT image encoder. To address these issues, we propose M ulti-Perspective L anguage- I mage P retraining ( MLIP ). In MLIP, we leverage the frequency transform’s sensitivity to both high and low-frequency variations, which complements the spatial domain’s sensitivity limited to low-frequency variations only. By incorporating frequency transforms and token-level alignment, we expand CILP’s single supervision into multi-domain and multi-level supervision, enabling a more thorough exploration of informative image features. Additionally, we introduce a token merging method guided by comprehensive semantics from the frequency and spatial domains. This allows us to merge tokens to multi-granularity tokens with a controllable compression rate to accelerate CLIP. Extensive experiments validate the effectiveness of our design.

NeurIPS Conference 2024 Conference Paper

Multi-LLM Debate: Framework, Principals, and Interventions

  • Andrew Estornell
  • Yang Liu

The flexible and generalized nature of large language models has allowed for their application in a wide array of language-based domains. Much like their human contemporaries, these models are capable of engaging in discussions and debates as a means of improving answer quality. We first take a theoretical approach to analyzing debate and provide a framework through which debate can be mathematically examined. Building on this framework, we provide several theoretical results for multi-agent debate. In particular, we demonstrate that similar model capabilities, or similar model responses, can result in static debate dynamics where the debate procedure simply converges to the majority opinion. When this majority opinion is the result of a common misconception (ingrained in the models through shared training data) debate is likely to converge to answers associated with that common misconception. Using insights from our theoretical results we then propose three interventions which improve the efficacy of debate. For each intervention, we provide theoretical results demonstrating how debate is improved. We also demonstrate that these interventions result in better performance on four common benchmark tasks.

ICML Conference 2024 Conference Paper

Multi-View Clustering by Inter-cluster Connectivity Guided Reward

  • Hao Dai
  • Yang Liu
  • Peng Su
  • Hecheng Cai
  • Shudong Huang
  • Jiancheng Lv 0001

Multi-view clustering has been widely explored for its effectiveness in harmonizing heterogeneity along with consistency in different views of data. Despite the significant progress made by recent works, the performance of most existing methods is heavily reliant on strong priori information regarding the true cluster number $\textit{K}$, which is rarely feasible in real-world scenarios. In this paper, we propose a novel graph-based multi-view clustering algorithm to infer unknown $\textit{K}$ through a graph consistency reward mechanism. To be specific, we evaluate the cluster indicator matrix during each iteration with respect to diverse $\textit{K}$. We formulate the inference process of unknown $\textit{K}$ as a parsimonious reinforcement learning paradigm, where the reward is measured by inter-cluster connectivity. As a result, our approach is capable of independently producing the final clustering result, free from the input of a predefined cluster number. Experimental results on multiple benchmark datasets demonstrate the effectiveness of our proposed approach in comparison to existing state-of-the-art methods.

AAAI Conference 2024 Conference Paper

Novel Class Discovery in Chest X-rays via Paired Images and Text

  • Jiaying Zhou
  • Yang Liu
  • Qingchao Chen

Novel class discover(NCD) aims to identify new classes undefined during model training phase with the help of knowledge of known classes. Many methods have been proposed and notably boosted performance of NCD in natural images. However, there has been no work done in discovering new classes based on medical images and disease categories, which is crucial for understanding and diagnosing specific diseases. Moreover, most of the existing methods only utilize information from image modality and use labels as the only supervisory information. In this paper, we propose a multi-modal novel class discovery method based on paired images and text, inspired by the low classification accuracy of chest X-ray images and the relatively higher accuracy of the paired text. Specifically, we first pretrain the image encoder and text encoder with multi-modal contrastive learning on the entire dataset and then we generate pseudo-labels separately on the image branch and text branch. We utilize intra-modal consistency to assess the quality of pseudo-labels and adjust the weights of the pseudo-labels from both branches to generate the ultimate pseudo-labels for training. Experiments on eight subset splits of MIMIC-CXR-JPG dataset show that our method improves the clustering performance of unlabeled classes by about 10% on average compared to state-of-the-art methods. Code is available at: https://github.com/zzzzzzzzjy/MMNCD-main.

NeurIPS Conference 2024 Conference Paper

P$^2$C$^2$Net: PDE-Preserved Coarse Correction Network for efficient prediction of spatiotemporal dynamics

  • Qi Wang
  • Pu Ren
  • Hao Zhou
  • Xin-Yang Liu
  • Zhiwen Deng
  • Yi Zhang
  • Ruizhi Chengze
  • Hongsheng Liu

When solving partial differential equations (PDEs), classical numerical methods often require fine mesh grids and small time stepping to meet stability, consistency, and convergence conditions, leading to high computational cost. Recently, machine learning has been increasingly utilized to solve PDE problems, but they often encounter challenges related to interpretability, generalizability, and strong dependency on rich labeled data. Hence, we introduce a new PDE-Preserved Coarse Correction Network (P$^2$C$^2$Net) to efficiently solve spatiotemporal PDE problems on coarse mesh grids in small data regimes. The model consists of two synergistic modules: (1) a trainable PDE block that learns to update the coarse solution (i. e. , the system state), based on a high-order numerical scheme with boundary condition encoding, and (2) a neural network block that consistently corrects the solution on the fly. In particular, we propose a learnable symmetric Conv filter, with weights shared over the entire model, to accurately estimate the spatial derivatives of PDE based on the neural-corrected system state. The resulting physics-encoded model is capable of handling limited training data (e. g. , 3--5 trajectories) and accelerates the prediction of PDE solutions on coarse spatiotemporal grids while maintaining a high accuracy. P$^2$C$^2$Net achieves consistent state-of-the-art performance with over 50\% gain (e. g. , in terms of relative prediction error) across four datasets covering complex reaction-diffusion processes and turbulent flows.

TIST Journal 2024 Journal Article

PerFedRec++: Enhancing Personalized Federated Recommendation with Self-Supervised Pre-Training

  • Sichun Luo
  • Yuanzhang Xiao
  • Xinyi Zhang
  • Yang Liu
  • Wenbo Ding
  • Linqi Song

Federated recommendation systems employ federated learning techniques to safeguard user privacy by transmitting model parameters instead of raw user data between user devices and the central server. Nevertheless, the current federated recommender system faces three significant challenges: (1) data heterogeneity: the heterogeneity of users’ attributes and local data necessitates the acquisition of personalized models to improve the performance of federated recommendation; (2) model performance degradation: the privacy-preserving protocol design in the federated recommendation, such as pseudo item labeling and differential privacy, would deteriorate the model performance; (3) communication bottleneck: the standard federated recommendation algorithm can have a high communication overhead. Previous studies have attempted to address these issues, but none have been able to solve them simultaneously. In this article, we propose a novel framework, named PerFedRec++, to enhance the personalized federated recommendation with self-supervised pre-training. Specifically, we utilize the privacy-preserving mechanism of federated recommender systems to generate two augmented graph views, which are used as contrastive tasks in self-supervised graph learning to pre-train the model. Pre-training enhances the performance of federated models by improving the uniformity of representation learning. Also, by providing a better initial state for federated training, pre-training makes the overall training converge faster, thus alleviating the heavy communication burden. We then construct a collaborative graph to learn the client representation through a federated graph neural network. Based on these learned representations, we cluster users into different user groups and learn personalized models for each cluster. Each user learns a personalized model by combining the global federated model, the cluster-level federated model, and its own fine-tuned local model. Experiments on three real-world datasets show that our proposed method achieves superior performance over existing methods.

AAAI Conference 2024 Conference Paper

Performative Federated Learning: A Solution to Model-Dependent and Heterogeneous Distribution Shifts

  • Kun Jin
  • Tongxin Yin
  • Zhongzhu Chen
  • Zeyu Sun
  • Xueru Zhang
  • Yang Liu
  • Mingyan Liu

We consider a federated learning (FL) system consisting of multiple clients and a server, where the clients aim to collaboratively learn a common decision model from their distributed data. Unlike the conventional FL framework that assumes the client's data is static, we consider scenarios where the clients' data distributions may be reshaped by the deployed decision model. In this work, we leverage the idea of distribution shift mappings in performative prediction to formalize this model-dependent data distribution shift and propose a performative FL framework. We first introduce necessary and sufficient conditions for the existence of a unique performative stable solution and characterize its distance to the performative optimal solution. Then we propose the performative FedAvg algorithm and show that it converges to the performative stable solution at a rate of O(1/T) under both full and partial participation schemes. In particular, we use novel proof techniques and show how the clients' heterogeneity influences the convergence. Numerical results validate our analysis and provide valuable insights into real-world applications.

AAAI Conference 2024 Conference Paper

Personalization as a Shortcut for Few-Shot Backdoor Attack against Text-to-Image Diffusion Models

  • Yihao Huang
  • Felix Juefei-Xu
  • Qing Guo
  • Jie Zhang
  • Yutong Wu
  • Ming Hu
  • Tianlin Li
  • Geguang Pu

Although recent personalization methods have democratized high-resolution image synthesis by enabling swift concept acquisition with minimal examples and lightweight computation, they also present an exploitable avenue for highly accessible backdoor attacks. This paper investigates a critical and unexplored aspect of text-to-image (T2I) diffusion models - their potential vulnerability to backdoor attacks via personalization. By studying the prompt processing of popular personalization methods (epitomized by Textual Inversion and DreamBooth), we have devised dedicated personalization-based backdoor attacks according to the different ways of dealing with unseen tokens and divide them into two families: nouveau-token and legacy-token backdoor attacks. In comparison to conventional backdoor attacks involving the fine-tuning of the entire text-to-image diffusion model, our proposed personalization-based backdoor attack method can facilitate more tailored, efficient, and few-shot attacks. Through comprehensive empirical study, we endorse the utilization of the nouveau-token backdoor attack due to its impressive effectiveness, stealthiness, and integrity, markedly outperforming the legacy-token backdoor attack.

AIJ Journal 2024 Journal Article

Polarized message-passing in graph neural networks

  • Tiantian He
  • Yang Liu
  • Yew-Soon Ong
  • Xiaohu Wu
  • Xin Luo

In this paper, we present Polarized message-passing (PMP), a novel paradigm to revolutionize the design of message-passing graph neural networks (GNNs). In contrast to existing methods, PMP captures the power of node-node similarity and dissimilarity to acquire dual sources of messages from neighbors. The messages are then coalesced to enable GNNs to learn expressive representations from sparse but strongly correlated neighbors. Three novel GNNs based on the PMP paradigm, namely PMP graph convolutional network (PMP-GCN), PMP graph attention network (PMP-GAT), and PMP graph PageRank network (PMP-GPN) are proposed to perform various downstream tasks. Theoretical analysis is also conducted to verify the high expressiveness of the proposed PMP-based GNNs. In addition, an empirical study of five learning tasks based on 12 real-world datasets is conducted to validate the performances of PMP-GCN, PMP-GAT, and PMP-GPN. The proposed PMP-GCN, PMP-GAT, and PMP-GPN outperform numerous strong message-passing GNNs across all five learning tasks, demonstrating the effectiveness of the proposed PMP paradigm.

ICRA Conference 2024 Conference Paper

Policy Optimization by Looking Ahead for Model-based Offline Reinforcement Learning

  • Yang Liu
  • Marius Hofert

Offline reinforcement learning (RL) aims to optimize a policy, based on pre-collected data, to maximize the cumulative rewards after performing a sequence of actions. Existing approaches learn a value function from historical data and then guide the updating of the policy parameters by maximizing the value function at a single time. Driven by the gap between maximizing the cumulative rewards of RL and the greedy strategy of existing methods, we propose an approach of policy optimization by looking ahead (POLA) to mitigate the gap. Concretely, we optimize the policy on both current and future states where the future states are predicted by a transition model. A trajectory contains numerous actions before the task is done. Performing the best action at each time does not mean an optimal trajectory in the end. We need to allow sub-optimal or negative actions occasionally. But existing methods focus on generating the optimal action at each time according to the maximizing Q-value principle. This motivates our looking ahead approach. Besides, hidden confounding factors may affect the decision making process. To that end, we incorporate the correlations among dimensions of the state into the policy, providing more information about the environment for the policy to make decisions. Empirical results on the Mujoco dataset show the effectiveness of the proposed approach.

AAAI Conference 2024 Conference Paper

Providing Fair Recourse over Plausible Groups

  • Jayanth Yetukuri
  • Ian Hardy
  • Yevgeniy Vorobeychik
  • Berk Ustun
  • Yang Liu

Machine learning models now automate decisions in applications where we may wish to provide recourse to adversely affected individuals. In practice, existing methods to provide recourse return actions that fail to account for latent characteristics that are not captured in the model (e.g., age, sex, marital status). In this paper, we study how the cost and feasibility of recourse can change across these latent groups. We introduce a notion of group-level plausibility to identify groups of individuals with a shared set of latent characteristics. We develop a general-purpose clustering procedure to identify groups from samples. Further, we propose a constrained optimization approach to learn models that equalize the cost of recourse over latent groups. We evaluate our approach through an empirical study on simulated and real-world datasets, showing that it can produce models that have better performance in terms of overall costs and feasibility at a group level.

ICLR Conference 2024 Conference Paper

Reinforcement Symbolic Regression Machine

  • Yilong Xu
  • Yang Liu
  • Hao Sun 0002

In nature, the behavior of many complex systems can be described by parsimonious math equations. Symbolic Regression (SR) is defined as the task of automatically distilling equations from limited data. Keen efforts have been placed on tackling this issue and demonstrated success in SR. However, there still exist bottlenecks that current methods struggle to break, when the expressions we need to explore tend toward infinity and especially when the underlying math formula is intricate. To this end, we propose a novel Reinforcement Symbolic Regression Machine (RSRM) that masters the capability of uncovering complex math equations from only scarce data. The RSRM model is composed of three key modules: (1) a Monte Carlo tree search (MCTS) agent, designed for exploration, that explores optimal math expression trees consisting of pre-defined math operators and variables, (2) a Double Q-learning block, designed for exploitation, that helps reduce the feasible search space of MCTS via properly understanding the distribution of reward, and (3) a modulated sub-tree discovery block that heuristically learns and defines new math operators to improve representation ability of math expression trees. Binding of these modules yields the SOTA performance of RSRM in SR as demonstrated by multiple benchmark datasets. The RSRM shows clear superiority over several representative baseline models.

AAAI Conference 2024 Conference Paper

Robust Evaluation Measures for Evaluating Social Biases in Masked Language Models

  • Yang Liu

Many evaluation measures are used to evaluate social biases in masked language models (MLMs). However, we find that these previously proposed evaluation measures are lacking robustness in scenarios with limited datasets. This is because these measures are obtained by comparing the pseudo-log-likelihood (PLL) scores of the stereotypical and anti-stereotypical samples using an indicator function. The disadvantage is the limited mining of the PLL score sets without capturing its distributional information. In this paper, we represent a PLL score set as a Gaussian distribution and use Kullback-Leibler (KL) divergence and Jensen–Shannon (JS) divergence to construct evaluation measures for the distributions of stereotypical and anti-stereotypical PLL scores. Experimental results on the publicly available datasets StereoSet (SS) and CrowS-Pairs (CP) show that our proposed measures are significantly more robust and interpretable than those proposed previously.

NeurIPS Conference 2024 Conference Paper

Scalable Optimization in the Modular Norm

  • Tim Large
  • Yang Liu
  • Minyoung Huh
  • Hyojin Bahng
  • Phillip Isola
  • Jeremy Bernstein

To improve performance in contemporary deep learning, one is interested in scaling up the neural network in terms of both the number and the size of the layers. When ramping up the width of a single layer, graceful scaling of training has been linked to the need to normalize the weights and their updates in the "natural norm" particular to that layer. In this paper, we significantly generalize this idea by defining the modular norm, which is the natural norm on the full weight space of any neural network architecture. The modular norm is defined recursively in tandem with the network architecture itself. We show that the modular norm has several promising applications. On the practical side, the modular norm can be used to normalize the updates of any base optimizer so that the learning rate becomes transferable across width and depth. This means that the user does not need to compute optimizer-specific scale factors in order to scale training. On the theoretical side, we show that for any neural network built from "well-behaved" atomic modules, the gradient of the network is Lipschitz-continuous in the modular norm, with the Lipschitz constant admitting a simple recursive formula. This characterization opens the door to porting standard ideas in optimization theory over to deep learning. We have created a Python package called Modula that automatically normalizes weight updates in the modular norm of the architecture. Both the Modula package and code for our experiments are provided in the supplementary material.

AAAI Conference 2024 Conference Paper

Semantic-Guided Novel Category Discovery

  • Weishuai Wang
  • Ting Lei
  • Qingchao Chen
  • Yang Liu

The Novel Category Discovery problem aims to cluster an unlabeled set with the help of a labeled set consisting of disjoint but related classes. However, existing models treat class names as discrete one-hot labels and ignore the semantic understanding of these classes. In this paper, we propose a new setting named Semantic-guided Novel Category Discovery (SNCD), which requires the model to not only cluster the unlabeled images but also semantically recognize these images based on a set of their class names. The first challenge we confront pertains to effectively leveraging the class names of unlabeled images, given the inherent gap between the visual and linguistic domains. To address this issue, we incorporate a semantic-aware recognition mechanism. This is achieved by constructing dynamic class-wise visual prototypes as well as a semantic similarity matrix that enables the projection of visual features into the semantic space. The second challenge originates from the granularity disparity between the classification and clustering tasks. To deal with this, we develop a semantic-aware clustering process to facilitate the exchange of knowledge between the two tasks. Through extensive experiments, we demonstrate the mutual benefits of the recognition and clustering tasks, which can be jointly optimized. Experimental results on multiple datasets confirm the effectiveness of our proposed method. Our code is available at https://github.com/wang-weishuai/Semantic-guided-NCD.

IS Journal 2024 Journal Article

Ship Grid: A Novel Anchor-Free Ship Detection Algorithm

  • Yantong Chen
  • Yanyan Zhang
  • Jialiang Wang
  • Yang Liu

Video-based ship detection is crucial for the real-time monitoring of maritime activities, aiding decision making in maritime traffic management, safety monitoring, and rescue operations. Current challenges include multiscale variations and occlusion issues affecting detection accuracy. Existing ship detection methods often address the multiscale problem by redesigning the network architecture, providing limited improvements. We present Ship Grid, an innovative anchor-free ship detection algorithm. Ship Grid tackles the challenges of ship feature capture in occluded scenarios by directly generating bounding boxes at the predicted centers during the label assignment phase. Moreover, it enables simultaneous ship feature extraction at multiple scales, effectively addressing the issues of insufficient feature extraction for small objects and imprecise localization for large objects caused by stark scale variations. In the bounding box regression phase, we introduce a scale-invariant localization loss that guides the regression process of prediction boxes at different scales. This approach allows the network to comprehensively learn ship features across multiple scales and further enhances performance in the presence of large ship scale variations. We rigorously evaluated the ship grid on the SeaShips dataset, achieving 0. 988 and 0. 835 on the evaluation metrics of mean average precision (mAP) at an intersection over union (IoU) threshold of 0. 5 and mAP at IoU thresholds ranging from 0. 5 to 0. 95 This outperforms state-of-the-art methods, demonstrating its advantage in ship detection.

NeurIPS Conference 2024 Conference Paper

TopoFR: A Closer Look at Topology Alignment on Face Recognition

  • Jun Dan
  • Yang Liu
  • Jiankang Deng
  • Haoyu Xie
  • Siyuan Li
  • Baigui Sun
  • Shan Luo

The field of face recognition (FR) has undergone significant advancements with the rise of deep learning. Recently, the success of unsupervised learning and graph neural networks has demonstrated the effectiveness of data structure information. Considering that the FR task can leverage large-scale training data, which intrinsically contains significant structure information, we aim to investigate how to encode such critical structure information into the latent space. As revealed from our observations, directly aligning the structure information between the input and latent spaces inevitably suffers from an overfitting problem, leading to a structure collapse phenomenon in the latent space. To address this problem, we propose TopoFR, a novel FR model that leverages a topological structure alignment strategy called PTSA and a hard sample mining strategy named SDE. Concretely, PTSA uses persistent homology to align the topological structures of the input and latent spaces, effectively preserving the structure information and improving the generalization performance of FR model. To mitigate the impact of hard samples on the latent space structure, SDE accurately identifies hard samples by automatically computing structure damage score (SDS) for each sample, and directs the model to prioritize optimizing these samples. Experimental results on popular face benchmarks demonstrate the superiority of our TopoFR over the state-of-the-art methods. Code and models are available at: https: //github. com/modelscope/facechain/tree/main/face_module/TopoFR.

NeurIPS Conference 2024 Conference Paper

Toward Robust Incomplete Multimodal Sentiment Analysis via Hierarchical Representation Learning

  • Mingcheng Li
  • Dingkang Yang
  • Yang Liu
  • Shunli Wang
  • Jiawei Chen
  • Shuaibing Wang
  • Jinjie Wei
  • Yue Jiang

Multimodal Sentiment Analysis (MSA) is an important research area that aims to understand and recognize human sentiment through multiple modalities. The complementary information provided by multimodal fusion promotes better sentiment analysis compared to utilizing only a single modality. Nevertheless, in real-world applications, many unavoidable factors may lead to situations of uncertain modality missing, thus hindering the effectiveness of multimodal modeling and degrading the model’s performance. To this end, we propose a Hierarchical Representation Learning Framework (HRLF) for the MSA task under uncertain missing modalities. Specifically, we propose a fine-grained representation factorization module that sufficiently extracts valuable sentiment information by factorizing modality into sentiment-relevant and modality-specific representations through crossmodal translation and sentiment semantic reconstruction. Moreover, a hierarchical mutual information maximization mechanism is introduced to incrementally maximize the mutual information between multi-scale representations to align and reconstruct the high-level semantics in the representations. Ultimately, we propose a hierarchical adversarial learning mechanism that further aligns and adapts the latent distribution of sentiment-relevant representations to produce robust joint multimodal representations. Comprehensive experiments on three datasets demonstrate that HRLF significantly improves MSA performance under uncertain modality missing cases.

NeurIPS Conference 2024 Conference Paper

Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking

  • Yuwei Zhang
  • Tong Xia
  • Jing Han
  • Yu Y. Wu
  • Georgios Rizos
  • Yang Liu
  • Mohammed Mosuily
  • Jagmohan Chauhan

Respiratory audio, such as coughing and breathing sounds, has predictive power for a wide range of healthcare applications, yet is currently under-explored. The main problem for those applications arises from the difficulty in collecting large labeled task-specific data for model development. Generalizable respiratory acoustic foundation models pretrained with unlabeled data would offer appealing advantages and possibly unlock this impasse. However, given the safety-critical nature of healthcare applications, it is pivotal to also ensure openness and replicability for any proposed foundation model solution. To this end, we introduce OPERA, an OPEn Respiratory Acoustic foundation model pretraining and benchmarking system, as the first approach answering this need. We curate large-scale respiratory audio datasets ($\sim$136K samples, over 400 hours), pretrain three pioneering foundation models, and build a benchmark consisting of 19 downstream respiratory health tasks for evaluation. Our pretrained models demonstrate superior performance (against existing acoustic models pretrained with general audio on 16 out of 19 tasks) and generalizability (to unseen datasets and new respiratory audio modalities). This highlights the great promise of respiratory acoustic foundation models and encourages more studies using OPERA as an open resource to accelerate research on respiratory audio for health. The system is accessible from https: //github. com/evelyn0414/OPERA.

NeurIPS Conference 2024 Conference Paper

Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation

  • Muzhi Zhu
  • Yang Liu
  • Zekai Luo
  • Chenchen Jing
  • Hao Chen
  • Guangkai Xu
  • Xinlong Wang
  • Chunhua Shen

The Diffusion Model has not only garnered noteworthy achievements in the realm of image generation but has also demonstrated its potential as an effective pretraining method utilizing unlabeled data. Drawing from the extensive potential unveiled by the Diffusion Model in both semantic correspondence and open vocabulary segmentation, our work initiates an investigation into employing the Latent Diffusion Model for Few-shot Semantic Segmentation. Recently, inspired by the in-context learning ability of large language models, Few-shot Semantic Segmentation has evolved into In-context Segmentation tasks, morphing into a crucial element in assessing generalist segmentation models. In this context, we concentrate on Few-shot Semantic Segmentation, establishing a solid foundation for the future development of a Diffusion-based generalist model for segmentation. Our initial focus lies in understanding how to facilitate interaction between the query image and the support image, resulting in the proposal of a KV fusion method within the self-attention framework. Subsequently, we delve deeper into optimizing the infusion of information from the support mask and simultaneously re-evaluating how to provide reasonable supervision from the query mask. Based on our analysis, we establish a simple and effective framework named DiffewS, maximally retaining the original Latent Diffusion Model's generative framework and effectively utilizing the pre-training prior. Experimental results demonstrate that our method significantly outperforms the previous SOTA models in multiple settings.

NeurIPS Conference 2024 Conference Paper

User-Creator Feature Polarization in Recommender Systems with Dual Influence

  • Tao Lin
  • Kun Jin
  • Andrew Estornell
  • Xiaoying Zhang
  • Yiling Chen
  • Yang Liu

Recommender systems serve the dual purpose of presenting relevant content to users and helping content creators reach their target audience. The dual nature of these systems naturally influences both users and creators: users' preferences are affected by the items they are recommended, while creators may be incentivized to alter their content to attract more users. We define a model, called user-creator feature dynamics, to capture the dual influence of recommender systems. We prove that a recommender system with dual influence is guaranteed to polarize, causing diversity loss in the system. We then investigate, both theoretically and empirically, approaches for mitigating polarization and promoting diversity in recommender systems. Unexpectedly, we find that common diversity-promoting approaches do not work in the presence of dual influence, while relevancy-optimizing methods like top-$k$ truncation can prevent polarization and improve diversity of the system.

IJCAI Conference 2024 Conference Paper

Vision-based Discovery of Nonlinear Dynamics for 3D Moving Target

  • Zitong Zhang
  • Yang Liu
  • Hao Sun

Data-driven discovery of governing equations has kindled significant interests in many science and engineering areas. Existing studies primarily focus on uncovering equations that govern nonlinear dynamics based on direct measurement of the system states (e. g. , trajectories). Limited efforts have been placed on distilling governing laws of dynamics directly from videos for moving targets in a 3D space. To this end, we propose a vision-based approach to automatically uncover governing equations of nonlinear dynamics for 3D moving targets via raw videos recorded by a set of cameras. The approach is composed of three key blocks: (1) a target tracking module that extracts plane pixel motions of the moving target in each video, (2) a Rodrigues' rotation formula-based coordinate transformation learning module that reconstructs the 3D coordinates with respect to a predefined reference point, and (3) a spline-enhanced library-based sparse regressor that uncovers the underlying governing law of dynamics. This framework is capable of effectively handling the challenges associated with measurement data, e. g. , noise in the video, imprecise tracking of the target that causes data missing, etc. The efficacy of our method has been demonstrated through multiple sets of synthetic videos considering different nonlinear dynamics.

IJCAI Conference 2024 Conference Paper

VSGT: Variational Spatial and Gaussian Temporal Graph Models for EEG-based Emotion Recognition

  • Chenyu Liu
  • Xinliang Zhou
  • Jiaping Xiao
  • Zhengri Zhu
  • Liming Zhai
  • Ziyu Jia
  • Yang Liu

Electroencephalogram (EEG), which directly reflects the emotional activity of the brain, has been increasingly utilized for emotion recognition. Most works exploit the spatial and temporal dependencies in EEG to learn emotional feature representations, but they still have two limitations to reach their full potential. First, prior knowledge is rarely used to capture the spatial dependency of brain regions. Second, the cross temporal dependency between consecutive time slices for different brain regions is ignored. To address these limitations, in this paper, we propose Variational Spatial and Gaussian Temporal (VSGT) graph models to investigate the spatial and temporal dependencies for EEG-based emotion recognition. The VSGT has two key components: Variational Spatial Encoder (VSE) and Gaussian Temporal Encoder (GTE). The VSE leverages the upper bound theorem to identify the dynamic spatial dependency based on prior knowledge by the variational Bayesian method. Besides, the GTE exploits the conditional Gaussian graph transform that computes comprehensive temporal dependency between consecutive time slices. Finally, the VSGT utilizes a recurrent structure to calculate the spatial and temporal dependencies for all time slices. Extensive experiments show the superiority of VSGT over state-of-the-art methods on multiple EEG datasets.

NeurIPS Conference 2024 Conference Paper

WenMind: A Comprehensive Benchmark for Evaluating Large Language Models in Chinese Classical Literature and Language Arts

  • Jiahuan Cao
  • Yang Liu
  • Yongxin Shi
  • Kai Ding
  • Lianwen Jin

Large Language Models (LLMs) have made significant advancements across numerous domains, but their capabilities in Chinese Classical Literature and Language Arts (CCLLA) remain largely unexplored due to the limited scope and tasks of existing benchmarks. To fill this gap, we propose WenMind, a comprehensive benchmark dedicated for evaluating LLMs in CCLLA. WenMind covers the sub-domains of Ancient Prose, Ancient Poetry, and Ancient Literary Culture, comprising 4, 875 question-answer pairs, spanning 42 fine-grained tasks, 3 question formats, and 2 evaluation scenarios: domain-oriented and capability-oriented. Based on WenMind, we conduct a thorough evaluation of 31 representative LLMs, including general-purpose models and ancient Chinese LLMs. The results reveal that even the best-performing model, ERNIE-4. 0, only achieves a total score of 64. 3, indicating significant room for improvement of LLMs in the CCLLA domain. We also provide insights into the strengths and weaknesses of different LLMs and highlight the importance of pre-training data in achieving better results. Overall, WenMind serves as a standardized and comprehensive baseline, providing valuable insights for future CCLLA research. Our benchmark and related code are available at \url{https: //github. com/SCUT-DLVCLab/WenMind}.

IJCAI Conference 2024 Conference Paper

With a Little Help from Language: Semantic Enhanced Visual Prototype Framework for Few-Shot Learning

  • Hecheng Cai
  • Yang Liu
  • Shudong Huang
  • Jiancheng Lv

Few-shot learning (FSL) aims to recognize new categories given limited training samples. The core challenge is to avoid overfitting to the minimal data while ensuring good generalization to novel classes. One mainstream method employs prototypes from visual feature extractors as classifier weight and the performance depends on the quality of the prototype. Since different categories may have similar visual features, the visual prototype has limitations. This is because existing methods only learn a simple visual feature extractor during the pre-training stage but neglect the importance of a well-developed feature space for the prototype. We introduce the Semantic Enhanced Visual Prototype framework (SEVpro) to address this issue. SEVpro refines prototype learning from the pre-training stage and serves as a versatile plug-and-play framework for all prototype-based FSL methods. Specifically, we enhance prototype discriminability by transforming semantic embeddings into the visual space, aiding in separating categories with similar visual features. For novel class learning, we leverage knowledge from base classes and incorporate semantic information to elevate prototype quality further. Meanwhile, extensive experiments on FSL benchmarks and ablation studies demonstrate the superiority of our proposed SEVpro for FSL.

ICLR Conference 2024 Conference Paper

ZeroFlow: Scalable Scene Flow via Distillation

  • Kyle Vedder
  • Neehar Peri
  • Nathaniel Chodosh
  • Ishan Khatri
  • Eric Eaton
  • Dinesh Jayaraman
  • Yang Liu
  • Deva Ramanan

Scene flow estimation is the task of describing the 3D motion field between temporally successive point clouds. State-of-the-art methods use strong priors and test-time optimization techniques, but require on the order of tens of seconds to process full-size point clouds, making them unusable as computer vision primitives for real-time applications such as open world object detection. Feedforward methods are considerably faster, running on the order of tens to hundreds of milliseconds for full-size point clouds, but require expensive human supervision. To address both limitations, we propose _Scene Flow via Distillation_, a simple, scalable distillation framework that uses a label-free optimization method to produce pseudo-labels to supervise a feedforward model. Our instantiation of this framework, _ZeroFlow_, achieves **state-of-the-art** performance on the _Argoverse 2 Self-Supervised Scene Flow Challenge_ while using zero human labels by simply training on large-scale, diverse unlabeled data. At test-time, ZeroFlow is over 1000$\times$ faster than label-free state-of-the-art optimization-based methods on full-size point clouds (34 FPS vs 0.028 FPS) and over 1000$\times$ cheaper to train on unlabeled data compared to the cost of human annotation (\\$394 vs ~\\$750,000). To facilitate further research, we will release our code, trained model weights, and high quality pseudo-labels for the Argoverse 2 and Waymo Open datasets.

AAAI Conference 2023 Conference Paper

A Simple Yet Effective Subsequence-Enhanced Approach for Cross-Domain NER

  • Jinpeng Hu
  • DanDan Guo
  • Yang Liu
  • Zhuo Li
  • Zhihong Chen
  • Xiang Wan
  • Tsung-Hui Chang

Cross-domain named entity recognition (NER), aiming to address the limitation of labeled resources in the target domain, is a challenging yet important task. Most existing studies alleviate the data discrepancy across different domains at the coarse level via combing NER with language modelings or introducing domain-adaptive pre-training (DAPT). Notably, source and target domains tend to share more fine-grained local information within denser subsequences than global information within the whole sequence, such that subsequence features are easier to transfer, which has not been explored well. Besides, compared to token-level representation, subsequence-level information can help the model distinguish different meanings of the same word in different domains. In this paper, we propose to incorporate subsequence-level features for promoting the cross-domain NER. In detail, we first utilize a pre-trained encoder to extract the global information. Then, we re-express each sentence as a group of subsequences and propose a novel bidirectional memory recurrent unit (BMRU) to capture features from the subsequences. Finally, an adaptive coupling unit (ACU) is proposed to combine global information and subsequence features for predicting entity labels. Experimental results on several benchmark datasets illustrate the effectiveness of our model, which achieves considerable improvements.

JBHI Journal 2023 Journal Article

Analysis and Recognition of Voluntary Facial Expression Mimicry Based on Depressed Patients

  • Jiayu Ye
  • Yanhong Yu
  • Gang Fu
  • Yunshao Zheng
  • Yang Liu
  • Yitao Zhu
  • Qingxiang Wang

Many clinical studies have shown that facial expression recognition and cognitive function are impaired in depressed patients. Different from spontaneous facial expression mimicry (SFEM), 164 subjects (82 in a case group and 82 in a control group) participated in our voluntary facial expression mimicry (VFEM) experiment using expressions of neutrality, anger, disgust, fear, happiness, sadness and surprise. Our research is as follows. First, we collected a large amount of subject data for VFEM. Second, we extracted the geometric features of subject facial expression images for VFEM and used Spearman correlation analysis, a random forest, and logistic regression-based recursive feature elimination (LR-RFE) to perform feature selection. The features selected revealed the difference between the case group and the control group. Third, we combined geometric features with the original images and improved the advanced deep learning facial expression recognition (FER) algorithms in different systems. We propose the E-ViT and E-ResNet based on VFEM. The accuracies and F1 scores were higher than those of the baseline models, respectively. Our research proved that it is effective to use feature selection to screen geometric features and combine them with a deep learning model for depression facial expression recognition.

AAAI Conference 2023 Conference Paper

Anonymization for Skeleton Action Recognition

  • Saemi Moon
  • Myeonghyeon Kim
  • Zhenyue Qin
  • Yang Liu
  • Dongwoo Kim

Skeleton-based action recognition attracts practitioners and researchers due to the lightweight, compact nature of datasets. Compared with RGB-video-based action recognition, skeleton-based action recognition is a safer way to protect the privacy of subjects while having competitive recognition performance. However, due to improvements in skeleton recognition algorithms as well as motion and depth sensors, more details of motion characteristics can be preserved in the skeleton dataset, leading to potential privacy leakage. We first train classifiers to categorize private information from skeleton trajectories to investigate the potential privacy leakage from skeleton datasets. Our preliminary experiments show that the gender classifier achieves 87% accuracy on average, and the re-identification classifier achieves 80% accuracy on average with three baseline models: Shift-GCN, MS-G3D, and 2s-AGCN. We propose an anonymization framework based on adversarial learning to protect potential privacy leakage from the skeleton dataset. Experimental results show that an anonymized dataset can reduce the risk of privacy leakage while having marginal effects on action recognition performance even with simple anonymizer architectures. The code used in our experiments is available at https://github.com/ml-postech/Skeleton-anonymization/

AAAI Conference 2023 Conference Paper

AudioEar: Single-View Ear Reconstruction for Personalized Spatial Audio

  • Xiaoyang Huang
  • Yanjun Wang
  • Yang Liu
  • Bingbing Ni
  • Wenjun Zhang
  • Jinxian Liu
  • Teng Li

Spatial audio, which focuses on immersive 3D sound rendering, is widely applied in the acoustic industry. One of the key problems of current spatial audio rendering methods is the lack of personalization based on different anatomies of individuals, which is essential to produce accurate sound source positions. In this work, we address this problem from an interdisciplinary perspective. The rendering of spatial audio is strongly correlated with the 3D shape of human bodies, particularly ears. To this end, we propose to achieve personalized spatial audio by reconstructing 3D human ears with single-view images. First, to benchmark the ear reconstruction task, we introduce AudioEar3D, a high-quality 3D ear dataset consisting of 112 point cloud ear scans with RGB images. To self-supervisedly train a reconstruction model, we further collect a 2D ear dataset composed of 2,000 images, each one with manual annotation of occlusion and 55 landmarks, named AudioEar2D. To our knowledge, both datasets have the largest scale and best quality of their kinds for public use. Further, we propose AudioEarM, a reconstruction method guided by a depth estimation network that is trained on synthetic data, with two loss functions tailored for ear data. Lastly, to fill the gap between the vision and acoustics community, we develop a pipeline to integrate the reconstructed ear mesh with an off-the-shelf 3D human body and simulate a personalized Head-Related Transfer Function (HRTF), which is the core of spatial audio rendering. Code and data are publicly available in https://github.com/seanywang0408/AudioEar.

AAAI Conference 2023 Conference Paper

Background-Mixed Augmentation for Weakly Supervised Change Detection

  • Rui Huang
  • Ruofei Wang
  • Qing Guo
  • Jieda Wei
  • Yuxiang Zhang
  • Wei Fan
  • Yang Liu

Change detection (CD) is to decouple object changes (i.e., object missing or appearing) from background changes (i.e., environment variations) like light and season variations in two images captured in the same scene over a long time span, presenting critical applications in disaster management, urban development, etc. In particular, the endless patterns of background changes require detectors to have a high generalization against unseen environment variations, making this task significantly challenging. Recent deep learning-based methods develop novel network architectures or optimization strategies with paired-training examples, which do not handle the generalization issue explicitly and require huge manual pixel-level annotation efforts. In this work, for the first attempt in the CD community, we study the generalization issue of CD from the perspective of data augmentation and develop a novel weakly supervised training algorithm that only needs image-level labels. Different from general augmentation techniques for classification, we propose the background-mixed augmentation that is specifically designed for change detection by augmenting examples under the guidance of a set of background changing images and letting deep CD models see diverse environment variations. Moreover, we propose the augmented & real data consistency loss that encourages the generalization increase significantly. Our method as a general framework can enhance a wide range of existing deep learning-based detectors. We conduct extensive experiments in two public datasets and enhance four state-of-the-art methods, demonstrating the advantages of our method. We release the code at https://github.com/tsingqguo/bgmix.

EAAI Journal 2023 Journal Article

Boosting fish counting in sonar images with global attention and point supervision

  • Yunhong Duan
  • Shubin Zhang
  • Yang Liu
  • Jincun Liu
  • Dong An
  • Yaoguang Wei

Automatically counting fish in sonar images has been attracting increasing attention in recent years because extreme efforts are needed in manual counting. Density map regression provides a promising approach in the counting field, but two obstacles are placed in front of fish counting in low resolution sonar images: the difficulty in distinguishing fish from the similar background noise and the inconsistency between the strip-shaped fishes in input images and dot-shaped ground truth density map. To address these issues, we present GPNet, a novel encoder-decoder network with global attention and point supervision, to boost sonar image-based fish counting accuracy. To alleviate the impact of background noise, we incorporate a segmentation module (SM) with global self-attention to the neck of the network to identify the fish region and space out background noise. Furthermore, feature enhancement modules (FEM) with a global receptive field are introduced to the encoder to enhance the feature representation and discrimination. To break down the performance upper bound resulting from target shape inconsistency between input and ground truth, we leverage fish center coordinates instead of the Gaussian density map to supervise the network training directly. Extensive experiments on a challenging public sonar image-based fish counting dataset, the ARIS dataset, demonstrate that GPNet achieves state-of-the-art performance both in counting accuracy and noise removal.

IJCAI Conference 2023 Conference Paper

CostFormer: Cost Transformer for Cost Aggregation in Multi-view Stereo

  • Weitao Chen
  • Hongbin Xu
  • Zhipeng Zhou
  • Yang Liu
  • Baigui Sun
  • Wenxiong Kang
  • Xuansong Xie

The core of Multi-view Stereo(MVS) is the matching process among reference and source pixels. Cost aggregation plays a significant role in this process, while previous methods focus on handling it via CNNs. This may inherit the natural limitation of CNNs that fail to discriminate repetitive or incorrect matches due to limited local receptive fields. To handle the issue, we aim to involve Transformer into cost aggregation. However, another problem may occur due to the quadratically growing computational complexity caused by Transformer, resulting in memory overflow and inference latency. In this paper, we overcome these limits with an efficient Transformer-based cost aggregation network, namely CostFormer. The Residual Depth-Aware Cost Transformer(RDACT) is proposed to aggregate long-range features on cost volume via self-attention mechanisms along the depth and spatial dimensions. Furthermore, Residual Regression Transformer(RRT) is proposed to enhance spatial attention. The proposed method is a universal plug-in to improve learning-based MVS methods.

JBHI Journal 2023 Journal Article

Cross-Domain Unpaired Learning for Low-Dose CT Imaging

  • Yang Liu
  • Gaofeng Chen
  • Shumao Pang
  • Dong Zeng
  • Youde Ding
  • Guoxi Xie
  • Jianhua Ma
  • Ji He

Supervised deep-learning techniques with paired training datasets have been widely studied for low-dose computed tomography (LDCT) imaging with excellent performance. However, the paired training datasets are usually difficult to obtain in clinical routine, which restricts the wide adoption of supervised deep-learning techniques in clinical practices. To address this issue, a general idea is to construct a pseudo paired training dataset based on the widely available unpaired data, after which, supervised deep-learning techniques can be adopted for improving the LDCT imaging performance by training on the pseudo paired training dataset. However, due to the complexity of noise properties in CT imaging, the LDCT data are difficult to generate in order to construct the pseudo paired training dataset. In this article, we propose a simple yet effective cross-domain unpaired learning framework for pseudo LDCT data generation and LDCT image reconstruction, which is denoted as CrossDuL. Specifically, a dedicated pseudo LDCT sinogram generative module is constructed based on a data-dependent noise model in the sinogram domain, and then instead of in the sinogram domain, a pseudo paired dataset is constructed in the image domain to train an LDCT image restoration module. To validate the effectiveness of the proposed framework, clinical datasets are adopted. Experimental results demonstrate that the CrossDuL framework can obtain promising LDCT imaging performance in both quantitative and qualitative measurements.

NeurIPS Conference 2023 Conference Paper

Crystal Structure Prediction by Joint Equivariant Diffusion

  • Rui Jiao
  • Wenbing Huang
  • Peijia Lin
  • Jiaqi Han
  • Pin Chen
  • Yutong Lu
  • Yang Liu

Crystal Structure Prediction (CSP) is crucial in various scientific disciplines. While CSP can be addressed by employing currently-prevailing generative models ( e. g. diffusion models), this task encounters unique challenges owing to the symmetric geometry of crystal structures---the invariance of translation, rotation, and periodicity. To incorporate the above symmetries, this paper proposes DiffCSP, a novel diffusion model to learn the structure distribution from stable crystals. To be specific, DiffCSP jointly generates the lattice and atom coordinates for each crystal by employing a periodic-E(3)-equivariant denoising model, to better model the crystal geometry. Notably, different from related equivariant generative approaches, DiffCSP leverages fractional coordinates other than Cartesian coordinates to represent crystals, remarkably promoting the diffusion and the generation process of atom positions. Extensive experiments verify that our DiffCSP remarkably outperforms existing CSP methods, with a much lower computation cost in contrast to DFT-based methods. Moreover, the superiority of DiffCSP is still observed when it is extended for ab initio crystal generation.

NeurIPS Conference 2023 Conference Paper

Deep Insights into Noisy Pseudo Labeling on Graph Data

  • Botao Wang
  • Jia Li
  • Yang Liu
  • Jiashun Cheng
  • Yu Rong
  • Wenjia Wang
  • Fugee Tsung

Pseudo labeling (PL) is a wide-applied strategy to enlarge the labeled dataset by self-annotating the potential samples during the training process. Several works have shown that it can improve the graph learning model performance in general. However, we notice that the incorrect labels can be fatal to the graph training process. Inappropriate PL may result in the performance degrading, especially on graph data where the noise can propagate. Surprisingly, the corresponding error is seldom theoretically analyzed in the literature. In this paper, we aim to give deep insights of PL on graph learning models. We first present the error analysis of PL strategy by showing that the error is bounded by the confidence of PL threshold and consistency of multi-view prediction. Then, we theoretically illustrate the effect of PL on convergence property. Based on the analysis, we propose a cautious pseudo labeling methodology in which we pseudo label the samples with highest confidence and multi-view consistency. Finally, extensive experiments demonstrate that the proposed strategy improves graph learning process and outperforms other PL strategies on link prediction and node classification tasks.

IJCAI Conference 2023 Conference Paper

DenseLight: Efficient Control for Large-scale Traffic Signals with Dense Feedback

  • Junfan Lin
  • Yuying Zhu
  • Lingbo Liu
  • Yang Liu
  • Guanbin Li
  • Liang Lin

Traffic Signal Control (TSC) aims to reduce the average travel time of vehicles in a road network, which in turn enhances fuel utilization efficiency, air quality, and road safety, benefiting society as a whole. Due to the complexity of long-horizon control and coordination, most prior TSC methods leverage deep reinforcement learning (RL) to search for a control policy and have witnessed great success. However, TSC still faces two significant challenges. 1) The travel time of a vehicle is delayed feedback on the effectiveness of TSC policy at each traffic intersection since it is obtained after the vehicle has left the road network. Although several heuristic reward functions have been proposed as substitutes for travel time, they are usually biased and not leading the policy to improve in the correct direction. 2) The traffic condition of each intersection is influenced by the non-local intersections since vehicles traverse multiple intersections over time. Therefore, the TSC agent is required to leverage both the local observation and the non-local traffic conditions to predict the long-horizontal traffic conditions of each intersection comprehensively. To address these challenges, we propose DenseLight, a novel RL-based TSC method that employs an unbiased reward function to provide dense feedback on policy effectiveness and a non-local enhanced TSC agent to better predict future traffic conditions for more precise traffic control. Extensive experiments and ablation studies demonstrate that DenseLight can consistently outperform advanced baselines on various road networks with diverse traffic flows. The code is available at https: //github. com/junfanlin/DenseLight.

AAAI Conference 2023 Conference Paper

EASAL: Entity-Aware Subsequence-Based Active Learning for Named Entity Recognition

  • Yang Liu
  • Jinpeng Hu
  • Zhihong Chen
  • Xiang Wan
  • Tsung-Hui Chang

Active learning is a critical technique for reducing labelling load by selecting the most informative data. Most previous works applied active learning on Named Entity Recognition (token-level task) similar to the text classification (sentence-level task). They failed to consider the heterogeneity of uncertainty within each sentence and required access to the entire sentence for the annotator when labelling. To overcome the mentioned limitations, in this paper, we allow the active learning algorithm to query subsequences within sentences and propose an Entity-Aware Subsequences-based Active Learning (EASAL) that utilizes an effective Head-Tail pointer to query one entity-aware subsequence for each sentence based on BERT. For other tokens outside this subsequence, we randomly select 30% of these tokens to be pseudo-labelled for training together where the model directly predicts their pseudo-labels. Experimental results on both news and biomedical datasets demonstrate the effectiveness of our proposed method. The code is released at https://github.com/lylylylylyly/EASAL.

NeurIPS Conference 2023 Conference Paper

Echoes Beyond Points: Unleashing the Power of Raw Radar Data in Multi-modality Fusion

  • Yang Liu
  • Feng Wang
  • Naiyan Wang
  • ZHAO-XIANG ZHANG

Radar is ubiquitous in autonomous driving systems due to its low cost and good adaptability to bad weather. Nevertheless, the radar detection performance is usually inferior because its point cloud is sparse and not accurate due to the poor azimuth and elevation resolution. Moreover, point cloud generation algorithms already drop weak signals to reduce the false targets which may be suboptimal for the use of deep fusion. In this paper, we propose a novel method named EchoFusion to skip the existing radar signal processing pipeline and then incorporate the radar raw data with other sensors. Specifically, we first generate the Bird's Eye View (BEV) queries and then take corresponding spectrum features from radar to fuse with other sensors. By this approach, our method could utilize both rich and lossless distance and speed clues from radar echoes and rich semantic clues from images, making our method surpass all existing methods on the RADIal dataset, and approach the performance of LiDAR. The code will be released on https: //github. com/tusen-ai/EchoFusion.

AAAI Conference 2023 Conference Paper

Energy-Motivated Equivariant Pretraining for 3D Molecular Graphs

  • Rui Jiao
  • Jiaqi Han
  • Wenbing Huang
  • Yu Rong
  • Yang Liu

Pretraining molecular representation models without labels is fundamental to various applications. Conventional methods mainly process 2D molecular graphs and focus solely on 2D tasks, making their pretrained models incapable of characterizing 3D geometry and thus defective for downstream 3D tasks. In this work, we tackle 3D molecular pretraining in a complete and novel sense. In particular, we first propose to adopt an equivariant energy-based model as the backbone for pretraining, which enjoys the merits of fulfilling the symmetry of 3D space. Then we develop a node-level pretraining loss for force prediction, where we further exploit the Riemann-Gaussian distribution to ensure the loss to be E(3)-invariant, enabling more robustness. Moreover, a graph-level noise scale prediction task is also leveraged to further promote the eventual performance. We evaluate our model pretrained from a large-scale 3D dataset GEOM-QM9 on two challenging 3D benchmarks: MD17 and QM9. Experimental results demonstrate the efficacy of our method against current state-of-the-art pretraining approaches, and verify the validity of our design for each proposed component. Code is available at https://github.com/jiaor17/3D-EMGP.

IJCAI Conference 2023 Conference Paper

Fairness via Group Contribution Matching

  • Tianlin Li
  • Zhiming Li
  • Anran Li
  • Mengnan Du
  • Aishan Liu
  • Qing Guo
  • Guozhu Meng
  • Yang Liu

Fairness issues in Deep Learning models have recently received increasing attention due to their significant societal impact. Although methods for mitigating unfairness are constantly proposed, little research has been conducted to understand how discrimination and bias develop during the standard training process. In this study, we propose analyzing the contribution of each subgroup (i. e. , a group of data with the same sensitive attribute) in the training process to understand the cause of such bias development process. We propose a gradient-based metric to assess training subgroup contribution disparity, showing that unequal contributions from different subgroups are one source of such unfairness. One way to balance the contribution of each subgroup is through oversampling, which ensures that an equal number of samples are drawn from each subgroup during each training iteration. However, we have found that even with a balanced number of samples, the contribution of each group remains unequal, resulting in unfairness under the oversampling strategy. To address the above issues, we propose an easy but effective group contribution matching (GCM) method to match the contribution of each subgroup. Our experiments show that our GCM effectively improves fairness and outperforms other methods significantly.

NeurIPS Conference 2023 Conference Paper

Graph Contrastive Learning with Stable and Scalable Spectral Encoding

  • Deyu Bo
  • Yuan Fang
  • Yang Liu
  • Chuan Shi

Graph contrastive learning (GCL) aims to learn representations by capturing the agreements between different graph views. Traditional GCL methods generate views in the spatial domain, but it has been recently discovered that the spectral domain also plays a vital role in complementing spatial views. However, existing spectral-based graph views either ignore the eigenvectors that encode valuable positional information or suffer from high complexity when trying to address the instability of spectral features. To tackle these challenges, we first design an informative, stable, and scalable spectral encoder, termed EigenMLP, to learn effective representations from the spectral features. Theoretically, EigenMLP is invariant to the rotation and reflection transformations on eigenvectors and robust against perturbations. Then, we propose a spatial-spectral contrastive framework (Sp$^{2}$GCL) to capture the consistency between the spatial information encoded by graph neural networks and the spectral information learned by EigenMLP, thus effectively fusing these two graph views. Experiments on the node- and graph-level datasets show that our method not only learns effective graph representations but also achieves a 2--10x speedup over other spectral-based methods.

AAAI Conference 2023 Conference Paper

Human Mobility Modeling during the COVID-19 Pandemic via Deep Graph Diffusion Infomax

  • Yang Liu
  • Yu Rong
  • Zhuoning Guo
  • Nuo Chen
  • Tingyang Xu
  • Fugee Tsung
  • Jia Li

Non-Pharmaceutical Interventions (NPIs), such as social gathering restrictions, have shown effectiveness to slow the transmission of COVID-19 by reducing the contact of people. To support policy-makers, multiple studies have first modelled human mobility via macro indicators (e.g., average daily travel distance) and then study the effectiveness of NPIs. In this work, we focus on mobility modelling and, from a micro perspective, aim to predict locations that will be visited by COVID-19 cases. Since NPIs generally cause economic and societal loss, such a prediction benefits governments when they design and evaluate them. However, in real-world situations, strict privacy data protection regulations result in severe data sparsity problems (i.e., limited case and location information). To address these challenges and jointly model variables including a geometric graph, a set of diffusions and a set of locations, we propose a model named Deep Graph Diffusion Infomax (DGDI). We show the maximization of DGDI can be bounded by two tractable components: a univariate Mutual Information (MI) between geometric graph and diffusion representation, and a univariate MI between diffusion representation and location representation. To facilitate the research of COVID-19 prediction, we present two benchmarks that contain geometric graphs and location histories of COVID-19 cases. Extensive experiments on the two benchmarks show that DGDI significantly outperforms other competing methods.

EAAI Journal 2023 Journal Article

Imbalanced data classification: Using transfer learning and active sampling

  • Yang Liu
  • Guoping Yang
  • Shaojie Qiao
  • Meiqi Liu
  • Lulu Qu
  • Nan Han
  • Guan Yuan
  • Tao Wu

Recently, deep learning models have made great breakthroughs in the field of computer vision, relying on large-scale class-balanced datasets. However, most of them do not consider the class-imbalanced data. In reality, the class-imbalanced distribution can lead to the degradation of model performance, reducing the generalization of these models. In addition, in the era of big data, many applications need to use real-time visual data. These data come from different mobile devices, which continuously generate a huge number of visual data. However, there are few studies using real-time data from information systems, real-time data is easy to capture but difficult to use. In order to solve the above problems, we propose a new model (Transfer Learning Classifier, TLC) based on transfer learning to deal with class-imbalanced data. The model includes active sampling module, real-time data augmentation module and DenseNet module. Among them, (1) the newly proposed active sampling module can dynamically adjust the number of samples with skewed distribution; (2) the data augmentation module can expand the real-time data to avoid over-fitting and insufficient data; (3) the DenseNet module is a standard DenseNet network pre-trained on the ImageNet dataset and transferred to TLC for relearning, and then we adjust the memory usage of the standard DenseNet to make it more efficient. In addition, we have applied a new end-to-end real-time data storage and analysis system. A large number of experiments have been carried out on four different long mantissa data sets. Experimental results show that the proposed TLC model can effectively deal with the static data as well as the real-time data, and the classification effect of imbalanced data is better than that of existing models.

AAAI Conference 2023 Conference Paper

Improving Training and Inference of Face Recognition Models via Random Temperature Scaling

  • Lei Shang
  • Mouxiao Huang
  • Wu Shi
  • Yuchen Liu
  • Yang Liu
  • Wang Steven
  • Baigui Sun
  • Xuansong Xie

Data uncertainty is commonly observed in the images for face recognition (FR). However, deep learning algorithms often make predictions with high confidence even for uncertain or irrelevant inputs. Intuitively, FR algorithms can benefit from both the estimation of uncertainty and the detection of out-of-distribution (OOD) samples. Taking a probabilistic view of the current classification model, the temperature scalar is exactly the scale of uncertainty noise implicitly added in the softmax function. Meanwhile, the uncertainty of images in a dataset should follow a prior distribution. Based on the observation, a unified framework for uncertainty modeling and FR, Random Temperature Scaling (RTS), is proposed to learn a reliable FR algorithm. The benefits of RTS are two-fold. (1) In the training phase, it can adjust the learning strength of clean and noisy samples for stability and accuracy. (2) In the test phase, it can provide a score of confidence to detect uncertain, low-quality and even OOD samples, without training on extra labels. Extensive experiments on FR benchmarks demonstrate that the magnitude of variance in RTS, which serves as an OOD detection metric, is closely related to the uncertainty of the input image. RTS can achieve top performance on both the FR and OOD detection tasks. Moreover, the model trained with RTS can perform robustly on datasets with noise. The proposed module is light-weight and only adds negligible computation cost to the model.

IJCAI Conference 2023 Conference Paper

Incentivizing Recourse through Auditing in Strategic Classification

  • Andrew Estornell
  • Yatong Chen
  • Sanmay Das
  • Yang Liu
  • Yevgeniy Vorobeychik

The increasing automation of high-stakes decisions with direct impact on the lives and well-being of individuals raises a number of important considerations. Prominent among these is strategic behavior by individuals hoping to achieve a more desirable outcome. Two forms of such behavior are commonly studied: 1) misreporting of individual attributes, and 2) recourse, or actions that truly change such attributes. The former involves deception, and is inherently undesirable, whereas the latter may well be a desirable goal insofar as it changes true individual qualification. We study misreporting and recourse as strategic choices by individuals within a unified framework. In particular, we propose auditing as a means to incentivize recourse actions over attribute manipulation, and characterize optimal audit policies for two types of principals, utility-maximizing and recourse-maximizing. Additionally, we consider subsidies as an incentive for recourse over manipulation, and show that even a utility-maximizing principal would be willing to devote a considerable amount of audit budget to providing such subsidies. Finally, we consider the problem of optimizing fines for failed audits, and bound the total cost incurred by the population as a result of audits.

EAAI Journal 2023 Journal Article

Intelligent multiobjective optimization for high-performance concrete mix proportion design: A hybrid machine learning approach

  • Sai Yang
  • Hongyu Chen
  • Zongbao Feng
  • Yawei Qin
  • Jian Zhang
  • Yuan Cao
  • Yang Liu

The concrete mix proportion design process is complex but important, especially in cold, ocean, underground and other complex engineering environments. In this study, a hybrid intelligent optimization method based on the random forest (RF), recursive feature elimination (RFE), Bayesian optimization (BO), least squares support vector machine (LSSVM) and nondominated sorting genetic algorithm (NGSA)-III was proposed to optimize the concrete mix proportion and rapidly and accurately predict the frost resistance, chloride ion penetration resistance and concrete strength (CS). Adopting a key project in Jilin Province as an example, the RF-RFE-BO-LSSVM-NSGA-III algorithm achieved a significant optimization effect in terms of the chloride ion permeability coefficient (CIPC), relative dynamic elastic modulus (RDEM) and 28-day CS. After optimization, the chloride ion penetration resistance, frost resistance and CS increased by 34. 6%, 4. 1% and 3. 7%, respectively, over the average levels of the sample data. This study can provide basis for concrete mix proportion design in complex environment.

TMLR Journal 2023 Journal Article

Learning to Incentivize Improvements from Strategic Agents

  • Yatong Chen
  • Jialu Wang
  • Yang Liu

Machine learning systems are often used in settings where individuals adapt their features to obtain a desired outcome. In such settings, strategic behavior leads to a sharp loss in model performance in deployment. In this work, we aim to address this problem by learning classifiers that encourage decision subjects to change their features in a way that leads to improvement in both predicted and true outcome. We frame the dynamics of prediction and adaptation as a two-stage game, and characterize optimal strategies for the model designer and its decision subjects. In benchmarks on simulated and real-world datasets, we find that classifiers trained using our method maintain the accuracy of existing approaches while inducing higher levels of improvement and less manipulation.

NeurIPS Conference 2023 Conference Paper

Long-Term Fairness with Unknown Dynamics

  • Tongxin Yin
  • Reilly Raab
  • Mingyan Liu
  • Yang Liu

While machine learning can myopically reinforce social inequalities, it may also be used to dynamically seek equitable outcomes. In this paper, we formalize long-term fairness as an online reinforcement learning problem for a policy affecting human populations. This formulation accommodates dynamical control objectives, such as achieving equitable population states, that cannot be incorporated into static formulations of fairness. We demonstrate that algorithmic solutions to the proposed fairness problem can adapt to unknown dynamics and, by sacrificing short-term incentives, drive the policy-population system towards more desirable equilibria. For the proposed setting, we develop an algorithm that adapts recent work in online learning and prove that this algorithm achieves simultaneous probabilistic bounds on cumulative loss and cumulative violations of fairness. In the classification setting subject to group fairness, we compare our proposed algorithm to several baselines, including the repeated retraining of myopic or distributionally robust classifiers, and to a deep reinforcement learning algorithm that lacks fairness guarantees. Our experiments model human populations according to evolutionary game theory and integrate real-world datasets.

EAAI Journal 2023 Journal Article

LOSN: Lightweight ore sorting networks for edge device environment

  • Yang Liu
  • Xueyi Wang
  • Zelin Zhang
  • Fang Deng

Vision-based intelligent ore sorting technology has been widely applied in current mining production, a trend further facilitated by the emergence of deep learning. However, most available implementations are still based on image classification, i. e. , dividing the overall sorting task into two processes: classification and localization, without end-to-end integration. Meanwhile, harsh sorting scenarios make edge computing devices the primary candidate for model deployment, with more stringent limitations for model size, computational complexity, and inference speed. Therefore, this study proposes to integrate the operating processes to locate and classify the ores particles simultaneously. The lightweight structures, attention mechanisms, and multi-scale feature fusion strategies are applied in the architecture design to meet the deployment requirements of edge device environments and achieve a preferred accuracy–efficiency tradeoff, which leads to a new lightweight ore sorting networks called LOSN. In the case study, LOSN has the highest accuracy in multi-type and multi-class ore sorting tasks (78. 87% and 80. 64% in the gas coal and anthracite dataset, respectively) with fewer parameters (5. 970M), lower GFLOPs (6. 829G) and higher FPS (89. 92), which is superior to commonly used high-performance object detection architectures (e. g. , Yolo series, EfficientDet, Faster-RCNN, and CenterNet). Grad-CAM visualizations also demonstrate the feature extraction capability of LOSN.

NeurIPS Conference 2023 Conference Paper

Model Sparsity Can Simplify Machine Unlearning

  • Jinghan Jia
  • Jiancheng Liu
  • Parikshit Ram
  • Yuguang Yao
  • Gaowen Liu
  • Yang Liu
  • Pranay Sharma
  • Sijia Liu

In response to recent data regulation requirements, machine unlearning (MU) has emerged as a critical process to remove the influence of specific examples from a given model. Although exact unlearning can be achieved through complete model retraining using the remaining dataset, the associated computational costs have driven the development of efficient, approximate unlearning techniques. Moving beyond data-centric MU approaches, our study introduces a novel model-based perspective: model sparsification via weight pruning, which is capable of reducing the gap between exact unlearning and approximate unlearning. We show in both theory and practice that model sparsity can boost the multi-criteria unlearning performance of an approximate unlearner, closing the approximation gap, while continuing to be efficient. This leads to a new MU paradigm, termed prune first, then unlearn, which infuses a sparse prior to the unlearning process. Building on this insight, we also develop a sparsity-aware unlearning method that utilizes sparsity regularization to enhance the training process of approximate unlearning. Extensive experiments show that our proposals consistently benefit MU in various unlearning scenarios. A notable highlight is the 77% unlearning efficacy gain of fine-tuning (one of the simplest approximate unlearning methods) when using our proposed sparsity-aware unlearning method. Furthermore, we showcase the practical impact of our proposed MU methods through two specific use cases: defending against backdoor attacks, and enhancing transfer learning through source class removal. These applications demonstrate the versatility and effectiveness of our approaches in addressing a variety of machine learning challenges beyond unlearning for data privacy. Codes are available at https: //github. com/OPTML-Group/Unlearn-Sparse.

EAAI Journal 2023 Journal Article

MSE-Fusion: Weakly supervised medical image fusion with modal synthesis and enhancement

  • Lifang Wang
  • Yang Liu
  • Jia Mi
  • Jiong Zhang

Existing multi-modal image fusion methods utilize multi-modal images as input that require multiple imaging of patients causing harm to patients’ bodies and large costs, moreover, image fusion needs a large number of registered images which is time-consuming and difficult to get, and has unclear texture and structure of the fused images. Therefore, a weakly supervised medical image fusion method with modal synthesis and enhancement is proposed. In modal synthesis, a weakly supervised approach is used to train the model to decrease the requirements of registered images, and MR images are used as input to synthesize CT images through a deep-structure and shallow-detail generator by training to reduce the required input modal and make the texture and structure clearer. In image enhancement, MR images are passed through a trained generator to generate enhanced MR images which enhance the texture and structure of the MR images. And then using the synthesized CT and enhanced MR images together with the original PET images as input to achieve tri-modal image fusion. Compared with 13 state-of-the-art modal synthesis and image fusion methods on the same datasets, the performance of the proposed method on 7 objective evaluation metrics is significantly improved. The subjective visual effect and objective evaluation metrics of our method are better than those of the compared image fusion methods.

EAAI Journal 2023 Journal Article

Neurodynamics-based configuration transformation with engineering application to robot manipulators using two intelligent approaches

  • Boyu Ma
  • Zongwu Xie
  • Xiaohang Yang
  • Yang Liu
  • Zhengpu Wang
  • Zainan Jiang

Before performing various tasks, it is important to transform the robot manipulator from current configuration to the desired initial configuration. Therefore, this paper proposes a jerk-level configuration transformation (JCT) strategy based on neurodynamics, avoiding the joint angle, velocity, acceleration, and jerk physical limits simultaneously. Moreover, two intelligent approaches are designed to solve the JCT strategy, namely dynamic recurrent neural network and intelligent iterative optimizer, and the convergence and computational complexity are analyzed theoretically. In terms of superiority, the JCT strategy is compared with other typical strategies by simulations of a 7-degree-of-freedom manipulator. In experimental verification for engineering applications, the JCT strategy is applied to the space manipulator on the China Space Station, and verified on the ground hardware-in-the-loop experimental system to demonstrate the effectiveness and physical realizability.

IROS Conference 2023 Conference Paper

OA-Bug: An Olfactory-Auditory Augmented Bug Algorithm for Swarm Robots in a Denied Environment

  • Siqi Tan
  • Xiaoya Zhang
  • Jingyao Li
  • Ruitao Jing
  • Mufan Zhao
  • Yang Liu
  • Quan Quan

Searching in a denied environment is challenging for swarm robots as no assistance from GNSS, mapping, data sharing, and central processing is allowed. However, using olfactory and auditory signals to cooperate like animals could be an important way to improve the collaboration of swarm robots. In this paper, an Olfactory-Auditory augmented Bug algorithm (OA-Bug) is proposed for a swarm of autonomous robots to explore a denied environment. A simulation environment is built to measure the performance of OA-Bug. The coverage of the search task can reach 96. 93% using OA-Bug, which is significantly improved compared with a similar algorithm, SGBA [1]. Furthermore, experiments are conducted on real swarm robots to prove the validity of OA-Bug. Results show that OA-Bug can improve the performance of swarm robots in a denied environment. Video: https://youtu.be/vj9cRiSmgeM.

EAAI Journal 2023 Journal Article

Optimization of high-performance concrete mix ratio design using machine learning

  • Bin Chen
  • Lei Wang
  • Zongbao Feng
  • Yang Liu
  • Xianguo Wu
  • Yawei Qin
  • Lingyu Xia

High-durability concrete is required in extremely cold or ocean environments, making the design of concrete mixes highly important and complicated. In this study, a hybrid intelligent framework for multi-objective optimization based on random forest (RF) and the non-dominated sorting genetic algorithm version II (NSGA-II) is developed to efficiently predict concrete durability and optimize the concrete mix ratio. The relative dynamic elastic modulus of concrete after 300 freeze–thaw cycles and the chloride ion permeability coefficient at 28 days are defined as the standard measures of durability. The concrete mix ratio is taken as the influencing parameter, and orthogonal test data and engineering practice data are collected as the datasets. The proposed framework is applied to a realistic expressway project in a cold region of China. The results demonstrate that (1) a hybrid intelligent framework based on RF-NSGA-II can effectively predict concrete durability and optimize the mix ratio. (2) The developed RF model has an excellent regression learning ability, while the goodness of fit (R2) of concrete durability reaches 0. 9503 and 0. 9551, respectively, with root mean square error (RMSE) values of only 0. 096 and 0. 043, the mean absolute percentage error (MAPE) values of 2. 54% and 2. 17%. (3) After optimization, the concrete durability reaches a high standard, with a frost resistance of >95% and a chloride ion permeability coefficient of <3*10 − 8 cm2/s, at a unit volume cost of only 376. 77 yuan. Hence, the proposed framework can be used to effectively optimize the concrete mix design and provide guidance for similar projects.

AAAI Conference 2023 Conference Paper

Phrase-Level Temporal Relationship Mining for Temporal Sentence Localization

  • Minghang Zheng
  • Sizhe Li
  • Qingchao Chen
  • Yuxin Peng
  • Yang Liu

In this paper, we address the problem of video temporal sentence localization, which aims to localize a target moment from videos according to a given language query. We observe that existing models suffer from a sheer performance drop when dealing with simple phrases contained in the sentence. It reveals the limitation that existing models only capture the annotation bias of the datasets but lack sufficient understanding of the semantic phrases in the query. To address this problem, we propose a phrase-level Temporal Relationship Mining (TRM) framework employing the temporal relationship relevant to the phrase and the whole sentence to have a better understanding of each semantic entity in the sentence. Specifically, we use phrase-level predictions to refine the sentence-level prediction, and use Multiple Instance Learning to improve the quality of phrase-level predictions. We also exploit the consistency and exclusiveness constraints of phrase-level and sentence-level predictions to regularize the training process, thus alleviating the ambiguity of each phrase prediction. The proposed approach sheds light on how machines can understand detailed phrases in a sentence and their compositions in their generality rather than learning the annotation biases. Experiments on the ActivityNet Captions and Charades-STA datasets show the effectiveness of our method on both phrase and sentence temporal localization and enable better model interpretability and generalization when dealing with unseen compositions of seen concepts. Code can be found at https://github.com/minghangz/TRM.

EAAI Journal 2023 Journal Article

Safety evaluation of buildings adjacent to shield construction in karst areas: An improved extension cloud approach

  • Hongyu Chen
  • Sai Yang
  • Zongbao Feng
  • Yang Liu
  • Yawei Qin

To accurately evaluate the safety risk status of buildings adjacent to karst shield construction areas, a safety evaluation standard for buildings adjacent to shield construction in karst areas and a safety risk assessment method based on optimal cloud entropy are proposed. Comprehensive consideration of the tunnel characteristics, geological conditions, building conditions, construction, management and other influencing factors, a risk evaluation index system including 4 level-II indicators and 15 level-III indicators and evaluation criteria are established for buildings adjacent to shield construction in karst areas. The traditional extension cloud theory is improved based on the optimal cloud entropy calculation method for adaptive evaluation objects, and the clarity and fuzziness of index classification are considered. To verify the applicability of the proposed approach, it was applied to ten adjacent buildings in a karst geological section of Guiyang Rail transit Line 3. The results show that (a) the evaluation standard and the improved extended cloud safety risk assessment method proposed can effectively consider the uncertainty of risk events and that the evaluation results are consistent with the actual building safety risk status information with the calculated reliability factor of each building is close to 1. (b) The key risk factors are identified through sensitivity analysis. According to the key risk factors and risk statuses, effective measures can be taken, and high-risk buildings can be monitored to maintain a safe control state. Thus, the proposed approach can be feasibly used in various applications and can provide guidance for other similar projects.

NeurIPS Conference 2023 Conference Paper

Sparse Modular Activation for Efficient Sequence Modeling

  • Liliang Ren
  • Yang Liu
  • Shuohang Wang
  • Yichong Xu
  • Chenguang Zhu
  • Cheng Xiang Zhai

Recent hybrid models combining Linear State Space Models (SSMs) with self-attention mechanisms have demonstrated impressive results across a range of sequence modeling tasks. However, current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs. To address this limitation, we introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely and dynamically activate sub-modules for sequence elements in a differentiable manner. Through allowing each element to skip non-activated sub-modules, SMA reduces computation and memory consumption of neural networks at both training and inference stages. To validate the effectiveness of SMA on sequence modeling, we design a novel neural architecture, SeqBoat, which employs SMA to sparsely activate a Gated Attention Unit (GAU) based on the state representations learned from an SSM. By constraining the GAU to only conduct local attention on the activated inputs, SeqBoat can achieve linear inference complexity with theoretically infinite attention span, and provide substantially better quality-efficiency trade-off than the chunking-based models. With experiments on a wide range of tasks, including long sequence modeling, speech classification and language modeling, SeqBoat brings new state-of-the-art results among hybrid models with linear complexity, and reveals the amount of attention needed for each task through the learned sparse activation patterns. Our code is publicly available at https: //github. com/renll/SeqBoat.

ICML Conference 2023 Conference Paper

Tensor Gaussian Process with Contraction for Multi-Channel Imaging Analysis

  • Hu Sun
  • Ward Manchester
  • Meng Jin
  • Yang Liu
  • Yang Chen

Multi-channel imaging data is a prevalent data format in scientific fields such as astronomy and biology. The structured information and the high dimensionality of these 3-D tensor data makes the analysis an intriguing but challenging topic for statisticians and practitioners. The low-rank scalar-on-tensor regression model, in particular, has received widespread attention and has been re-formulated as a tensor Gaussian Process (Tensor-GP) model with multi-linear kernel in Yu et al. (2018). In this paper, we extend the Tensor-GP model by introducing an integrative dimensionality reduction technique, called tensor contraction, with a Tensor-GP for a scalar-on-tensor regression task with multi-channel imaging data. This is motivated by the solar flare forecasting problem with high dimensional multi-channel imaging data. We first estimate a latent, reduced-size tensor for each data tensor and then apply a multi-linear Tensor-GP on the latent tensor data for prediction. We introduce an anisotropic total-variation regularization when conducting the tensor contraction to obtain a sparse and smooth latent tensor. We then propose an alternating proximal gradient descent algorithm for estimation. We validate our approach via extensive simulation studies and applying it to the solar flare forecasting problem.

IJCAI Conference 2023 Conference Paper

The Importance of Human-Labeled Data in the Era of LLMs

  • Yang Liu

The advent of large language models (LLMs) has brought about a revolution in the development of tailored machine learning models and sparked debates on redefining data requirements. The automation facilitated by the training and implementation of LLMs has led to discussions and aspirations that human-level labeling interventions may no longer hold the same level of importance as in the era of supervised learning. This paper presents compelling arguments supporting the ongoing relevance of human-labeled data in the era of LLMs.

AAAI Conference 2023 Conference Paper

Towards Credible Human Evaluation of Open-Domain Dialog Systems Using Interactive Setup

  • Sijia Liu
  • Patrick Lange
  • Behnam Hedayatnia
  • Alexandros Papangelis
  • Di Jin
  • Andrew Wirth
  • Yang Liu
  • Dilek Hakkani-Tur

Evaluating open-domain conversation models has been an open challenge due to the open-ended nature of conversations. In addition to static evaluations, recent work has started to explore a variety of per-turn and per-dialog interactive evaluation mechanisms and provide advice on the best setup. In this work, we adopt the interactive evaluation framework and further apply to multiple models with a focus on per-turn evaluation techniques. Apart from the widely used setting where participants select the best response among different candidates at each turn, one more novel per-turn evaluation setting is adopted, where participants can select all appropriate responses with different fallback strategies to continue the conversation when no response is selected. We evaluate these settings based on sensitivity and consistency using four GPT2-based models that differ in model sizes or fine-tuning data. To better generalize to any model groups with no prior assumptions on their rankings and control evaluation costs for all setups, we also propose a methodology to estimate the required sample size given a minimum performance gap of interest before running most experiments. Our comprehensive human evaluation results shed light on how to conduct credible human evaluations of open domain dialog systems using the interactive setup, and suggest additional future directions.

NeurIPS Conference 2023 Conference Paper

Uncertainty-Aware Instance Reweighting for Off-Policy Learning

  • Xiaoying Zhang
  • Junpu Chen
  • Hongning Wang
  • Hong Xie
  • Yang Liu
  • John C. S. Lui
  • Hang Li

Off-policy learning, referring to the procedure of policy optimization with access only to logged feedback data, has shown importance in various important real-world applications, such as search engines and recommender systems. While the ground-truth logging policy is usually unknown, previous work simply takes its estimated value for the off-policy learning, ignoring the negative impact from both high bias and high variance resulted from such an estimator. And these impact is often magnified on samples with small and inaccurately estimated logging probabilities. The contribution of this work is to explicitly model the uncertainty in the estimated logging policy, and propose an Uncertainty-aware Inverse Propensity Score estimator (UIPS) for improved off-policy learning, with a theoretical convergence guarantee. Experiment results on the synthetic and real-world recommendation datasets demonstrate that UIPS significantly improves the quality of the discovered policy, when compared against an extensive list of state-of-the-art baselines.

NeurIPS Conference 2023 Conference Paper

Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning

  • Hongyu Zang
  • Xin Li
  • Leiji Zhang
  • Yang Liu
  • Baigui Sun
  • Riashat Islam
  • Remi Tachet des Combes
  • Romain Laroche

While bisimulation-based approaches hold promise for learning robust state representations for Reinforcement Learning (RL) tasks, their efficacy in offline RL tasks has not been up to par. In some instances, their performance has even significantly underperformed alternative methods. We aim to understand why bisimulation methods succeed in online settings, but falter in offline tasks. Our analysis reveals that missing transitions in the dataset are particularly harmful to the bisimulation principle, leading to ineffective estimation. We also shed light on the critical role of reward scaling in bounding the scale of bisimulation measurements and of the value error they induce. Based on these findings, we propose to apply the expectile operator for representation learning to our offline RL setting, which helps to prevent overfitting to incomplete data. Meanwhile, by introducing an appropriate reward scaling strategy, we avoid the risk of feature collapse in representation space. We implement these recommendations on two state-of-the-art bisimulation-based algorithms, MICo and SimSR, and demonstrate performance gains on two benchmark suites: D4RL and Visual D4RL. Codes are provided at \url{https: //github. com/zanghyu/Offline_Bisimulation}.

AAAI Conference 2023 Conference Paper

Unsupervised Explanation Generation via Correct Instantiations

  • Sijie Cheng
  • Zhiyong Wu
  • Jiangjie Chen
  • Zhixing Li
  • Yang Liu
  • Lingpeng Kong

While large pre-trained language models (PLM) have shown their great skills at solving discriminative tasks, a significant gap remains when compared with humans for explanation-related tasks. Among them, explaining the reason why a statement is wrong (e.g., against commonsense) is incredibly challenging. The major difficulty is finding the conflict point, where the statement contradicts our real world. This paper proposes Neon, a two-phrase, unsupervised explanation generation framework. Neon first generates corrected instantiations of the statement (phase I), then uses them to prompt large PLMs to find the conflict point and complete the explanation (phase II). We conduct extensive experiments on two standard explanation benchmarks, i.e., ComVE and e-SNLI. According to both automatic and human evaluations, Neon outperforms baselines, even for those with human-annotated instantiations. In addition to explaining a negative prediction, we further demonstrate that Neon remains effective when generalizing to different scenarios. The resources of Neon are available at: https://github.com/Shark-NLP/Neon.

NeurIPS Conference 2022 Conference Paper

A Closer Look at the Adversarial Robustness of Deep Equilibrium Models

  • Zonghan Yang
  • Tianyu Pang
  • Yang Liu

Deep equilibrium models (DEQs) refrain from the traditional layer-stacking paradigm and turn to find the fixed point of a single layer. DEQs have achieved promising performance on different applications with featured memory efficiency. At the same time, the adversarial vulnerability of DEQs raises concerns. Several works propose to certify robustness for monotone DEQs. However, limited efforts are devoted to studying empirical robustness for general DEQs. To this end, we observe that an adversarially trained DEQ requires more forward steps to arrive at the equilibrium state, or even violates its fixed-point structure. Besides, the forward and backward tracks of DEQs are misaligned due to the black-box solvers. These facts cause gradient obfuscation when applying the ready-made attacks to evaluate or adversarially train DEQs. Given this, we develop approaches to estimate the intermediate gradients of DEQs and integrate them into the attacking pipelines. Our approaches facilitate fully white-box evaluations and lead to effective adversarial defense for DEQs. Extensive experiments on CIFAR-10 validate the adversarial robustness of DEQs competitive with deep networks of similar sizes.

IROS Conference 2022 Conference Paper

A Deep-Learning-based System for Indoor Active Cleaning

  • Yike Yun
  • Linjie Hou
  • Zijian Feng
  • Wei Jin
  • Yang Liu
  • Heng Wang
  • Ruonan He
  • Weitao Guo

Cleaning public areas like commercial complexes is challenging due to their sophisticated surroundings and the vast kinds of real-life dirt. Robots are required to distinguish dirts and apply corresponding cleaning strategies. In this work, we proposed an active-cleaning framework by utilizing deep-learning methods for both solid wastes detection and liquid stains segmentation. Our system consists of 4 components: a Perception module integrated with deep-learning models, a Post-processing module for projection, a Tracking module for map localization, and a Planning and Control module for cleaning strategies. Compared with classic approaches, our vision-based system significantly improves cleaning efficiency. Besides, we released the largest real-world indoor hybrid dirt cleaning dataset (HD10K) containing 10K labeled images, together with a track-level evaluation metric for better cleaning performance measurement. The proposed deep-learning based system is verified with extensive experiments on our dataset, and deployed to Gaussian Robotics's robots operating globally. Dataset is available at: https://gaussianopensource.github.io/projects/active_cleaning.

AAAI Conference 2022 Conference Paper

A Label Dependence-Aware Sequence Generation Model for Multi-Level Implicit Discourse Relation Recognition

  • Changxing Wu
  • Liuwen Cao
  • Yubin Ge
  • Yang Liu
  • Min Zhang
  • Jinsong Su

Implicit discourse relation recognition (IDRR) is a challenging but crucial task in discourse analysis. Most existing methods train multiple models to predict multi-level labels independently, while ignoring the dependence between hierarchically structured labels. In this paper, we consider multi-level IDRR as a conditional label sequence generation task and propose a Label Dependence-aware Sequence Generation Model (LDSGM) for it. Specifically, we first design a label attentive encoder to learn the global representation of an input instance and its level-specific contexts, where the label dependence is integrated to obtain better label embeddings. Then, we employ a label sequence decoder to output the predicted labels in a top-down manner, where the predicted higherlevel labels are directly used to guide the label prediction at the current level. We further develop a mutual learning enhanced training method to exploit the label dependence in a bottom-up direction, which is captured by an auxiliary decoder introduced during training. Experimental results on the PDTB dataset show that our model achieves the state-of-theart performance on multi-level IDRR. We release our code at https: //github. com/nlpersECJTU/LDSGM.

NeurIPS Conference 2022 Conference Paper

A Variant of Anderson Mixing with Minimal Memory Size

  • Fuchao Wei
  • Chenglong Bao
  • Yang Liu
  • Guangwen Yang

Anderson mixing (AM) is a useful method that can accelerate fixed-point iterations by exploring the information from historical iterations. Despite its numerical success in various applications, the memory requirement in AM remains a bottleneck when solving large-scale optimization problems in a resource-limited machine. To address this problem, we propose a novel variant of AM method, called Min-AM, by storing only one vector pair, that is the minimal memory size requirement in AM. Our method forms a symmetric approximation to the inverse Hessian matrix and is proved to be equivalent to the full-memory Type-I AM for solving strongly convex quadratic optimization. Moreover, for general nonlinear optimization problems, we establish the convergence properties of Min-AM under reasonable assumptions and show that the mixing parameters can be adaptively chosen by estimating the eigenvalues of the Hessian. Finally, we extend Min-AM to solve stochastic programming problems. Experimental results on logistic regression and network training problems validate the effectiveness of the proposed Min-AM.

NeurIPS Conference 2022 Conference Paper

Adaptive Data Debiasing through Bounded Exploration

  • Yifan Yang
  • Yang Liu
  • Parinaz Naghizadeh

Biases in existing datasets used to train algorithmic decision rules can raise ethical and economic concerns due to the resulting disparate treatment of different groups. We propose an algorithm for sequentially debiasing such datasets through adaptive and bounded exploration in a classification problem with costly and censored feedback. Exploration in this context means that at times, and to a judiciously-chosen extent, the decision maker deviates from its (current) loss-minimizing rule, and instead accepts some individuals that would otherwise be rejected, so as to reduce statistical data biases. Our proposed algorithm includes parameters that can be used to balance between the ultimate goal of removing data biases -- which will in turn lead to more accurate and fair decisions, and the exploration risks incurred to achieve this goal. We analytically show that such exploration can help debias data in certain distributions. We further investigate how fairness criteria can work in conjunction with our data debiasing algorithm. We illustrate the performance of our algorithm using experiments on synthetic and real-world datasets.

JBHI Journal 2022 Journal Article

An Expectation Maximization Based Adaptive Group Testing Method for Improving Efficiency and Sensitivity of Large-Scale Screening of COVID-19

  • Xiaofang Xia
  • Yang Liu
  • Bo Yang
  • Yingfan Liu
  • Jiangtao Cui
  • Yinlong Zhang

The pathogen of the ongoing coronavirus disease 2019 (COVID-19) pandemic is a newly discovered virus called severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Testing individuals for SARS-CoV-2 plays a critical role in containing COVID-19. For saving medical personnel and consumables, many countries are implementing group testing against SARS-CoV-2. However, existing group testing methods have the following limitations: (1) The group size is determined without theoretical analysis, and hence is usually not optimal. This adversely impacts the screening efficiency. (2) These methods neglect the fact that mixing samples together usually leads to substantial dilution of the SARS-CoV-2 virus, which seriously impacts the sensitivity of tests. In this paper, we aim to screen individuals infected with COVID-19 with as few tests as possible, under the premise that the sensitivity of tests is high enough. We propose an eXpectation Maximization based Adaptive Group Testing (XMAGT) method. The basic idea is to adaptively adjust its testing strategy between a group testing strategy and an individual testing strategy such that the expected number of samples identified by a single test is larger. During the screening process, the XMAGT method can estimate the ratio of positive samples. With this ratio, the XMAGT method can determine a group size under which the group testing strategy can achieve a maximal expected number of negative samples and the sensitivity of tests is higher than a user-specified threshold. Experimental results show that the XMAGT method outperforms existing methods in terms of both efficiency and sensitivity.

NeurIPS Conference 2022 Conference Paper

Certifying Some Distributional Fairness with Subpopulation Decomposition

  • Mintong Kang
  • Linyi Li
  • Maurice Weber
  • Yang Liu
  • Ce Zhang
  • Bo Li

Extensive efforts have been made to understand and improve the fairness of machine learning models based on observational metrics, especially in high-stakes domains such as medical insurance, education, and hiring decisions. However, there is a lack of certified fairness considering the end-to-end performance of an ML model. In this paper, we first formulate the certified fairness of an ML model trained on a given data distribution as an optimization problem based on the model performance loss bound on a fairness constrained distribution, which is within bounded distributional distance with the training distribution. We then propose a general fairness certification framework and instantiate it for both sensitive shifting and general shifting scenarios. In particular, we propose to solve the optimization problem by decomposing the original data distribution into analytical subpopulations and proving the convexity of the subproblems to solve them. We evaluate our certified fairness on six real-world datasets and show that our certification is tight in the sensitive shifting scenario and provides non-trivial certification under general shifting. Our framework is flexible to integrate additional non-skewness constraints and we show that it provides even tighter certification under different real-world scenarios. We also compare our certified fairness bound with adapted existing distributional robustness bounds on Gaussian data and demonstrate that our method is significantly tighter.

TIST Journal 2022 Journal Article

Communication-Efficient Federated Learning with Adaptive Quantization

  • Yuzhu Mao
  • Zihao Zhao
  • Guangfeng Yan
  • Yang Liu
  • Tian Lan
  • Linqi Song
  • Wenbo Ding

Federated learning (FL) has attracted tremendous attentions in recent years due to its privacy-preserving measures and great potential in some distributed but privacy-sensitive applications, such as finance and health. However, high communication overloads for transmitting high-dimensional networks and extra security masks remain a bottleneck of FL. This article proposes a communication-efficient FL framework with an Adaptive Quantized Gradient (AQG), which adaptively adjusts the quantization level based on a local gradient’s update to fully utilize the heterogeneity of local data distribution for reducing unnecessary transmissions. In addition, client dropout issues are taken into account and an Augmented AQG is developed, which could limit the dropout noise with an appropriate amplification mechanism for transmitted gradients. Theoretical analysis and experiment results show that the proposed AQG leads to 18% to 50% of additional transmission reduction as compared with existing popular methods, including Quantized Gradient Descent (QGD) and Lazily Aggregated Quantized (LAQ) gradient-based methods without deteriorating convergence properties. Experiments with heterogenous data distributions corroborate a more significant transmission reduction compared with independent identical data distributions. The proposed AQG is robust to a client dropping rate up to 90% empirically, and the Augmented AQG manages to further improve the FL system’s communication efficiency with the presence of moderate-scale client dropouts commonly seen in practical FL scenarios.

JBHI Journal 2022 Journal Article

Customized Federated Learning for Multi-Source Decentralized Medical Image Classification

  • Jeffry Wicaksana
  • Zengqiang Yan
  • Xin Yang
  • Yang Liu
  • Lixin Fan
  • Kwang-Ting Cheng

The performance of deep networks for medical image analysis is often constrained by limited medical data, which is privacy-sensitive. Federated learning (FL) alleviates the constraint by allowing different institutions to collaboratively train a federated model without sharing data. However, the federated model is often suboptimal with respect to the characteristics of each client's local data. Instead of training a single global model, we propose Customized FL (CusFL), for which each client iteratively trains a client-specific/private model based on a federated global model aggregated from all private models trained in the immediate previous iteration. Two overarching strategies employed by CusFL lead to its superior performance: 1) the federated model is mainly for feature alignment and thus only consists of feature extraction layers; 2) the federated feature extractor is used to guide the training of each private model. In that way, CusFL allows each client to selectively learn useful knowledge from the federated model to improve its personalized model. We evaluated CusFL on multi-source medical image datasets for the identification of clinically significant prostate cancer and the classification of skin lesions.

AAAI Conference 2022 Conference Paper

DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization

  • Ming Zhong
  • Yang Liu
  • Yichong Xu
  • Chenguang Zhu
  • Michael Zeng

Dialogue is an essential part of human communication and cooperation. Existing research mainly focuses on short dialogue scenarios in a one-on-one fashion. However, multiperson interactions in the real world, such as meetings or interviews, are frequently over a few thousand words. There is still a lack of corresponding research and powerful tools to understand and process such long dialogues. Therefore, in this work, we present a pre-training framework for long dialogue understanding and summarization. Considering the nature of long conversations, we propose a window-based denoising approach for generative pre-training. For a dialogue, it corrupts a window of text with dialogue-inspired noise, and guides the model to reconstruct this window based on the content of the remaining conversation. Furthermore, to process longer input, we augment the model with sparse attention which is combined with conventional attention in a hybrid manner. We conduct extensive experiments on five datasets of long dialogues, covering tasks of dialogue summarization, abstractive question answering and topic segmentation. Experimentally, we show that our pre-trained model DIALOGLM significantly surpasses the state-of-the-art models across datasets and tasks. Source code and all the pretrained models are available on our GitHub repository (https: //github. com/microsoft/DialogLM).

ICML Conference 2022 Conference Paper

Directed Acyclic Transformer for Non-Autoregressive Machine Translation

  • Fei Huang 0005
  • Hao Zhou 0012
  • Yang Liu
  • Hang Li 0001
  • Minlie Huang

Non-autoregressive Transformers (NATs) significantly reduce the decoding latency by generating all tokens in parallel. However, such independent predictions prevent NATs from capturing the dependencies between the tokens for generating multiple possible translations. In this paper, we propose Directed Acyclic Transfomer (DA-Transformer), which represents the hidden states in a Directed Acyclic Graph (DAG), where each path of the DAG corresponds to a specific translation. The whole DAG simultaneously captures multiple translations and facilitates fast predictions in a non-autoregressive fashion. Experiments on the raw training data of WMT benchmark show that DA-Transformer substantially outperforms previous NATs by about 3 BLEU on average, which is the first NAT model that achieves competitive results with autoregressive Transformers without relying on knowledge distillation.

IJCAI Conference 2022 Conference Paper

Distilling Governing Laws and Source Input for Dynamical Systems from Videos

  • Lele Luan
  • Yang Liu
  • Hao Sun

Distilling interpretable physical laws from videos has led to expanded interest in the computer vision community recently thanks to the advances in deep learning, but still remains a great challenge. This paper introduces an end-to-end unsupervised deep learning framework to uncover the explicit governing equations of dynamics presented by moving object(s), based on recorded videos. Instead in the pixel (spatial) coordinate system of image space, the physical law is modeled in a regressed underlying physical coordinate system where the physical states follow potential explicit governing equations. A numerical integrator-based sparse regression module is designed and serves as a physical constraint to the autoencoder and coordinate system regression, and, in the meanwhile, uncover the parsimonious closed-form governing equations from the learned physical states. Experiments on simulated dynamical scenes show that the proposed method is able to distill closed-form governing equations and simultaneously identify unknown excitation input for several dynamical systems recorded by videos, which fills in the gap in literature where no existing methods are available and applicable for solving this type of problem.

AAAI Conference 2022 Conference Paper

DMN4: Few-Shot Learning via Discriminative Mutual Nearest Neighbor Neural Network

  • Yang Liu
  • Tu Zheng
  • Jie Song
  • Deng Cai
  • Xiaofei He

Few-shot learning (FSL) aims to classify images under lowdata regimes, where the conventional pooled global feature is likely to lose useful local characteristics. Recent work has achieved promising performances by using deep descriptors. They generally take all deep descriptors from neural networks into consideration while ignoring that some of them are useless in classification due to their limited receptive field, e. g. , task-irrelevant descriptors could be misleading and multiple aggregative descriptors from background clutter could even overwhelm the object’s presence. In this paper, we argue that a Mutual Nearest Neighbor (MNN) relation should be established to explicitly select the query descriptors that are most relevant to each task and discard less relevant ones from aggregative clutters in FSL. Specifically, we propose Discriminative Mutual Nearest Neighbor Neural Network (DMN4) for FSL. Extensive experiments demonstrate that our method outperforms the existing state-of-the-arts on both fine-grained and generalized datasets.

AAAI Conference 2022 Conference Paper

Exploring Motion and Appearance Information for Temporal Sentence Grounding

  • Daizong Liu
  • Xiaoye Qu
  • Pan Zhou
  • Yang Liu

This paper addresses temporal sentence grounding. Previous works typically solve this task by learning frame-level video features and align them with the textual information. A major limitation of these works is that they fail to distinguish ambiguous video frames with subtle appearance differences due to frame-level feature extraction. Recently, a few methods adopt Faster R-CNN to extract detailed object features in each frame to differentiate the fine-grained appearance similarities. However, the object-level features extracted by Faster R-CNN suffer from missing motion analysis since the object detection model lacks temporal modeling. To solve this issue, we propose a novel Motion-Appearance Reasoning Network (MARN), which incorporates both motion-aware and appearance-aware object features to better reason object relations for modeling the activity among successive frames. Specifically, we first introduce two individual video encoders to embed the video into corresponding motion-oriented and appearance-aspect object representations. Then, we develop separate motion and appearance branches to learn motionguided and appearance-guided object relations, respectively. At last, both motion and appearance information from two branches are associated to generate more representative features for final grounding. Extensive experiments on two challenging datasets (Charades-STA and TACoS) show that our proposed MARN significantly outperforms previous state-ofthe-art methods by a large margin.

NeurIPS Conference 2022 Conference Paper

Fairness Transferability Subject to Bounded Distribution Shift

  • Yatong Chen
  • Reilly Raab
  • Jialu Wang
  • Yang Liu

Given an algorithmic predictor that is "fair"' on some source distribution, will it still be fair on an unknown target distribution that differs from the source within some bound? In this paper, we study the transferability of statistical group fairness for machine learning predictors (i. e. , classifiers or regressors subject to bounded distribution shift. Such shifts may be introduced by initial training data uncertainties, user adaptation to a deployed predictor, dynamic environments, or the use of pre-trained models in new settings. Herein, we develop a bound that characterizes such transferability, flagging potentially inappropriate deployments of machine learning for socially consequential tasks. We first develop a framework for bounding violations of statistical fairness subject to distribution shift, formulating a generic upper bound for transferred fairness violations as our primary result. We then develop bounds for specific worked examples, focusing on two commonly used fairness definitions (i. e. , demographic parity and equalized odds) and two classes of distribution shift (i. e. , covariate shift and label shift). Finally, we compare our theoretical bounds to deterministic models of distribution shift and against real-world data, finding that we are able to estimate fairness violation bounds in practice, even when simplifying assumptions are only approximately satisfied.

TIST Journal 2022 Journal Article

FedCVT: Semi-supervised Vertical Federated Learning with Cross-view Training

  • Yan Kang
  • Yang Liu
  • Xinle Liang

Federated learning allows multiple parties to build machine learning models collaboratively without exposing data. In particular, vertical federated learning (VFL) enables participating parties to build a joint machine learning model based upon distributed features of aligned samples. However, VFL requires all parties to share a sufficient amount of aligned samples. In reality, the set of aligned samples may be small, leaving the majority of the non-aligned data unused. In this article, we propose Federated Cross-view Training (FedCVT), a semi-supervised learning approach that improves the performance of the VFL model with limited aligned samples. More specifically, FedCVT estimates representations for missing features, predicts pseudo-labels for unlabeled samples to expand the training set, and trains three classifiers jointly based upon different views of the expanded training set to improve the VFL model’s performance. FedCVT does not require parties to share their original data and model parameters, thus preserving data privacy. We conduct experiments on NUS-WIDE, Vehicle, and CIFAR10 datasets. The experimental results demonstrate that FedCVT significantly outperforms vanilla VFL that only utilizes aligned samples. Finally, we perform ablation studies to investigate the contribution of each component of FedCVT to the performance of FedCVT.

NeurIPS Conference 2022 Conference Paper

GALOIS: Boosting Deep Reinforcement Learning via Generalizable Logic Synthesis

  • Yushi Cao
  • Zhiming Li
  • Tianpei Yang
  • Hao Zhang
  • Yan Zheng
  • Yi Li
  • Jianye Hao
  • Yang Liu

Despite achieving superior performance in human-level control problems, unlike humans, deep reinforcement learning (DRL) lacks high-order intelligence (e. g. , logic deduction and reuse), thus it behaves ineffectively than humans regarding learning and generalization in complex problems. Previous works attempt to directly synthesize a white-box logic program as the DRL policy, manifesting logic-driven behaviors. However, most synthesis methods are built on imperative or declarative programming, and each has a distinct limitation, respectively. The former ignores the cause-effect logic during synthesis, resulting in low generalizability across tasks. The latter is strictly proof-based, thus failing to synthesize programs with complex hierarchical logic. In this paper, we combine the above two paradigms together and propose a novel Generalizable Logic Synthesis (GALOIS) framework to synthesize hierarchical and strict cause-effect logic programs. GALOIS leverages the program sketch and defines a new sketch-based hybrid program language for guiding the synthesis. Based on that, GALOIS proposes a sketch-based program synthesis method to automatically generate white-box programs with generalizable and interpretable cause-effect logic. Extensive evaluations on various decision-making tasks with complex logic demonstrate the superiority of GALOIS over mainstream baselines regarding the asymptotic performance, generalizability, and great knowledge reusability across different environments.

TIST Journal 2022 Journal Article

GTG-Shapley: Efficient and Accurate Participant Contribution Evaluation in Federated Learning

  • Zelei Liu
  • Yuanyuan Chen
  • Han Yu
  • Yang Liu
  • Lizhen Cui

Federated Learning (FL) bridges the gap between collaborative machine learning and preserving data privacy. To sustain the long-term operation of an FL ecosystem, it is important to attract high-quality data owners with appropriate incentive schemes. As an important building block of such incentive schemes, it is essential to fairly evaluate participants’ contribution to the performance of the final FL model without exposing their private data. Shapley Value (SV)–based techniques have been widely adopted to provide a fair evaluation of FL participant contributions. However, existing approaches incur significant computation costs, making them difficult to apply in practice. In this article, we propose the Guided Truncation Gradient Shapley (GTG-Shapley) approach to address this challenge. It reconstructs FL models from gradient updates for SV calculation instead of repeatedly training with different combinations of FL participants. In addition, we design a guided Monte Carlo sampling approach combined with within-round and between-round truncation to further reduce the number of model reconstructions and evaluations required. This is accomplished through extensive experiments under diverse realistic data distribution settings. The results demonstrate that GTG-Shapley can closely approximate actual Shapley values while significantly increasing computational efficiency compared with the state-of-the-art, especially under non-i.i.d. settings.

JMLR Journal 2022 Journal Article

Improving Bayesian Network Structure Learning in the Presence of Measurement Error

  • Yang Liu
  • Anthony C. Constantinou
  • Zhigao Guo

Structure learning algorithms that learn the graph of a Bayesian network from observational data often do so by assuming the data correctly reflect the true distribution of the variables. However, this assumption does not hold in the presence of measurement error, which can lead to spurious edges. This is one of the reasons why the synthetic performance of these algorithms often overestimates real-world performance. This paper describes a heuristic algorithm that can be added as an additional learning phase at the end of any structure learning algorithm, and serves as a correction learning phase that removes potential false positive edges. The results show that the proposed correction algorithm successfully improves the graphical score of five well-established structure learning algorithms spanning different classes of learning in the presence of measurement error. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2022. ( edit, beta )

ICML Conference 2022 Conference Paper

Investigating Generalization by Controlling Normalized Margin

  • Alexander R. Farhang
  • Jeremy Bernstein
  • Kushal Tirumala
  • Yang Liu
  • Yisong Yue

Weight norm $\|w\|$ and margin $\gamma$ participate in learning theory via the normalized margin $\gamma/\|w\|$. Since standard neural net optimizers do not control normalized margin, it is hard to test whether this quantity causally relates to generalization. This paper designs a series of experimental studies that explicitly control normalized margin and thereby tackle two central questions. First: does normalized margin always have a causal effect on generalization? The paper finds that no—networks can be produced where normalized margin has seemingly no relationship with generalization, counter to the theory of Bartlett et al. (2017). Second: does normalized margin ever have a causal effect on generalization? The paper finds that yes—in a standard training setup, test performance closely tracks normalized margin. The paper suggests a Gaussian process model as a promising explanation for this behavior.

IJCAI Conference 2022 Conference Paper

Learning Prototype via Placeholder for Zero-shot Recognition

  • Zaiquan Yang
  • Yang Liu
  • Wenjia Xu
  • Chong Huang
  • Lei Zhou
  • Chao Tong

Zero-shot learning (ZSL) aims to recognize unseen classes by exploiting semantic descriptions shared between seen classes and unseen classes. Current methods show that it is effective to learn visual-semantic alignment by projecting semantic embeddings into the visual space as class prototypes. However, such a projection function is only concerned with seen classes. When applied to unseen classes, the prototypes often perform suboptimally due to domain shift. In this paper, we propose to learn prototypes via placeholders, termed LPL, to eliminate the domain shift between seen and unseen classes. Specifically, we combine seen classes to hallucinate new classes which play as placeholders of the unseen classes in the visual and semantic space. Placed between seen classes, the placeholders encourage prototypes of seen classes to be highly dispersed. And more space is spared for the insertion of well-separated unseen ones. Empirically, well-separated prototypes help counteract visual-semantic misalignment caused by domain shift. Furthermore, we exploit a novel semantic-oriented fine-tuning method to guarantee the semantic reliability of placeholders. Extensive experiments on five benchmark datasets demonstrate the significant performance gain of LPL over the state-of-the-art methods.

JBHI Journal 2022 Journal Article

Margin Preserving Self-Paced Contrastive Learning Towards Domain Adaptation for Medical Image Segmentation

  • Zhizhe Liu
  • Zhenfeng Zhu
  • Shuai Zheng
  • Yang Liu
  • Jiayu Zhou
  • Yao Zhao

To bridge the gap between the source and target domains in unsupervised domain adaptation (UDA), the most common strategy puts focus on matching the marginal distributions in the feature space through adversarial learning. However, such category-agnostic global alignment lacks of exploiting the class-level joint distributions, causing the aligned distribution less discriminative. To address this issue, we propose in this paper a novel margin preserving self-paced contrastive Learning (MPSCL) model for cross-modal medical image segmentation. Unlike the conventional construction of contrastive pairs in contrastive learning, the domain-adaptive category prototypes are utilized to constitute the positive and negative sample pairs. With the guidance of progressively refined semantic prototypes, a novel margin preserving contrastive loss is proposed to boost the discriminability of embedded representation space. To enhance the supervision for contrastive learning, more informative pseudo-labels are generated in target domain in a self-paced way, thus benefiting the category-aware distribution alignment for UDA. Furthermore, the domain-invariant representations are learned through joint contrastive learning between the two domains. Extensive experiments on cross-modal cardiac segmentation tasks demonstrate that MPSCL significantly improves semantic segmentation performance, and outperforms a wide variety of state-of-the-art methods by a large margin.

NeurIPS Conference 2022 Conference Paper

Molecule Generation by Principal Subgraph Mining and Assembling

  • Xiangzhe Kong
  • Wenbing Huang
  • Zhixing Tan
  • Yang Liu

Molecule generation is central to a variety of applications. Current attention has been paid to approaching the generation task as subgraph prediction and assembling. Nevertheless, these methods usually rely on hand-crafted or external subgraph construction, and the subgraph assembling depends solely on local arrangement. In this paper, we define a novel notion, principal subgraph that is closely related to the informative pattern within molecules. Interestingly, our proposed merge-and-update subgraph extraction method can automatically discover frequent principal subgraphs from the dataset, while previous methods are incapable of. Moreover, we develop a two-step subgraph assembling strategy, which first predicts a set of subgraphs in a sequence-wise manner and then assembles all generated subgraphs globally as the final output molecule. Built upon graph variational auto-encoder, our model is demonstrated to be effective in terms of several evaluation metrics and efficiency, compared with state-of-the-art methods on distribution learning and (constrained) property optimization tasks.

ICLR Conference 2022 Conference Paper

On Robust Prefix-Tuning for Text Classification

  • Zonghan Yang
  • Yang Liu

Recently, prefix-tuning has gained increasing attention as a parameter-efficient finetuning method for large-scale pretrained language models. The method keeps the pretrained models fixed and only updates the prefix token parameters for each downstream task. Despite being lightweight and modular, prefix-tuning still lacks robustness to textual adversarial attacks. However, most currently developed defense techniques necessitate auxiliary model update and storage, which inevitably hamper the modularity and low storage of prefix-tuning. In this work, we propose a robust prefix-tuning framework that preserves the efficiency and modularity of prefix-tuning. The core idea of our framework is leveraging the layerwise activations of the language model by correctly-classified training data as the standard for additional prefix finetuning. During the test phase, an extra batch-level prefix is tuned for each batch and added to the original prefix for robustness enhancement. Extensive experiments on three text classification benchmarks show that our framework substantially improves robustness over several strong baselines against five textual attacks of different types while maintaining comparable accuracy on clean texts. We also interpret our robust prefix-tuning framework from the optimal control perspective and pose several directions for future research.

AAAI Conference 2022 Conference Paper

Perceptual Quality Assessment of Omnidirectional Images

  • Yuming Fang
  • Liping Huang
  • Jiebin Yan
  • Xuelin Liu
  • Yang Liu

Omnidirectional images, also called 360◦ images, have attracted extensive attention in recent years, due to the rapid development of virtual reality (VR) technologies. During omnidirectional image processing including capture, transmission, consumption, and so on, measuring the perceptual quality of omnidirectional images is highly desired, since it plays a great role in guaranteeing the immersive quality of experience (IQoE). In this paper, we conduct a comprehensive study on the perceptual quality of omnidirectional images from both subjective and objective perspectives. Specifically, we construct the largest so far subjective omnidirectional image quality database, where we consider several key influential elements, i. e. , realistic non-uniform distortion, viewing condition, and viewing behavior, from the user view. In addition to subjective quality scores, we also record head and eye movement data. Besides, we make the first attempt by using the proposed database to train a convolutional neural network (CNN) for blind omnidirectional image quality assessment. To be consistent with the human viewing behavior in the VR device, we extract viewports from each omnidirectional image and incorporate the user viewing conditions naturally in the proposed model. The proposed model is composed of two parts, including a multi-scale CNN-based feature extraction module and a perceptual quality prediction module. The feature extraction module is used to incorporate the multi-scale features, and the perceptual quality prediction module is designed to regress them to perceived quality scores. The experimental results on our database verify that the proposed model achieves the competing performance compared with the state-of-the-art methods.

AAAI Conference 2022 Conference Paper

SCALoss: Side and Corner Aligned Loss for Bounding Box Regression

  • Tu Zheng
  • Shuai Zhao
  • Yang Liu
  • Zili Liu
  • Deng Cai

Bounding box regression is an important component in object detection. Recent work achieves promising performance by optimizing the Intersection over Union (IoU). However, IoU-based loss has the gradient vanish problem in the case of low overlapping bounding boxes, and the model could easily ignore these simple cases. In this paper, we propose Side Overlap (SO) loss by maximizing the side overlap of two bounding boxes, which puts more penalty for low overlapping bounding box cases. Besides, to speed up the convergence, the Corner Distance (CD) is added into the objective function. Combining the Side Overlap and Corner Distance, we get a new regression objective function, Side and Corner Align Loss (SCALoss). The SCALoss is well-correlated with IoU loss, which also benefits the evaluation metric but produces more penalty for low-overlapping cases. It can serve as a comprehensive similarity measure, leading to better localization performance and faster convergence speed. Experiments on COCO, PASCAL VOC, and LVIS benchmarks show that SCALoss can bring consistent improvement and outperform `n loss and IoU based loss with popular object detectors such as YOLOV3, SSD, Faster-RCNN. Code is available at: https: //github. com/Turoad/SCALoss.

IJCAI Conference 2022 Conference Paper

Towards Controlling the Transmission of Diseases: Continuous Exposure Discovery over Massive-Scale Moving Objects

  • Ke Li
  • Lisi Chen
  • Shuo Shang
  • Haiyan Wang
  • Yang Liu
  • Panos Kalnis
  • Bin Yao

Infectious diseases have been recognized as major public health concerns for decades. Close contact discovery is playing an indispensable role in preventing epidemic transmission. In this light, we study the continuous exposure search problem: Given a collection of moving objects and a collection of moving queries, we continuously discover all objects that have been directly and indirectly exposed to at least one query over a period of time. Our problem targets a variety of applications, including but not limited to disease control, epidemic pre-warning, information spreading, and co-movement mining. To answer this problem, we develop an exact group processing algorithm with optimization strategies. Further, we propose an approximate algorithm that substantially improves the efficiency without false dismissal. Extensive experiments offer insight into effectiveness and efficiency of our proposed algorithms.

AAAI Conference 2022 Conference Paper

TransZero: Attribute-Guided Transformer for Zero-Shot Learning

  • Shiming Chen
  • Ziming Hong
  • Yang Liu
  • Guo-Sen Xie
  • Baigui Sun
  • Hao Li
  • Qinmu Peng
  • Ke Lu

Zero-shot learning (ZSL) aims to recognize novel classes by transferring semantic knowledge from seen classes to unseen ones. Semantic knowledge is learned from attribute descriptions shared between different classes, which act as strong priors for localizing object attributes that represent discriminative region features, enabling significant visual-semantic interaction. Although some attention-based models have attempted to learn such region features in a single image, the transferability and discriminative attribute localization of visual features are typically neglected. In this paper, we propose an attribute-guided Transformer network, termed TransZero, to refine visual features and learn attribute localization for discriminative visual embedding representations in ZSL. Specifically, TransZero takes a feature augmentation encoder to alleviate the cross-dataset bias between ImageNet and ZSL benchmarks, and improves the transferability of visual features by reducing the entangled relative geometry relationships among region features. To learn locality-augmented visual features, TransZero employs a visual-semantic decoder to localize the image regions most relevant to each attribute in a given image, under the guidance of semantic attribute information. Then, the locality-augmented visual features and semantic vectors are used to conduct effective visual-semantic interaction in a visual-semantic embedding network. Extensive experiments show that TransZero achieves the new state of the art on three ZSL benchmarks. The codes are available at: https: //github. com/shiming-chen/TransZero.

IJCAI Conference 2022 Conference Paper

Vision Shared and Representation Isolated Network for Person Search

  • Yang Liu
  • Yingping Li
  • Chengyu Kong
  • Yuqiu Kong
  • Shenglan Liu
  • Feilong Wang

Person search is a widely-concerned computer vision task that aims to jointly solve the problems of pedestrian detection and person re-identification in panoramic scenes. However, the pedestrian detection focuses on the consistency of pedestrians, while the person re-identification attempts to extract the discriminative features of pedestrians. The inevitable conflict greatly restricts the researches on the one-stage person search methods. To address this issue, we propose a Vision Shared and Representation Isolated (VSRI) network to decouple the two conflicted subtasks simultaneously, through which two independent representations are constructed for the two subtasks. To enhance the discrimination of the re-ID representation, a Multi-Level Feature Fusion (MLFF) module is proposed. The MLFF adopts the Spatial Pyramid Feature Fusion (SPFF) module to obtain diverse features from the stem network. Moreover, the multi-head self-attention mechanism is employed to construct a Multi-head Attention Driven Extraction (MADE) module and the cascaded convolution unit is adopted to devise a Feature Decomposition and Cascaded Integration (FDCI) module, which facilitates the MLFF to obtain more discriminative representations of the pedestrians. The proposed method outperforms the state-of-the-art methods on the mainstream datasets.

AAAI Conference 2022 Conference Paper

Weakly Supervised Video Moment Localization with Contrastive Negative Sample Mining

  • Minghang Zheng
  • Yanjie Huang
  • Qingchao Chen
  • Yang Liu

Video moment localization aims at localizing the video segments which are most related to the given free-form natural language query. The weakly supervised setting, where only video level description is available during training, is getting more and more attention due to its lower annotation cost. Prior weakly supervised methods mainly use sliding windows to generate temporal proposals, which are independent of video content and low quality, and train the model to distinguish matched video-query pairs and unmatched ones collected from different videos, while neglecting what the model needs is to distinguish the unaligned segments within the video. In this work, we propose a novel weakly supervised solution by introducing Contrastive Negative sample Mining (CNM). Specifically, we use a learnable Gaussian mask to generate positive samples, highlighting the video frames most related to the query, and consider other frames of the video and the whole video as easy and hard negative samples respectively. We then train our network with the Intra-Video Contrastive loss to make our positive and negative samples more discriminative. Our method has two advantages: (1) Our proposal generation process with a learnable Gaussian mask is more efficient and makes our positive sample higher quality. (2) The more difficult intra-video negative samples enable our model to distinguish highly confusing scenes. Experiments on two datasets show the effectiveness of our method. Code can be found at https: //github. com/minghangz/cnm.

JBHI Journal 2021 Journal Article

Automatic Detection of QRS Complexes Using Dual Channels Based on U-Net and Bidirectional Long Short-Term Memory

  • Runnan He
  • Yang Liu
  • Kuanquan Wang
  • Na Zhao
  • Yongfeng Yuan
  • Qince Li
  • Henggui Zhang

Objective: Detecting changes in the QRS complexes in ECG signals is regarded as a straightforward, noninvasive, inexpensive, and preliminary diagnosis approach for evaluating the cardiac health of patients. Therefore, detecting QRS complexes in ECG signals must be accurate over short times. However, the reliability of automatic QRS detection is restricted by all kinds of noise and complex signal morphologies. The objective of this paper is to address automatic detection of QRS complexes. Methods: In this paper, we proposed a new algorithm for automatic detection of QRS complexes using dual channels based on U-Net and bidirectional long short-term memory. First, a proposed preprocessor with mean filtering and discrete wavelet transform was initially applied to remove different types of noise. Next the signal was transformed and annotations were relabeled. Finally, a method combining U-Net and bidirectional long short-term memory with dual channels was used for the automatic detection of QRS complexes. Results: The proposed algorithm was trained and tested using 44 ECG records from the MIT-BIH arrhythmia database and CPSC2019 dataset, which achieved 99. 06% and 95. 13% for sensitivity, 99. 22% and 82. 03% for positive predictivity, and 98. 29% and 78. 73% accuracy on the two datasets respectively. Conclusion: Experimental results prove that the proposed method may be useful for automatic detection of QRS complex task. Significance: The proposed method not only has application potential for QRS complex detecting for large ECG data, but also can be extended to other medical signal research fields.

IJCAI Conference 2021 Conference Paper

AVA: Adversarial Vignetting Attack against Visual Recognition

  • Binyu Tian
  • Felix Juefei-Xu
  • Qing Guo
  • Xiaofei Xie
  • Xiaohong Li
  • Yang Liu

Vignetting is an inherent imaging phenomenon within almost all optical systems, showing as a radial intensity darkening toward the corners of an image. Since it is a common effect for photography and usually appears as a slight intensity variation, people usually regard it as a part of a photo and would not even want to post-process it. Due to this natural advantage, in this work, we study the vignetting from a new viewpoint, i. e. , adversarial vignetting attack (AVA), which aims to embed intentionally misleading information into the vignetting and produce a natural adversarial example without noise patterns. This example can fool the state-of-the-art deep convolutional neural networks (CNNs) but is imperceptible to human. To this end, we first propose the radial-isotropic adversarial vignetting attack (RI-AVA) based on the physical model of vignetting, where the physical parameters (e. g. , illumination factor and focal length) are tuned through the guidance of target CNN models. To achieve higher transferability across different CNNs, we further propose radial-anisotropic adversarial vignetting attack (RA-AVA) by allowing the effective regions of vignetting to be radial-anisotropic and shape-free. Moreover, we propose the geometry-aware level-set optimization method to solve the adversarial vignetting regions and physical parameters jointly. We validate the proposed methods on three popular datasets, i. e. , DEV, CIFAR10, and Tiny ImageNet, by attacking four CNNs, e. g. , ResNet50, EfficientNet-B0, DenseNet121, and MobileNet-V2, demonstrating the advantages of our methods over baseline methods on both transferability and image quality.

NeurIPS Conference 2021 Conference Paper

Bandit Learning with Delayed Impact of Actions

  • Wei Tang
  • Chien-Ju Ho
  • Yang Liu

We consider a stochastic multi-armed bandit (MAB) problem with delayed impact of actions. In our setting, actions taken in the pastimpact the arm rewards in the subsequent future. This delayed impact of actions is prevalent in the real world. For example, the capability to pay back a loan for people in a certain social group might depend on historically how frequently that group has been approved loan applications. If banks keep rejecting loan applications to people in a disadvantaged group, it could create a feedback loop and further damage the chance of getting loans for people in that group. In this paper, we formulate this delayed and long-term impact of actions within the context of multi-armed bandits. We generalize the bandit setting to encode the dependency of this ``bias" due to the action history during learning. The goal is to maximize the collected utilities over time while taking into account the dynamics created by the delayed impacts of historical actions. We propose an algorithm that achieves a regret of $\tilde{O}(KT^{2/3})$ and show a matching regret lower bound of $\Omega(KT^{2/3})$, where $K$ is the number of arms and $T$ is the learning horizon. Our results complement the bandit literature by adding techniques to deal with actions with long-term impacts and have implications in designing fair algorithms.

NeurIPS Conference 2021 Conference Paper

Can Less be More? When Increasing-to-Balancing Label Noise Rates Considered Beneficial

  • Yang Liu
  • Jialu Wang

In this paper, we answer the question of when inserting label noise (less informative labels) can instead return us more accurate and fair models. We are primarily inspired by three observations: 1) In contrast to reducing label noise rates, increasing the noise rates is easy to implement; 2) Increasing a certain class of instances' label noise to balance the noise rates (increasing-to-balancing) results in an easier learning problem; 3) Increasing-to-balancing improves fairness guarantees against label bias. In this paper, we first quantify the trade-offs introduced by increasing a certain group of instances' label noise rate w. r. t. the loss of label informativeness and the lowered learning difficulties. We analytically demonstrate when such an increase is beneficial, in terms of either improved generalization power or the fairness guarantees. Then we present a method to insert label noise properly for the task of learning with noisy labels, either without or with a fairness constraint. The primary technical challenge we face is due to the fact that we would not know which data instances are suffering from higher noise, and we would not have the ground truth labels to verify any possible hypothesis. We propose a detection method that informs us which group of labels might suffer from higher noise without using ground truth labels. We formally establish the effectiveness of the proposed solution and demonstrate it with extensive experiments.

AAAI Conference 2021 Conference Paper

Decision-Guided Weighted Automata Extraction from Recurrent Neural Networks

  • Xiyue Zhang
  • Xiaoning Du
  • Xiaofei Xie
  • Lei Ma
  • Yang Liu
  • Meng Sun

Recurrent Neural Networks (RNNs) have demonstrated their effectiveness in learning and processing sequential data (e. g. , speech and natural language). However, due to the black-box nature of neural networks, understanding the decision logic of RNNs is quite challenging. Some recent progress has been made to approximate the behavior of an RNN by weighted automata. They provide better interpretability, but still suffer from poor scalability. In this paper, we propose a novel approach to extracting weighted automata with the guidance of a target RNN’s decision and context information. In particular, we identify the patterns of RNN’s step-wise predictive decisions to instruct the formation of automata states. Further, we propose a state composition method to enhance the context-awareness of the extracted model. Our in-depth evaluations on typical RNN tasks, including language model and classification, demonstrate the effectiveness and advantage of our method over the state-of-the-arts. The evaluation results show that our method can achieve accurate approximation of an RNN even on large-scale tasks.

NeurIPS Conference 2021 Conference Paper

Deconvolutional Networks on Graph Data

  • Jia Li
  • Jiajin Li
  • Yang Liu
  • Jianwei Yu
  • Yueting Li
  • Hong Cheng

In this paper, we consider an inverse problem in graph learning domain -- "given the graph representations smoothed by Graph Convolutional Network (GCN), how can we reconstruct the input graph signal? " We propose Graph Deconvolutional Network (GDN) and motivate the design of GDN via a combination of inverse filters in spectral domain and de-noising layers in wavelet domain, as the inverse operation results in a high frequency amplifier and may amplify the noise. We demonstrate the effectiveness of the proposed method on several tasks including graph feature imputation and graph structure generation.

AAAI Conference 2021 Conference Paper

EfficientDeRain: Learning Pixel-wise Dilation Filtering for High-Efficiency Single-Image Deraining

  • Qing Guo
  • Jingyang Sun
  • Felix Juefei-Xu
  • Lei Ma
  • Xiaofei Xie
  • Wei Feng
  • Yang Liu
  • Jianjun Zhao

Single-image deraining is rather challenging due to the unknown rain model. Existing methods often make specific assumptions of the rain model, which can hardly cover many diverse circumstances in the real world, compelling them to employ complex optimization or progressive refinement. This, however, significantly affects these methods’ efficiency and effectiveness for many efficiency-critical applications. To fill this gap, in this paper, we regard the single-image deraining as a general image-enhancing problem and originally propose a model-free deraining method, i. e. , Efficient- DeRain, which is able to process a rainy image within 10 ms (i. e. , around 6 ms on average), over 80 times faster than the state-of-the-art method (i. e. , RCDNet), while achieving similar de-rain effects. We first propose the novel pixel-wise dilation filtering. In particular, a rainy image is filtered with the pixel-wise kernels estimated from a kernel prediction network, by which suitable multi-scale kernels for each pixel can be efficiently predicted. Then, to eliminate the gap between synthetic and real data, we further propose an effective data augmentation method (i. e. , RainMix) that helps to train network for handling real rainy images. We perform comprehensive evaluation on both synthetic and realworld rainy datasets to demonstrate the effectiveness and efficiency of our method. We release the model and code in https: //github. com/tsingqguo/efficientderain. git.

JMLR Journal 2021 Journal Article

FATE: An Industrial Grade Platform for Collaborative Learning With Data Protection

  • Yang Liu
  • Tao Fan
  • Tianjian Chen
  • Qian Xu
  • Qiang Yang

Collaborative and federated learning has become an emerging solution to many industrial applications where data values from different sites are exploit jointly with privacy protection. We introduce FATE, an industrial-grade project that supports enterprises and institutions to build machine learning models collaboratively at large-scale in a distributed manner. FATE supports a variety of secure computation protocols and machine learning algorithms, and features out-of-box usability with end-to-end building modules and visualization tools. Documentations are available at https://github.com/FederatedAI/FATE. Case studies and other information are available at https://www.fedai.org. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2021. ( edit, beta )

IS Journal 2021 Journal Article

Federated Digital Gateway: Methodologies, Tools, and Applications

  • Yang Liu
  • Ruolan Wang
  • Shishuai Du
  • Junbo Zhang
  • Yu Zheng

Federated machine learning (FML) is a new machine learning paradigm that is focused on training distributed models, where data are scattered in different places known as data silos, only necessary modeling information (not raw data) is exchanged, and data privacy and security are protected during the modeling. This research area has been growing fast during the past years, but the vision of making it a practical solution is still not fulfilled. Motivated by this, here we introduce an intelligent architecture, termed Federated Digital Gateway. It is designed to help algorithm engineers to easily deploy FML methods for real-life tasks. It provides different modules such as secure communication tools, database interface, authentication center, account system, and user interface. This architecture has been shown to function smoothly in two real-world applications. Overall, the federated digital gateway is practical and deployable for applying federated learning to solve real-life tasks.

IJCAI Conference 2021 Conference Paper

Fine-tuning Is Not Enough: A Simple yet Effective Watermark Removal Attack for DNN Models

  • Shangwei Guo
  • Tianwei Zhang
  • Han Qiu
  • Yi Zeng
  • Tao Xiang
  • Yang Liu

Watermarking has become the tendency in protecting the intellectual property of DNN models. Recent works, from the adversary's perspective, attempted to subvert watermarking mechanisms by designing watermark removal attacks. However, these attacks mainly adopted sophisticated fine-tuning techniques, which have certain fatal drawbacks or unrealistic assumptions. In this paper, we propose a novel watermark removal attack from a different perspective. Instead of just fine-tuning the watermarked models, we design a simple yet powerful transformation algorithm by combining imperceptible pattern embedding and spatial-level transformations, which can effectively and blindly destroy the memorization of watermarked models to the watermark samples. We also introduce a lightweight fine-tuning strategy to preserve the model performance. Our solution requires much less resource or knowledge about the watermarking scheme than prior works. Extensive experimental results indicate that our attack can bypass state-of-the-art watermarking solutions with very high success rates. Based on our attack, we propose watermark augmentation techniques to enhance the robustness of existing watermarks.

NeurIPS Conference 2021 Conference Paper

How Powerful are Performance Predictors in Neural Architecture Search?

  • Colin White
  • Arber Zela
  • Robin Ru
  • Yang Liu
  • Frank Hutter

Early methods in the rapidly developing field of neural architecture search (NAS) required fully training thousands of neural networks. To reduce this extreme computational cost, dozens of techniques have since been proposed to predict the final performance of neural architectures. Despite the success of such performance prediction methods, it is not well-understood how different families of techniques compare to one another, due to the lack of an agreed-upon evaluation metric and optimization for different constraints on the initialization time and query time. In this work, we give the first large-scale study of performance predictors by analyzing 31 techniques ranging from learning curve extrapolation, to weight-sharing, to supervised learning, to zero-cost proxies. We test a number of correlation- and rank-based performance measures in a variety of settings, as well as the ability of each technique to speed up predictor-based NAS frameworks. Our results act as recommendations for the best predictors to use in different settings, and we show that certain families of predictors can be combined to achieve even better predictive power, opening up promising research directions. We release our code, featuring a library of 31 performance predictors.

NeurIPS Conference 2021 Conference Paper

HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning

  • Shiming Chen
  • Guosen Xie
  • Yang Liu
  • Qinmu Peng
  • Baigui Sun
  • Hao Li
  • Xinge You
  • Ling Shao

Zero-shot learning (ZSL) tackles the unseen class recognition problem, transferring semantic knowledge from seen classes to unseen ones. Typically, to guarantee desirable knowledge transfer, a common (latent) space is adopted for associating the visual and semantic domains in ZSL. However, existing common space learning methods align the semantic and visual domains by merely mitigating distribution disagreement through one-step adaptation. This strategy is usually ineffective due to the heterogeneous nature of the feature representations in the two domains, which intrinsically contain both distribution and structure variations. To address this and advance ZSL, we propose a novel hierarchical semantic-visual adaptation (HSVA) framework. Specifically, HSVA aligns the semantic and visual domains by adopting a hierarchical two-step adaptation, i. e. , structure adaptation and distribution adaptation. In the structure adaptation step, we take two task-specific encoders to encode the source data (visual domain) and the target data (semantic domain) into a structure-aligned common space. To this end, a supervised adversarial discrepancy (SAD) module is proposed to adversarially minimize the discrepancy between the predictions of two task-specific classifiers, thus making the visual and semantic feature manifolds more closely aligned. In the distribution adaptation step, we directly minimize the Wasserstein distance between the latent multivariate Gaussian distributions to align the visual and semantic distributions using a common encoder. Finally, the structure and distribution adaptation are derived in a unified framework under two partially-aligned variational autoencoders. Extensive experiments on four benchmark datasets demonstrate that HSVA achieves superior performance on both conventional and generalized ZSL. The code is available at \url{https: //github. com/shiming-chen/HSVA}.

ICML Conference 2021 Conference Paper

Learning by Turning: Neural Architecture Aware Optimisation

  • Yang Liu
  • Jeremy Bernstein
  • Markus Meister
  • Yisong Yue

Descent methods for deep networks are notoriously capricious: they require careful tuning of step size, momentum and weight decay, and which method will work best on a new benchmark is a priori unclear. To address this problem, this paper conducts a combined study of neural architecture and optimisation, leading to a new optimiser called Nero: the neuronal rotator. Nero trains reliably without momentum or weight decay, works in situations where Adam and SGD fail, and requires little to no learning rate tuning. Also, Nero’s memory footprint is square root that of Adam or LAMB. Nero combines two ideas: (1) projected gradient descent over the space of balanced networks; (2) neuron-specific updates, where the step size sets the angle through which each neuron’s hyperplane turns. The paper concludes by discussing how this geometric connection between architecture and optimisation may impact theories of generalisation in deep learning.

AAAI Conference 2021 Conference Paper

Mind-the-Gap! Unsupervised Domain Adaptation for Text-Video Retrieval

  • Qingchao Chen
  • Yang Liu
  • Samuel Albanie

When can we expect a text-video retrieval system to work effectively on datasets that differ from its training domain? In this work, we investigate this question through the lens of unsupervised domain adaptation in which the objective is to match natural language queries and video content in the presence of domain shift at query-time. Such systems have significant practical applications since they are capable generalising to new data sources without requiring corresponding text annotations. We make the following contributions: (1) We propose the UDAVR (Unsupervised Domain Adaptation for Video Retrieval) benchmark and employ it to study the performance of text-video retrieval in the presence of domain shift. (2) We propose Concept-Aware-Pseudo-Query (CAPQ), a method for learning discriminative and transferable features that bridge these cross-domain discrepancies to enable effective target domain retrieval using source domain supervision. (3) We show that CAPQ outperforms alternative domain adaptation strategies on UDAVR.

IROS Conference 2021 Conference Paper

Overlap Displacement Error: Are Your SLAM Poses Map-Consistent?

  • Christian Mostegel
  • Jianbo Ye
  • Yu Luo
  • Yang Liu

Localization is an essential module that supports many intelligent functions of a mobile robot such as transportation or inspection. However, justifying that a localization module is sufficiently accurate for supporting all downstream tasks is one of the most difficult questions to answer in practice. To overcome this problem, we move away from the traditional calculation of pose errors and propose a new approach that instead evaluates the potential map inconsistency introduced by those pose errors. For this purpose, we propose a new metric, which we call Overlap Displacement Error (ODE). This metric measures the relative displacements between multiple overlapping sensor frustums with respect to the ground truth. All you need to compute this metric are a query trajectory, a ground truth trajectory and the sensor frustum used for mapping. Having the sensor frustum and the map representation as part of the metric, the ODE is customized to the hardware configuration and the mapping strategy. This design allows the analysis of pose accuracy in a space that matters to map creation, and also allows the identification of problems sitting in the interplay between localization and mapping. We demonstrate the potential of this new analysis tool on synthetic and the real-world sequences.

IJCAI Conference 2021 Conference Paper

Physics-informed Spline Learning for Nonlinear Dynamics Discovery

  • Fangzheng Sun
  • Yang Liu
  • Hao Sun

Dynamical systems are typically governed by a set of linear/nonlinear differential equations. Distilling the analytical form of these equations from very limited data remains intractable in many disciplines such as physics, biology, climate science, engineering and social science. To address this fundamental challenge, we propose a novel Physics-informed Spline Learning (PiSL) framework to discover parsimonious governing equations for nonlinear dynamics, based on sparsely sampled noisy data. The key concept is to (1) leverage splines to interpolate locally the dynamics, perform analytical differentiation and build the library of candidate terms, (2) employ sparse representation of the governing equations, and (3) use the physics residual in turn to inform the spline learning. The synergy between splines and discovered underlying physics leads to the robust capacity of dealing with high-level data scarcity and noise. A hybrid sparsity-promoting alternating direction optimization strategy is developed for systematically pruning the sparse coefficients that form the structure and explicit expression of the governing equations. The efficacy and superiority of the proposed method have been demonstrated by multiple well-known nonlinear dynamical systems, in comparison with two state-of-the-art methods.

NeurIPS Conference 2021 Conference Paper

Policy Learning Using Weak Supervision

  • Jingkang Wang
  • Hongyi Guo
  • Zhaowei Zhu
  • Yang Liu

Most existing policy learning solutions require the learning agents to receive high-quality supervision signals, e. g. , rewards in reinforcement learning (RL) or high-quality expert demonstrations in behavioral cloning (BC). These quality supervisions are either infeasible or prohibitively expensive to obtain in practice. We aim for a unified framework that leverages the available cheap weak supervisions to perform policy learning efficiently. To handle this problem, we treat the weak supervision'' as imperfect information coming from a peer agent, and evaluate the learning agent's policy based on a correlated agreement'' with the peer agent's policy (instead of simple agreements). Our approach explicitly punishes a policy for overfitting to the weak supervision. In addition to theoretical guarantees, extensive evaluations on tasks including RL with noisy reward, BC with weak demonstrations, and standard policy co-training (RL + BC) show that our method leads to substantial performance improvements, especially when the complexity or the noise of the learning environments is high.

IS Journal 2021 Journal Article

SecureBoost: A Lossless Federated Learning Framework

  • Kewei Cheng
  • Tao Fan
  • Yilun Jin
  • Yang Liu
  • Tianjian Chen
  • Dimitrios Papadopoulos
  • Qiang Yang

The protection of user privacy is an important concern in machine learning, as evidenced by the rolling out of the General Data Protection Regulation (GDPR) in the European Union (EU) in May 2018. The GDPR is designed to give users more control over their personal data, which motivates us to explore machine learning frameworks for data sharing that do not violate user privacy. To meet this goal, in this article, we propose a novel lossless privacy-preserving tree-boosting system known as SecureBoost in the setting of federated learning. SecureBoost first conducts entity alignment under a privacy-preserving protocol and then constructs boosting trees across multiple parties with a carefully designed encryption strategy. This federated learning system allows the learning process to be jointly conducted over multiple parties with common user samples but different feature sets, which corresponds to a vertically partitioned dataset. An advantage of SecureBoost is that it provides the same level of accuracy as the non -privacy-preserving approach while at the same time, reveals no information of each private data provider. We show that the SecureBoost framework is as accurate as other nonfederated gradient tree-boosting algorithms that require centralized data, and thus, it is highly scalable and practical for industrial applications such as credit risk analysis. To this end, we discuss information leakage during the protocol execution and propose ways to provably reduce it.

IJCAI Conference 2021 Conference Paper

Spline Positional Encoding for Learning 3D Implicit Signed Distance Fields

  • Peng-Shuai Wang
  • Yang Liu
  • Yu-Qi Yang
  • Xin Tong

Multilayer perceptrons (MLPs) have been successfully used to represent 3D shapes implicitly and compactly, by mapping 3D coordinates to the corresponding signed distance values or occupancy values. In this paper, we propose a novel positional encoding scheme, called Spline Positional Encoding, to map the input coordinates to a high dimensional space before passing them to MLPs, which help recover 3D signed distance fields with fine-scale geometric details from unorganized 3D point clouds. We verified the superiority of our approach over other positional encoding schemes on tasks of 3D shape reconstruction and 3D shape space learning from input point clouds. The efficacy of our approach extended to image reconstruction is also demonstrated and evaluated.

TIST Journal 2021 Journal Article

StarFL: Hybrid Federated Learning Architecture for Smart Urban Computing

  • Anbu Huang
  • Yang Liu
  • Tianjian Chen
  • Yongkai Zhou
  • Quan Sun
  • Hongfeng Chai
  • Qiang Yang

From facial recognition to autonomous driving, Artificial Intelligence (AI) will transform the way we live and work over the next couple of decades. Existing AI approaches for urban computing suffer from various challenges, including dealing with synchronization and processing of vast amount of data generated from the edge devices, as well as the privacy and security of individual users, including their bio-metrics, locations, and itineraries. Traditional centralized-based approaches require data in each organization be uploaded to the central database, which may be prohibited by data protection acts, such as GDPR and CCPA. To decouple model training from the need to store the data in the cloud, a new training paradigm called Federated Learning (FL) is proposed. FL enables multiple devices to collaboratively learn a shared model while keeping the training data on devices locally, which can significantly mitigate privacy leakage risk. However, under urban computing scenarios, data are often communication-heavy, high-frequent, and asynchronized, posing new challenges to FL implementation. To handle these challenges, we propose a new hybrid federated learning architecture called StarFL. By combining with Trusted Execution Environment (TEE), Secure Multi-Party Computation (MPC), and (Beidou) satellites, StarFL enables safe key distribution, encryption, and decryption, and provides a verification mechanism for each participant to ensure the security of the local data. In addition, StarFL can provide accurate timestamp matching to facilitate synchronization of multiple clients. All these improvements make StarFL more applicable to the security-sensitive scenarios for the next generation of urban computing.

NeurIPS Conference 2021 Conference Paper

Stochastic Anderson Mixing for Nonconvex Stochastic Optimization

  • Fuchao Wei
  • Chenglong Bao
  • Yang Liu

Anderson mixing (AM) is an acceleration method for fixed-point iterations. Despite its success and wide usage in scientific computing, the convergence theory of AM remains unclear, and its applications to machine learning problems are not well explored. In this paper, by introducing damped projection and adaptive regularization to the classical AM, we propose a Stochastic Anderson Mixing (SAM) scheme to solve nonconvex stochastic optimization problems. Under mild assumptions, we establish the convergence theory of SAM, including the almost sure convergence to stationary points and the worst-case iteration complexity. Moreover, the complexity bound can be improved when randomly choosing an iterate as the output. To further accelerate the convergence, we incorporate a variance reduction technique into the proposed SAM. We also propose a preconditioned mixing strategy for SAM which can empirically achieve faster convergence or better generalization ability. Finally, we apply the SAM method to train various neural networks including the vanilla CNN, ResNets, WideResNet, ResNeXt, DenseNet and LSTM. Experimental results on image classification and language model demonstrate the advantages of our method.

NeurIPS Conference 2021 Conference Paper

Synthetic Benchmarks for Scientific Research in Explainable Machine Learning

  • Yang Liu
  • Sujay Khandagale
  • Colin White
  • Willie Neiswanger

As machine learning models grow more complex and their applications become more high-stakes, tools for explaining model predictions have become increasingly important. This has spurred a flurry of research in model explainability and has given rise to feature attribution methods such as LIME and SHAP. Despite their widespread use, evaluating and comparing different feature attribution methods remains challenging: evaluations ideally require human studies, and empirical evaluation metrics are often data-intensive or computationally prohibitive on real-world datasets. In this work, we address this issue by releasing XAI-BENCH: a suite of synthetic datasets along with a library for benchmarking feature attribution algorithms. Unlike real-world datasets, synthetic datasets allow the efficient computation of conditional expected values that are needed to evaluate ground-truth Shapley values and other metrics. The synthetic datasets we release offer a wide variety of parameters that can be configured to simulate real-world data. We demonstrate the power of our library by benchmarking popular explainability techniques across several evaluation metrics and across a variety of settings. The versatility and efficiency of our library will help researchers bring their explainability methods from development to deployment. Our code is available at https: //github. com/abacusai/xai-bench.

AAMAS Conference 2021 Conference Paper

Temporal Watermarks for Deep Reinforcement Learning Models

  • Kangjie Chen
  • Shangwei Guo
  • Tianwei Zhang
  • Shuxin Li
  • Yang Liu

Watermarking has become a popular and attractive technique to protect the Intellectual Property (IP) of Deep Learning (DL) models. However, very few studies explore the possibility of watermarking Deep Reinforcement Learning (DRL) models. Common approaches in the DL context embed backdoors into the protected model and use special samples to verify the model ownership. These solutions are easy to be detected, and can potentially affect the performance and behaviors of the target model. Such limitations make existing solutions less applicable to safety- and security-critical tasks and scenarios, where DRL has been widely used. In this work, we propose a novel watermarking scheme for DRL protection. Instead of using spatial watermarks as in DL models, we introduce temporal watermarks, which can reduce the potential impact and damage to the target model, while achieving ownership verification with high fidelity. Specifically, (1) we design a new damage metric to select sequential states for watermark generation; (2) we introduce a new reward function to efficiently alter the model’s behaviors for watermark embedding; (3) we propose to utilize a predefined probability density function of actions over the watermark states as the verification evidence. Our method is general and can be applied to various DRL tasks with either deterministic or stochastic reinforcement learning algorithms. Extensive experimental results show that it can effectively preserve the functionality of DRL models and exhibit significant robustness against common model modifications, e. g. , fine-tuning and model compression.

IJCAI Conference 2021 Conference Paper

Understanding Structural Vulnerability in Graph Convolutional Networks

  • Liang Chen
  • Jintang Li
  • Qibiao Peng
  • Yang Liu
  • Zibin Zheng
  • Carl Yang

Recent studies have shown that Graph Convolutional Networks (GCNs) are vulnerable to adversarial attacks on the graph structure. Although multiple works have been proposed to improve their robustness against such structural adversarial attacks, the reasons for the success of the attacks remain unclear. In this work, we theoretically and empirically demonstrate that structural adversarial examples can be attributed to the non-robust aggregation scheme (i. e. , the weighted mean) of GCNs. Specifically, our analysis takes advantage of the breakdown point which can quantitatively measure the robustness of aggregation schemes. The key insight is that weighted mean, as the basic design of GCNs, has a low breakdown point and its output can be dramatically changed by injecting a single edge. We show that adopting the aggregation scheme with a high breakdown point (e. g. , median or trimmed mean) could significantly enhance the robustness of GCNs against structural attacks. Extensive experiments on four real-world datasets demonstrate that such a simple but effective method achieves the best robustness performance compared to state-of-the-art models.

NeurIPS Conference 2021 Conference Paper

Unintended Selection: Persistent Qualification Rate Disparities and Interventions

  • Reilly Raab
  • Yang Liu

Realistically---and equitably---modeling the dynamics of group-level disparities in machine learning remains an open problem. In particular, we desire models that do not suppose inherent differences between artificial groups of people---but rather endogenize disparities by appeal to unequal initial conditions of insular subpopulations. In this paper, agents each have a real-valued feature $X$ (e. g. , credit score) informed by a ``true'' binary label $Y$ representing qualification (e. g. , for a loan). Each agent alternately (1) receives a binary classification label $\hat{Y}$ (e. g. , loan approval) from a Bayes-optimal machine learning classifier observing $X$ and (2) may update their qualification $Y$ by imitating successful strategies (e. g. , seek a raise) within an isolated group $G$ of agents to which they belong. We consider the disparity of qualification rates $\Pr(Y=1)$ between different groups and how this disparity changes subject to a sequence of Bayes-optimal classifiers repeatedly retrained on the global population. We model the evolving qualification rates of each subpopulation (group) using the replicator equation, which derives from a class of imitation processes. We show that differences in qualification rates between subpopulations can persist indefinitely for a set of non-trivial equilibrium states due to uniformed classifier deployments, even when groups are identical in all aspects except initial qualification densities. We next simulate the effects of commonly proposed fairness interventions on this dynamical system along with a new feedback control mechanism capable of permanently eliminating group-level qualification rate disparities. We conclude by discussing the limitations of our model and findings and by outlining potential future work.

AAAI Conference 2021 Conference Paper

Unsupervised 3D Learning for Shape Analysis via Multiresolution Instance Discrimination

  • Peng-Shuai Wang
  • Yu-Qi Yang
  • Qian-Fang Zou
  • Zhirong Wu
  • Yang Liu
  • Xin Tong

We propose an unsupervised method for learning a generic and efficient shape encoding network for different shape analysis tasks. Our key idea is to jointly encode and learn shape and point features from unlabeled 3D point clouds. For this purpose, we adapt HRNet to octree-based convolutional neural networks for jointly encoding shape and point features with fused multiresolution subnetworks and design a simple-yetefficient Multiresolution Instance Discrimination (MID) loss for jointly learning the shape and point features. Our network takes a 3D point cloud as input and output both shape and point features. After training, Our network is concatenated with simple task-specific back-ends and fine-tuned for different shape analysis tasks. We evaluate the efficacy and generality of our method with a set of shape analysis tasks, including shape classification, semantic shape segmentation, as well as shape registration tasks. With simple back-ends, our network demonstrates the best performance among all unsupervised methods and achieves competitive performance to supervised methods. For fine-grained shape segmentation on the PartNet dataset, our method even surpasses existing supervised methods by a large margin.

IJCAI Conference 2020 Conference Paper

A Multi-player Game for Studying Federated Learning Incentive Schemes

  • Kang Loon Ng
  • Zichen Chen
  • Zelei Liu
  • Han Yu
  • Yang Liu
  • Qiang Yang

Federated Learning (FL) enables participants to "share'' their sensitive local data in a privacy preserving manner and collaboratively build machine learning models. In order to sustain long-term participation by high quality data owners (especially if they are businesses), FL systems need to provide suitable incentives. To design an effective incentive scheme, it is important to understand how FL participants respond under such schemes. This paper proposes FedGame, a multi-player game to study how FL participants make action selection decisions under different incentive schemes. It allows human players to role-play under various conditions. The decision-making processes can be analyzed and visualized to inform FL incentive mechanism design in the future.

IS Journal 2020 Journal Article

A Secure Federated Transfer Learning Framework

  • Yang Liu
  • Yan Kang
  • Chaoping Xing
  • Tianjian Chen
  • Qiang Yang

Machine learning relies on the availability of vast amounts of data for training. However, in reality, data are mostly scattered across different organizations and cannot be easily integrated due to many legal and practical constraints. To address this important challenge in the field of machine learning, we introduce a new technique and framework, known as federated transfer learning (FTL), to improve statistical modeling under a data federation. FTL allows knowledge to be shared without compromising user privacy and enables complementary knowledge to be transferred across domains in a data federation, thereby enabling a target-domain party to build flexible and effective models by leveraging rich labels from a source domain. This framework requires minimal modifications to the existing model structure and provides the same level of accuracy as the nonprivacy-preserving transfer learning. It is flexible and can be effectively adapted to various secure multiparty machine learning tasks.

IS Journal 2020 Journal Article

A Sustainable Incentive Scheme for Federated Learning

  • Han Yu
  • Zelei Liu
  • Yang Liu
  • Tianjian Chen
  • Mingshu Cong
  • Xi Weng
  • Dusit Niyato
  • Qiang Yang

In federated learning (FL), a federation distributedly trains a collective machine learning model by leveraging privacy preserving technologies. However, FL participants need to incur some cost for contributing to the FL models. The training and commercialization of the models will take time. Thus, there will be delays before the federation could pay back the participants. This temporary mismatch between contributions and rewards has not been accounted for by existing payoff-sharing schemes. To address this limitation, we propose the FL incentivizer (FLI). It dynamically divides a given budget in a context-aware manner among data owners in a federation by jointly maximizing the collective utility while minimizing the inequality among the data owners, in terms of the payoff received and the waiting time for receiving payoffs. Comparisons with five state-of-the-art payoff-sharing schemes show that FLI attracts high-quality data owners and achieves the highest expected revenue for a federation.

AAAI Conference 2020 Conference Paper

Adaptive Quantitative Trading: An Imitative Deep Reinforcement Learning Approach

  • Yang Liu
  • Qi Liu
  • Hongke Zhao
  • Zhen Pan
  • Chuanren Liu

In recent years, considerable efforts have been devoted to developing AI techniques for finance research and applications. For instance, AI techniques (e. g. , machine learning) can help traders in quantitative trading (QT) by automating two tasks: market condition recognition and trading strategies execution. However, existing methods in QT face challenges such as representing noisy high-frequent financial data and finding the balance between exploration and exploitation of the trading agent with AI techniques. To address the challenges, we propose an adaptive trading model, namely iRDPG, to automatically develop QT strategies by an intelligent trading agent. Our model is enhanced by deep reinforcement learning (DRL) and imitation learning techniques. Specifically, considering the noisy financial data, we formulate the QT process as a Partially Observable Markov Decision Process (POMDP). Also, we introduce imitation learning to leverage classical trading strategies useful to balance between exploration and exploitation. For better simulation, we train our trading agent in the real financial market using minute-frequent data. Experimental results demonstrate that our model can extract robust market features and be adaptive in different markets.

IJCAI Conference 2020 Conference Paper

Combinatorial Multi-Armed Bandits with Concave Rewards and Fairness Constraints

  • Huanle Xu
  • Yang Liu
  • Wing Cheong Lau
  • Rui Li

The problem of multi-armed bandit (MAB) with fairness constraint has emerged as an important research topic recently. For such problems, one common objective is to maximize the total rewards within a fixed round of pulls, while satisfying the fairness requirement of a minimum selection fraction for each individual arm in the long run. Previous works have made substantial advancements in designing efficient online selection solutions, however, they fail to achieve a sublinear regret bound when incorporating such fairness constraints. In this paper, we study a combinatorial MAB problem with concave objective and fairness constraints. In particular, we adopt a new approach that combines online convex optimization with bandit methods to design selection algorithms. Our algorithm is computationally efficient, and more importantly, manages to achieve a sublinear regret bound with probability guarantees. Finally, we evaluate the performance of our algorithm via extensive simulations and demonstrate that it outperforms the baselines substantially.

AAAI Conference 2020 Conference Paper

Continuous Multiagent Control Using Collective Behavior Entropy for Large-Scale Home Energy Management

  • Jianwen Sun
  • Yan Zheng
  • Jianye Hao
  • Zhaopeng Meng
  • Yang Liu

With the increasing popularity of electric vehicles, distributed energy generation and storage facilities in smart grid systems, an efficient Demand-Side Management (DSM) is urgent for energy savings and peak loads reduction. Traditional DSM works focusing on optimizing the energy activities for a single household can not scale up to large-scale home energy management problems. Multi-agent Deep Reinforcement Learning (MA-DRL) shows a potential way to solve the problem of scalability, where modern homes interact together to reduce energy consumers consumption while striking a balance between energy cost and peak loads reduction. However, it is difficult to solve such an environment with the non-stationarity, and existing MA-DRL approaches cannot effectively give incentives for expected group behavior. In this paper, we propose a collective MA-DRL algorithm with continuous action space to provide fine-grained control on a large scale microgrid. To mitigate the non-stationarity of the microgrid environment, a novel predictive model is proposed to measure the collective market behavior. Besides, a collective behavior entropy is introduced to reduce the high peak loads incurred by the collective behaviors of all householders in the smart grid. Empirical results show that our approach significantly outperforms the state-of-the-art methods regarding power cost reduction and daily peak loads optimization.

IS Journal 2020 Journal Article

Crowd Intelligence: Conducting Asymmetric Impact-Performance Analysis Based on Online Reviews

  • Jian-Wu Bi
  • Yang Liu
  • Zhi-Ping Fan

Asymmetric impact-performance analysis (AIPA) is an effective technique for understanding customer satisfaction and formulating improvement strategies for products and services. Typically, AIPA is conducted based on data obtained from customer surveys, which are expensive in terms of time and money. As a new data source, online reviews have many advantages, which are a promising data source for conducting AIPA. To this end, this article proposes a method for conducting AIPA based on online reviews. To illustrate the feasibility and validity of the proposed method, a case study of AIPA for a five-star hotel in Singapore is given. The proposed method can provide managers one more choice for conducting AIPA with lower cost and shorter time since products and services online reviews can be easily collected.

ICLR Conference 2020 Conference Paper

Discrepancy Ratio: Evaluating Model Performance When Even Experts Disagree on the Truth

  • Igor Lovchinsky
  • Alon Daks
  • Israel Malkin
  • Pouya Samangouei
  • Ardavan Saeedi
  • Yang Liu
  • Swami Sankaranarayanan
  • Tomer Gafner

In most machine learning tasks unambiguous ground truth labels can easily be acquired. However, this luxury is often not afforded to many high-stakes, real-world scenarios such as medical image interpretation, where even expert human annotators typically exhibit very high levels of disagreement with one another. While prior works have focused on overcoming noisy labels during training, the question of how to evaluate models when annotators disagree about ground truth has remained largely unexplored. To address this, we propose the discrepancy ratio: a novel, task-independent and principled framework for validating machine learning models in the presence of high label noise. Conceptually, our approach evaluates a model by comparing its predictions to those of human annotators, taking into account the degree to which annotators disagree with one another. While our approach is entirely general, we show that in the special case of binary classification, our proposed metric can be evaluated in terms of simple, closed-form expressions that depend only on aggregate statistics of the labels and not on any individual label. Finally, we demonstrate how this framework can be used effectively to validate machine learning models using two real-world tasks from medical imaging. The discrepancy ratio metric reveals what conventional metrics do not: that our models not only vastly exceed the average human performance, but even exceed the performance of the best human experts in our datasets.

IS Journal 2020 Journal Article

Distributed Privacy-Preserving Iterative Summation Protocols

  • Yang Liu
  • Qingchen Liu
  • Xiong Zhang
  • Shuqi Qin
  • Xiaoping Lei

In this article, we study the problem of summation evaluation of secrets. The secrets are distributed over a network of nodes that form a ring graph. Privacy-preserving iterative protocols for computing the sum of the secrets are proposed, which are resilient against dynamic node join and leave situations. Theoretic bounds are derived regarding the utility and accuracy, and the proposed protocols are shown to comply with differential privacy requirements. Based on utility, accuracy, and privacy, we also provide guidance on appropriate selections of random noise parameters. Additionally, a few numerical examples that demonstrate their effectiveness and superiority are provided.

IJCAI Conference 2020 Conference Paper

FakeSpotter: A Simple yet Robust Baseline for Spotting AI-Synthesized Fake Faces

  • Run Wang
  • Felix Juefei-Xu
  • Lei Ma
  • Xiaofei Xie
  • Yihao Huang
  • Jian Wang
  • Yang Liu

In recent years, generative adversarial networks (GANs) and its variants have achieved unprecedented success in image synthesis. They are widely adopted in synthesizing facial images which brings potential security concerns to humans as the fakes spread and fuel the misinformation. However, robust detectors of these AI-synthesized fake faces are still in their infancy and are not ready to fully tackle this emerging challenge. In this work, we propose a novel approach, named FakeSpotter, based on monitoring neuron behaviors to spot AI-synthesized fake faces. The studies on neuron coverage and interactions have successfully shown that they can be served as testing criteria for deep learning systems, especially under the settings of being exposed to adversarial attacks. Here, we conjecture that monitoring neuron behavior can also serve as an asset in detecting fake faces since layer-by-layer neuron activation patterns may capture more subtle features that are important for the fake detector. Experimental results on detecting four types of fake faces synthesized with the state-of-the-art GANs and evading four perturbation attacks show the effectiveness and robustness of our approach.

AAAI Conference 2020 Conference Paper

Generating Adversarial Examples for Holding Robustness of Source Code Processing Models

  • Huangzhao Zhang
  • Zhuo Li
  • Ge Li
  • Lei Ma
  • Yang Liu
  • Zhi Jin

Automated processing, analysis, and generation of source code are among the key activities in software and system lifecycle. To this end, while deep learning (DL) exhibits a certain level of capability in handling these tasks, the current stateof-the-art DL models still suffer from non-robust issues and can be easily fooled by adversarial attacks. Different from adversarial attacks for image, audio, and natural languages, the structured nature of programming languages brings new challenges. In this paper, we propose a Metropolis-Hastings sampling-based identifier renaming technique, named Metropolis-Hastings Modifier (MHM), which generates adversarial examples for DL models specialized for source code processing. Our in-depth evaluation on a functionality classification benchmark demonstrates the effectiveness of MHM in generating adversarial examples of source code. The higher robustness and performance enhanced through our adversarial training with MHM further confirms the usefulness of DL models-based method for future fully automated source code processing.

IJCAI Conference 2020 Conference Paper

Generating Behavior-Diverse Game AIs with Evolutionary Multi-Objective Deep Reinforcement Learning

  • Ruimin Shen
  • Yan Zheng
  • Jianye Hao
  • Zhaopeng Meng
  • Yingfeng Chen
  • Changjie Fan
  • Yang Liu

Generating diverse behaviors for game artificial intelligence (Game AI) has been long recognized as a challenging task in the game industry. Designing a Game AI with a satisfying behavioral characteristic (style) heavily depends on the domain knowledge and is hard to achieve manually. Deep reinforcement learning sheds light on advancing the automatic Game AI design. However, most of them focus on creating a superhuman Game AI, ignoring the importance of behavioral diversity in games. To bridge the gap, we introduce a new framework, named EMOGI, which can automatically generate desirable styles with almost no domain knowledge. More importantly, EMOGI succeeds in creating a range of diverse styles, providing behavior-diverse Game AIs. Evaluations on the Atari and real commercial games indicate that, compared to existing algorithms, EMOGI performs better in generating diverse behaviors and significantly improves the efficiency of Game AI design.

NeurIPS Conference 2020 Conference Paper

How do fair decisions fare in long-term qualification?

  • Xueru Zhang
  • Ruibo Tu
  • Yang Liu
  • Mingyan Liu
  • Hedvig Kjellstrom
  • Kun Zhang
  • Cheng Zhang

Although many fairness criteria have been proposed for decision making, their long-term impact on the well-being of a population remains unclear. In this work, we study the dynamics of population qualification and algorithmic decisions under a partially observed Markov decision problem setting. By characterizing the equilibrium of such dynamics, we analyze the long-term impact of static fairness constraints on the equality and improvement of group well-being. Our results show that static fairness constraints can either promote equality or exacerbate disparity depending on the driving factor of qualification transitions and the effect of sensitive attributes on feature distributions. We also consider possible interventions that can effectively improve group qualification or promote equality of group qualification. Our theoretical results and experiments on static real-world datasets with simulated dynamics show that our framework can be used to facilitate social science studies.

AIJ Journal 2020 Journal Article

How do fairness definitions fare? Testing public attitudes towards three algorithmic definitions of fairness in loan allocations

  • Nripsuta Ani Saxena
  • Karen Huang
  • Evan DeFilippis
  • Goran Radanovic
  • David C. Parkes
  • Yang Liu

What is the best way to define algorithmic fairness? While many definitions of fairness have been proposed in the computer science literature, there is no clear agreement over a particular definition. In this work, we investigate ordinary people's perceptions of three of these fairness definitions. Across three online experiments, we test which definitions people perceive to be the fairest in the context of loan decisions, and whether fairness perceptions change with the addition of sensitive information (i. e. , race or gender of the loan applicants). Overall, one definition (calibrated fairness) tends to be more preferred than the others, and the results also provide support for the principle of affirmative action.

AAAI Conference 2020 Conference Paper

Image Formation Model Guided Deep Image Super-Resolution

  • Jinshan Pan
  • Yang Liu
  • Deqing Sun
  • Jimmy Ren
  • Ming-Ming Cheng
  • Jian Yang
  • Jinhui Tang

We present a simple and effective image super-resolution algorithm that imposes an image formation constraint on the deep neural networks via pixel substitution. The proposed algorithm first uses a deep neural network to estimate intermediate high-resolution images, blurs the intermediate images using known blur kernels, and then substitutes values of the pixels at the un-decimated positions with those of the corresponding pixels from the low-resolution images. The output of the pixel substitution process strictly satisfies the image formation model and is further refined by the same deep neural network in a cascaded manner. The proposed framework is trained in an end-to-end fashion and can work with existing feed-forward deep neural networks for super-resolution and converges fast in practice. Extensive experimental results show that the proposed algorithm performs favorably against state-of-the-art methods.

IS Journal 2020 Journal Article

Introduction to the Special Issue on Federated Machine Learning

  • Yang Liu
  • Han Yu
  • Qiang Yang

The articles in this special section focus on federated machine learning, an emerging research paradigm focusing on solving data-silos challenges in real-world industrial applications. It is a broad discipline that touches many topics, including distributed and collaborative learning, privacy-preserving machine learning, edge computing, and data valuation, etc. Its interdisciplinary nature calls for collaborative efforts from a variety of fields to establish new protocols, frameworks and systems to address unique challenges, and open problems. These articles highlight a selection of high-quality and original works in this new area, including accepted papers to the 1st International Workshop on Federated Machine Learning in conjunction with IJCAI 2019.

NeurIPS Conference 2020 Conference Paper

Learning Strategy-Aware Linear Classifiers

  • Yiling Chen
  • Yang Liu
  • Chara Podimata

We address the question of repeatedly learning linear classifiers against agents who are \emph{strategically} trying to \emph{game} the deployed classifiers, and we use the \emph{Stackelberg regret} to measure the performance of our algorithms. First, we show that Stackelberg and external regret for the problem of strategic classification are \emph{strongly incompatible}: i. e. , there exist worst-case scenarios, where \emph{any} sequence of actions providing \emph{sublinear} external regret might result in \emph{linear} Stackelberg regret and vice versa. Second, we present a strategy-aware algorithm for minimizing the Stackelberg regret for which we prove nearly matching upper and lower regret bounds. Finally, we provide simulations to complement our theoretical analysis. Our results advance the growing literature of learning from revealed preferences, which has so far focused on ``smoother'' assumptions from the perspective of the learner and the agents respectively.

IJCAI Conference 2020 Conference Paper

Modeling Voting for System Combination in Machine Translation

  • Xuancheng Huang
  • Jiacheng Zhang
  • Zhixing Tan
  • Derek F. Wong
  • Huanbo Luan
  • Jingfang Xu
  • Maosong Sun
  • Yang Liu

System combination is an important technique for combining the hypotheses of different machine translation systems to improve translation performance. Although early statistical approaches to system combination have been proven effective in analyzing the consensus between hypotheses, they suffer from the error propagation problem due to the use of pipelines. While this problem has been alleviated by end-to-end training of multi-source sequence-to-sequence models recently, these neural models do not explicitly analyze the relations between hypotheses and fail to capture their agreement because the attention to a word in a hypothesis is calculated independently, ignoring the fact that the word might occur in multiple hypotheses. In this work, we propose an approach to modeling voting for system combination in machine translation. The basic idea is to enable words in hypotheses from different systems to vote on words that are representative and should get involved in the generation process. This can be done by quantifying the influence of each voter and its preference for each candidate. Our approach combines the advantages of statistical and neural methods since it can not only analyze the relations between hypotheses but also allow for end-to-end training. Experiments show that our approach is capable of better taking advantage of the consensus between hypotheses and achieves significant improvements over state-of-the-art baselines on Chinese-English and English-German machine translation tasks.

AAAI Conference 2020 Conference Paper

Multi-Zone Unit for Recurrent Neural Networks

  • Fandong Meng
  • Jinchao Zhang
  • Yang Liu
  • Jie Zhou

Recurrent neural networks (RNNs) have been widely used to deal with sequence learning problems. The input-dependent transition function, which folds new observations into hidden states to sequentially construct fixed-length representations of arbitrary-length sequences, plays a critical role in RNNs. Based on single space composition, transition functions in existing RNNs often have difficulty in capturing complicated long-range dependencies. In this paper, we introduce a new Multi-zone Unit (MZU) for RNNs. The key idea is to design a transition function that is capable of modeling multiple space composition. The MZU consists of three components: zone generation, zone composition, and zone aggregation. Experimental results on multiple datasets of the character-level language modeling task and the aspect-based sentiment analysis task demonstrate the superiority of the MZU.

ICLR Conference 2020 Conference Paper

On Identifiability in Transformers

  • Gino Brunner
  • Yang Liu
  • Damian Pascual
  • Oliver Richter
  • Massimiliano Ciaramita
  • Roger Wattenhofer

In this paper we delve deep in the Transformer architecture by investigating two of its core components: self-attention and contextual embeddings. In particular, we study the identifiability of attention weights and token embeddings, and the aggregation of context into hidden tokens. We show that, for sequences longer than the attention head dimension, attention weights are not identifiable. We propose effective attention as a complementary tool for improving explanatory interpretations based on attention. Furthermore, we show that input tokens retain to a large degree their identity across the model. We also find evidence suggesting that identity information is mainly encoded in the angle of the embeddings and gradually decreases with depth. Finally, we demonstrate strong mixing of input information in the generation of contextual embeddings by means of a novel quantification method based on gradient attribution. Overall, we show that self-attention distributions are not directly interpretable and present tools to better understand and further investigate Transformer models.

NeurIPS Conference 2020 Conference Paper

Optimal Query Complexity of Secure Stochastic Convex Optimization

  • Wei Tang
  • Chien-Ju Ho
  • Yang Liu

We study the \emph{secure} stochastic convex optimization problem: a learner aims to learn the optimal point of a convex function through sequentially querying a (stochastic) gradient oracle, in the meantime, there exists an adversary who aims to free-ride and infer the learning outcome of the learner from observing the learner's queries. The adversary observes only the points of the queries but not the feedback from the oracle. The goal of the learner is to optimize the accuracy, i. e. , obtaining an accurate estimate of the optimal point, while securing her privacy, i. e. , making it difficult for the adversary to infer the optimal point. We formally quantify this tradeoff between learner’s accuracy and privacy and characterize the lower and upper bounds on the learner's query complexity as a function of desired levels of accuracy and privacy. For the analysis of lower bounds, we provide a general template based on information theoretical analysis and then tailor the template to several families of problems, including stochastic convex optimization and (noisy) binary search. We also present a generic secure learning protocol that achieves the matching upper bound up to logarithmic factors.

AAAI Conference 2020 Conference Paper

Reinforcement Learning with Perturbed Rewards

  • Jingkang Wang
  • Yang Liu
  • Bo Li

Recent studies have shown that reinforcement learning (RL) models are vulnerable in various noisy scenarios. For instance, the observed reward channel is often subject to noise in practice (e. g. , when rewards are collected through sensors), and is therefore not credible. In addition, for applications such as robotics, a deep reinforcement learning (DRL) algorithm can be manipulated to produce arbitrary errors by receiving corrupted rewards. In this paper, we consider noisy RL problems with perturbed rewards, which can be approximated with a confusion matrix. We develop a robust RL framework that enables agents to learn in noisy environments where only perturbed rewards are observed. Our solution framework builds on existing RL/DRL algorithms and firstly addresses the biased noisy reward setting without any assumptions on the true distribution (e. g. , zero-mean Gaussian noise as made in previous works). The core ideas of our solution include estimating a reward confusion matrix and defining a set of unbiased surrogate rewards. We prove the convergence and sample complexity of our approach. Extensive experiments on different DRL platforms show that trained policies based on our estimated surrogate reward can achieve higher expected rewards, and converge faster than existing baselines. For instance, the state-of-the-art PPO algorithm is able to obtain 84. 6% and 80. 8% improvements on average score for five Atari games, with error rates as 10% and 30% respectively.

AAAI Conference 2020 Conference Paper

Stealthy and Efficient Adversarial Attacks against Deep Reinforcement Learning

  • Jianwen Sun
  • Tianwei Zhang
  • Xiaofei Xie
  • Lei Ma
  • Yan Zheng
  • Kangjie Chen
  • Yang Liu

Adversarial attacks against conventional Deep Learning (DL) systems and algorithms have been widely studied, and various defenses were proposed. However, the possibility and feasibility of such attacks against Deep Reinforcement Learning (DRL) are less explored. As DRL has achieved great success in various complex tasks, designing effective adversarial attacks is an indispensable prerequisite towards building robust DRL algorithms. In this paper, we introduce two novel adversarial attack techniques to stealthily and efficiently attack the DRL agents. These two techniques enable an adversary to inject adversarial samples in a minimal set of critical moments while causing the most severe damage to the agent. The first technique is the critical point attack: the adversary builds a model to predict the future environmental states and agent’s actions, assesses the damage of each possible attack strategy, and selects the optimal one. The second technique is the antagonist attack: the adversary automatically learns a domainagnostic model to discover the critical moments of attacking the agent in an episode. Experimental results demonstrate the effectiveness of our techniques. Specifically, to successfully attack the DRL agent, our critical point technique only requires 1 (TORCS) or 2 (Atari Pong and Breakout) steps, and the antagonist technique needs fewer than 5 steps (4 Mujoco tasks), which are significant improvements over state-of-theart methods.

AAAI Conference 2020 Conference Paper

Structure-Aware Feature Fusion for Unsupervised Domain Adaptation

  • Qingchao Chen
  • Yang Liu

Unsupervised domain Adaptation (UDA) aims to learn and transfer generalized features from a labelled source domain to a target domain without any annotations. Existing methods only aligning high-level representation but without exploiting the complex multi-class structure and local spatial structure. This is problematic as 1) the model is prone to negative transfer when the features from different classes are misaligned; 2) missing the local spatial structure poses a major obstacle in performing the fine-grained feature alignment. In this paper, we integrate the valuable information conveyed in classifier prediction and local feature maps into global feature representation and then perform a single mini-max game to make it domain invariant. In this way, the domain-invariant feature not only describes the holistic representation of the original image but also preserves mode-structure and fine-grained spatial structural information. The feature integration is achieved by estimating and maximizing the mutual information (MI) among the global feature, local feature and classifier prediction simultaneously. As the MI is hard to measure directly in high-dimension spaces, we adopt a new objective function that implicitly maximizes the MI via an effective sampling strategy and a discriminator design. Our STructure-Aware Feature Fusion (STAFF) network achieves the state-of-the-art performances in various UDA datasets.

NeurIPS Conference 2020 Conference Paper

Watch out! Motion is Blurring the Vision of Your Deep Neural Networks

  • Qing Guo
  • Felix Juefei-Xu
  • Xiaofei Xie
  • Lei Ma
  • Jian Wang
  • Bing Yu
  • Wei Feng
  • Yang Liu

The state-of-the-art deep neural networks (DNNs) are vulnerable against adversarial examples with additive random-like noise perturbations. While such examples are hardly found in the physical world, the image blurring effect caused by object motion, on the other hand, commonly occurs in practice, making the study of which greatly important especially for the widely adopted real-time image processing tasks (e. g. , object detection, tracking). In this paper, we initiate the first step to comprehensively investigate the potential hazards of blur effect for DNN, caused by object motion. We propose a novel adversarial attack method that can generate visually natural motion-blurred adversarial examples, named motion-based adversarial blur attack (ABBA). To this end, we first formulate the kernel-prediction-based attack where an input image is convolved with kernels in a pixel-wise way, and the misclassification capability is achieved by tuning the kernel weights. To generate visually more natural and plausible examples, we further propose the saliency-regularized adversarial kernel prediction, where the salient region serves as a moving object, and the predicted kernel is regularized to achieve naturally visual effects. Besides, the attack is further enhanced by adaptively tuning the translations of object and background. A comprehensive evaluation on the NeurIPS'17 adversarial competition dataset demonstrates the effectiveness of ABBA by considering various kernel sizes, translations, and regions. The in-depth study further confirms that our method shows a more effective penetrating capability to the state-of-the-art GAN-based deblurring mechanisms compared with other blurring methods. We release the code to \url{https: //github. com/tsingqguo/ABBA}.

AAAI Conference 2019 Conference Paper

A Multi-Agent Communication Framework for Question-Worthy Phrase Extraction and Question Generation

  • Siyuan Wang
  • Zhongyu Wei
  • Zhihao Fan
  • Yang Liu
  • Xuanjing Huang

Question generation aims to produce questions automatically given a piece of text as input. Existing research follows a sequence-to-sequence fashion that constructs a single question based on the input. Considering each question usually focuses on a specific fragment of the input, especially in the scenario of reading comprehension, it is reasonable to identify the corresponding focus before constructing the question. In this paper, we propose to identify question-worthy phrases first and generate questions with the assistance of these phrases. We introduce a multi-agent communication framework, taking phrase extraction and question generation as two agents, and learn these two tasks simultaneously via message passing mechanism. The results of experiments show the effectiveness of our framework: we can extract question-worthy phrases, which are able to improve the performance of question generation. Besides, our system is able to extract more than one question worthy phrases and generate multiple questions accordingly.

AAAI Conference 2019 Conference Paper

Bayesian Fairness

  • Christos Dimitrakakis
  • Yang Liu
  • David C. Parkes
  • Goran Radanovic

We consider the problem of how decision making can be fair when the underlying probabilistic model of the world is not known with certainty. We argue that recent notions of fairness in machine learning need to explicitly incorporate parameter uncertainty, hence we introduce the notion of Bayesian fairness as a suitable candidate for fair decision rules. Using balance, a definition of fairness introduced in (Kleinberg, Mullainathan, and Raghavan 2016), we show how a Bayesian perspective can lead to well-performing and fair decision rules even under high uncertainty.

AAAI Conference 2019 Conference Paper

Dependency Grammar Induction with a Neural Variational Transition-Based Parser

  • Bowen Li
  • Jianpeng Cheng
  • Yang Liu
  • Frank Keller

Dependency grammar induction is the task of learning dependency syntax without annotated training data. Traditional graph-based models with global inference achieve state-ofthe-art results on this task but they require O(n3 ) run time. Transition-based models enable faster inference with O(n) time complexity, but their performance still lags behind. In this work, we propose a neural transition-based parser for dependency grammar induction, whose inference procedure utilizes rich neural features with O(n) time complexity. We train the parser with an integration of variational inference, posterior regularization and variance reduction techniques. The resulting framework outperforms previous unsupervised transition-based dependency parsers and achieves performance comparable to graph-based models, both on the English Penn Treebank and on the Universal Dependency Treebank. In an empirical comparison, we show that our approach substantially increases parsing speed over graphbased models.

NeurIPS Conference 2019 Conference Paper

Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks

  • Yaqin Zhou
  • Shangqing Liu
  • Jingkai Siow
  • Xiaoning Du
  • Yang Liu

Vulnerability identification is crucial to protect the software systems from attacks for cyber security. It is especially important to localize the vulnerable functions among the source code to facilitate the fix. However, it is a challenging and tedious process, and also requires specialized security expertise. Inspired by the work on manually-defined patterns of vulnerabilities from various code representation graphs and the recent advance on graph neural networks, we propose Devign, a general graph neural network based model for graph-level classification through learning on a rich set of code semantic representations. It includes a novel Conv module to efficiently extract useful features in the learned rich node representations for graph-level classification. The model is trained over manually labeled datasets built on 4 diversified large-scale open-source C projects that incorporate high complexity and variety of real source code instead of synthesis code used in previous works. The results of the extensive evaluation on the datasets demonstrate that Devign outperforms the state of the arts significantly with an average of 10. 51% higher accuracy and 8. 68% F1 score, increases averagely 4. 66% accuracy and 6. 37% F1 by the Conv module.

IJCAI Conference 2019 Conference Paper

DiffChaser: Detecting Disagreements for Deep Neural Networks

  • Xiaofei Xie
  • Lei Ma
  • Haijun Wang
  • Yuekang Li
  • Yang Liu
  • Xiaohong Li

The platform migration and customization have become an indispensable process of deep neural network (DNN) development lifecycle. A high-precision but complex DNN trained in the cloud on massive data and powerful GPUs often goes through an optimization phase (e. g, quantization, compression) before deployment to a target device (e. g, mobile device). A test set that effectively uncovers the disagreements of a DNN and its optimized variant provides certain feedback to debug and further enhance the optimization procedure. However, the minor inconsistency between a DNN and its optimized version is often hard to detect and easily bypasses the original test set. This paper proposes DiffChaser, an automated black-box testing framework to detect untargeted/targeted disagreements between version variants of a DNN. We demonstrate 1) its effectiveness by comparing with the state-of-the-art techniques, and 2) its usefulness in real-world DNN product deployment involved with quantization and optimization.

AAAI Conference 2019 Short Paper

EWGAN: Entropy-Based Wasserstein GAN for Imbalanced Learning

  • Jinfu Ren
  • Yang Liu
  • Jiming Liu

In this paper, we propose a novel oversampling strategy dubbed Entropy-based Wasserstein Generative Adversarial Network (EWGAN) to generate data samples for minority classes in imbalanced learning. First, we construct an entropyweighted label vector for each class to characterize the data imbalance in different classes. Then we concatenate this entropyweighted label vector with the original feature vector of each data sample, and feed it into the WGAN model to train the generator. After the generator is trained, we concatenate the entropy-weighted label vector with random noise feature vectors, and feed them into the generator to generate data samples for minority classes. Experimental results on two benchmark datasets show that the samples generated by the proposed oversampling strategy can help to improve the classification performance when the data are highly imbalanced. Furthermore, the proposed strategy outperforms other state-of-the-art oversampling algorithms in terms of the classification accuracy.

AIJ Journal 2019 Journal Article

Exploiting reverse target-side contexts for neural machine translation via asynchronous bidirectional decoding

  • Jinsong Su
  • Xiangwen Zhang
  • Qian Lin
  • Yue Qin
  • Junfeng Yao
  • Yang Liu

Based on a unified encoder-decoder framework with attentional mechanism, neural machine translation (NMT) models have attracted much attention and become the mainstream in the community of machine translation. Generally, the NMT decoders produce translation in a left-to-right way. As a result, only left-to-right target-side contexts from the generated translations are exploited, while the right-to-left target-side contexts are completely unexploited for translation. In this paper, we extend the conventional attentional encoder-decoder NMT framework by introducing a backward decoder, in order to explore asynchronous bidirectional decoding for NMT. In the first step after encoding, our backward decoder learns to generate the target-side hidden states in a right-to-left manner. Next, in each timestep of translation prediction, our forward decoder concurrently considers both the source-side and the reverse target-side hidden states via two attention models. Compared with previous models, the innovation in this architecture enables our model to fully exploit contexts from both source side and target side, which improve translation quality altogether. We conducted experiments on NIST Chinese-English, WMT English-German and Finnish-English translation tasks to investigate the effectiveness of our model. Experimental results show that (1) our improved RNN-based NMT model achieves significant improvements over the conventional RNNSearch by 1. 44/-3. 02, 1. 11/-1. 01, and 1. 23/-1. 27 average BLEU and TER points, respectively; and (2) our enhanced Transformer outperforms the standard Transformer by 1. 56/-1. 49, 1. 76/-2. 49, and 1. 29/-1. 33 average BLEU and TER points, respectively. We released our code at https: //github. com/DeepLearnXMU/ABD-NMT.

IJCAI Conference 2019 Conference Paper

Fair and Explainable Dynamic Engagement of Crowd Workers

  • Han Yu
  • Yang Liu
  • Xiguang Wei
  • Chuyu Zheng
  • Tianjian Chen
  • Qiang Yang
  • Xiong Peng

Years of rural-urban migration has resulted in a significant population in China seeking ad-hoc work in large urban centres. At the same time, many businesses face large fluctuations in demand for manpower and require more efficient ways to satisfy such demands. This paper outlines AlgoCrowd, an artificial intelligence (AI)-empowered algorithmic crowdsourcing platform. Equipped with an efficient explainable task-worker matching optimization approach designed to focus on fair treatment of workers while maximizing collective utility, the platform provides explainable task recommendations to workers' personal work management mobile apps which are becoming popular, with the aim to address the above societal challenge.

TIST Journal 2019 Journal Article

Federated Machine Learning

  • Qiang Yang
  • Yang Liu
  • Tianjian Chen
  • Yongxin Tong

Today’s artificial intelligence still faces two major challenges. One is that, in most industries, data exists in the form of isolated islands. The other is the strengthening of data privacy and security. We propose a possible solution to these challenges: secure federated learning. Beyond the federated-learning framework first proposed by Google in 2016, we introduce a comprehensive secure federated-learning framework, which includes horizontal federated learning, vertical federated learning, and federated transfer learning. We provide definitions, architectures, and applications for the federated-learning framework, and provide a comprehensive survey of existing works on this subject. In addition, we propose building data networks among organizations based on federated mechanisms as an effective solution to allowing knowledge to be shared without compromising user privacy.

IJCAI Conference 2019 Conference Paper

Graph and Autoencoder Based Feature Extraction for Zero-shot Learning

  • Yang Liu
  • Deyan Xie
  • Quanxue Gao
  • Jungong Han
  • Shujian Wang
  • Xinbo Gao

Zero-shot learning (ZSL) aims to build models to recognize novel visual categories that have no associated labelled training samples. The basic framework is to transfer knowledge from seen classes to unseen classes by learning the visual-semantic embedding. However, most of approaches do not preserve the underlying sub-manifold of samples in the embedding space. In addition, whether the mapping can precisely reconstruct the original visual feature is not investigated in-depth. In order to solve these problems, we formulate a novel framework named Graph and Autoencoder Based Feature Extraction (GAFE) to seek a low-rank mapping to preserve the sub-manifold of samples. Taking the encoder-decoder paradigm, the encoder part learns a mapping from the visual feature to the semantic space, while decoder part reconstructs the original features with the learned mapping. In addition, a graph is constructed to guarantee the learned mapping can preserve the local intrinsic structure of the data. To this end, an L21 norm sparsity constraint is imposed on the mapping to identify features relevant to the target domain. Extensive experiments on five attribute datasets demonstrate the effectiveness of the proposed model.

IJCAI Conference 2019 Conference Paper

Heterogeneous Gaussian Mechanism: Preserving Differential Privacy in Deep Learning with Provable Robustness

  • NhatHai Phan
  • Minh N. Vu
  • Yang Liu
  • Ruoming Jin
  • Dejing Dou
  • Xintao Wu
  • My T. Thai

In this paper, we propose a novel Heterogeneous Gaussian Mechanism (HGM) to preserve differential privacy in deep neural networks, with provable robustness against adversarial examples. We first relax the constraint of the privacy budget in the traditional Gaussian Mechanism from (0, 1] to (0, infty), with a new bound of the noise scale to preserve differential privacy. The noise in our mechanism can be arbitrarily redistributed, offering a distinctive ability to address the trade-off between model utility and privacy loss. To derive provable robustness, our HGM is applied to inject Gaussian noise into the first hidden layer. Then, a tighter robustness bound is proposed. Theoretical analysis and thorough evaluations show that our mechanism notably improves the robustness of differentially private deep neural networks, compared with baseline approaches, under a variety of model attacks.

AAAI Conference 2019 Conference Paper

Implanting Rational Knowledge into Distributed Representation at Morpheme Level

  • Zi Lin
  • Yang Liu

Previously, researchers paid no attention to the creation of unambiguous morpheme embeddings independent from the corpus, while such information plays an important role in expressing the exact meanings of words for parataxis languages like Chinese. In this paper, after constructing the Chinese lexical and semantic ontology based on word-formation, we propose a novel approach to implanting the structured rational knowledge into distributed representation at morpheme level, naturally avoiding heavy disambiguation in the corpus. We design a template to create the instances as pseudo-sentences merely from the pieces of knowledge of morphemes built in the lexicon. To exploit hierarchical information and tackle the data sparseness problem, the instance proliferation technique is applied based on similarity to expand the collection of pseudo-sentences. The distributed representation for morphemes can then be trained on these pseudo-sentences using word2vec. For evaluation, we validate the paradigmatic and syntagmatic relations of morpheme embeddings, and apply the obtained embeddings to word similarity measurement, achieving significant improvements over the classical models by more than 5 Spearman scores or 8 percentage points, which shows very promising prospects for adoption of the new source of knowledge.

EAAI Journal 2019 Journal Article

Improving the effectiveness of keyword search in databases using query logs

  • Ziqiang Yu
  • Ajith Abraham
  • Xiaohui Yu
  • Yang Liu
  • Jing Zhou
  • Kun Ma

Using query logs to enhance user experience has been extensively studied in the Web IR literature. However, in the area of keyword search on structured data (relational databases in particular), most existing works have focused on improving search result quality via designing better scoring functions, without giving explicit consideration to query logs. However, query logs can reflect the user preferences, so our work taps into the wealth of information contained in query logs and aims to enhance the search effectiveness by explicitly taking into account the log information when ranking the query results. Different from existing approaches only relying on a schema graph or a data graph, our work designs a comprehensive solution based on both the schema graph and the data graph for discovering top-k results with two stages. First, we identify top-k candidate networks with a query-log-aware ranking strategy by employing the largest frequent subtrees mined from query logs. Since a candidate network usually corresponds to multiple joined tuple trees, we further rank these joined tuple trees with the PageRank principle based on the data graph in the second stage. Finally, user studies on a real dataset validate the effectiveness of the proposed ranking strategy.

IJCAI Conference 2019 Conference Paper

Matching User with Item Set: Collaborative Bundle Recommendation with Deep Attention Network

  • Liang Chen
  • Yang Liu
  • Xiangnan He
  • Lianli Gao
  • Zibin Zheng

Most recommendation research has been concentrated on recommending single items to users, such as the considerable work on collaborative filtering that models the interaction between a user and an item. However, in many real-world scenarios, the platform needs to show users a set of items, e. g. , the marketing strategy that offers multiple items for sale as one bundle. In this work, we consider recommending a set of items to a user, i. e. , the Bundle Recommendation task, which concerns the interaction modeling between a user and a set of items. We contribute a neural network solution named DAM, short for Deep Attentive Multi-Task model, which is featured with two special designs: 1) We design a factorized attention network to aggregate the item embeddings in a bundle to obtain the bundle's representation; 2) We jointly model user-bundle interactions and user-item interactions in a multi-task manner to alleviate the scarcity of user-bundle interactions. Extensive experiments on a real-world dataset show that DAM outperforms the state-of-the-art solution, verifying the effectiveness of our attention design and multi-task learning in DAM.

IJCAI Conference 2019 Conference Paper

Multi-Agent Visualization for Explaining Federated Learning

  • Xiguang Wei
  • Quan Li
  • Yang Liu
  • Han Yu
  • Tianjian Chen
  • Qiang Yang

As an alternative decentralized training approach, Federated Learning enables distributed agents to collaboratively learn a machine learning model while keeping personal/private information on local devices. However, one significant issue of this framework is the lack of transparency, thus obscuring understanding of the working mechanism of Federated Learning systems. This paper proposes a multi-agent visualization system that illustrates what is Federated Learning and how it supports multi-agents coordination. To be specific, it allows users to participate in the Federated Learning empowered multi-agent coordination. The input and output of Federated Learning are visualized simultaneously, which provides an intuitive explanation of Federated Learning for users in order to help them gain deeper understanding of the technology.

AAAI Conference 2019 Conference Paper

Randomized Wagering Mechanisms

  • Yiling Chen
  • Yang Liu
  • Juntao Wang

Wagering mechanisms are one-shot betting mechanisms that elicit agents’ predictions of an event. For deterministic wagering mechanisms, an existing impossibility result has shown incompatibility of some desirable theoretical properties. In particular, Pareto optimality (no profitable side bet before allocation) can not be achieved together with weak incentive compatibility, weak budget balance and individual rationality. In this paper, we expand the design space of wagering mechanisms to allow randomization and ask whether there are randomized wagering mechanisms that can achieve all previously considered desirable properties, including Pareto optimality. We answer this question positively with two classes of randomized wagering mechanisms: i) one simple randomized lottery-type implementation of existing deterministic wagering mechanisms, and ii) another family of randomized wagering mechanisms, named surrogate wagering mechanisms, which are robust to noisy ground truth. Surrogate wagering mechanisms are inspired by an idea of learning with noisy labels (Natarajan et al. 2013) as well as a recent extension of this idea to the information elicitation without verification setting (Liu and Chen 2018). We show that a broad set of randomized wagering mechanisms satisfy all desirable theoretical properties.

AAAI Conference 2019 Conference Paper

Recognizing Unseen Attribute-Object Pair with Generative Model

  • Zhixiong Nan
  • Yang Liu
  • Nanning Zheng
  • Song-Chun Zhu

In this paper, we are studying the problem of recognizing attribute-object pairs that do not appear in the training dataset, which is called unseen attribute-object pair recognition. Existing methods mainly learn a discriminative classifier or compose multiple classifiers to tackle this problem, which exhibit poor performance for unseen pairs. The key reasons for this failure are 1) they have not learned an intrinsic attributeobject representation, and 2) the attribute and object are processed either separately or equally so that the inner relation between the attribute and object has not been explored. To explore the inner relation of attribute and object as well as the intrinsic attribute-object representation, we propose a generative model with the encoder-decoder mechanism that bridges visual and linguistic information in a unified end-to-end network. The encoder-decoder mechanism presents the impressive potential to find an intrinsic attribute-object feature representation. In addition, combining visual and linguistic features in a unified model allows to mine the relation of attribute and object. We conducted extensive experiments to compare our method with several state-of-the-art methods on two challenging datasets. The results show that our method outperforms all other methods.

YNIMG Journal 2019 Journal Article

Reduction of cerebral blood flow in community-based adults with subclinical cerebrovascular atherosclerosis: A 3.0T magnetic resonance imaging study

  • Hualu Han
  • Runhua Zhang
  • Gaifen Liu
  • Huiyu Qiao
  • Zhensen Chen
  • Yang Liu
  • Xiaoyi Chen
  • Dongye Li

Reduction in cerebral blood flow (CBF), one of the major metrics for cerebral perfusion, is associated with many brain disorders. Therefore, early characterization of CBF prior to occurrence of symptoms is essential for prevention of cerebral ischemic events. We hypothesized that large artery atherosclerosis might be a potential indicator for decline in cerebral perfusion. The aim of this study was to investigate the relationship between large artery atherosclerosis and CBF in asymptomatic adults. A total of 134 asymptomatic subjects (mean age, 56. 2 ± 12. 8 years; 54 males) were recruited and underwent magnetic resonance (MR) imaging for brain and intracranial and extracranial carotid arteries. Presence or absence of cerebrovascular atherosclerosis was determined on MR vessel wall images. The CBF was measured with pseudo-continuous arterial spin labeling (pCASL) imaging. The CBF values in internal carotid artery (ICA) (37. 2 ± 5. 8 vs. 39. 0 ± 4. 9 ml/100 g/min, P = 0. 049) and vertebrobasilar artery (VA-BA) territories (42. 0 ± 6. 8 vs. 44. 8 ± 7. 0 ml/100 g/min, P = 0. 023) were significantly reduced in subjects with cerebrovascular plaque compared to those without. Presence of cerebrovascular plaque was significantly associated with CBF of VA-BA territory before (odds ratio, 2. 89; 95% confidence interval, 1. 37–6. 08; P = 0. 005) and after adjusted for confounding factors including age, gender, body-mass-index, diabetes, systolic blood pressure, hyperlipidemia and history of cardiovascular disease (odds ratio, 2. 76; 95% confidence interval, 1. 18–6. 46; P = 0. 019). In conclusion, presence of cerebrovascular atherosclerosis is independently associated with reduction in CBF measured by pCASL in asymptomatic adults, suggesting that cerebrovascular large artery atherosclerosis might be an effective indicator for impairment of cerebral microcirculation hemodynamics.

AAMAS Conference 2019 Conference Paper

Self-Improving Generative Adversarial Reinforcement Learning

  • Yang Liu
  • Yifeng Zeng
  • Yingke Chen
  • Jing Tang
  • Yinghui Pan

The lack of data efficiency and stability is one of the main challenges in end-to-end model free reinforcement learning (RL) methods. Recent researches solve the problem resort to supervised learning methods by utilizing human expert demonstrations, e. g. imitation learning. In this paper we present a novel framework which builds a self-improving process upon a policy improvement operator, which is used as a black box such that it has multiple implementation options for various applications. An agent is trained to iteratively imitate behaviors that are generated by the operator. Hence the agent can learn by itself without domain knowledge from human. We employ generative adversarial networks (GAN) to implement the imitation module in the new framework. We evaluate the framework performance over multiple application domains and provide comparison results in support.

IJCAI Conference 2019 Conference Paper

Worst-Case Discriminative Feature Selection

  • Shuangli Liao
  • Quanxue Gao
  • Feiping Nie
  • Yang Liu
  • Xiangdong Zhang

Feature selection plays a critical role in data mining, driven by increasing feature dimensionality in target problems. In this paper, we propose a new criterion for discriminative feature selection, worst-case discriminative feature selection (WDFS). Unlike Fisher Score and other methods based on the discriminative criteria considering the overall (or average) separation of data, WDFS adopts a new perspective called worst-case view which arguably is more suitable for classification applications. Specifically, WDFS directly maximizes the ratio of the minimum of between-class variance of all class pairs over the maximum of within-class variance, and thus it duly considers the separation of all classes. Otherwise, we take a greedy strategy by finding one feature at a time, but it is very easy to implement. Moreover, we also utilize the correlation between features to help reduce the redundancy and extend WDFS to uncorrelated WDFS (UWDFS). To evaluate the effectiveness of the proposed algorithm, we conduct classification experiments on many real data sets. In the experiment, we respectively use the original features and the score vectors of features over all class pairs to calculate the correlation coefficients, and analyze the experimental results in these two ways. Experimental results demonstrate the effectiveness of WDFS and UWDFS.

AAAI Conference 2018 Conference Paper

Asynchronous Bidirectional Decoding for Neural Machine Translation

  • Xiangwen Zhang
  • Jinsong Su
  • Yue Qin
  • Yang Liu
  • Rongrong Ji
  • Hongji Wang

The dominant neural machine translation (NMT) models apply unified attentional encoder-decoder neural networks for translation. Traditionally, the NMT decoders adopt recurrent neural networks (RNNs) to perform translation in a left-toright manner, leaving the target-side contexts generated from right to left unexploited during translation. In this paper, we equip the conventional attentional encoder-decoder NMT framework with a backward decoder, in order to explore bidirectional decoding for NMT. Attending to the hidden state sequence produced by the encoder, our backward decoder first learns to generate the target-side hidden state sequence from right to left. Then, the forward decoder performs translation in the forward direction, while in each translation prediction timestep, it simultaneously applies two attention models to consider the source-side and reverse target-side hidden states, respectively. With this new architecture, our model is able to fully exploit source- and target-side contexts to improve translation quality altogether. Experimental results on NIST Chinese-English and WMT English-German translation tasks demonstrate that our model achieves substantial improvements over the conventional NMT by 3. 14 and 1. 38 BLEU points, respectively. The source code of this work can be obtained from https: //github. com/DeepLearnXMU/ABD- NMT.

AAAI Conference 2018 Short Paper

Bayesian Network Structure Learning: The Two-Step Clustering-Based Algorithm

  • Yikun Zhang
  • Jiming Liu
  • Yang Liu

In this paper we introduce a two-step clustering-based strategy, which can automatically generate prior information from data in order to further improve the accuracy and time ef- ficiency of state-of-the-art algorithms for Bayesian network structure learning. Our clustering-based strategy is composed of two steps. In the first step, we divide the potential nodes into several groups via clustering analysis and apply Bayesian network structure learning to obtain some pre-existing arcs within each cluster. In the second step, with all the withincluster arcs being well preserved, we learn the betweencluster structure of the given network. Experimental results on benchmark datasets show that a wide range of structure learning algorithms benefit from the proposed clusteringbased strategy in terms of both accuracy and efficiency.

AAAI Conference 2018 Conference Paper

Dictionary Learning Inspired Deep Network for Scene Recognition

  • Yang Liu
  • Qingchao Chen
  • Wei Chen
  • Ian Wassell

Scene recognition remains one of the most challenging problems in image understanding. With the help of fully connected layers (FCL) and rectified linear units (ReLu), deep networks can extract the moderately sparse and discriminative feature representation required for scene recognition. However, few methods consider exploiting a sparsity model for learning the feature representation in order to provide enhanced discriminative capability. In this paper, we replace the conventional FCL and ReLu with a new dictionary learning layer, that is composed of a finite number of recurrent units to simultaneously enhance the sparse representation and discriminative abilities of features via the determination of optimal dictionaries. In addition, with the help of the structure of the dictionary, we propose a new label discriminative regressor to boost the discrimination ability. We also propose new constraints to prevent overfitting by incorporating the advantage of the Mahalanobis and Euclidean distances to balance the recognition accuracy and generalization performance. Our proposed approach is evaluated using various scene datasets and shows superior performance to many stateof-the-art approaches.

AAAI Conference 2018 Conference Paper

Early Detection of Fake News on Social Media Through Propagation Path Classification with Recurrent and Convolutional Networks

  • Yang Liu
  • Yi-Fang Wu

In the midst of today’s pervasive influence of social media, automatically detecting fake news is drawing significant attention from both the academic communities and the general public. Existing detection approaches rely on machine learning algorithms with a variety of news characteristics to detect fake news. However, such approaches have a major limitation on detecting fake news early, i. e. , the information required for detecting fake news is often unavailable or inadequate at the early stage of news propagation. As a result, the accuracy of early detection of fake news is low. To address this limitation, in this paper, we propose a novel model for early detection of fake news on social media through classifying news propagation paths. We first model the propagation path of each news story as a multivariate time series in which each tuple is a numerical vector representing characteristics of a user who engaged in spreading the news. Then, we build a time series classifier that incorporates both recurrent and convolutional networks which capture the global and local variations of user characteristics along the propagation path respectively, to detect fake news. Experimental results on three real-world datasets demonstrate that our proposed model can detect fake news with accuracy 85% and 92% on Twitter and Sina Weibo respectively in 5 minutes after it starts to spread, which is significantly faster than state-of-the-art baselines.

IJCAI Conference 2018 Conference Paper

Energy-efficient Amortized Inference with Cascaded Deep Classifiers

  • Jiaqi Guan
  • Yang Liu
  • Qiang Liu
  • Jian Peng

Deep neural networks have been remarkable successful in various AI tasks but often cast high computation and energy cost for energy-constrained applications such as mobile sensing. We address this problem by proposing a novel framework that optimizes the prediction accuracy and energy cost simultaneously, thus enabling effective cost-accuracy trade-off at test time. In our framework, each data instance is pushed into a cascade of deep neural networks with increasing sizes, and a selection module is used to sequentially determine when a sufficiently accurate classifier can be used for this data instance. The cascade of neural networks and the selection module are jointly trained in an end-to-end fashion by the REINFORCE algorithm to optimize a trade-off between the computational cost and the predictive accuracy. Our method is able to simultaneously improve the accuracy and efficiency by learning to assign easy instances to fast yet sufficiently accurate classifiers to save computation and energy cost, while assigning harder instances to deeper and more powerful classifiers to ensure satisfiable accuracy. Moreover, we demonstrate our method's effectiveness with extensive experiments on CIFAR-10/100, ImageNet32x32 and original ImageNet dataset.

AAAI Conference 2018 Conference Paper

Euler Sparse Representation for Image Classification

  • Yang Liu
  • Quanxue Gao
  • Jungong Han
  • Shujian Wang

Sparse representation based classification (SRC) has gained great success in image recognition. Motivated by the fact that kernel trick can capture the nonlinear similarity of features, which may help improve the separability and margin between nearby data points, we propose Euler SRC for image classification, which is essentially the SRC with Euler sparse representation. To be specific, it first maps the images into the complex space by Euler representation, which has a negligible effect for outliers and illumination, and then performs complex SRC with Euler representation. The major advantage of our method is that Euler representation is explicit with no increase of the image space dimensionality, thereby enabling this technique to be easily deployed in real applications. To solve Euler SRC, we present an efficient algorithm, which is fast and has good convergence. Extensive experimental results illustrate that Euler SRC outperforms traditional SRC and achieves better performance for image classification.

AAMAS Conference 2018 Conference Paper

Gossip Gradient Descent

  • Yang Liu
  • Ji Liu
  • Tamer Basar

We consider a problem of learning a linear regression model distributively with a network of N interconnected agents which receive private streaming data. Each agent can deploy an online learning algorithm, e. g. stochastic gradient descent, to learn adaptively the regression model using its receiving private data. The goal is to devise an algorithm for each agent, under the constraint that each of them can communicate only with its neighboring agents based on a communication graph, to enable each agent converge to the true model with a performance comparable to that of the traditional centralized solution. We propose an algorithm called gossip gradient descent, and establish O q log t (1−λ2)N t convergence in expectation and mean square, where λ2 is the second largest eigenvalue of the expected gossip matrix corresponding to the underlying communication graph. For the case when agents are privacy sensitive, we propose a differentially private variant of the algorithm, which achieves ϵ-differential privacy and O r log2 t ϵ(1−λ2)N t! convergence.

AAAI Conference 2018 Conference Paper

Improved Text Matching by Enhancing Mutual Information

  • Yang Liu
  • Wenge Rong
  • Zhang Xiong

Text matching is a core issue for question answering (QA), information retrieval (IR) and many other fields. We propose to reformulate the original text, i. e. , generating a new text that is semantically equivalent to original text, to improve text matching degree. Intuitively, the generated text improves mutual information between two text sequences. We employ the generative adversarial network as the reformulation model where there is a discriminator to guide the text generating process. In this work, we focus on matching question and answers. The task is to rank answers based on QA matching degree. We first reformulate the original question without changing the asker’s intent, then compute a relevance score for each answer. To evaluate the method, we collected questions and answers from Zhihu. In addition, we also conduct substantial experiments on public data such as SemEval and WikiQA to compare our method with existing methods. Experimental results demonstrate that after adding the reformulated question, the ranking performance across different matching models can be improved consistently, indicating that the reformulated question has enhanced mutual information and effectively bridged the semantic gap between QA.

AAAI Conference 2018 Conference Paper

Incentivizing High Quality User Contributions: New Arm Generation in Bandit Learning

  • Yang Liu
  • Chien-Ju Ho

We study the problem of incentivizing high quality contributions in user generated content platforms, in which users arrive sequentially with unknown quality. We are interested in designing a content displaying strategy which decides which content should be chosen to show to users, with the goal of maximizing user experience (i. e. , the likelihood of users liking the content). This goal naturally leads to a joint problem of incentivizing high quality contributions and learning the unknown content quality. To address the incentive issue, we consider a model in which users are strategic in deciding whether to contribute and are motivated by exposure, i. e. , they aim to maximize the number of times their contributions are viewed. For the learning perspective, we model the content quality as the probability of obtaining positive feedback (e. g. , like or upvote) from a random user. Naturally, the platform needs to resolve the classical trade-off between exploration (collecting feedback for all content) and exploitation (displaying the best content). We formulate this problem as a multi-arm bandit problem, where the number of arms (i. e. , contributions) is increasing over time and depends on the strategic choices of arriving users. We first show that applying standard bandit algorithms incentivizes a flood of low cost contributions, which in turn leads to linear regret. We then propose Rand UCB which adds an additional layer of randomization on top of the UCB algorithm to address the issue of flooding contributions. We show that Rand UCB helps eliminate the incentives for low quality contributions, provides incentives for high quality contributions (due to bounded number of explorations for the low quality ones), and achieves sub-linear regrets with respect to displaying the current best arms.

NeurIPS Conference 2018 Conference Paper

Inference Aided Reinforcement Learning for Incentive Mechanism Design in Crowdsourcing

  • Zehong Hu
  • Yitao Liang
  • Jie Zhang
  • Zhao Li
  • Yang Liu

Incentive mechanisms for crowdsourcing are designed to incentivize financially self-interested workers to generate and report high-quality labels. Existing mechanisms are often developed as one-shot static solutions, assuming a certain level of knowledge about worker models (expertise levels, costs for exerting efforts, etc. ). In this paper, we propose a novel inference aided reinforcement mechanism that acquires data sequentially and requires no such prior assumptions. Specifically, we first design a Gibbs sampling augmented Bayesian inference algorithm to estimate workers' labeling strategies from the collected labels at each step. Then we propose a reinforcement incentive learning (RIL) method, building on top of the above estimates, to uncover how workers respond to different payments. RIL dynamically determines the payment without accessing any ground-truth labels. We theoretically prove that RIL is able to incentivize rational workers to provide high-quality labels both at each step and in the long run. Empirical results show that our mechanism performs consistently well under both rational and non-fully rational (adaptive learning) worker models. Besides, the payments offered by RIL are more robust and have lower variances compared to existing one-shot mechanisms.

IJCAI Conference 2018 Conference Paper

Joint Learning of Phenotypes and Diagnosis-Medication Correspondence via Hidden Interaction Tensor Factorization

  • Kejing Yin
  • William K. Cheung
  • Yang Liu
  • Benjamin C. M. Fung
  • Jonathan Poon

Non-negative tensor factorization has been shown effective for discovering phenotypes from the EHR data with minimal human supervision. In most cases, an interaction tensor of the elements in the EHR (e. g. , diagnoses and medications) has to be first established before the factorization can be applied. Such correspondence information however is often missing. While different heuristics can be used to estimate the missing correspondence, any errors introduced will in turn cause inaccuracy for the subsequent phenotype discovery task. This is especially true for patients with multiple diseases diagnosed (e. g. , under critical care). To alleviate this limitation, we propose the hidden interaction tensor factorization (HITF) where the diagnosis-medication correspondence and the underlying phenotypes are inferred simultaneously. We formulate it under a Poisson non-negative tensor factorization framework and learn the HITF model via maximum likelihood estimation. For performance evaluation, we applied HITF to the MIMIC III dataset. Our empirical results show that both the phenotypes and the correspondence inferred are clinically meaningful. In addition, the inferred HITF model outperforms a number of state-of-the-art methods for mortality prediction.

IJCAI Conference 2018 Conference Paper

Learning with Adaptive Neighbors for Image Clustering

  • Yang Liu
  • Quanxue Gao
  • Zhaohua Yang
  • Shujian Wang

Due to the importance and efficiency of learning complex structures hidden in data, graph-based methods have been widely studied and get successful in unsupervised learning. Generally, most existing graph-based clustering methods require post-processing on the original data graph to extract the clustering indicators. However, there are two drawbacks with these methods: (1) the cluster structures are not explicit in the clustering results; (2) the final clustering performance is sensitive to the construction of the original data graph. To solve these problems, in this paper, a novel learning model is proposed to learn a graph based on the given data graph such that the new obtained optimal graph is more suitable for the clustering task. We also propose an efficient algorithm to solve the model. Extensive experimental results illustrate that the proposed model outperforms other state-of-the-art clustering algorithms.

IJCAI Conference 2018 Conference Paper

Multi-scale and Discriminative Part Detectors Based Features for Multi-label Image Classification

  • Gong Cheng
  • Decheng Gao
  • Yang Liu
  • Junwei Han

Convolutional neural networks (CNNs) have shown their promise for image classification task. However, global CNN features still lack geometric invariance for addressing the problem of intra-class variations and so are not optimal for multi-label image classification. This paper proposes a new and effective framework built upon CNNs to learn Multi-scale and Discriminative Part Detectors (MsDPD)-based feature representations for multi-label image classification. Specifically, at each scale level, we (i) first present an entropy-rank based scheme to generate and select a set of discriminative part detectors (DPD), and then (ii) obtain a number of DPD-based convolutional feature maps with each feature map representing the occurrence probability of a particular part detector and learn DPD-based features by using a task-driven pooling scheme. The two steps are formulated into a unified framework by developing a new objective function, which jointly trains part detectors incrementally and integrates the learning of feature representations into the classification task. Finally, the multi-scale features are fused to produce the predictions. Experimental results on PASCAL VOC 2007 and VOC 2012 datasets demonstrate that the proposed method achieves better accuracy when compared with the existing state-of-the-art multi-label classification methods.

AAAI Conference 2018 Conference Paper

Robust Formulation for PCA: Avoiding Mean Calculation With L 2,p -norm Maximization

  • Shuangli Liao
  • Jin Li
  • Yang Liu
  • Quanxue Gao
  • Xinbo Gao

Most existing robust principal component analysis (PCA) involve mean estimation for extracting low-dimensional representation. However, they do not get the optimal mean for real data, which include outliers, under the different robust distances metric learning, such as 1-norm and 2, 1-norm. This affects the robustness of algorithms. Motivated by the fact that the variance of data can be characterized by the variation between each pair of data, we propose a novel robust formulation for PCA. It avoids computing the mean of data in the criterion function. Our method employs 2, p norm as the distance metric to measure the variation in the criterion function and aims to seek the projection matrix that maximizes the sum of variation between each pair of the projected data. Both theoretical analysis and experimental results demonstrate that our methods are efficient and superior to most existing robust methods for data reconstruction.

IJCAI Conference 2018 Conference Paper

Zero Shot Learning via Low-rank Embedded Semantic AutoEncoder

  • Yang Liu
  • Quanxue Gao
  • Jin Li
  • Jungong Han
  • Ling Shao

Zero-shot learning (ZSL) has been widely researched and get successful in machine learning. Most existing ZSL methods aim to accurately recognize objects of unseen classes by learning a shared mapping from the feature space to a semantic space. However, such methods did not investigate in-depth whether the mapping can precisely reconstruct the original visual feature. Motivated by the fact that the data have low intrinsic dimensionality e. g. low-dimensional subspace. In this paper, we formulate a novel framework named Low-rank Embedded Semantic AutoEncoder (LESAE) to jointly seek a low-rank mapping to link visual features with their semantic representations. Taking the encoder-decoder paradigm, the encoder part aims to learn a low-rank mapping from the visual feature to the semantic space, while decoder part manages to reconstruct the original data with the learned mapping. In addition, a non-greedy iterative algorithm is adopted to solve our model. Extensive experiments on six benchmark datasets demonstrate its superiority over several state-of-the-art algorithms.

AAAI Conference 2018 Conference Paper

Zero-Resource Neural Machine Translation with Multi-Agent Communication Game

  • Yun Chen
  • Yang Liu
  • Victor Li

While end-to-end neural machine translation (NMT) has achieved notable success in the past years in translating a handful of resource-rich language pairs, it still suffers from the data scarcity problem for low-resource language pairs and domains. To tackle this problem, we propose an interactive multimodal framework for zero-resource neural machine translation. Instead of being passively exposed to large amounts of parallel corpora, our learners (implemented as encoder-decoder architecture) engage in cooperative image description games, and thus develop their own image captioning or neural machine translation model from the need to communicate in order to succeed at the game. Experimental results on the IAPR-TC12 and Multi30K datasets show that the proposed learning mechanism significantly improves over the state-of-the-art methods.

EAAI Journal 2017 Journal Article

A hybrid harmony search algorithm with efficient job sequence scheme and variable neighborhood search for the permutation flow shop scheduling problems

  • Fuqing Zhao
  • Yang Liu
  • Yi Zhang
  • Weimin Ma
  • Chuck Zhang

The permutation flow shop scheduling problem (PFSSP), one of the most widely studied production scheduling problems, is a typical NP-hard combinatorial optimization problem. In this paper, a hybrid harmony search algorithm with efficient job sequence mapping scheme and variable neighborhood search (VNS), named HHS, is proposed to solve the PFFSP with the objective to minimize the makespan. First of all, to extend the HHS algorithm to solve the PFSSP effectively, an efficient smallest order value (SOV) rule based on random key is introduced to convert continuous harmony vector into a discrete job permutation after fully investigating the effect of different job sequence mapping schemes. Secondly, an effective initialization scheme, which is based on NEH heuristic mechanism combining with chaotic sequence, is employed with the aim of improving the solution’s quality of the initial harmony memory (HM). Thirdly, an opposition-based learning technique in the selection process and the best harmony (best individual) in the pitch adjustment process are made full use of to accelerate convergence performances and improve solution accuracy. Meanwhile, the parameter sensitivity is studied to investigate the properties of HHS, and the recommended values of parameters adopted in HHS are presented. Finally, by making use of a novel variable neighborhood search, the efficient insert and swap structures are incorporated into the HHS to adequately emphasize local exploitation ability. Experimental simulations and comparisons on both continuous and combinatorial benchmark problems demonstrate that the HHS algorithm outperforms the standard HS algorithm and other recently proposed efficient algorithms in terms of solution quality and stability.

AAAI Conference 2017 Conference Paper

Bilingual Lexicon Induction from Non-Parallel Data with Minimal Supervision

  • Meng Zhang
  • Haoruo Peng
  • Yang Liu
  • Huanbo Luan
  • Maosong Sun

Building bilingual lexica from non-parallel data is a longstanding natural language processing research problem that could benefit thousands of resource-scarce languages which lack parallel data. Recent advances of continuous word representations have opened up new possibilities for this task, e. g. by establishing cross-lingual mapping between word embeddings via a seed lexicon. The method is however unreliable when there are only a limited number of seeds, which is a reasonable setting for resource-scarce languages. We tackle the limitation by introducing a novel matching mechanism into bilingual word representation learning. It captures extra translation pairs exposed by the seeds to incrementally improve the bilingual word embeddings. In our experiments, we find the matching mechanism to substantially improve the quality of the bilingual vector space, which in turn allows us to induce better bilingual lexica with seeds as few as 10.

IJCAI Conference 2017 Conference Paper

Crowd Learning: Improving Online Decision Making Using Crowdsourced Data

  • Yang Liu
  • Mingyan Liu

We analyze an online learning problem that arises in crowdsourcing systems for users facing crowdsourced data: a user at each discrete time step t can choose K out of a total of N options (bandits), and receives randomly generated rewards dependent on user-specific and option-specific statistics unknown to the user. Each user aims to maximize her expected total rewards over a certain time horizon through a sequence of exploration and exploitation steps. Different from the typical regret/bandit learning setting, in this case a user may also exploit crowdsourced information to augment her learning process, i. e. , other users' choices or rewards from using these options. We consider two scenarios, one in which only their choices are shared, and the other in which users share full information including their choices and subsequent rewards. In both cases we derive bounds on the weak regret, the difference between the user's expected total reward and the reward from a user-specific best single-action policy; and show how they improve over their individual-learning counterpart. We also evaluate the performance of our algorithms using simulated data as well as the real-world movie ratings dataset MovieLens.

TCS Journal 2017 Journal Article

Fast quantum algorithms for least squares regression and statistic leverage scores

  • Yang Liu
  • Shengyu Zhang

Least squares regression is the simplest and most widely used technique for solving overdetermined systems of linear equations A x = b, where A ∈ R n × p has full column rank and b ∈ R n. Though there is a well known unique solution x ⁎ ∈ R p to minimize the squared error ‖ A x − b ‖ 2 2, the best known classical algorithm to find x ⁎ takes time Ω ( n ), even for sparse and well-conditioned matrices A, a fairly large class of input instances commonly seen in practice. In this paper, we design an efficient quantum algorithm to generate a quantum state proportional to | x ⁎ 〉. The algorithm takes only O ( log ⁡ n ) time for sparse and well-conditioned A. When the condition number of A is large, a canonical solution is to use regularization. We give efficient quantum algorithms for two regularized regression problems, including ridge regression and δ-truncated SVD, with similar costs and solution approximation. Given a matrix A ∈ R n × p of rank r with SVD A = U Σ V T where U ∈ R n × r, Σ ∈ R r × r and V ∈ R p × r, the statistical leverage scores of A are the squared row norms of U, defined as s i = ‖ U i ‖ 2 2, for i = 1, …, n. The matrix coherence is the largest statistic leverage score. These quantities play an important role in many machine learning algorithms. The best known classical algorithm to approximate these values runs in time Ω ( n p ). In this work, we introduce an efficient quantum algorithm to approximate s i in time O ( log ⁡ n ) when A is sparse and the ratio between A's largest singular value and smallest non-zero singular value is constant. This gives an exponential speedup over the best known classical algorithms. Different than previous examples which are mainly modern algebraic or number theoretic ones, this problem is linear algebraic. It is also different than previous quantum algorithms for solving linear equations and least squares regression, whose outputs compress the p-dimensional solution to a log ⁡ ( p ) -qubit quantum state.

TIST Journal 2017 Journal Article

Implicit Visual Learning

  • Yan Liu
  • Yang Liu
  • Shenghua Zhong
  • Songtao Wu

According to consciousness involvement, human’s learning can be roughly classified into explicit learning and implicit learning. Contrasting strongly to explicit learning with clear targets and rules, such as our school study of mathematics, learning is implicit when we acquire new information without intending to do so. Research from psychology indicates that implicit learning is ubiquitous in our daily life. Moreover, implicit learning plays an important role in human visual perception. But in the past 60 years, most of the well-known machine-learning models aimed to simulate explicit learning while the work of modeling implicit learning was relatively limited, especially for computer vision applications. This article proposes a novel unsupervised computational model for implicit visual learning by exploring dissipative system, which provides a unifying macroscopic theory to connect biology with physics. We test the proposed Dissipative Implicit Learning Model (DILM) on various datasets. The experiments show that DILM not only provides a good match to human behavior but also improves the explicit machine-learning performance obviously on image classification tasks.

IJCAI Conference 2017 Conference Paper

Joint Training for Pivot-based Neural Machine Translation

  • Yong Cheng
  • Qian Yang
  • Yang Liu
  • Maosong Sun
  • Wei Xu

While recent neural machine translation approaches have delivered state-of-the-art performance for resource-rich language pairs, they suffer from the data scarcity problem for resource-scarce language pairs. Although this problem can be alleviated by exploiting a pivot language to bridge the source and target languages, the source-to-pivot and pivot-to-target translation models are usually independently trained. In this work, we introduce a joint training algorithm for pivot-based neural machine translation. We propose three methods to connect the two models and enable them to interact with each other during training. Experiments on Europarl and WMT corpora show that joint training of source-to-pivot and pivot-to-target models leads to significant improvements over independent training across various languages.

AAAI Conference 2017 Conference Paper

Lattice-Based Recurrent Neural Network Encoders for Neural Machine Translation

  • Jinsong Su
  • Zhixing Tan
  • Deyi Xiong
  • Rongrong Ji
  • Xiaodong Shi
  • Yang Liu

Neural machine translation (NMT) heavily relies on wordlevel modelling to learn semantic representations of input sentences. However, for languages without natural word delimiters (e. g. , Chinese) where input sentences have to be tokenized first, conventional NMT is confronted with two issues: 1) it is difficult to find an optimal tokenization granularity for source sentence modelling, and 2) errors in 1-best tokenizations may propagate to the encoder of NMT. To handle these issues, we propose word-lattice based Recurrent Neural Network (RNN) encoders for NMT, which generalize the standard RNN to word lattice topology. The proposed encoders take as input a word lattice that compactly encodes multiple tokenizations, and learn to generate new hidden states from arbitrarily many inputs and hidden states in preceding time steps. As such, the word-lattice based encoders not only alleviate the negative impact of tokenization errors but also are more expressive and flexible to embed input sentences. Experiment results on Chinese-English translation demonstrate the superiorities of the proposed encoders over the conventional encoder.

IJCAI Conference 2017 Conference Paper

Maximum Expected Likelihood Estimation for Zero-resource Neural Machine Translation

  • Hao Zheng
  • Yong Cheng
  • Yang Liu

While neural machine translation (NMT) has made remarkable progress in translating a handful of high-resource language pairs recently, parallel corpora are not always available for many zero-resource language pairs. To deal with this problem, we propose an approach to zero-resource NMT via maximum expected likelihood estimation. The basic idea is to maximize the expectation with respect to a pivot-to-source translation model for the intended source-to-target model on a pivot-target parallel corpus. To approximate the expectation, we propose two methods to connect the pivot-to-source and source-to-target models. Experiments on two zero-resource language pairs show that the proposed approach yields substantial gains over baseline methods. We also observe that when trained jointly with the source-to-target model, the pivot-to-source translation model also obtains improvements over independent training.

AAAI Conference 2017 Conference Paper

Maximum Reconstruction Estimation for Generative Latent-Variable Models

  • Yong Cheng
  • Yang Liu
  • Wei Xu

Generative latent-variable models are important for natural language processing due to their capability of providing compact representations of data. As conventional maximum likelihood estimation (MLE) is prone to focus on explaining irrelevant but common correlations in data, we apply maximum reconstruction estimation (MRE) to learning generative latent-variable models alternatively, which aims to find model parameters that maximize the probability of reconstructing the observed data. We develop tractable algorithms to directly learn hidden Markov models and IBM translation models using the MRE criterion, without the need to introduce a separate reconstruction model to facilitate efficient inference. Experiments on unsupervised part-of-speech induction and unsupervised word alignment show that our approach enables generative latent-variable models to better discover intended correlations in data and outperforms maximum likelihood estimators significantly.

AAAI Conference 2017 Conference Paper

Neural Machine Translation with Reconstruction

  • Zhaopeng Tu
  • Yang Liu
  • Lifeng Shang
  • Xiaohua Liu
  • Hang Li

Although end-to-end Neural Machine Translation (NMT) has achieved remarkable progress in the past two years, it suffers from a major drawback: translations generated by NMT systems often lack of adequacy. It has been widely observed that NMT tends to repeatedly translate some source words while mistakenly ignoring other words. To alleviate this problem, we propose a novel encoder-decoder-reconstructor framework for NMT. The reconstructor, incorporated into the NMT model, manages to reconstruct the input source sentence from the hidden layer of the output target sentence, to ensure that the information in the source side is transformed to the target side as much as possible. Experiments show that the proposed framework significantly improves the adequacy of NMT output and achieves superior translation result over state-of-theart NMT and statistical MT systems.

AAAI Conference 2017 Conference Paper

Sequential Peer Prediction: Learning to Elicit Effort using Posted Prices

  • Yang Liu
  • Yiling Chen

Peer prediction mechanisms are often adopted to elicit truthful contributions from crowd workers when no ground-truth verification is available. Recently, mechanisms of this type have been developed to incentivize effort exertion, in addition to truthful elicitation. In this paper, we study a sequential peer prediction problem where a data requester wants to dynamically determine the reward level to optimize the trade-off between the quality of information elicited from workers and the total expected payment. In this problem, workers have homogeneous expertise and heterogeneous cost for exerting effort, both unknown to the requester. We propose a sequential posted-price mechanism to dynamically learn the optimal reward level from workers’ contributions and to incentivize effort exertion and truthful reporting. We show that (1) in our mechanism, workers exerting effort according to a nondegenerate threshold policy and then reporting truthfully is an equilibrium that returns highest utility for every worker, and (2) The regret of our learning mechanism w. r. t. offering the optimal reward (price) is upper bounded by Õ(T3/4 ) where T is the learning horizon. We further show the power of our learning approach when the reports of workers do not necessarily follow the game-theoretic equilibrium.

NeurIPS Conference 2016 Conference Paper

A Bandit Framework for Strategic Regression

  • Yang Liu
  • Yiling Chen

We consider a learner's problem of acquiring data dynamically for training a regression model, where the training data are collected from strategic data sources. A fundamental challenge is to incentivize data holders to exert effort to improve the quality of their reported data, despite that the quality is not directly verifiable by the learner. In this work, we study a dynamic data acquisition process where data holders can contribute multiple times. Using a bandit framework, we leverage on the long-term incentive of future job opportunities to incentivize high-quality contributions. We propose a Strategic Regression-Upper Confidence Bound (SR-UCB) framework, an UCB-style index combined with a simple payment rule, where the index of a worker approximates the quality of his past contributions and is used by the learner to determine whether the worker receives future work. For linear regression and certain family of non-linear regression problems, we show that SR-UCB enables a $O(\sqrt{\log T/T})$-Bayesian Nash Equilibrium (BNE) where each worker exerting a target effort level that the learner has chosen, with $T$ being the number of data acquisition stages. The SR-UCB framework also has some other desirable properties: (1) The indexes can be updated in an online fashion (hence computationally light). (2) A slight variant, namely Private SR-UCB (PSR-UCB), is able to preserve $(O(\log^{-1} T), O(\log^{-1} T))$-differential privacy for workers' data, with only a small compromise on incentives (achieving $O(\log^{6} T/\sqrt{T})$-BNE).

IJCAI Conference 2016 Conference Paper

Agreement-Based Joint Training for Bidirectional Attention-Based Neural Machine Translation

  • Yong Cheng
  • Shiqi Shen
  • Zhongjun He
  • Wei He
  • Hua Wu
  • Maosong Sun
  • Yang Liu

The attentional mechanism has proven to be effective in improving end-to-end neural machine translation. However, due to the intricate structural divergence between natural languages, unidirectional attention-based models might only capture partial aspects of attentional regularities. We propose agreement-based joint training for bidirectional attention-based end-to-end neural machine translation. Instead of training source-to-target and target-to-source translation models independently, our approach encourages the two complementary models to agree on word alignment matrices on the same training data. Experiments on Chinese-English and English-French translation tasks show that agreement-based joint training significantly improves both alignment and translation quality over independent training.

AAAI Conference 2016 Conference Paper

Building Earth Mover’s Distance on Bilingual Word Embeddings for Machine Translation

  • Meng Zhang
  • Yang Liu
  • Huanbo Luan
  • Maosong Sun
  • Tatsuya Izuha
  • Jie Hao

Following their monolingual counterparts, bilingual word embeddings are also on the rise. As a major application task, word translation has been relying on the nearest neighbor to connect embeddings cross-lingually. However, the nearest neighbor strategy suffers from its inherently local nature and fails to cope with variations in realistic bilingual word embeddings. Furthermore, it lacks a mechanism to deal with manyto-many mappings that often show up across languages. We introduce Earth Mover’s Distance to this task by providing a natural formulation that translates words in a holistic fashion, addressing the limitations of the nearest neighbor. We further extend the formulation to a new task of identifying parallel sentences, which is useful for statistical machine translation systems, thereby expanding the application realm of bilingual word embeddings. We show encouraging performance on both tasks.

AAAI Conference 2016 Conference Paper

Finding One’s Best Crowd: Online Learning By Exploiting Source Similarity

  • Yang Liu
  • Mingyan Liu

We consider an online learning problem (classification or prediction) involving disparate sources of sequentially arriving data, whereby a user over time learns the best set of data sources to use in constructing the classifier by exploiting their similarity. We first show that, when (1) the similarity information among data sources is known, and (2) data from different sources can be acquired without cost, then a judicious selection of data from different sources can effectively enlarge the training sample size compared to using a single data source, thereby improving the rate and performance of learning; this is achieved by bounding the classification error of the resulting classifier. We then relax assumption (1) and characterize the loss in learning performance when the similarity information must also be acquired through repeated sampling. We further relax both (1) and (2) and present a cost-efficient algorithm that identifies a best crowd from a potentially large set of data sources in terms of both classifier performance and data acquisition cost. This problem has various applications, including online prediction systems with time series data of various forms, such as financial markets, advertisement and network measurement.

AAAI Conference 2016 Conference Paper

Implicit Discourse Relation Classification via Multi-Task Neural Networks

  • Yang Liu
  • Sujian Li
  • Xiaodong Zhang
  • Zhifang Sui

Without discourse connectives, classifying implicit discourse relations is a challenging task and a bottleneck for building a practical discourse parser. Previous research usually makes use of one kind of discourse framework such as PDTB or RST to improve the classification performance on discourse relations. Actually, under different discourse annotation frameworks, there exist multiple corpora which have internal connections. To exploit the combination of different discourse corpora, we design related discourse classification tasks specific to a corpus, and propose a novel Convolutional Neural Network embedded multi-task learning system to synthesize these tasks by learning both unique and shared representations for each task. The experimental results on the PDTB implicit discourse relation classification task demonstrate that our model achieves significant gains over baseline systems.

AAAI Conference 2016 Conference Paper

Is It Harmful When Advisors Only Pretend to Be Honest?

  • Dongxia Wang
  • Tim Muller
  • Jie Zhang
  • Yang Liu

In trust systems, unfair rating attacks – where advisors provide ratings dishonestly – influence the accuracy of trust evaluation. A secure trust system should function properly under all possible unfair rating attacks; including dynamic attacks. In the literature, camouflage attacks are the most studied dynamic attacks. But an open question is whether more harmful dynamic attacks exist. We propose random processes to model and measure dynamic attacks. The harm of an attack is influenced by a user’s ability to learn from the past. We consider three types of users: blind users, aware users, and general users. We found for all the three types, camouflage attacks are far from the most harmful. We identified the most harmful attacks, under which we found the ratings may still be useful to users.

IJCAI Conference 2016 Conference Paper

Learning to Incentivize: Eliciting Effort via Output Agreement

  • Yang Liu
  • Yiling Chen

In crowdsourcing when there is a lack of verification for contributed answers, output agreement mechanisms are often used to incentivize participants to provide truthful answers when the correct answer is hold by the majority. In this paper, we focus on using output agreement mechanisms to elicit effort, in addition to eliciting truthful answers, from a population of workers. We consider a setting where workers have heterogeneous cost of effort exertion and examine the data requester's problem of deciding the reward level in output agreement for optimal elicitation. In particular, when the requester knows the cost distribution, we derive the optimal reward level for output agreement mechanisms. This is achieved by first characterizing Bayesian Nash equilibria of output agreement mechanisms for a given reward level. When the cost distribution is unknown to the requester, we develop sequential mechanisms that combine learning the cost distribution with incentivizing effort exertion to approximately determine the optimal reward level.

JAIR Journal 2016 Journal Article

ProMoca: Probabilistic Modeling and Analysis of Agents in Commitment Protocols

  • Akın Günay
  • Yang Liu
  • Jie Zhang

Social commitment protocols regulate interactions of agents in multiagent systems. Several methods have been developed to analyze properties of commitment protocols. However, analysis of an agent's behavior in a commitment protocol, which should take into account the agent's goals and beliefs, has received less attention. In this paper we present ProMoca framework to address this issue. Firstly, we develop an expressive formal language to model agents with respect to their commitments. Our language provides dedicated elements to define commitment protocols, and model agents in terms of their goals, behaviors, and beliefs. Furthermore, our language provides probabilistic and non-deterministic elements to model uncertainty in agents' beliefs. Secondly, we identify two essential properties of an agent with respect to a commitment protocol, namely compliance and goal satisfaction. We formalize these properties using a probabilistic variant of linear temporal logic. Thirdly, we adapt a probabilistic model checking algorithm to automatically analyze compliance and goal satisfaction properties. Finally, we present empirical results about efficiency and scalability of ProMoca.

AAAI Conference 2016 Conference Paper

To Swap or Not to Swap? Exploiting Dependency Word Pairs for Reordering in Statistical Machine Translation

  • Christian Hadiwinoto
  • Yang Liu
  • Hwee Tou Ng

Reordering poses a major challenge in machine translation (MT) between two languages with significant differences in word order. In this paper, we present a novel reordering approach utilizing sparse features based on dependency word pairs. Each instance of these features captures whether two words, which are related by a dependency link in the source sentence dependency parse tree, follow the same order or are swapped in the translation output. Experiments on Chinese-to-English translation show a statistically significant improvement of 1. 21 BLEU point using our approach, compared to a state-of-the-art statistical MT system that incorporates prior reordering approaches.

EAAI Journal 2015 Journal Article

A flood inundation modelling using v-support vector machine regression model

  • Yang Liu
  • Gareth Pender

Full two dimensional (2D) hydrodynamic models have proven to be successful in a wide area of applications. The limitation of using full 2D models is their expensive computational requirement. The flood risk analysis and model uncertainty analysis usually need to run the numerical model and evaluate the performance thousands of times. However, in real world applications, there is simply not enough time and resources to perform such a huge number of model runs. In this study, a computational framework, known as v-Support Vector Regression (SVR)-Fine Grid Model (FGM) or linear regression (LR)-FGM, is presented for solving computationally expensive simulation problems. The concept of v-SVR-FGM or LR-FGM will be demonstrated via a small number of fine grid model (FGM) runs using a nonlinear regression or linear regression model with data preprocessing. The approximation model is performed in predicting the form of results of FGM instead of running the time consuming FGM. This approach can substantially reduce computational running time without loss of accuracy of FGM. The simulation results suggest that the proposed method is able to achieve good predictive results (water depth and velocity) as well as provide considerable savings in computer time.

AAAI Conference 2015 Conference Paper

A Novel Neural Topic Model and Its Supervised Extension

  • Ziqiang Cao
  • Sujian Li
  • Yang Liu
  • Wenjie Li
  • Heng Ji

Topic modeling techniques have the benefits of modeling words and documents uniformly under a probabilistic framework. However, they also suffer from the limitations of sensitivity to initialization and unigram topic distribution, which can be remedied by deep learning techniques. To explore the combination of topic modeling and deep learning techniques, we first explain the standard topic model from the perspective of a neural network. Based on this, we propose a novel neural topic model (NTM) where the representation of words and documents are efficiently and naturally combined into a uniform framework. Extending from NTM, we can easily add a label layer and propose the supervised neural topic model (sNTM) to tackle supervised tasks. Experiments show that our models are competitive in both topic discovery and classification/regression tasks.

AAAI Conference 2015 Conference Paper

Automated Analysis of Commitment Protocols Using Probabilistic Model Checking

  • Akın Günay
  • Song Songzheng
  • Yang Liu
  • Jie Zhang

Commitment protocols provide an effective formalism for the regulation of agent interaction. Although existing work mainly focus on the design-time development of static commitment protocols, recent studies propose methods to create them dynamically at run-time with respect to the goals of the agents. These methods require agents to verify new commitment protocols taking their goals, and beliefs about the other agents’ behavior into account. Accordingly, in this paper, we first propose a probabilistic model to formally capture commitment protocols according to agents’ beliefs. Secondly, we identify a set of important properties for the verification of a new commitment protocol from an agent’s perspective and formalize these properties in our model. Thirdly, we develop probabilistic model checking algorithms with advanced reduction for efficient verification of these properties. Finally, we implement these algorithms as a tool and evaluate the proposed properties over different commitment protocols.

AAAI Conference 2015 Conference Paper

Contrastive Unsupervised Word Alignment with Non-Local Features

  • Yang Liu
  • Maosong Sun

Word alignment is an important natural language processing task that indicates the correspondence between natural languages. Recently, unsupervised learning of log-linear models for word alignment has received considerable attention as it combines the merits of generative and discriminative approaches. However, a major challenge still remains: it is intractable to calculate the expectations of non-local features that are critical for capturing the divergence between natural languages. We propose a contrastive approach that aims to differentiate observed training examples from noises. It not only introduces prior knowledge to guide unsupervised learning but also cancels out partition functions. Based on the observation that the probability mass of log-linear models for word alignment is usually highly concentrated, we propose to use top-n alignments to approximate the expectations with respect to posterior distributions. This allows for efficient and accurate calculation of expectations of non-local features. Experiments show that our approach achieves significant improvements over stateof-the-art unsupervised word alignment methods.

IJCAI Conference 2015 Conference Paper

Iterative Learning of Parallel Lexicons and Phrases from Non-Parallel Corpora

  • Meiping Dong
  • Yang Liu
  • Huanbo Luan
  • Maosong Sun
  • Tatsuya Izuha
  • Dakun Zhang

While parallel corpora are an indispensable resource for data-driven multilingual natural language processing tasks such as machine translation, they are limited in quantity, quality and coverage. As a result, learning translation models from nonparallel corpora has become increasingly important nowadays, especially for low-resource languages. In this work, we propose a joint model for iteratively learning parallel lexicons and phrases from non-parallel corpora. The model is trained using a Viterbi EM algorithm that alternates between constructing parallel phrases using lexicons and updating lexicons based on the constructed parallel phrases. Experiments on Chinese-English datasets show that our approach learns better parallel lexicons and phrases and improves translation performance significantly.

IJCAI Conference 2015 Conference Paper

Joint POS Tagging and Text Normalization for Informal Text

  • Chen Li
  • Yang Liu

Text normalization and part-of-speech (POS) tagging for social media data have been investigated recently, however, prior work has treated them separately. In this paper, we propose a joint Viterbi decoding process to determine each token’s POS tag and non-standard token’s correct form at the same time. In order to evaluate our approach, we create two new data sets with POS tag labels and non-standard tokens’ correct forms. This is the first data set with such annotation. The experiment results demonstrate the effect of non-standard words on POS tagging, and also show that our proposed methods perform better than the state-of-theart systems in both POS tagging and normalization.

AAAI Conference 2015 Conference Paper

Learning Entity and Relation Embeddings for Knowledge Graph Completion

  • Yankai Lin
  • Zhiyuan Liu
  • Maosong Sun
  • Yang Liu
  • Xuan Zhu

Knowledge graph completion aims to perform link prediction between entities. In this paper, we consider the approach of knowledge graph embeddings. Recently, models such as TransE and TransH build entity and relation embeddings by regarding a relation as translation from head entity to tail entity. We note that these models simply put both entities and relations within the same semantic space. In fact, an entity may have multiple aspects and various relations may focus on different aspects of entities, which makes a common space insufficient for modeling. In this paper, we propose TransR to build entity and relation embeddings in separate entity space and relation spaces. Afterwards, we learn embeddings by first projecting entities from entity space to corresponding relation space and then building translations between projected entities. In experiments, we evaluate our models on three tasks including link prediction, triple classification and relational fact extraction. Experimental results show significant and consistent improvements compared to stateof-the-art baselines including TransE and TransH. The source code of this paper can be obtained from https: //github. com/mrlyk423/relation extraction.

IJCAI Conference 2015 Conference Paper

Quantifying Robustness of Trust Systems against Collusive Unfair Rating Attacks Using Information Theory

  • Dongxia Wang
  • Tim Muller
  • Jie Zhang
  • Yang Liu

Unfair rating attacks happen in existing trust and reputation systems, lowering the quality of the systems. There exists a formal model that measures the maximum impact of independent attackers [Wang et al. , 2015] – based on information theory. We improve on these results in multiple ways: (1) we alter the methodology to be able to reason about colluding attackers as well, and (2) we extend the method to be able to measure the strength of any attacks (rather than just the strongest attack). Using (1), we identify the strongest collusion attacks, helping construct robust trust system. Using (2), we identify the strength of (classes of) attacks that we found in the literature. Based on this, we help to overcome a shortcoming of current research into collusion-resistance – specific (types of) attacks are used in simulations, disallowing direct comparisons between analyses of systems.

AAAI Conference 2015 Conference Paper

Topical Word Embeddings

  • Yang Liu
  • Zhiyuan Liu
  • Tat-Seng Chua
  • Maosong Sun

Most word embedding models typically represent each word using a single vector, which makes these models indiscriminative for ubiquitous homonymy and polysemy. In order to enhance discriminativeness, we employ latent topic models to assign topics for each word in the text corpus, and learn topical word embeddings (TWE) based on both words and their topics. In this way, contextual word embeddings can be flexibly obtained to measure contextual word similarity. We can also build document representations, which are more expressive than some widely-used document models such as latent topic models. In the experiments, we evaluate the TWE models on two tasks, contextual word similarity and text classification. The experimental results show that our models outperform typical word embedding models including the multi-prototype version on contextual word similarity, and also exceed latent topic models and other representative document models on text classification. The source code of this paper can be obtained from https: //github. com/largelymfs/ topical word embeddings.

JBHI Journal 2014 Journal Article

Adaptive Shape Prior Constrained Level Sets for Bladder MR Image Segmentation

  • Xianjing Qin
  • Xuelong Li
  • Yang Liu
  • Hongbing Lu
  • Pingkun Yan

Three-dimensional bladder wall segmentation for thickness measuring can be very useful for bladder magnetic resonance (MR) image analysis, since thickening of the bladder wall can indicate abnormality. However, it is a challenging task due to the artifacts inside bladder lumen, weak boundaries in the apex and base areas, and complicated outside intensity distributions. To deal with these difficulties, in this paper, an adaptive shape prior constrained directional level set model is proposed to segment the inner and outer boundaries of the bladder wall. In addition, a coupled directional level set model is presented to refine the segmentation by exploiting the prior knowledge of region information and minimum thickness. With our proposed method, the influence of the artifacts in the bladder lumen and the complicated outside tissues surrounding the bladder can be appreciably reduced. Furthermore, the leakage on the weak boundaries can be avoided. Compared with other related methods, better results were obtained on 11 patients' 3-D bladder MR images by using the proposed method.

AAAI Conference 2013 Conference Paper

An Extended GHKM Algorithm for Inducing Lambda-SCFG

  • Peng Li
  • Yang Liu
  • Maosong Sun

Semantic parsing, which aims at mapping a natural language (NL) sentence into its formal meaning representation (e. g. , logical form), has received increasing attention in recent years. While synchronous context-free grammar (SCFG) augmented with lambda calculus (λ- SCFG) provides an effective mechanism for semantic parsing, how to learn such λ-SCFG rules still remains a challenge because of the difficulty in determining the correspondence between NL sentences and logical forms. To alleviate this structural divergence problem, we extend the GHKM algorithm, which is a state-ofthe-art algorithm for learning synchronous grammars in statistical machine translation, to induce λ-SCFG from pairs of NL sentences and logical forms. By treating logical forms as trees, we reformulate the theory behind GHKM that gives formal semantics to the alignment between NL words and logical form tokens. Experiments on the GEOQUERY dataset show that our semantic parser achieves an F-measure of 90. 2%, the best result published to date.

IJCAI Conference 2013 Conference Paper

Improving Question Retrieval in Community Question Answering Using World Knowledge

  • Guangyou Zhou
  • Yang Liu
  • Fang Liu
  • Daojian Zeng
  • Jun Zhao

Community question answering (cQA), which provides a platform for people with diverse background to share information and knowledge, has become an increasingly popular research topic. In this paper, we focus on the task of question retrieval. The key problem of question retrieval is to measure the similarity between the queried questions and the historical questions which have been solved by other users. The traditional methods measure the similarity based on the bag-of-words (BOWs) representation. This representation neither captures dependencies between related words, nor handles synonyms or polysemous words. In this work, we first propose a way to build a concept thesaurus based on the semantic relations extracted from the world knowledge of Wikipedia. Then, we develop a unified framework to leverage these semantic relations in order to enhance the question similarity in the concept space. Experiments conducted on a real cQA data set show that with the help of Wikipedia thesaurus, the performance of question retrieval is improved as compared to the traditional methods.

TCS Journal 2013 Journal Article

On testing monomials in multivariate polynomials

  • Zhixiang Chen
  • Bin Fu
  • Yang Liu
  • Robert Schweller

This paper presents a summary of our initial work on developing a theory of testing monomials in multivariate polynomials. The central question is to ask whether a polynomial represented by certain economically compact structure has a multilinear monomial in its sum-product expansion. The complexity aspects of this problem and its variants are investigated with two objectives. One is to understand how this problem relates to critical problems in complexity, and if so to what extent. The other is to exploit possibilities of applying algebraic properties of polynomials to the study of those problems. A series of results about Π Σ Π and Π Σ polynomials is obtained in this paper, laying a basis for further study along this line. Several randomized and deterministic algorithms are devised for testing multilinear monomials or p -monomials in certain respective types of polynomials, where p is prime.

IJCAI Conference 2013 Conference Paper

Opinion Target Extraction Using Partially-Supervised Word Alignment Model

  • Kang Liu
  • Liheng Xu
  • Yang Liu
  • Jun Zhao

Mining opinion targets from online reviews is an important and challenging task in opinion mining. This paper proposes a novel approach to extract opinion targets by using partially-supervised word alignment model (PSWAM). At first, we apply PSWAM in a monolingual scenario to mine opinion relations in sentences and estimate the associations between words. Then, a graph-based algorithm is exploited to estimate the confidence of each candidate, and the candidates with higher confidence will be extracted as the opinion targets. Compared with existing syntax-based methods, PSWAM can effectively avoid parsing errors when dealing with informal sentences in online reviews. Compared with the methods using alignment model, PSWAM can capture opinion relations more precisely through partial supervision from partial alignment links. Moreover, when estimating candidate confidence, we make penalties on higherdegree vertices in our graph-based algorithm in order to decrease the probability of the random walk running into the unrelated regions in the graph. As a result, some errors can be avoided. The experimental results on three data sets with different sizes and languages show that our approach outperforms state-of-the-art methods.

YNIMG Journal 2012 Journal Article

Inferring consistent functional interaction patterns from natural stimulus FMRI data

  • Jiehuan Sun
  • Xintao Hu
  • Xiu Huang
  • Yang Liu
  • Kaiming Li
  • Xiang Li
  • Junwei Han
  • Lei Guo

There has been increasing interest in how the human brain responds to natural stimulus such as video watching in the neuroimaging field. Along this direction, this paper presents our effort in inferring consistent and reproducible functional interaction patterns under natural stimulus of video watching among known functional brain regions identified by task-based fMRI. Then, we applied and compared four statistical approaches, including Bayesian network modeling with searching algorithms: greedy equivalence search (GES), Peter and Clark (PC) analysis, independent multiple greedy equivalence search (IMaGES), and the commonly used Granger causality analysis (GCA), to infer consistent and reproducible functional interaction patterns among these brain regions. It is interesting that a number of reliable and consistent functional interaction patterns were identified by the GES, PC and IMaGES algorithms in different participating subjects when they watched multiple video shots of the same semantic category. These interaction patterns are meaningful given current neuroscience knowledge and are reasonably reproducible across different brains and video shots. In particular, these consistent functional interaction patterns are supported by structural connections derived from diffusion tensor imaging (DTI) data, suggesting the structural underpinnings of consistent functional interactions. Our work demonstrates that specific consistent patterns of functional interactions among relevant brain regions might reflect the brain's fundamental mechanisms of online processing and comprehension of video messages.

AAAI Conference 2012 Conference Paper

Sequence Labeling with Non-Negative Weighted Higher Order Features

  • Xian Qian
  • Yang Liu

In sequence labeling, using higher order features leads to high inference complexity. A lot of studies have been conducted to address this problem. In this paper, we propose a new exact decoding algorithm under the assumption that weights of all higher order features are non-negative. In the worst case, the time complexity of our algorithm is quadratic on the number of higher order features. Comparing with existing algorithms, our method is more efficient and easier to implement. We evaluate our method on two sequence labeling tasks: Optical Character Recognition and Chinese part-of-speech tagging. Our experimental results demonstrate that adding higher order features significantly improves the performance without much additional inference time.

AIIM Journal 2011 Journal Article

Exploring a corpus-based approach for detecting language impairment in monolingual English-speaking children

  • Keyur Gabani
  • Thamar Solorio
  • Yang Liu
  • Khairun-nisa Hassanali
  • Christine A. Dollaghan

Objectives This paper explores the use of an automated method for analyzing narratives of monolingual English speaking children to accurately predict the presence or absence of a language impairment. The goal is to exploit corpus-based approaches inspired by the fields of natural language processing and machine learning. Methods and materials We extract a large variety of features from language samples and use them to train language models and well known machine learning algorithms as the underlying predictors. The methods are evaluated on two different datasets and three language tasks. One dataset contains samples of two spontaneous narrative tasks performed by 118 children with an average age of 13 years and a second dataset contains play sessions from over 600 younger children with an average age of 6 years. Results We compare results against a cut off baseline method and show that our results are far superior, reaching F-measures of over 85% in two of the three language tasks, and 48% in the third one. Conclusions The different experiments we present here show that corpus based approaches can yield good prediction results in the problem of language impairment detection. These findings warrant further exploration of natural language processing techniques in the field of communication disorders. Moreover, the proposed framework can be easily adapted to analyze samples in languages other than English since most of the features are language independent or can be customized with little effort.

TCS Journal 2011 Journal Article

Improved deterministic algorithms for weighted matching and packing problems

  • Jianer Chen
  • Qilong Feng
  • Yang Liu
  • Songjian Lu
  • Jianxin Wang

Based on the method of ( n, k ) -universal sets, we present a deterministic parameterized algorithm for the weighted r d-matching problem with time complexity O ∗ ( 4 ( r − 1 ) k + o ( k ) ), improving the previous best upper bound O ∗ ( 4 r k + o ( k ) ). In particular, the algorithm applied to the unweighted 3d-matching problem results in a deterministic algorithm with time O ∗ ( 1 6 k + o ( k ) ), improving the previous best result O ∗ ( 21. 2 6 k ). For the weighted r -set packing problem, we present a deterministic parameterized algorithm with time complexity O ∗ ( 2 ( 2 r − 1 ) k + o ( k ) ), improving the previous best result O ∗ ( 2 2 r k + o ( k ) ). The algorithm, when applied to the unweighted 3-set packing problem, has running time O ∗ ( 3 2 k + o ( k ) ), improving the previous best result O ∗ ( 43. 6 2 k + o ( k ) ). Moreover, for the weighted r -set packing and weighted r d-matching problems, we give a kernel of size O ( k r ), which is the first kernelization algorithm for the problems on weighted versions.

AAAI Conference 2011 Conference Paper

Ordinal Regression via Manifold Learning

  • Yang Liu
  • Yan Liu
  • Keith Chan

Ordinal regression is an important research topic in machine learning. It aims to automatically determine the implied rating of a data item on a fixed, discrete rating scale. In this paper, we present a novel ordinal regression approach via manifold learning, which is capable of uncovering the embedded nonlinear structure of the data set according to the observations in the highdimensional feature space. By optimizing the order information of the observations and preserving the intrinsic geometry of the data set simultaneously, the proposed algorithm provides the faithful ordinal regression to the new coming data points. To offer more general solution to the data with natural tensor structure, we further introduce the multilinear extension of the proposed algorithm, which can support the ordinal regression of high order data like images. Experiments on various data sets validate the effectiveness of the proposed algorithm as well as its extension.

AAAI Conference 2010 Conference Paper

Forest-Based Semantic Role Labeling

  • Hao Xiong
  • Haitao Mi
  • Yang Liu
  • Qun Liu

Parsing plays an important role in semantic role labeling (SRL) because most SRL systems infer semantic relations from 1-best parses. Therefore, parsing errors inevitably lead to labeling mistakes. To alleviate this problem, we propose to use packed forest, which compactly encodes all parses for a sentence. We design an algorithm to exploit exponentially many parses to learn semantic relations efficiently. Experimental results on the CoNLL-2005 shared task show that using forests achieves an absolute improvement of 1. 2% in terms of F1 score over using 1-best parses and 0. 6% over using 50-best parses.

AAAI Conference 2010 Conference Paper

Multilinear Maximum Distance Embedding Via L1-Norm Optimization

  • Yang Liu
  • Yan Liu
  • Keith Chan

Dimensionality reduction plays an important role in many machine learning and pattern recognition tasks. In this paper, we present a novel dimensionality reduction algorithm called multilinear maximum distance embedding (M2 DE), which includes three key components. To preserve the local geometry and discriminant information in the embedded space, M2 DE utilizes a new objective function, which aims to maximize the distances between some particular pairs of data points, such as the distances between nearby points and the distances between data points from different classes. To make the mapping of new data points straightforward, and more importantly, to keep the natural tensor structure of high-order data, M2 DE integrates multilinear techniques to learn the transformation matrices sequentially. To provide reasonable and stable embedding results, M2 DE employs the L1-norm, which is more robust to outliers, to measure the dissimilarity between data points. Experiments on various datasets demonstrate that M2 DE achieves good embedding results of high-order data for classification tasks.

ICRA Conference 2010 Conference Paper

Performance approximation and bottleneck identification in re-entrant lines

  • Yang Liu
  • Jingshan Li
  • Shu-Yin Chiang

In this paper, we study a re-entrant line with unreliable exponential machines and finite buffers, operating under last buffer first serve scheduling policy. First, an approximation method is presented to estimate the throughput of the re-entrant line. Then, a system approach to identify bottleneck based on blockage and starvation information is proposed. It has been shown that the approximation method results in acceptable accuracy, and the bottleneck identification method could correctly detect the bottleneck in most cases.

ICRA Conference 2008 Conference Paper

Modeling and analysis of Bernoulli production systems with split and merge

  • Yang Liu
  • Jingshan Li

Many production systems have split and merge operations to increase production capacity and variety, improve product quality, and implement product control and scheduling policies. In this paper, we present analytical methods to model and analyze Bernoulli production systems with circulate and priority split/merge policies. The recursive procedures for performance analysis are derived, the convergence of the procedures and uniqueness of the solutions, along with the structural properties, are proved analytically, and the accuracy of the estimation is justified numerically with high precision.

IROS Conference 2005 Conference Paper

QoS management of supermedia enhanced teleoperation via overlay networks

  • Zhiwei Cen
  • Matt W. Mutka
  • Yang Liu
  • Amit Goradia
  • Ning Xi 0001

In supermedia enhanced Internet based teleoperation systems, the data flowing between the operator and the robot include robotic control commands, video, audio, haptic feedback and other media types. The difference between an Internet based teleoperation system and other Internet applications are that (1) there are many media types involved in teleoperation systems and each of them has a particular quality of service (QoS) requirement; and (2) some media types are very latency sensitive. Overlay networks have been proposed to improve the QoS of teleoperation applications. However efficient use the overlay network resources and the distribution of these resources optimally to all supermedia streams remains an important problem. This paper aims to provide a framework of QoS management for teleoperation systems over overlay networks. The validity and performance of the system is evaluated using the PlanetLab overlay network.

ICRA Conference 2004 Conference Paper

Multisensory Gripper and Local Autonomy of Extravehicular Mobile Robot

  • Yang Liu
  • Tao Mei
  • Xiaohua Wang
  • Bin Liang

This paper presents the development of the multisensory robot gripper for extravehicular mobile robot (EMR) and its sensor based local autonomy. For stable extravehicular walking and performing delicate tasks in unstructured and complex environment, our EMR gripper employed a simple and reliable mechanism and it is equipped with multisensory apparatus. Local autonomy of the space robot is an important requirement for on-orbit manipulation. Detecting contact state between robot gripper and environment is essential to fulfill space robot local autonomy. But we often face the problem of lack of sensory information when we try to know the contact state. A new way to detect: contact state under inadequate sensory information is proposed. By combing force sensor information with gripper geometry and mechanical analysis, some spatial contact information between robot and the trusswork can be derived. Then robot can adjust its position and orientation by fine motion displacement based on contact information to fulfill steady grasping. This method is implemented on a walking/grasping task, which is a simple and important fundamental task for extravehicular space robot.