Arrow Research search

Author name cluster

Chang Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

107 papers
2 author rows

Possible papers

107

AAAI Conference 2026 Conference Paper

Adaptive-Learngene: Continual Expansion and Task-Aware Selection of Learngenes for Dynamic Environments

  • Shuxia Lin
  • Qiufeng Wang
  • Chang Liu
  • Xu Yang
  • Xin Geng

Pre-trained Vision Transformer (ViT) models have achieved impressive performance across various computer vision tasks. However, most existing pre-trained models are built on fixed datasets and lack the flexibility to incorporate new pre-training data. When additional data becomes available, previous models must typically be retrained on both old and new data, which is costly and impractical, especially in privacy-sensitive or resource-constrained environments. Moreover, direct fine-tuning on downstream tasks does not provide mechanisms to adapt to the specific data distributions of those tasks, and it only supports fixed model sizes. To address these challenges, we propose Adaptive-Learngene, a novel framework in which the ancestry model is trained solely on newly available data, and a new component, termed a learngene, is extracted and added to a global learngene pool that expands incrementally. This design enables a dynamically evolving pool of learngenes without requiring access to previous data. For each new downstream task, the Task-Adaptive Learngene Selector (TALS) retrieves a sparse combination of learngenes that best match to the data distribution of the target task. TALS requires only a small amount of downstream data for this selection, enabling descendant models of different sizes to be efficiently initialized and tailored to specific data distributions and resource constraints. Extensive experiments on diverse downstream tasks demonstrate that our method matches or outperforms existing approaches while offering superior scalability, adaptability, and efficiency in dynamic learning environments.

AAAI Conference 2026 Conference Paper

AuthSig: Safeguarding Scanned Signatures Against Unauthorized Reuse in Paperless Workflows

  • Ruiqiang Zhang
  • Zehua Ma
  • Guanjie Wang
  • Chang Liu
  • Hengyi Wang
  • Weiming Zhang

With the deepening trend of paperless workflows, signatures as a means of identity authentication are gradually shifting from traditional ink-on-paper to electronic formats. Despite the availability of dynamic pressure-sensitive and PKI-based digital signatures, static scanned signatures remain prevalent in practice due to their convenience. However, these static images, having almost lost their authentication attributes, cannot be reliably verified and are vulnerable to malicious copying and reuse. To address these issues, we propose AuthSig, a novel static electronic signature framework based on generative models and watermark, which binds authentication information to the signature image. Leveraging the human visual system’s insensitivity to subtle style variations, AuthSig finely modulates style embeddings during generation to implicitly encode watermark bits-enforcing a One Signature, One Use policy. To overcome the scarcity of handwritten signature data and the limitations of traditional augmentation methods, we introduce a keypoint-driven data augmentation strategy that effectively enhances style diversity to support robust watermark embedding. Experimental results show that AuthSig achieves over 98% extraction accuracy under both digital-domain distortions and signature-specific degradations, and remains effective even in print-scan scenarios.

AAAI Conference 2026 Conference Paper

BiHiTo: Biomolecular Hierarchy-inspired Tokenization

  • Ruochong Zheng
  • Yutian Liu
  • Yian Zhao
  • Zhiwei Nie
  • Xuehan Hou
  • Chang Liu
  • Siwei Ma
  • Youdong Mao

Three-dimensional atomic arrangements of biomolecules are key to demystifying biological functions. The rapid expansion of accessible structural data, driven by advances in AI for science, highlights the critical challenge of efficiently modeling large-scale biomolecular structures, which are high-dimensional systems shaped by biological assembly principles. To address this, we introduce BiHiTo, a multi-level Biomolecular Hierarchy-inspired Tokenizer that intrinsically mimics natural biological assembly hierarchies. Specifically, we design a multi-codebook quantizer that mirrors the natural hierarchy of biomolecular structure, enabling simultaneous capture of representations spanning atomic motifs to global conformational variations. This hierarchical alignment markedly improves the biological interpretability and reconstruction fidelity of biomolecular structure.Extensive experiments demonstrate that BiHiTo delivers state-of-the-art performance and robust generalization across molecular dynamics trajectories and macromolecular complexes, facilitating advances in structure generation and dynamic conformation exploration. In the reconstruction of the CASP14 and OOD test set FastFolding protein multi-conformation data, our method achieves a 17% and 51% reduction in RMSD compared to Bio2Token, respectively.

AAAI Conference 2026 Conference Paper

Bridging Cognitive Gap: Hierarchical Description Learning for Artistic Image Aesthetics Assessment

  • Henglin Liu
  • Nisha Huang
  • Chang Liu
  • Jiangpeng Yan
  • Huijuan Huang
  • Jixuan Ying
  • Tong-Yee Lee
  • Pengfei Wan

The aesthetic quality assessment task is crucial for developing a human-aligned quantitative evaluation system for AIGC. However, its inherently complex nature—spanning visual perception, cognition, and emotion—poses fundamental challenges. Although aesthetic descriptions offer a viable representation of this complexity, two critical challenges persist: (1) data scarcity and imbalance: existing dataset overly focuses on visual perception and neglects deeper dimensions due to the expensive manual annotation; and (2) model fragmentation: current visual networks isolate aesthetic attributes with multi-branch encoder, while multimodal methods represented by contrastive learning struggle to effectively process long-form textual descriptions. To resolve challenge (1), we first present the Refined Aesthetic Description (RAD) dataset, a large-scale (70k), multi-dimensional structured dataset, generated via an iterative pipeline without heavy annotation costs and easy to scale. To address challenge (2), we propose ArtQuant, an aesthetics assessment framework for artistic image which not only couple isolated aesthetic dimensions through joint description generation, but also better model long-text semantics with the help of LLM decoders. Besides, theoretical analysis confirms this symbiosis: RAD's semantic adequacy (data) and generation paradigm (model) collectively minimize prediction entropy, providing mathematical grounding for the framework. Our approach achieves state-of-the-art performance on several datasets while requiring only 33% of conventional training epochs, narrowing the cognitive gap between artistic image and aesthetic judgment. We will release both code and dataset to support future research.

AAAI Conference 2026 Conference Paper

CoT-VLNBench: A Benchmark for Visual Chain-of-Thought Reasoning in Vision-Language-Navigation Robots

  • Xiao Zhao
  • Chang Liu
  • Ruiteng Ji
  • Zheyuan Zhang
  • Mingxu Zhu
  • Linna Song
  • Zhe Ren
  • Luo Qingliang

Recent advances in vision language models (VLMs) have demonstrated remarkable potential in embodied navigation tasks. However, existing robot-centric datasets primarily focus on traditional 3D tasks such as perception and prediction, lacking adequate support for vision-language tasks. Vision-language-navigation (VLN) is a key capability for achieving human-like and interpretable navigation in complex environments. In this study, we present CoT-VLNBench, the first large-scale benchmark and dataset designed for chain-of-thought (CoT) reasoning in quadruped robot navigation. Our dataset encompasses a diverse range of indoor and outdoor scenes, multi-step navigation trajectories, and rich natural language instructions, all annotated with fine-grained CoT reasoning traces. Specifically, it contains 175K frames, 5.25M 3D bounding boxes, and 875K vision–question–answer (VQA) pairs. This comprehensive resource enables thorough evaluation of embodied agents’ perceptual and step-by-step reasoning abilities. Furthermore, we propose a novel CoT-VLN model, a state-of-the-art 7B VLN model that integrates visual, linguistic, and reasoning modules, to facilitate interpretable and effective navigation. Extensive experiments demonstrate that our approach significantly outperforms existing non-VLMs baselines on the new benchmark, underscoring the importance of CoT-VLN in embodied navigation. We hope that CoT-VLNBench will serve as a valuable resource to advance research at the intersection of robotics, vision, language, and reasoning.

JBHI Journal 2026 Journal Article

PAM-CDR: Property-Aware Multi-Modal Drug Representation Learning for Accurate Cancer Drug Response Prediction

  • Yang Li
  • Chang Liu
  • Haijie Cui
  • Jianli Ma

Accurate prediction of cancer drug response is essential for advancing precision oncology, enabling tailored therapies that account for the molecular heterogeneity of tumors. While deep learning has shown promise in this domain, many existing approaches fail to incorporate physicochemical properties of drug compounds, limiting the biological interpretability and generalizability of learned representations. To address this gap, we present PAM-CDR, a property-aware multi-modal representation learning framework that integrates molecular graphs, fingerprints, and physicochemical descriptors with transcriptomic and genomic profiles of cancer cell lines. PAM-CDR employs a three-stage hierarchical fusion strategy to enable fine-grained representation learning across drug and cell modalities. In the first stage, property-guided attention injects biologically meaningful context to enrich molecular graph and fingerprint features. In the second stage, bidirectional cross-modality interactions capture complementary patterns and enhance multi-omic cellular representations. In the final stage, unified drug and cell line embeddings are integrated to accurately predict drug responses. Benefiting from these designs, PAM-CDR consistently outperforms competitive baselines, achieving an AUC of 0. 9161 and an AUPR of 0. 9313. Ablation studies confirm the critical contribution of physicochemical priors, while embedding visualizations reveal improved biological coherence in the learned molecular representations. The code is publicly available at https://github.com/catly/PAM-CDR.

AAAI Conference 2026 Conference Paper

ProAR: Probabilistic Autoregressive Modeling for Molecular Dynamics

  • Kaiwen Cheng
  • Yutian Liu
  • Zhiwei Nie
  • Mujie Lin
  • Yanzhen Hou
  • Yiheng Tao
  • Chang Liu
  • Jie Chen

Understanding the structural dynamics of biomolecules is crucial for uncovering biological functions. As molecular dynamics (MD) simulation data becomes more available, deep generative models have been developed to synthesize realistic MD trajectories. However, existing methods produce fixed-length trajectories by jointly denoising high-dimensional spatiotemporal representations, which conflicts with MD’s frame-by-frame integration process and fails to capture time-dependent conformational diversity. Inspired by MD's sequential nature, we introduce a new probabilistic autoregressive (ProAR) framework for trajectory generation. ProAR uses a dual-network system that models each frame as a multivariate Gaussian distribution and employs an anti-drifting sampling strategy to reduce cumulative errors. This approach captures conformational uncertainty and time-coupled structural changes while allowing flexible generation of trajectories of arbitrary length. Experiments on ATLAS, a large-scale protein MD dataset, demonstrate that for long trajectory generation, our model achieves a 7.5% reduction in reconstruction RMSE and an average 25.8% improvement in conformation change accuracy compared to previous state-of-the-art methods. For conformation sampling task, it performs comparably to specialized time-independent models, providing a flexible and dependable alternative to standard MD simulations.

AAAI Conference 2026 Conference Paper

S2D-Align: Shallow-to-Deep Auxiliary Learning for Anatomically-Grounded Radiology Report Generation

  • Jiechao Gao
  • Chang Liu
  • Yuangang Li

Radiology Report Generation (RRG) aims to automatically generate diagnostic reports from radiology images. To achieve this, existing methods have leveraged the powerful cross-modal generation capabilities of Multimodal Large Language Models (MLLMs), primarily focusing on optimizing cross-modal alignment between radiographs and reports through Supervised Fine-Tuning (SFT). However, by only performing instance-level alignment with the image-text pairs, the standard SFT paradigm fails to establish anatomically-grounded alignment, where the templated nature of reports often leads to sub-optimal generation quality. To address this, we propose S2D-Align, a novel SFT paradigm that establishes anatomically-grounded alignment by leveraging auxiliary signals of varying granularities. S2D-Align implements a shallow-to-deep strategy, progressively enriching the alignment process: it begins with the coarse radiograph-report pairing, then introduces reference reports for instance-level guidance, and ultimately utilizes key phrases to ground the generation in specific anatomical details. To bridge the different alignment stages, we introduce a memory-based adapter that empowers feature sharing, thereby integrating coarse and fine-grained guidance. For evaluation, we conduct experiments on the public MIMIC-CXR and IU X-Ray benchmarks, where S2D-Align achieves state-of-the-art performance compared to existing methods. Ablation studies validate the effectiveness of our multi-stage, auxiliary-guided approach, highlighting a promising direction for enhancing grounding capabilities in complex, multi-modal generation tasks.

AAAI Conference 2026 Conference Paper

SCIR: A Self-Correcting Iterative Refinement Framework for Enhanced Information Extraction Based on Schema

  • Yushen Fang
  • Jianjun Li
  • Mingqian Ding
  • Chang Liu
  • Xinchi Zou
  • Wenqi Yang

Although Large language Model (LLM)-powered information extraction (IE) systems have shown impressive capabilities, current fine-tuning paradigms face two major limitations: high training costs and difficulties in aligning with LLM preferences. To address these issues, we propose a novel universal IE paradigm—the Self-Correcting Iterative Refinement (SCIR) framework—along with a Multi-task Bilingual (Chinese-English) Self-Correcting (MBSC) dataset containing over 100,000 entries. The SCIR framework achieves plug-and-play compatibility with existing LLMs and IE systems through its Dual-Path Self-Correcting module and feedback-driven optimization, thereby significantly reducing training costs. Concurrently, the MBSC dataset tackles the challenge of preference alignment by indirectly distilling GPT-4's capabilities into IE result detection models. Experimental results demonstrate that SCIR outperforms state-of-the-art IE methods across three key tasks— named entity recognition, relation extraction, and event extraction—achieving a 5.27 percent average improvement in span-based Micro-F1 while reducing training costs by 87 percent compared to baseline approaches. These advancements not only enhance the flexibility and accuracy of IE systems but also pave the way for lightweight and efficient IE paradigms.

AAAI Conference 2026 Conference Paper

WaveFormer: Frequency-Time Decoupled Vision Modeling with Wave Equation

  • Zishan Shu
  • Juntong Wu
  • Wei Yan
  • Xudong Liu
  • Hongyu Zhang
  • Chang Liu
  • Youdong Mao
  • Jie Chen

Vision modeling has advanced rapidly with Transformers, whose attention mechanisms capture visual dependencies but lack a principled account of how semantic information propagates spatially. We revisit this problem from a wave-based perspective: feature maps are treated as spatial signals whose evolution over an internal propagation time (aligned with network depth) is governed by an underdamped wave equation. In this formulation, spatial frequency—from low-frequency global layout to high-frequency edges and textures—is modeled explicitly, and its interaction with propagation time is controlled rather than implicitly fixed. We derive a closed-form, frequency–time decoupled solution and implement it as the Wave Propagation Operator (WPO), a lightweight module that models global interactions in O(NlogN) time—far lower than attention. Building on WPO, we propose a family of WaveFormer models as drop-in replacements for standard ViTs and CNNs, achieving competitive accuracy across image classification, object detection, and semantic segmentation, while delivering up to 1.6× higher throughput and 30% fewer FLOPs than attention-based alternatives. Furthermore, our results demonstrate that wave propagation introduces a complementary modeling bias to heat-based methods, effectively capturing both global coherence and high-frequency details essential for rich visual semantics.

AAAI Conference 2025 Conference Paper

Aligning Instance Brownian Bridge with Texts for Open-Vocabulary Video Instance Segmentation

  • Zesen Cheng
  • Kehan Li
  • Li Hao
  • Peng Jin
  • Xiawu Zheng
  • Chang Liu
  • Jie Chen

Temporally locating objects with arbitrary class texts is the primary pursuit of open-vocabulary Video Instance Segmentation (VIS). Because of the insufficient vocabulary of video data, previous methods leverage the image-text pretraining model for recognizing object instances by separately aligning each frame with class texts. As a result, the separation breaks the instance movement context of videos and requires a lot of inference overhead. To tackle these issues, we propose BridgeText Alignment (BTA) to link frame-level instance representations as a Brownian Bridge. On one hand, we can calculate the global descriptor of a Brownian bridge for capturing instance dynamics, which enables extra considering temporal information rather than only static information of each frame for aligning with texts. On the other hand, according to the goal-conditioned property of the Brownian bridge, we can estimate the middle frame features via the start and the end frame features so the global feature calculation of a Brownian bridge only needs to infer a few frames, which largely reduces inference overhead. We term our overall pipeline as BriVIS. Following the training settings of previous works, BriVIS surpasses the SOTA (OV2Seg) by a clear margin. For example, on the challenging large-vocabulary datasets (BURST, LVVIS), BriVIS achieves 5.7 and 20.9 mAP, which exhibits +2.2∼+6.7 mAP improvement compared to OV2Seg. Furthermore, after training via BTA, using only the head and the tail frames for alignment improves the speed by 32% (2.77 → 1.88 s/iter) while just decreasing the performance by 0.2 mAP (21.1 → 20.9 mAP).

EAAI Journal 2025 Journal Article

Automatic weighted ensemble learning for ballast resistance estimation driven by vehicle-ground information fusion

  • Conghui Wang
  • Shiwu Yang
  • Chang Liu

Health management of transmission parameters for railway signal equipment is a key link between intelligent operation and maintenance. As a core parameter of track circuits, ballast resistance significantly affects signal transmission. To accurately and reliably assess its health state, an ensemble learning algorithm (ELA) is introduced, tackling deviations of appraisal decision boundaries. Focusing on issues of complex weight calculation, model homogenization, and severe overfitting in ELA, an integration model based on an automatic weight allocation strategy (AWAS) is innovatively proposed, constructing a method for resistance estimations driven by information fusion, while maximizing its generalization ability. Firstly, for deterioration mechanism analysis of ballast resistance, a transmission state model for vehicle-ground collaboration is established, completing extractions of evolutionary rules. Secondly, the improved ELA leverages heterogeneous classifier optimization and automatic weighted soft voting, with its core ensemble strategy employing a secondary learner to map the fused datasets. Then, by means of data mining techniques, interpolation and denoising algorithms are applied to implement data preprocessing, facilitating the effective fusion of heterogeneous vehicle-ground information. Finally, based on occurrence of adverse conditions, an appropriate particle size is set to achieve state warning. The results indicate that the proposed AWAS for ballast resistance calculations can achieve 98. 52 % testing accuracy and outperforms others.

IROS Conference 2025 Conference Paper

BoRe-Depth: Self-Supervised Monocular Depth Estimation with Boundary Refinement for Embedded Systems

  • Juan Li
  • Sheng Zhang
  • Chang Liu
  • Jie Li
  • Xu Zhang

Depth estimation is one of the key technologies for realizing 3D perception in unmanned systems. Monocular depth estimation has been widely researched because of its low-cost advantage, but the existing methods face the challenges of poor depth estimation performance and blurred object boundaries on embedded systems. In this paper, we propose a novel monocular depth estimation model, BoRe-Depth, which contains only 8. 7M parameters. It can accurately estimate depth maps on embedded systems and significantly improves boundary quality. Firstly, we design an Enhanced Feature Adaptive Fusion Module (EFAF) which adaptively fuses depth features to enhance boundary detail representation. Secondly, we integrate semantic knowledge into the encoder to improve the object recognition and boundary perception capabilities. Finally, BoRe-Depth is deployed on NVIDIA Jetson Orin, and runs efficiently at 50. 7 FPS. We demonstrate that the proposed model significantly outperforms previous lightweight models on multiple challenging datasets, and we provide detailed ablation studies for the proposed methods. The code is available at https://github.com/liangxiansheng093/BoRe-Depth.

ICML Conference 2025 Conference Paper

Counterfactual Voting Adjustment for Quality Assessment and Fairer Voting in Online Platforms with Helpfulness Evaluation

  • Chang Liu
  • Yixin Wang
  • Moontae Lee

Efficient access to high-quality information is vital for online platforms. To promote more useful information, users not only create new content but also evaluate existing content, often through helpfulness voting. Although aggregated votes help service providers rank their user content, these votes are often biased by disparate accessibility per position and the cascaded influence of prior votes. For a fairer assessment of information quality, we propose the Counterfactual Voting Adjustment (CVA), a causal framework that accounts for the context in which individual votes are cast. Through preliminary and semi-synthetic experiments, we show that CVA effectively models the position and herding biases, accurately recovering the predefined content quality. In a real experiment, we demonstrate that reranking content based on the learned quality by CVA exhibits stronger alignment with both user sentiment and quality evaluation assessed by GPT-4o, outperforming system rankings based on aggregated votes and model-based rerankings without causal inference. Beyond the individual quality inference, our embeddings offer comparative insights into the behavioral dynamics of expert user groups across 120 major StackExchange communities.

AAAI Conference 2025 Conference Paper

DCA: Dividing and Conquering Amnesia in Incremental Object Detection

  • Aoting Zhang
  • Dongbao Yang
  • Chang Liu
  • Xiaopeng Hong
  • Miao Shang
  • Yu Zhou

Incremental object detection (IOD) aims to cultivate an object detector that can continuously localize and recognize novel classes while preserving its performance on previous classes. Existing methods achieve certain success by improving knowledge distillation and exemplar replay for transformer-based detection frameworks, but the intrinsic forgetting mechanisms remain underexplored. In this paper, we dive into the cause of forgetting and discover forgetting imbalance between localization and recognition in transformer-based IOD, which means that localization is less-forgetting and can generalize to future classes, whereas catastrophic forgetting occurs primarily on recognition. Based on these insights, we propose a Divide-and-Conquer Amnesia (DCA) strategy, which redesigns the transformer-based IOD into a localization-then-recognition process. DCA can well maintain and transfer the localization ability, leaving decoupled fragile recognition to be specially conquered. To reduce feature drift in recognition, we leverage semantic knowledge encoded in pre-trained language models to anchor class representations within a unified feature space across incremental tasks. This involves designing a duplex classifier fusion and embedding class semantic features into the recognition decoding process in the form of queries. Extensive experiments validate that our approach achieves state-of-the-art performance, especially for long-term incremental scenarios. For example, under the four-step setting on MS-COCO, our DCA strategy significantly improves the final AP by 6.9%.

AAAI Conference 2025 Conference Paper

DigitalLLaVA: Incorporating Digital Cognition Capability for Physical World Comprehension in Multimodal LLMs

  • Shiyu Li
  • Pengxu Wei
  • Pengchong Qiao
  • Chang Liu
  • Jie Chen

Multimodal Large Language Models (MLLMs) have shown remarkable cognitive capabilities in various cross-modal tasks.However, existing MLLMs struggle with tasks that require physical digital cognition, such as accurately reading an electric meter or pressure gauge. This limitation significantly reduces their effectiveness in practical applications like industrial monitoring and home energy management, where digital sensors are not feasible. For humans, physical digits are artificially defined quantities presented on specific carriers, which require training to recognize. As existing MLLMs are only pre-trained in the manner of object recognition, they fail to comprehend the relationship between digital carriers and their reading. To this end, referring to human behavior, we propose a novel DigitalLLaVA method to explicitly inject digital cognitive abilities into MLLMs in a two-step manner. In the first step, to improve the MLLM's understanding of physical digit carriers, we propose a digit carrier mapping method. This step utilizes object-level text-image pairs to enhance the model's comprehension of objects containing physical digits. For the second step, unlike previous methods that rely on sequential digital prediction or digit regression, we propose a 32 bit floating point simulation approach that treats digit prediction as a whole. Using digit-level text-image pairs, we train three float heads to predict 32-bit floating-point numbers using 0/1 binary classification. This step significantly reduces the search space, making the prediction process more robust and straightforward. Being simple but effective, our method can identify very precise metrics (i.e., accurate to ±0.001) and provide floating-point results, showing its applicability in digital carrier domains.

NeurIPS Conference 2025 Conference Paper

E2Former: An Efficient and Equivariant Transformer with Linear-Scaling Tensor Products

  • Yunyang Li
  • Lin Huang
  • Zhihao Ding
  • Xinran Wei
  • Chu Wang
  • Han Yang
  • Zun Wang
  • Chang Liu

Equivariant Graph Neural Networks (EGNNs) have demonstrated significant success in modeling microscale systems, including those in chemistry, biology and materials science. However, EGNNs face substantial computational challenges due to the high cost of constructing edge features via spherical tensor products, making them almost impractical for large-scale systems. To address this limitation, we introduce E2Former, an equivariant and efficient transformer architecture that incorporates a Wigner $6j$ convolution (Wigner $6j$ Conv). By shifting the computational burden from edges to nodes, Wigner $6j$ Conv reduces the complexity from $O(| \mathcal{E}|)$ to $O(| \mathcal{V}|)$ while preserving both the model's expressive power and rotational equivariance. We show that this approach achieves a 7x–30x speedup compared to conventional $\mathrm{SO}(3)$ convolutions. Furthermore, our empirical results demonstrate that the derived E2Former mitigates the computational challenges of existing approaches without compromising the ability to capture detailed geometric information. This development could suggest a promising direction for scalable molecular modeling.

AAAI Conference 2025 Conference Paper

Exploiting Diffusion Prior for Real-World Image Dehazing with Unpaired Training

  • Yunwei Lan
  • Zhigao Cui
  • Chang Liu
  • Jialun Peng
  • Nian Wang
  • Xin Luo
  • Dong Liu

Unpaired training has been verified as one of the most effective paradigms for real scene dehazing by learning from unpaired real-world hazy and clear images. Although numerous studies have been proposed, current methods demonstrate limited generalization for various real scenes due to limited feature representation and insufficient use of real-world prior. Inspired by the strong generative capabilities of diffusion models in producing both hazy and clear images, we exploit diffusion prior for real-world image dehazing, and propose an unpaired framework named Diff-Dehazer. Specifically, we leverage diffusion prior as bijective mapping learners within the CycleGAN, a classic unpaired learning framework. Considering that physical priors contain pivotal statistics information of real-world data, we further excavate real-world knowledge by integrating physical priors into our framework. Furthermore, we introduce a new perspective for adequately leveraging the representation ability of diffusion models by removing degradation in image and text modalities, so as to improve the dehazing effect. Extensive experiments on multiple real-world datasets demonstrate the superior performance of our method.

NeurIPS Conference 2025 Conference Paper

FedWMSAM: Fast and Flat Federated Learning via Weighted Momentum and Sharpness-Aware Minimization

  • Tianle Li
  • Yongzhi Huang
  • Linshan Jiang
  • Chang Liu
  • Qipeng Xie
  • Wenfeng Du
  • Lu Wang
  • Kaishun Wu

In federated learning (FL), models must \emph{converge quickly} under tight communication budgets while \emph{generalizing} across non-IID client distributions. These twin requirements have naturally led to two widely used techniques: client/server \emph{momentum} to accelerate progress, and \emph{sharpness-aware minimization} (SAM) to prefer flat solutions. However, simply combining momentum and SAM leaves two structural issues unresolved in non-IID FL. We identify and formalize two failure modes: \emph{local–global curvature misalignment} (local SAM directions need not reflect the global loss geometry) and \emph{momentum-echo oscillation} (late-stage instability caused by accumulated momentum). To our knowledge, these failure modes have not been jointly articulated and addressed in the FL literature. We propose \textbf{FedWMSAM} to address both failure modes. First, we construct a momentum-guided global perturbation from server-aggregated momentum to align clients' SAM directions with the global descent geometry, enabling a \emph{single-backprop} SAM approximation that preserves efficiency. Second, we couple momentum and SAM via a cosine-similarity adaptive rule, yielding an early-momentum, late-SAM two-phase training schedule. We provide a non-IID convergence bound that \emph{explicitly models the perturbation-induced variance} $\sigma_\rho^2=\sigma^2+(L\rho)^2$ and its dependence on $(S, K, R, N)$ on the theory side. We conduct extensive experiments on multiple datasets and model architectures, and the results validate the effectiveness, adaptability, and robustness of our method, demonstrating its superiority in addressing the optimization challenges of Federated Learning. Our code is available at \url{https: //github. com/Li-Tian-Le/NeurlPS_FedWMSAM}.

IROS Conference 2025 Conference Paper

Haptic-ACT: Bridging Human Intuition with Compliant Robotic Manipulation via Immersive VR

  • Kelin Li
  • Shubham M. Wagh
  • Nitish Sharma
  • Saksham Bhadani
  • Wei Chen
  • Chang Liu
  • Petar Kormushev

Robotic manipulation is essential for the widespread adoption of robots in industrial and home settings and has long been a focus within the robotics community. Advances in artificial intelligence have introduced promising learning-based methods to address this challenge, with imitation learning emerging as particularly effective. However, efficiently acquiring high-quality demonstrations remains a challenge. In this work, we introduce an immersive VR-based teleoperation setup designed to collect demonstrations from a remote human user. We also propose an imitation learning framework called Haptic Action Chunking with Transformers (Haptic-ACT). To evaluate the platform, we conducted a pick-and-place task and collected 50 demonstration episodes. Results indicate that the immersive VR platform significantly reduces demonstrator fingertip forces compared to systems without haptic feedback, enabling more delicate manipulation. Additionally, evaluations of the Haptic-ACT framework in both the MuJoCo simulator and on a real robot demonstrate its effectiveness in teaching robots more compliant manipulation compared to the original ACT. Additional materials are available at https://sites.google.com/view/hapticact.

NeurIPS Conference 2025 Conference Paper

How Does Topology Bias Distort Message Passing in Graph Recommender? A Dirichlet Energy Perspective

  • Yanbiao Ji
  • Yue Ding
  • Dan Luo
  • Chang Liu
  • Yuxiang Lu
  • Xin Xin
  • Hongtao Lu

Graph-based recommender systems have achieved remarkable effectiveness by modeling high-order interactions between users and items. However, such approaches are significantly undermined by popularity bias, which distorts the interaction graph’s structure—referred to as topology bias. This leads to overrepresentation of popular items, thereby reinforcing biases and fairness issues through the user-system feedback loop. Despite attempts to study this effect, most prior work focuses on the embedding or gradient level bias, overlooking how topology bias fundamentally distorts the message passing process itself. We bridge this gap by providing an empirical and theoretical analysis from a Dirichlet energy perspective, revealing that graph message passing inherently amplifies topology bias and consistently benefits highly connected nodes. To address these limitations, we propose Test-time Simplicial Propagation (TSP), which extends message passing to higher-order simplicial complexes. By incorporating richer structures beyond pairwise connections, TSP mitigates harmful topology bias and substantially improves the representation and recommendation of long-tail items during inference. Extensive experiments across five real-world datasets demonstrate the superiority of our approach in mitigating topology bias and enhancing recommendation quality. The implementation code is available at https: //github. com/sotaagi/TSP.

JBHI Journal 2025 Journal Article

Improving Foundation Model for Endoscopy Video Analysis via Representation Learning on Long Sequences

  • Zhao Wang
  • Chang Liu
  • Lingting Zhu
  • Tongtong Wang
  • Shaoting Zhang
  • Qi Dou

Recent advancements in endoscopy video analysis have relied on the utilization of relatively short video clips extracted from longer videos or millions of individual frames. However, these approaches tend to neglect the domain-specific characteristics of endoscopy data, which is typically presented as a long stream containing valuable semantic spatial and temporal information. To address this limitation, we propose EndoFM-LV, a foundation model developed under a minute-level pre-training framework upon long endoscopy video sequences. To be specific, we propose a novel masked token modeling scheme within a teacher-student framework for self-supervised video pre-training, which is tailored for learning representations from long video sequences. For pre-training, we construct a large-scale long endoscopy video dataset comprising 6, 469 long endoscopic video samples, each longer than 1 minute and totaling over 13 million frames. Our EndoFM-LV is evaluated on four types of endoscopy tasks, namely classification, segmentation, detection, and workflow recognition, serving as the backbone or temporal module. Extensive experimental results demonstrate that our framework outperforms previous state-of-the-art video-based and frame-based approaches by a significant margin, surpassing Endo-FM (5. 6% F1, 9. 3% Dice, 8. 4% F1, and 3. 3% accuracy for classification, segmentation, detection, and workflow recognition) and EndoSSL (5. 0% F1, 8. 1% Dice, 9. 3% F1 and 3. 1% accuracy for classification, segmentation, detection, and workflow recognition).

AAAI Conference 2025 Conference Paper

Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning

  • Yun Qu
  • Yuhang Jiang
  • Boyuan Wang
  • Yixiu Mao
  • Cheems Wang
  • Chang Liu
  • Xiangyang Ji

Reinforcement learning (RL) often encounters delayed and sparse feedback in real-world applications, even with only episodic rewards. Previous approaches have made some progress in reward redistribution for credit assignment but still face challenges, including training difficulties due to redundancy and ambiguous attributions stemming from overlooking the multifaceted nature of mission performance evaluation. Hopefully, Large Language Model (LLM) encompasses fruitful decision-making knowledge and provides a plausible tool for reward redistribution. Even so, deploying LLM in this case is non-trivial due to the misalignment between linguistic knowledge and the symbolic form requirement, together with inherent randomness and hallucinations in inference. To tackle these issues, we introduce LaRe, a novel LLM-empowered symbolic-based decision-making framework, to improve credit assignment. Key to LaRe is the concept of the Latent Reward, which works as a multi-dimensional performance evaluation, enabling more interpretable goal attainment from various perspectives and facilitating more effective reward redistribution. We examine that semantically generated code from LLM can bridge linguistic knowledge and symbolic latent rewards, as it is executable for symbolic objects. Meanwhile, we design latent reward self-verification to increase the stability and reliability of LLM inference. Theoretically, reward-irrelevant redundancy elimination in the latent reward benefits RL performance from more accurate reward estimation. Extensive experimental results witness that LaRe (i) achieves superior temporal credit assignment to SOTA methods, (ii) excels in allocating contributions among multiple agents, and (iii) outperforms policies trained with ground truth rewards for certain tasks.

NeurIPS Conference 2025 Conference Paper

Long-term Intracortical Neural activity and Kinematics (LINK): An intracortical neural dataset for chronic brain-machine interfaces, neuroscience, and machine learning

  • Hisham Temmar
  • Yixuan Wang
  • Nina Gill
  • Nicholas Mellon
  • Chang Liu
  • Luis Cubillos
  • Rio Parsons
  • Joseph Costello

Intracortical brain-machine interfaces (iBMIs) have enabled movement and speech in people living with paralysis by using neural data to decode behaviors in real-time. However, intracortical neural recordings exhibit significant instabilities over time, which poses problems for iBMIs, neuroscience, and machine learning. For iBMIs, neural instabilities require frequent decoder recalibration to maintain high performance, a critical bottleneck for real-world translation. Several approaches have been developed to address this issue, and the field has recognized the need for standardized datasets on which to compare them, but no standard dataset exists for evaluation over year-long timescales. In neuroscience, a growing body of research attempts to elucidate the latent computations performed by populations of neurons. Nonstationarity in neural recordings imposes significant challenges to the design of these studies, so a dataset containing recordings over large time spans would improve methods to account for instabilities. In machine learning, continuous domain adaptation of temporal data is an area of active research, and a dataset containing shift distributions on long time scales would be beneficial to researchers. To address these gaps, we present the LINK Dataset (Long-term Intracortical Neural activity and Kinematics), which contains intracortical spiking activity and kinematic data from 312 sessions of a non-human primate performing a dexterous, 2 degree-of-freedom finger movement task, spanning 1, 242 days. We also present longitudinal analyses of the dataset’s neural spiking activity and its relationship to kinematics, as well as overall decoding performance using linear and neural network models. The LINK dataset (https: //dandiarchive. org/dandiset/001201) and code (https: //github. com/chesteklab/LINK_dataset) are freely available to the public.

YNIMG Journal 2025 Journal Article

Loss aversion and evidence accumulation in short-video addiction: A behavioral and neuroimaging investigation

  • Chang Liu
  • Jinlian Wang
  • Hanbing Li
  • Qianyi Shangguan
  • Weipeng Jin
  • Wenwei Zhu
  • Pinchun Wang
  • Xuyi Chen

Excessive use of short-video platforms not only impairs decision-making processes but also predisposes individuals to addictive behaviors. This study investigated the relationship between short-video addiction (SVA) symptoms and loss aversion (LA), delving into the underlying computational and neural mechanisms using the drift diffusion model (DDM) and the inter-subject representational similarity analysis (IS-RSA). Behavioral analyses revealed a significant negative correlation between SVA symptoms and the LA coefficient (lnλ). Additionally, the DDM-based drift rate (v) was found to mediate this relationship. Neuroimaging analyses further indicated that SVA symptoms were negatively associated with gain-related activity in the right precuneus, while positively correlating with loss-related activity in the right cerebellum and left postcentral gyrus. Notably, precuneus activation during gain processing mediated the relationship between SVA symptoms and both lnλ and drift rate. IS-RSA revealed that inter-subject variations in SVA symptoms were significantly associated with distinct activation patterns related to gain processing in the frontoparietal network (e.g., frontal pole, inferior frontal gyrus, and supramarginal gyrus) and motor network (e.g., precentral), as well as loss-related activation patterns in the motor networks (e.g., postcentral and pre-supplementary motor area). Similar patterns emerged when examining simultaneous gain and loss-related activation patterns. Mediation analyses further demonstrated that functional activation patterns in the motor network mediated the relationships between inter-subject variations in SVA symptoms and both loss-aversion and psychological processing patterns (e.g., decision threshold, drift rate, and non-decision time). These findings provide novel insights into the cognitive and neural mechanisms underlying the influence of SVA symptoms on loss aversion, and suggest the critical roles of evidence accumulation speed and specific brain activation patterns-particularly within the cognitive control and motor network-in shaping decision-making biases associated with addiction.

EAAI Journal 2025 Journal Article

MirrorDiff: Prompt redescription for zero-shot grounded text-to-image generation with attention modulation

  • Chang Liu
  • Mingwen Shao
  • Zhengyi Gong
  • Xiang Lv
  • Lingzhuang Meng

Large-scale layout-conditioned text-to-image diffusion models have made significant progress and achieved remarkable results in generating diverse and high-quality images, realizing objects appearing in specific regions simultaneously. However, existing methods still fail with attribute coupling, unreasonable spatial relationships expressions and missing objects when the prompt is complex with multiple objects containing multiple attributes. In addition, it is difficult for users to give precise layout conditions for complex prompts. To address the above issues, we propose MirrorDiff, a novel training-free grounded text-to-image-to-text framework by redescription to correct inaccurate content expressions of synthetic images iteratively. Specifically, we first utilize large language models as layout generator which have the ability to understand visual concepts and support plausible arrangements to generate scene layout for complex prompts to help users obtain precision layout more conveniently. Subsequently, to solve small object missing, we design a layout-guided attention modulation strategy to properly adjust attention maps during diffusion generation process, which effectively increases attention of small objects. Additionally, semantic text regeneration supervision is proposed to constrain the redescription to keep consistent with the given text semantically, which aims to mitigate attribute coupling and failures of spatial relationships expressions. We conduct extensive experiments on four benchmarks and our method achieves the best results in all categories on the Holistic, Reliable and Scalable benchmark, which shows that our proposed MirrorDiff achieves state-of-the-art results both quantitatively and qualitatively compared with current superior models.

NeurIPS Conference 2025 Conference Paper

Near-Optimal Sample Complexity for Online Constrained MDPs

  • Chang Liu
  • Yunfan Li
  • Lin Yang

Safety is a fundamental challenge in reinforcement learning (RL), particularly in real-world applications such as autonomous driving, robotics, and healthcare. To address this, Constrained Markov Decision Processes (CMDPs) are commonly used to enforce safety constraints while optimizing performance. However, existing methods often suffer from significant safety violations or require a high sample complexity to generate near-optimal policies. We address two settings: relaxed feasibility, where small violations are allowed, and strict feasibility, where no violation is allowed. We propose a model-based primal-dual algorithm that balances regret and bounded constraint violations, drawing on techniques from online RL and constrained optimization. For relaxed feasibility, we prove that our algorithm returns an $\varepsilon$-optimal policy with $\varepsilon$-bounded violation with arbitrarily high probability, requiring $\tilde{O}\left(\frac{SAH^3}{\varepsilon^2}\right)$ learning episodes, matching the lower bound for unconstrained MDPs. For strict feasibility, we prove that our algorithm returns an $\varepsilon$-optimal policy with zero violation with arbitrarily high probability, requiring $\tilde{O}\left(\frac{SAH^5}{\varepsilon^2\zeta^2}\right)$ learning episodes, where $\zeta$ is the problem-dependent Slater constant characterizing the size of the feasible region. This result matches the lower bound for learning CMDPs with access to a generative model. Episodic tabular CMDPs serve as a crucial benchmark for safe RL, providing a structured environment for theoretical analysis and algorithmic validation. Our results demonstrate that learning CMDPs in an online setting is as easy as learning with a generative model and is no more challenging than learning unconstrained MDPs when small violations are allowed.

YNIMG Journal 2025 Journal Article

Neural, psychological, and transcriptomic predictors of short video addiction: A multi-site longitudinal study of fear of missing out and negative affect

  • Chang Liu
  • Hanbing Li
  • Qianyi Shangguan
  • Yuyang Zeng
  • Pinchun Wang
  • Zong Zhang
  • Weipeng Jin
  • Qiang Wang

Short video addiction symptoms (SVAS) have become increasingly prevalent, yet their longitudinal neurobiological basis remains unclear. In a multi-site longitudinal study (n = 280), we examined whether baseline brain features and dispositional traits-negative affect (NA) and fear of missing out (FoMO)-predict future SVAS. Participants completed self-report measures and MRI scans at baseline, with follow-up assessments conducted after 5 months to 5 years. Behaviorally, both baseline and follow-up NA and FoMO significantly predicted SVAS. Structurally, baseline gray matter volume (GMV) in the frontal-parietal network (FPN), default mode network (DMN), and hippocampal morphological patterns predicted follow-up SVAS severity. Functionally, baseline regional homogeneity (ReHo) in the FPN, DMN, ventral attention network (VAN), and sensorimotor network (SMN) also predicted SVAS. Parallel multiple mediation analyses revealed a dissociable neural architecture: hippocampal morphological patterns predicted SVAS via the unique indirect effect of follow-up FoMO, whereas DMN functional profiles (e.g., ReHo) predicted SVAS via follow-up NA. Notably, the VAN served as an integrative hub, exerting its influence via the unique indirect effects of both follow-up NA and FoMO. Transcriptomic analyses linked SVAS-related ReHo to two gene sets, namely positively correlated (SVAS-ReHo⁺) and negatively correlated (SVAS-ReHo⁻) genes. SVAS-ReHo⁺ genes were enriched in RNA processing and vascular signaling and expressed in endothelial cells; SVAS-ReHo⁻ genes were enriched in synaptic transmission and expressed in excitatory and inhibitory neurons. Spatial-temporal patterns showed SVAS-ReHo⁺ genes were expressed in subcortical regions across adolescence, whereas SVAS-ReHo⁻ genes were prominent in cortical-limbic areas during postnatal development. Functional decoding linked SVAS-ReHo⁺ genes to sensorimotor function and metabolism, and SVAS-ReHo⁻ genes to emotion and psychiatric risk. Together, these findings highlight dissociable structural, functional, and molecular pathways through which FoMO and NA contribute to short video addiction development.

YNIMG Journal 2025 Journal Article

Neuroanatomical and functional substrates of the short video addiction and its association with brain transcriptomic and cellular architecture

  • Yuanyuan Gao
  • Ying Hu
  • Jinlian Wang
  • Chang Liu
  • Hohjin Im
  • Weipeng Jin
  • Wenwei Zhu
  • Wei Ge

Short video addiction (SVA) has emerged as a growing behavioral and social issue, driven by the widespread use of digital platforms that provide highly engaging, personalized, and brief video content. We investigated the neuroanatomical and functional substrates of SVA symptoms, alongside brain transcriptomic and cellular characteristics, using Inter-Subject Representational Similarity Analysis (IS-RSA) and transcriptomic approaches. Behaviorally, we found that dispositional envy was associated with SVA. Structurally, SVA was positively correlated with increased morphological volumes in the orbitofrontal cortex (OFC) and bilateral cerebellum. Functionally, the dorsolateral prefrontal cortex (DLPFC), posterior cingulate cortex (PCC), cerebellum, and temporal pole (TP) exhibited heightened spontaneous activity, which was positively correlated with SVA severity. Transcriptomic and cellular analyses also showed specific genes linked to gray matter volume (GMV) associated with SVA, with predominant expression in excitatory and inhibitory neurons. These genes showed distinct spatiotemporal expression patterns in the cerebellum during adolescence. This study offers a comprehensive framework integrating structural, functional, and neurochemical evidence to highlight the neural-transcriptomic underpinnings of SVA symptoms in a non-clinical population.

NeurIPS Conference 2025 Conference Paper

One Filters All: A Generalist Filter For State Estimation

  • Shiqi Liu
  • Wenhan Cao
  • Chang Liu
  • Zeyu He
  • Tianyi Zhang
  • Yinuo Wang
  • Shengbo Eben Li

Estimating hidden states in dynamical systems, also known as optimal filtering, is a long-standing problem in various fields of science and engineering. In this paper, we introduce a general filtering framework, $\textbf{LLM-Filter}$, which leverages large language models (LLMs) for state estimation by embedding noisy observations with text prototypes. In a number of experiments for classical dynamical systems, we find that first, state estimation can significantly benefit from the knowledge embedded in pre-trained LLMs. By achieving proper modality alignment with the frozen LLM, LLM-Filter outperforms the state-of-the-art learning-based approaches. Second, we carefully design the prompt structure, System-as-Prompt (SaP), incorporating task instructions that enable LLMs to understand tasks and adapt to specific systems. Guided by these prompts, LLM-Filter exhibits exceptional generalization, capable of performing filtering tasks accurately in changed or even unseen environments. We further observe a scaling-law behavior in LLM-Filter, where accuracy improves with larger model sizes and longer training times. These findings make LLM-Filter a promising foundation model of filtering.

ICML Conference 2025 Conference Paper

Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads

  • Siqi Kou
  • Jiachun Jin
  • Zhihong Liu
  • Chang Liu
  • Ye Ma
  • Jian Jia
  • Quan Chen 0006
  • Peng Jiang 0002

We introduce Orthus, a unified multimodal model that excels in generating interleaved images and text from mixed-modality inputs by simultaneously handling discrete text tokens and continuous image features under the AR modeling principle. The continuous treatment of visual signals minimizes the information loss while the fully AR formulation renders the characterization of the correlation between modalities straightforward. Orthus leverages these advantages through its modality-specific heads—one regular language modeling (LM) head predicts discrete text tokens and one diffusion head generates continuous image features. We devise an efficient strategy for building Orthus—by substituting the Vector Quantization (VQ) operation in the existing unified AR model with a soft alternative, introducing a diffusion head, and tuning the added modules to reconstruct images, we can create an Orthus-base model effortlessly (e. g. , within 72 A100 GPU hours). Orthus-base can further embrace post-training to craft lengthy interleaved image-text, reflecting the potential for handling intricate real-world tasks. For visual understanding and generation, Orthus achieves a GenEval score of 0. 58 and an MME-P score of 1265. 8 using 7B parameters, outperforming competing baselines including Show-o and Chameleon.

TMLR Journal 2025 Journal Article

Potential Score Matching: Debiasing Molecular Structure Sampling with Potential Energy Guidance

  • Liya Guo
  • Zun Wang
  • Chang Liu
  • Junzhe Li
  • Pipi Hu
  • Yi Zhu
  • Tao Qin

The ensemble average of physical properties of molecules is closely related to the distribution of molecular conformations, and sampling such distributions is a fundamental challenge in physics and chemistry. Traditional methods like molecular dynamics (MD) simulations and Markov chain Monte Carlo (MCMC) sampling are commonly used but can be time-consuming and costly. Recently, diffusion models have emerged as efficient alternatives by learning the distribution of training data. Obtaining an unbiased target distribution is still an expensive task, primarily because it requires satisfying ergodicity. To tackle these challenges, we propose Potential Score Matching (PSM), an approach that utilizes the potential energy gradient to guide generative models. PSM does not require exact energy functions and can debias sample distributions even when trained on limited and biased data. Our method outperforms existing state-of-the-art (SOTA) models on the Lennard-Jones (LJ) potential, a commonly used toy model. Furthermore, we extend the evaluation of PSM to high-dimensional problems using the MD17 and MD22 datasets. The results demonstrate that molecular distributions generated by PSM more closely approximate the Boltzmann distribution compared to traditional diffusion models.

IJCAI Conference 2025 Conference Paper

Query-Based and Unnoticeable Graph Injection Attack from Neighborhood Perspective

  • Chang Liu
  • Hai Huang
  • Xingquan Zuo

The robustness of Graph Neural Networks (GNNs) has become an increasingly important topic due to their expanding range of applications. Various attack methods have been proposed to explore the vulnerabilities of GNNs, ranging from Graph Modification Attacks (GMA) to the more practical and flexible Graph Injection Attacks (GIA). However, existing methods face two key challenges: (i) their reliance on surrogate models, which often leads to reduced attack effectiveness due to structural differences and prior biases, and (ii) existing GIA methods often sacrifice attack success rates in undefended settings to bypass certain defense models, thereby limiting their overall effectiveness. To overcome these limitations, we propose QUGIA, a Query-based and Unnoticeable Graph Injection Attack. QUGIA injects nodes by first selecting edges based on victim node connections and then generating node features using a Bayesian framework. This ensures that the injected nodes are similar to the original graph nodes, implicitly preserving homophily and making the attack more unnoticeable. Unlike previous methods, QUGIA does not rely on surrogate models, thereby avoiding performance degradation and achieving better generalization. Extensive experiments on six real-world datasets with diverse characteristics demonstrate that QUGIA achieves unnoticeable attacks and outperforms state-of-the-art attackers. Our code is available at: https: //anonymous. 4open. science/r/QUGIA-588E/.

AAAI Conference 2025 Conference Paper

Robust Heterogeneous Graph Classification for Molecular Property Prediction with Information Bottleneck

  • Zhibin Ni
  • Chang Liu
  • Hai Wan
  • Xibin Zhao

Heterogeneous Graph Neural Networks (HGNNs) have achieved state-of-the-art performance in classifying molecular graphs, capitalizing on their ability to capture rich semantics. However, HGNNs for molecule property prediction exhibit significant susceptibility to adversarial attacks—a challenge that prior research has entirely overlooked. To fill this gap, this paper introduces the first study focused on robust graph-level representation learning tailored for heterogeneous molecular graphs. To achieve this goal, we propose a comprehensive Robust Heterogeneous Graph Classification (RHGC) framework grounded in the Information Bottleneck principle, which aims to identify the most informative and least noisy heterogeneous subgraphs to derive robust, holistic representations. This is specifically accomplished through a dedicated Node Semantic Purifier, which enhances node-level and semantic-level robustness by eliminating label-irrelevant interference using graph stochastic attention and the Hilbert-Schmidt Independence Criterion, along with a Global Graph Disentanglement method, which improves graph-level robustness by addressing information leak. Experiments on three molecular benchmarks demonstrate that RHGC enhances accuracy by an average of 5.06% under all three attack settings and meanwhile by 4.33% on clean data.

AAAI Conference 2025 Conference Paper

Specifying What You Know or Not for Multi-Label Class-Incremental Learning

  • Aoting Zhang
  • Dongbao Yang
  • Chang Liu
  • Xiaopeng Hong
  • Yu Zhou

Existing class incremental learning is mainly designed for single-label classification task, which is ill-equipped for multi-label scenarios due to the inherent contradiction of learning objectives for samples with incomplete labels. We argue that the main challenge to overcome this contradiction in multi-label class-incremental learning (MLCIL) lies in the model's inability to clearly distinguish between known and unknown knowledge. This ambiguity hinders the model's ability to retain historical knowledge, master current classes, and prepare for future learning simultaneously. In this paper, we target at specifying what is known or not to accommodate Historical, Current, and Prospective knowledge for MLCIL and propose a novel framework termed as HCP. Specifically, (i) we clarify the known classes by dynamic feature purification and recall enhancement with distribution prior, enhancing the precision and retention of known information. (ii) We design prospective knowledge mining to probe the unknown, preparing the model for future learning. Extensive experiments validate that our method effectively alleviates catastrophic forgetting in MLCIL, surpassing the previous state-of-the-art by 3.3% on average accuracy for MS-COCO B0-C10 setting without replay buffers.

EAAI Journal 2025 Journal Article

Towards pedestrian head tracking: A benchmark dataset and a multi-source data fusion network

  • Kailai Sun
  • Xinwei Wang
  • Shaobo Liu
  • Qianchuan Zhao
  • Gao Huang
  • Chang Liu

Pedestrian detection and tracking in crowded video sequences have many applications, including autonomous driving, robot navigation and pedestrian flow analysis. However, detecting and tracking pedestrians in high-density crowds face many challenges, including intra-class occlusions, complex motions, and diverse poses. Although artificial intelligence (AI) models have achieved great progress in head detection, head tracking datasets and methods are extremely lacking. Existing head datasets have limited coverage of complex pedestrian flows and scenes (e. g. , pedestrian interactions, occlusions, and object interference). It is of great importance to develop new head tracking datasets and methods. To address these challenges, we present a Chinese Large-scale Cross-scene Pedestrian Head Tracking dataset (Cchead) and a Multi-source Data Fusion Network (MDFN). The dataset has features that are of considerable interest, including 10 diverse scenes of 50, 528 frames with about 2, 366, 249 heads and 2, 358 tracks. Our dataset contains diverse pedestrian moving speeds, directions, and complex crowd pedestrian flows with collision avoidance behaviors. Existing state-of-the-art (SOTA) algorithms are tested and compared on the Cchead dataset. MDFN is the first end-to-end convolutional neural network (CNN)-based head detection and tracking network that jointly trains Red, Green, Blue (RGB) frames, pixel-level motion information (optical flow and frame difference maps), depth maps, and density maps in videos. Ablation experiments confirm the significance of multi-source data fusion. Compared with SOTA pedestrian detection and tracking methods, MDFN achieves superior performance across three datasets: Cchead, Restaurant and Crowd of Heads Dataset (CroHD). To promote further development, we share our source code and trained models for global researchers: https: //github. com/kailaisun/Cchead. We hope our datasets to become essential resources towards developing pedestrian tracking in dense crowds.

AAAI Conference 2025 Conference Paper

Wavelet-Driven Masked Image Modeling: A Path to Efficient Visual Representation

  • Wenzhao Xiang
  • Chang Liu
  • Hongyang Yu
  • Xilin Chen

Masked Image Modeling (MIM) has garnered significant attention in self-supervised learning, thanks to its impressive capacity to learn scalable visual representations tailored for downstream tasks. However, images inherently contain abundant redundant information, leading the pixel-based MIM reconstruction process to focus excessively on finer details such as textures, thus prolonging training times unnecessarily. Addressing this challenge requires a shift towards a compact representation of features during MIM reconstruction. Frequency domain analysis provides a promising avenue for achieving compact image feature representation. In contrast to the commonly used Fourier transform, wavelet transform not only offers frequency information but also preserves spatial characteristics and multi-level features of the image. Additionally, the multi-level decomposition process of wavelet transformation aligns well with the hierarchical architecture of modern neural networks. In this study, we leverage wavelet transform as a tool for efficient representation learning to expedite the training process of MIM. Specifically, we conduct multi-level decomposition of images using wavelet transform, utilizing wavelet coefficients from different levels to construct distinct reconstruction targets representing various frequencies and scales. These reconstruction targets are then integrated into the MIM process, with adjustable weights assigned to prioritize the most crucial information. Extensive experiments demonstrate that our method achieves comparable or superior performance across various downstream tasks while exhibiting higher training efficiency.

NeurIPS Conference 2024 Conference Paper

Assemblage: Automatic Binary Dataset Construction for Machine Learning

  • Chang Liu
  • Rebecca Saul
  • Yihao Sun
  • Edward Raff
  • Maya Fuchs
  • Townsend Southard Pantano
  • James Holt
  • Kristopher Micinski

Binary code is pervasive, and binary analysis is a key task in reverse engineering, malware classification, and vulnerability discovery. Unfortunately, while there exist large corpuses of malicious binaries, obtaining high-quality corpuses of benign binaries for modern systems has proven challenging (e. g. , due to licensing issues). Consequently, machine learning based pipelines for binary analysis utilize either costly commercial corpuses (e. g. , VirusTotal) or open-source binaries (e. g. , coreutils) available in limited quantities. To address these issues, we present Assemblage: an extensible cloud-based distributed system that crawls, configures, and builds Windows PE binaries to obtain high-quality binary corpuses suitable for training state-of-the-art models in binary analysis. We have run Assemblage on AWS over the past year, producing 890k Windows PE and 428k Linux ELF binaries across 29 configurations. Assemblage is designed to be both reproducible and extensible, enabling users to publish "recipes" for their datasets, and facilitating the extraction of a wide array of features. We evaluated Assemblage by using its data to train modern learning-based pipelines for compiler provenance and binary function similarity. Our results illustrate the practical need for robust corpuses of high-quality Windows PE binaries in training modern learning-based binary analyses.

EAAI Journal 2024 Journal Article

Attribute granules-based object entropy for outlier detection in nominal data

  • Chang Liu
  • Dezhong Peng
  • Hongmei Chen
  • Zhong Yuan

Concept lattice theory, which is one of the key mathematical models of granular computing, is capable of successfully dealing with uncertain information in nominal data. It has been applied to machine learning tasks such as data reduction, classification, and association rule mining. For the problem of outlier detection in nominal data, this paper presents a concept lattice theory-based approach for detecting outliers in nominal data. First, subcontexts and concept lattices based on subsets of objects are discussed. Then, information entropy is introduced into the formal context, and an object entropy based on attribute granules is proposed. Finally, a nominal data-oriented outlier detection method is explored based on the proposed object entropy. The experimental results show that the proposed detection method can effectively detect outliers in nominal data. Besides, the results of the hypothesis testing indicate that the proposed method is statistically significantly different from the other methods. The code is publicly available online at https: //github. com/from-china-to/OEOD.

EAAI Journal 2024 Journal Article

Behavioral response of fish under ammonia nitrogen stress based on machine vision

  • Wenkai Xu
  • Chang Liu
  • Guangxu Wang
  • Yue Zhao
  • Jiaxuan Yu
  • Akhter Muhammad
  • Daoliang Li

The long-term accumulation of ammonia nitrogen in aquaculture seriously affects the life of fish and even causes large-scale death. Moreover, when the concentration of ammonia nitrogen starts to accumulate, it is a judgment standard to provide early warning through the changes in fish behavior to prevent excessive ammonia nitrogen in water. Therefore, this paper proposes a novel approach to monitoring water quality for aquaculture based on deep learning and three-dimensional movement trajectory. The improved YOLOv8 model was used as the object detection approach to obtain three-dimensional position information of fish by combining Kalman filter, Kuhn Munkres (KM) algorithm, and Kernelized Correlation Filters (KCF) algorithm. The proposed approach was evaluated in the recovery experiment of acute ammonia nitrogen stress of sturgeon, bass, and crucian. The experimental results show that the precision, recall, mAP@0. 5, and mAP@0. 5: 0. 95 of the improved YOLOv8 model are 0. 964, 0. 914, 0. 979, and 0. 602, respectively. In addition, the proposed three-dimensional positioning approach can qualitatively and quantitatively analyze the fish behavior in different stages and further explores the fish behavior changes through behavior trajectories, volumes of exercise, spatial distribution, and movement velocity. This research provides a new method and idea for studying the abnormal behavior of aquatic animals under ammonia nitrogen stress and has theoretical and practical significance.

AAAI Conference 2024 Conference Paper

Bootstrapping Large Language Models for Radiology Report Generation

  • Chang Liu
  • Yuanhe Tian
  • Weidong Chen
  • Yan Song
  • Yongdong Zhang

Radiology report generation (RRG) aims to automatically generate a free-text description from a specific clinical radiograph, e.g., chest X-Ray images. Existing approaches tend to perform RRG with specific models trained on the public yet limited data from scratch, where they often lead to inferior performance owing to the problem of inefficient capabilities in both aligning visual and textual features and generating informative reports accordingly. Currently, large language models (LLMs) offered a promising solution to text generation with their power in learning from big data, especially for cross-modal scenarios such as RRG. However, most existing LLMs are pre-trained on general data, and suffer from the same problem of conventional approaches caused by knowledge gap between general and medical domain if they are applied to RRG. Therefore in this paper, we propose an approach to bootstrapping LLMs for RRG with a in-domain instance induction and a coarse-to-fine decoding process. Specifically, the in-domain instance induction process learns to align the LLM to radiology reports from general texts through contrastive learning. The coarse-to-fine decoding performs a text elevating process for those reports from the ranker, further enhanced with visual features and refinement prompts. Experimental results on two prevailing RRG datasets, namely, IU X-Ray and MIMIC-CXR, demonstrate the superiority of our approach to previous state-of-the-art solutions. Further analyses illustrate that, for the LLM, the induction process enables it to better align with the medical domain and the coarse-to-fine generation allows it to conduct more precise text generation.

NeurIPS Conference 2024 Conference Paper

Infusing Self-Consistency into Density Functional Theory Hamiltonian Prediction via Deep Equilibrium Models

  • Zun Wang
  • Chang Liu
  • Nianlong Zou
  • He Zhang
  • Xinran Wei
  • Lin Huang
  • Lijun Wu
  • Bin Shao

In this study, we introduce a unified neural network architecture, the Deep Equilibrium Density Functional Theory Hamiltonian (DEQH) model, which incorporates Deep Equilibrium Models (DEQs) for predicting Density Functional Theory (DFT) Hamiltonians. The DEQH model inherently captures the self-consistency nature of Hamiltonian, a critical aspect often overlooked by traditional machine learning approaches for Hamiltonian prediction. By employing DEQ within our model architecture, we circumvent the need for DFT calculations during the training phase to introduce the Hamiltonian's self-consistency, thus addressing computational bottlenecks associated with large or complex systems. We propose a versatile framework that combines DEQ with off-the-shelf machine learning models for predicting Hamiltonians. When benchmarked on the MD17 and QH9 datasets, DEQHNet, an instantiation of the DEQH framework, has demonstrated a significant improvement in prediction accuracy. Beyond a predictor, the DEQH model is a Hamiltonian solver, in the sense that it uses the fixed-point solving capability of the deep equilibrium model to iteratively solve for the Hamiltonian. Ablation studies of DEQHNet further elucidate the network's effectiveness, offering insights into the potential of DEQ-integrated networks for Hamiltonian learning. We open source our implementation at https: //github. com/Zun-Wang/DEQHNet.

NeurIPS Conference 2024 Conference Paper

Is Function Similarity Over-Engineered? Building a Benchmark

  • Rebecca Saul
  • Chang Liu
  • Noah Fleischmann
  • Richard Zak
  • Kristopher Micinski
  • Edward Raff
  • James Holt

Binary analysis is a core component of many critical security tasks, including reverse engineering, malware analysis, and vulnerability detection. Manual analysis is often time-consuming, but identifying commonly-used or previously-seen functions can reduce the time it takes to understand a new file. However, given the complexity of assembly, and the NP-hard nature of determining function equivalence, this task is extremely difficult. Common approaches often use sophisticated disassembly and decompilation tools, graph analysis, and other expensive pre-processing steps to perform function similarity searches over some corpus. In this work, we identify a number of discrepancies between the current research environment and the underlying application need. To remedy this, we build a new benchmark, REFuSe-Bench, for binary function similarity detection consisting of high-quality datasets and tests that better reflect real-world use cases. In doing so, we address issues like data duplication and accurate labeling, experiment with real malware, and perform the first serious evaluation of ML binary function similarity models on Windows data. Our benchmark reveals that a new, simple baseline — one which looks at only the raw bytes of a function, and requires no disassembly or other pre-processing --- is able to achieve state-of-the-art performance in multiple settings. Our findings challenge conventional assumptions that complex models with highly-engineered features are being used to their full potential, and demonstrate that simpler approaches can provide significant value.

AAAI Conference 2024 Conference Paper

Learning Spatially Collaged Fourier Bases for Implicit Neural Representation

  • Jason Chun Lok Li
  • Chang Liu
  • Binxiao Huang
  • Ngai Wong

Existing approaches to Implicit Neural Representation (INR) can be interpreted as a global scene representation via a linear combination of Fourier bases of different frequencies. However, such universal basis functions can limit the representation capability in local regions where a specific component is unnecessary, resulting in unpleasant artifacts. To this end, we introduce a learnable spatial mask that effectively dispatches distinct Fourier bases into respective regions. This translates into collaging Fourier patches, thus enabling an accurate representation of complex signals. Comprehensive experiments demonstrate the superior reconstruction quality of the proposed approach over existing baselines across various INR tasks, including image fitting, video representation, and 3D shape representation. Our method outperforms all other baselines, improving the image fitting PSNR by over 3dB and 3D reconstruction to 98.81 IoU and 0.0011 Chamfer Distance.

NeurIPS Conference 2024 Conference Paper

MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models

  • Yichi Zhang
  • Yao Huang
  • Yitong Sun
  • Chang Liu
  • Zhe Zhao
  • Zhengwei Fang
  • Yifan Wang
  • Huanran Chen

Despite the superior capabilities of Multimodal Large Language Models (MLLMs) across diverse tasks, they still face significant trustworthiness challenges. Yet, current literature on the assessment of trustworthy MLLMs remains limited, lacking a holistic evaluation to offer thorough insights into future improvements. In this work, we establish MultiTrust, the first comprehensive and unified benchmark on the trustworthiness of MLLMs across five primary aspects: truthfulness, safety, robustness, fairness, and privacy. Our benchmark employs a rigorous evaluation strategy that addresses both multimodal risks and cross-modal impacts, encompassing 32 diverse tasks with self-curated datasets. Extensive experiments with 21 modern MLLMs reveal some previously unexplored trustworthiness issues and risks, highlighting the complexities introduced by the multimodality and underscoring the necessity for advanced methodologies to enhance their reliability. For instance, typical proprietary models still struggle with the perception of visually confusing images and are vulnerable to multimodal jailbreaking and adversarial attacks; MLLMs are more inclined to disclose privacy in text and reveal ideological and cultural biases even when paired with irrelevant images in inference, indicating that the multimodality amplifies the internal risks from base LLMs. Additionally, we release a scalable toolbox for standardized trustworthiness research, aiming to facilitate future advancements in this important field. Code and resources are publicly available at: https: //multi-trust. github. io/.

AAAI Conference 2024 Conference Paper

MuST: Robust Image Watermarking for Multi-Source Tracing

  • Guanjie Wang
  • Zehua Ma
  • Chang Liu
  • Xi Yang
  • Han Fang
  • Weiming Zhang
  • Nenghai Yu

In recent years, with the popularity of social media applications, massive digital images are available online, which brings great convenience to image recreation. However, the use of unauthorized image materials in multi-source composite images is still inadequately regulated, which may cause significant loss and discouragement to the copyright owners of the source image materials. Ideally, deep watermarking techniques could provide a solution for protecting these copyrights based on their encoder-noise-decoder training strategy. Yet existing image watermarking schemes, which are mostly designed for single images, cannot well address the copyright protection requirements in this scenario, since the multi-source image composing process commonly includes distortions that are not well investigated in previous methods, e.g., the extreme downsizing. To meet such demands, we propose MuST, a multi-source tracing robust watermarking scheme, whose architecture includes a multi-source image detector and minimum external rectangle operation for multiple watermark resynchronization and extraction. Furthermore, we constructed an image material dataset covering common image categories and designed the simulation model of the multi-source image composing process as the noise layer. Experiments demonstrate the excellent performance of MuST in tracing sources of image materials from the composite images compared with SOTA watermarking methods, which could maintain the extraction accuracy above 98% to trace the sources of at least 3 different image materials while keeping the average PSNR of watermarked image materials higher than 42.51 dB. We released our code on https://github.com/MrCrims/MuST

AAAI Conference 2024 Conference Paper

Parallel Vertex Diffusion for Unified Visual Grounding

  • Zesen Cheng
  • Kehan Li
  • Peng Jin
  • Siheng Li
  • Xiangyang Ji
  • Li Yuan
  • Chang Liu
  • Jie Chen

Unified visual grounding (UVG) capitalizes on a wealth of task-related knowledge across various grounding tasks via one-shot training, which curtails retraining costs and task-specific architecture design efforts. Vertex generation-based UVG methods achieve this versatility by unified modeling object box and contour prediction and provide a text-powered interface to vast related multi-modal tasks, e.g., visual question answering and captioning. However, these methods typically generate vertexes sequentially through autoregression, which is prone to be trapped in error accumulation and heavy computation, especially for high-dimension sequence generation in complex scenarios. In this paper, we develop Parallel Vertex Diffusion (PVD) based on the parallelizability of diffusion models to accurately and efficiently generate vertexes in a parallel and scalable manner. Since the coordinates fluctuate greatly, it typically encounters slow convergence when training diffusion models without geometry constraints. Therefore, we consummate our PVD by two critical components, i.e., center anchor mechanism and angle summation loss, which serve to normalize coordinates and adopt a differentiable geometry descriptor from the point-in-polygon problem of computational geometry to constrain the overall difference of prediction and label vertexes. These innovative designs empower our PVD to demonstrate its superiority with state-of-the-art performance across various grounding tasks.

NeurIPS Conference 2024 Conference Paper

Physical Consistency Bridges Heterogeneous Data in Molecular Multi-Task Learning

  • Yuxuan Ren
  • Dihan Zheng
  • Chang Liu
  • Peiran Jin
  • Yu Shi
  • Lin Huang
  • Jiyan He
  • Shengjie Luo

In recent years, machine learning has demonstrated impressive capability in handling molecular science tasks. To support various molecular properties at scale, machine learning models are trained in the multi-task learning paradigm. Nevertheless, data of different molecular properties are often not aligned: some quantities, e. g. equilibrium structure, demand more cost to compute than others, e. g. energy, so their data are often generated by cheaper computational methods at the cost of lower accuracy, which cannot be directly overcome through multi-task learning. Moreover, it is not straightforward to leverage abundant data of other tasks to benefit a particular task. To handle such data heterogeneity challenges, we exploit the specialty of molecular tasks that there are physical laws connecting them, and design consistency training approaches that allow different tasks to exchange information directly so as to improve one another. Particularly, we demonstrate that the more accurate energy data can improve the accuracy of structure prediction. We also find that consistency training can directly leverage force and off-equilibrium structure data to improve structure prediction, demonstrating a broad capability for integrating heterogeneous data.

JMLR Journal 2024 Journal Article

Pygmtools: A Python Graph Matching Toolkit

  • Runzhong Wang
  • Ziao Guo
  • Wenzheng Pan
  • Jiale Ma
  • Yikai Zhang
  • Nan Yang
  • Qi Liu
  • Longxuan Wei

Graph matching aims to find node-to-node matching among multiple graphs, which is a fundamental yet challenging problem. To facilitate graph matching in scientific research and industrial applications, pygmtools is released, which is a Python graph matching toolkit that implements a comprehensive collection of two-graph matching and multi-graph matching solvers, covering both learning-free solvers as well as learning-based neural graph matching solvers. Our implementation supports numerical backends including Numpy, PyTorch, Jittor, Paddle, runs on Windows, MacOS and Linux, and is friendly to install and configure. Comprehensive documentations covering beginner's guide, API reference and examples are available online. pygmtools is open-sourced under Mulan PSL v2 license. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2024. ( edit, beta )

AAAI Conference 2024 Conference Paper

Quantum Interference Model for Semantic Biases of Glosses in Word Sense Disambiguation

  • Junwei Zhang
  • Ruifang He
  • Fengyu Guo
  • Chang Liu

Word Sense Disambiguation (WSD) aims to determine the meaning of the target word according to the given context. Currently, a single representation enhanced by glosses from different dictionaries or languages is used to characterize each word sense. By analyzing the similarity between glosses of the same word sense, we find semantic biases among them, revealing that the glosses have their own descriptive perspectives. Therefore, the traditional approach of integrating all glosses by a single representation results in failing to present the unique semantics revealed by the individual glosses. In this paper, a quantum superposition state is employed to formalize the representations of multiple glosses of the same word sense to reveal their distributions. Furthermore, the quantum interference model is leveraged to calculate the probability that the target word belongs to this superposition state. The advantage is that the interference term can be regarded as a confidence level to guide word sense recognition. Finally, experiments are performed under standard WSD evaluation framework and the latest cross-lingual datasets, and the results verify the effectiveness of our model.

YNIMG Journal 2024 Journal Article

Relationships between brain structure-function coupling in normal aging and cognition: A cross-ethnicity population-based study

  • Chang Liu
  • Jing Jing
  • Jiyang Jiang
  • Wei Wen
  • Wanlin Zhu
  • Zixiao Li
  • Yuesong Pan
  • Xueli Cai

Increased efforts in neuroscience seek to understand how macro-anatomical and physiological connectomes cooperatively work to generate cognitive behaviors. However, the structure-function coupling characteristics in normal aging individuals remain unclear. Here, we developed an index, the Coupling in Brain Structural connectome and Functional connectome (C-BSF) index, to quantify regional structure-function coupling in a large community-based cohort. C-BSF used diffusion tensor imaging (DTI) and resting-state functional magnetic resonance imaging (fMRI) data from the Polyvascular Evaluation for Cognitive Impairment and Vascular Events study (PRECISE) cohort (2007 individuals, age: 61.15 ± 6.49 years) and the Sydney Memory and Ageing Study (MAS) cohort (254 individuals, age: 83.45 ± 4.33 years). We observed that structure-function coupling was the strongest in the visual network and the weakest in the ventral attention network. We also observed that the weaker structure-function coupling was associated with increased age and worse cognitive level of the participant. Meanwhile, the structure-function coupling in the visual network was associated with the visuospatial performance and partially mediated the connections between age and the visuospatial function. This work contributes to our understanding of the underlying brain mechanisms by which aging affects cognition and also help establish early diagnosis and treatment approaches for neurological diseases in the elderly.

IROS Conference 2024 Conference Paper

Risk-Aware Non-Myopic Motion Planner for Large-Scale Robotic Swarm Using CVaR Constraints

  • Xuru Yang
  • Yunze Hu
  • Han Gao
  • Kang Ding
  • Zhaoyang Li
  • Pingping Zhu
  • Ying Sun
  • Chang Liu

Swarm robotics has garnered significant attention due to its ability to accomplish elaborate and synchronized tasks. Existing methodologies for motion planning of swarm robotic systems mainly encounter difficulties in scalability and safety guarantee. To address these limitations, we propose a Risk-aware swarm mOtion planner using conditional ValuE-at-Risk (ROVER) that systematically navigates large-scale swarms through cluttered environments while ensuring safety. ROVER formulates a finite-time model predictive control (FTMPC) problem predicated upon the macroscopic state of the robot swarm represented by a Gaussian Mixture Model (GMM) and integrates conditional value-at-risk (CVaR) to ensure collision avoidance. The key component of ROVER is imposing a CVaR constraint on the distribution of the Signed Distance Function between the swarm GMM and obstacles in the FTMPC to enforce collision avoidance. Utilizing the analytical expression of CVaR of a GMM derived in this work, we develop a computationally efficient solution to solve the non-linear constrained FTMPC through sequential linear programming. Simulations and comparisons with representative benchmark approaches demonstrate the effectiveness of ROVER in flexibility, scalability, and safety guarantee.

IROS Conference 2024 Conference Paper

SwarmPRM: Probabilistic Roadmap Motion Planning for Large-Scale Swarm Robotic Systems

  • Yunze Hu
  • Xuru Yang
  • Kangjie Zhou
  • Qinghang Liu
  • Kang Ding
  • Han Gao
  • Pingping Zhu
  • Chang Liu

Large-scale swarm robotic systems consisting of numerous cooperative agents show considerable promise for performing autonomous tasks across various sectors. Nonetheless, traditional motion planning approaches often face a trade-off between scalability and solution quality due to the exponential growth of the joint state space of robots. In response, this work proposes SwarmPRM, a hierarchical, scalable, computationally efficient, and risk-aware sampling-based motion planning approach for large-scale swarm robots. SwarmPRM utilizes a Gaussian Mixture Model (GMM) to represent the swarm’s macroscopic state and constructs a Probabilistic Roadmap in Gaussian space, referred to as the Gaussian roadmap, to generate a transport trajectory of GMM. This trajectory is then followed by each robot at the microscopic stage. To enhance trajectory safety, SwarmPRM incorporates the conditional value-at-risk (CVaR) in the collision checking process to impart the property of risk awareness to the constructed Gaussian roadmap. SwarmPRM then crafts a linear programming formulation to compute the optimal GMM transport trajectory within this roadmap. Extensive simulations demonstrate that SwarmPRM outperforms state-of-the-art methods in computational efficiency, scalability, and trajectory quality while offering the capability to adjust the risk tolerance of generated trajectories.

EAAI Journal 2024 Journal Article

TL-TSD: A two-layer traffic sub-area division framework based on trajectory clustering

  • Chang Liu
  • Xinzheng Niu
  • Yong Ma
  • Shiyun Shao
  • Bing Wang

The development of intelligent traffic coordination and smart mobility under the digital economy has increased the need for effective traffic sub-area division. The traditional division methods rely on predefined urban administrative units, failing to adapt to the varying traffic conditions. Therefore, data-driven approaches have been developed to divide traffic sub-areas, considering both the frequency of data updates and a balance between accuracy and traffic characteristics within the selected data. However, these approaches are affected by unsatisfactory zone boundaries, and the relevant clustering algorithms cannot efficiently support traffic sub-area division. To address these issues, this paper proposes a two-layer traffic sub-area division (TL-TSD) framework that considers factors such as traffic density, road structure, and spatiotemporal characteristics inherent in the overall trajectory. Specifically, in the first layer, we introduce a specific equation based on dynamic time warping to adaptively perform trajectory cutting and record matching information while maintaining the shape features of the overall trajectory. Subsequently, we design a modified density based spatial clustering of applications with noise algorithm to obtain initial clusters. In the second layer, based on the matching information, we introduce two trajectory refinement algorithms. Each is designed for final clusters and well-defined boundaries. Extensive experimental results and statistical analysis on two real-world datasets indicate that the proposed framework can effectively address the aforementioned technical challenges and outperform the comparison algorithms in terms of the overall dunn metric. Moreover, the visualization results show that the final clusters with well-defined boundaries are more effective for dividing traffic sub-areas.

NeurIPS Conference 2024 Conference Paper

Towards General Loop Invariant Generation: A Benchmark of Programs with Memory Manipulation

  • Chang Liu
  • Xiwei Wu
  • Yuan Feng
  • Qinxiang Cao
  • Junchi Yan

Program verification is vital for ensuring software reliability, especially in the context of increasingly complex systems. Loop invariants, remaining true before and after each iteration of loops, are crucial for this verification process. Traditional provers and machine learning based methods for generating loop invariants often require expert intervention or extensive labeled data, and typically only handle numerical property verification. These methods struggle with programs involving complex data structures and memory manipulations, limiting their applicability and automation capabilities. This paper introduces a new benchmark named LIG-MM, specifically for programs with complex data structures and memory manipulations. We collect 312 programs from various sources, including daily programs from college homework, the international competition (SV-COMP), benchmarks from previous papers (SLING), and programs from real-world software systems (Linux Kernel, GlibC, LiteOS, and Zephyr). Based on LIG-MM, our findings indicate that previous methods, including GPT-4, fail to automate verification for these programs. Consequently, we propose a novel LLM-SE framework that coordinates LLM with symbolic execution, fine-tuned using self-supervised learning, to generate loop invariants. Experimental results on LIG-MM demonstrate that our LLM-SE outperforms state-of-the-art methods, offering a new direction toward automated program verification in real-world scenarios.

EAAI Journal 2023 Journal Article

A text mining-based approach for understanding Chinese railway incidents caused by electromagnetic interference

  • Chang Liu
  • Shiwu Yang

The high-speed railway is a deeply coupled system with strong and weak electrical equipment, while complex electromagnetic interference (EMI) consequently brings potential risks and hazards to signaling safety. Since the incident reports on signaling failure intrinsically reflect the generation and evolution mechanism of equipment failures, relying on text mining technology, this paper tries to extract failure-related entities and constructs a knowledge graph to clarify the negative impact of the on-site electromagnetic environment. Firstly, based on convolutional neural networks (CNN), a supervised deep learning model for Chinese text classification is established to generate a corpus containing only railway failures caused by EMI. Then, the bidirectional long short-term memory (BiLSTM) and bidirectional encoder representations from transformers (BERT) algorithms are adopted to build the named entity recognition (NER) model. A NER algorithm more suitable for Chinese text features is proposed through ensemble modeling, training verification, and comparative evaluation. Finally, the knowledge storage and visualization of relational graph construction based on the Neo4j database are realized according to the obtained failure-related entities. This knowledge topology network effectively explores the inherent relationship between EMI factors and railway safety, as well as provides support for improving the safety assessment and enhancing the anti-interference performance of the equipment.

AAAI Conference 2023 Conference Paper

AutoStegaFont: Synthesizing Vector Fonts for Hiding Information in Documents

  • Xi Yang
  • Jie Zhang
  • Han Fang
  • Chang Liu
  • Zehua Ma
  • Weiming Zhang
  • Nenghai Yu

Hiding information in text documents has been a hot topic recently, with the most typical schemes of utilizing fonts. By constructing several fonts with similar appearances, information can be effectively represented and embedded in documents. However, due to the unstructured characteristic, font vectors are more difficult to synthesize than font images. Existing methods mainly use handcrafted features to design the fonts manually, which is time-consuming and labor-intensive. Moreover, due to the diversity of fonts, handcrafted features are not generalizable to different fonts. Besides, in practice, since documents might be distorted through transmission, ensuring extractability under distortions is also an important requirement. Therefore, three requirements are imposed on vector font generation in this domain: automaticity, generalizability, and robustness. However, none of the existing methods can satisfy these requirements well and simultaneously. To satisfy the above requirements, we propose AutoStegaFont, an automatic vector font synthesis scheme for hiding information in documents. Specifically, we design a two-stage and dual-modality learning framework. In the first stage, we jointly train an encoder and a decoder to invisibly encode the font images with different information. To ensure robustness, we target designing a noise layer to work with the encoder and decoder during training. In the second stage, we employ a differentiable rasterizer to establish a connection between the image and the vector modality. Then, we design an optimization algorithm to convey the information from the encoded image to the corresponding vector. Thus the encoded font vectors can be automatically generated. Extensive experiments demonstrate the superior performance of our scheme in automatically synthesizing vector fonts for hiding information in documents, with robustness to distortions caused by low-resolution screenshots, printing, and photography. Besides, the proposed framework has better generalizability to fonts with diverse styles and languages.

AAAI Conference 2023 Conference Paper

Context-Aware Transformer for 3D Point Cloud Automatic Annotation

  • Xiaoyan Qian
  • Chang Liu
  • Xiaojuan Qi
  • Siew-Chong Tan
  • Edmund Lam
  • Ngai Wong

3D automatic annotation has received increased attention since manually annotating 3D point clouds is laborious. However, existing methods are usually complicated, e.g., pipelined training for 3D foreground/background segmentation, cylindrical object proposals, and point completion. Furthermore, they often overlook the inter-object feature correlation that is particularly informative to hard samples for 3D annotation. To this end, we propose a simple yet effective end-to-end Context-Aware Transformer (CAT) as an automated 3D-box labeler to generate precise 3D box annotations from 2D boxes, trained with a small number of human annotations. We adopt the general encoder-decoder architecture, where the CAT encoder consists of an intra-object encoder (local) and an inter-object encoder (global), performing self-attention along the sequence and batch dimensions, respectively. The former models intra-object interactions among points and the latter extracts feature relations among different objects, thus boosting scene-level understanding. Via local and global encoders, CAT can generate high-quality 3D box annotations with a streamlined workflow, allowing it to outperform existing state-of-the-arts by up to 1.79% 3D AP on the hard task of the KITTI test set.

AAAI Conference 2023 Conference Paper

DeAR: A Deep-Learning-Based Audio Re-recording Resilient Watermarking

  • Chang Liu
  • Jie Zhang
  • Han Fang
  • Zehua Ma
  • Weiming Zhang
  • Nenghai Yu

Audio watermarking is widely used for leaking source tracing. The robustness of the watermark determines the traceability of the algorithm. With the development of digital technology, audio re-recording (AR) has become an efficient and covert means to steal secrets. AR process could drastically destroy the watermark signal while preserving the original information. This puts forward a new requirement for audio watermarking at this stage, that is, to be robust to AR distortions. Unfortunately, none of the existing algorithms can effectively resist AR attacks due to the complexity of the AR process. To address this limitation, this paper proposes DeAR, a deep-learning-based audio re-recording resistant watermarking. Inspired by DNN-based image watermarking, we pioneer a deep learning framework for audio carriers, based on which the watermark signal can be effectively embedded and extracted. Meanwhile, in order to resist the AR attack, we delicately analyze the distortions that occurred in the AR process and design the corresponding distortion layer to cooperate with the proposed watermarking framework. Extensive experiments show that the proposed algorithm can resist not only common electronic channel distortions but also AR distortions. Under the premise of high-quality embedding (SNR=25.86dB), in the case of a common re-recording distance (20cm), the algorithm can effectively achieve an average bit recovery accuracy of 98.55%.

NeurIPS Conference 2023 Conference Paper

Discover and Align Taxonomic Context Priors for Open-world Semi-Supervised Learning

  • Yu Wang
  • Zhun Zhong
  • Pengchong Qiao
  • Xuxin Cheng
  • Xiawu Zheng
  • Chang Liu
  • Nicu Sebe
  • Rongrong Ji

Open-world Semi-Supervised Learning (OSSL) is a realistic and challenging task, aiming to classify unlabeled samples from both seen and novel classes using partially labeled samples from the seen classes. Previous works typically explore the relationship of samples as priors on the pre-defined single-granularity labels to help novel class recognition. In fact, classes follow a taxonomy and samples can be classified at multiple levels of granularity, which contains more underlying relationships for supervision. We thus argue that learning with single-granularity labels results in sub-optimal representation learning and inaccurate pseudo labels, especially with unknown classes. In this paper, we take the initiative to explore and propose a uniformed framework, called Taxonomic context prIors Discovering and Aligning (TIDA), which exploits the relationship of samples under various granularity. It allows us to discover multi-granularity semantic concepts as taxonomic context priors (i. e. , sub-class, target-class, and super-class), and then collaboratively leverage them to enhance representation learning and improve the quality of pseudo labels. Specifically, TIDA comprises two components: i) A taxonomic context discovery module that constructs a set of hierarchical prototypes in the latent space to discover the underlying taxonomic context priors; ii) A taxonomic context-based prediction alignment module that enforces consistency across hierarchical predictions to build the reliable relationship between classes among various granularity and provide additions supervision. We demonstrate that these two components are mutually beneficial for an effective OSSL framework, which is theoretically explained from the perspective of the EM algorithm. Extensive experiments on seven commonly used datasets show that TIDA can significantly improve the performance and achieve a new state of the art. The source codes are publicly available at https: //github. com/rain305f/TIDA.

AAAI Conference 2023 Conference Paper

ILSGAN: Independent Layer Synthesis for Unsupervised Foreground-Background Segmentation

  • Qiran Zou
  • Yu Yang
  • Wing Yin Cheung
  • Chang Liu
  • Xiangyang Ji

Unsupervised foreground-background segmentation aims at extracting salient objects from cluttered backgrounds, where Generative Adversarial Network (GAN) approaches, especially layered GANs, show great promise. However, without human annotations, they are typically prone to produce foreground and background layers with non-negligible semantic and visual confusion, dubbed "information leakage", resulting in notable degeneration of the generated segmentation mask. To alleviate this issue, we propose a simple-yet-effective explicit layer independence modeling approach, termed Independent Layer Synthesis GAN (ILSGAN), pursuing independent foreground-background layer generation by encouraging their discrepancy. Specifically, it targets minimizing the mutual information between visible and invisible regions of the foreground and background to spur interlayer independence. Through in-depth theoretical and experimental analyses, we justify that explicit layer independence modeling is critical to suppressing information leakage and contributes to impressive segmentation performance gains. Also, our ILSGAN achieves strong state-of-the-art generation quality and segmentation performance on complex real-world data.

YNIMG Journal 2023 Journal Article

Non-contrast assessment of blood-brain barrier permeability to water in mice: An arterial spin labeling study at cerebral veins

  • Zhiliang Wei
  • Hongshuai Liu
  • Zixuan Lin
  • Minmin Yao
  • Ruoxuan Li
  • Chang Liu
  • Yuguo Li
  • Jiadi Xu

Blood-brain barrier (BBB) plays a critical role in protecting the brain from toxins and pathogens. However, in vivo tools to assess BBB permeability are scarce and often require the use of exogenous contrast agents. In this study, we aimed to develop a non-contrast arterial-spin-labeling (ASL) based MRI technique to estimate BBB permeability to water in mice. By determining the relative fraction of labeled water spins that were exchanged into the brain tissue as opposed to those that remained in the cerebral veins, we estimated indices of global BBB permeability to water including water extraction fraction (E) and permeability surface-area product (PS). First, using multiple post-labeling delay ASL experiments, we estimated the bolus arrival time (BAT) of the labeled spins to reach the great vein of Galen (VG) to be 691.2 ± 14.5 ms (N = 5). Next, we investigated the dependence of the VG ASL signal on labeling duration and identified an optimal imaging protocol with a labeling duration of 1200 ms and a PLD of 100 ms. Quantitative E and PS values in wild-type mice were found to be 59.9 ± 3.2% and 260.9 ± 18.9 ml/100 g/min, respectively. In contrast, mice with Huntington's disease (HD) revealed a significantly higher E (69.7 ± 2.4%, P = 0.026) and PS (318.1 ± 17.1 ml/100 g/min, P = 0.040), suggesting BBB breakdown in this mouse model. Reproducibility studies revealed a coefficient-of-variation (CoV) of 4.9 ± 1.7% and 6.1 ± 1.2% for E and PS, respectively. The proposed method may open new avenues for preclinical research on pathophysiological mechanisms of brain diseases and therapeutic trials in animal models.

AIIM Journal 2023 Journal Article

Reconstruction of central arterial pressure waveform based on CBi-SAN network from radial pressure waveform

  • Hanguang Xiao
  • Wangwang Song
  • Chang Liu
  • Bo Peng
  • Mi Zhu
  • Bin Jiang
  • Zhi Liu

The central arterial pressure (CAP) is an important physiological indicator of the human cardiovascular system which represents one of the greatest threats to human health. Accurate non-invasive detection and reconstruction of CAP waveforms are crucial for the reliable treatment of cardiovascular system diseases. However, the traditional methods are reconstructed with relatively low accuracy, and some deep learning neural network models also have difficulty in extracting features, as a result, these methods have potential for further advancement. In this study, we proposed a novel model (CBi-SAN) to implement an end-to-end relationship from radial artery pressure (RAP) waveform to CAP waveform, which consisted of the convolutional neural network (CNN), the bidirectional long-short-time memory network (BiLSTM), and the self-attention mechanism to improve the performance of CAP reconstruction. The data on invasive measurements of CAP and RAP waveform were used in 62 patients before and after medication to develop and validate the performance of CBi-SAN model for reconstructing CAP waveform. We compared it with traditional methods and deep learning models in mean absolute error (MAE), root mean square error (RMSE), and Spearman correlation coefficient (SCC). Study results indicated the CBi-SAN model performed great performance on CAP waveform reconstruction (MAE: 2. 23 ± 0. 11 mmHg, RMSE: 2. 21 ± 0. 07 mmHg), concurrently, the best reconstruction effect was obtained in the central artery systolic pressure (CASP) and the central artery diastolic pressure(CADP) (RMSE CASP: 2. 94 ± 0. 48 mmHg, RMSE CADP: 1. 96 ± 0. 06 mmHg). These results implied the performance of the CAP reconstruction based on CBi-SAN model was superior to the existing methods, hopped to be effectively applied to clinical practice in the future.

EAAI Journal 2023 Journal Article

Sparse q -Laplace kernel online prediction for indoor localization in the Internet of Things

  • Chang Liu
  • Xifeng Li
  • Dongjie Bi
  • Libiao Peng
  • Yongle Xie

As an important component of IoT-oriented applications, the indoor positioning estimation is getting increasing concern with IoT’s rapid development. However, the performance of indoor positioning is heavily relied on the complexity of the environment, which is always full of noises, such as Gaussian noise mixed with impulsive noise. These noises can deteriorate the precision performance of indoor positioning identification systems. In order to attack this problem, we propose a new kernel called generalized q -Laplace kernel to produce a new q -Laplace kernel adaptive filtering algorithm ( q LaKAF), which is combined with the recently proposed kernel mean p-power error criterion (KMPE). The proposed q LaKAF has two vital features. Firstly, the q -Laplace kernel is employed to combat the Gaussian noise together with abrupt noise in real-world scenarios. Besides, the KMPE is utilized to obtain higher-order information in addition to second-order information which facilitates to suppress mixed noise. Furthermore, a Strengthened Surprise Criterion (SSC) is applied to q LaKAF to reduce the size of neural networks. The q LaKAF algorithm assisted by SSC is called Strengthened Surprise Criterion q -Laplace kernel adaptive filtering algorithm (SSC- q LaKAF). Three experiments are carried out on two real-world scenarios to validate the effectiveness and accuracy performance. The experimental results demonstrate that the accuracy has been improved by at least 3. 6%; meanwhile, the SSC- q LaKAF neural network size can be reduced by up to 12. 5%, without much loss of accuracy performance compared to q LaKAF.

IJCAI Conference 2023 Conference Paper

Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment

  • Peng Jin
  • Hao Li
  • Zesen Cheng
  • Jinfa Huang
  • Zhennan Wang
  • Li Yuan
  • Chang Liu
  • Jie Chen

Text-video retrieval is a challenging cross-modal task, which aims to align visual entities with natural language descriptions. Current methods either fail to leverage the local details or are computationally expensive. What's worse, they fail to leverage the heterogeneous concepts in data. In this paper, we propose the Disentangled Conceptualization and Set-to-set Alignment (DiCoSA) to simulate the conceptualizing and reasoning process of human beings. For disentangled conceptualization, we divide the coarse feature into multiple latent factors related to semantic concepts. For set-to-set alignment, where a set of visual concepts correspond to a set of textual concepts, we propose an adaptive pooling method to aggregate semantic concepts to address the partial matching. In particular, since we encode concepts independently in only a few dimensions, DiCoSA is superior at efficiency and granularity, ensuring fine-grained interactions using a similar computational complexity as coarse-grained alignment. Extensive experiments on five datasets, including MSR-VTT, LSMDC, MSVD, ActivityNet, and DiDeMo, demonstrate that our method outperforms the existing state-of-the-art methods.

IJCAI Conference 2023 Conference Paper

TG-VQA: Ternary Game of Video Question Answering

  • Hao Li
  • Peng Jin
  • Zesen Cheng
  • Songyang Zhang
  • Kai Chen
  • Zhennan Wang
  • Chang Liu
  • Jie Chen

Video question answering aims at answering a question about the video content by reasoning the alignment semantics within them. However, since relying heavily on human instructions, i. e. , annotations or priors, current contrastive learning-based VideoQA methods remains challenging to perform fine-grained visual-linguistic alignments. In this work, we innovatively resort to game theory, which can simulate complicated relationships among multiple players with specific interaction strategies, e. g. , video, question, and answer as ternary players, to achieve fine-grained alignment for VideoQA task. Specifically, we carefully design a VideoQA-specific interaction strategy to tailor the characteristics of VideoQA, which can mathematically generate the fine-grained visual-linguistic alignment label without label-intensive efforts. Our TG-VQA outperforms existing state-of-the-art by a large margin (more than 5%) on long-term and short-term VideoQA datasets, verifying its effectiveness and generalization ability. Thanks to the guidance of game-theoretic interaction, our model impressively convergences well on limited data (10^4 videos), surpassing most of those pre-trained on large-scale data (10^7 videos).

IJCAI Conference 2023 Conference Paper

WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image Segmentation

  • Zesen Cheng
  • Peng Jin
  • Hao Li
  • Kehan Li
  • Siheng Li
  • Xiangyang Ji
  • Chang Liu
  • Jie Chen

The top-down and bottom-up methods are two mainstreams of referring segmentation, while both methods have their own intrinsic weaknesses. Top-down methods are chiefly disturbed by Polar Negative (PN) errors owing to the lack of fine-grained cross-modal alignment. Bottom-up methods are mainly perturbed by Inferior Positive (IP) errors due to the lack of prior object information. Nevertheless, we discover that two types of methods are highly complementary for restraining respective weaknesses but the direct average combination leads to harmful interference. In this context, we build Win-win Cooperation (WiCo) to exploit complementary nature of two types of methods on both interaction and integration aspects for achieving a win-win improvement. For the interaction aspect, Complementary Feature Interaction (CFI) introduces prior object information to bottom-up branch and provides fine-grained information to top-down branch for complementary feature enhancement. For the integration aspect, Gaussian Scoring Integration (GSI) models the gaussian performance distributions of two branches and weighted integrates results by sampling confident scores from the distributions. With our WiCo, several prominent bottom-up and top-down combinations achieve remarkable improvements on three common datasets with reasonable extra costs, which justifies effectiveness and generality of our method.

TMLR Journal 2022 Journal Article

Direct Molecular Conformation Generation

  • Jinhua Zhu
  • Yingce Xia
  • Chang Liu
  • Lijun Wu
  • Shufang Xie
  • Yusong Wang
  • Tong Wang
  • Tao Qin

Molecular conformation generation aims to generate three-dimensional coordinates of all the atoms in a molecule and is an important task in bioinformatics and pharmacology. Previous methods usually first predict the interatomic distances, the gradients of interatomic distances or the local structures (e.g., torsion angles) of a molecule, and then reconstruct its 3D conformation. How to directly generate the conformation without the above intermediate values is not fully explored. In this work, we propose a method that directly predicts the coordinates of atoms: (1) the loss function is invariant to roto-translation of coordinates and permutation of symmetric atoms; (2) the newly proposed model adaptively aggregates the bond and atom information and iteratively refines the coordinates of the generated conformation. Our method achieves the best results on GEOM-QM9 and GEOM-Drugs datasets. Further analysis shows that our generated conformations have closer properties (e.g., HOMO-LUMO gap) with the groundtruth conformations. In addition, our method improves molecular docking by providing better initial conformations. All the results demonstrate the effectiveness of our method and the great potential of the direct approach. The code is released at \url{https://github.com/DirectMolecularConfGen/DMCG}.

NeurIPS Conference 2022 Conference Paper

Distilling Representations from GAN Generator via Squeeze and Span

  • Yu Yang
  • Xiaotian Cheng
  • Chang Liu
  • Hakan Bilen
  • Xiangyang Ji

In recent years, generative adversarial networks (GANs) have been an actively studied topic and shown to successfully produce high-quality realistic images in various domains. The controllable synthesis ability of GAN generators suggests that they maintain informative, disentangled, and explainable image representations, but leveraging and transferring their representations to downstream tasks is largely unexplored. In this paper, we propose to distill knowledge from GAN generators by squeezing and spanning their representations. We \emph{squeeze} the generator features into representations that are invariant to semantic-preserving transformations through a network before they are distilled into the student network. We \emph{span} the distilled representation of the synthetic domain to the real domain by also using real training data to remedy the mode collapse of GANs and boost the student network performance in a real domain. Experiments justify the efficacy of our method and reveal its great significance in self-supervised representation learning. Code is available at https: //github. com/yangyu12/squeeze-and-span.

IJCAI Conference 2022 Conference Paper

Test-time Fourier Style Calibration for Domain Generalization

  • Xingchen Zhao
  • Chang Liu
  • Anthony Sicilia
  • Seong Jae Hwang
  • Yun Fu

The topic of generalizing machine learning models learned on a collection of source domains to unknown target domains is challenging. While many domain generalization (DG) methods have achieved promising results, they primarily rely on the source domains at train-time without manipulating the target domains at test-time. Thus, it is still possible that those methods can overfit to source domains and perform poorly on target domains. Driven by the observation that domains are strongly related to styles, we argue that reducing the gap between source and target styles can boost models’ generalizability. To solve the dilemma of having no access to the target domain during training, we introduce Test-time Fourier Style Calibration (TF-Cal) for calibrating the target domain style on the fly during testing. To access styles, we utilize Fourier transformation to decompose features into amplitude (style) features and phase (semantic) features. Furthermore, we present an effective technique to Augment Amplitude Features (AAF) to complement TF-Cal. Extensive experiments on several popular DG benchmarks and a segmentation dataset for medical images demonstrate that our method outperforms state-of-the-art methods.

EAAI Journal 2022 Journal Article

Transformer-based moving target tracking method for Unmanned Aerial Vehicle

  • Nianyi Sun
  • Jin Zhao
  • Guangwei Wang
  • Chang Liu
  • Peng Liu
  • Xiong Tang
  • Jinbiao Han

Unmanned Aerial Vehicle (UAV) moving target tracking is one of the fundamental implementations in remote sensing and has been widely applied in monitoring, search and rescue, pursuit-escapes, and other fields. Currently, most UAV tracking algorithms merely establish the local relationship between the template and search region without fully using the global context information, leading to problems such as target loss and misclassification, and imprecise bounding boxes. This paper innovatively proposes a UAV tracker, TransUAV, overcoming the above challenge by a feature correlation network based on the self-attention mechanism. The method efficiently combines global features between the search region and the template to reduce the influence of external interference, enhancing the precision and robustness of the tracking algorithm. Moreover, the global spatio-temporal features are acquired by learning query embedding and temporal update strategies to make predictions, enhancing the adaptability to rapid changes in the appearance of target object. There is no proposal or predetermined anchor in this method to satisfy the requirements of onboard operational speed, therefore, no post-processing procedure is required, and the entire approach is end-to-end. The superiority of the proposed TransUAV is verified by an exhaustive evaluation of six challenging target tracking video datasets benchmarks, and the accuracy and robustness of the proposed TransUAV are compared with state-of-the-art methods.

JBHI Journal 2021 Journal Article

A Data-Driven Approach to Transfer Function Analysis for Superior Discriminative Power: Optimized Assessment of Dynamic Cerebral Autoregulation

  • Jia Liu
  • Zhen-Ni Guo
  • David Simpson
  • Pandeng Zhang
  • Chang Liu
  • Jia-Ning Song
  • Xinyi Leng
  • Yi Yang

Transfer function analysis (TFA) is extensively used to assess human physiological functions. However, extracting parameters from TFA is not usually optimized for detecting impaired function. In this study, we propose to use data-driven approaches to improve the performance of TFA in assessing blood flow control in the brain (dynamic cerebral autoregulation, dCA). Data were collected from two distinct groups of subjects deemed to have normal and impaired dCA. Continuous arterial blood pressure (ABP) and cerebral blood flow velocity (CBFV) were simultaneously recorded for approximately 10 mins in 82 subjects (including 41 healthy controls) to give 328 labeled samples of the TFA variables. The recordings were further divided into 4, 294 short data segments to generate 17, 176 unlabeled samples of the TFA variables. We optimized TFA post-processing with a generic semi-supervised learning strategy and a novel semi-supervised stacked ensemble learning (SSEL) strategy for classification into normal and impaired dCA. The generic strategy led to a performance with no significant difference to that of the conventional dCA analysis methods, whereas the proposed new strategy boosted the performance of TFA to an accuracy of 93. 3%. To our knowledge, this is the best dCA discrimination performance obtained to date and the first attempt at optimizing TFA through machine learning techniques. Equivalent methods can potentially also be applied to assessing a wide spectrum of other human physiological functions.

TIST Journal 2021 Journal Article

A GDPR-compliant Ecosystem for Speech Recognition with Transfer, Federated, and Evolutionary Learning

  • Di Jiang
  • Conghui Tan
  • Jinhua Peng
  • Chaotao Chen
  • Xueyang Wu
  • Weiwei Zhao
  • Yuanfeng Song
  • Yongxin Tong

Automatic Speech Recognition (ASR) is playing a vital role in a wide range of real-world applications. However, Commercial ASR solutions are typically “one-size-fits-all” products and clients are inevitably faced with the risk of severe performance degradation in field test. Meanwhile, with new data regulations such as the European Union’s General Data Protection Regulation (GDPR) coming into force, ASR vendors, which traditionally utilize the speech training data in a centralized approach, are becoming increasingly helpless to solve this problem, since accessing clients’ speech data is prohibited. Here, we show that by seamlessly integrating three machine learning paradigms (i.e., T ransfer learning, F ederated learning, and E volutionary learning (TFE)), we can successfully build a win-win ecosystem for ASR clients and vendors and solve all the aforementioned problems plaguing them. Through large-scale quantitative experiments, we show that with TFE, the clients can enjoy far better ASR solutions than the “one-size-fits-all” counterpart, and the vendors can exploit the abundance of clients’ data to effectively refine their own ASR products.

ICRA Conference 2021 Conference Paper

Computational Design and Fabrication of Corrugated Mechanisms from Behavioral Specifications

  • Chang Liu
  • Wenzhong Yan
  • Ankur Mehta

Orthogonally assembled double-layered corrugated (OADLC) mechanisms are a class of foldable structures that harness origami-inspired methods to enhance the structural stiffness of resulting devices; these mechanisms have extensive applications due to their lightweight, compact nature as well as their high strength-to-weight ratio. However, the design of these mechanisms remains challenging. Here, we propose an efficient method to rapidly design OADLC mechanisms from desired behavioral specifications, i. e. in-plane stiffness and out-of-plane stiffness. Based on an equivalent plate model, we develop and validate analytical formulas for the behavioral specifications of OADLC mechanisms; the analytical formulas can be described as expressions of design parameters. On the basis of the analytical expressions, we formulate the design of OADLC mechanisms from behavioral specifications into an optimization problem that minimizes the weight with given design constraints. The 2D folding patterns of the optimized OADLC mechanisms can be generated automatically and directly delivered for fabrication. Our rapid design method is demonstrated by developing stiffness-enhanced mechanisms with a desired out-of-plane stiffness for a foldable gripper that enables a blimp to perch steadily under air disturbance and weight limit.

IJCAI Conference 2021 Conference Paper

Generalizing to Unseen Domains: A Survey on Domain Generalization

  • Jindong Wang
  • Cuiling Lan
  • Chang Liu
  • Yidong Ouyang
  • Tao Qin

Domain generalization (DG), i. e. , out-of-distribution generalization, has attracted increased interests in recent years. Domain generalization deals with a challenging setting where one or several different but related domain(s) are given, and the goal is to learn a model that can generalize to an unseen test domain. For years, great progress has been achieved. This paper presents the first review for recent advances in domain generalization. First, we provide a formal definition of domain generalization and discuss several related fields. Then, we categorize recent algorithms into three classes and present them in detail: data manipulation, representation learning, and learning strategy, each of which contains several popular algorithms. Third, we introduce the commonly used datasets and applications. Finally, we summarize existing literature and present some potential research topics for the future.

IJCAI Conference 2021 Conference Paper

Knowledge-based Residual Learning

  • Guanjie Zheng
  • Chang Liu
  • Hua Wei
  • Porter Jenkins
  • Chacha Chen
  • Tao Wen
  • Zhenhui Li

Small data has been a barrier for many machine learning tasks, especially when applied in scientific domains. Fortunately, we can utilize domain knowledge to make up the lack of data. Hence, in this paper, we propose a hybrid model KRL that treats domain knowledge model as a weak learner and uses another neural net model to boost it. We prove that KRL is guaranteed to improve over pure domain knowledge model and pure neural net model under certain loss functions. Extensive experiments have shown the superior performance of KRL over baselines. In addition, several case studies have explained how the domain knowledge can assist the prediction.

NeurIPS Conference 2021 Conference Paper

Learning Causal Semantic Representation for Out-of-Distribution Prediction

  • Chang Liu
  • Xinwei Sun
  • Jindong Wang
  • Haoyue Tang
  • Tao Li
  • Tao Qin
  • Wei Chen
  • Tie-Yan Liu

Conventional supervised learning methods, especially deep ones, are found to be sensitive to out-of-distribution (OOD) examples, largely because the learned representation mixes the semantic factor with the variation factor due to their domain-specific correlation, while only the semantic factor causes the output. To address the problem, we propose a Causal Semantic Generative model (CSG) based on a causal reasoning so that the two factors are modeled separately, and develop methods for OOD prediction from a single training domain, which is common and challenging. The methods are based on the causal invariance principle, with a novel design in variational Bayes for both efficient learning and easy prediction. Theoretically, we prove that under certain conditions, CSG can identify the semantic factor by fitting training data, and this semantic-identification guarantees the boundedness of OOD generalization error and the success of adaptation. Empirical study shows improved OOD performance over prevailing baselines.

AAAI Conference 2021 Conference Paper

Noninvasive Self-attention for Side Information Fusion in Sequential Recommendation

  • Chang Liu
  • Xiaoguang Li
  • Guohao Cai
  • Zhenhua Dong
  • Hong Zhu
  • Lifeng Shang

Sequential recommender systems aim to model users’ evolving interests from their historical behaviors, and hence make customized time-relevant recommendations. Compared with traditional models, deep learning approaches such as CNN and RNN have achieved remarkable advancements in recommendation tasks. Recently, the BERT framework also emerges as a promising method, benefited from its selfattention mechanism in processing sequential data. However, one limitation of the original BERT framework is that it only considers one input source of the natural language tokens. It is still an open question to leverage various types of information under the BERT framework. Nonetheless, it is intuitively appealing to utilize other side information, such as item category or tag, for more comprehensive depictions and better recommendations. In our pilot experiments, we found naive approaches, which directly fuse types of side information into the item embeddings, usually bring very little or even negative effects. Therefore, in this paper, we propose the NOninVasive self-Attention mechanism (NOVA) to leverage side information effectively under the BERT framework. NOVA makes use of side information to generate better attention distribution, rather than directly altering the item embeddings, which may cause information overwhelming. We validate the NOVA-BERT model on both public and commercial datasets, and our method can stably outperform the state-of-the-art models with negligible computational overheads.

NeurIPS Conference 2021 Conference Paper

Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning

  • Jongjin Park
  • Younggyo Seo
  • Chang Liu
  • Li Zhao
  • Tao Qin
  • Jinwoo Shin
  • Tie-Yan Liu

Behavioral cloning has proven to be effective for learning sequential decision-making policies from expert demonstrations. However, behavioral cloning often suffers from the causal confusion problem where a policy relies on the noticeable effect of expert actions due to the strong correlation but not the cause we desire. This paper presents Object-aware REgularizatiOn (OREO), a simple technique that regularizes an imitation policy in an object-aware manner. Our main idea is to encourage a policy to uniformly attend to all semantic objects, in order to prevent the policy from exploiting nuisance variables strongly correlated with expert actions. To this end, we introduce a two-stage approach: (a) we extract semantic objects from images by utilizing discrete codes from a vector-quantized variational autoencoder, and (b) we randomly drop the units that share the same discrete code together, i. e. , masking out semantic objects. Our experiments demonstrate that OREO significantly improves the performance of behavioral cloning, outperforming various other regularization and causality-based methods on a variety of Atari environments and a self-driving CARLA environment. We also show that our method even outperforms inverse reinforcement learning methods trained with a considerable amount of environment interaction.

NeurIPS Conference 2021 Conference Paper

On the Generative Utility of Cyclic Conditionals

  • Chang Liu
  • Haoyue Tang
  • Tao Qin
  • Jintao Wang
  • Tie-Yan Liu

We study whether and how can we model a joint distribution $p(x, z)$ using two conditional models $p(x|z)$ and $q(z|x)$ that form a cycle. This is motivated by the observation that deep generative models, in addition to a likelihood model $p(x|z)$, often also use an inference model $q(z|x)$ for extracting representation, but they rely on a usually uninformative prior distribution $p(z)$ to define a joint distribution, which may render problems like posterior collapse and manifold mismatch. To explore the possibility to model a joint distribution using only $p(x|z)$ and $q(z|x)$, we study their compatibility and determinacy, corresponding to the existence and uniqueness of a joint distribution whose conditional distributions coincide with them. We develop a general theory for operable equivalence criteria for compatibility, and sufficient conditions for determinacy. Based on the theory, we propose a novel generative modeling framework CyGen that only uses the two cyclic conditional models. We develop methods to achieve compatibility and determinacy, and to use the conditional models to fit and generate data. With the prior constraint removed, CyGen better fits data and captures more representative features, supported by both synthetic and real-world experiments.

IROS Conference 2021 Conference Paper

Origami Logic Gates for Printable Robots

  • Wenzhong Yan
  • Chang Liu
  • Ankur Mehta

Origami robots–often called "printable" robots– created using folding processes have gained extensive attention due to their potential for rapid and accessible design and fabrication through simple structures with complex functionalities. However, almost all origami robots require conventional rigid electronics for control, which may hinder the integration and restrict the potential of these origami systems. Here we introduce origami logic gates that can be built through folding. The major enabling technology is a bistable switch that can switch between two different circuits to control the electrical flow. Based on the origami switch, we develop NOT, AND, and OR logic gates (showing functional completeness) and demonstrate these logic gates through sufficiently powering low-current LEDs. These logic gates are fabricated using cut-and-fold manufacturing and offer a potential way of integrating logic functions directly into origami machines without electronics.

NeurIPS Conference 2021 Conference Paper

Recovering Latent Causal Factor for Generalization to Distributional Shifts

  • Xinwei Sun
  • Botong Wu
  • Xiangyu Zheng
  • Chang Liu
  • Wei Chen
  • Tao Qin
  • Tie-Yan Liu

Distributional shifts between training and target domains may degrade the prediction accuracy of learned models, mainly because these models often learn features that possess only correlation rather than causal relation with the output. Such a correlation, which is known as ``spurious correlation'' statistically, is domain-dependent hence may fail to generalize to unseen domains. To avoid such a spurious correlation, we propose \textbf{La}tent \textbf{C}ausal \textbf{I}nvariance \textbf{M}odels (LaCIM) that specifies the underlying causal structure of the data and the source of distributional shifts, guiding us to pursue only causal factor for prediction. Specifically, the LaCIM introduces a pair of correlated latent factors: (a) causal factor and (b) others, while the extent of this correlation is governed by a domain variable that characterizes the distributional shifts. On the basis of this, we prove that the distribution of observed variables conditioning on latent variables is shift-invariant. Equipped with such an invariance, we prove that the causal factor can be recovered without mixing information from others, which induces the ground-truth predicting mechanism. We propose a Variational-Bayesian-based method to learn this invariance for prediction. The utility of our approach is verified by improved generalization to distributional shifts on various real-world data. Our code is freely available at \url{https: //github. com/wubotong/LaCIM}.

YNIMG Journal 2021 Journal Article

Where does fear originate in the brain? A coordinate-based meta-analysis of explicit and implicit fear processing

  • Di Tao
  • Zonglin He
  • Yuchen Lin
  • Chang Liu
  • Qian Tao

Processing of fear is of crucial importance for human survival and it can generally occur at explicit and implicit conditions. It is worth noting that explicit and implicit fear processing produces different behavioral and neurophysiological outcomes. The present study capitalizes on the Activation Likelihood Estimation (ALE) method of meta-analysis to identify: (a) the "core" network of fear processing in healthy individuals; (b) common and specific neural activations associated with explicit and implicit processing of fear. Following PRISMA guidelines, a total of 92 fMRI and PET studies were included in the meta-analysis. The overall analysis show that the core fear network comprises the amygdala, pulvinar, and fronto-occipital regions. Both implicit and explicit fear processing activated amygdala, declive, fusiform gyrus, and middle frontal gyrus, suggesting that these two types of fear processing share a common neural substrate. Explicit fear processing elicited more activations at the pulvinar and parahippocampal gyrus, suggesting visual attention/orientation and contextual association play important roles during explicit fear processing. In contrast, implicit fear processing elicited more activations at the cerebellum-amygdala-cortical pathway, indicating an 'alarm' system underlying implicit fear processing. These findings have shed light on the neural mechanism underlying fear processing at different levels of awareness.

IJCAI Conference 2020 Conference Paper

An AI-empowered Visual Storyline Generator

  • Chang Liu
  • Zhao Yong Lim
  • Han Yu
  • Zhiqi Shen
  • Ian Dixon
  • Zhanning Gao
  • Pan Wang
  • Peiran Ren

Video editing is currently a highly skill- and time-intensive process. One of the most important tasks in video editing is to compose the visual storyline. This paper outlines Visual Storyline Generator (VSG), an artificial intelligence (AI)-empowered system that automatically generates visual storylines based on a set of images and video footages provided by the user. It is designed to produce engaging and persuasive promotional videos with an easy-to-use interface. In addition, users can be involved in refining the AI-generated visual storylines. The editing results can be used as training data to further improve the AI algorithms in VSG.

AAAI Conference 2020 Conference Paper

Cross-Lingual Low-Resource Set-to-Description Retrieval for Global E-Commerce

  • Juntao Li
  • Chang Liu
  • Jian Wang
  • Lidong Bing
  • Hongsong Li
  • Xiaozhong Liu
  • Dongyan Zhao
  • Rui Yan

With the prosperous of cross-border e-commerce, there is an urgent demand for designing intelligent approaches for assisting e-commerce sellers to offer local products for consumers from all over the world. In this paper, we explore a new task of cross-lingual information retrieval, i. e. , cross-lingual set-todescription retrieval in cross-border e-commerce, which involves matching product attribute sets in the source language with persuasive product descriptions in the target language. We manually collect a new and high-quality paired dataset, where each pair contains an unordered product attribute set in the source language and an informative product description in the target language. As the dataset construction process is both time-consuming and costly, the new dataset only comprises of 13. 5k pairs, which is a low-resource setting and can be viewed as a challenging testbed for model development and evaluation in cross-border e-commerce. To tackle this cross-lingual set-to-description retrieval task, we propose a novel cross-lingual matching network (CLMN) with the enhancement of context-dependent cross-lingual mapping upon the pre-trained monolingual BERT representations. Experimental results indicate that our proposed CLMN yields impressive results on the challenging task and the contextdependent cross-lingual mapping on BERT yields noticeable improvement over the pre-trained multi-lingual BERT model.

AAAI Conference 2020 Conference Paper

DWM: A Decomposable Winograd Method for Convolution Acceleration

  • Di Huang
  • Xishan Zhang
  • Rui Zhang
  • Tian Zhi
  • Deyuan He
  • Jiaming Guo
  • Chang Liu
  • Qi Guo

Winograd’s minimal filtering algorithm has been widely used in Convolutional Neural Networks (CNNs) to reduce the number of multiplications for faster processing. However, it is only effective on convolutions with kernel size as 3x3 and stride as 1, because it suffers from significantly increased FLOPs and numerical accuracy problem for kernel size larger than 3x3 and fails on convolution with stride larger than 1. In this paper, we propose a novel Decomposable Winograd Method (DWM), which breaks through the limitation of original Winograd’s minimal filtering algorithm to a wide and general convolutions. DWM decomposes kernels with large size or large stride to several small kernels with stride as 1 for further applying Winograd method, so that DWM can reduce the number of multiplications while keeping the numerical accuracy. It enables the fast exploring of larger kernel size and larger stride value in CNNs for high performance and accuracy and even the potential for new CNNs. Comparing against the original Winograd, the proposed DWM is able to support all kinds of convolutions with a speedup of ∼2, without affecting the numerical accuracy.

AAAI Conference 2020 Conference Paper

Find Objects and Focus on Highlights: Mining Object Semantics for Video Highlight Detection via Graph Neural Networks

  • Yingying Zhang
  • Junyu Gao
  • Xiaoshan Yang
  • Chang Liu
  • Yan Li
  • Changsheng Xu

With the increasing prevalence of portable computing devices, browsing unedited videos is time-consuming and tedious. Video highlight detection has the potential to significantly ease this situation, which discoveries moments of user’s major or special interest in a video. Existing methods suffer from two problems. Firstly, most existing approaches only focus on learning holistic visual representations of videos but ignore object semantics for inferring video highlights. Secondly, current state-of-the-art approaches often adopt the pairwise ranking-based strategy, which cannot enjoy the global information to infer highlights. Therefore, we propose a novel video highlight framework, named VH- GNN, to construct an object-aware graph and model the relationships between objects from a global view. To reduce computational cost, we decompose the whole graph into two types of graphs: a spatial graph to capture the complex interactions of object within each frame, and a temporal graph to obtain object-aware representation of each frame and capture the global information. In addition, we optimize the framework via a proposed multi-stage loss, where the first stage aims to determine the highlight-probability and the second stage leverage the relationships between frames and focus on hard examples from the former stage. Extensive experiments on two standard datasets strongly evidence that VH-GNN obtains significant performance compared with state-of-the-arts.

AAAI Conference 2020 Short Paper

Generating Engaging Promotional Videos for E-commerce Platforms (Student Abstract)

  • Chang Liu
  • Han Yu
  • Yi Dong
  • Zhiqi Shen
  • Yingxue Yu
  • Ian Dixon
  • Zhanning Gao
  • Pan Wang

There is an emerging trend for sellers to use videos to promote their products on e-commerce platforms such as Taobao. com. Current video production workflow includes the production of visual storyline by human directors. We propose a system to automatically generate visual storyline based on the input set of visual materials (e. g. video clips or still images) and then produce a promotional video. In particular, we propose an algorithm called Shot Composition, Selection and Plotting (ShotCSP), which generates visual storylines leveraging film-making principles to improve viewing experience and perceived persuasiveness.

AIIM Journal 2020 Journal Article

Handling imbalanced medical image data: A deep-learning-based one-class classification approach

  • Long Gao
  • Lei Zhang
  • Chang Liu
  • Shandong Wu

In clinical settings, a lot of medical image datasets suffer from the imbalance problem which hampers the detection of outliers (rare health care events), as most classification methods assume an equal occurrence of classes. In this way, identifying outliers in imbalanced datasets has become a crucial issue. To help address this challenge, one-class classification, which focuses on learning a model using samples from only a single given class, has attracted increasing attention. Previous one-class modeling usually uses feature mapping or feature fitting to enforce the feature learning process. However, these methods are limited for medical images which usually have complex features. In this paper, a novel method is proposed to enable deep learning models to optimally learn single-class-relevant inherent imaging features by leveraging the concept of imaging complexity. We investigate and compare the effects of simple but effective perturbing operations applied to images to capture imaging complexity and to enhance feature learning. Extensive experiments are performed on four clinical datasets to show that the proposed method outperforms four state-of-the-art methods.

AAAI Conference 2020 Conference Paper

Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

  • Dezhao Luo
  • Chang Liu
  • Yu Zhou
  • Dongbao Yang
  • Can Ma
  • Qixiang Ye
  • Weiping Wang

We propose a novel self-supervised method, referred to as Video Cloze Procedure (VCP), to learn rich spatial-temporal representations. VCP first generates “blanks” by withholding video clips and then creates “options” by applying spatiotemporal operations on the withheld clips. Finally, it fills the blanks with “options” and learns representations by predicting the categories of operations applied on the clips. VCP can act as either a proxy task or a target task in self-supervised learning. As a proxy task, it converts rich self-supervised representations into video clip operations (options), which enhances the flexibility and reduces the complexity of representation learning. As a target task, it can assess learned representation models in a uniform and interpretable manner. With VCP, we train spatial-temporal representation models (3D-CNNs) and apply such models on action recognition and video retrieval tasks. Experiments on commonly used benchmarks show that the trained models outperform the state-ofthe-art self-supervised models with significant margins.

AAAI Conference 2019 Conference Paper

Attentive Tensor Product Learning

  • Qiuyuan Huang
  • Li Deng
  • Dapeng Wu
  • Chang Liu
  • Xiaodong He

This paper proposes a novel neural architecture — Attentive Tensor Product Learning (ATPL) — to represent grammatical structures of natural language in deep learning models. ATPL exploits Tensor Product Representations (TPR), a structured neural-symbolic model developed in cognitive science, to integrate deep learning with explicit natural language structures and rules. The key ideas of ATPL are: 1) unsupervised learning of role-unbinding vectors of words via the TPR-based deep neural network; 2) the use of attention modules to compute TPR; and 3) the integration of TPR with typical deep learning architectures including long short-term memory and feedforward neural networks. The novelty of our approach lies in its ability to extract the grammatical structure of a sentence by using role-unbinding vectors, which are obtained in an unsupervised manner. Our ATPL approach is applied to 1) image captioning, 2) part of speech (POS) tagging, and 3) constituency parsing of a natural language sentence. The experimental results demonstrate the effectiveness of the proposed approach in all these three natural language processing tasks.

YNIMG Journal 2019 Journal Article

BOLD-fMRI reveals the association between renal oxygenation and functional connectivity in the aging brain

  • Hechun Li
  • Weifang Cao
  • Xingxing Zhang
  • Bo Sun
  • Sisi Jiang
  • Jianfu Li
  • Chang Liu
  • Wenjie Yin

Aging is accompanied by a decline in physical and cognitive function. Vascular aging may provide a major influence on these measures. The purpose of this study was to explore the relationship between renal oxygenation and functional connectivity of the aging brain because of the anatomic and hemodynamic similarities between cerebral and renal vessels. Fifty-two healthy older adults were recruited to undergo a BOLD-fMRI scan of the brain and kidneys, and forty-four healthy younger subjects were recruited as the control group. First, cerebral functional connectivity density (FCD) was used to evaluate functional connectivity. Renal medullary and cortical R2* values were extracted respectively, and the ratio of medullary and cortical R2* values (MCR) was calculated. Then, the association between brain FCD and renal MCR was analyzed. Compared with younger adults, the elderly group showed higher renal medullary R2* and MCR, which might reflect a slight abnormality of renal oxygenation with aging. The older subjects also showed enhanced FCD in bilateral motor-related regions and decreased FCD in regions of the default mode network (DMN). The findings indicated that the functional connectivity in the DMN and motor cortices was vulnerable to aging. Moreover, the altered brain FCD values in the watershed regions, DMN and motor cortices were significantly correlated with the renal MCR value in the elderly group. The association between renal oxygenation abnormalities and spontaneous activity in the brain might reflect vascular aging and its influence on the kidney and brain during aging to some extent. This study provided a new perspective for understanding the relationship between tissue oxygenation and brain functional connectivity.

NeurIPS Conference 2019 Conference Paper

FreeAnchor: Learning to Match Anchors for Visual Object Detection

  • Xiaosong Zhang
  • Fang Wan
  • Chang Liu
  • Rongrong Ji
  • Qixiang Ye

Modern CNN-based object detectors assign anchors for ground-truth objects under the restriction of object-anchor Intersection-over-Unit (IoU). In this study, we propose a learning-to-match approach to break IoU restriction, allowing objects to match anchors in a flexible manner. Our approach, referred to as FreeAnchor, updates hand-crafted anchor assignment to "free" anchor matching by formulating detector training as a maximum likelihood estimation (MLE) procedure. FreeAnchor targets at learning features which best explain a class of objects in terms of both classification and localization. FreeAnchor is implemented by optimizing detection customized likelihood and can be fused with CNN-based detectors in a plug-and-play manner. Experiments on MS-COCO demonstrate that FreeAnchor consistently outperforms the counterparts with significant margins.

IJCAI Conference 2018 Conference Paper

Curriculum Adversarial Training

  • Qi-Zhi Cai
  • Chang Liu
  • Dawn Song

Recently, deep learning has been applied to many security-sensitive applications, such as facial authentication. The existence of adversarial examples hinders such applications. The state-of-the-art result on defense shows that adversarial training can be applied to train a robust model on MNIST against adversarial examples; but it fails to achieve a high empirical worst-case accuracy on a more complex task, such as CIFAR-10 and SVHN. In our work, we propose curriculum adversarial training (CAT) to resolve this issue. The basic idea is to develop a curriculum of adversarial examples generated by attacks with a wide range of strengths. With two techniques to mitigate the catastrophic forgetting and the generalization issues, we demonstrate that CAT can improve the prior art's empirical worst-case accuracy by a large margin of 25% on CIFAR-10 and 35% on SVHN. At the same, the model's performance on non-adversarial inputs is comparable to the state-of-the-art models.

YNIMG Journal 2018 Journal Article

Development of subcortical volumes across adolescence in males and females: A multisample study of longitudinal changes

  • Megan M. Herting
  • Cory Johnson
  • Kathryn L. Mills
  • Nandita Vijayakumar
  • Meg Dennison
  • Chang Liu
  • Anne-Lise Goddings
  • Ronald E. Dahl

The developmental patterns of subcortical brain volumes in males and females observed in previous studies have been inconsistent. To help resolve these discrepancies, we examined developmental trajectories using three independent longitudinal samples of participants in the age-span of 8–22 years (total 216 participants and 467 scans). These datasets, including Pittsburgh (PIT; University of Pittsburgh, USA), NeuroCognitive Development (NCD; University of Oslo, Norway), and Orygen Adolescent Development Study (OADS; The University of Melbourne, Australia), span three countries and were analyzed together and in parallel using mixed-effects modeling with both generalized additive models and general linear models. For all regions and across all samples, males were found to have significantly larger volumes as compared to females, and significant sex differences were seen in age trajectories over time. However, direct comparison of sample trajectories and sex differences identified within samples were not consistent. The trajectories for the amygdala, putamen, and nucleus accumbens were most consistent between the three samples. Our results suggest that even after using similar preprocessing and analytic techniques, additional factors, such as image acquisition or sample composition may contribute to some of the discrepancies in sex specific patterns in subcortical brain changes across adolescence, and highlight region-specific variations in congruency of developmental trajectories.

AAMAS Conference 2018 Conference Paper

Human-UAV Teaming in Dynamic and Uncertain Environments

  • Alper Turan Alan
  • Chang Liu
  • Elliot Salisbury
  • Stephen D. Prior
  • Sarvapali D. Ramchurn
  • Feng Wu
  • Kerry Tatlock
  • Gareth Rees

In this demonstrator we show how an algorithm developed for human-agent coordination can be used to coordinate human actors on the ground and unmanned aerial vehicles in a rescue mission. A video can be found here: http: //goo. gl/QLQD7q.

AAAI Conference 2018 Conference Paper

Riemannian Stein Variational Gradient Descent for Bayesian Inference

  • Chang Liu
  • Jun Zhu

We develop Riemannian Stein Variational Gradient Descent (RSVGD), a Bayesian inference method that generalizes Stein Variational Gradient Descent (SVGD) to Riemann manifold. The benefits are two-folds: (i) for inference tasks in Euclidean spaces, RSVGD has the advantage over SVGD of utilizing information geometry, and (ii) for inference tasks on Riemann manifolds, RSVGD brings the unique advantages of SVGD to the Riemannian world. To appropriately transfer to Riemann manifolds, we conceive novel and non-trivial techniques for RSVGD, which are required by the intrinsically different characteristics of general Riemann manifolds from Euclidean spaces. We also discover Riemannian Stein’s Identity and Riemannian Kernelized Stein Discrepancy. Experimental results show the advantages over SVGD of exploring distribution geometry and the advantages of particleefficiency, iteration-effectiveness and approximation flexibility over other inference methods on Riemann manifolds.

NeurIPS Conference 2018 Conference Paper

Tree-to-tree Neural Networks for Program Translation

  • Xinyun Chen
  • Chang Liu
  • Dawn Song

Program translation is an important tool to migrate legacy code in one language into an ecosystem built in a different language. In this work, we are the first to employ deep neural networks toward tackling this problem. We observe that program translation is a modular procedure, in which a sub-tree of the source tree is translated into the corresponding target sub-tree at each step. To capture this intuition, we design a tree-to-tree neural network to translate a source tree into a target one. Meanwhile, we develop an attention mechanism for the tree-to-tree model, so that when the decoder expands one non-terminal in the target tree, the attention mechanism locates the corresponding sub-tree in the source tree to guide the expansion of the decoder. We evaluate the program translation capability of our tree-to-tree model against several state-of-the-art approaches. Compared against other neural translation models, we observe that our approach is consistently better than the baselines with a margin of up to 15 points. Further, our approach can improve the previous state-of-the-art program translation approaches by a margin of 20 points on the translation of real-world projects.

IJCAI Conference 2017 Conference Paper

MAT: A Multimodal Attentive Translator for Image Captioning

  • Chang Liu
  • Fuchun Sun
  • Changhu Wang
  • Feng Wang
  • Alan Yuille

In this work we formulate the problem of image captioning as a multimodal translation task. Analogous to machine translation, we present a sequence-to-sequence recurrent neural networks (RNN) model for image caption generation. Different from most existing work where the whole image is represented by convolutional neural network (CNN) feature, we propose to represent the input image as a sequence of detected objects which feeds as the source sequence of the RNN model. In this way, the sequential representation of an image can be naturally translated to a sequence of words, as the target sequence of the RNN model. To represent the image in a sequential way, we extract the objects features in the image and arrange them in a order using convolutional neural networks. To further leverage the visual information from the encoded objects, a sequential attention layer is introduced to selectively attend to the objects that are related to generate corresponding words in the sentences. Extensive experiments are conducted to validate the proposed approach on popular benchmark dataset, i. e. , MS COCO, and the proposed model surpasses the state-of-the-art methods in all metrics following the dataset splits of previous work. The proposed approach is also evaluated by the evaluation server of MS COCO captioning challenge, and achieves very competitive results, e. g. , a CIDEr of 1. 029 (c5) and 1. 064 (c40).

AAMAS Conference 2016 Conference Paper

Goal Inference Improves Objective and Perceived Performance in Human-Robot Collaboration

  • Chang Liu
  • Jessica B. Hamrick
  • Jaime F. Fisac
  • Anca D. Dragan
  • J. Karl Hedrick
  • S. Shankar Sastry
  • Thomas L. Griffiths

The study of human-robot interaction is fundamental to the design and use of robotics in real-world applications. Robots will need to predict and adapt to the actions of human collaborators in order to achieve good performance and improve safety and end-user adoption. This paper evaluates a human-robot collaboration scheme that combines the task allocation and motion levels of reasoning: the robotic agent uses Bayesian inference to predict the next goal of its human partner from his or her ongoing motion, and re-plans its own actions in real time. This anticipative adaptation is desirable in many practical scenarios, where humans are unable or unwilling to take on the cognitive overhead required to explicitly communicate their intent to the robot. A behavioral experiment indicates that the combination of goal inference and dynamic task planning significantly improves both objective and perceived performance of the humanrobot team. Participants were highly sensitive to the differences between robot behaviors, preferring to work with a robot that adapted to their actions over one that did not.

NeurIPS Conference 2016 Conference Paper

Latent Attention For If-Then Program Synthesis

  • Chang Liu
  • Xinyun Chen
  • Eui Chul Shin
  • Mingcheng Chen
  • Dawn Song

Automatic translation from natural language descriptions into programs is a long-standing challenging problem. In this work, we consider a simple yet important sub-problem: translation from textual descriptions to If-Then programs. We devise a novel neural network architecture for this task which we train end-to-end. Specifically, we introduce Latent Attention, which computes multiplicative weights for the words in the description in a two-stage process with the goal of better leveraging the natural language structures that indicate the relevant parts for predicting program elements. Our architecture reduces the error rate by 28. 57% compared to prior art. We also propose a one-shot learning scenario of If-Then program synthesis and simulate it with our existing dataset. We demonstrate a variation on the training procedure for this scenario that outperforms the original procedure, significantly closing the gap to the model trained with all data.

TCS Journal 2016 Journal Article

Linear-time computation of prefix table for weighted strings & applications

  • Carl Barton
  • Chang Liu
  • Solon P. Pissis

The prefix table of a string is one of the most fundamental data structures of algorithms on strings: it determines the longest factor at each position of the string that matches a prefix of the string. It can be computed in time linear with respect to the size of the string, and hence it can be used efficiently for locating patterns or for regularity searching in strings. A weighted string is a string in which a set of letters may occur at each position with respective occurrence probabilities. Weighted strings, also known as position weight matrices or uncertain strings, naturally arise in many biological contexts; for example, they provide a method to realise approximation among occurrences of the same DNA segment. In this article, given a weighted string x of length n and a constant cumulative weight threshold 1 / z, defined as the minimal probability of occurrence of factors in x, we present an O ( n ) -time algorithm for computing the prefix table of x. Furthermore, we outline a number of applications of this result for solving various problems on non-standard strings, and present some preliminary experimental results.

NeurIPS Conference 2016 Conference Paper

Stochastic Gradient Geodesic MCMC Methods

  • Chang Liu
  • Jun Zhu
  • Yang Song

We propose two stochastic gradient MCMC methods for sampling from Bayesian posterior distributions defined on Riemann manifolds with a known geodesic flow, e. g. hyperspheres. Our methods are the first scalable sampling methods on these manifolds, with the aid of stochastic gradients. Novel dynamics are conceived and 2nd-order integrators are developed. By adopting embedding techniques and the geodesic integrator, the methods do not require a global coordinate system of the manifold and do not involve inner iterations. Synthetic experiments show the validity of the method, and its application to the challenging inference for spherical topic models indicate practical usability and efficiency.

TIST Journal 2015 Journal Article

TerraFly GeoCloud

  • Mingjin Zhang
  • Huibo Wang
  • Yun Lu
  • Tao Li
  • Yudong Guang
  • Chang Liu
  • Erik Edrosa
  • Hongtai Li

With the exponential growth of the usage of web map services, geo-data analysis has become more and more popular. This article develops an online spatial data analysis and visualization system, TerraFly GeoCloud, which helps end-users visualize and analyze spatial data and share the analysis results. Built on the TerraFly Geo spatial database, TerraFly GeoCloud is an extra layer running upon the TerraFly map and can efficiently support many different visualization functions and spatial data analysis models. Furthermore, users can create unique URLs to visualize and share the analysis results. TerraFly GeoCloud also enables the MapQL technology to customize map visualization using SQL-like statements. The system is available at http://terrafly.fiu.edu/GeoCloud/.

IROS Conference 2014 Conference Paper

Modeling and controller design of cooperative robots in workspace sharing human-robot assembly teams

  • Chang Liu
  • Masayoshi Tomizuka

Human workers and robots are two major workforces in modern factories. For safety reasons, they are separated, which limits the productive potentials of both parties. It is promising if we can combine human's flexibility and robot's productivity in manufacturing. This paper investigates the modeling and controller design method of workspace sharing human-robot assembly teams and adopts a two-layer interaction model between the human and the robot. In theoretical analysis, enforcing invariance in a safe set guarantees safety. In implementation, an integrated method concerning online learning of closed loop human behavior and receding horizon control in the safe set is proposed. Simulation results in a 2D setup confirm the safety and efficiency of the algorithm.

AAAI Conference 2012 Conference Paper

Large Scale Temporal RDFS Reasoning Using MapReduce

  • Chang Liu
  • Guilin Qi
  • Yong Yu

In this work, we build a large scale reasoning engine under temporal RDFS semantics using MapReduce. We identify the major challenges of applying MapReduce framework to reason over temporal information, and present our solutions to tackle them.