Arrow Research search

Author name cluster

Chao Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

83 papers
2 author rows

Possible papers

83

AAAI Conference 2026 Conference Paper

AIR-DR: Adaptive Image Retargeting with Instance Relocation and Dual-guidance Repainting

  • Zhitong Dong
  • Chao Li
  • Yongjian Deng
  • Hao Chen

Image retargeting aims to adjust the aspect ratio of images to accommodate various display devices. While existing methods consider both foreground semantics and background inpainting, their Seam-carving-based framework is inherently destructive, often compromising the structural integrity of foreground instances. Furthermore, conventional inpainting models struggle to achieve pixel-level accuracy with global-only guidance, leading to local inconsistencies and background distortions. To address these challenges, we reformulate image retargeting as a instance-level re-layout task. By Adaptive Instance Relocation and Dual-guidance Repainting (AIR-DR), our method preserves the structural integrity of the foreground and recovers the background with consistent details. Additionally, we introduce an adaptive retargeting decision that maintains robustness across challenging retargeting scenarios and any ratios. Extensive experiments on multiple public datasets across various aspect ratios demonstrate that our approach consistently outperforms existing methods in both objective metrics and subjective evaluations. Comprehensive ablation studies further validate the effectiveness of each component.

AAAI Conference 2026 Conference Paper

Asymmetric Cross-Modal Knowledge Distillation: Bridging Modalities with Weak Semantic Consistency

  • Riling Wei
  • Kelu Yao
  • Chuanguang Yang
  • Jin Wang
  • Zhuoyan Gao
  • Chao Li

Cross-modal Knowledge Distillation has demonstrated promising performance on paired modalities with strong semantic connections, referred to as Symmetric Cross-modal Knowledge Distillation (SCKD). However, implementing SCKD becomes exceedingly constrained in real-world scenarios due to the limited availability of paired modalities. To this end, we investigate a general and effective knowledge learning concept under weak semantic consistency, dubbed Asymmetric Cross-modal Knowledge Distillation (ACKD), aiming to bridge modalities with limited semantic overlap. Nevertheless, the shift from strong to weak semantic consistency improves flexibility but exacerbates challenges in knowledge transmission costs, which we rigorously verified based on optimal transport theory. To mitigate the issue, we further propose a framework, namely SemBridge, integrating a Student-Friendly Matching module and a Semantic-aware Knowledge Alignment module. The former leverages self-supervised learning to acquire semantic-based knowledge and provide personalized instruction for each student sample by dynamically selecting the relevant teacher samples. The latter seeks the optimal transport path by employing Lagrangian optimization. To facilitate the research, we curate a benchmark dataset derived from two modalities, namely Multi-Spectral (MS) and asymmetric RGB images, tailored for remote sensing scene classification. Comprehensive experiments exhibit that our framework achieves state-of-the-art performance compared with 7 existing approaches on 6 different model architectures across various datasets.

AAAI Conference 2026 Conference Paper

Can Pseudo-Label Be More Reliable? A Simple yet Effective Topology-Aware Graph Self-Training Method

  • Gen Liu
  • Zhongying Zhao
  • Hui Zhou
  • Chao Li
  • Qingtian Zeng

Graph Neural Networks (GNNs) have demonstrated impressive success across a range of graph-based tasks. However, their performance in node classification typically relies on enough high-quality labeled data which are difficult to obtain in practice. Self-training emerges as a promising solution to tackle the issue of label scarcity. Most existing studies in this direction mainly rely on classification scores to explore high-confidence unlabeled samples. Nevertheless, these methods often lead to false positive samples, which hinders the capability of GNNs. To this end, we propose a simple yet effective Topology-Aware Graph Self-Training (TA-GST) method. Specifically, we first explore the origin of false positives in pseudo-labeled samples. We then design a topology-aware scoring method, which considers both the classification score and connectivity pattern to enhance the reliability of pseudo-labeled samples. Besides, we depart TA-GST from the traditional teacher-student pattern and simplify it in an end-to-end manner. Extensive experiments on seven real-world datasets demonstrate the effectiveness of our method.

AAAI Conference 2026 Conference Paper

DARLING: Dual Hypergraph-Enhanced Curriculum-Guided Graph Structure Learning for Node Classification

  • Guangkai Wu
  • Gen Liu
  • Chao Li
  • Qingtian Zeng
  • Hui Zhou
  • Zhongying Zhao

Graph Structure Learning (GSL) aims to simultaneously enhance the original graph and the performance of Graph Neural Networks. However, existing GSL methods for node classification fail to consider neighborhood label dependencies during training, which limits their ability to refine the graph structure in an adaptive manner. Furthermore, the training of those methods lacks a proper schedule based on graph structure quality, thereby yielding suboptimal performance. To address these challenges, we propose a novel GSL framework for node classification, termed DuAl hypeRgraph-enhanced curricuLum-guided graph structure learnING for node classification (DARLING). It first introduces a graph structure curriculum module to effectively discriminate the suboptimal graph structures by examining both the distribution of neighborhood labels and the degree of nodes. Subsequently, a self-supervised dual hypergraph similarity learning module is proposed to capture higher-order neighborhood label dependencies. This is achieved via formulating a pre-training task that involves hyperedge batch-filling within the dual hypergraph of the input graph. The experimental results on six datasets demonstrate that the proposed DARLING outperforms eleven state-of-the-art methods significantly, in terms of effectiveness and robustness.

AAAI Conference 2026 Conference Paper

FreqTAD: Multi-scale Frequency Encoding and Time-Frequency Attention for Anomaly Detection in Dynamic Graphs

  • Chao Li
  • Runshuo Liu
  • Zhongying Zhao
  • Hui Zhou
  • Qingtian Zeng

Anomaly detection in dynamic graphs aims to capture the dynamic evolution characteristics of graphs, and then identify abnormal behaviors that deviate from normal patterns. However, previous studies fail to decouple periodic and bursty information during the time encoding process, which hinders their performances. In addition, most existing methods use attention mechanisms to capture the importance of time points. They fail to leverage the normal and abnormal characteristics in the frequency domain. To address the above issues, we propose a model that integrates multi-scale Frequency encoding with Time-frequency Attention for Anomaly Detection in dynamic graphs, named FreqTAD. We design a multi-scale frequency encoder that decomposes time series into distinct periodic and bursty components. Moreover, we present an effective time-frequency attention mechanism that focuses on frequency components to differentiate frequency-domain features of normal and abnormal behaviors. Experimental results on four datasets demonstrate the superior performance of FreqTAD in both anomaly detection accuracy and computational efficiency.

AAAI Conference 2026 Conference Paper

ICAD-LLM: One-for-All Anomaly Detection via In-Context Learning with Large Language Models

  • Zhongyuan Wu
  • Jingyuan Wang
  • Zexuan Cheng
  • Yilong Zhou
  • Weizhi Wang
  • Juhua Pu
  • Chao Li
  • Changqing Ma

Anomaly detection (AD) is a fundamental task of critical importance across numerous domains. Current systems increasingly operate in rapidly evolving environments that generate diverse yet interconnected data modalities—such as time series, system logs, and tabular records—as exemplified by modern IT systems. Effective AD methods in such environments must therefore possess two critical capabilities: (1) the ability to handle heterogeneous data formats within a unified framework, allowing the model to process and detect multiple modalities in a consistent manner during anomalous events; (2) a strong generalization ability to quickly adapt to new scenarios without extensive retraining. However, most existing methods fall short of these requirements, as they typically focus on single modalities and lack the flexibility to generalize across domains. To address this gap, we introduce a novel paradigm: In-Context Anomaly Detection (ICAD), where anomalies are defined by their dissimilarity to a relevant reference set of normal samples. Under this paradigm, we propose ICAD-LLM, a unified AD framework leveraging Large Language Models' in-context learning abilities to process heterogeneous data within a single model. Extensive experiments demonstrate that ICAD-LLM achieves competitive performance with task-specific AD methods and exhibits strong generalization to previously unseen tasks, which substantially reduces deployment costs and enables rapid adaptation to new environments. To the best of our knowledge, ICAD-LLM is the first model capable of handling anomaly detection tasks across diverse domains and modalities.

AAAI Conference 2026 Conference Paper

Stage-Aware Graph Contrastive Learning with Node-oriented Mixture of Experts

  • Xiangkai Zhu
  • Yeyu Yan
  • Saiqin Long
  • Chao Li
  • Guanwen Chen
  • Longsheng Su

Text-attributed graphs (TAGs), which associate rich textual descriptions with each node, are widely employed to represent complex relationships among real-world textual entities. Currently, representation learning for TAGs leverages large language models (LLMs) to transform node-matched textual descriptions into node features or labels, followed by the message passing in graph neural networks (GNNs) that further improves the expressiveness of graph representation learning. Nevertheless, a simple experiment we conducted demonstrates that not all LLMs are readily compatible with GNNs. A salient finding indicates that architectural heterogeneity among LLMs manifests as substantial performance gap across diverse TAGs representation learning. Moreover, the node semantics encoded by LLMs are often misaligned with the message passing in GNNs, causing performance collapse. Motivated by this observation, we propose a novel self-supervised graph learning framework called Stage-Aware Graph Contrastive Learning (SAGCL). In particular, we propose the node-oriented mixture of experts (NodeMoE) to assign suitable candidate experts for each node. It flexibly balances the strengths of different language experts by low-rank decomposition and reparameterization strategies. Subsequently, to align the inductive biases of graph structures with the semantic perception capabilities of LLMs, the message passing in GNNs is decoupled into the feature transformation stage and the feature propagation stage. Given the two stage views, stage-aware graph contrastive learning is proposed to match the node semantics encoded by the LLM with the locally aware topological patterns within the GNN via self-supervised contrastive learning. Experiments on eight datasets and three downstream tasks demonstrate the effectiveness of SAGCL.

AAAI Conference 2026 Conference Paper

State-Derivative-Aware Neural Controlled Differential Equations for Multivariate Time Series Anomaly Detection and Diagnosis

  • Xin Sun
  • Heng Zhou
  • Yuhao Wu
  • Chao Li

Multivariate time series anomaly detection is a crucial factor in real-world applications but a challenging task due to the complex temporal dependencies and system dynamics. Reconstruction-based methods have made great improvements in recent years. However, we observe an issue these methods are suffering, that they primarily measure deviations in the time points themselves when performing anomaly detection but ignore changes in the dynamic properties of the system. In these cases, they are unable to produce sufficient reconstruction errors to detect anomalies, so some potential abnormal time points caused by the dynamic evolution of the system are missing. To address this problem, we propose a novel method, SDA2D, which models system dynamics by the derivative of the NCDE-derived state vector with respect to time, enabling the learning of reconstruction deviation and system evolution jointly. Our experimental results show that SDA2D achieves noticeable improvements in four benchmark datasets, and the visualization also provides further instructions for anomaly diagnosis, which helps locate the sources of these anomalies.

AAAI Conference 2025 Conference Paper

Beyond Mandatory Federations: Balancing Egoism, Utilitarianism and Egalitarianism in Mixed-Motive Games

  • Shaokang Dong
  • Chao Li
  • Shangdong Yang
  • Hongye Cao
  • Wanqi Yang
  • Yang Gao

In the field of mixed-motive games, extensive multi-agent learning studies have explored the balance between egoism (individual interest), utilitarianism (collective interest), and egalitarianism (fairness). Traditional approaches often rely on manually designed reward functions, social norms, and alliance/federation mechanisms to transition agents from individualistic behaviors toward cooperative strategies. However, these methods typically require all agents to share private local information or to mandatorily participate in federations, which is impractical in real-world applications. To address these issues, this paper proposes a Flexible-Participation Federation (FPF) framework that allows agents to participate in the federation voluntarily. Furthermore, we extend the federation from a global to a Local Multi-Federation (LMF) framework, enabling agents to form multiple localized federations, thereby promoting more efficient and adaptive cooperation. Theoretical evidence demonstrates that the global FPF model, along with the discrepancy between decentralized egoistic policies and federated utilitarian policies, achieves an O(1/T) convergence rate. Agents in the LMF framework also reach consensus within a sublinear gap. Extensive experiments show that agents opting out of federation participation experience a reduction in egoism, and our approach outperforms multiple baselines in terms of both utilitarianism and egalitarianism.

AAAI Conference 2025 Conference Paper

Bridging Traffic State and Trajectory for Dynamic Road Network and Trajectory Representation Learning

  • Chengkai Han
  • Jingyuan Wang
  • Yongyao Wang
  • Xie Yu
  • Hao Lin
  • Chao Li
  • Junjie Wu

Effective urban traffic management is vital for sustainable city development, relying on intelligent systems with machine learning tasks such as traffic flow prediction and travel time estimation. Traditional approaches usually focus on static road network and trajectory representation learning, and overlook the dynamic nature of traffic states and trajectories, which is crucial for downstream tasks. To address this gap, we propose TRACK, a novel framework to bridge traffic state and trajectory data for dynamic road network and trajectory representation learning. TRACK leverages graph attention networks (GAT) to encode static and spatial road segment features, and introduces a transformer-based model for trajectory representation learning. By incorporating transition probabilities from trajectory data into GAT attention weights, TRACK captures dynamic spatial features of road segments. Meanwhile, TRACK designs a traffic transformer encoder to capture the spatial-temporal dynamics of road segments from traffic state data. To further enhance dynamic representations, TRACK proposes a co-attentional transformer encoder and a trajectory-traffic state matching task. Extensive experiments on real-life urban traffic datasets demonstrate the superiority of TRACK over state-of-the-art baselines. Case studies confirm TRACK’s ability to capture spatial-temporal dynamics effectively.

IJCAI Conference 2025 Conference Paper

CoLA-Former: Graph Transformer Using Communal Linear Attention for Lightweight Sequential Recommendation

  • Zhongying Zhao
  • Jinyu Zhang
  • Chuanxu Jia
  • Chao Li
  • Yanwei Yu
  • Qingtian Zeng

Graph Transformer has shown great promise in capturing the dynamics of user preferences for sequential recommendations. However, the self-attention mechanism within its structure is of quadratic complexity, posing challenges for deployment on devices with limited resources. To this end, we propose a Communal Linear Attention-enhanced Graph TransFormer for lightweight sequential recommendation, namely CoLA-Former. Specifically, we introduce a Communal Linear Attention (CoLAttention) mechanism. It utilizes low-rank yet reusable communal units to calculate the global correlations on sequential graphs. The weights from the units are also made communal across different training batches, enabling inter-batch global weighting. Moreover, we devise a low-rank approximation component. It utilizes weights distillation to reduce the scale of the trainable parameters in the Graph Transformer network. Extensive experimental results on three real-world datasets demonstrate that the proposed CoLA-Former significantly outperforms twelve state-of-the-art methods in accuracy and efficiency. The datasets and codes are available at https: //github. com/ZZY-GraphMiningLab/CoLA_Former.

TAAS Journal 2025 Journal Article

DeFeed: Secure Decentralized Cross-Contract Data Feed in Web 3.0 for Connected Autonomous Vehicles

  • Xingchen Sun
  • Runhua Xu
  • Wei Ni
  • Li Duan
  • Chao Li

Smart contracts have been a topic of interest in blockchain research and are a key enabling technology for Connected Autonomous Vehicles (CAVs) in the era of Web 3.0. These contracts enable trustless interactions without the need for intermediaries, as they operate based on predefined rules encoded on the blockchain. However, smart contacts face significant challenges in cross-contract communication and information sharing, making it difficult to establish seamless connectivity and collaboration among CAVs with Web 3.0. In this paper, we propose DeFeed, a novel secure protocol that incorporates various gas-saving functions for CAVs, originated from in-depth research into the interaction among smart contracts for decentralized cross-contract data feed in Web 3.0. DeFeed allows smart contracts to obtain information from other contracts efficiently in a single click, without complicated operations. We judiciously design and complete various functions with DeFeed, including a pool function and a cache function for gas optimization, a subscribe function for facilitating data access, and an update function for the future iteration of our protocol. Tailored for CAVs with Web 3.0 use cases, DeFeed enables efficient data feed between smart contracts underpinning decentralized applications and vehicle coordination. Implemented and tested on the Ethereum official test network, DeFeed demonstrates significant improvements in contract interaction efficiency, reducing computational complexity and gas costs. Our solution represents a critical step towards seamless, decentralized communication in Web 3.0 ecosystems.

ICLR Conference 2025 Conference Paper

DoF: A Diffusion Factorization Framework for Offline Multi-Agent Reinforcement Learning

  • Chao Li
  • Ziwei Deng
  • Chenxing Lin
  • Wenqi Chen
  • Yongquan Fu
  • Weiquan Liu
  • Chenglu Wen
  • Cheng Wang 0003

Diffusion models have been widely adopted in image and language generation and are now being applied to reinforcement learning. However, the application of diffusion models in offline cooperative Multi-Agent Reinforcement Learning (MARL) remains limited. Although existing studies explore this direction, they suffer from scalability or poor cooperation issues due to the lack of design principles for diffusion-based MARL. The Individual-Global-Max (IGM) principle is a popular design principle for cooperative MARL. By satisfying this principle, MARL algorithms achieve remarkable performance with good scalability. In this work, we extend the IGM principle to the Individual-Global-identically-Distributed (IGD) principle. This principle stipulates that the generated outcome of a multi-agent diffusion model should be identically distributed as the collective outcomes from multiple individual-agent diffusion models. We propose DoF, a diffusion factorization framework for Offline MARL. It uses noise factorization function to factorize a centralized diffusion model into multiple diffusion models. We theoretically show that the noise factorization functions satisfy the IGD principle. Furthermore, DoF uses data factorization function to model the complex relationship among data generated by multiple diffusion models. Through extensive experiments, we demonstrate the effectiveness of DoF. The source code is available at [https://github.com/xmu-rl-3dv/DoF](https://github.com/xmu-rl-3dv/DoF).

IJCAI Conference 2025 Conference Paper

Generate or Re-Weight? A Mutual-Guidance Method for Class-Imbalanced Graphs

  • Zhongying Zhao
  • Gen Liu
  • Qi Meng
  • Chao Li
  • Qingtian Zeng

Class imbalance is a widespread problem in graph-structured data. The existing studies tailored for class-imbalanced graphs are typically categorized into generative and re-weighting methods. However, the former merely focuses on quantity balance rather than learning balance. The latter performs the fine-tuning in a majority-minority paradigm, overlooking the authentic-generative one. In fact, the collaboration of them is capable of relieving their respective limitations. To this end, we propose a Mutual-Guidance method for class-imbalanced graphs, namely GraphMuGu. Specifically, we first design an uncertainty-aware method to quantify the number of synthesized samples for each category. Furthermore, we devise a similarity-aware method to re-weight the importance of the authentic and generative samples. To the best our knowledge, the proposed GraphMuGu is the first try to incorporate the generative and re-weighting methods into a unified framework. The experimental results on five class-imbalanced datasets demonstrate the superiority of the proposed method. The source codes are available at https: //github. com/ZZY-GraphMiningLab/GraphMuGu.

NeurIPS Conference 2025 Conference Paper

In-Context Fully Decentralized Cooperative Multi-Agent Reinforcement Learning

  • Chao Li
  • Bingkun BAO
  • Yang Gao

In this paper, we consider fully decentralized cooperative multi-agent reinforcement learning, where each agent has access only to the states, its local actions, and the shared rewards. The absence of information about other agents' actions typically leads to the non-stationarity problem during per-agent value function updates, and the relative overgeneralization issue during value function estimation. However, existing works fail to address both issues simultaneously, as they lack the capability to model the agents' joint policy in a fully decentralized setting. To overcome this limitation, we propose a simple yet effective method named Return-Aware Context (RAC). RAC formalizes the dynamically changing task, as locally perceived by each agent, as a contextual Markov Decision Process (MDP), and addresses both non-stationarity and relative overgeneralization through return-aware context modeling. Specifically, the contextual MDP attributes the non-stationary local dynamics of each agent to switches between contexts, each corresponding to a distinct joint policy. Then, based on the assumption that the joint policy changes only between episodes, RAC distinguishes different joint policies by the training episodic return and constructs contexts using discretized episodic return values. Accordingly, RAC learns a context-based value function for each agent to address the non-stationarity issue during value function updates. For value function estimation, an individual optimistic marginal value is constructed to encourage the selection of optimal joint actions, thereby mitigating the relative overgeneralization problem. Experimentally, we evaluate RAC on various cooperative tasks (including matrix game, predator and prey, and SMAC), and its significant performance validates its effectiveness.

JBHI Journal 2025 Journal Article

Large Model Driven Multi-Granularity Medical Image Analysis: A Fuzzy Logic-Guided Framework

  • Guan Wang
  • Mingyu Xu
  • Chao Li
  • Xingsi Xue
  • Bo Yi
  • Jing Yang

The analysis of medical images requires sophisticated computational approaches that can handle the inherent complexity and uncertainty present in pathological structures. This paper presents a large model driven framework that integrates fuzzy logic principles with transformer-based architectures to enable multi-granularity medical image analysis. The proposed approach, termed ULVM-MG, employs a sophisticated feature extraction strategy that simultaneously processes pathological images at coarse, medium, and fine granularity levels, mirroring the systematic examination methodology employed by experienced pathologists. In particular, a fuzzy-guided cross-attention mechanism directs the transformer's attention toward diagnostically significant regions while preserving essential contextual information. regions while preserving essential contextual information. Comprehensive evaluation on histopathological datasets demonstrates superior performance compared to state-of-the-art transformer-based approaches. ULVM-MG achieves 98. 76% and 97. 34% accuracy on LC25000 and NCT datasets, respectively, outperforming the best baseline by 1. 61% and 2. 17%. The framework excels particularly in distinguishing morphologically similar tissue types and benign versus malignant classification tasks. Ablation studies confirm the critical contributions of multi-granularity processing and fuzzy uncertainty modeling, with statistical analysis revealing significant performance improvements across all evaluation metrics.

AAAI Conference 2025 Conference Paper

Lightweight Yet Fine-Grained: A Graph Capsule Convolutional Network with Subspace Alignment for Shared-Account Sequential Recommendation

  • Jinyu Zhang
  • Zhongying Zhao
  • Chao Li
  • Yanwei Yu

Shared-account Sequential Recommendation (SSR) aims to provide personalized recommendations for accounts shared by multiple users with varying sequential preferences. Previous studies on SSR struggle to capture the fine-grained associations between interactions and different latent users within the shared account's hybrid sequences. Moreover, most existing SSR methods (e.g., RNN-based or GCN-based methods) have quadratic computational complexities, hindering the deployment of SSRs on resource-constrained devices. To this end, we propose a Lightweight Graph Capsule Convolutional Network with subspace alignment for shared-account sequential recommendation, named LightGC2N. Specifically, we devise a lightweight graph capsule convolutional network. It facilitates the fine-grained matching between interactions and latent users by attentively propagating messages on the capsule graphs. Besides, we present an efficient subspace alignment method. This method refines the sequence representations and then aligns them with the finely clustered preferences of latent users. The experimental results on four real-world datasets indicate that LightGC2N outperforms nine state-of-the-art methods in accuracy and efficiency.

IROS Conference 2025 Conference Paper

MISCGrasp: Leveraging Multiple Integrated Scales and Contrastive Learning for Enhanced Volumetric Grasping

  • Qingyu Fan
  • Yinghao Cai
  • Chao Li
  • Chunting Jiao
  • Xudong Zheng
  • Tao Lu 0006
  • Bin Liang
  • Shuo Wang 0001

Robotic grasping faces challenges in adapting to objects with varying shapes and sizes. In this paper, we introduce MISCGrasp, a volumetric grasping method that integrates multi-scale feature extraction with contrastive feature enhancement for self-adaptive grasping. We propose a query-based interaction between high-level and low-level features through the Insight Transformer, while the Empower Transformer selectively attends to the highest-level features, which synergistically strikes a balance between focusing on fine geometric details and overall geometric structures. Furthermore, MISCGrasp utilizes multi-scale contrastive learning to exploit similarities among positive grasp samples, ensuring consistency across multi-scale features. Extensive experiments in both simulated and real-world environments demonstrate that MISCGrasp outperforms baseline and variant methods in tabletop decluttering tasks. More details are available at https://miscgrasp.github.io/.

ICML Conference 2025 Conference Paper

Morse: Dual-Sampling for Lossless Acceleration of Diffusion Models

  • Chao Li
  • Jiawei Fan
  • Anbang Yao

In this paper, we present $Morse$, a simple dual-sampling framework for accelerating diffusion models losslessly. The key insight of Morse is to reformulate the iterative generation (from noise to data) process via taking advantage of fast jump sampling and adaptive residual feedback strategies. Specifically, Morse involves two models called $Dash$ and $Dot$ that interact with each other. The Dash model is just the pre-trained diffusion model of any type, but operates in a jump sampling regime, creating sufficient space for sampling efficiency improvement. The Dot model is significantly faster than the Dash model, which is learnt to generate residual feedback conditioned on the observations at the current jump sampling point on the trajectory of the Dash model, lifting the noise estimate to easily match the next-step estimate of the Dash model without jump sampling. By chaining the outputs of the Dash and Dot models run in a time-interleaved fashion, Morse exhibits the merit of flexibly attaining desired image generation performance while improving overall runtime efficiency. With our proposed weight sharing strategy between the Dash and Dot models, Morse is efficient for training and inference. Our method shows a lossless speedup of 1. 78$\times$ to 3. 31$\times$ on average over a wide range of sampling step budgets relative to 9 baseline diffusion models on 6 image generation tasks. Furthermore, we show that our method can be also generalized to improve the Latent Consistency Model (LCM-SDXL, which is already accelerated with consistency distillation technique) tailored for few-step text-to-image synthesis. The code and models are available at https: //github. com/deep-optimization/Morse.

NeurIPS Conference 2025 Conference Paper

Multivariate Time Series Anomaly Detection with Idempotent Reconstruction

  • Xin Sun
  • Heng Zhou
  • Chao Li

Reconstruction-based methods are competitive choices for multivariate time series anomaly detection (MTS AD). However, one challenge these methods may suffer is over generalization, where abnormal inputs are also well reconstructed. In addition, balancing robustness and sensitivity is also important for final performance, as robustness ensures accurate detection in potentially noisy data, while sensitivity enables early detection of subtle anomalies. To address these problems, inspired by idempotent generative network, we take the view from the manifold and propose a novel module named I dempotent G eneration for A nomaly D etection (IGAD) which can be flexibly combined with a reconstruction-based method without introducing additional trainable parameters. We modify the manifold to make sure that normal time points can be mapped onto it while tightening it to drop out abnormal time points simultaneously. Regarding the latest findings of AD metrics, we evaluated IGAD on various methods with four real-world datasets, and they achieve visible improvements in VUS-PR than their predecessors, demonstrating the effective potential of IGAD for further improvements in MTS AD tasks. Our instructions on integrating IGAD into customized models and example codes are available at https: //github. com/ProEcho1/Idempotent-Generation-for-Anomaly-Detection-IGAD.

ICRA Conference 2025 Conference Paper

NeuGrasp: Generalizable Neural Surface Reconstruction with Background Priors for Material-Agnostic Object Grasp Detection

  • Qingyu Fan
  • Yinghao Cai
  • Chao Li
  • Wenzhe He
  • Xudong Zheng
  • Tao Lu 0006
  • Bin Liang
  • Shuo Wang 0001

Robotic grasping in scenes with transparent and specular objects presents great challenges for methods relying on accurate depth information. In this paper, we introduce NeuGrasp, a neural surface reconstruction method that leverages background priors for material-agnostic grasp detection. NeuGrasp integrates transformers and global prior volumes to aggregate multi-view features with spatial encoding, enabling robust surface reconstruction in narrow and sparse viewing conditions. By focusing on foreground objects through residual feature enhancement and refining spatial perception with an occupancy-prior volume, NeuGrasp excels in handling objects with transparent and specular surfaces. Extensive experiments in both simulated and real-world scenarios show that NeuGrasp outperforms state-of-the-art methods in grasping while maintaining comparable reconstruction quality. More details are available at https://neugrasp.github.io/.

IROS Conference 2025 Conference Paper

SF-TIM: A Simple Framework for Enhancing Quadrupedal Robot Jumping Agility by Combining Terrain Imagination and Measurement

  • Ze Wang 0009
  • Yang Li
  • Long Xu 0002
  • Hao Shi 0004
  • Zunwang Ma
  • Zhen Chu
  • Chao Li
  • Fei Gao 0011

Dynamic jumping on high platforms and over gaps differentiates legged robots from wheeled counterparts. Compared to walking on rough terrains, dynamic locomotion on abrupt surfaces requires fusing proprioceptive and exteroceptive perception for explosive movements. In this paper, we propose SF-TIM (Simple Framework combining Terrain Imagination and Measurement), a single-policy method that enhances quadrupedal robot jumping agility, while preserving their fundamental blind walking capabilities. In addition, we introduce a terrain-guided reward design specifically to assist quadrupedal robots in high jumping, improving their performance in this task. To narrow the simulation-to-reality gap in quadrupedal robot learning, we introduce a stable and high-speed elevation map generation framework, enabling zero-shot simulation-to-reality transfer of locomotion ability. Our algorithm has been deployed and validated on both the small-/large-size quadrupedal robots, demonstrating its effectiveness in real-world applications: the robot has successfully traversed various high platforms and gaps, showing the robustness of our proposed approach. A demo video has been made available at https://flysoaryun.github.io/SF-TIM.

AAAI Conference 2025 Conference Paper

Teacher-guided Edge Discriminator for Personalized Graph Masked Autoencoder

  • Qiqi Zhang
  • Chao Li
  • Zhongying Zhao

Graph Masked AutoEncoder (GMAE) has recently attracted vast interest in handling graph-related tasks by adopting the 'masking-reconstruction' learning paradigm. Most existing GMAE-based methods adhere to the homophily assumption, i.e., connected nodes share the same attributes or labels. However, this assumption is not always right because most graphs from real-world applications are mixed by both homophilic and heterophilic edges. Therefore, it is necessary to distinguish them to improve the representative ability of GMAE. In this paper, we propose a teacher-guided edge discriminator for the personalized graph masked autoencoder (TEDMAE). Specifically, we design a teacher-guided edge discriminator that distinguishes homophilic and heterophilic edges by leveraging the embeddings from teacher models with structure and attribute knowledge. Then, we present a personalized graph masked autoencoder that individually tailors the masking, encoding, and reconstruction processes for each graph. Finally, we optimize the model by minimizing two types of loss functions, i.e., the scaled cosine error (SCE) loss and the InfoNCE loss. Experimental results on 10 datasets demonstrate the superior performance of TEDMAE on the tasks of node classification and node clustering.

TMLR Journal 2024 Journal Article

Analyzing the Impact of Learnable Softmax Temperature in Contrastive Visual-Textual Alignment Systems: Benefits, Drawbacks, and Alternative Approaches

  • Zhun Sun
  • Chao Li

This work does NOT read like “fabricate motivation - propose something - obtain sota results”. Instead, we provide an in-depth analysis of the learnable softmax temperature parameter in the practical training of contrastive visual-textual alignment models, commonly known as CLIP models. This parameter is critical for optimal system performance, yet its mechanism and potential drawbacks have been largely overlooked. Our study addresses this gap and proposes a novel solution by utilizing the architecture of Vision Transformers (ViTs). We focus on the crucial role of the softmax temperature in managing noisy training data. We demonstrate that there is a balance in the gradient of the contrastive loss, with the temperature parameter acting as a distance scaling factor. If not properly calibrated, the model struggles to align positive pairs due to numerical issues in the loss term. Conversely, a high temperature can lead to unstable learning dynamics. We explore alternative approaches to mitigate this problem from a topological perspective of the contrastive loss. Ultimately, we leverage multiple class tokens embedded within the transformer architecture to present a concise solution. This configuration significantly enhances zero-shot classification performance, improving baseline CLIP models pretrained on large-scale datasets by an average of 6.1%.

JBHI Journal 2024 Journal Article

Attention-Based Temporal Graph Representation Learning for EEG-Based Emotion Recognition

  • Chao Li
  • Feng Wang
  • Ziping Zhao
  • Haishuai Wang
  • Björn W. Schuller

Due to the objectivity of emotional expression in the central nervous system, EEG-based emotion recognition can effectively reflect humans' internal emotional states. In recent years, convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have made significant strides in extracting local features and temporal dependencies from EEG signals. However, CNNs ignore spatial distribution information from EEG electrodes; moreover, RNNs may encounter issues such as exploding/vanishing gradients and high time consumption. To address these limitations, we propose an attention-based temporal graph representation network (ATGRNet) for EEG-based emotion recognition. Firstly, a hierarchical attention mechanism is introduced to integrate feature representations from both frequency bands and channels ordered by priority in EEG signals. Second, a graph convolutional neural network with top-k operation is utilized to capture internal relationships between EEG electrodes under different emotion patterns. Next, a residual-based graph readout mechanism is applied to accumulate the EEG feature node-level representations into graph-level representations. Finally, the obtained graph-level representations are fed into a temporal convolutional network (TCN) to extract the temporal dependencies between EEG frames. We evaluated our proposed ATGRNet on the SEED, DEAP and FACED datasets. The experimental findings show that the proposed ATGRNet surpasses the state-of-the-art graph-based mehtods for EEG-based emotion recognition.

IROS Conference 2024 Conference Paper

Decentralized Communication-Maintained Coordination for Multi-Robot Exploration: Achieving Connectivity and Adaptability

  • Wei Tang
  • Chao Li
  • Jun Wu 0003
  • Qiuguo Zhu

The realm of multi-robot autonomous exploration tasks underscores the critical role of communication in coordinating group activities. This paper introduces an innovative decentralized multi-robot exploration algorithm, meticulously crafted to ensure unbroken communication within robotic groups, a crucial element for effective coordination. The motivation for our work is two-fold: Firstly, seamless communication is vital for coordinating multi-robot autonomous exploration tasks. Secondly, in applications such as disaster rescue operations or military maneuvers, there are numerous scenarios where spatial congregation of multiple robots is imperative for joint task accomplishment. Our approach addresses these challenges through a stringent communication constraint, ensuring that each robot remains in constant communicative contact with the rest of the group. This is realized by employing a decentralized policy that integrates Graph Neural Network (GNN) layers with self-attention mechanism. Such policy network design allows adaptation to different numbers of robots and varied environments. After an initial imitation learning phase, the policy is refined through learning from experiences generated via a tree-search-based lookahead technique. Our experimental analysis validates that the algorithm not only maintains consistent communication links among all group members but also improve the exploration efficiency under the communication constraints. These results highlight the potential of our method in enhancing the effectiveness of robotic group explorations while ensuring robust communication connection.

NeurIPS Conference 2024 Conference Paper

Dual Defense: Enhancing Privacy and Mitigating Poisoning Attacks in Federated Learning

  • Runhua Xu
  • Shiqi Gao
  • Chao Li
  • James Joshi
  • Jianxin Li

Federated learning (FL) is inherently susceptible to privacy breaches and poisoning attacks. To tackle these challenges, researchers have separately devised secure aggregation mechanisms to protect data privacy and robust aggregation methods that withstand poisoning attacks. However, simultaneously addressing both concerns is challenging; secure aggregation facilitates poisoning attacks as most anomaly detection techniques require access to unencrypted local model updates, which are obscured by secure aggregation. Few recent efforts to simultaneously tackle both challenges offen depend on impractical assumption of non-colluding two-server setups that disrupt FL's topology, or three-party computation which introduces scalability issues, complicating deployment and application. To overcome this dilemma, this paper introduce a Dual Defense Federated learning (DDFed) framework. DDFed simultaneously boosts privacy protection and mitigates poisoning attacks, without introducing new participant roles or disrupting the existing FL topology. DDFed initially leverages cutting-edge fully homomorphic encryption (FHE) to securely aggregate model updates, without the impractical requirement for non-colluding two-server setups and ensures strong privacy protection. Additionally, we proposes a unique two-phase anomaly detection mechanism for encrypted model updates, featuring secure similarity computation and feedback-driven collaborative selection, with additional measures to prevent potential privacy breaches from Byzantine clients incorporated into the detection process. We conducted extensive experiments on various model poisoning attacks and FL scenarios, including both cross-device and cross-silo FL. Experiments on publicly available datasets demonstrate that DDFed successfully protects model privacy and effectively defends against model poisoning threats.

ICML Conference 2024 Conference Paper

Enhancing Vision Transformer: Amplifying Non-Linearity in Feedforward Network Module

  • Yixing Xu
  • Chao Li
  • Dong Li 0025
  • Xiao Sheng
  • Fan Jiang
  • Lu Tian
  • Ashish Sirasao
  • Emad Barsoum

Transformer models have been gaining substantial interest in the field of computer vision tasks nowadays. Although a vision transformer contains two important components which are self-attention module and feedforward network (FFN) module, the majority of research tends to concentrate on modifying the former while leaving the latter in its original form. In this paper, we focus on improving the FFN module within the vision transformer. Through theoretical analysis, we demonstrate that the effect of the FFN module primarily lies in providing non-linearity, whose degree corresponds to the hidden dimensions. Thus, the computational cost of the FFN module can be reduced by enhancing the degree of non-linearity in the nonlinear function. Leveraging this insight, we propose an improved FFN (IFFN) module for vision transformers which involves the usage of the arbitrary GeLU (AGeLU) function and integrating multiple instances of it to augment non-linearity so that the number of hidden dimensions can be effectively reduced. Besides, a spatial enhancement part is involved to further enrich the non-linearity in the proposed IFFN module. Experimental results show that we can apply our method to a wide range of state-of-the-art vision transformer models irrespective of how they modify their self-attention part and the overall architecture, and reduce FLOPs and parameters without compromising classification accuracy on the ImageNet dataset.

IROS Conference 2024 Conference Paper

Feasible Region Construction by Polygon Merging for Continuous Bipedal Walking

  • Chao Li
  • Xuechao Chen
  • Hengbo Qi
  • Qingqing Li 0004
  • Qingrui Zhao
  • Yongliang Shi
  • Zhangguo Yu
  • Lingxuan Zhao

Feasible regions for continuous walking must provide necessary information for footstep planning, including surrounding landing areas and details about obstacles to be avoided during foot swing. However, the current frame lacks sufficient information to construct a feasible region needed at the current moment due to knee occlusion. To this end, this paper uses polygon merging to construct an information-complete feasible region. This polygon merging refers to merging polygons from the current frame and a specific previous frame. Since the polygon is more concise and efficient than point cloud for environmental representation, construction can be completed quickly without GPU acceleration. Experiments show that the proposed method successfully constructs informative feasible regions within the allowed time frame, enabling the robot to navigate stairs.

ICML Conference 2024 Conference Paper

KernelWarehouse: Rethinking the Design of Dynamic Convolution

  • Chao Li
  • Anbang Yao

Dynamic convolution learns a linear mixture of $n$ static kernels weighted with their input-dependent attentions, demonstrating superior performance than normal convolution. However, it increases the number of convolutional parameters by $n$ times, and thus is not parameter efficient. This leads to no research progress that can allow researchers to explore the setting $n > 100$ (an order of magnitude larger than the typical setting $n < 10$) for pushing forward the performance boundary of dynamic convolution while enjoying parameter efficiency. To fill this gap, in this paper, we propose KernelWarehouse, a more general form of dynamic convolution, which redefines the basic concepts of “kernels”, “assembling kernels” and “attention function” through the lens of exploiting convolutional parameter dependencies within the same layer and across neighboring layers of a ConvNet. We testify the effectiveness of KernelWarehouse on ImageNet and MS-COCO datasets using various ConvNet architectures. Intriguingly, KernelWarehouse is also applicable to Vision Transformers, and it can even reduce the model size of a backbone while improving the model accuracy. For instance, KernelWarehouse ($n = 4$) achieves 5. 61%|3. 90%|4. 38% absolute top-1 accuracy gain on the ResNet18|MobileNetV2|DeiT-Tiny backbone, and KernelWarehouse ($n = 1/4$) with 65. 10% model size reduction still achieves 2. 29% gain on the ResNet18 backbone. The code and models are available at https: //github. com/OSVAI/KernelWarehouse.

ICRA Conference 2024 Conference Paper

LIKO: LiDAR, Inertial, and Kinematic Odometry for Bipedal Robots

  • Qingrui Zhao
  • Mingyuan Li
  • Yongliang Shi
  • Xuechao Chen
  • Zhangguo Yu
  • Lianqiang Han
  • Zhenyuan Fu
  • Jintao Zhang

High-frequency and accurate state estimation is crucial for biped robots. This paper presents a tightly-coupled LiDAR-Inertial-Kinematic Odometry (LIKO) for biped robot state estimation based on an iterated extended Kalman filter. Beyond state estimation, the foot contact position is also modeled and estimated. This allows for both position and velocity updates from kinematic measurement. Additionally, the use of kinematic measurement results in an increased output state frequency of about 1kHz. This ensures temporal continuity of the estimated state and makes it practical for control purposes of biped robots. We also announce a biped robot dataset consisting of LiDAR, inertial measurement unit (IMU), joint encoders, force/torque (F/T) sensors, and motion capture ground truth to evaluate the proposed method. The dataset is collected during robot locomotion, and our approach reached the best quantitative result among other LIO-based methods and biped robot state estimation algorithms. The dataset and source code will be available at https://github.com/Mr-Zqr/LIKO.

NeurIPS Conference 2024 Conference Paper

Long-range Meta-path Search on Large-scale Heterogeneous Graphs

  • Chao Li
  • Zijie Guo
  • Qiuting He
  • Kun He

Utilizing long-range dependency, a concept extensively studied in homogeneous graphs, remains underexplored in heterogeneous graphs, especially on large ones, posing two significant challenges: Reducing computational costs while maximizing effective information utilization in the presence of heterogeneity, and overcoming the over-smoothing issue in graph neural networks. To address this gap, we investigate the importance of different meta-paths and introduce an automatic framework for utilizing long-range dependency on heterogeneous graphs, denoted as Long-range Meta-path Search through Progressive Sampling (LMSPS). Specifically, we develop a search space with all meta-paths related to the target node type. By employing a progressive sampling algorithm, LMSPS dynamically shrinks the search space with hop-independent time complexity. Through a sampling evaluation strategy, LMSPS conducts a specialized and effective meta-path selection, leading to retraining with only effective meta-paths, thus mitigating costs and over-smoothing. Extensive experiments across diverse heterogeneous datasets validate LMSPS's capability in discovering effective long-range meta-paths, surpassing state-of-the-art methods. Our code is available at https: //github. com/JHL-HUST/LMSPS.

IROS Conference 2024 Conference Paper

Novel Multiport Output Twisted String Actuator with Self-differential Mechanism: Hand Glove Application

  • Dunwen Wei
  • Chengguang Cui
  • Haitao Yu
  • Tao Gao
  • Chao Li
  • Sajjad Hussain
  • Fanny Ficuciello

The differential mechanism can reduce the number of actuators and efficiently distribute force or power. We proposed a novel multiport output twisted string actuator (MO-TSA) with self-differential mechanism that employs a single actuator to achieve multiport outputs. The differential MO-TSA is adaptively controlled in accordance with the force differences at each output port, thus replacing the traditional differential gears and whiffletree mechanisms. Inspired by the hand muscles, we designed one hand glove using the MO-TSA, aiming to enhance the range of achievable grasp configurations. The hand glove is capable of performing various grasps with a single actuator, resulting in a lighter and simpler hand design and revolutionizing the field of twisted string actuators (TSAs) by offering a streamlined solution for achieving versatile actuation.

AAAI Conference 2024 Conference Paper

Optimistic Value Instructors for Cooperative Multi-Agent Reinforcement Learning

  • Chao Li
  • Yupeng Zhang
  • Jianqi Wang
  • Yujing Hu
  • Shaokang Dong
  • Wenbin Li
  • Tangjie Lv
  • Changjie Fan

In cooperative multi-agent reinforcement learning, decentralized agents hold the promise of overcoming the combinatorial explosion of joint action space and enabling greater scalability. However, they are susceptible to a game-theoretic pathology called relative overgeneralization that shadows the optimal joint action. Although recent value-decomposition algorithms guide decentralized agents by learning a factored global action value function, the representational limitation and the inaccurate sampling of optimal joint actions during the learning process make this problem still. To address this limitation, this paper proposes a novel algorithm called Optimistic Value Instructors (OVI). The main idea behind OVI is to introduce multiple optimistic instructors into the value-decomposition paradigm, which are capable of suggesting potentially optimal joint actions and rectifying the factored global action value function to recover these optimal actions. Specifically, the instructors maintain optimistic value estimations of per-agent local actions and thus eliminate the negative effects caused by other agents' exploratory or sub-optimal non-cooperation, enabling accurate identification and suggestion of optimal joint actions. Based on the instructors' suggestions, the paper further presents two instructive constraints to rectify the factored global action value function to recover these optimal joint actions, thus overcoming the RO problem. Experimental evaluation of OVI on various cooperative multi-agent tasks demonstrates its superior performance against multiple baselines, highlighting its effectiveness.

NeurIPS Conference 2024 Conference Paper

Prune and Repaint: Content-Aware Image Retargeting for any Ratio

  • Feihong Shen
  • Chao Li
  • Yifeng Geng
  • Yongjian Deng
  • Hao Chen

Image retargeting is the task of adjusting the aspect ratio of images to suit different display devices or presentation environments. However, existing retargeting methods often struggle to balance the preservation of key semantics and image quality, resulting in either deformation or loss of important objects, or the introduction of local artifacts such as discontinuous pixels and inconsistent regenerated content. To address these issues, we propose a content-aware retargeting method called PruneRepaint. It incorporates semantic importance for each pixel to guide the identification of regions that need to be pruned or preserved in order to maintain key semantics. Additionally, we introduce an adaptive repainting module that selects image regions for repainting based on the distribution of pruned pixels and the proportion between foreground size and target aspect ratio, thus achieving local smoothness after pruning. By focusing on the content and structure of the foreground, our PruneRepaint approach adaptively avoids key content loss and deformation, while effectively mitigating artifacts with local repainting. We conduct experiments on the public RetargetMe benchmark and demonstrate through objective experimental results and subjective user studies that our method outperforms previous approaches in terms of preserving semantics and aesthetics, as well as better generalization across diverse aspect ratios. Codes will be available at https: //github. com/fhshen2022/PruneRepaint.

NeurIPS Conference 2024 Conference Paper

QT-ViT: Improving Linear Attention in ViT with Quadratic Taylor Expansion

  • Yixing Xu
  • Chao Li
  • Dong Li
  • Xiao Sheng
  • Fan Jiang
  • Lu Tian
  • Emad Barsoum

Vision transformer model (ViT) is widely used and performs well in vision tasks due to its ability to capture long-range dependencies. However, the time complexity and memory consumption increase quadratically with the number of input patches which limits the usage of ViT in real-world applications. Previous methods have employed linear attention to mitigate the complexity of the original self-attention mechanism at the expense of effectiveness. In this paper, we propose QT-ViT models that improve the previous linear self-attention using quadratic Taylor expansion. Specifically, we substitute the softmax-based attention with second-order Taylor expansion, and then accelerate the quadratic expansion by reducing the time complexity with a fast approximation algorithm. The proposed method capitalizes on the property of quadratic expansion to achieve superior performance while employing linear approximation for fast inference. Compared to previous studies of linear attention, our approach does not necessitate knowledge distillation or high-order attention residuals to facilitate the training process. Extensive experiments demonstrate the efficiency and effectiveness of the proposed QT-ViTs, showcasing the state-of-the-art results. Particularly, the proposed QT-ViTs consistently surpass the previous SOTA EfficientViTs under different model sizes, and achieve a new Pareto-front in terms of accuracy and speed.

NeurIPS Conference 2024 Conference Paper

ScaleKD: Strong Vision Transformers Could Be Excellent Teachers

  • Jiawei Fan
  • Chao Li
  • Xiaolong Liu
  • Anbang Yao

In this paper, we question if well pre-trained vision transformer (ViT) models could be used as teachers that exhibit scalable properties to advance cross architecture knowledge distillation research, in the context of adopting mainstream large-scale visual recognition datasets for evaluation. To make this possible, our analysis underlines the importance of seeking effective strategies to align (1) feature computing paradigm differences, (2) model scale differences, and (3) knowledge density differences. By combining three closely coupled components namely *cross attention projector*, *dual-view feature mimicking* and *teacher parameter perception* tailored to address the alignment problems stated above, we present a simple and effective knowledge distillation method, called *ScaleKD*. Our method can train student backbones that span across a variety of convolutional neural network (CNN), multi-layer perceptron (MLP), and ViT architectures on image classification datasets, achieving state-of-the-art knowledge distillation performance. For instance, taking a well pre-trained Swin-L as the teacher model, our method gets 75. 15\%|82. 03\%|84. 16\%|78. 63\%|81. 96\%|83. 93\%|83. 80\%|85. 53\% top-1 accuracies for MobileNet-V1|ResNet-50|ConvNeXt-T|Mixer-S/16|Mixer-B/16|ViT-S/16|Swin-T|ViT-B/16 models trained on ImageNet-1K dataset from scratch, showing 3. 05\%|3. 39\%|2. 02\%|4. 61\%|5. 52\%|4. 03\%|2. 62\%|3. 73\% absolute gains to the individually trained counterparts. Intriguingly, when scaling up the size of teacher models or their pre-training datasets, our method showcases the desired scalable properties, bringing increasingly larger gains to student models. We also empirically show that the student backbones trained by our method transfer well on downstream MS-COCO and ADE20K datasets. More importantly, our method could be used as a more efficient alternative to the time-intensive pre-training paradigm for any target student model on large-scale datasets if a strong pre-trained ViT is available, reducing the amount of viewed training samples up to 195$\times$. The code is available at *https: //github. com/deep-optimization/ScaleKD*.

IJCAI Conference 2024 Conference Paper

STAR: Spatio-Temporal State Compression for Multi-Agent Tasks with Rich Observations

  • Chao Li
  • Yujing Hu
  • Shangdong Yang
  • Tangjie Lv
  • Changjie Fan
  • Wenbin Li
  • Chongjie Zhang
  • Yang Gao

This paper focuses on the problem of learning compressed state representations for multi-agent tasks. Under the assumption of rich observation, we pinpoint that the state representations should be compressed both spatially and temporally to enable efficient prioritization of task-relevant features, while existing works typically fail. To overcome this limitation, we propose a novel method named Spatio-Temporal stAte compRession (STAR) that explicitly defines both spatial and temporal compression operations on the learned state representations to encode per-agent task-relevant features. Specifically, we first formalize this problem by introducing Task Informed Partially Observable Stochastic Game (TI-POSG). Then, we identify the spatial representation compression in it as encoding the latent states from the joint observations of all agents, and achieve this by learning representations that approximate the latent states based on the information theoretical principle. After that, we further extract the task-relevant features of each agent from these representations by aligning them based on their reward similarities, which is regarded as the temporal representation compression. Structurally, we implement these two compression by learning a set of agent-specific decoding functions and incorporate them into a critic shared by agents for scalable learning. We evaluate our method by developing decentralized policies on 12 maps of the StarCraft Multi-Agent Challenge benchmark, and the superior performance demonstrates its effectiveness.

NeurIPS Conference 2023 Conference Paper

Augmentation-Free Dense Contrastive Knowledge Distillation for Efficient Semantic Segmentation

  • Jiawei Fan
  • Chao Li
  • Xiaolong Liu
  • Meina Song
  • Anbang Yao

In recent years, knowledge distillation methods based on contrastive learning have achieved promising results on image classification and object detection tasks. However, in this line of research, we note that less attention is paid to semantic segmentation. Existing methods heavily rely on data augmentation and memory buffer, which entail high computational resource demands when applying them to handle semantic segmentation that requires to preserve high-resolution feature maps for making dense pixel-wise predictions. In order to address this problem, we present Augmentation-free Dense Contrastive Knowledge Distillation (Af-DCD), a new contrastive distillation learning paradigm to train compact and accurate deep neural networks for semantic segmentation applications. Af-DCD leverages a masked feature mimicking strategy, and formulates a novel contrastive learning loss via taking advantage of tactful feature partitions across both channel and spatial dimensions, allowing to effectively transfer dense and structured local knowledge learnt by the teacher model to a target student model while maintaining training efficiency. Extensive experiments on five mainstream benchmarks with various teacher-student network pairs demonstrate the effectiveness of our approach. For instance, DeepLabV3-Res18|DeepLabV3-MBV2 model trained by Af-DCD reaches 77. 03\%|76. 38\% mIOU on Cityscapes dataset when choosing DeepLabV3-Res101 as the teacher, setting new performance records. Besides that, Af-DCD achieves an absolute mIOU improvement of 3. 26\%|3. 04\%|2. 75\%|2. 30\%|1. 42\% compared with individually trained counterpart on Cityscapes|Pascal VOC|Camvid|ADE20K|COCO-Stuff-164K. Code is available at https: //github. com/OSVAI/Af-DCD.

AAMAS Conference 2023 Conference Paper

Centralized Cooperative Exploration Policy for Continuous Control Tasks

  • Chao Li
  • Chen Gong
  • Qiang He
  • Xinwen Hou
  • Yu Liu

Despite recent works making great progress in continuous control tasks, exploration in these tasks has remained insufficiently investigated. This paper proposes CCEP (Centralized Cooperative Exploration Policy), which utilizes estimation biases of value functions to contribute to the exploration capacity. CCEP keeps two value functions initialized with different parameters, and generates diverse policies with multiple exploration styles from a pair of value functions. In addition, a centralized policy framework ensures that CCEP achieves message delivery between multiple policies, furthermore contributing to exploring the environment cooperatively. Extensive experimental results demonstrate that CCEP achieves higher exploration capacity. Empirical analysis shows diverse exploration styles in the learned policies by CCEP, reaping benefits in more exploration regions. Besides, the exploration capabilities of CCEP have been demonstrated to outperform current state-ofthe-art methods on multiple continuous control tasks.

AAAI Conference 2023 Conference Paper

Differentiable Meta Multigraph Search with Partial Message Propagation on Heterogeneous Information Networks

  • Chao Li
  • Hao Xu
  • Kun He

Heterogeneous information networks (HINs) are widely employed for describing real-world data with intricate entities and relationships. To automatically utilize their semantic information, graph neural architecture search has recently been developed for various tasks of HINs. Existing works, on the other hand, show weaknesses in instability and inflexibility. To address these issues, we propose a novel method called Partial Message Meta Multigraph search (PMMM) to automatically optimize the neural architecture design on HINs. Specifically, to learn how graph neural networks (GNNs) propagate messages along various types of edges, PMMM adopts an efficient differentiable framework to search for a meaningful meta multigraph, which can capture more flexible and complex semantic relations than a meta graph. The differentiable search typically suffers from performance instability, so we further propose a stable algorithm called partial message search to ensure that the searched meta multigraph consistently surpasses the manually designed meta-structures, i.e., meta-paths. Extensive experiments on six benchmark datasets over two representative tasks, including node classification and recommendation, demonstrate the effectiveness of the proposed method. Our approach outperforms the state-of-the-art heterogeneous GNNs, finds out meaningful meta multigraphs, and is significantly more stable. Our code is available at https://github.com/JHL-HUST/PMMM.

IROS Conference 2023 Conference Paper

DRKF: Distilled Rotated Kernel Fusion for Efficient Rotation Invariant Descriptors in Local Feature Matching

  • Ranran Huang 0001
  • Jiancheng Cai
  • Zhuoyuan Wu
  • Xinmin Liu
  • Zhenhua Chai
  • Chao Li

The performance of local feature descriptors degrades in the presence of large rotation variations. To address this issue, we present an efficient approach to learning rotation invariant descriptors. Specifically, we propose Rotated Kernel Fusion (RKF) which imposes rotations on the convolution kernel to improve the inherent nature of CNN. Since RKF can be processed by the subsequent re-parameterization, no extra computational costs will be introduced in the inference stage. Moreover, we present Multi-oriented Feature Aggregation (MOFA) which aggregates features extracted from multiple rotated versions of the input image and can provide auxiliary knowledge for the training of RKF by leveraging the distillation strategy. We refer to the distilled RKF model as DRKF. Besides the evaluation on a rotation-augmented version of the public dataset HPatches, we also contribute a new dataset named DiverseBEV which is collected during the drone's flight and consists of bird's eye view images with large viewpoint changes and camera rotations. Extensive experiments show that our method can outperform other state-of-the-art techniques when exposed to large rotation variations.

NeurIPS Conference 2023 Conference Paper

Keep Various Trajectories: Promoting Exploration of Ensemble Policies in Continuous Control

  • Chao Li
  • Chen Gong
  • Qiang He
  • Xinwen Hou

The combination of deep reinforcement learning (DRL) with ensemble methods has been proved to be highly effective in addressing complex sequential decision-making problems. This success can be primarily attributed to the utilization of multiple models, which enhances both the robustness of the policy and the accuracy of value function estimation. However, there has been limited analysis of the empirical success of current ensemble RL methods thus far. Our new analysis reveals that the sample efficiency of previous ensemble DRL algorithms may be limited by sub-policies that are not as diverse as they could be. Motivated by these findings, our study introduces a new ensemble RL algorithm, termed \textbf{T}rajectories-awar\textbf{E} \textbf{E}nsemble exploratio\textbf{N} (TEEN). The primary goal of TEEN is to maximize the expected return while promoting more diverse trajectories. Through extensive experiments, we demonstrate that TEEN not only enhances the sample diversity of the ensemble policy compared to using sub-policies alone but also improves the performance over ensemble RL algorithms. On average, TEEN outperforms the baseline ensemble DRL algorithms by 41\% in performance on the tested representative environments.

ICLR Conference 2023 Conference Paper

NORM: Knowledge Distillation via N-to-One Representation Matching

  • Xiaolong Liu
  • Lujun Li 0001
  • Chao Li
  • Anbang Yao

Existing feature distillation methods commonly adopt the One-to-one Representation Matching between any pre-selected teacher-student layer pair. In this paper, we present $N$-to-$O$ne $R$epresentation $M$atching (NORM), a new two-stage knowledge distillation method, which relies on a simpleFeature Transform (FT) module consisting of two linear layers. In view of preserving the intact information learnt by the teacher network, during training, our FT module is merely inserted after the last convolutional layer of the student network. The first linear layer projects the student representation to a feature space having $N$ times feature channels than the teacher representation from the last convolutional layer, and the second linear layer contracts the expanded output back to the original feature space. By sequentially splitting the expanded student representation into $N$ non-overlapping feature segments having the same number of feature channels as the teacher's, they can be readily forced to approximate the intact teacher representation simultaneously, formulating a novel many-to-one representation matching mechanism conditioned on a single teacher-student layer pair. After training, such an FT module will be naturally merged into the subsequent fully connected layer thanks to its linear property, introducing no extra parameters or architectural modifications to the student network at inference. Extensive experiments on different visual recognition benchmarks demonstrate the leading performance of our method. For instance, the ResNet18|MobileNet|ResNet50-1/4 model trained by NORM reaches 72.14%|74.26%|68.03% top-1 accuracy on the ImageNet dataset when using a pre-trained ResNet34|ResNet50|ResNet50 model as the teacher, achieving an absolute improvement of 2.01%|4.63%|3.03% against the individually trained counterpart. Code is available at https://github.com/OSVAI/NORM.

NeurIPS Conference 2023 Conference Paper

RiskQ: Risk-sensitive Multi-Agent Reinforcement Learning Value Factorization

  • Siqi Shen
  • Chennan Ma
  • Chao Li
  • Weiquan Liu
  • Yongquan Fu
  • Songzhu Mei
  • Xinwang Liu
  • Cheng Wang

Multi-agent systems are characterized by environmental uncertainty, varying policies of agents, and partial observability, which result in significant risks. In the context of Multi-Agent Reinforcement Learning (MARL), learning coordinated and decentralized policies that are sensitive to risk is challenging. To formulate the coordination requirements in risk-sensitive MARL, we introduce the Risk-sensitive Individual-Global-Max (RIGM) principle as a generalization of the Individual-Global-Max (IGM) and Distributional IGM (DIGM) principles. This principle requires that the collection of risk-sensitive action selections of each agent should be equivalent to the risk-sensitive action selection of the central policy. Current MARL value factorization methods do not satisfy the RIGM principle for common risk metrics such as the Value at Risk (VaR) metric or distorted risk measurements. Therefore, we propose RiskQ to address this limitation, which models the joint return distribution by modeling quantiles of it as weighted quantile mixtures of per-agent return distribution utilities. RiskQ satisfies the RIGM principle for the VaR and distorted risk metrics. We show that RiskQ can obtain promising performance through extensive experiments. The source code of RiskQ is available in https: //github. com/xmu-rl-3dv/RiskQ.

IJCAI Conference 2023 Conference Paper

Spatially Covariant Lesion Segmentation

  • Hang Zhang
  • Rongguang Wang
  • Jinwei Zhang
  • Dongdong Liu
  • Chao Li
  • Jiahao Li

Compared to natural images, medical images usually show stronger visual patterns and therefore this adds flexibility and elasticity to resource-limited clinical applications by injecting proper priors into neural networks. In this paper, we propose spatially covariant pixel-aligned classifier (SCP) to improve the computational efficiency and meantime maintain or increase accuracy for lesion segmentation. SCP relaxes the spatial invariance constraint imposed by convolutional operations and optimizes an underlying implicit function that maps image coordinates to network weights, the parameters of which are obtained along with the backbone network training and later used for generating network weights to capture spatially covariant contextual information. We demonstrate the effectiveness and efficiency of the proposed SCP using two lesion segmentation tasks from different imaging modalities: white matter hyperintensity segmentation in magnetic resonance imaging and liver tumor segmentation in contrast-enhanced abdominal computerized tomography. The network using SCP has achieved 23. 8, 64. 9 and 74. 7 reduction in GPU memory usage, FLOPs, and network size with similar or better accuracy for lesion segmentation.

NeurIPS Conference 2023 Conference Paper

Transformed Low-Rank Parameterization Can Help Robust Generalization for Tensor Neural Networks

  • Andong Wang
  • Chao Li
  • Mingyuan Bai
  • Zhong Jin
  • Guoxu Zhou
  • Qibin Zhao

Multi-channel learning has gained significant attention in recent applications, where neural networks with t-product layers (t-NNs) have shown promising performance through novel feature mapping in the transformed domain. However, despite the practical success of t-NNs, the theoretical analysis of their generalization remains unexplored. We address this gap by deriving upper bounds on the generalization error of t-NNs in both standard and adversarial settings. Notably, it reveals that t-NNs compressed with exact transformed low-rank parameterization can achieve tighter adversarial generalization bounds compared to non-compressed models. While exact transformed low-rank weights are rare in practice, the analysis demonstrates that through adversarial training with gradient flow, highly over-parameterized t-NNs with the ReLU activation can be implicitly regularized towards a transformed low-rank parameterization under certain conditions. Moreover, this paper establishes sharp adversarial generalization bounds for t-NNs with approximately transformed low-rank weights. Our analysis highlights the potential of transformed low-rank parameterization in enhancing the robust generalization of t-NNs, offering valuable insights for further research and development.

JBHI Journal 2022 Journal Article

Classification of Wideband Tympanometry by Deep Transfer Learning With Data Augmentation for Automatic Diagnosis of Otosclerosis

  • Leixin Nie
  • Chao Li
  • Franck Marzani
  • Haibin Wang
  • Francois Thibouw
  • Alexis Bozorg Grayeli

Otosclerosis is a common disease of the middle ear leading to stapedial fixation. Its rapid and non-invasive diagnosis could be achieved through wideband tympanometry (WBT), but the interpretation of the raw data provided by this tool is complex and time-consuming. Convolutional neural networks (CNN) could potentially be applied to this situation to help the clinicians categorize WBT data. A dataset containing 135 samples from 80 patients with otosclerosis and 55 controls was obtained. We designed a lightweight CNN to categorize samples into the otosclerosis and control. Receiver operating characteristic (ROC) analysis showed an area under the curve (AUC) of $0. 95 \pm 0. 011$, and the F1-score was $0. 89 \pm 0. 031$ ( $r=10$ ). The performance was further improved by data augmentation schemes and transfer learning strategies (AUC: $0. 97 \pm 0. 010$, F1-score: $0. 94 \pm 0. 016$, $p< 0. 05$, ANOVA). Finally, the most relevant diagnostic features employed by the CNN were assessed via the activation pattern heatmaps. These results are crucial for the visual interpretation of WBT graphic outputs which clinicians use in routine, and for a better understanding of the WBT signal in relation to the ossicular mechanics.

AAAI Conference 2022 Conference Paper

Deep Incomplete Multi-View Clustering via Mining Cluster Complementarity

  • Jie Xu
  • Chao Li
  • Yazhou Ren
  • Liang Peng
  • Yujie Mo
  • Xiaoshuang Shi
  • Xiaofeng Zhu

Incomplete multi-view clustering (IMVC) is an important unsupervised approach to group the multi-view data containing missing data in some views. Previous IMVC methods suffer from the following issues: (1) the inaccurate imputation or padding for missing data negatively affects the clustering performance, (2) the quality of features after fusion might be interfered by the low-quality views, especially the inaccurate imputed views. To avoid these issues, this work presents an imputation-free and fusion-free deep IMVC framework. First, the proposed method builds a deep embedding feature learning and clustering model for each view individually. Our method then nonlinearly maps the embedding features of complete data into a high-dimensional space to discover linear separability. Concretely, this paper provides an implementation of the high-dimensional mapping as well as shows the mechanism to mine the multi-view cluster complementarity. This complementary information is then transformed to the supervised information with high confidence, aiming to achieve the multi-view clustering consistency for the complete data and incomplete data. Furthermore, we design an EM-like optimization strategy to alternately promote feature learning and clustering. Extensive experiments on real-world multi-view datasets demonstrate that our method achieves superior clustering performance over state-of-the-art methods.

IROS Conference 2022 Conference Paper

Exploring mmWave Radar and Camera Fusion for High-Resolution and Long-Range Depth Imaging

  • Akarsh Prabhakara
  • Diana Zhang
  • Chao Li
  • Sirajum Munir
  • Aswin C. Sankaranarayanan
  • Anthony Rowe 0001
  • Swarun Kumar

Robotic geo-fencing and surveillance systems require accurate monitoring of objects if/when they violate perimeter restrictions. In this paper, we seek a solution for depth imaging of such objects of interest at high accuracy (few tens of cm) over extended ranges (up to 300 meters) from a single vantage point, such as a pole mounted platform. Unfortunately, the rich literature in depth imaging using camera, lidar and radar in isolation struggles to meet these tight requirements in real-world conditions. This paper proposes Metamoran, a solution that explores long-range depth imaging of objects of interest by fusing the strengths of two complementary technologies: mmWave radar and camera. Unlike cameras, mmWave radars offer excellent cm-scale depth resolution even at very long ranges. However, their angular resolution is at least 10x worse than camera systems. Fusing these two modalities is natural, but in scenes with high clutter and at long ranges, radar reflections are weak and experience spurious artifacts. Metamoran's core contribution is to leverage image segmentation and monocular depth estimation on camera images to help declutter radar and discover true object reflections. We perform a detailed evaluation of Metamoran's depth imaging capabilities in 400 diverse scenarios. Our evaluation shows that Metamoran estimates the depth of static objects up to 90 m away and moving objects up to 305 m away and with a median error of 28 cm, an improvement of 13 x over a naive radar+camera baseline and 23 x compared to monocular depth estimation.

AAAI Conference 2022 Conference Paper

Interpretable Generative Adversarial Networks

  • Chao Li
  • Kelu Yao
  • Jin Wang
  • Boyu Diao
  • Yongjun Xu
  • Quanshi Zhang

Learning a disentangled representation is still a challenge in the field of the interpretability of generative adversarial networks (GANs). This paper proposes a generic method to modify a traditional GAN into an interpretable GAN, which ensures that filters in an intermediate layer of the generator encode disentangled localized visual concepts. Each filter in the layer is supposed to consistently generate image regions corresponding to the same visual concept when generating different images. The interpretable GAN learns to automatically discover meaningful visual concepts without any annotations of visual concepts. The interpretable GAN enables people to modify a specific visual concept on generated images by manipulating feature maps of the corresponding filters in the layer. Our method can be broadly applied to different types of GANs. Experiments have demonstrated the effectiveness of our method.

ICLR Conference 2022 Conference Paper

Omni-Dimensional Dynamic Convolution

  • Chao Li
  • Aojun Zhou
  • Anbang Yao

Learning a single static convolutional kernel in each convolutional layer is the common training paradigm of modern Convolutional Neural Networks (CNNs). Instead, recent research in dynamic convolution shows that learning a linear combination of n convolutional kernels weighted with their input-dependent attentions can significantly improve the accuracy of light-weight CNNs, while maintaining efficient inference. However, we observe that existing works endow convolutional kernels with the dynamic property through one dimension (regarding the convolutional kernel number) of the kernel space, but the other three dimensions (regarding the spatial size, the input channel number and the output channel number for each convolutional kernel) are overlooked. Inspired by this, we present Omni-dimensional Dynamic Convolution (ODConv), a more generalized yet elegant dynamic convolution design, to advance this line of research. ODConv leverages a novel multi-dimensional attention mechanism with a parallel strategy to learn complementary attentions for convolutional kernels along all four dimensions of the kernel space at any convolutional layer. As a drop-in replacement of regular convolutions, ODConv can be plugged into many CNN architectures. Extensive experiments on the ImageNet and MS-COCO datasets show that ODConv brings solid accuracy boosts for various prevailing CNN backbones including both light-weight and large ones, e.g., 3.77%~5.71%|1.86%~3.72% absolute top-1 improvements to MobivleNetV2|ResNet family on the ImageNet dataset. Intriguingly, thanks to its improved feature learning ability, ODConv with even one single kernel can compete with or outperform existing dynamic convolution counterparts with multiple kernels, substantially reducing extra parameters. Furthermore, ODConv is also superior to other attention modules for modulating the output features or the convolutional weights. Code and models will be available at https://github.com/OSVAI/ODConv.

AAAI Conference 2020 Conference Paper

Beyond Unfolding: Exact Recovery of Latent Convex Tensor Decomposition Under Reshuffling

  • Chao Li
  • Mohammad Emtiyaz Khan
  • Zhun Sun
  • Gang Niu
  • Bo Han
  • Shengli Xie
  • Qibin Zhao

Exact recovery of tensor decomposition (TD) methods is a desirable property in both unsupervised learning and scientific data analysis. The numerical defects of TD methods, however, limit their practical applications on real-world data. As an alternative, convex tensor decomposition (CTD) was proposed to alleviate these problems, but its exact-recovery property is not properly addressed so far. To this end, we focus on latent convex tensor decomposition (LCTD), a practically widely-used CTD model, and rigorously prove a sufficient condition for its exact-recovery property. Furthermore, we show that such property can be also achieved by a more general model than LCTD. In the new model, we generalize the classic tensor (un-)folding into reshuffling operation, a more flexible mapping to relocate the entries of the matrix into a tensor. Armed with the reshuffling operations and exact-recovery property, we explore a totally novel application for (generalized) LCTD, i. e. , image steganography. Experimental results on synthetic data validate our theory, and results on image steganography show that our method outperforms the state-of-the-art methods.

AAAI Conference 2020 Conference Paper

Gated Convolutional Networks with Hybrid Connectivity for Image Classification

  • Chuanguang Yang
  • Zhulin An
  • Hui Zhu
  • Xiaolong Hu
  • Kun Zhang
  • Kaiqiang Xu
  • Chao Li
  • Yongjun Xu

We propose a simple yet effective method to reduce the redundancy of DenseNet by substantially decreasing the number of stacked modules by replacing the original bottleneck by our SMG module, which is augmented by local residual. Furthermore, SMG module is equipped with an efficient twostage pipeline, which aims to DenseNet-like architectures that need to integrate all previous outputs, i. e. , squeezing the incoming informative but redundant features gradually by hierarchical convolutions as a hourglass shape and then exciting it by multi-kernel depthwise convolutions, the output of which would be compact and hold more informative multi-scale features. We further develop a forget and an update gate by introducing the popular attention modules to implement the effective fusion instead of a simple addition between reused and new features. Due to the Hybrid Connectivity (nested combination of global dense and local residual) and Gated mechanisms, we called our network as the HCGNet. Experimental results on CIFAR and ImageNet datasets show that HCGNet is more prominently efficient than DenseNet, and can also significantly outperform state-of-the-art networks with less complexity. Moreover, HCGNet also shows the remarkable interpretability and robustness by network dissection and adversarial defense, respectively. On MS-COCO, HCGNet can consistently learn better features than popular backbones.

AAAI Conference 2020 Conference Paper

Robust Tensor Decomposition via Orientation Invariant Tubal Nuclear Norms

  • Andong Wang
  • Chao Li
  • Zhong Jin
  • Qibin Zhao

Low-rank tensor recovery has been widely applied to computer vision and machine learning. Recently, tubal nuclear norm (TNN) based optimization is proposed with superior performance as compared to other tensor nuclear norms. However, one major limitation is its orientation sensitivity due to low-rankness strictly defined along tubal orientation and it cannot simultaneously model spectral low-rankness in multiple orientations. To this end, we introduce two new tensor norms called OITNN-O and OITNN-L to exploit multi-orientational spectral low-rankness for an arbitrary K-way (K ≥ 3) tensors. We further formulate two robust tensor decomposition models via the proposed norms and develop two algorithms as the solutions. Theoretically, we establish non-asymptotic error bounds which can predict the scaling behavior of the estimation error. Experiments on real-world datasets demonstrate the superiority and effectiveness of the proposed norms.

IJCAI Conference 2020 Conference Paper

Visual Encoding and Decoding of the Human Brain Based on Shared Features

  • Chao Li
  • Baolin Liu
  • Jianguo Wei

Using a convolutional neural network to build visual encoding and decoding models of the human brain is a good starting point for the study on relationship between deep learning and human visual cognitive mechanism. However, related studies have not fully considered their differences. In this paper, we assume that only a portion of neural network features is directly related to human brain signals, which we call shared features. In the encoding process, we extract shared features from the lower and higher layers of the neural network, and then build a non-negative sparse map to predict brain activities. In the decoding process, we use back-propagation to reconstruct visual stimuli, and use dictionary learning and a deep image prior to improve the robustness and accuracy of the algorithm. Experiments on a public fMRI dataset confirm the rationality of the encoding models, and comparing with a recently proposed method, our reconstruction results obtain significantly higher accuracy.

AAAI Conference 2019 Conference Paper

Coupled CycleGAN: Unsupervised Hashing Network for Cross-Modal Retrieval

  • Chao Li
  • Cheng Deng
  • Lei Wang
  • De Xie
  • Xianglong Liu

In recent years, hashing has attracted more and more attention owing to its superior capacity of low storage cost and high query efficiency in large-scale cross-modal retrieval. Benefiting from deep leaning, continuously compelling results in cross-modal retrieval community have been achieved. However, existing deep cross-modal hashing methods either rely on amounts of labeled information or have no ability to learn an accuracy correlation between different modalities. In this paper, we proposed Unsupervised coupled Cycle generative adversarial Hashing networks (UCH), for cross-modal retrieval, where outer-cycle network is used to learn powerful common representation, and inner-cycle network is explained to generate reliable hash codes. Specifically, our proposed UCH seamlessly couples these two networks with generative adversarial mechanism, which can be optimized simultaneously to learn representation and hash codes. Extensive experiments on three popular benchmark datasets show that the proposed UCH outperforms the state-of-the-art unsupervised cross-modal hashing methods.

NeurIPS Conference 2019 Conference Paper

Cross-Modal Learning with Adversarial Samples

  • Chao Li
  • Shangqian Gao
  • Cheng Deng
  • De Xie
  • Wei Liu

With the rapid developments of deep neural networks, numerous deep cross-modal analysis methods have been presented and are being applied in widespread real-world applications, including healthcare and safety-critical environments. However, the recent studies on robustness and stability of deep neural networks show that a microscopic modification, known as adversarial sample, which is even imperceptible to humans, can easily fool a well-performed deep neural network and brings a new obstacle to deep cross-modal correlation exploring. In this paper, we propose a novel Cross-Modal correlation Learning with Adversarial samples, namely CMLA, which for the first time presents the existence of adversarial samples in cross-modal data. Moreover, we provide a simple yet effective adversarial sample learning method, where inter- and intra- modality similarity regularizations across different modalities are simultaneously integrated into the learning of adversarial samples. Finally, our proposed CMLA is demonstrated to be highly effective in cross-modal hashing based retrieval. Extensive experiments on two cross-modal benchmark datasets show that the adversarial examples produced by our CMLA are efficient in fooling a target deep cross-modal hashing network. On the other hand, such adversarial examples can significantly strengthen the robustness of the target network by conducting an adversarial training.

IJCAI Conference 2019 Conference Paper

Graph Convolutional Network Hashing for Cross-Modal Retrieval

  • Ruiqing Xu
  • Chao Li
  • Junchi Yan
  • Cheng Deng
  • Xianglong Liu

Deep network based cross-modal retrieval has recently made significant progress. However, bridging modality gap to further enhance the retrieval accuracy still remains a crucial bottleneck. In this paper, we propose a Graph Convolutional Hashing (GCH) approach, which learns modality-unified binary codes via an affinity graph. An end-to-end deep architecture is constructed with three main components: a semantic encoder module, two feature encoding networks, and a graph convolutional network (GCN). We design a semantic encoder as a teacher module to guide the feature encoding process, a. k. a. student module, for semantic information exploiting. Furthermore, GCN is utilized to explore the inherent similarity structure among data points, which will help to generate discriminative hash codes. Extensive experiments on three benchmark datasets demonstrate that the proposed GCH outperforms the state-of-the-art methods.

AAAI Conference 2019 Conference Paper

Semantic Adversarial Network with Multi-Scale Pyramid Attention for Video Classification

  • De Xie
  • Cheng Deng
  • Hao Wang
  • Chao Li
  • Dapeng Tao

Two-stream architecture have shown strong performance in video classification task. The key idea is to learn spatiotemporal features by fusing convolutional networks spatially and temporally. However, there are some problems within such architecture. First, it relies on optical flow to model temporal information, which are often expensive to compute and store. Second, it has limited ability to capture details and local context information for video data. Third, it lacks explicit semantic guidance that greatly decrease the classification performance. In this paper, we proposed a new two-stream based deep framework for video classification to discover spatial and temporal information only from RGB frames, moreover, the multi-scale pyramid attention (MPA) layer and the semantic adversarial learning (SAL) module is introduced and integrated in our framework. The MPA enables the network capturing global and local feature to generate a comprehensive representation for video, and the SAL can make this representation gradually approximate to the real video semantics in an adversarial manner. Experimental results on two public benchmarks demonstrate our proposed methods achieves state-of-the-art results on standard video datasets.

AAAI Conference 2019 Conference Paper

Tensor Ring Decomposition with Rank Minimization on Latent Space: An Efficient Approach for Tensor Completion

  • Longhao Yuan
  • Chao Li
  • Danilo Mandic
  • Jianting Cao
  • Qibin Zhao

In tensor completion tasks, the traditional low-rank tensor decomposition models suffer from the laborious model selection problem due to their high model sensitivity. In particular, for tensor ring (TR) decomposition, the number of model possibilities grows exponentially with the tensor order, which makes it rather challenging to find the optimal TR decomposition. In this paper, by exploiting the low-rank structure of the TR latent space, we propose a novel tensor completion method which is robust to model selection. In contrast to imposing the low-rank constraint on the data space, we introduce nuclear norm regularization on the latent TR factors, resulting in the optimization step using singular value decomposition (SVD) being performed at a much smaller scale. By leveraging the alternating direction method of multipliers (ADMM) scheme, the latent TR factors with optimal rank and the recovered tensor can be obtained simultaneously. Our proposed algorithm is shown to effectively alleviate the burden of TR-rank selection, thereby greatly reducing the computational cost. The extensive experimental results on both synthetic and real-world data demonstrate the superior performance and efficiency of the proposed approach against the state-of-the-art algorithms.

IJCAI Conference 2018 Conference Paper

Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation

  • Chao Li
  • Qiaoyong Zhong
  • Di Xie
  • Shiliang Pu

Skeleton-based human action recognition has recently drawn increasing attentions with the availability of large-scale skeleton datasets. The most crucial factors for this task lie in two aspects: the intra-frame representation for joint co-occurrences and the inter-frame representation for skeletons' temporal evolutions. In this paper we propose an end-to-end convolutional co-occurrence feature learning framework. The co-occurrence features are learned with a hierarchical methodology, in which different levels of contextual information are aggregated gradually. Firstly point-level information of each joint is encoded independently. Then they are assembled into semantic representation in both spatial and temporal domains. Specifically, we introduce a global spatial aggregation scheme, which is able to learn superior joint co-occurrence features over local aggregation. Besides, raw skeleton coordinates as well as their temporal difference are integrated with a two-stream paradigm. Experiments show that our approach consistently outperforms other state-of-the-arts on action recognition and detection benchmarks like NTU RGB+D, SBU Kinect Interaction and PKU-MMD.

IJCAI Conference 2018 Conference Paper

Data-driven Onboard Scheduling for an Autonomous Observation Satellite

  • Chao Li
  • Yingwu Chen
  • Patrick De Causmaecker

Observation requests for autonomous observation satellites are dynamically generated. Considering the limited computing resources, a data-driven onboard scheduling method combining AI techniques and polynomial-time heuristics is proposed in this work. To construct observation schedules, a framework with offline learning and onboard scheduling is adopted. A neural network is trained offline in ground stations to assign the scheduling priority to observation requests in the onboard scheduling, based on the optimized historical schedules obtained by genetic algorithms which are computationally demanding to run onboard. The computational simulations show that the performance of the scheduling heuristic is enhanced using the data-driven framework.

IJCAI Conference 2018 Conference Paper

Deep Joint Semantic-Embedding Hashing

  • Ning Li
  • Chao Li
  • Cheng Deng
  • Xianglong Liu
  • Xinbo Gao

Hashing has been widely deployed to large-scale image retrieval due to its low storage cost and fast query speed. Almost all deep hashing methods do not sufficiently discover semantic correlation from label information, which results in the learned hash codes less discriminative. In this paper, we propose a novel Deep Joint Semantic-Embedding Hashing (DSEH) approach that contains LabNet and ImgNet. Specifically, LabNet is explored to capture abundant semantic correlation between sample pairs and supervise ImgNet from semantic level and hash codes level, which is conductive to the generated hash codes being more discriminative and similarity-preserving. Extensive experiments on three benchmark datasets show that the proposed model outperforms the state-of-the-art methods.

IJCAI Conference 2018 Conference Paper

Generative Adversarial Positive-Unlabelled Learning

  • Ming Hou
  • Brahim Chaib-draa
  • Chao Li
  • Qibin Zhao

In this work, we consider the task of classifying binary positive-unlabeled (PU) data. The existing discriminative learning based PU models attempt to seek an optimal reweighting strategy for U data, so that a decent decision boundary can be found. However, given limited P data, the conventional PU models tend to suffer from overfitting when adapted to very flexible deep neural networks. In contrast, we are the first to innovate a totally new paradigm to attack the binary PU task, from perspective of generative learning by leveraging the powerful generative adversarial networks (GAN). Our generative positive-unlabeled (GenPU) framework incorporates an array of discriminators and generators that are endowed with different roles in simultaneously producing positive and negative realistic samples. We provide theoretical analysis to justify that, at equilibrium, GenPU is capable of recovering both positive and negative data distributions. Moreover, we show GenPU is generalizable and closely related to the semi-supervised classification. Given rather limited P data, experiments on both synthetic and real-world dataset demonstrate the effectiveness of our proposed framework. With infinite realistic and diverse sample streams generated from GenPU, a very flexible classifier can then be trained using deep neural networks.

JBHI Journal 2017 Journal Article

Canonical Polyadic Decomposition With Auxiliary Information for Brain–Computer Interface

  • Junhua Li
  • Chao Li
  • Andrzej Cichocki

Physiological signals are often organized in the form of multiple dimensions (e. g. , channel, time, task, and 3-D voxel), so it is better to preserve original organization structure when processing. Unlike vector-based methods that destroy data structure, canonical polyadic decomposition (CPD) aims to process physiological signals in the form of multiway array, which considers relationships between dimensions and preserves structure information contained by the physiological signal. Nowadays, CPD is utilized as an unsupervised method for feature extraction in a classification problem. After that, a classifier, such as support vector machine, is required to classify those features. In this manner, classification task is achieved in two isolated steps. We proposed supervised CPD by directly incorporating auxiliary label information during decomposition, by which a classification task can be achieved without an extra step of classifier training. The proposed method merges the decomposition and classifier learning together, so it reduces procedure of classification task compared with that of respective decomposition and classification. In order to evaluate the performance of the proposed method, three different kinds of signals, synthetic signal, EEG signal, and MEG signal, were used. The results based on evaluations of synthetic and real signals demonstrated that the proposed method is effective and efficient.

AAAI Conference 2017 Conference Paper

Web-Based Semantic Fragment Discovery for On-Line Lingual-Visual Similarity

  • Xiaoshuai Sun
  • Jiewei Cao
  • Chao Li
  • Lei Zhu
  • Heng Tao Shen

In this paper, we present an automatic approach for on-line discovery of visual-lingual semantic fragments from weakly labeled Internet images. Instead of learning region-entity correspondences from well-labeled image-sentence pairs, our approach directly collects and enhances the weakly labeled visual contents from the Web and constructs an adaptive visual representation which automatically links generic lingual phrases to their related visual contents. To ensure reliable and efficient semantic discovery, we adopt non-parametric density estimation to re-rank the related visual instances and proposed a fast self-similarity-based quality assessment method to identify the high-quality semantic fragments. The discovered semantic fragments provide an adaptive joint representation for texts and images, based on which lingual-visual similarity can be defined for further co-analysis of heterogeneous multimedia data. Experimental results on semantic fragment quality assessment, sentence-based image retrieval, automatic multimedia insertion and ordering demonstrated the effectiveness of the proposed framework. The experiments show that the proposed methods can make effective use of the Web knowledge, and are able to generate competitive results compared to state-of-the-art approaches in various tasks.

TAAS Journal 2016 Journal Article

Managing Server Clusters on Renewable Energy Mix

  • Chao Li
  • Rui Wang
  • Depei Qian
  • Tao Li

As climate change has become a global concern and server energy demand continues to soar, many IT companies have started to explore server clusters running on various renewable energy sources. Existing green data center designs often yield suboptimal performance as they only look at a certain specific type of energy source. This article explores data centers powered by hybrid renewable energy systems. We propose GreenWorks, a framework for HPC data centers running on a renewable energy mix. Specifically, GreenWorks features a cross-layer power management scheme tailored to the timing behaviors and capacity constraints of different energy sources. Using realistic workload traces and renewable energy data, we show that GreenWorks could provide a near-optimal workload performance (within 3% difference) on average. It can also reduce the worst-case performance degradation by 43% compared to the state-of-the-art design. Moreover, the performance improvements are based on carbon-neutral operations and are not at the cost of significant efficiency degradation and reduced battery lifecycle. Our technique becomes more efficient when servers become more energy proportional and can effectively handle the ever-increasing depth of renewable power penetration in green data centers.

AAAI Conference 2015 Conference Paper

Acronym Disambiguation Using Word Embedding

  • Chao Li
  • Lei Ji
  • Jun Yan

According to the website AcronymFinder. com which is one of the world's largest and most comprehensive dictionaries of acronyms, an average of 37 new human-edited acronym definitions are added every day. There are 379, 918 acronyms with 4, 766, 899 definitions on that site up to now, and each acronym has 12. 5 definitions on average. It is a very important research topic to identify what exactly an acronym means in a given context for document comprehension as well as for document retrieval. In this paper, we propose two word embedding based models for acronym disambiguation. Word embedding is to represent words in a continuous and multidimensional vector space, so that it is easy to calculate the semantic similarity between words by calculating the vector distance. We evaluate the models on MSH Dataset and ScienceWISE Dataset, and both models outperform the state-of-art methods on accuracy. The experimental results show that word embedding helps to improve acronym disambiguation.

AAAI Conference 2015 Conference Paper

Multi-tensor Completion with Common Structures

  • Chao Li
  • Qibin Zhao
  • Junhua Li
  • Andrzej Cichocki
  • Lili Guo

In multi-data learning, it is usually assumed that common latent factors exist among multi-datasets, but it may lead to deteriorated performance when datasets are heterogeneous and unbalanced. In this paper, we propose a novel common structure for multi-data learning. Instead of common latent factors, we assume that datasets share Common Adjacency Graph (CAG) structure, which is more robust to heterogeneity and unbalance of datasets. Furthermore, we utilize CAG structure to develop a new method for multi-tensor completion, which exploits the common structure in datasets to improve the completion performance. Numerical results demonstrate that the proposed method not only outperforms state-of-the-art methods for video in-painting, but also can recover missing data well even in cases that conventional methods are not applicable.