Arrow Research search

Author name cluster

Chao Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

74 papers
2 author rows

Possible papers

74

AAAI Conference 2026 Conference Paper

Approximation Algorithm for Constrained k-Center Clustering: A Local Search Approach

  • Chaoqi Jia
  • Longkun Guo
  • Kewen Liao
  • Zhigang Lu
  • Chao Chen
  • Jason Xue

Clustering is a long-standing research problem and a fundamental tool in AI and data analysis. The traditional k-center problem, known as a fundamental theoretical challenge in clustering, has a best possible approximation ratio of 2, and any improvement to a ratio of 2 - ε would imply P = NP. In this work, we study the constrained k-center clustering problem, where instance-level cannot-link (CL) and must-link (ML) constraints are incorporated as background knowledge. Although general CL constraints significantly increase the hardness of approximation, previous work has shown that disjoint CL sets permit constant-factor approximations. However, whether local search can achieve such a guarantee in this setting remains an open question. To this end, we propose a novel local search framework based on a transformation to a dominating matching set problem, achieving the best possible approximation ratio of 2. The experimental results on both real-world and synthetic datasets demonstrate that our algorithm outperforms baselines in solution quality.

AAAI Conference 2026 Conference Paper

CoCo-MILP: Inter-Variable Contrastive and Intra-Constraint Competitive MILP Solution Prediction

  • Tianle Pu
  • Jianing Li
  • Yingying Gao
  • Shixuan Liu
  • Zijie Geng
  • Haoyang Liu
  • Chao Chen
  • Changjun Fan

Mixed-Integer Linear Programming (MILP) is a cornerstone of combinatorial optimization, yet solving large-scale instances remains a significant computational challenge. Recently, Graph Neural Networks (GNNs) have shown promise in accelerating MILP solvers by predicting high-quality solutions. However, we identify that existing methods misalign with the intrinsic structure of MILP problems at two levels. At the leaning objective level, the Binary Cross-Entropy (BCE) loss treats variables independently, neglecting their relative priority and yielding plausible logits. At the model architecture level, standard GNN message passing inherently smooths the representations across variables, msking the natural competitive relationships within constraints. To address these challenges, we propose CoCo-MILP, which explicitly models inter-variable Contrast and intra-constraint Competition for advanced MILP solution prediction. At the objective level, CoCo-MILP introduces the Inter-Variable Contrastive Loss (VCL), which explicitly maximizes the embedding margin between variables assigned one versus zero. At the architectural level, we design an Intra-Constraint Competitive GNN layer that, instead of homogenizing features, learns to differentiate representations of competing variables within a constraint, capturing their exclusionary nature. Experimental results on standard benchmarks demonstrate that CoCo-MILP significantly outperforms existing learning-based approaches, reducing the solution gap by up to 68.12% compared to traditional solvers.

AAAI Conference 2026 Conference Paper

DW-DGAT: Dynamically Weighted Dual Graph Attention Network for Neurodegenerative Disease Diagnosis

  • Chengjia Liang
  • Zhenjiong Wang
  • Chao Chen
  • Ruizhi Zhang
  • Songxi Liang
  • Hai Xie
  • Haijun Lei
  • Zhongwei Huang

Parkinson's disease (PD) and Alzheimer's disease (AD) are the two most prevalent and incurable neurodegenerative diseases (NDs) worldwide, for which early diagnosis is critical to delay their progression. However, the high dimensionality of multi-metric data with diverse structural forms, the heterogeneity of neuroimaging and phenotypic data, and class imbalance collectively pose significant challenges to early ND diagnosis. To address these challenges, we propose a dynamically weighted dual graph attention network (DW-DGAT) that integrates: (1) a general-purpose data fusion strategy to merge three structural forms of multi-metric data; (2) a dual graph attention architecture based on brain regions and inter-sample relationships to extract both micro- and macro-level features; and (3) a class weight generation mechanism combined with two stable and effective loss functions to mitigate class imbalance. Rigorous experiments, based on the Parkinson Progression Marker Initiative (PPMI) and Alzhermer's Disease Neuroimaging Initiative (ADNI) studies, demonstrate the state-of-the-art performance of our approach.

AAAI Conference 2026 Conference Paper

FinMathBench: A Formula-Driven Benchmark for Evaluating LLMs’ Math Reasoning Capabilities in Finance

  • Yi He
  • Ping Wang
  • Shiqiang Xiong
  • Chao Chen
  • Haixiang Hu

Many existing financial math reasoning benchmarks suffer from data contamination and high manual construction costs. To address this, we propose a novel formula-driven approach to dynamically construct math reasoning benchmarks in finance. Our two-stage approach: (1) generates single-formula questions by LLMs using a "Mask-for-Solve" paradigm for ground truth answers, and (2) synthesizes multi-formula questions through hierarchical tree-based DAGs. Our approach ensures novelty (via LLMs' creativity) and controllability of difficulty (via DAG structure). Based on a self-constructed financial formula bank, we utilize the proposed method to build FinMathBench, the first formula-driven and fully LLM-generated benchmark aimed at assessing LLMs' math reasoning abilities in finance, containing 946 questions across 4 complexity levels. Evaluation results on 40 LLMs demonstrate significant accuracy drops in multi-formula questions, e.g., 72.9% (1-Formula) to 14.0% (4-Formula) for GPT-4o under Chain-of-Thought prompting. Three critical flaws of LLMs are also observed: poor direct calculation performance, bias toward frequently solved variables in formulas, and erroneous "correction" of valid but extreme financial values. These findings highlight gaps in current LLMs' domain-specific reasoning and underscore FinMathBench's value for advancing robust financial LLMs.

AAAI Conference 2026 Conference Paper

From Attribution to Action: Jointly ALIGNing Predictions and Explanations

  • Dongsheng Hong
  • Chao Chen
  • Yanhui Chen
  • Shanshan Lin
  • Zhihao Chen
  • Xiangwen Liao

Explanation-guided learning (EGL) has shown promise in aligning model predictions with interpretable reasoning, particularly in computer vision tasks. However, most approaches rely on external annotations or heuristic-based segmentation to supervise model explanations, which can be noisy, imprecise and difficult to scale. In this work, we provide both empirical and theoretical evidence that low-quality supervision signals can degrade model performance rather than improve it. In response, we propose ALIGN, a novel framework that jointly trains a classifier and a masker in an iterative manner. The masker learns to produce soft, task-relevant masks that highlight informative regions, while the classifier is optimized for both prediction accuracy and alignment between its saliency maps and the learned masks. By leveraging high-quality masks as guidance, ALIGN improves both interpretability and generalizability, showing its superiority across various settings. Experiments on the two domain generalization benchmarks, VLCS and Terra Incognita, show that ALIGN consistently outperforms six strong baselines in both in-distribution and out-of-distribution settings. Besides, ALIGN also yields superior explanation quality concerning sufficiency and comprehensiveness, highlighting its effectiveness in producing accurate and interpretable models.

AAAI Conference 2026 Conference Paper

Improved Streaming Algorithm for Fair k-Center Clustering

  • Longkun Guo
  • Zeyu Lin
  • Chaoqi Jia
  • Chao Chen

Many real-world applications call for incorporating fairness constraints into the k-center clustering problem, where the dataset is partitioned into m demographic groups, each with a specified upper bound on the number of centers to ensure fairness. Focusing on big data scenarios, this paper addresses the problem in a streaming setting, where data points arrive sequentially in a continuous stream. Leveraging a structure called the λ-independent center set, we propose a one-pass streaming algorithm that first computes a reserved set of points during the streaming process. In the post-streaming process, we then select centers from the reserved point set by analyzing three possible cases and transforming the most complex one into a specially constrained vertex-cover problem on an auxiliary graph. Our algorithm achieves an approximation ratio of 5 +? and memory complexity O(k log?), where? is the aspect ratio and? > 0 is any small constant. Furthermore, we extend our approach to semi-structured data streams, where data points arrive in groups. In this setting, we present a (3 +?)-approximation algorithm for m = 2, which can be readily adapted to solve the offline fair k-center problem, achieving an approximation ratio of 3 that matches the current state of the art. Lastly, we conduct extensive experiments to evaluate the performance of our approaches, demonstrating that they outperform existing baselines in both clustering cost and runtime efficiency.

AAAI Conference 2026 Conference Paper

MambaSeg: Harnessing Mamba for Accurate and Efficient Image-Event Semantic Segmentation

  • Fuqiang Gu
  • Yuanke Li
  • Xianlei Long
  • Kangping Ji
  • Chao Chen
  • Qingyi Gu
  • Zhenliang Ni

Semantic segmentation is a fundamental task in computer vision with wide-ranging applications, including autonomous driving and robotics. While RGB-based methods have achieved strong performance with CNNs and Transformers, their effectiveness degrades under fast motion, low-light, or high dynamic range conditions due to limitations of frame cameras. Event cameras offer complementary advantages such as high temporal resolution and low latency, yet lack color and texture, making them insufficient on their own. To address this, recent research has explored multimodal fusion of RGB and event data; however, many existing approaches are computationally expensive and focus primarily on spatial fusion, neglecting the temporal dynamics inherent in event streams. In this work, we propose MambaSeg, a novel dual-branch semantic segmentation framework that employs parallel Mamba encoders to efficiently model RGB images and event streams. To reduce cross-modal ambiguity, we introduce the Dual-Dimensional Interaction Module (DDIM), comprising a Cross-Spatial Interaction Module (CSIM) and a Cross-Temporal Interaction Module (CTIM), which jointly perform fine-grained fusion along both spatial and temporal dimensions. This design improves cross-modal alignment, reduces ambiguity, and leverages the complementary properties of each modality. Extensive experiments on the DDD17 and DSEC datasets demonstrate that MambaSeg achieves state-of-the-art segmentation performance while significantly reducing computational cost, showcasing its promise for efficient, scalable, and robust multimodal perception.

AAAI Conference 2026 Conference Paper

Optimized Algorithms for Text Clustering with LLM-Generated Constraints

  • Chaoqi Jia
  • Weihong Wu
  • Longkun Guo
  • Zhigang Lu
  • Chao Chen
  • Kok-Leong Ong

Clustering is a fundamental tool that has garnered significant interest across a wide range of applications including text analysis. To improve clustering accuracy, many researchers have proposed incorporating background knowledge, typically in the form of must‑link and cannot‑link constraints, to guide the clustering process. With the recent advent of large language models (LLMs), there is growing interest in improving clustering quality through LLM-based automatic constraint generation. In this paper, we propose a novel constraint‑generation approach that reduces resource consumption by generating constraint sets rather than using traditional pairwise constraints. This improves both query efficiency and constraint accuracy compared to state‑of‑the‑art methods. We further introduce a constrained clustering algorithm tailored to the characteristics of LLM-generated constraints. Our method incorporates a confidence threshold and a penalty mechanism to address potentially inaccurate constraints. We evaluate our approach on five text datasets, considering both the cost of constraint generation and overall clustering performance. The results show that our method achieves clustering accuracy comparable to the state-of-the-art algorithms while reducing the number of LLM queries by more than 20 times.

JBHI Journal 2026 Journal Article

Text-Driven Weakly Supervised OCT Lesion Segmentation With Structural Guidance

  • Jiaqi Yang
  • Nitish Mehta
  • Xiaoling Hu
  • Chao Chen
  • Chia-Ling Tsai

Accurate segmentation of Optical Coherence Tomography (OCT) images is crucial for diagnosing and monitoring retinal diseases. However, the labor-intensive nature of pixel-level annotation limits the scalability of supervised learning for large datasets. Weakly Supervised Semantic Segmentation (WSSS) offers a promising alternative by using weaker forms of supervision, such as image-level labels, to reduce the annotation burden. Despite its advantages, weak supervision inherently carries limited information. We propose a novel WSSS framework with only image-level labels for OCT lesion segmentation that integrates structural and text-driven guidance to produce high-quality, pixel-level pseudo labels. The framework employs two visual processing modules: one that processes the original OCT images and another that operates on layer segmentations augmented with anomalous signals, enabling the model to associate lesions with their corresponding anatomical layers. Complementing these visual cues, we leverage large-scale pretrained models to provide two forms of textual guidance: label-derived descriptions that encode local semantics, and domain-agnostic synthetic descriptions that, although expressed in natural image terms, capture spatial and relational semantics useful for generating globally consistent representations. By fusing these visual and textual features in a multi-modal framework, our method aligns semantic meaning with structural relevance, thereby improving lesion localization and segmentation performance. Experiments on three OCT datasets demonstrate state-of-the-art results, highlighting its potential to advance diagnostic accuracy and efficiency in medical imaging.

TMLR Journal 2025 Journal Article

A Theoretical Study of Neural Network Expressive Power via Manifold Topology

  • Jiachen Yao
  • Lingjie Yi
  • Mayank Goswami
  • Chao Chen

A prevalent assumption regarding real-world data is that it lies on or close to a low-dimensional manifold. When deploying a neural network on data manifolds, the required size, i.e., the number of neurons of the network, heavily depends on the intricacy of the underlying latent manifold. While significant advancements have been made in understanding the geometric attributes of manifolds, it's essential to recognize that topology, too, is a fundamental characteristic of manifolds. In this study, we investigate network expressive power in terms of the latent data manifold. Integrating both topological and geometric facets of the data manifold, we present a size upper bound of ReLU neural networks.

NeurIPS Conference 2025 Conference Paper

Controlling Thinking Speed in Reasoning Models

  • Zhengkai Lin
  • Zhihang Fu
  • Ze Chen
  • Chao Chen
  • Liang Xie
  • Wenxiao Wang
  • Deng Cai
  • Zheng Wang

Human cognition is theorized to operate in two modes: fast, intuitive System 1 thinking and slow, deliberate System 2 thinking. While current Large Reasoning Models (LRMs) excel at System 2 thinking, their inability to perform fast thinking leads to high computational overhead and latency. In this work, we enable LRMs to approximate human intelligence through dynamic thinking speed adjustment, optimizing accuracy-efficiency trade-offs. Our approach addresses two key questions: (1) how to control thinking speed in LRMs, and (2) when to adjust it for optimal performance. For the first question, we identify the steering vector that governs slow-fast thinking transitions in LRMs' representation space. Using this vector, we achieve the first representation editing-based test-time scaling effect, outperforming existing prompt-based scaling methods. For the second question, we apply real-time difficulty estimation to signal reasoning segments of varying complexity. Combining these techniques, we propose the first reasoning strategy that enables fast processing of easy steps and deeper analysis for complex reasoning. Without any training or additional cost, our plug-and-play method yields an average +1. 3\% accuracy with -8. 6\% token usage across leading LRMs and advanced reasoning benchmarks. All of our algorithms are implemented based on vLLM and are expected to support broader applications and inspire future research.

JMLR Journal 2025 Journal Article

Enhancing Graph Representation Learning with Localized Topological Features

  • Zuoyu Yan
  • Qi Zhao
  • Ze Ye
  • Tengfei Ma
  • Liangcai Gao
  • Zhi Tang
  • Yusu Wang
  • Chao Chen

Representation learning on graphs is a fundamental problem that can be crucial in various tasks. Graph neural networks, the dominant approach for graph representation learning, are limited in their representation power. Therefore, it can be beneficial to explicitly extract and incorporate high-order topological and geometric information into these models. In this paper, we propose a principled approach to extract the rich connectivity information of graphs based on the theory of persistent homology. Our method utilizes the topological features to enhance the representation learning of graph neural networks and achieve state-of-the-art performance on various node classification and link prediction benchmarks. We also explore the option of end-to-end learning of the topological features, i.e., treating topological computation as a differentiable operator during learning. Our theoretical analysis and empirical study provide insights and potential guidelines for employing topological features in graph learning tasks. [abs] [ pdf ][ bib ] &copy JMLR 2025. ( edit, beta )

NeurIPS Conference 2025 Conference Paper

Focus-Then-Reuse: Fast Adaptation in Visual Perturbation Environments

  • Jiahui Wang
  • Chao Chen
  • Jiacheng Xu
  • Zongzhang Zhang
  • Yang Yu

Visual reinforcement learning has shown promise in various real-world applications. However, deploying policies in complex real-world environments with visual perturbations remains a significant challenge. We notice that humans tend to filter information at the object level prior to decision-making, facilitating efficient skill transfer across different contexts. Inspired by this, we introduce Focus-Then-Reuse (FTR), a method utilizing a novel object selection mechanism to focus on task-relevant objects, and directly reuse the simulation-trained policy on them. The training of the object selection mechanism integrates prior knowledge from a vision-language model and feedback from the environment. Experimental results on challenging tasks based on DeepMind Control Suite and Franka Emika Robotics demonstrate that FTR enables rapid adaptation in visual perturbation environments and achieves state-of-the-art performance. The source code is available at https: //github. com/LAMDA-RL/FTR.

JBHI Journal 2025 Journal Article

Incomplete Multi-view Data Learning via Adaptive Embedding and Partial l 2,1 Norm Constraints for Parkinson's Disease Diagnosis

  • Zhongwei Huang
  • Kai Wang
  • Chao Chen
  • Jianxia Chen
  • Jun Wan
  • Zhi Yang
  • Ran Zhou
  • Haitao Gan

Parkinson's disease (PD) is a progressive neurodegenerative disorder characterized by mental abnormalities and motor dysfunction. Its early classification and prediction of clinical scores have been major concerns for researchers. Currently, multi-view data learning has become an essential research area due to the capacity of multiple views to provide complementary insights from various perspectives. However, the discontinuous distribution, data missing complexity, small sample size, and redundant features in multi-view datasets pose a substantial obstacle, and most existing multi-view learning methods are unable to handle these challenges effectively. In this study, we propose a novel incomplete multi-view data learning framework (IMVDL) via dynamic embedding and partial l 2, 1 norm constraints for PD diagnosis. Specifically, multi-view dynamic embedding can adapt to any view missing scene, thereby linearly/nonlinearly mapping incomplete multi-view data to low-dimensional manifold spaces and generating complete multi-view data representations. The partial l 2, 1 norm constraint can ignore larger feature weight values and perform l 2, 1 norm sparse on the remaining weights, thereby avoiding the sparse bias problem caused by larger weight values. An efficient iterative algorithm is derived to find the optimal solution of the IMVDL method. We conduct extensive experiments using multi-modal neuroimage data from the Parkinson's Progression Markers Initiative (PPMI) database. The results demonstrate that the IMVDL method is superior to other comparative methods. The source code for IMVDL is available at https://github.com/a610lab/IMVDL/.

NeurIPS Conference 2025 Conference Paper

MATCH: Multi-faceted Adaptive Topo-Consistency for Semi-Supervised Histopathology Segmentation

  • Meilong Xu
  • Xiaoling Hu
  • Shahira Abousamra
  • Chen Li
  • Chao Chen

In semi-supervised segmentation, capturing meaningful semantic structures from unlabeled data is essential. This is particularly challenging in histopathology image analysis, where objects are densely distributed. To address this issue, we propose a semi-supervised segmentation framework designed to robustly identify and preserve relevant topological features. Our method leverages multiple perturbed predictions obtained through stochastic dropouts and temporal training snapshots, enforcing topological consistency across these varied outputs. This consistency mechanism helps distinguish biologically meaningful structures from transient and noisy artifacts. A key challenge in this process is to accurately match the corresponding topological features across the predictions in the absence of ground truth. To overcome this, we introduce a novel matching strategy that integrates spatial overlap with global structural alignment, minimizing discrepancies among predictions. Extensive experiments demonstrate that our approach effectively reduces topological errors, resulting in more robust and accurate segmentations essential for reliable downstream analysis. Code is available at https: //github. com/Melon-Xu/MATCH.

AAAI Conference 2025 Conference Paper

MSR: A Multifaceted Self-Retrieval Framework for Microscopic Cascade Prediction

  • Dongsheng Hong
  • Chao Chen
  • Xujia Li
  • Shuhui Wang
  • Wen Lin
  • Xiangwen Liao

The microscopic cascade prediction task has wide applications in downstream areas like ''rumor detection''. Its goal is to forecast the diffusion routines of information cascade within networks. Existing works typically formulate it as a classification task, which fails to well align with the Social Homophily assumption, as it just use the features of ''infected'' users while neglecting those of ''uninfected'' users in representation learning. Moreover, these methods focus primarily on social relationships, thereby dismissing other vital dimensions like users' historical behavior and the underlying preferences behind it. To address these challenges, we introduce the MSR (Multifaceted Self-Retrieval) framework. During encoding, in addition to the existing social graph, we construct a preference graph to represent ''behavioral preferences'' and further propose a modified multi-channel GRAU for multi-view analysis of cascade phenomenon. For decoding, our approach diverges from classification-based methods by reformulating the task as an information retrieval problem that predicts the target user with similarity measures. Empirical evaluations on public datasets demonstrate that this framework significantly outperforms baselines on Hits@κ and MAP@κ, affirming its enhanced ability.

ICRA Conference 2025 Conference Paper

NYC-Event-VPR: A Large-Scale High-Resolution Event-Based Visual Place Recognition Dataset in Dense Urban Environments

  • Taiyi Pan
  • Junyang He
  • Chao Chen
  • Yiming Li 0003
  • Chen Feng 0002

Visual place recognition (VPR) enables autonomous robots to identify previously visited locations, which contributes to tasks like simultaneous localization and mapping (SLAM). VPR faces challenges such as accurate image neighbor retrieval and appearance change in scenery. Event cameras, also known as dynamic vision sensors, are a new sensor modality for VPR and offer a promising solution to the challenges with their unique attributes: high temporal resolution (1MHz clock), ultra-low latency (in μs), and high dynamic range (>120dB). These attributes make event cameras less susceptible to motion blur and more robust in variable lighting conditions, making them suitable for addressing VPR challenges. However, the scarcity of event-based VPR datasets, partly due to the novelty and cost of event cameras, hampers their adoption. To fill this data gap, our paper introduces the NYC-Event-VPR dataset to the robotics and computer vision communities, featuring the Prophesee IMX636 HD event sensor (1280x720 resolution), combined with RGB camera and GPS module. It encompasses over 13 hours of geotagged event data, spanning 260 kilometers across New York City, covering diverse lighting and weather conditions, day/night scenarios, and multiple visits to various locations. Furthermore, our paper employs three frameworks to conduct generalization performance assessments, promoting innovation in event-based VPR and its integration into robotics applications.

NeurIPS Conference 2025 Conference Paper

RoME: Domain-Robust Mixture-of-Experts for MILP Solution Prediction across Domains

  • Tianle Pu
  • Zijie Geng
  • Haoyang Liu
  • Shixuan Liu
  • Jie Wang
  • Li Zeng
  • Chao Chen
  • Changjun Fan

Mixed-Integer Linear Programming (MILP) is a fundamental and powerful framework for modeling complex optimization problems across diverse domains. Recently, learning-based methods have shown great promise in accelerating MILP solvers by predicting high-quality solutions. However, most existing approaches are developed and evaluated in single-domain settings, limiting their ability to generalize to unseen problem distributions. This limitation poses a major obstacle to building scalable and general-purpose learning-augmented solvers. To address this challenge, we introduce RoME, a domain-Robust Mixture-of-Experts (MoE) framework for predicting MILP solutions across domains. RoME dynamically routes problem instances to specialized experts based on learned task embeddings. The model is trained using a two-level distributionally robust optimization strategy: inter-domain to mitigate global shifts across domains, and intra-domain to enhance local robustness by introducing perturbations on task embeddings. We reveal that cross-domain training not only enhances the model's generalization capability to unseen domains but also improves performancewithin each individual domain by encouraging the model to capture more general intrinsic combinatorial patterns. Specifically, a single RoME model trained on three domains achieves an average improvement of $67. 7\%$ then evaluated on five diverse domains. We further test the pretrained model on MIPLIB in a zero-shot setting, demonstrating its ability to deliver measurable performance gains on challenging real-world instances where existing learning-based approaches often struggle to generalize.

AAAI Conference 2025 Conference Paper

Sharpening Neural Implicit Functions with Frequency Consolidation Priors

  • Chao Chen
  • Yu-Shen Liu
  • Zhizhong Han

Signed Distance Functions (SDFs) are vital implicit representations to represent high fidelity 3D surfaces. Current methods mainly leverage a neural network to learn an SDF from various supervisions including signed distances, 3D point clouds, or multi-view images. However, due to various reasons including the bias of neural network on low frequency content, 3D unaware sampling, sparsity in point clouds, or low resolutions of images, neural implicit representations still struggle to represent geometries with high frequency components like sharp structures, especially for the ones learned from images or point clouds. To overcome this challenge, we introduce a method to sharpen a low frequency SDF observation by recovering its high frequency components, pursuing a sharper and more complete surface. Our key idea is to learn a mapping from a low frequency observation to a full frequency coverage in a data-driven manner, leading to a prior knowledge of shape consolidation in the frequency domain, dubbed frequency consolidation priors. To better generalize a learned prior to unseen shapes, we introduce to represent frequency components as embeddings and disentangle the embedding of the low frequency component from the embedding of the full frequency component. This disentanglement allows the prior to generalize on an unseen low frequency observation by simply recovering its full frequency embedding through a test-time self-reconstruction. Our evaluations under widely used benchmarks or real scenes show that our method can recover high frequency component and produce more accurate surfaces than the latest methods.

AAAI Conference 2025 Conference Paper

UniTR: A Unified Framework for Joint Representation Learning of Trajectories and Road Networks

  • Jie Zhao
  • Chao Chen
  • Yuanshao Zhu
  • Mingyu Deng
  • Yuxuan Liang

Representation learning of urban spatial-temporal data is fundamental and critical, serving a wide range of intelligent applications. Given that road networks and trajectories are inherently interrelated, their joint representation learning can significantly enhance the accuracy and utility of these applications. However, effectively learning joint representations for these two types of data remains challenging, particularly due to the complexities of interaction modeling and cross-scale optimization. To this end, we propose a unified framework, named UniTR, for joint representation learning of road networks and trajectories. Specifically, we first design a hierarchical propagation mechanism to model the complex many-to-many interactions between road networks and trajectories, thereby generating informative embeddings. Then, a triple-level contrastive optimization module is incorporated to systematically select valid positive and negative samples, further refining the embeddings. Experiments conducted on real-world datasets from two cities clearly demonstrate the effectiveness and superiority of UniTR.

ICLR Conference 2025 Conference Paper

Wasserstein-Regularized Conformal Prediction under General Distribution Shift

  • Rui Xu
  • Chao Chen
  • Yue Sun 0001
  • Parvathinathan Venkitasubramaniam
  • Sihong Xie

Conformal prediction yields a prediction set with guaranteed $1-\alpha$ coverage of the true target under the i.i.d. assumption, which can fail and lead to a gap between $1-\alpha$ and the actual coverage. Prior studies bound the gap using total variation distance, which cannot identify the gap changes under distribution shift at different $\alpha$, thus serving as a weak indicator of prediction set validity. Besides, existing methods are mostly limited to covariate shifts, while general joint distribution shifts are more common in practice but less researched. In response, we first propose a Wasserstein distance-based upper bound of the coverage gap and analyze the bound using probability measure pushforwards between the shifted joint data and conformal score distributions, enabling a separation of the effect of covariate and concept shifts over the coverage gap. We exploit the separation to design algorithms based on importance weighting and regularized representation learning (WR-CP) to reduce the Wasserstein bound with a finite-sample error bound. WR-CP achieves a controllable balance between conformal prediction accuracy and efficiency. Experiments on six datasets prove that WR-CP can reduce coverage gaps to 3.2% across different confidence levels and outputs prediction sets 37% smaller than the worst-case approach on average.

AAMAS Conference 2024 Conference Paper

Deep Anomaly Detection via Active Anomaly Search

  • Chao Chen
  • Dawei Wang
  • Feng Mao
  • Jiacheng Xu
  • Zongzhang Zhang
  • Yang Yu

Anomaly detection (AD) holds substantial practical value, and considering the limited labeled data, the semi-supervised anomaly detection technique has garnered increasing attention. We find that previous methods suffer from insufficient exploitation of labeled data and under-exploration of unlabeled data. To tackle the above problem, we aim to search for possible anomalies in unlabeled data and use the searched anomalies to enhance performance. We innovatively model this search process as a Markov decision process and utilize a reinforcement learning algorithm to solve it. Our method, Deep Anomaly Detection and Search (DADS), integrates the exploration of unlabeled data and the exploitation of labeled data into one framework. Experimentally, we compare DADS with several state-of-the-art methods in widely used benchmarks, and the results show that DADS can efficiently search anomalies from unlabeled data and learn from them, thus achieving good performance. Code: https: //github. com/LAMDA-RL/DADS

IJCAI Conference 2024 Conference Paper

Detector Collapse: Backdooring Object Detection to Catastrophic Overload or Blindness in the Physical World

  • Hangtao Zhang
  • Shengshan Hu
  • Yichen Wang
  • Leo Yu Zhang
  • Ziqi Zhou
  • Xianlong Wang
  • Yanjun Zhang
  • Chao Chen

Object detection tasks, crucial in safety-critical systems like autonomous driving, focus on pinpointing object locations. These detectors are known to be susceptible to backdoor attacks. However, existing backdoor techniques have primarily been adapted from classification tasks, overlooking deeper vulnerabilities specific to object detection. This paper is dedicated to bridging this gap by introducing Detector Collapse (DC), a brand-new backdoor attack paradigm tailored for object detection. DC is designed to instantly incapacitate detectors (i. e. , severely impairing detector's performance and culminating in a denial-of-service). To this end, we develop two innovative attack schemes: Sponge for triggering widespread misidentifications and Blinding for rendering objects invisible. Remarkably, we introduce a novel poisoning strategy exploiting natural objects, enabling DC to act as a practical backdoor in real-world environments. Our experiments on different detectors across several benchmarks show a significant improvement (~10%-60% absolute and ~2-7x relative) in attack efficacy over state-of-the-art attacks.

NeurIPS Conference 2024 Conference Paper

Enhancing LLM’s Cognition via Structurization

  • Kai Liu
  • Zhihang Fu
  • Chao Chen
  • Wei Zhang
  • Rongxin Jiang
  • Fan Zhou
  • Yaowu Chen
  • Yue Wu

When reading long-form text, human cognition is complex and structurized. While large language models (LLMs) process input contexts through a causal and sequential perspective, this approach can potentially limit their ability to handle intricate and complex inputs effectively. To enhance LLM’s cognition capability, this paper presents a novel concept of context structurization. Specifically, we transform the plain, unordered contextual sentences into well-ordered and hierarchically structurized elements. By doing so, LLMs can better grasp intricate and extended contexts through precise attention and information-seeking along the organized structures. Extensive evaluations are conducted across various model architectures and sizes (including a series of auto-regressive LLMs as well as BERT-like masking models) on a diverse set of NLP tasks (e. g. , context-based question-answering, exhaustive hallucination evaluation, and passage-level dense retrieval). Empirical results show consistent and significant performance gains afforded by a single-round structurization. In particular, we boost the open-sourced LLaMA2-70B model to achieve comparable performance against GPT-3. 5-Turbo as the halluci- nation evaluator. Besides, we show the feasibility of distilling advanced LLMs’ language processing abilities to a smaller yet effective StruXGPT-7B to execute structurization, addressing the practicality of our approach. Code is available at https: //github. com/alibaba/struxgpt.

AAAI Conference 2024 Conference Paper

Focus-Then-Decide: Segmentation-Assisted Reinforcement Learning

  • Chao Chen
  • Jiacheng Xu
  • Weijian Liao
  • Hao Ding
  • Zongzhang Zhang
  • Yang Yu
  • Rui Zhao

Visual Reinforcement Learning (RL) is a promising approach to achieve human-like intelligence. However, it currently faces challenges in learning efficiently within noisy environments. In contrast, humans can quickly identify task-relevant objects in distraction-filled surroundings by applying previously acquired common knowledge. Recently, foundational models in natural language processing and computer vision have achieved remarkable successes, and the common knowledge within these models can significantly benefit downstream task training. Inspired by these achievements, we aim to incorporate common knowledge from foundational models into visual RL. We propose a novel Focus-Then-Decide (FTD) framework, allowing the agent to make decisions based solely on task-relevant objects. To achieve this, we introduce an attention mechanism to select task-relevant objects from the object set returned by a foundational segmentation model, and only use the task-relevant objects for the subsequent training of the decision module. Additionally, we specifically employed two generic self-supervised objectives to facilitate the rapid learning of this attention mechanism. Experimental results on challenging tasks based on DeepMind Control Suite and Franka Emika Robotics demonstrate that our method can quickly and accurately pinpoint objects of interest in noisy environments. Consequently, it achieves a significant performance improvement over current state-of-the-art algorithms. Project Page: https://www.lamda.nju.edu.cn/chenc/FTD.html Code: https://github.com/LAMDA-RL/FTD

NeurIPS Conference 2024 Conference Paper

Inferring Neural Signed Distance Functions by Overfitting on Single Noisy Point Clouds through Finetuning Data-Driven based Priors

  • Chao Chen
  • Yu-Shen Liu
  • Zhizhong Han

It is important to estimate an accurate signed distance function (SDF) from a point cloud in many computer vision applications. The latest methods learn neural SDFs using either a data-driven based or an overfitting-based strategy. However, these two kinds of methods are with either poor generalization or slow convergence, which limits their capability under challenging scenarios like highly noisy point clouds. To resolve this issue, we propose a method to prompt pros of both data-driven based and overfitting-based methods for better generalization, faster inference, and higher accuracy in learning neural SDFs. We introduce a novel statistical reasoning algorithm in local regions which is able to finetune data-driven based priors without signed distance supervision, clean point cloud, or point normals. This helps our method start with a good initialization, and converge to a minimum in a much faster way. Our numerical and visual comparisons with the stat-of-the-art methods show our superiority over these methods in surface reconstruction and point cloud denoising on widely used shape and scene benchmarks. The code is available at https: //github. com/chenchao15/LocalN2NM.

TMLR Journal 2024 Journal Article

Learning to Abstain From Uninformative Data

  • Yikai Zhang
  • Songzhu Zheng
  • Mina Dalirrooyfard
  • Pengxiang Wu
  • Anderson Schneider
  • Anant Raj
  • Yuriy Nevmyvaka
  • Chao Chen

Learning and decision-making in domains with naturally high noise-to-signal ratios – such as Finance or Healthcare – is often challenging, while the stakes are very high. In this paper, we study the problem of learning and acting under a general noisy generative process. In this problem, the data distribution has a significant proportion of uninformative samples with high noise in the label, while part of the data contains useful information represented by low label noise. This dichotomy is present during both training and inference, which requires the proper handling of uninformative data during both training and testing. We propose a novel approach to learning under these conditions via a loss inspired by the selective learning theory. By minimizing this loss, the model is guaranteed to make a near-optimal decision by distinguishing informative data from uninformative data and making predictions. We build upon the strength of our theoretical guarantees by describing an iterative algorithm, which jointly optimizes both a predictor and a selector, and evaluates its empirical performance in a variety of settings.

NeurIPS Conference 2024 Conference Paper

Linear Uncertainty Quantification of Graphical Model Inference

  • Chenghua Guo
  • Han Yu
  • Jiaxin Liu
  • Chao Chen
  • Qi Li
  • Sihong Xie
  • Xi Zhang

Uncertainty Quantification (UQ) is vital for decision makers as it offers insights into the potential reliability of data and model, enabling more informed and risk-aware decision-making. Graphical models, capable of representing data with complex dependencies, are widely used across domains. Existing sampling-based UQ methods are unbiased but cannot guarantee convergence and are time-consuming on large-scale graphs. There are fast UQ methods for graphical models with closed-form solutions and convergence guarantee but with uncertainty underestimation. We propose LinUProp, a UQ method that utilizes a novel linear propagation of uncertainty to model uncertainty among related nodes additively instead of multiplicatively, to offer linear scalability, guaranteed convergence, and closed-form solutions without underestimating uncertainty. Theoretically, we decompose the expected prediction error of the graphical model and prove that the uncertainty computed by LinUProp is the generalized variance component of the decomposition. Experimentally, we demonstrate that LinUProp is consistent with the sampling-based method but with linear scalability and fast convergence. Moreover, LinUProp outperforms competitors in uncertainty-based active learning on four real-world graph datasets, achieving higher accuracy with a lower labeling budget.

NeurIPS Conference 2024 Conference Paper

MultiPull: Detailing Signed Distance Functions by Pulling Multi-Level Queries at Multi-Step

  • Takeshi Noda
  • Chao Chen
  • Weiqi Zhang
  • Xinhai Liu
  • Yu-Shen Liu
  • Zhizhong Han

Reconstructing a continuous surface from a raw 3D point cloud is a challenging task. Latest methods employ supervised learning or pretrained priors to learn a signed distance function (SDF). However, neural networks tend to smooth local details due to the lack of ground truth signed distnaces or normals, which limits the performance of learning-based methods in reconstruction tasks. To resolve this issue, we propose a novel method, named MultiPull, to learn multi-scale implicit fields from raw point clouds to optimize accurate SDFs from coarse to fine. We achieve this by mapping 3D query points into a set of frequency features, which makes it possible to leverage multi-level features during optimization. Meanwhile, we introduce optimization constraints from the perspective of spatial distance and normal consistency, which play a key role in point cloud reconstruction based on multi-scale optimization strategies. Our experiments on widely used object and scene benchmarks demonstrate that our method outperforms the state-of-the-art methods in surface reconstruction.

ICLR Conference 2024 Conference Paper

OWL: A Large Language Model for IT Operations

  • Hongcheng Guo
  • Jian Yang 0030
  • Jiaheng Liu
  • Liqun Yang
  • Linzheng Chai
  • Jiaqi Bai 0001
  • Junran Peng
  • Xiaorong Hu

With the rapid advancement of IT operations, managing and analyzing large data volumes efficiently for practical applications has become increasingly critical. Natural Language Processing (NLP) techniques have demonstrated remarkable capabilities in various tasks, including named entity recognition, machine translation, and dialogue systems. Recently, Large Language Models (LLMs) have achieved significant improvements across various domain-specific areas. However, there is a noticeable gap in the development of specialized Large Language Models (LLMs) tailored for IT operations. In this paper, we introduce the OWL, a large language model trained on our constructed Owl-Instruct with a wide range of IT-related information. Specifically, limited by the maximum input length, we propose the \textbf{H}omogeneous \textbf{M}arkov \textbf{C}ontext \textbf{E}xtension method (HMCE). The mixture-of-adapter strategy is leveraged to improve the parameter-efficient tuning across different domains or tasks. Further, we evaluate the performance of OWL on the Owl-Bench established by us and open IT-related benchmarks. OWL demonstrates superior performance results on IT tasks, which outperforms existing models by significant margins. Moreover, we hope that the findings of our work will provide more insights to revolutionize the techniques of IT operations with specialized LLMs.

NeurIPS Conference 2024 Conference Paper

Rethinking Out-of-Distribution Detection on Imbalanced Data Distribution

  • Kai Liu
  • Zhihang Fu
  • Sheng Jin
  • Chao Chen
  • Ze Chen
  • Rongxin Jiang
  • Fan Zhou
  • Yaowu Chen

Detecting and rejecting unknown out-of-distribution (OOD) samples is critical for deployed neural networks to void unreliable predictions. In real-world scenarios, however, the efficacy of existing OOD detection methods is often impeded by the inherent imbalance of in-distribution (ID) data, which causes significant performance decline. Through statistical observations, we have identified two common challenges faced by different OOD detectors: misidentifying tail class ID samples as OOD, while erroneously predicting OOD samples as head class from ID. To explain this phenomenon, we introduce a generalized statistical framework, termed ImOOD, to formulate the OOD detection problem on imbalanced data distribution. Consequently, the theoretical analysis reveals that there exists a class-aware bias item between balanced and imbalanced OOD detection, which contributes to the performance gap. Building upon this finding, we present a unified training-time regularization technique to mitigate the bias and boost imbalanced OOD detectors across architecture designs. Our theoretically grounded method translates into consistent improvements on the representative CIFAR10-LT, CIFAR100-LT, and ImageNet-LT benchmarks against several state-of-the-art OOD detection ap- proaches. Code is available at https: //github. com/alibaba/imood.

AAAI Conference 2024 Conference Paper

Sketch and Refine: Towards Fast and Accurate Lane Detection

  • Chao Chen
  • Jie Liu
  • Chang Zhou
  • Jie Tang
  • Gangshan Wu

Lane detection is to determine the precise location and shape of lanes on the road. Despite efforts made by current methods, it remains a challenging task due to the complexity of real-world scenarios. Existing approaches, whether proposal-based or keypoint-based, suffer from depicting lanes effectively and efficiently. Proposal-based methods detect lanes by distinguishing and regressing a collection of proposals in a streamlined top-down way, yet lack sufficient flexibility in lane representation. Keypoint-based methods, on the other hand, construct lanes flexibly from local descriptors, which typically entail complicated post-processing. In this paper, we present a “Sketch-and-Refine” paradigm that utilizes the merits of both keypoint-based and proposal-based methods. The motivation is that local directions of lanes are semantically simple and clear. At the “Sketch” stage, local directions of keypoints can be easily estimated by fast convolutional layers. Then we can build a set of lane proposals accordingly with moderate accuracy. At the “Refine” stage, we further optimize these proposals via a novel Lane Segment Association Module (LSAM), which allows adaptive lane segment adjustment. Last but not least, we propose multi-level feature integration to enrich lane feature representations more efficiently. Based on the proposed “Sketch-and-Refine” paradigm, we propose a fast yet effective lane detector dubbed “SRLane”. Experiments show that our SRLane can run at a fast speed (i.e., 278 FPS) while yielding an F1 score of 78.9%. The source code is available at: https://github.com/passerer/SRLane.

NeurIPS Conference 2024 Conference Paper

Training for Stable Explanation for Free

  • Chao Chen
  • Chenghua Guo
  • Rufeng Chen
  • Guixiang Ma
  • Ming Zeng
  • Xiangwen Liao
  • Xi Zhang
  • Sihong Xie

To foster trust in machine learning models, explanations must be faithful and stable for consistent insights. Existing relevant works rely on the $\ell_p$ distance for stability assessment, which diverges from human perception. Besides, existing adversarial training (AT) associated with intensive computations may lead to an arms race. To address these challenges, we introduce a novel metric to assess the stability of top-$k$ salient features. We introduce R2ET which trains for stable explanation by efficient and effective regularizer, and analyze R2ET by multi-objective optimization to prove numerical and statistical stability of explanations. Moreover, theoretical connections between R2ET and certified robustness justify R2ET's stability in all attacks. Extensive experiments across various data modalities and model architectures show that R2ET achieves superior stability against stealthy attacks, and generalizes effectively across different explanation methods. The code can be found at https: //github. com/ccha005/R2ET.

NeurIPS Conference 2023 Conference Paper

Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions

  • Kai Liu
  • Zhihang Fu
  • Chao Chen
  • Sheng Jin
  • Ze Chen
  • Mingyuan Tao
  • Rongxin Jiang
  • Jieping Ye

The key to OOD detection has two aspects: generalized feature representation and precise category description. Recently, vision-language models such as CLIP provide significant advances in both two issues, but constructing precise category descriptions is still in its infancy due to the absence of unseen categories. This work introduces two hierarchical contexts, namely perceptual context and spurious context, to carefully describe the precise category boundary through automatic prompt tuning. Specifically, perceptual contexts perceive the inter-category difference (e. g. , cats vs apples) for current classification tasks, while spurious contexts further identify spurious (similar but exactly not) OOD samples for every single category (e. g. , cats vs panthers, apples vs peaches). The two contexts hierarchically construct the precise description for a certain category, which is, first roughly classifying a sample to the predicted category and then delicately identifying whether it is truly an ID sample or actually OOD. Moreover, the precise descriptions for those categories within the vision-language framework present a novel application: CATegory-EXtensible OOD detection (CATEX). One can efficiently extend the set of recognizable categories by simply merging the hierarchical contexts learned under different sub-task settings. And extensive experiments are conducted to demonstrate CATEX’s effectiveness, robustness, and category-extensibility. For instance, CATEX consistently surpasses the rivals by a large margin with several protocols on the challenging ImageNet-1K dataset. In addition, we offer new insights on how to efficiently scale up the prompt engineering in vision-language models to recognize thousands of object categories, as well as how to incorporate large language models (like GPT-3) to boost zero-shot applications.

AAAI Conference 2023 Short Paper

Deep Anomaly Detection and Search via Reinforcement Learning (Student Abstract)

  • Chao Chen
  • Dawei Wang
  • Feng Mao
  • Zongzhang Zhang
  • Yang Yu

Semi-supervised anomaly detection is a data mining task which aims at learning features from partially-labeled datasets. We propose Deep Anomaly Detection and Search (DADS) with reinforcement learning. During the training process, the agent searches for possible anomalies in unlabeled dataset to enhance performance. Empirically, we compare DADS with several methods in the settings of leveraging known anomalies to detect both other known and unknown anomalies. Results show that DADS achieves good performance.

IJCAI Conference 2023 Conference Paper

Denial-of-Service or Fine-Grained Control: Towards Flexible Model Poisoning Attacks on Federated Learning

  • Hangtao Zhang
  • Zeming Yao
  • Leo Yu Zhang
  • Shengshan Hu
  • Chao Chen
  • Alan Liew
  • Zhetao Li

Federated learning (FL) is vulnerable to poisoning attacks, where adversaries corrupt the global aggregation results and cause denial-of-service (DoS). Unlike recent model poisoning attacks that optimize the amplitude of malicious perturbations along certain prescribed directions to cause DoS, we propose a flexible model poisoning attack (FMPA) that can achieve versatile attack goals. We consider a practical threat scenario where no extra knowledge about the FL system (e. g. , aggregation rules or updates on benign devices) is available to adversaries. FMPA exploits the global historical information to construct an estimator that predicts the next round of the global model as a benign reference. It then fine-tunes the reference model to obtain the desired poisoned model with low accuracy and small perturbations. Besides the goal of causing DoS, FMPA can be naturally extended to launch a fine-grained controllable attack, making it possible to precisely reduce the global accuracy. Armed with precise control, malicious FL service providers can gain advantages over their competitors without getting noticed, hence opening a new attack surface in FL other than DoS. Even for the purpose of DoS, experiments show that FMPA significantly decreases the global accuracy, outperforming six state-of-the-art attacks.

AAAI Conference 2023 Conference Paper

End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation

  • Mingrui Wu
  • Jiaxin Gu
  • Yunhang Shen
  • Mingbao Lin
  • Chao Chen
  • Xiaoshuai Sun

Most existing Human-Object Interaction (HOI) Detection methods rely heavily on full annotations with predefined HOI categories, which is limited in diversity and costly to scale further. We aim at advancing zero-shot HOI detection to detect both seen and unseen HOIs simultaneously. The fundamental challenges are to discover potential human-object pairs and identify novel HOI categories. To overcome the above challenges, we propose a novel End-to-end zero-shot HOI Detection (EoID) framework via vision-language knowledge distillation. We first design an Interactive Score module combined with a Two-stage Bipartite Matching algorithm to achieve interaction distinguishment for human-object pairs in an action-agnostic manner. Then we transfer the distribution of action probability from the pretrained vision-language teacher as well as the seen ground truth to the HOI model to attain zero-shot HOI classification. Extensive experiments on HICO-Det dataset demonstrate that our model discovers potential interactive pairs and enables the recognition of unseen HOIs. Finally, our method outperforms the previous SOTA under various zero-shot settings. Moreover, our method is generalizable to large-scale object detection data to further scale up the action sets. The source code is available at: https://github.com/mrwu-mac/EoID.

AAAI Conference 2023 Conference Paper

FoPro: Few-Shot Guided Robust Webly-Supervised Prototypical Learning

  • Yulei Qin
  • Xingyu Chen
  • Chao Chen
  • Yunhang Shen
  • Bo Ren
  • Yun Gu
  • Jie Yang
  • Chunhua Shen

Recently, webly supervised learning (WSL) has been studied to leverage numerous and accessible data from the Internet. Most existing methods focus on learning noise-robust models from web images while neglecting the performance drop caused by the differences between web domain and real-world domain. However, only by tackling the performance gap above can we fully exploit the practical value of web datasets. To this end, we propose a Few-shot guided Prototypical (FoPro) representation learning method, which only needs a few labeled examples from reality and can significantly improve the performance in the real-world domain. Specifically, we initialize each class center with few-shot real-world data as the ``realistic" prototype. Then, the intra-class distance between web instances and ``realistic" prototypes is narrowed by contrastive learning. Finally, we measure image-prototype distance with a learnable metric. Prototypes are polished by adjacent high-quality web images and involved in removing distant out-of-distribution samples. In experiments, FoPro is trained on web datasets with a few real-world examples guided and evaluated on real-world datasets. Our method achieves the state-of-the-art performance on three fine-grained datasets and two large-scale datasets. Compared with existing WSL methods under the same few-shot settings, FoPro still excels in real-world generalization. Code is available at https://github.com/yuleiqin/fopro.

AAAI Conference 2023 Conference Paper

From Coarse to Fine: Hierarchical Pixel Integration for Lightweight Image Super-resolution

  • Jie Liu
  • Chao Chen
  • Jie Tang
  • Gangshan Wu

Image super-resolution (SR) serves as a fundamental tool for the processing and transmission of multimedia data. Recently, Transformer-based models have achieved competitive performances in image SR. They divide images into fixed-size patches and apply self-attention on these patches to model long-range dependencies among pixels. However, this architecture design is originated for high-level vision tasks, which lacks design guideline from SR knowledge. In this paper, we aim to design a new attention block whose insights are from the interpretation of Local Attribution Map (LAM) for SR networks. Specifically, LAM presents a hierarchical importance map where the most important pixels are located in a fine area of a patch and some less important pixels are spread in a coarse area of the whole image. To access pixels in the coarse area, instead of using a very large patch size, we propose a lightweight Global Pixel Access (GPA) module that applies cross-attention with the most similar patch in an image. In the fine area, we use an Intra-Patch Self-Attention (IPSA) module to model long-range pixel dependencies in a local patch, and then a spatial convolution is applied to process the finest details. In addition, a Cascaded Patch Division (CPD) strategy is proposed to enhance perceptual quality of recovered images. Extensive experiments suggest that our method outperforms state-of-the-art lightweight SR methods by a large margin. Code is available at https://github.com/passerer/HPINet.

NeurIPS Conference 2023 Conference Paper

Optimal Parameter and Neuron Pruning for Out-of-Distribution Detection

  • Chao Chen
  • Zhihang Fu
  • Kai Liu
  • Ze Chen
  • Mingyuan Tao
  • Jieping Ye

For a machine learning model deployed in real world scenarios, the ability of detecting out-of-distribution (OOD) samples is indispensable and challenging. Most existing OOD detection methods focused on exploring advanced training skills or training-free tricks to prevent the model from yielding overconfident confidence score for unknown samples. The training-based methods require expensive training cost and rely on OOD samples which are not always available, while most training-free methods can not efficiently utilize the prior information from the training data. In this work, we propose an \textbf{O}ptimal \textbf{P}arameter and \textbf{N}euron \textbf{P}runing (\textbf{OPNP}) approach, which aims to identify and remove those parameters and neurons that lead to over-fitting. The main method is divided into two steps. In the first step, we evaluate the sensitivity of the model parameters and neurons by averaging gradients over all training samples. In the second step, the parameters and neurons with exceptionally large or close to zero sensitivities are removed for prediction. Our proposal is training-free, compatible with other post-hoc methods, and exploring the information from all training data. Extensive experiments are performed on multiple OOD detection tasks and model architectures, showing that our proposed OPNP consistently outperforms the existing methods by a large margin.

NeurIPS Conference 2023 Conference Paper

Topology-Aware Uncertainty for Image Segmentation

  • Saumya Gupta
  • Yikai Zhang
  • Xiaoling Hu
  • Prateek Prasanna
  • Chao Chen

Segmentation of curvilinear structures such as vasculature and road networks is challenging due to relatively weak signals and complex geometry/topology. To facilitate and accelerate large scale annotation, one has to adopt semi-automatic approaches such as proofreading by experts. In this work, we focus on uncertainty estimation for such tasks, so that highly uncertain, and thus error-prone structures can be identified for human annotators to verify. Unlike most existing works, which provide pixel-wise uncertainty maps, we stipulate it is crucial to estimate uncertainty in the units of topological structures, e. g. , small pieces of connections and branches. To achieve this, we leverage tools from topological data analysis, specifically discrete Morse theory (DMT), to first capture the structures, and then reason about their uncertainties. To model the uncertainty, we (1) propose a joint prediction model that estimates the uncertainty of a structure while taking the neighboring structures into consideration (inter-structural uncertainty); (2) propose a novel Probabilistic DMT to model the inherent uncertainty within each structure (intra-structural uncertainty) by sampling its representations via a perturb-and-walk scheme. On various 2D and 3D datasets, our method produces better structure-wise uncertainty maps compared to existing works. Code available at: https: //github. com/Saumya-Gupta-26/struct-uncertainty

JBHI Journal 2022 Journal Article

Attention-Guided Discriminative Region Localization and Label Distribution Learning for Bone Age Assessment

  • Chao Chen
  • Zhihong Chen
  • Xinyu Jin
  • Lanjuan Li
  • William Speier
  • Corey W. Arnold

Bone age assessment (BAA) is clinically important as it can be used to diagnose endocrine and metabolic disorders during child development. Existing deep learning based methods for classifying bone age use the global image as input, or exploit local information by annotating extra bounding boxes or key points. However, training with the global image underutilizes discriminative local information, while providing extra annotations is expensive and subjective. In this paper, we propose an attention-guided approach to automatically localize the discriminative regions for BAA without any extra annotations. Specifically, we first train a classification model to learn the attention maps of the discriminative regions, finding the hand region, the most discriminative region (the carpal bones), and the next most discriminative region (the metacarpal bones). Guided by those attention maps, we then crop the informative local regions from the original image and aggregate different regions for BAA. Instead of taking BAA as a general regression task, which is suboptimal due to the label ambiguity problem in the age label space, we propose using joint age distribution learning and expectation regression, which makes use of the ordinal relationship among hand images with different individual ages and leads to more robust age estimation. Extensive experiments are conducted on the RSNA pediatric bone age data set. Without using extra manual annotations, our method achieves competitive results compared with existing state-of-the-art deep learning-based methods that require manual annotation. Code is available at https://github.com/chenchao666/Bone-Age-Assessment.

NeurIPS Conference 2022 Conference Paper

Neural Approximation of Graph Topological Features

  • Zuoyu Yan
  • Tengfei Ma
  • Liangcai Gao
  • Zhi Tang
  • Yusu Wang
  • Chao Chen

Topological features based on persistent homology capture high-order structural information so as to augment graph neural network methods. However, computing extended persistent homology summaries remains slow for large and dense graphs and can be a serious bottleneck for the learning pipeline. Inspired by recent success in neural algorithmic reasoning, we propose a novel graph neural network to estimate extended persistence diagrams (EPDs) on graphs efficiently. Our model is built on algorithmic insights, and benefits from better supervision and closer alignment with the EPD computation algorithm. We validate our method with convincing empirical results on approximating EPDs and downstream graph representation learning tasks. Our method is also efficient; on large and dense graphs, we accelerate the computation by nearly 100 times.

AAAI Conference 2022 Conference Paper

Resistance Training Using Prior Bias: Toward Unbiased Scene Graph Generation

  • Chao Chen
  • Yibing Zhan
  • Baosheng Yu
  • Liu Liu
  • Yong Luo
  • Bo Du

Scene Graph Generation (SGG) aims to build a structured representation of a scene using objects and pairwise relationships, which benefits downstream tasks. However, current SGG methods usually suffer from sub-optimal scene graph generation because of the long-tailed distribution of training data. To address this problem, we propose Resistance Training using Prior Bias (RTPB) for the scene graph generation. Specifically, RTPB uses a distributed-based prior bias to improve models’ detecting ability on less frequent relationships during training, thus improving the model generalizability on tail categories. In addition, to further explore the contextual information of objects and relationships, we design a contextual encoding backbone network, termed as Dual Transformer (DTrans). We perform extensive experiments on a very popular benchmark, VG150, to demonstrate the effectiveness of our method for the unbiased scene graph generation. In specific, our RTPB achieves an improvement of over 10% under the mean recall when applied to current SGG methods. Furthermore, DTrans with RTPB outperforms nearly all stateof-the-art methods with a large margin. Code is available at https: //github. com/ChCh1999/RTPB

JBHI Journal 2021 Journal Article

Attention-RefNet: Interactive Attention Refinement Network for Infected Area Segmentation of COVID-19

  • Titinunt Kitrungrotsakul
  • Qingqing Chen
  • Huitao Wu
  • Yutaro Iwamoto
  • Hongjie Hu
  • Wenchao Zhu
  • Chao Chen
  • Fangyi Xu

COVID-19 pneumonia is a disease that causes an existential health crisis in many people by directly affecting and damaging lung cells. The segmentation of infected areas from computed tomography (CT) images can be used to assist and provide useful information for COVID-19 diagnosis. Although several deep learning-based segmentation methods have been proposed for COVID-19 segmentation and have achieved state-of-the-art results, the segmentation accuracy is still not high enough (approximately 85%) due to the variations of COVID-19 infected areas (such as shape and size variations) and the similarities between COVID-19 and non-COVID-infected areas. To improve the segmentation accuracy of COVID-19 infected areas, we propose an interactive attention refinement network (Attention RefNet). The interactive attention refinement network can be connected with any segmentation network and trained with the segmentation network in an end-to-end fashion. We propose a skip connection attention module to improve the important features in both segmentation and refinement networks and a seed point module to enhance the important seeds (positions) for interactive refinement. The effectiveness of the proposed method was demonstrated on public datasets (COVID-19CTSeg and MICCAI) and our private multicenter dataset. The segmentation accuracy was improved to more than 90%. We also confirmed the generalizability of the proposed network on our multicenter dataset. The proposed method can still achieve high segmentation accuracy.

AAAI Conference 2021 Conference Paper

Localization in the Crowd with Topological Constraints

  • Shahira Abousamra
  • Minh Hoai
  • Dimitris Samaras
  • Chao Chen

We address the problem of crowd localization, i. e. , the prediction of dots corresponding to people in a crowded scene. Due to various challenges, a localization method is prone to spatial semantic errors, i. e. , predicting multiple dots within a same person or collapsing multiple dots in a cluttered region. We propose a topological approach targeting these semantic errors. We introduce a topological constraint that teaches the model to reason about the spatial arrangement of dots. To enforce this constraint, we define a persistence loss based on the theory of persistent homology. The loss compares the topographic landscape of the likelihood map and the topology of the ground truth. Topological reasoning improves the quality of the localization algorithm especially near cluttered regions. On multiple public benchmarks, our method outperforms previous localization methods. Additionally, we demonstrate the potential of our method in improving the performance in the crowd counting task.

JBHI Journal 2021 Journal Article

Non-Invasive Heart Rate Estimation From Ballistocardiograms Using Bidirectional LSTM Regression

  • Changzhe Jiao
  • Chao Chen
  • Shuiping Gou
  • Dong Hai
  • Bo-Yu Su
  • Marjorie Skubic
  • Licheng Jiao
  • Alina Zare

Non-invasive heart rate estimation is of great importance in daily monitoring of cardiovascular diseases. In this paper, a bidirectional long short term memory (bi-LSTM) regression network is developed for non-invasive heart rate estimation from the ballistocardiograms (BCG) signals. The proposed deep regression model provides an effective solution to the existing challenges in BCG heart rate estimation, such as the mismatch between the BCG signals and ground-truth reference, multi-sensor fusion and effective time series feature learning. Allowing label uncertainty in the estimation can reduce the manual cost of data annotation while further improving the heart rate estimation performance. Compared with the state-of-the-art BCG heart rate estimation methods, the strong fitting and generalization ability of the proposed deep regression model maintains better robustness to noise ( e. g. , sensor noise) and perturbations ( e. g. , body movements) in the BCG signals and provides a more reliable solution for long term heart rate monitoring.

AAAI Conference 2021 Conference Paper

Scalable and Explainable 1-Bit Matrix Completion via Graph Signal Learning

  • Chao Chen
  • Dongsheng Li
  • Junchi Yan
  • Hanchi Huang
  • Xiaokang Yang

One-bit matrix completion is an important class of positiveunlabeled (PU) learning problems where the observations consist of only positive examples, e. g. , in top-N recommender systems. For the first time, we show that 1-bit matrix completion can be formulated as the problem of recovering clean graph signals from noise-corrupted signals in hypergraphs. This makes it possible to enjoy recent advances in graph signal learning. Then, we propose the spectral graph matrix completion (SGMC) method, which can recover the underlying matrix in distributed systems by filtering the noisy data in the graph frequency domain. Meanwhile, it can provide microand macro-level explanations by following vertex-frequency analysis. To tackle the computational and memory issue of performing graph signal operations on large graphs, we construct a scalable Nyström algorithm which can efficiently compute orthonormal eigenvectors. Furthermore, we also develop polynomial and sparse frequency filters to remedy the accuracy loss caused by the approximations. We demonstrate the effectiveness of our algorithms on top-N recommendation tasks, and the results on three large-scale real-world datasets show that SGMC can outperform state-of-the-art top-N recommendation algorithms in accuracy while only requiring a small fraction of training time compared to the baselines.

IJCAI Conference 2021 Conference Paper

Structure Guided Lane Detection

  • Jinming Su
  • Chao Chen
  • Ke Zhang
  • Junfeng Luo
  • Xiaoming Wei
  • Xiaolin Wei

Recently, lane detection has made great progress with the rapid development of deep neural networks and autonomous driving. However, there exist three mainly problems including characterizing lanes, modeling the structural relationship between scenes and lanes, and supporting more attributes (e. g. , instance and type) of lanes. In this paper, we propose a novel structure guided framework to solve these problems simultaneously. In the framework, we first introduce a new lane representation to characterize each instance. Then a top-down vanishing point guided anchoring mechanism is proposed to produce intensive anchors, which efficiently capture various lanes. Next, multi-level structural constraints are used to improve the perception of lanes. In the process, pixel-level perception with binary segmentation is introduced to promote features around anchors and restore lane details from bottom up, a lane-level relation is put forward to model structures (i. e. , parallel) around lanes, and an image-level attention is used to adaptively attend different regions of the image from the perspective of scenes. With the help of structural guidance, anchors are effectively classified and regressed to obtain precise locations and shapes. Extensive experiments on public benchmark datasets show that the proposed approach outperforms state-of-the-art methods with 117 FPS on a single GPU.

NeurIPS Conference 2021 Conference Paper

Topological Detection of Trojaned Neural Networks

  • Songzhu Zheng
  • Yikai Zhang
  • Hubert Wagner
  • Mayank Goswami
  • Chao Chen

Deep neural networks are known to have security issues. One particular threat is the Trojan attack. It occurs when the attackers stealthily manipulate the model's behavior through Trojaned training samples, which can later be exploited. Guided by basic neuroscientific principles, we discover subtle -- yet critical -- structural deviation characterizing Trojaned models. In our analysis we use topological tools. They allow us to model high-order dependencies in the networks, robustly compare different networks, and localize structural abnormalities. One interesting observation is that Trojaned models develop short-cuts from shallow to deep layers. Inspired by these observations, we devise a strategy for robust detection of Trojaned models. Compared to standard baselines it displays better performance on multiple benchmarks.

JBHI Journal 2021 Journal Article

Using BI-RADS Stratifications as Auxiliary Information for Breast Masses Classification in Ultrasound Images

  • Jie Xing
  • Chao Chen
  • Qinyang Lu
  • Xun Cai
  • Aijun Yu
  • Yi Xu
  • Xiaoling Xia
  • Yue Sun

Breast Ultrasound (BUS) imaging has been recognized as an essential imaging modality for breast masses classification in China. Current deep learning (DL) based solutions for BUS classification seek to feed ultrasound (US) images into deep convolutional neural networks (CNNs), to learn a hierarchical combination of features for discriminating malignant and benign masses. One existing problem in current DL-based BUS classification was the lack of spatial and channel-wise features weighting, which inevitably allow interference from redundant features and low sensitivity. In this study, we aim to incorporate the instructive information provided by breast imaging reporting and data system (BI-RADS) within DL-based classification. A novel DL-based BI-RADS Vector-Attention Network (BVA Net) that trains with both texture information and decoded information from BI-RADS stratifications was proposed for the task. Three baseline models, pre-trained DenseNet-121, ResNet-50 and Residual-Attention Network (RA Net) were included for comparison. Experiments were conducted on a large scale private main dataset and two public datasets, UDIAT and BUSI. On the main dataset, BVA Net outperformed other models, in terms of AUC (area under the receiver operating curve, 0. 908), ACC (accuracy, 0. 865), sensitivity (0. 812) and precision (0. 795). BVA Net also achieved the high AUC (0. 87 and 0. 882) and ACC (0. 859 and 0. 843), on UDIAT and BUSI. Moreover, we proposed a method that integrates both BVA Net binary classification and BI-RADS stratification estimation, called integrated classification. The introduction of integrated classification helped improving the overall sensitivity while maintaining a high specificity.

NeurIPS Conference 2020 Conference Paper

A Topological Filter for Learning with Label Noise

  • Pengxiang Wu
  • Songzhu Zheng
  • Mayank Goswami
  • Dimitris Metaxas
  • Chao Chen

Noisy labels can impair the performance of deep neural networks. To tackle this problem, in this paper, we propose a new method for filtering label noise. Unlike most existing methods relying on the posterior probability of a noisy classifier, we focus on the much richer spatial behavior of data in the latent representational space. By leveraging the high-order topological information of data, we are able to collect most of the clean data and train a high-quality model. Theoretically we prove that this topological approach is guaranteed to collect the clean data with high probability. Empirical results show that our method outperforms the state-of-the-arts and is robust to a broad spectrum of noise types and levels.

NeurIPS Conference 2020 Conference Paper

Deep Variational Instance Segmentation

  • Jialin Yuan
  • Chao Chen
  • Fuxin Li

Instance segmentation, which seeks to obtain both class and instance labels for each pixel in the input image, is a challenging task in computer vision. State-of- the-art algorithms often employ a search-based strategy, which first divides the output image with a regular grid and generate proposals at each grid cell, then the proposals are classified and boundaries refined. In this paper, we propose a novel algorithm that directly utilizes a fully convolutional network (FCN) to predict instance labels. Specifically, we propose a variational relaxation of instance segmentation as minimizing an optimization functional for a piecewise-constant segmentation problem, which can be used to train an FCN end-to-end. It extends the classical Mumford-Shah variational segmentation algorithm to be able to handle the permutation-invariant ground truth in instance segmentation. Experiments on PASCAL VOC 2012 and the MSCOCO 2017 dataset show that the proposed approach efficiently tackles the instance segmentation task.

ICML Conference 2020 Conference Paper

DRWR: A Differentiable Renderer without Rendering for Unsupervised 3D Structure Learning from Silhouette Images

  • Zhizhong Han
  • Chao Chen
  • Yu-Shen Liu
  • Matthias Zwicker

Differentiable renderers have been used successfully for unsupervised 3D structure learning from 2D images because they can bridge the gap between 3D and 2D. To optimize 3D shape parameters, current renderers rely on pixel-wise losses between rendered images of 3D reconstructions and ground truth images from corresponding viewpoints. Hence they require interpolation of the recovered 3D structure at each pixel, visibility handling, and optionally evaluating a shading model. In contrast, here we propose a Differentiable Renderer Without Rendering (DRWR) that omits these steps. DRWR only relies on a simple but effective loss that evaluates how well the projections of reconstructed 3D point clouds cover the ground truth object silhouette. Specifically, DRWR employs a smooth silhouette loss to pull the projection of each individual 3D point inside the object silhouette, and a structure-aware repulsion loss to push each pair of projections that fall inside the silhouette far away from each other. Although we omit surface interpolation, visibility handling, and shading, our results demonstrate that DRWR achieves state-of-the-art accuracies under widely used benchmarks, outperforming previous methods both qualitatively and quantitatively. In addition, our training times are significantly lower due to the simplicity of DRWR.

AAAI Conference 2020 Conference Paper

HoMM: Higher-Order Moment Matching for Unsupervised Domain Adaptation

  • Chao Chen
  • Zhihang Fu
  • Zhihong Chen
  • Sheng Jin
  • Zhaowei Cheng
  • Xinyu Jin
  • Xian-Sheng Hua

Minimizing the discrepancy of feature distributions between different domains is one of the most promising directions in unsupervised domain adaptation. From the perspective of moment matching, most existing discrepancy-based methods are designed to match the second-order or lower moments, which however, have limited expression of statistical characteristic for non-Gaussian distributions. In this work, we propose a Higher-order Moment Matching (HoMM) method, and further extend the HoMM into reproducing kernel Hilbert spaces (RKHS). In particular, our proposed HoMM can perform arbitrary-order moment matching, we show that the firstorder HoMM is equivalent to Maximum Mean Discrepancy (MMD) and the second-order HoMM is equivalent to Correlation Alignment (CORAL). Moreover, HoMM (order≥ 3) is expected to perform fine-grained domain alignment as higher-order statistics can approximate more complex, non- Gaussian distributions. Besides, we also exploit the pseudolabeled target samples to learn discriminative representations in the target domain, which further improves the transfer performance. Extensive experiments are conducted, showing that our proposed HoMM consistently outperforms the existing moment matching methods by a large margin. Codes are available at https: //github. com/chenchao666/HoMM-Master

AAAI Conference 2020 Conference Paper

Local Regularizer Improves Generalization

  • Yikai Zhang
  • Hui Qu
  • Dimitris Metaxas
  • Chao Chen

Regularization plays an important role in generalization of deep learning. In this paper, we study the generalization power of an unbiased regularizor for training algorithms in deep learning. We focus on training methods called Locally Regularized Stochastic Gradient Descent (LRSGD). An LRSGD leverages a proximal type penalty in gradient descent steps to regularize SGD in training. We show that by carefully choosing relevant parameters, LRSGD generalizes better than SGD. Our thorough theoretical analysis is supported by experimental evidence. It advances our theoretical understanding of deep learning and provides new perspectives on designing training algorithms. The code is available at https: //github. com/huiqu18/LRSGD.

AAAI Conference 2020 Conference Paper

SSAH: Semi-Supervised Adversarial Deep Hashing with Self-Paced Hard Sample Generation

  • Sheng Jin
  • Shangchen Zhou
  • Yao Liu
  • Chao Chen
  • Xiaoshuai Sun
  • Hongxun Yao
  • Xian-Sheng Hua

Deep hashing methods have been proved to be effective and efficient for large-scale Web media search. The success of these data-driven methods largely depends on collecting suf- ficient labeled data, which is usually a crucial limitation in practical cases. The current solutions to this issue utilize Generative Adversarial Network (GAN) to augment data in semisupervised learning. However, existing GAN-based methods treat image generations and hashing learning as two isolated processes, leading to generation ineffectiveness. Besides, most works fail to exploit the semantic information in unlabeled data. In this paper, we propose a novel Semisupervised Self-pace Adversarial Hashing method, named SSAH to solve the above problems in a unified framework. The SSAH method consists of an adversarial network (A- Net) and a hashing network (H-Net). To improve the quality of generative images, first, the A-Net learns hard samples with multi-scale occlusions and multi-angle rotated deformations which compete against the learning of accurate hashing codes. Second, we design a novel self-paced hard generation policy to gradually increase the hashing difficulty of generated samples. To make use of the semantic information in unlabeled ones, we propose a semi-supervised consistent loss. The experimental results show that our method can significantly improve state-of-the-art models on both the widelyused hashing datasets and fine-grained datasets.

IJCAI Conference 2019 Conference Paper

Heuristic Search for Homology Localization Problem and Its Application in Cardiac Trabeculae Reconstruction

  • Xudong Zhang
  • Pengxiang Wu
  • Changhe Yuan
  • Yusu Wang
  • Dimitris Metaxas
  • Chao Chen

Cardiac trabeculae are fine rod-like muscles whose ends are attached to the inner walls of ventricles. Accurate extraction of trabeculae is important yet challenging, due to the background noise and limited resolution of cardiac images. Existing works proposed to handle this task by modeling the trabeculae as topological handles for better extraction. Computing optimal representation of these handles is essential yet very expensive. In this work, we formulate the problem as a heuristic search problem, and propose novel heuristic functions based on advanced topological techniques. We show in experiments that the proposed heuristic functions improve the computation in both time and memory.

AAAI Conference 2019 Conference Paper

Joint Domain Alignment and Discriminative Feature Learning for Unsupervised Deep Domain Adaptation

  • Chao Chen
  • Zhihong Chen
  • Boyuan Jiang
  • Xinyu Jin

Recently, considerable effort has been devoted to deep domain adaptation in computer vision and machine learning communities. However, most of existing work only concentrates on learning shared feature representation by minimizing the distribution discrepancy across different domains. Due to the fact that all the domain alignment approaches can only reduce, but not remove the domain shift, target domain samples distributed near the edge of the clusters, or far from their corresponding class centers are easily to be misclassified by the hyperplane learned from the source domain. To alleviate this issue, we propose to joint domain alignment and discriminative feature learning, which could benefit both domain alignment and final classification. Specifically, an instance-based discriminative feature learning method and a center-based discriminative feature learning method are proposed, both of which guarantee the domain invariant features with better intra-class compactness and inter-class separability. Extensive experiments show that learning the discriminative features in the shared feature space can significantly boost the performance of deep domain adaptation methods.

AAAI Conference 2019 Conference Paper

Point Cloud Processing via Recurrent Set Encoding

  • Pengxiang Wu
  • Chao Chen
  • Jingru Yi
  • Dimitris Metaxas

We present a new permutation-invariant network for 3D point cloud processing. Our network is composed of a recurrent set encoder and a convolutional feature aggregator. Given an unordered point set, the encoder firstly partitions its ambient space into parallel beams. Points within each beam are then modeled as a sequence and encoded into subregional geometric features by a shared recurrent neural network (RNN). The spatial layout of the beams is regular, and this allows the beam features to be further fed into an efficient 2D convolutional neural network (CNN) for hierarchical feature aggregation. Our network is effective at spatial feature learning, and competes favorably with the state-of-the-arts (SOTAs) on a number of benchmarks. Meanwhile, it is significantly more efficient compared to the SOTAs.

IJCAI Conference 2019 Conference Paper

Taming the Noisy Gradient: Train Deep Neural Networks with Small Batch Sizes

  • Yikai Zhang
  • Hui Qu
  • Chao Chen
  • Dimitris Metaxas

Deep learning architectures are usually proposed with millions of parameters, resulting in a memory issue when training deep neural networks with stochastic gradient descent type methods using large batch sizes. However, training with small batch sizes tends to produce low quality solution due to the large variance of stochastic gradients. In this paper, we tackle this problem by proposing a new framework for training deep neural network with small batches/noisy gradient. During optimization, our method iteratively applies a proximal type regularizer to make loss function strongly convex. Such regularizer stablizes the gradient, leading to better training performance. We prove that our algorithm achieves comparable convergence rate as vanilla SGD even with small batch size. Our framework is simple to implement and can be potentially combined with many existing optimization algorithms. Empirical results show that our method outperforms SGD and Adam when batch size is small. Our implementation is available at https: //github. com/huiqu18/TRAlgorithm.

NeurIPS Conference 2019 Conference Paper

Topology-Preserving Deep Image Segmentation

  • Xiaoling Hu
  • Fuxin Li
  • Dimitris Samaras
  • Chao Chen

Segmentation algorithms are prone to make topological errors on fine-scale struc- tures, e. g. , broken connections. We propose a novel method that learns to segment with correct topology. In particular, we design a continuous-valued loss function that enforces a segmentation to have the same topology as the ground truth, i. e. ,having the same Betti number. The proposed topology-preserving loss function is differentiable and can be incorporated into end-to-end training of a deep neural network. Our method achieves much better performance on the Betti number error, which directly accounts for the topological correctness. It also performs superior on other topology-relevant metrics, e. g. , the Adjusted Rand Index and the Variation of Information, without sacrificing per-pixel accuracy. We illustrate the effectiveness of the proposed method on a broad spectrum of natural and biomedical datasets.

AAAI Conference 2017 Conference Paper

ERMMA: Expected Risk Minimization for Matrix Approximation-based Recommender Systems

  • Dongsheng Li
  • Chao Chen
  • Qin Lv
  • Li Shang
  • Stephen Chu
  • Hongyuan Zha

Matrix approximation (MA) is one of the most popular techniques in today’s recommender systems. In most MA-based recommender systems, the problem of risk minimization should be defined, and how to achieve minimum expected risk in model learning is one of the most critical problems to recommendation accuracy. This paper addresses the expected risk minimization problem, in which expected risk can be bounded by the sum of optimization error and generalization error. Based on the uniform stability theory, we propose an expected risk minimized matrix approximation method (ER- MMA), which is designed to achieve better tradeoff between optimization error and generalization error in order to reduce the expected risk of the learned MA models. Theoretical analysis shows that ERMMA can achieve lower expected risk bound than existing MA methods. Experimental results on the MovieLens and Netflix datasets demonstrate that ERMMA outperforms six state-of-the-art MA-based recommendation methods in both rating prediction problem and item ranking problem.

AAAI Conference 2017 Conference Paper

ESPACE: Accelerating Convolutional Neural Networks via Eliminating Spatial and Channel Redundancy

  • Shaohui Lin
  • Rongrong Ji
  • Chao Chen
  • Feiyue Huang

Recent years have witnessed an extensive popularity of convolutional neural networks (CNNs) in various computer vision and artificial intelligence applications. However, the performance gains have come at a cost of substantially intensive computation complexity, which prohibits its usage in resource-limited applications like mobile or embedded devices. While increasing attention has been paid to the acceleration of internal network structure, the redundancy of visual input is rarely considered. In this paper, we make the first attempt of reducing spatial and channel redundancy directly from the visual input for CNNs acceleration. The proposed method, termed ESPACE (Elimination of SPAtial and Channel rEdundancy), works by the following three steps: First, the 3D channel redundancy of convolutional layers is reduced by a set of low-rank approximation of convolutional filters. Second, a novel mask based selective processing scheme is proposed, which further speedups the convolution operations via skipping unsalient spatial locations of the visual input. Third, the accelerated network is fine-tuned using the training data via back-propagation. The proposed method is evaluated on ImageNet 2012 with implementations on two widelyadopted CNNs, i. e. AlexNet and GoogLeNet. In comparison to several recent methods of CNN acceleration, the proposed scheme has demonstrated new state-of-the-art acceleration performance by a factor of 5. 48× and 4. 12× speedup on AlexNet and GoogLeNet, respectively, with a minimal decrease in classification accuracy.

AAAI Conference 2017 Conference Paper

GLOMA: Embedding Global Information in Local Matrix Approximation Models for Collaborative Filtering

  • Chao Chen
  • Dongsheng Li
  • Qin Lv
  • Junchi Yan
  • Li Shang
  • Stephen Chu

Recommender systems have achieved great success in recent years, and matrix approximation (MA) is one of the most popular techniques for collaborative filtering (CF) based recommendation. However, a major issue is that MA methods perform poorly at detecting strong localized associations among closely related users and items. Recently, some MA-based CF methods adopt clustering methods to discover meaningful user-item subgroups and perform ensemble on different clusterings to improve the recommendation accuracy. However, ensemble learning suffers from lower efficiency due to the increased overall computation overhead. In this paper, we propose GLOMA, a new clustering-based matrix approximation method, which can embed global information in local matrix approximation models to improve recommendation accuracy. In GLOMA, a MA model is first trained on the entire data to capture global information. The global MA model is then utilized to guide the training of cluster-based local MA models, such that the local models can detect strong localized associations shared within clusters and at the same time preserve global associations shared among all users/items. Evaluation results using MovieLens and Netflix datasets demonstrate that, by integrating global information in local models, GLOMA can outperform five state-of-the-art MA-based CF methods in recommendation accuracy while achieving descent efficiency.

NeurIPS Conference 2017 Conference Paper

Mixture-Rank Matrix Approximation for Collaborative Filtering

  • Dongsheng Li
  • Chao Chen
  • Wei Liu
  • Tun Lu
  • Ning Gu
  • Stephen Chu

Low-rank matrix approximation (LRMA) methods have achieved excellent accuracy among today's collaborative filtering (CF) methods. In existing LRMA methods, the rank of user/item feature matrices is typically fixed, i. e. , the same rank is adopted to describe all users/items. However, our studies show that submatrices with different ranks could coexist in the same user-item rating matrix, so that approximations with fixed ranks cannot perfectly describe the internal structures of the rating matrix, therefore leading to inferior recommendation accuracy. In this paper, a mixture-rank matrix approximation (MRMA) method is proposed, in which user-item ratings can be characterized by a mixture of LRMA models with different ranks. Meanwhile, a learning algorithm capitalizing on iterated condition modes is proposed to tackle the non-convex optimization problem pertaining to MRMA. Experimental studies on MovieLens and Netflix datasets demonstrate that MRMA can outperform six state-of-the-art LRMA-based CF methods in terms of recommendation accuracy.

TIST Journal 2017 Journal Article

SPACE-TA

  • Leye Wang
  • Daqing Zhang
  • Dingqi Yang
  • Animesh Pathak
  • Chao Chen
  • Xiao Han
  • Haoyi Xiong
  • Yasha Wang

Data quality and budget are two primary concerns in urban-scale mobile crowdsensing. Traditional research on mobile crowdsensing mainly takes sensing coverage ratio as the data quality metric rather than the overall sensed data error in the target-sensing area. In this article, we propose to leverage spatiotemporal correlations among the sensed data in the target-sensing area to significantly reduce the number of sensing task assignments. In particular, we exploit both intradata correlations within the same type of sensed data and interdata correlations among different types of sensed data in the sensing task. We propose a novel crowdsensing task allocation framework called SPACE-TA (SPArse Cost-Effective Task Allocation), combining compressive sensing, statistical analysis, active learning, and transfer learning, to dynamically select a small set of subareas for sensing in each timeslot (cycle), while inferring the data of unsensed subareas under a probabilistic data quality guarantee. Evaluations on real-life temperature, humidity, air quality, and traffic monitoring datasets verify the effectiveness of SPACE-TA. In the temperature-monitoring task leveraging intradata correlations, SPACE-TA requires data from only 15.5% of the subareas while keeping the inference error below 0.25°C in 95% of the cycles, reducing the number of sensed subareas by 18.0% to 26.5% compared to baselines. When multiple tasks run simultaneously, for example, for temperature and humidity monitoring, SPACE-TA can further reduce ∼10% of the sensed subareas by exploiting interdata correlations.

IJCAI Conference 2016 Conference Paper

Solving M-Modes Using Heuristic Search

  • Cong Chen
  • Changhe Yuan
  • Chao Chen

M-Modes for graphical models is the problem of finding top M label configurations of highest probability in their local neighborhoods. The state-of-the-art method for solving M-Modes is a dynamic programming algorithm which computes global modes by first computing local modes of each subgraph and then search through all their consistent combinations. A drawback of the algorithm is that most of its time is wasted on computing local modes that are never used in global modes. This paper introduces new algorithms that directly search the space of consistent local modes in finding the global modes, which is enabled by a novel search operator designed to search a subgraph of variables at each time. As a result, the search algorithms only need to generate and verify a small number of local modes and can hence lead to significant improvement in efficiency and scalability.

NeurIPS Conference 2014 Conference Paper

Mode Estimation for High Dimensional Discrete Tree Graphical Models

  • Chao Chen
  • Han Liu
  • Dimitris Metaxas
  • Tianqi Zhao

This paper studies the following problem: given samples from a high dimensional discrete distribution, we want to estimate the leading $(\delta, \rho)$-modes of the underlying distributions. A point is defined to be a $(\delta, \rho)$-mode if it is a local optimum of the density within a $\delta$-neighborhood under metric $\rho$. As we increase the ``scale'' parameter $\delta$, the neighborhood size increases and the total number of modes monotonically decreases. The sequence of the $(\delta, \rho)$-modes reveal intrinsic topographical information of the underlying distributions. Though the mode finding problem is generally intractable in high dimensions, this paper unveils that, if the distribution can be approximated well by a tree graphical model, mode characterization is significantly easier. An efficient algorithm with provable theoretical guarantees is proposed and is applied to applications like data analysis and multiple predictions.

JBHI Journal 2014 Journal Article

WE-CARE: An Intelligent Mobile Telecardiology System to Enable mHealth Applications

  • Anpeng Huang
  • Chao Chen
  • Kaigui Bian
  • Xiaohui Duan
  • Min Chen
  • Hongqiao Gao
  • Chao Meng
  • Qian Zheng

Recently, cardiovascular disease (CVD) has become one of the leading death causes worldwide, and it contributes to 41% of all deaths each year in China. This disease incurs a cost of more than 400 billion US dollars in China on the healthcare expenditures and lost productivity during the past ten years. It has been shown that the CVD can be effectively prevented by an interdisciplinary approach that leverages the technology development in both IT and electrocardiogram (ECG) fields. In this paper, we present WE-CARE, an intelligent telecardiology system using mobile 7-lead ECG devices. Because of its improved mobility result from wearable and mobile ECG devices, the WE-CARE system has a wider variety of applications than existing resting ECG systems that reside in hospitals. Meanwhile, it meets the requirement of dynamic ECG systems for mobile users in terms of the detection accuracy and latency. We carried out clinical trials by deploying the WE-CARE systems at Peking University Hospital. The clinical results clearly showed that our solution achieves a high detection rate of over 95% against common types of anomalies in ECG, while it only incurs a small detection latency around one second, both of which meet the criteria of real-time medical diagnosis. As demonstrated by the clinical results, the WE-CARE system is a useful and efficient mHealth (mobile health) tool for the cardiovascular disease diagnosis and treatment in medical platforms.

TIST Journal 2013 Journal Article

Web media semantic concept retrieval via tag removal and model fusion

  • Chao Chen
  • Qiusha Zhu
  • Lin Lin
  • Mei-Ling Shyu

Multimedia data on social websites contain rich semantics and are often accompanied with user-defined tags. To enhance Web media semantic concept retrieval, the fusion of tag-based and content-based models can be used, though it is very challenging. In this article, a novel semantic concept retrieval framework that incorporates tag removal and model fusion is proposed to tackle such a challenge. Tags with useful information can facilitate media search, but they are often imprecise, which makes it important to apply noisy tag removal (by deleting uncorrelated tags) to improve the performance of semantic concept retrieval. Therefore, a multiple correspondence analysis (MCA)-based tag removal algorithm is proposed, which utilizes MCA's ability to capture the relationships among nominal features and identify representative and discriminative tags holding strong correlations with the target semantic concepts. To further improve the retrieval performance, a novel model fusion method is also proposed to combine ranking scores from both tag-based and content-based models, where the adjustment of ranking scores, the reliability of models, and the correlations between the intervals divided on the ranking scores and the semantic concepts are all considered. Comparative results with extensive experiments on the NUS-WIDE-LITE as well as the NUS-WIDE-270K benchmark datasets with 81 semantic concepts show that the proposed framework outperforms baseline results and the other comparison methods with each component being evaluated separately.