Arrow Research search

Author name cluster

Ke Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

45 papers
2 author rows

Possible papers

45

AAAI Conference 2026 Conference Paper

CASL: Curvature-Augmented Self-supervised Learning for 3D Anomaly Detection

  • Yaohua Zha
  • Xue Yuerong
  • Chunlin Fan
  • Yuansong Wang
  • Tao Dai
  • Ke Chen
  • Shu-Tao Xia

Deep learning-based 3D anomaly detection methods have demonstrated significant potential in industrial manufacturing. However, many approaches are specifically designed for anomaly detection tasks, which limits their generalizability to other 3D tasks. In contrast, self-supervised point cloud models aim for general representation learning, yet our investigation reveals that these classical models are suboptimal at anomaly detection under the unified fine-tuning paradigm. This motivates us to develop a more generalizable 3D model that can effectively detect anomalies without relying on task-specific designs. Interestingly, we find that using only the curvature of each point as its anomaly score already outperforms several classical self-supervised and dedicated anomaly detection models, highlighting the critical role of curvature in 3D anomaly detection. In this paper, we propose a Curvature-Augmented Self-supervised Learning (CASL) framework based on a reconstruction paradigm. Built upon the classical U-Net architecture, our approach introduces multi-scale curvature prompts to guide the decoder in predicting the coordinates of each point. Without relying on any dedicated anomaly detection mechanisms, it achieves leading detection performance through straightforward anomaly classification fine-tuning. Moreover, the learned representations generalize well to standard 3D understanding tasks such as point cloud classification.

TMLR Journal 2026 Journal Article

Large Language Model-based Data Science Agent: A Survey

  • Ke Chen
  • Peiran Wang
  • Yaoning Yu
  • Xianyang Zhan
  • Haohan Wang

The rapid advancement of Large Language Models (LLMs) has driven novel applications across diverse domains, with LLM-based agents emerging as a crucial area of exploration. This survey presents a comprehensive analysis of LLM-based agents designed for data science tasks, summarizing insights from recent studies. From the agent perspective, we discuss the key design principles, covering agent roles, execution, knowledge, and reflection methods. From the data science perspective, we identify key processes for LLM-based agents, including data preprocessing, model development, evaluation, visualization, etc. Our work offers two key contributions: (1) a comprehensive review of recent developments in applying LLM-based agents to data science tasks; (2) a dual-perspective framework that connects general agent design principles with the practical workflows in data science.

NeurIPS Conference 2025 Conference Paper

Angular Constraint Embedding via SpherePair Loss for Constrained Clustering

  • Shaojie Zhang
  • Ke Chen

Constrained clustering integrates domain knowledge through pairwise constraints. However, existing deep constrained clustering (DCC) methods are either limited by anchors inherent in end-to-end modeling or struggle with learning discriminative Euclidean embedding, restricting their scalability and real-world applicability. To avoid their respective pitfalls, we propose a novel angular constraint embedding approach for DCC, termed SpherePair. Using the SpherePair loss with a geometric formulation, our method faithfully encodes pairwise constraints and leads to embeddings that are clustering-friendly in angular space, effectively separating representation learning from clustering. SpherePair preserves pairwise relations without conflict, removes the need to specify the exact number of clusters, generalizes to unseen data, enables rapid inference of the number of clusters, and is supported by rigorous theoretical guarantees. Comparative evaluations with state-of-the-art DCC methods on diverse benchmarks, along with empirical validation of theoretical insights, confirm its superior performance, scalability, and overall real-world effectiveness. Code is available at our repository.

JBHI Journal 2025 Journal Article

Automatic Multi-Task Segmentation and Vulnerability Assessment of Carotid Plaque on Contrast-Enhanced Ultrasound Images and Videos via Deep Learning

  • Bokai Hu
  • Han Zhang
  • Caixia Jia
  • Ke Chen
  • Xiangjiang Tang
  • Da He
  • Luni Zhang
  • Shiyao Gu

Intraplaque neovascularization (IPN) within carotid plaque is a crucial indicator of plaque vulnerability. Contrast-enhanced ultrasound (CEUS) is a valuable tool for assessing IPN by evaluating the location and quantity of microbubbles within the carotid plaque. However, this task is typically performed by experienced radiologists. Here we propose a deep learning-based multi-task model for the automatic segmentation and IPN grade classification of carotid plaque on CEUS images and videos. We also compare the performance of our model with that of radiologists. To simulate the clinical practice of radiologists, who often use CEUS videos with dynamic imaging to track microbubble flow and identify IPN, we develop a workflow for plaque vulnerability assessment using CEUS videos. Our multi-task model outperformed individually trained segmentation and classification models, achieving superior performance in IPN grade classification based on CEUS images. Specifically, our model achieved a high segmentation Dice coefficient of 84. 64% and a high classification accuracy of 81. 67%. Moreover, our model surpassed the performance of junior and medium-level radiologists, providing more accurate IPN grading of carotid plaque on CEUS images. For CEUS videos, our model achieved a classification accuracy of 80. 00% in IPN grading. Overall, our multi-task model demonstrates great performance in the automatic, accurate, objective, and efficient IPN grading in both CEUS images and videos. This work holds significant promise for enhancing the clinical diagnosis of plaque vulnerability associated with IPN in CEUS evaluations.

AAAI Conference 2025 Conference Paper

CogSQL: A Cognitive Framework for Enhancing Large Language Models in Text-to-SQL Translation

  • Hongwei Yuan
  • Xiu Tang
  • Ke Chen
  • Lidan Shou
  • Gang Chen
  • Huan Li

Large language models (LLMs) have significantly advanced the performance of various natural language processing tasks, including text-to-SQL. Current LLM-based text-to-SQL schemes mainly focus on improving the understanding of natural language questions (NLQs) or refining the quality of generated SQLs. While these strategies are effective, they often address specific, nuanced aspects. In contrast, humans approach text-to-SQL with a holistic view, applying transitional logical reasoning across multiple steps to arrive at the final answer. We believe LLMs can leverage human cognitive processes to achieve greater accuracy in text-to-SQL. In this paper, we present COGSQL, a framework featuring a suite of tailored models and strategies aimed at replicating human cognitive processes for enhanced LLM-based text-to-SQL. COGSQL consists of three key modules: (1) SQL preparation: we employ a coarse-to-fine schema linking and syntax keyword prediction, akin to how human recall and align key concepts for better understanding. (2) SQL generation: we introduce a concept-enhanced chain-of-thought prompting, enhancing NLQ interpretation and SQL composition of LLMs, similar to humans drafting SQL query. (3) SQL correction: we develop NLQ consistency and result consistency techniques to correct various errors, mirroring how humans evaluate and refine reasoning. We conduct extensive experiments using diverse benchmarks and LLMs. The results and analysis verify the effectiveness and generalizability of COGSQL.

NeurIPS Conference 2025 Conference Paper

GSPN-2: Efficient Parallel Sequence Modeling

  • Hongjun Wang
  • yitong jiang
  • Collin McCarthy
  • David Wehr
  • Hanrong Ye
  • Xinhao Li
  • Ka Chun Cheung
  • Wonmin Byeon

Efficient vision transformer remains a bottleneck for high-resolution images and long-video related real-world applications. Generalized Spatial Propagation Network (GSPN) \cite{wang2025parallel} addresses this by replacing quadratic self-attention with a line-scan propagation scheme, bringing the cost close to linear in the number of rows or columns, while retaining accuracy. Despite this advancement, the existing GSPN implementation still suffers from (i) heavy overhead due to repeatedly launching GPU kernels, (ii) excessive data transfers from global GPU memory, and (iii) redundant computations caused by maintaining separate propagation weights for each channel. We introduce GSPN-2, a joint algorithm–system redesign. In particular, we eliminate thousands of micro-launches from the previous implementation into one single 2D kernel, explicitly pin one warp to each channel slice, and stage the previous column's activations in shared memory. On the model side, we introduce a set of channel-shared propagation weights that replace per-channel matrices, trimming parameters, and align naturally with the affinity map used in transformer attention. Experiments demonstrate GSPN-2's effectiveness across image classification and text-to-image synthesis tasks, matching transformer-level accuracy with significantly lower computational cost. GSPN-2 establishes a new efficiency frontier for modeling global spatial context in vision applications through its unique combination of structured matrix transformations and GPU-optimized implementation.

AAAI Conference 2025 Conference Paper

In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning

  • Songjun Tu
  • Jingbo Sun
  • Qichao Zhang
  • Yaocheng Zhang
  • Jia Liu
  • Ke Chen
  • Dongbin Zhao

Offline preference-based reinforcement learning (PbRL) typically operates in two phases: first, use human preferences to learn a reward model and annotate rewards for a reward-free offline dataset; second, learn a policy by optimizing the learned reward via offline RL. However, accurately modeling step-wise rewards from trajectory-level preference feedback presents inherent challenges. The reward bias introduced, particularly the overestimation of predicted rewards, leads to optimistic trajectory stitching, which undermines the pessimism mechanism critical to the offline RL phase. To address this challenge, we propose In-Dataset Trajectory Return Regularization (DTR) for offline PbRL, which leverages conditional sequence modeling to mitigate the risk of learning inaccurate trajectory stitching under reward bias. Specifically, DTR employs Decision Transformer and TD-Learning to strike a balance between maintaining fidelity to the behavior policy with high in-dataset trajectory returns and selecting optimal actions based on high reward labels. Additionally, we introduce an ensemble normalization technique that effectively integrates multiple reward models, balancing the trade-off between reward differentiation and accuracy. Empirical evaluations on various benchmarks demonstrate the superiority of DTR over other state-of-the-art baselines.

IJCAI Conference 2025 Conference Paper

Point Cloud Mixture-of-Domain-Experts Model for 3D Self-supervised Learning

  • Yaohua Zha
  • Tao Dai
  • Hang Guo
  • Yanzi Wang
  • Bin Chen
  • Ke Chen
  • Shu-Tao Xia

Point clouds, as a primary representation of 3D data, can be categorized into scene domain point clouds and object domain point clouds. Point cloud self-supervised learning (SSL) has become a mainstream paradigm for learning 3D representations. However, existing point cloud SSL primarily focuses on learning domain-specific 3D representations within a single domain, neglecting the complementary nature of cross-domain knowledge, which limits the learning of 3D representations. In this paper, we propose to learn a comprehensive Point cloud Mixture-of-Domain-Experts model (Point-MoDE) via a block-to-scene pre-training strategy. Specifically, We first propose a mixture-of-domain-expert model consisting of scene domain experts and multiple shared object domain experts. Furthermore, we propose a block-to-scene pretraining strategy, which leverages the features of point blocks in the object domain to regress their initial positions in the scene domain through object-level block mask reconstruction and scene-level block position regression. By integrating the complementary knowledge between object and scene, this strategy simultaneously facilitates the learning of both object-domain and scene-domain representations, leading to a more comprehensive 3D representation. Extensive experiments in downstream tasks demonstrate the superiority of our model.

AAMAS Conference 2025 Conference Paper

Salience-Invariant Consistent Policy Learning for Generalization in Visual Reinforcement Learning

  • Jingbo Sun
  • Songjun Tu
  • Qichao Zhang
  • Ke Chen
  • Dongbin Zhao

Generalizing policies to unseen scenarios remains a critical challenge in visual reinforcement learning, where agents often overfit to the specific visual observations of the training environment. In unseen environments, distracting pixels may lead agents to extract representations containing task-irrelevant information. As a result, agents may deviate from the optimal behaviors learned during training, thereby hindering visual generalization. To address this issue, we propose the Salience-Invariant Consistent Policy Learning (SCPL) algorithm, an efficient framework for zero-shot generalization. Our approach introduces a novel value consistency module alongside a dynamics module to effectively capture taskrelevant representations. The value consistency module, guided by saliency, ensures the agent focuses on task-relevant pixels in both original and perturbed observations, while the dynamics module uses augmented data to help the encoder capture dynamicand reward-relevant representations. Additionally, our theoretical analysis highlights the importance of policy consistency for generalization. To strengthen this, we introduce a policy consistency module with a KL divergence constraint to maintain consistent policies across original and perturbed observations. Extensive experiments on the DMC-GB, Robotic Manipulation, and CARLA benchmarks demonstrate that SCPL significantly outperforms state-of-the-art methods in terms of generalization. Notably, SCPL achieves average performance improvements of 14%, 39%, and 69% in the challenging DMC video hard setting, the Robotic hard setting, and the CARLA benchmark, respectively. Project Page: https: //sites. google. com/view/scpl-rl. Corresponding author: Qichao Zhang, Dongbin Zhao. This work is licensed under a Creative Commons Attribution International 4. 0 License. Proc. of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2025), Y. Vorobeychik, S. Das, A. Nowé (eds.), May 19 – 23, 2025, Detroit, Michigan, USA. © 2025 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org).

ICRA Conference 2025 Conference Paper

Ultrasound-Guided Robotic Blood Drawing and In Vivo Studies on Submillimetre Vessels of Rats

  • Shuaiqi Jing
  • Tianliang Yao
  • Ke Zhang
  • Di Wu 0053
  • Qiulin Wang
  • Zixi Chen
  • Ke Chen
  • Peng Qi 0001

Billions of vascular access procedures are performed annually worldwide, serving as a crucial first step in various clinical diagnostic and therapeutic procedures. For pediatric or elderly individuals, whose vessels are small in size (typically 2 to 3 mm in diameter for adults and <1 mm in children), vascular access can be highly challenging. This study presents an image-guided robotic system aimed at enhancing the accuracy of difficult vascular access procedures. The system integrates a 6-DoF (Degrees of Freedom) robotic arm with a 3-DoF end-effector, ensuring precise navigation and needle insertion. Multi-modal imaging and sensing technologies have been utilized to endow the medical robot with precision and safety, while ultrasound (US) imaging guidance is specifically evaluated in this study. To evaluate in vivo vascular access in submillimeter vessels, we conducted ultrasound-guided robotic blood drawing on the tail veins (with a diameter of 0. 7 ± 0. 2 mm) of 40 rats. The results demonstrate that the system achieved a first-attempt success rate of 95%. The high first-attempt success rate in intravenous vascular access, even with small blood vessels, demonstrates the system's effectiveness in performing these procedures. This capability reduces the risk of failed attempts, minimizes patient discomfort, and enhances clinical efficiency.

ICLR Conference 2025 Conference Paper

Unsupervised Zero-Shot Reinforcement Learning via Dual-Value Forward-Backward Representation

  • Jingbo Sun
  • Songjun Tu
  • Qichao Zhang
  • Haoran Li 0010
  • Xin Liu 0039
  • Yaran Chen
  • Ke Chen
  • Dongbin Zhao

Online unsupervised reinforcement learning (URL) can discover diverse skills via reward-free pre-training and exhibits impressive downstream task adaptation abilities through further fine-tuning. However, online URL methods face challenges in achieving zero-shot generalization, i.e., directly applying pre-trained policies to downstream tasks without additional planning or learning. In this paper, we propose a novel Dual-Value Forward-Backward representation (DVFB) framework with a contrastive entropy intrinsic reward to achieve both zero-shot generalization and fine-tuning adaptation in online URL. On the one hand, we demonstrate that poor exploration in forward-backward representations can lead to limited data diversity in online URL, impairing successor measures, and ultimately constraining generalization ability. To address this issue, the DVFB framework learns successor measures through a skill value function while promoting data diversity through an exploration value function, thus enabling zero-shot generalization. On the other hand, and somewhat surprisingly, by employing a straightforward dual-value fine-tuning scheme combined with a reward mapping technique, the pre-trained policy further enhances its performance through fine-tuning on downstream tasks, building on its zero-shot performance. Through extensive multi-task generalization experiments, DVFB demonstrates both superior zero-shot generalization (outperforming on all 12 tasks) and fine-tuning adaptation (leading on 10 out of 12 tasks) abilities, surpassing state-of-the-art URL methods.

AAAI Conference 2024 Conference Paper

CARAT: Contrastive Feature Reconstruction and Aggregation for Multi-Modal Multi-Label Emotion Recognition

  • Cheng Peng
  • Ke Chen
  • Lidan Shou
  • Gang Chen

Multi-modal multi-label emotion recognition (MMER) aims to identify relevant emotions from multiple modalities. The challenge of MMER is how to effectively capture discriminative features for multiple labels from heterogeneous data. Recent studies are mainly devoted to exploring various fusion strategies to integrate multi-modal information into a unified representation for all labels. However, such a learning scheme not only overlooks the specificity of each modality but also fails to capture individual discriminative features for different labels. Moreover, dependencies of labels and modalities cannot be effectively modeled. To address these issues, this paper presents ContrAstive feature Reconstruction and AggregaTion (CARAT) for the MMER task. Specifically, we devise a reconstruction-based fusion mechanism to better model fine-grained modality-to-label dependencies by contrastively learning modal-separated and label-specific features. To further exploit the modality complementarity, we introduce a shuffle-based aggregation strategy to enrich co-occurrence collaboration among labels. Experiments on two benchmark datasets CMU-MOSEI and M3ED demonstrate the effectiveness of CARAT over state-of-the-art methods. Code is available at https://github.com/chengzju/CARAT.

IJCAI Conference 2024 Conference Paper

Retrieval Guided Music Captioning via Multimodal Prefixes

  • Nikita Srivatsan
  • Ke Chen
  • Shlomo Dubnov
  • Taylor Berg-Kirkpatrick

In this paper we put forward a new approach to music captioning, the task of automatically generating natural language descriptions for songs. These descriptions are useful both for categorization and analysis, and also from an accessibility standpoint as they form an important component of closed captions for video content. Our method supplements an audio encoding with a retriever, allowing the decoder to condition on multimodal signal both from the audio of the song itself as well as a candidate caption identified by a nearest neighbor system. This lets us retain the advantages of a retrieval based approach while also allowing for the flexibility of a generative one. We evaluate this system on a dataset of 200k music-caption pairs scraped from Audiostock, a royalty-free music platform, and on MusicCaps, a dataset of 5. 5k pairs. We demonstrate significant improvements over prior systems across both automatic metrics and human evaluation.

AAAI Conference 2024 Conference Paper

Variational Hybrid-Attention Framework for Multi-Label Few-Shot Aspect Category Detection

  • Cheng Peng
  • Ke Chen
  • Lidan Shou
  • Gang Chen

Multi-label few-shot aspect category detection (FS-ACD) is a challenging sentiment analysis task, which aims to learn a multi-label learning paradigm with limited training data. The difficulty of this task is how to use limited data to generalize effective discriminative representations for different categories. Nowadays, all advanced FS-ACD works utilize the prototypical network to learn label prototypes to represent different aspects. However, such point-based estimation methods are inherently noise-susceptible and bias-vulnerable. To this end, this paper proposes a novel Variational Hybrid-Attention Framework (VHAF) for the FS-ACD task. Specifically, to alleviate the data noise, we adopt a hybrid-attention mechanism to generate more discriminative aspect-specific embeddings. Then, based on these embeddings, we introduce the variational distribution inference to obtain the aspect-specific distribution as a more robust aspect representation, which can eliminate the scarce data bias for better inference. Moreover, we further leverage an adaptive threshold estimation to help VHAF better identify multiple relevant aspects. Extensive experiments on three datasets demonstrate the effectiveness of our VHAF over other state-of-the-art methods. Code is available at https://github.com/chengzju/VHAF.

TMLR Journal 2023 Journal Article

Deep Operator Learning Lessens the Curse of Dimensionality for PDEs

  • Ke Chen
  • Chunmei Wang
  • Haizhao Yang

Deep neural networks (DNNs) have achieved remarkable success in numerous domains, and their application to PDE-related problems has been rapidly advancing. This paper provides an estimate for the generalization error of learning Lipschitz operators over Banach spaces using DNNs with applications to various PDE solution operators. The goal is to specify DNN width, depth, and the number of training samples needed to guarantee a certain testing error. Under mild assumptions on data distributions or operator structures, our analysis shows that deep operator learning can have a relaxed dependence on the discretization resolution of PDEs and, hence, lessen the curse of dimensionality in many PDE-related problems including elliptic equations, parabolic equations, and Burgers equations. Our results are also applied to give insights about discretization-invariant in operator learning.

IJCAI Conference 2023 Conference Paper

Deep Partial Multi-Label Learning with Graph Disambiguation

  • Haobo Wang
  • Shisong Yang
  • Gengyu Lyu
  • Weiwei Liu
  • Tianlei Hu
  • Ke Chen
  • Songhe Feng
  • Gang Chen

In partial multi-label learning (PML), each data example is equipped with a candidate label set, which consists of multiple ground-truth labels and other false-positive labels. Recently, graph-based methods, which demonstrate a good ability to estimate accurate confidence scores from candidate labels, have been prevalent to deal with PML problems. However, we observe that existing graph-based PML methods typically adopt linear multi-label classifiers and thus fail to achieve superior performance. In this work, we attempt to remove several obstacles for extending them to deep models and propose a novel deep Partial multi-Label model with grAph-disambIguatioN (PLAIN). Specifically, we introduce the instance-level and label-level similarities to recover label confidences as well as exploit label dependencies. At each training epoch, labels are propagated on the instance and label graphs to produce relatively accurate pseudo-labels; then, we train the deep model to fit the numerical labels. Moreover, we provide a careful analysis of the risk functions to guarantee the robustness of the proposed model. Extensive experiments on various synthetic datasets and three real-world PML datasets demonstrate that PLAIN achieves significantly superior results to state-of-the-art methods.

AAAI Conference 2023 Conference Paper

Effective Continual Learning for Text Classification with Lightweight Snapshots

  • Jue Wang
  • Dajie Dong
  • Lidan Shou
  • Ke Chen
  • Gang Chen

Continual learning is known for suffering from catastrophic forgetting, a phenomenon where previously learned concepts are forgotten upon learning new tasks. A natural remedy is to use trained models for old tasks as ‘teachers’ to regularize the update of the current model to prevent such forgetting. However, this requires storing all past models, which is very space-consuming for large models, e.g. BERT, thus impractical in real-world applications. To tackle this issue, we propose to construct snapshots of seen tasks whose key knowledge is captured in lightweight adapters. During continual learning, we transfer knowledge from past snapshots to the current model through knowledge distillation, allowing the current model to review previously learned knowledge while learning new tasks. We also design representation recalibration to better handle the class-incremental setting. Experiments over various task sequences show that our approach effectively mitigates catastrophic forgetting and outperforms all baselines.

IJCAI Conference 2023 Conference Paper

FedBFPT: An Efficient Federated Learning Framework for Bert Further Pre-training

  • Xin'ao Wang
  • Huan Li
  • Ke Chen
  • Lidan Shou

This study proposes FEDBFPT (Federated BERT Further Pre-Training), a Federated Learning (FL) framework for further pre-training the BERT language model in specialized domains while addressing privacy concerns. FEDBFPT enables multiple clients to collaboratively train the shallower layers of BERT, which are crucial in the pre-training stage, without the need to share private data. To achieve this, FEDBFPT involves building a local model for each client, progressively training the shallower layers of local models while sampling deeper layers, and aggregating trained parameters on a server to create the final global model. This approach utilizes multiple smaller local models to further pre-train a global model targeted at specific tasks via fine-tuning, resulting in a reduction in resource usage while maintaining model accuracy. Theoretical analysis is conducted to support the efficiency of FEDBFPT, and experiments are conducted on corpora across domains such as medicine, biology, and computer science. Results indicate that FEDBFPT achieves performance levels comparable to traditional FL methods while reducing computation and communication costs by 46. 70% and 7. 04%, respectively, even approaching the performance of centralized training models. The Source code is released at https: //github. com/Hanzhouu/FedBFPT.

IJCAI Conference 2023 Conference Paper

Manifold-Aware Self-Training for Unsupervised Domain Adaptation on Regressing 6D Object Pose

  • Yichen Zhang
  • Jiehong Lin
  • Ke Chen
  • Zelin Xu
  • Yaowei Wang
  • Kui Jia

Domain gap between synthetic and real data in visual regression (e. g. , 6D pose estimation) is bridged in this paper via global feature alignment and local refinement on the coarse classification of discretized anchor classes in target space, which imposes a piece-wise target manifold regularization into domain-invariant representation learning. Specifically, our method incorporates an explicit self-supervised manifold regularization, revealing consistent cumulative target dependency across domains, to a self-training scheme (e. g. , the popular Self-Paced Self-Training) to encourage more discriminative transferable representations of regression tasks. Moreover, learning unified implicit neural functions to estimate relative direction and distance of targets to their nearest class bins aims to refine target classification predictions, which can gain robust performance against inconsistent feature scaling sensitive to UDA regressors. Experiment results on three public benchmarks of the challenging 6D pose estimation task can verify the effectiveness of our method, consistently achieving superior performance to the state-of-the-art for UDA on 6D pose estimation. Codes and pre-trained models are available https: //github. com/Gorilla-Lab-SCUT/MAST.

ICRA Conference 2023 Conference Paper

NVRadarNet: Real-Time Radar Obstacle and Free Space Detection for Autonomous Driving

  • Alexander Popov
  • Patrik Gebhardt
  • Ke Chen
  • Ryan Oldja

Detecting obstacles is crucial for safe and efficient autonomous driving. To this end, we present NVRadarNet, a deep neural network (DNN) that detects dynamic obstacles and drivable free space using automotive RADAR sensors. The network utilizes temporally accumulated data from multiple RADAR sensors to detect dynamic obstacles and compute their orientation in a top-down bird's-eye view (BEV). The network also regresses drivable free space to detect unclassified obstacles. Our DNN is the first of its kind to utilize sparse RADAR signals in order to perform obstacle and free space detection in real time from RADAR data only. The network has been successfully used for perception on our autonomous vehicles in real self-driving scenarios. The network runs faster than real time on an embedded GPU and shows good generalization across geographic regions. 1 1 Video at https://youtu.be/WlwJJMltoJY.

AAAI Conference 2023 Conference Paper

Quality-Aware Self-Training on Differentiable Synthesis of Rare Relational Data

  • Chongsheng Zhang
  • Yaxin Hou
  • Ke Chen
  • Shuang Cao
  • Gaojuan Fan
  • Ji Liu

Data scarcity is a very common real-world problem that poses a major challenge to data-driven analytics. Although a lot of data-balancing approaches have been proposed to mitigate this problem, they may drop some useful information or fall into the overfitting problem. Generative Adversarial Network (GAN) based data synthesis methods can alleviate such a problem but lack of quality control over the generated samples. Moreover, the latent associations between the attribute set and the class labels in a relational data cannot be easily captured by a vanilla GAN. In light of this, we introduce an end-to-end self-training scheme (namely, Quality-Aware Self-Training) for rare relational data synthesis, which generates labeled synthetic data via pseudo labeling on GAN-based synthesis. We design a semantic pseudo labeling module to first control the quality of the generated features/samples, then calibrate their semantic labels via a classifier committee consisting of multiple pre-trained shallow classifiers. The high-confident generated samples with calibrated pseudo labels are then fed into a semantic classification network as augmented samples for self-training. We conduct extensive experiments on 20 benchmark datasets of different domains, including 14 industrial datasets. The results show that our method significantly outperforms state-of-the-art methods, including two recent GAN-based data synthesis schemes. Codes are available at https://github.com/yaxinhou/QAST.

IJCAI Conference 2022 Conference Paper

BiCo-Net: Regress Globally, Match Locally for Robust 6D Pose Estimation

  • Zelin Xu
  • Yichen Zhang
  • Ke Chen
  • Kui Jia

The challenges of learning a robust 6D pose function lie in 1) severe occlusion and 2) systematic noises in depth images. Inspired by the success of point-pair features, the goal of this paper is to recover the 6D pose of an object instance segmented from RGB-D images by locally matching pairs of oriented points between the model and camera space. To this end, we propose a novel Bi-directional Correspondence Mapping Network (BiCo-Net) to first generate point clouds guided by a typical pose regression, which can thus incorporate pose-sensitive information to optimize generation of local coordinates and their normal vectors. As pose predictions via geometric computation only rely on one single pair of local oriented points, our BiCo-Net can achieve robustness against sparse and occluded point clouds. An ensemble of redundant pose predictions from locally matching and direct pose regression further refines final pose output against noisy observations. Experimental results on three popularly benchmarking datasets can verify that our method can achieve state-of-the-art performance, especially for the more challenging severe occluded scenes. Source codes are available at https: //github. com/Gorilla-Lab-SCUT/BiCo-Net.

IJCAI Conference 2022 Conference Paper

Continual Federated Learning Based on Knowledge Distillation

  • Yuhang Ma
  • Zhongle Xie
  • Jue Wang
  • Ke Chen
  • Lidan Shou

Federated learning (FL) is a promising approach for learning a shared global model on decentralized data owned by multiple clients without exposing their privacy. In real-world scenarios, data accumulated at the client-side varies in distribution over time. As a consequence, the global model tends to forget the knowledge obtained from previous tasks while learning new tasks, showing signs of "catastrophic forgetting". Previous studies in centralized learning use techniques such as data replay and parameter regularization to mitigate catastrophic forgetting. Unfortunately, these techniques cannot adequately solve the non-trivial problem in FL. We propose Continual Federated Learning with Distillation (CFeD) to address catastrophic forgetting under FL. CFeD performs knowledge distillation on both the clients and the server, with each party independently having an unlabeled surrogate dataset, to mitigate forgetting. Moreover, CFeD assigns different learning objectives, namely learning the new task and reviewing old tasks, to different clients, aiming to improve the learning ability of the model. The results show that our method performs well in mitigating catastrophic forgetting and achieves a good trade-off between the two objectives.

AAAI Conference 2022 Conference Paper

Zero-Shot Audio Source Separation through Query-Based Learning from Weakly-Labeled Data

  • Ke Chen
  • Xingjian Du
  • Bilei Zhu
  • Zejun Ma
  • Taylor Berg-Kirkpatrick
  • Shlomo Dubnov

Deep learning techniques for separating audio into different sound sources face several challenges. Standard architectures require training separate models for different types of audio sources. Although some universal separators employ a single model to target multiple sources, they have difficulty generalizing to unseen sources. In this paper, we propose a threecomponent pipeline to train a universal audio source separator from a large, but weakly-labeled dataset: AudioSet. First, we propose a transformer-based sound event detection system for processing weakly-labeled training data. Second, we devise a query-based audio separation model that leverages this data for model training. Third, we design a latent embedding processor to encode queries that specify audio targets for separation, allowing for zero-shot generalization. Our approach uses a single model for source separation of multiple sound types, and relies solely on weakly-labeled data for training. In addition, the proposed audio separator can be used in a zero-shot setting, learning to separate types of audio sources that were never seen in training. To evaluate the separation performance, we test our model on MUSDB18, while training on the disjoint AudioSet. We further verify the zero-shot performance by conducting another experiment on audio source types that are held-out from training. The model achieves comparable Source-to-Distortion Ratio (SDR) performance to current supervised models in both cases.

ICRA Conference 2021 Conference Paper

Balance Control of a Novel Wheel-legged Robot: Design and Experiments

  • Shuai Wang 0007
  • Leilei Cui 0002
  • Jingfan Zhang
  • Jie Lai
  • Dongsheng Zhang
  • Ke Chen
  • Yu Zheng 0001
  • Zhengyou Zhang

This paper presents a balance control technique for a novel wheel-legged robot. We first derive a dynamic model of the robot and then apply a linear feedback controller based on output regulation and linear quadratic regulator (LQR) methods to maintain the standing of the robot on the ground without moving backward and forward mightily. To take into account nonlinearities of the model and obtain a large domain of stability, a nonlinear controller based on the interconnection and damping assignment - passivity-based control (IDA-PBC) method is exploited to control the robot in more general scenarios. Physical experiments are performed with various control tasks. Experimental results demonstrate that the proposed linear output regulator can maintain the standing of the robot, while the proposed nonlinear controller can balance the robot under an initial starting angle far away from the equilibrium point, or under a changing robot height.

AAAI Conference 2021 Conference Paper

Effective Slot Filling via Weakly-Supervised Dual-Model Learning

  • Jue Wang
  • Ke Chen
  • Lidan Shou
  • Sai Wu
  • Gang Chen

Slot filling is a challenging task in Spoken Language Understanding (SLU). Supervised methods usually require large amounts of annotation to maintain desirable performance. A solution to relieve the heavy dependency on labeled data is to employ bootstrapping, which leverages unlabeled data. However, bootstrapping is known to suffer from semantic drift. We argue that semantic drift can be tackled by exploiting the correlation between slot values (phrases) and their respective types. By using some particular weakly-labeled data, namely the plain phrases included in sentences, we propose a weaklysupervised slot filling approach. Our approach trains two models, namely a classifier and a tagger, which can effectively learn from each other on the weakly-labeled data. The experimental results demonstrate that our approach achieves better results than standard baselines on multiple datasets, especially in the low-resource setting.

NeurIPS Conference 2021 Conference Paper

Sparse Steerable Convolutions: An Efficient Learning of SE(3)-Equivariant Features for Estimation and Tracking of Object Poses in 3D Space

  • Jiehong Lin
  • Hongyang Li
  • Ke Chen
  • Jiangbo Lu
  • Kui Jia

As a basic component of SE(3)-equivariant deep feature learning, steerable convolution has recently demonstrated its advantages for 3D semantic analysis. The advantages are, however, brought by expensive computations on dense, volumetric data, which prevent its practical use for efficient processing of 3D data that are inherently sparse. In this paper, we propose a novel design of Sparse Steerable Convolution (SS-Conv) to address the shortcoming; SS-Conv greatly accelerates steerable convolution with sparse tensors, while strictly preserving the property of SE(3)-equivariance. Based on SS-Conv, we propose a general pipeline for precise estimation of object poses, wherein a key design is a Feature-Steering module that takes the full advantage of SE(3)-equivariance and is able to conduct an efficient pose refinement. To verify our designs, we conduct thorough experiments on three tasks of 3D object semantic analysis, including instance-level 6D pose estimation, category-level 6D pose and size estimation, and category-level 6D pose tracking. Our proposed pipeline based on SS-Conv outperforms existing methods on almost all the metrics evaluated by the three tasks. Ablation studies also show the superiority of our SS-Conv over alternative convolutions in terms of both accuracy and efficiency. Our code is released publicly at https: //github. com/Gorilla-Lab-SCUT/SS-Conv.

AAAI Conference 2020 Conference Paper

Cascading Convolutional Color Constancy

  • Huanglin Yu
  • Ke Chen
  • Kaiqi Wang
  • Yanlin Qian
  • Zhaoxiang Zhang
  • Kui Jia

Regressing the illumination of a scene from the representations of object appearances is popularly adopted in computational color constancy. However, it’s still challenging due to intrinsic appearance and label ambiguities caused by unknown illuminants, diverse reflection properties of materials and extrinsic imaging factors (such as different camera sensors). In this paper, we introduce a novel algorithm – Cascading Convolutional Color Constancy (in short, C4 ) to improve robustness of regression learning and achieve stable generalization capability across datasets (different cameras and scenes) in a unique framework. The proposed C4 method ensembles a series of dependent illumination hypotheses from each cascade stage via introducing a weighted multiply-accumulate loss function, which can inherently capture different modes of illuminations and explicitly enforce coarse-to-fine network optimization. Experimental results on the public Color Checker and NUS 8-Camera benchmarks demonstrate superior performance of the proposed algorithm in comparison with the state-of-the-art methods, especially for more difficult scenes.

NeurIPS Conference 2020 Conference Paper

Feature Importance Ranking for Deep Learning

  • Maksymilian Wojtas
  • Ke Chen

Feature importance ranking has become a powerful tool for explainable AI. However, its nature of combinatorial optimization poses a great challenge for deep learning. In this paper, we propose a novel dual-net architecture consisting of operator and selector for discovery of an optimal feature subset of a fixed size and ranking the importance of those features in the optimal subset simultaneously. During learning, the operator is trained for a supervised learning task via optimal feature subset candidates generated by the selector that learns predicting the learning performance of the operator working on different optimal subset candidates. We develop an alternate learning algorithm that trains two nets jointly and incorporates a stochastic local search procedure into learning to address the combinatorial optimization challenge. In deployment, the selector generates an optimal feature subset and ranks feature importance, while the operator makes predictions based on the optimal subset for test data. A thorough evaluation on synthetic, benchmark and real data sets suggests that our approach outperforms several state-of-the-art feature importance ranking and supervised feature selection methods. (Our source code is available: https: //github. com/maksym33/FeatureImportanceDL)

AAAI Conference 2020 Conference Paper

Incorporating Label Embedding and Feature Augmentation for Multi-Dimensional Classification

  • Haobo Wang
  • Chen Chen
  • Weiwei Liu
  • Ke Chen
  • Tianlei Hu
  • Gang Chen

Feature augmentation, which manipulates the feature space by integrating the label information, is one of the most popular strategies for solving Multi-Dimensional Classification (MDC) problems. However, the vanilla feature augmentation approaches fail to consider the intra-class exclusiveness, and may achieve degenerated performance. To fill this gap, a novel neural network based model is proposed which seamlessly integrates the Label Embedding and Feature Augmentation (LEFA) techniques to learn label correlations. Specifically, based on attentional factorization machine, a cross correlation aware network is introduced to learn a low-dimensional label representation that simultaneously depicts the inter-class correlations and the intra-class exclusiveness. Then the learned latent label vector can be used to augment the original feature space. Extensive experiments on seven real-world datasets demonstrate the superiority of LEFA over state-of-the-art MDC approaches.

IJCAI Conference 2020 Conference Paper

Learning From Multi-Dimensional Partial Labels

  • Haobo Wang
  • Weiwei Liu
  • Yang Zhao
  • Tianlei Hu
  • Ke Chen
  • Gang Chen

Multi-dimensional classification has attracted huge attention from the community. Though most studies consider fully annotated data, in real practice obtaining fully labeled data in MDC tasks is usually intractable. In this paper, we propose a novel learning paradigm: MultiDimensional Partial Label Learning (MDPL) where the ground-truth labels of each instance are concealed in multiple candidate label sets. We first introduce the partial hamming loss for MDPL that incurs a large loss if the predicted labels are not in candidate label sets, and provide an empirical risk minimization (ERM) framework. Theoretically, we rigorously prove the conditions for ERM learnability of MDPL in both independent and dependent cases. Furthermore, we present two MDPL algorithms under our proposed ERM framework. Comprehensive experiments on both synthetic and real-world datasets validate the effectiveness of our proposals.

IROS Conference 2020 Conference Paper

MVLidarNet: Real-Time Multi-Class Scene Understanding for Autonomous Driving Using Multiple Views

  • Ke Chen
  • Ryan Oldja
  • Nikolai Smolyanskiy
  • Stan Birchfield
  • Alexander Popov
  • David Wehr
  • Ibrahim Eden
  • Joachim Pehserl

Autonomous driving requires the inference of actionable information such as detecting and classifying objects, and determining the drivable space. To this end, we present Multi-View LidarNet (MVLidarNet), a two-stage deep neural network for multi-class object detection and drivable space segmentation using multiple views of a single LiDAR point cloud. The first stage processes the point cloud projected onto a perspective view in order to semantically segment the scene. The second stage then processes the point cloud (along with semantic labels from the first stage) projected onto a bird's eye view, to detect and classify objects. Both stages use an encoder-decoder architecture. We show that our multi-view, multi-stage, multi-class approach is able to detect and classify objects while simultaneously determining the drivable space using a single LiDAR scan as input, in challenging scenes with more than one hundred vehicles and pedestrians at a time. The system operates efficiently at 150 fps on an embedded GPU designed for a self-driving car, including a postprocessing step to maintain identities over time. We show results on both KITTI and a much larger internal dataset, thus demonstrating the method's ability to scale by an order of magnitude.

IJCAI Conference 2019 Conference Paper

Deep Cascade Generation on Point Sets

  • Kaiqi Wang
  • Ke Chen
  • Kui Jia

This paper proposes a deep cascade network to generate 3D geometry of an object on a point cloud, consisting of a set of permutation-insensitive points. Such a surface representation is easy to learn from, but inhibits exploiting rich low-dimensional topological manifolds of the object shape due to lack of geometric connectivity. For benefiting from its simple structure yet utilizing rich neighborhood information across points, this paper proposes a two-stage cascade model on point sets. Specifically, our method adopts the state-of-the-art point set autoencoder to generate a sparsely coarse shape first, and then locally refines it by encoding neighborhood connectivity on a graph representation. An ensemble of sparse refined surface is designed to alleviate the suffering from local minima caused by modeling complex geometric manifolds. Moreover, our model develops a dynamically-weighted loss function for jointly penalizing the generation output of cascade levels at different training stages in a coarse-to-fine manner. Comparative evaluation on the publicly benchmarking ShapeNet dataset demonstrates superior performance of the proposed model to the state-of-the-art methods on both single-view shape reconstruction and shape autoencoding applications.

IJCAI Conference 2017 Conference Paper

Cross-Granularity Graph Inference for Semantic Video Object Segmentation

  • Huiling Wang
  • Tinghuai Wang
  • Ke Chen
  • Joni-Kristian Kämäräinen

We address semantic video object segmentation via a novel cross-granularity hierarchical graphical model to integrate tracklet and object proposal reasoning with superpixel labeling. Tracklet characterizes varying spatial-temporal relations of video object which, however, quite often suffers from sporadic local outliers. In order to acquire high-quality tracklets, we propose a transductive inference model which is capable of calibrating short-range noisy object tracklets with respect to long-range dependencies and high-level context cues. In the center of this work lies a new paradigm of semantic video object segmentation beyond modeling appearance and motion of objects locally, where the semantic label is inferred by jointly exploiting multi-scale contextual information and spatial-temporal relations of video object. We evaluate our method on two popular semantic video object segmentation benchmarks and demonstrate that it advances the state-of-the-art by achieving superior accuracy performance than other leading methods.

TIST Journal 2016 Journal Article

Learning Contextualized Music Semantics from Tags Via a Siamese Neural Network

  • Ubai Sandouk
  • Ke Chen

Music information retrieval faces a challenge in modeling contextualized musical concepts formulated by a set of co-occurring tags. In this article, we investigate the suitability of our recently proposed approach based on a Siamese neural network in fighting off this challenge. By means of tag features and probabilistic topic models, the network captures contextualized semantics from tags via unsupervised learning. This leads to a distributed semantics space and a potential solution to the out of vocabulary problem, which has yet to be sufficiently addressed. We explore the nature of the resultant music-based semantics and address computational needs. We conduct experiments on three public music tag collections—namely, CAL500, MagTag5K and Million Song Dataset—and compare our approach to a number of state-of-the-art semantics learning approaches. Comparative results suggest that this approach outperforms previous approaches in terms of semantic priming and music tag completion.

NeurIPS Conference 2011 Conference Paper

Extracting Speaker-Specific Information with a Regularized Siamese Deep Network

  • Ke Chen
  • Ahmad Salman

Speech conveys different yet mixed information ranging from linguistic to speaker-specific components, and each of them should be exclusively used in a specific task. However, it is extremely difficult to extract a specific information component given the fact that nearly all existing acoustic representations carry all types of speech information. Thus, the use of the same representation in both speech and speaker recognition hinders a system from producing better performance due to interference of irrelevant information. In this paper, we present a deep neural architecture to extract speaker-specific information from MFCCs. As a result, a multi-objective loss function is proposed for learning speaker-specific characteristics and regularization via normalizing interference of non-speaker related information and avoiding information loss. With LDC benchmark corpora and a Chinese speech corpus, we demonstrate that a resultant speaker-specific representation is insensitive to text/languages spoken and environmental mismatches and hence outperforms MFCCs and other state-of-the-art techniques in speaker recognition. We discuss relevant issues and relate our approach to previous work.

NeurIPS Conference 2007 Conference Paper

Regularized Boost for Semi-Supervised Learning

  • Ke Chen
  • Shihai Wang

Semi-supervised inductive learning concerns how to learn a decision rule from a data set containing both labeled and unlabeled data. Several boosting algorithms have been extended to semi-supervised learning with various strategies. To our knowledge, however, none of them takes local smoothness constraints among data into account during ensemble learning. In this paper, we introduce a local smoothness regularizer to semi-supervised boosting algorithms based on the universal optimization framework of margin cost functionals. Our regularizer is applicable to existing semi-supervised boosting algorithms to improve their generalization and speed up their training. Comparative results on synthetic, benchmark and real world tasks demonstrate the effectiveness of our local smoothness regularizer. We discuss relevant issues and relate our regularizer to previous work.

NeurIPS Conference 1998 Conference Paper

Perceiving without Learning: From Spirals to Inside/Outside Relations

  • Ke Chen
  • Deliang Wang

As a benchmark task, the spiral problem is well known in neural net(cid: 173) works. Unlike previous work that emphasizes learning, we approach the problem from a generic perspective that does not involve learning. We point out that the spiral problem is intrinsically connected to the in(cid: 173) side/outside problem. A generic solution to both problems is proposed based on oscillatory correlation using a time delay network. Our simu(cid: 173) lation results are qualitatively consistent with human performance, and we interpret human limitations in terms of synchrony and time delays, both biologically plausible. As a special case, our network without time delays can always distinguish these figures regardless of shape, position, size, and orientation.

ICRA Conference 1994 Conference Paper

Systematic Generation of Assembly Precedence Graphs

  • Ke Chen
  • Jean-Michel Henrioud

In this paper, the authors present a complete method allowing a systematic generation of assembly precedence graphs for mechanical products. The generated precedence graphs can be used as the input data of different assembly line design methods such as line balancing. This method involves two stages: the first stage generates all feasible operations from which the second stage determines all the precedence graphs. The first stage is already operational, thus our attention is paid to the description of the principle of the second stage and its mathematical support. The basic idea of this stage is to decompose the set of all feasible operations into several subsets which determine all the precedence graphs. The mathematical support of this idea is a fundamental theorem whose proof is given. Some notions are introduced in this paper, like the assembly graph and base graph. The algorithm of this stage is proposed. >