Author name cluster

Chen Feng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers

2 author rows

TMLR Journal 2026 Journal Article

α-OCC: Uncertainty-Aware Camera-based 3D Semantic Occupancy Prediction

Sanbao Su
Nuo Chen
Chenchen Lin
Felix Juefei-Xu
Chen Feng
Fei Miao

Comprehending 3D scenes is paramount for tasks such as planning and mapping for autonomous vehicles and robotics. Camera-based 3D Semantic Occupancy Prediction (OCC) aims to infer scene geometry and semantics from limited observations. While it has gained popularity due to affordability and rich visual cues, existing methods often neglect the inherent uncertainty in models. To address this, we propose an uncertainty-aware OCC method (α-OCC). We first introduce Depth-UP, an uncertainty propagation framework that improves geometry completion by up to 11.58% and semantic segmentation by up to 12.95% across various OCC models. For uncertainty quantification (UQ), we propose the hierarchical conformal prediction (HCP) method, effectively handling the high-level class imbalance in OCC datasets. On the geometry level, the novel KL-based score function significantly improves the occupied recall (45%) of safety-critical classes with minimal performance overhead (3.4% reduction). On UQ, our HCP achieves smaller prediction set sizes while maintaining the defined coverage guarantee. Compared with baselines, it reduces up to 90% set size, with 18% further reduction when integrated with Depth-UP. Our contributions advance OCC accuracy and robustness, marking a noteworthy step forward in autonomous perception systems. Our code is public on https://coperception.github.io/alpha-OCC/.

PDF Details

NeurIPS Conference 2025 Conference Paper

OmniDraft: A cross-vocabulary, online adaptive drafter for on-device speculative decoding

Ramchalam Kinattinkara Ramakrishnan
Zhaocong Yuan
Jay Zhuo
Chen Feng
Yicheng Lin
Chenzheng Su
Xiaopeng Zhang

Speculative decoding generally dictates having a small, efficient draft model that is either pretrained or distilled offline to a particular target model series, for instance, Llama or Qwen models. However, within online deployment settings, there are two major challenges: 1) usage of a target model that is incompatible with the draft model; 2) expectation of latency improvements over usage and time. In this work, we propose OmniDraft, a unified framework that enables a single draft model to operate with any target model and adapt dynamically to user data. We introduce an online n-gram cache with hybrid distillation fine-tuning to address the cross-vocabulary mismatch across draft and target models; and further improve decoding speed by leveraging adaptive drafting techniques. OmniDraft is particularly suitable for on-device LLM applications where model cost, efficiency and user customization are the major points of contention. This further highlights the need to tackle the above challenges and motivates the “one drafter for all” paradigm. We showcase the proficiency of the OmniDraft framework by performing online learning on math reasoning, coding and text generation tasks. Notably, OmniDraft enables a single Llama-68M model to pair with various target models including Vicuna-7B, Qwen2-7B and Llama3-8B models for speculative decoding; and additionally provides up to 1. 5-2x speedup.

PDF Details

AAAI Conference 2025 Conference Paper

PROSAC: Provably Safe Certification for Machine Learning Models under Adversarial Attacks

Chen Feng
Ziquan Liu
Zhuo Zhi
Ilija Bogunovic
Carsten Gerner-Beuerle
Miguel Rodrigues

It is widely known that state-of-the-art machine learning models, including vision and language models, can be seriously compromised by adversarial perturbations. It is therefore increasingly relevant to develop capabilities to certify their performance in the presence of the most effective adversarial attacks. Our paper offers a new approach to certify the performance of machine learning models in the presence of adversarial attacks with population level risk guarantees. In particular, we introduce the notion of (α,ζ)-safe machine learning model. We propose a hypothesis testing procedure, based on the availability of a calibration set, to derive statistical guarantees providing that the probability of declaring that the adversarial (population) risk of a machine learning model is less than α (i.e. the model is safe), while the model is in fact unsafe (i.e. the model adversarial population risk is higher than α), is less than ζ. We also propose Bayesian optimization algorithms to determine efficiently whether a machine learning model is (α,ζ)-safe in the presence of an adversarial attack, along with statistical guarantees. We apply our framework to a range of machine learning models - including various sizes of vision Transformer (ViT) and ResNet models - impaired by a variety of adversarial attacks, such as PGDAttack, MomentumAttack, GenAttack and BanditAttack, to illustrate the operation of our approach. Importantly, we show that ViT's are generally more robust to adversarial attacks than ResNets, and large models are generally more robust than smaller models. Our approach goes beyond existing empirical adversarial risk-based certification guarantees. It formulates rigorous (and provable) performance guarantees that can be used to satisfy regulatory requirements mandating the use of state-of-the-art technical tools.

PDF Details DOI

ICRA Conference 2025 Conference Paper

Robot-Based Automatic Charging for Electric Vehicles Using Incremental Learning and Biomimetic Control

Chao Zeng 0002
Dexi Ye
Ning Wang 0009
Chen Feng
Chenguang Yang 0001

With the growing popularity of electric vehicles, the demand for robot-based unmanned automatic charging has become both urgent and challenging. Two key challenges need to be addressed: how to efficiently locate the charging port, and how to compliantly insert the connector into the port. In this paper, we propose an incremental learning method based on the broad learning system to address the visual positioning error of the charging port. This method allows the robot to transfer and generalize the search skills learned in simulation to real-world scenarios. As a result, the robot can rapidly locate the charging port in real-world environments without the need for complex contact state modeling, time-consuming data collection, or model retraining. Subsequently, a biomimetic admittance controller is designed to enable the robot to adapt its compliant behavior online during the plugging process. Finally, experiments are performed on a UR robot to verify the effectiveness of our method.

Details

TMLR Journal 2025 Journal Article

TFAR: A Training-Free Framework for Autonomous Reliable Reasoning in Visual Question Answering

Zhuo Zhi
Chen Feng
Adam Daneshmend
Mine Orlu
Andreas Demosthenous
Lu Yin
Da Li
Ziquan Liu

Recent approaches introduce chain-of-thought (CoT) reasoning to mitigate the challenges, such as hallucination and reasoning deficit in multimodal large language models (MLLMs) and enhance performance. However, existing CoT-based methods often rely on extensive data annotation and training. To overcome these limitations, we propose a training-free framework for autonomous and reliable reasoning (TFAR), which only uses common lightweight vision tools to improve the reasoning ability of MLLMs. TFAR enables an MLLM to autonomously and accurately identify relevant regions of interest (RoIs) and support CoT reasoning, without requiring additional training or annotations, and with low computational overhead during inference. However, the use of external tools will introduce noise and uncertainty. To mitigate the uncertainty introduced by external tools and select the optimal pathway, we propose a conformal prediction-based uncertainty quantification method that calibrates the outputs from external tools and dynamically selects the most appropriate tool based on the MLLM’s output uncertainty. Experiments across five datasets demonstrate that TFAR improves performance over the base MLLM by an average of 4.6$\%$, in some cases even outperforming fine-tuned baselines, while maintaining low inference cost. These results offer new insights into training-free CoT guidance for MLLMs and underscore the value of reliable visual tools.

PDF Details

ICLR Conference 2025 Conference Paper

Towards Homogeneous Lexical Tone Decoding from Heterogeneous Intracranial Recordings

Di Wu 0057
Siyuan Li 0002
Chen Feng
Lu Cao
Yue Zhang
Jie Yang 0033
Mohamad Sawan

Recent advancements in brain-computer interfaces (BCIs) and deep learning have made decoding lexical tones from intracranial recordings possible, providing the potential to restore the communication ability of speech-impaired tonal language speakers. However, data heterogeneity induced by both physiological and instrumental factors poses a significant challenge for unified invasive brain tone decoding. Particularly, the existing heterogeneous decoding paradigm (training subject-specific models with individual data) suffers from the intrinsic limitation that fails to learn generalized neural representations and leverages data across subjects. To this end, we introduce Homogeneity-Heterogeneity Disentangled Learning for Neural Representations (H2DiLR), a framework that disentangles and learns the homogeneity and heterogeneity from intracranial recordings of multiple subjects. To verify the effectiveness of H2DiLR, we collected stereoelectroencephalography (sEEG) from multiple participants reading Mandarin materials containing 407 syllables (covering nearly all Mandarin characters). Extensive experiments demonstrate that H2DiLR, as a unified decoding paradigm, outperforms the naive heterogeneous decoding paradigm by a large margin. We also empirically show that H2DiLR indeed captures homogeneity and heterogeneity during neural representation learning.

Details

AAAI Conference 2024 Short Paper

Biomedical Knowledge Graph Embedding with Householder Projection (Student Abstract)

Sensen Zhang
Xun Liang
Simin Niu
Xuan Zhang
Chen Feng
Yuefeng Ma

Researchers have applied knowledge graph embedding (KGE) techniques with advanced neural network techniques, such as capsule networks, for predicting drug-drug interactions (DDIs) and achieved remarkable results. However, most ignore molecular structure and position features between drug pairs. They cannot model the biomedical field's significant relational mapping properties (RMPs,1-N, N-1, N-N) relation. To solve these problems, we innovatively propose CDHse that consists of two crucial modules: 1) Entity embedding module, we obtain position feature obtained by PubMedBERT and Convolutional Neural Network (CNN), obtain molecular structure feature with Graphic Nuaral Network (GNN), obtain entity embedding feature of drug pairs, and then incorporate these features into one synthetic feature. 2) Knowledge graph embedding module, the synthetic feature is Householder projections and then embedded in the complex vector space for training. In this paper, we have selected several advanced models for the DDIs task and performed experiments on three standard BioKG to validate the effectiveness of CDHse.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Memorize What Matters: Emergent Scene Decomposition from Multitraverse

Yiming Li
Zehong Wang
Yue Wang
Zhiding Yu
Zan Gojcic
Marco Pavone
Chen Feng
Jose M. Alvarez

Humans naturally retain memories of permanent elements, while ephemeral moments often slip through the cracks of memory. This selective retention is crucial for robotic perception, localization, and mapping. To endow robots with this capability, we introduce 3D Gaussian Mapping (3DGM), a self-supervised, camera-only offline mapping framework grounded in 3D Gaussian Splatting. 3DGM converts multitraverse RGB videos from the same region into a Gaussian-based environmental map while concurrently performing 2D ephemeral object segmentation. Our key observation is that the environment remains consistent across traversals, while objects frequently change. This allows us to exploit self-supervision from repeated traversals to achieve environment-object decomposition. More specifically, 3DGM formulates multitraverse environmental mapping as a robust 3D representation learning problem, treating pixels of the environment and objects as inliers and outliers, respectively. Using robust feature distillation, feature residual mining, and robust optimization, 3DGM simultaneously performs 2D segmentation and 3D mapping without human intervention. We build the Mapverse benchmark, sourced from the Ithaca365 and nuPlan datasets, to evaluate our method in unsupervised 2D segmentation, 3D reconstruction, and neural rendering. Extensive results verify the effectiveness and potential of our method for self-driving and robotics.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Multiview Scene Graph

Juexiao Zhang
Gao Zhu
Sihang Li
Xinhao Liu
Haorui Song
Xinran Tang
Chen Feng

A proper scene representation is central to the pursuit of spatial intelligence where agents can robustly reconstruct and efficiently understand 3D scenes. A scene representation is either metric, such as landmark maps in 3D reconstruction, 3D bounding boxes in object detection, or voxel grids in occupancy prediction, or topological, such as pose graphs with loop closures in SLAM or visibility graphs in SfM. In this work, we propose to build Multiview Scene Graphs (MSG) from unposed images, representing a scene topologically with interconnected place and object nodes. The task of building MSG is challenging for existing representation learning methods since it needs to jointly address both visual place recognition, object detection, and object association from images with limited fields of view and potentially large viewpoint changes. To evaluate any method tackling this task, we developed an MSG dataset and annotation based on a public 3D dataset. We also propose an evaluation metric based on the intersection-over-union score of MSG edges. Moreover, we develop a novel baseline method built on mainstream pretrained vision models, combining visual place recognition and object association into one Transformer decoder architecture. Experiments demonstrate that our method has superior performance compared to existing relevant baselines.

PDF Details DOI

IROS Conference 2024 Conference Paper

Roofus: Learning-based Robotic Moisture Mapping on Flat Rooftops with Ground Penetrating Radar

Kevin Lee 0003
Wei-Heng Lin
Talha Javed
Sruti Madhusudhan
Bilal Sher
Chen Feng

Robust moisture detection is crucial for building maintenance and cost reduction. Current methods are often limited by the type of roofing material or are cumbersome and expensive. Ground Penetrating Radar (GPR) has shown promise in recent works in moisture detection due to its effectiveness across a broader range of materials, its compactness and lightweight nature, and its ability to image the subsurface. We introduce Roofus, an integrated robotic moisture detection system for flat rooftops, designed to overcome traditional method limitations. It combines a remotely controlled robot with deep learning GPR data processing and automatic map generation. Real-world data is collected and manually annotated for supervised learning. We investigate a novel approach to interpreting GPR data via deep learning using Transformer-based classifiers. LiDAR inertial odometry is employed to integrate multiple individual GPR scans into a holistic moisture map over the rooftop. When evaluated against existing methods such as infrared thermal imaging, electrical capacitance surveys, and nuclear moisture gauges, our system shows promising viability for industry application.

Details

NeurIPS Conference 2024 Conference Paper

Stepping Forward on the Last Mile

Chen Feng
Shaojie Zhuo
Xiaopeng Zhang
Ramchalam K. Ramakrishnan
Zhaocong Yuan
Andrew Z. Li

Continuously adapting pre-trained models to local data on resource constrained edge devices is the \emph{last mile} for model deployment. However, as models increase in size and depth, backpropagation requires a large amount of memory, which becomes prohibitive for edge devices. In addition, most existing low power neural processing engines (e. g. , NPUs, DSPs, MCUs, etc. ) are designed as fixed-point inference accelerators, without training capabilities. Forward gradients, solely based on directional derivatives computed from two forward calls, have been recently used for model training, with substantial savings in computation and memory. However, the performance of quantized training with fixed-point forward gradients remains unclear. In this paper, we investigate the feasibility of on-device training using fixed-point forward gradients, by conducting comprehensive experiments across a variety of deep learning benchmark tasks in both vision and audio domains. We propose a series of algorithm enhancements that further reduce the memory footprint, and the accuracy gap compared to backpropagation. An empirical study on how training with forward gradients navigates in the loss landscape is further explored. Our results demonstrate that on the last mile of model customization on edge devices, training with fixed-point forward gradients is a feasible and practical approach.

PDF Details DOI

NeurIPS Conference 2021 Conference Paper

Learning Distilled Collaboration Graph for Multi-Agent Perception

Yiming Li
Shunli Ren
Pengxiang Wu
Siheng Chen
Chen Feng
Wenjun Zhang

To promote better performance-bandwidth trade-off for multi-agent perception, we propose a novel distilled collaboration graph (DiscoGraph) to model trainable, pose-aware, and adaptive collaboration among agents. Our key novelties lie in two aspects. First, we propose a teacher-student framework to train DiscoGraph via knowledge distillation. The teacher model employs an early collaboration with holistic-view inputs; the student model is based on intermediate collaboration with single-view inputs. Our framework trains DiscoGraph by constraining post-collaboration feature maps in the student model to match the correspondences in the teacher model. Second, we propose a matrix-valued edge weight in DiscoGraph. In such a matrix, each element reflects the inter-agent attention at a specific spatial region, allowing an agent to adaptively highlight the informative regions. During inference, we only need to use the student model named as the distilled collaboration network (DiscoNet). Attributed to the teacher-student framework, multiple agents with the shared DiscoNet could collaboratively approach the performance of a hypothetical teacher model with a holistic view. Our approach is validated on V2X-Sim 1. 0, a large-scale multi-agent perception dataset that we synthesized using CARLA and SUMO co-simulation. Our quantitative and qualitative experiments in multi-agent 3D object detection show that DiscoNet could not only achieve a better performance-bandwidth trade-off than the state-of-the-art collaborative perception methods, but also bring more straightforward design rationale. Our code is available on https: //github. com/ai4ce/DiscoNet.

PDF Details

EAAI Journal 2021 Journal Article

Learning dynamic regression with automatic distractor repression for real-time UAV tracking

Changhong Fu
Fangqiang Ding
Yiming Li
Jin Jin
Chen Feng

With high efficiency and efficacy, the trackers based on the discriminative correlation filter have experienced rapid development in the field of unmanned aerial vehicle (UAV) over the past decade. In literature, these trackers aim at solving a regression problem in which the circulated samples are mapped into a Gaussian label for online filter training. However, the fixed target label for regression makes trackers lose adaptivity in uncertain tracking scenarios. One of the typical failure cases is that the distractors, e. g. , background clutter, camouflage, and similar object, are prone to confuse these trackers. In this work, an efficient approach to instantly monitor the local maximums of the response map for discovering distractors automatically is proposed. In addition, the regression target is accordingly learned, i. e. , the location possessing local maximum indicates latent distractor and thus should be repressed by reducing its target response value in filter training. Qualitative and quantitative experiments performed on three challenging well-known benchmarks demonstrate that the presented method not only outperforms the state-of-the-art handcrafted feature-based trackers but also exhibits comparable performance compared to deep learning-based approaches. Specifically, the presented tracker has phenomenal practicability in real-time UAV applications with an average speed of ∼ 50 frames per second on an affordable CPU.

Details DOI

EAAI Journal 2020 Journal Article

A T–S fuzzy model identification approach based on evolving MIT2-FCRM and WOS-ELM algorithm

Chunyang Wei
Chaoshun Li
Chen Feng
Jianzhong Zhou
Yongchuan Zhang

Inter type-2 fuzzy model has been confirmed to be more effective in Takagi–Sugeno (T–S) fuzzy model identification compared to type-1 fuzzy model. It is indisputable that some algorithms based on inter type-2 fuzzy model have already been developed and shown remarkable modeling performance. To further improve the modeling accuracy, the optimization methods and the neural network are taken into consideration. In this paper, an evolving modified inter type-2 fuzzy c-regression model (MIT2-FCRM) algorithm based on gravitational search algorithm (GSA) and a consequent parameter identification method based on extreme learning machine algorithm with forgetting factor for processing online sequences (namely WOS-ELM) were proposed. Then a novel approach for T–S fuzzy modeling was presented, in which, the coefficients of the upper and lower hyperplanes were obtained by evolving MIT2-FCRM algorithm based on GSA, a hyper-plane-shaped membership function (MF) was utilized to identify the antecedent parameters of the T–S fuzzy model, and WOS-ELM was employed to identify the consequent parameters. The modeling results of six examples indicate that the proposed approach is superior to other studies in terms of identification accuracy, compact fuzzy rules and noise resistance ability.

Details DOI

ICRA Conference 2016 Conference Paper

A fully automated robotic system for three-dimensional cell rotation

Zenan Wang
Chen Feng
Ramadass Muruganandam
Joyce Mathew
Peng Cheang Wong
Wei Tech Ang
Steven Yih Min Tan
Win Tun Latt

Injection and extraction of materials (e. g. protein, sperms, DNA, and blastomeres) into and from cells are essential operation for In Vitro Fertilisation (IVF), intracytoplasmic sperm injection (ICSI) and preimplantation genetic diagnosis (PGD). In order to perform the injection and extraction without damage to the cell, the cell needs to be at an optimal orientation. In this paper, a fully automated microrobotic system for cell orientation is presented. The proposed system overcomes several inherent problems in manual cell manipulation, including low efficiency, inconsistent output and poor success rate. The system can rotate a single fish embryo to a desired orientation by employing motion control, fluidic flow control, and computer vision. It is evaluated by using 60 Zebrafish embryos. Experimental results show that the system is capable of performing fast embryo orientation at a speed of 13 s/cell with a high success rate of 97. 5% and a high in-plane rotation precision of 0. 5°

Details