Author name cluster

Lu Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

40 papers

2 author rows

AAAI Conference 2026 Conference Paper

Uncertainty-Guided View-Strength-Aware Feature Utilization for Multi-View Classification

Li Lv
Qian Guo
Li Zhang
Liang Du
Bingbing Jiang
Lu Chen
Xinyan Liang

In multi-view classification tasks (MVC), each view provides an unique perspective on the data, offering complementary information that can improve classification performance when properly integrated. However, traditional methods typically adopt a uniform processing strategy for all views before fusion, overlooking the fact that different views may require different treatments due to variations in their quality and informativeness. To address this limitation, we propose a novel framework called Uncertainty-Guided View-Strength-Aware Feature Utilization (UVF) for multi-view classification. Our approach introduces a view uncertainty estimation module to quantify the discriminative strength of each view. Based on this estimation, a Differentiated Feature Selector (DFS) adaptively selects features, retaining informative dimensions in weak views while preserving original features in strong views. Furthermore, we employ an uncertainty-guided fusion strategy that assigns dynamic weights to each view's contribution based on its uncertainty score, enhancing the robustness and reliability of the final decision. Experimental results on benchmark datasets demonstrate that our method significantly outperforms conventional approaches, achieving better classification accuracy and interpretability through strength-aware feature processing and fusion.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

An Association-based Fusion Method for Speech Enhancement

Shijie Wang
Qian Guo
Lu Chen
Liang Du
Zikun Jin
Zhian Yuan
Xinyan Liang

Deep learning-based speech enhancement (SE) methods predominantly draw upon two architectural frameworks: generative adversarial networks and diffusion models. In the realm of SE, capturing the local and global relations between signal frames is crucial for the success of these methods. These frameworks typically employ a UNet architecture as their foundational backbone, integrating Long Short-Term Memory (LSTM) networks or attention mechanisms within the UNet to effectively model both local and global signal relations. However, the coupled relation modeling way may not fully harness the potential of these relations. In this paper, we propose an innovative Association-based Fusion Speech Enhancement method (AFSE), a decoupled method. AFSE first constructs a graph that encapsulates the association between each time window of the speech signal, and then models the global relations between frames by fusing the features of these time windows in a manner akin to graph neural networks. Furthermore, AFSE leverages a UNet with dilated convolutions to model the local relations, enabling the network to maintain a high-resolution representation while benefiting from a wider receptive field. Experimental results demonstrate that the AFSE method significantly improves performance in speech enhancement tasks, validating the effectiveness and superiority of our approach. The code is available at https: //github. com/jie019/AFSE_IJCAI2025.

PDF Details DOI

EAAI Journal 2025 Journal Article

BV-NORM: A neural operator learning framework for parametric boundary value problems on complex geometric domains in engineering

Zhiliang Deng
Qinglu Meng
Yingguang Li
Xu Liu
Gengxiang Chen
Lu Chen
Changqing Liu
Xiaozhong Hao

Boundary Value Problems (BVPs) are extensively employed in engineering for process modelling and optimisation. These problems frequently involve complex geometries and require the massive solution of BVPs under different boundary conditions (BCs). Neural operators (NOs), capable of learning mappings between infinite-dimensional functions, present a potential solution for solving parametric BVPs. However, existing NOs are typically designed for scenarios where the input and output functions share the same domain, thus not applicable to BVPs in which the BC and solution functions are defined over different complex domains. Therefore, this study presents a novel deep learning framework called Boundary Value Neural Operator on Riemannian Manifolds (BV-NORM) for solving parametric BVPs involving complex geometric domains. BV-NORM introduces two sub-networks, Geometry-NET (Geo-NET) and Boundary condition-NET (BC-NET), to encode geometric information and boundary conditions. Consequently, the geometric information of the output domain can then be incorporated into the learning process. Furthermore, the Laplace kernel integration module of NORM is employed to construct the sub-networks, thereby enhancing the capacity to analyse complex geometric domains. The performance of the proposed method is evaluated in four benchmark cases, including toy Partial differential equation (PDE) cases and engineering applications, through comparisons with existing baseline neural operators. The experimental results validate that BV-NORM effectively addresses BVPs across various engineering scenarios.

Details DOI

NeurIPS Conference 2025 Conference Paper

Causal Spatio-Temporal Prediction: An Effective and Efficient Multi-Modal Approach

Yuting Huang
Ziquan Fang
Zhihao Zeng
Lu Chen
Yunjun Gao

Spatio-temporal prediction plays a crucial role in intelligent transportation, weather forecasting, and urban planning. While integrating multi-modal data has shown potential for enhancing prediction accuracy, key challenges persist: (i) inadequate fusion of multi-modal information, (ii) confounding factors that obscure causal relations, and (iii) high computational complexity of prediction models. To address these challenges, we propose E$^2$-CSTP, an Effective and Efficient Causal multi-modal Spatio-Temporal Prediction framework. E$^2$-CSTP leverages cross-modal attention and gating mechanisms to effectively integrate multi-modal data. Building on this, we design a dual-branch causal inference approach: the primary branch focuses on spatio-temporal prediction, while the auxiliary branch mitigates bias by modeling additional modalities and applying causal interventions to uncover true causal dependencies. To improve model efficiency, we integrate GCN with the Mamba architecture for accelerated spatio-temporal encoding. Extensive experiments on 4 real-world datasets show that E$^2$-CSTP significantly outperforms 9 state-of-the-art methods, achieving up to 9. 66% improvements in accuracy as well as 17. 37%-56. 11% reductions in computational overhead.

PDF Details

AAAI Conference 2025 Conference Paper

Contrasting Adversarial Perturbations: The Space of Harmless Perturbations

Lu Chen
Shaofeng Li
Benhao Huang
Fan Yang
Zheng Li
Jie Li
Yuan Luo

Existing works have extensively studied adversarial examples, which are minimal perturbations that can mislead the output of deep neural networks (DNNs) while remaining imperceptible to humans. However, in this work, we reveal the existence of a harmless perturbation space, in which perturbations drawn from this space, regardless of their magnitudes, leave the network output unchanged when applied to inputs. Essentially, the harmless perturbation space emerges from the usage of non-injective functions (linear or non-linear layers) within DNNs, enabling multiple distinct inputs to be mapped to the same output. For linear layers with input dimensions exceeding output dimensions, any linear combination of the orthogonal bases of the nullspace of the parameter consistently yields no change in their output. For non-linear layers, the harmless perturbation space may expand, depending on the properties of the layers and input samples. Inspired by this property of DNNs, we solve for a family of general perturbation spaces that are redundant for the DNN's decision, and can be used to hide sensitive data and serve as a means of model identification. Our work highlights the distinctive robustness of DNNs (i.e., consistency under large magnitude perturbations) in contrast to adversarial examples (vulnerability for small noises).

PDF Details DOI

IROS Conference 2025 Conference Paper

Disambiguate Gripper State in Grasp-Based Tasks: Pseudo-Tactile as Feedback Enables Pure Simulation Learning

Yifei Yang
Lu Chen
Zherui Song
Yenan Chen
Wentao Sun
Zhongxiang Zhou
Rong Xiong
Yue Wang 0020

Grasp-based manipulation tasks are fundamental to robots interacting with their environments, yet gripper state ambiguity significantly reduces the robustness of imitation learning policies for these tasks. Data-driven solutions face the challenge of high real-world data costs, while simulation data, despite its low costs, is limited by the sim-to-real gap. We identify the root cause of gripper state ambiguity as the lack of tactile feedback. To address this, we propose a novel approach employing pseudo-tactile as feedback, inspired by the idea of using a force-controlled gripper as a tactile sensor. This method enhances policy robustness without additional data collection and hardware involvement, while providing a noise-free binary gripper state observation for the policy and thus facilitating pure simulation learning to unleash the power of simulation. Experimental results across three real-world grasp-based tasks demonstrate the necessity, effectiveness, and efficiency of our approach. Videos are available on Project Page.

Details

EAAI Journal 2025 Journal Article

Fast prediction and compensation of curing deformation behaviours of composite parts with complex geometry based on neural operator on Riemannian manifolds

Lu Chen
Yingguang Li
Jingyan Su
Weiwei Xu
Lin Hu
Gengxiang Chen
Xu Liu
Xiaozhong Hao

Controlling the curing deformation of composite parts is becoming increasingly challenging with the ever-increasing performance requirements of aerospace equipment. Mould surface compensation, which adjusts the mould surface to minimise the discrepancy between the cured part geometry and the nominal part geometry, has become a primary deformation control way in engineering. The existing mirror compensation methods focus on the deformation dominated by spring-in and are challenging for complex deformation modes. Surrogate model-based shape optimisation provides a feasible idea, but establishing a surrogate model to predict curing deformation fields on complex part geometries remains a challenge. Therefore, this study explores a novel neural operator-driven framework for fast curing deformation prediction and compensation. A clustering-based deformation field segmentation method is proposed to manipulate the mould surface morphing using limited design variables. The neural operator on Riemannian manifolds is introduced for the first time to establish the surrogate model between the mould surface and the curing deformation fields on complex part geometries. To control the global error distribution of the composite part, two error metrics are designed to optimise the mould surface by genetic algorithm. The verification results show that the proposed framework exhibits significant potential in predicting and compensating for the curing deformation field of composite parts with complex geometry.

Details DOI

IJCAI Conference 2025 Conference Paper

Frequency-Aware Deep Depth from Focus

Tao Yan
Yingying Wang
Jiangfeng Zhang
Yuhua Qian
Jieru Jia
Lu Chen
Feijiang Li

In large aperture imaging, the shallow depth of field (DoF) phenomenon requires capturing multiple images at different focal levels, allowing us to infer depth information using depth from focus (DFF) techniques. However, most previous works design convolutional neural networks from a time domain perspective, often leading to blurred fine details in depth estimation. In this work, we propose a frequency-aware deep DFF network (FAD) that couples multi-scale spatial domain local features with frequency domain global structural features. Our main innovations include two key points: First, we introduce a frequency domain feature extraction module that uses the Fourier transform to transfer latent focus features into the frequency domain. This module adaptively captures essential frequency information for focus changes through element-wise multiplication, enhancing fine details in depth results while preserving global structural integrity. Second, the time-frequency joint module of FAD improves the consistency of depth information in sparse texture regions and the continuity in transition areas from both local and global complementary perspectives. Comprehensive experiments demonstrate that our model achieves compelling generalization and state-of-the-art depth prediction across various datasets. Additionally, it can be quickly adapted to real-world applications as a pre-trained model.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

FuncGenFoil: Airfoil Generation and Editing Model in Function Space

Jinouwen Zhang
Junjie Ren
Ma Qianhong
Jianyu Wu
Aobo Yang
Yan Lu
Lu Chen
Hairun Xie

Aircraft manufacturing is the jewel in the crown of industry, in which generating high-fidelity airfoil geometries with controllable and editable representations remains a fundamental challenge. Existing deep learning methods, which typically rely on predefined parametric representations (e. g. , Bézier curves) or discrete point sets, face an inherent trade-off between expressive power and resolution adaptability. To tackle this challenge, we introduce FuncGenFoil, a novel function-space generative model that directly reconstructs airfoil geometries as function curves. Our method inherits the advantages of arbitrary-resolution sampling and smoothness from parametric functions, as well as the strong expressiveness of discrete point-based representations. Empirical evaluations demonstrate that FuncGenFoil improves upon state-of-the-art methods in airfoil generation, achieving a relative 74. 4% reduction in label error and a 23. 2% increase in diversity on the AF-200K dataset. Our results highlight the advantages of function-space modeling for aerodynamic shape optimization, offering a powerful and flexible framework for high-fidelity airfoil design.

PDF Details

JBHI Journal 2025 Journal Article

Hierarchical Multi-Scale Enhanced Transformer for Medical Image Segmentation

Yantao Song
Yunli Lu
Lu Chen
Yimin Luo

Segmentation is an important prerequisite for developing model healthcare systems, particularly for disease diagnosis and treatment planning. In the field of medical image segmentation, the U-shaped architecture, commonly referred to as U-Net, has emerged as the de facto standard and achieved remarkable success. However, due to the intrinsic locality of convolution operations, U-Net generally demonstrates limitations in explicitly modeling long-range dependency. Recent transformer-based models, designed for sequence-to-sequence prediction, have emerged as an alternative to traditional architectures, featuring innate global self-attention mechanisms. Unfortunately, they may sometimes suffer from limited localization abilities due to a lack of sufficient low-level details. To merit both Transformers and U-Net, in this paper, we propose a novel two-channel self-attention mechanism U-network, which performs feature extraction from two channels, CNN and Transformer, respectively. Compared to previous models, we propose two hierarchical feature fusion strategies from both spatial and channel dimensions. Moreover, to further promote the model performance, a loss function that can dynamically adjust the weights according to the output of each layer is constructed. Experimental results on five different datasets show that our method performs consistently outperforms state-of-the-art methods, and it also has an outstanding generalization ability to various medical image modalities.

Details DOI

IJCAI Conference 2025 Conference Paper

Human-Centric Foundation Models: Perception, Generation and Agentic Modeling

Shixiang Tang
Yizhou Wang
Lu Chen
Yuan Wang
Sida Peng
Dan Xu
Wanli Ouyang

Human understanding and generation are critical for modeling digital humans and humanoid embodiments. Recently, Human-centric Foundation Models (HcFMs)—inspired by the success of generalist models such as large language and vision models—have emerged to unify diverse human-centric tasks into a single framework, surpassing traditional task-specific approaches. In this survey, we present a comprehensive overview of HcFMs by proposing a taxonomy that categorizes current approaches into four groups: (1) Human-centric Perception Foundation Models that capture fine-grained features for multi-modal 2D and 3D understanding; (2) Human-centric AIGC Foundation Models that generate high-fidelity, diverse human-related content; (3) Unified Perception and Generation Models that integrate these capabilities to enhance both human understanding and synthesis; and (4) Human-centric Agentic Foundation Models that extend beyond perception and generation to learn human-like intelligence and interactive behaviors for humanoid embodied tasks. We review state-of-the-art techniques, discuss emerging challenges and future research directions. This survey aims to serve as a roadmap for researchers and practitioners working towards more robust, versatile, and intelligent digital human and embodiments modeling. Website is https: //github. com/HumanCentricModels/Awesome-Human-Centric-Foundation-Models/

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

MS-BART: Unified Modeling of Mass Spectra and Molecules for Structure Elucidation

Yang Han
Pengyu Wang
Kai Yu
Xin Chen
Lu Chen

Mass spectrometry (MS) plays a critical role in molecular identification, significantly advancing scientific discovery. However, structure elucidation from MS data remains challenging due to the scarcity of annotated spectra. While large-scale pretraining has proven effective in addressing data scarcity in other domains, applying this paradigm to mass spectrometry is hindered by the complexity and heterogeneity of raw spectral signals. To address this, we propose MS-BART, a unified modeling framework that maps mass spectra and molecular structures into a shared token vocabulary, enabling cross-modal learning through large-scale pretraining on reliably computed fingerprint–molecule datasets. Multi-task pretraining objectives further enhance MS-BART's generalization by jointly optimizing denoising and translation task. The pretrained model is subsequently transferred to experimental spectra through finetuning on fingerprint predictions generated with MIST, a pre-trained spectral inference model, thereby enhancing robustness to real-world spectral variability. While finetuning alleviates the distributional difference, MS-BART still suffers molecular hallucination and requires further alignment. We therefore introduce a chemical feedback mechanism that guides the model toward generating molecules closer to the reference structure. Extensive evaluations demonstrate that MS-BART achieves SOTA performance across 5/12 key metrics on MassSpecGym and NPLIB1 and is faster by one order of magnitude than competing diffusion-based methods, while comprehensive ablation studies systematically validate the model's effectiveness and robustness. We provide the data and code at https: //github. com/OpenDFM/MS-BART.

PDF Details

NeurIPS Conference 2025 Conference Paper

Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving

Daoguang Zan
Zhirong Huang
Wei Liu
Hanwu Chen
Shulin Xin
Linhao Zhang
Qi Liu
Li Aoyan

The task of issue resolving aims to modify a codebase to generate a patch that addresses a given issue. However, most existing benchmarks focus almost exclusively on Python, making them insufficient for evaluating Large Language Models (LLMs) across different programming languages. To bridge this gap, we introduce a multilingual issue-resolving benchmark, called Multi-SWE-bench, covering 8 languages of Python, Java, TypeScript, JavaScript, Go, Rust, C, and C++. In particular, this benchmark includes a total of 2, 132 high-quality instances, carefully curated by 68 expert annotators, ensuring a reliable and accurate evaluation of LLMs on the issue-resolving task. Based on human-annotated results, the issues are further classified into three difficulty levels. We evaluate a series of state-of-the-art models on Multi-SWE-bench, utilizing both procedural and agent-based frameworks for issue resolving. Our experiments reveal three key findings: (1) Limited generalization across languages: While existing LLMs perform well on Python issues, their ability to generalize across other languages remains limited; (2) Performance aligned with human-annotated difficulty: LLM-based agents' performance closely aligns with human-assigned difficulty, with resolution rates decreasing as issue complexity rises; and (3) Performance drop on cross-file issues: The performance of current methods significantly deteriorates when handling cross-file issues. These findings highlight the limitations of current LLMs and underscore the need for more robust models capable of handling a broader range of programming languages and complex issue scenarios.

PDF Details

IROS Conference 2025 Conference Paper

Natural Humanoid Robot Locomotion with Generative Motion Prior

Haodong Zhang
Liang Zhang
Zhenghan Chen
Lu Chen
Yue Wang 0020
Rong Xiong

Natural and lifelike locomotion remains a fundamental challenge for humanoid robots to interact with human society. However, previous methods either neglect motion naturalness or rely on unstable and ambiguous style rewards. In this paper, we propose a novel Generative Motion Prior (GMP) that provides fine-grained motion-level supervision for the task of natural humanoid robot locomotion. To leverage natural human motions, we first employ whole-body motion retargeting to effectively transfer them to the robot. Subsequently, we train a generative model offline to predict future natural reference motions for the robot based on a conditional variational auto-encoder. During policy training, the generative motion prior serves as a frozen online motion generator, delivering precise and comprehensive supervision at the trajectory level, including joint angles and keypoint positions. The generative motion prior significantly enhances training stability and improves interpretability by offering detailed and dense guidance throughout the learning process. Experimental results in both simulation and real-world environments demonstrate that our method achieves superior motion naturalness compared to existing approaches. Project page can be found at https://sites.google.com/view/humanoid-gmp

Details

EAAI Journal 2025 Journal Article

Objective transformation-based and niche-based many-objective evolutionary algorithm with a two-step coordination mechanism

Jiale Luo
Qinghua Gu
Xuexian Li
Lu Chen

Evolutionary algorithms have emerged as powerful tools for optimization. However, striking a balance between convergence and diversity in many-objective optimization remains a significant challenge. To address this gap, we propose TSEA-OTN, an objective transformation-based and niche-based many-objective evolutionary algorithm with a two-step coordination mechanism. Uniquely, TSEA-OTN operates without relying on relaxed Pareto dominance, reference vectors, or additional indicators. Instead, it utilizes prior knowledge about the curvature of the PF (Pareto optimal front) to transform the objectives of the population and establish niches. Additionally, a niche-assisted density estimation method is designed to measure the distribution of individual. The environmental selection process incorporates a two-step mechanism: in the former step, the niche-assisted density evaluation method identifies crowded individuals to prioritize diversity; in the latter step, the Euclidean distance among transformed individuals and convergence evaluation criteria are used to eliminate individuals within the same niche for promoting convergence. Finally, TSEA-OTN is evaluated against six state-of-the-art algorithms on DTLZ (Deb-Thiele-Laumanns-Zitzler), MaF (Many-objective function), WGF (Walking Fish Group) benchmark suites, as well as an engineering case study. Experimental results demonstrate the competitive performance of TSEA-OTN in solving many-objective optimization problems. This research not only advances the field of evolutionary computation but also provides novel solutions for real-world optimization.

Details DOI

NeurIPS Conference 2025 Conference Paper

Task-Specific Data Selection for Instruction Tuning via Monosemantic Neuronal Activations

Da Ma
Gonghu Shang
Zhi Chen
Libo Qin
Yijie LUO
Hongshen Xu
Lei Pan
Shuai Fan

Instruction tuning improves the ability of large language models (LLMs) to follow diverse human instructions, but achieving strong performance on specific target tasks remains challenging. A critical bottleneck is selecting the most relevant data to maximize task-specific performance. Existing data selection approaches include unstable influence-based methods and more stable distribution alignment methods, the latter of which critically rely on the underlying sample representation. In practice, most distribution alignment methods, from shallow features (e. g. , BM25) to neural embeddings (e. g. , BGE, LLM2Vec), may fail to capture how the model internally processes samples. To bridge this gap, we adopt a model-centric strategy in which each sample is represented by its neuronal activation pattern in the model, directly reflecting internal computation. However, directly using raw neuron activations leads to spurious similarity between unrelated samples due to neuron polysemanticity, where a single neuron may respond to multiple, unrelated concepts. To address this, we employ sparse autoencoders to disentangle polysemantic activations into sparse, monosemantic representations, and introduce a dedicated similarity metric for this space to better identify task-relevant data. Comprehensive experiments across multiple instruction datasets, models, tasks, and selection ratios show that our approach consistently outperforms existing data selection baselines in both stability and task-specific performance.

PDF Details

NeurIPS Conference 2025 Conference Paper

TranSUN: A Preemptive Paradigm to Eradicate Retransformation Bias Intrinsically from Regression Models in Recommender Systems

Jiahao Yu
Haozhuang Liu
Yeqiu Yang
Lu Chen
Jian Wu
Yuning Jiang
Bo Zheng

Regression models are crucial in recommender systems. However, retransformation bias problem has been conspicuously neglected within the community. While many works in other fields have devised effective bias correction methods, all of them are post-hoc cures externally to the model, facing practical challenges when applied to real-world recommender systems. Hence, we propose a preemptive paradigm to eradicate the bias intrinsically from the models via minor model refinement. Specifically, a novel TranSUN method is proposed with a joint bias learning manner to offer theoretically guaranteed unbiasedness under empirical superior convergence. It is further generalized into a novel generic regression model family, termed Generalized TranSUN (GTS), which not only offers more theoretical insights but also serves as a generic framework for flexibly developing various bias-free models. Comprehensive experimental results demonstrate the superiority of our methods across data from various domains, which have been successfully deployed in two real-world industrial recommendation scenarios, i. e. product and short video recommendation scenarios in Guess What You Like business domain in the homepage of Taobao App (a leading e-commerce platform with DAU > 300M), to serve the major online traffic.

PDF Details

IJCAI Conference 2025 Conference Paper

View-Association-Guided Dynamic Multi-View Classification

Xinyan Liang
Li Lv
Qian Guo
Bingbing Jiang
Feijiang Li
Liang Du
Lu Chen

In multi-view classification tasks, integrating information from multiple views effectively is crucial for improving model performance. However, most existing methods fail to fully leverage the complex relationships between views, often treating them independently or using static fusion strategies. In this paper, we propose a View-Association-Guided Dynamic Multi-View Classification method (AssoDMVC) to address these limitations. Our approach dynamically models and incorporates the relationships between different views during the classification process. Specifically, we introduce a view-relation-guided mechanism that captures the dependencies and interactions between views, allowing for more flexible and adaptive feature fusion. This dynamic fusion strategy ensures that each view contributes optimally based on its contextual relevance and the inter-view relationships. Extensive experiments on multiple benchmark datasets demonstrate that our method outperforms traditional multi-view classification techniques, offering a more robust and efficient solution for tasks involving complex multi-view data.

PDF Details DOI

EAAI Journal 2024 Journal Article

A ship-radiated noise classification method based on domain knowledge embedding and attention mechanism

Lu Chen
Xinwei Luo
Hanlu Zhou

Ship classification based on machine learning (ML) has proven to be a significant underwater acoustic research direction. One of the critical challenges rests with how to embed domain signal knowledge into ML models to obtain suitable features that highly correlate with the classification and create better predictors. In this paper, a novel ML-based ship classification model, Hierarchical Underwater Acoustic Transformer (HUAT), is proposed to improve the classification performance. Firstly, the Detection of Envelope Modulation on Noise (DEMON) spectra of ship-radiated noise signals are estimated by cyclostationary analysis. The motivation for using a DEMON-based preprocessing scheme is that valuable propeller information can be revealed by exploiting the second-order cyclostationarity of ship-radiated noise signals. Secondly, the useful features of DEMON spectra are enhanced using a multi-head self-attention module, and the potential features of the Mel spectrograms are extracted employing a Convolutional Neural Network (CNN) module. The two kinds of features are fused to provide ship classification patterns. The challenge of feature learning in the deep classification model is reduced by leveraging domain-related classification knowledge. Finally, the Swin Transformer, based on shifted window self-attention mechanism, is used to learn high-level feature representations and conduct ship classification. Experimental results show that the HUAT model achieves excellent classification performance on ship-radiated noise datasets, ShipsEar and DeepShip. And its classification efficiency is better than the model based on traditional Transformer architecture. In addition, the proposed method provides technical support for the underwater intelligent system capable of automatically sensing sailing vessels and recognizing vessel types.

Details DOI

ICLR Conference 2024 Conference Paper

Defining and extracting generalizable interaction primitives from DNNs

Lu Chen
Siyu Lou
Benhao Huang
Quanshi Zhang

Faithfully summarizing the knowledge encoded by a deep neural network (DNN) into a few symbolic primitive patterns without losing much information represents a core challenge in explainable AI. To this end, Ren et al. (2024) have derived a series of theorems to prove that the inference score of a DNN can be explained as a small set of interactions between input variables. However, the lack of generalization power makes it still hard to consider such interactions as faithful primitive patterns encoded by the DNN. Therefore, given different DNNs trained for the same task, we develop a new method to extract interactions that are shared by these DNNs. Experiments show that the extracted interactions can better reflect common knowledge shared by different DNNs.

Details

TAAS Journal 2024 Journal Article

FASDSA: A Flexible Adaptive and Secure Data Sharing Architecture

Zixuan Wang
Pan Wang
Zhixin Sun
Xiaokang Zhou
MengYi Fu
MinYao Liu
Xintong Wang
Lu Chen

With the development of Web 3.0 and Metaverse technologies, the ability of autonomous vehicles has been dramatically improved. These technologies have decentralized features that break the traditional data-sharing mode, grant users control over their data, and achieve benefits through data sharing, promoting the widespread circulation of data. To ensure data exchange security, flexibility, and reliability, this paper proposes FASDSA: A Flexible, Adaptive, and Secure Data Sharing Architecture for CAVs with Web 3.0 and Metaverse. This architecture has three advantages: First, it adopts a decentralized, federated learning and CAV role division method, which allows different computational power CAVs to participate in data sharing according to their roles, achieving flexible data privacy protection. Second, it has the ability of tampered model detection based on interpretable analysis, which can effectively ensure that the model is not tampered with. Third, it has a reward mechanism based on work contribution and trust assessment, which uses blockchain technology to ensure the continuous security operation of this architecture. To verify the performance of FASDSA, we used the UNSW-NB15 dataset to conduct three experiments. The experimental results indicate that compared to traditional methods, FASDSA possesses greater flexibility and security while maintaining similar or even superior model performance.

Details DOI

AAAI Conference 2024 Conference Paper

MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators

Yaqi Zhang
Di Huang
Bin Liu
Shixiang Tang
Yan Lu
Lu Chen
Lei Bai
Qi Chu

Generating realistic human motion from given action descriptions has experienced significant advancements because of the emerging requirement of digital humans. While recent works have achieved impressive results in generating motion directly from textual action descriptions, they often support only a single modality of the control signal, which limits their application in the real digital human industry. This paper presents a Motion General-Purpose generaTor (MotionGPT) that can use multimodal control signals, e.g., text and single-frame poses, for generating consecutive human motions by treating multimodal signals as special input tokens in large language models (LLMs). Specifically, we first quantize multimodal control signals into discrete codes and then formulate them in a unified prompt instruction to ask the LLMs to generate the motion answer. Our MotionGPT demonstrates a unified human motion generation model with multimodal control signals by tuning a mere 0.4% of LLM parameters. To the best of our knowledge, MotionGPT is the first method to generate human motion by multimodal control signals, which we hope can shed light on this new direction. Visit our webpage at https://qiqiapink.github.io/MotionGPT/.

PDF Details DOI

IROS Conference 2024 Conference Paper

Multimodal Evolutionary Encoder for Continuous Vision-Language Navigation

Zongtao He
Liuyi Wang
Lu Chen
Shu Li 0005
Qingqing Yan
Chengju Liu
Qijun Chen

Can multimodal encoder evolve when facing increasingly tough circumstances? Our work investigates this possibility in the context of continuous vision-language navigation (continuous VLN), which aims to navigate robots under linguistic supervision and visual feedback. We propose a multimodal evolutionary encoder (MEE) comprising a unified multimodal encoder architecture and an evolutionary pre-training strategy. The unified multimodal encoder unifies rich modalities, including depth and sub-instruction, to enhance the solid understanding of environments and tasks. It also effectively utilizes monocular observation, reducing the reliance on panoramic vision. The evolutionary pre-training strategy exposes the encoder to increasingly unfamiliar data domains and difficult objectives. The multi-stage adaption helps the encoder establish robust intra- and inter-modality connections and improve its generalization to unfamiliar environments. To achieve such evolution, we collect a large-scale multi-stage dataset with specialized objectives, addressing the absence of suitable continuous VLN pre-training. Evaluation on VLN-CE demonstrates the superiority of MEE over other direct action-predicting methods. Furthermore, we deploy MEE in real scenes using self-developed service robots, showcasing its effectiveness and potential for real-world applications. Our code and dataset are available at https://github.com/RavenKiller/MEE.

Details

AAAI Conference 2024 Conference Paper

SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research

Liangtai Sun
Yang Han
Zihan Zhao
Da Ma
Zhennan Shen
Baocai Chen
Lu Chen
Kai Yu

Recently, there has been growing interest in using Large Language Models (LLMs) for scientific research. Numerous benchmarks have been proposed to evaluate the ability of LLMs for scientific research. However, current benchmarks are mostly based on pre-collected objective questions. This design suffers from data leakage problem and lacks the evaluation of subjective Q/A ability. In this paper, we propose SciEval, a comprehensive and multi-disciplinary evaluation benchmark to address these issues. Based on Bloom's taxonomy, SciEval covers four dimensions to systematically evaluate scientific research ability. In particular, we design a "dynamic" subset based on scientific principles to prevent evaluation from potential data leakage. Both objective and subjective questions are included in SciEval. These characteristics make SciEval a more effective benchmark for scientific research ability evaluation of LLMs. Comprehensive experiments on most advanced LLMs show that, although GPT-4 achieves SOTA performance compared to other LLMs, there is still substantial room for improvement, especially for dynamic questions. The codes and data are publicly available on https://github.com/OpenDFM/SciEval.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

Ruisheng Cao
Fangyu Lei
Haoyuan Wu
Jixuan Chen
Yeqiao Fu
Hongcheng Gao
Xinzhuang Xiong
Hanchong Zhang

Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte. As vision language models (VLMs) advance in multimodal understanding and code generation, VLM-based agents could potentially automate these workflows by generating SQL queries, Python code, and GUI operations. This automation can improve the productivity of experts while democratizing access to large-scale data analysis. In this paper, we introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering workflows, featuring 494 real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications. These tasks, derived from real-world use cases, evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems. To balance realistic simulation with evaluation simplicity, we devote significant effort to developing automatic configurations for task setup and carefully crafting evaluation metrics for each task. Furthermore, we supplement multimodal agents with comprehensive documents of these enterprise data software systems. Our empirical evaluation reveals that existing state-of-the-art LLM/VLM-based agents do not reliably automate full data workflows (14. 0% success). Even with step-by-step guidance, these agents still underperform in tasks that require fine-grained, knowledge-intensive GUI actions (16. 2%) and involve remote cloud-hosted workspaces (10. 6%). We hope that Spider2-V paves the way for autonomous multimodal agents to transform the automation of data science and engineering workflow. Our code and data are available at https: //spider2-v. github. io.

PDF Details DOI

ICLR Conference 2023 Conference Paper

DM-NeRF: 3D Scene Geometry Decomposition and Manipulation from 2D Images

Bing Wang 0013
Lu Chen
Bo Yang 0027

In this paper, we study the problem of 3D scene geometry decomposition and manipulation from 2D views. By leveraging the recent implicit neural representation techniques, particularly the appealing neural radiance fields, we introduce an object field component to learn unique codes for all individual objects in 3D space only from 2D supervision. The key to this component is a series of carefully designed loss functions to enable every 3D point, especially in non-occupied space, to be effectively optimized even without 3D labels. In addition, we introduce an inverse query algorithm to freely manipulate any specified 3D object shape in the learned scene representation. Notably, our manipulation algorithm can explicitly tackle key issues such as object collisions and visual occlusions. Our method, called DM-NeRF, is among the first to simultaneously reconstruct, decompose, manipulate and render complex 3D scenes in a single pipeline. Extensive experiments on three datasets clearly show that our method can accurately decompose all 3D objects from 2D views, allowing any interested object to be freely manipulated in 3D space such as translation, rotation, size adjustment, and deformation.

Details

ICML Conference 2023 Conference Paper

HarsanyiNet: Computing Accurate Shapley Values in a Single Forward Propagation

Lu Chen
Siyu Lou
Keyan Zhang
Jin Huang
Quanshi Zhang

The Shapley value is widely regarded as a trustworthy attribution metric. However, when people use Shapley values to explain the attribution of input variables of a deep neural network (DNN), it usually requires a very high computational cost to approximate relatively accurate Shapley values in real-world applications. Therefore, we propose a novel network architecture, the HarsanyiNet, which makes inferences on the input sample and simultaneously computes the exact Shapley values of the input variables in a single forward propagation. The HarsanyiNet is designed on the theoretical foundation that the Shapley value can be reformulated as the redistribution of Harsanyi interactions encoded by the network.

Details

NeurIPS Conference 2023 Conference Paper

Large Language Models Are Semi-Parametric Reinforcement Learning Agents

Danyang Zhang
Lu Chen
Situo Zhang
Hongshen Xu
Zihan Zhao
Kai Yu

Inspired by the insights in cognitive science with respect to human memory and reasoning mechanism, a novel evolvable LLM-based (Large Language Model) agent framework is proposed as Rememberer. By equipping the LLM with a long-term experience memory, Rememberer is capable of exploiting the experiences from the past episodes even for different task goals, which excels an LLM-based agent with fixed exemplars or equipped with a transient working memory. We further introduce R einforcement L earning with E xperience M emory ( RLEM ) to update the memory. Thus, the whole system can learn from the experiences of both success and failure, and evolve its capability without fine-tuning the parameters of the LLM. In this way, the proposed Rememberer constitutes a semi-parametric RL agent. Extensive experiments are conducted on two RL task sets to evaluate the proposed framework. The average results with different initialization and training sets exceed the prior SOTA by 4% and 2% for the success rate on two task sets and demonstrate the superiority and robustness of Rememberer.

PDF Details

ICML Conference 2023 Conference Paper

Multiple Thinking Achieving Meta-Ability Decoupling for Object Navigation

Ronghao Dang
Lu Chen
Liuyi Wang
Zongtao He
Chengju Liu
Qijun Chen

We propose a meta-ability decoupling (MAD) paradigm, which brings together various object navigation methods in an architecture system, allowing them to mutually enhance each other and evolve together. Based on the MAD paradigm, we design a multiple thinking (MT) model that leverages distinct thinking to abstract various meta-abilities. Our method decouples meta-abilities from three aspects: input, encoding, and reward while employing the multiple thinking collaboration (MTC) module to promote mutual cooperation between thinking. MAD introduces a novel qualitative and quantitative interpretability system for object navigation. Through extensive experiments on AI2-Thor and RoboTHOR, we demonstrate that our method outperforms state-of-the-art (SOTA) methods on both typical and zero-shot object navigation tasks.

Details

IROS Conference 2023 Conference Paper

Robust Real-Time Motion Retargeting via Neural Latent Prediction

Tiantian Wang 0006
Haodong Zhang
Lu Chen
Dongqi Wang 0002
Yue Wang 0020
Rong Xiong

Human-robot motion retargeting is a crucial approach for fast learning motion skills. Achieving real-time retargeting demands high levels of synchronization and accuracy. Even though existing retargeting methods have swift calculation, they still cause time-delay effect on the synchronous retargeting. To mitigate this issue, this paper proposes a motion retargeting method guided by prediction, which effectively reduces the adverse impact of time-delay. The proposed pipeline contains motion retargeting in spatial-temporal graph-based structure and motion prediction in the latent space. The motion sequence retargeting builds mapping and paired data from human poses to corresponding robot configurations for training prediction model, and generated robot motion satisfies limit and self-collision constrains. The controller guided by prediction imports future robot joint motion to achieve advanced trajectory tracking, thereby compensating for delay time spent on calculation and tracking. Experimental results show that our method outperforms other methods in terms of synchronization and similarity. Furthermore, our method exhibits fault-tolerant capability in scenarios involving the loss of human information input.

Details

NeurIPS Conference 2022 Conference Paper

Less-forgetting Multi-lingual Fine-tuning

Yuren Mao
Yaobo Liang
Nan Duan
Haobo Wang
Kai Wang
Lu Chen
Yunjun Gao

Multi-lingual fine-tuning (MLF), which fine-tunes a multi-lingual language model (MLLM) with multiple source languages, aims to gain good zero-shot performance on target languages. In MLF, the fine-tuned model tends to fit the source languages while forgetting its cross-lingual knowledge obtained from the pre-training stage. This forgetting phenomenon degenerates the zero-shot performance of MLF, which remains under-explored. To fill this gap, this paper proposes a multi-lingual fine-tuning method, dubbed Less-forgetting Multi-lingual Fine-tuning (LF-MLF). In LF-MLF, we cast multi-lingual fine-tuning as a constrained optimization problem, where the optimization objective is to minimize forgetting, and constraints are reducing the fine-tuning loss. The proposed method has superior zero-shot performance; furthermore, it can achieve the Pareto stationarity. Extensive experiments on Named Entity Recognition, Question Answering and Natural Language Inference back up our theoretical analysis and validate the superiority of our proposals.

PDF Details

IJCAI Conference 2022 Conference Paper

MetaER-TTE: An Adaptive Meta-learning Model for En Route Travel Time Estimation

Yu Fan
Jiajie Xu
Rui Zhou
Jianxin Li
Kai Zheng
Lu Chen
Chengfei Liu

En route travel time estimation (ER-TTE) aims to predict the travel time on the remaining route. Since the traveled and remaining parts of a trip usually have some common characteristics like driving speed, it is desirable to explore these characteristics for improved performance via effective adaptation. This yet faces the severe problem of data sparsity due to the few sampled points in a traveled partial trajectory. Since trajectories with different contextual information tend to have different characteristics, the existing meta-learning method for ER-TTE cannot fit each trajectory well because it uses the same model for all trajectories. To this end, we propose a novel adaptive meta-learning model called MetaER-TTE. Particularly, we utilize soft-clustering and derive cluster-aware initialized parameters to better transfer the shared knowledge across trajectories with similar contextual information. In addition, we adopt a distribution-aware approach for adaptive learning rate optimization, so as to avoid task-overfitting which will occur when guiding the initial parameters with a fixed learning rate for tasks under imbalanced distribution. Finally, we conduct comprehensive experiments to demonstrate the superiority of MetaER-TTE.

PDF Details DOI

JBHI Journal 2022 Journal Article

RFormer: Transformer-Based Generative Adversarial Network for Real Fundus Image Restoration on a New Clinical Benchmark

Zhuo Deng
Yuanhao Cai
Lu Chen
Zheng Gong
Qiqi Bao
Xue Yao
Dong Fang
Wenming Yang

Ophthalmologists have used fundus images to screen and diagnose eye diseases. However, different equipments and ophthalmologists pose large variations to the quality of fundus images. Low-quality (LQ) degraded fundus images easily lead to uncertainty in clinical screening and generally increase the risk of misdiagnosis. Thus, real fundus image restoration is worth studying. Unfortunately, real clinical benchmark has not been explored for this task so far. In this paper, we investigate the real clinical fundus image restoration problem. Firstly, We establish a clinical dataset, Real Fundus (RF), including 120 low- and high-quality (HQ) image pairs. Then we propose a novel Transformer-based Generative Adversarial Network (RFormer) to restore the real degradation of clinical fundus images. The key component in our network is the Window-based Self-Attention Block (WSAB) which captures non-local self-similarity and long-range dependencies. To produce more visually pleasant results, a Transformer-based discriminator is introduced. Extensive experiments on our clinical benchmark show that the proposed RFormer significantly outperforms the state-of-the-art (SOTA) methods. In addition, experiments of downstream tasks such as vessel segmentation and optic disc/cup detection demonstrate that our proposed RFormer benefits clinical fundus image analysis and applications.

Details DOI

IJCAI Conference 2022 Conference Paper

When Transfer Learning Meets Cross-City Urban Flow Prediction: Spatio-Temporal Adaptation Matters

Ziquan Fang
Dongen Wu
Lu Pan
Lu Chen
Yunjun Gao

Urban flow prediction is a fundamental task to build smart cities, where neural networks have become the most popular method. However, the deep learning methods typically rely on massive training data that are probably inaccessible in real world. In light of this, the community calls for knowledge transfer. However, when adapting transfer learning for cross-city prediction tasks, existing studies are built on static knowledge transfer, ignoring the fact inter-city correlations change dynamically across time. The dynamic correlations make urban feature transfer challenging. This paper proposes a novel Spatio-Temporal Adaptation Network (STAN) to perform urban flow prediction for data-scarce cities via the spatio-temporal knowledge transferred from data-rich cities. STAN encompasses three modules: i) spatial adversarial adaptation module that adopts an adversarial manner to capture the transferable spatial features; ii) temporal attentive adaptation module to attend to critical dynamics for temporal feature transfer; iii) prediction module that aims to learn task-driven transferable knowledge. Extensive experiments on five real datasets show STAN substantially outperforms state-of-the-art methods.

PDF Details DOI

AAAI Conference 2021 Conference Paper

LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short Text Matching

Boer Lyu
Lu Chen
Su Zhu
Kai Yu

Chinese short text matching is a fundamental task in natural language processing. Existing approaches usually take Chinese characters or words as input tokens. They have two limitations: 1) Some Chinese words are polysemous, and semantic information is not fully utilized. 2) Some models suffer potential issues caused by word segmentation. Here we introduce HowNet as an external knowledge base and propose a Linguistic knowledge Enhanced graph Transformer (LET) to deal with word ambiguity. Additionally, we adopt the word lattice graph as input to maintain multi-granularity information. Our model is also complementary to pre-trained language models. Experimental results on two Chinese datasets show that our models outperform various typical text matching approaches. Ablation study also indicates that both semantic information and multi-granularity information are important for text matching modeling.

PDF Details

NeurIPS Conference 2021 Conference Paper

Towards a Unified Game-Theoretic View of Adversarial Perturbations and Robustness

Jie Ren
Die Zhang
Yisen Wang
Lu Chen
Zhanpeng Zhou
Yiting Chen
Xu Cheng
Xin Wang

This paper provides a unified view to explain different adversarial attacks and defense methods, i. e. the view of multi-order interactions between input variables of DNNs. Based on the multi-order interaction, we discover that adversarial attacks mainly affect high-order interactions to fool the DNN. Furthermore, we find that the robustness of adversarially trained DNNs comes from category-specific low-order interactions. Our findings provide a potential method to unify adversarial perturbations and robustness, which can explain the existing robustness-boosting methods in a principle way. Besides, our findings also make a revision of previous inaccurate understanding of the shape bias of adversarially learned features. Our code is available online at https: //github. com/Jie-Ren/A-Unified-Game-Theoretic-Interpretation-of-Adversarial-Robustness.

PDF Details

ICRA Conference 2020 Conference Paper

Optimized Foothold Planning and Posture Searching for Energy-Efficient Quadruped Locomotion over Challenging Terrains

Lu Chen
Shusheng Ye
Caiming Sun
Aidong Zhang 0002
Ganyu Deng
Tianjiao Liao

Energy-efficient locomotion is of primary importance for legged robot to extend operation time in practical applications. This paper presents an approach to achieve energy-efficient locomotion for a quadrupedal robot walking over challenging terrains. Firstly, we optimize the nominal stance parameters based on the analysis of leg torque distribution. Secondly, we proposed the foothold planner and the center of gravity (COG) trajectory planner working together to guide the robot to place its standing legs in an energy-saving stance posture. We have validated the effectiveness of our method on a real quadrupedal robot in experiments including autonomously walking on plain ground and climbing stairs.

Details

IJCAI Conference 2020 Conference Paper

Pivot-based Maximal Biclique Enumeration

Aman Abidi
Rui Zhou
Lu Chen
Chengfei Liu

Enumerating maximal bicliques in a bipartite graph is an important problem in data mining, with innumerable real-world applications across different domains such as web community, bioinformatics, etc. Although substantial research has been conducted on this problem, surprisingly, we find that pivot-based search space pruning, which is quite effective in clique enumeration, has not been exploited in biclique scenario. Therefore, in this paper, we explore the pivot-based pruning for biclique enumeration. We propose an algorithm for implementing the pivot-based pruning, powered by an effective index structure Containment Directed Acyclic Graph (CDAG). Meanwhile, existing literature indicates contradictory findings on the order of vertex selection in biclique enumeration. As such, we re-examine the problem and suggest an offline ordering of vertices which expedites the pivot pruning. We conduct an extensive performance study using real-world datasets from a wide range of domains. The experimental results demonstrate that our algorithm is more scalable and outperforms all the existing algorithms across all datasets and can achieve a significant speedup against the previous algorithms.

PDF Details DOI

AAAI Conference 2020 Conference Paper

Schema-Guided Multi-Domain Dialogue State Tracking with Graph Attention Neural Networks

Lu Chen
Boer Lv
Chi Wang
Su Zhu
Bowen Tan
Kai Yu

Dialogue state tracking (DST) aims at estimating the current dialogue state given all the preceding conversation. For multidomain DST, the data sparsity problem is also a major obstacle due to the increased number of state candidates. Existing approaches generally predict the value for each slot independently and do not consider slot relations, which may aggravate the data sparsity problem. In this paper, we propose a Schema-guided multi-domain dialogue State Tracker with graph attention networks (SST) that predicts dialogue states from dialogue utterances and schema graphs which contain slot relations in edges. We also introduce a graph attention matching network to fuse information from utterances and graphs, and a recurrent graph attention network to control state updating. Experiment results show that our approach obtains new state-of-the-art performance on both MultiWOZ 2. 0 and MultiWOZ 2. 1 benchmarks.

PDF Details

AAAI Conference 2020 Conference Paper

Semi-Supervised Text Simplification with Back-Translation and Asymmetric Denoising Autoencoders

Yanbin Zhao
Lu Chen
Zhi Chen
Kai Yu

Text simpliﬁcation (TS) rephrases long sentences into simpliﬁed variants while preserving inherent semantics. Traditional sequence-to-sequence models heavily rely on the quantity and quality of parallel sentences, which limits their applicability in different languages and domains. This work investigates how to leverage large amounts of unpaired corpora in TS task. We adopt the back-translation architecture in unsupervised machine translation (NMT), including denoising autoencoders for language modeling and automatic generation of parallel data by iterative back-translation. However, it is non-trivial to generate appropriate complex-simple pair if we directly treat the set of simple and complex corpora as two different languages, since the two types of sentences are quite similar and it is hard for the model to capture the characteristics in different types of sentences. To tackle this problem, we propose asymmetric denoising methods for sentences with separate complexity. When modeling simple and complex sentences with autoencoders, we introduce different types of noise into the training process. Such a method can signiﬁcantly improve the simpliﬁcation performance. Our model can be trained in both unsupervised and semi-supervised manner. Automatic and human evaluations show that our unsupervised model outperforms the previous systems, and with limited supervision, our model can perform competitively with multiple state-of-the-art simpliﬁcation systems.

PDF Details