Author name cluster

Bin Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers

2 author rows

EAAI Journal 2026 Journal Article

Global relationship awareness 3-dimensional object detection using 4-dimensional radar

Pianzhang Duan
Li Wang
Cheng Fang
Ziying Song
Ming Gao
Mo Zhou
Ying Li
Yibo Zhang

4D (4-dimensional) radar sensing technology is essential for high-precision autonomous driving perception systems, as its superior detection capabilities at increased distances, compared to traditional LiDAR (Light Detection and Ranging). However, due to the sparsity of point clouds and the low resolution of millimeter-wave radar, voxel-based methods may fail to detect distant or closely adjacent objects, leading to inadequate detection accuracy. To mitigate the accuracy issues arising from the sparse nature of point clouds in such scenarios, we propose a novel object detection network: GRA-Net (Global Relation-Aware object detection Network). By leveraging a self-attention mechanism, GRA-Net effectively learns critical features from each radar pillar, enhancing the network’s capacity to capture relevant information about nearby objects. Furthermore, we introduce a global perception module that integrates key features within the pillars and global features, mitigating the impact of point cloud sparsity, particularly in distant regions. We conducted a series of experiments to evaluate the performance of GRA-Net. On the Astyx HiRes 2019 dataset, our method achieved 33. 63 mAP (mean Average Precision) and 43. 93 mAP at the moderate level; On the View-of-Delft dataset, our method achieved 47. 74 mAP in the entire annotated area and 69. 25 mAP in the driving corridor area.

Details DOI

NeurIPS Conference 2025 Conference Paper

AGENTIF: Benchmarking Large Language Models Instruction Following Ability in Agentic Scenarios

Yunjia Qi
Hao Peng
Xiaozhi Wang
Amy Xin
Youfeng Liu
Bin Xu
Lei Hou
Juanzi Li

Large Language Models (LLMs) have demonstrated advanced capabilities in real-world agentic applications. Growing research efforts aim to develop LLM-based agents to address practical demands, introducing a new challenge: agentic scenarios often involve lengthy instructions with complex constraints, such as extended system prompts and detailed tool specifications. While adherence to such instructions is crucial for agentic applications, whether LLMs can reliably follow them remains underexplored. In this paper, we introduce AgentIF, the first benchmark for systematically evaluating LLM instruction following ability in agentic scenarios. AgentIF features three key characteristics: (1) Realistic, constructed from $50$ real-world agentic applications. (2) Long, averaging $1, 723$ words with a maximum of $15, 630$ words. (3) Complex, averaging $11. 9$ constraints per instruction, covering diverse constraint types, such as tool specifications and condition constraints. To construct AgentIF, we collect $707$ human-annotated instructions across $50$ agentic tasks from industrial application agents and open-source agentic systems. For each instruction, we annotate the associated constraints and corresponding evaluation metrics, including code-based evaluation, LLM-based evaluation, and hybrid code-LLM evaluation. We use AgentIF to systematically evaluate existing advanced LLMs. We observe that current models generally perform poorly, especially in handling complex constraint structures and tool specifications. We further conduct error analysis and analytical experiments on instruction length and meta constraints, providing some findings about the failure modes of existing LLMs. We have released the code and data to facilitate future research.

PDF Details

IJCAI Conference 2025 Conference Paper

CAN-ST: Clustering Adaptive Normalization for Spatio-temporal OOD Learning

Min Yang
Yang An
Jinliang Deng
Xiaoyu Li
Bin Xu
Ji Zhong
Xiankai Lu
Yongshun Gong

Spatio-temporal data mining is crucial for decision-making and planning in diverse domains. However, in real-world scenarios, training and testing data are often not independent or identically distributed due to rapid changes in data distributions over time and space, resulting in spatio-temporal out-of-distribution (OOD) challenges. This non-stationarity complicates accurate predictions and has motivated research efforts focused on mitigating non-stationarity through normalization operations. Existing methods, nonetheless, often address individual time series in isolation, neglecting correlations across series, which limits their capacity to handle complex spatio-temporal dynamics and results in suboptimal solutions. To overcome these challenges, we propose Clustering Adaptive Normalization (CAN-ST), a general and model-agnostic method that mitigates non-stationarity by capturing both localized distributional changes and shared patterns across nodes via adaptive clustering and a parameter register. As a plugin, CAN-ST can be easily integrated into various spatio-temporal prediction models. Extensive experiments on multiple datasets with diverse forecasting models demonstrate that CAN-ST consistently improves performance by over 20% on average and outperforms state-of-the-art normalization methods.

PDF Details DOI

EAAI Journal 2025 Journal Article

Deeppipe: A physics-enhanced adaptive multi-modal fused neural network for predicting contamination length interval in multi-product pipeline

Jian Du
Jianqin Zheng
Pengtao Niu
Shiyuan Pan
Qi Liao
Rui Qiu
Bin Xu
Ning Xu

During the operation of a multi-product pipeline, an accurate and effective prediction of contamination length interval is the central key to guiding the cutting plan formulation and improving the economic effect. However, the existing methods focus on extracting implicit principles and insufficient feature correlations in a data-driven pattern but overlook the potential knowledge in the scientific theory of contamination development, failing to provide practically useful results. Consequently, in this study, the holistic feature correlations and physical knowledge are extracted and integrated into the neural network to propose a physics-enhanced adaptive multi-modal fused neural network (PE-AMFNN) for contamination length interval prediction. In PE-AMFNN, a multi-modal adaptive feature fusion module is created to establish a comprehensive feature space with quantified feature importance, thus capturing sufficient feature correlations. Subsequently, a mechanism-coupled customized neural network is designed to incorporate the explicit scientific principle into the forward and backward propagation. Besides, a physics-embedded loss function, which introduces interval differences and interval correlation constraints, is established to unearth the latent physical knowledge in contamination development and force the model to draw physically unreasonable results. Validation on the real-world cases implies that the proposed model outperforms the start-of-the-art techniques and latest achievements, with Root Mean Squared Relative Errors reduced by 31 % and 36 % in lower and upper limit prediction. Furthermore, the sensitivity analysis of model modules suggests that both the multi-modal feature fusion and physical principles are crucial for model improvement.

Details DOI

YNIMG Journal 2025 Journal Article

Mesoscale functional connectivity of amygdala to the auditory and prefrontal cortex of macaque monkeys revealed by INS-fMRI

Qianbing Li
An Ping
Yuqi Feng
Bin Xu
Baorong Zhang
Anna Wang Roe
Lixia Gao
Xinjian Li

Mammals rely heavily on their auditory system to perceive environmental threats, socially communicate, and care for the young. As an extension of the multiple sensory system including the auditory system, the amygdala evaluates the emotional salience of acoustic stimuli, and mediates its impact on sensory, cognitive, and physiological aspects of emotional processing via the lateral amygdala (LA), basal amygdala (BA), and central amygdala (CeA) nuclei of the amygdala in acoustic domain. However, the functional connections of LA, BA, and CeA with the auditory cortex (AC) and the prefrontal cortex (PFC) remain unclear, particularly at the mesoscale level. Here we employed a novel method called INS-fMRI (Infrared Neural Stimulation combined with high-resolution functional magnetic resonance imaging) in Macaque monkeys, this method permits stimulation of multiple sites within single animals in vivo, so that the relative organization of auditory networks can be studied. We found that: (1) Focal INS stimulation of the amygdala elicited robust and reliable responses in both the AC and the PFC; (2) Amygdala stimulation mainly activated ipsilateral AC and PFC; (3) The stimulation of the amygdala mainly activated the secondary AC, and the dorsolateral PFC; (4) The connection between the amygdala and the cortex is mainly mediated by neurons in LA and BA connection area. Our study further revealed the functional connectivity among the amygdala subnucleus, the auditory cortex and the prefrontal cortex, and will shed light on the research for processing biologically meaningful complex sounds.

Details DOI

ECAI Conference 2025 Conference Paper

Model Recovery at the Edge Under Resource Constraints for Physical AI

Bin Xu
Ayan Banerjee 0001
Sandeep K. S. Gupta

Model Recovery (MR) enables safe, explainable decision-making in mission-critical autonomous systems (MCAS) by learning governing dynamical equations, but its deployment on edge devices is hindered by the iterative nature of neural ordinary differential equations (NODE), which are inefficient on FPGAs. Memory and energy consumption are the main concern of applying MR on edge devices for real-time running MR. We propose MERINDA, a novel FPGA-accelerated MR framework that replaces iterative solvers with a parallelizable neural architecture equivalent to NODEs. MERINDA achieves nearly 11× lower DRAM usage and 2. 2× faster runtime compared to mobile GPUs. Experiments reveal an inverse relationship between memory and energy at fixed accuracy, highlighting MERINDA’s suitability for resource-constrained, real-time MCAS. “The implementation and datasets are publicly available at github. com/ImpactLabASU/ECAI2025. ”

Details

IJCAI Conference 2025 Conference Paper

SRA-MCTS: Self-driven Reasoning Augmentation with Monte Carlo Tree Search for Code Generation

Bin Xu
Yiguan Lin
Yinghao Li
Yang Gao

Large language models exhibit remarkable performance in simple code generation tasks. However, they encounter significant challenges when addressing complex problems that require reasoning and question decomposition. To tackle this, we propose a self-driven reasoning augmentation process, SRA-MCTS, which incorporates Monte Carlo Tree Search (MCTS) for reasoning data generation. SRA-MCTS enables LLMs to self-generate intermediate reasoning steps and perform iterative self-evaluation, facilitating self-improvement. Specifically, it utilizes MCTS to produce diverse intermediate reasoning steps. During each iteration, MCTS generates a step and employs self-evaluation to guide the selection of subsequent branches, ultimately forming a sufficiently diverse reasoning path referred to as “thinking”. This thinking guides the model in generating corresponding code, and both are combined as training data for supervised fine-tuning. Experimental results demonstrate that SRA-MCTS achieves consistent performance improvements across three model scales without additional supervisory assistance. Applied to the Meta-Llama-3. 1-8B-Instruct model, it delivers an 11-point improvement on the MBPP-Complex dataset, underscoring the significant potential for model self-improvement. The code and data are available at https: //github. com/DIRECT-BIT/SRA-MCTS.

PDF Details DOI

EAAI Journal 2024 Journal Article

A novel multi-step ahead prediction method for landslide displacement based on autoregressive integrated moving average and intelligent algorithm

Peng Shao
Hong Wang
Guangyu Long
Jianxing Liao
Fei Gan
Bin Xu
Ke Hu
Yuhang Teng

Accurate landslide displacement prediction is crucial for prevention and early warning. In this paper, we proposed a novel hybrid multi-step-ahead prediction model (ARIMA-IM) that combines autoregressive integrated moving average (ARIMA) and intelligent models (IMs) for landslide displacement prediction. This model integrates the linear prediction strengths of ARIMA, the signal decomposition capabilities of variational mode decomposition (VMD) and empirical wavelet transform (EWT), and the nonlinear change-capturing ability of IMs. The proposed model cannot only effectively capture abrupt and long-term trends in landslide displacement but also enable high-precision multi-step-ahead prediction. To validate the effectiveness of the proposed model, we predicted the displacement of the Bazimen landslide in the Three Gorges Reservoir by using four IMs, namely long short-term memory (LSTM), bidirectional LSTM (BiLSTM), gate recurrent unit (GRU), and artificial neural network (ANN), in both continuous and jump strategies for multi-step-ahead prediction. Results showed that ARIMA-IM, especially the hybrid model combining ARIMA and deep learning, achieved high accuracy in 1–5-step-ahead prediction. Continuous multi-step-ahead prediction exhibited higher accuracy and provided more prediction information for decision-making. Compared to other models, ARIMA-IM not only achieved multi-step-ahead prediction but also exhibited comparable or better prediction accuracy, which is of high practical significance for landslide disaster early warning.

Details DOI

NeurIPS Conference 2024 Conference Paper

CogVLM: Visual Expert for Pretrained Language Models

Weihan Wang
Qingsong Lv
Wenmeng Yu
Wenyi Hong
Ji Qi
Yan Wang
Junhui Ji
Zhuoyi Yang

We introduce CogVLM, a powerful open-source visual language foundation model. Different from the popular \emph{shallow alignment} method which maps image features into the input space of language model, CogVLM bridges the gap between the frozen pretrained language model and image encoder by a trainable visual expert module in the attention and FFN layers. As a result, CogVLM enables a deep fusion of vision language features without sacrificing any performance on NLP tasks. CogVLM-17B achieves state-of-the-art performance on 17 classic cross-modal benchmarks, including 1) image captioning datasets: NoCaps, Flicker30k, 2) VQA datasets: OKVQA, TextVQA, OCRVQA, ScienceQA, 3) LVLM benchmarks: MM-Vet, MMBench, SEED-Bench, LLaVABench, POPE, MMMU, MathVista, 4) visual grounding datasets: RefCOCO, RefCOCO+, RefCOCOg, Visual7W. Codes and checkpoints are available at Github.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Lookup Table meets Local Laplacian Filter: Pyramid Reconstruction Network for Tone Mapping

Feng Zhang
Ming Tian
Zhiqiang Li
Bin Xu
Qingbo Lu
Changxin Gao
Nong Sang

Tone mapping aims to convert high dynamic range (HDR) images to low dynamic range (LDR) representations, a critical task in the camera imaging pipeline. In recent years, 3-Dimensional LookUp Table (3D LUT) based methods have gained attention due to their ability to strike a favorable balance between enhancement performance and computational efficiency. However, these methods often fail to deliver satisfactory results in local areas since the look-up table is a global operator for tone mapping, which works based on pixel values and fails to incorporate crucial local information. To this end, this paper aims to address this issue by exploring a novel strategy that integrates global and local operators by utilizing closed-form Laplacian pyramid decomposition and reconstruction. Specifically, we employ image-adaptive 3D LUTs to manipulate the tone in the low-frequency image by leveraging the specific characteristics of the frequency information. Furthermore, we utilize local Laplacian filters to refine the edge details in the high-frequency components in an adaptive manner. Local Laplacian filters are widely used to preserve edge details in photographs, but their conventional usage involves manual tuning and fixed implementation within camera imaging pipelines or photo editing tools. We propose to learn parameter value maps progressively for local Laplacian filters from annotated data using a lightweight network. Our model achieves simultaneous global tone manipulation and local edge detail preservation in an end-to-end manner. Extensive experimental results on two benchmark datasets demonstrate that the proposed method performs favorably against state-of-the-art methods.

PDF Details

AAAI Conference 2020 Conference Paper

Image Enhanced Event Detection in News Articles

Meihan Tong
Shuai Wang
Yixin Cao
Bin Xu
Juanzi Li
Lei Hou
Tat-Seng Chua

Event detection is a crucial and challenging sub-task of event extraction, which suffers from a severe ambiguity issue of trigger words. Existing works mainly focus on using textual context information, while there naturally exist many images accompanied by news articles that are yet to be explored. We believe that images not only reﬂect the core events of the text, but are also helpful for the disambiguation of trigger words. In this paper, we ﬁrst contribute an image dataset supplement to ED benchmarks (i. e. , ACE2005) for training and evaluation. We then propose a novel Dual Recurrent Multimodal Model, DRMM, to conduct deep interactions between images and sentences for modality features aggregation. DRMM utilizes pre-trained BERT and ResNet to encode sentences and images, and employs an alternating dual attention to select informative features for mutual enhancements. Our superior performance compared to six state-of-art baselines as well as further ablation studies demonstrate the signiﬁcance of image modality and effectiveness of the proposed architecture. The code and image dataset are avaliable at https: //github. com/ shuaiwa16/image-enhanced-event-extraction.

PDF Details

EAAI Journal 2019 Journal Article

Empirical study on character level neural network classifier for Chinese text

Tonglee Chung
Bin Xu
Yongbin Liu
Chunping Ouyang
Siliang Li
Lingyun Luo

Character level models are drawing attention recently. A number of these models have been proposed and shown successful in Natural Language Processing tasks. While most of the models are experimented mainly on English, or other alphabetic languages, a number of problems arise when they applied these models to non-alphabetic language such as Chinese. In this study, we investigated the problems encountered when transferring these models to the Chinese and put forward some solutions. We propose a double embedding neural network model that is also character level and consists of both CNN and RNN with two separate embeddings. The model is applied to a fundamental Natural Language Processing task, text classification. Experiment results conducted on the Chinese corpus demonstrated that our character level neural network model performs just as well as or better than those word level classification models. Our model is able to reach 95. 9% accuracy on a Chinese Fudan news dataset, which outperforms the state-of-the-art models.

Details DOI

IROS Conference 2018 Conference Paper

Design and Implementation of a Novel Aerial Manipulator with Tandem Ducted Fans

Yibo Zhang
Changle Xiang
Bin Xu
Yang Wang
Xiaoliang Wang

This paper proposes a novel aerial manipulator with tandem ducted fans, which takes both trafficability and effective loading into account. The aerial manipulator is particularly suitable for grasping in complex and narrow environment, in which traditional multi-rotor and helicopter would be inaccessible. The comprehensive integrated dynamic model is established by taking the aerial vehicle dynamics and manipulator dynamics as a whole. On this basis, a multilayer composite controller with feedforward compensation is designed, considering the mutual reactive influence between the aerial vehicle and the manipulator to improve the stability of the system under the motion of the manipulator. The simulation and actual flight tests verify the effectiveness of the design and show good stability and tracking performance of the system.

Details

ICRA Conference 2018 Conference Paper

Safe Teleoperation of Dynamic UAVs Through Control Barrier Functions

Bin Xu
Koushil Sreenath

This paper presents a method for assisting human operators to teleoperate highly dynamic systems such as quadrotors inside a constrained environment with safety guarantees. Our method enables human operators to focus on manually operating and flying quadrotor systems without the need to focus on avoiding potential obstacles. This is achieved with the presented supervisory controller overriding human input to enforce safety constraints when necessary. This method can be used as an assistive training solution for novice pilots to begin flying quadrotors without crashing them. Our supervisory controller uses an Exponential control barrier function based quadratic program to achieve safe human teleoperated flight. We demonstrate and validate our control approach through several experiments with multiple users with varying skill levels for three different scenarios of a quadrotor flying in a motion capture environment with virtual and physical constraints.

Details

ICRA Conference 2017 Conference Paper

Real-time visual tracking via robust Kernelized Correlation Filter

Xiaoliang Wang
Marie O'Brien
Changle Xiang
Bin Xu
Homayoun Najjaran

There has been an increasing interest in the use of correlation filters for visual object tracking due to their impressive tracking performance. However, existing correlation filter based tracking methods, such as Struck and Kernelized Correlation Filter (KCF), cannot always solve tracking problems in complicated conditions such as heavy occlusion and aggressive motion. In this paper, we proposed a real-time visual tracker via a robust KCF. We start by implementing a search window alignment, based on a motion model with uncertainty, which increases the tracking accuracy for fast moving targets and reduces the padding value to accelerate tracking speed. Next, we establish a combined confidence measurement including occlusion information, which is utilized for robust updating. Then we apply an adaptive Kalman filter to improve the tracking accuracy. Qualitative and quantitative experimental results show that the proposed algorithm outperforms the state-of-the-art methods such as KCF and Struck.

Details

IJCAI Conference 2013 Conference Paper

Harmonious Hashing

Bin Xu
Jiajun Bu
Yue Lin
Chun Chen
Xiaofei He
Deng Cai

Hashing-based fast nearest neighbor search technique has attracted great attention in both research and industry areas recently. Many existing hashing approaches encode data with projection-based hash functions and represent each projected dimension by 1-bit. However, the dimensions with high variance hold large energy or information of data but treated equivalently as dimensions with low variance, which leads to a serious information loss. In this paper, we introduce a novel hashing algorithm called Harmonious Hashing which aims at learning hash functions with low information loss. Specifically, we learn a set of optimized projections to preserve the maximum cumulative energy and meet the constraint of equivalent variance on each dimension as much as possible. In this way, we could minimize the information loss after binarization. Despite the extreme simplicity, our method outperforms superiorly to many state-of-the-art hashing methods in large-scale and high-dimensional nearest neighbor search experiments.

PDF Details DOI

AAAI Conference 2012 Conference Paper

A Bregman Divergence Optimization Framework for Ranking on Data Manifold and Its New Extensions

Bin Xu
Jiajun Bu
Chun Chen
Deng Cai

Recently, graph-based ranking algorithms have received considerable interests in machine learning, computer vision and information retrieval communities. Ranking on data manifold (or manifold ranking, MR) is one of the representative approaches. One of the limitations of manifold ranking is its high computational complexity (O(n3 ), where n is the number of samples in database). In this paper, we cast the manifold ranking into a Bregman divergence optimization framework under which we transform the original MR to an equivalent optimal kernel matrix learning problem. With this new formulation, two effective and efficient extensions are proposed to enhance the ranking performance. Extensive experimental results on two real world image databases show the effectiveness of the proposed approach.

PDF Details

EAAI Journal 2004 Journal Article

Direct identification of structural parameters from dynamic responses with neural networks

Bin Xu
Zhishen Wu
Genda Chen
Koichi Yokoyama

Details DOI