Arrow Research search

Author name cluster

Bin Luo

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

21 papers
1 author row

Possible papers

21

JBHI Journal 2026 Journal Article

Attention-based Multimodal Spatiotemporal Enhanced Interaction Network For Major Depressive Disorder Detection

  • Changxu Dong
  • Xinwei Liu
  • Shuoqiu Gan
  • Zongyun Gu
  • Bin Luo
  • DIRECT Consortium
  • Dengdi Sun

Although deep learning models have shown promising results in detecting major depressive disorder (MDD), two main limitations remain: insufficient exploitation of interactive information across multimodal brain networks and a lack of adaptive mechanisms for capturing crucial spatiotemporal dependencies among brain regions. To address these challenges, we propose the Attention-based Multimodal Spatiotemporal Enhanced Interaction Network (AM-SEIN) for MDD detection. Specifically, to tackle the first challenge, we integrate structural information from 3D structural magnetic resonance imaging (sMRI) with functional temporal data from functional magnetic resonance imaging (fMRI). Additionally, we design the Cross-Modal Interaction Network (CMIN) and fusion layer to enhance mutual information aggregation and facilitate interactive guidance between the two modalities. For the second challenge, we develop an attention-based adaptive spatiotemporal feature-extracting architecture for both modalities, incorporating the fMRI-based Adaptive Spatiotemporal Fusion (fASF) and the sMRI-based Regional-Level Content-Dependent (sRLCD) modules. This approach enables the effective encoding of inter-regional interactions relevant to MDD detection. Finally, the proposed AM-SEIN is evaluated on the Rest-meta-MDD(RMM) and Rest-meta-MDD-V2(RMM-V2) datasets, achieving state-of-the-art performance.

AAAI Conference 2026 Conference Paper

CHASE: Contextual History for Adaptive and Simple Exploitation in Large Language Model Jailbreaking

  • Zhiqiang Hao
  • Chuanyi Li
  • Ye Fan
  • Jun Cai
  • Xiao Fu
  • Shangqi Wang
  • Hao Shen
  • Jiao Yin

We propose Contextual History for Adaptive and Simple Exploitation (CHASE), a novel multi-turn method for Large Language Model (LLM) jailbreaking. Rather than directly attack an LLM that may be difficult to jailbreak, CHASE first collects jailbroken histories from an easy-to-jailbreak LLM and then transfers them to the target LLM. Through this history transfer process, CHASE misleads the target LLM into thinking that it is responsible for producing the jailbroken histories and increases the chances of successful jailbreaking by prompting it to continue the conversation. Extensive evaluations on mainstream LLMs show that CHASE consistently achieves higher attack success rates and demands fewer computational resources compared to existing methods.

AAAI Conference 2026 Conference Paper

Semantic-Driven Visual Progressive Refinement for Aerial-Ground Person ReID: A Challenging Large-Scale Benchmark

  • Aihua Zheng
  • Hao Xie
  • Xixi Wan
  • Zi Wang
  • Shihao Li
  • Jin Tang
  • Bin Luo

Aerial-Ground Person Re-IDentification (AGPReID) aims to extract identity-discriminative representations from heterogeneous perspectives across different platforms in complex real-world environments. However, existing methods primarily focus on visual appearance modeling and make insufficient use of semantic attribute priors, which limits their ability to bridge the aerial-ground view gap. To address this limitation, we propose a Semantic-driven Visual Progressive Refinement framework for AGPReID (SVPR-ReID), which effectively leverages textual attribute priors to guide the extraction of fine-grained visual cues. Specifically, we design a View-Decoupled Feature Extractor that incorporates view-aware textual prompts to decouple view-invariant identity features. Then, to alleviate inter-class ambiguity, we propose an Attribute-Scattered Mixture-of-Experts module that integrates attribute semantics into the visual space, thereby improving discrimination among visually similar pedestrians. Finally, we design a Context-Vision Progressive Refinement module for progressive refinement of attribute and view-invariant features, obtaining robust cross-view identity representations. In particular, we contribute a comprehensive benchmark for AGPReID, named CP2108, which contains 142,817 images of 2,108 identities annotated with 22 attributes. Notably, it includes 191 identities captured across different times, enabling both short- and long-term ReID evaluation, addressing the limitation of existing datasets that focus only on short-term scenarios. Extensive experimental results validate the effectiveness of our SVPR-ReID on four AGPReID datasets.

AAAI Conference 2025 Conference Paper

Alignment-Free RGB-T Salient Object Detection: A Large-Scale Dataset and Progressive Correlation Network

  • Kunpeng Wang
  • Keke Chen
  • Chenglong Li
  • Zhengzheng Tu
  • Bin Luo

Alignment-free RGB-Thermal (RGB-T) salient object detection (SOD) aims to achieve robust performance in complex scenes by directly leveraging the complementary information from unaligned visible-thermal image pairs, without requiring manual alignment. However, the labor-intensive process of collecting and annotating image pairs limits the scale of existing benchmarks, hindering the advancement of alignment-free RGB-T SOD. In this paper, we construct a large-scale and high-diversity unaligned RGB-T SOD dataset named UVT20K, comprising 20,000 image pairs, 407 scenes, and 1256 object categories. All samples are collected from real-world scenarios with various challenges, such as low illumination, image clutter, complex salient objects, and so on. To support the exploration for further research, each sample in UVT20K is annotated with a comprehensive set of ground truths, including saliency masks, scribbles, boundaries, and challenge attributes. In addition, we propose a Progressive Correlation Network (PCNet), which models inter- and intra-modal correlations on the basis of explicit alignment to achieve accurate predictions in unaligned image pairs. Extensive experiments conducted on two unaligned three weakly aligned three aligned datasets demonstrate the effectiveness of our method.

AAAI Conference 2025 Conference Paper

RGBT Tracking via All-layer Multimodal Interactions with Progressive Fusion Mamba

  • Andong Lu
  • Wanyu Wang
  • Chenglong Li
  • Jin Tang
  • Bin Luo

Existing RGBT tracking methods often design various interaction models to perform cross-modal fusion of each layer, but can not execute the feature interactions among all layers, which plays a critical role in robust multimodal representation, due to large computational burden. To address this issue, this paper presents a novel All-layer multimodal Interaction Network, named AINet, which performs efficient and effective feature interactions of all modalities and layers in a progressive fusion Mamba, for robust RGBT tracking. Even though modality features in different layers are known to contain different cues, it is always challenging to build multimodal interactions in each layer due to struggling in balancing interaction capabilities and efficiency. Meanwhile, considering that the feature discrepancy between RGB and thermal modalities reflects their complementary information to some extent, we design a Difference-based Fusion Mamba (DFM) to achieve enhanced fusion of different modalities with linear complexity. When interacting with features from all layers, a huge number of token sequences (3840 tokens in this work) are involved and the computational burden is thus large. To handle this problem, we design an Order-dynamic Fusion Mamba (OFM) to execute efficient and effective feature interactions of all layers by dynamically adjusting the scan order of different layers in Mamba. Extensive experiments on four public RGBT tracking datasets show that AINet achieves leading performance against existing state-of-the-art methods. We will release the code upon acceptance of the paper.

TMLR Journal 2025 Journal Article

Sparse-Input Neural Network using Group Concave Regularization

  • Bin Luo
  • Susan Halabi

Simultaneous feature selection and non-linear function estimation is challenging in modeling, especially in high-dimensional settings where the number of variables exceeds the available sample size. In this article, we investigate the problem of feature selection in neural networks. Although the group least absolute shrinkage and selection operator (LASSO) has been utilized to select variables for learning with neural networks, it tends to select unimportant variables into the model to compensate for its over-shrinkage. To overcome this limitation, we propose a framework of sparse-input neural networks using group concave regularization for feature selection in both low-dimensional and high-dimensional settings. The main idea is to apply a proper concave penalty to the $l_2$ norm of weights from all outgoing connections of each input node, and thus obtain a neural net that only uses a small subset of the original variables. In addition, we develop an effective algorithm based on backward path-wise optimization to yield stable solution paths, in order to tackle the challenge of complex optimization landscapes. We provide a rigorous theoretical analysis of the proposed framework, establishing finite-sample guarantees for both variable selection consistency and prediction accuracy. These results are supported by extensive simulation studies and real data applications, which demonstrate the finite-sample performance of the estimator in feature selection and prediction across continuous, binary, and time-to-event outcomes.

IJCAI Conference 2023 Conference Paper

DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving

  • Jun-Yan He
  • Zhi-Qi Cheng
  • Chenyang Li
  • Wangmeng Xiang
  • Binghui Chen
  • Bin Luo
  • Yifeng Geng
  • Xuansong Xie

In the realm of autonomous driving, real-time perception or streaming perception remains under-explored. This research introduces DAMO-StreamNet, a novel framework that merges the cutting-edge elements of the YOLO series with a detailed examination of spatial and temporal perception techniques. DAMO-StreamNet's main inventions include: (1) a robust neck structure employing deformable convolution, bolstering receptive field and feature alignment capabilities; (2) a dual-branch structure synthesizing short-path semantic features and long-path temporal features, enhancing the accuracy of motion state prediction; (3) logits-level distillation facilitating efficient optimization, which aligns the logits of teacher and student networks in semantic space; and (4) a real-time prediction mechanism that updates the features of support frames with the current frame, providing smooth streaming perception during inference. Our testing shows that DAMO-StreamNet surpasses current state-of-the-art methodologies, achieving 37. 8% (normal size (600, 960)) and 43. 3% (large size (1200, 1920)) sAP without requiring additional data. This study not only establishes a new standard for real-time perception but also offers valuable insights for future research. The source code is at https: //github. com/zhiqic/DAMO-StreamNet.

IJCAI Conference 2023 Conference Paper

HDFormer: High-order Directed Transformer for 3D Human Pose Estimation

  • Hanyuan Chen
  • Jun-Yan He
  • Wangmeng Xiang
  • Zhi-Qi Cheng
  • Wei Liu
  • Hanbing Liu
  • Bin Luo
  • Yifeng Geng

Human pose estimation is a challenging task due to its structured data sequence nature. Existing methods primarily focus on pair-wise interaction of body joints, which is insufficient for scenarios involving overlapping joints and rapidly changing poses. To overcome these issues, we introduce a novel approach, the High-order Directed Transformer (HDFormer), which leverages high-order bone and joint relationships for improved pose estimation. Specifically, HDFormer incorporates both self-attention and high-order attention to formulate a multi-order attention module. This module facilitates first-order "joint-joint", second-order "bone-joint", and high-order "hyperbone-joint" interactions, effectively addressing issues in complex and occlusion-heavy situations. In addition, modern CNN techniques are integrated into the transformer-based architecture, balancing the trade-off between performance and efficiency. HDFormer significantly outperforms state-of-the-art (SOTA) models on Human3. 6M and MPI-INF-3DHP datasets, requiring only 1/10 of the parameters and significantly lower computational costs. Moreover, HDFormer demonstrates broad real-world applicability, enabling real-time, accurate 3D pose estimation. The source code is in https: //github. com/hyer/HDFormer.

IJCAI Conference 2022 Conference Paper

Deep Learning Meets Software Engineering: A Survey on Pre-Trained Models of Source Code

  • Changan Niu
  • Chuanyi Li
  • Bin Luo
  • Vincent Ng

Recent years have seen the successful application of deep learning to software engineering (SE). In particular, the development and use of pre-trained models of source code has enabled state-of-the-art results to be achieved on a wide variety of SE tasks. This paper provides an overview of this rapidly advancing field of research and reflects on future research directions.

AAAI Conference 2020 Conference Paper

Multi-Spectral Vehicle Re-Identification: A Challenge

  • Hongchao Li
  • Chenglong Li
  • Xianpeng Zhu
  • Aihua Zheng
  • Bin Luo

Vehicle re-identification (Re-ID) is a crucial task in smart city and intelligent transportation, aiming to match vehicle images across non-overlapping surveillance camera views. Currently, most works focus on RGB-based vehicle Re-ID, which limits its capability of real-life applications in adverse environments such as dark environments and bad weathers. IR (Infrared) spectrum imaging offers complementary information to relieve the illumination issue in computer vision tasks. Furthermore, vehicle Re-ID suffers a big challenge of the diverse appearance with different views, such as trucks. In this work, we address the RGB and IR vehicle Re-ID problem and contribute a multi-spectral vehicle Re-ID benchmark named RGBN300, including RGB and NIR (Near Infrared) vehicle images of 300 identities from 8 camera views, giving in total 50125 RGB images and 50125 NIR images respectively. In addition, we have acquired additional TIR (Thermal Infrared) data for 100 vehicles from RGBN300 to form another dataset for three-spectral vehicle Re-ID. Furthermore, we propose a Heterogeneity-collaboration Aware Multi-stream convolutional Network (HAMNet) towards automatically fusing different spectrum features in an endto-end learning framework. Comprehensive experiments on prevalent networks show that our HAMNet can effectively integrate multi-spectral data for robust vehicle Re-ID in day and night. Our work provides a benchmark dataset for RGB- NIR and RGB-NIR-TIR multi-spectral vehicle Re-ID and a baseline network for both research and industrial communities. The dataset and baseline codes are available at: https: //github. com/ttaalle/multi-modal-vehicle-Re-ID.

NeurIPS Conference 2017 Conference Paper

Graph Matching via Multiplicative Update Algorithm

  • Bo Jiang
  • Jin Tang
  • Chris Ding
  • Yihong Gong
  • Bin Luo

As a fundamental problem in computer vision, graph matching problem can usually be formulated as a Quadratic Programming (QP) problem with doubly stochastic and discrete (integer) constraints. Since it is NP-hard, approximate algorithms are required. In this paper, we present a new algorithm, called Multiplicative Update Graph Matching (MPGM), that develops a multiplicative update technique to solve the QP matching problem. MPGM has three main benefits: (1) theoretically, MPGM solves the general QP problem with doubly stochastic constraint naturally whose convergence and KKT optimality are guaranteed. (2) Em- pirically, MPGM generally returns a sparse solution and thus can also incorporate the discrete constraint approximately. (3) It is efficient and simple to implement. Experimental results show the benefits of MPGM algorithm.

AAAI Conference 2017 Conference Paper

Nonnegative Orthogonal Graph Matching

  • Bo Jiang
  • Jin Tang
  • Chris Ding
  • Bin Luo

Graph matching problem that incorporates pair-wise constraints can be formulated as Quadratic Assignment Problem (QAP). The optimal solution of QAP is discrete and combinational, which makes QAP problem NP-hard. Thus, many algorithms have been proposed to find approximate solutions. In this paper, we propose a new algorithm, called Nonnegative Orthogonal Graph Matching (NOGM), for QAP matching problem. NOGM is motivated by our new observation that the discrete mapping constraint of QAP can be equivalently encoded by a nonnegative orthogonal constraint which is much easier to implement computationally. Based on this observation, we develop an effective multiplicative update algorithm to solve NOGM and thus can find an effective approximate solution for QAP problem. Comparing with many traditional continuous methods which usually obtain continuous solutions and should be further discretized, NOGM can obtain a sparse solution and thus incorporates the desirable discrete constraint naturally in its optimization. Promising experimental results demonstrate benefits of NOGM algorithm.

TIST Journal 2016 Journal Article

A Spatial-Temporal Topic Model for the Semantic Annotation of POIs in LBSNs

  • Tieke He
  • Hongzhi Yin
  • Zhenyu Chen
  • Xiaofang Zhou
  • Shazia Sadiq
  • Bin Luo

Semantic tags of points of interest (POIs) are a crucial prerequisite for location search, recommendation services, and data cleaning. However, most POIs in location-based social networks (LBSNs) are either tag-missing or tag-incomplete. This article aims to develop semantic annotation techniques to automatically infer tags for POIs. We first analyze two LBSN datasets and observe that there are two types of tags, category-related ones and sentimental ones, which have unique characteristics. Category-related tags are hierarchical, whereas sentimental ones are category-aware. All existing related work has adopted classification methods to predict high-level category-related tags in the hierarchy, but they cannot apply to infer either low-level category tags or sentimental ones. In light of this, we propose a latent-class probabilistic generative model, namely the spatial-temporal topic model (STM), to infer personal interests, the temporal and spatial patterns of topics/semantics embedded in users’ check-in activities, the interdependence between category-topic and sentiment-topic, and the correlation between sentimental tags and rating scores from users’ check-in and rating behaviors. Then, this learned knowledge is utilized to automatically annotate all POIs with both category-related and sentimental tags in a unified way. We conduct extensive experiments to evaluate the performance of the proposed STM on a real large-scale dataset. The experimental results show the superiority of our proposed STM, and we also observe that the real challenge of inferring category-related tags for POIs lies in the low-level ones of the hierarchy and that the challenge of predicting sentimental tags are those with neutral ratings.

IJCAI Conference 2016 Conference Paper

Robust Out-of-Sample Data Recovery

  • Bo Jiang
  • Chris Ding
  • Bin Luo

Trace norm based rank regularization techniques have been successfully applied to learn a low-rank recovery for high-dimensional noise data. In many applications, it is desirable to add new samples to previously recovered data which is known as out of sample data recovery problem. However, traditional trace norm based regularization methods can not naturally cope with new samples and thus fail to deal with out-of-sample data recovery. In this paper, we propose a new robust out-of-sample data recovery (ROSR) model for trace norm based regularization methods. An effective iterative algorithm, with the proof of convergence, is presented to find the optimal solution of ROSR problem. As an application, we apply our ROSR to image classification task. Experimental results on six image datasets demonstrate the effectiveness and benefits of the proposed ROSR method.

AAAI Conference 2014 Conference Paper

Robust Non-Negative Dictionary Learning

  • Qihe Pan
  • Deguang Kong
  • Chris Ding
  • Bin Luo

Dictionary learning plays an important role in machine learning, where data vectors are modeled as a sparse linear combinations of basis factors (i. e. , dictionary). However, how to conduct dictionary learning in noisy environment has not been well studied. Moreover, in practice, the dictionary (i. e. , the lower rank approximation of the data matrix) and the sparse representations are required to be nonnegative, such as applications for image annotation, document summarization, microarray analysis. In this paper, we propose a new formulation for non-negative dictionary learning in noisy environment, where structure sparsity is enforced on sparse representation. The proposed new formulation is also robust for data with noises and outliers, due to a robust loss function used. We derive an efficient multiplicative updating algorithm to solve the optimization problem, where dictionary and sparse representation are updated iteratively. We prove the convergence and correctness of proposed algorithm rigorously. We show the differences of dictionary at different level of sparsity constraint. The proposed algorithm can be adapted for clustering and semi-supervised learning.

IJCAI Conference 2011 Conference Paper

Angular Decomposition

  • Dengdi Sun
  • Chris Ding
  • Bin Luo
  • Jin Tang

Dimensionality reduction plays a vital role in pattern recognition. However, for normalized vector data, existing methods do not utilize the fact that the data is normalized. In this paper, we propose to employ an Angular Decomposition of the normalized vector data which corresponds to embedding them on a unit surface. On graph data for similarity/kernel matrices with constant diagonal elements, we propose the Angular Decomposition of the similarity matrices which corresponds to embedding objects on a unit sphere. In these angular embeddings, the Euclidean distance is equivalent to the cosine similarity. Thus data structures best described in the cosine similarity and data structures best captured by the Euclidean distance can both be effectively detected in our angular embedding. We provide the theoretical analysis, derive the computational algorithm, and evaluate the angular embedding on several datasets. Experiments on data clustering demonstrate that our method can provide a more discriminative subspace.