Author name cluster

Bing Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

34 papers

2 author rows

AAAI Conference 2026 Conference Paper

Collaborative Feature Matching with Progressive Correspondence Learning

Xin Liu
Yanbing Han
Rong Qin
Bing Wang
Jufeng Yang

Accurate feature matching between image pairs is fundamental for various computer vision applications. In detector-base process, the feature matcher aims to find the optimal feature correspondences, and the match filter is used for further removing mismatches. However, their connection is rarely exploited since they are usually treated as two separate issues in previous method, which may lead to suboptimal results. In this paper, we propose an end-to-end collaborative feature matching (CFM) method, which contains a keypoint learning (KL) module and a correspondence learning (CL) module, to bridge the gap between two types of works. The former improves the discrimination of keypoints, and provides high-quality dynamic matches for CL module. The latter further captures the rich context of matches, and gives effective feedback to KL module. These two modules can reinforce each other in a progressive manner. Besides, we develop an efficient version of CFM, named ECFM, using an adaptive sampling strategy to avoid the negative influence of uninformative keypoints. Experimental results indicate that both methods outperform the state-of-the-art competitors in the tasks of relative pose estimation and visual localization.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Enhancing Multimodal Misinformation Detection by Replaying the Whole Story from Image Modality Perspective

Bing Wang
Ximing Li
Yanjun Wang
Changchun Li
Lin Yuanbo Wu
Buyu Wang
Shengsheng Wang

Multimodal Misinformation Detection (MMD) refers to the task of detecting social media posts involving misinformation, where the post often contains text and image modalities. However, by observing the MMD posts, we hold that the text modality may be much more informative than the image modality because the text generally describes the whole event/story of the current post but the image often presents partial scenes only. Our preliminary empirical results indicate that the image modality exactly contributes less to MMD. Upon this idea, we propose a new MMD method named RETSIMD. Specifically, we suppose that each text can be divided into several segments, and each text segment describes a partial scene that can be presented by an image. Accordingly, we split the text into a sequence of segments, and feed these segments into a pre-trained text-to-image generator to augment a sequence of images. We further incorporate two auxiliary objectives concerning text-image and image-label mutual information, and further post-train the generator over an auxiliary text-to-image generation benchmark dataset. Additionally, we propose a graph structure by defining three heuristic relationships between images, and use a graph neural network to generate the fused features. Extensive empirical results validate the effectiveness of RETSIMD.

PDF Details DOI

JBHI Journal 2026 Journal Article

MVGF-DR: Multi-view Graph Feature Fusion Approach for Drug Repositioning

Bing Wang
Tongxin Wu
Jun Zhang
Peng Chen
Kun Lu

Drug repositioning, exploring new indications for existing drugs, is emerging as a promising approach to accelerate drug discovery and reduce research risk of failure. Recent advances in this topic by applying graph neural networks have enabled researches to achieve significant results by extracting latent features from the original data. However, the previous studies have not fully considered the distinctive information embedded within different construction graphs, which may lead to insufficient classification performance due to the lack of more detailed features. This work therefore proposes a novel approach, namely MVGF DR, which leverages graph network construction and multi view graph feature fusion for drug repositioning. MVGF-DR built a comprehensive graph network from both similarity and association information, i. e. , a similarity graph network is constructed with drug-drug and disease-disease similarities where similarity information are extracted by graph isomorphism networks, and an association graph network with drug-disease associations where drug-disease relationships are explored by graph convolutional networks. Additionally, a maximum value selection strategy is introduced to filter features from different channels for feature fusion and noise reduction. The average AUROC and AUPR achieved by MVGF-DR across the three datasets reached 95. 38% and 51. 20%, respectively, outperforming the other five state-of-the-art models. Multiple experiments further also demonstrated the flexibility and practical applicability of MVGF-DR.

Details DOI

AAAI Conference 2026 Conference Paper

VILTA: A VLM-in-the-Loop Adversary for Enhancing Driving Policy Robustness

Qimao Chen
Fang Li
Shaoqing Xu
Zhiyi Lai
Zixun Xie
Yuechen Luo
Shengyin Jiang
Hanbing Li

The safe deployment of autonomous driving (AD) systems is fundamentally hindered by the long-tail problem, where rare yet critical driving scenarios are severely underrepresented in real-world data. Existing solutions including safety-critical scenario generation and closed-loop learning often rely on rule-based heuristics, resampling methods and generative models learned from offline datasets, limiting their ability to produce diverse and novel challenges. While recent works leverage Vision Language Models (VLMs) to produce scene descriptions that guide a separate, downstream model in generating hazardous trajectories for agents, such two-stage framework constrains the generative potential of VLMs, as the diversity of the final trajectories is ultimately limited by the generalization ceiling of the downstream algorithm. To overcome these limitations, we introduce VILTA (VLM-In-the-Loop Trajectory Adversary), a novel framework that integrates a VLM into the closed-loop training of AD agents. Unlike prior works, VILTA actively participates in the training loop by comprehending the dynamic driving environment and strategically generating challenging scenarios through direct, fine-grained editing of surrounding agents' future trajectories. This direct-editing approach fully leverages the VLM's powerful generalization capabilities to create a diverse curriculum of plausible yet challenging scenarios that extend beyond the scope of traditional methods. We demonstrate that our approach substantially enhances the safety and robustness of the resulting AD policy, particularly in its ability to navigate critical long-tail events.

PDF Details DOI

IROS Conference 2025 Conference Paper

AVIP: Acoustic-Visual-Inertial-Pressure Fusion-based Underwater Localization System with Multi-Centric Calibration

Yuanbo Xue
Yang Hu
Dejin Zhang
Chih-Yung Wen
Bing Wang

Underwater localization is a crucial capability for ensuring robust and accurate vehicle navigation. Although various well-developed localization systems exist, their primary focus is on ground and aerial applications. The challenges posed by underwater environments, such as sparse textures and dynamic disturbances, enable the multi-modal fusion a promising solution for localization. This paper presents AVIP, a localization method that fuses Acoustic, Visual, Inertial, and Pressure modalities for underwater applications. To integrate the information from all modalities during initialization, visual and inertial modalities are alternately assigned as centric sensors to pairwise predict and update estimations of other modalities. The multi-centric calibration problem is addressed through factor graph optimization, which is fully integrated into the graph-based AVIP system as the calibration factor. To evaluate the performance and compare to state-of-the-art approaches, the proposed method is evaluated using semi-physical datasets recorded by a BlueROV2 robot and public real-world datasets. Extensive experiments demonstrate that AVIP achieves superior localization accuracy and exhibits adaptability across a range of sensor configurations.

Details

NeurIPS Conference 2025 Conference Paper

Balancing Positive and Negative Classification Error Rates in Positive-Unlabeled Learning

Ximing Li
Yuanchao Dai
Bing Wang
Changchun Li
Jianfeng Qu
Renchu Guan

Positive and Unlabeled (PU) learning is a special case of binary classification with weak supervision, where only positive labeled and unlabeled data are available. Previous studies suggest several specific risk estimators of PU learning such as non-negative PU (nnPU), which are unbiased and consistent with the expected risk of supervised binary classification. In nnPU, the negative-class empirical risk is estimated by positive labeled and unlabeled data with a non-negativity constraint. However, its negative-class empirical risk estimator approaches 0, so the negative class is over-played, resulting in imbalanced error rates between positive and negative classes. To solve this problem, we suppose that the expected risks of the positive-class and negative-class should be close. Accordingly, we constrain that the negative-class empirical risk estimator is lower bounded by the positive-class empirical risk, instead of 0; and also incorporate an explicit equality constraint between them. we suggest a risk estimator of PU learning that balances positive and negative classification error rates, named $\mathrm{D{\small C-PU} }$, and suggest an efficient training method for $\mathrm{D{\small C-PU} }$ based on the augmented Lagrange multiplier framework. We theoretically analyze the estimation error of $\mathrm{D{\small C-PU} }$ and empirically validate that $\mathrm{D{\small C-PU} }$ achieves higher accuracy and converges more stable than other risk estimators of PU learning. Additionally, $\mathrm{D{\small C-PU} }$ also performs competitive accuracy performance with practical PU learning methods.

PDF Details

TIST Journal 2025 Journal Article

Cross-platform Prediction of Depression Treatment Outcome Using Location Sensory Data on Smartphones

Soumyashree Sahoo
Md. Zakir Hossain
Chinmaey Shende
Parit Patel
Yushuo Niu
Reynaldo Morillo
Xinyu Wang
Shweta Ware

ABSTRACT Currently, depression treatment relies on closely monitoring patients’ response to treatment and adjusting the treatment as needed. Using self-reported or physician-administrated questionnaires to monitor treatment response is, however, subjective, costly and suffers from recall bias. In this paper, we explore using location sensory data collected passively on smartphones to predict treatment outcome. To address heterogeneous data collection on Android and iOS phones, the two predominant smartphone platforms, we explore using domain adaptation techniques to map their data to a common feature space, and then use the data jointly to train machine learning models. We further explore integrating contrastive learning with domain adaptation to augment data and learn feature embeddings. These learned embeddings are then used to train machine learning models to predict depression treatment outcomes. Our evaluation shows that using the embeddings learned by jointly integrating contrastive learning and domain adaptation leads to the best prediction accuracy. In addition, our results show that using location features and baseline self-reported questionnaire score can lead to F1 score up to 0.76. This accuracy is comparable to that obtained using periodic self-reported questionnaires, indicating that using location data is a promising direction for predicting depression treatment outcome. Last, when all location and questionnaire data are used together, the F1 score further increases to 0.79.

Details DOI

NeurIPS Conference 2025 Conference Paper

Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency

Xiangyu Guo
Zhanqian Wu
Kaixin Xiong
Ziyang Xu
Lijun Zhou
Gangwei Xu
Shaoqing Xu
Haiyang Sun

We present Genesis, a unified world model for joint generation of multi-view driving videos and LiDAR sequences with spatio-temporal and cross-modal consistency. Genesis employs a two-stage architecture that integrates a DiT-based video diffusion model with 3D-VAE encoding, and a BEV-represented LiDAR generator with NeRF-based rendering and adaptive sampling. Both modalities are directly coupled through a shared condition input, enabling coherent evolution across visual and geometric domains. To guide the generation with structured semantics, we introduce DataCrafter, a captioning module built on vision-language models that provides scene-level and instance-level captions. Extensive experiments on the nuScenes benchmark demonstrate that Genesis achieves state-of-the-art performance across video and LiDAR metrics (FVD 16. 95, FID 4. 24, Chamfer 0. 611), and benefits downstream tasks including segmentation and 3D detection, validating the semantic fidelity and practical utility of the synthetic data.

PDF Details

AIIM Journal 2025 Journal Article

IHE-Net:Hidden feature discrepancy fusion and triple consistency training for semi-supervised medical image segmentation

Mengyi Ju
Bing Wang
Zutong Zhao
Shiyin Zhang
Shuo Yang
Zhihong Wei

Teacher-Student (TS) networks have become the mainstream frameworks of semi-supervised deep learning, and are widely used in medical image segmentation. However, traditional TSs based on single or homogeneous encoders often struggle to capture the rich semantic details required for complex, fine-grained tasks. To address this, we propose a novel semi-supervised medical image segmentation framework (IHE-Net), which makes good use of the feature discrepancies of two heterogeneous encoders to improve segmentation performance. The two encoders are instantiated by different learning paradigm networks, namely CNN and Transformer/Mamba, respectively, to extract richer and more robust context representations from unlabeled data. On this basis, we propose a simple yet powerful multi-level feature discrepancy fusion module (MFDF), which effectively integrates different modal features and their discrepancies from two heterogeneous encoders. This design enhances the representational capacity of the model through efficient fusion without introducing additional computational overhead. Furthermore, we introduce a triple consistency learning strategy to improve predictive stability by setting dual decoders and adding mixed output consistency. Extensive experimental results on three skin lesion segmentation datasets, ISIC2017, ISIC2018, and PH2, demonstrate the superiority of our framework. Ablation studies further validate the rationale and effectiveness of the proposed method. Code is available at: https: //github. com/joey-AI-medical-learning/IHE-Net.

Details DOI

NeurIPS Conference 2025 Conference Paper

LiveStar: Live Streaming Assistant for Real-World Online Video Understanding

Zhenyu Yang
Kairui Zhang
Yuhang Hu
Bing Wang
Shengsheng Qian
Bin Wen
Fan Yang
Tingting Gao

Despite significant progress in Video Large Language Models (Video-LLMs) for offline video understanding, existing online Video-LLMs typically struggle to simultaneously process continuous frame-by-frame inputs and determine optimal response timing, often compromising real-time responsiveness and narrative coherence. To address these limitations, we introduce LiveStar, a pioneering live streaming assistant that achieves always-on proactive responses through adaptive streaming decoding. Specifically, LiveStar incorporates: (1) a training strategy enabling incremental video-language alignment for variable-length video streams, preserving temporal consistency across dynamically evolving frame sequences; (2) a response-silence decoding framework that determines optimal proactive response timing via a single forward pass verification; (3) memory-aware acceleration via peak-end memory compression for online inference on 10+ minute videos, combined with streaming key-value cache to achieve 1. 53× faster inference. We also construct an OmniStar dataset, a comprehensive dataset for training and benchmarking that encompasses 15 diverse real-world scenarios and 5 evaluation tasks for online video understanding. Extensive experiments across three benchmarks demonstrate LiveStar's state-of-the-art performance, achieving an average 19. 5\% improvement in semantic correctness with 18. 1\% reduced timing difference compared to existing online Video-LLMs, while improving FPS by 12. 0\% across all five OmniStar tasks. Our model and dataset can be accessed at https: //github. com/yzy-bupt/LiveStar.

PDF Details

JBHI Journal 2025 Journal Article

MCD-LightGBM System for Intelligent Analyzing Heterogeneous Clinical Drug Therapeutic Effects

Xiao-Hui Yang
Hao-Jie Liao
Pei-Yu Sun
Jing Ma
Bing Wang
Yan He
Liu-Gen Xue
Li-Min Su

Causal effect estimation of individual heterogeneity is a core issue in the field of causal inference, and its application in medicine poses an active and challenging problem. In high-risk decision-making domain such as healthcare, inappropriate treatments can have serious negative impacts on patients. Recently, machine learning-based methods have been proposed to improve the accuracy of causal effect estimation results. However, many of these methods concentrate on estimating causal effects of continuous outcome variables under binary intervention conditions, and give less consideration to multivariate intervention conditions or discrete outcome variables, thus limiting their scope of application. To tackle this issue, we combine the double machine learning framework with Light Gradient Boosting Machine (LightGBM) and propose a double LightGBM model. This model can estimate binary causal effects more accurately and in less time. Two cyclic structures were added to the model. Data correction method was introduced and improved to transform discrete outcome variables into continuous outcome variables. Multivariate Cyclic Double LightGBM model (MCD-LightGBM) was proposed to intelligently estimate multivariate treatment effects. A visual human-computer interaction system for heterogeneous causal effect estimation was designed, which can be applied to different types of data. This paper reports that the system improved the Logarithm of the Minimum Angle of Resolution (LogMAR) of visual acuity change after Vascular Endothelial Growth Factor (anti-VEGF) treatment in patients with diabetic macular degeneration. The improvement was observed in two clinical problems, from 0. 05 to 0. 33, and the readmission rate of diabetic patients after cure was reduced from 48. 4% to 10. 5%. The results above demonstrate the potential of the proposed system in predicting heterogeneous clinical drug treatment effects.

Details DOI

ICLR Conference 2025 Conference Paper

McEval: Massively Multilingual Code Evaluation

Linzheng Chai
Shukai Liu
Jian Yang 0030
Yuwei Yin
Ke Jin
Jiaheng Liu
Tao Sun 0016
Ge Zhang 0009

Code large language models (LLMs) have shown remarkable advances in code understanding, completion, and generation tasks. Programming benchmarks, comprised of a selection of code challenges and corresponding test cases, serve as a standard to evaluate the capability of different LLMs in such tasks. However, most existing benchmarks primarily focus on Python and are still restricted to a limited number of languages, where other languages are translated from the Python samples degrading the data diversity. To further facilitate the research of code LLMs, we propose a massively multilingual code benchmark covering 40 programming languages (McEval) with 16K test samples, which substantially pushes the limits of code LLMs in multilingual scenarios. The benchmark contains challenging code completion, understanding, and generation evaluation tasks with finely curated massively multilingual instruction corpora McEval-Instruct. In addition, we introduce an effective multilingual coder mCoder trained on McEval-Instruct to support multilingual programming language generation. Extensive experimental results on McEval show that there is still a difficult journey between open-source models and closed-source LLMs in numerous languages. The instruction corpora and evaluation benchmark are available at https://github.com/MCEVAL/McEval.

Details

NeurIPS Conference 2025 Conference Paper

Pixel-Perfect Depth with Semantics-Prompted Diffusion Transformers

Gangwei Xu
Haotong Lin
Hongcheng Luo
Xianqi Wang
JINGFENG YAO
Lianghui Zhu
Yuechuan Pu
Cheng Chi_

This paper presents Pixel-Perfect Depth, a monocular depth estimation model based on pixel-space diffusion generation that produces high-quality, flying-pixel-free point clouds from estimated depth maps. Current generative depth estimation models fine-tune Stable Diffusion and achieve impressive performance. However, they require a VAE to compress depth maps into the latent space, which inevitably introduces flying pixels at edges and details. Our model addresses this challenge by directly performing diffusion generation in the pixel space, avoiding VAE-induced artifacts. To overcome the high complexity associated with pixel-space generation, we introduce two novel designs: 1) Semantics-Prompted Diffusion Transformers ( SP-DiT ), which incorporate semantic representations from vision foundation models into DiT to prompt the diffusion process, thereby preserving global semantic consistency while enhancing fine-grained visual details; and 2) Cascade DiT Design that progressively increases the number of tokens to further enhance efficiency and accuracy. Our model achieves the best performance among all published generative models across five benchmarks, and significantly outperforms all other models in edge-aware point cloud evaluation. Project page: https: //pixel-perfect-depth. github. io/.

PDF Details

IJCAI Conference 2025 Conference Paper

Robust Misinformation Detection by Visiting Potential Commonsense Conflict

Bing Wang
Ximing Li
Changchun Li
Bingrui Zhao
Bo Fu
Renchu Guan
Shengsheng Wang

The development of Internet technology has led to an increased prevalence of misinformation, causing severe negative effects across diverse domains. To mitigate this challenge, Misinformation Detection (MD), aiming to detect online misinformation automatically, emerges as a rapidly growing research topic in the community. In this paper, we propose a novel plug-and-play augmentation method for the MD task, namely Misinformation Detection with Potential Commonsense Conflict (MD-PCC). We take inspiration from the prior studies indicating that fake articles are more likely to involve commonsense conflict. Accordingly, we construct commonsense expressions for articles, serving to express potential commonsense conflicts inferred by the difference between extracted commonsense triplet and golden ones inferred by the well-established commonsense reasoning tool COMET. These expressions are then specified for each article as augmentation. Any specific MD methods can be then trained on those commonsense-augmented articles. Besides, we also collect a novel commonsense-oriented dataset named CoMis, whose all fake articles are caused by commonsense conflict. We integrate MD-PCC with various existing MD backbones and compare them across both 4 public benchmark datasets and CoMis. Empirical results demonstrate that MD-PCC can consistently outperform the existing MD baselines.

PDF Details DOI

EAAI Journal 2025 Journal Article

SkeletonDETR: A novel multimodal fusion based object detection framework for chemical safety applications

Yudi Tang
Bing Wang
Wangli He
Feng Qian
Zhen Liu

Regarding the object detection algorithm in field operations in chemical plants, how to accurately detect objects carried and used by construction workers has become a crucial challenge in safety monitoring. Current object detection algorithms usually perform well for large objects but can easily ignore small objects, particularly under partial occlusion. Moreover, existing methods fail to recognize the significance of workers’ pose information during the construction process, which provides significant benefits for the detection task, especially when the targets for detection are closely associated with the construction workers. To solve this problem, we proposed a novel multimodal fusion based object detection framework, which can effectively use human pose information to improve the detection effect of small targets and occlusions. Furthermore, we propose a multimodal sampling module to fully utilize the features of different modalities to enhance the encoder’s ability to aggregate features. Compared with the baseline model, our proposed method achieves an 8. 3% improvement in small object detection performance. Comprehensive experiments demonstrate that our proposed method outperforms existing efficient models, especially in field operation in chemical plants.

Details DOI

AAAI Conference 2025 Conference Paper

Towards Unbiased Information Extraction and Adaptation in Cross-Domain Recommendation

Yibo Wang
Yingchun Jian
Wenhao Yang
Shiyin Lu
Lei Shen
Bing Wang
Xiaoyi Zeng
Lijun Zhang

Cross-Domain Recommendation (CDR) leverages additional knowledge from auxiliary domains to address the long-standing data sparsity issue. However, existing methods typically acquire this knowledge by minimizing the average loss over all domains, overlooking the fact that different domains possess different user-preference distributions. As a result, the acquired knowledge may contain biased information from data-rich domains, leading to performance degradation in data-scarce domains. In this paper, we propose a novel CDR method, which takes domain distinctions into consideration to extract and adapt unbiased information. Specifically, our method consists of two key components: Unbiased Information Extraction (UIE) and Unbiased Information Adaptation (UIA). In the UIE, inspired by distributionally robust optimization, we optimize the worst-case performance across all domains to extract domain-invariant information, preventing the potential bias from auxiliary domains. In the UIA, we introduce a new user-item attention module, which employs domain-specific information from historically interacted items to attend the adaptation of domain-invariant information. To verify the effectiveness of our method, we conduct extensive experiments on three real-world datasets, each of which contains three extremely sparse domains. Experimental results demonstrate the considerable superiority of our proposed method compared to baselines.

PDF Details DOI

AAAI Conference 2025 Conference Paper

XCOT: Cross-lingual Instruction Tuning for Cross-lingual Chain-of-Thought Reasoning

Linzheng Chai
Jian Yang
Tao Sun
Hongcheng Guo
Jiaheng Liu
Bing Wang
Xinnian Liang
Jiaqi Bai

Chain-of-thought (CoT) has emerged as a powerful technique to elicit reasoning in large language models and improve a variety of downstream tasks. CoT mainly demonstrates excellent performance in English, but its usage in low-resource languages is constrained due to poor language generalization. To bridge the gap among different languages, we propose a cross-lingual instruction fine-tuning framework (xCoT) to transfer knowledge from high-resource languages to low-resource languages. Specifically, the multilingual instruction training data (xCoT-Instruct) is created to encourage the semantic alignment of multiple languages. We introduce cross-lingual in-context few-shot learning (xICL) to accelerate multilingual agreement in instruction tuning, where some fragments of source languages in examples are randomly substituted by their counterpart translations of target languages. During multilingual instruction tuning, we adopt the randomly online CoT strategy to enhance the multilingual reasoning ability of the large language model by first translating the query to another language and then answering in English. To further facilitate the language transfer, we leverage the high-resource CoT to supervise the training of low-resource languages with cross-lingual distillation. Experimental results demonstrate the superior performance of xCoT in reducing the gap among different languages, highlighting its potential to reduce the cross-lingual gap.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Aspect-Based Sentiment Analysis with Explicit Sentiment Augmentations

Jihong Ouyang
Zhiyao Yang
Silong Liang
Bing Wang
Yimeng Wang
Ximing Li

Aspect-based sentiment analysis (ABSA), a fine-grained sentiment classification task, has received much attention recently. Many works investigate sentiment information through opinion words, such as "good'' and "bad''. However, implicit sentiment data widely exists in the ABSA dataset, whose sentiment polarity is hard to determine due to the lack of distinct opinion words. To deal with implicit sentiment, this paper proposes an ABSA method that integrates explicit sentiment augmentations (ABSA-ESA) to add more sentiment clues. We propose an ABSA-specific explicit sentiment generation method to create such augmentations. Specifically, we post-train T5 by rule-based data and employ three strategies to constrain the sentiment polarity and aspect term of the generated augmentations. We employ Syntax Distance Weighting and Unlikelihood Contrastive Regularization in the training procedure to guide the model to generate the explicit opinion words with the same polarity as the input sentence. Meanwhile, we utilize the Constrained Beam Search to ensure the augmentations are aspect-related. We test ABSA-ESA on two ABSA benchmarks. The results show that ABSA-ESA outperforms the SOTA baselines on implicit and explicit sentiment accuracy.

PDF Details DOI

EAAI Journal 2024 Journal Article

Feature extraction of multi-sensors for early bearing fault diagnosis using deep learning based on minimum unscented kalman filter

Haihong Tang
Yanmin Tang
Yuxiang Su
Wuwei Feng
Bing Wang
Peng Chen
Dunwen Zuo

Bearing fault diagnosis is vital for ensuring reliability and safety of high-speed trains and wind turbines. Therefore, a minimum unscented Kalman filter-aided deep belief network is proposed to extract invariant features from vibration signals collected by multiple sensors. This particularly crucial due to the dynamic nature of environmental noise and internal bearing degradation, which pose challenges to accurate diagnosis. Firstly, the Gramian angular summation field is employed to transform the multi-sensor signals into 2-D feature maps. This transformation retains the absolute temporal relation within the time-series signals, mitigating feature distortion and enhancing noise elimination for early detection. Secondly, a deep belief network is utilized to construct a robust deep learning framework capable of analysing the translated 2-D feature maps for effective diagnosis. In addition, a minimum unscented transform technique and an adaptive scaling process for noise are integrated into the diagnostic model. These components exhibit exceptional dynamic tracking capabilities, allowing for adjustment of key parameters in response to the prolonged and evolving bearing degradation process. The proposed methodology was rigorously evaluated through a comprehensive analysis involving nine distinct methods, utilizing two diverse bearing datasets. The results obtained underscore the superior attributes. Notably, the proposed diagnostic scheme achieved accuracy rates exceeding 98% and 99% for the two datasets, respectively. This achievement underscores the establishment of an intelligent diagnosis model characterized by high precision and exceptional generalisation capabilities for bearings within rotating machinery. Consequently, this work lays a robust foundation for future research endeavours, particularly in the realm of transfer learning.

Details DOI

AAAI Conference 2024 Conference Paper

Non-stationary Projection-Free Online Learning with Dynamic and Adaptive Regret Guarantees

Yibo Wang
Wenhao Yang
Wei Jiang
Shiyin Lu
Bing Wang
Haihong Tang
Yuanyu Wan
Lijun Zhang

Projection-free online learning has drawn increasing interest due to its efficiency in solving high-dimensional problems with complicated constraints. However, most existing projection-free online methods focus on minimizing the static regret, which unfortunately fails to capture the challenge of changing environments. In this paper, we investigate non-stationary projection-free online learning, and choose dynamic regret and adaptive regret to measure the performance. Specifically, we first provide a novel dynamic regret analysis for an existing projection-free method named BOGD_IP, and establish an O(T^¾ (1+P_T)) dynamic regret bound, where P_T denotes the path-length of the comparator sequence. Then, we improve the upper bound to O(T^¾ (1+P_T)^¼) by running multiple BOGD_IP algorithms with different step sizes in parallel, and tracking the best one on the fly. Our results are the first general-case dynamic regret bounds for projection-free online learning, and can recover the existing O(T^¾) static regret by setting P_T = 0. Furthermore, we propose a projection-free method to attain an O(?^¾) adaptive regret bound for any interval with length?, which nearly matches the static regret over that interval. The essential idea is to maintain a set of BOGD_IP algorithms dynamically, and combine them by a meta algorithm. Moreover, we demonstrate that it is also equipped with an O(T^¾ (1+P_T)^¼) dynamic regret bound. Finally, empirical studies verify our theoretical findings.

PDF Details DOI

EAAI Journal 2024 Journal Article

Robotic assembly control reconfiguration based on transfer reinforcement learning for objects with different geometric features

Yuhang Gai
Bing Wang
Jiwen Zhang
Dan Wu
Ken Chen

Robotic force-based compliance control is a preferred approach to achieve high-precision assembly tasks. When the geometric features of assembly objects are asymmetric or irregular, reinforcement learning (RL) agents are gradually incorporated into the compliance controller to adapt to complex force-pose mapping which is hard to model analytically. Since force-pose mapping is strongly dependent on geometric features, a compliance controller is only optimal for current geometric features. To reduce the learning cost of assembly objects with different geometric features, this paper is devoted to answering how to reconfigure existing controllers for new assembly objects with different geometric features. In this paper, model-based parameters are first reconfigured based on the proposed Equivalent Theory of Compliance Law (ETCL). Then the RL agent is transferred based on the proposed Weighted Dimensional Policy Distillation (WDPD) method. The experiment results demonstrate that the control reconfiguration method costs less time and achieves better control performance, which confirms the validity of proposed methods.

Details DOI

EAAI Journal 2024 Journal Article

TL-TSD: A two-layer traffic sub-area division framework based on trajectory clustering

Chang Liu
Xinzheng Niu
Yong Ma
Shiyun Shao
Bing Wang

The development of intelligent traffic coordination and smart mobility under the digital economy has increased the need for effective traffic sub-area division. The traditional division methods rely on predefined urban administrative units, failing to adapt to the varying traffic conditions. Therefore, data-driven approaches have been developed to divide traffic sub-areas, considering both the frequency of data updates and a balance between accuracy and traffic characteristics within the selected data. However, these approaches are affected by unsatisfactory zone boundaries, and the relevant clustering algorithms cannot efficiently support traffic sub-area division. To address these issues, this paper proposes a two-layer traffic sub-area division (TL-TSD) framework that considers factors such as traffic density, road structure, and spatiotemporal characteristics inherent in the overall trajectory. Specifically, in the first layer, we introduce a specific equation based on dynamic time warping to adaptively perform trajectory cutting and record matching information while maintaining the shape features of the overall trajectory. Subsequently, we design a modified density based spatial clustering of applications with noise algorithm to obtain initial clusters. In the second layer, based on the matching information, we introduce two trajectory refinement algorithms. Each is designed for final clusters and well-defined boundaries. Extensive experimental results and statistical analysis on two real-world datasets indicate that the proposed framework can effectively address the aforementioned technical challenges and outperform the comparison algorithms in terms of the overall dunn metric. Moreover, the visualization results show that the final clusters with well-defined boundaries are more effective for dividing traffic sub-areas.

Details DOI

IJCAI Conference 2024 Conference Paper

WPML3CP: Wasserstein Partial Multi-Label Learning with Dual Label Correlation Perspectives

Ximing Li
Yuanchao Dai
Bing Wang
Changchun Li
Renchu Guan
Fangming Gu
Jihong Ouyang

Partial multi-label learning (PMLL) refers to a weakly-supervised classification problem, where each instance is associated with a set of candidate labels, covering its ground-truth labels but also with irrelevant ones. The current methodology of PMLL is to estimate the ground-truth confidences of candidate labels, i. e. , the likelihood of a candidate label being a ground-truth one, and induce the multi-label predictor with them, rather than the candidate labels. In this paper, we aim to estimate precise ground-truth confidences by leveraging precise label correlations, which are also required to estimate. To this end, we propose to capture label correlations from both measuring and modeling perspectives. Specifically, we measure the loss between ground-truth confidences and predictions by employing the Wasserstein distance involving label correlations; and form a label correlation-aware regularization to constraint predictive parameters. The two techniques are coupled to promote precise estimations of label correlations. Upon these ideas, we propose a novel PMLL method, namely Wasserstein Partial Multi-Label Learning with dual Label Correlation Perspectives (WPML3CP). We conduct extensive experiments on several benchmark datasets. Empirical results demonstrate that WPML3CP can outperform the existing PMLL baselines.

PDF Details DOI

NeurIPS Conference 2022 Conference Paper

BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework

Tingting Liang
Hongwei Xie
Kaicheng Yu
Zhongyu Xia
Zhiwei Lin
Yongtao Wang
Tao Tang
Bing Wang

Fusing the camera and LiDAR information has become a de-facto standard for 3D object detection tasks. Current methods rely on point clouds from the LiDAR sensor as queries to leverage the feature from the image space. However, people discovered that this underlying assumption makes the current fusion framework infeasible to produce any prediction when there is a LiDAR malfunction, regardless of minor or major. This fundamentally limits the deployment capability to realistic autonomous driving scenarios. In contrast, we propose a surprisingly simple yet novel fusion framework, dubbed BEVFusion, whose camera stream does not depend on the input of LiDAR data, thus addressing the downside of previous methods. We empirically show that our framework surpasses the state-of-the-art methods under the normal training settings. Under the robustness training settings that simulate various LiDAR malfunctions, our framework significantly surpasses the state-of-the-art methods by 15. 7% to 28. 9% mAP. To the best of our knowledge, we are the first to handle realistic LiDAR malfunction and can be deployed to realistic scenarios without any post-processing procedure.

PDF Details

AAAI Conference 2022 Conference Paper

Contrastive Instruction-Trajectory Learning for Vision-Language Navigation

Xiwen Liang
Fengda Zhu
Yi Zhu
Bingqian Lin
Bing Wang
Xiaodan Liang

The vision-language navigation (VLN) task requires an agent to reach a target with the guidance of natural language instruction. Previous works learn to navigate step-bystep following an instruction. However, these works may fail to discriminate the similarities and discrepancies across instruction-trajectory pairs and ignore the temporal continuity of sub-instructions. These problems hinder agents from learning distinctive vision-and-language representations, harming the robustness and generalizability of the navigation policy. In this paper, we propose a Contrastive Instruction-Trajectory Learning (CITL) framework that explores invariance across similar data samples and variance across different ones to learn distinctive representations for robust navigation. Specifically, we propose: (1) a coarse-grained contrastive learning objective to enhance vision-and-language representations by contrasting semantics of full trajectory observations and instructions, respectively; (2) a fine-grained contrastive learning objective to perceive instructions by leveraging the temporal information of the sub-instructions; (3) a pairwise sample-reweighting mechanism for contrastive learning to mine hard samples and hence mitigate the influence of data sampling bias in contrastive learning. Our CITL can be easily integrated with VLN backbones to form a new learning paradigm and achieve better generalizability in unseen environments. Extensive experiments show that the model with CITL surpasses the previous state-of-the-art methods on R2R, R4R, and RxR. Code is available at https: //github. com/ liangcici/CITL-VLN.

PDF Details

IJCAI Conference 2022 Conference Paper

Corner Affinity: A Robust Grouping Algorithm to Make Corner-guided Detector Great Again

Haoran Wei
Chenglong Liu
Ping Guo
Yangguang Zhu
Jiamei Fu
Bing Wang
Peng Wang

Corner-guided detector enjoys potential ability to yield precise bounding boxes. However, unreliable corner pairs, generated by heuristic grouping guidance, hinder the development of this detector. In this paper, we propose a novel corner grouping algorithm, termed as Corner Affinity, to significantly boost the reliability and robustness of corner grouping. The proposed Corner Affinity is a couple of two interactional factors, namely, 1) the structure affinity (SA), applying to generate preliminary corner pairs through the corresponding object's shallow construction information. 2) the contexts affinity (CA), running as optimizing corner pairs via embedding deeper semantic features of affiliated instances. Equipped with the Corner Affinity, a detector can produce high-quality bounding boxes upon preferable paired corner keypoints. Experimental results show the superiority of our design on multiple benchmark datasets. Specifically, for CornerNet baseline, the proposed Corner Affinity brings AP boostings of 5. 8% on COCO, 35. 8% on Citypersons, and 17. 2% on UCAS-AOD without bells and whistles.

PDF Details DOI

JBHI Journal 2022 Journal Article

Transformer Model for Functional Near-Infrared Spectroscopy Classification

Zenghui Wang
Jun Zhang
Xiaochu Zhang
Peng Chen
Bing Wang

Functional near-infrared spectroscopy (fNIRS) is a promising neuroimaging technology. The fNIRS classification problem has always been the focus of the brain-computer interface (BCI). Inspired by the success of Transformer based on self-attention mechanism in the fields of natural language processing and computer vision, we propose an fNIRS classification network based on Transformer, named fNIRS-T. We explore the spatial-level and channel-level representation of fNIRS signals to improve data utilization and network representation capacity. Besides, a preprocessing module, which consists of one-dimensional average pooling and layer normalization, is designed to replace filtering and baseline correction of data preprocessing. It makes fNIRS-T an end-to-end network, called fNIRS-PreT. Compared with traditional machine learning classifiers, convolutional neural network (CNN), and long short-term memory (LSTM), the proposed models obtain the best accuracy on three open-access datasets. Specifically, in the most extensive ternary classification task (30 subjects) that includes three types of overt movements, fNIRS-T, CNN, and LSTM obtain 75. 49%, 72. 89%, and 61. 94% on test sets, respectively. Compared to traditional classifiers, fNIRS-T is at least 27. 41% higher than statistical features and 6. 79% higher than well-designed features. In the individual subject experiment of the ternary classification task, fNIRS-T achieves an average subject accuracy of 78. 22% and surpasses CNN and LSTM by a large margin of +4. 75% and +11. 33%. fNIRS-PreT using raw data also achieves competitive performance to fNIRS-T. Therefore, the proposed models improve the performance of fNIRS-based BCI significantly.

Details DOI

ICRA Conference 2021 Conference Paper

PSF-LO: Parameterized Semantic Features Based Lidar Odometry

Guibin Chen
Bosheng Wang
Xiaoliang Wang
Huanjun Deng
Bing Wang
Shuo Zhang

Lidar odometry (LO) is a key technology in numerous reliable and accurate localization and mapping systems of autonomous driving. The state-of-the-art LO methods generally leverage geometric information to perform point cloud registration. Furthermore, obtaining the point cloud semantic information describing the environment more abundantly will facilitate the registration. We present a novel semantic lidar odometry method based on self-designed parameterized seman-tic features (PSFs) to achieve low-drift ego-motion estimation for autonomous vehicle in real time. We first use a convolutional neural network-based algorithm to obtain point-wise semantics from the input laser point cloud, and then use semantic labels to separate road, building, traffic sign and pole-like point cloud and fit them separately to obtain corresponding PSFs. A fast PSF-based matching enables us to refine geometric features (GeFs) registration, thereby reducing the impact of blurred submap surface on the accuracy of GeFs matching. Besides, we design an efficient instance-level method to accurately recognize and remove the dynamic objects while retaining static ones in the semantic point cloud, which are beneficial to further improve the accuracy of LO. We evaluate our method, namely PSF-LO, on the public dataset KITTI Odometry Benchmark and rank #1 among semantic lidar methods with an average translational error of 0. 82% in the test dataset.

Details

AAAI Conference 2021 Conference Paper

VMLoc: Variational Fusion For Learning-Based Multimodal Camera Localization

Kaichen Zhou
Changhao Chen
Bing Wang
Muhamad Risqi U. Saputra
Niki Trigoni
Andrew Markham

Recent learning-based approaches have achieved impressive results in the field of single-shot camera localization. However, how best to fuse multiple modalities (e. g. , image and depth) and to deal with degraded or missing input are less well studied. In particular, we note that previous approaches towards deep fusion do not perform significantly better than models employing a single modality. We conjecture that this is because of the naive approaches to feature space fusion through summation or concatenation which do not take into account the different strengths of each modality. To address this, we propose an end-to-end framework, termed VM- Loc, to fuse different sensor inputs into a common latent space through a variational Product-of-Experts (PoE) followed by attention-based fusion. Unlike previous multimodal variational works directly adapting the objective function of vanilla variational auto-encoder, we show how camera localization can be accurately estimated through an unbiased objective function based on importance weighting. Our model is extensively evaluated on RGB-D datasets and the results prove the efficacy of our model. The source code is available at https: //github. com/Zalex97/VMLoc.

PDF Details

JBHI Journal 2020 Journal Article

A Deep Learning-Based Chemical System for QSAR Prediction

ShanShan Hu
Peng Chen
Pengying Gu
Bing Wang

Research on quantitative structure-activity relationships (QSAR) provides an effective approach to determine new hits and promising lead compounds during drug discovery. In the past decades, various works have gained good performance for QSAR with the development of machine learning. The rise of deep learning, along with massive accessible chemical databases, made improvement on the QSAR performance. This article proposes a novel deep-learning-based method to implement QSAR prediction by the concatenation of end-to-end encoder-decoder model and convolutional neural network (CNN) architecture. The encoder-decoder model is mainly used to generate fixed-size latent features to represent chemical molecules; while these features are then input into CNN framework to train a robust and stable model and finally to predict active chemicals. Two models with different schemes are investigated to evaluate the validity of our proposed model on the same data sets. Experimental results showed that our proposed method outperforms other state-of-the-art methods in successful identification of chemical molecule whether it is active.

Details DOI

AAAI Conference 2020 Conference Paper

Accelerating Primal Solution Findings for Mixed Integer Programs Based on Solution Prediction

Jian-Ya Ding
Chao Zhang
Lei Shen
Shengyin Li
Bing Wang
Yinghui Xu
Le Song

Mixed Integer Programming (MIP) is one of the most widely used modeling techniques for combinatorial optimization problems. In many applications, a similar MIP model is solved on a regular basis, maintaining remarkable similarities in model structures and solution appearances but differing in formulation coefﬁcients. This offers the opportunity for machine learning methods to explore the correlations between model structures and the resulting solution values. To address this issue, we propose to represent a MIP instance using a tripartite graph, based on which a Graph Convolutional Network (GCN) is constructed to predict solution values for binary variables. The predicted solutions are used to generate a local branching type cut which can be either treated as a global (invalid) inequality in the formulation resulting in a heuristic approach to solve the MIP, or as a root branching rule resulting in an exact approach. Computational evaluations on 8 distinct types of MIP problems show that the proposed framework improves the primal solution ﬁnding performance signiﬁcantly on a state-of-the-art open-source MIP solver.

PDF Details

AAAI Conference 2020 Conference Paper

AtLoc: Attention Guided Camera Localization

Bing Wang
Changhao Chen
Chris Xiaoxuan Lu
Peijun Zhao
Niki Trigoni
Andrew Markham

Deep learning has achieved impressive results in camera localization, but current single-image techniques typically suffer from a lack of robustness, leading to large outliers. To some extent, this has been tackled by sequential (multi-images) or geometry constraint approaches, which can learn to reject dynamic objects and illumination conditions to achieve better performance. In this work, we show that attention can be used to force the network to focus on more geometrically robust objects and features, achieving state-of-the-art performance in common benchmark, even if using only a single image as input. Extensive experimental evidence is provided through public indoor and outdoor datasets. Through visualization of the saliency maps, we demonstrate how the network learns to reject dynamic objects, yielding superior global camera pose regression performance. The source code is avaliable at https: //github. com/BingCS/AtLoc.

PDF Details

EAAI Journal 2019 Journal Article

A MEMCIF-IN method for safety risk assessment in oil and gas industry based on interval numbers and risk attitudes

Donghong Tian
Chunlan Zhao
Bing Wang
Meng Zhou

This paper mainly proposes a novel method to construct a risk matrix for assessing safety risks in oil and gas industry. There are often multiple experts and multiple criteria involved in safety risk assessment problems and the assessment data are often given in the form of interval numbers. In order to better assess risks, the definition of interval number with distribution function and utility function is proposed in this paper. The frequency and the consequence of risk are only two needed indicators in risk matrix and their values are needed in the form of crisp values. So a multi-expert and multi-criterion information fusion based on interval number(MEMCIF-IN) model is built in this paper. Firstly, a multi-expert and multi-criterion fusion model is constructed to combine individual interval numbers into a collective interval number and integrate multiple criteria into a comprehensive index. In the fusion model, the weights of assessment experts are calculated based on the objective weights and the subjective weights simultaneously and the information of individual interval numbers is preserved without information loss in the final result. Secondly, a Continuous Weighted Ordered Weighted Aggregation(C-WOWA) operator is proposed. In the C-WOWA operator, the position weights which are generated by utility function and the importance weights which are generated by probability density function are considered at the same time. The position weights in the C-WOWA operator can correct the impact of experts’ risk attitudes and the importance weights can reflect the importance of the points themselves in an interval number. Finally, a risk matrix is constructed to show which risk is high and which is low. In addition, an application is implemented to show the practicality and rationality of the proposed method.

Details DOI

IS Journal 2014 Journal Article

How Effective Are the Prevailing Attack-Defense Models for Cybersecurity Anyway?

Daojing He
Sammy Chan
Yan Zhang
Chunming Wu
Bing Wang

Attack-defense models play an important role in the design of cybersecurity systems. Here, the authors review some traditional and prevailing attack-defense models along with their weaknesses. Then, they survey some recently proposed paradigm shifts to these models based on which more effective security strategies can be designed. Further, they provide some suggestions on how to adopt the new models, and present challenges that need to be addressed in this field.

Details DOI