Author name cluster

Bo Dong

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

27 papers

1 author row

AAAI Conference 2026 Conference Paper

Dynamic Weight Adaptation in Spiking Neural Networks Inspired by Biological Homeostasis

Yunduo Zhou
Bo Dong
Chang Li
Yuanchen Wang
Xuefeng Yin
Yang Wang
Xin Yang

Homeostatic mechanisms play a crucial role in maintaining optimal functionality within the neural circuits of the brain. By regulating physiological and biochemical processes, these mechanisms ensure the stability of an organism’s internal environment, enabling it to better adapt to external changes. Among these mechanisms, the Bienenstock, Cooper, and Munro (BCM) theory has been extensively studied as a key principle for maintaining the balance of synaptic strengths in biological systems. Despite the extensive development of spiking neural networks (SNNs) as a model for bionic neural networks, no prior work in the machine learning community has integrated biologically plausible BCM formulations into SNNs to provide homeostasis. In this study, we propose a Dynamic Weight Adaptation Mechanism (DWAM) for SNNs, inspired by the BCM theory. DWAM can be integrated into the host SNN, dynamically adjusting network weights in real time to regulate neuronal activity, providing homeostasis to the host SNN without any fine-tuning. We validated our method through dynamic obstacle avoidance and continuous control tasks under both normal and specifically designed degraded conditions. Experimental results demonstrate that DWAM not only enhances the performance of SNNs without existing homeostatic mechanisms under various degraded conditions but also further improves the performance of SNNs that already incorporate homeostatic mechanisms.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Enhancing Pre-training Data Detection in LLMs Through Discriminative and Symmetric Prefix Selection

Kai Sun
Yuxin Lin
Bo Dong
Jingyao Zhang
Bin Shi

The rapid development of large language models (LLMs) has relied on access to high-quality, large-scale datasets, yet growing concerns around data privacy and security have spurred substantial research into pre-training data detection. While state-of-the-art (SOTA) methods such as RECALL and CON-RECALL leverage auxiliary prefixes to enhance detection performance, their dependence on individual prefixes introduces notable instability across varying prefix conditions. To address this, we first conduct a theoretical analysis to assess the impact of prefixes on existing prefix-based methods. Building on the analysis, we propose a novel prefix selection method to identify optimal prefixes. Specifically, our method derives two key criteria Discriminability and Symmetry. These criteria serve to quantify the effectiveness of prefixes in detecting pre-training data, enabling precise selection of high-performing candidate prefixes. Experiments on the WikiMIA dataset demonstrate that our method consistently improves the performance of RECALL and CON-RECALL, achieving gains of up to 21.1% in AUC scores while significantly enhancing robustness.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Scope Delineation Before Localization: A Two-Stage Framework for Enhancing Failure Attribution in Multi-Agent Systems

Kai Sun
Wenqiang Li
Bo Dong
Yuxin Lin
Jingyao Zhang
Bin Shi

Large language models (LLMs) are seeing growing adoption in multi-agent systems. In these systems, efficient failure attribution is critical for ensuring robustness and interpretability. Current LLM-based attribution methods often face challenges with lengthy logs and lacking expert knowledge. Drawing inspiration from human debugging strategies, we propose an automated failure attribution framework, Scope Delineation Before Localization, which operates in two key stages: (1) identifying the failure scope and (2) pinpointing the failure step. By decoupling failure attribution into the two stages, our approach alleviates the reasoning workload of LLMs, enabling more precise failure attribution. To support scope delineation, we further introduce two strategies: Stepwise Scope Delineation and Expertise-Assisted Scope Delineation. Experiments on the Who&When dataset validate the efficacy of our two-stage framework, demonstrating substantial improvements over prior methods (up to 24.27% on step-level accuracy).

PDF Details DOI

AAAI Conference 2026 Conference Paper

View-on-Graph: Zero-Shot 3D Visual Grounding via Vision-Language Reasoning on Scene Graphs

Yuanyuan Liu
Haiyang Mei
Dongyang Zhan
Jiayue Zhao
Dongsheng Zhou
Bo Dong
Xin Yang

3D visual grounding (3DVG) identifies objects in 3D scenes from language descriptions. Existing zero-shot approaches leverage 2D vision–language models (VLMs) by converting 3D spatial information (SI) into forms amenable to VLM processing, typically as composite inputs such as specified-view renderings or video sequences with overlaid object markers. However, this VLM ⊕ SI paradigm yields entangled visual representations that compel the VLM to process entire cluttered cues, making it hard to exploit spatial–semantic relationships effectively. In this work, we propose a new VLM ⊗ SI paradigm that externalizes the 3D SI into a form enabling the VLM to incrementally retrieve only what it needs during reasoning. We instantiate this paradigm with a novel View-on-Graph (VoG) method, which organizes the scene into a multi-modal, multi-layer scene graph and allows the VLM to operate as an active agent that selectively accesses necessary cues as it traverses the scene. This design offers two intrinsic advantages: (i) by structuring 3D context into a spatially and semantically coherent scene graph rather than confounding the VLM with densely entangled visual inputs, it lowers the VLM's reasoning difficulty; and (ii) by actively exploring and reasoning over the scene graph, it naturally produces transparent, step-by-step traces for interpretable 3DVG. Extensive experiments show that VoG achieves state-of-the-art zero-shot performance, establishing structured scene exploration as a promising strategy for advancing zero-shot 3DVG.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Federated Multi-view Graph Clustering with Incomplete Attribute Imputation

Wei Feng
Zeyu Bi
Qianqian Wang
Bo Dong

Federated Multi-View Clustering (FedMVC) aims to uncover consistent clustering structures from distributed multi-view data for clustering while preserving data privacy. However, existing FedMVC methods under vertical settings either ignore the ubiquitous incomplete view issue or require uploading data features, which may lead to privacy leakage or induce high communication costs. To mitigate the view incompleteness issue and simultaneously maintain privacy and efffciency, we propose a novel Federated Multiview Graph Clustering with Incomplete Attribute Imputation (FMVC-IAI). This method constructs a consensus graph structure through complementary multi-view data and then utilizes a non-parametric graph neural network (GNN) to impute missing features. Additionally, it utilizes the adjacency graph as the knowledge carrier to share and fuse the multi-view information. To alleviate the high communication cost due to graph sharing, we proposed to share the anchor graph for global adjacency graph construction, which reduces communication cost and also helps to reduce privacy leakage risk. Extensive experiments demonstrate the superiority of our method in FedMVC tasks with incomplete views.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Fully Autonomous Neuromorphic Navigation and Dynamic Obstacle Avoidance

Xiaochen Shang
Pengwei Luo
Xinning Wang
Jiayue Zhao
Huilin Ge
Bo Dong
Xin Yang

Unmanned aerial vehicles could accurately accomplish complex navigation and obstacle avoidance tasks under external control. However, enabling unmanned aerial vehicles (UAVs) to rely solely on onboard computation and sensing for real-time navigation and dynamic obstacle avoidance remains a significant challenge due to stringent latency and energy constraints. Inspired by the efficiency of biological systems, we propose a fully neuromorphic framework achieving end-to-end obstacle avoidance during navigation with an overall latency of just 2. 3 milliseconds. Specifically, our bio-inspired approach enables accurate moving object detection and avoidance without requiring target recognition or trajectory computation. Additionally, we introduce the first monocular event-based pose correction dataset with over 50, 000 paired and labeled event streams. We validate our system on an autonomous quadrotor using only onboard resources, demonstrating reliable navigation and avoidance of diverse obstacles moving at speeds up to 10 m/s.

PDF Details

AAAI Conference 2025 Conference Paper

Out-of-Distribution Generalization on Graphs via Progressive Inference

Yiming Xu
Bin Shi
Zhen Peng
Huixiang Liu
Bo Dong
Chen Chen

The development and evaluation of graph neural networks (GNNs) generally follow the independent and identically distributed (i.i.d.) assumption. Yet this assumption is often untenable in practice due to the uncontrollable data generation mechanism. In particular, when the data distribution shows a significant shift, most GNNs would fail to produce reliable predictions and may even make decisions randomly. One of the most promising solutions to improve the model generalization is to pick out causal invariant parts in the input graph. Nonetheless, we observe a significant distribution gap between the causal parts learned by existing methods and the ground-truth, leading to undesirable performance. In response to the above issues, this paper presents GPro, a model that learns graph causal invariance with progressive inference. Specifically, the complicated graph causal invariant learning is decomposed into multiple intermediate inference steps from easy to hard, and the perception of GPro is continuously strengthened through a progressive inference process to extract causal features that are stable to distribution shifts. We also enlarge the training distribution by creating counterfactual samples to enhance the capability of the GPro in capturing the causal invariant parts. Extensive experiments demonstrate that our proposed GPro outperforms the state-of-the-art methods by 4.91% on average. For datasets with more severe distribution shifts, the performance improvement can be up to 6.86%.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Revisiting Graph Contrastive Learning on Anomaly Detection: A Structural Imbalance Perspective

Yiming Xu
Zhen Peng
Bin Shi
Xu Hua
Bo Dong
Song Wang
Chen Chen

The superiority of graph contrastive learning (GCL) has prompted its application to anomaly detection tasks for more powerful risk warning systems. Unfortunately, existing GCL-based models tend to excessively prioritize overall detection performance while neglecting robustness to structural imbalance, which can be problematic for many real-world networks following power-law degree distributions. Particularly, GCL-based methods may fail to capture tail anomalies (abnormal nodes with low degrees). This raises concerns about the security and robustness of current anomaly detection algorithms and therefore hinders their applicability in a variety of realistic high-risk scenarios. To the best of our knowledge, research on the robustness of graph anomaly detection to structural imbalance has received little scrutiny. To address the above issues, this paper presents a novel GCL-based framework named AD-GCL. It devises the neighbor pruning strategy to filter noisy edges for head nodes and facilitate the detection of genuine tail nodes by aligning from head nodes to forged tail nodes. Moreover, AD-GCL actively explores potential neighbors to enlarge the receptive field of tail nodes through anomaly-guided neighbor completion. We further introduce intra- and inter-view consistency loss of the original and augmentation graph for enhanced representation. The performance evaluation of the whole, head, and tail nodes on multiple datasets validates the comprehensive superiority of the proposed AD-GCL in detecting both head anomalies and tail anomalies.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Separating the Wheat from the Chaff: Spatio-Temporal Transformer with View-interweaved Attention for Photon-Efficient Depth Sensing

Letian Yu
Jiaxi Yang
Bo Dong
Qirui Bao
Yuanbo Wang
Felix Heide
Xiaopeng Wei
Xin Yang

Time-resolved imaging is an emerging sensing modality that has been shown to enable advanced applications, including remote sensing, fluorescence lifetime imaging, and even non-line-of-sight sensing. Single-photon avalanche diodes (SPADs) outperform relevant time-resolved imaging technologies thanks to their excellent photon sensitivity and superior temporal resolution on the order of tens of picoseconds. The capability of exceeding the sensing limits of conventional cameras for SPADs also draws attention to the photon-efficient imaging area. However, photon-efficient imaging under degraded conditions with low photon counts and low signal-to-background ratio (SBR) still remains an inevitable challenge. In this paper, we propose a spatio-temporal transformer network for photon-efficient imaging under low-flux scenarios. In particular, we introduce a view-interweaved attention mechanism (VIAM) to extract both spatial-view and temporal-view self-attention in each transformer block. We also design an adaptive-weighting scheme to dynamically adjust the weights between different views of self-attention in VIAM for different signal-to-background levels. We extensively validate and demonstrate the effectiveness of our approach on the simulated Middlebury dataset and a specially self-collected dataset with real-world-captured SPAD measurements and well-annotated ground truth depth maps.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Tensorial Multi-view Clustering with Deep Anchor Graph Projection

Wei Feng
Dongyuvan Wei
Qianqian Wang
Bo Dong

Multi-view clustering (MVC) has emerged as an important unsupervised multi-view learning method that leverages consistent and complementary information to enhance clustering performance. Recently, tensorized MVC, which processes multi-view data as a tensor to capture their cross-view information, has received considerable attention. However, existing tensorized MVC methods generally overlook deep structures within each view and rely on post-processing to derive clustering results, leading to potential information loss and degraded performance. To address these issues, we develop Tensorial Multi-view Clustering with Deep Anchor Graph Projection (TMVC-DAGP), which performs deep projection on the anchor graph, thus improving model scalability. Besides, we utilize a sparsity regularization to eliminate the redundancy and enforce the projected anchor graph to retain a clear clustering structure. Furthermore, TMVC-DAGP leverages weighted Tensor Schatten $p$-norm to exploit the consistent and complementary information. Extensive experiments on multiple datasets demonstrate TMVC-DAGP's effectiveness and superiority.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback

Pengwei Liu
Hangjie Yuan
Bo Dong
Jiazheng Xing
Jinwang Wang
Rui Zhao
Weihua Chen
Fan Wang

Relighting is a crucial task with both practical demand and artistic value, and recent diffusion models have shown strong potential by enabling rich and controllable lighting effects. However, as they are typically optimized in semantic latent space, where proximity does not guarantee physical correctness in visual space, they often produce unrealistic results—such as overexposed highlights, misaligned shadows, and incorrect occlusions. We address this with UniLumos, a unified relighting framework for both images and videos that brings RGB-space geometry feedback into a flow-matching backbone. By supervising the model with depth and normal maps extracted from its outputs, we explicitly align lighting effects with the scene structure, enhancing physical plausibility. Nevertheless, this feedback requires high-quality outputs for supervision in visual space, making standard multi-step denoising computationally expensive. To mitigate this, we employ path consistency learning, allowing supervision to remain effective even under few-step training regimes. To enable fine-grained relighting control and supervision, we design a structured six-dimensional annotation protocol capturing core illumination attributes. Building upon this, we propose LumosBench, a disentangled attribute-level benchmark that evaluates lighting controllability via large vision-language models, enabling automatic and interpretable assessment of relighting precision across individual dimensions. Extensive experiments demonstrate that UniLumos achieves state-of-the-art relighting quality with significantly improved physical consistency, while delivering a 20x speedup for both image and video relighting. Code is available at https: //github. com/alibaba-damo-academy/Lumos-Custom.

PDF Details

AAAI Conference 2025 Conference Paper

VERO: Verification and Zero-Shot Feedback Acquisition for Few-Shot Multimodal Aspect-Level Sentiment Classification

Kai Sun
Hao Wu
Bin Shi
Samuel Mensah
Peng Liu
Bo Dong

Deep learning approaches for multimodal aspect-level sentiment classification (MALSC) often require extensive data, which is costly and time-consuming to obtain. To mitigate this, current methods typically fine-tune small-scale pretrained models like BERT and BART with few-shot examples. While these models have shown success, Large Vision-Language Models (LVLMs) offer significant advantages due to their greater capacity and ability to understand nuanced language in both zero-shot and few-shot settings. However, there is limited work on fine-tuning LVLMs for MALSC. A major challenge lies in selecting few-shot examples that effectively capture the underlying patterns in data for these LVLMs. To bridge this research gap, we propose an acquisition function designed to select challenging samples for the few-shot learning of LVLMs for MALSC. We compare our approach, Verification and ZERO-shot feedback acquisition (VERO), with diverse acquisition functions for few-shot learning in MALSC. Our experiments show that VERO outperforms prior methods, achieving an F1 score improvement of up to 6.07% on MALSC benchmark datasets.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Apprenticeship-Inspired Elegance: Synergistic Knowledge Distillation Empowers Spiking Neural Networks for Efficient Single-Eye Emotion Recognition

Yang Wang
Haiyang Mei
Qirui Bao
Ziqi Wei
Mike Zheng Shou
Haizhou Li
Bo Dong
Xin Yang

We introduce a novel multimodality synergistic knowledge distillation scheme tailored for efficient single-eye motion recognition tasks. This method allows a lightweight, unimodal student spiking neural network (SNN) to extract rich knowledge from an event-frame multimodal teacher network. The core strength of this approach is its ability to utilize the ample, coarser temporal cues found in conventional frames for effective emotion recognition. Consequently, our method adeptly interprets both temporal and spatial information from the conventional frame domain, eliminating the need for specialized sensing devices, e. g. , event-based camera. The effectiveness of our approach is thoroughly demonstrated using both existing and our compiled single-eye emotion recognition datasets, achieving unparalleled performance in accuracy and efficiency over existing state-of-the-art methods.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Efficient Federated Multi-View Clustering with Integrated Matrix Factorization and K-Means

Wei Feng
Zhenwei Wu
Qianqian Wang
Bo Dong
Zhiqiang Tao
Quanxue Gao

Multi-view clustering is a popular unsupervised multi-view learning method. Real-world multi-view data are often distributed across multiple entities, presenting a challenge for performing multi-view clustering. Federated learning provides a solution by enabling multiple entities to collaboratively train a global model. However, existing federated multi-view clustering methods usually conduct feature extraction and clustering in separate steps, potentially leading to a degradation in clustering performance. To address this issue and for the sake of efficiency, we propose a novel Federated Multi-View Clustering method with Integrated Matrix Factorization and K-Means (FMVC-IMK), which integrates matrix factorization and multi-view K-means into one step. Additionally, an adaptive weight is employed to balance the influence of data from each view. FMVC-IMK further incorporates a graph-based regularizer to preserve the original data's geometric structure within the learned global clustering structure. We also develop a federated optimization approach to collaboratively learn a global clustering result without sharing any original data. Experimental results on multiple datasets demonstrate the effectiveness of FMVC-IMK.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Federated Multi-View Clustering via Tensor Factorization

Wei Feng
Zhenwei Wu
Qianqian Wang
Bo Dong
Zhiqiang Tao
Quanxue Gao

Multi-view clustering is an effective method to process massive unlabeled multi-view data. Since data of different views may be collected and held by different parties, it becomes impractical to train a multi-view clustering model in a centralized way, for the sake of privacy. However, federated multi-view clustering is challenging because multi-view learning has to consider the complementary and consistent information between each view distributed across different clients. For another, efficiency is highly expected in federated scenarios. Therefore, we propose a novel federated multi-view clustering method with tensor factorization (TensorFMVC), which is built based on K-means and hence is more efficient. Besides, TensorFMVC avoids initializing centroids to address the performance degradation of K-means due to its sensitivity to centroid initialization. A three-order tensor stacked by cluster assignment matrices is introduced to exploit the complementary information and spatial structure of different views. Furthermore, we divide the optimization into several subproblems and develop a federated optimization approach to support cooperative model training. Extensive experiments on several datasets demonstrate that our proposed method exhibits superior performance in federated multi-view clustering.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Partial Multi-View Clustering via Self-Supervised Network

Wei Feng
Guoshuai Sheng
Qianqian Wang
Quanxue Gao
Zhiqiang Tao
Bo Dong

Partial multi-view clustering is a challenging and practical research problem for data analysis in real-world applications, due to the potential data missing issue in different views. However, most existing methods have not fully explored the correlation information among various incomplete views. In addition, these existing clustering methods always ignore discovering discriminative features inside the data itself in this unsupervised task. To tackle these challenges, we propose Partial Multi-View Clustering via Self-Supervised \textbf{N}etwork (PVC-SSN) in this paper. Specifically, we employ contrastive learning to obtain a more discriminative and consistent subspace representation, which is guided by a self-supervised module. Self-supervised learning can exploit effective cluster information through the data itself to guide the learning process of clustering tasks. Thus, it can pull together embedding features from the same cluster and push apart these from different clusters. Extensive experiments on several benchmark datasets show that the proposed PVC-SCN method outperforms several state-of-the-art clustering methods.

PDF Details DOI

AAAI Conference 2024 Conference Paper

RR-PU: A Synergistic Two-Stage Positive and Unlabeled Learning Framework for Robust Tax Evasion Detection

Shuzhi Cao
Jianfei Ruan
Bo Dong
Bin Shi
Qinghua Zheng

Tax evasion, an unlawful practice in which taxpayers deliberately conceal information to avoid paying tax liabilities, poses significant challenges for tax authorities. Effective tax evasion detection is critical for assisting tax authorities in mitigating tax revenue loss. Recently, machine-learning-based methods, particularly those employing positive and unlabeled (PU) learning, have been adopted for tax evasion detection, achieving notable success. However, these methods exhibit two major practical limitations. First, their success heavily relies on the strong assumption that the label frequency (the fraction of identified taxpayers among tax evaders) is known in advance. Second, although some methods attempt to estimate label frequency using approaches like Mixture Proportion Estimation (MPE) without making any assumptions, they subsequently construct a classifier based on the error-prone label frequency obtained from the previous estimation. This two-stage approach may not be optimal, as it neglects error accumulation in classifier training resulting from the estimation bias in the first stage. To address these limitations, we propose a novel PU learning-based tax evasion detection framework called RR-PU, which can revise the bias in a two-stage synergistic manner. Specifically, RR-PU refines the label frequency initialization by leveraging a regrouping technique to fortify the MPE perspective. Subsequently, we integrate a trainable slack variable to fine-tune the initial label frequency, concurrently optimizing this variable and the classifier to eliminate latent bias in the initial stage. Experimental results on three real-world tax datasets demonstrate that RR-PU outperforms state-of-the-art methods in tax evasion detection tasks.

PDF Details DOI

AAAI Conference 2024 Conference Paper

The Evidence Contraction Issue in Deep Evidential Regression: Discussion and Solution

Yuefei Wu
Bin Shi
Bo Dong
Qinghua Zheng
Hua Wei

Deep Evidential Regression (DER) places a prior on the original Gaussian likelihood and treats learning as an evidence acquisition process to quantify uncertainty. For the validity of the evidence theory, DER requires specialized activation functions to ensure that the prior parameters remain non-negative. However, such constraints will trigger evidence contraction, causing sub-optimal performance. In this paper, we analyse DER theoretically, revealing the intrinsic limitations for sub-optimal performance: the non-negativity constraints on the Normal Inverse-Gamma (NIG) prior parameter trigger the evidence contraction under the specialized activation function, which hinders the optimization of DER performance. On this basis, we design a Non-saturating Uncertainty Regularization term, which effectively ensures that the performance is further optimized in the right direction. Experiments on real-world datasets show that our proposed approach improves the performance of DER while maintaining the ability to quantify uncertainty.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

Dichotomous Image Segmentation with Frequency Priors

Yan Zhou
Bo Dong
Yuanfeng Wu
Wentao Zhu
Geng Chen
Yanning Zhang

Dichotomous image segmentation (DIS) has a wide range of real-world applications and gained increasing research attention in recent years. In this paper, we propose to tackle DIS with informative frequency priors. Our model, called FP-DIS, stems from the fact that prior knowledge in the frequency domain can provide valuable cues to identify fine-grained object boundaries. Specifically, we propose a frequency prior generator to jointly utilize a fixed filter and learnable filters to extract informative frequency priors. Before embedding the frequency priors into the network, we first harmonize the multi-scale side-out features to reduce their heterogeneity. This is achieved by our feature harmonization module, which is based on a gating mechanism to harmonize the grouped features. Finally, we propose a frequency prior embedding module to embed the frequency priors into multi-scale features through an adaptive modulation strategy. Extensive experiments on the benchmark dataset, DIS5K, demonstrate that our FP-DIS outperforms state-of-the-art methods by a large margin in terms of key evaluation metrics.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Head-Free Lightweight Semantic Segmentation with Linear Transformer

Bo Dong
Pichao Wang
Fan Wang

Existing semantic segmentation works have been mainly focused on designing effective decoders; however, the computational load introduced by the overall structure has long been ignored, which hinders their applications on resource-constrained hardwares. In this paper, we propose a head-free lightweight architecture specifically for semantic segmentation, named Adaptive Frequency Transformer (AFFormer). AFFormer adopts a parallel architecture to leverage prototype representations as specific learnable local descriptions which replaces the decoder and preserves the rich image semantics on high-resolution features. Although removing the decoder compresses most of the computation, the accuracy of the parallel structure is still limited by low computational resources. Therefore, we employ heterogeneous operators (CNN and vision Transformer) for pixel embedding and prototype representations to further save computational costs. Moreover, it is very difficult to linearize the complexity of the vision Transformer from the perspective of spatial domain. Due to the fact that semantic segmentation is very sensitive to frequency information, we construct a lightweight prototype learning block with adaptive frequency filter of complexity O(n) to replace standard self attention with O(n^2). Extensive experiments on widely adopted datasets demonstrate that AFFormer achieves superior accuracy while retaining only 3M parameters. On the ADE20K dataset, AFFormer achieves 41.8 mIoU and 4.6 GFLOPs, which is 4.4 mIoU higher than Segformer, with 45% less GFLOPs. On the Cityscapes dataset, AFFormer achieves 78.7 mIoU and 34.4 GFLOPs, which is 2.5 mIoU higher than Segformer with 72.5% less GFLOPs. Code is available at https://github.com/dongbo811/AFFormer.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

NerCo: A Contrastive Learning Based Two-Stage Chinese NER Method

Zai Zhang
Bin Shi
Haokun Zhang
Huang Xu
Yaodong Zhang
Yuefei Wu
Bo Dong
Qinghua Zheng

Sequence labeling serves as the most commonly used scheme for Chinese named entity recognition(NER). However, traditional sequence labeling methods classify tokens within an entity into different classes according to their positions. As a result, different tokens in the same entity may be learned with representations that are isolated and unrelated in target representation space, which could finally negatively affect the subsequent performance of token classification. In this paper, we point out and define this problem as Entity Representation Segmentation in Label-semantics. And then we present NerCo: Named entity recognition with Contrastive learning, a novel NER framework which can better exploit labeled data and avoid the above problem. Following the pretrain-finetune paradigm, NerCo firstly guides the encoder to learn powerful label-semantics based representations by gathering the encoded token representations of the same Semantic Class while pushing apart that of different. Subsequently, NerCo finetunes the learned encoder for final entity prediction. Extensive experiments on several datasets demonstrate that our framework can consistently improve the baseline and achieve state-of-the-art performance.

PDF Details DOI

NeurIPS Conference 2022 Conference Paper

Biologically Inspired Dynamic Thresholds for Spiking Neural Networks

Jianchuan Ding
Bo Dong
Felix Heide
Yufei Ding
Yunduo Zhou
Baocai Yin
Xin Yang

The dynamic membrane potential threshold, as one of the essential properties of a biological neuron, is a spontaneous regulation mechanism that maintains neuronal homeostasis, i. e. , the constant overall spiking firing rate of a neuron. As such, the neuron firing rate is regulated by a dynamic spiking threshold, which has been extensively studied in biology. Existing work in the machine learning community does not employ bioinspired spiking threshold schemes. This work aims at bridging this gap by introducing a novel bioinspired dynamic energy-temporal threshold (BDETT) scheme for spiking neural networks (SNNs). The proposed BDETT scheme mirrors two bioplausible observations: a dynamic threshold has 1) a positive correlation with the average membrane potential and 2) a negative correlation with the preceding rate of depolarization. We validate the effectiveness of the proposed BDETT on robot obstacle avoidance and continuous control tasks under both normal conditions and various degraded conditions, including noisy observations, weights, and dynamic environments. We find that the BDETT outperforms existing static and heuristic threshold approaches by significant margins in all tested conditions, and we confirm that the proposed bioinspired dynamic threshold scheme offers homeostasis to SNNs in complex real-world tasks.

PDF Details

AAAI Conference 2022 Conference Paper

Regularized Modal Regression on Markov-Dependent Observations: A Theoretical Assessment

Tieliang Gong
Yuxin Dong
Hong Chen
Wei Feng
Bo Dong
Chen Li

Modal regression, a widely used regression protocol, has been extensively investigated in statistical and machine learning communities due to its robustness to outliers and heavy-tailed noises. Understanding modal regression’s theoretical behavior can be fundamental in learning theory. Despite significant progress in characterizing its statistical property, the majority of the results are based on the assumption that samples are independent and identical distributed (i. i. d.), which is too restrictive for real-world applications. This paper concerns the statistical property of regularized modal regression (RMR) within an important dependence structure - Markov dependent. Specifically, we establish the upper bound for RMR estimator under moderate conditions and give an explicit learning rate. Our results show that the Markov dependence impacts on the generalization error in the way that sample size would be discounted by a multiplicative factor depending on the spectral gap of underlying Markov chain. This result shed a new light on characterizing the theoretical underpinning for robust regression.

PDF Details

AAAI Conference 2020 Short Paper

Few Sample Learning without Data Storage for Lifelong Stream Mining (Student Abstract)

Zhuoyi Wang
Yigong Wang
Yu Lin
Bo Dong
Hemeng Tao
Latifur Khan

Continuously mining complexity data stream has recently been attracting an increasing amount of attention, due to the rapid growth of real-world vision/signal applications such as self-driving cars and online social media messages. In this paper, we aim to address two signiﬁcant problems in the lifelong/incremental stream mining scenario: ﬁrst, how to make the learning algorithms generalize to the unseen classes only from a few labeled samples; second, is it possible to avoid storing instances from previously seen classes to solve the catastrophic forgetting problem? We introduce a novelty stream mining framework to classify the inﬁnite stream of data with different categories that occurred during different times. We apply a few-sample learning strategy to make the model recognize the novel class with limited samples; at the same time, we implement an incremental generative model to maintain old knowledge when learning new coming categories, and also avoid the violation of data privacy and memory restrictions simultaneously. We evaluate our approach in the continual class-incremental setup on the classiﬁcation tasks and ensure the sufﬁcient model capacity to accommodate for learning the new incoming categories.

PDF Details

JBHI Journal 2020 Journal Article

Machine Listening for Heart Status Monitoring: Introducing and Benchmarking HSS—The Heart Sounds Shenzhen Corpus

Fengquan Dong
Kun Qian
Zhao Ren
Alice Baird
Xinjian Li
Zhenyu Dai
Bo Dong
Florian Metze

Auscultation of the heart is a widely studied technique, which requires precise hearing from practitioners as a means of distinguishing subtle differences in heart-beat rhythm. This technique is popular due to its non-invasive nature, and can be an early diagnosis aid for a range of cardiac conditions. Machine listening approaches can support this process, monitoring continuously and allowing for a representation of both mild and chronic heart conditions. Despite this potential, relevant databases and benchmark studies are scarce. In this paper, we introduce our publicly accessible database, the Heart Sounds Shenzhen Corpus (HSS), which was first released during the recent INTERSPEECH 2018 ComParE Heart Sound sub-challenge. Additionally, we provide a survey of machine learning work in the area of heart sound recognition, as well as a benchmark for HSS utilising standard acoustic features and machine learning models. At best our support vector machine with Log Mel features achieves 49. 7% unweighted average recall on a three category task (normal, mild, moderate/severe).

Details DOI

IJCAI Conference 2019 Conference Paper

ATTENet: Detecting and Explaining Suspicious Tax Evasion Groups

Qinghua Zheng
Yating Lin
Huan He
Jianfei Ruan
Bo Dong

In this demonstration, we present ATTENet, a novel visual analytic system for detecting and explaining suspicious affiliated-transaction-based tax evasion (ATTE) groups. First, the system constructs a taxpayer interest interacted network, which contains economic behaviors and social relationships between taxpayers. Then, the system combines basic features and structure features of each group in the network with network embedding method structure2Vec, and then detects suspicious ATTE groups with random forest algorithm. Last, to explore and explain the detection results, the system provides an ATTENet visualization with three coordinated views and interactive tools. We demonstrate ATTENet on a non-confidential dataset which contains two years of real tax data obtained by our cooperative tax authorities to verify the usefulness of our system.

PDF Details

AAAI Conference 2019 Conference Paper

Multistream Classification with Relative Density Ratio Estimation

Bo Dong
Yang Gao
Swarup Chandra
Latifur Khan

In supervised learning, availability of sufficient labeled data is of prime importance. Unfortunately, they are sparingly available in many real-world applications. Particularly when performing classification over a non-stationary data stream, unavailability of sufficient labeled data undermines the classifier’s long-term performance by limiting its adaptability to changes in data distribution over time. Recently, studies in such settings have appealed to transfer learning techniques over a data stream while detecting drifts in data distribution over time. Here, the data stream is represented by two independent non-stationary streams, one containing labeled data instances (called source stream) having a biased distribution compared to the unlabeled data instances (called target stream). The task of label prediction under this representation is called Multistream Classification, where instances in the two streams occur independently. While these studies have addressed various challenges in the multistream setting, it still suffers from large computational overhead mainly due to frequent bias correction and drift adaptation methods employed. In this paper, we focus on utilizing an alternative bias correction technique, called relative density-ratio estimation, which is known to be computationally faster. Importantly, we propose a novel mechanism to automatically learn an appropriate mixture of relative density that adapts to changes in the multistream setting over time. We theoretically study its properties and empirically demonstrate its superior performance, within a multistream framework called MSCRDR, on benchmark datasets by comparing with other competing methods.

PDF Details