Arrow Research search

Author name cluster

Bo Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

78 papers
2 author rows

Possible papers

78

AAAI Conference 2026 Conference Paper

CCFQA: A Benchmark for Cross-Lingual and Cross-Modal Speech and Text Factuality Evaluation

  • Yexing Du
  • Kaiyuan Liu
  • Youcheng Pan
  • Zheng Chu
  • Bo Yang
  • Xiaocheng Feng
  • Ming Liu
  • Yang Xiang

As Large Language Models (LLMs) are increasingly popularized in the multilingual world, ensuring hallucination-free factuality becomes markedly crucial. However, existing benchmarks for evaluating the reliability of Multimodal Large Language Models (MLLMs) predominantly focus on textual or visual modalities with a primary emphasis on English, which creates a gap in evaluation when processing multilingual input, especially in speech. To bridge this gap, we propose a novel Cross-lingual and Cross-modal Factuality benchmark (CCFQA). Specifically, the CCFQA benchmark contains parallel speech-text factual questions across 8 languages, designed to systematically evaluate MLLMs' cross-lingual and cross-modal factuality capabilities. Our experimental results demonstrate that current MLLMs still face substantial challenges on the CCFQA benchmark. Furthermore, we propose a few-shot transfer learning strategy that effectively transfers the Question Answering (QA) capabilities of LLMs in English to multilingual Spoken Question Answering (SQA) tasks, achieving competitive performance with GPT-4o-mini-Audio using just 5-shot training. We release CCFQA as a foundational research resource to promote the development of MLLMs with more robust and reliable speech understanding capabilities.

AAAI Conference 2026 Conference Paper

MISF: MLLM Guided Iterative Sample Filtering for Data Fault Detection

  • Guoying Chen
  • Ruizhuo Zhao
  • Zhewei Xu
  • Bo Yang
  • Kunlong Wang

High quality datasets are critical for training reliable machine learning models, yet data faults caused by insufficient annotation expertise or malicious poisoning attacks remain prevalent. Traditional classifier based methods rely on manually curated subsets for fault detection, but their limited scale frequently leads to model overfitting. While multimodal large language models (MLLMs) based methods offer promising detection capabilities, their few-shot learning limitations hinder generalization in domain specific tasks. To address these challenges, we propose MLLM Guided Iterative Sample Filtering (MISF), a novel framework that combines the strengths of MLLM based initialization and iterative data refinement. Our framework initializes the detection model with MLLM generated synthetic images and a curated clean subset, then iteratively refines it by progressively selecting high certainty clean samples, improving both domain adaptation and detection accuracy. Extensive experiments on RESISC45 and Oxford-IIIT Pets datasets demonstrate that MISF effectively identifies data faults, outperforming existing approaches. MISF provides a robust, scalable solution for improving dataset quality in specialized domains.

AAAI Conference 2026 Conference Paper

Single-Stage fMRI-to-3D Reconstruction via Viewpoint-Aware Embedding and Hierarchical Guidance

  • Xun Zhang
  • Weihao Xia
  • Yulong Liu
  • Bo Yang
  • Alessandro Bozzon
  • Pan Wang

Understanding the neural basis of three-dimensional (3D) perception is a fundamental objective in cognitive neuroscience. Despite advances in decoding 2D visual stimuli from neural data, reconstructing high-fidelity 3D objects with detailed texture and geometry remains largely unexplored. In this work, we introduce NeuroSculptor3D, the first single-stage, end-to-end framework for reconstructing textured 3D shapes directly from brain activity. NeuroSculptor3D integrates a viewpoint-aware brain embedding module that captures fine-grained spatial variations across visual perspectives, and a hierarchical guidance mechanism that aligns brain-derived features with perceptual, semantic, and structural priors. Together, these components facilitate the generation of consistent multi-view embeddings, which are then decoded via TRELLIS to produce high-quality textured 3D reconstructions. Experiments on the fMRI-Shape dataset demonstrate that NeuroSculptor3D outperforms existing baselines across multiple settings, achieving significant improvements in both structural accuracy and semantic consistency. Code will be released to facilitate further research.

AAAI Conference 2026 Conference Paper

SkyMoE: A Vision-Language Foundation Model for Enhancing Geospatial Interpretation with Mixture of Experts

  • Jiaqi Liu
  • Ronghao Fu
  • Lang Sun
  • Haoran Liu
  • Xiao Yang
  • Weipeng Zhang
  • Xu Na
  • Zhuoran Duan

The emergence of large vision-language models (VLMs) has significantly enhanced the efficiency and flexibility of geospatial interpretation. However, general-purpose VLMs remain suboptimal for remote sensing (RS) tasks. Existing geospatial VLMs typically adopt a unified modeling strategy and struggle to differentiate between task types and interpretation granularities, limiting their ability to balance local detail perception and global contextual understanding. In this paper, we present SkyMoE, a Mixture-of-Experts (MoE) vision-language model tailored for multimodal, multi-task RS interpretation. SkyMoE employs an adaptive router that generates task- and granularity-aware routing instructions, enabling specialized large language model experts to handle diverse sub-tasks. To further promote expert decoupling and granularity sensitivity, we introduce a context-disentangled augmentation strategy that creates contrastive pairs between local and global features, guiding experts toward level-specific representation learning. We also construct MGRS-Bench, a comprehensive benchmark covering multiple RS interpretation tasks and granularity levels, to evaluate generalization in complex scenarios. Extensive experiments on 21 public datasets demonstrate that SkyMoE achieves state-of-the-art performance across tasks, validating its adaptability, scalability, and superior multi-granularity understanding in remote sensing.

ICRA Conference 2025 Conference Paper

Brain-Inspired Spatial Continuous State Encoding for Efficient Spiking-Based Navigation

  • Qingao Chai
  • Jiashuo Wang
  • Runhao Jiang
  • Bo Yang
  • Rui Yan
  • Huajin Tang

Spiking neural networks (SNNs) show great potential in mapless navigation tasks due to their low power consumption, but the continuous representation of spatial information poses a challenge to SNN training. Neuroscience findings reveal that spatial cognition cells encode spatial information through population spike patterns. Inspired by this, we propose a navigation method based on SNNs, leveraging spatial cognition cells, which include grid cells (GCs), head direction cells (HDCs), and boundary vector cells (BVCs). Our method integrates spike-based information to achieve precise navigation goal encoding and egocentric environment perception, significantly improving SNN navigation capabilities in complex environments. Simulation and real-world experiments demonstrate that our method achieves significant improvements in navigation success rate and energy efficiency, showcasing superior adaptability across environments. Our work provides a novel approach to developing efficient brain-inspired navigation systems.

IJCAI Conference 2025 Conference Paper

Distilling A Universal Expert from Clustered Federated Learning

  • Zeqi Leng
  • Chunxu Zhang
  • Guodong Long
  • Riting Xia
  • Bo Yang

Clustered Federated Learning (CFL) addresses the challenges posed by non-IID data by training multiple group- or cluster-specific expert models. However, existing methods often overlook the shared information across clusters, which represents the generalizable knowledge valuable to all participants in the Federated Learning (FL) system. To overcome this limitation, this paper introduces a novel FL framework that distills a universal expert model from the knowledge of multiple clusters. This universal expert captures globally shared information across all clients and is subsequently distributed to each client as the initialization for the next round of model training. The proposed FL framework operates in three iterative steps: (1) local model training at each client, (2) cluster-specific model aggregation, and (3) universal expert distillation. This three-step learning paradigm ensures the preservation of fine-grained non-IID characteristics while effectively incorporating shared knowledge across clusters. Compared to traditional gradient-based aggregation methods, the distillation-based model aggregation introduces greater flexibility in handling model heterogeneity and reduces conflicts among cluster-specific experts. Extensive experimental results demonstrate the superior performance of the proposed method across various scenarios, highlighting its potential to advance the state of CFL by balancing personalized and shared knowledge more effectively.

IROS Conference 2025 Conference Paper

Heterogeneous Graph Network-Based UWB Localization for Complex Indoor Environments

  • Bo Yang
  • Luyang Li
  • Sizhen He
  • Weinan Chen
  • Hong Zhang

Accurate indoor location-based services are important for mobile robots, especially in complex indoor environments. In this paper, we propose a heterogeneous graph network-based ultra-wide band (UWB) localization method to provide accurate and robust localization results for mobile robots in complex indoor scenarios. The core of our approach lies in constructing the anchors, ranging measurements and tags into a heterogeneous graph structure according to the topological structure of the UWB localization system, and then design a spatial-temporal heterogeneous graph attention neural network to extract high-level features and estimate the tag locations from the graph. Therefore, the geometric relationships contained in the UWB localization system are comprehensively established, while the spatial and temporal information contained in the ranging measurements can also be extracted. We validate the proposed method through real-world experiments. The results demonstrate that, compared to existing deep learning-based methods, the constructed heterogeneous graph better represents the geometric structure of the UWB localization system, and the designed heterogeneous graph neural network effectively extracts the spatial-temporal and geometric features. Consequently, the accuracy and robustness of UWB localization are significantly improved.

ICRA Conference 2025 Conference Paper

HSRL: A Hierarchical Control System Based on Spiking Deep Reinforcement Learning for Robot Navigation

  • Bo Yang
  • Shibo Zhou
  • Chaohui Lin
  • Qingao Chai
  • Rui Yan 0005
  • De Ma
  • Gang Pan 0001
  • Huajin Tang

Reinforcement Learning (RL) has shown promise in robotic navigation tasks, yet applying it to real-world environments remains challenging due to dynamic complexities and the need for dynamically feasible actions. We propose a hierarchical control framework based on Spiking Deep Reinforcement Learning (SDRL) for robust robot navigation in real environments. Our approach utilizes a two-layer architecture: a high-level decision layer powered by a Spiking GRU network for handling partially observable environments, and a low-level executive layer employing Continuous Attractor Neural Networks (CANNs) to ensure precise and continuous actions. This hierarchical structure allows real-time decisionmaking that respects the physical constraints of the robot. Experimental results show that our method adapts effectively to new environments without fine-tuning and surpasses existing methods in performance. We also explore the implementation on the Darwin3 chip, paving the way for biologically inspired motion control in future robotic applications.

AAAI Conference 2025 Conference Paper

Improving Integrated Gradient-based Transferable Adversarial Examples by Refining the Integration Path

  • Yuchen Ren
  • Zhengyu Zhao
  • Chenhao Lin
  • Bo Yang
  • Lu Zhou
  • Zhe Liu
  • Chao Shen

Transferable adversarial examples are known to cause threats in practical, black-box attack scenarios. A notable approach to improving transferability is using integrated gradients (IG), originally developed for model interpretability. In this paper, we find that existing IG-based attacks have limited transferability due to their naive adoption of IG in model interpretability. To address this limitation, we focus on the IG integration path and refine it in three aspects: multiplicity, monotonicity, and diversity, supported by theoretical analyses. We propose the Multiple Monotonic Diversified Integrated Gradients (MuMoDIG) attack, which can generate highly transferable adversarial examples on different CNN and ViT models and defenses. Experiments validate that MuMoDIG outperforms the latest IG-based attack by up to 37.3% and other state-of-the-art attacks by 8.4%. In general, our study reveals that migrating established techniques to improve transferability may require non-trivial efforts.

AAAI Conference 2025 Conference Paper

Multifaceted User Modeling in Recommendation: A Federated Foundation Models Approach

  • Chunxu Zhang
  • Guodong Long
  • Hongkuan Guo
  • Zhaojie Liu
  • Guorui Zhou
  • Zijian Zhang
  • Yang Liu
  • Bo Yang

Multifaceted user modeling aims to uncover fine-grained patterns and learn representations from user data, revealing their diverse interests and characteristics, such as profile, preference, and personality. Recent studies on foundation model-based recommendation have emphasized the Transformer architecture's remarkable ability to capture complex, non-linear user-item interaction relationships. This paper aims to advance foundation model-based recommendersystems by introducing enhancements to multifaceted user modeling capabilities. We propose a novel Transformer layer designed specifically for recommendation, using the self-attention mechanism to capture sequential user-item interaction patterns. Specifically, we design a group gating network to identify user groups, enabling hierarchical discovery across different layers, thereby capturing the multifaceted nature of user interests through multiple Transformer layers. Furthermore, to broaden the data scope and further enhance multifaceted user modeling, we extend the framework to a federated setting, enabling the use of private datasets while ensuring privacy. Experimental validations on benchmark datasets demonstrate the superior performance of our proposed method.

JAIR Journal 2025 Journal Article

Rumor Detection with Adaptive Data Augmentation and Adversarial Training

  • Ying Wang
  • Fuyuan Ma
  • Zhaoqi Yang
  • Yaodi Zhu
  • Bo Yang
  • Pengfei Shen
  • Lei Yun

Rumors are widely spread on social media, which has a negative impact on social stability. To address this problem, many rumor detection methods have been proposed. However, most existing methods overlook the potential impact of noise and adversarial attacks on their detection performance, which could compromise their effectiveness when applied in an unknown environment. To overcome these challenges and improve the framework robustness to noise and adversarial attacks, we propose a novel rumor detection framework with Adaptive Data Augmentation and Adversarial Training, named ADAAT. Our framework utilizes the adaptive data augmentation module to calculate the importance of edges and features and adaptively modify the less important among them with a greater probability. In addition, it contains a hard sample generation module which generates adversarial representations through adversarial training. These adversarial representations are treated as hard samples, which are utilized in contrastive learning to learn essential features, thereby improving the robustness of the framework. Our framework proves superiority in rumor detection tasks, increasing the accuracy by an average of 3.6%, 4.5% and 2.5% over the state-of-the-art methods on Twitter15, Twitter16 and PHEME, respectively. When the ADAAT framework is applied to attacked test data, the detection accuracy decreases by only 1.3%, 1.4%, and 1.2%. This paper appears in the AI & Society Track.

IJCAI Conference 2025 Conference Paper

State Feedback Enhanced Graph Differential Equations for Multivariate Time Series Forecasting

  • Jiaxu Cui
  • Qipeng Wang
  • Yiming Zhao
  • Bingyi Sun
  • Pengfei Wang
  • Bo Yang

Multivariate time series forecasting holds significant theoretical and practical importance in various fields, including web analytics and transportation. Recently, graph neural networks and graph differential equations have shown exceptional capabilities in modeling spatio-temporal features. However, existing methods often suffer from over-smoothing, hindering real-world problem-solving. In this work, we analyze the graph propagation process as a dynamical system and propose a novel feedback mechanism to enhance representation power, adaptively adjusting the representations to align with desired performance outcomes, thereby fundamentally mitigating the issue of over-smoothing. Moreover, we introduce an effective multivariate time series forecasting model called SF-GDE, based on the proposed graph propagation with the feedback mechanism. Intensive experiments are conducted on three real-world datasets from diverse fields. Results show that SF-GDE outperforms the state of the arts, and the feedback mechanism can serve as a universal booster to improve performance for graph propagation models.

NeurIPS Conference 2025 Conference Paper

TimeEmb: A Lightweight Static-Dynamic Disentanglement Framework for Time Series Forecasting

  • Mingyuan Xia
  • Chunxu Zhang
  • Zijian Zhang
  • Hao Miao
  • Qidong Liu
  • Yuanshao Zhu
  • Bo Yang

Temporal non-stationarity, the phenomenon that time series distributions change over time, poses fundamental challenges to reliable time series forecasting. Intuitively, the complex time series can be decomposed into two factors, i. e. , time-invariant and time-varying components, which indicate static and dynamic patterns, respectively. Nonetheless, existing methods often conflate the time-varying and time-invariant components, and jointly learn the combined long-term patterns and short-term fluctuations, leading to suboptimal performance facing distribution shifts. To address this issue, we initiatively propose a lightweight static-dynamic decomposition framework, TimeEmb, for time series forecasting. TimeEmb innovatively separates time series into two complementary components: (1) time-invariant component, captured by a novel global embedding module that learns persistent representations across time series, and (2) time-varying component, processed by an efficient frequency-domain filtering mechanism inspired by full-spectrum analysis in signal processing. Experiments on real-world datasets demonstrate that TimeEmb outperforms state-of-the-art baselines and requires fewer computational resources. We conduct comprehensive quantitative and qualitative analyses to verify the efficacy of static-dynamic disentanglement. This lightweight framework can also improve existing time-series forecasting methods with simple integration. To ease reproducibility, our code is available at https: //github. com/showmeon/TimeEmb.

IJCAI Conference 2025 Conference Paper

Towards Generalizable Neural Simulators: Addressing Distribution Shifts Induced by Environmental and Temporal Variations

  • Jiaqi Liu
  • Jiaxu Cui
  • Shiang Sun
  • Yizhu Zhao
  • Bo Yang

With advancements in deep learning, neural simulators have become increasingly important for improving the efficiency and effectiveness of simulating complex dynamical systems in various scientific and technological fields. This paper presents a novel neural simulator called Context-informed Polymorphic Neural ODE Processes (CoPoNDP), aimed at addressing the challenges of modeling dynamical systems encountering concurrent environmental and temporal distribution shifts, which are common in real-world scenarios. CoPoNDP employs a context-driven neural stochastic process governed by a combination of basic differential equations in a time-sensitive manner to adaptively modulate the evolution of system states. This allows for flexible adaptation to changing temporal dynamics and generalization across different environments. Extensive experiments conducted on dynamical systems from ecology, chemistry, physics, and energy demonstrate that by effectively utilizing contextual information, CoPoNDP outperforms the state-of-the-art models in handling joint distribution shifts. It also shows robustness in sparse and noisy settings, making it a promising approach for modeling dynamical systems in complex real-world applications.

IROS Conference 2024 Conference Paper

CASRL: Collision Avoidance with Spiking Reinforcement Learning Among Dynamic, Decision-Making Agents

  • Chengjun Zhang
  • Ka-Wa Yip
  • Bo Yang
  • Zhiyong Zhang
  • Mengwen Yuan
  • Rui Yan 0005
  • Huajin Tang

Developing an efficient collision avoidance policy with Spiking Reinforcement Learning for dynamic, decision-making agents remains challenging. Moreover, the implementation of energy-efficient collision avoidance is important for mobile robots that operate with limited on-board computing resources. Most existing energy-efficient methods via spiking reinforcement learning are predominately concerned with the navigational capabilities of a single agent, and are unable to handle a large, and possibly varying number of agents. To overcome these limitations, we propose a model called collision avoidance with spiking reinforcement learning (CASRL), based on proximal policy optimization algorithms. This proposed model consists of an actor with spiking neural networks (SNNs) and a critic with deep neural networks (DNNs). Our spiking reinforcement learning algorithm is advantageous to handle an arbitrary number of other agents by virtue of a spiking-gated transformer (SpikeGTr) architecture and an accumulate-to-fire (ATF) module. Extensive experimental results demonstrate that CASRL obtains a competitive success rate of navigation and exhibits higher time-efficiency for navigation in crowded scenarios compared to traditional DNN-based methods.

NeurIPS Conference 2024 Conference Paper

Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling

  • Sili Huang
  • Jifeng Hu
  • Zhejian Yang
  • Liwei Yang
  • Tao Luo
  • Hechang Chen
  • Lichao Sun
  • Bo Yang

Recent works have shown the remarkable superiority of transformer models in reinforcement learning (RL), where the decision-making problem is formulated as sequential generation. Transformer-based agents could emerge with self-improvement in online environments by providing task contexts, such as multiple trajectories, called in-context RL. However, due to the quadratic computation complexity of attention in transformers, current in-context RL methods suffer from huge computational costs as the task horizon increases. In contrast, the Mamba model is renowned for its efficient ability to process long-term dependencies, which provides an opportunity for in-context RL to solve tasks that require long-term memory. To this end, we first implement Decision Mamba (DM) by replacing the backbone of Decision Transformer (DT). Then, we propose a Decision Mamba-Hybrid (DM-H) with the merits of transformers and Mamba in high-quality prediction and long-term memory. Specifically, DM-H first generates high-value sub-goals from long-term memory through the Mamba model. Then, we use sub-goals to prompt the transformer, establishing high-quality predictions. Experimental results demonstrate that DM-H achieves state-of-the-art in long and short-term tasks, such as D4RL, Grid World, and Tmaze benchmarks. Regarding efficiency, the online testing of DM-H in the long-term task is 28$\times$ times faster than the transformer-based baselines.

IJCAI Conference 2024 Conference Paper

Federated Adaptation for Foundation Model-based Recommendations

  • Chunxu Zhang
  • Guodong Long
  • Hongkuan Guo
  • Xiao Fang
  • Yang Song
  • Zhaojie Liu
  • Guorui Zhou
  • Zijian Zhang

With the recent success of large language models, particularly foundation models with generalization abilities, applying foundation models for recommendations becomes a new paradigm to improve existing recommendation systems. It becomes a new open challenge to enable the foundation model to capture user preference changes in a timely manner with reasonable communication and computation costs while preserving privacy. This paper proposes a novel federated adaptation mechanism to enhance the foundation model-based recommendation system in a privacy-preserving manner. Specifically, each client will learn a lightweight personalized adapter using its private data. The adapter then collaborates with pre-trained foundation models to provide recommendation service efficiently with fine-grained manners. Importantly, users' private behavioral data remains secure as it is not shared with the server. This data localization-based privacy preservation is embodied via the federated learning framework. The model can ensure that shared knowledge is incorporated into all adapters while simultaneously preserving each user's personal preferences. Experimental results on four benchmark datasets demonstrate our method's superior performance. The code is available.

AAAI Conference 2024 Conference Paper

Fine-Grained Prototypes Distillation for Few-Shot Object Detection

  • Zichen Wang
  • Bo Yang
  • Haonan Yue
  • Zhenghao Ma

Few-shot object detection (FSOD) aims at extending a generic detector for novel object detection with only a few training examples. It attracts great concerns recently due to the practical meanings. Meta-learning has been demonstrated to be an effective paradigm for this task. In general, methods based on meta-learning employ an additional support branch to encode novel examples (a.k.a. support images) into class prototypes, which are then fused with query branch to facilitate the model prediction. However, the class-level prototypes are difficult to precisely generate, and they also lack detailed information, leading to instability in performance. New methods are required to capture the distinctive local context for more robust novel object detection. To this end, we propose to distill the most representative support features into fine-grained prototypes. These prototypes are then assigned into query feature maps based on the matching results, modeling the detailed feature relations between two branches. This process is realized by our Fine-Grained Feature Aggregation (FFA) module. Moreover, in terms of high-level feature fusion, we propose Balanced Class-Agnostic Sampling (B-CAS) strategy and Non-Linear Fusion (NLF) module from differenct perspectives. They are complementary to each other and depict the high-level feature relations more effectively. Extensive experiments on PASCAL VOC and MS COCO benchmarks show that our method sets a new state-of-the-art performance in most settings. Our code is available at https://github.com/wangchen1801/FPD.

IJCAI Conference 2024 Conference Paper

Stochastic Neural Simulator for Generalizing Dynamical Systems across Environments

  • Liu Jiaqi
  • Jiaxu Cui
  • Jiayi Yang
  • Bo Yang

Neural simulators for modeling complex dynamical systems have been extensively studied for various real-world applications, such as weather forecasting, ocean current prediction, and computational fluid dynamics simulation. Although they have demonstrated powerful fitting and predicting, most existing models are only built to learn single-system dynamics. Several advanced researches have considered learning dynamics across environments, which can exploit the potential commonalities among the dynamics across environments and adapt to new environments. However, these methods still are prone to scarcity problems where per-environment data is sparse or limited. Therefore, we propose a novel CoNDP (Context-Informed Neural ODE Processes) to achieve learning system dynamics from sparse observations across environments. It can fully use contextual information of each environment to better capture the intrinsic commonalities across environments and distinguishable differences among environments while modeling uncertainty of system evolution, producing more accurate predictions. Intensive experiments are conducted on five complex dynamical systems in various fields. Results show that the proposed CoNDP can achieve optimal results compared with common neural simulators and state-of-the-art cross-environmental models.

IJCAI Conference 2023 Conference Paper

Do We Need an Encoder-Decoder to Model Dynamical Systems on Networks?

  • Bing Liu
  • Wei Luo
  • Gang Li
  • Jing Huang
  • Bo Yang

As deep learning gains popularity in modelling dynamical systems, we expose an underappreciated misunderstanding relevant to modelling dynamics on networks. Strongly influenced by graph neural networks, latent vertex embeddings are naturally adopted in many neural dynamical network models. However, we show that embeddings tend to induce a model that fits observations well but simultaneously has incorrect dynamical behaviours. Recognising that previous studies narrowly focus on short-term predictions during the transient phase of a flow, we propose three tests for correct long-term behaviour, and illustrate how an embedding-based dynamical model fails these tests, and analyse the causes, particularly through the lens of topological conjugacy. In doing so, we show that the difficulties can be avoided by not using embedding. We propose a simple embedding-free alternative based on parametrising two additive vector-field components. Through extensive experiments, we verify that the proposed model can reliably recover a broad class of dynamics on different network topologies from time series data.

ICRA Conference 2023 Conference Paper

DriveIRL: Drive in Real Life with Inverse Reinforcement Learning

  • Tung Phan-Minh
  • Forbes Howington
  • Ting-Sheng Chu
  • Momchil S. Tomov
  • Robert E. Beaudoin
  • Sang Uk Lee
  • Nanxiang Li
  • Caglayan Dicle

In this paper, we introduce the first published planner to drive a car in dense, urban traffic using Inverse Reinforcement Learning (IRL). Our planner, DriveIRL, generates a diverse set of trajectory proposals and scores them with a learned model. The best trajectory is tracked by our self-driving vehicle's low-level controller. We train our trajectory scoring model on a 500+ hour real-world dataset of expert driving demonstrations in Las Vegas within the maximum entropy IRL framework. DriveIRL's benefits include: a simple design due to only learning the trajectory scoring function, a flexible and relatively interpretable feature engineering approach, and strong real-world performance. We validated DriveIRL on the Las Vegas Strip and demonstrated fully autonomous driving in heavy traffic, including scenarios involving cut-ins, abrupt braking by the lead vehicle, and hotel pickup/dropoff zones. Our dataset, a part of nuPlan, has been released to the public to help further research in this area.

IJCAI Conference 2023 Conference Paper

Dual Personalization on Federated Recommendation

  • Chunxu Zhang
  • Guodong Long
  • Tianyi Zhou
  • Peng Yan
  • Zijian Zhang
  • Chengqi Zhang
  • Bo Yang

Federated recommendation is a new Internet service architecture that aims to provide privacy-preserving recommendation services in federated settings. Existing solutions are used to combine distributed recommendation algorithms and privacy-preserving mechanisms. Thus it inherently takes the form of heavyweight models at the server and hinders the deployment of on-device intelligent models to end-users. This paper proposes a novel Personalized Federated Recommendation (PFedRec) framework to learn many user-specific lightweight models to be deployed on smart devices rather than a heavyweight model on a server. Moreover, we propose a new dual personalization mechanism to effectively learn fine-grained personalization on both users and items. The overall learning process is formulated into a unified federated optimization framework. Specifically, unlike previous methods that share exactly the same item embeddings across users in a federated system, dual personalization allows mild finetuning of item embeddings for each user to generate user-specific views for item representations which can be integrated into existing federated recommendation methods to gain improvements immediately. Experiments on multiple benchmark datasets have demonstrated the effectiveness of PFedRec and the dual personalization mechanism. Moreover, we provide visualizations and in-depth analysis of the personalization techniques in item embedding, which shed novel insights on the design of recommender systems in federated settings. The code is available.

NeurIPS Conference 2023 Conference Paper

Learning Generalizable Agents via Saliency-guided Features Decorrelation

  • Sili Huang
  • Yanchao Sun
  • Jifeng Hu
  • Siyuan Guo
  • Hechang Chen
  • Yi Chang
  • Lichao Sun
  • Bo Yang

In visual-based Reinforcement Learning (RL), agents often struggle to generalize well to environmental variations in the state space that were not observed during training. The variations can arise in both task-irrelevant features, such as background noise, and task-relevant features, such as robot configurations, that are related to the optimal decisions. To achieve generalization in both situations, agents are required to accurately understand the impact of changed features on the decisions, i. e. , establishing the true associations between changed features and decisions in the policy model. However, due to the inherent correlations among features in the state space, the associations between features and decisions become entangled, making it difficult for the policy to distinguish them. To this end, we propose Saliency-Guided Features Decorrelation (SGFD) to eliminate these correlations through sample reweighting. Concretely, SGFD consists of two core techniques: Random Fourier Functions (RFF) and the saliency map. RFF is utilized to estimate the complex non-linear correlations in high-dimensional images, while the saliency map is designed to identify the changed features. Under the guidance of the saliency map, SGFD employs sample reweighting to minimize the estimated correlations related to changed features, thereby achieving decorrelation in visual RL tasks. Our experimental results demonstrate that SGFD can generalize well on a wide range of test environments and significantly outperforms state-of-the-art methods in handling both task-irrelevant variations and task-relevant variations.

JBHI Journal 2023 Journal Article

Lung Sound Recognition Method Based on Multi-Resolution Interleaved Net and Time-Frequency Feature Enhancement

  • Lukui Shi
  • Jingye Zhang
  • Bo Yang
  • Yingjie Gao

Air pollution and aging population have caused increasing rates of lung diseases and elderly lung diseases year by year. At the same time, the outbreak of COVID-19 has brought challenges to the medical system, which placed higher demands on preventing lung diseases and improving diagnostic efficiency to some extent. Artificial intelligence can alleviate the burden on the medical system by analyzing lung sound signals to help to diagnose lung diseases. The existing models for lung sound recognition have challenges in capturing the correlation between time and frequency information. It is difficult for convolutional neural network to capture multi-scale features across different resolutions, and the fusion of features ignores the difference of influences between time and frequency features. To address these issues, a lung sound recognition model based on multi-resolution interleaved net and time-frequency feature enhancement was proposed, which consisted of a heterogeneous dual-branch time-frequency feature extractor (TFFE), a time-frequency feature enhancement module based on branch attention (FEBA), and a fusion semantic classifier based on semantic mapping (FSC). TFFE independently extracts the time and frequency information of lung sounds through a multi-resolution interleaved net and Transformer, which maintains the correlation between time-frequency features. FEBA focuses on the differences in the influence of time and frequency information on prediction results by branch attention. The proposed model achieved an accuracy of 91. 56% on the combined dataset, by an improvement of over 2. 13% compared to other models.

JBHI Journal 2023 Journal Article

Natural Grasp Intention Recognition Based on Gaze in Human–Robot Interaction

  • Bo Yang
  • Jian Huang
  • Xinxing Chen
  • Xiaolong Li
  • Yasuhisa Hasegawa

Objective: While neuroscience research has established a link between vision and intention, studies on gaze data features for intention recognition are absent. The majority of existing gaze-based intention recognition approaches are based on deliberate long-term fixation and suffer from insufficient accuracy. In order to address the lack of features and insufficient accuracy in previous studies, the primary objective of this study is to suppress noise from human gaze data and extract useful features for recognizing grasp intention. Methods: We conduct gaze movement evaluation experiments to investigate the characteristics of gaze motion. The target-attracted gaze movement model (TAGMM) is proposed as a quantitative description of gaze movement based on the findings. A Kalman filter (KF) is used to reduce the noise in the gaze data based on TAGMM. We conduct gaze-based natural grasp intention recognition evaluation experiments to collect the subject's gaze data. Four types of features describing gaze point dispersion ( $f_{var}$ ), gaze point movement ( $f_{gm}$ ), head movement ( $f_{hm}$ ), and distance from the gaze points to objects ( $f_{d_{j}}$ ) are then proposed to recognize the subject's grasp intentions. With the proposed features, we perform intention recognition experiments, employing various classifiers, and the results are compared with different methods. Results: The statistical analysis reveals that the proposed features differ significantly across intentions, offering the possibility of employing these features to recognize grasp intentions. We demonstrated the intention recognition performance utilizing the TAGMM and the proposed features in within-subject and cross-subject experiments. The results indicate that the proposed method can recognize the intention with accuracy improvements of 44. 26% (within-subject) and 30. 67% (cross-subject) over the fixation-based method. The proposed method also consumes less time (34. 87 ms) to recognize the intention than the fixation-based method (about 1 s). Conclusion: This work introduces a novel TAGMM for modeling gaze movement and a variety of practical features for recognizing grasp intentions. Experiments confirm the effectiveness of our approach. Significance: The proposed TAGMM is capable of modeling gaze movements and can be utilized to process gaze data, and the proposed features can reveal the user's intentions. These results contribute to the development of gaze-based human-robot interaction.

NeurIPS Conference 2023 Conference Paper

NVFi: Neural Velocity Fields for 3D Physics Learning from Dynamic Videos

  • Jinxi Li
  • Ziyang Song
  • Bo Yang

In this paper, we aim to model 3D scene dynamics from multi-view videos. Unlike the majority of existing works which usually focus on the common task of novel view synthesis within the training time period, we propose to simultaneously learn the geometry, appearance, and physical velocity of 3D scenes only from video frames, such that multiple desirable applications can be supported, including future frame extrapolation, unsupervised 3D semantic scene decomposition, and dynamic motion transfer. Our method consists of three major components, 1) the keyframe dynamic radiance field, 2) the interframe velocity field, and 3) a joint keyframe and interframe optimization module which is the core of our framework to effectively train both networks. To validate our method, we further introduce two dynamic 3D datasets: 1) Dynamic Object dataset, and 2) Dynamic Indoor Scene dataset. We conduct extensive experiments on multiple datasets, demonstrating the superior performance of our method over all baselines, particularly in the critical tasks of future frame extrapolation and unsupervised 3D semantic scene decomposition.

AAAI Conference 2023 Conference Paper

OPT-GAN: A Broad-Spectrum Global Optimizer for Black-Box Problems by Learning Distribution

  • Minfang Lu
  • Shuai Ning
  • Shuangrong Liu
  • Fengyang Sun
  • Bo Zhang
  • Bo Yang
  • Lin Wang

Black-box optimization (BBO) algorithms are concerned with finding the best solutions for problems with missing analytical details. Most classical methods for such problems are based on strong and fixed a priori assumptions, such as Gaussianity. However, the complex real-world problems, especially when the global optimum is desired, could be very far from the a priori assumptions because of their diversities, causing unexpected obstacles. In this study, we propose a generative adversarial net-based broad-spectrum global optimizer (OPT-GAN) which estimates the distribution of optimum gradually, with strategies to balance exploration-exploitation trade-off. It has potential to better adapt to the regularity and structure of diversified landscapes than other methods with fixed prior, e.g., Gaussian assumption or separability. Experiments on diverse BBO benchmarks and high dimensional real world applications exhibit that OPT-GAN outperforms other traditional and neural net-based BBO algorithms. The code and Appendix are available at https://github.com/NBICLAB/OPT-GAN

NeurIPS Conference 2023 Conference Paper

RayDF: Neural Ray-surface Distance Fields with Multi-view Consistency

  • Zhuoman Liu
  • Bo Yang
  • Yan Luximon
  • Ajay Kumar
  • Jinxi Li

In this paper, we study the problem of continuous 3D shape representations. The majority of existing successful methods are coordinate-based implicit neural representations. However, they are inefficient to render novel views or recover explicit surface points. A few works start to formulate 3D shapes as ray-based neural functions, but the learned structures are inferior due to the lack of multi-view geometry consistency. To tackle these challenges, we propose a new framework called RayDF. It consists of three major components: 1) the simple ray-surface distance field, 2) the novel dual-ray visibility classifier, and 3) a multi-view consistency optimization module to drive the learned ray-surface distances to be multi-view geometry consistent. We extensively evaluate our method on three public datasets, demonstrating remarkable performance in 3D surface point reconstruction on both synthetic and challenging real-world 3D scenes, clearly surpassing existing coordinate-based and ray-based baselines. Most notably, our method achieves a 1000x faster speed than coordinate-based methods to render an 800x800 depth image, showing the superiority of our method for 3D shape representation. Our code and data are available at https: //github. com/vLAR-group/RayDF

IROS Conference 2023 Conference Paper

Spiking Reinforcement Learning with Memory Ability for Mapless Navigation

  • Bo Yang
  • Mengwen Yuan
  • Chengjun Zhang
  • Chaofei Hong
  • Gang Pan 0001
  • Huajin Tang

Our study focuses on mapless navigation in robotics, which involves navigating without an established obstacle map of the environment. Spiking Neural Networks (SNNs) have recently been applied to this task using Deep Reinforcement Learning (DRL), but face challenges in dynamic and partially observable environments, as well as inaccuracies in transmitted data. To overcome these issues, we propose a Multi-Critic DDPG with Spiking Memory (MC-DDPGSM) framework. Our approach introduces a spiking Gate Recurrent Unit layer (Spiking-GRU) to provide memory function and evaluates the state-action value with multi-critic networks. The experimental results demonstrate that our method achieves better performance (success rate, navigation distance, navigation time spent, and power consumption) in complex navigation tasks compared to the state-of-the-art approaches. Furthermore, our model can be transferred to unseen environments without the need for fine-tuning.

IJCAI Conference 2023 Conference Paper

STS-GAN: Can We Synthesize Solid Texture with High Fidelity from Arbitrary 2D Exemplar?

  • Xin Zhao
  • Jifeng Guo
  • Lin Wang
  • Fanqi Li
  • Jiahao Li
  • Junteng Zheng
  • Bo Yang

Solid texture synthesis (STS), an effective way to extend a 2D exemplar to a 3D solid volume, exhibits advantages in computational photography. However, existing methods generally fail to accurately learn arbitrary textures, which may result in the failure to synthesize solid textures with high fidelity. In this paper, we propose a novel generative adversarial nets-based framework (STS-GAN) to extend the given 2D exemplar to arbitrary 3D solid textures. In STS-GAN, multi-scale 2D texture discriminators evaluate the similarity between the given 2D exemplar and slices from the generated 3D texture, promoting the 3D texture generator synthesizing realistic solid textures. Finally, experiments demonstrate that the proposed method can generate high-fidelity solid textures with similar visual characteristics to the 2D exemplar.

JBHI Journal 2022 Journal Article

An Expectation Maximization Based Adaptive Group Testing Method for Improving Efficiency and Sensitivity of Large-Scale Screening of COVID-19

  • Xiaofang Xia
  • Yang Liu
  • Bo Yang
  • Yingfan Liu
  • Jiangtao Cui
  • Yinlong Zhang

The pathogen of the ongoing coronavirus disease 2019 (COVID-19) pandemic is a newly discovered virus called severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Testing individuals for SARS-CoV-2 plays a critical role in containing COVID-19. For saving medical personnel and consumables, many countries are implementing group testing against SARS-CoV-2. However, existing group testing methods have the following limitations: (1) The group size is determined without theoretical analysis, and hence is usually not optimal. This adversely impacts the screening efficiency. (2) These methods neglect the fact that mixing samples together usually leads to substantial dilution of the SARS-CoV-2 virus, which seriously impacts the sensitivity of tests. In this paper, we aim to screen individuals infected with COVID-19 with as few tests as possible, under the premise that the sensitivity of tests is high enough. We propose an eXpectation Maximization based Adaptive Group Testing (XMAGT) method. The basic idea is to adaptively adjust its testing strategy between a group testing strategy and an individual testing strategy such that the expected number of samples identified by a single test is larger. During the screening process, the XMAGT method can estimate the ratio of positive samples. With this ratio, the XMAGT method can determine a group size under which the group testing strategy can achieve a maximal expected number of negative samples and the sensitivity of tests is higher than a user-specified threshold. Experimental results show that the XMAGT method outperforms existing methods in terms of both efficiency and sensitivity.

NeurIPS Conference 2022 Conference Paper

HSDF: Hybrid Sign and Distance Field for Modeling Surfaces with Arbitrary Topologies

  • Li Wang
  • Jie Yang
  • Weikai Chen
  • Xiaoxu Meng
  • Bo Yang
  • Jintao Li
  • Lin Gao

Neural implicit function based on signed distance field (SDF) has achieved impressive progress in reconstructing 3D models with high fidelity. However, such approaches can only represent closed shapes. Recent works based on unsigned distance function (UDF) are proposed to handle both watertight and open surfaces. Nonetheless, as UDF is signless, its direct output is limited to point cloud, which imposes an additional challenge on extracting high-quality meshes from discrete points. To address this issue, we present a new learnable implicit representation, coded HSDF, that connects the good ends of SDF and UDF. In particular, HSDF is able to represent arbitrary topologies containing both closed and open surfaces while being compatible with existing iso-surface extraction techniques for easy field-to-mesh conversion. In addition to predicting a UDF, we propose to learn an additional sign field via a simple classifier. Unlike traditional SDF, HSDF is able to locate the surface of interest before level surface extraction by generating surface points following NDF~\cite{chibane2020ndf}. We are then able to obtain open surfaces via an adaptive meshing approach that only instantiates regions containing surface into a polygon mesh. We also propose HSDF-Net, a dedicated learning framework that factorizes the learning of HSDF into two easier problems. Experiments on multiple datasets show that HSDF outperforms state-of-the-art techniques both qualitatively and quantitatively.

NeurIPS Conference 2022 Conference Paper

OGC: Unsupervised 3D Object Segmentation from Rigid Dynamics of Point Clouds

  • Ziyang Song
  • Bo Yang

In this paper, we study the problem of 3D object segmentation from raw point clouds. Unlike all existing methods which usually require a large amount of human annotations for full supervision, we propose the first unsupervised method, called OGC, to simultaneously identify multiple 3D objects in a single forward pass, without needing any type of human annotations. The key to our approach is to fully leverage the dynamic motion patterns over sequential point clouds as supervision signals to automatically discover rigid objects. Our method consists of three major components, 1) the object segmentation network to directly estimate multi-object masks from a single point cloud frame, 2) the auxiliary self-supervised scene flow estimator, and 3) our core object geometry consistency component. By carefully designing a series of loss functions, we effectively take into account the multi-object rigid consistency and the object shape invariance in both temporal and spatial scales. This allows our method to truly discover the object geometry even in the absence of annotations. We extensively evaluate our method on five datasets, demonstrating the superior performance for object part instance segmentation and general object segmentation in both indoor and the challenging outdoor scenarios.

NeurIPS Conference 2022 Conference Paper

Promising or Elusive? Unsupervised Object Segmentation from Real-world Single Images

  • Yafei YANG
  • Bo Yang

In this paper, we study the problem of unsupervised object segmentation from single images. We do not introduce a new algorithm, but systematically investigate the effectiveness of existing unsupervised models on challenging real-world images. We firstly introduce four complexity factors to quantitatively measure the distributions of object- and scene-level biases in appearance and geometry for datasets with human annotations. With the aid of these factors, we empirically find that, not surprisingly, existing unsupervised models catastrophically fail to segment generic objects in real-world images, although they can easily achieve excellent performance on numerous simple synthetic datasets, due to the vast gap in objectness biases between synthetic and real images. By conducting extensive experiments on multiple groups of ablated real-world datasets, we ultimately find that the key factors underlying the colossal failure of existing unsupervised models on real-world images are the challenging distributions of object- and scene-level biases in appearance and geometry. Because of this, the inductive biases introduced in existing unsupervised models can hardly capture the diverse object distributions. Our research results suggest that future work should exploit more explicit objectness biases in the network design.

AAAI Conference 2022 Conference Paper

Towards Robust Off-Policy Learning for Runtime Uncertainty

  • Da Xu
  • Yuting Ye
  • Chuanwei Ruan
  • Bo Yang

Off-policy learning plays a pivotal role in optimizing and evaluating policies prior to the online deployment. However, during the real-time serving, we observe varieties of interventions and constraints that cause inconsistency between the online and offline settings, which we summarize and term as runtime uncertainty. Such uncertainty cannot be learned from the logged data due to its abnormality and rareness nature. To assert a certain level of robustness, we perturb the off-policy estimators along an adversarial direction in view of the runtime uncertainty. It allows the resulting estimators to be robust not only to observed but also unexpected runtime uncertainties. Leveraging this idea, we bring runtime-uncertainty robustness to three major off-policy learning methods: the inverse propensity score method, reward-model method, and doubly robust method. We theoretically justify the robustness of our methods to runtime uncertainty, and demonstrate their effectiveness using both the simulation and the real-world online experiments.

AAAI Conference 2021 Conference Paper

Cost-aware Graph Generation: A Deep Bayesian Optimization Approach

  • Jiaxu Cui
  • Bo Yang
  • Bingyi Sun
  • Jiming Liu

Graph-structured data is ubiquitous throughout the natural and social sciences, ranging from complex drug molecules to artificial neural networks. Evaluating their functional properties, e. g. , drug effectiveness and prediction accuracy, is usually costly in terms of time, money, energy, or environment, becoming a bottleneck for the graph generation task. In this work, from the perspective of saving cost, we propose a novel Cost-Aware Graph Generation (CAGG) framework to generate graphs with optimal properties at as low cost as possible. By introducing a robust Bayesian graph neural network as the surrogate model and a goal-oriented training scheme for the generation model, the CAGG can approach the real expensive evaluation function and generate search space close to the optimal property, to avoid unnecessary evaluations. Intensive experiments conducted on two challenging real-world applications, including molecular discovery and neural architecture search, demonstrate its effectiveness and applicability. The results show that it can generate the optimal graphs and reduce the evaluation costs significantly compared to the state-of-the-art.

AAMAS Conference 2021 Conference Paper

Drone Formation Control via Belief-Correlated Imitation Learning

  • Bo Yang
  • Chaofan Ma
  • Xiaofang Xia

The proliferation of unmanned aerial vehicles (UAVs) has flourished various intelligent services, in which the effective coordination plays a significant role in enhancing swarm execution efficiency. However, due to the unreliable communication in the air as well as the heterogeneity in operation mode, it is challenging to achieve highly coordinated actions, particularly in the fully distributed environment with incomplete observations. In this paper, we leverage the generative adversarial imitation learning (GAIL) technique to coordinate the drones’ actions by directly imitating the peer’s demonstrations. In order to characterize the true environment state under local incomplete observations, we transform historical observation-action trajectories into belief representations, which are trained in conjunction with the imitation policies. We also gain regularized belief representations by correlating the prediction of future states, the trace of historical contexts, and the action-assisted guidance information, which contribute to more accurate imitation policies. We evaluate the proposed algorithm on the drones’ formation control scenario. Evaluation results show the superiorities on imitation accuracy, teamwork execution time and energy cost.

AAAI Conference 2021 Conference Paper

Graph and Temporal Convolutional Networks for 3D Multi-person Pose Estimation in Monocular Videos

  • Yu Cheng
  • Bo Wang
  • Bo Yang
  • Robby T. Tan

Despite the recent progress, 3D multi-person pose estimation from monocular videos is still challenging due to the commonly encountered problem of missing information caused by occlusion, partially out-of-frame target persons, and inaccurate person detection. To tackle this problem, we propose a novel framework integrating graph convolutional networks (GCNs) and temporal convolutional networks (TCNs) to robustly estimate camera-centric multi-person 3D poses that does not require camera parameters. In particular, we introduce a human-joint GCN, which unlike the existing GCN, is based on a directed graph that employs the 2D pose estimator’s confidence scores to improve the pose estimation results. We also introduce a human-bone GCN, which models the bone connections and provides more information beyond human joints. The two GCNs work together to estimate the spatial frame-wise 3D poses, and can make use of both visible joint and bone information in the target frame to estimate the occluded or missing human-part information. To further refine the 3D pose estimation, we use our temporal convolutional networks (TCNs) to enforce the temporal and human-dynamics constraints. We use a joint-TCN to estimate person-centric 3D poses across frames, and propose a velocity-TCN to estimate the speed of 3D joints to ensure the consistency of the 3D pose estimation in consecutive frames. Finally, to estimate the 3D human poses for multiple persons, we propose a root-TCN that estimates camera-centric 3D poses without requiring camera parameters. Quantitative and qualitative evaluations demonstrate the effectiveness of the proposed method. Our code and models are available at https: //github. com/3dpose/GnTCN.

NeurIPS Conference 2021 Conference Paper

Learning Interpretable Decision Rule Sets: A Submodular Optimization Approach

  • Fan Yang
  • Kai He
  • Linxiao Yang
  • Hongxia Du
  • Jingbang Yang
  • Bo Yang
  • Liang Sun

Rule sets are highly interpretable logical models in which the predicates for decision are expressed in disjunctive normal form (DNF, OR-of-ANDs), or, equivalently, the overall model comprises an unordered collection of if-then decision rules. In this paper, we consider a submodular optimization based approach for learning rule sets. The learning problem is framed as a subset selection task in which a subset of all possible rules needs to be selected to form an accurate and interpretable rule set. We employ an objective function that exhibits submodularity and thus is amenable to submodular optimization techniques. To overcome the difficulty arose from dealing with the exponential-sized ground set of rules, the subproblem of searching a rule is casted as another subset selection task that asks for a subset of features. We show it is possible to write the induced objective function for the subproblem as a difference of two submodular (DS) functions to make it approximately solvable by DS optimization algorithms. Overall, the proposed approach is simple, scalable, and likely to be benefited from further research on submodular optimization. Experiments on real datasets demonstrate the effectiveness of our method.

JBHI Journal 2021 Journal Article

Neuron Image Segmentation via Learning Deep Features and Enhancing Weak Neuronal Structures

  • Bo Yang
  • Weixun Chen
  • Huiqiong Luo
  • Yinghui Tan
  • Min Liu
  • Yaonan Wang

Neuron morphology reconstruction (tracing) in 3D volumetric images is critical for neuronal research. However, most existing neuron tracing methods are not applicable in challenging datasets where the neuron images are contaminated by noises or containing weak filament signals. In this paper, we present a two-stage 3D neuron segmentation approach via learning deep features and enhancing weak neuronal structures, to reduce the impact of image noise in the data and enhance the weak-signal neuronal structures. In the first stage, we train a voxel-wise multi-level fully convolutional network (FCN), which specializes in learning deep features, to obtain the 3D neuron image segmentation maps in an end-to-end manner. In the second stage, a ray-shooting model is employed to detect the discontinued segments in segmentation results of the first-stage, and the local neuron diameter of the broken point is estimated and direction of the filamentary fragment is detected by rayburst sampling algorithm. Then, a Hessian-repair model is built to repair the broken structures, by enhancing weak neuronal structures in a fibrous structure determined by the estimated local neuron diameter and the filamentary fragment direction. Experimental results demonstrate that our proposed segmentation approach achieves better segmentation performance than other state-of-the-art methods for 3D neuron segmentation. Compared with the neuron reconstruction results on the segmented images produced by other segmentation methods, the proposed approach gains 47. 83% and 34. 83% improvement in the average distance scores. The average Precision and Recall rates of the branch point detection with our proposed method are 38. 74% and 22. 53% higher than the detection results without segmentation.

NeurIPS Conference 2021 Conference Paper

OctField: Hierarchical Implicit Functions for 3D Modeling

  • Jia-Heng Tang
  • Weikai Chen
  • Jie Yang
  • Bo Wang
  • Songrun Liu
  • Bo Yang
  • Lin Gao

Recent advances in localized implicit functions have enabled neural implicit representation to be scalable to large scenes. However, the regular subdivision of 3D space employed by these approaches fails to take into account the sparsity of the surface occupancy and the varying granularities of geometric details. As a result, its memory footprint grows cubically with the input volume, leading to a prohibitive computational cost even at a moderately dense decomposition. In this work, we present a learnable hierarchical implicit representation for 3D surfaces, coded OctField, that allows high-precision encoding of intricate surfaces with low memory and computational budget. The key to our approach is an adaptive decomposition of 3D scenes that only distributes local implicit functions around the surface of interest. We achieve this goal by introducing a hierarchical octree structure to adaptively subdivide the 3D space according to the surface occupancy and the richness of part geometry. As octree is discrete and non-differentiable, we further propose a novel hierarchical network that models the subdivision of octree cells as a probabilistic process and recursively encodes and decodes both octree structure and surface geometry in a differentiable manner. We demonstrate the value of OctField for a range of shape modeling and reconstruction tasks, showing superiority over alternative approaches.

AAAI Conference 2020 Conference Paper

3D Human Pose Estimation Using Spatio-Temporal Networks with Explicit Occlusion Training

  • Yu Cheng
  • Bo Yang
  • Bo Wang
  • Robby T. Tan

Estimating 3D poses from a monocular video is still a challenging task, despite the significant progress that has been made in the recent years. Generally, the performance of existing methods drops when the target person is too small/large, or the motion is too fast/slow relative to the scale and speed of the training data. Moreover, to our knowledge, many of these methods are not designed or trained under severe occlusion explicitly, making their performance on handling occlusion compromised. Addressing these problems, we introduce a spatio-temporal network for robust 3D human pose estimation. As humans in videos may appear in different scales and have various motion speeds, we apply multi-scale spatial features for 2D joints or keypoints prediction in each individual frame, and multi-stride temporal convolutional networks (TCNs) to estimate 3D joints or keypoints. Furthermore, we design a spatio-temporal discriminator based on body structures as well as limb motions to assess whether the predicted pose forms a valid pose and a valid movement. During training, we explicitly mask out some keypoints to simulate various occlusion cases, from minor to severe occlusion, so that our network can learn better and becomes robust to various degrees of occlusion. As there are limited 3D ground truth data, we further utilize 2D video data to inject a semisupervised learning capability to our network. Experiments on public data sets validate the effectiveness of our method, and our ablation studies show the strengths of our network’s individual submodules.

NeurIPS Conference 2020 Conference Paper

Curvature Regularization to Prevent Distortion in Graph Embedding

  • Hongbin Pei
  • Bingzhe Wei
  • Kevin Chang
  • Chunxu Zhang
  • Bo Yang

Recent research on graph embedding has achieved success in various applications. Most graph embedding methods preserve the proximity in a graph into a manifold in an embedding space. We argue an important but neglected problem about this proximity-preserving strategy: Graph topology patterns, while preserved well into an embedding manifold by preserving proximity, may distort in the ambient embedding Euclidean space, and hence to detect them becomes difficult for machine learning models. To address the problem, we propose curvature regularization, to enforce flatness for embedding manifolds, thereby preventing the distortion. We present a novel angle-based sectional curvature, termed ABS curvature, and accordingly three kinds of curvature regularization to induce flat embedding manifolds during graph embedding. We integrate curvature regularization into five popular proximity-preserving embedding methods, and empirical results in two applications show significant improvements on a wide range of open graph datasets.

NeurIPS Conference 2020 Conference Paper

Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems

  • Songtao Lu
  • Meisam Razaviyayn
  • Bo Yang
  • Kejun Huang
  • Mingyi Hong

This paper proposes two efficient algorithms for computing approximate second-order stationary points (SOSPs) of problems with generic smooth non-convex objective functions and generic linear constraints. While finding (approximate) SOSPs for the class of smooth non-convex linearly constrained problems is computationally intractable, we show that generic problem instances in this class can be solved efficiently. Specifically, for a generic problem instance, we show that certain strict complementarity (SC) condition holds for all Karush-Kuhn-Tucker (KKT) solutions. Based on this condition, we design an algorithm named Successive Negative-curvature grAdient Projection (SNAP), which performs either conventional gradient projection or some negative curvature-based projection steps to find SOSPs. SNAP is a second-order algorithm that requires $\widetilde{\mathcal{O}}(\max\{1/\epsilon^2_G, 1/\epsilon^3_H\})$ iterations to compute an $(\epsilon_G, \epsilon_H)$-SOSP, where $\widetilde{\mathcal{O}}$ hides the iteration complexity for eigenvalue-decomposition. Building on SNAP, we propose a first-order algorithm, named SNAP$^+$, that requires $\mathcal{O}(1/\epsilon^{2. 5})$ iterations to compute $(\epsilon, \sqrt{\epsilon})$-SOSP. The per-iteration computational complexities of our algorithms are polynomial in the number of constraints and problem dimension. To the best of our knowledge, this is the first time that first-order algorithms with polynomial per-iteration complexity and global sublinear rate are designed to find SOSPs of the important class of non-convex problems with linear constraints (almost surely).

AAMAS Conference 2019 Conference Paper

Attack-Resilient Connectivity Game for UAV Networks using Generative Adversarial Learning

  • Bo Yang
  • Min Liu

The continuous link connectivity is critical for the efficient collaboration of multiple unmanned aerial vehicles (UAVs). However, the UAV communication environments are not only harsh, but are also confronted with the threats of smart attackers, which pose great barriers in maintaining the links unblocked. In this paper, we leverage the paradigm of the Generative Adversarial Network (GAN) to formulate an attackresilient connectivity game between a pair of neighboring UAVs and an attacker. In the three-agent adversary game, the attacker acts as the generator, which attempts to generate highly approximate information as the UAVs so as to maximize its jamming capability; while the pairwise UAVs act as the discriminators, which attempt to enhance the capability of refusing the fake information (i. e. , the opponent’s attack). As the state-of-the-art GAN learning algorithms suffer from the instability dilemma (i. e. , either with the unsuccessful convergence or with the low generation/discrimination performance), we incorporate the conditional GAN with the least square objective loss function as well as the mean square error such that the attacker can improve the detection capability from UAVs’ historical activity patterns and the UAVs can accordingly adjust the connectivity strategy. We validate the effectiveness of the proposed algorithm through extensive evaluations. Results demonstrate that the proposed algorithm can improve the convergence efficiency, reduce the connection latency, and enhance the attack-resilience capability significantly.

AAAI Conference 2019 Conference Paper

Deep Bayesian Optimization on Attributed Graphs

  • Jiaxu Cui
  • Bo Yang
  • Xia Hu

Attributed graphs, which contain rich contextual features beyond just network structure, are ubiquitous and have been observed to benefit various network analytics applications. Graph structure optimization, aiming to find the optimal graphs in terms of some specific measures, has become an effective computational tool in complex network analysis. However, traditional model-free methods suffer from the expensive computational cost of evaluating graphs; existing vectorial Bayesian optimization methods cannot be directly applied to attributed graphs and have the scalability issue due to the use of Gaussian processes (GPs). To bridge the gap, in this paper, we propose a novel scalable Deep Graph Bayesian Optimization (DGBO) method on attributed graphs. The proposed DGBO prevents the cubical complexity of the GPs by adopting a deep graph neural network to surrogate black-box functions, and can scale linearly with the number of observations. Intensive experiments are conducted on both artificial and real-world problems, including molecular discovery and urban road network design, and demonstrate the effectiveness of the DGBO compared with the state-of-the-art.

NeurIPS Conference 2019 Conference Paper

Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds

  • Bo Yang
  • Jianan Wang
  • Ronald Clark
  • Qingyong Hu
  • Sen Wang
  • Andrew Markham
  • Niki Trigoni

We propose a novel, conceptually simple and general framework for instance segmentation on 3D point clouds. Our method, called 3D-BoNet, follows the simple design philosophy of per-point multilayer perceptrons (MLPs). The framework directly regresses 3D bounding boxes for all instances in a point cloud, while simultaneously predicting a point-level mask for each instance. It consists of a backbone network followed by two parallel network branches for 1) bounding box regression and 2) point mask prediction. 3D-BoNet is single-stage, anchor-free and end-to-end trainable. Moreover, it is remarkably computationally efficient as, unlike existing approaches, it does not require any post-processing steps such as non-maximum suppression, feature sampling, clustering or voting. Extensive experiments show that our approach surpasses existing work on both ScanNet and S3DIS datasets while being approximately 10x more computationally efficient. Comprehensive ablation studies demonstrate the effectiveness of our design.

IS Journal 2019 Journal Article

Parallel Cyber-Physical-Social Systems Based Smart Energy Robotic Dispatcher and Knowledge Automation: Concepts, Architectures, and Challenges

  • Lefeng Cheng
  • Tao Yu
  • Xiaoshun Zhang
  • Bo Yang

We propose a novel concept of the robot of energy control based on parallel cyber-physical-social systems for the next generation of energy and electric power systems. We thoroughly investigate its knowledge automation technologies and discuss its main challenges in preparation of promising evolution from Energy 4. 0 to Energy 5. 0 in China.

IJCAI Conference 2019 Conference Paper

Topology Optimization based Graph Convolutional Network

  • Liang Yang
  • Zesheng Kang
  • Xiaochun Cao
  • Di Jin
  • Bo Yang
  • Yuanfang Guo

In the past few years, semi-supervised node classification in attributed network has been developed rapidly. Inspired by the success of deep learning, researchers adopt the convolutional neural network to develop the Graph Convolutional Networks (GCN), and they have achieved surprising classification accuracy by considering the topological information and employing the fully connected network (FCN). However, the given network topology may also induce a performance degradation if it is directly employed in classification, because it may possess high sparsity and certain noises. Besides, the lack of learnable filters in GCN also limits the performance. In this paper, we propose a novel Topology Optimization based Graph Convolutional Networks (TO-GCN) to fully utilize the potential information by jointly refining the network topology and learning the parameters of the FCN. According to our derivations, TO-GCN is more flexible than GCN, in which the filters are fixed and only the classifier can be updated during the learning process. Extensive experiments on real attributed networks demonstrate the superiority of the proposed TO-GCN against the state-of-the-art approaches.

IJCAI Conference 2018 Conference Paper

3D-PhysNet: Learning the Intuitive Physics of Non-Rigid Object Deformations

  • Zhihua Wang
  • Stefano Rosa
  • Bo Yang
  • Sen Wang
  • Niki Trigoni
  • Andrew Markham

The ability to interact and understand the environment is a fundamental prerequisite for a wide range of applications from robotics to augmented reality. In particular, predicting how deformable objects will react to applied forces in real time is a significant challenge. This is further confounded by the fact that shape information about encountered objects in the real world is often impaired by occlusions, noise and missing regions e. g. a robot manipulating an object will only be able to observe a partial view of the entire solid. In this work we present a framework, 3D-PhysNet, which is able to predict how a three-dimensional solid will deform under an applied force using intuitive physics modelling. In particular, we propose a new method to encode the physical properties of the material and the applied force, enabling generalisation over materials. The key is to combine deep variational autoencoders with adversarial training, conditioned on the applied force and the material properties. We further propose a cascaded architecture that takes a single 2. 5D depth view of the object and predicts its deformation. Training data is provided by a physics simulator. The network is fast enough to be used in real-time applications from partial views. Experimental results show the viability and the generalisation properties of the proposed architecture.

AAAI Conference 2018 Conference Paper

Group Sparse Bayesian Learning for Active Surveillance on Epidemic Dynamics

  • Hongbin Pei
  • Bo Yang
  • Jiming Liu
  • Lei Dong

Predicting epidemic dynamics is of great value in understanding and controlling diffusion processes, such as infectious disease spread and information propagation. This task is intractable, especially when surveillance resources are very limited. To address the challenge, we study the problem of active surveillance, i. e. , how to identify a small portion of system components as sentinels to effect monitoring, such that the epidemic dynamics of an entire system can be readily predicted from the partial data collected by such sentinels. We propose a novel measure, the γ value, to identify the sentinels by modeling a sentinel network with row sparsity structure. We design a flexible group sparse Bayesian learning algorithm to mine the sentinel network suitable for handling both linear and non-linear dynamical systems by using the expectation maximization method and variational approximation. The efficacy of the proposed algorithm is theoretically analyzed and empirically validated using both synthetic and real-world data.

IJCAI Conference 2018 Conference Paper

Keeping in Touch with Collaborative UAVs: A Deep Reinforcement Learning Approach

  • Bo Yang
  • Min Liu

Effective collaborations among autonomous unmanned aerial vehicles (UAVs) rely on timely information sharing. However, the time-varying flight environment and the intermittent link connectivity pose great challenges to message delivery. In this paper, we leverage the deep reinforcement learning (DRL) technique to address the UAVs' optimal links discovery and selection problem in uncertain environments. As the multi-agent learning efficiency is constrained by the high-dimensional and continuous action spaces, we slice the whole action spaces into a number of tractable fractions to achieve efficient convergences of optimal policies in continuous domains. Moreover, for the nonstationarity issue that particularly challenges the multi-agent DRL with local perceptions, we present a multi-agent mutual sampling method that jointly interacts the intra-agent and inter-agent state-action information to stabilize and expedite the training procedure. We evaluate the proposed algorithm on the UAVs' continuous network connection task. Results show that the associated UAVs can quickly select the optimal connected links, which facilitate the UAVs' teamwork significantly.

AAAI Conference 2015 Conference Paper

Bayesian Approach to Modeling and Detecting Communities in Signed Network

  • Bo Yang
  • Xuehua Zhao
  • Xueyan Liu

There has been an increasing interest in exploring signed networks with positive and negative links in that they contain more information than unsigned networks. As fundamental problems of signed network analysis, community detection and sign (or attitude) prediction are still primary challenges. To address them, we propose a generative Bayesian approach, in which 1) a signed stochastic blockmodel is proposed to characterize the community structure in context of signed networks, by means of explicitly formulating the distributions of both density and frustration of signed links from a stochastic perspective, and 2) a model learning algorithm is proposed by theoretically deriving a variational Bayes EM for parameter estimation and a variation based approximate evidence for model selection. Through the comparisons with state-of-the-art methods on synthetic and real-world networks, the proposed approach shows its superiority in both community detection and sign prediction for exploratory networks.

AAAI Conference 2015 Conference Paper

On the Scalable Learning of Stochastic Blockmodel

  • Bo Yang
  • Xuehua Zhao

Stochastic blockmodel (SBM) enables us to decompose and analyze an exploratory network without a priori knowledge about its intrinsic structure. However, the task of effectively and efficiently learning a SBM from a large-scale network is still challenging due to the high computational cost of its model selection and parameter estimation. To address this issue, we present a novel SBM learning algorithm referred to as BLOS (BLOckwise Sbm learning). Distinct from the literature, the model selection and parameter estimation of SBM are concurrently, rather than alternately, executed in BLOS by embedding the minimum message length criterion into a block-wise EM algorithm, which greatly reduces the time complexity of SBM learning without losing learning accuracy and modeling flexibility. Its effectiveness and efficiency have been tested through rigorous comparisons with the state-of-the-art methods on both synthetic and real-world networks.

AAAI Conference 2014 Conference Paper

Modeling and Mining Spatiotemporal Patterns of Infection Risk from Heterogeneous Data for Active Surveillance Planning

  • Bo Yang
  • Hua Guo
  • Yi Yang
  • Benyun Shi
  • Xiaonong Zhou
  • Jiming Liu

Active surveillance is a desirable way to prevent the spread of infectious diseases in that it aims to timely discover individual incidences through an active searching for patients. However, in practice active surveillance is difficult to implement especially when monitoring space is large but available resources are limited. Therefore, it is extremely important for public health authorities to know how to distribute their very sparse resources to high-priority regions so as to maximize the outcomes of active surveillance. In this paper, we raise the problem of active surveillance planning and provide an effective method to address it via modeling and mining spatiotemporal patterns of infection risks from heterogeneous data sources. Taking malaria as an example, we perform an empirical study on real-world data to validate our method and provide our new findings.

IROS Conference 2013 Conference Paper

Path-generating regulator along a straight passage for two-wheeled mobile robots

  • Bo Yang
  • Naohiko Hanajima
  • Atsushi Yamamoto
  • Mototada Ayamura
  • Jun Dai

In this paper, the path-generating regulator is extended to tracking problem along a straight passage for two-wheeled mobile robots. As most of mobile robots are with nonholonomic constraints, it is difficult for us to make them converge to the target state with a control law. To solve this problem, many methods have been proposed. One of them is Path-generating Regulator(PGR) which designs a nonlinear regulator carrying out asymptotic convergence to a given trajectory family. However, the original method is not well suited for passages. In this paper, we will present the extended PGR for the tracking problem along a straight passage. Numerical simulations and experiments are also performed to show the effectiveness of this method.

IJCAI Conference 2013 Conference Paper

Social Collaborative Filtering by Trust

  • Bo Yang
  • Yu Lei
  • Dayou Liu
  • Jiming Liu

To accurately and actively provide users with their potentially interested information or services is the main task of a recommender system. Collaborative filtering is one of the most widely adopted recommender algorithms, whereas it is suffering the issues of data sparsity and cold start that will severely degrade quality of recommendations. To address such issues, this article proposes a novel method, trying to improve the performance of collaborative filtering recommendation by means of elaborately integrating twofold sparse information, the conventional rating data given by users and the social trust network among the same users. It is a model-based method adopting matrix factorization technique to map users into low-dimensional latent feature spaces in terms of their trust relationship, aiming to reflect users’ reciprocal influence on their own opinions more reasonably. The validations against a real-world dataset show that the proposed method performs much better than state-of-the-art recommendation algorithms for social collaborative filtering by trust.

ICRA Conference 2011 Conference Paper

Design and implementation of a pneumatically-actuated robot for breast biopsy under continuous MRI

  • Bo Yang
  • U-Xuan Tan
  • Alan B. McMillan
  • Rao P. Gullapalli
  • Jaydev P. Desai

Magnetic Resonance Imaging (MRI) is superior to other imaging modalities such as Ultrasound and Computed Tomography and is used for both diagnostic and therapeutic procedures. However, current breast biopsy procedures based on MR images obtained apriori, use a blind targeting approach, which can be long and painful. Current approaches, due to possible patient motion, can lead to tool tip positioning errors thereby affecting diagnostic accuracy and causing significant patient discomfort, if repeated procedures are required. Hence, it is desired to develop a MRI-compatible robot for breast biopsy procedures without removing the patient from the MRI bore. This approach could potentially avoid multiple biopsy needle insertions and minimize sampling errors. Due to the working principle of MRI, material, actuation, and sensing techniques are limited as the MR images must not be affected significantly during the procedure. In addition, the limited space of the MRI bore requires the robot to be compact. This paper presents a four degrees of freedom robot with a compact parallel mechanism of which three degrees of freedom are pneumatically actuated while the needle driver mechanism is actuated by a piezo motor. Fiber-optic force sensor is also designed, developed, and mounted on the top mobile platform of the parallel mechanism to sense the needle and tissue interaction forces. Position control of the pneumatic cylinders is implemented using PI control with a modified integration term to achieve a slow and smooth motion.

ICRA Conference 2010 Conference Paper

Design and development of a 3-axis MRI-compatible force sensor

  • U-Xuan Tan
  • Bo Yang
  • Rao P. Gullapalli
  • Jaydev P. Desai

Magnetic resonance imaging (MRI) has been gaining popularity over standard imaging modalities like ultrasound and CT because of its ability to provide excellent soft-tissue contrast. However, due to the working principle of MRI, a number of conventional force sensors are not compatible. One popular solution is to develop a fiber-optic force sensor. However, the measurements along the principal axes of a number of these force sensors are highly cross-coupled. One of the objectives of this paper is to minimize this coupling effect. In addition, this paper describes the design of an elastic frame structure that is obtained systematically by an algorithm and not purely based on design intuition. We used a topology optimization technique, which has two major advantages: 1) aids engineers in design when given a constrained boundary, and 2) optimize the displacement amplification, which will in turn increase stiffness, bandwidth, and improve sensing resolution. To ensure that the frames are linked from the input to output, a solution for topology optimization is proposed. The sensor is then fabricated using plastic material (ABS) as it is one of the ideal material for MRI environment. However, the hysteresis effect seen in the displacement-load graph of plastic materials is known to affect the accuracy. Hence, this paper also proposes modeling and addressing this hysteretic effect using Prandtl-Ishlinskii play operators. Finally, experiments are conducted to evaluate the sensor's performance, as well as its compatibility in MRI under continuous imaging.

JAAMAS Journal 2009 Journal Article

An autonomy-oriented computing approach to community mining in distributed and dynamic networks

  • Bo Yang
  • Jiming Liu
  • Dayou Liu

Abstract A network community refers to a special type of network structure that contains a group of nodes connected based on certain relationships or similar properties. Our ability to mine communities hidden inside networks will readily enable us to effectively understand and exploit such networks. So far, various methods and algorithms have been developed to perform the task of community mining, where it is often required that the networks are processed in a centralized manner, and their structures will not dynamically change. However, in the real world, many applications involve distributed and dynamically evolving networks, in which resources and controls are not only decentralized but also updated frequently. It would be difficult for the existing methods to deal with these types of networks since their global topological representations are either not available or too hard to obtain due to their huge size, decentralization, and/or dynamic updates. The aim of our work is to address the problem of mining communities from a distributed and dynamic network. It differs from the previous ones in that here we introduce the notion of self-organizing agent networks, and provide an autonomy-oriented computing (AOC) approach to distributed and incremental mining of network communities. The AOC-based method utilizes reactive agents that can collectively detect and update community structures in a distributed and dynamically evolving network, based only on their local views and interactions. While providing detailed formulations, we present the results of our systematic validations using real-world benchmark networks as well as synthetic networks that include a distributed intelligent Portable Digital Assistant (iPDA) network example.