Author name cluster

Tao Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

102 papers

2 author rows

EAAI Journal 2026 Journal Article

A collaborative approach based on large language model and knowledge graphs for information integration towards smart manufacturing

Ruihao Li
Chong Chen
Ying Liu
Tao Wang
Haidong Shao
Lianglun Cheng

Details DOI

EAAI Journal 2026 Journal Article

A novel physics-constrained deep learning framework for the inverse design of assembly contact interfaces

Lifei Chen
Qiyin Lin
Mingjun Qiu
Chen Wang
Tao Wang
Hao Guan
Qiyuan Xie
Yuge Jiao

Details DOI

AAAI Conference 2026 Conference Paper

CrossCheck-Bench: Diagnosing Compositional Failures in Multimodal Conflict Resolution

Baoliang Tian
Yuxuan Si
Jilong Wang
LingYao Li
Zhongyuan Bao
Zineng Zhou
Tao Wang
Sixu Li

Multimodal Large Language Models are primarily trained and evaluated on aligned image-text pairs, which leaves their ability to detect and resolve real-world inconsistencies largely unexplored. In open-domain applications visual and textual cues often conflict, requiring models to perform structured reasoning beyond surface-level alignment. We introduce CrossCheck-Bench, a diagnostic benchmark for evaluating contradiction detection in multimodal inputs. The benchmark adopts a hierarchical task framework covering three levels of reasoning complexity and defines seven atomic capabilities essential for resolving cross-modal inconsistencies. CrossCheck-Bench includes 15k question-answer pairs sourced from real-world artifacts with synthetically injected contradictions. The dataset is constructed through a multi-stage annotation pipeline involving more than 450 expert hours to ensure semantic validity and calibrated difficulty across perception, integration, and reasoning. We evaluate 13 state-of-the-art vision-language models and observe a consistent performance drop as tasks shift from perceptual matching to logical contradiction detection. Most models perform well on isolated entity recognition but fail when multiple clues must be synthesized for conflict reasoning. Capability-level analysis further reveals uneven skill acquisition, especially in tasks requiring multi-step inference or rule-based validation. Additional probing shows that conventional prompting strategies such as Chain-of-Thought and Set-of-Mark yield only marginal gains. By contrast, methods that interleave symbolic reasoning with grounded visual processing achieve more stable improvements. These results highlight a persistent bottleneck in multimodal reasoning and suggest new directions for building models capable of robust cross-modal verification.

PDF Details DOI

AAAI Conference 2026 Conference Paper

DiMA: Distinguishing Resident and Tourist Preferences via Multi-Modal LLM Alignment for Out-of-Town Cross-Domain Recommendation

Fan Zhang
Jinpeng Chen
Tao Wang
Huan Li
Senzhang Wang
Feifei Kou
Ye Ji
Kaimin Wei

Out-of-Town (OOT) recommendation aims to provide personalized suggestions for users in unfamiliar cities. However, OOT recommendation faces two fundamental challenges: the difficulty of reasoning across modalities, as preference signals in disparate formats such as images and text are hard to compare; and the preference deviation problem, since a user's resident and tourist preferences often diverge, rendering simple preference transfer ineffective. To address these challenges, we propose Distinguishing Resident and Tourist Preferences via Multi-Modal LLM Alignment for Out-of-Town Cross-Domain Recommendation (DiMA), a framework for re-ranking Points of Interest (POIs). To tackle the multimodal challenge, DiMA first leverages Multimodal Large Language Models and Large Language Models (LLMs) to transform heterogeneous POI data into unified semantic tags, enabling both cross-modal reasoning and efficient downstream processing. To address preference deviation, a ``teacher'' LLM executes a custom Chain-of-Thought (CoT) process to disentangle resident and tourist preferences from multi-city histories for re-ranking. Finally, a lightweight student model learns this CoT reasoning via Supervised Fine-Tuning and is then refined with Direct Preference Optimization to align with true user choices, with the potential to surpass the teacher. Extensive experiments on a real-world dataset demonstrate that DiMA significantly enhances the performance of baseline models in the OOT recommendation re-ranking task.

PDF Details DOI

EAAI Journal 2026 Journal Article

Human-inspired predictive trajectory planning for fixed-wing unmanned aerial vehicles in contested environments

Yuyang Xue
Tao Wang
Bo Ma
Yiwen Hu
Chen Gao
Feng Xie

Details DOI

AAAI Conference 2026 Conference Paper

Hybrid-DMKG: A Hybrid Reasoning Framework over Dynamic Multimodal Knowledge Graphs for Multimodal Multihop QA with Knowledge Editing

Li Yuan
Qingfei Huang
Bingshan Zhu
Yi Cai
Qingbao Huang
Changmeng Zheng
Zikun Deng
Tao Wang

Multimodal Knowledge Editing (MKE) extends traditional knowledge editing to settings involving both textual and visual modalities. However, existing MKE benchmarks primarily assess final answer correctness, neglecting the quality of intermediate reasoning and robustness to visually rephrased inputs. To address this limitation, we introduce MMQAKE, the first benchmark for multimodal multihop question answering with knowledge editing. MMQAKE evaluates: (1) a model’s ability to reason over 2–5-hop factual chains that span both text and images, including performance at each intermediate step; (2) robustness to visually rephrased inputs in multihop questions. Our evaluation shows that current MKE methods often struggle to consistently update and reason over multimodal reasoning chains following knowledge edits. To overcome these challenges, we propose Hybrid-DMKG, a hybrid reasoning framework built on a dynamic multimodal knowledge graph (DMKG) to enable accurate multihop reasoning over updated multimodal knowledge. Hybrid-DMKG first uses a large language model to decompose multimodal multihop questions into sequential sub-questions, then applies a multimodal retrieval model to locate updated facts by jointly encoding each sub-question with candidate entities and their associated images. For answer inference, a hybrid reasoning module operates over the DMKG via two parallel paths: (1) relation-linking prediction; (2) RAG Reasoning with large vision-language models. A background-reflective decision module then aggregates evidence from both paths to select the most credible answer. Experimental results on MMQAKE show that Hybrid-DMKG significantly outperforms existing MKE approaches, achieving higher accuracy and improved robustness to knowledge updates.

PDF Details DOI

JBHI Journal 2026 Journal Article

USRMamba: Adaptive Routing-Guided State Space Model for Ultrasound Super-Resolution

Tao Wang
Zihan Zhou
Chufeng Jin
Tianyi Liu
Baike Shi
Guangquan Zhou
Rongjun Ge
Jean-Louis Coatrieux

In ultrasound (US) imaging, resolution degradation caused by the acoustic diffraction limit and transducer array density can significantly reduce image quality, which have negative impacts on clinical diagnosis. Super-resolution (SR) reconstruction is a more flexible and cost-effective measure compared to system upgrades. However, the complexity and diversity of tissue acoustic properties make it difficult to establish a unified model for US image SR reconstruction. In this context, this paper pioneers a revolutionary Mamba-based single US image SR method, referred to as USRMamba. Firstly, a simple and efficient Enhanced Transform Combine Module (ETCM) is designed for shallow feature extraction, which achieves multi-scale decoupling through Laplacian sharpening and wavelet transform to solve the interference of high-frequency information loss and speckle noise in US images; More importantly, an Adaptive Top-k Prompt Module (ATPM) is proposed, whose core is to generate semantic prompts through an adaptive routing-guided strategy to suppress the interference of fuzzy region labels caused by attenuation on detail reconstruction. In addition, a Frequency Channel Attention Module (FCAM) is developed, forming a modeling strategy of “frequency-spatial domain reconstruction” in parallel with ATPM, further optimizing the fidelity for US images SR reconstruction. Qualitative and quantitative experiments demonstrate that USRMamba exhibits superior performance on several US datasets. Especially with scale factor ×2, the proposed method has an average PSNR 1. 31dB higher than state-of-the-art (SOTA) methods.

Details DOI

IROS Conference 2025 Conference Paper

A Crab-Inspired Soft Gripper with Single-Finger Dexterous Grasping Capabilities

Yunce Zhang
Haobin Lv
Yixiang Liu
Zhe Min
Shizhao Zhou
Tao Wang
Shiqiang Zhu
Rui Song 0002

Soft grippers conform to the shape and surface properties of the objects to be grasped, effectively avoiding damage to soft and fragile items. Despite the variety of existing soft gripper designs, their structures lack sufficient flexibility for effectively grasping slender objects or operating in narrow spaces. To address these challenges, we propose a soft gripper with single-finger grasping capabilities, inspired by the structure of crab claws. The structural design and the fabrication method of the gripper are introduced, and the analytical bending model is derived. Experiments are conducted under typical operating conditions to validate the model, and the results indicate that the measured data are in good accordance with the predicted responses. Furthermore, a series of grasping experiments are carried out to test the single-finger grasping capabilities of the proposed soft gripper. The results indicate that the proposed soft gripper can efficiently and stably grasp slender or irregular objects with a single finger. In particular, it demonstrates suitability for operations in narrow spaces and shows potential for handling complex tasks. This innovative design effectively reduces the complexity of the system, while exhibiting promising capabilities in grasping slender or irregular objects and operating within restricted spaces.

Details

TIST Journal 2025 Journal Article

A GPT-assisted Multi-Granularity Contrastive Learning approach for Knowledge Graph Entity Typing

Hongbin Zhang
Tao Wang
Zhuowei Wang
Nankai Lin
Chong Chen
Lianglun Cheng

Knowledge graph entity typing (KGET) is an efficient way to infer possible missing types for entities, which has become a key instrument to enhance the construction of knowledge graphs (KGs). Existing models to KGET have mainly focused on a single granularity information such as distinct entity information, but other granularity information including entity-to-type-clusters, the same cluster and interaction information have not been fully explored, resulting in inferring incorrect types in KGs. To address this, we propose a GPT-assisted Multi-Granularity Contrastive Learning (GMGCL) approach to acquire entity-to-type-clusters, entity, type-cluster and relation information by GPT-assisted entity-to-type-clusters clustering, entity-based, cluster-based and relation-based contrastive learning, respectively. Our approach is evaluated on FB15kET and YAGO43kET datasets, outperforming other baselines and obtaining a 1.35% average improvement at least on MRR.

Details DOI

EAAI Journal 2025 Journal Article

A novel short-term prediction method for distributed photovoltaic power generation considering extreme weather

Xin Guan
Xiao Han
Jun Wang
Tao Wang

Details DOI

EAAI Journal 2025 Journal Article

A universal parameter-efficient fine-tuning approach for stereo image super-resolution

Yuanbo Zhou
Yuyang Xue
Xinlin Zhang
Wei Deng
Tao Wang
Tao Tan
Qinquan Gao
Tong Tong

Details DOI

NeurIPS Conference 2025 Conference Paper

ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation

Lingfeng Wang
Hualing Lin
Senda Chen
Tao Wang
Changxu Cheng
Yangyang Zhong
Dong Zheng
Wuyue Zhao

While humans effortlessly draw visual objects and shapes by adaptively allocating attention based on their complexity, existing multimodal large language models (MLLMs) remain constrained by rigid token representations. Bridging this gap, we propose ALTo, an adaptive length tokenizer for autoregressive mask generation. To achieve this, a novel token length predictor is designed, along with a length regularization term and a differentiable token chunking strategy. We further build ALToLLM that seamlessly integrates ALTo into MLLM. Preferences on the trade-offs between mask quality and efficiency is implemented by group relative policy optimization (GRPO). Experiments demonstrate that ALToLLM achieves state-of-the-art performance with adaptive token cost on popular segmentation benchmarks. Code and models will be released.

PDF Details

EAAI Journal 2025 Journal Article

An enhanced generative adversarial network for longer vibration time data generation under variable operating conditions for imbalanced bearing fault diagnosis

Teng Wang
Zhi Chao Ong
Shin Yee Khoo
Pei Yi Siow
Jinlai Zhang
Tao Wang

Details DOI

NeurIPS Conference 2025 Conference Paper

Association-Focused Path Aggregation for Graph Fraud Detection

Tian Qiu
Wenda Li
Zunlei Feng
Jie Lei
Tao Wang
Yi Gao
Mingli Song
Yang Gao

Fraudulent activities have caused substantial negative social impacts and are exhibiting emerging characteristics such as intelligence and industrialization, posing challenges of high-order interactions, intricate dependencies, and the sparse yet concealed nature of fraudulent entities. Existing graph fraud detectors are limited by their narrow "receptive fields", as they focus only on the relations between an entity and its neighbors while neglecting longer-range structural associations hidden between entities. To address this issue, we propose a novel fraud detector based on Graph Path Aggregation (GPA). It operates through variable-length path sampling, semantic-associated path encoding, path interaction and aggregation, and aggregation-enhanced fraud detection. To further facilitate interpretable association analysis, we synthesize G-Internet, the first benchmark dataset in the field of internet fraud detection. Extensive experiments across datasets in multiple fraud scenarios demonstrate that the proposed GPA outperforms mainstream fraud detectors by up to +15% in Average Precision (AP). Additionally, GPA exhibits enhanced robustness to noisy labels and provides excellent interpretability by uncovering implicit fraudulent patterns across broader contexts. Code is available at https: //github. com/horrible-dong/GPA.

PDF Details

IJCAI Conference 2025 Conference Paper

Collaborative Multi-LoRA Experts with Achievement-based Multi-Tasks Loss for Unified Multimodal Information Extraction

Li Yuan
Yi Cai
Xudong Shen
Qing Li
Qingbao Huang
Zikun Deng
Tao Wang

Multimodal Information Extraction (MIE) has gained attention for extracting structured information from multimedia sources. Traditional methods tackle MIE tasks separately, missing opportunities to share knowledge across tasks. Recent approaches unify these tasks into a generation problem using instruction-based T5 models with visual adaptors, optimized through full-parameter fine-tuning. However, this method is computationally intensive, and multi-task fine-tuning often faces gradient conflicts, limiting performance. To address these challenges, we propose collaborative multi-LoRA experts with achievement-based multi-task loss (C-LoRAE) for MIE tasks. C-LoRAE extends the low-rank adaptation (LoRA) method by incorporating a universal expert to learn shared multimodal knowledge from cross-MIE tasks and task-specific experts to learn specialized instructional task features. This configuration enhances the model’s generalization ability across multiple tasks while maintaining the independence of various instruction tasks and mitigating gradient conflicts. Additionally, we propose an achievement-based multi-task loss to balance training progress across tasks, addressing the imbalance caused by varying numbers of training samples in MIE tasks. Experimental results on seven benchmark datasets across three key MIE tasks demonstrate that C-LoRAE achieves superior overall performance compared to traditional fine-tuning methods and LoRA methods while utilizing a comparable number of training parameters to LoRA.

PDF Details DOI

JBHI Journal 2025 Journal Article

CSAI: Conditional Self-Attention Imputation for Healthcare Time-series

Linglong Qian
Joseph Arul Raj
Hugh Logan-Ellis
Ao Zhang
Yuezhou Zhang
Tao Wang
Richard JB Dobson
Zina Ibrahim

We introduce the Conditional Self-Attention Imputation (CSAI) model, a novel recurrent neural network architecture designed to address imputation challenges in multivariate time series derived from hospital electronic health records (EHRs). CSAI introduces key novelties specific to EHR data: a) attention-based hidden state initialisation to capture both long- and short-range temporal dependencies, b) domain-informed temporal decay to mimic clinical recording patterns, and c) a non-uniform masking strategy that models non-random missingness. Comprehensive evaluation across four EHR benchmark datasets demonstrates CSAI's effectiveness compared to state-of-the-art architectures in data restoration and downstream tasks. CSAI is integrated into PyPOTS, an open-source Python toolbox for partially observed time series. This work significantly advances the state of neural network imputation applied to EHRs by more closely aligning algorithmic imputation with clinical realities.

Details DOI

JBHI Journal 2025 Journal Article

Design of a Multi-Parameter Fusion Sensor and System for Respiratory Monitoring of Mechanically Ventilated Patients in the ICU

Shuai Ren
Xiaohan Wang
Maolin Cai
Yan Shi
Tao Wang
Zujin Luo

In order to achieve precise respiratory therapy for mechanically ventilated patients, real-time monitoring of the state parameters of inhaled and exhaled gases is required. These parameters are primarily measured by ventilators, with limitations such as insufficient monitoring parameters, circuit leaks, and constraints imposed by distance and obstacles. This paper designs a low-power wireless sensor for multi-parameter monitoring near the patient, which can be used continuously for approximately 60 days. Based on this sensor, an intelligent respiratory monitoring system with a distributed architecture is proposed to achieve intelligent patient-ventilator asynchrony (PVA) perception. Experimental results show that the system can stably and accurately collect and transmit data, with measurement errors for pressure, flow, temperature, humidity, and CO $_{2}$ concentration being $\pm$ 1. 3%, $\pm$ 2. 1%, $\pm$ 0. 6 $^\circ$, $\pm$ 1% RH, $\pm$ 0. 3 mmHg respectively. The proposed sensor and system have the potential to enhance the efficiency and intelligence of medical care significantly.

Details DOI

EAAI Journal 2025 Journal Article

Enhancing primitive segmentation through transformer-based cross-task interaction

Tao Wang
Weibin Xi
Yong Cheng
Jun Zhang
Ruochen Yin
Yang Yang

Details DOI

IROS Conference 2025 Conference Paper

Enhancing the Flexibility of a Quadruped Robot with a 2-DOF Active Spine Using Nonlinear Model Predictive Control

Zeyi Yang
Zhiyong Xu
Haoming Rong
Shaolin Mo
Yuying Chen
Zujian Chen
Tao Wang
Hui Cheng

For quadrupeds, a flexible spine allows them to traverse space and make quick turns. From the perspective of mechanical design in quadruped robots, an active spine with 2 degrees of freedom (2-DOF) can achieve dynamic posture adjustment similar to biological organisms which allows for pitch and yaw control. In this work, we present a novel approach to enhance the flexibility of a quadruped robot, Yatsen Lion II, by incorporating a 2-DOF active spine, which is mechanically designed as a linkage-driven parallelogram mechanism. To optimize its motion, we utilize nonlinear model predictive control (NMPC), which combines centroidal dynamics with full kinematics. By incorporating the two extra DOFs of the spinal joint into the generalized coordinates and velocities, we represent the robot as a hybrid dynamic system, capturing the intricate interplay between the legs and spine. Centroidal dynamics act as a crucial bridge between joint movements and the robot’s overall momentum, enabling the controller to synchronize the quadruped’s movements with dynamic spinal adjustments and adaptive gait patterns. We validate our approach through both simulation and real-world experiments. We compare spinal quadruped robot to their rigid-spine counterparts across key locomotion metrics, including in-place turning, straight-line speed, and turning radius. The results indicate that the spined quadrupedal robot outperforms its rigid counterpart by up to 26%, highlighting its flexibility.

Details

NeurIPS Conference 2025 Conference Paper

Foundations of Top-$k$ Decoding for Language Models

Georgy Noarov
Soham Mallick
Tao Wang
Sunay Joshi
Yan Sun
Yangxinyu Xie
Mengxin Yu
Edgar Dobriban

Top-$k$ decoding is a widely used method for sampling from LLMs: at each token, only the largest $k$ next-token-probabilities are kept, and the next token is sampled after re-normalizing them to sum to unity. Top-$k$ and other sampling methods are motivated by the intuition that true next-token distributions are sparse, and the noisy LLM probabilities need to be truncated. However, to our knowledge, a precise theoretical motivation for the use of top-$k$ decoding is missing. In this work, we develop a theoretical framework that both explains and generalizes top-$k$ decoding. We view decoding at a fixed token as the recovery of a sparse probability distribution. We introduce *Bregman decoders* obtained by minimizing a separable Bregman divergence (for both the *primal* and *dual* cases) with a sparsity-inducing $\ell_0$-regularization; in particular, these decoders are *adaptive* in the sense that the sparsity parameter $k$ is chosen depending on the underlying token distribution. Despite the combinatorial nature of the sparse Bregman objective, we show how to optimize it efficiently for a large class of divergences. We prove that (i) the optimal decoding strategies are greedy, and further that (ii) the objective is discretely convex in $k$, such that the optimal $k$ can be identified in logarithmic time. We note that standard top-$k$ decoding arises as a special case for the KL divergence, and construct new decoding strategies with substantially different behaviors (e. g. , non-linearly up-weighting larger probabilities after re-normalization).

PDF Details

NeurIPS Conference 2025 Conference Paper

Gaussian Herding across Pens: An Optimal Transport Perspective on Global Gaussian Reduction for 3DGS

Tao Wang
Mengyu Li
Geduo Zeng
Cheng Meng
Qiong Zhang

3D Gaussian Splatting (3DGS) has emerged as a powerful technique for radiance field rendering, but it typically requires millions of redundant Gaussian primitives, overwhelming memory and rendering budgets. Existing compaction approaches address this by pruning Gaussians based on heuristic importance scores, without global fidelity guarantee. To bridge this gap, we propose a novel optimal transport perspective that casts 3DGS compaction as global Gaussian mixture reduction. Specifically, we first minimize the composite transport divergence over a KD-tree partition to produce a compact geometric representation, and then decouple appearance from geometry by fine-tuning color and opacity attributes with far fewer Gaussian primitives. Experiments on benchmark datasets show that our method (i) yields negligible loss in rendering quality (PSNR, SSIM, LPIPS) compared to vanilla 3DGS with only 10\% Gaussians; and (ii) consistently outperforms state-of-the-art 3DGS compaction techniques. Notably, our method is applicable to any stage of vanilla or accelerated 3DGS pipelines, providing an efficient and agnostic pathway to lightweight neural rendering.

PDF Details

JBHI Journal 2025 Journal Article

High-Frequency SSVEP-BCI With Row-Column Dual-Frequency Encoding and Decoding Strategy for Reduced Training Data

Yufeng Ke
Xiaohe Chen
Wei Xu
Tao Wang
Shuaishuai Shen
Dong Ming

Steady-state visual evoked potentials (SSVEP)-based brain-computer interfaces (BCIs) have the potential to be utilized in various fields due to their high accuracies and information transfer rates (ITR). High-frequency (HF) visual stimuli have shown promise in reducing visual fatigue and enhancing user comfort. However, these HF-SSVEP-BCIs often face limitations in the number of commands and typically require extensive individual training data to achieve high performance. In this study, we proposed a row-column dual-frequency encoding and decoding method using HF stimulation to develop a comfortable BCI system that supports multiple commands and reduces training costs. We arranged 20 targets in a matrix of five rows and four columns, with each target modulated by left-and-right field stimulation using two frequency-phase combinations. Targets in each row or column share a unique frequency-phase combination, allowing EEG data from the same row or column to be used collectively to train a row/column index decoding model for target identification. To evaluate the performance of our method, we constructed a 20-target asynchronous robotic arm control system with the adaptive window method. With only four training trials per target, the online system achieved an ITR of 105. 14 ± 14. 15 bits/min, a true positive rate of 98. 18 ± 2. 87%, a false positive rate of 7. 39 ± 6. 73%, and a classification accuracy of 91. 88 ± 5. 75%, with an average data length of 925. 70 ± 45. 44 ms. These results indicate that the proposed protocol can deliver accurate and rapid command outputs for a comfortable SSVEP-based BCI with minimal training data and fewer frequencies.

Details DOI

JBHI Journal 2025 Journal Article

How Deep is Your Guess? A Fresh Perspective on Deep Learning for Medical Time-Series Imputation

Linglong Qian
Hugh Logan Ellis
Tao Wang
Jun Wang
Robin Mitra
Richard Dobson
Zina Ibrahim

We present a comprehensive analysis of deep learning approaches for Electronic Health Record (EHR) time-series imputation, examining how the interplay between architectural and framework design decisions gives rise to higher-level properties of a given deep imputer model and distinct biases towards complex data characteristics. Our investigation reveals the varying capabilities of deep imputers in capturing complex spatio-temporal dependencies within EHRs, and that the effectiveness of the model depends on how its combined biases align with the characteristics of the medical time series. Our experimental evaluation challenges common assumptions about model complexity, demonstrating that larger models do not necessarily improve performance. Rather, carefully designed architectures can better capture the complex patterns inherent in clinical data. The study highlights the need for imputation approaches that prioritise clinically meaningful data reconstruction over statistical accuracy. Our experiments further reveal up to 20% in variations of imputation performance based on preprocessing and implementation choices, emphasising the need for standardised benchmarking methodologies. Finally, we identify critical gaps between current deep imputation methods and medical requirements, highlighting the importance of integrating clinical insights to achieve more reliable imputation approaches for healthcare applications.

Details DOI

IROS Conference 2025 Conference Paper

IMM-MOT: A Novel 3D Multi-object Tracking Framework with Interacting Multiple Model Filter

Xiaohong Liu
Xulong Zhao
Gang Liu
Zili Wu
Tao Wang
Lei Meng
Yuhan Wang

3D Multi-Object Tracking (MOT) provides the trajectories of surrounding objects, assisting robots or vehicles in smarter path planning and obstacle avoidance. Existing 3D MOT methods based on the Tracking-by-Detection framework typically use a single motion model to track an object throughout its entire tracking process. However, objects may change their motion patterns due to variations in the surrounding environment. In this paper, we introduce the Interacting Multiple Model filter in IMM-MOT, which accurately fits the complex motion patterns of individual objects, overcoming the limitation of single-model tracking in existing approaches. In addition, we incorporate a Damping Window mechanism into the trajectory lifecycle management, leveraging the continuous association status of trajectories to control their creation and termination, reducing the occurrence of overlooked low-confidence true targets. Furthermore, we propose the Distance-Based Score Enhancement module, which enhances the differentiation between false positives and true positives by adjusting detection scores, thereby improving the effectiveness of the Score Filter. On the NuScenes Val dataset, IMM-MOT outperforms most other single-modal models using 3D point clouds, achieving an AMOTA of 73. 8%. Our project is available at https://github.com/Ap01lo/IMM-MOT.

Details

JBHI Journal 2025 Journal Article

Improving Patient-Ventilator Synchrony During Pressure Support Ventilation Based on Reinforcement Learning Algorithm

Liming Hao
Xiaohan Wang
Shuai Ren
Yan Shi
Maolin Cai
Tao Wang
Zujin Luo

Mechanical ventilation is an effective treatment for critically ill patients and those with pulmonary diseases. However, patient-ventilator asynchrony (PVA) remains a significant challenge, potentially leading to high mortality. Improving patient-ventilator synchrony poses a complex decision-making problem in clinical practice. Traditional methods rely heavily on clinicians' experience, often resulting in inefficiencies, delayed ventilator adjustments, and resource shortages. This paper proposes a novel approach using a deep reinforcement learning (RL) algorithm based on deep Q-learning (DQN) to enhance patient-ventilator synchrony during pressure support ventilation. The action space and reward function are established from clinical experience, and a pneumatic model of the mechanical ventilation system is constructed to simulate various patient conditions and types of PVAs. Clinical data are used to evaluate the RL algorithm qualitatively and quantitatively. The RL-optimized ventilation strategy reduces the proportion of breaths containing PVAs from 37. 52% to 7. 08%, demonstrating its effectiveness in assisting clinical decision-making, improving synchrony, and enabling intelligent ventilator control, bedside monitoring, and automatic weaning.

Details DOI

ICML Conference 2025 Conference Paper

Improving Value Estimation Critically Enhances Vanilla Policy Gradient

Tao Wang
Ruipeng Zhang
Sicun Gao

Modern policy gradient algorithms, such as TRPO and PPO, outperform vanilla policy gradient in many RL tasks. Questioning the common belief that enforcing approximate trust regions leads to steady policy improvement in practice, we show that the more critical factor is the enhanced value estimation accuracy from more value update steps in each iteration. To demonstrate, we show that by simply increasing the number of value update steps per iteration, vanilla policy gradient itself can achieve performance comparable to or better than PPO in all the standard continuous control benchmark environments. Importantly, this simple change to vanilla policy gradient is significantly more robust to hyperparameter choices, opening up the possibility that RL algorithms may still become more effective and easier to use.

Details

JBHI Journal 2025 Journal Article

Integrative Graph-Based Framework for Predicting circRNA Drug Resistance Using Disease Contextualization and Deep Learning

Yongtian Wang
Wenkai Shen
Yewei Shen
Shang Feng
Tao Wang
Xuequn Shang
Jiajie Peng

Circular RNAs (circRNAs) play a crucial role in gene regulation and have been implicated in the development of drug resistance in cancer, representing a significant challenge in oncological therapeutics. Despite advancements in computational models predicting RNA-drug interactions, existing frameworks often overlook the complex interplay between circRNAs, drug mechanisms, and disease contexts. This study aims to bridge this gap by introducing a novel computational model, circRDRP, that enhances prediction accuracy by integrating disease-specific contexts into the analysis of circRNA-drug interactions. It employs a hybrid graph neural network that combines features from Graph Attention Networks (GAT) and Graph Convolutional Networks (GCN) in a two-layer structure, with further enhancement through convolutional neural networks. This approach allows for sophisticated feature extraction from integrated networks of circRNAs, drugs, and diseases. Our results demonstrate that the circRDRP model outperforms existing models in predicting drug resistance, showing significant improvements in accuracy, precision, and recall. Specifically, the model shows robust predictive capability in case studies involving major anticancer drugs such as Cisplatin and Methotrexate, indicating its potential utility in precision medicine. In conclusion, circRDRP offers a powerful tool for understanding and predicting drug resistance mediated by circRNAs, with implications for designing more effective cancer therapies.

Details DOI

EAAI Journal 2025 Journal Article

Multi-scale camouflaged feature mining and fusion network for liver tumor segmentation

Lei Yang
Jiawei Zhang
Tao Wang
Qianjin Feng
Sirui Fu
Meiyan Huang

Details DOI

EAAI Journal 2025 Journal Article

Multi-scale distance similarity entropy: A novel complexity measurement for gearbox fault diagnosis

Tao Wang
Shin Yee Khoo
Zhi Chao Ong
Pei Yi Siow
Teng Wang

Details DOI

ICML Conference 2025 Conference Paper

Open-Det: An Efficient Learning Framework for Open-Ended Detection

Guiping Cao
Tao Wang
Wenjian Huang 0001
Xiangyuan Lan
Jianguo Zhang 0001
Dongmei Jiang

Open-Ended object Detection (OED) is a novel and challenging task that detects objects and generates their category names in a free-form manner, without requiring additional vocabularies during inference. However, the existing OED models, such as GenerateU, require large-scale datasets for training, suffer from slow convergence, and exhibit limited performance. To address these issues, we present a novel and efficient Open-Det framework, consisting of four collaborative parts. Specifically, Open-Det accelerates model training in both the bounding box and object name generation process by reconstructing the Object Detector and the Object Name Generator. To bridge the semantic gap between Vision and Language modalities, we propose a Vision-Language Aligner with V-to-L and L-to-V alignment mechanisms, incorporating with the Prompts Distiller to transfer knowledge from the VLM into VL-prompts, enabling accurate object name generation for the LLM. In addition, we design a Masked Alignment Loss to eliminate contradictory supervision and introduce a Joint Loss to enhance classification, resulting in more efficient training. Compared to GenerateU, Open-Det, using only 1. 5% of the training data (0. 077M vs. 5. 077M), 20. 8% of the training epochs (31 vs. 149), and fewer GPU resources (4 V100 vs. 16 A100), achieves even higher performance (+1. 0% in APr). The source codes are available at: https: //github. com/Med-Process/Open-Det.

Details

JBHI Journal 2025 Journal Article

Resting-State Electroencephalographic Signatures Predict Treatment Efficacy of tACS for Refractory Auditory Hallucinations in Schizophrenic Patients

Xiaojuan Wang
Ruxin Hu
Tao Wang
Yuan Chang
Xiaoya Liu
Meijuan Li
Ying Gao
Shuang Liu

Transcranial alternating current stimulation (tACS) has been reported to treat refractory auditory hallucinations in schizophrenia. Despite diligent efforts, it is imperative to underscore that tACS does not uniformly demonstrate efficacy across all patients as with all treatments currently employed in clinical practice. The study aims to find biomarkers predicting individual responses to tACS, guiding treatment decisions, and preventing healthcare resource wastage. We divided 17 schizophrenic patients with refractory auditory hallucinations into responsive(RE) and non-responsive(NR) groups based on their auditory hallucination symptom reduction rates after one month of tACS treatment. The pre-treatment resting-state electroencephalogram(rsEEG) was recorded and then computed absolute power spectral density (PSD), Hjorth parameters (HPs, Hjorth activity (HA), Hjorth mobility (HM), and Hjorth complexity (HC) included) from different frequency bands to portray the brain oscillations. The results demonstrated that statistically significant differences localized within the high gamma frequency bands of the right brain hemisphere. Immediately, we input the significant dissociable features into popular machine learning algorithms, the Cascade Forward Neural Network achieved the best recognition accuracy of 93. 87%. These findings preliminarily imply that high gamma oscillations in the right brain hemisphere may be the main influencing factor leading to different responses to tACS treatment, and incorporating rsEEG signatures could improve personalized decisions for integrating tACS in clinical treatment.

Details DOI

NeurIPS Conference 2025 Conference Paper

SALS: Sparse Attention in Latent Space for KV Cache Compression

Junlin Mu
Hantao Huang
Jihang Zhang
Minghui Yu
Tao Wang
Yidong Li

Large Language Models (LLMs) capable of handling extended contexts are in high demand, yet their inference remains challenging due to substantial Key-Value (KV) cache size and high memory bandwidth requirements. Previous research has demonstrated that KV cache exhibits low-rank characteristics within the hidden dimension, suggesting the potential for effective compression. However, due to the widely adopted Rotary Position Embedding (RoPE) mechanism in modern LLMs, naive low‑-rank compression suffers severe accuracy degradation or creates a new speed bottleneck, as the low-rank cache must first be reconstructed in order to apply RoPE. In this paper, we introduce two key insights: first, the application of RoPE to the key vectors increases their variance, which in turn results in a higher rank; second, after the key vectors are transformed into the latent space, they largely maintain their representation across most layers. Based on these insights, we propose the Sparse Attention in Latent Space (SALS) framework. SALS projects the KV cache into a compact latent space via low-rank projection, and performs sparse token selection using RoPE-free query--key interactions in this space. By reconstructing only a small subset of important tokens, it avoids the overhead of full KV cache reconstruction. We comprehensively evaluate SALS on various tasks using two large-scale models: LLaMA2-7b-chat and Mistral-7b, and additionally verify its scalability on the RULER-128k benchmark with LLaMA3. 1-8B-Instruct. Experimental results demonstrate that SALS achieves SOTA performance by maintaining competitive accuracy. Under different settings, SALS achieves 6. 4-fold KV cache compression and 5. 7-fold speed-up in the attention operator compared to FlashAttention2 on the 4K sequence. For the end-to-end throughput performance, we achieves 1. 4-fold and 4. 5-fold improvement compared to GPT-fast on 4k and 32K sequences, respectively. The source code will be publicly available in the future.

PDF Details

IROS Conference 2025 Conference Paper

Seamless Transition Control in Spring-Legged Quadrotors: A Hybrid Dynamics Perspective with Guaranteed Feasibility

Hongli Li
Botao Zhang
Rui Mao
Tao Wang
Hui Cheng

Legged aerial-terrestrial robots have garnered significant research attention in recent years due to their enhanced environmental adaptability through combined aerial and terrestrial locomotion. However, existing passive spring-legged aerial robots exhibit limited motion versatility, demonstrating single stance gait during ground impacts, which constrains their task adaptability and creates substantial challenges in hybrid trajectory optimization and switching control. To address these difficulties, this work presents a systematic solution to achieve diverse hybrid locomotion. We innovatively establish the differential flatness property for spring-legged quadrotors in both aerial and terrestrial domains, and propose a unified hybrid trajectory optimization framework that generates smooth, agile, and dynamically feasible multi-modal trajectories incorporating diverse stance gait patterns. Furthermore, a hybrid nonlinear model predictive controller with a trajectory extension strategy is developed to enhance hybrid tracking precision and mode transition execution. Compared to existing methods, we achieve a 27% reduction in tracking error during hybrid locomotion while maintaining high-precision foot placement. The source code will be released to benefit the community 1

Details

AIIM Journal 2025 Journal Article

Surgery scheduling based on large language models

Fang Wan
Tao Wang
Kezhi Wang
Yuanhang Si
Julien Fondrevelle
Shuimiao Du
Antoine Duclos

Details DOI

JBHI Journal 2024 Journal Article

An Accurate Non-Contact Photoplethysmography via Active Cancellation of Reflective Interference

Yonggang Tong
Zhipei Huang
Feng Qiu
Tao Wang
Yiquan Wang
Fei Qin
Ming Yin

Imaging Photoplethysmography (IPPG) is an emerging and efficient optical method for non-contact measurement of pulse waves using an image sensor. While the contactless way brings convenience, the inevitable distance between the sensor and the subject results in massive specular reflection interference on the skin surface, which leads to a low Signal to Interference plus Noise Ratio (SINR) of IPPG. To ease this challenge, this work proposes a novel modulation illumination approach to measure the accurate arterial pulse wave via surface reflection interference isolation from IPPG. Based on the proposed skin reflection model, a specific modulation illumination is designed to separate the surface reflections and obtain the subcutaneous diffuse reflections containing the pulse wave information. Compared with the results under ambient illumination and constant supplemental illumination, the SINR of the proposed method is improved by 4. 56 and 3. 74 dB, respectively.

Details DOI

IROS Conference 2024 Conference Paper

An Online Automatic Calibration Method for Infrastructure-Based LiDAR-Camera via Cross-modal Object Matching

Tao Wang
Yuesheng He
Hanyang Zhuang
Ming Yang 0002

In indoor environments where the Global Navigation Satellite System (GNSS) isn’t available, the infrastructure-based LiDAR-camera joint array can provide high-precision localization for mobile robots, such as Autonomous Valet Parking (AVP). The primary challenge in employing the infrastructure-based LiDAR-camera joint array is the extrinsic calibration between the LiDAR and the camera. Moreover, to handle interference deviation caused by vibrations or inadequate mounting stiffness during operation, the calibration’s extrinsic parameters must be automatically updated online, presenting higher demands for infrastructure-based LiDAR-camera extrinsic calibration. This paper proposes an infrastructure LiDAR-camera online automatic calibration method based on prior knowledge of cross-modal target registration. This method requires no manual targets and initial pose guesses and can achieve extrinsic calibration. The object-prior model based on a lightweight object detection algorithm can rapidly detect scenes favorable for extrinsic calibration in sub-images of camera images. This creates favorable conditions for the registration of cross-modal networks and poses optimization of the LiDAR camera. Additionally, because a lightweight algorithm is used, the process does not compromise efficiency or consume excessive computational resources. Experimental results demonstrate that the proposed calibration method is suitable for calibrating infrastructure-based LiDAR-camera, with comparable accuracy and the ability to perform online calibration. Comparative experiments also show that the object-prior model can indeed select better scenes for LiDAR-camera extrinsic calibration, thus improving the accuracy and stability of extrinsic calibration to some extent.

Details

EAAI Journal 2024 Journal Article

Assessing growth potential of careers with occupational mobility network and ensemble framework

Jiamin Liu
Tao Wang
Feng Yao
Witold Pedrycz
Yanjie Song
Renjie He

Details DOI

EAAI Journal 2024 Journal Article

Compact convolutional transformers- generative adversarial network for compound fault diagnosis of industrial robot

Chong Chen
Tao Wang
Kaijie Lu
Ying Liu
Lianglun Cheng

Details DOI

ICML Conference 2024 Conference Paper

Controlled Decoding from Language Models

Sidharth Mudgal
Jong Lee
Harish Ganapathy
YaGuang Li
Tao Wang
Yanping Huang
Zhifeng Chen
Heng-Tze Cheng

KL-regularized reinforcement learning (RL) is a popular alignment framework to control the language model responses towards high reward outcomes. We pose a tokenwise RL objective and propose a modular solver for it, called controlled decoding (CD). CD exerts control through a separate prefix scorer module, which is trained to learn a value function for the reward. The prefix scorer is used at inference time to control the generation from a frozen base model, provably sampling from a solution to the RL objective. We empirically demonstrate that CD is effective as a control mechanism on popular benchmarks. We also show that prefix scorers for multiple rewards may be combined at inference time, effectively solving a multi-objective RL problem with no additional training. We show that the benefits of applying CD transfer to an unseen base model with no further tuning as well. Finally, we show that CD can be applied in a blockwise decoding fashion at inference-time, essentially bridging the gap between the popular best-of-$K$ strategy and tokenwise control through reinforcement learning. This makes CD a promising approach for alignment of language models.

Details

ECAI Conference 2024 Conference Paper

CorrAdaptor: Adaptive Local Context Learning for Correspondence Pruning

Wei Zhu
Yicheng Liu
Yuping He
Tangfei Liao
Kang Zheng
Xiaoqiu Xu
Tao Wang
Tong Lu

In the fields of computer vision and robotics, accurate pixel-level correspondences are essential for enabling advanced tasks such as structure-from-motion and simultaneous localization and mapping. Recent correspondence pruning methods usually focus on learning local consistency through k-nearest neighbors, which makes it difficult to capture robust context for each correspondence. We propose CorrAdaptor, a novel architecture that introduces a dual-branch structure capable of adaptively adjusting local contexts through both explicit and implicit local graph learning. Specifically, the explicit branch uses KNN-based graphs tailored for initial neighborhood identification, while the implicit branch leverages a learnable matrix to softly assign neighbors and adaptively expand the local context scope, significantly enhancing the model’s robustness and adaptability to complex image variations. Moreover, we design a motion injection module to integrate motion consistency into the network to suppress the impact of outliers and refine local context learning, resulting in substantial performance improvements. The experimental results on extensive correspondence-based tasks indicate that our CorrAdaptor achieves state-of-the-art performance both qualitatively and quantitatively.

Details

EAAI Journal 2024 Journal Article

CSANet: Cross-self attention guided by semantic click embedding for interactive segmentation

Zongyuan Ding
Hongyuan Wang
Quansen Sun
Tao Wang
Fuhua Chen

Details DOI

NeurIPS Conference 2024 Conference Paper

DFA-GNN: Forward Learning of Graph Neural Networks by Direct Feedback Alignment

Gongpei Zhao
Tao Wang
Congyan Lang
Yi Jin
Yidong Li
Haibin Ling

Graph neural networks (GNNs) are recognized for their strong performance across various applications, with the backpropagation (BP) algorithm playing a central role in the development of most GNN models. However, despite its effectiveness, BP has limitations that challenge its biological plausibility and affect the efficiency, scalability and parallelism of training neural networks for graph-based tasks. While several non-backpropagation (non-BP) training algorithms, such as the direct feedback alignment (DFA), have been successfully applied to fully-connected and convolutional network components for handling Euclidean data, directly adapting these non-BP frameworks to manage non-Euclidean graph data in GNN models presents significant challenges. These challenges primarily arise from the violation of the independent and identically distributed (i. i. d. ) assumption in graph data and the difficulty in accessing prediction errors for all samples (nodes) within the graph. To overcome these obstacles, in this paper we propose DFA-GNN, a novel forward learning framework tailored for GNNs with a case study of semi-supervised learning. The proposed method breaks the limitations of BP by using a dedicated forward training mechanism. Specifically, DFA-GNN extends the principles of DFA to adapt to graph data and unique architecture of GNNs, which incorporates the information of graph topology into the feedback links to accommodate the non-Euclidean characteristics of graph data. Additionally, for semi-supervised graph learning tasks, we developed a pseudo error generator that spreads residual errors from training data to create a pseudo error for each unlabeled node. These pseudo errors are then utilized to train GNNs using DFA. Extensive experiments on 10 public benchmarks reveal that our learning framework outperforms not only previous non-BP methods but also the standard BP methods, and it exhibits excellent robustness against various types of noise and attacks.

PDF Details DOI

AIJ Journal 2024 Journal Article

Emotion selectable end-to-end text-based speech editing

Tao Wang
Jiangyan Yi
Ruibo Fu
Jianhua Tao
Zhengqi Wen
Chu Yuan Zhang

Details DOI

NeurIPS Conference 2024 Conference Paper

Generated and Pseudo Content guided Prototype Refinement for Few-shot Point Cloud Segmentation

Lili Wei
Congyan Lang
Ziyi Chen
Tao Wang
Yidong Li
Jun Liu

Few-shot 3D point cloud semantic segmentation aims to segment query point clouds with only a few annotated support point clouds. Existing prototype-based methods learn prototypes from the 3D support set to guide the segmentation of query point clouds. However, they encounter the challenge of low prototype quality due to constrained semantic information in the 3D support set and class information bias between support and query sets. To address these issues, in this paper, we propose a novel framework called Generated and Pseudo Content guided Prototype Refinement (GPCPR), which explicitly leverages LLM-generated content and reliable query context to enhance prototype quality. GPCPR achieves prototype refinement through two core components: LLM-driven Generated Content-guided Prototype Refinement (GCPR) and Pseudo Query Context-guided Prototype Refinement (PCPR). Specifically, GCPR integrates diverse and differentiated class descriptions generated by large language models to enrich prototypes with comprehensive semantic knowledge. PCPR further aggregates reliable class-specific pseudo-query context to mitigate class information bias and generate more suitable query-specific prototypes. Furthermore, we introduce a dual-distillation regularization term, enabling knowledge transfer between early-stage entities (prototypes or pseudo predictions) and their deeper counterparts to enhance refinement. Extensive experiments demonstrate the superiority of our method, surpassing the state-of-the-art methods by up to 12. 10% and 13. 75% mIoU on S3DIS and ScanNet, respectively.

PDF Details DOI

ICML Conference 2024 Conference Paper

Mollification Effects of Policy Gradient Methods

Tao Wang
Sylvia L. Herbert
Sicun Gao

Policy gradient methods have enabled deep reinforcement learning (RL) to approach challenging continuous control problems, even when the underlying systems involve highly nonlinear dynamics that generate complex non-smooth optimization landscapes. We develop a rigorous framework for understanding how policy gradient methods mollify non-smooth optimization landscapes to enable effective policy search, as well as the downside of it: while making the objective function smoother and easier to optimize, the stochastic objective deviates further from the original problem. We demonstrate the equivalence between policy gradient methods and solving backward heat equations. Following the ill-posedness of backward heat equations from PDE theory, we present a fundamental challenge to the use of policy gradient under stochasticity. Moreover, we make the connection between this limitation and the uncertainty principle in harmonic analysis to understand the effects of exploration with stochastic policies in RL. We also provide experimental results to illustrate both the positive and negative aspects of mollification effects in practice.

Details

NeurIPS Conference 2024 Conference Paper

PANORAMIA: Privacy Auditing of Machine Learning Models without Retraining

Mishaal Kazmi
Hadrien Lautraite
Alireza Akbari
Qiaoyue Tang
Mauricio Soroco
Tao Wang
Sébastien Gambs
Mathias Lécuyer

We present PANORAMIA, a privacy leakage measurement framework for machine learning models that relies on membership inference attacks using generated data as non-members. By relying on generated non-member data, PANORAMIA eliminates the common dependency of privacy measurement tools on in-distribution non-member data. As a result, PANORAMIA does not modify the model, training data, or training process, and only requires access to a subset of the training data. We evaluate PANORAMIA on ML models for image and tabular data classification, as well as on large-scale language models.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Trend-Aware Supervision: On Learning Invariance for Semi-supervised Facial Action Unit Intensity Estimation

Yingjie Chen
Jiarui Zhang
Tao Wang
Yun Liang

With the increasing need for facial behavior analysis, semi-supervised AU intensity estimation using only keyframe annotations has emerged as a practical and effective solution to relieve the burden of annotation. However, the lack of annotations makes the spurious correlation problem caused by AU co-occurrences and subject variation much more prominent, leading to non-robust intensity estimation that is entangled among AUs and biased among subjects. We observe that trend information inherent in keyframe annotations could act as extra supervision and raising the awareness of AU-specific facial appearance changing trends during training is the key to learning invariant AU-specific features. To this end, we propose Trend-AwareSupervision (TAS), which pursues three kinds of trend awareness, including intra-trend ranking awareness, intra-trend speed awareness, and inter-trend subject awareness. TAS alleviates the spurious correlation problem by raising trend awareness during training to learn AU-specific features that represent the corresponding facial appearance changes, to achieve intensity estimation invariance. Experiments conducted on two commonly used AU benchmark datasets, BP4D and DISFA, show the effectiveness of each kind of awareness. And under trend-aware supervision, the performance can be improved without extra computational or storage costs during inference.

PDF Details DOI

JBHI Journal 2024 Journal Article

Using Semi-Supervised Domain Adaptation to Enhance EEG-Based Cross-Task Mental Workload Classification Performance

Tao Wang
Yufeng Ke
Yichao Huang
Feng He
Wenxiao Zhong
Shuang Liu
Dong Ming

Mental workload (MWL) assessment is critical for accident prevention and operator safety. However, achieving cross-task generalization of MWL classification models is a significant challenge for real-world applications. Classifiers trained on labeled samples from one task often experience a notable performance drop when directly applied to samples from other tasks, limiting its use cases. To address this issue, we propose a semi-supervised cross-task domain adaptation (SCDA) method using power spectral density (PSD) features for MWL recognition across tasks (MATB-II and n-back). Our results demonstrated that the SCDA method achieved the best cross-task classification performance on our data and COG-BCI public dataset, with accuracies of 90. 98% ± 9. 36% and 96. 61% ± 4. 35%, respectively. Furthermore, in the cross-task classification of cross-subject scenarios, SCDA showed the highest average accuracy (75. 39% ± 9. 56% on our data, 90. 98% ± 9. 36% on the COG-BCI public dataset). The findings indicate that the semi-supervised transfer learning approach using PSD features is feasible and effective for cross-task MWL assessment.

Details DOI

AAAI Conference 2024 Conference Paper

VSFormer: Visual-Spatial Fusion Transformer for Correspondence Pruning

Tangfei Liao
Xiaoqin Zhang
Li Zhao
Tao Wang
Guobao Xiao

Correspondence pruning aims to find correct matches (inliers) from an initial set of putative correspondences, which is a fundamental task for many applications. The process of finding is challenging, given the varying inlier ratios between scenes/image pairs due to significant visual differences. However, the performance of the existing methods is usually limited by the problem of lacking visual cues (e.g., texture, illumination, structure) of scenes. In this paper, we propose a Visual-Spatial Fusion Transformer (VSFormer) to identify inliers and recover camera poses accurately. Firstly, we obtain highly abstract visual cues of a scene with the cross attention between local features of two-view images. Then, we model these visual cues and correspondences by a joint visual-spatial fusion module, simultaneously embedding visual cues into correspondences for pruning. Additionally, to mine the consistency of correspondences, we also design a novel module that combines the KNN-based graph and the transformer, effectively capturing both local and global contexts. Extensive experiments have demonstrated that the proposed VSFormer outperforms state-of-the-art methods on outdoor and indoor benchmarks. Our code is provided at the following repository: https://github.com/sugar-fly/VSFormer.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Zero-Shot Aerial Object Detection with Visual Description Regularization

Zhengqing Zang
Chenyu Lin
Chenwei Tang
Tao Wang
Jiancheng Lv

Existing object detection models are mainly trained on large-scale labeled datasets. However, annotating data for novel aerial object classes is expensive since it is time-consuming and may require expert knowledge. Thus, it is desirable to study label-efficient object detection methods on aerial images. In this work, we propose a zero-shot method for aerial object detection named visual Description Regularization, or DescReg. Concretely, we identify the weak semantic-visual correlation of the aerial objects and aim to address the challenge with prior descriptions of their visual appearance. Instead of directly encoding the descriptions into class embedding space which suffers from the representation gap problem, we propose to infuse the prior inter-class visual similarity conveyed in the descriptions into the embedding learning. The infusion process is accomplished with a newly designed similarity-aware triplet loss which incorporates structured regularization on the representation space. We conduct extensive experiments with three challenging aerial object detection datasets, including DIOR, xView, and DOTA. The results demonstrate that DescReg significantly outperforms the state-of-the-art ZSD methods with complex projection designs and generative frameworks, e.g., DescReg outperforms best reported ZSD method on DIOR by 4.5 mAP on unseen classes and 8.1 in HM. We further show the generalizability of DescReg by integrating it into generative ZSD methods as well as varying the detection architecture. Codes will be released at https://github.com/zq-zang/DescReg.

PDF Details DOI

EAAI Journal 2023 Journal Article

A prospect theory-based MABAC algorithm with novel similarity measures and interactional operations for picture fuzzy sets and its applications

Tao Wang
Xinxing Wu
Harish Garg
Qian Liu
Guanrong Chen

Details DOI

NeurIPS Conference 2023 Conference Paper

Fractal Landscapes in Policy Optimization

Tao Wang
Sylvia Herbert
Sicun Gao

Policy gradient lies at the core of deep reinforcement learning (RL) in continuous domains. Despite much success, it is often observed in practice that RL training with policy gradient can fail for many reasons, even on standard control problems with known solutions. We propose a framework for understanding one inherent limitation of the policy gradient approach: the optimization landscape in the policy space can be extremely non-smooth or fractal for certain classes of MDPs, such that there does not exist gradient to be estimated in the first place. We draw on techniques from chaos theory and non-smooth analysis, and analyze the maximal Lyapunov exponents and H\"older exponents of the policy optimization objectives. Moreover, we develop a practical method that can estimate the local smoothness of objective function from samples to identify when the training process has encountered fractal landscapes. We show experiments to illustrate how some failure cases of policy optimization can be explained by such fractal landscapes.

PDF Details

IJCAI Conference 2023 Conference Paper

Graph Propagation Transformer for Graph Representation Learning

Zhe Chen
Hao Tan
Tao Wang
Tianrun Shen
Tong Lu
Qiuying Peng
Cheng Cheng
Yue Qi

This paper presents a novel transformer architecture for graph representation learning. The core insight of our method is to fully consider the information propagation among nodes and edges in a graph when building the attention module in the transformer blocks. Specifically, we propose a new attention mechanism called Graph Propagation Attention (GPA). It explicitly passes the information among nodes and edges in three ways, i. e. node-to-node, node-to-edge, and edge-to-node, which is essential for learning graph-structured data. On this basis, we design an effective transformer architecture named Graph Propagation Transformer (GPTrans) to further help learn graph data. We verify the performance of GPTrans in a wide range of graph learning experiments on several benchmark datasets. These results show that our method outperforms many state-of-the-art transformer-based graph models with better performance. The code will be released at https: //github. com/czczup/GPTrans.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

Orion: Online Backdoor Sample Detection via Evolution Deviance

Huayang Huang
Qian Wang
Xueluan Gong
Tao Wang

Widely-used DNN models are vulnerable to backdoor attacks, where the backdoored model is only triggered by specific inputs but can maintain a high prediction accuracy on benign samples. Existing backdoor input detection strategies rely on the assumption that benign and poisoned samples are separable in the feature representation of the model. However, such an assumption can be broken by advanced feature-hidden backdoor attacks. In this paper, we propose a novel detection framework, dubbed Orion (online backdoor sample detection via evolution deviance). Specifically, we analyze how predictions evolve during a forward pass and find deviations between the shallow and deep outputs of the backdoor inputs. By introducing side nets to track such evolution divergence, Orion eliminates the need for the assumption of latent separability. Additionally, we put forward a scheme to restore the original label of backdoor samples, enabling more robust predictions. Extensive experiments on six attacks, three datasets, and two architectures verify the effectiveness of Orion. It is shown that Orion outperforms state-of-the-art defenses and can identify feature-hidden attacks with an F1-score of 90%, compared to 40% for other detection schemes. Orion can also achieve 80% label recovery accuracy on basic backdoor attacks.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Punctuation-level Attack: Single-shot and Single Punctuation Can Fool Text Models

Wenqiang Wang
Chongyang Du
Tao Wang
Kaihao Zhang
Wenhan Luo
Lin Ma
Wei Liu
Xiaochun Cao

The adversarial attacks have attracted increasing attention in various fields including natural language processing. The current textual attacking models primarily focus on fooling models by adding character-/word-/sentence-level perturbations, ignoring their influence on human perception. In this paper, for the first time in the community, we propose a novel mode of textual attack, punctuation-level attack. With various types of perturbations, including insertion, displacement, deletion, and replacement, the punctuation-level attack achieves promising fooling rates against SOTA models on typical textual tasks and maintains minimal influence on human perception and understanding of the text by mere perturbation of single-shot single punctuation. Furthermore, we propose a search method named Text Position Punctuation Embedding and Paraphrase (TPPEP) to accelerate the pursuit of optimal position to deploy the attack, without exhaustive search, and we present a mathematical interpretation of TPPEP. Thanks to the integrated Text Position Punctuation Embedding (TPPE), the punctuation attack can be applied at a constant cost of time. Experimental results on public datasets and SOTA models demonstrate the effectiveness of the punctuation attack and the proposed TPPE. We additionally apply the single punctuation attack to summarization, semantic-similarity-scoring, and text-to-image tasks, and achieve encouraging results.

PDF Details

JBHI Journal 2023 Journal Article

SemiMAR: Semi-Supervised Learning for CT Metal Artifact Reduction

Tao Wang
Hui Yu
Zhiwen Wang
Hu Chen
Yan Liu
Jingfeng Lu
Yi Zhang

Metal artifacts lead to CT imaging quality degradation. With the success of deep learning (DL) in medical imaging, a number of DL-based supervised methods have been developed for metal artifact reduction (MAR). Nonetheless, fully-supervised MAR methods based on simulated data do not perform well on clinical data due to the domain gap. Although this problem can be avoided in an unsupervised way to a certain degree, severe artifacts cannot be well suppressed in clinical practice. Recently, semi-supervised metal artifact reduction (MAR) methods have gained wide attention due to their ability in narrowing the domain gap and improving MAR performance in clinical data. However, these methods typically require large model sizes, posing challenges for optimization. To address this issue, we propose a novel semi-supervised MAR framework. In our framework, only the artifact-free parts are learned, and the artifacts are inferred by subtracting these clean parts from the metal-corrupted CT images. Our approach leverages a single generator to execute all complex transformations, thereby reducing the model's scale and preventing overlap between clean part and artifacts. To recover more tissue details, we distill the knowledge from the advanced dual-domain MAR network into our model in both image domain and latent feature space. The latent space constraint is achieved via contrastive learning. We also evaluate the impact of different generator architectures by investigating several mainstream deep learning-based MAR backbones. Our experiments demonstrate that the proposed method competes favorably with several state-of-the-art semi-supervised MAR techniques in both qualitative and quantitative aspects.

Details DOI

JBHI Journal 2023 Journal Article

Trustworthy Data and AI Environments for Clinical Prediction: Application to Crisis-Risk in People With Depression

Yamiko Joseph Msosa
Arturas Grauslys
Yifan Zhou
Tao Wang
Iain Buchan
Paul Langan
Steven Foster
Michael Walker

Depression is a common mental health condition that often occurs in association with other chronic illnesses, and varies considerably in severity. Electronic Health Records (EHRs) contain rich information about a patient's medical history and can be used to train, test and maintain predictive models to support and improve patient care. This work evaluated the feasibility of implementing an environment for predicting mental health crisis among people living with depression based on both structured and unstructured EHRs. A large EHR from a mental health provider, Mersey Care, was pseudonymised and ingested into the Natural Language Processing (NLP) platform CogStack, allowing text content in binary clinical notes to be extracted. All unstructured clinical notes and summaries were semantically annotated by MedCAT and BioYODIE NLP services. Cases of crisis in patients with depression were then identified. Random forest models, gradient boosting trees, and Long Short-Term Memory (LSTM) networks, with varying feature arrangement, were trained to predict the occurrence of crisis. The results showed that all the prediction models can use a combination of structured and unstructured EHR information to predict crisis in patients with depression with good and useful accuracy. The LSTM network that was trained on a modified dataset with only 1000 most-important features from the random forest model with temporality showed the best performance with a mean AUC of 0. 901 and a standard deviation of 0. 006 using a training dataset and a mean AUC of 0. 810 and 0. 01 using a hold-out test dataset. Comparing the results from the technical evaluation with the views of psychiatrists shows that there are now opportunities to refine and integrate such prediction models into pragmatic point-of-care clinical decision support tools for supporting mental healthcare delivery.

Details DOI

AAAI Conference 2023 Conference Paper

Ultra-High-Definition Low-Light Image Enhancement: A Benchmark and Transformer-Based Method

Tao Wang
Kaihao Zhang
Tianrun Shen
Wenhan Luo
Bjorn Stenger
Tong Lu

As the quality of optical sensors improves, there is a need for processing large-scale images. In particular, the ability of devices to capture ultra-high definition (UHD) images and video places new demands on the image processing pipeline. In this paper, we consider the task of low-light image enhancement (LLIE) and introduce a large-scale database consisting of images at 4K and 8K resolution. We conduct systematic benchmarking studies and provide a comparison of current LLIE algorithms. As a second contribution, we introduce LLFormer, a transformer-based low-light enhancement method. The core components of LLFormer are the axis-based multi-head self-attention and cross-layer attention fusion block, which significantly reduces the linear complexity. Extensive experiments on the new dataset and existing public datasets show that LLFormer outperforms state-of-the-art methods. We also show that employing existing LLIE methods trained on our benchmark as a pre-processing step significantly improves the performance of downstream tasks, e.g., face detection in low-light conditions. The source code and pre-trained models are available at https://github.com/TaoWangzj/LLFormer.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Causal Intervention for Subject-Deconfounded Facial Action Unit Recognition

Yingjie Chen
Diqi Chen
Tao Wang
Yizhou Wang
Yun Liang

Subject-invariant facial action unit (AU) recognition remains challenging for the reason that the data distribution varies among subjects. In this paper, we propose a causal inference framework for subject-invariant facial action unit recognition. To illustrate the causal effect existing in AU recognition task, we formulate the causalities among facial images, subjects, latent AU semantic relations, and estimated AU occurrence probabilities via a structural causal model. By constructing such a causal diagram, we clarify the causal effect among variables and propose a plug-in causal intervention module, CIS, to deconfound the confounder Subject in the causal diagram. Extensive experiments conducted on two commonly used AU benchmark datasets, BP4D and DISFA, show the effectiveness of our CIS, and the model with CIS inserted, CISNet, has achieved state-of-the-art performance.

PDF Details

AAAI Conference 2022 Conference Paper

Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-Supervised Action Recognition

Tianyu Guo
Hong Liu
Zhan Chen
Mengyuan Liu
Tao Wang
Runwei Ding

In recent years, self-supervised representation learning for skeleton-based action recognition has been developed with the advance of contrastive learning methods. The existing contrastive learning methods use normal augmentations to construct similar positive samples, which limits the ability to explore novel movement patterns. In this paper, to make better use of the movement patterns introduced by extreme augmentations, a Contrastive Learning framework utilizing Abundant Information Mining for self-supervised action Representation (AimCLR) is proposed. First, the extreme augmentations and the Energy-based Attention-guided Drop Module (EADM) are proposed to obtain diverse positive samples, which bring novel movement patterns to improve the universality of the learned representations. Second, since directly using extreme augmentations may not be able to boost the performance due to the drastic changes in original identity, the Dual Distributional Divergence Minimization Loss (D3 M Loss) is proposed to minimize the distribution divergence in a more gentle way. Third, the Nearest Neighbors Mining (NNM) is proposed to further expand positive samples to make the abundant information mining process more reasonable. Exhaustive experiments on NTU RGB+D 60, PKU-MMD, NTU RGB+D 120 datasets have verified that our AimCLR can significantly perform favorably against state-of-the-art methods under a variety of evaluation protocols with observed higher quality action representations. Our code is available at https: //github. com/Levigty/AimCLR.

PDF Details

IJCAI Conference 2022 Conference Paper

Discrete Listwise Personalized Ranking for Fast Top-N Recommendation with Implicit Feedback

Fangyuan Luo
Jun Wu
Tao Wang

We address the efficiency problem of personalized ranking from implicit feedback by hashing users and items with binary codes, so that top-N recommendation can be fast executed in a Hamming space by bit operations. However, current hashing methods for top-N recommendation fail to align their learning objectives (such as pointwise or pairwise loss) with the benchmark metrics for ranking quality (e. g. Average Precision, AP), resulting in sub-optimal accuracy. To this end, we propose a Discrete Listwise Personalized Ranking (DLPR) model that optimizes AP under discrete constraints for fast and accurate top-N recommendation. To resolve the challenging DLPR problem, we devise an efficient algorithm that can directly learn binary codes in a relaxed continuous solution space. Specifically, theoretical analysis shows that the optimal solution to the relaxed continuous optimization problem is exactly the same as that of the original discrete DLPR problem. Through extensive experiments on two real-world datasets, we show that DLPR consistently surpasses state-of-the-art hashing methods for top-N recommendation.

PDF Details DOI

AAAI Conference 2022 Conference Paper

FedInv: Byzantine-Robust Federated Learning by Inversing Local Model Updates

Bo Zhao
Peng Sun
Tao Wang
Keyu Jiang

Federated learning (FL) is a privacy-preserving distributed machine learning paradigm that enables multiple clients to collaboratively train statistical models without disclosing raw training data. However, the inaccessible local training data and uninspectable local training process make FL susceptible to various Byzantine attacks (e. g. , data poisoning and model poisoning attacks), aiming to manipulate the FL model training process and degrade the model performance. Most of the existing Byzantine-robust FL schemes cannot effectively defend against stealthy poisoning attacks that craft poisoned models statistically similar to benign models. Things worsen when many clients are compromised or data among clients are highly non-independent and identically distributed (non-IID). In this work, to address these issues, we propose FedInv, a novel Byzantine-robust FL framework by inversing local model updates. Specifically, in each round of local model aggregation in FedInv, the parameter server first inverses the local model updates submitted by each client to generate a corresponding dummy dataset. Then, the server identifies those dummy datasets with exceptional Wasserstein distances from others and excludes the related local model updates from model aggregation. We conduct an exhaustive experimental evaluation of FedInv. The results demonstrate that FedInv significantly outperforms the existing robust FL schemes in defending against stealthy poisoning attacks under highly non-IID data partitions.

PDF Details

AIIM Journal 2022 Journal Article

Gated Tree-based Graph Attention Network (GTGAT) for medical knowledge graph reasoning

Jingchi Jiang
Tao Wang
Boran Wang
Linjiang Ma
Yi Guan

Details DOI

ICML Conference 2022 Conference Paper

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

Nan Du 0002
Yanping Huang
Andrew M. Dai
Simon Tong
Dmitry Lepikhin
Yuanzhong Xu
Maxim Krikun
Yanqi Zhou

Scaling language models with more data, compute and parameters has driven significant progress in natural language processing. For example, thanks to scaling, GPT-3 was able to achieve strong results on in-context learning tasks. However, training these large dense models requires significant amounts of computing resources. In this paper, we propose and develop a family of language models named \glam (\textbf{G}eneralist \textbf{La}nguage \textbf{M}odel), which uses a sparsely activated mixture-of-experts architecture to scale the model capacity while also incurring substantially less training cost compared to dense variants. The largest \glam has 1. 2 trillion parameters, which is approximately 7x larger than GPT-3. It consumes only 1/3 of the energy used to train GPT-3 and requires half of the computation flops for inference, while still achieving better overall fewshot performance across 29 NLP tasks.

Details

EAAI Journal 2022 Journal Article

Human trajectory forecasting using a flow-based generative model

Bo Zhang
Tao Wang
Changdong Zhou
Nicola Conci
Hongbo Liu

Details DOI

AAAI Conference 2022 Conference Paper

Pose-Guided Feature Disentangling for Occluded Person Re-identification Based on Transformer

Tao Wang
Hong Liu
Pinhao Song
Tianyu Guo
Wei Shi

Occluded person re-identification is a challenging task as human body parts could be occluded by some obstacles (e. g. trees, cars, and pedestrians) in certain scenes. Some existing pose-guided methods solve this problem by aligning body parts according to graph matching, but these graph-based methods are not intuitive and complicated. Therefore, we propose a transformer-based Pose-guided Feature Disentangling (PFD) method by utilizing pose information to clearly disentangle semantic components (e. g. human body or joint parts) and selectively match non-occluded parts correspondingly. First, Vision Transformer (ViT) is used to extract the patch features with its strong capability. Second, to preliminarily disentangle the pose information from patch information, the matching and distributing mechanism is leveraged in Pose-guided Feature Aggregation (PFA) module. Third, a set of learnable semantic views are introduced in transformer decoder to implicitly enhance the disentangled body part features. However, those semantic views are not guaranteed to be related to the body without additional supervision. Therefore, Pose-View Matching (PVM) module is proposed to explicitly match visible body parts and automatically separate occlusion features. Fourth, to better prevent the interference of occlusions, we design a Pose-guided Push Loss to emphasize the features of visible body parts. Extensive experiments over five challenging datasets for two tasks (occluded and holistic Re-ID) demonstrate that our proposed PFD is superior promising, which performs favorably against state-of-the-art methods. Code is available at https: //github. com/WangTaoAs/PFD Net

PDF Details

AAAI Conference 2022 Conference Paper

Powerful Graph Convolutional Networks with Adaptive Propagation Mechanism for Homophily and Heterophily

Tao Wang
Di Jin
Rui Wang
Dongxiao He
Yuxiao Huang

Graph Convolutional Networks (GCNs) have been widely applied in various fields due to their significant power on processing graph-structured data. Typical GCN and its variants work under a homophily assumption (i. e. , nodes with same class are prone to connect to each other), while ignoring the heterophily which exists in many real-world networks (i. e. , nodes with different classes tend to form edges). Existing methods deal with heterophily by mainly aggregating higher-order neighborhoods or combing the immediate representations, which leads to noise and irrelevant information in the result. But these methods did not change the propagation mechanism which works under homophily assumption (that is a fundamental part of GCNs). This makes it difficult to distinguish the representation of nodes from different classes. To address this problem, in this paper we design a novel propagation mechanism, which can automatically change the propagation and aggregation process according to homophily or heterophily between node pairs. To adaptively learn the propagation process, we introduce two measurements of homophily degree between node pairs, which is learned based on topological and attribute information, respectively. Then we incorporate the learnable homophily degree into the graph convolution framework, which is trained in an end-to-end schema, enabling it to go beyond the assumption of homophily. More importantly, we theoretically prove that our model can constrain the similarity of representations between nodes according to their homophily degree. Experiments on seven real-world datasets demonstrate that this new approach outperforms the state-of-the-art methods under heterophily or low homophily, and gains competitive performance under homophily.

PDF Details

NeurIPS Conference 2022 Conference Paper

Rethinking Image Restoration for Object Detection

Shangquan Sun
Wenqi Ren
Tao Wang
Xiaochun Cao

Although image restoration has achieved significant progress, its potential to assist object detectors in adverse imaging conditions lacks enough attention. It is reported that the existing image restoration methods cannot improve the object detector performance and sometimes even reduce the detection performance. To address the issue, we propose a targeted adversarial attack in the restoration procedure to boost object detection performance after restoration. Specifically, we present an ADAM-like adversarial attack to generate pseudo ground truth for restoration training. Resultant restored images are close to original sharp images, and at the same time, lead to better results of object detection. We conduct extensive experiments in image dehazing and low light enhancement and show the superiority of our method over conventional training and other domain adaptation and multi-task methods. The proposed pipeline can be applied to all restoration methods and detectors in both one- and two-stage.

PDF Details

IJCAI Conference 2022 Conference Paper

Uncertainty-Guided Pixel Contrastive Learning for Semi-Supervised Medical Image Segmentation

Tao Wang
Jianglin Lu
Zhihui Lai
Jiajun Wen
Heng Kong

Recently, contrastive learning has shown great potential in medical image segmentation. Due to the lack of expert annotations, however, it is challenging to apply contrastive learning in semi-supervised scenes. To solve this problem, we propose a novel uncertainty-guided pixel contrastive learning method for semi-supervised medical image segmentation. Specifically, we construct an uncertainty map for each unlabeled image and then remove the uncertainty region in the uncertainty map to reduce the possibility of noise sampling. The uncertainty map is determined by a well-designed consistency learning mechanism, which generates comprehensive predictions for unlabeled data by encouraging consistent network outputs from two different decoders. In addition, we suggest that the effective global representations learned by an image encoder should be equivariant to different geometric transformations. To this end, we construct an equivariant contrastive loss to strengthen global representation learning ability of the encoder. Extensive experiments conducted on popular medical image benchmarks demonstrate that the proposed method achieves better segmentation performance than the state-of-the-art methods.

PDF Details DOI

TIST Journal 2022 Journal Article

Weakly Supervised Video Object Segmentation via Dual-attention Cross-branch Fusion

Lili Wei
Congyan Lang
Liqian Liang
Songhe Feng
Tao Wang
Shidi Chen

Recently, concerning the challenge of collecting large-scale explicitly annotated videos, weakly supervised video object segmentation (WSVOS) using video tags has attracted much attention. Existing WSVOS approaches follow a general pipeline including two phases, i.e., a pseudo masks generation phase and a refinement phase. To explore the intrinsic property and correlation buried in the video frames, most of them focus on the later phase by introducing optical flow as temporal information to provide more supervision. However, these optical flow-based studies are greatly affected by illumination and distortion and lack consideration of the discriminative capacity of multi-level deep features. In this article, with the goal of capturing more effective temporal information and investigating a temporal information fusion strategy accordingly, we propose a unified WSVOS model by adopting a two-branch architecture with a multi-level cross-branch fusion strategy, named as dual-attention cross-branch fusion network (DACF-Net). Concretely, the two branches of DACF-Net, i.e., a temporal prediction subnetwork (TPN) and a spatial segmentation subnetwork (SSN), are used for extracting temporal information and generating predicted segmentation masks, respectively. To perform the cross-branch fusion between TPN and SSN, we propose a dual-attention fusion module that can be plugged into the SSN flexibly. We also pose a cross-frame coherence loss (CFCL) to achieve smooth segmentation results by exploiting the coherence of masks produced by TPN and SSN. Extensive experiments demonstrate the effectiveness of proposed approach compared with the state-of-the-arts on two challenging datasets, i.e., Davis-2016 and YouTube-Objects.

Details DOI

AAAI Conference 2021 Short Paper

An Entity-Aware Adversarial Domain Adaptation Network for Cross-Domain Named Entity Recognition (Student Abstract)

Qi Peng
Changmeng Zheng
Yi Cai
Tao Wang
Haoran Xie
Qing Li

Existing methods for named entity recognition are critically relied on labeled data. To handle the situation that the data is fully-unlabeled, we propose an entity-aware adversarial domain adaptation network, which utilizes the labeled source data and then adapts to unlabeled target domain. We first apply adversarial training to reduce the distribution gap between different domains. Furthermore, we introduce an entity-aware attention to guide adversarial process to achieve the alignment of entity features. The experiment shows that our model outperforms the state-of-the-art approaches.

PDF Details

IJCAI Conference 2021 Conference Paper

Deep Reinforcement Learning for Multi-contact Motion Planning of Hexapod Robots

Huiqiao Fu
Kaiqiang Tang
Peng Li
Wenqi Zhang
Xinpeng Wang
Guizhou Deng
Tao Wang
Chunlin Chen

Legged locomotion in a complex environment requires careful planning of the footholds of legged robots. In this paper, a novel Deep Reinforcement Learning (DRL) method is proposed to implement multi-contact motion planning for hexapod robots moving on uneven plum-blossom piles. First, the motion of hexapod robots is formulated as a Markov Decision Process (MDP) with a speciﬁed reward function. Second, a transition feasibility model is proposed for hexapod robots, which describes the feasibility of the state transition under the condition of satisfying kinematics and dynamics, and in turn determines the rewards. Third, the footholds and Center-of-Mass (CoM) sequences are sampled from a diagonal Gaussian distribution and the sequences are optimized through learning the optimal policies using the designed DRL algorithm. Both of the simulation and experimental results on physical systems demonstrate the feasibility and efficiency of the proposed method. Videos are shown at https: //videoviewpage. wixsite. com/mcrl.

PDF Details DOI

NeurIPS Conference 2021 Conference Paper

Direct Multi-view Multi-person 3D Pose Estimation

Tao Wang
Jianfeng Zhang
Yujun Cai
Shuicheng Yan
Jiashi Feng

We present Multi-view Pose transformer (MvP) for estimating multi-person 3D poses from multi-view images. Instead of estimating 3D joint locations from costly volumetric representation or reconstructing the per-person 3D pose from multiple detected 2D poses as in previous methods, MvP directly regresses the multi-person 3D poses in a clean and efficient way, without relying on intermediate tasks. Specifically, MvP represents skeleton joints as learnable query embeddings and let them progressively attend to and reason over the multi-view information from the input images to directly regress the actual 3D joint locations. To improve the accuracy of such a simple pipeline, MvP presents a hierarchical scheme to concisely represent query embeddings of multi-person skeleton joints and introduces an input-dependent query adaptation approach. Further, MvP designs a novel geometrically guided attention mechanism, called projective attention, to more precisely fuse the cross-view information for each joint. MvP also introduces a RayConv operation to integrate the view-dependent camera geometry into the feature representations for augmenting the projective attention. We show experimentally that our MvP model outperforms the state-of-the-art methods on several benchmarks while being much more efficient. Notably, it achieves 92. 3% AP25 on the challenging Panoptic dataset, improving upon the previous best approach [35] by 9. 8%. MvP is general and also extendable to recovering human mesh represented by the SMPL model, thus useful for modeling multi-person body shapes. Code and models are available at https: //github. com/sail-sg/mvp.

PDF Details

TIST Journal 2021 Journal Article

Fine-Grained Semantic Image Synthesis with Object-Attention Generative Adversarial Network

Min Wang
Congyan Lang
Liqian Liang
Songhe Feng
Tao Wang
Yutong Gao

Semantic image synthesis is a new rising and challenging vision problem accompanied by the recent promising advances in generative adversarial networks. The existing semantic image synthesis methods only consider the global information provided by the semantic segmentation mask, such as class label, global layout, and location, so the generative models cannot capture the rich local fine-grained information of the images (e.g., object structure, contour, and texture). To address this issue, we adopt a multi-scale feature fusion algorithm to refine the generated images by learning the fine-grained information of the local objects. We propose OA-GAN, a novel object-attention generative adversarial network that allows attention-driven, multi-fusion refinement for fine-grained semantic image synthesis. Specifically, the proposed model first generates multi-scale global image features and local object features, respectively, then the local object features are fused into the global image features to improve the correlation between the local and the global. In the process of feature fusion, the global image features and the local object features are fused through the channel-spatial-wise fusion block to learn ‘what’ and ‘where’ to attend in the channel and spatial axes, respectively. The fused features are used to construct correlation filters to obtain feature response maps to determine the locations, contours, and textures of the objects. Extensive quantitative and qualitative experiments on COCO-Stuff, ADE20K and Cityscapes datasets demonstrate that our OA-GAN significantly outperforms the state-of-the-art methods.

Details DOI

EAAI Journal 2021 Journal Article

STENet: A hybrid spatio-temporal embedding network for human trajectory forecasting

Bo Zhang
Chengzhi Yuan
Tao Wang
Hongbo Liu

Details DOI

EAAI Journal 2020 Journal Article

A review of machine learning for new generation smart dispatch in power systems

Linfei Yin
Qi Gao
Lulin Zhao
Bin Zhang
Tao Wang
Shengyuan Li
Hui Liu

Details DOI

EAAI Journal 2020 Journal Article

A weighted corrective fuzzy reasoning spiking neural P system for fault diagnosis in power systems with variable topologies

Tao Wang
Xiaoguang Wei
Jun Wang
Tao Huang
Hong Peng
Xiaoxiao Song
Luis Valencia Cabrera
Mario J. Pérez-Jiménez

Details DOI

TIST Journal 2020 Journal Article

End-to-End Text-to-Image Synthesis with Spatial Constrains

Min Wang
Congyan Lang
Liqian Liang
Songhe Feng
Tao Wang
Yutong Gao

Although the performance of automatically generating high-resolution realistic images from text descriptions has been significantly boosted, many challenging issues in image synthesis have not been fully investigated, due to shapes variations, viewpoint changes, pose changes, and the relations of multiple objects. In this article, we propose a novel end-to-end approach for text-to-image synthesis with spatial constraints by mining object spatial location and shape information. Instead of learning a hierarchical mapping from text to image, our algorithm directly generates multi-object fine-grained images through the guidance of the generated semantic layouts. By fusing text semantic and spatial information into a synthesis module and jointly fine-tuning them with multi-scale semantic layouts generated, the proposed networks show impressive performance in text-to-image synthesis for complex scenes. We evaluate our method both on single-object CUB dataset and multi-object MS-COCO dataset. Comprehensive experimental results demonstrate that our method significantly outperforms the state-of-the-art approaches consistently across different evaluation metrics.

Details DOI

AAAI Conference 2020 Conference Paper

The task of spatial-temporal action detection has attracted increasing attention among researchers. Existing dominant methods solve this problem by relying on short-term information and dense serial-wise detection on each individual frames or clips. Despite their effectiveness, these methods showed inadequate use of long-term information and are prone to inefﬁciency. In this paper, we propose for the ﬁrst time, an efﬁcient framework that generates action tube proposals from video streams with a single forward pass in a sparse-to-dense manner. There are two key characteristics in this framework: (1) Both long-term and short-term sampled information are explicitly utilized in our spatiotemporal network, (2) A new dynamic feature sampling module (DTS) is designed to effectively approximate the tube output while keeping the system tractable. We evaluate the efﬁcacy of our model on the UCF101-24, JHMDB-21 and UCFSports benchmark datasets, achieving promising results that are competitive to state-of-the-art methods. The proposed sparse-to-dense strategy rendered our framework about 7. 6 times more efﬁcient than the nearest competitor.

PDF Details

TCS Journal 2020 Journal Article

Novel updatable identity-based hash proof system and its applications

Yanwei Zhou
Bo Yang
Tao Wang
Yi Mu

Details DOI

TIST Journal 2019 Journal Article

Co-saliency Detection with Graph Matching

Zun Li
Congyan Lang
Jiashi Feng
Yidong Li
Tao Wang
Songhe Feng

Recently, co-saliency detection, which aims to automatically discover common and salient objects appeared in several relevant images, has attracted increased interest in the computer vision community. In this article, we present a novel graph-matching based model for co-saliency detection in image pairs. A solution of graph matching is proposed to integrate the visual appearance, saliency coherence, and spatial structural continuity for detecting co-saliency collaboratively. Since the saliency and the visual similarity have been seamlessly integrated, such a joint inference schema is able to produce more accurate and reliable results. More concretely, the proposed model first computes the intra-saliency for each image by aggregating multiple saliency cues. The common and salient regions across multiple images are thus discovered via a graph matching procedure. Then, a graph reconstruction scheme is proposed to refine the intra-saliency iteratively. Compared to existing co-saliency detection methods that only utilize visual appearance cues, our proposed model can effectively exploit both visual appearance and structure information to better guide co-saliency detection. Extensive experiments on several challenging image pair databases demonstrate that our model outperforms state-of-the-art baselines significantly.

Details DOI

ICRA Conference 2019 Conference Paper

Eagle Shoal: A new designed modular tactile sensing dexterous hand for domestic service robots

Tao Wang
Zhanxiao Geng
Bo Kang
Xiaochuan Luo

This paper introduces a new designed modular tactile sensing dexterous hand for domestic service robots. This fully-actuated hand consists of 1 palm and 3 fingers, with embedded tactile sensors, motors and control boards. The palm and each finger have 2 degrees of freedom (DOFs). The modular design makes it easy to attach and detach the hand, even by inexperienced users. The tactile sensor unit with new structure can help to decrease sensor number and keep a good sensing ability. A series of experiments to test the sensor unit and evaluated the hand performance with an object set was performed in this paper. The results show that the sensor unit can provide precise sensing result and perceive continuous vibration data, and the hand has excellent grasp ability. In addition to its good performance, the hand features a cost of $500 USD with a scale of one hundred sets. This hand is affordable for researchers and for domestic service robots in the consumer market. In future research, this hand will be used to promote the robotic manipulation research based on visual and tactile data.

Details

AAAI Conference 2019 Conference Paper

Partial Multi-Label Learning by Low-Rank and Sparse Decomposition

Lijuan Sun
Songhe Feng
Tao Wang
Congyan Lang
Yi Jin

Multi-Label Learning (MLL) aims to learn from the training data where each example is represented by a single instance while associated with a set of candidate labels. Most existing MLL methods are typically designed to handle the problem of missing labels. However, in many real-world scenarios, the labeling information for multi-label data is always redundant, which can not be solved by classical MLL methods, thus a novel Partial Multi-label Learning (PML) framework is proposed to cope with such problem, i. e. removing the the noisy labels from the multi-label sets. In this paper, in order to further improve the denoising capability of PML framework, we utilize the low-rank and sparse decomposition scheme and propose a novel Partial Multi-label Learning by Low-Rank and Sparse decomposition (PML-LRS) approach. Specifically, we first reformulate the observed label set into a label matrix, and then decompose it into a groundtruth label matrix and an irrelevant label matrix, where the former is constrained to be low rank and the latter is assumed to be sparse. Next, we utilize the feature mapping matrix to explore the label correlations and meanwhile constrain the feature mapping matrix to be low rank to prevent the proposed method from being overfitting. Finally, we obtain the ground-truth labels via minimizing the label loss, where the Augmented Lagrange Multiplier (ALM) algorithm is incorporated to solve the optimization problem. Enormous experimental results demonstrate that PML-LRS can achieve superior or competitive performance against other state-of-the-art methods.

PDF Details

YNICL Journal 2019 Journal Article

Severe asymptomatic carotid stenosis is associated with robust reductions in homotopic functional connectivity

Lei Gao
Tao Wang
Tianyi Qian
Feng Xiao
Lijun Bai
Junjian Zhang
Haibo Xu

Details DOI

ICRA Conference 2018 Conference Paper

A Fluid-Filled Tubular Dielectric Elastomer Variable Stiffness Structure Inspired by the Hydrostatic Skeleton Principle *Research supported by the National Natural Science Foundation of China (No. 51675413)

Tao Wang
Yue Li
Yuanjie Li
Jinhua Zhang
Jun Hong 0002
Michael Yu Wang

This work presents a novel variable stiffness structure consisting of a fiber-constrained dielectric elastomer tube filled with insulating oil. The tensile stiffness of the structure can be adjusted by voltages and its initial value can be customized according to the initial pre-stretch of the material. The structure has a dimension of ∼30 mm diameter × 50 mm length. A mathematical analysis is established to predict the initial tensile stiffness of the structure. The changes of the tensile stiffness of the structure under voltages are verified experimentally. The results show a decrease of the tensile stiffness of the device by 25% at 4 kV and the decrement is also related to the elongation of the structure. With different pre-stretches and dimensions of the dielectric elastomer, one can obtain devices with different variation ranges of tensile stiffness.

Details

ICRA Conference 2017 Conference Paper

Design and control of an inchworm-inspired soft robot with omega-arching locomotion

Huaxia Guo
Jinhua Zhang
Tao Wang
Yuanjie Li
Jun Hong 0002
Yue Li

This paper presents an inchworm inspired soft robot composed of the soft body, the front foot as well as the back foot. Compared to the traditional inchworm-type robot consisting of rigid components, the driven mode for the soft robot is more simple. The soft robot inspired by the inchworm has higher locomotion efficiency than the other bionic soft robot. The main idea of this paper is to imitate the “Ω” motion shape of biology inchworm based on a silicone square tube with strain-limiting layers. Besides, each foot of the robot made through 3D printing technology together with metal sheet can produce different friction coefficients to achieve the anchor-motion movement. Then, the robot realizes an inchworm-like locomotion under certain actuation patterns. Experimental results show that the proposed robot has excellent performance.

Details

IJCAI Conference 2017 Conference Paper

Interactive Image Segmentation via Pairwise Likelihood Learning

Tao Wang
Quansen Sun
Qi Ge
Zexuan Ji
Qiang Chen
Guiyu Xia

This paper presents an interactive image segmentation approach where the segmentation problem is formulated as a probabilistic estimation manner. Instead of measuring the distances between unseeded pixels and seeded pixels, we measure the similarities between pixel pairs and seed pairs to improve the robustness to the seeds. The unary prior probability of each pixel belonging to the foreground F and background B can be effectively estimated based on the similarities with label pairs (F, F), (F, B), (B, F) and (B, B). Then a likelihood learning framework is proposed to fuse the region and boundary information of the image by imposing the smoothing constraint on the unary potentials. Experiments on challenging data sets demonstrate that the proposed method can obtain better performance than state-of-the-art methods.

PDF Details

ICRA Conference 2016 Conference Paper

A continuous jumping robot on water mimicking water striders

Jihong Yan
Kai Yang 0008
Tao Wang
Xinbin Zhang
Jie Zhao 0003

Aiming at mimicking the jumping locomotion of water striders, a new continuous jumping robot on water is proposed. Compared with the horizontal rowing motion, the jumping capability of water striders is challengeable to imitate, since the impact force on water is easy to cause the sinking of the robot. In this paper, a jumping mechanism based on springs is designed to produce a large thrust for the robot to jump. The shape of supporting legs and center of gravity of the robot are carefully designed so that the robot can jump on the surface continuously and smoothly. Influences of several critical factors, including the area of supporting legs, spring stiffness and jumping angle, on jump performance are analyzed by means of dynamic simulation and experiments. The fabricated robot weighs about 10. 2 g and can continuously jump on water with the maximum leap height and length of 120 mm and 410 mm, respectively.

Details

AAAI Conference 2016 Conference Paper

Convolutional Neural Networks over Tree Structures for Programming Language Processing

Lili Mou
Ge Li
Lu Zhang
Tao Wang
Zhi Jin

Programming language processing (similar to natural language processing) is a hot research topic in the ﬁeld of software engineering; it has also aroused growing interest in the artiﬁcial intelligence community. However, different from a natural language sentence, a program contains rich, explicit, and complicated structural information. Hence, traditional NLP models may be inappropriate for programs. In this paper, we propose a novel tree-based convolutional neural network (TBCNN) for programming language processing, in which a convolution kernel is designed over programs’ abstract syntax trees to capture structural information. TBCNN is a generic architecture for programming language processing; our experiments show its effectiveness in two different program analysis tasks: classifying programs according to functionality, and detecting code snippets of certain patterns. TBCNN outperforms baseline methods, including several neural models for NLP.

PDF Details

YNIMG Journal 2016 Journal Article

Joint feature-sample selection and robust diagnosis of Parkinson's disease from MRI data

Ehsan Adeli
Feng Shi
Le An
Chong-Yaw Wee
Guorong Wu
Tao Wang
Dinggang Shen

Details DOI

AAAI Conference 2016 Conference Paper

Path Following with Adaptive Path Estimation for Graph Matching

Tao Wang
Haibin Ling

Graph matching plays an important role in many ﬁelds in computer vision. It is a well-known general NP-hard problem and has been investigated for decades. Among the large amount of algorithms for graph matching, the algorithms utilizing the path following strategy exhibited state-of-art performances. However, the main drawback of this category of algorithms lies in their high computational burden. In this paper, we propose a novel path following strategy for graph matching aiming to improve its computation efﬁciency. We ﬁrst propose a path estimation method to reduce the computational cost at each iteration, and subsequently a method of adaptive step length to accelerate the convergence. The proposed approach is able to be integrated into all the algorithms that utilize the path following strategy. To validate our approach, we compare our approach with several recently proposed graph matching algorithms on three benchmark image datasets. Experimental results show that, our approach improves signiﬁcantly the computation efﬁciency of the original algorithms, and offers similar or better matching results.

PDF Details

IS Journal 2014 Journal Article

Characterizing the Evolution of Social Computing Research

Tao Wang
Zhong Liu
Baoxin Xiu
Hong Mo
Qingpeng Zhang

With Web 2. 0 advances, social computing has become an emerging research field in the past decade. This article analyzes the characteristics of social computing research from both static and dynamic perspectives. First, the authors present the overlapping relationships of content represented by keywords as of 2011. Next, they show the dynamics of social computing research through analyzing keyword trends and topological evolution of co-word networks. The article characterizes the key features and the evolution of social computing from a quantitative perspective.

Details DOI

IS Journal 2014 Journal Article

Collaboration Pattern and Topic Analysis on Intelligence and Security Informatics Research

Wenli Liu
Xiaolong Zheng
Tao Wang
Hui Wang

In this article, researcher collaboration patterns and research topics on Intelligence and Security Informatics (ISI) are investigated using social network analysis approaches. The collaboration networks exhibit scale-free property and small-world effect. From these networks, the authors obtain the key researchers, institutions, and three important topics.

Details DOI

ICRA Conference 2014 Conference Paper

On-board inertial-assisted visual odometer on an embedded system

Guyue Zhou
Jiaxin Ye
Wei Ren
Tao Wang
Zexiang Li 0001

In this paper, we propose a novel inertial-assisted visual odometry system intended for low-cost micro aerial vehicles (MAVs). The system sensor assembly consists of two downward-facing cameras and an inertial measurement unit (IMU) with three-axis accelerometers/gyroscopes. Real-time implementation of the system is enabled by a low-cost embedded system via two important features: firstly, simple pixel-level algorithms are integrated in a low-end FPGA and accelerated via pipeline and combinational logic techniques; secondly, a fast yaw-and-translation estimation algorithm works well with a novel outlier rejection scheme based on probabilistic predetermined operations rather than hypothesis testing iterations. We illustrate the performance of our system by hovering a MAV in a GPS-denied environment. Its feasibility and robustness is also illustrated in complex outdoor environments.

Details

ICML Conference 2013 Conference Paper

Deep learning with COTS HPC systems

Adam Coates 0002
Brody Huval
Tao Wang
David J. Wu 0001
Bryan Catanzaro
Andrew Y. Ng

Scaling up deep learning algorithms has been shown to lead to increased performance in benchmark tasks and to enable discovery of complex high-level features. Recent efforts to train extremely large networks (with over 1 billion parameters) have relied on cloud-like computing infrastructure and thousands of CPU cores. In this paper, we present technical details and results from our own system based on Commodity Off-The-Shelf High Performance Computing (COTS HPC) technology: a cluster of GPU servers with Infiniband interconnects and MPI. Our system is able to train 1 billion parameter networks on just 3 machines in a couple of days, and we show that it can scale to networks with over 11 billion parameters using just 16 machines. As this infrastructure is much more easily marshaled by others, the approach enables much wider-spread research with extremely large neural networks.

Details

TIST Journal 2011 Journal Article

Automatic player labeling, tracking and field registration and trajectory mapping in broadcast soccer video

Xiaofeng Tong
Jia Liu
Tao Wang
Yimin Zhang

In this article, we present a method to perform automatic player trajectories mapping based on player detection, unsupervised labeling, efficient multi-object tracking, and playfield registration in broadcast soccer videos. Player detector determines the players' positions and scales by combining the ability of dominant color based background subtraction and a boosting detector with Haar features. We first learn the dominant color with accumulate color histogram at the beginning of processing, then use the player detector to collect hundreds of player samples, and learn player appearance codebook by unsupervised clustering. In a soccer game, a player can be labeled as one of four categories: two teams, referee or outlier. The learning capability enables the method to be generalized well to different videos without any manual initialization. With the dominant color and player appearance model, we can locate and label each player. After that, we perform multi-object tracking by using Markov Chain Monte Carlo (MCMC) data association to generate player trajectories. Some data driven dynamics are proposed to improve the Markov chain's efficiency, such as label consistency, motion consistency, and track length, etc. Finally, we extract key-points and find the mapping from an image plane to the standard field model, and then map players' position and trajectories to the field. A large quantity of experimental results on FIFA World Cup 2006 videos demonstrate that this method can reach high detection and labeling precision, reliably tracking in scenes of player occlusion, moderate camera motion and pose variation, and yield promising field registration results.

Details DOI

IJCAI Conference 2007 Conference Paper

Jianguo Li
Changshui Zhang
Tao Wang
Yimin Zhang

Bayesian network classifiers (BNC) have received considerable attention in machine learning field. Some special structure BNCs have been proposed and demonstrate promise performance. However, recent works show that structure learning in BNs may lead to a non-negligible posterior problem, i. e, there might be many structures have similar posterior scores. In this paper, we propose a generalized additive Bayesian network classifiers, which transfers the structure learning problem to a generalized additive models (GAM) learning problem. We first generate a series of very simple BNs, and put them in the framework of GAM, then adopt a gradient-based algorithm to learn the combining parameters, and thus construct a more powerful classifier. On a large suite of benchmark data sets, the proposed approach outperforms many traditional BNCs, such as naive Bayes, TAN, etc, and achieves comparable or better performance in comparison to boosted Bayesian network classifiers.

PDF Details

IJCAI Conference 2007 Conference Paper

Daniel Lizotte
Tao Wang
Michael Bowling
Dale Schuurmans

Gait optimization is a basic yet challenging problem for both quadrupedal and bipedal robots. Although techniques for automating the process exist, most involve local function optimization procedures that suffer from three key drawbacks. Local optimization techniques are naturally plagued by local optima, make no use of the expensive gait evaluations once a local step is taken, and do not explicitly model noise in gait evaluation. These drawbacks increase the need for a large number of gait evaluations, making optimization slow, data inefficient, and manually intensive. We present a Bayesian approach based on Gaussian process regression that addresses all three drawbacks. It uses a global search strategy based on a posterior model inferred from all of the individual noisy evaluations. We demonstrate the technique on a quadruped robot, using it to optimize two different criteria: speed and smoothness. We show in both cases our technique requires dramatically fewer gait evaluations than state-of-the-art local gradient approaches.

PDF Details

NeurIPS Conference 2007 Conference Paper

Stable Dual Dynamic Programming

Tao Wang
Michael Bowling
Dale Schuurmans
Daniel Lizotte

Recently, we have introduced a novel approach to dynamic programming and re- inforcement learning that is based on maintaining explicit representations of sta- tionary distributions instead of value functions. In this paper, we investigate the convergence properties of these dual algorithms both theoretically and empirically, and show how they can be scaled up by incorporating function approximation.

PDF Details

AAAI Conference 2006 Short Paper

Action Selection in Bayesian Reinforcement Learning

Tao Wang

My research attempts to address on-line action selection in reinforcement learning from a Bayesian perspective. The idea is to develop more effective action selection techniques by exploiting information in a Bayesian posterior, while also selecting actions by growing an adaptive, sparse lookahead tree. I further augment the approach by considering a new value function approximation strategy for the belief-state Markov decision processes induced by Bayesian learning.

PDF Details

AAAI Conference 2006 Conference Paper

Compact, Convex Upper Bound Iteration for Approximate POMDP Planning

Tao Wang
Michael Bowling

Partially observable Markov decision processes (POMDPs) are an intuitive and general way to model sequential decision making problems under uncertainty. Unfortunately, even approximate planning in POMDPs is known to be hard, and developing heuristic planners that can deliver reasonable results in practice has proved to be a significant challenge. In this paper, we present a new approach to approximate value-iteration for POMDP planning that is based on quadratic rather than piecewise linear function approximators. Specifically, we approximate the optimal value function by a convex upper bound composed of a fixed number of quadratics, and optimize it at each stage by semidefinite programming. We demonstrate that our approach can achieve competitive approximation quality to current techniques while still maintaining a bounded size representation of the function approximator. Moreover, an upper bound on the optimal value function can be preserved if required. Overall, the technique requires computation time and space that is only linear in the number of iterations (horizon time).

PDF Details

ICML Conference 2005 Conference Paper

Bayesian sparse sampling for on-line reward optimization

Tao Wang
Daniel J. Lizotte
Michael H. Bowling
Dale Schuurmans

Details