Arrow Research search

Author name cluster

Yi Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

125 papers
2 author rows

Possible papers

125

AAAI Conference 2026 Conference Paper

CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation

  • Guanghao Zhang
  • Tao Zhong
  • Yan Xia
  • Mushui Liu
  • Zhelun Yu
  • Haoyuan Li
  • Wanggui He
  • Dong She

While previous multimodal slow-thinking methods have demonstrated remarkable success in single-image understanding scenarios, their effectiveness becomes fundamentally constrained when extended to more complex multi-image comprehension tasks. This limitation stems from their predominant reliance on text-based intermediate reasoning processes. While for human, when engaging in sophisticated multi-image analysis, they typically perform two complementary cognitive operations: (1) continuous cross-image visual comparison through region-of-interest matching, and (2) dynamic memorization of critical visual concepts throughout the reasoning chain. Motivated by these observations, we propose the Complex Multi-Modal Chain-of-Thought (CMMCoT) framework, a multi-step reasoning framework that mimics human-like "slow thinking" for multi-image understanding. Our approach incorporates two key innovations: (1) The construction of interleaved multimodal multi-step reasoning chains, which utilize critical visual region tokens, extracted from intermediate reasoning steps, as supervisory signals. This mechanism not only facilitates comprehensive cross-modal understanding but also enhances model interpretability. (2) The introduction of a test-time memory augmentation module that expands the model’s reasoning capacity during inference while preserving parameter efficiency. Furthermore, to facilitate research in this direction, we have curated a novel multi-image slow-thinking dataset. Extensive experiments demonstrate the effectiveness of our model.

AAAI Conference 2026 Conference Paper

Diffusion Reconstruction-based Data Likelihood Estimation for Core-Set Selection

  • Mingyang Chen
  • Jiawei Du
  • Bo Huang
  • Yi Wang
  • Xiaobo Zhang
  • Wei Wang

Existing core-set selection methods predominantly rely on heuristic scoring signals such as training dynamics or model uncertainty, lacking explicit modeling of data likelihood. This omission may hinder the constructed subset from capturing subtle yet critical distributional structures that underpin effective model training. In this work, we propose a novel, theoretically grounded approach that leverages diffusion models to estimate data likelihood via reconstruction deviation induced by partial reverse denoising. Specifically, we establish a formal connection between reconstruction error and data likelihood, grounded in the Evidence Lower Bound (ELBO) of Markovian diffusion processes, thereby enabling a principled, distribution-aware scoring criterion for data selection. Complementarily, we introduce an efficient information-theoretic method to identify the optimal reconstruction timestep, ensuring that the deviation provides a reliable signal indicative of underlying data likelihood. Extensive experiments on ImageNet demonstrate that reconstruction deviation offers an effective scoring criterion, consistently outperforming existing baselines across selection ratios, and closely matching full-data training using only 50% of the data. Further analysis shows that the likelihood-informed nature of our score reveals informative insights in data selection, shedding light on the interplay between data distributional characteristics and model learning preferences.

AAAI Conference 2026 Conference Paper

FreeMem: Enhancing Consistency in Long Video Generation via Tuning-Free Memory

  • Jibin Peng
  • Di Lin
  • Zhecheng Xu
  • Haoran Lu
  • Ruonan Liu
  • Wuyuan Xie
  • Miaohui Wang
  • Lingyu Liang

Text-to-Video (T2V) generation has advanced greatly, yet maintaining consistency remains challenging, especially for tuning-free long video generation. We attribute the consistency problem to cumulative deviations for long video generation at three levels: the random noise lacking correlation results initial deviation between frames; discrepancy in semantic feature tokens between denoising network blocks gradually accumulates as the frame count grows, leading to greater deviations; attention mechanisms struggle to capture global relationships across distant frames in long videos. To address these, we propose FreeMem, a tuning-free framework leveraging hierarchical memory update and injection: the noise memory stabilizes consistency by manipulating low and high frequency components in the initial noise space; the token memory combats inconsistency through adaptive fusion of historical and current semantic feature tokens between denoising network blocks; and the attention memory establishes persistent cache to model long-range relationships within self attention layers. Evaluated on VBench, FreeMem improves subject and background consistency matrics across various methods, offering a practical solution for low-cost, high-consistency long video generation.

EAAI Journal 2026 Journal Article

Granular-ball based robust representation learning for social recommendation

  • Xiaofei Zhu
  • Shiyan Wu
  • Li Liu
  • Shuyin Xia
  • Yi Wang
  • Guoyin Wang

Social recommendation systems seek to leverage social relationships to mitigate data sparsity and cold-start issues by augmenting user–item interactions. However, existing methods encounter two critical limitations: (1) They predominantly model user–item interactions at a fine-grained granular level of user/item nodes, neglecting the potential coarse-grained collaborative patterns; and (2) They usually suppress noisy edges in social graphs from a single granular perspective, failing to adjust the denoising granularity according to the actual strength of relationships between users. To address these challenges, we propose GBRSR, a novel Granular-ball based Robust Representation Learning framework. Inspired by the “Global-first” cognitive principle, Granular-ball Computing (GBC), which represents data as granular-ball units with geometric significance, has garnered significant attention due to its outstanding performance in many fields. We leverage GBC theory for representation distillation, transferring coarse-grained knowledge to enhance fine-grained node-level representations. In addition, we employs a granular-ball based structure denoising strategy to prune noisy user relationships, while simultaneously alleviating noise in user representations through a diffusion process. Extensive experiments on three real-world benchmark datasets validate the superiority of GBRSR in recommendation accuracy and robustness, particularly under noisy and sparse conditions.

AAAI Conference 2026 Conference Paper

InterCoser: Interactive 3D Character Creation with Disentangled Fine-Grained Features

  • Yi Wang
  • Jian Ma
  • Zhuo Su
  • Guidong Wang
  • Jingyu Yang
  • Yu-Kun Lai
  • Kun Li

This paper aims to interactively generate and edit disentangled 3D characters based on precise user instructions. Existing methods generate and edit 3D characters via rough and simple editing guidance and entangled representations, making it difficult to achieve precise and comprehensive control over fine-grained local editing and free clothing transfer for characters. To enable accurate and intuitive control over the generation and editing of high-quality 3D characters with freely interchangeable clothing, we propose a novel user-interactive approach for disentangled 3D character creation. Specifically, to achieve precise control over 3D character generation and editing, we introduce two user-friendly interaction approaches: a sketch-based layered character generation/editing method, which supports clothing transfer; and a 3D-proxy-based part-level editing method, enabling fine-grained disentangled editing. To enhance 3D character quality, we propose a 3D Gaussian reconstruction strategy guided by geometric priors, ensuring that 3D characters exhibit detailed local geometry and smooth global surfaces. Extensive experiments on both public datasets and in-the-wild data demonstrate that our approach not only generates high-quality disentangled 3D characters but also supports precise and fine-grained editing through user interaction.

JBHI Journal 2026 Journal Article

mTIPs: Multimodal Time-Series Imaging and Prediction System for Early Differentiation of Retinal Organoids

  • Yi Wang
  • Yilin Qiao
  • Junao Song
  • Yuan-zhi Liu
  • Min Ye
  • Jianbo Mao
  • Ronald X. Xu
  • Mingzhai Sun

Retinal organoids (ROs) hold immense promise for disease modeling and therapy, but their production is hindered by developmental heterogeneity, necessitating robust characterization and early prediction. However, the limited application of current methods to human stem cell–derived organoids reduces clinical relevance; the omission of 3D multimodal imaging leads to incomplete morphological assessment; and the lack of time-series analysis diminishes applicability. To address these gaps, we developed mTIPs (Multimodal Time Series Imaging and Prediction System), which enables automated, early-stage quality control of human embryonic stem cell (hESC)-derived ROs. Firstly, we present the first publicly available multimodal, time-series imaging dataset of hESC-derived retinal organoids, establishing a foundational resource with significant clinical relevance that was previously lacking. Furthermore, we highlight the advantages of 3D imaging, demonstrating that Optical Coherence Tomography (OCT) derived volumetric data offers unique insights into development and significantly enhances early predictive accuracy compared to traditional 2D bright-field microscopy. Notably, we developed a practical, unified model using a single time-encoding network that achieves an AUROC exceeding 0. 8 at all time points, including as early as Day 6—significantly outperforming expert manual classification. mTIPs offers a scalable framework to standardize organoid production, enhancing the reliability of ROs for research and future clinical applications.

EAAI Journal 2026 Journal Article

Multimodal and view image processing for three-dimensional object detection in autonomous driving

  • Yi Wang
  • Hang Dong
  • Hua Bo

Three-dimensional object detection is critical for artificial intelligence autonomous driving, as it allows vehicles to detect and accurately locate objects in complex and dynamic environments. Traditional methods that rely on light detection and ranging or camera data often face challenges in scenarios involving occlusion, varying lighting conditions, and cluttered backgrounds. This paper proposes a novel multimodal approach that integrates radar point cloud data with multiview image features, including bird’s eye view and front view, to leverage complementary information for enhanced detection accuracy and robustness. We introduced a method of integrating a “voting mechanism” to generate high-quality candidate regions, combined with a target attention mechanism to improve feature extraction and object localization, particularly in difficult environments. By effectively combining the strengths of both three-dimensional and two-dimensional data, our method addresses occlusion and noise issues while enhancing detection capabilities across diverse conditions. We validated the effectiveness of the proposed method on three object detection datasets, achieving mean average precision of 90. 30%, 72. 4%, and 75. 1%, respectively.

AAAI Conference 2026 Conference Paper

P2S: Probabilistic Process Supervision for General-Domain Reasoning Question Answering

  • Wenlin Zhong
  • Chengyuan Liu
  • Yiquan Wu
  • Bovin Tan
  • Changlong Sun
  • Yi Wang
  • Xiaozhong Liu
  • Kun Kuang

While reinforcement learning with verifiable rewards (RLVR) has advanced LLM reasoning in structured domains like mathematics and programming, its application to general-domain reasoning tasks remains challenging due to the absence of verifiable reward signals. To this end, methods like Reinforcement Learning with Reference Probability Reward (RLPR) have emerged, leveraging the probability of generating the final answer as a reward signal. However, these outcome-focused approaches neglect crucial step-by-step supervision of the reasoning process itself. To address this gap, we introduce Probabilistic Process Supervision (P2S), a novel self-supervision framework that provides fine-grained process rewards without requiring a separate reward model or human-annotated reasoning steps. During reinforcement learning, P2S synthesizes and filters a high-quality reference reasoning chain (gold-CoT). The core of our method is to calculate a Path Faithfulness Reward (PFR) for each reasoning step, which is derived from the conditional probability of generating the gold-CoT's suffix, given the model's current reasoning prefix. Crucially, this PFR can be flexibly integrated with any outcome-based reward, directly tackling the reward sparsity problem by providing dense guidance. Extensive experiments on reading comprehension and medical Question Answering benchmarks show that P2S significantly outperforms strong baselines.

AAAI Conference 2026 Conference Paper

Permutation Equivariant Framelet-based Hypergraph Neural Networks

  • Ming Li
  • Yi Wang
  • Chengling Gao
  • Lu Bai
  • Yujie Fang
  • Xiaosheng Zhuang
  • Pietro Lio

Hypergraphs provide a natural and expressive framework for modeling high-order relationships, enabling the representation of group-wise interactions beyond pairwise connections. While hypergraph neural networks (HNNs) have shown promise for learning on such structures, existing models often rely on shallow message passing and lack the ability to extract multiscale patterns. Framelet-based techniques offer a principled solution by decomposing signals into multiple frequency bands. However, most prior framelet systems, particularly Haar-type ones, are sensitive to node ordering and fail to ensure consistent representations under permutation, leading to instability in hypergraph learning. To address this, we propose Permutation Equivariant Framelet-based Hypergraph Neural Networks (PEF-HNN), a novel framework that integrates multiscale framelet analysis with permutation-consistent learning. We construct a new family of permutation equivariant Haar-type framelets specifically designed for hypergraphs, supported by theoretical analysis of their stability and decomposition properties. Built upon these framelets, PEF-HNN incorporates both low-pass and high-pass components across multiple scales into a unified neural architecture. Extensive experiments on nine benchmark datasets, including three homophilic and four heterophilic hypergraphs, as well as two real-world datasets for visual object classification, demonstrate the effectiveness of our approach, consistently outperforming existing HNN baselines and highlighting the advantages of permutation equivariant framelet design in hypergraph representation learning.

AAAI Conference 2026 Conference Paper

REFO: Reinforced Evolutionary Faithfulness Optimization for Large Language Models

  • Yi Wang
  • Xiaqiang Tang
  • Keyu Hu
  • Haojie Lu
  • Sihong Xie

Despite its success in enriching LLMs with external knowledge, RAG remains plagued by faithfulness hallucinations, where generated text contradicts the retrieved source information. Previous research on faithfulness hallucination in LLMs is frequently hindered by prohibitive manual annotation costs and a dependency on static datasets, which caps their performance and adaptability. Furthermore, these models lack a clear training mechanism to explicitly promote contextual focus. In this work, we propose a novel iterative self-evolution framework to enhance model faithfulness. This framework autonomously generates high-quality data and leverages it for the continuous self-optimization of the model, leading to significant improvements in faithfulness. Our experimental analysis reveals that improving model faithfulness encourages a closer alignment of the attention distribution with the given context. Based on this finding, we design an attention-based loss function to further promote this process. Experimental results show that our model achieves state-of-the-art faithfulness on a range of context-based question-answering datasets, marking a significant advancement over previous approaches.

AAAI Conference 2026 Conference Paper

Richer Representations for Neural Algorithmic Reasoning via Auxiliary Reconstruction

  • Jiafu Huang
  • Chao Peng
  • Chenyang Xu
  • Zhengfeng Yang
  • Kecheng Cai
  • Chenhao Zhang
  • Yi Wang
  • Yiwei Gong

Neural algorithmic reasoning has recently emerged as a popular research direction. It aims to train neural networks to mimic the step-by-step behavior of classical rule-based algorithms. More specifically, the execution of such algorithms can be abstracted as a sequence of states, where each state represents the intermediate outcome after an execution step. The training objective is to generate state sequences that replicate the underlying algorithmic process. A common framework for this task adopts an ``encoder-processor-decoder'' architecture, where the encoder learns representations of states, the processor simulates algorithmic steps, and the decoder reconstructs output states. While prior work has primarily focused on improving the processor, the role of the encoder in representation learning has received little attention. Most existing methods rely on simple MLP encoders, raising the question of whether such representations are sufficiently informative for supporting algorithmic reasoning. This paper investigates how to improve encoder representations for neural algorithmic reasoning. We propose a reconstruction module that aims to recover the input state from its encoded representation. This auxiliary reconstruction task encourages the encoder to retain critical information about the input. We demonstrate that incorporating this task during training improves the performance of existing neural architectures on standard benchmarks. Furthermore, we observe that current encoders often underutilize the correlations among features within a state. To address this, we draw inspiration from self-supervised learning and design an enhanced variant of the auxiliary task that encourages the encoder to capture intra-state feature dependencies. Experimental results show that our method enables the encoder to learn richer representations, thereby enhancing the performance of existing processors on algorithmic reasoning tasks.

AAAI Conference 2026 Conference Paper

Semi-supervised Latent Disentangled Diffusion Model for Textile Pattern Generation

  • Chenggong Hu
  • Yi Wang
  • Mengqi Xue
  • Haofei Zhang
  • Jie Song
  • Li Sun

Textile pattern generation (TPG) aims to synthesize fine-grained textile pattern images based on given clothing images. Although previous studies have not explicitly investigated TPG, existing image-to-image models appear to be natural candidates for this task. However, when applied directly, these methods often produce unfaithful results, failing to preserve fine-grained details due to feature confusion between complex textile patterns and the inherent non-rigid texture distortions in clothing images. In this paper, we propose a novel method, SLDDM-TPG, for faithful and high-fidelity TPG. Our method consists of two stages: (1) a latent disentangled network (LDN) that resolves feature confusion in clothing representations and constructs a multi-dimensional, independent clothing feature space; and (2) a semi-supervised latent diffusion model (S-LDM), which receives guidance signals from LDN and generates faithful results through semi-supervised diffusion training, combined with our designed fine-grained alignment strategy. Extensive evaluations show that SLDDM-TPG reduces FID by 4.1 and improves SSIM by up to 0.116 on our CTP-HD dataset, and also demonstrate good generalization on the VITON-HD dataset.

AAAI Conference 2026 Conference Paper

SSTODE: Ocean-Atmosphere Physics-Informed Neural ODEs for Sea Surface Temperature Prediction

  • Zheng Jiang
  • Wei Wang
  • Gaowei Zhang
  • Yi Wang

Sea Surface Temperature (SST) is crucial for understanding upper-ocean thermal dynamics and ocean-atmosphere interactions, which have profound economic and social impacts. While data-driven models show promise in SST prediction, their black-box nature often limits interpretability and overlooks key physical processes. Recently, physics-informed neural networks have been gaining momentum but struggle with complex ocean-atmosphere dynamics due to 1) inadequate characterization of seawater movement (e.g., coastal upwelling) and 2) insufficient integration of external SST drivers (e.g., turbulent heat fluxes). To address these challenges, we propose SSTODE, a physics-informed Neural Ordinary Differential Equations (Neural ODEs) framework for SST prediction. First, we derive ODEs from fluid transport principles, incorporating both advection and diffusion to model ocean spatiotemporal dynamics. Through variational optimization, we recover a latent velocity field that explicitly governs the temporal dynamics of SST. Building upon ODE, we introduce an Energy Exchanges Integrator (EEI)-inspired by ocean heat budget equations-to account for external forcing factors. Thus, the variations in the components of these factors provide deeper insights into SST dynamics. Extensive experiments demonstrate that SSTODE achieves state-of-the-art performances in global and regional SST forecasting benchmarks. Furthermore, SSTODE visually reveals the impact of advection dynamics, thermal diffusion patterns, and diurnal heating-cooling cycles on SST evolution. These findings demonstrate the model's interpretability and physical consistency.

AAAI Conference 2026 Conference Paper

Think Then Rewrite: Reasoning Enhanced Query Rewriting for Domain Specific Retrieval

  • Ang Li
  • Yufei Shi
  • Yuxuan Si
  • Yiquan Wu
  • Ming Cai
  • Xu Tan
  • Yi Wang
  • Changlong Sun

Query rewriting is a crucial task for improving retrieval, especially in professional domains such as law and medicine, where user queries are often underspecified and ambiguous. While large language models (LLMs) offer strong understanding and generation capabilities, existing LLM-based approaches reduce the task to text transformation or expansion, neglecting reasoning to disambiguate queries, which fails to bridge the cognitive gap between user queries and specialized documents. In this paper, we propose Think-Then-Rewrite (TTR), a reinforcement learning based framework that unleashes LLMs' reasoning ability for domain-specific query rewriting. TTR introduces a contrastive mutual information reward to encourage the LLM to generate reasoning processes that effectively distinguish confusing distractors. To boost early-stage training, TTR also constructs golden query rewrites as off‑policy data, providing strong guidance for RL learning. A mixed-policy optimization then combines on-policy and off-policy signals, ensuring both effectiveness and stability. Extensive experiments on legal and medical retrieval benchmarks demonstrate that TTR achieves state-of-the-art performance.

AAAI Conference 2026 Conference Paper

TraveLLaMA: A Multimodal Travel Assistant with Large-Scale Dataset and Structured Reasoning

  • Meng Chu
  • Yukang Chen
  • Haokun GUI
  • Shaozuo Yu
  • Yi Wang
  • Jiaya Jia

Tourism and travel planning increasingly rely on digital assistance, yet existing multimodal AI systems often lack specialized knowledge and contextual understanding of urban environments. We present TraveLLaMA, a specialized multimodal language model designed for comprehensive travel assistance. Our work addresses the fundamental challenge of developing practical AI travel assistants through three key contributions: (1) TravelQA, a novel dataset of 265k question-answer pairs combining 160k text QA from authentic travel sources, 100k vision-language QA featuring maps and location imagery, and 5k expert-annotated Chain-of-Thought reasoning examples; (2) Travel-CoT, a structured reasoning framework that decomposes travel queries into spatial, temporal, and practical dimensions, improving answer accuracy by 10.8% while providing interpretable decision paths; and (3) an interactive agent system validated through extensive user studies. Through fine-tuning experiments on state-of-the-art vision-language models (LLaVA, Qwen-VL, Shikra), we achieve 6.2-9.4% base improvements, further enhanced by Travel-CoT reasoning. Our model demonstrates superior capabilities in contextual travel recommendations, map interpretation, and scene understanding while providing practical information such as operating hours and cultural insights. User studies with 500 participants show TraveLLaMA achieves a System Usability Scale score of 82.5, significantly outperforming general-purpose models and establishing new standards for multimodal travel assistance systems.

AAAI Conference 2026 Conference Paper

VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning

  • Zikang Wang
  • Boyu Chen
  • Zhengrong Yue
  • Yi Wang
  • Yu Qiao
  • Limin Wang
  • Yali Wang

Recent advances in video understanding have been driven by MLLMs. But these MLLMs are good at analyzing short videos, while suffering from difficulties in understanding videos with a longer context. To address this difficulty, several agent paradigms have recently been proposed, using MLLMs as agents for retrieving extra contextual knowledge in a long video. However, most existing agents ignore the key fact that a long video is composed with multiple shots, i.e., to answer the user question from a long video, it is critical to deeply understand its relevant shots like human. Without such insight, these agents often mistakenly find redundant even noisy temporal context, restricting their capacity for long video understanding. To fill this gap, we propose VideoChat-A1, a novel long video agent paradigm. Different from the previous works, our VideoChat-A1 can deeply think with long videos, via a distinct chain-of-shot reasoning paradigm. More specifically, it can progressively select the relevant shots of user question, and look into these shots in a coarse-to-fine partition. By multi-modal reasoning along the shot chain, VideoChat-A1 can effectively mimic step-by-step human thinking process, allowing the interactive discovery of preferable temporal context for thoughtful understanding in long videos. Extensive experiments show that, VideoChat-A1 achieves the state-of-the-art performance on the mainstream long video QA benchmarks, e.g., it achieves 77.0 on VideoMME(w/ subs) and 70.1 on EgoSchema, outperforming its strong baselines (e.g., InternVL2.5-8B and InternVideo2.5-8B), by up to 10.1% and 6.2%. Compared to leading closed-source GPT-4o and Gemini 1.5 Pro, VideoChat-A1 offers competitive accuracy, but only with 7% input frames and 12% inference time on average.

IROS Conference 2025 Conference Paper

A Deep Learning-Driven Autonomous System for Retinal Vein Cannulation: Validation Using a Chicken Embryo Model

  • Yi Wang
  • Peiyao Zhang
  • Mojtaba Esfandiari
  • Peter Gehlbach
  • Iulian I. Iordachita

Retinal vein cannulation (RVC) is a minimally invasive microsurgical procedure for treating retinal vein occlusion (RVO), a leading cause of vision impairment. However, the small size and fragility of retinal veins, coupled with the need for high-precision, tremor-free needle manipulation, create significant technical challenges. These limitations highlight the need for robotic assistance to improve accuracy and stability. This study presents an automated robotic system with a top-down microscope and B-scan optical coherence tomography (OCT) imaging for precise depth sensing. Deep learning-based models enable real-time needle navigation, contact detection, and vein puncture recognition, using a chicken embryo model as a surrogate for human retinal veins. The system autonomously detects needle position and puncture events with 85% accuracy. The experiments demonstrate notable reductions in navigation and puncture times compared to manual methods. Our results demonstrate the potential of integrating advanced imaging and deep learning to automate microsurgical tasks, providing a pathway for safer and more reliable RVC procedures with enhanced precision and reproducibility.

EAAI Journal 2025 Journal Article

Adaptive adversarial pattern contrast algorithm for black-box model and domain attack

  • Weidong Wang
  • Yi Wang
  • Zhi Li
  • Long Zheng
  • Li Zhang

The transferability of adversarial attacks in deep neural networks (DNNs) is a significant challenge, especially for achieving effective attacks across models and data domains. Unfortunately, existing attack approaches primarily focus on cross-model transferability, often overlooking the potential for black-box attacks across diverse data domains. This paper proposes the Adaptive adversarial Pattern Contrast (APEC) algorithm, designed to achieve cross-model and domain adversarial attacks with high transferability. Firstly, APEC generates transferable adversarial examples by leveraging spatial characteristics such as regional homogeneity, repetition, and density, thereby increasing classifier misclassification rates. Secondly, a key innovation in APEC is the similarity contrast loss inspired by contrastive learning. It guides the model to learn discriminative adversarial features by aligning adversarial examples with adversarial patterns and distancing them from clean examples. Importantly, this optimization is performed label-free, enhancing APEC’s practicality in real-world black-box scenarios. Additionally, we introduce a Gaussian low-pass filter in APEC to generate adversarial perturbation patterns adaptively. This operation suppresses high-frequency information while preserving the low-frequency characteristics of natural examples, enhancing APEC’s attack capabilities. The APEC algorithm shows relative improvement across models and data domains compared to state-of-the-art transferability attacks. Our code is available at https: //github. com/cs-igps/APEC-TransferAttack.

IJCAI Conference 2025 Conference Paper

All Roads Lead to Rome: Exploring Edge Distribution Shifts for Heterophilic Graph Learning

  • Yi Wang
  • Changqin Huang
  • Ming Li
  • Tingyi Cai
  • Zhonglong Zheng
  • Xiaodi Huang

Heterophilic graph neural networks (GNNs) have gained prominence for their ability to learn effective representations in graphs with diverse, attribute-aware relationships. While existing methods leverage attribute inference during message passing to improve performance, they often struggle with challenging heterophilic graphs. This is due to edge distribution shifts introduced by diverse connection patterns, which blur attribute distinctions and undermine message-passing stability. This paper introduces H₂OGNN, a novel framework that reframes edge attribute inference as an out-of-distribution (OOD) detection problem. H₂OGNN introduces a simple yet effective symbolic energy regularization approach for OOD learning, ensuring robust classification boundaries between homophilic and heterophilic edge attributes. This design significantly improves the stability and reliability of GNNs across diverse connectivity patterns. Through theoretical analysis, we show that H₂OGNN addresses the graph denoising problem by going beyond feature smoothing, offering deeper insights into how precise edge attribute identification boosts model performance. Extensive experiments on nine benchmark datasets demonstrate that H₂OGNN not only achieves state-of-the-art performance but also consistently outperforms other heterophilic GNN frameworks, particularly on datasets with high heterophily.

ICRA Conference 2025 Conference Paper

Asymptotically Optimal Sampling-Based Motion Planning Through Anytime Incremental Lazy Bidirectional Heuristic Search

  • Yi Wang
  • Bingxian Mu
  • Oren Salzman

This paper introduces Bidirectional Lazy Informed Trees (BLIT*), the first algorithm to incorporate anytime incremental lazy bidirectional heuristic search (Bi-HS) into batch-wise sampling-based motion planning (Bw-SBMP). BLIT* operates on batches of informed states (states that can potentially improve the cost of the incumbent solution) structured as an implicit random geometric graph (RGG). The computational cost of collision detection is mitigated via a new lazy edge-evaluation strategy by focusing on states near obstacles. Experimental results, especially in high dimensions, show that BLIT* outperforms existing Bw-SBMP planners by efficiently finding an initial solution and effectively improving the quality as more computational resources are available.

EAAI Journal 2025 Journal Article

Automatic collaborative learning for drug repositioning

  • Yi Wang
  • Yajie Meng
  • Chang Zhou
  • Xianfang Tang
  • Pan Zeng
  • Chu Pan
  • Qiang Zhu
  • Bengong Zhang

Drug repositioning seeks to identify new therapeutic uses for existing drugs, accelerating development and reducing costs. While traditional wet lab experiments are costly, computational methods offer a low-cost, efficient alternative. Despite their potential, most research in this field has uncritically employed the standard message-passing mechanism of Graph Neural Network (GNN), limiting the assessment of collaborative effects on prediction accuracy. In this paper, we introduce a novel model, an automatic collaborative learning framework for drug repositioning. Initially, we propose a metric to measure the interaction levels among neighbors and integrate it with the intrinsic message-passing mechanism of GNN, thereby enhancing the impact of various collaborative effects on prediction accuracy. Furthermore, we introduce an advanced contrastive learning technique to align feature consistency between the disease–drug association space and the customized neighbor space. This approach leverages the inherent regularities across different feature dimensions to minimize feature redundancy. Extensive experiments conducted on three benchmark datasets demonstrate substantial improvements of this novel model over various state-of-the-art methods. Case studies further highlight the practical utility of this model.

IJCAI Conference 2025 Conference Paper

Bidirectional Search while Ensuring Meet-In-The-Middle via Effective and Efficient-to-Compute Termination Conditions

  • Yi Wang
  • Eyal Weiss
  • Bingxian Mu
  • Oren Salzman

In bidirectional heuristic search, the meeting-in-the-middle property (MMP) and the theory of must-expand pairs (MEP) have driven significant recent developments in search efficiency. However, these methodologies typically terminate the search based on minimal priority metrics in the forward and backward open lists, requiring exploration of all potentially better solutions and potentially incurring substantial computational burden. In this paper, we investigate the reasons that contribute to the potential inefficiency in MM, and introduce a tighter termination condition that enables earlier termination without exhaustive exploration while still ensuring both MMP and optimality. This results in a highly efficient bidirectional search algorithm. Experimental comparisons demonstrate that our algorithm outperforms MM in terms of running time by at least two orders of magnitude and is on par or better compared to A*, highlighting its potential in a wide range of applications.

NeurIPS Conference 2025 Conference Paper

CoralVQA: A Large-Scale Visual Question Answering Dataset for Coral Reef Image Understanding

  • Hongyong Han
  • Wei Wang
  • Gaowei Zhang
  • Mingjie Li
  • Yi Wang

Coral reefs are vital yet vulnerable ecosystems that require continuous monitoring to support conservation. While coral reef images provide essential information in coral monitoring, interpreting such images remains challenging due to the need for domain expertise. Visual Question Answering (VQA), powered by Large Vision-Language Models (LVLMs), has great potential in user-friendly interaction with coral reef images. However, applying VQA to coral imagery demands a dedicated dataset that addresses two key challenges: domain-specific annotations and multidimensional questions. In this work, we introduce CoralVQA, the first large-scale VQA dataset for coral reef analysis. It contains 12, 805 real-world coral images from 67 coral genera collected from 3 oceans, along with 277, 653 question-answer pairs that comprehensively assess ecological and health-related conditions. To construct this dataset, we develop a semi-automatic data construction pipeline in collaboration with marine biologists to ensure both scalability and professional-grade data quality. CoralVQA presents novel challenges and provides a comprehensive benchmark for studying vision-language reasoning in the context of coral reef images. By evaluating several state-of-the-art LVLMs, we reveal key limitations and opportunities. These insights form a foundation for future LVLM development, with a particular emphasis on supporting coral conservation efforts.

EAAI Journal 2025 Journal Article

Coupled flows as guidance for model-based policy optimization

  • Shengrong Gong
  • Yi Wang
  • Xin Du
  • Yuya Sun
  • Lifan Zhou
  • Shan Zhong

Model-based reinforcement learning (MBRL) offers high sample efficiency but suffers from cumulative multi-step prediction errors that degrade long-term performance. To address this, we propose a coupled flows-guided policy optimization framework, where two coupled flows quantify and minimize the discrepancy between the true and learned state–action distributions. By reducing this divergence, the loss functions serve as both a discriminator, selecting more accurate rollouts for policy learning, and a reward signal, refining the dynamics model to mitigate multi-step errors. Theoretical analysis establishes a bound on the expected return discrepancy. Empirical evaluations demonstrate that our method achieves higher cumulative rewards than the representative model-based approaches across diverse control tasks. This highlights its applicability in data-scarce domains such as robotics, recommendation systems, and autonomous driving.

AAAI Conference 2025 Conference Paper

Deep Hypergraph Neural Networks with Tight Framelets

  • Ming Li
  • Yujie Fang
  • Yi Wang
  • Han Feng
  • Yongchun Gu
  • Lu Bai
  • Pietro Liò

Hypergraphs provide a flexible framework for modeling high-order (complex) interactions among multiple entities, extending beyond traditional pairwise correlations in graph structures. However, deep hypergraph neural networks (HGNNs) often face the challenge of oversmoothing with increasing depth, similar to issues in graph neural networks (GNNs). While oversmoothing in GNNs has been extensively studied, its implications in relation to hypergraphs are less explored. This paper addresses this gap by first theoretically exploring the reasons behind oversmoothing in deep HGNNs. Our novel insights suggest that a spectral-based hypergraph convolution, equipped with both low-pass and high-pass filters, can potentially mitigate these effects. Motivated by these findings, we introduce FrameHGNN, a framework that utilizes framelet-based hypergraph convolutions integrating tight framelet transforms with both low-pass and high-pass components, as well as the commonly used strategies in designing deep GNN architecture: initial residual and identity mappings. The experiment results on diverse benchmark datasets demonstrate that FrameHGNN outperforms several state-of-the-art models, effectively reducing oversmoothing while improving predictive accuracy. Our contributions not only advance the theoretical understanding of deep hypergraph learning but also provide a practical spectral-based approach for HGNNs, emphasizing the design of multifrequency channels.

JBHI Journal 2025 Journal Article

Diffusion Tensor Magnetic Resonance Image Registration Based on Parallel Dual-Channel VoxelMorph

  • Yi Wang
  • Shufan Geng
  • Haopeng Jia
  • Yang Cai
  • Zhe Guo
  • Yilong Niu
  • Asoke K. Nandi

Diffusion Tensor Magnetic Resonance Imaging (DTI) is a non-invasive technique for studying brain structure in vivo by measuring the diffusion properties of water molecules. Unlike conventional medical imaging that captures scalar intensity data, DTI data is typically stored as a 4D volume, where each voxel in 3D space is a 3×3 Cartesian tensor. DTI characterizes tensor-based diffusion profiles and captures information about the orientation of fiber bundles. During the alignment process, voxels need to be spatially transformed while maintaining the correspondence of tensor orientations, which leads to complex computations. Traditional DTI registration methods often suffer from slow iteration speed and low accuracy, posing challenges for clinical applications. In this paper, a novel DTI Registration method Based on Parallel Dual-channel Voxel Morph (DTI-RBPDV) is proposed. The core of the method is a two-branch convolutional neural network architecture. With a view to enhancing the alignment performance, it processes two input patterns simultaneously: (1) fractional anisotropy (FA) images and (2) principal eigenvectors from to-be-aligned and fixed DTI volumes to enhance the accuracy of deformation field prediction. In the network decoder layer, integration of attention mechanisms has also been implemented. These channel space attention modules dynamically highlight salient anatomical features and orientation consistency, improving the model's sensitivity to key structural alignments. Experimental results show that DTI-RBPDV effectively addresses the limitations of slow iterative computation and the challenges of applying deep learning to high-dimensional DTI data by significantly improving the registration accuracy and computational speed.

NeurIPS Conference 2025 Conference Paper

Efficient Knowledge Transfer in Federated Recommendation for Joint Venture Ecosystem

  • Yichen Li
  • Yijing Shan
  • Yi Liu
  • Haozhao Wang
  • Cheng Wang
  • Yi Wang
  • Ruixuan Li

The current Federated Recommendation System (FedRS) focuses on personalized recommendation services and assumes clients are personalized IoT devices (e. g. , Mobile phones). In this paper, we deeply dive into new but practical FedRS applications within the joint venture ecosystem. Subsidiaries engage as participants with their users and items. However, in such a situation, merely exchanging item embedding is insufficient, as user bases always exhibit both overlaps and exclusive segments, demonstrating the complexity of user information. Meanwhile, directly uploading user information is a violation of privacy and unacceptable. To tackle the above challenges, we propose an efficient and privacy-enhanced federated recommendation for the joint venture ecosystem (FR-JVE) that each client transfers more common knowledge from other clients with a distilled user's \textit{rating preference} from the local dataset. More specifically, we first transform the local data into a new format and apply model inversion techniques to distill the rating preference with frozen user gradients before the federated training. Then, a bridge function is employed on each client side to align the local rating preference and aggregated global preference in a privacy-friendly manner. Finally, each client matches similar users to make a better prediction for overlapped users. From a theoretical perspective, we analyze how effectively FR-JVE can guarantee user privacy. Empirically, we show that FR-JVE achieves superior performance compared to state-of-the-art methods.

EAAI Journal 2025 Journal Article

Electric vehicle charging and discharging control and microgrid energy management based on single agent deep reinforcement learning

  • Shiliang Guo
  • Yaochen Wang
  • Kai Ma
  • Jie Yang
  • Yi Wang

Electric vehicles can effectively achieve terminal emission reduction as a cleaning alternative. However, the power grid load fluctuations and multi-energy synergy dilemma caused by their large-scale access highlights the urgent need to build a new energy management system. To this end, this study proposes a multi-objective optimization framework for industrial microgrids, aiming to solve the problem that the power grid peak shaving needs are difficult to compatible with user economy by coordinating the spatiotemporal and spatial energy interaction between distributed power supplies, energy storage systems and electric vehicles. The research method integrates deep reinforcement learning and feature extraction techniques: first, the Long Short-Term Memory (LSTM) timing network is constructed to analyze the nonlinear coupling relationship between electricity price, renewable energy and load, and then design a flexible strategy generation mechanism based on the maximum entropy principle of the Soft Actor-Critic (SAC) algorithm. Finally, the simulation results were analyzed and the feasibility of using time of use pricing to guide electric vehicles to participate in demand response was explored. Experimental data indicate that the proposed LSTM-SAC model reduces the average daily operational cost of the microgrid by 2. 86 % and the cost of distributed generation by 58. 4 % compared to the SAC algorithm. This study realizes the coordinated optimization of fixed and mobile energy storage. The decision-making mechanism it constructs confirms the accessibility of the dual goals of economic and environmental protection of industrial microgrids, and provides a reusable example for the flexible regulation of high-proportion renewable energy systems.

NeurIPS Conference 2025 Conference Paper

Enhancing Privacy in Multimodal Federated Learning with Information Theory

  • Tianzhe Xiao
  • Yichen Li
  • Yining Qi
  • Yi Liu
  • Haozhao Wang
  • Yi Wang
  • Ruixuan Li

Multimodal federated learning (MMFL) has gained increasing popularity due to its ability to leverage the correlation between various modalities, meanwhile preserving data privacy for different clients. However, recent studies show that correlation between modalities increase the vulnerability of federated learning against Gradient Inversion Attack (GIA). The complicated situation of MMFL privacy preserving can be summarized as follows: 1) different modality transmits different amounts of information, thus requires various protection strength; 2) correlation between modalities should be taken into account. This paper introduces an information theory perspective to analyze the leaked privacy in process of MMFL, and tries to propose a more reasonable protection method \textbf{Sec-MMFL} based on assessing different information leakage possibilities of each modality by conditional mutual information and adjust the corresponding protection strength. Moreover, we use mutual information to reduce the cross-modality information leakage in MMFL. Experiments have proven that our method can bring more balanced and comprehensive protection at an acceptable cost.

AAAI Conference 2025 Conference Paper

Enhancing Vision-Language Models with Morphological and Taxonomic Knowledge: Towards Coral Recognition for Ocean Health

  • Hongyong Han
  • Wei Wang
  • Gaowei Zhang
  • Mingjie Li
  • Yi Wang

Coral reefs play a crucial role in marine ecosystems, offering a nutrient-rich environment and safe shelter for numerous marine species. Automated coral image recognition aids in monitoring ocean health at a scale without experts' manual effort. Recently, large vision-language models like CLIP have greatly enhanced zero-shot and low-shot classification capabilities for various visual tasks. However, these models struggle with fine-grained coral-related tasks due to a lack of specific knowledge. To bridge this gap, we compile a fine-grained coral image dataset consisting of 16,659 images with taxonomy labels (from Kingdom to Species), accompanied by morphology-specific text descriptions for each species. Based on the dataset, we propose CORAL-Adapter, integrating two complementary kinds of coral-specific knowledge (biological taxonomy and coral morphology) with general knowledge learned by CLIP. CORAL-Adapter is a simple yet powerful extension of CLIP with only a few parameter updates and can be used as a plug-and-play module with various CLIP-based methods. We show improvements in accuracy across diverse coral recognition tasks, e.g., recognizing corals unseen during training that are prone to bleaching or originate from different oceans.

NeurIPS Conference 2025 Conference Paper

FAPEX: Fractional Amplitude-Phase Expressor for Robust Cross-Subject Seizure Prediction

  • Ruizhe Zheng
  • Lingyan Mao
  • DINGDING HAN
  • Tian Luo
  • Yi Wang
  • Jing Ding
  • Yuguo Yu

Precise, generalizable subject-agnostic seizure prediction (SASP) remains a fundamental challenge due to the intrinsic complexity and significant spectral variability of electrophysiologial signals across individuals and recording modalities. We propose \model{FAPEX}, a novel architecture that introduces a learnable \emph{fractional neural frame operator} (FrNFO) for adaptive time–frequency decomposition. Unlike conventional models that exhibit spectral bias toward low frequencies, our FrNFO employs fractional-order convolutions to capture both high and low-frequency dynamics, achieving approximately $10\%$ improvement in F1-score and sensitivity over state-of-the-art baselines. The FrNFO enables the extraction of \emph{instantaneous phase and amplitude representations} that are particularly informative for preictal biomarker discovery and enhance out-of-distribution generalization. \model{FAPEX} further integrates structural state-space modeling and channelwise attention, allowing it to handle heterogeneous electrode montages. Evaluated across 12 benchmarks spanning species (human, rat, dog, macaque) and modalities (Scalp‑EEG, SEEG, ECoG, LFP), \model{FAPEX} consistently outperforms 23 supervised and 10 self-supervised baselines under nested cross-validation, with gains of up to $15\%$ in sensitivity on complex cross-domain scenarios. It further demonstrates superior performance in several external validation cohorts. To our knowledge, these establish \model{FAPEX} as the first epilepsy model to show consistent superiority in SASP, offering a promising solution for discovering epileptic biomarker evidence supporting the existence of a distinct and identifiable preictal state for and clinical translation.

AAAI Conference 2025 Conference Paper

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

  • Wanggui He
  • Siming Fu
  • Mushui Liu
  • Xierui Wang
  • Wenyi Xiao
  • Fangxun Shu
  • Yi Wang
  • Lei Zhang

Auto-regressive models have made significant progress in the realm of text-to-image synthesis, yet devising an appropriate model architecture and training strategy to achieve a satisfactory level remains an important avenue of exploration. In this work, we introduce MARS, a novel framework for T2I generation that incorporates a specially designed Semantic Vision-Language Integration Expert (SemVIE). This innovative component integrates pre-trained LLMs by independently processing linguistic and visual information—freezing the textual component while fine-tuning the visual component. This methodology preserves the NLP capabilities of LLMs while imbuing them with exceptional visual understanding. Building upon the powerful base of the pre-trained Qwen-7B, MARS stands out with its bilingual generative capabilities corresponding to both English and Chinese language prompts and the capacity for joint image and text generation. The flexibility of this framework lends itself to migration towards any-to-any task adaptability. Furthermore, MARS employs a multi-stage training strategy that first establishes robust image-text alignment through complementary bidirectional tasks and subsequently concentrates on refining the T2I generation process, significantly augmenting text-image synchrony and the granularity of image details. Notably, MARS requires only 9% of the GPU days needed by SD1.5, yet it achieves remarkable results across a variety of benchmarks, illustrating the training efficiency and the potential for swift deployment in various applications.

AAAI Conference 2025 Conference Paper

ML-GOOD: Towards Multi-Label Graph Out-Of-Distribution Detection

  • Tingyi Cai
  • Yunliang Jiang
  • Ming Li
  • Changqin Huang
  • Yi Wang
  • Qionghao Huang

The out-of-distribution (OOD) detection on graph-structured data is crucial for deploying graph neural networks securely in open-world scenarios. However, existing methods have overlooked the prevalent scenario of multi-label classification in real-world applications. In this work, we investigate the unexplored issue of OOD detection within multi-label node classification tasks. We propose ML-GOOD, a simple yet sufficient approach that utilizes an energy function to gauge the OOD score for each label. We further develop a strategy for amalgamating multiple label energies, allowing for the comprehensive utilization of label information to tackle the primary challenges encountered in multi-label scenarios. Extensive experimentation conducted on seven diverse sets of real-world multi-label graph datasets, encompassing cross-domain scenarios. The results show that the AUROC of ML-GOOD is improved by 5.26% in intra-domain and 6.54% in cross-domain compared to the previous methods. These empirical validations not only affirm the robustness of our methodology but also illuminate new avenues for further exploration within this burgeoning field of research.

AAAI Conference 2025 Conference Paper

MM-Mixing: Multi-Modal Mixing Alignment for 3D Understanding

  • Jiaze Wang
  • Yi Wang
  • Ziyu Guo
  • Renrui Zhang
  • Donghao Zhou
  • Guangyong Chen
  • Anfeng Liu
  • Pheng-Ann Heng

We introduce MM-Mixing, a multi-modal mixing alignment framework for 3D understanding. MM-Mixing applies mixing-based methods to multi-modal data, preserving and optimizing cross-modal connections while enhancing diversity and improving alignment across modalities. Our proposed two-stage training pipeline combines feature-level and input-level mixing to optimize the 3D encoder. The first stage employs feature-level mixing with contrastive learning to align 3D features with their corresponding modalities. The second stage incorporates both feature-level and input-level mixing, introducing mixed point cloud inputs to further refine 3D feature representations. MM-Mixing enhances intermodality relationships, promotes generalization, and ensures feature consistency while providing diverse and realistic training samples. We demonstrate that MM-Mixing significantly improves baseline performance across various learning scenarios, including zero-shot 3D classification, linear probing 3D classification, and cross-modal 3D shape retrieval. Notably, we improved the zero-shot classification accuracy on ScanObjectNN from 51.3% to 61.9%, and on Objaverse-LVIS from 46.8% to 51.4%. Our findings highlight the potential of multi-modal mixing-based alignment to significantly advance 3D object recognition and understanding while remaining straightforward to implement and integrate into existing frameworks.

YNIMG Journal 2025 Journal Article

Pain in focus: How persistent pain disrupts the attentional bias towards pain-related information

  • Jia Li
  • Xiaohan Lyu
  • Xiaoyun Li
  • Xilin Yang
  • Lingling Weng
  • Yi Wang
  • Weiwei Peng

Pain modulates attentional biases, contributing to chronic pain development and maintenance through enhanced focus on pain-related stimuli. This study employed drift-diffusion modeling (DDM) and multivariate EEG to investigate how sustained pain affects attention allocation. Using a crossover design, 58 healthy volunteers underwent two sessions (capsaicin-induced pain vs. control cream) while performing word- and picture-based dot-probe tasks. Probes appeared in locations either congruent or incongruent with pain-related stimuli, or after neutral stimulus pairs. Behavioral and neural responses to congruency and incongruency effects were compared between pain states. DDM revealed increased incongruency effects during pain, characterized by slower drift rates and narrower decision thresholds, suggesting impaired evidence accumulation. EEG analyses revealed two distinct pain-state modulations: (1) amplified P3 amplitudes (300-600 ms) during incongruent trials, and (2) multivariate decoding of δ/θ oscillations (1-7 Hz, 116-364 ms post-stimulus) that uniquely differentiated incongruent from neutral conditions specifically under pain. These behavioral and neural signatures of attentional disruption manifested selectively during verbal tasks, with no parallel effects observed in pictorial processing. Our findings demonstrate how pain disrupts cognitive control: impaired expectancy processing (early δ/θ oscillations), compromised decision formation (altered DDM parameters), and deficient response inhibition (P3 modulation). These results highlight verbal information processing as a key vulnerability in pain-related attentional bias, suggesting targeted interventions for cognitive control components could mitigate chronic pain consequences.

YNIMG Journal 2025 Journal Article

Potential separation of multiple system atrophy and Parkinson’s disease by susceptibility-derived components

  • Su Yan
  • Jun Lu
  • Bingfang Duan
  • Shun Zhang
  • Dong Liu
  • Yuanyuan Qin
  • Alexey V. Dimov
  • Junghun Cho

BACKGROUND: Substantial evidence emphasizes the dysregulation of iron homeostasis, demyelination and oxidative stress in the neurodegenerative process of multiple system atrophy (MSA) and Parkinson's disease (PD), although its clinical implications remain unclear. Recent MRI post-processing techniques leveraging magnetic susceptibility properties provide a noninvasive means to characterize iron, myelin content and oxygen metabolism alterations. This study aims to investigate subcortical alterations of susceptibility-derived metrics in these two synucleinopathies. METHODS: A cohort comprising 180 patients (122 with PD and 58 with MSA) and 77 healthy controls (HCs) underwent clinical evaluation and multi-echo gradient echo MRI scans. Susceptibility source separation, susceptibility-based oxygen extraction fraction (OEF) mapping and semiautomatic subcortical nuclei segmentation were utilized to derive parametric values of deep gray matter in all subjects. RESULTS: MSA patients showed markedly elevated paramagnetic susceptibility values in the putamen, globus pallidus (GP) and thalamus; increased diamagnetic susceptibility values in the putamen and dentate nucleus; and reduced OEF values across all nuclei compared with PD patients and HCs. Whereas PD exhibited increased positive susceptibility values in the substantia nigra and enhancing negative values in the GP, similar to MSA. Notably, age-related reductions in OEF were evident in HCs, which was altered by the MSA pathology. Paramagnetic susceptibility was correlated with disease severity. Moreover, the susceptibility-derived metrics of striatum and midbrain nuclei proved to be effective predictors to distinguish PD from MSA (AUC = 0.833). CONCLUSION: Susceptibility-derived metrics could detect pathological involvement distinct to each disease, offering significant potential for differentiating between MSA and PD in clinical settings.

NeurIPS Conference 2025 Conference Paper

Seg-VAR:Image Segmentation with Visual Autoregressive Modeling

  • Rongkun Zheng
  • Lu Qi
  • Xi Chen
  • Yi Wang
  • Kun Wang
  • Hengshuang Zhao

While visual autoregressive modeling (VAR) strategies have shed light on image generation with the autoregressive models, their potential for segmentation, a task that requires precise low-level spatial perception, remains unexplored. Inspired by the multi-scale modeling of classic Mask2Former-based models, we propose Seg-VAR, a novel framework that rethinks segmentation as a conditional autoregressive mask generation problem. This is achieved by replacing the discriminative learning with the latent learning process. Specifically, our method incorporates three core components: (1) an image encoder generating latent priors from input images, (2) a spatial-aware seglat (a latent expression of segmentation mask) encoder that maps segmentation masks into discrete latent tokens using a location-sensitive color mapping to distinguish instances, and (3) a decoder reconstructing masks from these latents. A multi-stage training strategy is introduced: first learning seglat representations via image-seglat joint training, then refining latent transformations, and finally aligning image-encoder-derived latents with seglat distributions. Experiments show Seg-VAR outperforms previous discriminative and generative methods on various segmentation tasks and validation benchmarks. By framing segmentation as a sequential hierarchical prediction task, Seg-VAR opens new avenues for integrating autoregressive reasoning into spatial-aware vision systems.

NeurIPS Conference 2025 Conference Paper

Semantic Representation Attack against Aligned Large Language Models

  • Jiawei Lian
  • Jianhong Pan
  • Lefan Wang
  • Yi Wang
  • Shaohui Mei
  • Lap-Pui Chau

Large Language Models (LLMs) increasingly employ alignment techniques to prevent harmful outputs. Despite these safeguards, attackers can circumvent them by crafting prompts that induce LLMs to generate harmful content. Current methods typically target exact affirmative responses, suffering from limited convergence, unnatural prompts, and high computational costs. We introduce semantic representation attacks, a novel paradigm that fundamentally reconceptualizes adversarial objectives against aligned LLMs. Rather than targeting exact textual patterns, our approach exploits the semantic representation space that can elicit diverse responses that share equivalent harmful meanings. This innovation resolves the inherent trade-off between attack effectiveness and prompt naturalness that plagues existing methods. Our Semantic Representation Heuristic Search (SRHS) algorithm efficiently generates semantically coherent adversarial prompts by maintaining interpretability during incremental search. We establish rigorous theoretical guarantees for semantic convergence and demonstrate that SRHS achieves unprecedented attack success rates (89. 4% averaged across 18 LLMs, including 100% on 11 models) while significantly reducing computational requirements. Extensive experiments show that our method consistently outperforms existing approaches.

NeurIPS Conference 2025 Conference Paper

StableGuard: Towards Unified Copyright Protection and Tamper Localization in Latent Diffusion Models

  • Haoxin Yang
  • Bangzhen Liu
  • Xuemiao Xu
  • Cheng Xu
  • Yuyang Yu
  • Zikai Huang
  • Yi Wang
  • Shengfeng He

The advancement of diffusion models has enhanced the realism of AI-generated content but also raised concerns about misuse, necessitating robust copyright protection and tampering localization. Although recent methods have made progress toward unified solutions, their reliance on post hoc processing introduces considerable application inconvenience and compromises forensic reliability. We propose StableGuard, a novel framework that seamlessly integrates a binary watermark into the diffusion generation process, ensuring copyright protection and tampering localization in Latent Diffusion Models through an end-to-end design. We develop a Multiplexing Watermark VAE (MPW-VAE) by equipping a pretrained Variational Autoencoder (VAE) with a lightweight latent residual-based adapter, enabling the generation of paired watermarked and watermark-free images. These pairs, fused via random masks, create a diverse dataset for training a tampering-agnostic forensic network. To further enhance forensic synergy, we introduce a Mixture-of-Experts Guided Forensic Network (MoE-GFN) that dynamically integrates holistic watermark patterns, local tampering traces, and frequency-domain cues for precise watermark verification and tampered region detection. The MPW-VAE and MoE-GFN are jointly optimized in a self-supervised, end-to-end manner, fostering a reciprocal training between watermark embedding and forensic accuracy. Extensive experiments demonstrate that StableGuard consistently outperforms state-of-the-art methods in image fidelity, watermark verification, and tampering localization.

NeurIPS Conference 2025 Conference Paper

StreamForest: Efficient Online Video Understanding with Persistent Event Memory

  • Xiangyu Zeng
  • Kefan Qiu
  • Qingyu Zhang
  • Xinhao Li
  • Jing Wang
  • Jiaxin Li
  • Ziang Yan
  • Kun Tian

Multimodal Large Language Models (MLLMs) have recently achieved remarkable progress in video understanding. However, their effectiveness in real-time streaming scenarios remains limited due to storage constraints of historical visual features and insufficient real-time spatiotemporal reasoning. To address these challenges, we propose StreamForest, a novel architecture specifically designed for streaming video understanding. Central to StreamForest is the Persistent Event Memory Forest, a memory mechanism that adaptively organizes video frames into multiple event-level tree structures. This process is guided by penalty functions based on temporal distance, content similarity, and merge frequency, enabling efficient long-term memory retention under limited computational resources. To enhance real-time perception, we introduce a Fine-grained Spatiotemporal Window, which captures detailed short-term visual cues to improve current scene perception. Additionally, we present OnlineIT, an instruction-tuning dataset tailored for streaming video tasks. OnlineIT significantly boosts MLLM performance in both real-time perception and future prediction. To evaluate generalization in practical applications, we introduce ODV-Bench, a new benchmark focused on real-time streaming video understanding in autonomous driving scenarios. Experimental results demonstrate that StreamForest achieves the state-of-the-art performance, with accuracies of 77. 3% on StreamingBench, 60. 5% on OVBench, and 55. 6% on OVO-Bench. In particular, even under extreme visual token compression (limited to 1024 tokens), the model retains 96. 8% of its average accuracy in eight benchmarks relative to the default setting. These results underscore the robustness, efficiency, and generalizability of StreamForest for streaming video understanding.

NeurIPS Conference 2025 Conference Paper

V2V: Scaling Event-Based Vision through Efficient Video-to-Voxel Simulation

  • Hanyue Lou
  • Jinxiu Liang
  • Minggui Teng
  • Yi Wang
  • Boxin Shi

Event-based cameras offer unique advantages such as high temporal resolution, high dynamic range, and low power consumption. However, the massive storage requirements and I/O burdens of existing synthetic data generation pipelines and the scarcity of real data prevent event-based training datasets from scaling up, limiting the development and generalization capabilities of event vision models. To address this challenge, we introduce Video-to-Voxel (V2V), an approach that directly converts conventional video frames into event-based voxel grid representations, bypassing the storage-intensive event stream generation entirely. V2V enables a 150× reduction in storage requirements while supporting on-the-fly parameter randomization for enhanced model robustness. Leveraging this efficiency, we train several video reconstruction and optical flow estimation model architectures on 10, 000 diverse videos totaling 52 hours—an order of magnitude larger than existing event datasets, yielding substantial improvements.

NeurIPS Conference 2025 Conference Paper

VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception

  • Ziang Yan
  • Yinan He
  • Xinhao Li
  • Zhengrong Yue
  • Xiangyu Zeng
  • Yali Wang
  • Yu Qiao
  • Limin Wang

Inducing reasoning in multimodal large language models (MLLMs) is critical for achieving human-level perception and understanding. Existing methods mainly leverage LLM reasoning to analyze parsed visuals, often limited by static perception stages. This paper introduces Visual Test-Time Scaling (VTTS), a novel approach to enhance MLLMs' reasoning via iterative perception during inference. VTTS mimics humans' hierarchical attention by progressively refining focus on high-confidence spatio-temporal regions, guided by updated textual predictions. Specifically, VTTS employs an Iterative Perception (ITP) mechanism, incorporating reinforcement learning with spatio-temporal supervision to optimize reasoning. To support this paradigm, we also present VTTS-80K, a dataset tailored for iterative perception. These designs allows a MLLM to enhance its performance by increasing its perceptual compute. Extensive experiments validate VTTS's effectiveness and generalization across diverse tasks and benchmarks. Our newly introduced Videochat-R1. 5 model has achieved remarkable improvements, with an average increase of over 5\%, compared to robust baselines such as Qwen2. 5VL-3B and -7B, across more than 15 benchmarks that encompass video conversation, video reasoning, and spatio-temporal perception.

NeurIPS Conference 2025 Conference Paper

What We Miss Matters: Learning from the Overlooked in Point Cloud Transformers

  • Yi Wang
  • Jiaze Wang
  • Ziyu Guo
  • Renrui Zhang
  • Donghao Zhou
  • Guangyong Chen
  • Anfeng Liu
  • Pheng-Ann Heng

Point Cloud Transformers have become a cornerstone in 3D representation for their ability to model long-range dependencies via self-attention. However, these models tend to overemphasize salient regions while neglecting other informative regions, which limits feature diversity and compromises robustness. To address this challenge, we introduce BlindFormer, a novel contrastive attention learning framework that redefines saliency by explicitly incorporating features typically neglected by the model. The proposed Attentional Blindspot Mining (ABM) suppresses highly attended regions during training, thereby guiding the model to explore its own blind spots. This redirection of attention expands the model’s perceptual field and uncovers richer geometric cues. To consolidate these overlooked features, BlindFormer employs Blindspot-Aware Joint Optimization (BJO), a joint learning objective that integrates blindspot feature alignment with the original pretext task. BJO enhances feature discrimination while preserving performance on the primary task, leading to more robust and generalizable representations. We validate BlindFormer on several challenging benchmarks and demonstrate consistent performance gains across multiple Transformer backbones. Notably, it improves Point-MAE by +13. 4\% and PointGPT-S by +6. 3\% on OBJ-BG under Gaussian noise. These results highlight the importance of mitigating attentional biases in 3D representation learning, revealing BlindFormer’s superior ability to handle perturbations and improve feature discrimination.

AAAI Conference 2025 Conference Paper

When Hypergraph Meets Heterophily: New Benchmark Datasets and Baseline

  • Ming Li
  • Yongchun Gu
  • Yi Wang
  • Yujie Fang
  • Lu Bai
  • Xiaosheng Zhuang
  • Pietro Liò

Hypergraph neural networks (HNNs) have shown promise in handling tasks characterized by high-order correlations, achieving notable success across various applications. However, there has been limited focus on heterophilic hypergraph learning (HHL), in contrast to the increasing attention given to graph neural networks designed for graphs exhibiting heterophily. This paper aims to pave the way for HHL by addressing key gaps from multiple perspectives: measurement, dataset diversity, and baseline model development. First, we introduce metrics to quantify heterophily in hypergraphs, providing a numerical basis for assessing the homophily/heterophily ratio. Second, we develop diverse benchmark datasets across various real-world scenarios, facilitating comprehensive evaluations of existing HNNs and advancing research in HHL. Additionally, as a novel baseline model, we propose HyperUFG, a framelet-based HNN integrating both low-pass and high-pass filters. Extensive experiments conducted on synthetic and benchmark datasets highlight the challenges current HNNs face with heterophilic hypergraphs, while showcasing that HyperUFG performs competitively and often outperforms many existing models in such scenarios. Overall, our study underscores the urgent need for further exploration and development in this emerging field, with the potential to inspire and guide future research in HHL.

AIIM Journal 2024 Journal Article

An in-depth survey on Deep Learning-based Motor Imagery Electroencephalogram (EEG) classification

  • Xianheng Wang
  • Veronica Liesaputra
  • Zhaobin Liu
  • Yi Wang
  • Zhiyi Huang

Electroencephalogram (EEG)-based Brain–Computer Interfaces (BCIs) build a communication path between human brain and external devices. Among EEG-based BCI paradigms, the most commonly used one is motor imagery (MI). As a hot research topic, MI EEG-based BCI has largely contributed to medical fields and smart home industry. However, because of the low signal-to-noise ratio (SNR) and the non-stationary characteristic of EEG data, it is difficult to correctly classify different types of MI-EEG signals. Recently, the advances in Deep Learning (DL) significantly facilitate the development of MI EEG-based BCIs. In this paper, we provide a systematic survey of DL-based MI-EEG classification methods. Specifically, we first comprehensively discuss several important aspects of DL-based MI-EEG classification, covering input formulations, network architectures, public datasets, etc. Then, we summarize problems in model performance comparison and give guidelines to future studies for fair performance comparison. Next, we fairly evaluate the representative DL-based models using source code released by the authors and meticulously analyse the evaluation results. By performing ablation study on the network architecture, we found that (1) effective feature fusion is indispensable for multi-stream CNN-based models. (2) LSTM should be combined with spatial feature extraction techniques to obtain good classification performance. (3) the use of dropout contributes little to improving the model performance, and that (4) adding fully connected layers to the models significantly increases their parameters but it might not improve their performance. Finally, we raise several open issues in MI-EEG classification and provide possible future research directions.

EAAI Journal 2024 Journal Article

An integrated interval-valued spherical fuzzy Choquet integral based decision making model for prioritizing risk in Fine-Kinney

  • Yi Wang
  • Weizhong Wang
  • Muhammet Deveci
  • Xinyue Yu

The Fine-Kinney model has been expanded to serve as a tool for analyzing systemic risk in different industries, including occupational hazards. However, existing models for prioritizing risk in Fine-Kinney fail to account for the interconnectedness of experts’ cognitive information in interval-valued spherical fuzzy situations. This paper creates a new fuzzy compromise ranking of alternatives from distance to ideal solution (CRADIS) method to address the limitations of the Fine-Kinney model in occupational risk analysis. The interval-valued spherical fuzzy numbers are extended into the conventional risk scales in the Fine-Kinney model to generate a processing method for uncertain risk rating data. Then, the weighted averaging operator is developed based on the Choquet integral and interval-valued spherical fuzzy sets to construct the collective risk assessment matrix. This developed operator can capture inter-dependencies among risk data. Next, an enhanced CRADIS method with interval-valued spherical fuzzy sets and entropy measures is presented to address the risk ranking issue in the application of Fine-Kinney. To demonstrate the implementation of the synthesized occupational risk framework, a case study analyzing the occupational hazards in the metro construction process uses numerical methods. The proposed framework undergoes parameter sensitivity analysis to test its rationality. An evaluation compares the enhanced framework with the risk-prioritizing methods used in F–K to assess its advantages. The outcome suggests that the framework offers an appropriate and dependable approach to prioritizing occupational risks in a subjective and ambiguous scenario.

NeurIPS Conference 2024 Conference Paper

Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection?

  • Qingsong Zhao
  • Yi Wang
  • Jilan Xu
  • Yinan He
  • Zifan Song
  • Limin Wang
  • Yu Qiao
  • Cairong Zhao

Video understanding relies on accurate action detection for temporal analysis. However, existing mainstream methods have limitations in real-world applications due to their offline and closed-set evaluation approaches, as well as their dependence on manual annotations. To address these challenges and enable real-time action understanding in open-world scenarios, we propose OV-OAD, a zero-shot online action detector that leverages vision-language models and learns solely from text supervision. By introducing an object-centered decoder unit into a Transformer-based model, we aggregate frames with similar semantics using video-text correspondence. Extensive experiments on four action detection benchmarks demonstrate that OV-OAD outperforms other advanced zero-shot methods. Specifically, it achieves 37. 5\% mean average precision on THUMOS’14 and 73. 8\% calibrated average precision on TVSeries. This research establishes a robust baseline for zero-shot transfer in online action detection, enabling scalable solutions for open-world temporal understanding. The code will be available for download at \url{https: //github. com/OpenGVLab/OV-OAD}.

JBHI Journal 2024 Journal Article

DPFNet: Fast Reconstruction of Multi-Coil MRI Based on Dual Domain Parallel Fusion Network

  • Yi Wang
  • Bing Luo
  • Yuan Zhang
  • Zhenting Xiao
  • Miaomiao Wang
  • Yilong Niu
  • Asoke K. Nandi

There are relatively few studies on the multi-coil reconstruction task of existing Magnetic Resonance Imaging (MRI) methods, as there are problems with insufficient reconstruction details, high memory occupation during training, etc. Therefore, a new Dual-domain Parallel Fusion Reconstruction Network (DPFNet) is proposed in this paper. The whole network consists of coil sensitivity graph estimation module, dual domain feature extraction module, dual domain dynamic error correction module, and dual domain dynamic fusion module. A U-Net has been used as the backbone network. The network reconstructs under-sampled MRI images and K-space data simultaneously in two branches of the image domain and K-space domain, and the fusion module realizes the reconstruction information interaction between the two branches. In addition, a new dual domain consistency loss is also proposed, which reduces the error between the same MRI slice image and K-space data with dual domain output, and achieves high quality reconstruction. In this paper, a series of comparative experiments and ablation experiments are conducted in the open Calgary-Campinas-359 brain MRI data set. The results of the experiments show that the proposed DPFNet achieves the most advanced level at present and is superior to other traditional algorithms and reconstruction methods based on deep learning. In particular, the reconstruction results from Cartesian sampling are very good.

NeurIPS Conference 2024 Conference Paper

F-OAL: Forward-only Online Analytic Learning with Fast Training and Low Memory Footprint in Class Incremental Learning

  • Huiping Zhuang
  • Yuchen Liu
  • Run He
  • Kai Tong
  • Ziqian Zeng
  • Cen Chen
  • Yi Wang
  • Lap-Pui Chau

Online Class Incremental Learning (OCIL) aims to train models incrementally, where data arrive in mini-batches, and previous data are not accessible. A major challenge in OCIL is Catastrophic Forgetting, i. e. , the loss of previously learned knowledge. Among existing baselines, replay-based methods show competitive results but requires extra memory for storing exemplars, while exemplar-free (i. e. , data need not be stored for replay in production) methods are resource friendly but often lack accuracy. In this paper, we propose an exemplar-free approach—Forward-only Online Analytic Learning (F-OAL). Unlike traditional methods, F-OAL does not rely on back-propagation and is forward-only, significantly reducing memory usage and computational time. Cooperating with a pre-trained frozen encoder with Feature Fusion, F-OAL only needs to update a linear classifier by recursive least square. This approach simultaneously achieves high accuracy and low resource consumption. Extensive experiments on bench mark datasets demonstrate F-OAL’s robust performance in OCIL scenarios. Code is available at: https: //github. com/liuyuchen-cz/F-OAL

EAAI Journal 2024 Journal Article

MFFSP: Multi-scale feature fusion scene parsing network for landslides detection based on high-resolution satellite images

  • Penglei Li
  • Yi Wang
  • Tongzhen Si
  • Kashif Ullah
  • Wei Han
  • Lizhe Wang

Fast and efficient landslide detection plays an important role in post-disaster rescue and risk assessment. Existing convolution neural network (CNN) based landslide detection methods are difficult to exploit global long-distance dependencies due to limited receptive fields. Considering that landslide occurrence is susceptible to local and global conditions, we propose a novel multi-scale feature fusion scene parsing (MFFSP) framework to explore information at different scales by coupling CNN with Transformer to learn local and global clues for landslide detection based on satellite data. In the encoder, we design three modules, visual geometry module (VGM), residual learning module (RLM), and Transformer module (TRM) to exploit multi-scale features. Specifically, VGM and RLM are constructed based on convolution operations to explore local features by learning low-level and middle-level information, while TRM is built based on self-attention mechanism to learn long-distance dependencies. In the decoder, TRM and VGM are further extended to motivate the model to mine long-distance dependencies and detailed spatial information by deeply fusing features from multiple scales. To demonstrate the performance of the model, we employ two study areas with four test regions to conduct experiments and compare with seven state-of-the-art deep learning models. Extensive experiments demonstrate that MFFSP greatly outperforms other algorithms. In addition, we conduct numerous ablation experiments, proving that MFFSP fully combines the complementary advantages of CNN and Transformer to mine robust features.

IROS Conference 2024 Conference Paper

OW3Det: Toward Open-World 3D Object Detection for Autonomous Driving

  • Wenfei Hu
  • Weikai Lin
  • Hongyu Fang
  • Yi Wang
  • Dingsheng Luo

Despite their success in LIDAR object detection, modern detectors are vulnerable to uncommon instances and corner cases (e. g. , a runaway tire) since they are closed-set and static. Networks under the closed-set setup only predict labels of seen classes, while static models suffer from catastrophic forgetting when gradually learning novel concepts. This motivates us to formulate the open-world 3D object detection task for autonomous driving, which aims to 1) tackle the closed-set issue by identifying unseen instances as unknown and 2) incrementally learn novel classes without forgetting previously obtained knowledge. To achieve the open-world objectives, we propose Open-World 3D Detector (OW3Det), the first framework for open-world 3D object detection. The OW3Det comprises a base detector, a self-supervised unknown identifier, and a knowledge-distillation-restricted incremental learner. Although knowledge distillation facilitates preserving memories, imposing penalties on areas containing unknown objects hinders the incremental learning process. We mitigate this hindrance by employing unknown-driven pivotal mask, which eliminates unnecessary restrictions on regions overlapping with novel instances. Abundant experiments and visualizations demonstrate that the proposed OW3Det attains state-of-the-art performance.

AAAI Conference 2024 Conference Paper

PointPatchMix: Point Cloud Mixing with Patch Scoring

  • Yi Wang
  • Jiaze Wang
  • Jinpeng Li
  • Zixu Zhao
  • Guangyong Chen
  • Anfeng Liu
  • Pheng Ann Heng

Data augmentation is an effective regularization strategy for mitigating overfitting in deep neural networks, and it plays a crucial role in 3D vision tasks, where the point cloud data is relatively limited. While mixing-based augmentation has shown promise for point clouds, previous methods mix point clouds either on block level or point level, which has constrained their ability to strike a balance between generating diverse training samples and preserving the local characteristics of point clouds. The significance of each part component of the point clouds has not been fully considered, as not all parts contribute equally to the classification task, and some parts may contain unimportant or redundant information. To overcome these challenges, we propose PointPatchMix, a novel approach that mixes point clouds at the patch level and integrates a patch scoring module to generate content-based targets for mixed point clouds. Our approach preserves local features at the patch level, while the patch scoring module assigns targets based on the content-based significance score from a pre-trained teacher model. We evaluate PointPatchMix on two benchmark datasets including ModelNet40 and ScanObjectNN, and demonstrate significant improvements over various baselines in both synthetic and real-world datasets, as well as few-shot settings. With Point-MAE as our baseline, our model surpasses previous methods by a significant margin. Furthermore, our approach shows strong generalization across various point cloud methods and enhances the robustness of the baseline model. Code is available at https://jiazewang.com/projects/pointpatchmix.html.

IJCAI Conference 2024 Conference Paper

Purpose Enhanced Reasoning through Iterative Prompting: Uncover Latent Robustness of ChatGPT on Code Comprehension

  • Yi Wang
  • Qidong Zhao
  • Dongkuan Xu
  • Xu Liu

Code comments are crucial for gaining in-depth insights to facilitate code comprehension. The key to obtaining these insights lies in precisely summarizing the main purpose of the code. Recent approaches on code comment generation lie in prompting large language models (LLMs) such as ChatGPT, instead of training/fine-tuning specific models. Although ChatGPT demonstrates an impressive performance in code comprehension, it still suffers from robustness challenges in consistently producing high-quality code comments. This is because ChatGPT prioritizes the semantics of code tokens, which makes it vulnerable to commonly encountered benign perturbations such as variable name replacements. This study proposes a modular prompting paradigm Perthept to effectively mitigate the negative effects caused by such minor perturbations. Perthept iteratively enhances the reasoning depth to reach the main purpose of the code. Perthept demonstrates robustness under the scenario where there is stochasticity or unreliability in ChatGPT's responses. We give a comprehensive evaluation across four public datasets to show the consistent robustness improvement with our proposed methodology over other models.

NeurIPS Conference 2024 Conference Paper

SyncVIS: Synchronized Video Instance Segmentation

  • Rongkun Zheng
  • Lu Qi
  • Xi Chen
  • Yi Wang
  • Kun Wang
  • Yu Qiao
  • Hengshuang Zhao

Recent DETR-based methods have advanced the development of Video Instance Segmentation (VIS) through transformers' efficiency and capability in modeling spatial and temporal information. Despite harvesting remarkable progress, existing works follow asynchronous designs, which model video sequences via either video-level queries only or adopting query-sensitive cascade structures, resulting in difficulties when handling complex and challenging video scenarios. In this work, we analyze the cause of this phenomenon and the limitations of the current solutions, and propose to conduct synchronized modeling via a new framework named SyncVIS. Specifically, SyncVIS explicitly introduces video-level query embeddings and designs two key modules to synchronize video-level query with frame-level query embeddings: a synchronized video-frame modeling paradigm and a synchronized embedding optimization strategy. The former attempts to promote the mutual learning of frame- and video-level embeddings with each other and the latter divides large video sequences into small clips for easier optimization. Extensive experimental evaluations are conducted on the challenging YouTube-VIS 2019 & 2021 & 2022, and OVIS benchmarks, and SyncVIS achieves state-of-the-art results, which demonstrates the effectiveness and generality of the proposed approach. The code is available at https: //github. com/rkzheng99/SyncVIS.

NeurIPS Conference 2024 Conference Paper

Task-oriented Time Series Imputation Evaluation via Generalized Representers

  • Zhixian Wang
  • Linxiao Yang
  • Liang Sun
  • Qingsong Wen
  • Yi Wang

Time series analysis is widely used in many fields such as power energy, economics, and transportation, including different tasks such as forecasting, anomaly detection, classification, etc. Missing values are widely observed in these tasks, and often leading to unpredictable negative effects on existing methods, hindering their further application. In response to this situation, existing time series imputation methods mainly focus on restoring sequences based on their data characteristics, while ignoring the performance of the restored sequences in downstream tasks. Considering different requirements of downstream tasks (e. g. , forecasting), this paper proposes an efficient downstream task-oriented time series imputation evaluation approach. By combining time series imputation with neural network models used for downstream tasks, the gain of different imputation strategies on downstream tasks is estimated without retraining, and the most favorable imputation value for downstream tasks is given by combining different imputation strategies according to the estimated gain.

YNIMG Journal 2024 Journal Article

Trajectories and sex differences of brain structure, oxygenation and perfusion functions in normal aging

  • Di Wu
  • Yuanhao Li
  • Shun Zhang
  • Qiuyue Chen
  • Jiayu Fang
  • Junghun Cho
  • Yi Wang
  • Su Yan

BACKGROUND: Brain structure, oxygenation and perfusion are important factors in aging. Coupling between regional cerebral oxygen consumption and perfusion also reflects functions of neurovascular unit (NVU). Their trajectories and sex differences during normal aging important for clinical interpretation are still not well defined. In this study, we aim to investigate the relationship between brain structure, functions and age, and exam the sex disparities. METHOD: < 0.05 was considered statistically significant. RESULTS: < 0.05). CONCLUSION: The sex disparities, age trajectories of brain structure and functions as well as the coupling of NVU in healthy individuals provide insights into normal aging which are potential targets for study of pathological conditions.

NeurIPS Conference 2024 Conference Paper

Vision Mamba Mender

  • Jiacong Hu
  • Anda Cao
  • Zunlei Feng
  • Shengxuming Zhang
  • Yi Wang
  • Lingxiang Jia
  • Mingli Song

Mamba, a state-space model with selective mechanisms and hardware-aware architecture, has demonstrated outstanding performance in long sequence modeling tasks, particularly garnering widespread exploration and application in the field of computer vision. While existing works have mixed opinions of its application in visual tasks, the exploration of its internal workings and the optimization of its performance remain urgent and worthy research questions given its status as a novel model. Existing optimizations of the Mamba model, especially when applied in the visual domain, have primarily relied on predefined methods such as improving scanning mechanisms or integrating other architectures, often requiring strong priors and extensive trial and error. In contrast to these approaches, this paper proposes the Vision Mamba Mender, a systematic approach for understanding the workings of Mamba, identifying flaws within, and subsequently optimizing model performance. Specifically, we present methods for predictive correlation analysis of Mamba's hidden states from both internal and external perspectives, along with corresponding definitions of correlation scores, aimed at understanding the workings of Mamba in visual recognition tasks and identifying flaws therein. Additionally, tailored repair methods are proposed for identified external and internal state flaws to eliminate them and optimize model performance. Extensive experiments validate the efficacy of the proposed methods on prevalent Mamba architectures, significantly enhancing Mamba's performance.

NeurIPS Conference 2024 Conference Paper

Voxel Proposal Network via Multi-Frame Knowledge Distillation for Semantic Scene Completion

  • Lubo Wang
  • Di Lin
  • Kairui Yang
  • Ruonan Liu
  • Qing Guo
  • Wuyuan Xie
  • Miaohui Wang
  • Lingyu Liang

Semantic scene completion is a difficult task that involves completing the geometry and semantics of a scene from point clouds in a large-scale environment. Many current methods use 3D/2D convolutions or attention mechanisms, but these have limitations in directly constructing geometry and accurately propagating features from related voxels, the completion likely fails while propagating features in a single pass without considering multiple potential pathways. And they are generally only suitable for static scenes and struggle to handle dynamic aspects. This paper introduces Voxel Proposal Network (VPNet) that completes scenes from 3D and Bird's-Eye-View (BEV) perspectives. It includes Confident Voxel Proposal based on voxel-wise coordinates to propose confident voxels with high reliability for completion. This method reconstructs the scene geometry and implicitly models the uncertainty of voxel-wise semantic labels by presenting multiple possibilities for voxels. VPNet employs Multi-Frame Knowledge Distillation based on the point clouds of multiple adjacent frames to accurately predict the voxel-wise labels by condensing various possibilities of voxel relationships. VPNet has shown superior performance and achieved state-of-the-art results on the SemanticKITTI and SemanticPOSS datasets.

YNIMG Journal 2023 Journal Article

Age-dependent changes in brain iron deposition and volume in deep gray matter nuclei using quantitative susceptibility mapping

  • Gaiying Li
  • Rui Tong
  • Miao Zhang
  • Kelly M. Gillen
  • Wenqing Jiang
  • Yasong Du
  • Yi Wang
  • Jianqi Li

BACKGROUND: Microstructural changes in deep gray matter (DGM) nuclei are related to physiological behavior, cognition, and memory. Therefore, it is critical to study age-dependent trajectories of biomarkers in DGM nuclei for understanding brain development and aging, as well as predicting cognitive or neurodegenerative diseases. OBJECTIVES: We aimed to (1) characterize age-dependent trajectories of mean susceptibility, adjusted volume, and total iron content simultaneously in DGM nuclei using quantitative susceptibility mapping (QSM); (2) examine potential contributions of sex related effects to the different age-dependence trajectories of volume and iron deposition; and (3) evaluate the ability of brain age prediction by combining mean magnetic susceptibility and volume of DGM nuclei. METHODS: Magnetic susceptibilities and volumetric values of DGM nuclei were obtained from 220 healthy participants (aged 10-70 years) scanned on a 3T MRI system. Regions of interest (ROIs) were drawn manually on the QSM images. Univariate regression analysis between age and each of the MRI measurements in a single ROI was performed. Pearson correlation coefficients were calculated between magnetic susceptibility and adjusted volume in a single ROI. The statistical significance of sex differences in age-dependent trajectories of magnetic susceptibilities and adjusted volumes were determined using one-way ANCOVA. Multiple regression analysis was used to evaluate the ability to estimate brain age using a combination of the mean susceptibilities and adjusted volumes in multiple DGM nuclei. RESULTS: = 0.586). CONCLUSIONS: QSM can be used to simultaneously investigate age- and sex- dependent changes in magnetic susceptibility and volume of DGM nuclei, thus enabling a comprehensive understanding of the developmental trajectories of iron accumulation and volume in DGM nuclei during brain development and aging.

NeurIPS Conference 2023 Conference Paper

Bitstream-Corrupted Video Recovery: A Novel Benchmark Dataset and Method

  • Tianyi Liu
  • Kejun Wu
  • Yi Wang
  • Wenyang Liu
  • Kim-Hui Yap
  • Lap-Pui Chau

The past decade has witnessed great strides in video recovery by specialist technologies, like video inpainting, completion, and error concealment. However, they typically simulate the missing content by manual-designed error masks, thus failing to fill in the realistic video loss in video communication (e. g. , telepresence, live streaming, and internet video) and multimedia forensics. To address this, we introduce the bitstream-corrupted video (BSCV) benchmark, the first benchmark dataset with more than 28, 000 video clips, which can be used for bitstream-corrupted video recovery in the real world. The BSCV is a collection of 1) a proposed three-parameter corruption model for video bitstream, 2) a large-scale dataset containing rich error patterns, multiple corruption levels, and flexible dataset branches, and 3) a new video recovery framework that serves as a benchmark. We evaluate state-of-the-art video inpainting methods on the BSCV dataset, demonstrating existing approaches' limitations and our framework's advantages in solving the bitstream-corrupted video recovery problem. The benchmark and dataset are released at https: //github. com/LIUTIGHE/BSCV-Dataset.

AAAI Conference 2023 Conference Paper

Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech Recognition

  • Qingyu Wang
  • Tielin Zhang
  • Minglun Han
  • Yi Wang
  • Duzhen Zhang
  • Bo Xu

The spiking neural network (SNN) using leaky-integrated-and-fire (LIF) neurons has been commonly used in automatic speech recognition (ASR) tasks. However, the LIF neuron is still relatively simple compared to that in the biological brain. Further research on more types of neurons with different scales of neuronal dynamics is necessary. Here we introduce four types of neuronal dynamics to post-process the sequential patterns generated from the spiking transformer to get the complex dynamic neuron improved spiking transformer neural network (DyTr-SNN). We found that the DyTr-SNN could handle the non-toy automatic speech recognition task well, representing a lower phoneme error rate, lower computational cost, and higher robustness. These results indicate that the further cooperation of SNNs and neural dynamics at the neuron and network scales might have much in store for the future, especially on the ASR tasks.

EAAI Journal 2023 Journal Article

Deep time–frequency learning for interpretable weak signal enhancement of rotating machineries

  • Jiakai Ding
  • Yi Wang
  • Yi Qin
  • Baoping Tang

Rotating machinery generates periodic impulses when weak faults occur, but the fault resonant bands (FRB) which contains rich fault information is usually seriously contaminated by random interference and background noise. With the development of deep learning (DL), the existing signal enhancement methods based on DL have the property of ”black box”, i. e. , limited interpretability, by which the fault components with clear physical meaning (CPM) cannot be extracted. Aiming at the aforementioned problems, an interpretable weak signal enhancement method based on deep time–frequency learning (DTFL) is proposed. Firstly, a series of simulated template signals of different resonant bands with CPM are constructed as training samples for DTFL, besides, a resonant band time–frequency ratio mask with CPM is generated as the training target by using the time–frequency representations (TFR) of pure and noise-added simulated template signals to construct a non-linear mapping relationship between the TFR of simulated template signals and resonant bands, thereby to achieve accurate enhancement of FRB and credible mask suppression of background noise interference. Therefore, the time–frequency ratio mask with CPM obtained from the DTFL model can be adapted to enhance the FRB, which enables the results of the enhanced FRB to be interpretable, thereby the constructed DTFL model is interpretable. The DTFL method overcomes the problem that the existing signal enhancement methods rely heavily on expert experience, which can provide a CPM of the FRB to support the diagnosis of weak faults in rotating machineries.

YNIMG Journal 2023 Journal Article

Evaluation of whole-brain oxygen metabolism in Alzheimer's disease using QSM and quantitative BOLD

  • Aocai Yang
  • Hangwei Zhuang
  • Lei Du
  • Bing Liu
  • Kuan Lv
  • Jixin Luan
  • Pianpian Hu
  • Feng Chen

OBJECTIVE: ) perturbation in Alzheimer's disease (AD) and investigate the relationship between regional cerebral oxygen metabolism and global cognition. METHODS: analyses were performed. The associations between these measures in substructures of deep brain gray matter and MMSE scores were assessed. RESULTS: values in the bilateral hippocampus positively correlated with the MMSE score. CONCLUSION: in the hippocampus may be a useful tool for monitoring cognitive impairment.

YNIMG Journal 2023 Journal Article

Exploring the neural mechanisms underlying achalasia: A study of functional connectivity and regional brain activity

  • Nina Zhang
  • Binyu Teng
  • Xinyi Lu
  • Liangliang Shi
  • Li Liu
  • Fan Zhou
  • Ni Jiang
  • Xin Zhang

BACKGROUND AND AIMS: The pathophysiology of achalasia, which involves central nuclei abnormalities, remains unknown. We investigated the resting-state functional MRI (rs-fMRI) features of patients with achalasia. METHODS: We applied resting-state functional MRI (rs-fMRI) to investigate the brain features in patients with achalasia (n = 27), compared to healthy controls (n = 29). Focusing on three regions of interest (ROIs): the dorsal motor nucleus of the vagus (DMV), the nucleus ambiguus (NA), and the nucleus of the solitary tract (NTS), we analyzed variations in resting-state functional connectivity (rs-FC), fractional amplitude of low-frequency fluctuations (fALFF), and regional homogeneity (ReHo). RESULTS: Achalasia patients demonstrated stronger functional connectivity between the NA and the right precentral gyrus, left postcentral gyrus, and left insula. No significant changes were found in the DMV or NTS. The fMRI analysis showed higher rs-FC values for NA-DMV and NA-NTS connections in achalasia patients. Achalasia patients exhibited decreased fALFF values in the NA, DMV, and NTS regions, as well as increased ReHo values in the NA and DMV regions. A positive correlation was observed between fALFF values in all six ROIs and the width of the barium meal. The NTS fALFF value and NA ReHo value displayed a positive correlation with integrated relaxation pressure (IRP), while the ReHo value in the right precentral gyrus showed an inverse correlation with the height of the barium meal. CONCLUSIONS: Abnormal rs-FC and regional brain activity was found in patients with achalasia. Our study provides new insights into the pathophysiology of achalasia and highlights the potential of rs-fMRI in improving the diagnosis and treatment of this condition.

EAAI Journal 2023 Journal Article

Hybrid path planning based on adaptive visibility graph initialization and edge computing for mobile robots

  • Junlin Ou
  • Seong Hyeon Hong
  • Ge Song
  • Yi Wang

This paper presents a new initialization method that combines adaptive visibility graphs and the A* algorithm to improve the exploration, accuracy, and computing efficiency of hybrid path planning for mobile robots. First, segments/links in the full visibility graphs are removed randomly in an iterative and adaptive manner, yielding adaptive visibility graphs. Then the A* algorithm is applied to find the shortest paths in these adaptive visibility graphs. Next, high-quality paths featuring low fitness values are chosen to initialize the subsequent heuristic optimization in hybrid path planning. Specifically, in the present study, the genetic algorithm (GA) is implemented on a CPU/GPU edge computing device (Jetson AGX Xavier) to exploit its massively parallel processing threads, and the strategy for judicious CPU/GPU resource utilization is also developed. Numerical experiments are conducted to determine proper hyperparameters and configure GA with balanced performance. Various optimal paths with differential consideration of practical factors for robot path planning are obtained by the proposed method. Compared to the other benchmark methods, ours significantly improves the diversity of initial path and exploration, optimization accuracy, and computing speed (within 5 s with most less than 2 s). Furthermore, real-time experiments are carried out to demonstrate the effectiveness and application of the proposed algorithm on mobile robots.

NeurIPS Conference 2023 Conference Paper

JourneyDB: A Benchmark for Generative Image Understanding

  • Keqiang Sun
  • Junting Pan
  • Yuying Ge
  • Hao Li
  • Haodong Duan
  • Xiaoshi Wu
  • Renrui Zhang
  • Aojun Zhou

While recent advancements in vision-language models have had a transformative impact on multi-modal comprehension, the extent to which these models possess the ability to comprehend generated images remains uncertain. Synthetic images, in comparison to real data, encompass a higher level of diversity in terms of both content and style, thereby presenting significant challenges for the models to fully grasp. In light of this challenge, we introduce a comprehensive dataset, referred to as JourneyDB, that caters to the domain of generative images within the context of multi-modal visual understanding. Our meticulously curated dataset comprises 4 million distinct and high-quality generated images, each paired with the corresponding text prompts that were employed in their creation. Furthermore, we additionally introduce an external subset with results of another 22 text-to-image generative models, which makes JourneyDB a comprehensive benchmark for evaluating the comprehension of generated images. On our dataset, we have devised four benchmarks to assess the performance of generated image comprehension in relation to both content and style interpretation. These benchmarks encompass prompt inversion, style retrieval, image captioning, and visual question answering. Lastly, we evaluate the performance of state-of-the-art multi-modal models when applied to the JourneyDB dataset, providing a comprehensive analysis of their strengths and limitations in comprehending generated content. We anticipate that the proposed dataset and benchmarks will facilitate further research in the field of generative content understanding. The dataset is publicly available at https: //journeydb. github. io.

YNIMG Journal 2023 Journal Article

LARO: Learned acquisition and reconstruction optimization to accelerate quantitative susceptibility mapping

  • Jinwei Zhang
  • Pascal Spincemaille
  • Hang Zhang
  • Thanh D. Nguyen
  • Chao Li
  • Jiahao Li
  • Ilhami Kovanlikaya
  • Mert R. Sabuncu

Quantitative susceptibility mapping (QSM) involves acquisition and reconstruction of a series of images at multi-echo time points to estimate tissue field, which prolongs scan time and requires specific reconstruction technique. In this paper, we present our new framework, called Learned Acquisition and Reconstruction Optimization (LARO), which aims to accelerate the multi-echo gradient echo (mGRE) pulse sequence for QSM. Our approach involves optimizing a Cartesian multi-echo k-space sampling pattern with a deep reconstruction network. Next, this optimized sampling pattern was implemented in an mGRE sequence using Cartesian fan-beam k-space segmenting and ordering for prospective scans. Furthermore, we propose to insert a recurrent temporal feature fusion module into the reconstruction network to capture signal redundancies along echo time. Our ablation studies show that both the optimized sampling pattern and proposed reconstruction strategy help improve the quality of the multi-echo image reconstructions. Generalization experiments show that LARO is robust on the test data with new pathologies and different sequence parameters. Our code is available at https://github.com/Jinwei1209/LARO-QSM.git.

EAAI Journal 2023 Journal Article

Multi-agent deep reinforcement learning for task offloading in group distributed manufacturing systems

  • Jianyu Xiong
  • Peng Guo
  • Yi Wang
  • Xiangyin Meng
  • Jian Zhang
  • Linmao Qian
  • Zhenglin Yu

The rapid development of cloud computing and the Internet of Things (IoT) have facilitated near real-time optimization of the group distributed manufacturing systems. Currently, the most common technique to accomplish near-real-time optimization is cloud–edge cooperation for offloading optimization tasks. The tasks are partially offloaded to the cloud to be completed, and the remaining are kept at the edge. Due to the complexity of task offloading, such as capacity restrictions of cloud and edge computing resources, or task deadlines, unbalanced or insufficient tasks are offloaded to cloud and edge, causing time delay. To address the imbalance and insufficiency in the task offloading process, a mixed-integer programming model was developed to reduce the latency of task calculation. The task offloading problem is decomposed into two sub-problems: 1) Defining priorities for the tasks in near real-time. 2) Determining if the task is offloaded to the cloud. A multi-agent deep reinforcement learning with attention mechanism (MaDRLAM) framework is proposed to solve the two-step decision problem. The MaDRLAM framework consists of two agents, and each agent corresponds to a sub-problem. Each agent comprises an encoder and a decoder, and the two agents cooperate in devising an offloading strategy for the tasks. The Encoder and Decoder built for each agent are based on the Transformer structure. Unlike the traditional Transformer, we added the Pointer networks to the Transformer to solve the proposed decision problem. Besides, an improved multi-actor and single-critic strategy based on the REINFORCE algorithm is designed to train the proposed MaDRLAM. Finally, Extensive computational experiments are conducted on instances with a varying number of tasks, different task data sizes, and different cloud computing capacities. Computational results show that the proposed framework can find a solution with a GAP value of less than 1% within 1 s for each instance. The proposed framework is competitive in both solution accuracy and solution time compared with other offloading strategies.

AAAI Conference 2023 Conference Paper

ScatterFormer: Locally-Invariant Scattering Transformer for Patient-Independent Multispectral Detection of Epileptiform Discharges

  • Ruizhe Zheng
  • Jun Li
  • Yi Wang
  • Tian Luo
  • Yuguo Yu

Patient-independent detection of epileptic activities based on visual spectral representation of continuous EEG (cEEG) has been widely used for diagnosing epilepsy. However, precise detection remains a considerable challenge due to subtle variabilities across subjects, channels and time points. Thus, capturing fine-grained, discriminative features of EEG patterns, which is associated with high-frequency textural information, is yet to be resolved. In this work, we propose Scattering Transformer (ScatterFormer), an invariant scattering transform-based hierarchical Transformer that specifically pays attention to subtle features. In particular, the disentangled frequency-aware attention (FAA) enables the Transformer to capture clinically informative high-frequency components, offering a novel clinical explainability based on visual encoding of multichannel EEG signals. Evaluations on two distinct tasks of epileptiform detection demonstrate the effectiveness our method. Our proposed model achieves median AUCROC and accuracy of 98.14%, 96.39% in patients with Rolandic epilepsy. On a neonatal seizure detection benchmark, it outperforms the state-of-the-art by 9% in terms of average AUCROC.

JMLR Journal 2023 Journal Article

SQLFlow: An Extensible Toolkit Integrating DB and AI

  • Jun Zhou
  • Ke Zhang
  • Lin Wang
  • Hua Wu
  • Yi Wang
  • Chaochao Chen

Integrating AI algorithms into databases is an ongoing effort in both academia and industry. We introduce SQLFlow, a toolkit seamlessly combining data manipulations and AI operations that can be run locally or remotely. SQLFlow extends SQL syntax to support typical AI tasks including model training, inference, interpretation, and mathematical optimization. It is compatible with a variety of database management systems (DBMS) and AI engines, including MySQL, TiDB, MaxCompute, and Hive, as well as TensorFlow, scikit-learn, and XGBoost. Documentations and case studies are available at https://sqlflow.org. The source code and additional details can be found at https://github.com/sql-machine-learning/sqlflow. &copy JMLR 2023. ( edit, beta )

NeurIPS Conference 2023 Conference Paper

SSL4EO-L: Datasets and Foundation Models for Landsat Imagery

  • Adam Stewart
  • Nils Lehmann
  • Isaac Corley
  • Yi Wang
  • Yi-Chia Chang
  • Nassim Ait Ait Ali Braham
  • Shradha Sehgal
  • Caleb Robinson

The Landsat program is the longest-running Earth observation program in history, with 50+ years of data acquisition by 8 satellites. The multispectral imagery captured by sensors onboard these satellites is critical for a wide range of scientific fields. Despite the increasing popularity of deep learning and remote sensing, the majority of researchers still use decision trees and random forests for Landsat image analysis due to the prevalence of small labeled datasets and lack of foundation models. In this paper, we introduce SSL4EO-L, the first ever dataset designed for Self-Supervised Learning for Earth Observation for the Landsat family of satellites (including 3 sensors and 2 product levels) and the largest Landsat dataset in history (5M image patches). Additionally, we modernize and re-release the L7 Irish and L8 Biome cloud detection datasets, and introduce the first ML benchmark datasets for Landsats 4–5 TM and Landsat 7 ETM+ SR. Finally, we pre-train the first foundation models for Landsat imagery using SSL4EO-L and evaluate their performance on multiple semantic segmentation tasks. All datasets and model weights are available via the TorchGeo library, making reproducibility and experimentation easy, and enabling scientific advancements in the burgeoning field of remote sensing for a multitude of downstream applications.

NeurIPS Conference 2023 Conference Paper

TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation

  • Rongkun Zheng
  • Lu Qi
  • Xi Chen
  • Yi Wang
  • Kun Wang
  • Yu Qiao
  • Hengshuang Zhao

Training on large-scale datasets can boost the performance of video instance segmentation while the annotated datasets for VIS are hard to scale up due to the high labor cost. What we possess are numerous isolated filed-specific datasets, thus, it is appealing to jointly train models across the aggregation of datasets to enhance data volume and diversity. However, due to the heterogeneity in category space, as mask precision increase with the data volume, simply utilizing multiple datasets will dilute the attention of models on different taxonomy. Thus, increasing the data scale and enriching taxonomy space while improving classification precision is important. In this work, we analyze that providing extra taxonomy information can help models concentrate on specific taxonomy, and propose our model named Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation (TMT-VIS) to address this vital challenge. Specifically, we design a two-stage taxonomy aggregation module that first compiles taxonomy information from input videos and then aggregates these taxonomy priors into instance queries before the transformer decoder. We conduct extensive experimental evaluations on four popular and challenging benchmarks, including YouTube-VIS 2019, YouTube-VIS 2021, OVIS, and UVO. Our model shows significant improvement over the baseline solutions, and sets new state-of-the-art records on all these benchmarks. These appealing and encouraging results demonstrate the effectiveness and generality of our proposed approach. The code and trained models will be publicly available.

YNICL Journal 2022 Journal Article

A comparative assessment of myelin-sensitive measures in multiple sclerosis patients and healthy subjects

  • Reza Rahmanzadeh
  • Matthias Weigel
  • Po-Jui Lu
  • Lester Melie-Garcia
  • Thanh D. Nguyen
  • Alessandro Cagol
  • Francesco La Rosa
  • Muhamed Barakovic

INTRODUCTION: Multiple Sclerosis (MS) is a common neurological disease primarily characterized by myelin damage in lesions and in normal - appearing white and gray matter (NAWM, NAGM). Several quantitative MRI (qMRI) methods are sensitive to myelin characteristics by measuring specific tissue biophysical properties. However, there are currently few studies assessing the relative reproducibility and sensitivity of qMRI measures to MS pathology in vivo in patients. METHODS: We performed two studies. The first study assessed of the sensitivity of qMRI measures to MS pathology: in this work, we recruited 150 MS and 100 healthy subjects, who underwent brain MRI at 3 T including quantitative T1 mapping (qT1), quantitative susceptibility mapping (QSM), magnetization transfer saturation imaging (MTsat) and myelin water imaging for myelin water fraction (MWF). The sensitivity of qMRIs to MS focal pathology (MS lesions vs peri-plaque white/gray matter (PPWM/PPGM)) was studied lesion-wise; the sensitivity to diffuse normal appearing (NA) pathology was measured using voxel-wise threshold-free cluster enhancement (TFCE) in NAWM and vertex-wise inflated cortex analysis in NAGM. Furthermore, the sensitivity of qMRI to the identification of lesion tissue was investigated using a voxel-wise logistic regression analysis to distinguish MS lesion and PP voxels. The second study assessed the reproducibility of myelin-sensitive qMRI measures in a single scanner. To evaluate the intra-session and inter-session reproducibility of qMRI measures, we have investigated 10 healthy subjects, who underwent two brain 3 T MRIs within the same day (without repositioning), and one after 1-week interval. Five region of interest (ROIs) in white and deep grey matter areas were segmented, and inter- and intra- session reproducibility was studied using the intra-class correlation coefficient (ICC). Further, we also investigated the voxel-wise reproducibility of qMRI measures in NAWM and NAGM. RESULTS: qT1 and QSM showed the highest sensitivity to distinguish MS focal WM and cortical pathology from peri-plaque WM (P < 0.0001), although QSM also showed the highest variance when applied to lesions. MWF and MTsat exhibited the highest sensitivity to NAWM pathology (P < 0.01). On the other hand, qT1 appeared to be the most sensitive measure to NAGM pathology (P < 0.01). All myelin-sensitive qMRI measures exhibited high inter/intra sessional ICCs in various WM and deep GM ROIs, in NAWM and in NAGM (ICC 0.82 ± 0.12). CONCLUSION: This work shows that the applied qT1, MWF, MTsat and QSM are highly reproducible and exhibit differential sensitivity to focal and diffuse WM and GM pathology in MS patients.

ICRA Conference 2022 Conference Paper

Audio-Visual Grounding Referring Expression for Robotic Manipulation

  • Yefei Wang
  • Kaili Wang
  • Yi Wang
  • Di Guo 0002
  • Huaping Liu 0001
  • Fuchun Sun 0001

Referring expressions are commonly used when referring to a specific target in people's daily dialogue. In this paper, we develop a novel task of audio-visual grounding referring expression for robotic manipulation. The robot leverages both the audio and visual information to understand the referring expression in the given manipulation instruction and the corresponding manipulations are implemented. To solve the proposed task, an audio-visual framework is proposed for visual localization and sound recognition. We have also established a dataset which contains visual data, auditory data and manipulation instructions for evaluation. Finally, extensive experiments are conducted both offline and online to verify the effectiveness of the proposed audio-visual framework. And it is demonstrated that the robot performs better with the audio-visual data than with only the visual data.

IJCAI Conference 2022 Conference Paper

Diversity Features Enhanced Prototypical Network for Few-shot Intent Detection

  • Fengyi Yang
  • Xi Zhou
  • Yi Wang
  • Abibulla Atawulla
  • Ran Bi

Few-shot Intent Detection (FSID) is a challenging task in dialogue systems due to the scarcity of available annotated utterances. Although existing few-shot learning approaches have made remarkable progress, they fall short in adapting to the Generalized Few-shot Intent Detection (GFSID) task where both seen and unseen classes are present. A core problem of the simultaneous existence of these two tasks is that limited training samples fail to cover the diversity of user expressions. In this paper, we propose an effective Diversity Features Enhanced Prototypical Network (DFEPN) to enhance diversity features for novel intents by fully exploiting the diversity of known intent samples. Specially, DFEPN generates diversity features of samples in the hidden space via a diversity feature generator module and then fuses these features with original support vectors to get a more suitable prototype vector of each class. To evaluate the effectiveness of our model on both FSID and GFSID tasks, we carry out sufficient experiments on two benchmark intent detection datasets. Results demonstrate that our proposed model outperforms existing state-of-the-art methods and keeps stable performance on both two tasks.

ICRA Conference 2022 Conference Paper

Learning Friction Model for Magnet-Actuated Tethered Capsule Robot

  • Yi Wang
  • Yuyang Tu
  • Yuchen He 0004
  • Xutian Deng
  • Ziwei Lei
  • Jianwei Zhang 0001
  • Miao Li 0002

The potential diagnostic applications of magnet-actuated capsules have been greatly increased in recent years. For most of these potential applications, accurate position control of the capsule have been highly demanding. However, the friction between the robot and the environment as well as the drag force from the tether play a significant role during the motion control of the capsule. Moreover, these forces especially the friction force are typically hard to model beforehand. In this paper, we first designed a magnet-actuated tethered capsule robot, where the driving magnet is mounted on the end of a robotic arm. Then, we proposed a learning-based approach to model the friction force between the capsule and the environment, with the goal of increasing the control accuracy of the whole system. Finally, several real robot experiments are demonstrated to showcase the effectiveness of our proposed approach.

ICLR Conference 2022 Conference Paper

Nonlinear ICA Using Volume-Preserving Transformations

  • Xiaojiang Yang
  • Yi Wang
  • Jiacheng Sun
  • Xing Zhang
  • Shifeng Zhang
  • Zhenguo Li
  • Junchi Yan

Nonlinear ICA is a fundamental problem in machine learning, aiming to identify the underlying independent components (sources) from data which is assumed to be a nonlinear function (mixing function) of these sources. Recent works prove that if the sources have some particular structures (e.g. temporal structure), they are theoretically identifiable even if the mixing function is arbitrary. However, in many cases such restrictions on the sources are difficult to satisfy or even verify, hence it inhibits the applicability of the proposed methods. Different from these works, we propose a general framework for nonlinear ICA, in which the mixing function is assumed to be a volume-preserving transformation, and meanwhile the conditions on the sources can be much looser. We provide an insightful proof of the identifiability of the proposed framework. We implement the framework by volume-preserving Flow-based models, and verify our theory by experiments on artificial data and synthesized images. Moreover, results on real-world images indicate that our framework can disentangle interpretable features.

YNICL Journal 2022 Journal Article

QSMRim-Net: Imbalance-aware learning for identification of chronic active multiple sclerosis lesions on quantitative susceptibility maps

  • Hang Zhang
  • Thanh D. Nguyen
  • Jinwei Zhang
  • Melanie Marcille
  • Pascal Spincemaille
  • Yi Wang
  • Susan A. Gauthier
  • Elizabeth M. Sweeney

BACKGROUND AND PURPOSE: Chronic active multiple sclerosis (MS) lesions are characterized by a paramagnetic rim at the edge of the lesion and are associated with increased disability in patients. Quantitative susceptibility mapping (QSM) is an MRI technique that is sensitive to chronic active lesions, termed rim + lesions on the QSM. We present QSMRim-Net, a data imbalance-aware deep neural network that fuses lesion-level radiomic and convolutional image features for automated identification of rim + lesions on QSM. METHODS: QSM and T2-weighted-Fluid-Attenuated Inversion Recovery (T2-FLAIR) MRI of the brain were collected at 3 T for 172 MS patients. Rim + lesions were manually annotated by two human experts, followed by consensus from a third expert, for a total of 177 rim + and 3986 rim negative (rim-) lesions. Our automated rim + detection algorithm, QSMRim-Net, consists of a two-branch feature extraction network and a synthetic minority oversampling network to classify rim + lesions. The first network branch is for image feature extraction from the QSM and T2-FLAIR, and the second network branch is a fully connected network for QSM lesion-level radiomic feature extraction. The oversampling network is designed to increase classification performance with imbalanced data. RESULTS: On a lesion-level, in a five-fold cross validation framework, the proposed QSMRim-Net detected rim + lesions with a partial area under the receiver operating characteristic curve (pROC AUC) of 0.760, where clinically relevant false positive rates of less than 0.1 were considered. The method attained an area under the precision recall curve (PR AUC) of 0.704. QSMRim-Net out-performed other state-of-the-art methods applied to the QSM on both pROC AUC and PR AUC. On a subject-level, comparing the predicted rim + lesion count and the human expert annotated count, QSMRim-Net achieved the lowest mean square error of 0.98 and the highest correlation of 0.89 (95% CI: 0.86, 0.92). CONCLUSION: This study develops a novel automated deep neural network for rim + MS lesion identification using T2-FLAIR and QSM images.

AAAI Conference 2021 Conference Paper

Adversarial Defence by Diversified Simultaneous Training of Deep Ensembles

  • Bo Huang
  • Zhiwei Ke
  • Yi Wang
  • Wei Wang
  • Linlin Shen
  • Feng Liu

Learning-based classifiers are susceptible to adversarial examples. Existing defence methods are mostly devised on individual classifiers. Recent studies showed that it is viable to increase adversarial robustness by promoting diversity over an ensemble of models. In this paper, we propose adversarial defence by encouraging ensemble diversity on learning high-level feature representations and gradient dispersion in simultaneous training of deep ensemble networks. We perform extensive evaluations under white-box and blackbox attacks including transferred examples and adaptive attacks. Our approach achieves a significant gain of up to 52% in adversarial robustness, compared with the baseline and the state-of-the-art method on image benchmarks with complex data scenes. The proposed approach complements the defence paradigm of adversarial training, and can further boost the performance. The source code is available at https: //github. com/ALIS-Lab/AAAI2021-PDD.

YNICL Journal 2021 Journal Article

ALL-Net: Anatomical information lesion-wise loss function integrated into neural network for multiple sclerosis lesion segmentation

  • Hang Zhang
  • Jinwei Zhang
  • Chao Li
  • Elizabeth M. Sweeney
  • Pascal Spincemaille
  • Thanh D. Nguyen
  • Susan A. Gauthier
  • Yi Wang

Accurate detection and segmentation of multiple sclerosis (MS) brain lesions on magnetic resonance images are important for disease diagnosis and treatment. This is a challenging task as lesions vary greatly in size, shape, location, and image contrast. The objective of our study was to develop an algorithm based on deep convolutional neural network integrated with anatomic information and lesion-wise loss function (ALL-Net) for fast and accurate automated segmentation of MS lesions. Distance transformation mapping was used to construct a convolutional module that encoded lesion-specific anatomical information. To overcome the lesion size imbalance during network training and improve the detection of small lesions, a lesion-wise loss function was developed in which individual lesions were modeled as spheres of equal size. On the ISBI-2015 longitudinal MS lesion segmentation challenge dataset (19 subjects in total), ALL-Net achieved an overall score of 93.32 and was amongst the top performing methods. On the larger Cornell MS dataset (176 subjects in total), ALL-Net significantly improved both voxel-wise metrics (Dice improvement of 3.9% to 35.3% with p-values ranging from p < 0.01 to p < 0.0001, and AUC of voxel-wise precision-recall curve improvement of 2.1% to 29.8%) and lesion-wise metrics (lesion-wise F1 score improvement of 12.6% to 29.8% with all p-values p < 0.0001, and AUC of lesion-wise ROC curve improvement of 1.4% to 20.0%) compared to leading publicly available MS lesion segmentation tools.

AAAI Conference 2021 Conference Paper

Efficient Folded Attention for Medical Image Reconstruction and Segmentation

  • Hang Zhang
  • Jinwei Zhang
  • Rongguang Wang
  • Qihao Zhang
  • Pascal Spincemaille
  • Thanh D. Nguyen
  • Yi Wang

Recently, 3D medical image reconstruction (MIR) and segmentation (MIS) based on deep neural networks have been developed with promising results, and attention mechanism has been further designed for performance enhancement. However, the large size of 3D volume images poses a great computational challenge to traditional attention methods. In this paper, we propose a folded attention (FA) approach to improve the computational efficiency of traditional attention methods on 3D medical images. The main idea is that we apply tensor folding and unfolding operations to construct four small sub-affinity matrices to approximate the original affinity matrix. Through four consecutive sub-attention modules of FA, each element in the feature tensor can aggregate spatial-channel information from all other elements. Compared to traditional attention methods, with the moderate improvement of accuracy, FA can substantially reduce the computational complexity and GPU memory consumption. We demonstrate the superiority of our method on two challenging tasks for 3D MIR and MIS, which are quantitative susceptibility mapping and multiple sclerosis lesion segmentation.

YNIMG Journal 2021 Journal Article

Estimation of Multiple Sclerosis lesion age on magnetic resonance imaging

  • Elizabeth M. Sweeney
  • Thanh D. Nguyen
  • Amy Kuceyeski
  • Sarah M. Ryan
  • Shun Zhang
  • Lily Zexter
  • Yi Wang
  • Susan A. Gauthier

We introduce the first-ever statistical framework for estimating the age of Multiple Sclerosis (MS) lesions from magnetic resonance imaging (MRI). Estimating lesion age is an important step when studying the longitudinal behavior of MS lesions and can be used in applications such as studying the temporal dynamics of chronic active MS lesions. Our lesion age estimation models use first order radiomic features over a lesion derived from conventional T1 (T1w) and T2 weighted (T2w) and fluid attenuated inversion recovery (FLAIR), T1w with gadolinium contrast (T1w+c), and Quantitative Susceptibility Mapping (QSM) MRI sequences as well as demographic information. For this analysis, we have a total of 32 patients with 53 new lesions observed at 244 time points. A one or two step random forest model for lesion age is fit on a training set using a lesion volume cutoff of 15 mm3 or 50 mm3. We explore the performance of nine different modeling scenarios that included various combinations of the MRI sequences and demographic information and a one or two step random forest models, as well as simpler models that only uses the mean radiomic feature from each MRI sequence. The best performing model on a validation set is a model that uses a two-step random forest model on the radiomic features from all of the MRI sequences with demographic information using a lesion volume cutoff of 50 mm3. This model has a mean absolute error of 7. 23 months (95% CI: [6. 98, 13. 43]) and a median absolute error of 5. 98 months (95% CI: [5. 26, 13. 25]) in the validation set. For this model, the predicted age and actual age have a statistically significant association (p-value <0. 001) in the validation set.

YNICL Journal 2021 Journal Article

GAMER MRI: Gated-attention mechanism ranking of multi-contrast MRI in brain pathology

  • Po-Jui Lu
  • Youngjin Yoo
  • Reza Rahmanzadeh
  • Riccardo Galbusera
  • Matthias Weigel
  • Pascal Ceccaldi
  • Thanh D. Nguyen
  • Pascal Spincemaille

INTRODUCTION: During the last decade, a multitude of novel quantitative and semiquantitative MRI techniques have provided new information about the pathophysiology of neurological diseases. Yet, selection of the most relevant contrasts for a given pathology remains challenging. In this work, we developed and validated a method, Gated-Attention MEchanism Ranking of multi-contrast MRI in brain pathology (GAMER MRI), to rank the relative importance of MR measures in the classification of well understood ischemic stroke lesions. Subsequently, we applied this method to the classification of multiple sclerosis (MS) lesions, where the relative importance of MR measures is less understood. METHODS: GAMER MRI was developed based on the gated attention mechanism, which computes attention weights (AWs) as proxies of importance of hidden features in the classification. In the first two experiments, we used Trace-weighted (Trace), apparent diffusion coefficient (ADC), Fluid-Attenuated Inversion Recovery (FLAIR), and T1-weighted (T1w) images acquired in 904 acute/subacute ischemic stroke patients and in 6,230 healthy controls and patients with other brain pathologies to assess if GAMER MRI could produce clinically meaningful importance orders in two different classification scenarios. In the first experiment, GAMER MRI with a pretrained convolutional neural network (CNN) was used in conjunction with Trace, ADC, and FLAIR to distinguish patients with ischemic stroke from those with other pathologies and healthy controls. In the second experiment, GAMER MRI with a patch-based CNN used Trace, ADC and T1w to differentiate acute ischemic stroke lesions from healthy tissue. The last experiment explored the performance of patch-based CNN with GAMER MRI in ranking the importance of quantitative MRI measures to distinguish two groups of lesions with different pathological characteristics and unknown quantitative MR features. Specifically, GAMER MRI was applied to assess the relative importance of the myelin water fraction (MWF), quantitative susceptibility mapping (QSM), T1 relaxometry map (qT1), and neurite density index (NDI) in distinguishing 750 juxtacortical lesions from 242 periventricular lesions in 47 MS patients. Pair-wise permutation t-tests were used to evaluate the differences between the AWs obtained for each quantitative measure. RESULTS: In the first experiment, we achieved a mean test AUC of 0.881 and the obtained AWs of FLAIR and the sum of AWs of Trace and ADC were 0.11 and 0.89, respectively, as expected based on previous knowledge. In the second experiment, we achieved a mean test F1 score of 0.895 and a mean AW of Trace = 0.49, of ADC = 0.28, and of T1w = 0.23, thereby confirming the findings of the first experiment. In the third experiment, MS lesion classification achieved test balanced accuracy = 0.777, sensitivity = 0.739, and specificity = 0.814. The mean AWs of T1map, MWF, NDI, and QSM were 0.29, 0.26, 0.24, and 0.22 (p < 0.001), respectively. CONCLUSIONS: This work demonstrates that the proposed GAMER MRI might be a useful method to assess the relative importance of MRI measures in neurological diseases with focal pathology. Moreover, the obtained AWs may in fact help to choose the best combination of MR contrasts for a specific classification problem.

YNICL Journal 2021 Journal Article

Increased risk for cerebral small vessel disease is associated with quantitative susceptibility mapping in HIV infected and uninfected individuals

  • Kyle D. Murray
  • Md Nasir Uddin
  • Madalina E. Tivarus
  • Bogachan Sahin
  • Henry Z. Wang
  • Meera V. Singh
  • Xing Qiu
  • Lu Wang

The aim of this study was to assess, in the context of cerebral small vessel disease (CSVD), whether cardiovascular risk factors and white matter hyperintensities (WMHs) were associated with brain tissue susceptibility as measured by quantitative susceptibility mapping (QSM). Given that CSVD is diagnosed by the presence of lacunar strokes, periventricular and deep WMHs, increased perivascular spaces, and microbleeds, we expected that QSM could capture changes in brain tissue due to underlying CSVD pathology. We compared a cohort of 101 HIV-infected individuals (mean age ± SD = 53.2 ± 10.9 years) with mild to moderate cardiovascular risk scores, as measured by the Reynolds risk score, to 102 age-matched controls (mean age (SD) = 50.3 (15.7) years) with similar Reynolds scores. We performed brain MRI to assess CSVD burden by acquiring 3D T1-MPRAGE, 3D FLAIR, 2D T2-TSE, and mGRE for QSM. We found that signs of CSVD are significantly higher in individuals with HIV-infection compared to controls and that WMH volumes are significantly correlated with age and cardiovascular risk scores. Regional QSM was associated with cardiovascular risk factors, age, sex, and WMH volumes but not HIV status. These results suggest that QSM may be an early imaging marker reflective of alterations in brain microcirculation.

YNIMG Journal 2020 Journal Article

Fidelity imposed network edit (FINE) for solving ill-posed image reconstruction

  • Jinwei Zhang
  • Zhe Liu
  • Shun Zhang
  • Hang Zhang
  • Pascal Spincemaille
  • Thanh D. Nguyen
  • Mert R. Sabuncu
  • Yi Wang

Deep learning (DL) is increasingly used to solve ill-posed inverse problems in medical imaging, such as reconstruction from noisy and/or incomplete data, as DL offers advantages over conventional methods that rely on explicit image features and hand engineered priors. However, supervised DL-based methods may achieve poor performance when the test data deviates from the training data, for example, when it has pathologies not encountered in the training data. Furthermore, DL-based image reconstructions do not always incorporate the underlying forward physical model, which may improve performance. Therefore, in this work we introduce a novel approach, called fidelity imposed network edit (FINE), which modifies the weights of a pre-trained reconstruction network for each case in the testing dataset. This is achieved by minimizing an unsupervised fidelity loss function that is based on the forward physical model. FINE is applied to two important inverse problems in neuroimaging: quantitative susceptibility mapping (QSM) and under-sampled image reconstruction in MRI. Our experiments demonstrate that FINE can improve reconstruction accuracy.

AAAI Conference 2020 Conference Paper

GraphER: Token-Centric Entity Resolution with Graph Convolutional Neural Networks

  • Bing Li
  • Wei Wang
  • Yifang Sun
  • Linhan Zhang
  • Muhammad Asif Ali
  • Yi Wang

Entity resolution (ER) aims to identify entity records that refer to the same real-world entity, which is a critical problem in data cleaning and integration. Most of the existing models are attribute-centric, that is, matching entity pairs by comparing similarities of pre-aligned attributes, which require the schemas of records to be identical and are too coarse-grained to capture subtle key information within a single attribute. In this paper, we propose a novel graph-based ER model GraphER. Our model is token-centric: the final matching results are generated by directly aggregating token-level comparison features, in which both the semantic and structural information has been softly embedded into token embeddings by training an Entity Record Graph Convolutional Network (ER-GCN). To the best of our knowledge, our work is the first effort to do token-centric entity resolution with the help of GCN in entity resolution task. Extensive experiments on two real-world datasets demonstrate that our model stably outperforms state-of-the-art models.

AAAI Conference 2020 Conference Paper

Group-Wise Dynamic Dropout Based on Latent Semantic Variations

  • Zhiwei Ke
  • Zhiwei Wen
  • Weicheng Xie
  • Yi Wang
  • Linlin Shen

Dropout regularization has been widely used in various deep neural networks to combat overfitting. It works by training a network to be more robust on information-degraded data points for better generalization. Conventional dropout and variants are often applied to individual hidden units in a layer to break up co-adaptations of feature detectors. In this paper, we propose an adaptive dropout to reduce the co-adaptations in a group-wise manner by coarse semantic information to improve feature discriminability. In particular, we showed that adjusting the dropout probability based on local feature densities can not only improve the classification performance significantly but also enhance the network robustness against adversarial examples in some cases. The proposed approach was evaluated in comparison with the baseline and several state-of-the-art adaptive dropouts over four public datasets of Fashion-MNIST, CIFAR-10, CIFAR-100 and SVHN.

YNICL Journal 2020 Journal Article

Hippocampal plasticity underpins long-term cognitive gains from resistance exercise in MCI

  • Kathryn M. Broadhouse
  • Maria Fiatarone Singh
  • Chao Suo
  • Nicola Gates
  • Wei Wen
  • Henry Brodaty
  • Nidhi Jain
  • Guy C. Wilson

Dementia affects 47 million individuals worldwide, and assuming the status quo is projected to rise to 150 million by 2050. Prevention of age-related cognitive impairment in older persons with lifestyle interventions continues to garner evidence but whether this can combat underlying neurodegeneration is unknown. The Study of Mental Activity and Resistance Training (SMART) trial has previously reported within-training findings; the aim of this study was to investigate the long-term neurostructural and cognitive impact of resistance exercise in Mild Cognitive Impairment (MCI). For the first time we show that hippocampal subareas particularly susceptible to volume loss in Alzheimer's disease (AD) are protected by resistance exercise for up to one year after training. One hundred MCI participants were randomised to one of four training groups: (1) Combined high intensity progressive resistance and computerised cognitive training (PRT+CCT), (2) PRT+Sham CCT, (3) CCT+Sham PRT, (4) Sham physical+sham cognitive training (SHAM+SHAM). Physical, neuropsychological and MRI assessments were carried out at baseline, 6 months (directly after training) and 18 months from baseline (12 months after intervention cessation). Here we report neuro-structural and functional changes over the 18-month trial period and the association with global cognitive and executive function measures. PRT but not CCT or PRT+CCT led to global long-term cognitive improvements above SHAM intervention at 18-month follow-up. Furthermore, hippocampal subfields susceptible to atrophy in AD were protected by PRT revealing an elimination of long-term atrophy in the left subiculum, and attenuation of atrophy in left CA1 and dentate gyrus when compared to SHAM+SHAM (p = 0.023, p = 0.020 and p = 0.027). These neuroprotective effects mediated a significant portion of long-term cognitive benefits. By contrast, within-training posterior cingulate plasticity decayed after training cessation and was unrelated to long term cognitive benefits. Neither general physical activity levels nor fitness change over the 18-month period mediated hippocampal trajectory, demonstrating that enduring hippocampal subfield plasticity is not a simple reflection of post-training changes in fitness or physical activity participation. Notably, resting-state fMRI analysis revealed that both the hippocampus and posterior cingulate participate in a functional network that continued to be upregulated following intervention cessation. Multiple structural mechanisms may contribute to the long-term global cognitive benefit of resistance exercise, developing along different time courses but functionally linked. For the first time we show that 6 months of high intensity resistance exercise is capable of not only promoting better cognition in those with MCI, but also protecting AD-vulnerable hippocampal subfields from degeneration for at least 12 months post-intervention. These findings emphasise the therapeutic potential of resistance exercise; however, future work will need to establish just how long-lived these outcomes are and whether they are sufficient to delay dementia.

IROS Conference 2020 Conference Paper

On a videoing control system based on object detection and tracking

  • Yanhao Ren
  • Yi Wang
  • Qi Tang
  • Haijun Jiang
  • Wenlian Lu

In this paper, we propose a camera control system towards occasionally videoing preassigned objects. Based on the technique of real-time visual detection and tracking, using the Kalman filter and re-identification (ReID), we propose continuous composition of lens, based on the atomic rules of shots, and give the trajectory planning of the camera, to generate the PID controller to the pan-tilt. By both simulation and emulation by frame-wise cropping of video clips, we illustrate the efficiency of this method. Based on this model, we design and produce an AI automatic camera for lively photography and clip videoing.

NeurIPS Conference 2020 Conference Paper

RANet: Region Attention Network for Semantic Segmentation

  • Dingguo Shen
  • Yuanfeng Ji
  • Ping Li
  • Yi Wang
  • Di Lin

Recent semantic segmentation methods model the relationship between pixels to construct the contextual representations. In this paper, we introduce the \emph{Region Attention Network} (RANet), a novel attention network for modeling the relationship between object regions. RANet divides the image into object regions, where we select representative information. In contrast to the previous methods, RANet configures the information pathways between the pixels in different regions, enabling the region interaction to exchange the regional context for enhancing all of the pixels in the image. We train the construction of object regions, the selection of the representative regional contents, the configuration of information pathways and the context exchange between pixels, jointly, to improve the segmentation accuracy. We extensively evaluate our method on the challenging segmentation benchmarks, demonstrating that RANet effectively helps to achieve the state-of-the-art results.

YNIMG Journal 2019 Journal Article

3D texture analyses within the substantia nigra of Parkinson's disease patients on quantitative susceptibility maps and R2∗ maps

  • Gaiying Li
  • Guoqiang Zhai
  • Xinxin Zhao
  • Hedi An
  • Pascal Spincemaille
  • Kelly M. Gillen
  • Yixuan Ku
  • Yi Wang

Iron accumulation in the substantia nigra (SN) is spatially heterogeneous, yet no study has quantitatively evaluated how the texture of quantitative susceptibility maps (QSM) and R2∗ might evolve with Parkinson's disease (PD) and healthy controls (HC). The aim of this study was to discriminate between patients with PD and HC using texture analysis in the SN from QSM and R2∗ maps. QSM and R2∗ maps were obtained from 28 PD patients and 28 HC on a clinical 3T MR imaging scanner using 3D multi-echo gradient-echo sequence. The first- and second- order texture features of the QSM and R2∗ images were obtained to evaluate group differences using two-tailed t-test. After correction for multiple comparisons, for the first-order analysis, the susceptibility of SN from patients with PD was significantly greater (p = 0. 017) compared with the SN from HC. For the second-order texture analysis, angular second moment, entropy, and sum of entropy showed significant differences in QSM (p < 0. 001) and R2∗ maps (p < 0. 01). In addition, correlation, contrast, sum of variance and difference of variance, significantly separated the subject groups in QSM maps (p < 0. 05) but not in R2∗ images. Receiver operating characteristic analysis showed that entropy and sum of entropy of the QSM maps in the SN yielded the highest performance for differentiating PD patients from HC (area under the curve = 0. 89). In conclusion, most first- and second- order QSM texture features successfully distinguished PD patients from HC and significantly outperformed R2∗ texture analysis. The second-order texture features were more accurate and sensitive than first-order texture features for classifying PD patients.

IJCAI Conference 2019 Conference Paper

Model-Agnostic Adversarial Detection by Random Perturbations

  • Bo Huang
  • Yi Wang
  • Wei Wang

Adversarial examples induce model classification errors on purpose, which has raised concerns on the security aspect of machine learning techniques. Many existing countermeasures are compromised by adaptive adversaries and transferred examples. We propose a model-agnostic approach to resolve the problem by analysing the model responses to an input under random perturbations, and study the robustness of detecting norm-bounded adversarial distortions in a theoretical framework. Extensive evaluations are performed on the MNIST, CIFAR-10 and ImageNet datasets. The results demonstrate that our detection method is effective and resilient against various attacks including black-box attacks and the powerful CW attack with four adversarial adaptations.

JBHI Journal 2018 Journal Article

Automatic Fetal Head Circumference Measurement in Ultrasound Using Random Forest and Fast Ellipse Fitting

  • Jing Li
  • Yi Wang
  • Baiying Lei
  • Jie-Zhi Cheng
  • Jing Qin
  • Tianfu Wang
  • Shengli Li
  • Dong Ni

Head circumference (HC) is one of the most important biometrics in assessing fetal growth during prenatal ultrasound examinations. However, the manual measurement of this biometric by doctors often requires substantial experience. We developed a learning-based framework that used prior knowledge and employed a fast ellipse fitting method (ElliFit) to measure HC automatically. We first integrated the prior knowledge about the gestational age and ultrasound scanning depth into a random forest classifier to localize the fetal head. We further used phase symmetry to detect the center line of the fetal skull and employed ElliFit to fit the HC ellipse for measurement. The experimental results from 145 HC images showed that our method had an average measurement error of 1. 7 mm and outperformed traditional methods. The experimental results demonstrated that our method shows great promise for applications in clinical practice.

EAAI Journal 2018 Journal Article

Cope with diverse data structures in multi-fidelity modeling: A Gaussian process method

  • Haitao Liu
  • Yew-Soon Ong
  • Jianfei Cai
  • Yi Wang

Multi-fidelity modeling (MFM) frameworks, especially the Bayesian MFM, have gained popularity in simulation based modeling, uncertainty quantification and optimization, due to the potential for reducing computational budget. In the view of multi-output modeling, the MFM approximates the high-/low-fidelity outputs simultaneously by considering the output correlations, and particularly, it transfers knowledge from the inexpensive low-fidelity outputs that have many training points to enhance the modeling of the expensive high-fidelity output that has a few training points. This article presents a novel multi-fidelity Gaussian process for modeling with diverse data structures. The diverse data structures mainly refer to the diversity of high-fidelity sample distributions, i. e. , the high-fidelity points may randomly fill the domain, or more challengingly, they may cluster in some subregions. The proposed multi-fidelity model is composed of a global trend term and a local residual term. Particularly, the flexible residual term extracts both the shared and output-specific residual information via a data-driven weight parameter. Numerical experiments on two synthetic examples, an aircraft example and a stochastic incompressible flow example reveal that this very promising Bayesian MFM approach is capable of effectively extracting the low-fidelity information for facilitating the modeling of the high-fidelity output using diverse data structures.

YNICL Journal 2018 Journal Article

Diagnostic accuracy of semiautomatic lesion detection plus quantitative susceptibility mapping in the identification of new and enhancing multiple sclerosis lesions

  • Shun Zhang
  • Thanh D. Nguyen
  • Yize Zhao
  • Susan A. Gauthier
  • Yi Wang
  • Ajay Gupta

Purpose: To evaluate the diagnostic accuracy of a novel non-contrast brain MRI method based on semiautomatic lesion detection using T2w FLAIR subtraction image, the statistical detection of change (SDC) algorithm (T2w + SDC), and quantitative susceptibility mapping (QSM). This method identifies new lesions and discriminates between enhancing and nonenhancing lesions in multiple sclerosis (MS). Methods: Thirty three MS patients who had MRIs at two different time points with at least one new Gd-enhancing lesion on the 2nd MRI were included in the study. For a reference standard, new lesions were identified by two neuroradiologists on T2w and post-Gd T1w images with the help of T2w + SDC. The diagnostic accuracy of the proposed method based on QSM and T2w + SDC lesion detection (T2w + SDC + QSM) for assessing lesion enhancement status was determined. Receiver operating characteristic (ROC) analysis was performed to compute the optimal lesion susceptibility cutoff value. Results: , the optimal QSM cutoff was 15.4 ppb with a sensitivity of 77.9% and specificity of 94.0% (0.93, 95% CI, 0.89-0.97). Conclusion: The proposed T2w + SDC + QSM method is highly accurate for identifying and predicting the enhancement status of new MS lesions without the use of Gd injection.

IJCAI Conference 2018 Conference Paper

Fast Factorization-free Kernel Learning for Unlabeled Chunk Data Streams

  • Yi Wang
  • Nan Xue
  • Xin Fan
  • Jiebo Luo
  • Risheng Liu
  • Bin Chen
  • Haojie Li
  • Zhongxuan Luo

Data stream analysis aims at extracting discriminative information for classification from continuously incoming samples. It is extremely challenging to detect novel data while updating the model in an efficient and stable fashion, especially for the chunk data. This paper proposes a fast factorization-free kernel learning method to unify novelty detection and incremental learning for unlabeled chunk data streams in one framework. The proposed method constructs a joint reproducing kernel Hilbert space from known class centers by solving a linear system in kernel space. Naturally, unlabeled data can be detected and classified among multi-classes by a single decision model. And projecting samples into the discriminative feature space turns out to be the product of two small-sized kernel matrices without needing such time-consuming factorization like QR-decomposition or singular value decomposition. Moreover, the insertion of a novel class can be treated as the addition of a new orthogonal basis to the existing feature space, resulting in fast and stable updating schemes. Both theoretical analysis and experimental validation on real-world datasets demonstrate that the proposed methods learn chunk data streams with significantly lower computational costs and comparable or superior accuracy than the state of the art.

NeurIPS Conference 2018 Conference Paper

Image Inpainting via Generative Multi-column Convolutional Neural Networks

  • Yi Wang
  • Xin Tao
  • Xiaojuan Qi
  • Xiaoyong Shen
  • Jiaya Jia

In this paper, we propose a generative multi-column network for image inpainting. This network synthesizes different image components in a parallel manner within one stage. To better characterize global structures, we design a confidence-driven reconstruction loss while an implicit diversified MRF regularization is adopted to enhance local details. The multi-column network combined with the reconstruction and MRF loss propagates local and global information derived from context to the target inpainting regions. Extensive experiments on challenging street view, face, natural objects and scenes manifest that our method produces visual compelling results even without previously common post-processing.

TIST Journal 2018 Journal Article

RelationLines

  • Wei Chen
  • Jing Xia
  • Xumeng Wang
  • Yi Wang
  • Jun Chen
  • Liang Chang

The increased accessibility of urban sensor data and the popularity of social network applications is enabling the discovery of crowd mobility and personal communication patterns. However, studying the egocentric relationships of an individual can be very challenging because available data may refer to direct contacts, such as phone calls between individuals, or indirect contacts, such as paired location presence. In this article, we develop methods to integrate three facets extracted from heterogeneous urban data (timelines, calls, and locations) through a progressive visual reasoning and inspection scheme. Our approach uses a detect-and-filter scheme such that, prior to visual refinement and analysis, a coarse detection is performed to extract the target individual and construct the timeline of the target. It then detects spatio-temporal co-occurrences or call-based contacts to develop the egocentric network of the individual. The filtering stage is enhanced with a line-based visual reasoning interface that facilitates a flexible and comprehensive investigation of egocentric relationships and connections in terms of time, space, and social networks. The integrated system, RelationLines, is demonstrated using a dataset that contains taxi GPS data, cell-base mobility data, mobile calling data, microblog data, and point-of-interest (POI) data from a city with millions of citizens. We examine the effectiveness and efficiency of our system with three case studies and user review.

KR Conference 2018 Conference Paper

Weight Learning in a Probabilistic Extension of Answer Set Programs

  • Joohyung Lee
  • Yi Wang

LPMLN is a probabilistic extension of answer set programs with the weight scheme derived from that of Markov Logic. Previous work has shown how inference in LPMLN can be achieved. In this paper, we present the concept of weight learning in LPMLN and learning algorithms for LPMLN derived from those for Markov Logic. We also present a prototype implementation that uses answer set solvers for learning as well as some example domains that illustrate distinct features of LPMLN learning. Learning in LPMLN is in accordance with the stable model semantics, thereby it learns parameters for probabilistic extensions of knowledge-rich domains where answer set programming has shown to be useful but limited to the deterministic case, such as reachability analysis and reasoning about actions in dynamic domains. We also apply the method to learn the parameters for probabilistic abductive reasoning about actions.

AAAI Conference 2017 Conference Paper

Fast Online Incremental Learning on Mixture Streaming Data

  • Yi Wang
  • Xin Fan
  • Zhongxuan Luo
  • Tianzhu Wang
  • Maomao Min
  • Jiebo Luo

The explosion of streaming data poses challenges to feature learning methods including linear discriminant analysis (LDA). Many existing LDA algorithms are not efficient enough to incrementally update with samples that sequentially arrive in various manners. First, we propose a new fast batch LDA (FLDA/QR) learning algorithm that uses the cluster centers to solve a lower triangular system that is optimized by the Cholesky-factorization. To take advantage of the intrinsically incremental mechanism of the matrix, we further develop an exact incremental algorithm (IFLDA/QR). The Gram-Schmidt process with reorthogonalization in IFLDA/QR significantly saves the space and time expenses compared with the rank-one QR-updating of most existing methods. IFLDA/QR is able to handle streaming data containing 1) new labeled samples in the existing classes, 2) samples of an entirely new (novel) class, and more significantly, 3) a chunk of examples mixed with those in 1) and 2). Both theoretical analysis and numerical experiments have demonstrated much lower space and time costs (2 ∼ 10 times faster) than the state of the art, with comparable classification accuracy.

AAAI Conference 2017 Conference Paper

Fine-Grained Recurrent Neural Networks for Automatic Prostate Segmentation in Ultrasound Images

  • Xin Yang
  • Lequan Yu
  • Lingyun Wu
  • Yi Wang
  • Dong Ni
  • Jing Qin
  • Pheng-Ann Heng

Boundary incompleteness raises great challenges to automatic prostate segmentation in ultrasound images. Shape prior can provide strong guidance in estimating the missing boundary, but traditional shape models often suffer from hand-crafted descriptors and local information loss in the fitting procedure. In this paper, we attempt to address those issues with a novel framework. The proposed framework can seamlessly integrate feature extraction and shape prior exploring, and estimate the complete boundary with a sequential manner. Our framework is composed of three key modules. Firstly, we serialize the static 2D prostate ultrasound images into dynamic sequences and then predict prostate shapes by sequentially exploring shape priors. Intuitively, we propose to learn the shape prior with the biologically plausible Recurrent Neural Networks (RNNs). This module is corroborated to be effective in dealing with the boundary incompleteness. Secondly, to alleviate the bias caused by different serialization manners, we propose a multi-view fusion strategy to merge shape predictions obtained from different perspectives. Thirdly, we further implant the RNN core into a multiscale Auto-Context scheme to successively refine the details of the shape prediction map. With extensive validation on challenging prostate ultrasound images, our framework bridges severe boundary incompleteness and achieves the best performance in prostate boundary delineation when compared with several advanced methods. Additionally, our approach is general and can be extended to other medical image segmentation tasks, where boundary incompleteness is one of the main challenges.

FLAP Journal 2017 Journal Article

Fuzzy Propositional Formulas under the Stable Model Semantics.

  • Joohyung Lee
  • Yi Wang

We define a stable model semantics for fuzzy propositional formulas, which generalizes both fuzzy propositional logic and the stable model semantics of classical propositional formulas. The syntax of the language is the same as the syntax of fuzzy propositional logic, but its semantics distinguishes stable models from non-stable models. The generality of the language allows for highly configurable nonmonotonic reasoning for dynamic domains involving graded truth degrees. We show that several properties of Boolean stable models are naturally extended to this many-valued setting, and discuss how it is related to other approaches to combining fuzzy logic and the stable model semantics.

YNICL Journal 2017 Journal Article

Quantitative susceptibility mapping to evaluate the early stage of Alzheimer's disease

  • Hyug-Gi Kim
  • Soonchan Park
  • Hak Young Rhee
  • Kyung Mi Lee
  • Chang-Woo Ryu
  • Sun Jung Rhee
  • Soo Yeol Lee
  • Yi Wang

The objective of this study was to evaluate susceptibility changes caused by iron accumulation in cognitive normal (CN) elderly, those with amnestic mild cognitive impairment (aMCI), and those with early state AD, and to compare the findings with gray matter volume (GMV) changes caused by neuronal loss. The participants included 19 elderly CN, 19 aMCI, and 19 AD subjects. The voxel-based quantitative susceptibility map (QSM) and GMV in the brain were calculated and the differences of those insides were compared among the three groups. The differences of the QSM data and GMVs among the three groups were investigated by voxel-based and region of interest (ROI)-based comparisons using a one-way analysis of covariance (ANCOVA) test with the gender and age as covariates. Finally, a receiver-operating-characteristic (ROC) curve analysis was performed. The voxel-based results showed that QSM demonstrated more areas with significant difference between the CN and AD groups compared to GMV. GMVs were decreased, but QSM values were increased in aMCI and AD groups compared with the CN group. QSM better differentiated aMCI from CN than GMV in the precuneus and allocortex regions. In the accumulation regions of iron and amyloid β, QSM can be used to differentiate between CN and aMCI groups, indicating a useful an auxiliary imaging for early diagnosis of AD.

IJCAI Conference 2016 Conference Paper

Bayesian Optimization of Partition Layouts for Mondrian Processes

  • Yi Wang
  • Bin Li
  • Xuhui Fan
  • Yang Wang
  • Fang Chen

The Mondrian process (MP) produces hierarchical partitions on a product space as a kd-tree, which can be served as a flexible yet parsimonious partition prior for relational modeling. Due to the recursive generation of partitions and varying dimensionality of the partition state space, the inference procedure for the MP relational modeling is extremely difficult. The prevalent inference method reversible-jump MCMC for this problem requires a number of unnecessary retrospective steps to transit from one partition state to a very similar one and it is prone to fall into a local optimum. In this paper, we attempt to circumvent these drawbacks by proposing an alternative method for inferring the MP partition structure. Based on the observation that similar cutting rate measures on the partition space lead to similar partition layouts, we propose to impose a nonhomogeneous cutting rate measure on the partition space to control the layouts of the generated partitions - the original MCMC sampling problem is thus transformed into a Bayesian global optimization problem. The empirical tests demonstrate that Bayesian optimization is able to find better partition structures than MCMC sampling with the same number of partition structure proposals.

ICML Conference 2016 Conference Paper

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

  • Dario Amodei
  • Sundaram Ananthanarayanan
  • Rishita Anubhai
  • Jingliang Bai
  • Eric Battenberg
  • Carl Case
  • Jared Casper
  • Bryan Catanzaro

We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech–two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our application of HPC techniques, enabling experiments that previously took weeks to now run in days. This allows us to iterate more quickly to identify superior architectures and algorithms. As a result, in several cases, our system is competitive with the transcription of human workers when benchmarked on standard datasets. Finally, using a technique called Batch Dispatch with GPUs in the data center, we show that our system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.

AAAI Conference 2016 Conference Paper

The Ostomachion Process

  • Xuhui Fan
  • Bin Li
  • Yi Wang
  • Yang Wang
  • Fang Chen

Stochastic partition processes for exchangeable graphs produce axis-aligned blocks on a product space. In relational modeling, the resulting blocks uncover the underlying interactions between two sets of entities of the relational data. Although some flexible axis-aligned partition processes, such as the Mondrian process, have been able to capture complex interacting patterns in a hierarchical fashion, they are still in short of capturing dependence between dimensions. To overcome this limitation, we propose the Ostomachion process (OP), which relaxes the cutting direction by allowing for oblique cuts. The partitions generated by an OP are convex polygons that can capture inter-dimensional dependence. The OP also exhibits interesting properties: 1) Along the time line the cutting times can be characterized by a homogeneous Poisson process, and 2) on the partition space the areas of the resulting components comply with a Dirichlet distribution. We can thus control the expected number of cuts and the expected areas of components through hyper-parameters. We adapt the reversible-jump MCMC algorithm for inferring OP partition structures. The experimental results on relational modeling and decision tree classification have validated the merit of the OP.

KR Conference 2016 Conference Paper

Weighted Rules under the Stable Model Semantics

  • Joohyung Lee
  • Yi Wang

We introduce the concept of weighted rules under the stable model semantics following the log-linear models of Markov Logic. This provides versatile methods to overcome the deterministic nature of the stable model semantics, such as resolving inconsistencies in answer set programs, ranking stable models, associating probability to stable models, and applying statistical inference to computing weighted stable models. We also present formal comparisons with related formalisms, such as answer set programs, Markov Logic, ProbLog, and P-log.

YNIMG Journal 2015 Journal Article

Age and sex related differences in subcortical brain iron concentrations among healthy adults

  • Ninni Persson
  • Jianlin Wu
  • Qing Zhang
  • Ting Liu
  • Jing Shen
  • Ruyi Bao
  • Mingfei Ni
  • Tian Liu

Age and sex can influence brain iron levels. We studied the influence of these variables on deep gray matter magnetic susceptibilities. In 183 healthy volunteers (44. 7±14. 2years, range 20–69, ♀ 49%), in vivo quantitative susceptibility mapping (QSM) at 1. 5T was performed to estimate brain iron accumulation in the following regions of interest (ROIs): caudate nucleus (Cd), putamen (Pt), globus pallidus (Gp), thalamus (Th), pulvinar (Pul), red nucleus (Rn), substantia nigra (Sn) and the cerebellar dentate nuclei (Dn). We gauged the influence of age and sex on magnetic susceptibility by specifying a series of structural equation models. The distributions of susceptibility varied in degree across the structures, conforming to histologic findings (Hallgren and Sourander, 1958), with the highest degree of susceptibility in the Gp and the lowest in the Th. Iron increase correlated across several ROIs, which may reflect an underlying age-related process. Advanced age was associated with a particularly strong linear rise of susceptibility in the striatum. Nonlinear age trends were found in the Rn, where they were the most pronounced, followed by the Pul and Sn, while minimal nonlinear trends were observed for the Pt, Th, and Dn. Moreover, sex related variations were observed, so that women showed lower levels of susceptibility in the Sn after accounting for age. Regional susceptibility of the Pul increased linearly with age in men but exhibited a nonlinear association with age in women with a leveling off starting from midlife. Women expected to be post menopause (+51years) showed lower total magnetic susceptibility in the subcortical gray matter. The current report not only is consistent with previous reports of age related variations of brain iron, but also adds to the current knowledge by reporting age-related changes in less studied, smaller subcortical nuclei. This is the first in-vivo report to show lower total subcortical brain iron levels selectively in women from midlife, compared to men and younger women. These results encourage further assessment of sex differences in brain iron. We anticipate that age and sex are important co-factors to take into account when establishing a baseline level for differentiating pathologic neurodegeneration from healthy aging. The variations in regional susceptibility reported herein should be evaluated further using a longitudinal study design to determine within-person changes in aging.

AAAI Conference 2015 Conference Paper

Handling Uncertainty in Answer Set Programming

  • Yi Wang
  • Joohyung Lee

We present a probabilistic extension of logic programs under the stable model semantics, inspired by the concept of Markov Logic Networks. The proposed language takes advantage of both formalisms in a single framework, allowing us to represent commonsense reasoning problems that require both logical and probabilistic reasoning in an intuitive and elaboration tolerant way.

EAAI Journal 2015 Journal Article

Maximum mutual information regularized classification

  • Jim Jing-Yan Wang
  • Yi Wang
  • Shiguang Zhao
  • Xin Gao

In this paper, a novel pattern classification approach is proposed by regularizing the classifier learning to maximize mutual information between the classification response and the true class label. We argue that, with the learned classifier, the uncertainty of the true class label of a data sample should be reduced by knowing its classification response as much as possible. The reduced uncertainty is measured by the mutual information between the classification response and the true class label. To this end, when learning a linear classifier, we propose to maximize the mutual information between classification responses and true class labels of training samples, besides minimizing the classification error and reducing the classifier complexity. An objective function is constructed by modeling mutual information with entropy estimation, and it is optimized by a gradient descend method in an iterative algorithm. Experiments on two real world pattern classification problems show the significant improvements achieved by maximum mutual information regularization.

YNIMG Journal 2015 Journal Article

Multi-vendor reliability of arterial spin labeling perfusion MRI using a near-identical sequence: Implications for multi-center studies

  • Henri J.M.M. Mutsaerts
  • Matthias J.P. van Osch
  • Fernando O. Zelaya
  • Danny J.J. Wang
  • Wibeke Nordhøy
  • Yi Wang
  • Stephen Wastling
  • Maria A. Fernandez-Seara

Introduction A main obstacle that impedes standardized clinical and research applications of arterial spin labeling (ASL), is the substantial differences between the commercial implementations of ASL from major MRI vendors. In this study, we compare a single identical 2D gradient-echo EPI pseudo-continuous ASL (PCASL) sequence implemented on 3T scanners from three vendors (General Electric Healthcare, Philips Healthcare and Siemens Healthcare) within the same center and with the same subjects. Material and methods Fourteen healthy volunteers (50% male, age 26. 4±4. 7years) were scanned twice on each scanner in an interleaved manner within 3h. Because of differences in gradient and coil specifications, two separate studies were performed with slightly different sequence parameters, with one scanner used across both studies for comparison. Reproducibility was evaluated by means of quantitative cerebral blood flow (CBF) agreement and inter-session variation, both on a region-of-interest (ROI) and voxel level. In addition, a qualitative similarity comparison of the CBF maps was performed by three experienced neuro-radiologists. Results There were no CBF differences between vendors in study 1 (p >0. 1), but there were CBF differences of 2–19% between vendors in study 2 (p <0. 001 in most gray matter ROIs) and 10–22% difference in CBF values obtained with the same vendor between studies (p <0. 001 in most gray matter ROIs). The inter-vendor inter-session variation was not significantly larger than the intra-vendor variation in all (p >0. 1) but one of the ROIs (p <0. 001). Conclusion This study demonstrates the possibility to acquire comparable cerebral CBF maps on scanners of different vendors. Small differences in sequence parameters can have a larger effect on the reproducibility of ASL than hardware or software differences between vendors. These results suggest that researchers should strive to employ identical labeling and readout strategies in multi-center ASL studies.

TIST Journal 2015 Journal Article

Peacock

  • Yi Wang
  • Xuemin Zhao
  • Zhenlong Sun
  • Hao Yan
  • Lifeng Wang
  • Zhihui Jin
  • Liubin Wang
  • Yang Gao

Latent Dirichlet allocation (LDA) is a popular topic modeling technique in academia but less so in industry, especially in large-scale applications involving search engine and online advertising systems. A main underlying reason is that the topic models used have been too small in scale to be useful; for example, some of the largest LDA models reported in literature have up to 10 3 topics, which difficultly cover the long-tail semantic word sets. In this article, we show that the number of topics is a key factor that can significantly boost the utility of topic-modeling systems. In particular, we show that a “big” LDA model with at least 10 5 topics inferred from 10 9 search queries can achieve a significant improvement on industrial search engine and online advertising systems, both of which serve hundreds of millions of users. We develop a novel distributed system called Peacock to learn big LDA models from big data. The main features of Peacock include hierarchical distributed architecture, real-time prediction, and topic de-duplication. We empirically demonstrate that the Peacock system is capable of providing significant benefits via highly scalable LDA topic models for several industrial applications.

IJCAI Conference 2015 Conference Paper

Saliency Detection with a Deeper Investigation of Light Field

  • Jun Zhang
  • Meng Wang
  • Jun Gao
  • Yi Wang
  • Xudong Zhang
  • Xindong Wu

Although the light field has been recently recognized helpful in saliency detection, it is not comprehensively explored yet. In this work, we propose a new saliency detection model with light field data. The idea behind the proposed model originates from the following observations. (1) People can distinguish regions at different depth levels via adjusting the focus of eyes. Similarly, a light field image can generate a set of focal slices focusing at different depth levels, which suggests that a background can be weighted by selecting the corresponding slice. We show that background priors encoded by light field focusness have advantages in eliminating background distraction and enhancing the saliency by weighting the light field contrast. (2) Regions at closer depth ranges tend to be salient, while far in the distance mostly belong to the backgrounds. We show that foreground objects can be easily separated from similar or cluttered backgrounds by exploiting their light field depth. Extensive evaluations on the recently introduced Light Field Saliency Dataset (LFSD) [Li et al. , 2014], including studies of different light field cues and comparisons with Li et al. ’s method (the only reported light field saliency detection approach to our knowledge) and the 2D/3D state-of-the-art approaches extended with light field depth/focusness information, show that the investigated light field properties are complementary with each other and lead to improvements on 2D/3D models, and our approach produces superior results in comparison with the state-of-the-art.

YNIMG Journal 2015 Journal Article

Simultaneous multi-slice Turbo-FLASH imaging with CAIPIRINHA for whole brain distortion-free pseudo-continuous arterial spin labeling at 3 and 7 T

  • Yi Wang
  • Steen Moeller
  • Xiufeng Li
  • An T. Vu
  • Kate Krasileva
  • Kamil Ugurbil
  • Essa Yacoub
  • Danny J.J. Wang

Simultaneous multi-slice (SMS) or multiband (MB) imaging has recently been attempted for arterial spin labeled (ASL) perfusion MRI in conjunction with echo-planar imaging (EPI) readout. It was found that SMS-EPI can reduce the T 1 relaxation effect of the label and improve image coverage and resolution with little penalty in signal-to-noise ratio (SNR). However, EPI still suffers from geometric distortion and signal dropout from field inhomogeneity effects especially at high and ultrahigh magnetic fields. Here we present a novel scheme for achieving high fidelity distortion-free quantitative perfusion imaging by combining pseudo-continuous ASL (pCASL) with SMS Turbo-FLASH (TFL) readout at both 3 and 7T. Bloch equation simulation was performed to characterize and optimize the TFL-based pCASL perfusion signal. Two MB factors (3 and 5) were implemented in SMS-TFL pCASL and compared with standard 2D TFL and EPI pCASL sequences. The temporal SNR of SMS-TFL pCASL relative to that of standard TFL pCASL was 0. 76±0. 10 and 0. 74±0. 11 at 7T and 0. 70±0. 05 and 0. 65±0. 05 at 3T for MB factor of 3 and 5, respectively. By implementing background suppression in conjunction with SMS-TFL at 3T, the relative temporal SNR improved to 0. 84±0. 09 and 0. 79±0. 10 for MB factor of 3 and 5, respectively. Compared to EPI pCASL, significantly increased temporal SNR (p<0. 001) and improved visualization of orbitofrontal cortex were achieved using SMS-TFL pCASL. By combining SMS acceleration with TFL pCASL, we demonstrated the feasibility for whole brain distortion-free quantitative mapping of cerebral blood flow at high and ultrahigh magnetic fields.

JBHI Journal 2014 Journal Article

Part-Based Multiderivative Edge Cross-Sectional Profiles for Polyp Detection in Colonoscopy

  • Yi Wang
  • Wallapak Tavanapong
  • Johnny Wong
  • JungHwan Oh
  • Piet C. de Groen

This paper presents a novel technique for automated detection of protruding polyps in colonoscopy images using edge cross-section profiles (ECSP). We propose a part-based multi-derivative ECSP that computes derivative functions of an edge cross-section profile and segments each of these profiles into parts. Therefore, we can model or extract features suitable for each part. Our features obtained from the parts can effectively describe complex properties of protruding polyps including the shape of the parts, texture, and protrusion and smoothness of the polyp surface. We evaluated our method against two existing polyp image detection techniques on 42 different polyps, including those with little protrusion. Each polyp has a large variation of appearance in viewing angles, light conditions, and scales in different images. The evaluation showed that our technique outperformed the existing techniques in both accuracy and analysis time. Our method has a higher area under the free-response receiver operating characteristic curve. For instance, when both techniques have a true positive rate for polyp image detection of 81. 4%, the average number of false regions per image of our technique is 0. 32 compared to 1. 8 of the best existing technique under study. Additionally, our technique can precisely mark edges of candidate polyp regions as visual feedback. These results altogether indicate that our technique is promising to provide visual feedback of polyp regions in clinical practice.

YNIMG Journal 2013 Journal Article

Magnetic susceptibility anisotropy: Cylindrical symmetry from macroscopically ordered anisotropic molecules and accuracy of MRI measurements using few orientations

  • Cynthia Wisnieff
  • Tian Liu
  • Pascal Spincemaille
  • Shuai Wang
  • Dong Zhou
  • Yi Wang

White matter is an essential component of the central nervous system and is of major concern in neurodegenerative diseases such as multiple sclerosis (MS). Recent MRI studies have explored the unique anisotropic magnetic properties of white matter using susceptibility tensor imaging. However, these measurements are inhibited in practice by the large number of different head orientations needed to accurately reconstruct the susceptibility tensor. Adding reasonable constraints reduces the number of model parameters and can help condition the tensor reconstruction from a small number of orientations. The macroscopic magnetic susceptibility is decomposed as a sum of molecular magnetic polarizabilities, demonstrating that macroscopic order in molecular arrangement is essential to the existence of and symmetry in susceptibility anisotropy and cylindrical symmetry is a natural outcome of an ordered molecular arrangement. Noise propagation in the susceptibility tensor reconstruction is analyzed through its condition number, showing that the tensor reconstruction is highly susceptible to the distribution of acquired subject orientations and to the tensor symmetry properties, with a substantial over- or under-estimation of susceptibility anisotropy in fiber directions not favorably oriented with respect to the acquired orientations. It was found that a careful acquisition of three non-coplanar orientations and the use of cylindrical symmetry guided by diffusion tensor imaging allowed reasonable estimation of magnetic susceptibility anisotropy in certain major white matter tracts in the human brain.

JBHI Journal 2013 Journal Article

Near Real-Time Retroflexion Detection in Colonoscopy

  • Yi Wang
  • W. Tavanapong
  • J. Wong
  • JungHwan Oh
  • P. C. de Groen

Colonoscopy is the most popular screening tool for colorectal cancer. Recent studies reported that retroflexion during colonoscopy helped to detect more polyps. Retroflexion is an endoscope maneuver that enables visualization of internal mucosa along the shaft of the endoscope, enabling visualization of the mucosa area that is difficult to see with typical forward viewing. This paper describes our new method that detects retroflexion during colonoscopy. We propose region shape and location (RSL) features and edgeless edge cross-section profile (ECSP) features that encapsulate important properties of endoscope appearance and edge information during retroflexion. Our experimental results on 50 colonoscopy test videos show that a simple ensemble classifier using both ECSP and RSL features can effectively identify retroflexion in terms of analysis time and detection rate.

AIJ Journal 2012 Journal Article

Model-based multidimensional clustering of categorical data

  • Tao Chen
  • Nevin L. Zhang
  • Tengfei Liu
  • Kin Man Poon
  • Yi Wang

Existing models for cluster analysis typically consist of a number of attributes that describe the objects to be partitioned and one single latent variable that represents the clusters to be identified. When one analyzes data using such a model, one is looking for one way to cluster data that is jointly defined by all the attributes. In other words, one performs unidimensional clustering. This is not always appropriate. For complex data with many attributes, it is more reasonable to consider multidimensional clustering, i. e. , to partition data along multiple dimensions. In this paper, we present a method for performing multidimensional clustering on categorical data and show its superiority over unidimensional clustering.

YNIMG Journal 2012 Journal Article

Morphology enabled dipole inversion for quantitative susceptibility mapping using structural consistency between the magnitude image and the susceptibility map

  • Jing Liu
  • Tian Liu
  • Ludovic de Rochefort
  • James Ledoux
  • Ildar Khalidov
  • Weiwei Chen
  • A. John Tsiouris
  • Cynthia Wisnieff

The magnetic susceptibility of tissue can be determined in gradient echo MRI by deconvolving the local magnetic field with the magnetic field generated by a unit dipole. This Quantitative Susceptibility Mapping (QSM) problem is unfortunately ill-posed. By transforming the problem to the Fourier domain, the susceptibility appears to be undersampled only at points where the dipole kernel is zero, suggesting that a modest amount of additional information may be sufficient for uniquely resolving susceptibility. A Morphology Enabled Dipole Inversion (MEDI) approach is developed that exploits the structural consistency between the susceptibility map and the magnitude image reconstructed from the same gradient echo MRI. Specifically, voxels that are part of edges in the susceptibility map but not in the edges of the magnitude image are considered to be sparse. In this approach an L1 norm minimization is used to express this sparsity property. Numerical simulations and phantom experiments are performed to demonstrate the superiority of this L1 minimization approach over the previous L2 minimization method. Preliminary brain imaging results in healthy subjects and in patients with intracerebral hemorrhages illustrate that QSM is feasible in practice.

YNIMG Journal 2011 Journal Article

DTI registration in atlas based fiber analysis of infantile Krabbe disease

  • Yi Wang
  • Aditya Gupta
  • Zhexing Liu
  • Hui Zhang
  • Maria L. Escolar
  • John H. Gilmore
  • Sylvain Gouttard
  • Pierre Fillard

In recent years, diffusion tensor imaging (DTI) has become the modality of choice to investigate white matter pathology in the developing brain. To study neonate Krabbe disease with DTI, we evaluate the performance of linear and non-linear DTI registration algorithms for atlas based fiber tract analysis. The DTI scans of 10 age-matched neonates with infantile Krabbe disease are mapped into an atlas for the analysis of major fiber tracts — the genu and splenium of the corpus callosum, the internal capsules tracts and the uncinate fasciculi. The neonate atlas is based on 377 healthy control subjects, generated using an unbiased diffeomorphic atlas building method. To evaluate the performance of one linear and seven nonlinear commonly used registration algorithms for DTI we propose the use of two novel evaluation metrics: a regional matching quality criterion incorporating the local tensor orientation similarity, and a fiber property profile based metric using normative correlation. Our experimental results indicate that the whole tensor based registration method within the DTI-ToolKit (DTI-TK) shows the best performance for our application.

IROS Conference 2009 Conference Paper

Real-time social touch gesture recognition for sensate robots

  • Heather Knight
  • Robert Lopez Toscano
  • Walter Dan Stiehl
  • Angela Chang
  • Yi Wang
  • Cynthia Breazeal

This paper describes the hardware and algorithms for a realtime social touch gesture recognition system. Early experiments involve a sensate bear test-rig with full body touch sensing, sensor visualization and gesture recognition capabilities. Algorithms are based on real humans interacting with a plush bear. In developing a preliminary gesture library with thirteen symbolic gestures and eight touch subtypes, we have taken the first steps toward a robotic touch API, showing that the huggable robot behavior system will be able to stream currently active sensors to detect regional social gestures and local sub-gestures in realtime. The system demonstrates the infrastructure to detect three types of touching: social touch, local touch, and sensor-level touch.

AAAI Conference 2008 Conference Paper

Latent Tree Models and Approximate Inference in Bayesian Networks

  • Yi Wang

We propose a novel method for approximate inference in Bayesian networks (BNs). The idea is to sample data from a BN, learn a latent tree model (LTM) from the data offline, and when online, make inference with the LTM instead of the original BN. Because LTMs are tree-structured, inference takes linear time. In the meantime, they can represent complex relationship among leaf nodes and hence the approximation accuracy is often good. Empirical evidence shows that our method can achieve good approximation accuracy at low online computational cost.

AIIM Journal 2008 Journal Article

Latent tree models and diagnosis in traditional Chinese medicine

  • Nevin L. Zhang
  • Shihong Yuan
  • Tao Chen
  • Yi Wang

Objective TCM (traditional Chinese medicine) is an important avenue for disease prevention and treatment for the Chinese people and is gaining popularity among others. However, many remain skeptical and even critical of TCM because of a number of its shortcomings. One key shortcoming is the lack of objective diagnosis standards. We endeavor to alleviate this shortcoming using machine learning techniques. Method TCM diagnosis consists of two steps, patient information gathering and syndrome differentiation. We focus on the latter. When viewed as a black box, syndrome differentiation is simply a classifier that classifies patients into different classes based on their symptoms. A fundamental question is: do those classes exist in reality? To seek an answer to the question from the machine learning perspective, one would naturally use cluster analysis. Previous clustering methods are unable to cope with the complexity of TCM. We have therefore developed a new clustering method in the form of latent tree models. We have conducted a case study where we first collected a data set about a TCM domain called kidney deficiency and then used latent tree models to analyze the data set. Results Our analysis has found natural clusters in the data set that correspond well to TCM syndrome types. This is an important discovery because (1) it provides statistical validation to TCM syndrome types and (2) it suggests the possibility of establishing objective and quantitative diagnosis standards for syndrome differentiation. In this paper, we provide a summary of research work on latent tree models and report the aforementioned case study.

IROS Conference 2003 Conference Paper

User-guided reinforcement learning of robot assistive tasks for an intelligent environment

  • Yi Wang
  • Manfred Huber
  • Vinay N. Papudesi
  • Diane J. Cook

Autonomous robots hold the possibility of performing a variety of assistive tasks in intelligent environments. However, widespread use of robot assistants in these environments requires ease of use by individuals who are generally not skilled robot operators. In this paper we present a method of training robots that bridges the gap between user programming of a robot and autonomous learning of a robot task. With our approach to variable autonomy, we integrate user commands at varying levels of abstraction into a reinforcement learner to permit faster policy acquisition. We illustrate the ideas using a robot assistant task, that of retrieving medicine for an inhabitant of a smart home.