Arrow Research search

Author name cluster

Qi Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

87 papers
2 author rows

Possible papers

87

EAAI Journal 2026 Journal Article

A Chinese financial event knowledge graph-based retrieval-augmented generation framework for financial question answering

  • Haitao Cheng
  • Ke Wang
  • Qi Wang
  • Tao Liu
  • Kai Sheng

Financial question answering in the Chinese domain presents significant challenges due to complex domain-specific terminology and the integration of heterogeneous financial research reports from multiple institutions. To address these issues, we propose a Chinese financial event knowledge graph-based retrieval-augmented generation framework. The framework constructs a structured index via semantic-aware text chunking and large language model-driven triplet extraction, incorporating a generation–verification mechanism to ensure reliable and relevant information retrieval. To mitigate vague or underspecified user queries that commonly occur in Chinese due to implicit expressions and unclear word boundaries, a reinforcement learning-based query reformulation module generates domain-specific representations, improving retrieval intent alignment. A dual-level retrieval mechanism is designed to retrieve core entities via semantic similarity and then expand event chains through knowledge graph-based neighbor expansion. Experimental results across three question types (single-hop, multi-hop, and open-ended) and four evaluation dimensions (comprehensiveness, diversity, empowerment, and overall performance) demonstrate that the proposed framework consistently outperforms baseline models, showing superior performance across various financial question answering tasks.

AAAI Conference 2026 Conference Paper

AbductiveMLLM: Boosting Visual Abductive Reasoning Within MLLMs

  • Boyu Chang
  • Qi Wang
  • Xi Guo
  • Zhixiong Nan
  • Yazhou Yao
  • Tianfei Zhou

Visual abductive reasoning (VAR) is a challenging task that requires AI systems to infer the most likely explanation for incomplete visual observations. While recent MLLMs develop strong general-purpose multimodal reasoning capabilities, they remain fall short in abductive inference, as compared to human beings. To bridge this gap, we draw inspiration from the interplay between verbal and pictorial abduction in human cognition, and propose to strengthen abduction of MLLMs by mimicking such dual-mode behavior. Concretely, we introduce AbductiveMLLM comprising of two synergistic components: REASONER and IMAGINER. The REASONER operates in the verbal domain. It first explores a broad space of possible explanations using a blind LLM and then prunes visually incongruent hypotheses based on cross-modal causal alignment. The remaining hypotheses are introduced into the MLLM as targeted priors, steering its reasoning toward causally coherent explanations. The IMAGINER, on the other hand, further guides MLLMs by emulating human-like pictorial thinking. It conditions a text-to-image diffusion model on both the input video and the REASONER’s output embeddings to “imagine” plausible visual scenes that correspond to verbal explanation, thereby enriching MLLMs' contextual grounding. The two components are trained jointly in an end-to-end manner. Experiments on standard VAR benchmarks show that AbductiveMLLM achieves state-of-the-art performance, consistently outperforming traditional solutions and advanced MLLMs.

AAAI Conference 2026 Conference Paper

Beyond Retraining: Training-Free Unknown Class Filtering for Source-Free Open Set Domain Adaptation of Vision–Language Models

  • Yongguang Li
  • Jindong Li
  • Qi Wang
  • QianLi Xing
  • Runliang Niu
  • Shengsheng Wang
  • Menglin Yang

Vision-language models (VLMs) have gained widespread attention for their strong zero-shot capabilities across numerous downstream tasks. However, these models assume that each test image’s class label is drawn from a predefined label set and lack a reliable mechanism to reject samples from emerging unknown classes when only unlabeled data are available. To address this gap, open-set domain adaptation methods retrain models to push potential unknowns away from known clusters. Yet, some unknown samples remain stably anchored to specific known classes in the VLM feature space due to semantic relevance, which is termed as Semantic Affinity Anchoring (SAA). Forcibly repelling these samples unavoidably distorts the native geometry of VLMs and degrades performance. Meanwhile, existing score‑based unknown detectors use simplistic thresholds and suffer from threshold sensitivity, resulting in sub‑optimal performance. To address aforementioned issues, we propose VLM-OpenXpert, which comprises two training‑free, plug‑and‑play inference modules. SUFF performs SVD on high-confidence unknowns to extract a low-rank "unknown subspace". Each sample’s projection onto this subspace is weighted and softly removed from its feature, suppressing unknown components while preserving semantics. BGAT corrects score skewness via a Box–Cox transform, then fits a bimodal Gaussian mixture to adaptively estimate the optimal threshold balancing known-class recognition and unknown-class rejection. Experiments on 9 benchmarks and three backbones (CLIP, SigLIP, ALIGN) under Source-Free OSDA settings show that our training-free pipeline matches or outperforms retraining-heavy state-of-the-art methods, establishing a powerful lightweight inference calibration paradigm for open-set VLM deployment.

AAAI Conference 2026 Conference Paper

CoGrad3D: Spatially-Coupled Timestep Optimization with Orthogonal Gradient Fusion for 3D Generation

  • Haoyang Tong
  • Hongbo Wang
  • Jin Liu
  • Qi Wang
  • Jie Cao
  • Ran He

Score Distillation Sampling has driven recent advances in text-to-3D generation. However, current approaches often fail to produce 3D assets that are both rich in detail and consistent across viewpoints. These limitations primarily arise from imbalanced guidance on fine-grained details and an overdependence on single-view optimization—issues exacerbated by the excessive randomness in selecting diffusion timesteps and camera configurations. Such deficiencies commonly lead to blurry textures and inter-view inconsistencies, which degrade visual realism and hinder practical deployment. To tackle these challenges, we introduce CoGrad3D, a unified generative refinement framework that adopts a continuously adaptive optimization strategy. By dynamically modulating the optimization focus based on real-time convergence signals, CoGrad3D ensures balanced progress toward both geometric completeness and high-fidelity detail. Concretely, we propose an adaptive region sampling strategy that emphasizes under-converged viewing areas, promoting stable and uniform optimization. To facilitate the transition from coarse geometry to fine-grained reconstruction, we develop a region-aware temporal scheduling scheme that integrates global training dynamics with local convergence feedback. Furthermore, we introduce a gradient fusion mechanism that consolidates historical gradients from adjacent viewpoints, mitigating view-specific artifacts and promoting the emergence of coherent 3D structures. Extensive experiments demonstrate that CoGrad3D substantially surpasses existing methods in both geometric consistency and texture fidelity, enabling the generation of high-quality, view-consistent 3D models from textual descriptions.

EAAI Journal 2026 Journal Article

Deep learning-aided Laser Doppler Velocimeter-Inertial Measurement Unit Fusion for Robust Vehicle Localization in Global Navigation Satellite Systems-denied environments

  • Zhiyi Xiang
  • Qi Wang
  • Xiaoming Nie
  • Jian Zhou

Achieving reliable and precise vehicle positioning is paramount for modern autonomous systems, yet it remains a formidable challenge in Global Navigation Satellite Systems (GNSS)-denied environments, especially when relying on ubiquitous low-cost Micro-Electro-Mechanical Systems (MEMS) Inertial Measurement Units (IMUs). This paper introduces a solution that enhances MEMS IMU capabilities by integrating two symmetrically mounted dual-beam Laser Doppler Velocimeters (LDVs). Our core innovation lies in leveraging two specialized Long Short-Term Memory (LSTM) networks that robustly regress the vehicle’s yaw and lateral velocities by effectively fusing both LDV and IMU outputs. To further elevate system accuracy, we propose an LDV outlier handling strategy and a method for LSTM prediction reliability detection designed to mitigate the adverse effects of anomalous network outputs. The vehicle velocities from the LDVs, augmented by our LSTM-derived yaw and lateral velocities, are then fused with MEMS IMU data within a Lie group-based Kalman filter. Experimental validation through two rigorous test sets demonstrates that our method significantly reduces system positioning errors under prolonged GNSS-denied conditions, outperforming existing LDV-based methods. This work underscores the potential of combining precise LDV measurements with the predictive power of deep learning and a robust Lie group-based data fusion strategy for accurate and reliable autonomous vehicle localization.

JBHI Journal 2026 Journal Article

Dual-Student Adversarial Framework With Discriminator and Consistency-Driven Learning for Semi-Supervised Medical Image Segmentation

  • Haifan Wu
  • Yuhan Geng
  • Di Gai
  • Jieying Tu
  • Xin Xiong
  • Qi Wang
  • Zheng Huang

Semi-supervised medical image segmentation is essential for alleviating the cost of manual annotation in clinical applications. However, existing methods often suffer from unreliable pseudo-labels and confirmation bias in consistency-based training, which can lead to unstable optimization and degraded performance. To address these issues, a novel method named dual-Student adversarial framework with discriminator and consistency-driven learning for semi-supervised medical image segmentation is proposed. Specifically, an adversarial learning-based segmentation refinement (ALSR) module is designed to encourage prediction diversity between two student networks and leverage a shared discriminator for adversarial refinement of pseudo-labels. To further stabilize the consistency process, a residual exponential moving average (R-EMA) is applied in the uncertainty estimation with inter-instance consistency measurement (UIM) module to construct a robust teacher model, while noisy voxel predictions are selectively filtered based on uncertainty estimation. In addition, a Contrastive Representation Stabilization (CRS) module is developed to enhance voxel-level semantic alignment by performing contrastive learning only on confident regions, improving feature discriminability and structural consistency. Extensive experiments on benchmark datasets demonstrate that our method consistently outperforms prior state-of-the-art approaches.

AAAI Conference 2026 Conference Paper

Exploring Generalizable Remote Sensing Change Detection via Low-Rank Exchange Adaptation of Vision Foundation Model

  • Mingwei Zhang
  • Jingtao Hu
  • Qiang Li
  • Qi Wang

Remote sensing change detection (CD) has achieved remarkable progress in recent years. However, little attention has been paid to generalizable change detection (GCD) methods that can effectively generalize to unseen scenarios or domains beyond the training distribution. The major challenges in GCD arise from domain diversity and bitemporal domain shifts in remote sensing images, caused by variations in imaging platforms, acquisition times, geographic regions, and observed events. To tackle these challenges, we propose GenCD, a GCD framework built upon vision foundation models (VFMs). Specifically, GenCD introduces two key components: (1) a Low-Rank Exchange Adaptation (LREA) strategy of VFMs that aligns bitemporal representations while preserving the generalization capacity of VFMs on single-temporal inputs; and (2) a Token-Guided Feature Refinement (TGFR) mechanism that leverages an input-independent token as a guide to refine difference features, improving the discrimination between changed and unchanged regions. We conduct extensive cross-dataset evaluations on eight diverse datasets across three binary CD tasks: land cover, land use, and building-only CD. The results consistently demonstrate the superior generalization of GenCD over SoTA methods, highlighting its effectiveness in GCD.

AAAI Conference 2026 Conference Paper

HISE-KT: Synergizing Heterogeneous Information Networks and LLMs for Explainable Knowledge Tracing with Meta-Path Optimization

  • Zhiyi Duan
  • Zixing Shi
  • Hongyu Yuan
  • Qi Wang

Knowledge Tracing (KT) aims to mine students’ evolving knowledge states and predict their future question-answering performance. Existing methods based on heterogeneous information networks (HINs) are prone to introducing noises due to manual or random selection of meta-paths and lack necessary quality assessment of meta-path instances. Conversely, recent large language models (LLMs)-based methods ignore the rich information across students, and both paradigms struggle to deliver consistently accurate and evidence-based explanations. To address these issues, we propose an innovative framework, HIN-LLM Synergistic Enhanced Knowledge Tracing (HISE-KT), which seamlessly integrates HINs with LLMs. HISE-KT first builds a multi-relationship HIN containing diverse node types to capture the structural relations through multiple meta-paths. The LLM is then employed to intelligently score and filter meta-path instances and retain high-quality paths, pioneering automated meta-path quality assessment. Inspired by educational psychology principles, a similar student retrieval mechanism based on meta-paths is designed to provide a more valuable context for prediction. Finally, HISE-KT uses a structured prompt to integrate the target student's history with the retrieved similar trajectories, enabling the LLM to generate not only accurate predictions but also evidence-backed, explainable analysis reports. Experiments on four public datasets show that HISE-KT outperforms existing KT baselines in both prediction performance and interpretability.

JBHI Journal 2026 Journal Article

HyperSynergyX: Synergistic Drug Combination Prediction via Hypergraph Modeling and Knowledge Graph-Enhanced Retrieval-Augmented Generation

  • Qi Wang
  • Bingzheng Wu
  • Minglang Xu
  • Xiya Liu
  • Yiming Mao
  • Zhiheng Zhou
  • Guiying Yan

Drug combination therapy is pivotal for complex diseases, but identifying synergistic three-drug regimens remains challenging due to both combinatorial explosion and the opacity of existing computational models. To address this, we introduce HyperSynergyX, an explainable framework that integrates synergy prediction with mechanistic explanation. Its core predictive component, a Dual-Biased Random Walk on Hypergraphs (DBRWH), models higher-order interactions among drugs on a three drug hypergraph and identifies latent combination patterns via tensor decomposition. To enhance interpretability, we couple DBRWH with a knowledge-graph–enhanced retrieval augmented generation (KG-RAG) module that retrieves mechanistically relevant subgraphs and uses them to generate biologically grounded hypotheses for predicted synergies. On breast-cancer data, DBRWH achieves AUROC/AUPRC of 0. 9593/0. 9453 under 5-fold cross-validation, and on lung cancer data it achieves 0. 9262/0. 9481, outperforming strong deep learning and hypergraph baselines. By linking predictive performance with mechanistic interpretability, HyperSynergyX provides a robust and transparent tool to accelerate multi-drug discovery and support rational regimen design in precision oncology. The code is available at: https://github.com/wangqi27/HyperSynergyX.

AAAI Conference 2026 Conference Paper

PIMRL: Physics-Informed Multi-Scale Recurrent Learning for Burst-Sampled Spatiotemporal Dynamics

  • Han Wan
  • Qi Wang
  • Yuan Mi
  • Rui Zhang
  • Hao Sun

Deep learning has shown strong potential in modeling complex spatiotemporal dynamics. However, most existing methods depend on densely and uniformly sampled data, which is often unavailable in practice due to sensor and cost limitations. In many real-world settings, such as mobile sensing and physical experiments, data are burst-sampled with short high-frequency segments followed by long gaps, making it difficult to learn accurate dynamics from sparse observations. To address this issue, we propose Physics-Informed Multi-Scale Recurrent Learning (PIMRL), a novel framework specifically designed for burst-sampled spatiotemporal data. PIMRL combines macro-scale latent dynamics inference with micro-scale adaptive refinement guided by incomplete prior information from partial differential equations (PDEs). It further introduces a temporal message-passing mechanism to effectively propagate information across burst intervals. This multi-scale architecture enables PIMRL to model complex systems accurately even under severe data scarcity. We evaluate our approach on five benchmark datasets involving 1D to 3D multi-scale PDEs. The results show that PIMRL consistently outperforms state-of-the-art baselines, achieving substantial improvements and reducing errors by up to 80\% in the most challenging settings, which demonstrates the clear advantage of our model. Our work demonstrates the effectiveness of physics-informed recurrent learning for accurate and efficient modeling of sparse spatiotemporal systems.

EAAI Journal 2026 Journal Article

Predicting dielectric properties of polyetherimide-based composite via combined molecular dynamics simulation and machine learning

  • Yue Zhang
  • Zheng Gong
  • Changhai Zhang
  • Yongquan Zhang
  • Chao Yin
  • Xubin Wang
  • Tiandong Zhang
  • Xiajie Yi

The design of high-performance polymer dielectrics for capacitor energy storage is crucial but often hindered by time-consuming, resource-intensive development cycles. Polyetherimide is a promising matrix material, yet its performance is limited by a low dielectric constant and breakdown strength. To accelerate the design process, we propose and validate an integrated computational framework combining molecular dynamics simulations with interpretable machine learning. In terms of the Artificial Intelligence contribution, a weighted ensemble model was developed from a dual database of molecular dynamics parameters and molecular descriptors to predict dielectric property in Polyetherimide-based composites. The model was then deconstructed using the SHapley Additive exPlanations framework, which unveiled a multi-scale design hierarchy. This analysis revealed that filler weight fraction and intrinsic dielectric constant are the most dominant predictors, followed by interfacial compatibility and molecular polarity. Regarding the engineering application, to validate our computational approach, model-selected Benzil and Acetophenone were fabricated into composite films. Experimental results confirmed the model's high accuracy, identifying optimal contents of weight percent of 15 wt% for Benzil and 10 wt% for Acetophenone. Notably, the Polyetherimide-based composite with 10 wt% of Acetophenone achieved an excellent discharge energy density of 10. 3 J/cm3, representing a 58 % enhancement over pristine Polyetherimide. Ultimately, this study not only developed a promising material but established a reliable data-driven methodology providing clear guidance for designing next-generation polymer dielectrics.

AAAI Conference 2026 Conference Paper

Reasoning via Implicit Self-supervised Emergence for Instruction Segmentation

  • Qing Zhou
  • Lichang Yang
  • Yuyu Jia
  • Junyu Gao
  • Weiping Ni
  • Junzheng Wu
  • Qi Wang

We challenge the assumption that complex instruction-guided segmentation tasks necessitate equally complex and explicit supervision. This paper introduces RISE (Reasoning via Implicit Self-supervised Emergence), a framework that learns intricate compositional reasoning, spanning spatial relations to world knowledge, without a single ground-truth mask. To achieve this, RISE employs reinforcement learning with GRPO guided by a single, strikingly simple reward: the semantic alignment score between the textual instruction and the predicted image region. Our primary discovery is the implicit emergence of a high-quality chain-of-thought process from this minimalist signal. Within a structured format, the model autonomously learns to understand instructions by accessing its latent knowledge, inferring spatial relationships—capabilities inherent in its architecture but unlocked by our simple objective. Remarkably, our emergent reasoning yields highly competitive results: RISE achieves 58.7 gIoU on the ReasonSeg benchmark, on par with methods using geometric rewards. Furthermore, we show extreme data efficiency: a variant trained on only 2,000 ImageNet-label pairs establishes a new state-of-the-art for annotation-free referring segmentation with 79.6 cIoU on RefCOCO.

EAAI Journal 2026 Journal Article

Research on cable terminal interface defect state detection based on electric field characteristics and multi-core improved support vector machine

  • Yujing Tang
  • Yang Fu
  • Qin Cai
  • Jieping Wu
  • Qi Wang
  • Guoqiang Gao

As key equipment for high-speed rail power transmission and the connection of high-voltage systems, the cable terminals are crucial to ensuring the stable operation of the railway system. However, the existing detection methods for cable terminals are easily affected by on-site noise and have low detection accuracy. Therefore, this paper proposes a method for detecting interface defect status of high-speed cable terminals based on the electric field strength feature set and multi-kernel support vector machine (MK-SVM). Firstly, a spatial electric field detection platform was built to extract the electric field intensity of the prefabricated defective cable terminals of different lengths. Secondly, the optimization of the characteristic parameters of electric field strength of defective cable terminals was realized based on the Pearson coefficient method. In order to improve the recognition effect and model generalization ability, a MK-SVM combining linear kernel function and radial basis kernel function was proposed. Finally, a comparative study was conducted on the optimization effects of particle swarm algorithm, firefly algorithm, simulated annealing algorithm and genetic algorithm on MK-SVM. Research has shown that using genetic algorithm for parameter optimization of multi-core SVM has the best performance, with recognition accuracy, average precision, average recall, and average F1 score of 95. 6 %, 96 %, 95. 6 %, and 0. 96, respectively. Compared with the unoptimized SVM, the four feature parameters increased by 8. 9 %, 7. 9 %, 8. 9 %, and 9. 6 %, respectively.

AAAI Conference 2026 Conference Paper

Slender3D: Curve-Guided Multi-View Reconstruction of Slender Structures

  • Suqin Wang
  • Zeyi Wang
  • Min Shi
  • Zhaoxin Li
  • Qi Wang
  • Xiujuan Chai
  • Dengming Zhu

Although geometric reconstruction of general objects from images has made remarkable progress in recent years, slender structures remain largely underexplored, despite their critical importance in engineering, biomedical, and agricultural applications. To bridge this gap, we propose a dedicated 2DGS-based geometric reconstruction framework tailored for slender structures, achieving accurate and faithful geometry recovery. Our method first addresses the challenge that most slender objects are texture-less, which hinders reliable feature matching and pose estimation in traditional SfM pipelines. By leveraging the curve-like nature of slender structures, we perform a curve-guided SfM process that provides robust camera poses and accurate 3D curve initialization for Gaussian primitives. To ensure SfM reliability, we introduce a high-precision mask extraction strategy that integrates geometric priors with a segmentation network, effectively handling self-occlusion and thin geometry. Furthermore, to enhance fine geometric recovery, we incorporate a differentiable Poisson reconstruction module to extract an initial mesh during training, which is then refined via image-space iterative optimization using differentiable mesh rasterization. In contrast to conventional approaches that rely on differentiable Gaussian rasterization followed by TSDF-based mesh extraction, our method avoids the additional geometric errors and artifacts introduced during the intermediate TSDF conversion, thereby improving the overall reconstruction quality. Comprehensive experiments on both synthetic and real-world datasets validate that our method achieves superior reconstruction quality compared to state-of-the-art approaches.

AAAI Conference 2026 Conference Paper

Symphony-MoE: Harmonizing Disparate Pre-trained Models into a Coherent Mixture-of-Experts

  • Qi Wang
  • Hanyang Peng
  • Yue Yu

Mixture-of-Experts (MoE) models enable scalable performance by activating large parameter sets sparsely, minimizing computational overhead. To mitigate the prohibitive cost of training MoEs from scratch, recent work employs upcycling, reusing a single pre-trained dense model by replicating its feed-forward network (FFN) layers into experts. However, this limits expert diversity, as all experts originate from a single pre-trained dense model. This paper addresses this limitation by constructing powerful MoE models using experts sourced from multiple identically-architected but disparate pre-trained models (e.g., Qwen2.5-Coder and Qwen2). A key challenge lies in the fact that these source models occupy disparate, dissonant regions of the parameter space, making direct upcycling prone to severe performance degradation. To overcome this, we propose Symphony-MoE, a novel two-stage framework designed to harmonize these models into a single, coherent expert mixture. First, we establish this harmony in a training-free manner: we construct a shared backbone via a layer-aware fusion strategy and, crucially, alleviate parameter misalignment among experts using activation-based functional alignment. Subsequently, a stage of post-training coordinates the entire architecture. Experiments demonstrate that our method successfully integrates experts from heterogeneous sources, achieving an MoE model that significantly surpasses baselines in multi-domain tasks and out-of-distribution generalization.

AAAI Conference 2026 Conference Paper

TAPO: Dynamic Teacher and Perturbed Answer Injection for Policy Optimization

  • Maowei Jiang
  • Zihang Wang
  • Qi Wang
  • Peter Búš
  • Moquan Cheng
  • Yifan Wang
  • Quangao Liu
  • Ruiqi Li

Reinforcement learning (RL) has emerged as a powerful framework to improve the reasoning performance of large language models (LLMs), with approaches such as Group Relative Policy Optimization (GRPO) showing promising results. However, GRPO and its variants struggle with collapsed groups (i.e., all-correct or all-incorrect completions), leading to zero-variance rewards and ineffective gradient signals. Moreover, focusing solely on final answer correctness while ignoring the reasoning process, along with rigid length penalties, can hinder training stability and output quality. To address these issues, we introduce TAPO, a reinforcement learning framework that enhances optimization signals by modifying sampled completions within training groups. TAPO incorporates three core techniques: (1) Dynamic Teacher Injection (DTI), which selectively injects high-quality or adversarial examples to restore effective gradient signals in collapsed groups; (2) Perturbed Answer Injection (PAI), which makes partially correct completions to provide contrastive supervision separating reasoning correctness but wrong answer from the trajectories; and (3) InfoLen-Aware Reward Shaping, a fine-grained reward strategy that penalizes outputs based on both length and semantic redundancy, encouraging concise yet informative responses. Extensive experimental results demonstrate that TAPO significantly improves the mathematical reasoning capabilities of LLMs across multiple challenging benchmarks, outperforming the GRPO baseline by a substantial margin. Component-wise ablations further validate the contribution of each proposed technique.

AAAI Conference 2026 Conference Paper

Target-Balanced Score Distillation

  • Zhou Xu
  • Qi Wang
  • Yuxiao Yang
  • Luyuan Zhang
  • Zhang Liang
  • Yang Li

Score Distillation Sampling (SDS) enables 3D asset generation by distilling priors from pretrained 2D text-to-image diffusion models, but vanilla SDS suffers from over-saturation and over-smoothing. To mitigate this issue, recent variants have incorporated negative prompts. However, these methods face a critical trade-off: limited texture optimization, or significant texture gains with shape distortion. In this work, we first conduct a systematic analysis and reveal that this trade-off is fundamentally governed by the utilization of the negative prompts, where Target Negative Prompts (TNP) that embed target information in the negative prompts dramatically enhancing texture realism and fidelity but inducing shape distortions. Informed by this key insight, we introduce the Target-Balanced Score Distillation (TBSD). It formulates generation as a multi-objective optimization problem and introduces an adaptive strategy that effectively resolves the aforementioned trade-off. Extensive experiments demonstrate that TBSD significantly outperforms existing state-of-the-art methods, yielding 3D assets with high-fidelity textures and geometrically accurate shape.

NeurIPS Conference 2025 Conference Paper

Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning

  • Yixiu Mao
  • Yun Qu
  • Qi Wang
  • Xiangyang Ji

Offline reinforcement learning (RL) suffers from extrapolation errors induced by out-of-distribution (OOD) actions. To address this, offline RL algorithms typically impose constraints on action selection, which can be systematically categorized into density, support, and sample constraints. However, we show that each category has inherent limitations: density and sample constraints tend to be overly conservative in many scenarios, while the support constraint, though least restrictive, faces challenges in accurately modeling the behavior policy. To overcome these limitations, we propose a new neighborhood constraint that restricts action selection in the Bellman target to the union of neighborhoods of dataset actions. Theoretically, the constraint not only bounds extrapolation errors and distribution shift under certain conditions, but also approximates the support constraint without requiring behavior policy modeling. Moreover, it retains substantial flexibility and enables pointwise conservatism by adapting the neighborhood radius for each data point. In practice, we employ data quality as the adaptation criterion and design an adaptive neighborhood constraint. Building on an efficient bilevel optimization framework, we develop a simple yet effective algorithm, Adaptive Neighborhood-constrained Q learning (ANQ), to perform Q learning with target actions satisfying this constraint. Empirically, ANQ achieves state-of-the-art performance on standard offline RL benchmarks and exhibits strong robustness in scenarios with noisy or limited data.

AAAI Conference 2025 Conference Paper

CLIP-driven View-aware Prompt Learning for Unsupervised Vehicle Re-identification

  • Jiyang Xu
  • Qi Wang
  • Xin Xiong
  • Di Gai
  • Ruihua Zhou
  • Dong Wang

With the emergence of vision-language pre-trained models, such as CLIP, some textual prompts have been gradually introduced recently into re-identification (Re-ID) tasks to obtain considerably robust multimodal information. However, most textual descriptions based on vehicle Re-ID tasks only contain identity index words without specific words to describe vehicle view information, thereby resulting in difficulty to be widely applied in vehicle Re-ID tasks with view variations. This case inspires us to propose a CLIP-driven view-aware prompt learning framework for unsupervised vehicle Re-ID. We first design a learnable textual prompt template called view-aware context optimization (ViewCoOp) based on dynamic multi-view word embeddings, which can fully obtain the proportion and position encoding of each view in the whole vehicle body region. Subsequently, a cross-modal mutual graph is constructed to explore the connections between inter-modal and intra-modal. Each sample is treated as a graph node, which extracts textual features based on ViewCoOp and the visual features of images. Moreover, leveraging the inter-cluster and intra-cluster correlation in the bimodal clustering results in the determination of connectivity between graph node pairs. Lastly, the proposed cross-modal mutual graph method utilizes supervised information from the bimodal gap to directly fine-tune the image encoder of CLIP for downstream unsupervised vehicle Re-ID tasks. Extensive experiments verify that the proposed method is capable of effectively obtaining cross-modal description ability from multiple views.

EAAI Journal 2025 Journal Article

Development of intelligent equipment for weed identification and variable spraying in lettuce fields based on instance segmentation framework

  • Long-Tao Niu
  • Wen-Hao Su
  • He-Yi Zhang
  • Qi Wang
  • Bo-Wen Dong
  • Yankun Peng

Weeds in the field compete with crops for nutrients, water and sunlight, hindering the early growth of crops. If not controlled in time, weeds may adversely affect crop growth and yield. Although chemical weed control is low cost, efficient and widely applicable, excessive use of chemical agents may lead to herbicide residues and environmental pollution. In this study, an instance segmentation-based intelligent equipment was developed for weed recognition and targeted variable-rate spraying in lettuce fields. The You-Only-Look-Once version 8 segmentation (YOLOv8-seg) model was optimized through three key enhancements. Initially, Depthwise Separable Convolution (DSConv) was adopted to replace standard convolutional layers, effectively reducing model complexity, and improving computational efficiency. After that, a novel Faster Implementation of Cross Stage Partial Bottleneck with 2 Convolutions-Star shaped Convolutional (C2f_Star) module was proposed, which integrated the StarBlock from the Star-shaped Convolutional Neural Network (StarNet) into the existing structure, thereby enhancing the feature extraction capabilities of the model. Finally, the Simple Attention Module (SimAM), a parameter-free attention mechanism, was introduced to improve the model's attention to relevant features without increasing the number of parameters. These improvements led to the development of the YOLOv8n-seg model, which achieved a mean Average Precision (mAP) of 90. 15 % at 0. 5 Intersection over Union (IoU), with 2, 281, 702 parameters and an inference speed of 15. 7 ms per frame. Compared with the original model, the average precision and inference speed increased by 2. 65 % and 4. 3 %, respectively, while the number of parameters was reduced by 30 %. By combining this model with post-processing algorithms, a precision variable spraying algorithm and equipment were developed. Laboratory experiments at three different weed density levels demonstrated that the system achieved an average recognition accuracy of 95. 2 % and a target spraying success rate of 97. 2 % for weeds in lettuce fields. Herbicide dosage was reduced by 88. 42 %, 65. 25 %, and 37. 30 % at the three density levels, respectively. This research provides essential theoretical and technical support for the development of precision spraying and weeding robots.

JBHI Journal 2025 Journal Article

Edge-Guided Multi-Scale Frequency Attention Network for Gastrointestinal Cancer Image Segmentation

  • Zhiwen Liao
  • Qi Wang
  • Xinyi Tang
  • Han Wang
  • Jun Hu
  • Pengxiang Su
  • Evangelos K. Markakis
  • Peng Luo

Image segmentation is a critical technology to improve the accuracy of clinical decisions and treatments in computer-aided diagnostic systems. However, the diverse morphology and fuzzy boundaries of gastrointestinal tumors incur substantial challenges for existing segmentation models, leading to inaccurate feature capture and generating suboptimal results. For solving these problems, we design an edge-guided multi-scale frequency attention network for the gastrointestinal tumor segmentation task, termed EGMFA-Net, which consists of a Kernel Adaptive Enhancement Module (KAEM) and a Frequency-domain Self-attention Module (FDSA). Specifically, KAEM adaptively adjusts the feature extraction kernel based on the morphology of different lesion regions, which enhances the recognition of different morphology regions via a progressive optimization strategy of feature expression. Furthermore, FDSA effectively aggregates multi-scale features in the frequency domain to achieve global receptive fields while preserving more high-frequency details, thereby enhancing adaptability to complex pathological contexts. Extensive experiments on eight medical image benchmark datasets, including SEED, Kvasir, ClinicDB, ColonDB, ETIS, BKAI, CVC-300, and Synapse, show that EGMFA-Net attains state-of-the-art performance over existing methods. Our implementation is available at https://github.com/med-segment/egmfa-net.

NeurIPS Conference 2025 Conference Paper

Gains: Fine-grained Federated Domain Adaptation in Open Set

  • Zhengyi Zhong
  • Wenzheng Jiang
  • Weidong Bao
  • Ji Wang
  • Qi Wang
  • Guanbo Wang
  • Yongheng Deng
  • Ju Ren

Conventional federated learning (FL) assumes a closed world with a fixed total number of clients. In contrast, new clients continuously join the FL process in real-world scenarios, introducing new knowledge. This raises two critical demands: detecting new knowledge, i. e. , knowledge discovery, and integrating it into the global model, i. e. , knowledge adaptation. Existing research focuses on coarse-grained knowledge discovery, and often sacrifices source domain performance and adaptation efficiency. To this end, we propose a fine-grained federated domain adaptation approach in open set (Gains). Gains splits the model into an encoder and a classifier, empirically revealing features extracted by the encoder are sensitive to domain shifts while classifier parameters are sensitive to class increments. Based on this, we develop fine-grained knowledge discovery and contribution-driven aggregation techniques to identify and incorporate new knowledge. Additionally, an anti-forgetting mechanism is designed to preserve source domain performance, ensuring balanced adaptation. Experimental results on multi-domain datasets across three typical data-shift scenarios demonstrate that Gains significantly outperforms other baselines in performance for both source-domain and target-domain clients. Code is available at: https: //github. com/Zhong-Zhengyi/Gains.

AAAI Conference 2025 Conference Paper

GTDE: Grouped Training with Decentralized Execution for Multi-agent Actor-Critic

  • Mengxian Li
  • Qi Wang
  • Yongjun Xu

The rapid advancement of multi-agent reinforcement learning (MARL) has given rise to diverse training paradigms to learn the policies of each agent in the multi-agent system. The paradigms of decentralized training and execution (DTDE) and centralized training with decentralized execution (CTDE) have been proposed and widely applied. However, as the number of agents increases, the inherent limitations of these frameworks significantly degrade the performance metrics, such as win rate, total reward, etc. To reduce the influence of the increasing number of agents on the performance metrics, we propose a novel training paradigm of grouped training decentralized execution (GTDE). This framework eliminates the need for a centralized module and relies solely on local information, effectively meeting the training requirements of large-scale multi-agent systems. Specifically, we first introduce an adaptive grouping module, which divides each agent into different groups based on their observation history. To implement end-to-end training, GTDE uses Gumbel-Sigmoid for efficient point-to-point sampling on the grouping distribution while ensuring gradient backpropagation. To adapt to the uncertainty in the number of members in a group, two methods are used to implement a group information aggregation module that merges member information within the group. Empirical results show that in a cooperative environment with 495 agents, GTDE increased the total reward by an average of 382% compared to the baseline. In a competitive environment with 64 agents, GTDE achieved a 100% win rate against the baseline.

NeurIPS Conference 2025 Conference Paper

H3D-DGS: Exploring Heterogeneous 3D Motion Representation for Deformable 3D Gaussian Splatting

  • Bing He
  • Yunuo Chen
  • Guo Lu
  • Qi Wang
  • Qunshan Gu
  • Rong Xie
  • Li Song
  • Wenjun Zhang

Dynamic scene reconstruction poses a persistent challenge in 3D vision. Deformable 3D Gaussian Splatting has emerged as an effective method for this task, offering real-time rendering and high visual fidelity. This approach decomposes a dynamic scene into a static representation in a canonical space and time-varying scene motion. Scene motion is defined as the collective movement of all Gaussian points, and for compactness, existing approaches commonly adopt implicit neural fields or sparse control points. However, these methods predominantly rely on gradient-based optimization for all motion information. Due to the high degree of freedom, they struggle to converge on real-world datasets exhibiting complex motion. To preserve the compactness of motion representation and address convergence challenges, this paper proposes heterogeneous 3D control points, termed \textbf{H3D control points}, whose attributes are obtained using a hybrid strategy combining optical flow back-projection and gradient-based methods. This design decouples directly observable motion components from those that are geometrically occluded. Specifically, components of 3D motion that project onto the image plane are directly acquired via optical flow back projection, while unobservable portions are refined through gradient-based optimization. Experiments on the Neu3DV and CMU-Panoptic datasets demonstrate that our method achieves superior performance over state-of-the-art deformable 3D Gaussian splatting techniques. Remarkably, our method converges within just 100 iterations and achieves a per-frame processing speed of 2 seconds on a single NVIDIA RTX 4070 GPU.

IJCAI Conference 2025 Conference Paper

Leveraging Pretrained Diffusion Models for Zero-Shot Part Assembly

  • Ruiyuan Zhang
  • Qi Wang
  • Jiaxiang Liu
  • Yuchi Huo
  • Chao Wu

3D part assembly aims to understand part relationships and predict their 6-DoF poses to construct realistic 3D shapes, addressing the growing demand for autonomous assembly, which is crucial for robots. Existing methods mainly estimate the transformation of each part by training neural networks under supervision, which requires a substantial quantity of manually labeled data. However, the high cost of data collection and the immense variability of real-world shapes and parts make traditional methods impractical for large-scale applications. In this paper, we propose first a zero-shot part assembly method that utilizes pre-trained point cloud diffusion models as discriminators in the assembly process, guiding the manipulation of parts to form realistic shapes. Specifically, we theoretically demonstrate that utilizing a diffusion model for zero-shot part assembly can be transformed into an Iterative Closest Point (ICP) process. Then, we propose a novel pushing-away strategy to address the overlap parts, thereby further enhancing the robustness of the method. To verify our work, we conduct extensive experiments and quantitative comparisons to several strong baseline methods, demonstrating the effectiveness of the proposed approach, which even surpasses the supervised learning method. The code has been released on https: //github. com/Ruiyuan-Zhang/Zero-Shot-Assembly.

IJCAI Conference 2025 Conference Paper

PeSANet: Physics-encoded Spectral Attention Network for Simulating PDE-Governed Complex Systems

  • Han Wan
  • Rui Zhang
  • Qi Wang
  • Yang Liu
  • Hao Sun

Accurately modeling and forecasting complex systems governed by partial differential equations (PDEs) is crucial in various scientific and engineering domains. However, traditional numerical methods struggle in real-world scenarios due to incomplete or unknown physical laws. Meanwhile, machine learning approaches often fail to generalize effectively when faced with scarce observational data and the challenge of capturing local and global features. To this end, we propose the Physics-encoded Spectral Attention Network (PeSANet), which integrates local and global information to forecast complex systems with limited data and incomplete physical priors. The model consists of two key components: a physics-encoded block that uses hard constraints to approximate local differential operators from limited data, and a spectral-enhanced block that captures long-range global dependencies in the frequency domain. Specifically, we introduce a novel spectral attention mechanism to model inter-spectrum relationships and learn long-range spatial features. Experimental results demonstrate that PeSANet outperforms existing methods across all metrics, particularly in long-term forecasting accuracy, providing a promising solution for simulating complex systems with limited data and incomplete physics.

EAAI Journal 2025 Journal Article

Quantization-based deep diversified ensemble for medical image segmentation

  • Jiawei Zhang
  • Jialin Wang
  • Qi Wang
  • Yanchun Zhang
  • Weihong Han
  • Yangyang Mei
  • Yiyu Shi
  • Jian Zhuang

Recent advancements in fully convolutional networks (FCNs) have significantly improved medical image segmentation. Ensemble methods are often used to further enhance performance, with diversity among learners being a critical factor. However, many current approaches focus on diversifying training samples or predictions while overlooking the diversity of internal multi-scale features. This oversight can lead to high correlations among features across different learners, limiting overall effectiveness. Additionally, traditional quantization methods aim to minimize accuracy loss by maintaining a rigid quantization process. This rigidity can eliminate the randomness introduced by quantization, further reducing ensemble diversity and effectiveness. In this paper, we propose a novel approach called Quantization-based Deep Diversified Ensemble (QDD-Ens) for medical image segmentation. Our method enhances the diversity of internal features among ensemble learners through two mechanisms: deep diversified loss, which focuses on feature diversity rather than segmentation accuracy, and deep diversified quantization, which preserves beneficial randomness in quantization process. Furthermore, QDD-Ens facilitates a deeper form of ensemble learning by employing a meta-learner to integrate diversified features at multiple resolution levels from various base learners, which are diversified by two above diversify enhancement mechanisms. Extensive experiments on five public medical image segmentation datasets show that our method significantly improves segmentation accuracy and outperforms existing ensemble techniques. The source code is publicly available to support future research. (https: //github. com/JerRuy/QDD-Ens)

JBHI Journal 2025 Journal Article

Re-Visible Dual-Domain Self-Supervised Deep Unfolding Network for MRI Reconstruction

  • Hao Zhang
  • Qi Wang
  • Jian Sun
  • Zhijie Wen
  • Jun Shi
  • Shihui Ying

Magnetic Resonance Imaging (MRI) is widely used in clinical practice, but suffers from prolonged acquisition time. Although deep learning methods have been proposed to accelerate acquisition and demonstrate promising performance, they rely on high-quality fully-sampled datasets for training in a supervised manner. However, such datasets are time-consuming and expensive-to-collect, which constrains their broader applications. On the other hand, self-supervised methods offer an alternative by enabling learning from under-sampled data alone, but most existing methods rely on further partitioned under-sampled k-space data as model's input for training, which causes an input distribution shift between the the training stage and the inference stage. Additionally, their models have not effectively incorporated comprehensive image priors, leading to degraded reconstruction performance. In this paper, we propose a novel re-visible dual-domain self-supervised deep unfolding network to address these issues when only under-sampled datasets are available. Specifically, by incorporating re-visible dual-domain loss, all under-sampled k-space data are utilized during training to mitigate the input distribution shift caused by further partitioning. This design enables the model to implicitly adapt to all under-sampled k-space data as input. Additionally, we design a Deep Unfolding Network based on Chambolle and Pock Proximal Point Algorithm (DUN-CP-PPA) to achieve end-to-end reconstruction. By employing a Spatial-Frequency Feature Extraction (SFFE) block to capture both global and local representations, the model effectively integrates imaging physics with comprehensive image priors to enhance reconstruction performance. Experiments on both single-coil and multi-coil datasets demonstrate that our method outperforms state-of-the-art approaches in terms of reconstruction performance and generalization capability.

NeurIPS Conference 2025 Conference Paper

RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility

  • Haoyu He
  • Haozheng Luo
  • Yan Chen
  • Qi Wang

Predicting human mobility is inherently challenging due to complex long-range dependencies and multi-scale periodic behaviors. To address this, we introduce RHYTHM (Reasoning with Hierarchical Temporal Tokenization for Human Mobility), a unified framework that leverages large language models (LLMs) as general-purpose spatio-temporal predictors and trajectory reasoners. Methodologically, RHYTHM employs temporal tokenization to partition each trajectory into daily segments and encode them as discrete tokens with hierarchical attention that captures both daily and weekly dependencies, thereby quadratically reducing the sequence length while preserving cyclical information. Additionally, we enrich token representations by adding pre-computed prompt embeddings for trajectory segments and prediction targets via a frozen LLM, and feeding these combined embeddings back into the LLM backbone to capture complex interdependencies. Computationally, RHYTHM keeps the pretrained LLM backbone frozen, yielding faster training and lower memory usage. We evaluate our model against state-of-the-art methods using three real-world datasets. Notably, RHYTHM achieves a 2. 4% improvement in overall accuracy, a 5. 0% increase on weekends, and a 24. 6% reduction in training time. Code is publicly available at https: //github. com/he-h/rhythm.

NeurIPS Conference 2025 Conference Paper

Selective Learning for Deep Time Series Forecasting

  • Yisong Fu
  • Zezhi Shao
  • Chengqing Yu
  • Yujie Li
  • Zhulin An
  • Qi Wang
  • Yongjun Xu
  • Fei Wang

Benefiting from high capacity for capturing complex temporal patterns, deep learning (DL) has significantly advanced time series forecasting (TSF). However, deep models tend to suffer from severe overfitting due to the inherent vulnerability of time series to noise and anomalies. The prevailing DL paradigm uniformly optimizes all timesteps through the MSE loss and learns those uncertain and anomalous timesteps without difference, ultimately resulting in overfitting. To address this, we propose a novel selective learning strategy for deep TSF. Specifically, selective learning screens a subset of the whole timesteps to calculate the MSE loss in optimization, guiding the model to focus on generalizable timesteps while disregarding non-generalizable ones. Our framework introduces a dual-mask mechanism to target timesteps: (1) an uncertainty mask leveraging residual entropy to filter uncertain timesteps, and (2) an anomaly mask employing residual lower bound estimation to exclude anomalous timesteps. Extensive experiments across eight real-world datasets demonstrate that selective learning can significantly improve the predictive performance for typical state-of-the-art deep models, including 37. 4% MSE reduction for Informer, 8. 4% for TimesNet, and 6. 5% for iTransformer.

NeurIPS Conference 2025 Conference Paper

VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning

  • Qi Wang
  • Yanrui Yu
  • Ye Yuan
  • Rui Mao
  • Tianfei Zhou

Reinforcement fine-tuning (RFT) has shown great promise in achieving humanlevel reasoning capabilities of Large Language Models (LLMs), and has recently been extended to MLLMs. Nevertheless, reasoning about videos, which is a fundamental aspect of human intelligence, remains a persistent challenge due to the complex logic, temporal and causal structures inherent in video data. To fill this gap, we propose VideoRFT, a novel approach that extends the RFT paradigm to cultivate human-like video reasoning capabilities in MLLMs. VideoRFT follows the standard two-stage scheme in RFT: supervised fine-tuning (SFT) with chain-of-thought (CoT) annotations, followed by reinforcement learning (RL) to improve generalization. A central challenge to achieve this in the video domain lies in the scarcity of large-scale, high-quality video CoT datasets. We address this by building a multi-expert-driven, cognition-inspired CoT curation pipeline. First, we devise a cognition-inspired prompting strategy to elicit a reasoning LLM to generate preliminary CoTs based solely on rich, structured, and literal representations of video content. Subsequently, these CoTs are revised by a MLLM conditioned on the actual video, ensuring visual consistency and reducing visual hallucinations. This pipeline results in two new datasets, i. e. VideoRFT-CoT-102K for SFT and VideoRFT-RL-310K for RL. To further strengthen the RL phase, we introduce a novel semantic-consistency reward that explicitly promotes the alignment between textual reasoning and visual evidence. This reward encourages the model to produce coherent, context-aware reasoning outputs grounded in visual input. Extensive experiments show that VideoRFT achieves state-of-the-art performance on six video reasoning benchmarks.

AAAI Conference 2024 Conference Paper

Cross-Sentence Gloss Consistency for Continuous Sign Language Recognition

  • Qi Rao
  • Ke Sun
  • Xiaohan Wang
  • Qi Wang
  • Bang Zhang

Continuous sign language recognition (CSLR) aims to recognize gloss sequences from continuous sign videos. Recent works enhance the gloss representation consistency by mining correlations between visual and contextual modules within individual sentences. However, there still remain much richer correlations among glosses across different sentences. In this paper, we present a simple yet effective Cross-Sentence Gloss Consistency (CSGC), which enforces glosses belonging to a same category to be more consistent in representation than those belonging to different categories, across all training sentences. Specifically, in CSGC, a prototype is maintained for each gloss category and benefits the gloss discrimination in a contrastive way. Thanks to the well-distinguished gloss prototype, an auxiliary similarity classifier is devised to enhance the recognition clues, thus yielding more accurate results. Extensive experiments conducted on three CSLR datasets show that our proposed CSGC significantly boosts the performance of CSLR, surpassing existing state-of-the-art works by large margins (i.e., 1.6% on PHOENIX14, 2.4% on PHOENIX14-T, and 5.7% on CSL-Daily).

NeurIPS Conference 2024 Conference Paper

Doubly Mild Generalization for Offline Reinforcement Learning

  • Yixiu Mao
  • Qi Wang
  • Yun Qu
  • Yuhang Jiang
  • Xiangyang Ji

Offline Reinforcement Learning (RL) suffers from the extrapolation error and value overestimation. From a generalization perspective, this issue can be attributed to the over-generalization of value functions or policies towards out-of-distribution (OOD) actions. Significant efforts have been devoted to mitigating such generalization, and recent in-sample learning approaches have further succeeded in entirely eschewing it. Nevertheless, we show that mild generalization beyond the dataset can be trusted and leveraged to improve performance under certain conditions. To appropriately exploit generalization in offline RL, we propose Doubly Mild Generalization (DMG), comprising (i) mild action generalization and (ii) mild generalization propagation. The former refers to selecting actions in a close neighborhood of the dataset to maximize the Q values. Even so, the potential erroneous generalization can still be propagated, accumulated, and exacerbated by bootstrapping. In light of this, the latter concept is introduced to mitigate the generalization propagation without impeding the propagation of RL learning signals. Theoretically, DMG guarantees better performance than the in-sample optimal policy in the oracle generalization scenario. Even under worst-case generalization, DMG can still control value overestimation at a certain level and lower bound the performance. Empirically, DMG achieves state-of-the-art performance across Gym-MuJoCo locomotion tasks and challenging AntMaze tasks. Moreover, benefiting from its flexibility in both generalization aspects, DMG enjoys a seamless transition from offline to online learning and attains strong online fine-tuning performance.

IJCAI Conference 2024 Conference Paper

Error-aware Sampling in Adaptive Shells for Neural Surface Reconstruction

  • Qi Wang
  • Yuchi Huo
  • Qi Ye
  • Rui Wang
  • Hujun Bao

Neural implicit surfaces with signed distance functions (SDFs) achieve superior quality in 3D geometry reconstruction. However, training SDFs is time-consuming because it requires a great number of samples to calculate accurate weight distributions and a considerable amount of samples sampled from the distribution for integrating the rendering results. Some existing sampling strategies focus on this problem. During the training, they assume a spatially-consistent convergence speed of kernel size, thus still suffering from low convergence or errors. Instead, we introduce an error-aware sampling method based on thin intervals of valid weight distributions, dubbed adaptive shells, to reduce the number of samples while still maintaining the reconstruction accuracy. To this end, we first extend Laplace-based neural implicit surfaces with learned spatially-varying kernel sizes which indicates the range of valid weight distributions. Then, the adaptive shell for each ray is determined by an efficient double-clipping strategy with spatially-varying SDF values and kernel sizes, fitting larger kernel sizes to wider shells. Finally, we calculate the error-bounded cumulative distribution functions (CDFs) of shells to conduct efficient importance sampling, achieving low-variance rendering with fewer calculations. Extensive results in various scenes demonstrate the superiority of our sampling technique, including significantly reducing sample counts and training time, even improving the reconstruction quality. The code is available at https: //github. com/erernan/ESampling.

IJCAI Conference 2024 Conference Paper

FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on

  • Chenhui Wang
  • Tao Chen
  • Zhihao Chen
  • Zhizhong Huang
  • Taoran Jiang
  • Qi Wang
  • Hongming Shan

Despite their impressive generative performance, latent diffusion model-based virtual try-on (VTON) methods lack faithfulness to crucial details of the clothes, such as style, pattern, and text. To alleviate these issues caused by the diffusion stochastic nature and latent supervision, we propose a novel Faithful Latent Diffusion Model for VTON, termed FLDM-VTON. FLDM-VTON improves the conventional latent diffusion process in three major aspects. First, we propose incorporating warped clothes as both the starting point and local condition, supplying the model with faithful clothes priors. Second, we introduce a novel clothes flattening network to constrain generated try-on images, providing clothes-consistent faithful supervision. Third, we devise a clothes-posterior sampling for faithful inference, further enhancing the model performance over conventional clothes-agnostic Gaussian sampling. Extensive experimental results on the benchmark VITON-HD and Dress Code datasets demonstrate that our FLDM-VTON outperforms state-of-the-art baselines and is able to generate photo-realistic try-on images with faithful clothing details.

NeurIPS Conference 2024 Conference Paper

GO4Align: Group Optimization for Multi-Task Alignment

  • Jiayi Shen
  • Qi Wang
  • Zehao Xiao
  • Nanne van Noord
  • Marcel Worring

This paper proposes GO4Align, a multi-task optimization approach that tackles task imbalance by explicitly aligning the optimization across tasks. To achieve this, we design an adaptive group risk minimization strategy, comprising two techniques in implementation: (i) dynamical group assignment, which clusters similar tasks based on task interactions; (ii) risk-guided group indicators, which exploit consistent task correlations with risk information from previous iterations. Comprehensive experimental results on diverse benchmarks demonstrate our method's performance superiority with even lower computational costs.

JBHI Journal 2024 Journal Article

Improving Needle Tip Tracking and Detection in Ultrasound-Based Navigation System Using Deep Learning-Enabled Approach

  • Hui Che
  • Jiaxin Qin
  • Yao Chen
  • Zihan Ji
  • Yibo Yan
  • Jing Yang
  • Qi Wang
  • Chaofeng Liang

Ultrasound-guided percutaneous interventions have numerous advantages over traditional techniques. Accurate needle placement in the target anatomy is crucial for successful intervention, and reliable visual information is essential to achieve this. However, previous studies have revealed several challenges, such as the variability in needle echogenicity and the common misalignment of the ultrasound beam and the needle. Advanced techniques have been developed to optimize needle visualization, including hardware-based and image-processing-based methods. This paper proposes a novel strategy of integrating ultrasound-based deep learning approaches into an optical navigation system to enhance needle visualization and improve tip positioning accuracy. Both the tracking and detection algorithms are optimized utilizing optical tracking information. The information is introduced into the tracking network to define the search patch update strategy and form a trajectory reference to correct tracking results. In the detection network, the original image is processed according to the needle insertion position and current position given by the optical localization system to locate a coarse region, and the depth-score criterion is adopted to optimize detection results. Extensive experiments demonstrate that our approach achieves promising tip tracking and detection performance with tip localization errors of 1. 11 $\pm $ 0. 59 mm and 1. 17 $\pm$ 0. 70 mm, respectively. Moreover, we establish a paired dataset consisting of ultrasound images and their corresponding spatial tip coordinates acquired from the optical tracking system and conduct real puncture experiments to verify the effectiveness of the proposed methods. Our approach significantly improves needle visualization and provides physicians with visual guidance for posture adjustment.

TMLR Journal 2024 Journal Article

Large Language Models can be Guided to Evade AI-generated Text Detection

  • Ning Lu
  • Shengcai Liu
  • Rui He
  • Yew-Soon Ong
  • Qi Wang
  • Ke Tang

Large language models (LLMs) have shown remarkable performance in various tasks and have been extensively utilized by the public. However, the increasing concerns regarding the misuse of LLMs, such as plagiarism and spamming, have led to the development of multiple detectors, including fine-tuned classifiers and statistical methods. In this study, we equip LLMs with prompts, rather than relying on an external paraphraser, to evaluate the vulnerability of these detectors. We propose a novel Substitution-based In-Context example Optimization method (SICO) to automatically construct prompts for evading the detectors. SICO is cost-efficient as it requires only 40 human-written examples and a limited number of LLM inferences to generate a prompt. Moreover, once a task-specific prompt has been constructed, it can be universally used against a wide range of detectors. Extensive experiments across three real-world tasks demonstrate that SICO significantly outperforms the paraphraser baselines and enables GPT-3.5 to successfully evade six detectors, decreasing their AUC by 0.5 on average. Furthermore, a comprehensive human evaluation show that the SICO-generated text achieves human-level readability and task completion rates, while preserving high imperceptibility. Finally, we propose an ensemble approach to enhance the robustness of detectors against SICO attack.

EAAI Journal 2024 Journal Article

Machine learning-driven feature importance appraisal of seismic parameters on tunnel damage and seismic fragility prediction

  • Qi Wang
  • Ping Geng
  • Liangjie Wang
  • Dingwei He
  • Huoming Shen

This study proposes a machine learning-driven approach for the analysis of the feature importance of seismic parameters on tunnel damage and seismic fragility prediction. The Incremental Dynamic Analysis (IDA) method serves as the fundamental database for vulnerability analysis. Strength and deformation yield criteria are chosen to comprehensively assess the impact of different seismic parameters on the vulnerability of tunnels to seismic events. Three machine learning algorithms, namely Extreme Gradient Boosting (XGBoost), Random Forest (RF), and Support Vector Machine (SVM), are utilized to develop models for classifying and regressing tunnel damage under seismic conditions. Following parameter tuning, the models' performance in multi-classification, binary classification, and regression prediction is assessed, with XGBoost and RF models exhibiting outstanding performance. Feature importance analysis of seismic parameters in XGBoost and RF models for multi-classification, binary classification, and regression is performed using Shapley additive explanations (SHAP). The correlation analysis between SHAP-based feature values and predictions reveals that Peak Ground Displacement (PGD) has the highest influence in the regression model. Utilizing the interaction dependencies among crucial features in the regression model, fragility curves for tunnels based on these key features are effectively derived. The predicted fragility curves closely align with those derived from IDA, illustrating the time-saving and high-performance capabilities of machine learning in nonlinear dynamic computations.

NeurIPS Conference 2024 Conference Paper

Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning

  • Qi Wang
  • Junming Yang
  • Yunbo Wang
  • Xin Jin
  • Wenjun Zeng
  • Xiaokang Yang

Training offline RL models using visual inputs poses two significant challenges, i. e. , the overfitting problem in representation learning and the overestimation bias for expected future rewards. Recent work has attempted to alleviate the overestimation bias by encouraging conservative behaviors. This paper, in contrast, tries to build more flexible constraints for value estimation without impeding the exploration of potential advantages. The key idea is to leverage off-the-shelf RL simulators, which can be easily interacted with in an online manner, as the “ test bed ” for offline policies. To enable effective online-to-offline knowledge transfer, we introduce CoWorld, a model-based RL approach that mitigates cross-domain discrepancies in state and reward spaces. Experimental results demonstrate the effectiveness of CoWorld, outperforming existing RL approaches by large margins.

EAAI Journal 2024 Journal Article

Meta-fourier neural operators for multi-task modeling of film cooling in gas turbine endwalls

  • Qi Wang
  • Jian Lou
  • Yang Li
  • Li Yang

Film cooling was a key technology to protect gas turbine endwalls from thermal ablation. Precise local temperature control was important for film cooling design on turbine endwalls, which required fast prediction of the two-dimensional cooling effectiveness. Supervised deep learning methods were feasible methods to fulfill such demand, but still faced challenges in the lack of data and generalization. A prediction model trained for a specific endwall could not be generalized to others at a low cost. To break through this bottleneck, this study proposed a meta learning method for film cooling prediction, which leveraged historical data to reduce the modeling cost and improve the generalization on a new film cooling prediction task. Four historical tasks and two new tasks for the film cooling prediction were created by changing the endwall pressure gradients. The number of samples available for modeling was limited to less than 10 for each new task. A Fourier Neural Operator was adopted to regress the film cooling effectiveness on endwall surfaces. Results showed that the proposed method reduced the amount of data required by 80% and the prediction error by 55% on the new film cooling design tasks.

IJCAI Conference 2024 Conference Paper

MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music

  • Zihao Wang
  • Shuyu Li
  • Tao Zhang
  • Qi Wang
  • Pengfei Yu
  • Jinyang Luo
  • Yan Liu
  • Ming Xi

The rapidly evolving multimodal Large Language Models (LLMs) urgently require new benchmarks to uniformly evaluate their performance on understanding and textually describing music. However, due to semantic gaps between Music Information Retrieval (MIR) algorithms and human understanding, discrepancies between professionals and the public, and low precision of annotations, existing music description datasets cannot serve as benchmarks. To this end, we present MuChin, the first open-source music description benchmark in Chinese colloquial language, designed to evaluate the performance of multimodal LLMs in understanding and describing music. We established the Caichong Music Annotation Platform (CaiMAP) that employs an innovative multi-person, multi-stage assurance method, and recruited both amateurs and professionals to ensure the precision of annotations and alignment with popular semantics. Utilizing this method, we built a large-scale, private dataset with multi-dimensional, high-precision music annotations, the Caichong Music Dataset (CaiMD), and carefully selected 1, 000 high-quality entries to serve as the test set for MuChin. Based on MuChin, we analyzed the discrepancies between professionals and amateurs in terms of music description, and empirically demonstrated the effectiveness of CaiMD for fine-tuning LLMs. Ultimately, we employed MuChin to evaluate existing music understanding models on their ability to provide colloquial descriptions of music.

NeurIPS Conference 2024 Conference Paper

Offline Reinforcement Learning with OOD State Correction and OOD Action Suppression

  • Yixiu Mao
  • Qi Wang
  • Chen Chen
  • Yun Qu
  • Xiangyang Ji

In offline reinforcement learning (RL), addressing the out-of-distribution (OOD) action issue has been a focus, but we argue that there exists an OOD state issue that also impairs performance yet has been underexplored. Such an issue describes the scenario when the agent encounters states out of the offline dataset during the test phase, leading to uncontrolled behavior and performance degradation. To this end, we propose SCAS, a simple yet effective approach that unifies OOD state correction and OOD action suppression in offline RL. Technically, SCAS achieves value-aware OOD state correction, capable of correcting the agent from OOD states to high-value in-distribution states. Theoretical and empirical results show that SCAS also exhibits the effect of suppressing OOD actions. On standard offline RL benchmarks, SCAS achieves excellent performance without additional hyperparameter tuning. Moreover, benefiting from its OOD state correction feature, SCAS demonstrates enhanced robustness against environmental perturbations.

NeurIPS Conference 2024 Conference Paper

P$^2$C$^2$Net: PDE-Preserved Coarse Correction Network for efficient prediction of spatiotemporal dynamics

  • Qi Wang
  • Pu Ren
  • Hao Zhou
  • Xin-Yang Liu
  • Zhiwen Deng
  • Yi Zhang
  • Ruizhi Chengze
  • Hongsheng Liu

When solving partial differential equations (PDEs), classical numerical methods often require fine mesh grids and small time stepping to meet stability, consistency, and convergence conditions, leading to high computational cost. Recently, machine learning has been increasingly utilized to solve PDE problems, but they often encounter challenges related to interpretability, generalizability, and strong dependency on rich labeled data. Hence, we introduce a new PDE-Preserved Coarse Correction Network (P$^2$C$^2$Net) to efficiently solve spatiotemporal PDE problems on coarse mesh grids in small data regimes. The model consists of two synergistic modules: (1) a trainable PDE block that learns to update the coarse solution (i. e. , the system state), based on a high-order numerical scheme with boundary condition encoding, and (2) a neural network block that consistently corrects the solution on the fly. In particular, we propose a learnable symmetric Conv filter, with weights shared over the entire model, to accurately estimate the spatial derivatives of PDE based on the neural-corrected system state. The resulting physics-encoded model is capable of handling limited training data (e. g. , 3--5 trajectories) and accelerates the prediction of PDE solutions on coarse spatiotemporal grids while maintaining a high accuracy. P$^2$C$^2$Net achieves consistent state-of-the-art performance with over 50\% gain (e. g. , in terms of relative prediction error) across four datasets covering complex reaction-diffusion processes and turbulent flows.

NeurIPS Conference 2024 Conference Paper

Resource-Aware Federated Self-Supervised Learning with Global Class Representations

  • Mingyi Li
  • Xiao Zhang
  • Qi Wang
  • Tengfei Liu
  • Ruofan Wu
  • Weiqiang Wang
  • Fuzhen Zhuang
  • Hui Xiong

Due to the heterogeneous architectures and class skew, the global representation models training in resource-adaptive federated self-supervised learning face with tricky challenges: $\textit{deviated representation abilities}$ and $\textit{inconsistent representation spaces}$. In this work, we are the first to propose a multi-teacher knowledge distillation framework, namely $\textit{FedMKD}$, to learn global representations with whole class knowledge from heterogeneous clients even under extreme class skew. Firstly, the adaptive knowledge integration mechanism is designed to learn better representations from all heterogeneous models with deviated representation abilities. Then the weighted combination of the self-supervised loss and the distillation loss can support the global model to encode all classes from clients into a unified space. Besides, the global knowledge anchored alignment module can make the local representation spaces close to the global spaces, which further improves the representation abilities of local ones. Finally, extensive experiments conducted on two datasets demonstrate the effectiveness of $\textit{FedMKD}$ which outperforms state-of-the-art baselines 4. 78\% under linear evaluation on average.

IJCAI Conference 2024 Conference Paper

ScreenAgent: A Vision Language Model-driven Computer Control Agent

  • Runliang Niu
  • Jindong Li
  • Shiqi Wang
  • Yali Fu
  • Xiyu Hu
  • Xueyuan Leng
  • He Kong
  • Yi Chang

Large Language Models (LLM) can invoke a variety of tools and APIs to complete complex tasks. The computer, as the most powerful and universal tool, could potentially be controlled by a trained LLM agent. Powered by the computer, we can hopefully build a more generalized agent to assist humans in various daily digital works. In this paper, we construct an environment for a Vision Language Model (VLM) agent to interact with a real computer screen. Within this environment, the agent can observe screenshots and manipulate the Graphical User Interface (GUI) by outputting mouse and keyboard actions. We also design an automated control pipeline that includes planning, acting, and reflecting phases, guiding the agent to continuously interact with the environment and complete multi-step tasks. Additionally, we construct the ScreenAgent Dataset, which collects screenshots and action sequences when completing daily computer tasks. Finally, we train a model, ScreenAgent, which achieves comparable computer control capabilities to GPT-4V and demonstrated more precise UI positioning capabilities. Our attempts could inspire further research on building a generalist LLM agent. The code and more detailed information are at https: //github. com/niuzaisheng/ScreenAgent.

NeurIPS Conference 2024 Conference Paper

Theoretical Investigations and Practical Enhancements on Tail Task Risk Minimization in Meta Learning

  • Yiqin Lv
  • Qi Wang
  • Dong Liang
  • Zheng Xie

Meta learning is a promising paradigm in the era of large models and task distributional robustness has become an indispensable consideration in real-world scenarios. Recent advances have examined the effectiveness of tail task risk minimization in fast adaptation robustness improvement \citep{wang2023simple}. This work contributes to more theoretical investigations and practical enhancements in the field. Specifically, we reduce the distributionally robust strategy to a max-min optimization problem, constitute the Stackelberg equilibrium as the solution concept, and estimate the convergence rate. In the presence of tail risk, we further derive the generalization bound, establish connections with estimated quantiles, and practically improve the studied strategy. Accordingly, extensive evaluations demonstrate the significance of our proposal in boosting robustness.

YNICL Journal 2024 Journal Article

Unveiling MRI markers for Parkinson’s Disease: GABAergic dysfunction and cortical changes

  • Yuan Tian
  • Sijia Geng
  • Tianyi Liu
  • Qi Wang
  • Jianxiu Lian
  • Liangjie Lin
  • Jiayu Li
  • Tao Gong

OBJECTIVE: The study aimed to investigate changes in basal levels of the inhibitory γ-aminobutyric acid (GABA) neurotransmitter in the sensorimotor cortex (SMC) and cortical gyrification in patients with Parkinson's disease (PD), which could further identify potential imaging biomarkers for PD, particularly in patients with early-onset Parkinson's disease (EOPD). METHOD: Fifty patients with PD (EOPD: 10, late-onset Parkinson's disease [LOPD]: 40) and fifty-two age- and gender-matched healthy controls (HC) underwent GABA-edited 1H MRS of the SMC and high-resolution 3D T1-weighted brain imaging. GABA levels and local gyrification index (LGI) were calculated to assess GABAergic and cortical gyrification deficits in PD. RESULT: The Pearson correlation coefficients revealed significant negative associations between eight indicators, including GABA/Cr level and local gyrification index (LGI) of specific cortical regions (precentral, postcentral, entorhinal, superiortemporal, posteriorcingulate, cuneus, and transversetemporal cortex), and the likelihood of Parkinson's disease (r < -0.4, p < 0.001). Additionally, GABA levels were significantly lower in the SMC region of both EOPD and LOPD patients compared to healthy controls (mean ± SD [u.i.]: EOPD=0.081 ± 0.022 vs. Young-HC=0.112 ± 0.021, p = 0.003; LOPD=0.054 ± 0.024 vs. Old-HC=0.099 ± 0.021, p < 0.001). The logistic regression model was established by using multivariate analysis, identifying two statistically significant indicators: GABA/Cr and LGI of the transversetemporal. The combined model exhibited the highest AUC values in both younger and older populations. CONCLUSION: GABAergic dysfunction may play an important role in the pathogenesis of PD patients. Changes in neurotransmitter and morphological may serve as potential markers for the preclinical diagnosis and progression of PD, including EOPD.

NeurIPS Conference 2023 Conference Paper

A Simple Yet Effective Strategy to Robustify the Meta Learning Paradigm

  • Qi Wang
  • Yiqin Lv
  • Yanghe Feng
  • Zheng Xie
  • Jincai Huang

Meta learning is a promising paradigm to enable skill transfer across tasks. Most previous methods employ the empirical risk minimization principle in optimization. However, the resulting worst fast adaptation to a subset of tasks can be catastrophic in risk-sensitive scenarios. To robustify fast adaptation, this paper optimizes meta learning pipelines from a distributionally robust perspective and meta trains models with the measure of tail task risk. We take the two-stage strategy as heuristics to solve the robust meta learning problem, controlling the worst fast adaptation cases at a certain probabilistic level. Experimental results show that our simple method can improve the robustness of meta learning to task distributions and reduce the conditional expectation of the worst fast adaptation risk.

EAAI Journal 2023 Journal Article

An autonomous cooperative system of multi-AUV for underwater targets detection and localization

  • Qi Wang
  • Bo He
  • Yixiao Zhang
  • Fei Yu
  • Xiaochao Huang
  • Rong Yang

This paper proposes a cooperative online target detection methodology by multiple autonomous underwater vehicles (Multi-AUV) equipped with the side-scan sonar (SSS) sensor for real-time, accurate, and efficient underwater target detection and positioning in unknown environments. Due to the existence of unfavorable factors such as severe noises and geometric deformation of SSS images, this study incorporates the prior-based threshold segmentation with multi-scale cascaded networks (MSCNet) to reduce the high false alarm rate significantly. Specifically, to the real-time requirements of the AUVs computational platform, this study proposes the sequentially dual-branch lightweight block (LWBlock) as a baseline to obtain dense feature maps, which provide a good trade-off between accuracy and speed. Meanwhile, this study establishes the comprehensive correction model, which obtains the accurate target positioning information fusing with the predicted results. Furthermore, according to the target information provided by the automatic target recognition (ATR) system, the data-driven behavior-based (DDBB) path re-planning algorithm is performed that endows each AUV to scan above the interest target autonomously and in detail by designed maneuver behavior. Simulation and actual sea trial experimental results show that the proposed method outperforms other state-of-the-art algorithms, and achieves the recognition accuracy of 92. 16%, inference speed of 2. 45 s, and obtained the best FPR indicator in three SSS targets of 2. 54% (metal ball), 1. 96% (seabed rock) and 1. 03% (metal rod), respectively. At the same time, the proposed algorithm can improve detection efficiency by at least 40% compared with a single AUV, which can be widely used in marine mission exploration and resource deployment.

EAAI Journal 2023 Journal Article

An online path planning algorithm for autonomous marine geomorphological surveys based on AUV

  • Yixiao Zhang
  • Qi Wang
  • Yue Shen
  • Bo He

This paper proposed a data-driven bi-pattern (DDBP) path planning algorithm for ocean geomorphological surveys based on Autonomous Underwater Vehicles (AUVs). When an AUV conducts surveys in unknown areas, it uses the observation data of real-time side-scan sonar to conduct environment modeling to drive independent online path re-planning (PRP) according to the feature density of the interesting targets. Based on the DDBP algorithm, the AUV can autonomously focus on regions with rich target distribution and deviate from regions with sparse target distribution without prior knowledge of the task region. The quality and efficiency of the AUV-based surveys can be improved by focusing on the underwater detection area with high feature density. The DDBP algorithm includes two patterns: rough and fine scan, and the corresponding planning pattern is selected according to the distribution of the detected targets. AUV performs online PRP in the corresponding pattern according to the pre-identified strategy set. We conducted simulation experiments and selected sand waves and fish reefs as natural and artificial structures to conduct typical marine survey tests. Compared with the traditional marine survey method, the survey efficiency was increased by 33. 6% and 29. 6%, respectively, in the two DDBP survey experiments for sand waves; the efficiency increased by 32. 9% and 36. 7%, respectively, in the two groups of DDBP survey experiments on artificial reefs. The proposed general technical framework for online path planning driven by real-time observation data has good application prospects in underwater archaeology, rapid understanding of specific targets on the seafloor, and search of specific targets.

YNICL Journal 2023 Journal Article

Cortical anatomical variations, gene expression profiles, and clinical phenotypes in patients with schizophrenia

  • Yong Han
  • Yongfeng Yang
  • Zhilu Zhou
  • Xueyan Jin
  • Han Shi
  • Minglong Shao
  • Meng Song
  • Xi Su

BACKGROUND AND HYPOTHESIS: Schizophrenia (SZ) patients display significant structural brain abnormalities; nevertheless, the genetic mechanisms regulating cortical anatomical variations and their correlation with the disease phenotype are still ambiguous. STUDY DESIGN: We characterized anatomical variation using a surface-based method derived from structural magnetic resonance imaging of patients with SZ and age- and sex-matched healthy controls (HCs). Partial least-squares regression was performed across cortex regions between anatomical variation and average transcriptional profiles of SZ risk genes and all qualified genes from the Allen Human Brain Atlas. The morphological features of each brain region were correlated to symptomology variables in patients with SZ using partial correlation analysis. STUDY RESULTS: A total of 203 SZ and 201 HCs were included in the final analysis. We observed significant variation of 55 regions of cortical thickness, 23 regions of volume, 7 regions of area, and 55 regions of local gyrification index (LGI) between SZ and HC groups. Expression profiles of 4 SZ risk genes and 96 genes from all qualified genes showed a correlation to anatomical variability, however, after multiple comparisons, the correlations were no longer significant. LGI variability in multiple frontal subregions was associated with specific symptoms of SZ, whereas cognitive function involving attention/vigilance was linked to LGI variability across nine brain regions. CONCLUSIONS: Cortical anatomical variation of patients with schizophrenia is associated with gene transcriptome profiles as well as clinical phenotypes.

NeurIPS Conference 2023 Conference Paper

Episodic Multi-Task Learning with Heterogeneous Neural Processes

  • Jiayi Shen
  • Xiantong Zhen
  • Qi Wang
  • Marcel Worring

This paper focuses on the data-insufficiency problem in multi-task learning within an episodic training setup. Specifically, we explore the potential of heterogeneous information across tasks and meta-knowledge among episodes to effectively tackle each task with limited data. Existing meta-learning methods often fail to take advantage of crucial heterogeneous information in a single episode, while multi-task learning models neglect reusing experience from earlier episodes. To address the problem of insufficient data, we develop Heterogeneous Neural Processes (HNPs) for the episodic multi-task setup. Within the framework of hierarchical Bayes, HNPs effectively capitalize on prior experiences as meta-knowledge and capture task-relatedness among heterogeneous tasks, mitigating data-insufficiency. Meanwhile, transformer-structured inference modules are designed to enable efficient inferences toward meta-knowledge and task-relatedness. In this way, HNPs can learn more powerful functional priors for adapting to novel heterogeneous tasks in each meta-test episode. Experimental results show the superior performance of the proposed HNPs over typical baselines, and ablation studies verify the effectiveness of the designed inference modules.

IROS Conference 2023 Conference Paper

Hierarchical Attention Network for Planning-Informed Multi-Agent Trajectory Prediction

  • Wenyi Xiong
  • Jian Chen
  • Xinfang Zhang
  • Qi Wang
  • Ziheng Qi

The accurate prediction of the neighboring vehicles' trajectories affects the security of autonomous driving vehicles. However, it is challenging for existing methods to anticipating the trajectories of vehicles in the vicinity due to the uncertainty of driving behaviors and the complex interaction patterns of traffic flows. In this study, incorporating the planning information of the ego vehicle, we propose a novel trajectory prediction approach based on the hierarchical attention mechanism. Firstly, a spatio-temporary attention module is presented to extract the social interaction of surrounding vehicles and capture the temporal dependence of continuous frame historical information and planning information. Then, a hard-soft attention module is designed to perform two tasks: weighing the importance of both historical and future information, and learning different location information about the target vehicles. Our method is evaluated on two national highway datasets. The experimental results show that our algorithm achieves the state-of-the-art performance.

EAAI Journal 2023 Journal Article

Meteorological data layout and task scheduling in a multi-cloud environment

  • Yongsheng Hao
  • Jie Cao
  • Qi Wang
  • Tinghuai Ma
  • Qin Wang
  • Xin Zhang

The meteorological cloud mainly provides computing ability and meteorological datasets for meteorological model tasks. If the location of the required dataset and the execution location of the task are different, this will consume a large amount of time and bandwidth to transfer the data for the task. Meteorological data layout allocates meteorological datasets to various clouds. Because meteorological datasets are required by multiple meteorological model tasks and multiple times, the data layout is very important in the meteorological clouds. This paper focuses on how to layout out the meteorological datasets based on the association (internal meteorological datasets and between meteorological model tasks) and schedule resources for meteorological model tasks in the meteorological cloud. First, to find the association in the meteorological datasets and meteorological models, we use Apriori algorithm to mine frequent itemsets between datasets used by different meteorological models, and then we use the result to help layout meteorological data. After that, we present a heuristic algorithm for scheduling meteorological tasks. Finally, simulation comparison shows that the meteorological data layout method has a lowest value in the number of involved clouds for every task, the average size of transmitted datasets from other clouds, and the average time of transmitted datasets between clouds. We also prove that the scheduling method based on the data layout increases the number of completed tasks before their deadlines and reduces the average execution time.

IJCAI Conference 2023 Conference Paper

Multi-level Graph Contrastive Prototypical Clustering

  • Yuchao Zhang
  • Yuan Yuan
  • Qi Wang

Recently, graph neural networks (GNNs) have drawn a surge of investigations in deep graph clustering. Nevertheless, existing approaches predominantly are inclined to semantic-agnostic since GNNs exhibit inherent limitations in capturing global underlying semantic structures. Meanwhile, multiple objectives are imposed within one latent space, whereas representations from different granularities may presumably conflict with each other, yielding severe performance degradation for clustering. To this end, we propose a novel Multi-Level Graph Contrastive Prototypical Clustering (MLG-CPC) framework for end-to-end clustering. Specifically, a Prototype Discrimination (ProDisc) objective function is proposed to explicitly capture semantic information via cluster assignments. Moreover, to alleviate the issue of objectives conflict, we introduce to perceive representations of different granularities within individual feature-, prototypical-, and cluster-level spaces by the feature decorrelation, prototype contrast, and cluster space consistency respectively. Extensive experiments on four benchmarks demonstrate the superiority of the proposed MLG-CPC against the state-of-the-art graph clustering approaches.

EAAI Journal 2023 Journal Article

SAR ship localization method with denoising and feature refinement

  • Cheng Zha
  • Weidong Min
  • Qing Han
  • Wei Li
  • Xin Xiong
  • Qi Wang
  • Meng Zhu

Synthetic Aperture Radar (SAR) ship detection is greatly important to marine transportation monitoring and fishery resource management. To improve the detection accuracy of small ships, an SAR ship localization method with Denoising and Feature Refinement (DFR) is proposed in this paper. It consists of three parts. The first part is the denoising module, which uses non-local mean to suppress the speckle noise of the SAR image. The second part is Hierarchical Feature Fusion (HFF) module. It can integrate more low-level features by adding skip connections. This prevents the low-level spatial position information of the fused features from being diluted by high-level semantic information, therefore it is beneficial to the detection of small ships. The third part is a center-based ship predictor with Feature Refinement (FR). The FR module is proposed to refine the features and reduce the background interference, which is conducive to locate ships more accurately. Extensive experiments are conducted. The experimental results show that after adding the denoising and FR modules, the value of AP 0. 5 is increased by 1. 7% and 2. 3%, respectively, which proves the effectiveness of these two modules. In inshore and offshore scenarios, the AP 0. 5 values of DFR are 0. 884 and 0. 966, respectively, achieving the best results. The proposed method can also be generalized to mark lesion locations in medical images and detect offshore oil production platforms.

EAAI Journal 2023 Journal Article

SHDM-NET: Heat map detail guidance with image matting for industrial weld semantic segmentation network

  • Qi Wang
  • Jingwu Mei
  • Wuming Jiang
  • Hegui Zhu

Welding is widely used in metal components. The firm of weld components is very important in different applications, such as buildings, bridges, cars and airplanes, etc. Weld seam quality inspection is essential to ensure product quality. The area and shape of the weld seam are the basis for quality assessment. So the segmentation of the weld area is very important for quality assessment. To address the problem of segmentation of the weld seam region, a weld seam segmentation network based on heat map detail guidance with Matting is proposed in this paper, which provides a new idea for fine-grained segmentation of the weld seam region. The existing DCNN-based semantic segmentation algorithm model has a poor segmentation effect at the boundary and jagged segmentation boundary, which are unacceptable for the weld segmentation problem that requires clear and precise boundary positioning. To solve this problem, three innovations are made in this paper on the DCNN-based semantic segmentation network. (1) A heat map detail guidance module makes the segmentation boundary information focus on shallow features and enhances the representation of boundary information. (2) A segmentation head improvement method for fine-grained semantic segmentation is proposed. (3) In response to the loss of process details in the coding and decoding of the semantic segmentation network, resulting in poor segmentation boundary accuracy, this paper introduces a matting algorithm to calibrate the boundary of the weld seam segmentation region. Through many experiments on industrial weld data sets, the effectiveness of our method is demonstrated, and the MIoU (Mean Intersection over Union) reaches 96. 32%. It is worth noting that this performance is comparable to human manual segmentation ( MIoU 96. 38%).

IJCAI Conference 2022 Conference Paper

A Speech-driven Sign Language Avatar Animation System for Hearing Impaired Applications

  • Li Hu
  • Jiahui Li
  • Jiashuo Zhang
  • Qi Wang
  • Bang Zhang
  • Ping Tan

Sign language is the communication language used in hearing impaired community. Recently, the research of sign language production has made great progress but still need to cope with some critical challenges. In this paper, we propose a system-level scheme and push forward the implementation of sign language production for practical usage. We build a system capable of translating speech into sign language avatar. Different from previous approach only focusing on single technology, we systematically combine algorithms of language translation, body gesture animation and facial avatar generation. We also develop two applications: Sign Language Interpretation APP and Virtual Sign Language Anchor, to facilitate easy and clear communication for hearing impaired people.

IJCAI Conference 2022 Conference Paper

AttExplainer: Explain Transformer via Attention by Reinforcement Learning

  • Runliang Niu
  • Zhepei Wei
  • Yan Wang
  • Qi Wang

Transformer and its variants, built based on attention mechanisms, have recently achieved remarkable performance in many NLP tasks. Most existing works on Transformer explanation tend to reveal and utilize the attention matrix with human subjective intuitions in a qualitative manner. However, the huge size of dimensions directly challenges these methods to quantitatively analyze the attention matrix. Therefore, in this paper, we propose a novel reinforcement learning (RL) based framework for Transformer explanation via attention matrix, namely AttExplainer. The RL agent learns to perform step-by-step masking operations by observing the change in attention matrices. We have adapted our method to two scenarios, perturbation-based model explanation and text adversarial attack. Experiments on three widely used text classification benchmarks validate the effectiveness of the proposed method compared to state-of-the-art baselines. Additional studies show that our method is highly transferable and consistent with human intuition. The code of this paper is available at https: //github. com/niuzaisheng/AttExplainer.

EAAI Journal 2022 Journal Article

Dual-branch framework: AUV-based target recognition method for marine survey

  • Fei Yu
  • Bo He
  • Jixin Liu
  • Qi Wang

Autonomous recognition of marine targets is considered a promising technology for autonomous underwater vehicle (AUV) marine survey, and AUV equipped with side-scan sonar (SSS) for recognition is the key to surveys. As a fundamental function, SSS recognition remains unsolved due to the challenging image conditions of SSS and insufficient algorithm robustness. This paper proposes an accurate and real-time dual-branch recognition framework containing segmentation and refinement branches. Firstly, the segmentation branch uses a lightweight learning network to analyze the data comprehensively. In this branch, we propose a densely connected local attention recurrent residual (LAR2) block as the backbone, and at the same time, an atrous convolution is introduced. This branch can focus on the features of interest in the image, ensuring better feature representation with low-resolution SSS information while guiding the next branch. Secondly, the refinement branch is to adjust the previous branch’s results and combines the low-level and high-level features. We propose holistic attention (HA) block in this branch, which can further improve the target recognition performance. Finally, we adopt the feature fusion method of bilinear pooling to integrate the results of the two branches to output a high-precision recognition image. In offline experiments and sea trials, our proposed method outperforms other competing algorithms in the four indicators of semantic segmentation, and achieves a computation speed of 92. 66 ms ( ± 0. 86 ms) per image on AUV dedicated hardware. The method has strong robustness, meets real-time performance, and can be widely used in AUV marine survey.

YNIMG Journal 2022 Journal Article

Focal fMRI signal enhancement with implantable inductively coupled detectors

  • Yi Chen
  • Qi Wang
  • Sangcheon Choi
  • Hang Zeng
  • Kengo Takahashi
  • Chunqi Qian
  • Xin Yu

Despite extensive efforts to increase the signal-to-noise ratio (SNR) of fMRI images for brain-wide mapping, technical advances of focal brain signal enhancement are lacking, in particular, for animal brain imaging. Emerging studies have combined fMRI with fiber optic-based optogenetics to decipher circuit-specific neuromodulation from meso to macroscales. High-resolution fMRI is needed to integrate hemodynamic responses into cross-scale functional dynamics, but the SNR remains a limiting factor given the complex implantation setup of animal brains. Here, we developed a multimodal fMRI imaging platform with an implanted inductive coil detector. This detector boosts the tSNR of MRI images, showing a 2-3-fold sensitivity gain over conventional coil configuration. In contrast to the cryoprobe or array coils with limited spaces for implanted brain interface, this setup offers a unique advantage to study brain circuit connectivity with optogenetic stimulation and can be further extended to other multimodal fMRI mapping schemes.

NeurIPS Conference 2022 Conference Paper

Learning Expressive Meta-Representations with Mixture of Expert Neural Processes

  • Qi Wang
  • Herke van Hoof

Neural processes (NPs) formulate exchangeable stochastic processes and are promising models for meta learning that do not require gradient updates during the testing phase. However, most NP variants place a strong emphasis on a global latent variable. This weakens the approximation power and restricts the scope of applications using NP variants, especially when data generative processes are complicated. To resolve these issues, we propose to combine the Mixture of Expert models with Neural Processes to develop more expressive exchangeable stochastic processes, referred to as Mixture of Expert Neural Processes (MoE-NPs). Then we apply MoE-NPs to both few-shot supervised learning and meta reinforcement learning tasks. Empirical results demonstrate MoE-NPs' strong generalization capability to unseen tasks in these benchmarks.

JBHI Journal 2022 Journal Article

MRI Generated From CT for Acute Ischemic Stroke Combining Radiomics and Generative Adversarial Networks

  • Eryan Feng
  • Pinle Qin
  • Rui Chai
  • Jianchao Zeng
  • Qi Wang
  • Yanfeng Meng
  • Peng Wang

Compared to computed tomography (CT), magnetic resonance imaging (MRI) is more sensitive to acute ischemic stroke lesion. However, MRI is time-consuming, expensive, and susceptible to interference from metal implants. Generating MRI images from CT images can address the limitations of MRI. The key problem in the process is obtaining lesion information from CT. In this study, we propose a cross-modal image generation algorithm from CT to MRI for acute ischemic stroke by combining radiomics with generative adversarial networks. First, the lesion candidate region was obtained using radiomics, the radiomic features of the region were extracted, and the feature with the largest information gain was selected and visualized as a feature map. Then, the concatenation of the extracted feature map and the CT image was input in the generator. We added a residual module after the downsampling of the generator, following the general shape of U-Net, which can deepen the network without causing degradation problems. In addition, we introduced the lesion feature similarity loss function to focus the model on the similarity of the lesion. Through the subjective judgment of two experienced radiologists and using evaluation metrics, the results showed that the generated MRI images were very similar to the real MRI images. Moreover, the locations of the lesions were correct, and the shapes of lesions were similar to those of the real lesions, which can help doctors with timely diagnosis and treatment.

JBHI Journal 2021 Journal Article

i Phantom: A Framework for Automated Creation of Individualized Computational Phantoms and Its Application to CT Organ Dosimetry

  • Wanyi Fu
  • Shobhit Sharma
  • Ehsan Abadi
  • Alexandros-Stavros Iliopoulos
  • Qi Wang
  • Joseph Y. Lo
  • Xiaobai Sun
  • William P. Segars

Objective: This study aims to develop and validate a novel framework, iPhantom, for automated creation of patient-specific phantoms or “digital-twins (DT)” using patient medical images. The framework is applied to assess radiation dose to radiosensitive organs in CT imaging of individual patients. Method: Given a volume of patient CT images, iPhantom segments selected anchor organs and structures (e. g. , liver, bones, pancreas) using a learning-based model developed for multi-organ CT segmentation. Organs which are challenging to segment (e. g. , intestines) are incorporated from a matched phantom template, using a diffeomorphic registration model developed for multi-organ phantom-voxels. The resulting digital-twin phantoms are used to assess organ doses during routine CT exams. Result: iPhantom was validated on both with a set of XCAT digital phantoms (n = 50) and an independent clinical dataset (n = 10) with similar accuracy. iPhantom precisely predicted all organ locations yielding Dice Similarity Coefficients (DSC) 0. 6 - 1 for anchor organs and DSC of 0. 3-0. 9 for all other organs. iPhantom showed <; 10% errors in estimated radiation dose for the majority of organs, which was notably superior to the state-of-the-art baseline method (20-35% dose errors). Conclusion: iPhantom enables automated and accurate creation of patient-specific phantoms and, for the first time, provides sufficient and automated patient-specific dose estimates for CT dosimetry. Significance: The new framework brings the creation and application of CHPs (computational human phantoms) to the level of individual CHPs through automation, achieving wide and precise organ localization, paving the way for clinical monitoring, personalized optimization, and large-scale research.

EAAI Journal 2021 Journal Article

Learning to traverse over graphs with a Monte Carlo tree search-based self-play framework

  • Qi Wang
  • Yongsheng Hao
  • Jie Cao

The combinatorial optimization (CO) problems on the graph are the core and classic problems in artificial intelligence (AI) and operations research (OR). For example, the Vehicle Routing Problem (VRP) and Traveling Salesman Problem (TSP) are fascinating NP-hard problems and have important significance for the existing transportation system. Traditional methods such as heuristics methods, exact algorithms, and solution solvers can already find approximate solutions on small-scale graphs. However, they are helpless for large-scale graphs and other problems with similar structures. Moreover, traditional methods often require artificially designed heuristic functions to aid decision-making. In recent years, more and more work has focused on applying deep learning and reinforcement learning (RL) to learn heuristics, which allows us to learn the internal structure of the graph end-to-end and find the optimal path under the guidance of heuristic rules. However, most of these still need manual assistance, and the RL method used has the problems of low sampling efficiency and small searchable space. This paper proposes a novel framework (called OmegaZero) based on Alphago Zero, which does not prescribe expert experience or label data but is trained through self-play. We divide the learning into two stages: in the first stage, we employ graph attention network (GAT) and GRU to learn node representations and memory history trajectories. In the second stage, we employ Monte Carlo tree search (MCTS) and deep RL to search for the solution space and train the model.

YNIMG Journal 2020 Journal Article

Inter-subject pattern analysis: A straightforward and powerful scheme for group-level MVPA

  • Qi Wang
  • Bastien Cagna
  • Thierry Chaminade
  • Sylvain Takerkart

Multivariate pattern analysis (MVPA) has become vastly popular for analyzing functional neuroimaging data. At the group level, two main strategies are used in the literature. The standard one is hierarchical, combining the outcomes of within-subject decoding results in a second-level analysis. The alternative one, inter-subject pattern analysis, directly works at the group-level by using, e. g. a leave-one-subject-out cross-validation. This study provides a thorough comparison of these two group-level decoding schemes, using both a large number of artificial datasets where the size of the multivariate effect and the amount of inter-individual variability are parametrically controlled, as well as two real fMRI datasets comprising 15 and 39 subjects, respectively. We show that these two strategies uncover distinct significant regions with partial overlap, and that inter-subject pattern analysis is able to detect smaller effects and to facilitate the interpretation. The core source code and data are openly available, allowing to fully reproduce most of these results.

NeurIPS Conference 2020 Conference Paper

Unsupervised Semantic Aggregation and Deformable Template Matching for Semi-Supervised Learning

  • Tao Han
  • Junyu Gao
  • Yuan Yuan
  • Qi Wang

Unlabeled data learning has attracted considerable attention recently. However, it is still elusive to extract the expected high-level semantic feature with mere unsupervised learning. In the meantime, semi-supervised learning (SSL) demonstrates a promising future in leveraging few samples. In this paper, we combine both to propose an Unsupervised Semantic Aggregation and Deformable Template Matching (USADTM) framework for SSL, which strives to improve the classification performance with few labeled data and then reduce the cost in data annotating. Specifically, unsupervised semantic aggregation based on Triplet Mutual Information (T-MI) loss is explored to generate semantic labels for unlabeled data. Then the semantic labels are aligned to the actual class by the supervision of labeled data. Furthermore, a feature pool that stores the labeled samples is dynamically updated to assign proxy labels for unlabeled data, which are used as targets for cross-entropy minimization. Extensive experiments and analysis across four standard semi-supervised learning benchmarks validate that USADTM achieves top performance (e. g. , 90. 46% accuracy on CIFAR-10 with 40 labels and 95. 20% accuracy with 250 labels). The code is released at https: //github. com/taohan10200/USADTM.

AAAI Conference 2020 Conference Paper

Weakly-Supervised Video Moment Retrieval via Semantic Completion Network

  • Zhijie Lin
  • Zhou Zhao
  • Zhu Zhang
  • Qi Wang
  • Huasheng Liu

Video moment retrieval is to search the moment that is most relevant to the given natural language query. Existing methods are mostly trained in a fully-supervised setting, which requires the full annotations of temporal boundary for each query. However, manually labeling the annotations is actually time-consuming and expensive. In this paper, we propose a novel weakly-supervised moment retrieval framework requiring only coarse video-level annotations for training. Specifically, we devise a proposal generation module that aggregates the context information to generate and score all candidate proposals in one single pass. We then devise an algorithm that considers both exploitation and exploration to select top- K proposals. Next, we build a semantic completion module to measure the semantic similarity between the selected proposals and query, compute reward and provide feedbacks to the proposal generation module for scoring refinement. Experiments on the ActivityCaptions and Charades-STA demonstrate the effectiveness of our proposed method.

AAAI Conference 2019 Conference Paper

ACM: Adaptive Cross-Modal Graph Convolutional Neural Networks for RGB-D Scene Recognition

  • Yuan Yuan
  • Zhitong Xiong
  • Qi Wang

RGB image classification has achieved significant performance improvement with the resurge of deep convolutional neural networks. However, mono-modal deep models for RGB image still have several limitations when applied to RGB-D scene recognition. 1) Images for scene classification usually contain more than one typical object with flexible spatial distribution, so the object-level local features should also be considered in addition to global scene representation. 2) Multi-modal features in RGB-D scene classification are still under-utilized. Simply combining these modal-specific features suffers from the semantic gaps between different modalities. 3) Most existing methods neglect the complex relationships among multiple modality features. Considering these limitations, this paper proposes an adaptive crossmodal (ACM) feature learning framework based on graph convolutional neural networks for RGB-D scene recognition. In order to make better use of the modal-specific cues, this approach mines the intra-modality relationships among the selected local features from one modality. To leverage the multi-modal knowledge more effectively, the proposed approach models the inter-modality relationships between two modalities through the cross-modal graph (CMG). We evaluate the proposed method on two public RGB-D scene classification datasets: SUN-RGBD and NYUD V2, and the proposed method achieves state-of-the-art performance.

AAAI Conference 2019 Conference Paper

Memory-Augmented Temporal Dynamic Learning for Action Recognition

  • Yuan Yuan
  • Dong Wang
  • Qi Wang

Human actions captured in video sequences contain two crucial factors for action recognition, i. e. , visual appearance and motion dynamics. To model these two aspects, Convolutional and Recurrent Neural Networks (CNNs and RNNs) are adopted in most existing successful methods for recognizing actions. However, CNN based methods are limited in modeling long-term motion dynamics. RNNs are able to learn temporal motion dynamics but lack effective ways to tackle unsteady dynamics in long-duration motion. In this work, we propose a memory-augmented temporal dynamic learning network, which learns to write the most evident information into an external memory module and ignore irrelevant ones. In particular, we present a differential memory controller to make a discrete decision on whether the external memory module should be updated with current feature. The discrete memory controller takes in the memory history, context embedding and current feature as inputs and controls information flow into the external memory module. Additionally, we train this discrete memory controller using straight-through estimator. We evaluate this end-to-end system on benchmark datasets (UCF101 and HMDB51) of human action recognition. The experimental results show consistent improvements on both datasets over prior works and our baselines.

YNIMG Journal 2018 Journal Article

Correlation of neural activity with behavioral kinematics reveals distinct sensory encoding and evidence accumulation processes during active tactile sensing

  • Ioannis Delis
  • Jacek P. Dmochowski
  • Paul Sajda
  • Qi Wang

Many real-world decisions rely on active sensing, a dynamic process for directing our sensors (e. g. eyes or fingers) across a stimulus to maximize information gain. Though ecologically pervasive, limited work has focused on identifying neural correlates of the active sensing process. In tactile perception, we often make decisions about an object/surface by actively exploring its shape/texture. Here we investigate the neural correlates of active tactile decision-making by simultaneously measuring electroencephalography (EEG) and finger kinematics while subjects interrogated a haptic surface to make perceptual judgments. Since sensorimotor behavior underlies decision formation in active sensing tasks, we hypothesized that the neural correlates of decision-related processes would be detectable by relating active sensing to neural activity. Novel brain-behavior correlation analysis revealed that three distinct EEG components, localizing to right-lateralized occipital cortex (LOC), middle frontal gyrus (MFG), and supplementary motor area (SMA), respectively, were coupled with active sensing as their activity significantly correlated with finger kinematics. To probe the functional role of these components, we fit their single-trial-couplings to decision-making performance using a hierarchical-drift-diffusion-model (HDDM), revealing that the LOC modulated the encoding of the tactile stimulus whereas the MFG predicted the rate of information integration towards a choice. Interestingly, the MFG disappeared from components uncovered from control subjects performing active sensing but not required to make perceptual decisions. By uncovering the neural correlates of distinct stimulus encoding and evidence accumulation processes, this study delineated, for the first time, the functional role of cortical areas in active tactile decision-making.

AAAI Conference 2018 Conference Paper

Inferring Emotion from Conversational Voice Data: A Semi-Supervised Multi-Path Generative Neural Network Approach

  • Suping Zhou
  • Jia Jia
  • Qi Wang
  • Yufei Dong
  • Yufeng Yin
  • Kehua Lei

To give a more humanized response in Voice Dialogue Applications (VDAs), inferring emotion states from users’ queries may play an important role. However, in VDAs, we have tremendous amount of VDA users and massive scale of unlabeled data with high dimension features from multimodal information, which challenge the traditional speech emotion recognition methods. In this paper, to better infer emotion from conversational voice data, we propose a semisupervised multi-path generative neural network. Specifically, first, we build a novel supervised multi-path deep neural network framework. To avoid high dimensional input, raw features are trained by groups in local classifiers. Then high-level features of each local classifiers are concatenated as input of a global classifier. These two kinds classifiers are trained simultaneously through a single objective function to achieve a more effective and discriminative emotion inferring. To further solve the labeled-datascarcity problem, we extend the multi-path deep neural network to a generative model based on semi-supervised variational autoencoder(semi-VAE), which is able to train the labeled and unlabeled data simultaneously. Experiment based on a 24, 000 real-world dataset collected from Sogou Voice Assistant1 (SVAD13) and a benchmark dataset IEMOCAP show that our method significantly outperforms the existing state-of-the-art results.

IJCAI Conference 2018 Conference Paper

Nonrigid Points Alignment with Soft-weighted Selection

  • Xuelong Li
  • Jian Yang
  • Qi Wang

Point set registration (PSR) is a crucial problem in computer vision and pattern recognition. Existing PSR methods cannot align point sets robustly due to degradations, such as deformation, noise, occlusion, outlier, and multi-view changes. In this paper, we present a self-selected regularized Gaussian fields criterion for nonrigid point matching. Unlike most existing methods, we formulate the registration problem as a sparse approximation task with low rank constraint in reproducing kernel Hilbert space (RKHS). A self-selected mechanism is used to dynamically assign real-valued label for each point in an accuracy-aware weighting manner, which makes the model focus more on the reliable points in position. Based on the label, an equivalent matching number optimization is embedded into the non-rigid criterion to enhance the reliability of the approximation. Experimental results show that the proposed method can achieve a better result in both registration accuracy and correct matches compared to state-of-the-art approaches.

AAAI Conference 2017 Conference Paper

A Multiview-Based Parameter Free Framework for Group Detection

  • Xuelong Li
  • Mulin Chen
  • Feiping Nie
  • Qi Wang

Group detection is fundamentally important for analyzing crowd behaviors, and has attracted plenty of attention in arti- ficial intelligence. However, existing works mostly have limitations due to the insufficient utilization of crowd properties and the arbitrary processing of individuals. In this paper, we propose the Multiview-based Parameter Free (MPF) approach to detect groups in crowd scenes. The main contributions made in this study are threefold: (1) a new structural context descriptor is designed to characterize the structural property of individuals in crowd motions; (2) an selfweighted multiview clustering method is proposed to cluster feature points by incorporating their motion and context similarities; (3) a novel framework is introduced for group detection, which is able to determine the group number automatically without any parameter or threshold to be tuned. Extensive experiments on various real world datasets demonstrate the effectiveness of the proposed approach, and show its superiority against state-of-the-art group detection techniques.

IJCAI Conference 2017 Conference Paper

Convolutional 2D LDA for Nonlinear Dimensionality Reduction

  • Qi Wang
  • Zequn Qin
  • Feiping Nie
  • Yuan Yuan

Representing high-volume and high-order data is an essential problem, especially in machine learning field. Although existing two-dimensional (2D) discriminant analysis achieves promising performance, the single and linear projection features make it difficult to analyze more complex data. In this paper, we propose a novel convolutional two-dimensional linear discriminant analysis (2D LDA) method for data representation. In order to deal with nonlinear data, a specially designed Convolutional Neural Networks (CNN) is presented, which can be proved having the equivalent objective function with common 2D LDA. In this way, the discriminant ability can benefit from not only the nonlinearity of Convolutional Neural Networks, but also the powerful learning process. Experiment results on several datasets show that the proposed method performs better than other state-of-the-art methods in terms of classification accuracy.

IJCAI Conference 2017 Conference Paper

Locality Adaptive Discriminant Analysis

  • Xuelong Li
  • Mulin Chen
  • Feiping Nie
  • Qi Wang

Linear Discriminant Analysis (LDA) is a popular technique for supervised dimensionality reduction, and its performance is satisfying when dealing with Gaussian distributed data. However, the neglect of local data structure makes LDA inapplicable to many real-world situations. So some works focus on the discriminant analysis between neighbor points, which can be easily affected by the noise in the original data space. In this paper, we propose a new supervised dimensionality reduction method, Locality Adaptive Discriminant Analysis (LADA), to lean a representative subspace of the data. Compared to LDA and its variants, the proposed method has three salient advantages: (1) it finds the principle projection directions without imposing any assumption on the data distribution; (2) it’s able to exploit the local manifold structure of data in the desired subspace; (3) it exploits the points’ neighbor relationship automatically without introducing any additional parameter to be tuned. Performance on synthetic datasets and real-world benchmark datasets demonstrate the superiority of the proposed method.

AAAI Conference 2017 Conference Paper

Quantifying and Detecting Collective Motion by Manifold Learning

  • Qi Wang
  • Mulin Chen
  • Xuelong Li

The analysis of collective motion has attracted many researchers in artificial intelligence. Though plenty of works have been done on this topic, the achieved performance is still unsatisfying due to the complex nature of collective motions. By investigating the similarity of individuals, this paper proposes a novel framework for both quantifying and detecting collective motions. Our main contributions are threefold: (1) the time-varying dynamics of individuals are deeply investigated to better characterize the individual motion; (2) a structure-based collectiveness measurement is designed to precisely quantify both individual-level and scene-level properties of collective motions; (3) a multi-stage clustering strategy is presented to discover a more comprehensive understanding of the crowd scenes, containing both local and global collective motions. Extensive experimental results on real world data sets show that our method is capable of handling crowd scenes with complicated structures and various dynamics, and demonstrate its superior performance against state-of-the-art competitors.

TCS Journal 2016 Journal Article

The Space Complexity Analysis in the General Number Field Sieve Integer Factorization

  • Qi Wang
  • Xiubin Fan
  • Hongyan Zang
  • Yu Wang

The General Number Sieve is the most efficient algorithm for integer factorization. It consists of polynomial selection, sieving, solving equations and finding square roots. Root lifting of polynomial is discussed in this paper. The p-adic evaluation provided by each root and the expected p-value are also given. Then we gain the space complexity of sieving and building equations over the ring Z / 2 Z.

YNIMG Journal 2010 Journal Article

Identifying gene regulatory networks in schizophrenia

  • Steven G. Potkin
  • Fabio Macciardi
  • Guia Guffanti
  • James H. Fallon
  • Qi Wang
  • Jessica A. Turner
  • Anita Lakatos
  • Michael F. Miles

The imaging genetics approach to studying the genetic basis of disease leverages the individual strengths of both neuroimaging and genetic studies by visualizing and quantifying the brain activation patterns in the context of genetic background. Brain imaging as an intermediate phenotype can help clarify the functional link among genes, the molecular networks in which they participate, and brain circuitry and function. Integrating genetic data from a genome-wide association study (GWAS) with brain imaging as a quantitative trait (QT) phenotype can increase the statistical power to identify risk genes. A QT analysis using brain imaging (DLPFC activation during a working memory task) as a quantitative trait has identified unanticipated risk genes for schizophrenia. Several of these genes (RSRC1, ARHGAP18, ROBO1-ROBO2, GPC1, TNIK, and CTXN3-SLC12A2) have functions related to progenitor cell proliferation, migration, and differentiation, cytoskeleton reorganization, axonal connectivity, and development of forebrain structures. These genes, however, do not function in isolation but rather through gene regulatory networks. To obtain a deeper understanding how the GWAS-identified genes participate in larger gene regulatory networks, we measured correlations among transcript levels in the mouse and human postmortem tissue and performed a gene set enrichment analysis (GSEA) that identified several microRNA associated with schizophrenia (448, 218, 137). The results of such computational approaches can be further validated in animal experiments in which the networks are experimentally studied and perturbed with specific compounds. Glypican 1 and FGF17 mouse models for example, can be used to study such gene regulatory networks. The model demonstrates epistatic interactions between FGF and glypican on brain development and may be a useful model of negative symptom schizophrenia.

IROS Conference 2005 Conference Paper

The Pantograph Mk-II: a haptic instrument

  • Gianni Campion
  • Qi Wang
  • Vincent Hayward

We describe the redesign and the performance evaluation of a high-performance haptic device system called the Pantograph. The device is based on a two degree-of-freedom parallel mechanism which was designed for optimized dynamic performance, but which also is well kinematically conditioned. The results show that the system is capable of producing accurate tactile signals in the DC-400 Hz range and can resolve displacements of the order of 10 /spl mu/m. Future improvements are discussed.

IROS Conference 2002 Conference Paper

A prototype virtual haptic bronchoscope

  • Qi Wang
  • Yongsheng Ou
  • Yangsheng Xu

In this paper, we describe the design of the hardware and software for a virtual bronchoscope with force feedback. A haptic interface allows surgeons to feel the reaction force of virtual pneumonic surgery as if they were touching the area directly. We present novel algorithms for haptic force rendering, and examine its ability to display force. The rendering algorithms have been interfaced with a force-reflecting device. This virtual haptic bronchoscope is of significance in training inexperienced doctors in pneumonic diagnosis and surgery.

ICRA Conference 2000 Conference Paper

On Tracking Control of Mobile Manipulators

  • Wenjie Dong
  • Yangsheng Xu
  • Qi Wang

This paper studies the tracking control problem of mobile manipulators with consideration of the interaction between the mobile platform and the manipulator. A global tracking controller is proposed based on the dynamics of the defined tracking error and the extended Barbalat's lemma. The proposed controller ensures that the full state of the system asymptotically track the given desired trajectory globally in the presence of the system coupling. Extensive simulations presented in the paper show the effectiveness of the proposed approach.

ICRA Conference 1998 Conference Paper

Towards Real-Time Robot Programming by Human Demonstration for 6D Force Controlled Actions

  • Qi Wang
  • Joris De Schutter

An approach for real-time robot programming by human demonstration for 6D force controlled actions is presented. A human operator utilises a joystick to guide a robot with a force sensor to execute a task including continuous contact between a manipulated object and an unmodelled environment. During the demonstration, the position, velocity and force of the manipulated object as well as the human commands via the joystick are recorded. In real-time, the recorded information is translated into a textual robot program providing more robust execution in the presence of uncertainties. This approach has three main features (1) online control type adjustment; (2) automatic subtask termination; (3) real-time program generation. Experiments show the potential industrial applicability.

IROS Conference 1996 Conference Paper

An environment for compliant motion programming by human demonstration

  • Sean Graves
  • Qi Wang
  • Wim Witvrouw
  • Joris De Schutter

An integrated system for programming by demonstration, visualizing, and executing compliant motion programs is described. A human operator utilises a joystick to guide a robot with a force sensor to do a task including continuous contact between manipulator and environment. The demonstration may be executed either on an actual robot, or in a graphically simulated environment. During the demonstration, the position, velocity and force of the object manipulated are acquired. Then the recorded data are processed, analysed, and translated into a textual robot program, which provides more robust execution in the presence of uncertainties. The system is composed of a model-based reaction force simulator, a visualization package, a rule-based translator, and an interpreter for compliant motion programs. Experiments show the industrial applicability.

ICRA Conference 1996 Conference Paper

Derivation of compliant motion programs based on human demonstration

  • Qi Wang
  • Joris De Schutter
  • Wim Witvrouw
  • Sean Graves

An approach to force controlled robot programming by human demonstration is presented. A human operator utilises a joystick to guide a robot with a force sensor to do a task including continuous contact between a manipulated object and an un-modelled environment. During the demonstration, the position, velocity and force of the object manipulated are acquired. Then the recorded data are processed, analysed, and translated into a textual robot program, which provides more robust execution in the presence of uncertainties. This approach consists of three key techniques-data processing, subtask segmentation and termination condition identification. A software package is developed to generate the programs automatically. Experiments show the industrial applicability.