Author name cluster

Qi Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

87 papers

2 author rows

EAAI Journal 2026 Journal Article

A Chinese financial event knowledge graph-based retrieval-augmented generation framework for financial question answering

Haitao Cheng
Ke Wang
Qi Wang
Tao Liu
Kai Sheng

Financial question answering in the Chinese domain presents significant challenges due to complex domain-specific terminology and the integration of heterogeneous financial research reports from multiple institutions. To address these issues, we propose a Chinese financial event knowledge graph-based retrieval-augmented generation framework. The framework constructs a structured index via semantic-aware text chunking and large language model-driven triplet extraction, incorporating a generation–verification mechanism to ensure reliable and relevant information retrieval. To mitigate vague or underspecified user queries that commonly occur in Chinese due to implicit expressions and unclear word boundaries, a reinforcement learning-based query reformulation module generates domain-specific representations, improving retrieval intent alignment. A dual-level retrieval mechanism is designed to retrieve core entities via semantic similarity and then expand event chains through knowledge graph-based neighbor expansion. Experimental results across three question types (single-hop, multi-hop, and open-ended) and four evaluation dimensions (comprehensiveness, diversity, empowerment, and overall performance) demonstrate that the proposed framework consistently outperforms baseline models, showing superior performance across various financial question answering tasks.

Details DOI

AAAI Conference 2026 Conference Paper

AbductiveMLLM: Boosting Visual Abductive Reasoning Within MLLMs

Boyu Chang
Qi Wang
Xi Guo
Zhixiong Nan
Yazhou Yao
Tianfei Zhou

Visual abductive reasoning (VAR) is a challenging task that requires AI systems to infer the most likely explanation for incomplete visual observations. While recent MLLMs develop strong general-purpose multimodal reasoning capabilities, they remain fall short in abductive inference, as compared to human beings. To bridge this gap, we draw inspiration from the interplay between verbal and pictorial abduction in human cognition, and propose to strengthen abduction of MLLMs by mimicking such dual-mode behavior. Concretely, we introduce AbductiveMLLM comprising of two synergistic components: REASONER and IMAGINER. The REASONER operates in the verbal domain. It first explores a broad space of possible explanations using a blind LLM and then prunes visually incongruent hypotheses based on cross-modal causal alignment. The remaining hypotheses are introduced into the MLLM as targeted priors, steering its reasoning toward causally coherent explanations. The IMAGINER, on the other hand, further guides MLLMs by emulating human-like pictorial thinking. It conditions a text-to-image diffusion model on both the input video and the REASONER’s output embeddings to “imagine” plausible visual scenes that correspond to verbal explanation, thereby enriching MLLMs' contextual grounding. The two components are trained jointly in an end-to-end manner. Experiments on standard VAR benchmarks show that AbductiveMLLM achieves state-of-the-art performance, consistently outperforming traditional solutions and advanced MLLMs.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Beyond Retraining: Training-Free Unknown Class Filtering for Source-Free Open Set Domain Adaptation of Vision–Language Models

Yongguang Li
Jindong Li
Qi Wang
QianLi Xing
Runliang Niu
Shengsheng Wang
Menglin Yang

Vision-language models (VLMs) have gained widespread attention for their strong zero-shot capabilities across numerous downstream tasks. However, these models assume that each test image’s class label is drawn from a predefined label set and lack a reliable mechanism to reject samples from emerging unknown classes when only unlabeled data are available. To address this gap, open-set domain adaptation methods retrain models to push potential unknowns away from known clusters. Yet, some unknown samples remain stably anchored to specific known classes in the VLM feature space due to semantic relevance, which is termed as Semantic Affinity Anchoring (SAA). Forcibly repelling these samples unavoidably distorts the native geometry of VLMs and degrades performance. Meanwhile, existing score‑based unknown detectors use simplistic thresholds and suffer from threshold sensitivity, resulting in sub‑optimal performance. To address aforementioned issues, we propose VLM-OpenXpert, which comprises two training‑free, plug‑and‑play inference modules. SUFF performs SVD on high-confidence unknowns to extract a low-rank "unknown subspace". Each sample’s projection onto this subspace is weighted and softly removed from its feature, suppressing unknown components while preserving semantics. BGAT corrects score skewness via a Box–Cox transform, then fits a bimodal Gaussian mixture to adaptively estimate the optimal threshold balancing known-class recognition and unknown-class rejection. Experiments on 9 benchmarks and three backbones (CLIP, SigLIP, ALIGN) under Source-Free OSDA settings show that our training-free pipeline matches or outperforms retraining-heavy state-of-the-art methods, establishing a powerful lightweight inference calibration paradigm for open-set VLM deployment.

PDF Details DOI

AAAI Conference 2026 Conference Paper

CoGrad3D: Spatially-Coupled Timestep Optimization with Orthogonal Gradient Fusion for 3D Generation

Haoyang Tong
Hongbo Wang
Jin Liu
Qi Wang
Jie Cao
Ran He

Score Distillation Sampling has driven recent advances in text-to-3D generation. However, current approaches often fail to produce 3D assets that are both rich in detail and consistent across viewpoints. These limitations primarily arise from imbalanced guidance on fine-grained details and an overdependence on single-view optimization—issues exacerbated by the excessive randomness in selecting diffusion timesteps and camera configurations. Such deficiencies commonly lead to blurry textures and inter-view inconsistencies, which degrade visual realism and hinder practical deployment. To tackle these challenges, we introduce CoGrad3D, a unified generative refinement framework that adopts a continuously adaptive optimization strategy. By dynamically modulating the optimization focus based on real-time convergence signals, CoGrad3D ensures balanced progress toward both geometric completeness and high-fidelity detail. Concretely, we propose an adaptive region sampling strategy that emphasizes under-converged viewing areas, promoting stable and uniform optimization. To facilitate the transition from coarse geometry to fine-grained reconstruction, we develop a region-aware temporal scheduling scheme that integrates global training dynamics with local convergence feedback. Furthermore, we introduce a gradient fusion mechanism that consolidates historical gradients from adjacent viewpoints, mitigating view-specific artifacts and promoting the emergence of coherent 3D structures. Extensive experiments demonstrate that CoGrad3D substantially surpasses existing methods in both geometric consistency and texture fidelity, enabling the generation of high-quality, view-consistent 3D models from textual descriptions.

PDF Details DOI

EAAI Journal 2026 Journal Article

Deep learning-aided Laser Doppler Velocimeter-Inertial Measurement Unit Fusion for Robust Vehicle Localization in Global Navigation Satellite Systems-denied environments

Zhiyi Xiang
Qi Wang
Xiaoming Nie
Jian Zhou

Achieving reliable and precise vehicle positioning is paramount for modern autonomous systems, yet it remains a formidable challenge in Global Navigation Satellite Systems (GNSS)-denied environments, especially when relying on ubiquitous low-cost Micro-Electro-Mechanical Systems (MEMS) Inertial Measurement Units (IMUs). This paper introduces a solution that enhances MEMS IMU capabilities by integrating two symmetrically mounted dual-beam Laser Doppler Velocimeters (LDVs). Our core innovation lies in leveraging two specialized Long Short-Term Memory (LSTM) networks that robustly regress the vehicle’s yaw and lateral velocities by effectively fusing both LDV and IMU outputs. To further elevate system accuracy, we propose an LDV outlier handling strategy and a method for LSTM prediction reliability detection designed to mitigate the adverse effects of anomalous network outputs. The vehicle velocities from the LDVs, augmented by our LSTM-derived yaw and lateral velocities, are then fused with MEMS IMU data within a Lie group-based Kalman filter. Experimental validation through two rigorous test sets demonstrates that our method significantly reduces system positioning errors under prolonged GNSS-denied conditions, outperforming existing LDV-based methods. This work underscores the potential of combining precise LDV measurements with the predictive power of deep learning and a robust Lie group-based data fusion strategy for accurate and reliable autonomous vehicle localization.

Details DOI

JBHI Journal 2026 Journal Article

Dual-Student Adversarial Framework With Discriminator and Consistency-Driven Learning for Semi-Supervised Medical Image Segmentation

Haifan Wu
Yuhan Geng
Di Gai
Jieying Tu
Xin Xiong
Qi Wang
Zheng Huang

Semi-supervised medical image segmentation is essential for alleviating the cost of manual annotation in clinical applications. However, existing methods often suffer from unreliable pseudo-labels and confirmation bias in consistency-based training, which can lead to unstable optimization and degraded performance. To address these issues, a novel method named dual-Student adversarial framework with discriminator and consistency-driven learning for semi-supervised medical image segmentation is proposed. Specifically, an adversarial learning-based segmentation refinement (ALSR) module is designed to encourage prediction diversity between two student networks and leverage a shared discriminator for adversarial refinement of pseudo-labels. To further stabilize the consistency process, a residual exponential moving average (R-EMA) is applied in the uncertainty estimation with inter-instance consistency measurement (UIM) module to construct a robust teacher model, while noisy voxel predictions are selectively filtered based on uncertainty estimation. In addition, a Contrastive Representation Stabilization (CRS) module is developed to enhance voxel-level semantic alignment by performing contrastive learning only on confident regions, improving feature discriminability and structural consistency. Extensive experiments on benchmark datasets demonstrate that our method consistently outperforms prior state-of-the-art approaches.

Details DOI

AAAI Conference 2026 Conference Paper

Exploring Generalizable Remote Sensing Change Detection via Low-Rank Exchange Adaptation of Vision Foundation Model

Mingwei Zhang
Jingtao Hu
Qiang Li
Qi Wang

Remote sensing change detection (CD) has achieved remarkable progress in recent years. However, little attention has been paid to generalizable change detection (GCD) methods that can effectively generalize to unseen scenarios or domains beyond the training distribution. The major challenges in GCD arise from domain diversity and bitemporal domain shifts in remote sensing images, caused by variations in imaging platforms, acquisition times, geographic regions, and observed events. To tackle these challenges, we propose GenCD, a GCD framework built upon vision foundation models (VFMs). Specifically, GenCD introduces two key components: (1) a Low-Rank Exchange Adaptation (LREA) strategy of VFMs that aligns bitemporal representations while preserving the generalization capacity of VFMs on single-temporal inputs; and (2) a Token-Guided Feature Refinement (TGFR) mechanism that leverages an input-independent token as a guide to refine difference features, improving the discrimination between changed and unchanged regions. We conduct extensive cross-dataset evaluations on eight diverse datasets across three binary CD tasks: land cover, land use, and building-only CD. The results consistently demonstrate the superior generalization of GenCD over SoTA methods, highlighting its effectiveness in GCD.

PDF Details DOI

AAAI Conference 2026 Conference Paper

HISE-KT: Synergizing Heterogeneous Information Networks and LLMs for Explainable Knowledge Tracing with Meta-Path Optimization

Zhiyi Duan
Zixing Shi
Hongyu Yuan
Qi Wang

Knowledge Tracing (KT) aims to mine students’ evolving knowledge states and predict their future question-answering performance. Existing methods based on heterogeneous information networks (HINs) are prone to introducing noises due to manual or random selection of meta-paths and lack necessary quality assessment of meta-path instances. Conversely, recent large language models (LLMs)-based methods ignore the rich information across students, and both paradigms struggle to deliver consistently accurate and evidence-based explanations. To address these issues, we propose an innovative framework, HIN-LLM Synergistic Enhanced Knowledge Tracing (HISE-KT), which seamlessly integrates HINs with LLMs. HISE-KT first builds a multi-relationship HIN containing diverse node types to capture the structural relations through multiple meta-paths. The LLM is then employed to intelligently score and filter meta-path instances and retain high-quality paths, pioneering automated meta-path quality assessment. Inspired by educational psychology principles, a similar student retrieval mechanism based on meta-paths is designed to provide a more valuable context for prediction. Finally, HISE-KT uses a structured prompt to integrate the target student's history with the retrieved similar trajectories, enabling the LLM to generate not only accurate predictions but also evidence-backed, explainable analysis reports. Experiments on four public datasets show that HISE-KT outperforms existing KT baselines in both prediction performance and interpretability.

PDF Details DOI

JBHI Journal 2026 Journal Article

HyperSynergyX: Synergistic Drug Combination Prediction via Hypergraph Modeling and Knowledge Graph-Enhanced Retrieval-Augmented Generation

Qi Wang
Bingzheng Wu
Minglang Xu
Xiya Liu
Yiming Mao
Zhiheng Zhou
Guiying Yan

Drug combination therapy is pivotal for complex diseases, but identifying synergistic three-drug regimens remains challenging due to both combinatorial explosion and the opacity of existing computational models. To address this, we introduce HyperSynergyX, an explainable framework that integrates synergy prediction with mechanistic explanation. Its core predictive component, a Dual-Biased Random Walk on Hypergraphs (DBRWH), models higher-order interactions among drugs on a three drug hypergraph and identifies latent combination patterns via tensor decomposition. To enhance interpretability, we couple DBRWH with a knowledge-graph–enhanced retrieval augmented generation (KG-RAG) module that retrieves mechanistically relevant subgraphs and uses them to generate biologically grounded hypotheses for predicted synergies. On breast-cancer data, DBRWH achieves AUROC/AUPRC of 0. 9593/0. 9453 under 5-fold cross-validation, and on lung cancer data it achieves 0. 9262/0. 9481, outperforming strong deep learning and hypergraph baselines. By linking predictive performance with mechanistic interpretability, HyperSynergyX provides a robust and transparent tool to accelerate multi-drug discovery and support rational regimen design in precision oncology. The code is available at: https://github.com/wangqi27/HyperSynergyX.

Details DOI

AAAI Conference 2026 Conference Paper

PIMRL: Physics-Informed Multi-Scale Recurrent Learning for Burst-Sampled Spatiotemporal Dynamics

Han Wan
Qi Wang
Yuan Mi
Rui Zhang
Hao Sun

Deep learning has shown strong potential in modeling complex spatiotemporal dynamics. However, most existing methods depend on densely and uniformly sampled data, which is often unavailable in practice due to sensor and cost limitations. In many real-world settings, such as mobile sensing and physical experiments, data are burst-sampled with short high-frequency segments followed by long gaps, making it difficult to learn accurate dynamics from sparse observations. To address this issue, we propose Physics-Informed Multi-Scale Recurrent Learning (PIMRL), a novel framework specifically designed for burst-sampled spatiotemporal data. PIMRL combines macro-scale latent dynamics inference with micro-scale adaptive refinement guided by incomplete prior information from partial differential equations (PDEs). It further introduces a temporal message-passing mechanism to effectively propagate information across burst intervals. This multi-scale architecture enables PIMRL to model complex systems accurately even under severe data scarcity. We evaluate our approach on five benchmark datasets involving 1D to 3D multi-scale PDEs. The results show that PIMRL consistently outperforms state-of-the-art baselines, achieving substantial improvements and reducing errors by up to 80\% in the most challenging settings, which demonstrates the clear advantage of our model. Our work demonstrates the effectiveness of physics-informed recurrent learning for accurate and efficient modeling of sparse spatiotemporal systems.

PDF Details DOI

EAAI Journal 2026 Journal Article

Predicting dielectric properties of polyetherimide-based composite via combined molecular dynamics simulation and machine learning

Yue Zhang
Zheng Gong
Changhai Zhang
Yongquan Zhang
Chao Yin
Xubin Wang
Tiandong Zhang
Xiajie Yi

The design of high-performance polymer dielectrics for capacitor energy storage is crucial but often hindered by time-consuming, resource-intensive development cycles. Polyetherimide is a promising matrix material, yet its performance is limited by a low dielectric constant and breakdown strength. To accelerate the design process, we propose and validate an integrated computational framework combining molecular dynamics simulations with interpretable machine learning. In terms of the Artificial Intelligence contribution, a weighted ensemble model was developed from a dual database of molecular dynamics parameters and molecular descriptors to predict dielectric property in Polyetherimide-based composites. The model was then deconstructed using the SHapley Additive exPlanations framework, which unveiled a multi-scale design hierarchy. This analysis revealed that filler weight fraction and intrinsic dielectric constant are the most dominant predictors, followed by interfacial compatibility and molecular polarity. Regarding the engineering application, to validate our computational approach, model-selected Benzil and Acetophenone were fabricated into composite films. Experimental results confirmed the model's high accuracy, identifying optimal contents of weight percent of 15 wt% for Benzil and 10 wt% for Acetophenone. Notably, the Polyetherimide-based composite with 10 wt% of Acetophenone achieved an excellent discharge energy density of 10. 3 J/cm3, representing a 58 % enhancement over pristine Polyetherimide. Ultimately, this study not only developed a promising material but established a reliable data-driven methodology providing clear guidance for designing next-generation polymer dielectrics.

Details DOI

AAAI Conference 2026 Conference Paper

Reasoning via Implicit Self-supervised Emergence for Instruction Segmentation

Qing Zhou
Lichang Yang
Yuyu Jia
Junyu Gao
Weiping Ni
Junzheng Wu
Qi Wang

We challenge the assumption that complex instruction-guided segmentation tasks necessitate equally complex and explicit supervision. This paper introduces RISE (Reasoning via Implicit Self-supervised Emergence), a framework that learns intricate compositional reasoning, spanning spatial relations to world knowledge, without a single ground-truth mask. To achieve this, RISE employs reinforcement learning with GRPO guided by a single, strikingly simple reward: the semantic alignment score between the textual instruction and the predicted image region. Our primary discovery is the implicit emergence of a high-quality chain-of-thought process from this minimalist signal. Within a structured format, the model autonomously learns to understand instructions by accessing its latent knowledge, inferring spatial relationships—capabilities inherent in its architecture but unlocked by our simple objective. Remarkably, our emergent reasoning yields highly competitive results: RISE achieves 58.7 gIoU on the ReasonSeg benchmark, on par with methods using geometric rewards. Furthermore, we show extreme data efficiency: a variant trained on only 2,000 ImageNet-label pairs establishes a new state-of-the-art for annotation-free referring segmentation with 79.6 cIoU on RefCOCO.

PDF Details DOI

EAAI Journal 2026 Journal Article

Research on cable terminal interface defect state detection based on electric field characteristics and multi-core improved support vector machine

Yujing Tang
Yang Fu
Qin Cai
Jieping Wu
Qi Wang
Guoqiang Gao

As key equipment for high-speed rail power transmission and the connection of high-voltage systems, the cable terminals are crucial to ensuring the stable operation of the railway system. However, the existing detection methods for cable terminals are easily affected by on-site noise and have low detection accuracy. Therefore, this paper proposes a method for detecting interface defect status of high-speed cable terminals based on the electric field strength feature set and multi-kernel support vector machine (MK-SVM). Firstly, a spatial electric field detection platform was built to extract the electric field intensity of the prefabricated defective cable terminals of different lengths. Secondly, the optimization of the characteristic parameters of electric field strength of defective cable terminals was realized based on the Pearson coefficient method. In order to improve the recognition effect and model generalization ability, a MK-SVM combining linear kernel function and radial basis kernel function was proposed. Finally, a comparative study was conducted on the optimization effects of particle swarm algorithm, firefly algorithm, simulated annealing algorithm and genetic algorithm on MK-SVM. Research has shown that using genetic algorithm for parameter optimization of multi-core SVM has the best performance, with recognition accuracy, average precision, average recall, and average F1 score of 95. 6 %, 96 %, 95. 6 %, and 0. 96, respectively. Compared with the unoptimized SVM, the four feature parameters increased by 8. 9 %, 7. 9 %, 8. 9 %, and 9. 6 %, respectively.

Details DOI

AAAI Conference 2026 Conference Paper

Slender3D: Curve-Guided Multi-View Reconstruction of Slender Structures

Suqin Wang
Zeyi Wang
Min Shi
Zhaoxin Li
Qi Wang
Xiujuan Chai
Dengming Zhu

Although geometric reconstruction of general objects from images has made remarkable progress in recent years, slender structures remain largely underexplored, despite their critical importance in engineering, biomedical, and agricultural applications. To bridge this gap, we propose a dedicated 2DGS-based geometric reconstruction framework tailored for slender structures, achieving accurate and faithful geometry recovery. Our method first addresses the challenge that most slender objects are texture-less, which hinders reliable feature matching and pose estimation in traditional SfM pipelines. By leveraging the curve-like nature of slender structures, we perform a curve-guided SfM process that provides robust camera poses and accurate 3D curve initialization for Gaussian primitives. To ensure SfM reliability, we introduce a high-precision mask extraction strategy that integrates geometric priors with a segmentation network, effectively handling self-occlusion and thin geometry. Furthermore, to enhance fine geometric recovery, we incorporate a differentiable Poisson reconstruction module to extract an initial mesh during training, which is then refined via image-space iterative optimization using differentiable mesh rasterization. In contrast to conventional approaches that rely on differentiable Gaussian rasterization followed by TSDF-based mesh extraction, our method avoids the additional geometric errors and artifacts introduced during the intermediate TSDF conversion, thereby improving the overall reconstruction quality. Comprehensive experiments on both synthetic and real-world datasets validate that our method achieves superior reconstruction quality compared to state-of-the-art approaches.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Symphony-MoE: Harmonizing Disparate Pre-trained Models into a Coherent Mixture-of-Experts

Qi Wang
Hanyang Peng
Yue Yu

Mixture-of-Experts (MoE) models enable scalable performance by activating large parameter sets sparsely, minimizing computational overhead. To mitigate the prohibitive cost of training MoEs from scratch, recent work employs upcycling, reusing a single pre-trained dense model by replicating its feed-forward network (FFN) layers into experts. However, this limits expert diversity, as all experts originate from a single pre-trained dense model. This paper addresses this limitation by constructing powerful MoE models using experts sourced from multiple identically-architected but disparate pre-trained models (e.g., Qwen2.5-Coder and Qwen2). A key challenge lies in the fact that these source models occupy disparate, dissonant regions of the parameter space, making direct upcycling prone to severe performance degradation. To overcome this, we propose Symphony-MoE, a novel two-stage framework designed to harmonize these models into a single, coherent expert mixture. First, we establish this harmony in a training-free manner: we construct a shared backbone via a layer-aware fusion strategy and, crucially, alleviate parameter misalignment among experts using activation-based functional alignment. Subsequently, a stage of post-training coordinates the entire architecture. Experiments demonstrate that our method successfully integrates experts from heterogeneous sources, achieving an MoE model that significantly surpasses baselines in multi-domain tasks and out-of-distribution generalization.

PDF Details DOI

AAAI Conference 2026 Conference Paper

TAPO: Dynamic Teacher and Perturbed Answer Injection for Policy Optimization

Maowei Jiang
Zihang Wang
Qi Wang
Peter Búš
Moquan Cheng
Yifan Wang
Quangao Liu
Ruiqi Li

Reinforcement learning (RL) has emerged as a powerful framework to improve the reasoning performance of large language models (LLMs), with approaches such as Group Relative Policy Optimization (GRPO) showing promising results. However, GRPO and its variants struggle with collapsed groups (i.e., all-correct or all-incorrect completions), leading to zero-variance rewards and ineffective gradient signals. Moreover, focusing solely on final answer correctness while ignoring the reasoning process, along with rigid length penalties, can hinder training stability and output quality. To address these issues, we introduce TAPO, a reinforcement learning framework that enhances optimization signals by modifying sampled completions within training groups. TAPO incorporates three core techniques: (1) Dynamic Teacher Injection (DTI), which selectively injects high-quality or adversarial examples to restore effective gradient signals in collapsed groups; (2) Perturbed Answer Injection (PAI), which makes partially correct completions to provide contrastive supervision separating reasoning correctness but wrong answer from the trajectories; and (3) InfoLen-Aware Reward Shaping, a fine-grained reward strategy that penalizes outputs based on both length and semantic redundancy, encouraging concise yet informative responses. Extensive experimental results demonstrate that TAPO significantly improves the mathematical reasoning capabilities of LLMs across multiple challenging benchmarks, outperforming the GRPO baseline by a substantial margin. Component-wise ablations further validate the contribution of each proposed technique.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Target-Balanced Score Distillation

Zhou Xu
Qi Wang
Yuxiao Yang
Luyuan Zhang
Zhang Liang
Yang Li

Score Distillation Sampling (SDS) enables 3D asset generation by distilling priors from pretrained 2D text-to-image diffusion models, but vanilla SDS suffers from over-saturation and over-smoothing. To mitigate this issue, recent variants have incorporated negative prompts. However, these methods face a critical trade-off: limited texture optimization, or significant texture gains with shape distortion. In this work, we first conduct a systematic analysis and reveal that this trade-off is fundamentally governed by the utilization of the negative prompts, where Target Negative Prompts (TNP) that embed target information in the negative prompts dramatically enhancing texture realism and fidelity but inducing shape distortions. Informed by this key insight, we introduce the Target-Balanced Score Distillation (TBSD). It formulates generation as a multi-objective optimization problem and introduces an adaptive strategy that effectively resolves the aforementioned trade-off. Extensive experiments demonstrate that TBSD significantly outperforms existing state-of-the-art methods, yielding 3D assets with high-fidelity textures and geometrically accurate shape.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning

Yixiu Mao
Yun Qu
Qi Wang
Xiangyang Ji

Offline reinforcement learning (RL) suffers from extrapolation errors induced by out-of-distribution (OOD) actions. To address this, offline RL algorithms typically impose constraints on action selection, which can be systematically categorized into density, support, and sample constraints. However, we show that each category has inherent limitations: density and sample constraints tend to be overly conservative in many scenarios, while the support constraint, though least restrictive, faces challenges in accurately modeling the behavior policy. To overcome these limitations, we propose a new neighborhood constraint that restricts action selection in the Bellman target to the union of neighborhoods of dataset actions. Theoretically, the constraint not only bounds extrapolation errors and distribution shift under certain conditions, but also approximates the support constraint without requiring behavior policy modeling. Moreover, it retains substantial flexibility and enables pointwise conservatism by adapting the neighborhood radius for each data point. In practice, we employ data quality as the adaptation criterion and design an adaptive neighborhood constraint. Building on an efficient bilevel optimization framework, we develop a simple yet effective algorithm, Adaptive Neighborhood-constrained Q learning (ANQ), to perform Q learning with target actions satisfying this constraint. Empirically, ANQ achieves state-of-the-art performance on standard offline RL benchmarks and exhibits strong robustness in scenarios with noisy or limited data.

PDF Details

AAAI Conference 2025 Conference Paper

CLIP-driven View-aware Prompt Learning for Unsupervised Vehicle Re-identification

Jiyang Xu
Qi Wang
Xin Xiong
Di Gai
Ruihua Zhou
Dong Wang

With the emergence of vision-language pre-trained models, such as CLIP, some textual prompts have been gradually introduced recently into re-identification (Re-ID) tasks to obtain considerably robust multimodal information. However, most textual descriptions based on vehicle Re-ID tasks only contain identity index words without specific words to describe vehicle view information, thereby resulting in difficulty to be widely applied in vehicle Re-ID tasks with view variations. This case inspires us to propose a CLIP-driven view-aware prompt learning framework for unsupervised vehicle Re-ID. We first design a learnable textual prompt template called view-aware context optimization (ViewCoOp) based on dynamic multi-view word embeddings, which can fully obtain the proportion and position encoding of each view in the whole vehicle body region. Subsequently, a cross-modal mutual graph is constructed to explore the connections between inter-modal and intra-modal. Each sample is treated as a graph node, which extracts textual features based on ViewCoOp and the visual features of images. Moreover, leveraging the inter-cluster and intra-cluster correlation in the bimodal clustering results in the determination of connectivity between graph node pairs. Lastly, the proposed cross-modal mutual graph method utilizes supervised information from the bimodal gap to directly fine-tune the image encoder of CLIP for downstream unsupervised vehicle Re-ID tasks. Extensive experiments verify that the proposed method is capable of effectively obtaining cross-modal description ability from multiple views.

PDF Details DOI

EAAI Journal 2025 Journal Article

Development of intelligent equipment for weed identification and variable spraying in lettuce fields based on instance segmentation framework

Long-Tao Niu
Wen-Hao Su
He-Yi Zhang
Qi Wang
Bo-Wen Dong
Yankun Peng

Weeds in the field compete with crops for nutrients, water and sunlight, hindering the early growth of crops. If not controlled in time, weeds may adversely affect crop growth and yield. Although chemical weed control is low cost, efficient and widely applicable, excessive use of chemical agents may lead to herbicide residues and environmental pollution. In this study, an instance segmentation-based intelligent equipment was developed for weed recognition and targeted variable-rate spraying in lettuce fields. The You-Only-Look-Once version 8 segmentation (YOLOv8-seg) model was optimized through three key enhancements. Initially, Depthwise Separable Convolution (DSConv) was adopted to replace standard convolutional layers, effectively reducing model complexity, and improving computational efficiency. After that, a novel Faster Implementation of Cross Stage Partial Bottleneck with 2 Convolutions-Star shaped Convolutional (C2f_Star) module was proposed, which integrated the StarBlock from the Star-shaped Convolutional Neural Network (StarNet) into the existing structure, thereby enhancing the feature extraction capabilities of the model. Finally, the Simple Attention Module (SimAM), a parameter-free attention mechanism, was introduced to improve the model's attention to relevant features without increasing the number of parameters. These improvements led to the development of the YOLOv8n-seg model, which achieved a mean Average Precision (mAP) of 90. 15 % at 0. 5 Intersection over Union (IoU), with 2, 281, 702 parameters and an inference speed of 15. 7 ms per frame. Compared with the original model, the average precision and inference speed increased by 2. 65 % and 4. 3 %, respectively, while the number of parameters was reduced by 30 %. By combining this model with post-processing algorithms, a precision variable spraying algorithm and equipment were developed. Laboratory experiments at three different weed density levels demonstrated that the system achieved an average recognition accuracy of 95. 2 % and a target spraying success rate of 97. 2 % for weeds in lettuce fields. Herbicide dosage was reduced by 88. 42 %, 65. 25 %, and 37. 30 % at the three density levels, respectively. This research provides essential theoretical and technical support for the development of precision spraying and weeding robots.

Details DOI

JBHI Journal 2025 Journal Article

Edge-Guided Multi-Scale Frequency Attention Network for Gastrointestinal Cancer Image Segmentation

Zhiwen Liao
Qi Wang
Xinyi Tang
Han Wang
Jun Hu
Pengxiang Su
Evangelos K. Markakis
Peng Luo

Image segmentation is a critical technology to improve the accuracy of clinical decisions and treatments in computer-aided diagnostic systems. However, the diverse morphology and fuzzy boundaries of gastrointestinal tumors incur substantial challenges for existing segmentation models, leading to inaccurate feature capture and generating suboptimal results. For solving these problems, we design an edge-guided multi-scale frequency attention network for the gastrointestinal tumor segmentation task, termed EGMFA-Net, which consists of a Kernel Adaptive Enhancement Module (KAEM) and a Frequency-domain Self-attention Module (FDSA). Specifically, KAEM adaptively adjusts the feature extraction kernel based on the morphology of different lesion regions, which enhances the recognition of different morphology regions via a progressive optimization strategy of feature expression. Furthermore, FDSA effectively aggregates multi-scale features in the frequency domain to achieve global receptive fields while preserving more high-frequency details, thereby enhancing adaptability to complex pathological contexts. Extensive experiments on eight medical image benchmark datasets, including SEED, Kvasir, ClinicDB, ColonDB, ETIS, BKAI, CVC-300, and Synapse, show that EGMFA-Net attains state-of-the-art performance over existing methods. Our implementation is available at https://github.com/med-segment/egmfa-net.

Details DOI

NeurIPS Conference 2025 Conference Paper

Gains: Fine-grained Federated Domain Adaptation in Open Set

Zhengyi Zhong
Wenzheng Jiang
Weidong Bao
Ji Wang
Qi Wang
Guanbo Wang
Yongheng Deng
Ju Ren

Conventional federated learning (FL) assumes a closed world with a fixed total number of clients. In contrast, new clients continuously join the FL process in real-world scenarios, introducing new knowledge. This raises two critical demands: detecting new knowledge, i. e. , knowledge discovery, and integrating it into the global model, i. e. , knowledge adaptation. Existing research focuses on coarse-grained knowledge discovery, and often sacrifices source domain performance and adaptation efficiency. To this end, we propose a fine-grained federated domain adaptation approach in open set (Gains). Gains splits the model into an encoder and a classifier, empirically revealing features extracted by the encoder are sensitive to domain shifts while classifier parameters are sensitive to class increments. Based on this, we develop fine-grained knowledge discovery and contribution-driven aggregation techniques to identify and incorporate new knowledge. Additionally, an anti-forgetting mechanism is designed to preserve source domain performance, ensuring balanced adaptation. Experimental results on multi-domain datasets across three typical data-shift scenarios demonstrate that Gains significantly outperforms other baselines in performance for both source-domain and target-domain clients. Code is available at: https: //github. com/Zhong-Zhengyi/Gains.

PDF Details

AAAI Conference 2025 Conference Paper

GTDE: Grouped Training with Decentralized Execution for Multi-agent Actor-Critic

Mengxian Li
Qi Wang
Yongjun Xu

The rapid advancement of multi-agent reinforcement learning (MARL) has given rise to diverse training paradigms to learn the policies of each agent in the multi-agent system. The paradigms of decentralized training and execution (DTDE) and centralized training with decentralized execution (CTDE) have been proposed and widely applied. However, as the number of agents increases, the inherent limitations of these frameworks significantly degrade the performance metrics, such as win rate, total reward, etc. To reduce the influence of the increasing number of agents on the performance metrics, we propose a novel training paradigm of grouped training decentralized execution (GTDE). This framework eliminates the need for a centralized module and relies solely on local information, effectively meeting the training requirements of large-scale multi-agent systems. Specifically, we first introduce an adaptive grouping module, which divides each agent into different groups based on their observation history. To implement end-to-end training, GTDE uses Gumbel-Sigmoid for efficient point-to-point sampling on the grouping distribution while ensuring gradient backpropagation. To adapt to the uncertainty in the number of members in a group, two methods are used to implement a group information aggregation module that merges member information within the group. Empirical results show that in a cooperative environment with 495 agents, GTDE increased the total reward by an average of 382% compared to the baseline. In a competitive environment with 64 agents, GTDE achieved a 100% win rate against the baseline.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

H3D-DGS: Exploring Heterogeneous 3D Motion Representation for Deformable 3D Gaussian Splatting

Bing He
Yunuo Chen
Guo Lu
Qi Wang
Qunshan Gu
Rong Xie
Li Song
Wenjun Zhang

Dynamic scene reconstruction poses a persistent challenge in 3D vision. Deformable 3D Gaussian Splatting has emerged as an effective method for this task, offering real-time rendering and high visual fidelity. This approach decomposes a dynamic scene into a static representation in a canonical space and time-varying scene motion. Scene motion is defined as the collective movement of all Gaussian points, and for compactness, existing approaches commonly adopt implicit neural fields or sparse control points. However, these methods predominantly rely on gradient-based optimization for all motion information. Due to the high degree of freedom, they struggle to converge on real-world datasets exhibiting complex motion. To preserve the compactness of motion representation and address convergence challenges, this paper proposes heterogeneous 3D control points, termed \textbf{H3D control points}, whose attributes are obtained using a hybrid strategy combining optical flow back-projection and gradient-based methods. This design decouples directly observable motion components from those that are geometrically occluded. Specifically, components of 3D motion that project onto the image plane are directly acquired via optical flow back projection, while unobservable portions are refined through gradient-based optimization. Experiments on the Neu3DV and CMU-Panoptic datasets demonstrate that our method achieves superior performance over state-of-the-art deformable 3D Gaussian splatting techniques. Remarkably, our method converges within just 100 iterations and achieves a per-frame processing speed of 2 seconds on a single NVIDIA RTX 4070 GPU.

PDF Details

IJCAI Conference 2025 Conference Paper

Leveraging Pretrained Diffusion Models for Zero-Shot Part Assembly

Ruiyuan Zhang
Qi Wang
Jiaxiang Liu
Yuchi Huo
Chao Wu

3D part assembly aims to understand part relationships and predict their 6-DoF poses to construct realistic 3D shapes, addressing the growing demand for autonomous assembly, which is crucial for robots. Existing methods mainly estimate the transformation of each part by training neural networks under supervision, which requires a substantial quantity of manually labeled data. However, the high cost of data collection and the immense variability of real-world shapes and parts make traditional methods impractical for large-scale applications. In this paper, we propose first a zero-shot part assembly method that utilizes pre-trained point cloud diffusion models as discriminators in the assembly process, guiding the manipulation of parts to form realistic shapes. Specifically, we theoretically demonstrate that utilizing a diffusion model for zero-shot part assembly can be transformed into an Iterative Closest Point (ICP) process. Then, we propose a novel pushing-away strategy to address the overlap parts, thereby further enhancing the robustness of the method. To verify our work, we conduct extensive experiments and quantitative comparisons to several strong baseline methods, demonstrating the effectiveness of the proposed approach, which even surpasses the supervised learning method. The code has been released on https: //github. com/Ruiyuan-Zhang/Zero-Shot-Assembly.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

PeSANet: Physics-encoded Spectral Attention Network for Simulating PDE-Governed Complex Systems

Han Wan
Rui Zhang
Qi Wang
Yang Liu
Hao Sun

Accurately modeling and forecasting complex systems governed by partial differential equations (PDEs) is crucial in various scientific and engineering domains. However, traditional numerical methods struggle in real-world scenarios due to incomplete or unknown physical laws. Meanwhile, machine learning approaches often fail to generalize effectively when faced with scarce observational data and the challenge of capturing local and global features. To this end, we propose the Physics-encoded Spectral Attention Network (PeSANet), which integrates local and global information to forecast complex systems with limited data and incomplete physical priors. The model consists of two key components: a physics-encoded block that uses hard constraints to approximate local differential operators from limited data, and a spectral-enhanced block that captures long-range global dependencies in the frequency domain. Specifically, we introduce a novel spectral attention mechanism to model inter-spectrum relationships and learn long-range spatial features. Experimental results demonstrate that PeSANet outperforms existing methods across all metrics, particularly in long-term forecasting accuracy, providing a promising solution for simulating complex systems with limited data and incomplete physics.

PDF Details DOI

EAAI Journal 2025 Journal Article

Quantization-based deep diversified ensemble for medical image segmentation

Jiawei Zhang
Jialin Wang
Qi Wang
Yanchun Zhang
Weihong Han
Yangyang Mei
Yiyu Shi
Jian Zhuang

Recent advancements in fully convolutional networks (FCNs) have significantly improved medical image segmentation. Ensemble methods are often used to further enhance performance, with diversity among learners being a critical factor. However, many current approaches focus on diversifying training samples or predictions while overlooking the diversity of internal multi-scale features. This oversight can lead to high correlations among features across different learners, limiting overall effectiveness. Additionally, traditional quantization methods aim to minimize accuracy loss by maintaining a rigid quantization process. This rigidity can eliminate the randomness introduced by quantization, further reducing ensemble diversity and effectiveness. In this paper, we propose a novel approach called Quantization-based Deep Diversified Ensemble (QDD-Ens) for medical image segmentation. Our method enhances the diversity of internal features among ensemble learners through two mechanisms: deep diversified loss, which focuses on feature diversity rather than segmentation accuracy, and deep diversified quantization, which preserves beneficial randomness in quantization process. Furthermore, QDD-Ens facilitates a deeper form of ensemble learning by employing a meta-learner to integrate diversified features at multiple resolution levels from various base learners, which are diversified by two above diversify enhancement mechanisms. Extensive experiments on five public medical image segmentation datasets show that our method significantly improves segmentation accuracy and outperforms existing ensemble techniques. The source code is publicly available to support future research. (https: //github. com/JerRuy/QDD-Ens)

Details DOI

JBHI Journal 2025 Journal Article

Re-Visible Dual-Domain Self-Supervised Deep Unfolding Network for MRI Reconstruction

Hao Zhang
Qi Wang
Jian Sun
Zhijie Wen
Jun Shi
Shihui Ying

Magnetic Resonance Imaging (MRI) is widely used in clinical practice, but suffers from prolonged acquisition time. Although deep learning methods have been proposed to accelerate acquisition and demonstrate promising performance, they rely on high-quality fully-sampled datasets for training in a supervised manner. However, such datasets are time-consuming and expensive-to-collect, which constrains their broader applications. On the other hand, self-supervised methods offer an alternative by enabling learning from under-sampled data alone, but most existing methods rely on further partitioned under-sampled k-space data as model's input for training, which causes an input distribution shift between the the training stage and the inference stage. Additionally, their models have not effectively incorporated comprehensive image priors, leading to degraded reconstruction performance. In this paper, we propose a novel re-visible dual-domain self-supervised deep unfolding network to address these issues when only under-sampled datasets are available. Specifically, by incorporating re-visible dual-domain loss, all under-sampled k-space data are utilized during training to mitigate the input distribution shift caused by further partitioning. This design enables the model to implicitly adapt to all under-sampled k-space data as input. Additionally, we design a Deep Unfolding Network based on Chambolle and Pock Proximal Point Algorithm (DUN-CP-PPA) to achieve end-to-end reconstruction. By employing a Spatial-Frequency Feature Extraction (SFFE) block to capture both global and local representations, the model effectively integrates imaging physics with comprehensive image priors to enhance reconstruction performance. Experiments on both single-coil and multi-coil datasets demonstrate that our method outperforms state-of-the-art approaches in terms of reconstruction performance and generalization capability.

Details DOI

NeurIPS Conference 2025 Conference Paper

RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility

Haoyu He
Haozheng Luo
Yan Chen
Qi Wang

Predicting human mobility is inherently challenging due to complex long-range dependencies and multi-scale periodic behaviors. To address this, we introduce RHYTHM (Reasoning with Hierarchical Temporal Tokenization for Human Mobility), a unified framework that leverages large language models (LLMs) as general-purpose spatio-temporal predictors and trajectory reasoners. Methodologically, RHYTHM employs temporal tokenization to partition each trajectory into daily segments and encode them as discrete tokens with hierarchical attention that captures both daily and weekly dependencies, thereby quadratically reducing the sequence length while preserving cyclical information. Additionally, we enrich token representations by adding pre-computed prompt embeddings for trajectory segments and prediction targets via a frozen LLM, and feeding these combined embeddings back into the LLM backbone to capture complex interdependencies. Computationally, RHYTHM keeps the pretrained LLM backbone frozen, yielding faster training and lower memory usage. We evaluate our model against state-of-the-art methods using three real-world datasets. Notably, RHYTHM achieves a 2. 4% improvement in overall accuracy, a 5. 0% increase on weekends, and a 24. 6% reduction in training time. Code is publicly available at https: //github. com/he-h/rhythm.

PDF Details

NeurIPS Conference 2025 Conference Paper

Selective Learning for Deep Time Series Forecasting

Yisong Fu
Zezhi Shao
Chengqing Yu
Yujie Li
Zhulin An
Qi Wang
Yongjun Xu
Fei Wang

Benefiting from high capacity for capturing complex temporal patterns, deep learning (DL) has significantly advanced time series forecasting (TSF). However, deep models tend to suffer from severe overfitting due to the inherent vulnerability of time series to noise and anomalies. The prevailing DL paradigm uniformly optimizes all timesteps through the MSE loss and learns those uncertain and anomalous timesteps without difference, ultimately resulting in overfitting. To address this, we propose a novel selective learning strategy for deep TSF. Specifically, selective learning screens a subset of the whole timesteps to calculate the MSE loss in optimization, guiding the model to focus on generalizable timesteps while disregarding non-generalizable ones. Our framework introduces a dual-mask mechanism to target timesteps: (1) an uncertainty mask leveraging residual entropy to filter uncertain timesteps, and (2) an anomaly mask employing residual lower bound estimation to exclude anomalous timesteps. Extensive experiments across eight real-world datasets demonstrate that selective learning can significantly improve the predictive performance for typical state-of-the-art deep models, including 37. 4% MSE reduction for Informer, 8. 4% for TimesNet, and 6. 5% for iTransformer.

PDF Details

NeurIPS Conference 2025 Conference Paper

VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning

Qi Wang
Yanrui Yu
Ye Yuan
Rui Mao
Tianfei Zhou

Reinforcement fine-tuning (RFT) has shown great promise in achieving humanlevel reasoning capabilities of Large Language Models (LLMs), and has recently been extended to MLLMs. Nevertheless, reasoning about videos, which is a fundamental aspect of human intelligence, remains a persistent challenge due to the complex logic, temporal and causal structures inherent in video data. To fill this gap, we propose VideoRFT, a novel approach that extends the RFT paradigm to cultivate human-like video reasoning capabilities in MLLMs. VideoRFT follows the standard two-stage scheme in RFT: supervised fine-tuning (SFT) with chain-of-thought (CoT) annotations, followed by reinforcement learning (RL) to improve generalization. A central challenge to achieve this in the video domain lies in the scarcity of large-scale, high-quality video CoT datasets. We address this by building a multi-expert-driven, cognition-inspired CoT curation pipeline. First, we devise a cognition-inspired prompting strategy to elicit a reasoning LLM to generate preliminary CoTs based solely on rich, structured, and literal representations of video content. Subsequently, these CoTs are revised by a MLLM conditioned on the actual video, ensuring visual consistency and reducing visual hallucinations. This pipeline results in two new datasets, i. e. VideoRFT-CoT-102K for SFT and VideoRFT-RL-310K for RL. To further strengthen the RL phase, we introduce a novel semantic-consistency reward that explicitly promotes the alignment between textual reasoning and visual evidence. This reward encourages the model to produce coherent, context-aware reasoning outputs grounded in visual input. Extensive experiments show that VideoRFT achieves state-of-the-art performance on six video reasoning benchmarks.

PDF Details

AAAI Conference 2024 Conference Paper

Cross-Sentence Gloss Consistency for Continuous Sign Language Recognition

Qi Rao
Ke Sun
Xiaohan Wang
Qi Wang
Bang Zhang

Continuous sign language recognition (CSLR) aims to recognize gloss sequences from continuous sign videos. Recent works enhance the gloss representation consistency by mining correlations between visual and contextual modules within individual sentences. However, there still remain much richer correlations among glosses across different sentences. In this paper, we present a simple yet effective Cross-Sentence Gloss Consistency (CSGC), which enforces glosses belonging to a same category to be more consistent in representation than those belonging to different categories, across all training sentences. Specifically, in CSGC, a prototype is maintained for each gloss category and benefits the gloss discrimination in a contrastive way. Thanks to the well-distinguished gloss prototype, an auxiliary similarity classifier is devised to enhance the recognition clues, thus yielding more accurate results. Extensive experiments conducted on three CSLR datasets show that our proposed CSGC significantly boosts the performance of CSLR, surpassing existing state-of-the-art works by large margins (i.e., 1.6% on PHOENIX14, 2.4% on PHOENIX14-T, and 5.7% on CSL-Daily).

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Doubly Mild Generalization for Offline Reinforcement Learning

Yixiu Mao
Qi Wang
Yun Qu
Yuhang Jiang
Xiangyang Ji

Offline Reinforcement Learning (RL) suffers from the extrapolation error and value overestimation. From a generalization perspective, this issue can be attributed to the over-generalization of value functions or policies towards out-of-distribution (OOD) actions. Significant efforts have been devoted to mitigating such generalization, and recent in-sample learning approaches have further succeeded in entirely eschewing it. Nevertheless, we show that mild generalization beyond the dataset can be trusted and leveraged to improve performance under certain conditions. To appropriately exploit generalization in offline RL, we propose Doubly Mild Generalization (DMG), comprising (i) mild action generalization and (ii) mild generalization propagation. The former refers to selecting actions in a close neighborhood of the dataset to maximize the Q values. Even so, the potential erroneous generalization can still be propagated, accumulated, and exacerbated by bootstrapping. In light of this, the latter concept is introduced to mitigate the generalization propagation without impeding the propagation of RL learning signals. Theoretically, DMG guarantees better performance than the in-sample optimal policy in the oracle generalization scenario. Even under worst-case generalization, DMG can still control value overestimation at a certain level and lower bound the performance. Empirically, DMG achieves state-of-the-art performance across Gym-MuJoCo locomotion tasks and challenging AntMaze tasks. Moreover, benefiting from its flexibility in both generalization aspects, DMG enjoys a seamless transition from offline to online learning and attains strong online fine-tuning performance.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Error-aware Sampling in Adaptive Shells for Neural Surface Reconstruction

Qi Wang
Yuchi Huo
Qi Ye
Rui Wang
Hujun Bao

Neural implicit surfaces with signed distance functions (SDFs) achieve superior quality in 3D geometry reconstruction. However, training SDFs is time-consuming because it requires a great number of samples to calculate accurate weight distributions and a considerable amount of samples sampled from the distribution for integrating the rendering results. Some existing sampling strategies focus on this problem. During the training, they assume a spatially-consistent convergence speed of kernel size, thus still suffering from low convergence or errors. Instead, we introduce an error-aware sampling method based on thin intervals of valid weight distributions, dubbed adaptive shells, to reduce the number of samples while still maintaining the reconstruction accuracy. To this end, we first extend Laplace-based neural implicit surfaces with learned spatially-varying kernel sizes which indicates the range of valid weight distributions. Then, the adaptive shell for each ray is determined by an efficient double-clipping strategy with spatially-varying SDF values and kernel sizes, fitting larger kernel sizes to wider shells. Finally, we calculate the error-bounded cumulative distribution functions (CDFs) of shells to conduct efficient importance sampling, achieving low-variance rendering with fewer calculations. Extensive results in various scenes demonstrate the superiority of our sampling technique, including significantly reducing sample counts and training time, even improving the reconstruction quality. The code is available at https: //github. com/erernan/ESampling.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on

Chenhui Wang
Tao Chen
Zhihao Chen
Zhizhong Huang
Taoran Jiang
Qi Wang
Hongming Shan

Despite their impressive generative performance, latent diffusion model-based virtual try-on (VTON) methods lack faithfulness to crucial details of the clothes, such as style, pattern, and text. To alleviate these issues caused by the diffusion stochastic nature and latent supervision, we propose a novel Faithful Latent Diffusion Model for VTON, termed FLDM-VTON. FLDM-VTON improves the conventional latent diffusion process in three major aspects. First, we propose incorporating warped clothes as both the starting point and local condition, supplying the model with faithful clothes priors. Second, we introduce a novel clothes flattening network to constrain generated try-on images, providing clothes-consistent faithful supervision. Third, we devise a clothes-posterior sampling for faithful inference, further enhancing the model performance over conventional clothes-agnostic Gaussian sampling. Extensive experimental results on the benchmark VITON-HD and Dress Code datasets demonstrate that our FLDM-VTON outperforms state-of-the-art baselines and is able to generate photo-realistic try-on images with faithful clothing details.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

GO4Align: Group Optimization for Multi-Task Alignment

Jiayi Shen
Qi Wang
Zehao Xiao
Nanne van Noord
Marcel Worring

This paper proposes GO4Align, a multi-task optimization approach that tackles task imbalance by explicitly aligning the optimization across tasks. To achieve this, we design an adaptive group risk minimization strategy, comprising two techniques in implementation: (i) dynamical group assignment, which clusters similar tasks based on task interactions; (ii) risk-guided group indicators, which exploit consistent task correlations with risk information from previous iterations. Comprehensive experimental results on diverse benchmarks demonstrate our method's performance superiority with even lower computational costs.

PDF Details DOI

JBHI Journal 2024 Journal Article

Improving Needle Tip Tracking and Detection in Ultrasound-Based Navigation System Using Deep Learning-Enabled Approach

Hui Che
Jiaxin Qin
Yao Chen
Zihan Ji
Yibo Yan
Jing Yang
Qi Wang
Chaofeng Liang

Ultrasound-guided percutaneous interventions have numerous advantages over traditional techniques. Accurate needle placement in the target anatomy is crucial for successful intervention, and reliable visual information is essential to achieve this. However, previous studies have revealed several challenges, such as the variability in needle echogenicity and the common misalignment of the ultrasound beam and the needle. Advanced techniques have been developed to optimize needle visualization, including hardware-based and image-processing-based methods. This paper proposes a novel strategy of integrating ultrasound-based deep learning approaches into an optical navigation system to enhance needle visualization and improve tip positioning accuracy. Both the tracking and detection algorithms are optimized utilizing optical tracking information. The information is introduced into the tracking network to define the search patch update strategy and form a trajectory reference to correct tracking results. In the detection network, the original image is processed according to the needle insertion position and current position given by the optical localization system to locate a coarse region, and the depth-score criterion is adopted to optimize detection results. Extensive experiments demonstrate that our approach achieves promising tip tracking and detection performance with tip localization errors of 1. 11 $\pm $ 0. 59 mm and 1. 17 $\pm$ 0. 70 mm, respectively. Moreover, we establish a paired dataset consisting of ultrasound images and their corresponding spatial tip coordinates acquired from the optical tracking system and conduct real puncture experiments to verify the effectiveness of the proposed methods. Our approach significantly improves needle visualization and provides physicians with visual guidance for posture adjustment.

Details DOI

TMLR Journal 2024 Journal Article

Large Language Models can be Guided to Evade AI-generated Text Detection

Ning Lu
Shengcai Liu
Rui He
Yew-Soon Ong
Qi Wang
Ke Tang

Large language models (LLMs) have shown remarkable performance in various tasks and have been extensively utilized by the public. However, the increasing concerns regarding the misuse of LLMs, such as plagiarism and spamming, have led to the development of multiple detectors, including fine-tuned classifiers and statistical methods. In this study, we equip LLMs with prompts, rather than relying on an external paraphraser, to evaluate the vulnerability of these detectors. We propose a novel Substitution-based In-Context example Optimization method (SICO) to automatically construct prompts for evading the detectors. SICO is cost-efficient as it requires only 40 human-written examples and a limited number of LLM inferences to generate a prompt. Moreover, once a task-specific prompt has been constructed, it can be universally used against a wide range of detectors. Extensive experiments across three real-world tasks demonstrate that SICO significantly outperforms the paraphraser baselines and enables GPT-3.5 to successfully evade six detectors, decreasing their AUC by 0.5 on average. Furthermore, a comprehensive human evaluation show that the SICO-generated text achieves human-level readability and task completion rates, while preserving high imperceptibility. Finally, we propose an ensemble approach to enhance the robustness of detectors against SICO attack.

PDF Details

EAAI Journal 2024 Journal Article

Machine learning-driven feature importance appraisal of seismic parameters on tunnel damage and seismic fragility prediction

Qi Wang
Ping Geng
Liangjie Wang
Dingwei He
Huoming Shen

This study proposes a machine learning-driven approach for the analysis of the feature importance of seismic parameters on tunnel damage and seismic fragility prediction. The Incremental Dynamic Analysis (IDA) method serves as the fundamental database for vulnerability analysis. Strength and deformation yield criteria are chosen to comprehensively assess the impact of different seismic parameters on the vulnerability of tunnels to seismic events. Three machine learning algorithms, namely Extreme Gradient Boosting (XGBoost), Random Forest (RF), and Support Vector Machine (SVM), are utilized to develop models for classifying and regressing tunnel damage under seismic conditions. Following parameter tuning, the models' performance in multi-classification, binary classification, and regression prediction is assessed, with XGBoost and RF models exhibiting outstanding performance. Feature importance analysis of seismic parameters in XGBoost and RF models for multi-classification, binary classification, and regression is performed using Shapley additive explanations (SHAP). The correlation analysis between SHAP-based feature values and predictions reveals that Peak Ground Displacement (PGD) has the highest influence in the regression model. Utilizing the interaction dependencies among crucial features in the regression model, fragility curves for tunnels based on these key features are effectively derived. The predicted fragility curves closely align with those derived from IDA, illustrating the time-saving and high-performance capabilities of machine learning in nonlinear dynamic computations.

Details DOI

NeurIPS Conference 2024 Conference Paper

Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning

Qi Wang
Junming Yang
Yunbo Wang
Xin Jin
Wenjun Zeng
Xiaokang Yang

Training offline RL models using visual inputs poses two significant challenges, i. e. , the overfitting problem in representation learning and the overestimation bias for expected future rewards. Recent work has attempted to alleviate the overestimation bias by encouraging conservative behaviors. This paper, in contrast, tries to build more flexible constraints for value estimation without impeding the exploration of potential advantages. The key idea is to leverage off-the-shelf RL simulators, which can be easily interacted with in an online manner, as the “ test bed ” for offline policies. To enable effective online-to-offline knowledge transfer, we introduce CoWorld, a model-based RL approach that mitigates cross-domain discrepancies in state and reward spaces. Experimental results demonstrate the effectiveness of CoWorld, outperforming existing RL approaches by large margins.

PDF Details DOI

EAAI Journal 2024 Journal Article

Meta-fourier neural operators for multi-task modeling of film cooling in gas turbine endwalls

Qi Wang
Jian Lou
Yang Li
Li Yang

Film cooling was a key technology to protect gas turbine endwalls from thermal ablation. Precise local temperature control was important for film cooling design on turbine endwalls, which required fast prediction of the two-dimensional cooling effectiveness. Supervised deep learning methods were feasible methods to fulfill such demand, but still faced challenges in the lack of data and generalization. A prediction model trained for a specific endwall could not be generalized to others at a low cost. To break through this bottleneck, this study proposed a meta learning method for film cooling prediction, which leveraged historical data to reduce the modeling cost and improve the generalization on a new film cooling prediction task. Four historical tasks and two new tasks for the film cooling prediction were created by changing the endwall pressure gradients. The number of samples available for modeling was limited to less than 10 for each new task. A Fourier Neural Operator was adopted to regress the film cooling effectiveness on endwall surfaces. Results showed that the proposed method reduced the amount of data required by 80% and the prediction error by 55% on the new film cooling design tasks.

Details DOI

IJCAI Conference 2024 Conference Paper

MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music

Zihao Wang
Shuyu Li
Tao Zhang
Qi Wang
Pengfei Yu
Jinyang Luo
Yan Liu
Ming Xi

The rapidly evolving multimodal Large Language Models (LLMs) urgently require new benchmarks to uniformly evaluate their performance on understanding and textually describing music. However, due to semantic gaps between Music Information Retrieval (MIR) algorithms and human understanding, discrepancies between professionals and the public, and low precision of annotations, existing music description datasets cannot serve as benchmarks. To this end, we present MuChin, the first open-source music description benchmark in Chinese colloquial language, designed to evaluate the performance of multimodal LLMs in understanding and describing music. We established the Caichong Music Annotation Platform (CaiMAP) that employs an innovative multi-person, multi-stage assurance method, and recruited both amateurs and professionals to ensure the precision of annotations and alignment with popular semantics. Utilizing this method, we built a large-scale, private dataset with multi-dimensional, high-precision music annotations, the Caichong Music Dataset (CaiMD), and carefully selected 1, 000 high-quality entries to serve as the test set for MuChin. Based on MuChin, we analyzed the discrepancies between professionals and amateurs in terms of music description, and empirically demonstrated the effectiveness of CaiMD for fine-tuning LLMs. Ultimately, we employed MuChin to evaluate existing music understanding models on their ability to provide colloquial descriptions of music.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Offline Reinforcement Learning with OOD State Correction and OOD Action Suppression

Yixiu Mao
Qi Wang
Chen Chen
Yun Qu
Xiangyang Ji

In offline reinforcement learning (RL), addressing the out-of-distribution (OOD) action issue has been a focus, but we argue that there exists an OOD state issue that also impairs performance yet has been underexplored. Such an issue describes the scenario when the agent encounters states out of the offline dataset during the test phase, leading to uncontrolled behavior and performance degradation. To this end, we propose SCAS, a simple yet effective approach that unifies OOD state correction and OOD action suppression in offline RL. Technically, SCAS achieves value-aware OOD state correction, capable of correcting the agent from OOD states to high-value in-distribution states. Theoretical and empirical results show that SCAS also exhibits the effect of suppressing OOD actions. On standard offline RL benchmarks, SCAS achieves excellent performance without additional hyperparameter tuning. Moreover, benefiting from its OOD state correction feature, SCAS demonstrates enhanced robustness against environmental perturbations.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

P$^2$C$^2$Net: PDE-Preserved Coarse Correction Network for efficient prediction of spatiotemporal dynamics

Qi Wang
Pu Ren
Hao Zhou
Xin-Yang Liu
Zhiwen Deng
Yi Zhang
Ruizhi Chengze
Hongsheng Liu

When solving partial differential equations (PDEs), classical numerical methods often require fine mesh grids and small time stepping to meet stability, consistency, and convergence conditions, leading to high computational cost. Recently, machine learning has been increasingly utilized to solve PDE problems, but they often encounter challenges related to interpretability, generalizability, and strong dependency on rich labeled data. Hence, we introduce a new PDE-Preserved Coarse Correction Network (P$^2$C$^2$Net) to efficiently solve spatiotemporal PDE problems on coarse mesh grids in small data regimes. The model consists of two synergistic modules: (1) a trainable PDE block that learns to update the coarse solution (i. e. , the system state), based on a high-order numerical scheme with boundary condition encoding, and (2) a neural network block that consistently corrects the solution on the fly. In particular, we propose a learnable symmetric Conv filter, with weights shared over the entire model, to accurately estimate the spatial derivatives of PDE based on the neural-corrected system state. The resulting physics-encoded model is capable of handling limited training data (e. g. , 3--5 trajectories) and accelerates the prediction of PDE solutions on coarse spatiotemporal grids while maintaining a high accuracy. P$^2$C$^2$Net achieves consistent state-of-the-art performance with over 50\% gain (e. g. , in terms of relative prediction error) across four datasets covering complex reaction-diffusion processes and turbulent flows.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Resource-Aware Federated Self-Supervised Learning with Global Class Representations

Mingyi Li
Xiao Zhang
Qi Wang
Tengfei Liu
Ruofan Wu
Weiqiang Wang
Fuzhen Zhuang
Hui Xiong

Due to the heterogeneous architectures and class skew, the global representation models training in resource-adaptive federated self-supervised learning face with tricky challenges: $\textit{deviated representation abilities}$ and $\textit{inconsistent representation spaces}$. In this work, we are the first to propose a multi-teacher knowledge distillation framework, namely $\textit{FedMKD}$, to learn global representations with whole class knowledge from heterogeneous clients even under extreme class skew. Firstly, the adaptive knowledge integration mechanism is designed to learn better representations from all heterogeneous models with deviated representation abilities. Then the weighted combination of the self-supervised loss and the distillation loss can support the global model to encode all classes from clients into a unified space. Besides, the global knowledge anchored alignment module can make the local representation spaces close to the global spaces, which further improves the representation abilities of local ones. Finally, extensive experiments conducted on two datasets demonstrate the effectiveness of $\textit{FedMKD}$ which outperforms state-of-the-art baselines 4. 78\% under linear evaluation on average.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

ScreenAgent: A Vision Language Model-driven Computer Control Agent

Runliang Niu
Jindong Li
Shiqi Wang
Yali Fu
Xiyu Hu
Xueyuan Leng
He Kong
Yi Chang

Large Language Models (LLM) can invoke a variety of tools and APIs to complete complex tasks. The computer, as the most powerful and universal tool, could potentially be controlled by a trained LLM agent. Powered by the computer, we can hopefully build a more generalized agent to assist humans in various daily digital works. In this paper, we construct an environment for a Vision Language Model (VLM) agent to interact with a real computer screen. Within this environment, the agent can observe screenshots and manipulate the Graphical User Interface (GUI) by outputting mouse and keyboard actions. We also design an automated control pipeline that includes planning, acting, and reflecting phases, guiding the agent to continuously interact with the environment and complete multi-step tasks. Additionally, we construct the ScreenAgent Dataset, which collects screenshots and action sequences when completing daily computer tasks. Finally, we train a model, ScreenAgent, which achieves comparable computer control capabilities to GPT-4V and demonstrated more precise UI positioning capabilities. Our attempts could inspire further research on building a generalist LLM agent. The code and more detailed information are at https: //github. com/niuzaisheng/ScreenAgent.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Theoretical Investigations and Practical Enhancements on Tail Task Risk Minimization in Meta Learning

Yiqin Lv
Qi Wang
Dong Liang
Zheng Xie

Meta learning is a promising paradigm in the era of large models and task distributional robustness has become an indispensable consideration in real-world scenarios. Recent advances have examined the effectiveness of tail task risk minimization in fast adaptation robustness improvement \citep{wang2023simple}. This work contributes to more theoretical investigations and practical enhancements in the field. Specifically, we reduce the distributionally robust strategy to a max-min optimization problem, constitute the Stackelberg equilibrium as the solution concept, and estimate the convergence rate. In the presence of tail risk, we further derive the generalization bound, establish connections with estimated quantiles, and practically improve the studied strategy. Accordingly, extensive evaluations demonstrate the significance of our proposal in boosting robustness.

PDF Details DOI

YNICL Journal 2024 Journal Article

Unveiling MRI markers for Parkinson’s Disease: GABAergic dysfunction and cortical changes

Yuan Tian
Sijia Geng
Tianyi Liu
Qi Wang
Jianxiu Lian
Liangjie Lin
Jiayu Li
Tao Gong

OBJECTIVE: The study aimed to investigate changes in basal levels of the inhibitory γ-aminobutyric acid (GABA) neurotransmitter in the sensorimotor cortex (SMC) and cortical gyrification in patients with Parkinson's disease (PD), which could further identify potential imaging biomarkers for PD, particularly in patients with early-onset Parkinson's disease (EOPD). METHOD: Fifty patients with PD (EOPD: 10, late-onset Parkinson's disease [LOPD]: 40) and fifty-two age- and gender-matched healthy controls (HC) underwent GABA-edited 1H MRS of the SMC and high-resolution 3D T1-weighted brain imaging. GABA levels and local gyrification index (LGI) were calculated to assess GABAergic and cortical gyrification deficits in PD. RESULT: The Pearson correlation coefficients revealed significant negative associations between eight indicators, including GABA/Cr level and local gyrification index (LGI) of specific cortical regions (precentral, postcentral, entorhinal, superiortemporal, posteriorcingulate, cuneus, and transversetemporal cortex), and the likelihood of Parkinson's disease (r < -0.4, p < 0.001). Additionally, GABA levels were significantly lower in the SMC region of both EOPD and LOPD patients compared to healthy controls (mean ± SD [u.i.]: EOPD=0.081 ± 0.022 vs. Young-HC=0.112 ± 0.021, p = 0.003; LOPD=0.054 ± 0.024 vs. Old-HC=0.099 ± 0.021, p < 0.001). The logistic regression model was established by using multivariate analysis, identifying two statistically significant indicators: GABA/Cr and LGI of the transversetemporal. The combined model exhibited the highest AUC values in both younger and older populations. CONCLUSION: GABAergic dysfunction may play an important role in the pathogenesis of PD patients. Changes in neurotransmitter and morphological may serve as potential markers for the preclinical diagnosis and progression of PD, including EOPD.

Details DOI

NeurIPS Conference 2023 Conference Paper

A Simple Yet Effective Strategy to Robustify the Meta Learning Paradigm

Qi Wang
Yiqin Lv
Yanghe Feng
Zheng Xie
Jincai Huang

Meta learning is a promising paradigm to enable skill transfer across tasks. Most previous methods employ the empirical risk minimization principle in optimization. However, the resulting worst fast adaptation to a subset of tasks can be catastrophic in risk-sensitive scenarios. To robustify fast adaptation, this paper optimizes meta learning pipelines from a distributionally robust perspective and meta trains models with the measure of tail task risk. We take the two-stage strategy as heuristics to solve the robust meta learning problem, controlling the worst fast adaptation cases at a certain probabilistic level. Experimental results show that our simple method can improve the robustness of meta learning to task distributions and reduce the conditional expectation of the worst fast adaptation risk.

PDF Details

EAAI Journal 2023 Journal Article

An autonomous cooperative system of multi-AUV for underwater targets detection and localization

Qi Wang
Bo He
Yixiao Zhang
Fei Yu
Xiaochao Huang
Rong Yang

This paper proposes a cooperative online target detection methodology by multiple autonomous underwater vehicles (Multi-AUV) equipped with the side-scan sonar (SSS) sensor for real-time, accurate, and efficient underwater target detection and positioning in unknown environments. Due to the existence of unfavorable factors such as severe noises and geometric deformation of SSS images, this study incorporates the prior-based threshold segmentation with multi-scale cascaded networks (MSCNet) to reduce the high false alarm rate significantly. Specifically, to the real-time requirements of the AUVs computational platform, this study proposes the sequentially dual-branch lightweight block (LWBlock) as a baseline to obtain dense feature maps, which provide a good trade-off between accuracy and speed. Meanwhile, this study establishes the comprehensive correction model, which obtains the accurate target positioning information fusing with the predicted results. Furthermore, according to the target information provided by the automatic target recognition (ATR) system, the data-driven behavior-based (DDBB) path re-planning algorithm is performed that endows each AUV to scan above the interest target autonomously and in detail by designed maneuver behavior. Simulation and actual sea trial experimental results show that the proposed method outperforms other state-of-the-art algorithms, and achieves the recognition accuracy of 92. 16%, inference speed of 2. 45 s, and obtained the best FPR indicator in three SSS targets of 2. 54% (metal ball), 1. 96% (seabed rock) and 1. 03% (metal rod), respectively. At the same time, the proposed algorithm can improve detection efficiency by at least 40% compared with a single AUV, which can be widely used in marine mission exploration and resource deployment.

Details DOI

EAAI Journal 2023 Journal Article

An online path planning algorithm for autonomous marine geomorphological surveys based on AUV

Yixiao Zhang
Qi Wang
Yue Shen
Bo He

This paper proposed a data-driven bi-pattern (DDBP) path planning algorithm for ocean geomorphological surveys based on Autonomous Underwater Vehicles (AUVs). When an AUV conducts surveys in unknown areas, it uses the observation data of real-time side-scan sonar to conduct environment modeling to drive independent online path re-planning (PRP) according to the feature density of the interesting targets. Based on the DDBP algorithm, the AUV can autonomously focus on regions with rich target distribution and deviate from regions with sparse target distribution without prior knowledge of the task region. The quality and efficiency of the AUV-based surveys can be improved by focusing on the underwater detection area with high feature density. The DDBP algorithm includes two patterns: rough and fine scan, and the corresponding planning pattern is selected according to the distribution of the detected targets. AUV performs online PRP in the corresponding pattern according to the pre-identified strategy set. We conducted simulation experiments and selected sand waves and fish reefs as natural and artificial structures to conduct typical marine survey tests. Compared with the traditional marine survey method, the survey efficiency was increased by 33. 6% and 29. 6%, respectively, in the two DDBP survey experiments for sand waves; the efficiency increased by 32. 9% and 36. 7%, respectively, in the two groups of DDBP survey experiments on artificial reefs. The proposed general technical framework for online path planning driven by real-time observation data has good application prospects in underwater archaeology, rapid understanding of specific targets on the seafloor, and search of specific targets.

Details DOI

YNICL Journal 2023 Journal Article

Cortical anatomical variations, gene expression profiles, and clinical phenotypes in patients with schizophrenia

Yong Han
Yongfeng Yang
Zhilu Zhou
Xueyan Jin
Han Shi
Minglong Shao
Meng Song
Xi Su

BACKGROUND AND HYPOTHESIS: Schizophrenia (SZ) patients display significant structural brain abnormalities; nevertheless, the genetic mechanisms regulating cortical anatomical variations and their correlation with the disease phenotype are still ambiguous. STUDY DESIGN: We characterized anatomical variation using a surface-based method derived from structural magnetic resonance imaging of patients with SZ and age- and sex-matched healthy controls (HCs). Partial least-squares regression was performed across cortex regions between anatomical variation and average transcriptional profiles of SZ risk genes and all qualified genes from the Allen Human Brain Atlas. The morphological features of each brain region were correlated to symptomology variables in patients with SZ using partial correlation analysis. STUDY RESULTS: A total of 203 SZ and 201 HCs were included in the final analysis. We observed significant variation of 55 regions of cortical thickness, 23 regions of volume, 7 regions of area, and 55 regions of local gyrification index (LGI) between SZ and HC groups. Expression profiles of 4 SZ risk genes and 96 genes from all qualified genes showed a correlation to anatomical variability, however, after multiple comparisons, the correlations were no longer significant. LGI variability in multiple frontal subregions was associated with specific symptoms of SZ, whereas cognitive function involving attention/vigilance was linked to LGI variability across nine brain regions. CONCLUSIONS: Cortical anatomical variation of patients with schizophrenia is associated with gene transcriptome profiles as well as clinical phenotypes.

Details DOI

NeurIPS Conference 2023 Conference Paper

Episodic Multi-Task Learning with Heterogeneous Neural Processes

Jiayi Shen
Xiantong Zhen
Qi Wang
Marcel Worring

This paper focuses on the data-insufficiency problem in multi-task learning within an episodic training setup. Specifically, we explore the potential of heterogeneous information across tasks and meta-knowledge among episodes to effectively tackle each task with limited data. Existing meta-learning methods often fail to take advantage of crucial heterogeneous information in a single episode, while multi-task learning models neglect reusing experience from earlier episodes. To address the problem of insufficient data, we develop Heterogeneous Neural Processes (HNPs) for the episodic multi-task setup. Within the framework of hierarchical Bayes, HNPs effectively capitalize on prior experiences as meta-knowledge and capture task-relatedness among heterogeneous tasks, mitigating data-insufficiency. Meanwhile, transformer-structured inference modules are designed to enable efficient inferences toward meta-knowledge and task-relatedness. In this way, HNPs can learn more powerful functional priors for adapting to novel heterogeneous tasks in each meta-test episode. Experimental results show the superior performance of the proposed HNPs over typical baselines, and ablation studies verify the effectiveness of the designed inference modules.

PDF Details

IROS Conference 2023 Conference Paper

Hierarchical Attention Network for Planning-Informed Multi-Agent Trajectory Prediction

Wenyi Xiong
Jian Chen
Xinfang Zhang
Qi Wang
Ziheng Qi

The accurate prediction of the neighboring vehicles' trajectories affects the security of autonomous driving vehicles. However, it is challenging for existing methods to anticipating the trajectories of vehicles in the vicinity due to the uncertainty of driving behaviors and the complex interaction patterns of traffic flows. In this study, incorporating the planning information of the ego vehicle, we propose a novel trajectory prediction approach based on the hierarchical attention mechanism. Firstly, a spatio-temporary attention module is presented to extract the social interaction of surrounding vehicles and capture the temporal dependence of continuous frame historical information and planning information. Then, a hard-soft attention module is designed to perform two tasks: weighing the importance of both historical and future information, and learning different location information about the target vehicles. Our method is evaluated on two national highway datasets. The experimental results show that our algorithm achieves the state-of-the-art performance.

Details

EAAI Journal 2023 Journal Article

Meteorological data layout and task scheduling in a multi-cloud environment

Yongsheng Hao
Jie Cao
Qi Wang
Tinghuai Ma
Qin Wang
Xin Zhang

The meteorological cloud mainly provides computing ability and meteorological datasets for meteorological model tasks. If the location of the required dataset and the execution location of the task are different, this will consume a large amount of time and bandwidth to transfer the data for the task. Meteorological data layout allocates meteorological datasets to various clouds. Because meteorological datasets are required by multiple meteorological model tasks and multiple times, the data layout is very important in the meteorological clouds. This paper focuses on how to layout out the meteorological datasets based on the association (internal meteorological datasets and between meteorological model tasks) and schedule resources for meteorological model tasks in the meteorological cloud. First, to find the association in the meteorological datasets and meteorological models, we use Apriori algorithm to mine frequent itemsets between datasets used by different meteorological models, and then we use the result to help layout meteorological data. After that, we present a heuristic algorithm for scheduling meteorological tasks. Finally, simulation comparison shows that the meteorological data layout method has a lowest value in the number of involved clouds for every task, the average size of transmitted datasets from other clouds, and the average time of transmitted datasets between clouds. We also prove that the scheduling method based on the data layout increases the number of completed tasks before their deadlines and reduces the average execution time.

Details DOI

IJCAI Conference 2023 Conference Paper

Multi-level Graph Contrastive Prototypical Clustering

Yuchao Zhang
Yuan Yuan
Qi Wang

Recently, graph neural networks (GNNs) have drawn a surge of investigations in deep graph clustering. Nevertheless, existing approaches predominantly are inclined to semantic-agnostic since GNNs exhibit inherent limitations in capturing global underlying semantic structures. Meanwhile, multiple objectives are imposed within one latent space, whereas representations from different granularities may presumably conflict with each other, yielding severe performance degradation for clustering. To this end, we propose a novel Multi-Level Graph Contrastive Prototypical Clustering (MLG-CPC) framework for end-to-end clustering. Specifically, a Prototype Discrimination (ProDisc) objective function is proposed to explicitly capture semantic information via cluster assignments. Moreover, to alleviate the issue of objectives conflict, we introduce to perceive representations of different granularities within individual feature-, prototypical-, and cluster-level spaces by the feature decorrelation, prototype contrast, and cluster space consistency respectively. Extensive experiments on four benchmarks demonstrate the superiority of the proposed MLG-CPC against the state-of-the-art graph clustering approaches.

PDF Details DOI

EAAI Journal 2023 Journal Article

SAR ship localization method with denoising and feature refinement

Cheng Zha
Weidong Min
Qing Han
Wei Li
Xin Xiong
Qi Wang
Meng Zhu

Synthetic Aperture Radar (SAR) ship detection is greatly important to marine transportation monitoring and fishery resource management. To improve the detection accuracy of small ships, an SAR ship localization method with Denoising and Feature Refinement (DFR) is proposed in this paper. It consists of three parts. The first part is the denoising module, which uses non-local mean to suppress the speckle noise of the SAR image. The second part is Hierarchical Feature Fusion (HFF) module. It can integrate more low-level features by adding skip connections. This prevents the low-level spatial position information of the fused features from being diluted by high-level semantic information, therefore it is beneficial to the detection of small ships. The third part is a center-based ship predictor with Feature Refinement (FR). The FR module is proposed to refine the features and reduce the background interference, which is conducive to locate ships more accurately. Extensive experiments are conducted. The experimental results show that after adding the denoising and FR modules, the value of AP 0. 5 is increased by 1. 7% and 2. 3%, respectively, which proves the effectiveness of these two modules. In inshore and offshore scenarios, the AP 0. 5 values of DFR are 0. 884 and 0. 966, respectively, achieving the best results. The proposed method can also be generalized to mark lesion locations in medical images and detect offshore oil production platforms.

Details DOI

EAAI Journal 2023 Journal Article

SHDM-NET: Heat map detail guidance with image matting for industrial weld semantic segmentation network

Qi Wang
Jingwu Mei
Wuming Jiang
Hegui Zhu

Welding is widely used in metal components. The firm of weld components is very important in different applications, such as buildings, bridges, cars and airplanes, etc. Weld seam quality inspection is essential to ensure product quality. The area and shape of the weld seam are the basis for quality assessment. So the segmentation of the weld area is very important for quality assessment. To address the problem of segmentation of the weld seam region, a weld seam segmentation network based on heat map detail guidance with Matting is proposed in this paper, which provides a new idea for fine-grained segmentation of the weld seam region. The existing DCNN-based semantic segmentation algorithm model has a poor segmentation effect at the boundary and jagged segmentation boundary, which are unacceptable for the weld segmentation problem that requires clear and precise boundary positioning. To solve this problem, three innovations are made in this paper on the DCNN-based semantic segmentation network. (1) A heat map detail guidance module makes the segmentation boundary information focus on shallow features and enhances the representation of boundary information. (2) A segmentation head improvement method for fine-grained semantic segmentation is proposed. (3) In response to the loss of process details in the coding and decoding of the semantic segmentation network, resulting in poor segmentation boundary accuracy, this paper introduces a matting algorithm to calibrate the boundary of the weld seam segmentation region. Through many experiments on industrial weld data sets, the effectiveness of our method is demonstrated, and the MIoU (Mean Intersection over Union) reaches 96. 32%. It is worth noting that this performance is comparable to human manual segmentation ( MIoU 96. 38%).

Details DOI

IJCAI Conference 2022 Conference Paper

A Speech-driven Sign Language Avatar Animation System for Hearing Impaired Applications

Li Hu
Jiahui Li
Jiashuo Zhang
Qi Wang
Bang Zhang
Ping Tan

Sign language is the communication language used in hearing impaired community. Recently, the research of sign language production has made great progress but still need to cope with some critical challenges. In this paper, we propose a system-level scheme and push forward the implementation of sign language production for practical usage. We build a system capable of translating speech into sign language avatar. Different from previous approach only focusing on single technology, we systematically combine algorithms of language translation, body gesture animation and facial avatar generation. We also develop two applications: Sign Language Interpretation APP and Virtual Sign Language Anchor, to facilitate easy and clear communication for hearing impaired people.

PDF Details DOI

IJCAI Conference 2022 Conference Paper

AttExplainer: Explain Transformer via Attention by Reinforcement Learning

Runliang Niu
Zhepei Wei
Yan Wang
Qi Wang

Transformer and its variants, built based on attention mechanisms, have recently achieved remarkable performance in many NLP tasks. Most existing works on Transformer explanation tend to reveal and utilize the attention matrix with human subjective intuitions in a qualitative manner. However, the huge size of dimensions directly challenges these methods to quantitatively analyze the attention matrix. Therefore, in this paper, we propose a novel reinforcement learning (RL) based framework for Transformer explanation via attention matrix, namely AttExplainer. The RL agent learns to perform step-by-step masking operations by observing the change in attention matrices. We have adapted our method to two scenarios, perturbation-based model explanation and text adversarial attack. Experiments on three widely used text classification benchmarks validate the effectiveness of the proposed method compared to state-of-the-art baselines. Additional studies show that our method is highly transferable and consistent with human intuition. The code of this paper is available at https: //github. com/niuzaisheng/AttExplainer.

PDF Details DOI

JBHI Journal 2022 Journal Article

Corrections to “ i Phantom: A Framework for Automated Creation of Individualized Computational Phantoms and its Application to CT Organ Dosimetry” [Aug 21 3061-3072]

Wanyi Fu
Shobhit Sharma
Ehsan Abadi
Alexandros-Stavros Iliopoulos
Qi Wang
Joseph Y. Lo
Xiaobai Sun
William P. Segars

In [1], the dose estimation accuracy using the alternative baseline method under modulated tube current was not correctly calculated due to an unintentional simulation error.

Details DOI

EAAI Journal 2022 Journal Article

Dual-branch framework: AUV-based target recognition method for marine survey

Fei Yu
Bo He
Jixin Liu
Qi Wang

Autonomous recognition of marine targets is considered a promising technology for autonomous underwater vehicle (AUV) marine survey, and AUV equipped with side-scan sonar (SSS) for recognition is the key to surveys. As a fundamental function, SSS recognition remains unsolved due to the challenging image conditions of SSS and insufficient algorithm robustness. This paper proposes an accurate and real-time dual-branch recognition framework containing segmentation and refinement branches. Firstly, the segmentation branch uses a lightweight learning network to analyze the data comprehensively. In this branch, we propose a densely connected local attention recurrent residual (LAR2) block as the backbone, and at the same time, an atrous convolution is introduced. This branch can focus on the features of interest in the image, ensuring better feature representation with low-resolution SSS information while guiding the next branch. Secondly, the refinement branch is to adjust the previous branch’s results and combines the low-level and high-level features. We propose holistic attention (HA) block in this branch, which can further improve the target recognition performance. Finally, we adopt the feature fusion method of bilinear pooling to integrate the results of the two branches to output a high-precision recognition image. In offline experiments and sea trials, our proposed method outperforms other competing algorithms in the four indicators of semantic segmentation, and achieves a computation speed of 92. 66 ms ( ± 0. 86 ms) per image on AUV dedicated hardware. The method has strong robustness, meets real-time performance, and can be widely used in AUV marine survey.

Details DOI

YNIMG Journal 2022 Journal Article

Focal fMRI signal enhancement with implantable inductively coupled detectors

Yi Chen
Qi Wang
Sangcheon Choi
Hang Zeng
Kengo Takahashi
Chunqi Qian
Xin Yu

Despite extensive efforts to increase the signal-to-noise ratio (SNR) of fMRI images for brain-wide mapping, technical advances of focal brain signal enhancement are lacking, in particular, for animal brain imaging. Emerging studies have combined fMRI with fiber optic-based optogenetics to decipher circuit-specific neuromodulation from meso to macroscales. High-resolution fMRI is needed to integrate hemodynamic responses into cross-scale functional dynamics, but the SNR remains a limiting factor given the complex implantation setup of animal brains. Here, we developed a multimodal fMRI imaging platform with an implanted inductive coil detector. This detector boosts the tSNR of MRI images, showing a 2-3-fold sensitivity gain over conventional coil configuration. In contrast to the cryoprobe or array coils with limited spaces for implanted brain interface, this setup offers a unique advantage to study brain circuit connectivity with optogenetic stimulation and can be further extended to other multimodal fMRI mapping schemes.

Details DOI

NeurIPS Conference 2022 Conference Paper

Learning Expressive Meta-Representations with Mixture of Expert Neural Processes

Qi Wang
Herke van Hoof

Neural processes (NPs) formulate exchangeable stochastic processes and are promising models for meta learning that do not require gradient updates during the testing phase. However, most NP variants place a strong emphasis on a global latent variable. This weakens the approximation power and restricts the scope of applications using NP variants, especially when data generative processes are complicated. To resolve these issues, we propose to combine the Mixture of Expert models with Neural Processes to develop more expressive exchangeable stochastic processes, referred to as Mixture of Expert Neural Processes (MoE-NPs). Then we apply MoE-NPs to both few-shot supervised learning and meta reinforcement learning tasks. Empirical results demonstrate MoE-NPs' strong generalization capability to unseen tasks in these benchmarks.

PDF Details

JBHI Journal 2022 Journal Article

MRI Generated From CT for Acute Ischemic Stroke Combining Radiomics and Generative Adversarial Networks

Eryan Feng
Pinle Qin
Rui Chai
Jianchao Zeng
Qi Wang
Yanfeng Meng
Peng Wang

Compared to computed tomography (CT), magnetic resonance imaging (MRI) is more sensitive to acute ischemic stroke lesion. However, MRI is time-consuming, expensive, and susceptible to interference from metal implants. Generating MRI images from CT images can address the limitations of MRI. The key problem in the process is obtaining lesion information from CT. In this study, we propose a cross-modal image generation algorithm from CT to MRI for acute ischemic stroke by combining radiomics with generative adversarial networks. First, the lesion candidate region was obtained using radiomics, the radiomic features of the region were extracted, and the feature with the largest information gain was selected and visualized as a feature map. Then, the concatenation of the extracted feature map and the CT image was input in the generator. We added a residual module after the downsampling of the generator, following the general shape of U-Net, which can deepen the network without causing degradation problems. In addition, we introduced the lesion feature similarity loss function to focus the model on the similarity of the lesion. Through the subjective judgment of two experienced radiologists and using evaluation metrics, the results showed that the generated MRI images were very similar to the real MRI images. Moreover, the locations of the lesions were correct, and the shapes of lesions were similar to those of the real lesions, which can help doctors with timely diagnosis and treatment.

Details DOI

JBHI Journal 2021 Journal Article

i Phantom: A Framework for Automated Creation of Individualized Computational Phantoms and Its Application to CT Organ Dosimetry

Wanyi Fu
Shobhit Sharma
Ehsan Abadi
Alexandros-Stavros Iliopoulos
Qi Wang
Joseph Y. Lo
Xiaobai Sun
William P. Segars

Objective: This study aims to develop and validate a novel framework, iPhantom, for automated creation of patient-specific phantoms or “digital-twins (DT)” using patient medical images. The framework is applied to assess radiation dose to radiosensitive organs in CT imaging of individual patients. Method: Given a volume of patient CT images, iPhantom segments selected anchor organs and structures (e. g. , liver, bones, pancreas) using a learning-based model developed for multi-organ CT segmentation. Organs which are challenging to segment (e. g. , intestines) are incorporated from a matched phantom template, using a diffeomorphic registration model developed for multi-organ phantom-voxels. The resulting digital-twin phantoms are used to assess organ doses during routine CT exams. Result: iPhantom was validated on both with a set of XCAT digital phantoms (n = 50) and an independent clinical dataset (n = 10) with similar accuracy. iPhantom precisely predicted all organ locations yielding Dice Similarity Coefficients (DSC) 0. 6 - 1 for anchor organs and DSC of 0. 3-0. 9 for all other organs. iPhantom showed <; 10% errors in estimated radiation dose for the majority of organs, which was notably superior to the state-of-the-art baseline method (20-35% dose errors). Conclusion: iPhantom enables automated and accurate creation of patient-specific phantoms and, for the first time, provides sufficient and automated patient-specific dose estimates for CT dosimetry. Significance: The new framework brings the creation and application of CHPs (computational human phantoms) to the level of individual CHPs through automation, achieving wide and precise organ localization, paving the way for clinical monitoring, personalized optimization, and large-scale research.

Details DOI

EAAI Journal 2021 Journal Article

Learning to traverse over graphs with a Monte Carlo tree search-based self-play framework

Qi Wang
Yongsheng Hao
Jie Cao

The combinatorial optimization (CO) problems on the graph are the core and classic problems in artificial intelligence (AI) and operations research (OR). For example, the Vehicle Routing Problem (VRP) and Traveling Salesman Problem (TSP) are fascinating NP-hard problems and have important significance for the existing transportation system. Traditional methods such as heuristics methods, exact algorithms, and solution solvers can already find approximate solutions on small-scale graphs. However, they are helpless for large-scale graphs and other problems with similar structures. Moreover, traditional methods often require artificially designed heuristic functions to aid decision-making. In recent years, more and more work has focused on applying deep learning and reinforcement learning (RL) to learn heuristics, which allows us to learn the internal structure of the graph end-to-end and find the optimal path under the guidance of heuristic rules. However, most of these still need manual assistance, and the RL method used has the problems of low sampling efficiency and small searchable space. This paper proposes a novel framework (called OmegaZero) based on Alphago Zero, which does not prescribe expert experience or label data but is trained through self-play. We divide the learning into two stages: in the first stage, we employ graph attention network (GAT) and GRU to learn node representations and memory history trajectories. In the second stage, we employ Monte Carlo tree search (MCTS) and deep RL to search for the solution space and train the model.

Details DOI

YNIMG Journal 2020 Journal Article

Inter-subject pattern analysis: A straightforward and powerful scheme for group-level MVPA

Qi Wang
Bastien Cagna
Thierry Chaminade
Sylvain Takerkart

Multivariate pattern analysis (MVPA) has become vastly popular for analyzing functional neuroimaging data. At the group level, two main strategies are used in the literature. The standard one is hierarchical, combining the outcomes of within-subject decoding results in a second-level analysis. The alternative one, inter-subject pattern analysis, directly works at the group-level by using, e. g. a leave-one-subject-out cross-validation. This study provides a thorough comparison of these two group-level decoding schemes, using both a large number of artificial datasets where the size of the multivariate effect and the amount of inter-individual variability are parametrically controlled, as well as two real fMRI datasets comprising 15 and 39 subjects, respectively. We show that these two strategies uncover distinct significant regions with partial overlap, and that inter-subject pattern analysis is able to detect smaller effects and to facilitate the interpretation. The core source code and data are openly available, allowing to fully reproduce most of these results.

Details DOI

NeurIPS Conference 2020 Conference Paper

Unsupervised Semantic Aggregation and Deformable Template Matching for Semi-Supervised Learning

Tao Han
Junyu Gao
Yuan Yuan
Qi Wang

Unlabeled data learning has attracted considerable attention recently. However, it is still elusive to extract the expected high-level semantic feature with mere unsupervised learning. In the meantime, semi-supervised learning (SSL) demonstrates a promising future in leveraging few samples. In this paper, we combine both to propose an Unsupervised Semantic Aggregation and Deformable Template Matching (USADTM) framework for SSL, which strives to improve the classification performance with few labeled data and then reduce the cost in data annotating. Specifically, unsupervised semantic aggregation based on Triplet Mutual Information (T-MI) loss is explored to generate semantic labels for unlabeled data. Then the semantic labels are aligned to the actual class by the supervision of labeled data. Furthermore, a feature pool that stores the labeled samples is dynamically updated to assign proxy labels for unlabeled data, which are used as targets for cross-entropy minimization. Extensive experiments and analysis across four standard semi-supervised learning benchmarks validate that USADTM achieves top performance (e. g. , 90. 46% accuracy on CIFAR-10 with 40 labels and 95. 20% accuracy with 250 labels). The code is released at https: //github. com/taohan10200/USADTM.

PDF Details

AAAI Conference 2020 Conference Paper

Weakly-Supervised Video Moment Retrieval via Semantic Completion Network

Zhijie Lin
Zhou Zhao
Zhu Zhang
Qi Wang
Huasheng Liu

Video moment retrieval is to search the moment that is most relevant to the given natural language query. Existing methods are mostly trained in a fully-supervised setting, which requires the full annotations of temporal boundary for each query. However, manually labeling the annotations is actually time-consuming and expensive. In this paper, we propose a novel weakly-supervised moment retrieval framework requiring only coarse video-level annotations for training. Speciﬁcally, we devise a proposal generation module that aggregates the context information to generate and score all candidate proposals in one single pass. We then devise an algorithm that considers both exploitation and exploration to select top- K proposals. Next, we build a semantic completion module to measure the semantic similarity between the selected proposals and query, compute reward and provide feedbacks to the proposal generation module for scoring reﬁnement. Experiments on the ActivityCaptions and Charades-STA demonstrate the effectiveness of our proposed method.

PDF Details

AAAI Conference 2019 Conference Paper

ACM: Adaptive Cross-Modal Graph Convolutional Neural Networks for RGB-D Scene Recognition

Yuan Yuan
Zhitong Xiong
Qi Wang

RGB image classification has achieved significant performance improvement with the resurge of deep convolutional neural networks. However, mono-modal deep models for RGB image still have several limitations when applied to RGB-D scene recognition. 1) Images for scene classification usually contain more than one typical object with flexible spatial distribution, so the object-level local features should also be considered in addition to global scene representation. 2) Multi-modal features in RGB-D scene classification are still under-utilized. Simply combining these modal-specific features suffers from the semantic gaps between different modalities. 3) Most existing methods neglect the complex relationships among multiple modality features. Considering these limitations, this paper proposes an adaptive crossmodal (ACM) feature learning framework based on graph convolutional neural networks for RGB-D scene recognition. In order to make better use of the modal-specific cues, this approach mines the intra-modality relationships among the selected local features from one modality. To leverage the multi-modal knowledge more effectively, the proposed approach models the inter-modality relationships between two modalities through the cross-modal graph (CMG). We evaluate the proposed method on two public RGB-D scene classification datasets: SUN-RGBD and NYUD V2, and the proposed method achieves state-of-the-art performance.

PDF Details

AAAI Conference 2019 Conference Paper

Memory-Augmented Temporal Dynamic Learning for Action Recognition

Yuan Yuan
Dong Wang
Qi Wang

Human actions captured in video sequences contain two crucial factors for action recognition, i. e. , visual appearance and motion dynamics. To model these two aspects, Convolutional and Recurrent Neural Networks (CNNs and RNNs) are adopted in most existing successful methods for recognizing actions. However, CNN based methods are limited in modeling long-term motion dynamics. RNNs are able to learn temporal motion dynamics but lack effective ways to tackle unsteady dynamics in long-duration motion. In this work, we propose a memory-augmented temporal dynamic learning network, which learns to write the most evident information into an external memory module and ignore irrelevant ones. In particular, we present a differential memory controller to make a discrete decision on whether the external memory module should be updated with current feature. The discrete memory controller takes in the memory history, context embedding and current feature as inputs and controls information flow into the external memory module. Additionally, we train this discrete memory controller using straight-through estimator. We evaluate this end-to-end system on benchmark datasets (UCF101 and HMDB51) of human action recognition. The experimental results show consistent improvements on both datasets over prior works and our baselines.

PDF Details

YNIMG Journal 2018 Journal Article

Correlation of neural activity with behavioral kinematics reveals distinct sensory encoding and evidence accumulation processes during active tactile sensing

Ioannis Delis
Jacek P. Dmochowski
Paul Sajda
Qi Wang

Many real-world decisions rely on active sensing, a dynamic process for directing our sensors (e. g. eyes or fingers) across a stimulus to maximize information gain. Though ecologically pervasive, limited work has focused on identifying neural correlates of the active sensing process. In tactile perception, we often make decisions about an object/surface by actively exploring its shape/texture. Here we investigate the neural correlates of active tactile decision-making by simultaneously measuring electroencephalography (EEG) and finger kinematics while subjects interrogated a haptic surface to make perceptual judgments. Since sensorimotor behavior underlies decision formation in active sensing tasks, we hypothesized that the neural correlates of decision-related processes would be detectable by relating active sensing to neural activity. Novel brain-behavior correlation analysis revealed that three distinct EEG components, localizing to right-lateralized occipital cortex (LOC), middle frontal gyrus (MFG), and supplementary motor area (SMA), respectively, were coupled with active sensing as their activity significantly correlated with finger kinematics. To probe the functional role of these components, we fit their single-trial-couplings to decision-making performance using a hierarchical-drift-diffusion-model (HDDM), revealing that the LOC modulated the encoding of the tactile stimulus whereas the MFG predicted the rate of information integration towards a choice. Interestingly, the MFG disappeared from components uncovered from control subjects performing active sensing but not required to make perceptual decisions. By uncovering the neural correlates of distinct stimulus encoding and evidence accumulation processes, this study delineated, for the first time, the functional role of cortical areas in active tactile decision-making.

Details DOI

AAAI Conference 2018 Conference Paper

Inferring Emotion from Conversational Voice Data: A Semi-Supervised Multi-Path Generative Neural Network Approach

Suping Zhou
Jia Jia
Qi Wang
Yufei Dong
Yufeng Yin
Kehua Lei

To give a more humanized response in Voice Dialogue Applications (VDAs), inferring emotion states from users’ queries may play an important role. However, in VDAs, we have tremendous amount of VDA users and massive scale of unlabeled data with high dimension features from multimodal information, which challenge the traditional speech emotion recognition methods. In this paper, to better infer emotion from conversational voice data, we propose a semisupervised multi-path generative neural network. Speciﬁcally, ﬁrst, we build a novel supervised multi-path deep neural network framework. To avoid high dimensional input, raw features are trained by groups in local classiﬁers. Then high-level features of each local classiﬁers are concatenated as input of a global classiﬁer. These two kinds classiﬁers are trained simultaneously through a single objective function to achieve a more effective and discriminative emotion inferring. To further solve the labeled-datascarcity problem, we extend the multi-path deep neural network to a generative model based on semi-supervised variational autoencoder(semi-VAE), which is able to train the labeled and unlabeled data simultaneously. Experiment based on a 24, 000 real-world dataset collected from Sogou Voice Assistant1 (SVAD13) and a benchmark dataset IEMOCAP show that our method signiﬁcantly outperforms the existing state-of-the-art results.

PDF Details

IJCAI Conference 2018 Conference Paper

Nonrigid Points Alignment with Soft-weighted Selection

Xuelong Li
Jian Yang
Qi Wang

Point set registration (PSR) is a crucial problem in computer vision and pattern recognition. Existing PSR methods cannot align point sets robustly due to degradations, such as deformation, noise, occlusion, outlier, and multi-view changes. In this paper, we present a self-selected regularized Gaussian fields criterion for nonrigid point matching. Unlike most existing methods, we formulate the registration problem as a sparse approximation task with low rank constraint in reproducing kernel Hilbert space (RKHS). A self-selected mechanism is used to dynamically assign real-valued label for each point in an accuracy-aware weighting manner, which makes the model focus more on the reliable points in position. Based on the label, an equivalent matching number optimization is embedded into the non-rigid criterion to enhance the reliability of the approximation. Experimental results show that the proposed method can achieve a better result in both registration accuracy and correct matches compared to state-of-the-art approaches.

PDF Details

AAAI Conference 2017 Conference Paper

A Multiview-Based Parameter Free Framework for Group Detection

Xuelong Li
Mulin Chen
Feiping Nie
Qi Wang

Group detection is fundamentally important for analyzing crowd behaviors, and has attracted plenty of attention in arti- ﬁcial intelligence. However, existing works mostly have limitations due to the insufﬁcient utilization of crowd properties and the arbitrary processing of individuals. In this paper, we propose the Multiview-based Parameter Free (MPF) approach to detect groups in crowd scenes. The main contributions made in this study are threefold: (1) a new structural context descriptor is designed to characterize the structural property of individuals in crowd motions; (2) an selfweighted multiview clustering method is proposed to cluster feature points by incorporating their motion and context similarities; (3) a novel framework is introduced for group detection, which is able to determine the group number automatically without any parameter or threshold to be tuned. Extensive experiments on various real world datasets demonstrate the effectiveness of the proposed approach, and show its superiority against state-of-the-art group detection techniques.

PDF Details

IJCAI Conference 2017 Conference Paper

Convolutional 2D LDA for Nonlinear Dimensionality Reduction

Qi Wang
Zequn Qin
Feiping Nie
Yuan Yuan

Representing high-volume and high-order data is an essential problem, especially in machine learning field. Although existing two-dimensional (2D) discriminant analysis achieves promising performance, the single and linear projection features make it difficult to analyze more complex data. In this paper, we propose a novel convolutional two-dimensional linear discriminant analysis (2D LDA) method for data representation. In order to deal with nonlinear data, a specially designed Convolutional Neural Networks (CNN) is presented, which can be proved having the equivalent objective function with common 2D LDA. In this way, the discriminant ability can benefit from not only the nonlinearity of Convolutional Neural Networks, but also the powerful learning process. Experiment results on several datasets show that the proposed method performs better than other state-of-the-art methods in terms of classification accuracy.

PDF Details

IJCAI Conference 2017 Conference Paper

Locality Adaptive Discriminant Analysis

Xuelong Li
Mulin Chen
Feiping Nie
Qi Wang

Linear Discriminant Analysis (LDA) is a popular technique for supervised dimensionality reduction, and its performance is satisfying when dealing with Gaussian distributed data. However, the neglect of local data structure makes LDA inapplicable to many real-world situations. So some works focus on the discriminant analysis between neighbor points, which can be easily affected by the noise in the original data space. In this paper, we propose a new supervised dimensionality reduction method, Locality Adaptive Discriminant Analysis (LADA), to lean a representative subspace of the data. Compared to LDA and its variants, the proposed method has three salient advantages: (1) it finds the principle projection directions without imposing any assumption on the data distribution; (2) it’s able to exploit the local manifold structure of data in the desired subspace; (3) it exploits the points’ neighbor relationship automatically without introducing any additional parameter to be tuned. Performance on synthetic datasets and real-world benchmark datasets demonstrate the superiority of the proposed method.

PDF Details

AAAI Conference 2017 Conference Paper

Quantifying and Detecting Collective Motion by Manifold Learning

Qi Wang
Mulin Chen
Xuelong Li

The analysis of collective motion has attracted many researchers in artiﬁcial intelligence. Though plenty of works have been done on this topic, the achieved performance is still unsatisfying due to the complex nature of collective motions. By investigating the similarity of individuals, this paper proposes a novel framework for both quantifying and detecting collective motions. Our main contributions are threefold: (1) the time-varying dynamics of individuals are deeply investigated to better characterize the individual motion; (2) a structure-based collectiveness measurement is designed to precisely quantify both individual-level and scene-level properties of collective motions; (3) a multi-stage clustering strategy is presented to discover a more comprehensive understanding of the crowd scenes, containing both local and global collective motions. Extensive experimental results on real world data sets show that our method is capable of handling crowd scenes with complicated structures and various dynamics, and demonstrate its superior performance against state-of-the-art competitors.

PDF Details

TCS Journal 2016 Journal Article

The Space Complexity Analysis in the General Number Field Sieve Integer Factorization

Qi Wang
Xiubin Fan
Hongyan Zang
Yu Wang

The General Number Sieve is the most efficient algorithm for integer factorization. It consists of polynomial selection, sieving, solving equations and finding square roots. Root lifting of polynomial is discussed in this paper. The p-adic evaluation provided by each root and the expected p-value are also given. Then we gain the space complexity of sieving and building equations over the ring Z / 2 Z.

Details DOI

YNIMG Journal 2010 Journal Article

Identifying gene regulatory networks in schizophrenia

Steven G. Potkin
Fabio Macciardi
Guia Guffanti
James H. Fallon
Qi Wang
Jessica A. Turner
Anita Lakatos
Michael F. Miles

The imaging genetics approach to studying the genetic basis of disease leverages the individual strengths of both neuroimaging and genetic studies by visualizing and quantifying the brain activation patterns in the context of genetic background. Brain imaging as an intermediate phenotype can help clarify the functional link among genes, the molecular networks in which they participate, and brain circuitry and function. Integrating genetic data from a genome-wide association study (GWAS) with brain imaging as a quantitative trait (QT) phenotype can increase the statistical power to identify risk genes. A QT analysis using brain imaging (DLPFC activation during a working memory task) as a quantitative trait has identified unanticipated risk genes for schizophrenia. Several of these genes (RSRC1, ARHGAP18, ROBO1-ROBO2, GPC1, TNIK, and CTXN3-SLC12A2) have functions related to progenitor cell proliferation, migration, and differentiation, cytoskeleton reorganization, axonal connectivity, and development of forebrain structures. These genes, however, do not function in isolation but rather through gene regulatory networks. To obtain a deeper understanding how the GWAS-identified genes participate in larger gene regulatory networks, we measured correlations among transcript levels in the mouse and human postmortem tissue and performed a gene set enrichment analysis (GSEA) that identified several microRNA associated with schizophrenia (448, 218, 137). The results of such computational approaches can be further validated in animal experiments in which the networks are experimentally studied and perturbed with specific compounds. Glypican 1 and FGF17 mouse models for example, can be used to study such gene regulatory networks. The model demonstrates epistatic interactions between FGF and glypican on brain development and may be a useful model of negative symptom schizophrenia.

Details DOI

IROS Conference 2005 Conference Paper

The Pantograph Mk-II: a haptic instrument

Gianni Campion
Qi Wang
Vincent Hayward

We describe the redesign and the performance evaluation of a high-performance haptic device system called the Pantograph. The device is based on a two degree-of-freedom parallel mechanism which was designed for optimized dynamic performance, but which also is well kinematically conditioned. The results show that the system is capable of producing accurate tactile signals in the DC-400 Hz range and can resolve displacements of the order of 10 /spl mu/m. Future improvements are discussed.

Details

IROS Conference 2002 Conference Paper

A prototype virtual haptic bronchoscope

Qi Wang
Yongsheng Ou
Yangsheng Xu

In this paper, we describe the design of the hardware and software for a virtual bronchoscope with force feedback. A haptic interface allows surgeons to feel the reaction force of virtual pneumonic surgery as if they were touching the area directly. We present novel algorithms for haptic force rendering, and examine its ability to display force. The rendering algorithms have been interfaced with a force-reflecting device. This virtual haptic bronchoscope is of significance in training inexperienced doctors in pneumonic diagnosis and surgery.

Details

ICRA Conference 2000 Conference Paper

On Tracking Control of Mobile Manipulators

Wenjie Dong
Yangsheng Xu
Qi Wang

This paper studies the tracking control problem of mobile manipulators with consideration of the interaction between the mobile platform and the manipulator. A global tracking controller is proposed based on the dynamics of the defined tracking error and the extended Barbalat's lemma. The proposed controller ensures that the full state of the system asymptotically track the given desired trajectory globally in the presence of the system coupling. Extensive simulations presented in the paper show the effectiveness of the proposed approach.

Details

ICRA Conference 1998 Conference Paper

Towards Real-Time Robot Programming by Human Demonstration for 6D Force Controlled Actions

Qi Wang
Joris De Schutter

An approach for real-time robot programming by human demonstration for 6D force controlled actions is presented. A human operator utilises a joystick to guide a robot with a force sensor to execute a task including continuous contact between a manipulated object and an unmodelled environment. During the demonstration, the position, velocity and force of the manipulated object as well as the human commands via the joystick are recorded. In real-time, the recorded information is translated into a textual robot program providing more robust execution in the presence of uncertainties. This approach has three main features (1) online control type adjustment; (2) automatic subtask termination; (3) real-time program generation. Experiments show the potential industrial applicability.

Details

IROS Conference 1996 Conference Paper

An environment for compliant motion programming by human demonstration

Sean Graves
Qi Wang
Wim Witvrouw
Joris De Schutter

An integrated system for programming by demonstration, visualizing, and executing compliant motion programs is described. A human operator utilises a joystick to guide a robot with a force sensor to do a task including continuous contact between manipulator and environment. The demonstration may be executed either on an actual robot, or in a graphically simulated environment. During the demonstration, the position, velocity and force of the object manipulated are acquired. Then the recorded data are processed, analysed, and translated into a textual robot program, which provides more robust execution in the presence of uncertainties. The system is composed of a model-based reaction force simulator, a visualization package, a rule-based translator, and an interpreter for compliant motion programs. Experiments show the industrial applicability.

Details

ICRA Conference 1996 Conference Paper

Derivation of compliant motion programs based on human demonstration

Qi Wang
Joris De Schutter
Wim Witvrouw
Sean Graves

An approach to force controlled robot programming by human demonstration is presented. A human operator utilises a joystick to guide a robot with a force sensor to do a task including continuous contact between a manipulated object and an un-modelled environment. During the demonstration, the position, velocity and force of the object manipulated are acquired. Then the recorded data are processed, analysed, and translated into a textual robot program, which provides more robust execution in the presence of uncertainties. This approach consists of three key techniques-data processing, subtask segmentation and termination condition identification. A software package is developed to generate the programs automatically. Experiments show the industrial applicability.

Details