Arrow Research search

Author name cluster

Qian Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

65 papers
2 author rows

Possible papers

65

EAAI Journal 2026 Journal Article

A lesion region awareness and adaptive label-relation graph algorithm for multi-label chest X-ray image classification

  • Qian Wang
  • Weilun Meng
  • Congfan Gan
  • Hongnian Yu
  • Yongqiang Cheng

Multi-label chest X-ray (CXR) diagnosis remains challenging due to the high variability of lesion regions and the complex inter-disease relationships. Existing algorithms often rely on single-scale features and static label co-occurrence, limiting their ability to capture subtle lesions and dynamic label dependencies. This paper proposes a lesion region awareness and adaptive label-relation graph algorithm for multi-label CXR image classification. Firstly, the multi-scale feature-aware semantic learning method is proposed, which localizes disease regions of interest within features of different image scales, thereby extracting representations rich in the contextual information of the disease. Secondly, the adaptive label relation graph method is proposed, which dynamically models the dependencies among diseases for each sample and propagates discriminative features containing disease relations. Finally, the class-level feature enhancement method is proposed. Through the intra-class supervised contrastive learning strategy, the aggregation of disease-specific features is enhanced, further improving the discriminative ability and robustness of the algorithm. Experimental results on the CXR dataset demonstrate that the proposed algorithm outperforms state-of-the-art algorithms, verifying its effectiveness in multi-label chest disease classification.

AAAI Conference 2026 Conference Paper

Tell as You Want: Customizing Image Narrative with Knowledge and Thoughts

  • Ziwei Yao
  • Qian Wang
  • Ruiping Wang
  • Xilin Chen

With the advancement of vision-language models, image captioning has made significant progress, leading to the generation of more accurate and detailed descriptions. Current image captioning primarily focuses on describing the apparent visual characteristics, which are easily observed by most humans, but less helpful in real-world scenarios. When users seek a deeper understanding of visual content, they may be concerned with fine-grained categories, function properties, and other background knowledge, rather than merely appearances. Additionally, as users' interests vary, there is a growing demand for customizable content generation. To address these challenges, we propose the task of image narrative generation, which aims to produce knowledge-rich natural language responses for input images, customized to the user preference. Furthermore, we propose T^4, an image narrative generation model progressing through cascade steps: Tailor, reTrieve, Think, and Tell. Specifically, it takes the image and various types of prompts as input, and first refines or predicts potentially interesting queries that are tailored to the user expertise level. Subsequently, the model enriches contextual knowledge through retrieval-augmentation and employs chain-of-thoughts to decompose the generation process step by step, thereby telling an accurate and logically coherent image narrative. In addition, we construct the ImgNarr-23K dataset to support task training and evaluation. Experimental results demonstrate that the proposed approach generates image narratives that better satisfy user requirements, and achieves state-of-the-art performance in knowledge-based VQA tasks without additional finetuning. T^4 presents a promising solution for customized content generation in specialized domains.

IROS Conference 2025 Conference Paper

ad-trait: A Fast and Flexible Automatic Differentiation Library in Rust

  • Chen Liang
  • Qian Wang
  • Andy Xu
  • Daniel Rakita

The Rust programming language is an attractive choice for robotics and related fields, offering highly efficient and memory-safe code. However, a key limitation preventing its broader adoption in these domains is the lack of high-quality, well-supported Automatic Differentiation (AD)—a fundamental technique that enables convenient derivative computation by systematically accumulating data during function evaluation. In this work, we introduce ad-trait, a new Rust-based AD library. Our implementation overloads Rust’s standard floating-point type with a flexible trait that can efficiently accumulate necessary information for derivative computation. The library supports both forward-mode and reverse-mode automatic differentiation, making it the first operator-overloading AD implementation in Rust to offer both options. Additionally, ad-trait leverages Rust’s performance-oriented features, such as Single Instruction, Multiple Data acceleration in forward-mode AD, to enhance efficiency. Through benchmarking experiments, we show that our library is among the fastest AD implementations across several programming languages for computing derivatives. Moreover, it is already integrated into a Rust-based robotics library, where we showcase its ability to facilitate fast optimization procedures. We conclude with a discussion of the limitations and broader implications of our work.

NeurIPS Conference 2025 Conference Paper

AlignedGen: Aligning Style Across Generated Images

  • Jiexuan Zhang
  • Yiheng Du
  • Qian Wang
  • Weiqi Li
  • Yu Gu
  • Jian Zhang

Diffusion-based generative models struggle to maintain high style consistency across generated images via text description. Although several style-aligned image generation methods have been proposed to address this issue, they exhibit suboptimal performance and are primarily built upon the U-Net architecture, limiting their compatibility with DiT diffusion models like Flux that has emerged as a predominant model in the field of image generation. To address these limitations, we propose AlignedGen, a novel training-free style-aligned image generation method for DiT models to significantly enhance style consistency across generated images. Specifically, AlignedGen incorporates two key components to achieve this: Shifted Position Embedding (ShiftPE) and Advanced Attention Sharing (AAS). ShiftPE alleviates the text controllability degradation observed in prior methods when applied to DiT models through its non-overlapping position indices design, while AAS comprises three specialized techniques to unleash the full potential of DiT for style-aligned generation. Furthermore, to broaden the applicability of our method, we present an efficient query, key, and value feature extraction algorithm, enabling our method to seamlessly incorporate external images as style references. Extensive experimental results validate that our method effectively enhances style consistency across generated images while maintaining favorable text controllability. Code: https: //github. com/Jiexuanz/AlignedGen.

AAAI Conference 2025 Conference Paper

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation

  • Xie Tianyidan
  • Rui Ma
  • Qian Wang
  • Xiaoqian Ye
  • Feixuan Liu
  • Ying Tai
  • Zhenyu Zhang
  • Lanjun Wang

Recent advancements in image-conditioned image generation have demonstrated substantial progress. However, foreground-conditioned image generation remains underexplored, encountering challenges such as compromised object integrity, foreground-background inconsistencies, limited diversity, and reduced control flexibility. These challenges arise from current end-to-end inpainting models, which suffer from inaccurate training masks, limited foreground semantic understanding, data distribution biases, and inherent interference between visual and textual prompts. To overcome these limitations, we present Anywhere, a multi-agent framework that departs from the traditional end-to-end approach. In this framework, each agent is specialized in a distinct aspect, such as foreground understanding, diversity enhancement, object integrity protection, and textual prompt consistency. Our framework is further enhanced with the ability to incorporate optional user textual inputs, perform automated quality assessments, and initiate re-generation as needed. Comprehensive experiments demonstrate that this modular design effectively overcomes the limitations of existing end-to-end models, resulting in higher fidelity, quality, diversity and controllability in foreground-conditioned image generation. Additionally, the Anywhere framework is extensible, allowing it to benefit from future advancements in each individual agent.

NeurIPS Conference 2025 Conference Paper

BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning

  • Hongyi Zhou
  • Weiran Liao
  • Xi Huang
  • Yucheng Tang
  • Fabian Otto
  • Xiaogang Jia
  • Xinkai Jiang
  • Simon Hilber

We present the B-spline Encoded Action Sequence Tokenizer (BEAST), a novel action tokenizer that encodes action sequences into compact discrete or continuous tokens using B-splines. In contrast to existing action tokenizers based on vector quantization or byte pair encoding, BEAST requires no separate tokenizer training and consistently produces tokens of uniform length, enabling fast action sequence generation via parallel decoding. Leveraging our B-spline formulation, BEAST inherently ensures generating smooth trajectories without discontinuities between adjacent segments. We extensively evaluate BEAST by integrating it with three distinct model architectures: a Variational Autoencoder (VAE) with continuous tokens, a decoder-only Transformer with discrete tokens, and Florence-2, a pretrained Vision-Language Model with an encoder-decoder architecture, demonstrating BEAST's compatibility and scalability with large pretrained models. We evaluate BEAST across three established benchmarks consisting of 166 simulated tasks and on three distinct robot settings with a total of 8 real-world tasks. Experimental results demonstrate that BEAST (i) significantly reduces both training and inference computational costs, and (ii) consistently generates smooth, high-frequency control signals suitable for continuous control tasks while (iii) reliably achieves competitive task success rates compared to state-of-the-art methods.

EAAI Journal 2025 Journal Article

Deep bioinspired evolutionary stacking algorithm for unpaired multimodal cell classification calibration

  • Lili Zhao
  • Di Xu
  • Xueping Tan
  • Jinzhao Yang
  • Weiping Ding
  • Hengde Zhu
  • Lichi Zhang
  • Qian Wang

Single-modality deep learning approaches for cell image classification exhibit inherent limitations in informational diversity when processing cross-institutional datasets acquired under varied imaging protocols. In contrast, multimodal cell imaging has emerged as a promising alternative for addressing data heterogeneity through comprehensive information integration. This study introduces a novel multimodal alternate training decision-making architecture based on a stacking algorithm for unpaired multimodal cell classification calibration. The method leverages deep bioinspired evolutionary networks combined with kernel-based support vector machines. Specifically, deep base classifiers incorporating multimodal concepts are derived from heterogeneous cell datasets. Each base classifier employs a bioinspired strategy to perform alternate training between two evolutionary densely connected attention networks. To mitigate class imbalance, where diseased cells are significantly outnumbered by normal cells, we incorporate a Shannon entropy loss term. Finally, multiple kernel-based support vector machines serve as meta classifiers, transforming high-level multimodal concepts into a separable feature space for robust decision-making. Experimental results demonstrate the superiority of our approach over existing algorithms for unpaired multimodal cell image classification. Our findings emphasize the importance of alternate training intra-modality classifiers and inter-modality fusion calibration for accurate and reliable medical image classification. All source code for this work will be publicly available on GitHub.

EAAI Journal 2025 Journal Article

Deep Content and Contrastive Perception learning for automatic fetal nuchal translucency image quality assessment

  • Lili Zhao
  • Yuanyuan Xu
  • Jian Xu
  • Weiping Ding
  • Jinzhao Yang
  • Huiyu Zhou
  • Yiming Du
  • Bin Hu

Automatic quality assessment of fetal nuchal translucency ultrasound images can assist physicians in obtaining standard planes and improve the reproducibility of nuchal translucency screening. At present, there are no special studies and methods for the quality assessment of fetal nuchal translucency ultrasound images. For this task, main challenges are low image quality, content identification of structural integrity and relative position relationship, time consumption for data collection and fine-grained annotation. To address these challenges, we propose a framework based on DenseNet model, which includes preprocessing module, content perception module, attention learning module and contrastive regularization module. Experiments show that the modules are effective for improving the quality assessment framework performance. And this framework is better than the other fourteen deep learning models. This framework can provide the sonographer with a model interpretable reference map. Bland–Altman experimental analysis also verifies the consistency between the results obtained by the automatic quality assessment framework and the manually annotated clinical dataset. Therefore, the proposed quality assessment framework for fetal nuchal translucency ultrasound images has the prospect and value of clinical application.

IROS Conference 2025 Conference Paper

Differential-Flatness-Based Tracking Control for Tractor-Trailers in Reversing Maneuvers

  • Bo Yang 0064
  • Zhenhao Zhuang
  • Zitian Yu
  • Qian Wang
  • Junqing Wei
  • Yilin Mo
  • Wen Yang

In this paper, we propose a differential-flatness-based controller (DFBC) for precise trajectory tracking of tractor-trailers, particularly during reversing maneuvers, which are challenging due to unstable equilibrium points. The proposed controller leverages the differential flatness property of tractor-trailers, equivalently transforming the nonlinear kinematics into a brunovsky canonical form, allowing the application of linear control theory for control design. Compared to traditional linear quadratic regulator (LQR) controllers, the proposed DFBC method achieves higher precision and robustness in reversing maneuvers. We also showcase the performance of the proposed DFBC method through physical experiments conducted on our self-developed 1/10 scale autonomous tractor-trailer.

EAAI Journal 2025 Journal Article

Enhanced cross-domain lithology classification in imbalanced datasets using an unsupervised domain Adversarial Network

  • Yunxin Xie
  • Liangyu Jin
  • Chenyang Zhu
  • Weibin Luo
  • Qian Wang

Recent advancements in Artificial Intelligence (AI), particularly deep learning, have significantly improved lithology identification in reservoir exploration by leveraging micrographic rock imagery. Deep neural networks excel in feature extraction, enhancing classification accuracy. However, these models are prone to domain shifts, which often degrade their performance in real-world applications. This paper proposes an unsupervised domain adaptation framework that integrates Fisher linear discriminant analysis and Online Hard Example Mining (OHEM) to mitigate domain shifts and improve classification, particularly in datasets with imbalanced classes. The model employs a ω -balanced global–local domain discriminator to align feature distributions between different domains and introduces focal loss with class-wise weighted factors for better handling of imbalanced data. Additionally, an adapted version of OHEM identifies difficult samples during training, allowing the model to concentrate on challenging cases. The proposed method is validated on micrographic rock imagery from the Tibet, Qinghai, and Xinjiang regions, achieving an average accuracy of 83. 2%, which is 13. 8% higher than ResNet50 and at least 1% superior to other domain adaptation models. This research highlights the potential of AI-driven solutions in geoscientific applications and provides a robust framework for unsupervised lithology classification.

NeurIPS Conference 2025 Conference Paper

FreeControl: Efficient, Training-Free Structural Control via One-Step Attention Extraction

  • Jiang Lin
  • Xinyu Chen
  • Song Wu
  • Zhiqiu Zhang
  • Jizhi Zhang
  • Ye Wang
  • Qiang Tang
  • Qian Wang

Controlling the spatial and semantic structure of diffusion-generated images remains a challenge. Existing methods like ControlNet rely on handcrafted condition maps and retraining, limiting flexibility and generalization. Inversion-based approaches offer stronger alignment but incur high inference cost due to dual-path denoising. We present \textbf{FreeControl}, a training-free framework for semantic structural control in diffusion models. Unlike prior methods that extract attention across multiple timesteps, FreeControl performs \textit{one-step attention extraction} from a single, optimally chosen timestep and reuses it throughout denoising. This enables efficient structural guidance without inversion or retraining. To further improve quality and stability, we introduce \textit{Latent-Condition Decoupling (LCD)}: a principled separation of the timestep condition and the noised latent used in attention extraction. LCD provides finer control over attention quality and eliminates structural artifacts. FreeControl also supports compositional control via reference images assembled from multiple sources, enabling intuitive scene layout design and stronger prompt alignment. FreeControl introduces a new paradigm for test-time control—enabling structurally and semantically aligned, visually coherent generation directly from raw images, with the flexibility for intuitive compositional design and compatibility with modern diffusion models at ~5\% additional cost.

NeurIPS Conference 2025 Conference Paper

Hierarchical Frequency Tagging Probe (HFTP): A Unified Approach to Investigate Syntactic Structure Representations in Large Language Models and the Human Brain

  • Jingmin An
  • Yilong Song
  • Ruolin Yang
  • Nai Ding
  • Lingxi Lu
  • Yuxuan Wang
  • Wei Wang
  • Chu Zhuang

Large Language Models (LLMs) demonstrate human-level or even superior language abilities, effectively modeling syntactic structures, yet the specific computational units responsible remain unclear. A key question is whether LLM behavioral capabilities stem from mechanisms akin to those in the human brain. To address these questions, we introduce the Hierarchical Frequency Tagging Probe (HFTP), a tool that utilizes frequency-domain analysis to identify neuron-wise components of LLMs (e. g. , individual Multilayer Perceptron (MLP) neurons) and cortical regions (via intracranial recordings) encoding syntactic structures. Our results show that models such as GPT-2, Gemma, Gemma 2, Llama 2, Llama 3. 1, and GLM-4 process syntax in analogous layers, while the human brain relies on distinct cortical regions for different syntactic levels. Representational similarity analysis reveals a stronger alignment between LLM representations and the left hemisphere of the brain (dominant in language processing). Notably, upgraded models exhibit divergent trends: Gemma 2 shows greater brain similarity than Gemma, while Llama 3. 1 shows less alignment with the brain compared to Llama 2. These findings offer new insights into the interpretability of LLM behavioral improvements, raising questions about whether these advancements are driven by human-like or non-human-like mechanisms, and establish HFTP as a valuable tool bridging computational linguistics and cognitive neuroscience. This project is available at https: //github. com/LilTiger/HFTP.

EAAI Journal 2025 Journal Article

Loosen Attention: Integrating localized channel and coarse spatial attention for enhanced analysis of complex aurora images

  • Qian Wang
  • Shihao Jing
  • Rui Yang
  • Zhenpei Liu
  • Yao Tang
  • Han Pan

Auroral images present a significant challenge for automated analysis due to their highly variable and dynamic morphology, influenced by complex interactions between the solar wind and Earth’s magnetosphere. These natural phenomena exhibit considerable randomness in shape, brightness, and motion, making them a unique and challenging signal source for artificial intelligence methods. In this work, we propose Loosen Attention (LA), a novel and lightweight attention mechanism tailored to capture the unpredictable and fluid-like nature of auroral patterns. LA integrates localized channel attention with coarse-grained spatial attention, forming a flexible attention framework that enhances the robustness and adaptability of feature extraction in deep learning models. The LA module is engineered around four key strategies: volumetric feature generation, volume-wise attention computation, refinement and reconstruction, and enhanced feature fusion, enabling efficient focus on subtle yet significant auroral structures while tolerating less informative regions. Unlike conventional attention methods that may struggle with complex visual patterns, LA is explicitly designed to process diverse auroral images in a structured and computationally efficient way. We validate the LA mechanism on multiple vision tasks to demonstrate consistent performance improvements over state-of-the-art attention modules. The results highlight the potential of LA to improve deep learning pipelines in geospace imaging and other domains dealing with similarly complex natural signals. This work offers a promising approach for intelligent processing of Earth–space interaction data and other visually ambiguous scientific images.

AAAI Conference 2025 Conference Paper

MUC: Mixture of Uncalibrated Cameras for Robust 3D Human Body Reconstruction

  • Yitao Zhu
  • Sheng Wang
  • Mengjie Xu
  • Zixu Zhuang
  • Zhixin Wang
  • Kaidong Wang
  • Han Zhang
  • Qian Wang

Multiple cameras can provide comprehensive multi-view video coverage of a person. Fusing this multi-view data is crucial for tasks like behavioral analysis, although it traditionally requires camera calibration—a process that is often complex. Moreover, previous studies have overlooked the challenges posed by self-occlusion under multiple views and the continuity of human body shape estimation. In this study, we introduce a method to reconstruct the 3D human body from multiple uncalibrated camera views. Initially, we utilize a pre-trained human body encoder to process each camera view individually, enabling the reconstruction of human body models and parameters for each view along with predicted camera positions. Rather than merely averaging the models across views, we develop a neural network trained to assign weights to individual views for all human body joints, based on the estimated distribution of joint distances from each camera. Additionally, we focus on the mesh surface of the human body for dynamic fusion, allowing for the seamless integration of facial expressions and body shape into a unified human body model. Our method has shown excellent performance in reconstructing the human body on two public datasets, advancing beyond previous work from the SMPL model to the SMPL-X model. This extension incorporates more complex hand poses and facial expressions, enhancing the detail and accuracy of the reconstructions. Crucially, it supports the flexible ad-hoc deployment of any number of cameras, offering significant potential for various applications.

JBHI Journal 2025 Journal Article

Multi-Gate Mixture of Multi-View Graph Contrastive Learning on Electronic Health Record

  • Yu Cao
  • Qian Wang
  • Xu Wang
  • Dezhong Peng
  • Peilin Li

Electronic Health Record (EHR) is the digital form of patient visits that contains various medical data, including diagnosis, treatment, and lab events. Representation learning of EHR with deep learning methods has been beneficial for patient-related prediction tasks. Recently, studies have focused on revealing the inherent graph structure between medical events in EHR. Graph neural network (GNN) methods are prevalent and perform well in various prediction tasks. However, the inherent relationships between various medical events must be marked, which is complicated and time-consuming. Most research works adopt the straightforward structure of GNN models on a single prediction task which could not fully exploit the potential of EHR representations. Compared with previous work, the multi-task prediction could utilize the latent information of concealed correlations between different prediction tasks. In addition, self-contrastive learning on graphs could improve the representation learned by GNN. We propose a multi-gate mixture of multi-view graph contrastive learning (MMMGCL) method, aiming to get a more reasonable EHR representation and improve the performances of downstream tasks. First, each patient visit is represented as a graph with a well-designed hierarchically fully-connected pattern. Second, node features in the manually constructed graph are pre-trained via the Glove method with hierarchical ontology knowledge. Finally, MMMGCL processes the pre-trained graph and adopts a joint learning strategy to simultaneously optimize task and contrastive losses. We verify our method on two large open-source medical datasets, Medical Information Mart for Intensive Care (MIMIC-III) and the eICU Collaborative Research Database (eICU). Experiment results show that our method could improve performance compared to straightforward graph-based methods on prediction tasks of patient readmission, mortality, and length of stay.

ICLR Conference 2025 Conference Paper

NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens

  • Cunxiang Wang
  • Ruoxi Ning
  • Boqi Pan
  • Tonghui Wu
  • Qipeng Guo
  • Cheng Deng 0001
  • Guangsheng Bao
  • Xiangkun Hu

Recent advancements in Large Language Models (LLMs) have pushed the boundaries of natural language processing, especially in long-context understanding. However, the evaluation of these models' long-context abilities remains a challenge due to the limitations of current benchmarks. To address this gap, we introduce NovelQA, a benchmark tailored for evaluating LLMs with complex, extended narratives. NovelQA, constructed from English novels, offers a unique blend of complexity, length, and narrative coherence, making it an ideal tool for assessing deep textual understanding in LLMs. This paper details the design and construction of NovelQA, focusing on its comprehensive manual annotation process and the variety of question types aimed at evaluating nuanced comprehension. Our evaluation of long-context LLMs on NovelQA reveals significant insights into their strengths and weaknesses. Notably, the models struggle with multi-hop reasoning, detail-oriented questions, and handling extremely long inputs, averaging over 200,000 tokens. Results highlight the need for substantial advancements in LLMs to enhance their long-context comprehension and contribute effectively to computational literary analysis.

NeurIPS Conference 2025 Conference Paper

PointMapPolicy: Structured Point Cloud Processing for Multi-Modal Imitation Learning

  • Xiaogang Jia
  • Qian Wang
  • Anrui Wang
  • Han Wang
  • Balázs Gyenes
  • Emiliyan Gospodinov
  • Xinkai Jiang
  • Ge Li

Robotic manipulation systems benefit from complementary sensing modalities, where each provides unique environmental information. Point clouds capture detailed geometric structure, while RGB images provide rich semantic context. Current point cloud methods struggle to capture fine-grained detail, especially for complex tasks, which RGB methods lack geometric awareness, which hinders their precision and generalization. We introduce PointMapPolicy, a novel approach that conditions diffusion policies on structured grids of points without downsampling. The resulting data type makes it easier to extract shape and spatial relationships from observations, and can be transformed between reference frames. Yet due to their structure in a regular grid, we enable the use of established computer vision techniques directly to 3D data. Using xLSTM as a backbone, our model efficiently fuses the point maps with RGB data for enhanced multi-modal perception. Through extensive experiments on the RoboCasa and CALVIN benchmarks and real robot evaluations, we demonstrate that our method achieves state-of-the-art performance across diverse manipulation tasks. The overview and demos are available on our project page: https: //point-map. github. io/Point-Map/

JBHI Journal 2025 Journal Article

Topological GCN Guided Improved Conformer for Detection of Hip Landmarks From Ultrasound Images

  • Tianxiang Huang
  • Jing Shi
  • Ge Jin
  • Juncheng Li
  • Jun Wang
  • Qian Wang
  • Jun Du
  • Jun Shi

The B-mode ultrasound based computer-aided diagnosis (CAD) has shown its effectiveness for diagnosis of Developmental Dysplasia of the Hip (DDH) in infants within 6 months. Hip landmark detection is a feasible way for the CAD of DDH according to the Graf's method. However, existing landmark detection algorithms mainly focus on designing special models to capture the features from hip ultrasound images, but generally ignore the important spatial relations among different landmarks. To this end, a novel weakly supervised learning-based algorithm, the Topological Graph Convolutional Network (TGCN) guided Improved Conformer (TGCN-ICF), is proposed for detecting landmarks from hip ultrasound images. The TGCN-ICF includes two subnetworks: an Improved Conformer (ICF) subnetwork to generate heatmaps and constraint vectors from ultrasound images, and a TGCN subnetwork to additionally explore topological relations among hip landmarks with the guidance of class labels for further refining and improving the detection accuracy. Moreover, a new Mutual Modulation Fusion (MMF) module is developed to fully exchange and fuse the extracted feature information from the convolutional neural network (CNN) and Transformer branches in ICF. Meanwhile, a novel Mutual Supervision Constraint (MSC) strategy is designed to provide a constraint for detection of each hip landmark. The experimental results on two real-world DDH datasets demonstrate that the TGCN-ICF outperforms all the compared algorithms, suggesting its potential applications.

IJCAI Conference 2024 Conference Paper

Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces

  • Juan Hu
  • Xin Liao
  • Difei Gao
  • Satoshi Tsutsui
  • Qian Wang
  • Zheng Qin
  • Mike Zheng Shou

Deepfake videos are becoming increasingly realistic, showing few tampering traces on facial areas that vary between frames. Consequently, existing Deepfake detection methods struggle to detect unknown domain Deepfake videos while accurately locating the tampered region. To address this limitation, we propose Delocate, a novel Deepfake detection model that can both recognize and localize unknown domain Deepfake videos. Our method consists of two stages named recovering and localization. In the recovering stage, the model randomly masks regions of interest (ROIs) and reconstructs real faces without tampering traces, leading to a relatively good recovery effect for real faces and a poor recovery effect for fake faces. In the localization stage, the output of the recovery phase and the forgery ground truth mask serve as supervision to guide the forgery localization process. This process strategically emphasizes the recovery phase of fake faces with poor recovery, facilitating the localization of tampered regions. Our extensive experiments on four widely used benchmark datasets demonstrate that Delocate not only excels in localizing tampered areas but also enhances cross-domain detection performance.

IROS Conference 2024 Conference Paper

DiaGBT: An Explainable and Evolvable Robot Control Framework using Dialogue Generative Behavior Trees

  • Jinde Liang
  • Yuan Chang
  • Qian Wang
  • Yanzhen Wang
  • Xiaodong Yi 0002

Manipulating robots using natural language is the preferred way for non-technical specialists. The challenge lies in reliability and adaptability especially when robots operate in unstructured surroundings. In this paper, we propose a novel framework called Dialogue Generative Behavior Trees (DiaGBT). Natural language instructions from human operators are transformed into behavior trees (BTs) and further executed by robots. Compared to the emerging Large Language Models (LLMs), DiaGBT is comparable in terms of semantic understanding but more lightweight, since the parsing rules are produced by LLM but tailored for task-correlated instructions. Besides, DiaGBT allows multi-round human-robot interaction, where robots learn reusable skills in real time. For evaluation, we generate a dataset with 4k instruction-BT pairs covering 4 different scenarios. On average, DiaGBT reaches over 90% parsability and 80% plausibility. Similar results on the VEIL-500 dataset outperform the current state of the art.

AAAI Conference 2024 Conference Paper

Dynamic Budget Throttling in Repeated Second-Price Auctions

  • Zhaohua Chen
  • Chang Wang
  • Qian Wang
  • Yuqi Pan
  • Zhuming Shi
  • Zheng Cai
  • Yukun Ren
  • Zhihua Zhu

In today's online advertising markets, a crucial requirement for an advertiser is to control her total expenditure within a time horizon under some budget. Among various budget control methods, throttling has emerged as a popular choice, managing an advertiser's total expenditure by selecting only a subset of auctions to participate in. This paper provides a theoretical panorama of a single advertiser's dynamic budget throttling process in repeated second-price auctions. We first establish a lower bound on the regret and an upper bound on the asymptotic competitive ratio for any throttling algorithm, respectively, when the advertiser's values are stochastic and adversarial. Regarding the algorithmic side, we propose the OGD-CB algorithm, which guarantees a near-optimal expected regret with stochastic values. On the other hand, when values are adversarial, we prove that this algorithm also reaches the upper bound on the asymptotic competitive ratio. We further compare throttling with pacing, another widely adopted budget control method, in repeated second-price auctions. In the stochastic case, we demonstrate that pacing is generally superior to throttling for the advertiser, supporting the well-known result that pacing is asymptotically optimal in this scenario. However, in the adversarial case, we give an exciting result indicating that throttling is also an asymptotically optimal dynamic bidding strategy. Our results bridge the gaps in theoretical research of throttling in repeated auctions and comprehensively reveal the ability of this popular budget-smoothing strategy.

AAAI Conference 2024 Conference Paper

Every Node Is Different: Dynamically Fusing Self-Supervised Tasks for Attributed Graph Clustering

  • Pengfei Zhu
  • Qian Wang
  • Yu Wang
  • Jialu Li
  • Qinghua Hu

Attributed graph clustering is an unsupervised task that partitions nodes into different groups. Self-supervised learning (SSL) shows great potential in handling this task, and some recent studies simultaneously learn multiple SSL tasks to further boost performance. Currently, different SSL tasks are assigned the same set of weights for all graph nodes. However, we observe that some graph nodes whose neighbors are in different groups require significantly different emphases on SSL tasks. In this paper, we propose to dynamically learn the weights of SSL tasks for different nodes and fuse the embeddings learned from different SSL tasks to boost performance. We design an innovative graph clustering approach, namely Dynamically Fusing Self-Supervised Learning (DyFSS). Specifically, DyFSS fuses features extracted from diverse SSL tasks using distinct weights derived from a gating network. To effectively learn the gating network, we design a dual-level self-supervised strategy that incorporates pseudo labels and the graph structure. Extensive experiments on five datasets show that DyFSS outperforms the state-of-the-art multi-task SSL methods by up to 8.66% on the accuracy metric. The code of DyFSS is available at: https://github.com/q086/DyFSS.

EAAI Journal 2024 Journal Article

Graph Confident Learning for Software Vulnerability Detection

  • Qian Wang
  • Zhengdao Li
  • Hetong Liang
  • Xiaowei Pan
  • Hui Li
  • Tingting Li
  • Xiaochen Li
  • Chenchen Li

Code vulnerability exposes millions of software to the possibility of being attacked, as evidence every year on increasing reports of security issues, such as information leaks, system compromise, and denial of service. Despite with many vulnerability detection models proposed so far, their effectiveness is still limited due to the ignorance of syntactic structural information analysis in source code and the improper handling of labeling errors. To address these issues, we propose the Graph Confident Learning for Software Vulnerability Detection (GCL4SVD) model, a machine learning model to detect software vulnerability in the development phase. It comprises two components: code graph embedding and graph confident learning denoising. To address the syntactic structural information analysis limitation, the code graph embedding component extracts the structure and semantic information of source code with a sliding window mechanism, and then encodes source code into a graph structure to capture the patterns and characteristics of code vulnerabilities. Additionally, the graph confident learning denoising component identifies labeling errors to improve the quality of training set. Experimental results show that GCL4SVD outperforms the state-of-the-art vulnerability detection models on four open source datasets by 3. 7%, 3. 3%, 2. 5%, 0. 8% in terms of Accuracy, respectively, and by 10. 2%, 21. 8%, 8. 2%, 11. 2% in terms of F1-score.

ICAPS Conference 2024 Conference Paper

Improving Learnt Local MAPF Policies with Heuristic Search

  • Rishi Veerapaneni
  • Qian Wang
  • Kevin Ren
  • Arthur Jakobsson
  • Jiaoyang Li 0001
  • Maxim Likhachev

Multi-agent path finding (MAPF) is the problem of finding collision-free paths for a team of agents to reach their goal locations. State-of-the-art classical MAPF solvers typically employ heuristic search to find solutions for hundreds of agents but are typically centralized and can struggle to scale when run with short timeouts. Machine learning (ML) approaches that learn policies for each agent are appealing as these could enable decentralized systems and scale well while maintaining good solution quality. Current ML approaches to MAPF have proposed methods that have started to scratch the surface of this potential. However, state-of-the-art ML approaches produce ``local" policies that only plan for a single timestep and have poor success rates and scalability. Our main idea is that we can improve a ML local policy by using heuristic search methods on the output probability distribution to resolve deadlocks and enable full horizon planning. We show several model-agnostic ways to use heuristic search with learnt policies that significantly improve the policies

ICAPS Conference 2024 Conference Paper

MAPF in 3D Warehouses: Dataset and Analysis

  • Qian Wang
  • Rishi Veerapaneni
  • Yu Wu
  • Jiaoyang Li 0001
  • Maxim Likhachev

Recent works have made significant progress in multi-agent path finding (MAPF), with modern methods being able to scale to hundreds of agents, handle unexpected delays, work in groups, etc. The vast majority of these methods have focused on 2D "grid world" domains. However, modern warehouses often utilize multi-agent robotic systems that can move in 3D, enabling dense storage but resulting in a more complex multi-agent planning problem. Motivated by this, we introduce and experimentally analyze the application of MAPF to 3D warehouse management, and release the first (see http: //mapf. info/index. php/Main/Benchmarks) open-source 3D MAPF dataset. We benchmark two state-of-the-art MAPF methods, EECBS and MAPF-LNS2, and show how different hyper-parameters affect these methods across various 3D MAPF problems. We also investigate how the warehouse structure itself affects MAPF performance. Based on our experimental analysis, we find that a fast low-level search is critical for 3D MAPF, EECBS

TMLR Journal 2024 Journal Article

MDP: A Generalized Framework for Text-Guided Image Editing by Manipulating the Diffusion Path

  • Qian Wang
  • Biao Zhang
  • Michael Birsak
  • Peter Wonka

Image generation using diffusion can be controlled in multiple ways. In this paper, we systematically analyze the equations of modern generative diffusion networks to propose a framework, called MDP, that explains the design space of suitable manipulations. We identify 5 different manipulations, including intermediate latent, conditional embedding, cross attention maps, guidance, and predicted noise. We analyze the corresponding parameters of these manipulations and the manipulation schedule. We show that some previous editing methods fit nicely into our framework. Particularly, we identified one specific configuration as a new type of control by manipulating the predicted noise, which can perform higher-quality edits than previous work for a variety of local and global edits.

AAAI Conference 2024 Conference Paper

Mining Gaze for Contrastive Learning toward Computer-Assisted Diagnosis

  • Zihao Zhao
  • Sheng Wang
  • Qian Wang
  • Dinggang Shen

Obtaining large-scale radiology reports can be difficult for medical images due to ethical concerns, limiting the effectiveness of contrastive pre-training in the medical image domain and underscoring the need for alternative methods. In this paper, we propose eye-tracking as an alternative to text reports, as it allows for the passive collection of gaze signals without ethical issues. By tracking the gaze of radiologists as they read and diagnose medical images, we can understand their visual attention and clinical reasoning. When a radiologist has similar gazes for two medical images, it may indicate semantic similarity for diagnosis, and these images should be treated as positive pairs when pre-training a computer-assisted diagnosis (CAD) network through contrastive learning. Accordingly, we introduce the Medical contrastive Gaze Image Pre-training (McGIP) as a plug-and-play module for contrastive learning frameworks. McGIP uses radiologist gaze to guide contrastive pre-training. We evaluate our method using two representative types of medical images and two common types of gaze data. The experimental results demonstrate the practicality of McGIP, indicating its high potential for various clinical scenarios and applications.

NeurIPS Conference 2024 Conference Paper

Multi-Chain Graphs of Graphs: A New Approach to Analyzing Blockchain Datasets

  • Bingqiao Luo
  • Zhen Zhang
  • Qian Wang
  • Bingsheng He

Machine learning applied to blockchain graphs offers significant opportunities for enhanced data analysis and applications. However, the potential of this field is constrained by the lack of a large-scale, cross-chain dataset that includes hierarchical graph-level data. To address this issue, we present novel datasets that provide detailed label information at the token level and integrate interactions between tokens across multiple blockchain platforms. We model transactions within each token as local graphs and the relationships between tokens as global graphs, collectively forming a "Graphs of Graphs" (GoG) approach. This innovative approach facilitates a deeper understanding of systemic structures and hierarchical interactions, which are essential for applications such as link prediction, anomaly detection, and token classification. We conduct a series of experiments demonstrating that this dataset delivers new insights and challenges for exploring GoG within the blockchain domain. Our work promotes advancements and opens new avenues for research in both the blockchain and graph communities. Source code and datasets are available at https: //github. com/Xtra-Computing/Cryptocurrency-Graphs-of-graphs.

JBHI Journal 2024 Journal Article

RCPS: Rectified Contrastive Pseudo Supervision for Semi-Supervised Medical Image Segmentation

  • Xiangyu Zhao
  • Zengxin Qi
  • Sheng Wang
  • Qian Wang
  • Xuehai Wu
  • Ying Mao
  • Lichi Zhang

Medical image segmentation methods are generally designed as fully-supervised to guarantee model performance, which requires a significant amount of expert annotated samples that are high-cost and laborious. Semi-supervised image segmentation can alleviate the problem by utilizing a large number of unlabeled images along with limited labeled images. However, learning a robust representation from numerous unlabeled images remains challenging due to potential noise in pseudo labels and insufficient class separability in feature space, which undermines the performance of current semi-supervised segmentation approaches. To address the issues above, we propose a novel semi-supervised segmentation method named as Rectified Contrastive Pseudo Supervision (RCPS), which combines a rectified pseudo supervision and voxel-level contrastive learning to improve the effectiveness of semi-supervised segmentation. Particularly, we design a novel rectification strategy for the pseudo supervision method based on uncertainty estimation and consistency regularization to reduce the noise influence in pseudo labels. Furthermore, we introduce a bidirectional voxel contrastive loss in the network to ensure intra-class consistency and inter-class contrast in feature space, which increases class separability in the segmentation. The proposed RCPS segmentation method has been validated on two public datasets and an in-house clinical dataset. Experimental results reveal that the proposed method yields better segmentation performance compared with the state-of-the-art methods in semi-supervised medical image segmentation.

NeurIPS Conference 2024 Conference Paper

ROBIN: Robust and Invisible Watermarks for Diffusion Models with Adversarial Optimization

  • Huayang Huang
  • Yu Wu
  • Qian Wang

Watermarking generative content serves as a vital tool for authentication, ownership protection, and mitigation of potential misuse. Existing watermarking methods face the challenge of balancing robustness and concealment. They empirically inject a watermark that is both invisible and robust and passively achieve concealment by limiting the strength of the watermark, thus reducing the robustness. In this paper, we propose to explicitly introduce a watermark hiding process to actively achieve concealment, thus allowing the embedding of stronger watermarks. To be specific, we implant a robust watermark in an intermediate diffusion state and then guide the model to hide the watermark in the final generated image. We employ an adversarial optimization algorithm to produce the optimal hiding prompt guiding signal for each watermark. The prompt embedding is optimized to minimize artifacts in the generated image, while the watermark is optimized to achieve maximum strength. The watermark can be verified by reversing the generation process. Experiments on various diffusion models demonstrate the watermark remains verifiable even under significant image tampering and shows superior invisibility compared to other state-of-the-art robust watermarking methods.

YNIMG Journal 2024 Journal Article

Time courses of brain plasticity underpinning visual motion perceptual learning

  • Yongqian Song
  • Qian Wang
  • Fang Fang

Visual perceptual learning (VPL) refers to a long-term improvement of visual task performance through training or experience, reflecting brain plasticity even in adults. In human subjects, VPL has been mostly studied using functional magnetic resonance imaging (fMRI). However, due to the low temporal resolution of fMRI, how VPL affects the time course of visual information processing is largely unknown. To address this issue, we trained human subjects to perform a visual motion direction discrimination task. Their behavioral performance and magnetoencephalography (MEG) signals responding to the motion stimuli were measured before, immediately after, and two weeks after training. Training induced a long-lasting behavioral improvement for the trained direction. Based on the MEG signals from occipital sensors, we found that, for the trained motion direction, VPL increased the motion direction decoding accuracy, reduced the motion direction decoding latency, enhanced the direction-selective channel response, and narrowed the tuning profile. Following the MEG source reconstruction, we showed that VPL enhanced the cortical response in early visual cortex (EVC) and strengthened the feedforward connection from EVC to V3A. These VPL-induced neural changes co-occurred in 160-230 ms after stimulus onset. Complementary to previous fMRI findings on VPL, this study provides a comprehensive description on the neural mechanisms of visual motion perceptual learning from a temporal perspective and reveals how VPL shapes the time course of visual motion processing in the adult human brain.

IJCAI Conference 2023 Conference Paper

Detecting Adversarial Faces Using Only Real Face Self-Perturbations

  • Qian Wang
  • Yongqin Xian
  • Hefei Ling
  • Jinyuan Zhang
  • Xiaorui Lin
  • Ping Li
  • Jiazhong Chen
  • Ning Yu

Adversarial attacks aim to disturb the functionality of a target system by adding specific noise to the input samples, bringing potential threats to security and robustness when applied to facial recognition systems. Although existing defense techniques achieve high accuracy in detecting some specific adversarial faces (adv-faces), new attack methods especially GAN-based attacks with completely different noise patterns circumvent them and reach a higher attack success rate. Even worse, existing techniques require attack data before implementing the defense, making it impractical to defend newly emerging attacks that are unseen to defenders. In this paper, we investigate the intrinsic generality of adv-faces and propose to generate pseudo adv-faces by perturbing real faces with three heuristically designed noise patterns. We are the first to train an adv-face detector using only real faces and their self-perturbations, agnostic to victim facial recognition systems, and agnostic to unseen attacks. By regarding adv-faces as out-of-distribution data, we then naturally introduce a novel cascaded system for adv-face detection, which consists of training data self-perturbations, decision boundary regularization, and a max-pooling-based binary classifier focusing on abnormal local color aberrations. Experiments conducted on LFW and CelebA-HQ datasets with eight gradient-based and two GAN-based attacks validate that our method generalizes to a variety of unseen adversarial attacks.

AAAI Conference 2023 System Paper

FC-TrackNet: Fast Convergence Net for 6D Pose Tracking in Synthetic Domains

  • Di Jia
  • Qian Wang
  • Jun Cao
  • Peng Cai
  • Zhiyang Jin

In this work, we propose a fast convergence track net, or FC-TrackNet, based on a synthetic data-driven approach to maintaining long-term 6D pose tracking. Comparison experiments are performed on two different datasets, The results demonstrate that our approach can achieve a consistent tracking frequency of 90.9 Hz as well as higher accuracy than the state-of-the art approaches.

IJCAI Conference 2023 Conference Paper

Orion: Online Backdoor Sample Detection via Evolution Deviance

  • Huayang Huang
  • Qian Wang
  • Xueluan Gong
  • Tao Wang

Widely-used DNN models are vulnerable to backdoor attacks, where the backdoored model is only triggered by specific inputs but can maintain a high prediction accuracy on benign samples. Existing backdoor input detection strategies rely on the assumption that benign and poisoned samples are separable in the feature representation of the model. However, such an assumption can be broken by advanced feature-hidden backdoor attacks. In this paper, we propose a novel detection framework, dubbed Orion (online backdoor sample detection via evolution deviance). Specifically, we analyze how predictions evolve during a forward pass and find deviations between the shallow and deep outputs of the backdoor inputs. By introducing side nets to track such evolution divergence, Orion eliminates the need for the assumption of latent separability. Additionally, we put forward a scheme to restore the original label of backdoor samples, enabling more robust predictions. Extensive experiments on six attacks, three datasets, and two architectures verify the effectiveness of Orion. It is shown that Orion outperforms state-of-the-art defenses and can identify feature-hidden attacks with an F1-score of 90%, compared to 40% for other detection schemes. Orion can also achieve 80% label recovery accuracy on basic backdoor attacks.

NeurIPS Conference 2023 Conference Paper

Revisiting Adversarial Robustness Distillation from the Perspective of Robust Fairness

  • Xinli Yue
  • Mou Ningping
  • Qian Wang
  • Lingchen Zhao

Adversarial Robustness Distillation (ARD) aims to transfer the robustness of large teacher models to small student models, facilitating the attainment of robust performance on resource-limited devices. However, existing research on ARD primarily focuses on the overall robustness of student models, overlooking the crucial aspect of $\textit{robust fairness}$. Specifically, these models may demonstrate strong robustness on some classes of data while exhibiting high vulnerability on other classes. Unfortunately, the "buckets effect" implies that the robustness of the deployed model depends on the classes with the lowest level of robustness. In this paper, we first investigate the inheritance of robust fairness during ARD and reveal that student models only partially inherit robust fairness from teacher models. We further validate this issue through fine-grained experiments with various model capacities and find that it may arise due to the gap in capacity between teacher and student models, as well as the existing methods treating each class equally during distillation. Based on these observations, we propose $\textbf{Fair}$ $\textbf{A}$dversarial $\textbf{R}$obustness $\textbf{D}$istillation (Fair-ARD), a novel framework for enhancing the robust fairness of student models by increasing the weights of difficult classes, and design a geometric perspective-based method to quantify the difficulty of different classes for determining the weights. Extensive experiments show that Fair-ARD surpasses both state-of-the-art ARD methods and existing robust fairness algorithms in terms of robust fairness (e. g. , the worst-class robustness under AutoAttack is improved by at most 12. 3\% and 5. 3\% using ResNet18 on CIFAR10, respectively), while also slightly improving overall robustness. Our code is available at: [https: //github. com/NISP-official/Fair-ARD](https: //github. com/NISP-official/Fair-ARD).

IJCAI Conference 2022 Conference Paper

A Few Seconds Can Change Everything: Fast Decision-based Attacks against DNNs

  • Ningping Mou
  • Baolin Zheng
  • Qian Wang
  • Yunjie Ge
  • Binqing Guo

Previous researches have demonstrated deep learning models' vulnerabilities to decision-based adversarial attacks, which craft adversarial examples based solely on information from output decisions (top-1 labels). However, existing decision-based attacks have two major limitations, i. e. , expensive query cost and being easy to detect. To bridge the gap and enlarge real threats to commercial applications, we propose a novel and efficient decision-based attack against black-box models, dubbed FastDrop, which only requires a few queries and work well under strong defenses. The crux of the innovation is that, unlike existing adversarial attacks that rely on gradient estimation and additive noise, FastDrop generates adversarial examples by dropping information in the frequency domain. Extensive experiments on three datasets demonstrate that FastDrop can escape the detection of the state-of-the-art (SOTA) black-box defenses and reduce the number of queries by 13~133× under the same level of perturbations compared with the SOTA attacks. FastDrop only needs 10~20 queries to conduct an attack against various black-box models within 1s. Besides, on commercial vision APIs provided by Baidu and Tencent, FastDrop achieves an attack success rate (ASR) of 100% with 10 queries on average, which poses a real and severe threat to real-world applications.

JBHI Journal 2022 Journal Article

GAN-Guided Deformable Attention Network for Identifying Thyroid Nodules in Ultrasound Images

  • Jintao Lu
  • Xi Ouyang
  • Xueda Shen
  • Tianjiao Liu
  • Zhiming Cui
  • Qian Wang
  • Dinggang Shen

Early detection and identification of malignant thyroid nodules, a vital precursory to the treatment, is a difficult task even for experienced clinicians. Many Computer-Aided Diagnose (CAD) systems have been developed to assist clinicians in performing this task on ultrasonic images. Learning-based CAD systems for thyroid nodules generally accommodate both nodule detection/ segmentation and fine-grained classification for its malignancy, and prior researches often treat aforementioned tasks in separate stages, leading to additional computational costs. In this paper, we utilize an online class activation mapping (CAM) mechanism to guide the network to learn discriminative features for identifying thyroid nodules in ultrasound images, called CAM attention network. It takes nodule masks as localization cues for direct spatial attention of the classification module, thereby avoiding isolated training for classification. Meanwhile, we propose a deformable convolution module to add offsets to the regular grid sampling locations in the standard convolution, guiding the network to capture more discriminative features of nodule areas. Furthermore, we use a generative adversarial network (GAN)to ensure reliable deformations of nodules from the deformable convolution module. Our proposed CAM attention network has already achieved the 2nd place in the classification task of TN-SCUI 2020, a MICCAI 2020 Challenge with the largest set of thyroid nodule ultrasound images according to our knowledge. The further inclusion of our proposed GAN-guided deformable module allows for capturing more fine-grained features between benign and malignant nodules, and further improves the classification accuracy to a new state-of-the-art level.

AAAI Conference 2022 Conference Paper

How to Distribute Data across Tasks for Meta-Learning?

  • Alexandru Cioba
  • Michael Bromberg
  • Qian Wang
  • Ritwik Niyogi
  • Georgios Batzolis
  • Jezabel Garcia
  • Da-shan Shiu
  • Alberto Bernacchia

Meta-learning models transfer the knowledge acquired from previous tasks to quickly learn new ones. They are trained on benchmarks with a fixed number of data points per task. This number is usually arbitrary and it is unknown how it affects performance at testing. Since labelling of data is expensive, finding the optimal allocation of labels across training tasks may reduce costs. Given a fixed budget of labels, should we use a small number of highly labelled tasks, or many tasks with few labels each? Should we allocate more labels to some tasks and less to others? We show that: 1) If tasks are homogeneous, there is a uniform optimal allocation, whereby all tasks get the same amount of data; 2) At fixed budget, there is a trade-off between number of tasks and number of data points per task, with a unique solution for the optimum; 3) When trained separately, harder task should get more data, at the cost of a smaller number of tasks; 4) When training on a mixture of easy and hard tasks, more data should be allocated to easy tasks. Interestingly, Neuroscience experiments have shown that human visual skills also transfer better from easy tasks. We prove these results mathematically on mixed linear regression, and we show empirically that the same results hold for few-shot image classification on CIFAR-FS and mini-ImageNet. Our results provide guidance for allocating labels across tasks when collecting data for meta-learning.

AAAI Conference 2022 Conference Paper

Parameter Differentiation Based Multilingual Neural Machine Translation

  • Qian Wang
  • Jiajun Zhang

Multilingual neural machine translation (MNMT) aims to translate multiple languages with a single model and has been proved successful thanks to effective knowledge transfer among different languages with shared parameters. However, it is still an open question which parameters should be shared and which ones need to be task-specific. Currently, the common practice is to heuristically design or search languagespecific modules, which is difficult to find the optimal configuration. In this paper, we propose a novel parameter differentiation based method that allows the model to determine which parameters should be language-specific during training. Inspired by cellular differentiation, each shared parameter in our method can dynamically differentiate into more specialized types. We further define the differentiation criterion as inter-task gradient similarity. Therefore, parameters with conflicting inter-task gradients are more likely to be language-specific. Extensive experiments on multilingual datasets have demonstrated that our method significantly outperforms various strong baselines with different parameter sharing configurations. Further analyses reveal that the parameter sharing configuration obtained by our method correlates well with the linguistic proximities.

AAAI Conference 2022 Conference Paper

Reliability Exploration with Self-Ensemble Learning for Domain Adaptive Person Re-identification

  • Zongyi Li
  • Yuxuan Shi
  • Hefei Ling
  • Jiazhong Chen
  • Qian Wang
  • Fengfan Zhou

Person re-identification (Re-ID) based on unsupervised domain adaptation (UDA) aims to transfer the pre-trained model from one labeled source domain to an unlabeled target domain. Existing methods tackle this problem by using clustering methods to generate pseudo labels. However, pseudo labels produced by these techniques may be unstable and noisy, substantially deteriorating models’ performance. In this paper, we propose a Reliability Exploration with Self-ensemble Learning (RESL) framework for domain adaptive person Re- ID. First, to increase the feature diversity, multiple branches are presented to extract features from different data augmentations. Taking the temporally average model as a mean teacher model, online label refining is conducted by using its dynamic ensemble predictions from different branches as soft labels. Second, to combat the adverse effects of unreliable samples in clusters, sample reliability is estimated by evaluating the consistency of different clusters’ results, followed by selecting reliable instances for training and re-weighting sample contribution within Re-ID losses. A contrastive loss is also utilized with cluster-level memory features which are updated by the mean feature. The experiments demonstrate that our method can significantly surpass the state-of-the-art performance on the unsupervised domain adaptive person Re- ID.

IJCAI Conference 2021 Conference Paper

InverseNet: Augmenting Model Extraction Attacks with Training Data Inversion

  • Xueluan Gong
  • Yanjiao Chen
  • Wenbin Yang
  • Guanghao Mei
  • Qian Wang

Cloud service providers, including Google, Amazon, and Alibaba, have now launched machine-learning-as-a-service (MLaaS) platforms, allowing clients to access sophisticated cloud-based machine learning models via APIs. Unfortunately, however, the commercial value of these models makes them alluring targets for theft, and their strategic position as part of the IT infrastructure of many companies makes them an enticing springboard for conducting further adversarial attacks. In this paper, we put forth a novel and effective attack strategy, dubbed InverseNet, that steals the functionality of black-box cloud-based models with only a small number of queries. The crux of the innovation is that, unlike existing model extraction attacks that rely on public datasets or adversarial samples, InverseNet constructs inversed training samples to increase the similarity between the extracted substitute model and the victim model. Further, only a small number of data samples with high confidence scores (rather than an entire dataset) are used to reconstruct the inversed dataset, which substantially reduces the attack cost. Extensive experiments conducted on three simulated victim models and Alibaba Cloud's commercially-available API demonstrate that InverseNet yields a model with significantly greater functional similarity to the victim model than the current state-of-the-art attacks at a substantially lower query budget.

IJCAI Conference 2021 Conference Paper

Recent Advances in Adversarial Training for Adversarial Robustness

  • Tao Bai
  • Jinqi Luo
  • Jun Zhao
  • Bihan Wen
  • Qian Wang

Adversarial training is one of the most effective approaches for deep learning models to defend against adversarial examples. Unlike other defense strategies, adversarial training aims to enhance the robustness of models intrinsically. During the past few years, adversarial training has been studied and discussed from various aspects, which deserves a comprehensive review. For the first time in this survey, we systematically review the recent progress on adversarial training for adversarial robustness with a novel taxonomy. Then we discuss the generalization problems in adversarial training from three perspectives and highlight the challenges which are not fully tackled. Finally, we present potential future directions.

AAAI Conference 2021 Conference Paper

Synchronous Interactive Decoding for Multilingual Neural Machine Translation

  • Hao He
  • Qian Wang
  • Zhipeng Yu
  • Yang Zhao
  • Jiajun Zhang
  • Chengqing Zong

To simultaneously translate a source language into multiple different target languages is one of the most common scenarios of multilingual translation. However, existing methods cannot make full use of translation model information during decoding, such as intra-lingual and inter-lingual future information, and therefore may suffer from some issues like the unbalanced outputs. In this paper, we present a new approach for synchronous interactive multilingual neural machine translation (SimNMT), which predicts each target language output simultaneously and interactively using historical and future information of all target languages. Specifically, we first propose a synchronous cross-interactive decoder in which generation of each target output does not only depend on its generated sequences, but also relies on its future information, as well as history and future contexts of other target languages. Then, we present a new interactive multilingual beam search algorithm that enables synchronous interactive decoding of all target languages in a single model. We take two target languages as an example to illustrate and evaluate the proposed SimNMT model on IWSLT datasets. The experimental results demonstrate that our method achieves significant improvements over several advanced NMT and M- NMT models.

AAMAS Conference 2021 Conference Paper

The Tight Bound for Pure Price of Anarchy in an Extended Miner's Dilemma Game

  • Qian Wang
  • Yurong Chen

Pool block withholding attack, which reduces the effective mining power in the system and leads to potential systemic instability in the blockchain, can be modeled as a non-cooperative game called “the miner’s dilemma”. However, existing literature on the gametheoretic properties of this attack only gives a preliminary analysis. In this paper, we establish the existence and uniqueness of pure Nash equilibrium for the two-player miner’s dilemma. Then we give a tight upper bound 2 for PPoA, which measures how much mining power is wasted in the game. Moreover, we show the uniqueness and the tight bound holds in a more general setting with betrayal assumption. Inspired by the experiments on the games among three mining pools, we conjecture that similar results should hold for the N-player miner’s dilemma game (𝑁 ≥ 2).

AAAI Conference 2020 Conference Paper

Attention-over-Attention Field-Aware Factorization Machine

  • Zhibo Wang
  • Jinxin Ma
  • Yongquan Zhang
  • Qian Wang
  • Ju Ren
  • Peng Sun

Factorization Machine (FM) has been a popular approach in supervised predictive tasks, such as click-through rate prediction and recommender systems, due to its great performance and efficiency. Recently, several variants of FM have been proposed to improve its performance. However, most of the state-of-the-art prediction algorithms neglected the field information of features, and they also failed to discriminate the importance of feature interactions due to the problem of redundant features. In this paper, we present a novel algorithm called Attention-over-Attention Field-aware Factorization Machine (AoAFFM) for better capturing the characteristics of feature interactions. Specifically, we propose the fieldaware embedding layer to exploit the field information of features, and combine it with the attention-over-attention mechanism to learn both feature-level and interaction-level attention to estimate the weight of feature interactions. Experimental results show that the proposed AoAFFM improves FM and FFM with large margin, and outperforms state-of-the-art algorithms on three public benchmark datasets.

YNICL Journal 2020 Journal Article

Systems modeling of white matter microstructural abnormalities in Alzheimer's disease

  • Emrin Horgusluoglu-Moloch
  • Gaoyu Xiao
  • Minghui Wang
  • Qian Wang
  • Xianxiao Zhou
  • Kwangsik Nho
  • Andrew J. Saykin
  • Eric Schadt

INTRODUCTION: Microstructural abnormalities in white matter (WM) are often reported in Alzheimer's disease (AD). However, it is unclear which brain regions have the strongest WM changes in presymptomatic AD and what biological processes underlie WM abnormality during disease progression. METHODS: We developed a systems biology framework to integrate matched diffusion tensor imaging (DTI), genetic and transcriptomic data to investigate regional vulnerability to AD and identify genetic risk factors and gene subnetworks underlying WM abnormality in AD. RESULTS: We quantified regional WM abnormality and identified most vulnerable brain regions. A SNP rs2203712 in CELF1 was most significantly associated with several DTI-derived features in the hippocampus, the top ranked brain region. An immune response gene subnetwork in the blood was most correlated with DTI features across all the brain regions. DISCUSSION: Incorporation of image analysis with gene network analysis enhances our understanding of disease progression and facilitates identification of novel therapeutic strategies for AD.

AAAI Conference 2020 Conference Paper

Unsupervised Domain Adaptation via Structured Prediction Based Selective Pseudo-Labeling

  • Qian Wang
  • Toby Breckon

Unsupervised domain adaptation aims to address the problem of classifying unlabeled samples from the target domain whilst labeled samples are only available from the source domain and the data distributions are different in these two domains. As a result, classifiers trained from labeled samples in the source domain suffer from significant performance drop when directly applied to the samples from the target domain. To address this issue, different approaches have been proposed to learn domain-invariant features or domain-specific classifiers. In either case, the lack of labeled samples in the target domain can be an issue which is usually overcome by pseudo-labeling. Inaccurate pseudo-labeling, however, could result in catastrophic error accumulation during learning. In this paper, we propose a novel selective pseudo-labeling strategy based on structured prediction. The idea of structured prediction is inspired by the fact that samples in the target domain are well clustered within the deep feature space so that unsupervised clustering analysis can be used to facilitate accurate pseudo-labeling. Experimental results on four datasets (i. e. Office-Caltech, Office31, ImageCLEF-DA and Office-Home) validate our approach outperforms contemporary state-of-the-art methods.

YNICL Journal 2019 Journal Article

Quantitative susceptibility mapping based hybrid feature extraction for diagnosis of Parkinson's disease

  • Bin Xiao
  • Naying He
  • Qian Wang
  • Zenghui Cheng
  • Yining Jiao
  • E. Mark Haacke
  • Fuhua Yan
  • Feng Shi

Parkinson's disease is the second most common neurodegenerative disease in the elderly after Alzheimer's disease. The aetiology and pathogenesis of Parkinson's disease (PD) are still unclear, but the loss of dopaminergic cells and the excessive iron deposition in the substantia nigra (SN) are associated with the pathophysiology. As an imaging technique that can quantitatively reflect the amount of iron deposition, Quantitative Susceptibility Mapping (QSM) has been shown to be a promising modality for the diagnosis of PD. In the present work, we propose a hybrid feature extraction method for PD diagnosis using QSM images. First, we extract radiomics features from the SN using QSM and employ machine learning algorithms to classify PD and normal controls (NC). This approach allows us to investigate which features are most vulnerable to the effects of the disease. Along with this approach, we propose a Convolutional Neural Network (CNN) based method which can extract different features from the QSM image to further support the diagnosis of PD. Finally, we combine these two types of features and we find that the radiomics features and CNN features are complementary to each other, which helps further improve the classification (diagnostic) performance. We conclude that: (1) radiomics features from QSM data have significant clinical value for the diagnosis of PD; (2) CNN features are also useful in the diagnosis of PD; and (3) the combination of radiomics features and CNN features can enhance the diagnostic accuracy.

JBHI Journal 2019 Journal Article

Regression Convolutional Neural Network for Automated Pediatric Bone Age Assessment From Hand Radiograph

  • Xuhua Ren
  • Tingting Li
  • Xiujun Yang
  • Shuai Wang
  • Sahar Ahmad
  • Lei Xiang
  • Shaun Richard Stone
  • Lihong Li

Skeletal bone age assessment is a common clinical practice to investigate endocrinology, and genetic and growth disorders of children. However, clinical interpretation and bone age analyses are time-consuming, labor intensive, and often subject to inter-observer variability. This advocates the need of a fully automated method for bone age assessment. We propose a regression convolutional neural network (CNN) to automatically assess the pediatric bone age from hand radiograph. Our network is specifically trained to place more attention to those bone age related regions in the X-ray images. Specifically, we first adopt the attention module to process all images and generate the coarse/fine attention maps as inputs for the regression network. Then, the regression CNN follows the supervision of the dynamic attention loss during training; thus, it can estimate the bone age of the hard (or “outlier”) images more accurately. The experimental results show that our method achieves an average discrepancy of 5. 2–5. 3 months between clinical and automatic bone age evaluations on two large datasets. In conclusion, we propose a fully automated deep learning solution to process X-ray images of the hand for bone age assessment, with the accuracy comparable to human experts but with much better efficiency.

IROS Conference 2019 Conference Paper

Robot Learning via Human Adversarial Games

  • Jiali Duan
  • Qian Wang
  • Lerrel Pinto
  • C. -C. Jay Kuo
  • Stefanos Nikolaidis

Much work in robotics has focused on “humanin-the-loop” learning techniques that improve the efficiency of the learning process. However, these algorithms have made the strong assumption of a cooperating human supervisor that assists the robot. In reality, human observers tend to also act in an adversarial manner towards deployed robotic systems. We show that this can in fact improve the robustness of the learned models by proposing a physical framework that leverages perturbations applied by a human adversary, guiding the robot towards more robust models. In a manipulation task, we show that grasping success improves significantly when the robot trains with a human adversary as compared to training in a self-supervised manner.

TCS Journal 2017 Journal Article

Online algorithms for scheduling on batch processing machines with interval graph compatibilities between jobs

  • Qian Wang
  • Ji Tian
  • Ruyan Fu
  • Xiangjuan Yao

We consider the online (over time) scheduling problem of minimizing the makespan on m unbounded parallel-batch machines, in which jobs in the same batch have to be pairwise compatible. Compatibility is a symmetric binary relation, which is represented by an interval compatibility graph. The processing time of a batch is equal to the maximum processing time of the jobs in it, and all jobs in the same batch start and finish at the same time. For this problem, firstly, we show that there exists no online algorithm with a competitive ratio less than 2. Then we provide an online algorithm with a competitive ratio 2 + m − 1 m + 1, which is optimal for the case m = 1. When all jobs have the same processing times, we also give an optimal online algorithm.

TCS Journal 2016 Journal Article

Online scheduling on the unbounded drop-line batch machines to minimize the maximum delivery completion time

  • Ji Tian
  • Qian Wang
  • Ruyan Fu
  • Jinjiang Yuan

We consider the online scheduling on m unbounded drop-line batch machines with delivery times. Here a drop-line batch machine can process several jobs in a batch so that the processing time of a batch is equal to the longest processing time of the jobs in the batch, the jobs in a batch have the same starting time, and the completion time of a job is equal to the sum of its starting time and its processing time. Once the processing of a job is completed on the machine, we immediately deliver it to the destination. The objective is to minimize the time by which all jobs have been delivered. For this problem, we present a best possible online algorithm with a competitive ratio of 1 + α m, where α m is the positive root of the equation α 2 + m α − 1 = 0.

NeurIPS Conference 2014 Conference Paper

Attentional Neural Network: Feature Selection Using Cognitive Feedback

  • Qian Wang
  • Jiaxing Zhang
  • Sen Song
  • Zheng Zhang

Attentional Neural Network is a new framework that integrates top-down cognitive bias and bottom-up feature extraction in one coherent architecture. The top-down influence is especially effective when dealing with high noise or difficult segmentation problems. Our system is modular and extensible. It is also easy to train and cheap to run, and yet can accommodate complex behaviors. We obtain classification accuracy better than or competitive with state of art results on the MNIST variation dataset, and successfully disentangle overlaid digits with high success rates. We view such a general purpose framework as an essential foundation for a larger system emulating the cognitive abilities of the whole brain.

CSL Conference 2013 Conference Paper

Semantics of Intensional Type Theory extended with Decidable Equational Theories

  • Qian Wang
  • Bruno Barras

Incorporating extensional equality into a dependent intensional type system such as the Calculus of Constructions (CC) provides with stronger type-checking capabilities and makes the proof development closer to intuition. Since strong forms of extensionality generally leads to undecidable type-checking, it seems a reasonable trade-off to extend intensional equality with a decidable first-order theory, as experimented in earlier work on CoqMTU and its implementation CoqMT. In this work, CoqMTU is extended with strong eliminations. The meta-theoretical study, particularly the part relying on semantic arguments, is more complex. A set-theoretical model of the equational theory is the key ingredient to derive the logical consistency of the formalism. Strong normalization, the main lemma from which type-decidability follows, is proved by attaching realizability information to the values of the model. The approach we have followed is to first consider an abstract notion of first-order equational theory, and then instantiate it with a particular instance, Presburger Arithmetic. These results have been formalized using Coq.