Author name cluster

Qian Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

65 papers

2 author rows

EAAI Journal 2026 Journal Article

A lesion region awareness and adaptive label-relation graph algorithm for multi-label chest X-ray image classification

Qian Wang
Weilun Meng
Congfan Gan
Hongnian Yu
Yongqiang Cheng

Multi-label chest X-ray (CXR) diagnosis remains challenging due to the high variability of lesion regions and the complex inter-disease relationships. Existing algorithms often rely on single-scale features and static label co-occurrence, limiting their ability to capture subtle lesions and dynamic label dependencies. This paper proposes a lesion region awareness and adaptive label-relation graph algorithm for multi-label CXR image classification. Firstly, the multi-scale feature-aware semantic learning method is proposed, which localizes disease regions of interest within features of different image scales, thereby extracting representations rich in the contextual information of the disease. Secondly, the adaptive label relation graph method is proposed, which dynamically models the dependencies among diseases for each sample and propagates discriminative features containing disease relations. Finally, the class-level feature enhancement method is proposed. Through the intra-class supervised contrastive learning strategy, the aggregation of disease-specific features is enhanced, further improving the discriminative ability and robustness of the algorithm. Experimental results on the CXR dataset demonstrate that the proposed algorithm outperforms state-of-the-art algorithms, verifying its effectiveness in multi-label chest disease classification.

AAAI Conference 2026 Conference Paper

Tell as You Want: Customizing Image Narrative with Knowledge and Thoughts

Ziwei Yao
Qian Wang
Ruiping Wang
Xilin Chen

With the advancement of vision-language models, image captioning has made significant progress, leading to the generation of more accurate and detailed descriptions. Current image captioning primarily focuses on describing the apparent visual characteristics, which are easily observed by most humans, but less helpful in real-world scenarios. When users seek a deeper understanding of visual content, they may be concerned with fine-grained categories, function properties, and other background knowledge, rather than merely appearances. Additionally, as users' interests vary, there is a growing demand for customizable content generation. To address these challenges, we propose the task of image narrative generation, which aims to produce knowledge-rich natural language responses for input images, customized to the user preference. Furthermore, we propose T^4, an image narrative generation model progressing through cascade steps: Tailor, reTrieve, Think, and Tell. Specifically, it takes the image and various types of prompts as input, and first refines or predicts potentially interesting queries that are tailored to the user expertise level. Subsequently, the model enriches contextual knowledge through retrieval-augmentation and employs chain-of-thoughts to decompose the generation process step by step, thereby telling an accurate and logically coherent image narrative. In addition, we construct the ImgNarr-23K dataset to support task training and evaluation. Experimental results demonstrate that the proposed approach generates image narratives that better satisfy user requirements, and achieves state-of-the-art performance in knowledge-based VQA tasks without additional finetuning. T^4 presents a promising solution for customized content generation in specialized domains.

PDF Details DOI

IROS Conference 2025 Conference Paper

ad-trait: A Fast and Flexible Automatic Differentiation Library in Rust

Chen Liang
Qian Wang
Andy Xu
Daniel Rakita

The Rust programming language is an attractive choice for robotics and related fields, offering highly efficient and memory-safe code. However, a key limitation preventing its broader adoption in these domains is the lack of high-quality, well-supported Automatic Differentiation (AD)—a fundamental technique that enables convenient derivative computation by systematically accumulating data during function evaluation. In this work, we introduce ad-trait, a new Rust-based AD library. Our implementation overloads Rust’s standard floating-point type with a flexible trait that can efficiently accumulate necessary information for derivative computation. The library supports both forward-mode and reverse-mode automatic differentiation, making it the first operator-overloading AD implementation in Rust to offer both options. Additionally, ad-trait leverages Rust’s performance-oriented features, such as Single Instruction, Multiple Data acceleration in forward-mode AD, to enhance efficiency. Through benchmarking experiments, we show that our library is among the fastest AD implementations across several programming languages for computing derivatives. Moreover, it is already integrated into a Rust-based robotics library, where we showcase its ability to facilitate fast optimization procedures. We conclude with a discussion of the limitations and broader implications of our work.

NeurIPS Conference 2025 Conference Paper

AlignedGen: Aligning Style Across Generated Images

Jiexuan Zhang
Yiheng Du
Qian Wang
Weiqi Li
Yu Gu
Jian Zhang

Diffusion-based generative models struggle to maintain high style consistency across generated images via text description. Although several style-aligned image generation methods have been proposed to address this issue, they exhibit suboptimal performance and are primarily built upon the U-Net architecture, limiting their compatibility with DiT diffusion models like Flux that has emerged as a predominant model in the field of image generation. To address these limitations, we propose AlignedGen, a novel training-free style-aligned image generation method for DiT models to significantly enhance style consistency across generated images. Specifically, AlignedGen incorporates two key components to achieve this: Shifted Position Embedding (ShiftPE) and Advanced Attention Sharing (AAS). ShiftPE alleviates the text controllability degradation observed in prior methods when applied to DiT models through its non-overlapping position indices design, while AAS comprises three specialized techniques to unleash the full potential of DiT for style-aligned generation. Furthermore, to broaden the applicability of our method, we present an efficient query, key, and value feature extraction algorithm, enabling our method to seamlessly incorporate external images as style references. Extensive experimental results validate that our method effectively enhances style consistency across generated images while maintaining favorable text controllability. Code: https: //github. com/Jiexuanz/AlignedGen.

AAAI Conference 2025 Conference Paper

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation

Xie Tianyidan
Rui Ma
Qian Wang
Xiaoqian Ye
Feixuan Liu
Ying Tai
Zhenyu Zhang
Lanjun Wang

Recent advancements in image-conditioned image generation have demonstrated substantial progress. However, foreground-conditioned image generation remains underexplored, encountering challenges such as compromised object integrity, foreground-background inconsistencies, limited diversity, and reduced control flexibility. These challenges arise from current end-to-end inpainting models, which suffer from inaccurate training masks, limited foreground semantic understanding, data distribution biases, and inherent interference between visual and textual prompts. To overcome these limitations, we present Anywhere, a multi-agent framework that departs from the traditional end-to-end approach. In this framework, each agent is specialized in a distinct aspect, such as foreground understanding, diversity enhancement, object integrity protection, and textual prompt consistency. Our framework is further enhanced with the ability to incorporate optional user textual inputs, perform automated quality assessments, and initiate re-generation as needed. Comprehensive experiments demonstrate that this modular design effectively overcomes the limitations of existing end-to-end models, resulting in higher fidelity, quality, diversity and controllability in foreground-conditioned image generation. Additionally, the Anywhere framework is extensible, allowing it to benefit from future advancements in each individual agent.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning

Hongyi Zhou
Weiran Liao
Xi Huang
Yucheng Tang
Fabian Otto
Xiaogang Jia
Xinkai Jiang
Simon Hilber

We present the B-spline Encoded Action Sequence Tokenizer (BEAST), a novel action tokenizer that encodes action sequences into compact discrete or continuous tokens using B-splines. In contrast to existing action tokenizers based on vector quantization or byte pair encoding, BEAST requires no separate tokenizer training and consistently produces tokens of uniform length, enabling fast action sequence generation via parallel decoding. Leveraging our B-spline formulation, BEAST inherently ensures generating smooth trajectories without discontinuities between adjacent segments. We extensively evaluate BEAST by integrating it with three distinct model architectures: a Variational Autoencoder (VAE) with continuous tokens, a decoder-only Transformer with discrete tokens, and Florence-2, a pretrained Vision-Language Model with an encoder-decoder architecture, demonstrating BEAST's compatibility and scalability with large pretrained models. We evaluate BEAST across three established benchmarks consisting of 166 simulated tasks and on three distinct robot settings with a total of 8 real-world tasks. Experimental results demonstrate that BEAST (i) significantly reduces both training and inference computational costs, and (ii) consistently generates smooth, high-frequency control signals suitable for continuous control tasks while (iii) reliably achieves competitive task success rates compared to state-of-the-art methods.

EAAI Journal 2025 Journal Article

Deep bioinspired evolutionary stacking algorithm for unpaired multimodal cell classification calibration

Lili Zhao
Di Xu
Xueping Tan
Jinzhao Yang
Weiping Ding
Hengde Zhu
Lichi Zhang
Qian Wang

Single-modality deep learning approaches for cell image classification exhibit inherent limitations in informational diversity when processing cross-institutional datasets acquired under varied imaging protocols. In contrast, multimodal cell imaging has emerged as a promising alternative for addressing data heterogeneity through comprehensive information integration. This study introduces a novel multimodal alternate training decision-making architecture based on a stacking algorithm for unpaired multimodal cell classification calibration. The method leverages deep bioinspired evolutionary networks combined with kernel-based support vector machines. Specifically, deep base classifiers incorporating multimodal concepts are derived from heterogeneous cell datasets. Each base classifier employs a bioinspired strategy to perform alternate training between two evolutionary densely connected attention networks. To mitigate class imbalance, where diseased cells are significantly outnumbered by normal cells, we incorporate a Shannon entropy loss term. Finally, multiple kernel-based support vector machines serve as meta classifiers, transforming high-level multimodal concepts into a separable feature space for robust decision-making. Experimental results demonstrate the superiority of our approach over existing algorithms for unpaired multimodal cell image classification. Our findings emphasize the importance of alternate training intra-modality classifiers and inter-modality fusion calibration for accurate and reliable medical image classification. All source code for this work will be publicly available on GitHub.

EAAI Journal 2025 Journal Article

Deep Content and Contrastive Perception learning for automatic fetal nuchal translucency image quality assessment

Lili Zhao
Yuanyuan Xu
Jian Xu
Weiping Ding
Jinzhao Yang
Huiyu Zhou
Yiming Du
Bin Hu

Automatic quality assessment of fetal nuchal translucency ultrasound images can assist physicians in obtaining standard planes and improve the reproducibility of nuchal translucency screening. At present, there are no special studies and methods for the quality assessment of fetal nuchal translucency ultrasound images. For this task, main challenges are low image quality, content identification of structural integrity and relative position relationship, time consumption for data collection and fine-grained annotation. To address these challenges, we propose a framework based on DenseNet model, which includes preprocessing module, content perception module, attention learning module and contrastive regularization module. Experiments show that the modules are effective for improving the quality assessment framework performance. And this framework is better than the other fourteen deep learning models. This framework can provide the sonographer with a model interpretable reference map. Bland–Altman experimental analysis also verifies the consistency between the results obtained by the automatic quality assessment framework and the manually annotated clinical dataset. Therefore, the proposed quality assessment framework for fetal nuchal translucency ultrasound images has the prospect and value of clinical application.

IROS Conference 2025 Conference Paper

Differential-Flatness-Based Tracking Control for Tractor-Trailers in Reversing Maneuvers

Bo Yang 0064
Zhenhao Zhuang
Zitian Yu
Qian Wang
Junqing Wei
Yilin Mo
Wen Yang

In this paper, we propose a differential-flatness-based controller (DFBC) for precise trajectory tracking of tractor-trailers, particularly during reversing maneuvers, which are challenging due to unstable equilibrium points. The proposed controller leverages the differential flatness property of tractor-trailers, equivalently transforming the nonlinear kinematics into a brunovsky canonical form, allowing the application of linear control theory for control design. Compared to traditional linear quadratic regulator (LQR) controllers, the proposed DFBC method achieves higher precision and robustness in reversing maneuvers. We also showcase the performance of the proposed DFBC method through physical experiments conducted on our self-developed 1/10 scale autonomous tractor-trailer.

EAAI Journal 2025 Journal Article

Enhanced cross-domain lithology classification in imbalanced datasets using an unsupervised domain Adversarial Network

Yunxin Xie
Liangyu Jin
Chenyang Zhu
Weibin Luo
Qian Wang

Recent advancements in Artificial Intelligence (AI), particularly deep learning, have significantly improved lithology identification in reservoir exploration by leveraging micrographic rock imagery. Deep neural networks excel in feature extraction, enhancing classification accuracy. However, these models are prone to domain shifts, which often degrade their performance in real-world applications. This paper proposes an unsupervised domain adaptation framework that integrates Fisher linear discriminant analysis and Online Hard Example Mining (OHEM) to mitigate domain shifts and improve classification, particularly in datasets with imbalanced classes. The model employs a ω -balanced global–local domain discriminator to align feature distributions between different domains and introduces focal loss with class-wise weighted factors for better handling of imbalanced data. Additionally, an adapted version of OHEM identifies difficult samples during training, allowing the model to concentrate on challenging cases. The proposed method is validated on micrographic rock imagery from the Tibet, Qinghai, and Xinjiang regions, achieving an average accuracy of 83. 2%, which is 13. 8% higher than ResNet50 and at least 1% superior to other domain adaptation models. This research highlights the potential of AI-driven solutions in geoscientific applications and provides a robust framework for unsupervised lithology classification.

NeurIPS Conference 2025 Conference Paper

FreeControl: Efficient, Training-Free Structural Control via One-Step Attention Extraction

Jiang Lin
Xinyu Chen
Song Wu
Zhiqiu Zhang
Jizhi Zhang
Ye Wang
Qiang Tang
Qian Wang

Controlling the spatial and semantic structure of diffusion-generated images remains a challenge. Existing methods like ControlNet rely on handcrafted condition maps and retraining, limiting flexibility and generalization. Inversion-based approaches offer stronger alignment but incur high inference cost due to dual-path denoising. We present \textbf{FreeControl}, a training-free framework for semantic structural control in diffusion models. Unlike prior methods that extract attention across multiple timesteps, FreeControl performs \textit{one-step attention extraction} from a single, optimally chosen timestep and reuses it throughout denoising. This enables efficient structural guidance without inversion or retraining. To further improve quality and stability, we introduce \textit{Latent-Condition Decoupling (LCD)}: a principled separation of the timestep condition and the noised latent used in attention extraction. LCD provides finer control over attention quality and eliminates structural artifacts. FreeControl also supports compositional control via reference images assembled from multiple sources, enabling intuitive scene layout design and stronger prompt alignment. FreeControl introduces a new paradigm for test-time control—enabling structurally and semantically aligned, visually coherent generation directly from raw images, with the flexibility for intuitive compositional design and compatibility with modern diffusion models at ~5\% additional cost.

NeurIPS Conference 2025 Conference Paper

Hierarchical Frequency Tagging Probe (HFTP): A Unified Approach to Investigate Syntactic Structure Representations in Large Language Models and the Human Brain

Jingmin An
Yilong Song
Ruolin Yang
Nai Ding
Lingxi Lu
Yuxuan Wang
Wei Wang
Chu Zhuang

Large Language Models (LLMs) demonstrate human-level or even superior language abilities, effectively modeling syntactic structures, yet the specific computational units responsible remain unclear. A key question is whether LLM behavioral capabilities stem from mechanisms akin to those in the human brain. To address these questions, we introduce the Hierarchical Frequency Tagging Probe (HFTP), a tool that utilizes frequency-domain analysis to identify neuron-wise components of LLMs (e. g. , individual Multilayer Perceptron (MLP) neurons) and cortical regions (via intracranial recordings) encoding syntactic structures. Our results show that models such as GPT-2, Gemma, Gemma 2, Llama 2, Llama 3. 1, and GLM-4 process syntax in analogous layers, while the human brain relies on distinct cortical regions for different syntactic levels. Representational similarity analysis reveals a stronger alignment between LLM representations and the left hemisphere of the brain (dominant in language processing). Notably, upgraded models exhibit divergent trends: Gemma 2 shows greater brain similarity than Gemma, while Llama 3. 1 shows less alignment with the brain compared to Llama 2. These findings offer new insights into the interpretability of LLM behavioral improvements, raising questions about whether these advancements are driven by human-like or non-human-like mechanisms, and establish HFTP as a valuable tool bridging computational linguistics and cognitive neuroscience. This project is available at https: //github. com/LilTiger/HFTP.

EAAI Journal 2025 Journal Article

Loosen Attention: Integrating localized channel and coarse spatial attention for enhanced analysis of complex aurora images

Qian Wang
Shihao Jing
Rui Yang
Zhenpei Liu
Yao Tang
Han Pan

Auroral images present a significant challenge for automated analysis due to their highly variable and dynamic morphology, influenced by complex interactions between the solar wind and Earth’s magnetosphere. These natural phenomena exhibit considerable randomness in shape, brightness, and motion, making them a unique and challenging signal source for artificial intelligence methods. In this work, we propose Loosen Attention (LA), a novel and lightweight attention mechanism tailored to capture the unpredictable and fluid-like nature of auroral patterns. LA integrates localized channel attention with coarse-grained spatial attention, forming a flexible attention framework that enhances the robustness and adaptability of feature extraction in deep learning models. The LA module is engineered around four key strategies: volumetric feature generation, volume-wise attention computation, refinement and reconstruction, and enhanced feature fusion, enabling efficient focus on subtle yet significant auroral structures while tolerating less informative regions. Unlike conventional attention methods that may struggle with complex visual patterns, LA is explicitly designed to process diverse auroral images in a structured and computationally efficient way. We validate the LA mechanism on multiple vision tasks to demonstrate consistent performance improvements over state-of-the-art attention modules. The results highlight the potential of LA to improve deep learning pipelines in geospace imaging and other domains dealing with similarly complex natural signals. This work offers a promising approach for intelligent processing of Earth–space interaction data and other visually ambiguous scientific images.

AAAI Conference 2025 Conference Paper

MUC: Mixture of Uncalibrated Cameras for Robust 3D Human Body Reconstruction

Yitao Zhu
Sheng Wang
Mengjie Xu
Zixu Zhuang
Zhixin Wang
Kaidong Wang
Han Zhang
Qian Wang

Multiple cameras can provide comprehensive multi-view video coverage of a person. Fusing this multi-view data is crucial for tasks like behavioral analysis, although it traditionally requires camera calibration—a process that is often complex. Moreover, previous studies have overlooked the challenges posed by self-occlusion under multiple views and the continuity of human body shape estimation. In this study, we introduce a method to reconstruct the 3D human body from multiple uncalibrated camera views. Initially, we utilize a pre-trained human body encoder to process each camera view individually, enabling the reconstruction of human body models and parameters for each view along with predicted camera positions. Rather than merely averaging the models across views, we develop a neural network trained to assign weights to individual views for all human body joints, based on the estimated distribution of joint distances from each camera. Additionally, we focus on the mesh surface of the human body for dynamic fusion, allowing for the seamless integration of facial expressions and body shape into a unified human body model. Our method has shown excellent performance in reconstructing the human body on two public datasets, advancing beyond previous work from the SMPL model to the SMPL-X model. This extension incorporates more complex hand poses and facial expressions, enhancing the detail and accuracy of the reconstructions. Crucially, it supports the flexible ad-hoc deployment of any number of cameras, offering significant potential for various applications.

PDF Details DOI

JBHI Journal 2025 Journal Article

Multi-Gate Mixture of Multi-View Graph Contrastive Learning on Electronic Health Record

Yu Cao
Qian Wang
Xu Wang
Dezhong Peng
Peilin Li

Electronic Health Record (EHR) is the digital form of patient visits that contains various medical data, including diagnosis, treatment, and lab events. Representation learning of EHR with deep learning methods has been beneficial for patient-related prediction tasks. Recently, studies have focused on revealing the inherent graph structure between medical events in EHR. Graph neural network (GNN) methods are prevalent and perform well in various prediction tasks. However, the inherent relationships between various medical events must be marked, which is complicated and time-consuming. Most research works adopt the straightforward structure of GNN models on a single prediction task which could not fully exploit the potential of EHR representations. Compared with previous work, the multi-task prediction could utilize the latent information of concealed correlations between different prediction tasks. In addition, self-contrastive learning on graphs could improve the representation learned by GNN. We propose a multi-gate mixture of multi-view graph contrastive learning (MMMGCL) method, aiming to get a more reasonable EHR representation and improve the performances of downstream tasks. First, each patient visit is represented as a graph with a well-designed hierarchically fully-connected pattern. Second, node features in the manually constructed graph are pre-trained via the Glove method with hierarchical ontology knowledge. Finally, MMMGCL processes the pre-trained graph and adopts a joint learning strategy to simultaneously optimize task and contrastive losses. We verify our method on two large open-source medical datasets, Medical Information Mart for Intensive Care (MIMIC-III) and the eICU Collaborative Research Database (eICU). Experiment results show that our method could improve performance compared to straightforward graph-based methods on prediction tasks of patient readmission, mortality, and length of stay.

ICLR Conference 2025 Conference Paper

NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens

Cunxiang Wang
Ruoxi Ning
Boqi Pan
Tonghui Wu
Qipeng Guo
Cheng Deng 0001
Guangsheng Bao
Xiangkun Hu

Recent advancements in Large Language Models (LLMs) have pushed the boundaries of natural language processing, especially in long-context understanding. However, the evaluation of these models' long-context abilities remains a challenge due to the limitations of current benchmarks. To address this gap, we introduce NovelQA, a benchmark tailored for evaluating LLMs with complex, extended narratives. NovelQA, constructed from English novels, offers a unique blend of complexity, length, and narrative coherence, making it an ideal tool for assessing deep textual understanding in LLMs. This paper details the design and construction of NovelQA, focusing on its comprehensive manual annotation process and the variety of question types aimed at evaluating nuanced comprehension. Our evaluation of long-context LLMs on NovelQA reveals significant insights into their strengths and weaknesses. Notably, the models struggle with multi-hop reasoning, detail-oriented questions, and handling extremely long inputs, averaging over 200,000 tokens. Results highlight the need for substantial advancements in LLMs to enhance their long-context comprehension and contribute effectively to computational literary analysis.

NeurIPS Conference 2025 Conference Paper

PointMapPolicy: Structured Point Cloud Processing for Multi-Modal Imitation Learning

Xiaogang Jia
Qian Wang
Anrui Wang
Han Wang
Balázs Gyenes
Emiliyan Gospodinov
Xinkai Jiang
Ge Li

Robotic manipulation systems benefit from complementary sensing modalities, where each provides unique environmental information. Point clouds capture detailed geometric structure, while RGB images provide rich semantic context. Current point cloud methods struggle to capture fine-grained detail, especially for complex tasks, which RGB methods lack geometric awareness, which hinders their precision and generalization. We introduce PointMapPolicy, a novel approach that conditions diffusion policies on structured grids of points without downsampling. The resulting data type makes it easier to extract shape and spatial relationships from observations, and can be transformed between reference frames. Yet due to their structure in a regular grid, we enable the use of established computer vision techniques directly to 3D data. Using xLSTM as a backbone, our model efficiently fuses the point maps with RGB data for enhanced multi-modal perception. Through extensive experiments on the RoboCasa and CALVIN benchmarks and real robot evaluations, we demonstrate that our method achieves state-of-the-art performance across diverse manipulation tasks. The overview and demos are available on our project page: https: //point-map. github. io/Point-Map/

JBHI Journal 2025 Journal Article

Topological GCN Guided Improved Conformer for Detection of Hip Landmarks From Ultrasound Images

Tianxiang Huang
Jing Shi
Ge Jin
Juncheng Li
Jun Wang
Qian Wang
Jun Du
Jun Shi

The B-mode ultrasound based computer-aided diagnosis (CAD) has shown its effectiveness for diagnosis of Developmental Dysplasia of the Hip (DDH) in infants within 6 months. Hip landmark detection is a feasible way for the CAD of DDH according to the Graf's method. However, existing landmark detection algorithms mainly focus on designing special models to capture the features from hip ultrasound images, but generally ignore the important spatial relations among different landmarks. To this end, a novel weakly supervised learning-based algorithm, the Topological Graph Convolutional Network (TGCN) guided Improved Conformer (TGCN-ICF), is proposed for detecting landmarks from hip ultrasound images. The TGCN-ICF includes two subnetworks: an Improved Conformer (ICF) subnetwork to generate heatmaps and constraint vectors from ultrasound images, and a TGCN subnetwork to additionally explore topological relations among hip landmarks with the guidance of class labels for further refining and improving the detection accuracy. Moreover, a new Mutual Modulation Fusion (MMF) module is developed to fully exchange and fuse the extracted feature information from the convolutional neural network (CNN) and Transformer branches in ICF. Meanwhile, a novel Mutual Supervision Constraint (MSC) strategy is designed to provide a constraint for detection of each hip landmark. The experimental results on two real-world DDH datasets demonstrate that the TGCN-ICF outperforms all the compared algorithms, suggesting its potential applications.

IJCAI Conference 2024 Conference Paper

Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces

Juan Hu
Xin Liao
Difei Gao
Satoshi Tsutsui
Qian Wang
Zheng Qin
Mike Zheng Shou

Deepfake videos are becoming increasingly realistic, showing few tampering traces on facial areas that vary between frames. Consequently, existing Deepfake detection methods struggle to detect unknown domain Deepfake videos while accurately locating the tampered region. To address this limitation, we propose Delocate, a novel Deepfake detection model that can both recognize and localize unknown domain Deepfake videos. Our method consists of two stages named recovering and localization. In the recovering stage, the model randomly masks regions of interest (ROIs) and reconstructs real faces without tampering traces, leading to a relatively good recovery effect for real faces and a poor recovery effect for fake faces. In the localization stage, the output of the recovery phase and the forgery ground truth mask serve as supervision to guide the forgery localization process. This process strategically emphasizes the recovery phase of fake faces with poor recovery, facilitating the localization of tampered regions. Our extensive experiments on four widely used benchmark datasets demonstrate that Delocate not only excels in localizing tampered areas but also enhances cross-domain detection performance.

PDF Details DOI

IROS Conference 2024 Conference Paper

DiaGBT: An Explainable and Evolvable Robot Control Framework using Dialogue Generative Behavior Trees

Jinde Liang
Yuan Chang
Qian Wang
Yanzhen Wang
Xiaodong Yi 0002

Manipulating robots using natural language is the preferred way for non-technical specialists. The challenge lies in reliability and adaptability especially when robots operate in unstructured surroundings. In this paper, we propose a novel framework called Dialogue Generative Behavior Trees (DiaGBT). Natural language instructions from human operators are transformed into behavior trees (BTs) and further executed by robots. Compared to the emerging Large Language Models (LLMs), DiaGBT is comparable in terms of semantic understanding but more lightweight, since the parsing rules are produced by LLM but tailored for task-correlated instructions. Besides, DiaGBT allows multi-round human-robot interaction, where robots learn reusable skills in real time. For evaluation, we generate a dataset with 4k instruction-BT pairs covering 4 different scenarios. On average, DiaGBT reaches over 90% parsability and 80% plausibility. Similar results on the VEIL-500 dataset outperform the current state of the art.

AAAI Conference 2024 Conference Paper

Dynamic Budget Throttling in Repeated Second-Price Auctions

Zhaohua Chen
Chang Wang
Qian Wang
Yuqi Pan
Zhuming Shi
Zheng Cai
Yukun Ren
Zhihua Zhu

In today's online advertising markets, a crucial requirement for an advertiser is to control her total expenditure within a time horizon under some budget. Among various budget control methods, throttling has emerged as a popular choice, managing an advertiser's total expenditure by selecting only a subset of auctions to participate in. This paper provides a theoretical panorama of a single advertiser's dynamic budget throttling process in repeated second-price auctions. We first establish a lower bound on the regret and an upper bound on the asymptotic competitive ratio for any throttling algorithm, respectively, when the advertiser's values are stochastic and adversarial. Regarding the algorithmic side, we propose the OGD-CB algorithm, which guarantees a near-optimal expected regret with stochastic values. On the other hand, when values are adversarial, we prove that this algorithm also reaches the upper bound on the asymptotic competitive ratio. We further compare throttling with pacing, another widely adopted budget control method, in repeated second-price auctions. In the stochastic case, we demonstrate that pacing is generally superior to throttling for the advertiser, supporting the well-known result that pacing is asymptotically optimal in this scenario. However, in the adversarial case, we give an exciting result indicating that throttling is also an asymptotically optimal dynamic bidding strategy. Our results bridge the gaps in theoretical research of throttling in repeated auctions and comprehensively reveal the ability of this popular budget-smoothing strategy.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Every Node Is Different: Dynamically Fusing Self-Supervised Tasks for Attributed Graph Clustering

Pengfei Zhu
Qian Wang
Yu Wang
Jialu Li
Qinghua Hu

Attributed graph clustering is an unsupervised task that partitions nodes into different groups. Self-supervised learning (SSL) shows great potential in handling this task, and some recent studies simultaneously learn multiple SSL tasks to further boost performance. Currently, different SSL tasks are assigned the same set of weights for all graph nodes. However, we observe that some graph nodes whose neighbors are in different groups require significantly different emphases on SSL tasks. In this paper, we propose to dynamically learn the weights of SSL tasks for different nodes and fuse the embeddings learned from different SSL tasks to boost performance. We design an innovative graph clustering approach, namely Dynamically Fusing Self-Supervised Learning (DyFSS). Specifically, DyFSS fuses features extracted from diverse SSL tasks using distinct weights derived from a gating network. To effectively learn the gating network, we design a dual-level self-supervised strategy that incorporates pseudo labels and the graph structure. Extensive experiments on five datasets show that DyFSS outperforms the state-of-the-art multi-task SSL methods by up to 8.66% on the accuracy metric. The code of DyFSS is available at: https://github.com/q086/DyFSS.

PDF Details DOI

EAAI Journal 2024 Journal Article

Graph Confident Learning for Software Vulnerability Detection

Qian Wang
Zhengdao Li
Hetong Liang
Xiaowei Pan
Hui Li
Tingting Li
Xiaochen Li
Chenchen Li

Code vulnerability exposes millions of software to the possibility of being attacked, as evidence every year on increasing reports of security issues, such as information leaks, system compromise, and denial of service. Despite with many vulnerability detection models proposed so far, their effectiveness is still limited due to the ignorance of syntactic structural information analysis in source code and the improper handling of labeling errors. To address these issues, we propose the Graph Confident Learning for Software Vulnerability Detection (GCL4SVD) model, a machine learning model to detect software vulnerability in the development phase. It comprises two components: code graph embedding and graph confident learning denoising. To address the syntactic structural information analysis limitation, the code graph embedding component extracts the structure and semantic information of source code with a sliding window mechanism, and then encodes source code into a graph structure to capture the patterns and characteristics of code vulnerabilities. Additionally, the graph confident learning denoising component identifies labeling errors to improve the quality of training set. Experimental results show that GCL4SVD outperforms the state-of-the-art vulnerability detection models on four open source datasets by 3. 7%, 3. 3%, 2. 5%, 0. 8% in terms of Accuracy, respectively, and by 10. 2%, 21. 8%, 8. 2%, 11. 2% in terms of F1-score.

ICAPS Conference 2024 Conference Paper

Improving Learnt Local MAPF Policies with Heuristic Search

Rishi Veerapaneni
Qian Wang
Kevin Ren
Arthur Jakobsson
Jiaoyang Li 0001
Maxim Likhachev

Multi-agent path finding (MAPF) is the problem of finding collision-free paths for a team of agents to reach their goal locations. State-of-the-art classical MAPF solvers typically employ heuristic search to find solutions for hundreds of agents but are typically centralized and can struggle to scale when run with short timeouts. Machine learning (ML) approaches that learn policies for each agent are appealing as these could enable decentralized systems and scale well while maintaining good solution quality. Current ML approaches to MAPF have proposed methods that have started to scratch the surface of this potential. However, state-of-the-art ML approaches produce ``local" policies that only plan for a single timestep and have poor success rates and scalability. Our main idea is that we can improve a ML local policy by using heuristic search methods on the output probability distribution to resolve deadlocks and enable full horizon planning. We show several model-agnostic ways to use heuristic search with learnt policies that significantly improve the policies

ICAPS Conference 2024 Conference Paper

MAPF in 3D Warehouses: Dataset and Analysis

Qian Wang
Rishi Veerapaneni
Yu Wu
Jiaoyang Li 0001
Maxim Likhachev

Recent works have made significant progress in multi-agent path finding (MAPF), with modern methods being able to scale to hundreds of agents, handle unexpected delays, work in groups, etc. The vast majority of these methods have focused on 2D "grid world" domains. However, modern warehouses often utilize multi-agent robotic systems that can move in 3D, enabling dense storage but resulting in a more complex multi-agent planning problem. Motivated by this, we introduce and experimentally analyze the application of MAPF to 3D warehouse management, and release the first (see http: //mapf. info/index. php/Main/Benchmarks) open-source 3D MAPF dataset. We benchmark two state-of-the-art MAPF methods, EECBS and MAPF-LNS2, and show how different hyper-parameters affect these methods across various 3D MAPF problems. We also investigate how the warehouse structure itself affects MAPF performance. Based on our experimental analysis, we find that a fast low-level search is critical for 3D MAPF, EECBS

TMLR Journal 2024 Journal Article

MDP: A Generalized Framework for Text-Guided Image Editing by Manipulating the Diffusion Path

Qian Wang
Biao Zhang
Michael Birsak
Peter Wonka

Image generation using diffusion can be controlled in multiple ways. In this paper, we systematically analyze the equations of modern generative diffusion networks to propose a framework, called MDP, that explains the design space of suitable manipulations. We identify 5 different manipulations, including intermediate latent, conditional embedding, cross attention maps, guidance, and predicted noise. We analyze the corresponding parameters of these manipulations and the manipulation schedule. We show that some previous editing methods fit nicely into our framework. Particularly, we identified one specific configuration as a new type of control by manipulating the predicted noise, which can perform higher-quality edits than previous work for a variety of local and global edits.

AAAI Conference 2024 Conference Paper

Mining Gaze for Contrastive Learning toward Computer-Assisted Diagnosis

Zihao Zhao
Sheng Wang
Qian Wang
Dinggang Shen

Obtaining large-scale radiology reports can be difficult for medical images due to ethical concerns, limiting the effectiveness of contrastive pre-training in the medical image domain and underscoring the need for alternative methods. In this paper, we propose eye-tracking as an alternative to text reports, as it allows for the passive collection of gaze signals without ethical issues. By tracking the gaze of radiologists as they read and diagnose medical images, we can understand their visual attention and clinical reasoning. When a radiologist has similar gazes for two medical images, it may indicate semantic similarity for diagnosis, and these images should be treated as positive pairs when pre-training a computer-assisted diagnosis (CAD) network through contrastive learning. Accordingly, we introduce the Medical contrastive Gaze Image Pre-training (McGIP) as a plug-and-play module for contrastive learning frameworks. McGIP uses radiologist gaze to guide contrastive pre-training. We evaluate our method using two representative types of medical images and two common types of gaze data. The experimental results demonstrate the practicality of McGIP, indicating its high potential for various clinical scenarios and applications.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Multi-Chain Graphs of Graphs: A New Approach to Analyzing Blockchain Datasets

Bingqiao Luo
Zhen Zhang
Qian Wang
Bingsheng He

Machine learning applied to blockchain graphs offers significant opportunities for enhanced data analysis and applications. However, the potential of this field is constrained by the lack of a large-scale, cross-chain dataset that includes hierarchical graph-level data. To address this issue, we present novel datasets that provide detailed label information at the token level and integrate interactions between tokens across multiple blockchain platforms. We model transactions within each token as local graphs and the relationships between tokens as global graphs, collectively forming a "Graphs of Graphs" (GoG) approach. This innovative approach facilitates a deeper understanding of systemic structures and hierarchical interactions, which are essential for applications such as link prediction, anomaly detection, and token classification. We conduct a series of experiments demonstrating that this dataset delivers new insights and challenges for exploring GoG within the blockchain domain. Our work promotes advancements and opens new avenues for research in both the blockchain and graph communities. Source code and datasets are available at https: //github. com/Xtra-Computing/Cryptocurrency-Graphs-of-graphs.

PDF Details DOI

JBHI Journal 2024 Journal Article

RCPS: Rectified Contrastive Pseudo Supervision for Semi-Supervised Medical Image Segmentation

Xiangyu Zhao
Zengxin Qi
Sheng Wang
Qian Wang
Xuehai Wu
Ying Mao
Lichi Zhang

Medical image segmentation methods are generally designed as fully-supervised to guarantee model performance, which requires a significant amount of expert annotated samples that are high-cost and laborious. Semi-supervised image segmentation can alleviate the problem by utilizing a large number of unlabeled images along with limited labeled images. However, learning a robust representation from numerous unlabeled images remains challenging due to potential noise in pseudo labels and insufficient class separability in feature space, which undermines the performance of current semi-supervised segmentation approaches. To address the issues above, we propose a novel semi-supervised segmentation method named as Rectified Contrastive Pseudo Supervision (RCPS), which combines a rectified pseudo supervision and voxel-level contrastive learning to improve the effectiveness of semi-supervised segmentation. Particularly, we design a novel rectification strategy for the pseudo supervision method based on uncertainty estimation and consistency regularization to reduce the noise influence in pseudo labels. Furthermore, we introduce a bidirectional voxel contrastive loss in the network to ensure intra-class consistency and inter-class contrast in feature space, which increases class separability in the segmentation. The proposed RCPS segmentation method has been validated on two public datasets and an in-house clinical dataset. Experimental results reveal that the proposed method yields better segmentation performance compared with the state-of-the-art methods in semi-supervised medical image segmentation.

NeurIPS Conference 2024 Conference Paper

ROBIN: Robust and Invisible Watermarks for Diffusion Models with Adversarial Optimization

Huayang Huang
Yu Wu
Qian Wang

Watermarking generative content serves as a vital tool for authentication, ownership protection, and mitigation of potential misuse. Existing watermarking methods face the challenge of balancing robustness and concealment. They empirically inject a watermark that is both invisible and robust and passively achieve concealment by limiting the strength of the watermark, thus reducing the robustness. In this paper, we propose to explicitly introduce a watermark hiding process to actively achieve concealment, thus allowing the embedding of stronger watermarks. To be specific, we implant a robust watermark in an intermediate diffusion state and then guide the model to hide the watermark in the final generated image. We employ an adversarial optimization algorithm to produce the optimal hiding prompt guiding signal for each watermark. The prompt embedding is optimized to minimize artifacts in the generated image, while the watermark is optimized to achieve maximum strength. The watermark can be verified by reversing the generation process. Experiments on various diffusion models demonstrate the watermark remains verifiable even under significant image tampering and shows superior invisibility compared to other state-of-the-art robust watermarking methods.

PDF Details DOI

YNIMG Journal 2024 Journal Article

Time courses of brain plasticity underpinning visual motion perceptual learning

Yongqian Song
Qian Wang
Fang Fang

Visual perceptual learning (VPL) refers to a long-term improvement of visual task performance through training or experience, reflecting brain plasticity even in adults. In human subjects, VPL has been mostly studied using functional magnetic resonance imaging (fMRI). However, due to the low temporal resolution of fMRI, how VPL affects the time course of visual information processing is largely unknown. To address this issue, we trained human subjects to perform a visual motion direction discrimination task. Their behavioral performance and magnetoencephalography (MEG) signals responding to the motion stimuli were measured before, immediately after, and two weeks after training. Training induced a long-lasting behavioral improvement for the trained direction. Based on the MEG signals from occipital sensors, we found that, for the trained motion direction, VPL increased the motion direction decoding accuracy, reduced the motion direction decoding latency, enhanced the direction-selective channel response, and narrowed the tuning profile. Following the MEG source reconstruction, we showed that VPL enhanced the cortical response in early visual cortex (EVC) and strengthened the feedforward connection from EVC to V3A. These VPL-induced neural changes co-occurred in 160-230 ms after stimulus onset. Complementary to previous fMRI findings on VPL, this study provides a comprehensive description on the neural mechanisms of visual motion perceptual learning from a temporal perspective and reveals how VPL shapes the time course of visual motion processing in the adult human brain.

IJCAI Conference 2023 Conference Paper

Detecting Adversarial Faces Using Only Real Face Self-Perturbations

Qian Wang
Yongqin Xian
Hefei Ling
Jinyuan Zhang
Xiaorui Lin
Ping Li
Jiazhong Chen
Ning Yu

Adversarial attacks aim to disturb the functionality of a target system by adding specific noise to the input samples, bringing potential threats to security and robustness when applied to facial recognition systems. Although existing defense techniques achieve high accuracy in detecting some specific adversarial faces (adv-faces), new attack methods especially GAN-based attacks with completely different noise patterns circumvent them and reach a higher attack success rate. Even worse, existing techniques require attack data before implementing the defense, making it impractical to defend newly emerging attacks that are unseen to defenders. In this paper, we investigate the intrinsic generality of adv-faces and propose to generate pseudo adv-faces by perturbing real faces with three heuristically designed noise patterns. We are the first to train an adv-face detector using only real faces and their self-perturbations, agnostic to victim facial recognition systems, and agnostic to unseen attacks. By regarding adv-faces as out-of-distribution data, we then naturally introduce a novel cascaded system for adv-face detection, which consists of training data self-perturbations, decision boundary regularization, and a max-pooling-based binary classifier focusing on abnormal local color aberrations. Experiments conducted on LFW and CelebA-HQ datasets with eight gradient-based and two GAN-based attacks validate that our method generalizes to a variety of unseen adversarial attacks.

PDF Details DOI

AAAI Conference 2023 System Paper

FC-TrackNet: Fast Convergence Net for 6D Pose Tracking in Synthetic Domains

Di Jia
Qian Wang
Jun Cao
Peng Cai
Zhiyang Jin

In this work, we propose a fast convergence track net, or FC-TrackNet, based on a synthetic data-driven approach to maintaining long-term 6D pose tracking. Comparison experiments are performed on two different datasets, The results demonstrate that our approach can achieve a consistent tracking frequency of 90.9 Hz as well as higher accuracy than the state-of-the art approaches.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

Orion: Online Backdoor Sample Detection via Evolution Deviance

Huayang Huang
Qian Wang
Xueluan Gong
Tao Wang

Widely-used DNN models are vulnerable to backdoor attacks, where the backdoored model is only triggered by specific inputs but can maintain a high prediction accuracy on benign samples. Existing backdoor input detection strategies rely on the assumption that benign and poisoned samples are separable in the feature representation of the model. However, such an assumption can be broken by advanced feature-hidden backdoor attacks. In this paper, we propose a novel detection framework, dubbed Orion (online backdoor sample detection via evolution deviance). Specifically, we analyze how predictions evolve during a forward pass and find deviations between the shallow and deep outputs of the backdoor inputs. By introducing side nets to track such evolution divergence, Orion eliminates the need for the assumption of latent separability. Additionally, we put forward a scheme to restore the original label of backdoor samples, enabling more robust predictions. Extensive experiments on six attacks, three datasets, and two architectures verify the effectiveness of Orion. It is shown that Orion outperforms state-of-the-art defenses and can identify feature-hidden attacks with an F1-score of 90%, compared to 40% for other detection schemes. Orion can also achieve 80% label recovery accuracy on basic backdoor attacks.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Revisiting Adversarial Robustness Distillation from the Perspective of Robust Fairness

Xinli Yue
Mou Ningping
Qian Wang
Lingchen Zhao

Adversarial Robustness Distillation (ARD) aims to transfer the robustness of large teacher models to small student models, facilitating the attainment of robust performance on resource-limited devices. However, existing research on ARD primarily focuses on the overall robustness of student models, overlooking the crucial aspect of $\textit{robust fairness}$. Specifically, these models may demonstrate strong robustness on some classes of data while exhibiting high vulnerability on other classes. Unfortunately, the "buckets effect" implies that the robustness of the deployed model depends on the classes with the lowest level of robustness. In this paper, we first investigate the inheritance of robust fairness during ARD and reveal that student models only partially inherit robust fairness from teacher models. We further validate this issue through fine-grained experiments with various model capacities and find that it may arise due to the gap in capacity between teacher and student models, as well as the existing methods treating each class equally during distillation. Based on these observations, we propose $\textbf{Fair}$ $\textbf{A}$dversarial $\textbf{R}$obustness $\textbf{D}$istillation (Fair-ARD), a novel framework for enhancing the robust fairness of student models by increasing the weights of difficult classes, and design a geometric perspective-based method to quantify the difficulty of different classes for determining the weights. Extensive experiments show that Fair-ARD surpasses both state-of-the-art ARD methods and existing robust fairness algorithms in terms of robust fairness (e. g. , the worst-class robustness under AutoAttack is improved by at most 12. 3\% and 5. 3\% using ResNet18 on CIFAR10, respectively), while also slightly improving overall robustness. Our code is available at: [https: //github. com/NISP-official/Fair-ARD](https: //github. com/NISP-official/Fair-ARD).

EAAI Journal 2022 Journal Article

A benchmark for multi-class object counting and size estimation using deep convolutional neural networks

Zixu Liu
Qian Wang
Fanlin Meng

IJCAI Conference 2022 Conference Paper

A Few Seconds Can Change Everything: Fast Decision-based Attacks against DNNs

Ningping Mou
Baolin Zheng
Qian Wang
Yunjie Ge
Binqing Guo

Previous researches have demonstrated deep learning models' vulnerabilities to decision-based adversarial attacks, which craft adversarial examples based solely on information from output decisions (top-1 labels). However, existing decision-based attacks have two major limitations, i. e. , expensive query cost and being easy to detect. To bridge the gap and enlarge real threats to commercial applications, we propose a novel and efficient decision-based attack against black-box models, dubbed FastDrop, which only requires a few queries and work well under strong defenses. The crux of the innovation is that, unlike existing adversarial attacks that rely on gradient estimation and additive noise, FastDrop generates adversarial examples by dropping information in the frequency domain. Extensive experiments on three datasets demonstrate that FastDrop can escape the detection of the state-of-the-art (SOTA) black-box defenses and reduce the number of queries by 13~133× under the same level of perturbations compared with the SOTA attacks. FastDrop only needs 10~20 queries to conduct an attack against various black-box models within 1s. Besides, on commercial vision APIs provided by Baidu and Tencent, FastDrop achieves an attack success rate (ASR) of 100% with 10 queries on average, which poses a real and severe threat to real-world applications.

PDF Details DOI

JBHI Journal 2022 Journal Article

GAN-Guided Deformable Attention Network for Identifying Thyroid Nodules in Ultrasound Images

Jintao Lu
Xi Ouyang
Xueda Shen
Tianjiao Liu
Zhiming Cui
Qian Wang
Dinggang Shen

Early detection and identification of malignant thyroid nodules, a vital precursory to the treatment, is a difficult task even for experienced clinicians. Many Computer-Aided Diagnose (CAD) systems have been developed to assist clinicians in performing this task on ultrasonic images. Learning-based CAD systems for thyroid nodules generally accommodate both nodule detection/ segmentation and fine-grained classification for its malignancy, and prior researches often treat aforementioned tasks in separate stages, leading to additional computational costs. In this paper, we utilize an online class activation mapping (CAM) mechanism to guide the network to learn discriminative features for identifying thyroid nodules in ultrasound images, called CAM attention network. It takes nodule masks as localization cues for direct spatial attention of the classification module, thereby avoiding isolated training for classification. Meanwhile, we propose a deformable convolution module to add offsets to the regular grid sampling locations in the standard convolution, guiding the network to capture more discriminative features of nodule areas. Furthermore, we use a generative adversarial network (GAN)to ensure reliable deformations of nodules from the deformable convolution module. Our proposed CAM attention network has already achieved the 2nd place in the classification task of TN-SCUI 2020, a MICCAI 2020 Challenge with the largest set of thyroid nodule ultrasound images according to our knowledge. The further inclusion of our proposed GAN-guided deformable module allows for capturing more fine-grained features between benign and malignant nodules, and further improves the classification accuracy to a new state-of-the-art level.

AAAI Conference 2022 Conference Paper

How to Distribute Data across Tasks for Meta-Learning?

Alexandru Cioba
Michael Bromberg
Qian Wang
Ritwik Niyogi
Georgios Batzolis
Jezabel Garcia
Da-shan Shiu
Alberto Bernacchia

Meta-learning models transfer the knowledge acquired from previous tasks to quickly learn new ones. They are trained on benchmarks with a fixed number of data points per task. This number is usually arbitrary and it is unknown how it affects performance at testing. Since labelling of data is expensive, finding the optimal allocation of labels across training tasks may reduce costs. Given a fixed budget of labels, should we use a small number of highly labelled tasks, or many tasks with few labels each? Should we allocate more labels to some tasks and less to others? We show that: 1) If tasks are homogeneous, there is a uniform optimal allocation, whereby all tasks get the same amount of data; 2) At fixed budget, there is a trade-off between number of tasks and number of data points per task, with a unique solution for the optimum; 3) When trained separately, harder task should get more data, at the cost of a smaller number of tasks; 4) When training on a mixture of easy and hard tasks, more data should be allocated to easy tasks. Interestingly, Neuroscience experiments have shown that human visual skills also transfer better from easy tasks. We prove these results mathematically on mixed linear regression, and we show empirically that the same results hold for few-shot image classification on CIFAR-FS and mini-ImageNet. Our results provide guidance for allocating labels across tasks when collecting data for meta-learning.

AAAI Conference 2022 Conference Paper

Parameter Differentiation Based Multilingual Neural Machine Translation

Qian Wang
Jiajun Zhang

Multilingual neural machine translation (MNMT) aims to translate multiple languages with a single model and has been proved successful thanks to effective knowledge transfer among different languages with shared parameters. However, it is still an open question which parameters should be shared and which ones need to be task-specific. Currently, the common practice is to heuristically design or search languagespecific modules, which is difficult to find the optimal configuration. In this paper, we propose a novel parameter differentiation based method that allows the model to determine which parameters should be language-specific during training. Inspired by cellular differentiation, each shared parameter in our method can dynamically differentiate into more specialized types. We further define the differentiation criterion as inter-task gradient similarity. Therefore, parameters with conflicting inter-task gradients are more likely to be language-specific. Extensive experiments on multilingual datasets have demonstrated that our method significantly outperforms various strong baselines with different parameter sharing configurations. Further analyses reveal that the parameter sharing configuration obtained by our method correlates well with the linguistic proximities.

AAAI Conference 2022 Conference Paper

Reliability Exploration with Self-Ensemble Learning for Domain Adaptive Person Re-identification

Zongyi Li
Yuxuan Shi
Hefei Ling
Jiazhong Chen
Qian Wang
Fengfan Zhou

Person re-identification (Re-ID) based on unsupervised domain adaptation (UDA) aims to transfer the pre-trained model from one labeled source domain to an unlabeled target domain. Existing methods tackle this problem by using clustering methods to generate pseudo labels. However, pseudo labels produced by these techniques may be unstable and noisy, substantially deteriorating models’ performance. In this paper, we propose a Reliability Exploration with Self-ensemble Learning (RESL) framework for domain adaptive person Re- ID. First, to increase the feature diversity, multiple branches are presented to extract features from different data augmentations. Taking the temporally average model as a mean teacher model, online label refining is conducted by using its dynamic ensemble predictions from different branches as soft labels. Second, to combat the adverse effects of unreliable samples in clusters, sample reliability is estimated by evaluating the consistency of different clusters’ results, followed by selecting reliable instances for training and re-weighting sample contribution within Re-ID losses. A contrastive loss is also utilized with cluster-level memory features which are updated by the mean feature. The experiments demonstrate that our method can significantly surpass the state-of-the-art performance on the unsupervised domain adaptive person Re- ID.

EAAI Journal 2021 Journal Article

Dynamic clustering analysis for driving styles identification

Maria Valentina Niño de Zepeda
Fanlin Meng
Jinya Su
Xiao-Jun Zeng
Qian Wang

IJCAI Conference 2021 Conference Paper

InverseNet: Augmenting Model Extraction Attacks with Training Data Inversion

Xueluan Gong
Yanjiao Chen
Wenbin Yang
Guanghao Mei
Qian Wang

Cloud service providers, including Google, Amazon, and Alibaba, have now launched machine-learning-as-a-service (MLaaS) platforms, allowing clients to access sophisticated cloud-based machine learning models via APIs. Unfortunately, however, the commercial value of these models makes them alluring targets for theft, and their strategic position as part of the IT infrastructure of many companies makes them an enticing springboard for conducting further adversarial attacks. In this paper, we put forth a novel and effective attack strategy, dubbed InverseNet, that steals the functionality of black-box cloud-based models with only a small number of queries. The crux of the innovation is that, unlike existing model extraction attacks that rely on public datasets or adversarial samples, InverseNet constructs inversed training samples to increase the similarity between the extracted substitute model and the victim model. Further, only a small number of data samples with high confidence scores (rather than an entire dataset) are used to reconstruct the inversed dataset, which substantially reduces the attack cost. Extensive experiments conducted on three simulated victim models and Alibaba Cloud's commercially-available API demonstrate that InverseNet yields a model with significantly greater functional similarity to the victim model than the current state-of-the-art attacks at a substantially lower query budget.

PDF Details DOI

IJCAI Conference 2021 Conference Paper

Recent Advances in Adversarial Training for Adversarial Robustness

Tao Bai
Jinqi Luo
Jun Zhao
Bihan Wen
Qian Wang

Adversarial training is one of the most effective approaches for deep learning models to defend against adversarial examples. Unlike other defense strategies, adversarial training aims to enhance the robustness of models intrinsically. During the past few years, adversarial training has been studied and discussed from various aspects, which deserves a comprehensive review. For the first time in this survey, we systematically review the recent progress on adversarial training for adversarial robustness with a novel taxonomy. Then we discuss the generalization problems in adversarial training from three perspectives and highlight the challenges which are not fully tackled. Finally, we present potential future directions.

PDF Details DOI

AAAI Conference 2021 Conference Paper

Synchronous Interactive Decoding for Multilingual Neural Machine Translation

Hao He
Qian Wang
Zhipeng Yu
Yang Zhao
Jiajun Zhang
Chengqing Zong

To simultaneously translate a source language into multiple different target languages is one of the most common scenarios of multilingual translation. However, existing methods cannot make full use of translation model information during decoding, such as intra-lingual and inter-lingual future information, and therefore may suffer from some issues like the unbalanced outputs. In this paper, we present a new approach for synchronous interactive multilingual neural machine translation (SimNMT), which predicts each target language output simultaneously and interactively using historical and future information of all target languages. Specifically, we first propose a synchronous cross-interactive decoder in which generation of each target output does not only depend on its generated sequences, but also relies on its future information, as well as history and future contexts of other target languages. Then, we present a new interactive multilingual beam search algorithm that enables synchronous interactive decoding of all target languages in a single model. We take two target languages as an example to illustrate and evaluate the proposed SimNMT model on IWSLT datasets. The experimental results demonstrate that our method achieves significant improvements over several advanced NMT and M- NMT models.

AAMAS Conference 2021 Conference Paper

The Tight Bound for Pure Price of Anarchy in an Extended Miner's Dilemma Game

Qian Wang
Yurong Chen

Pool block withholding attack, which reduces the effective mining power in the system and leads to potential systemic instability in the blockchain, can be modeled as a non-cooperative game called “the miner’s dilemma”. However, existing literature on the gametheoretic properties of this attack only gives a preliminary analysis. In this paper, we establish the existence and uniqueness of pure Nash equilibrium for the two-player miner’s dilemma. Then we give a tight upper bound 2 for PPoA, which measures how much mining power is wasted in the game. Moreover, we show the uniqueness and the tight bound holds in a more general setting with betrayal assumption. Inspired by the experiments on the games among three mining pools, we conjecture that similar results should hold for the N-player miner’s dilemma game (𝑁 ≥ 2).

AAAI Conference 2020 Conference Paper

Attention-over-Attention Field-Aware Factorization Machine

Zhibo Wang
Jinxin Ma
Yongquan Zhang
Qian Wang
Ju Ren
Peng Sun

Factorization Machine (FM) has been a popular approach in supervised predictive tasks, such as click-through rate prediction and recommender systems, due to its great performance and efﬁciency. Recently, several variants of FM have been proposed to improve its performance. However, most of the state-of-the-art prediction algorithms neglected the ﬁeld information of features, and they also failed to discriminate the importance of feature interactions due to the problem of redundant features. In this paper, we present a novel algorithm called Attention-over-Attention Field-aware Factorization Machine (AoAFFM) for better capturing the characteristics of feature interactions. Speciﬁcally, we propose the ﬁeldaware embedding layer to exploit the ﬁeld information of features, and combine it with the attention-over-attention mechanism to learn both feature-level and interaction-level attention to estimate the weight of feature interactions. Experimental results show that the proposed AoAFFM improves FM and FFM with large margin, and outperforms state-of-the-art algorithms on three public benchmark datasets.

YNICL Journal 2020 Journal Article

Systems modeling of white matter microstructural abnormalities in Alzheimer's disease

Emrin Horgusluoglu-Moloch
Gaoyu Xiao
Minghui Wang
Qian Wang
Xianxiao Zhou
Kwangsik Nho
Andrew J. Saykin
Eric Schadt

INTRODUCTION: Microstructural abnormalities in white matter (WM) are often reported in Alzheimer's disease (AD). However, it is unclear which brain regions have the strongest WM changes in presymptomatic AD and what biological processes underlie WM abnormality during disease progression. METHODS: We developed a systems biology framework to integrate matched diffusion tensor imaging (DTI), genetic and transcriptomic data to investigate regional vulnerability to AD and identify genetic risk factors and gene subnetworks underlying WM abnormality in AD. RESULTS: We quantified regional WM abnormality and identified most vulnerable brain regions. A SNP rs2203712 in CELF1 was most significantly associated with several DTI-derived features in the hippocampus, the top ranked brain region. An immune response gene subnetwork in the blood was most correlated with DTI features across all the brain regions. DISCUSSION: Incorporation of image analysis with gene network analysis enhances our understanding of disease progression and facilitates identification of novel therapeutic strategies for AD.

AAAI Conference 2020 Conference Paper

Unsupervised Domain Adaptation via Structured Prediction Based Selective Pseudo-Labeling

Qian Wang
Toby Breckon

Unsupervised domain adaptation aims to address the problem of classifying unlabeled samples from the target domain whilst labeled samples are only available from the source domain and the data distributions are different in these two domains. As a result, classiﬁers trained from labeled samples in the source domain suffer from signiﬁcant performance drop when directly applied to the samples from the target domain. To address this issue, different approaches have been proposed to learn domain-invariant features or domain-speciﬁc classiﬁers. In either case, the lack of labeled samples in the target domain can be an issue which is usually overcome by pseudo-labeling. Inaccurate pseudo-labeling, however, could result in catastrophic error accumulation during learning. In this paper, we propose a novel selective pseudo-labeling strategy based on structured prediction. The idea of structured prediction is inspired by the fact that samples in the target domain are well clustered within the deep feature space so that unsupervised clustering analysis can be used to facilitate accurate pseudo-labeling. Experimental results on four datasets (i. e. Ofﬁce-Caltech, Ofﬁce31, ImageCLEF-DA and Ofﬁce-Home) validate our approach outperforms contemporary state-of-the-art methods.

YNICL Journal 2019 Journal Article

Quantitative susceptibility mapping based hybrid feature extraction for diagnosis of Parkinson's disease

Bin Xiao
Naying He
Qian Wang
Zenghui Cheng
Yining Jiao
E. Mark Haacke
Fuhua Yan
Feng Shi

Parkinson's disease is the second most common neurodegenerative disease in the elderly after Alzheimer's disease. The aetiology and pathogenesis of Parkinson's disease (PD) are still unclear, but the loss of dopaminergic cells and the excessive iron deposition in the substantia nigra (SN) are associated with the pathophysiology. As an imaging technique that can quantitatively reflect the amount of iron deposition, Quantitative Susceptibility Mapping (QSM) has been shown to be a promising modality for the diagnosis of PD. In the present work, we propose a hybrid feature extraction method for PD diagnosis using QSM images. First, we extract radiomics features from the SN using QSM and employ machine learning algorithms to classify PD and normal controls (NC). This approach allows us to investigate which features are most vulnerable to the effects of the disease. Along with this approach, we propose a Convolutional Neural Network (CNN) based method which can extract different features from the QSM image to further support the diagnosis of PD. Finally, we combine these two types of features and we find that the radiomics features and CNN features are complementary to each other, which helps further improve the classification (diagnostic) performance. We conclude that: (1) radiomics features from QSM data have significant clinical value for the diagnosis of PD; (2) CNN features are also useful in the diagnosis of PD; and (3) the combination of radiomics features and CNN features can enhance the diagnostic accuracy.

JBHI Journal 2019 Journal Article

Regression Convolutional Neural Network for Automated Pediatric Bone Age Assessment From Hand Radiograph

Xuhua Ren
Tingting Li
Xiujun Yang
Shuai Wang
Sahar Ahmad
Lei Xiang
Shaun Richard Stone
Lihong Li

Skeletal bone age assessment is a common clinical practice to investigate endocrinology, and genetic and growth disorders of children. However, clinical interpretation and bone age analyses are time-consuming, labor intensive, and often subject to inter-observer variability. This advocates the need of a fully automated method for bone age assessment. We propose a regression convolutional neural network (CNN) to automatically assess the pediatric bone age from hand radiograph. Our network is specifically trained to place more attention to those bone age related regions in the X-ray images. Specifically, we first adopt the attention module to process all images and generate the coarse/fine attention maps as inputs for the regression network. Then, the regression CNN follows the supervision of the dynamic attention loss during training; thus, it can estimate the bone age of the hard (or “outlier”) images more accurately. The experimental results show that our method achieves an average discrepancy of 5. 2–5. 3 months between clinical and automatic bone age evaluations on two large datasets. In conclusion, we propose a fully automated deep learning solution to process X-ray images of the hand for bone age assessment, with the accuracy comparable to human experts but with much better efficiency.

IROS Conference 2019 Conference Paper

Robot Learning via Human Adversarial Games

Jiali Duan
Qian Wang
Lerrel Pinto
C. -C. Jay Kuo
Stefanos Nikolaidis

Much work in robotics has focused on “humanin-the-loop” learning techniques that improve the efficiency of the learning process. However, these algorithms have made the strong assumption of a cooperating human supervisor that assists the robot. In reality, human observers tend to also act in an adversarial manner towards deployed robotic systems. We show that this can in fact improve the robustness of the learned models by proposing a physical framework that leverages perturbations applied by a human adversary, guiding the robot towards more robust models. In a manipulation task, we show that grasping success improves significantly when the robot trains with a human adversary as compared to training in a self-supervised manner.

TCS Journal 2017 Journal Article

Online algorithms for scheduling on batch processing machines with interval graph compatibilities between jobs

Qian Wang
Ji Tian
Ruyan Fu
Xiangjuan Yao

We consider the online (over time) scheduling problem of minimizing the makespan on m unbounded parallel-batch machines, in which jobs in the same batch have to be pairwise compatible. Compatibility is a symmetric binary relation, which is represented by an interval compatibility graph. The processing time of a batch is equal to the maximum processing time of the jobs in it, and all jobs in the same batch start and finish at the same time. For this problem, firstly, we show that there exists no online algorithm with a competitive ratio less than 2. Then we provide an online algorithm with a competitive ratio 2 + m − 1 m + 1, which is optimal for the case m = 1. When all jobs have the same processing times, we also give an optimal online algorithm.

TCS Journal 2016 Journal Article

Online scheduling on the unbounded drop-line batch machines to minimize the maximum delivery completion time

Ji Tian
Qian Wang
Ruyan Fu
Jinjiang Yuan

We consider the online scheduling on m unbounded drop-line batch machines with delivery times. Here a drop-line batch machine can process several jobs in a batch so that the processing time of a batch is equal to the longest processing time of the jobs in the batch, the jobs in a batch have the same starting time, and the completion time of a job is equal to the sum of its starting time and its processing time. Once the processing of a job is completed on the machine, we immediately deliver it to the destination. The objective is to minimize the time by which all jobs have been delivered. For this problem, we present a best possible online algorithm with a competitive ratio of 1 + α m, where α m is the positive root of the equation α 2 + m α − 1 = 0.

YNIMG Journal 2015 Journal Article

Hierarchical multi-atlas label fusion with multi-scale feature representation and label-specific patch partition

Guorong Wu
Minjeong Kim
Gerard Sanroma
Qian Wang
Brent C. Munsell
Dinggang Shen

YNIMG Journal 2015 Journal Article

Improved image registration by sparse patch-based deformation estimation

Minjeong Kim
Guorong Wu
Qian Wang
Seong-Whan Lee
Dinggang Shen

NeurIPS Conference 2014 Conference Paper

Attentional Neural Network: Feature Selection Using Cognitive Feedback

Qian Wang
Jiaxing Zhang
Sen Song
Zheng Zhang

Attentional Neural Network is a new framework that integrates top-down cognitive bias and bottom-up feature extraction in one coherent architecture. The top-down influence is especially effective when dealing with high noise or difficult segmentation problems. Our system is modular and extensible. It is also easy to train and cheap to run, and yet can accommodate complex behaviors. We obtain classification accuracy better than or competitive with state of art results on the MNIST variation dataset, and successfully disentangle overlaid digits with high success rates. We view such a general purpose framework as an essential foundation for a larger system emulating the cognitive abilities of the whole brain.

YNIMG Journal 2014 Journal Article

Hierarchical unbiased graph shrinkage (HUGS): A novel groupwise registration for large data set

Shihui Ying
Guorong Wu
Qian Wang
Dinggang Shen

CSL Conference 2013 Conference Paper

Semantics of Intensional Type Theory extended with Decidable Equational Theories

Qian Wang
Bruno Barras

Incorporating extensional equality into a dependent intensional type system such as the Calculus of Constructions (CC) provides with stronger type-checking capabilities and makes the proof development closer to intuition. Since strong forms of extensionality generally leads to undecidable type-checking, it seems a reasonable trade-off to extend intensional equality with a decidable first-order theory, as experimented in earlier work on CoqMTU and its implementation CoqMT. In this work, CoqMTU is extended with strong eliminations. The meta-theoretical study, particularly the part relying on semantic arguments, is more complex. A set-theoretical model of the equational theory is the key ingredient to derive the logical consistency of the formalism. Strong normalization, the main lemma from which type-decidability follows, is proved by attaching realizability information to the values of the model. The approach we have followed is to first consider an abstract notion of first-order equational theory, and then instantiate it with a particular instance, Presburger Arithmetic. These results have been formalized using Coq.

YNIMG Journal 2012 Journal Article

Registration of longitudinal brain image sequences with implicit template and spatial–temporal heuristics

Guorong Wu
Qian Wang
Dinggang Shen

YNIMG Journal 2011 Journal Article

Intermediate templates guided groupwise registration of diffusion tensor images

Hongjun Jia
Pew-Thian Yap
Guorong Wu
Qian Wang
Dinggang Shen

YNIMG Journal 2011 Journal Article

SharpMean: Groupwise registration guided by sharp mean image and tree-based registration

Guorong Wu
Hongjun Jia
Qian Wang
Dinggang Shen

YNIMG Journal 2010 Journal Article

ABSORB: Atlas building by self-organized registration and bundling

Hongjun Jia
Guorong Wu
Qian Wang
Dinggang Shen

YNIMG Journal 2010 Journal Article

Attribute vector guided groupwise registration

Qian Wang
Guorong Wu
Pew-Thian Yap
Dinggang Shen

EAAI Journal 2004 Journal Article

Pricing-based strategies for autonomic control of web servers for time-varying request arrivals

Yiyu Chen
Amitayu Das
Natarajan Gautam
Qian Wang
Anand Sivasubramaniam