Arrow Research search

Author name cluster

Qing Guo

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

29 papers
1 author row

Possible papers

29

AAAI Conference 2026 Conference Paper

Exploiting Geometric Structures for Modeling Multi-Agent Behaviors: A New Thinking

  • Bohao Qu
  • Xiaofeng Cao
  • Bing Li
  • Menglin Zhang
  • Tuan-Anh Vu
  • Di Lin
  • Qing Guo

In this paper, we rethink model agent behaviors from a geometric structure perspective in multi-agent reinforcement learning. Modeling agent behaviors is essential for understanding how agents interact and facilitating effective decisions. The key lies in capturing the dependencies and sequential relationships among agent decisions. Since each decision influences the subsequent choices, this forms a hierarchical and nested tree-like structure of interdependencies. While modeling tree-like data in Euclidean spaces could cause distortion, which results in a loss of agent decision structure information. Motivated by this, we reconsider model agent behaviors in hyperbolic space and propose the Hyperbolic Multi-Agent Representations (HMAR) method, which projects the agent behaviors into a Poincaré ball and leverages hyperbolic neural networks to learn agent policy representations. Additionally, we designed a contrastive loss function to train this network, minimizing the distance in feature space between different representations of the same agent while maximizing the distance between representations of distinct agents. Experimental results provide empirical evidence for the effectiveness of the HMAR method in cooperative and competitive environments, demonstrating the potential of hyperbolic agent representations for effective decision-making in multi-agent environments.

AAAI Conference 2026 Conference Paper

FreeMem: Enhancing Consistency in Long Video Generation via Tuning-Free Memory

  • Jibin Peng
  • Di Lin
  • Zhecheng Xu
  • Haoran Lu
  • Ruonan Liu
  • Wuyuan Xie
  • Miaohui Wang
  • Lingyu Liang

Text-to-Video (T2V) generation has advanced greatly, yet maintaining consistency remains challenging, especially for tuning-free long video generation. We attribute the consistency problem to cumulative deviations for long video generation at three levels: the random noise lacking correlation results initial deviation between frames; discrepancy in semantic feature tokens between denoising network blocks gradually accumulates as the frame count grows, leading to greater deviations; attention mechanisms struggle to capture global relationships across distant frames in long videos. To address these, we propose FreeMem, a tuning-free framework leveraging hierarchical memory update and injection: the noise memory stabilizes consistency by manipulating low and high frequency components in the initial noise space; the token memory combats inconsistency through adaptive fusion of historical and current semantic feature tokens between denoising network blocks; and the attention memory establishes persistent cache to model long-range relationships within self attention layers. Evaluated on VBench, FreeMem improves subject and background consistency matrics across various methods, offering a practical solution for low-cost, high-consistency long video generation.

AAAI Conference 2026 Conference Paper

MAGIC: Mastering Physical Adversarial Generation in Context Through Collaborative LLM Agents

  • Yun Xing
  • Nhat Chung
  • Jie Zhang
  • Yue Cao
  • Ivor Tsang
  • Yang Liu
  • Lei Ma
  • Qing Guo

Physical adversarial attacks in driving scenarios can expose critical vulnerabilities in visual perception models. However, developing such attacks remains non-trivial due to diverse real-world environmental influences. Existing approaches either struggle to generalize to dynamic environments or fail to achieve consistent physical attack performance. To address these challenges, we propose MAGIC (Mastering Physical Adversarial Generation In Context), a novel framework powered by multi-modal LLM agents to automatically understand the scene context during testing time and generate adversarial patches through synergistic interaction of language and vision understanding. Specifically, MAGIC orchestrates three specialized LLM agents: the adv-patch generation agent masters the creation of deceptive patches via strategic prompt manipulation for text-to-image models; the adv-patch deployment agent ensures contextual coherence by determining optimal deployment strategies based on scene understanding; and the self-examination agent completes this trilogy by providing critical oversight and iterative refinement of both processes. We validate our approach with both digital and physical scenarios, i.e., nuImage and real-world scenes, where both statistical and visual results demonstrate that our MAGIC is powerful and effective for attacking widely applied object detection systems, such as YOLO and DETR series.

AAAI Conference 2026 Conference Paper

PhysPatch: A Physically Realizable and Transferable Adversarial Patch Attack for Multimodal Large Language Models-based Autonomous Driving Systems

  • Qi Guo
  • Xiaojun Jia
  • Shanmin Pang
  • Simeng Qin
  • Lin Wang
  • Ju Jia
  • Yang Liu
  • Qing Guo

Multimodal Large Language Models (MLLMs) are becoming integral to autonomous driving (AD) systems due to their strong vision-language reasoning capabilities. However, MLLMs are vulnerable to adversarial attacks—particularly adversarial patch attacks—which can pose serious threats in real-world scenarios. Existing patch-based attack methods are primarily designed for object detection models. Due to the more complex architectures and strong reasoning capabilities of MLLMs, these approaches perform poorly when transferred to MLLM-based systems. To address these limitations, we propose PhysPatch, a physically realizable and transferable adversarial patch framework tailored for MLLM-based AD systems. PhysPatch jointly optimizes patch location, shape, and content to enhance attack effectiveness and real-world applicability. It introduces a semantic-based mask initialization strategy for realistic placement, an SVD-based local alignment loss with patch-guided crop-resize to improve transferability, and a potential field-based mask refinement method. Extensive experiments across open-source, commercial, and reasoning-capable MLLMs demonstrate that PhysPatch significantly outperforms state-of-the-art (SOTA) methods in steering MLLM-based AD systems toward target-aligned perception and planning outputs. Moreover, PhysPatch consistently places adversarial patches in physically feasible regions of AD scenes, ensuring strong real-world applicability and deployability.

JBHI Journal 2025 Journal Article

Adversarial Exposure Attack on Diabetic Retinopathy Imagery Grading

  • Yupeng Cheng
  • Qing Guo
  • Felix Juefei-Xu
  • Huazhu Fu
  • Shang-Wei Lin
  • Weisi Lin

Diabetic Retinopathy (DR) is a leading cause of vision loss around the world. To help diagnose it, numerous cutting-edge works have built powerful deep neural networks (DNNs) to automatically grade DR via retinal fundus images (RFIs). However, RFIs are commonly affected by camera exposure issues that may lead to incorrect grades. The mis-graded results can potentially pose high risks to an aggravation of the condition. In this paper, we study this problem from the viewpoint of adversarial attacks. We identify and introduce a novel solution to an entirely new task, termed as adversarial exposure attack, which is able to produce natural exposure images and mislead the state-of-the-art DNNs. We validate our proposed method on a real-world public DR dataset with three DNNs, e. g. , ResNet50, MobileNet, and EfficientNet, demonstrating that our method achieves high image quality and success rate in transferring the attacks. Our method reveals the potential threats to DNN-based automatic DR grading and would benefit the development of exposure-robust DR grading methods in the future.

NeurIPS Conference 2025 Conference Paper

AngleRoCL: Angle-Robust Concept Learning for Physically View-Invariant Adversarial Patches

  • Wenjun Ji
  • Yuxiang Fu
  • Luyang Ying
  • Deng-Ping Fan
  • Yuyi Wang
  • Ming-Ming Cheng
  • Ivor Tsang
  • Qing Guo

Cutting-edge works have demonstrated that text-to-image (T2I) diffusion models can generate adversarial patches that mislead state-of-the-art object detectors in the physical world, revealing detectors' vulnerabilities and risks. However, these methods neglect the T2I patches' attack effectiveness when observed from different views in the physical world (i. e. , angle robustness of the T2I adversarial patches). In this paper, we study the angle robustness of T2I adversarial patches comprehensively, revealing their angle-robust issues, demonstrating that texts affect the angle robustness of generated patches significantly, and task-specific linguistic instructions fail to enhance the angle robustness. Motivated by the studies, we introduce Angle-Robust Concept Learning (AngleRoCL), a simple and flexible approach that learns a generalizable concept (i. e. , text embeddings in implementation) representing the capability of generating angle-robust patches. The learned concept can be incorporated into textual prompts and guides T2I models to generate patches with their attack effectiveness inherently resistant to viewpoint variations. Through extensive simulation and physical-world experiments on five SOTA detectors across multiple views, we demonstrate that AngleRoCL significantly enhances the angle robustness of T2I adversarial patches compared to baseline methods. Our patches maintain high attack success rates even under challenging viewing conditions, with over 50% average relative improvement in attack effectiveness across multiple angles. This research advances the understanding of physically angle-robust patches and provides insights into the relationship between textual concepts and physical properties in T2I-generated contents. We released our code at https: //github. com/tsingqguo/anglerocl.

AAAI Conference 2025 Conference Paper

Concept Matching with Agent for Out-of-Distribution Detection

  • Yuxiao Lee
  • Xiaofeng Cao
  • Jingcai Guo
  • Wei Ye
  • Qing Guo
  • Yi Chang

The remarkable achievements of Large Language Models (LLMs) have captivated the attention of both academia and industry, transcending their initial role in dialogue generation. To expand the usage scenarios of LLM, some works enhance the effectiveness and capabilities of the model by introducing more external information, which is called the agent paradigm. Based on this idea, we propose a new method that integrates the agent paradigm into out-of-distribution (OOD) detection task, aiming to improve its robustness and adaptability. Our proposed method, Concept Matching with Agent (CMA), employs neutral prompts as agents to augment the CLIP-based OOD detection process. These agents function as dynamic observers and communication hubs, interacting with both In-distribution (ID) labels and data inputs to form vector triangle relationships. This triangular framework offers a more nuanced approach than the traditional binary relationship, allowing for better separation and identification of ID and OOD inputs. Our extensive experimental results showcase the superior performance of CMA over both zero-shot and training-required methods in a diverse array of real-world scenarios.

NeurIPS Conference 2025 Conference Paper

DepthVanish: Optimizing Adversarial Interval Structures for Stereo-Depth-Invisible Patches

  • Yun Xing
  • Yue Cao
  • Nhat Chung
  • Jie Zhang
  • Ivor Tsang
  • Ming-Ming Cheng
  • Yang Liu
  • Lei Ma

Stereo depth estimation is a critical task in autonomous driving and robotics, where inaccuracies (such as misidentifying nearby objects as distant) can lead to dangerous situations. Adversarial attacks against stereo depth estimation can help revealing vulnerabilities before deployment. Previous works have shown that repeating optimized textures can effectively mislead stereo depth estimation in digital settings. However, our research reveals that these naively repeated textures perform poorly in physical implementations, $\textit{i. e. }$, when deployed as patches, limiting their practical utility for stress-testing stereo depth estimation systems. In this work, for the first time, we discover that introducing regular intervals among the repeated textures, creating a grid structure, significantly enhances the patch attack performance. Through extensive experimentation, we analyze how variations of this novel structure influence the adversarial effectiveness. Based on these insights, we develop a novel stereo depth attack that jointly optimizes both the interval structure and texture elements. Our generated adversarial patches can be inserted into any scenes and successfully attack advanced stereo depth estimation methods of different paradigms, $\textit{i. e. }$, RAFT-Stereo and STTR. Most critically, our patch can also attack commercial RGB-D cameras (Intel RealSense) in real-world conditions, demonstrating their practical relevance for security assessment of stereo systems. The code is officially released at: https: //github. com/WiWiN42/DepthVanish

AAMAS Conference 2025 Conference Paper

Imitation from Diverse Behaviors: Wasserstein Quality Diversity Imitation Learning with Single-Step Archive Exploration

  • Xingrui Yu
  • Zhenglin Wan
  • David Mark Bossens
  • Yueming LYU
  • Qing Guo
  • Ivor W. Tsang

Learning diverse and high-performance behaviors from a limited set of demonstrations is a grand challenge. Traditional imitation learning methods usually fail in this task because most of them are designed to learn one specific behavior even with multiple demonstrations. Therefore, novel techniques for quality diversity imitation learning, which bridges the quality diversity optimization and imitation learning methods, are needed to solve the above challenge. This work introduces Wasserstein Quality Diversity Imitation Learning (WQDIL), which 1) improves the stability of imitation learning in the quality diversity setting with latent adversarial training based on a Wasserstein Auto-Encoder (WAE), and 2) mitigates a behavioroverfitting issue using a measure-conditioned reward function with a single-step archive exploration bonus. Empirically, our method significantly outperforms state-of-the-art IL methods, achieving near-expert or beyond-expert QD performance on the challenging continuous control tasks derived from MuJoCo environments.

NeurIPS Conference 2025 Conference Paper

Mask Image Watermarking

  • Runyi Hu
  • Jie Zhang
  • Shiqian Zhao
  • Nils Lukas
  • Jiwei Li
  • Qing Guo
  • Han Qiu
  • Tianwei Zhang

We present MaskWM, a simple, efficient, and flexible framework for image watermarking. MaskWM has two variants: (1) MaskWM-D, which supports global watermark embedding, watermark localization, and local watermark extraction for applications such as tamper detection; (2) MaskWM-ED, which focuses on local watermark embedding and extraction, offering enhanced robustness in small regions to support fine-grined image protection. MaskWM-D builds on the classical encoder-distortion layer-decoder training paradigm. In MaskWM-D, we introduce a simple masking mechanism during the decoding stage that enables both global and local watermark extraction. During training, the decoder is guided by various types of masks applied to watermarked images before extraction, helping it learn to localize watermarks and extract them from the corresponding local areas. MaskWM-ED extends this design by incorporating the mask into the encoding stage as well, guiding the encoder to embed the watermark in designated local regions, which improves robustness under regional attacks. Extensive experiments show that MaskWM achieves state-of-the-art performance in global and local watermark extraction, watermark localization, and multi-watermark embedding. It outperforms all existing baselines, including the recent leading model WAM for local watermarking, while preserving high visual quality of the watermarked images. In addition, MaskWM is highly efficient and adaptable. It requires only 20 hours of training on a single A6000 GPU, achieving 15× computational efficiency compared to WAM. By simply adjusting the distortion layer, MaskWM can be quickly fine-tuned to meet varying robustness requirements.

NeurIPS Conference 2025 Conference Paper

Open-Vocabulary Part Segmentation via Progressive and Boundary-Aware Strategy

  • Xinlong Li
  • Di Lin
  • Shaoyiyi Gao
  • Jiaxin Li
  • Ruonan Liu
  • Qing Guo

Open-vocabulary part segmentation (OVPS) struggles with structurally connected boundaries due to the inherent conflict between continuous image features and discrete classification mechanism. To address this, we propose PBAPS, a novel training-free framework specifically designed for OVPS. PBAPS leverages structural knowledge of object-part relationships to guide a progressive segmentation from objects to fine-grained parts. To further improve accuracy at challenging boundaries, we introduce a Boundary-Aware Refinement (BAR) module that identifies ambiguous boundary regions by quantifying classification uncertainty, enhances the discriminative features of these ambiguous regions using high-confidence context, and adaptively refines part prototypes to better align with the specific image. Experiments on Pascal-Part-116, ADE20K-Part-234, PartImageNet demonstrate that PBAPS significantly outperforms state-of-the-art methods, achieving 46. 35\% mIoU and 34. 46\% bIoU on Pascal-Part-116. Our code is available at https: //github. com/TJU-IDVLab/PBAPS.

NeurIPS Conference 2024 Conference Paper

ColJailBreak: Collaborative Generation and Editing for Jailbreaking Text-to-Image Deep Generation

  • Yizhuo Ma
  • Shanmin Pang
  • Qi Guo
  • Tianyu Wei
  • Qing Guo

The commercial text-to-image deep generation models (e. g. DALL·E) can produce high-quality images based on input language descriptions. These models incorporate a black-box safety filter to prevent the generation of unsafe or unethical content, such as violent, criminal, or hateful imagery. Recent jailbreaking methods generate adversarial prompts capable of bypassing safety filters and producing unsafe content, exposing vulnerabilities in influential commercial models. However, once these adversarial prompts are identified, the safety filter can be updated to prevent the generation of unsafe images. In this work, we propose an effective, simple, and difficult-to-detect jailbreaking solution: generating safe content initially with normal text prompts and then editing the generations to embed unsafe content. The intuition behind this idea is that the deep generation model cannot reject safe generation with normal text prompts, while the editing models focus on modifying the local regions of images and do not involve a safety strategy. However, implementing such a solution is non-trivial, and we need to overcome several challenges: how to automatically confirm the normal prompt to replace the unsafe prompts, and how to effectively perform editable replacement and naturally generate unsafe content. In this work, we propose the collaborative generation and editing for jailbreaking text-to-image deep generation (ColJailBreak), which comprises three key components: adaptive normal safe substitution, inpainting-driven injection of unsafe content, and contrastive language-image-guided collaborative optimization. We validate our method on three datasets and compare it to two baseline methods. Our method could generate unsafe content through two commercial deep generation models including GPT-4 and DALL·E 2.

NeurIPS Conference 2024 Conference Paper

Geometry Awakening: Cross-Geometry Learning Exhibits Superiority over Individual Structures

  • Yadong Sun
  • Xiaofeng Cao
  • Yu Wang
  • Wei Ye
  • Jingcai Guo
  • Qing Guo

Recent research has underscored the efficacy of Graph Neural Networks (GNNs) in modeling diverse geometric structures within graph data. However, real-world graphs typically exhibit geometrically heterogeneous characteristics, rendering the confinement to a single geometric paradigm insufficient for capturing their intricate structural complexities. To address this limitation, we examine the performance of GNNs across various geometries through the lens of knowledge distillation (KD) and introduce a novel cross-geometric framework. This framework encodes graphs by integrating both Euclidean and hyperbolic geometries in a space-mixing fashion. Our approach employs multiple teacher models, each generating hint embeddings that encapsulate distinct geometric properties. We then implement a structure-wise knowledge transfer module that optimally leverages these embeddings within their respective geometric contexts, thereby enhancing the training efficacy of the student model. Additionally, our framework incorporates a geometric optimization network designed to bridge the distributional disparities among these embeddings. Experimental results demonstrate that our model-agnostic framework more effectively captures topological graph knowledge, resulting in superior performance of the student models when compared to traditional KD methodologies.

AAAI Conference 2024 Conference Paper

Personalization as a Shortcut for Few-Shot Backdoor Attack against Text-to-Image Diffusion Models

  • Yihao Huang
  • Felix Juefei-Xu
  • Qing Guo
  • Jie Zhang
  • Yutong Wu
  • Ming Hu
  • Tianlin Li
  • Geguang Pu

Although recent personalization methods have democratized high-resolution image synthesis by enabling swift concept acquisition with minimal examples and lightweight computation, they also present an exploitable avenue for highly accessible backdoor attacks. This paper investigates a critical and unexplored aspect of text-to-image (T2I) diffusion models - their potential vulnerability to backdoor attacks via personalization. By studying the prompt processing of popular personalization methods (epitomized by Textual Inversion and DreamBooth), we have devised dedicated personalization-based backdoor attacks according to the different ways of dealing with unseen tokens and divide them into two families: nouveau-token and legacy-token backdoor attacks. In comparison to conventional backdoor attacks involving the fine-tuning of the entire text-to-image diffusion model, our proposed personalization-based backdoor attack method can facilitate more tailored, efficient, and few-shot attacks. Through comprehensive empirical study, we endorse the utilization of the nouveau-token backdoor attack due to its impressive effectiveness, stealthiness, and integrity, markedly outperforming the legacy-token backdoor attack.

NeurIPS Conference 2024 Conference Paper

Sim2Real-Fire: A Multi-modal Simulation Dataset for Forecast and Backtracking of Real-world Forest Fire

  • Yanzhi Li
  • Keqiu Li
  • Guohui Li
  • Zumin Wang
  • Changqing Ji
  • Lubo Wang
  • Die Zuo
  • Qing Guo

The latest research on wildfire forecast and backtracking has adopted AI models, which require a large amount of data from wildfire scenarios to capture fire spread patterns. This paper explores using cost-effective simulated wildfire scenarios to train AI models and apply them to the analysis of real-world wildfire. This solution requires AI models to minimize the Sim2Real gap, a brand-new topic in the fire spread analysis research community. To investigate the possibility of minimizing the Sim2Real gap, we collect the Sim2Real-Fire dataset that contains 1M simulated scenarios with multi-modal environmental information for training AI models. We prepare 1K real-world wildfire scenarios for testing the AI models. We also propose a deep transformer, S2R-FireTr, which excels in considering the multi-modal environmental information for forecasting and backtracking the wildfire. S2R-FireTr surpasses state-of-the-art methods in real-world wildfire scenarios.

AAAI Conference 2024 Short Paper

Spatial-Temporal Augmentation for Crime Prediction (Student Abstract)

  • Hongzhu Fu
  • Fan Zhou
  • Qing Guo
  • Qiang Gao

Crime prediction stands as a pivotal concern within the realm of urban management due to its potential threats to public safety. While prior research has predominantly focused on unraveling the intricate dependencies among urban regions and temporal dynamics, the challenges posed by the scarcity and uncertainty of historical crime data have not been thoroughly investigated. This study introduces an innovative spatial-temporal augmented learning framework for crime prediction, namely STAug. In STAug, we devise a CrimeMix to improve the ability of generalization. Furthermore, we harness a spatial-temporal aggregation to capture and incorporate multiple correlations covering the temporal, spatial, and crime-type aspects. Experiments on two real-world datasets underscore the superiority of STAug over several baselines.

NeurIPS Conference 2024 Conference Paper

Voxel Proposal Network via Multi-Frame Knowledge Distillation for Semantic Scene Completion

  • Lubo Wang
  • Di Lin
  • Kairui Yang
  • Ruonan Liu
  • Qing Guo
  • Wuyuan Xie
  • Miaohui Wang
  • Lingyu Liang

Semantic scene completion is a difficult task that involves completing the geometry and semantics of a scene from point clouds in a large-scale environment. Many current methods use 3D/2D convolutions or attention mechanisms, but these have limitations in directly constructing geometry and accurately propagating features from related voxels, the completion likely fails while propagating features in a single pass without considering multiple potential pathways. And they are generally only suitable for static scenes and struggle to handle dynamic aspects. This paper introduces Voxel Proposal Network (VPNet) that completes scenes from 3D and Bird's-Eye-View (BEV) perspectives. It includes Confident Voxel Proposal based on voxel-wise coordinates to propose confident voxels with high reliability for completion. This method reconstructs the scene geometry and implicitly models the uncertainty of voxel-wise semantic labels by presenting multiple possibilities for voxels. VPNet employs Multi-Frame Knowledge Distillation based on the point clouds of multiple adjacent frames to accurately predict the voxel-wise labels by condensing various possibilities of voxel relationships. VPNet has shown superior performance and achieved state-of-the-art results on the SemanticKITTI and SemanticPOSS datasets.

AAAI Conference 2023 Conference Paper

Background-Mixed Augmentation for Weakly Supervised Change Detection

  • Rui Huang
  • Ruofei Wang
  • Qing Guo
  • Jieda Wei
  • Yuxiang Zhang
  • Wei Fan
  • Yang Liu

Change detection (CD) is to decouple object changes (i.e., object missing or appearing) from background changes (i.e., environment variations) like light and season variations in two images captured in the same scene over a long time span, presenting critical applications in disaster management, urban development, etc. In particular, the endless patterns of background changes require detectors to have a high generalization against unseen environment variations, making this task significantly challenging. Recent deep learning-based methods develop novel network architectures or optimization strategies with paired-training examples, which do not handle the generalization issue explicitly and require huge manual pixel-level annotation efforts. In this work, for the first attempt in the CD community, we study the generalization issue of CD from the perspective of data augmentation and develop a novel weakly supervised training algorithm that only needs image-level labels. Different from general augmentation techniques for classification, we propose the background-mixed augmentation that is specifically designed for change detection by augmenting examples under the guidance of a set of background changing images and letting deep CD models see diverse environment variations. Moreover, we propose the augmented & real data consistency loss that encourages the generalization increase significantly. Our method as a general framework can enhance a wide range of existing deep learning-based detectors. We conduct extensive experiments in two public datasets and enhance four state-of-the-art methods, demonstrating the advantages of our method. We release the code at https://github.com/tsingqguo/bgmix.

NeurIPS Conference 2023 Conference Paper

CMMA: Benchmarking Multi-Affection Detection in Chinese Multi-Modal Conversations

  • Yazhou Zhang
  • Yang Yu
  • Qing Guo
  • Benyou Wang
  • Dongming Zhao
  • Sagar Uprety
  • Dawei Song
  • Qiuchi Li

Human communication has a multi-modal and multi-affection nature. The inter-relatedness of different emotions and sentiments poses a challenge to jointly detect multiple human affections with multi-modal clues. Recent advances in this field employed multi-task learning paradigms to render the inter-relatedness across tasks, but the scarcity of publicly available resources sets a limit to the potential of works. To fill this gap, we build the first Chinese Multi-modal Multi-Affection conversation (CMMA) dataset, which contains 3, 000 multi-party conversations and 21, 795 multi-modal utterances collected from various styles of TV-series. CMMA contains a wide variety of affection labels, including sentiment, emotion, sarcasm and humor, as well as the novel inter-correlations values between certain pairs of tasks. Moreover, it provides the topic and speaker information in conversations, which promotes better modeling of conversational context. On the dataset, we empirically analyze the influence of different data modalities and conversational contexts on different affection analysis tasks, and exhibit the practical benefit of inter-task correlations. The full dataset will be publicly available for research\footnote{https: //github. com/annoymity2022/Chinese-Dataset}

IJCAI Conference 2023 Conference Paper

Fairness via Group Contribution Matching

  • Tianlin Li
  • Zhiming Li
  • Anran Li
  • Mengnan Du
  • Aishan Liu
  • Qing Guo
  • Guozhu Meng
  • Yang Liu

Fairness issues in Deep Learning models have recently received increasing attention due to their significant societal impact. Although methods for mitigating unfairness are constantly proposed, little research has been conducted to understand how discrimination and bias develop during the standard training process. In this study, we propose analyzing the contribution of each subgroup (i. e. , a group of data with the same sensitive attribute) in the training process to understand the cause of such bias development process. We propose a gradient-based metric to assess training subgroup contribution disparity, showing that unequal contributions from different subgroups are one source of such unfairness. One way to balance the contribution of each subgroup is through oversampling, which ensures that an equal number of samples are drawn from each subgroup during each training iteration. However, we have found that even with a balanced number of samples, the contribution of each group remains unequal, resulting in unfairness under the oversampling strategy. To address the above issues, we propose an easy but effective group contribution matching (GCM) method to match the contribution of each subgroup. Our experiments show that our GCM effectively improves fairness and outperforms other methods significantly.

AIIM Journal 2022 Journal Article

Attribute-aware interpretation learning for thyroid ultrasound diagnosis

  • Ming Kong
  • Qing Guo
  • Shuowen Zhou
  • Mengze Li
  • Kun Kuang
  • Zhengxing Huang
  • Fei Wu
  • Xiaohong Chen

Thyroid nodule diagnosis from ultrasound images is a critical computer-aided diagnosis task. Previous works tried to imitate the doctor's diagnosis logic by considering the key attributes to improve the diagnosis performance and explaining the conclusion. However, their clinical feasibilities are still ambiguous because of the ignorance of the correlation between attribute features and global characteristics, as well as the lack of clinical effectiveness evaluation of result interpretations. Following the common logic of ultrasonic investigation, we design a novel Attribute-Aware Interpretation Learning (AAIL) model, consisting of attribute properties discovery module and attribute-global feature fusion module. Adequate result interpretation ensures reliability and transparency of diagnostic conclusions, including the visualization of attribute features and the relationship between attributes and the global feature. Extensive experiments on a practical dataset demonstrate the model's effectiveness, and an innovative human-computer collaborative experiment demonstrates the auxiliary diagnostic ability of the interpretations that can benefit professional doctors.

NeurIPS Conference 2022 Conference Paper

Generative Status Estimation and Information Decoupling for Image Rain Removal

  • Di Lin
  • Xin Wang
  • Jia Shen
  • Renjie Zhang
  • Ruonan Liu
  • Miaohui Wang
  • Wuyuan Xie
  • Qing Guo

Image rain removal requires the accurate separation between the pixels of the rain streaks and object textures. But the confusing appearances of rains and objects lead to the misunderstanding of pixels, thus remaining the rain streaks or missing the object details in the result. In this paper, we propose SEIDNet equipped with the generative Status Estimation and Information Decoupling for rain removal. In the status estimation, we embed the pixel-wise statuses into the status space, where each status indicates a pixel of the rain or object. The status space allows sampling multiple statuses for a pixel, thus capturing the confusing rain or object. In the information decoupling, we respect the pixel-wise statuses, decoupling the appearance information of rain and object from the pixel. Based on the decoupled information, we construct the kernel space, where multiple kernels are sampled for the pixel to remove the rain and recover the object appearance. We evaluate SEIDNet on the public datasets, achieving state-of-the-art performances of image rain removal. The experimental results also demonstrate the generalization of SEIDNet, which can be easily extended to achieve state-of-the-art performances on other image restoration tasks (e. g. , snow, haze, and shadow removal).

NeurIPS Conference 2022 Conference Paper

Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization

  • Qing Guo
  • Junya Chen
  • Dong Wang
  • Yuewei Yang
  • Xinwei Deng
  • Jing Huang
  • Larry Carin
  • Fan Li

Successful applications of InfoNCE (Information Noise-Contrastive Estimation) and its variants have popularized the use of contrastive variational mutual information (MI) estimators in machine learning. While featuring superior stability, these estimators crucially depend on costly large-batch training, and they sacrifice bound tightness for variance reduction. To overcome these limitations, we revisit the mathematics of popular variational MI bounds from the lens of unnormalized statistical modeling and convex optimization. Our investigation yields a new unified theoretical framework encompassing popular variational MI bounds, and leads to a novel, simple, and powerful contrastive MI estimator we name FLO. Theoretically, we show that the FLO estimator is tight, and it converges under stochastic gradient descent. Empirically, the proposed FLO estimator overcomes the limitations of its predecessors and learns more efficiently. The utility of FLO is verified using extensive benchmarks, and we further inspire the community with novel applications in meta-learning. Our presentation underscores the foundational importance of variational MI estimation in data-efficient learning.

IJCAI Conference 2021 Conference Paper

AVA: Adversarial Vignetting Attack against Visual Recognition

  • Binyu Tian
  • Felix Juefei-Xu
  • Qing Guo
  • Xiaofei Xie
  • Xiaohong Li
  • Yang Liu

Vignetting is an inherent imaging phenomenon within almost all optical systems, showing as a radial intensity darkening toward the corners of an image. Since it is a common effect for photography and usually appears as a slight intensity variation, people usually regard it as a part of a photo and would not even want to post-process it. Due to this natural advantage, in this work, we study the vignetting from a new viewpoint, i. e. , adversarial vignetting attack (AVA), which aims to embed intentionally misleading information into the vignetting and produce a natural adversarial example without noise patterns. This example can fool the state-of-the-art deep convolutional neural networks (CNNs) but is imperceptible to human. To this end, we first propose the radial-isotropic adversarial vignetting attack (RI-AVA) based on the physical model of vignetting, where the physical parameters (e. g. , illumination factor and focal length) are tuned through the guidance of target CNN models. To achieve higher transferability across different CNNs, we further propose radial-anisotropic adversarial vignetting attack (RA-AVA) by allowing the effective regions of vignetting to be radial-anisotropic and shape-free. Moreover, we propose the geometry-aware level-set optimization method to solve the adversarial vignetting regions and physical parameters jointly. We validate the proposed methods on three popular datasets, i. e. , DEV, CIFAR10, and Tiny ImageNet, by attacking four CNNs, e. g. , ResNet50, EfficientNet-B0, DenseNet121, and MobileNet-V2, demonstrating the advantages of our methods over baseline methods on both transferability and image quality.

AAAI Conference 2021 Conference Paper

EfficientDeRain: Learning Pixel-wise Dilation Filtering for High-Efficiency Single-Image Deraining

  • Qing Guo
  • Jingyang Sun
  • Felix Juefei-Xu
  • Lei Ma
  • Xiaofei Xie
  • Wei Feng
  • Yang Liu
  • Jianjun Zhao

Single-image deraining is rather challenging due to the unknown rain model. Existing methods often make specific assumptions of the rain model, which can hardly cover many diverse circumstances in the real world, compelling them to employ complex optimization or progressive refinement. This, however, significantly affects these methods’ efficiency and effectiveness for many efficiency-critical applications. To fill this gap, in this paper, we regard the single-image deraining as a general image-enhancing problem and originally propose a model-free deraining method, i. e. , Efficient- DeRain, which is able to process a rainy image within 10 ms (i. e. , around 6 ms on average), over 80 times faster than the state-of-the-art method (i. e. , RCDNet), while achieving similar de-rain effects. We first propose the novel pixel-wise dilation filtering. In particular, a rainy image is filtered with the pixel-wise kernels estimated from a kernel prediction network, by which suitable multi-scale kernels for each pixel can be efficiently predicted. Then, to eliminate the gap between synthetic and real data, we further propose an effective data augmentation method (i. e. , RainMix) that helps to train network for handling real rainy images. We perform comprehensive evaluation on both synthetic and realworld rainy datasets to demonstrate the effectiveness and efficiency of our method. We release the model and code in https: //github. com/tsingqguo/efficientderain. git.

AAAI Conference 2020 Conference Paper

An Attentional Recurrent Neural Network for Personalized Next Location Recommendation

  • Qing Guo
  • Zhu Sun
  • Jie Zhang
  • Yin-Leng Theng

Most existing studies on next location recommendation propose to model the sequential regularity of check-in sequences, but suffer from the severe data sparsity issue where most locations have fewer than five following locations. To this end, we propose an Attentional Recurrent Neural Network (ARNN) to jointly model both the sequential regularity and transition regularities of similar locations (neighbors). In particular, we first design a meta-path based random walk over a novel knowledge graph to discover location neighbors based on heterogeneous factors. A recurrent neural network is then adopted to model the sequential regularity by capturing various contexts that govern user mobility. Meanwhile, the transition regularities of the discovered neighbors are integrated via the attention mechanism, which seamlessly cooperates with the sequential regularity as a unified recurrent framework. Experimental results on multiple real-world datasets demonstrate that ARNN outperforms state-of-the-art methods.

NeurIPS Conference 2020 Conference Paper

Watch out! Motion is Blurring the Vision of Your Deep Neural Networks

  • Qing Guo
  • Felix Juefei-Xu
  • Xiaofei Xie
  • Lei Ma
  • Jian Wang
  • Bing Yu
  • Wei Feng
  • Yang Liu

The state-of-the-art deep neural networks (DNNs) are vulnerable against adversarial examples with additive random-like noise perturbations. While such examples are hardly found in the physical world, the image blurring effect caused by object motion, on the other hand, commonly occurs in practice, making the study of which greatly important especially for the widely adopted real-time image processing tasks (e. g. , object detection, tracking). In this paper, we initiate the first step to comprehensively investigate the potential hazards of blur effect for DNN, caused by object motion. We propose a novel adversarial attack method that can generate visually natural motion-blurred adversarial examples, named motion-based adversarial blur attack (ABBA). To this end, we first formulate the kernel-prediction-based attack where an input image is convolved with kernels in a pixel-wise way, and the misclassification capability is achieved by tuning the kernel weights. To generate visually more natural and plausible examples, we further propose the saliency-regularized adversarial kernel prediction, where the salient region serves as a moving object, and the predicted kernel is regularized to achieve naturally visual effects. Besides, the attack is further enhanced by adaptively tuning the translations of object and background. A comprehensive evaluation on the NeurIPS'17 adversarial competition dataset demonstrates the effectiveness of ABBA by considering various kernel sizes, translations, and regions. The in-depth study further confirms that our method shows a more effective penetrating capability to the state-of-the-art GAN-based deblurring mechanisms compared with other blurring methods. We release the code to \url{https: //github. com/tsingqguo/ABBA}.

YNIMG Journal 2014 Journal Article

A systematic review of the reporting of sample size calculations and corresponding data components in observational functional magnetic resonance imaging studies

  • Qing Guo
  • Lehana Thabane
  • Geoffrey Hall
  • Margaret McKinnon
  • Ron Goeree
  • Eleanor Pullenayegum

Anecdotal evidence suggests that functional magnetic resonance imaging (fMRI) studies rarely consider statistical power when setting a sample size. This raises concerns since undersized studies may fail to detect effects of interest and encourage data dredging. Although sample size methodology in this field exists, implementation requires specifications of estimated effect size and variance components. We therefore systematically evaluated how often estimates of effect size and variance components were reported in observational fMRI studies involving clinical human participants published in six leading journals between January 2010 and December 2011. A random sample of 100 eligible articles was included in data extraction and analyses. Two independent reviewers assessed the reporting of sample size calculations and the data components required to perform the calculations in the fMRI literature. One article (1%) reported sample size calculations. The reporting of parameter estimates for effect size (8%), between-subject variance (4%), within-subject variance (1%) and temporal autocorrelation matrix (0%) was uncommon. Three articles (3%) reported Cohen's d or F effect sizes. The majority (83%) reported peak or average t, z or F statistics. The inter-rater agreement was very good, with a prevalence-adjusted bias-adjusted kappa (PABAK) value greater than 0. 88. We concluded that sample size calculations were seldom reported in fMRI studies. Moreover, omission of parameter estimates for effect size, between- and within-subject variances, and temporal autocorrelation matrix could limit investigators' ability to perform power analyses for new studies. We suggest routine reporting of these quantities, and recommend strategies for reducing bias in their reported values.

AAMAS Conference 2011 Conference Paper

Modeling Bounded Rationality of Agents During Interactions

  • Qing Guo
  • Piotr Gmytrasiewicz

In this paper, we propose that bounded rationality of another agent be modeled as errors the agent is making while deciding on its action. We are motivated by the work on quantal response equilibria in behavioral game theory which uses Nash equilibria as the solution concept. In contrast, we use decision-theoretic maximization of expected utility. Quantal response assumes that a decision maker is approximately rational, i. e. , is maximizing its expected utility but with an error rate characterized by a single error parameter. Another agent's error rate may be unknown and needs to be estimated during an interaction. We show that this error rate can be estimated using Bayesian update of a suitable conjugate prior, and that it has a sufficient statistic of fixed dimension under strong simplifying assumptions. However, if the simplifying assumptions are relaxed, the quantal response does not admit a finite dimensional sufficient statistic, and a more complex update is needed.