Author name cluster

Jiawei Du

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers

2 author rows

AAAI Conference 2026 Conference Paper

Diffusion Reconstruction-based Data Likelihood Estimation for Core-Set Selection

Mingyang Chen
Jiawei Du
Bo Huang
Yi Wang
Xiaobo Zhang
Wei Wang

Existing core-set selection methods predominantly rely on heuristic scoring signals such as training dynamics or model uncertainty, lacking explicit modeling of data likelihood. This omission may hinder the constructed subset from capturing subtle yet critical distributional structures that underpin effective model training. In this work, we propose a novel, theoretically grounded approach that leverages diffusion models to estimate data likelihood via reconstruction deviation induced by partial reverse denoising. Specifically, we establish a formal connection between reconstruction error and data likelihood, grounded in the Evidence Lower Bound (ELBO) of Markovian diffusion processes, thereby enabling a principled, distribution-aware scoring criterion for data selection. Complementarily, we introduce an efficient information-theoretic method to identify the optimal reconstruction timestep, ensuring that the deviation provides a reliable signal indicative of underlying data likelihood. Extensive experiments on ImageNet demonstrate that reconstruction deviation offers an effective scoring criterion, consistently outperforming existing baselines across selection ratios, and closely matching full-data training using only 50% of the data. Further analysis shows that the likelihood-informed nature of our score reveals informative insights in data selection, shedding light on the interplay between data distributional characteristics and model learning preferences.

PDF Details DOI

TMLR Journal 2025 Journal Article

AutoAnnotator: A Collaborative Annotation Framework for Large and Small Language Models

Yao Lu
Ji Zhaiyuan
Jiawei Du
Yu Shanqing
Qi Xuan
Joey Tianyi Zhou

Although the annotation paradigm based on Large Language Models (LLMs) has made significant breakthroughs in recent years, its actual deployment still has two core bottlenecks: first, the cost of calling commercial APIs in large-scale annotation is very expensive; second, in scenarios that require fine-grained semantic understanding, such as sentiment classification and toxicity classification, the annotation accuracy of LLMs is even lower than that of Small Language Models (SLMs) dedicated to this field. To address these problems, we propose a new paradigm of multi-model cooperative annotation and design a fully automatic annotation framework AutoAnnotator based on this. Specifically, AutoAnnotator consists of two layers. The upper-level meta-controller layer uses the generation and reasoning capabilities of LLMs to select SLMs for annotation, automatically generate annotation code and verify difficult samples; the lower-level task‑specialist layer consists of multiple SLMs that perform annotation through multi-model voting. In addition, we use the difficult samples obtained by the secondary review of the meta-controller layer as the reinforcement learning set and fine-tune the SLMs in stages through a continual learning strategy, thereby improving the generalization of SLMs. Extensive experiments show that AutoAnnotator outperforms existing open-source/API LLMs in zero-shot, one-shot, CoT, and majority voting settings. Notably, AutoAnnotator reduces the annotation cost by 74.15% compared to directly annotating with GPT-3.5-turbo, while still improving the accuracy by 6.21%. The code is available in https://github.com/Zhaiyuan-Ji/AutoAnnotator.

PDF Details

AAAI Conference 2025 Conference Paper

Balancing Privacy and Performance: A Many-in-One Approach for Image Anonymization

Xuemei Jia
Jiawei Du
Hui Wei
Ruinian Xue
Zheng Wang
Hongyuan Zhu
Jun Chen

The effective utilization of data through Deep Neural Networks (DNNs) has profoundly influenced various aspects of society. The growing demand for high-quality, particularly personalized, data has spurred research efforts to prevent data leakage and protect privacy in recent years. Early privacy-preserving methods primarily relied on instance-wise modifications, such as erasing or obfuscating essential features for de-identification. However, this approach highlights an inherent trade-off: minimal modification offers insufficient privacy protection, while excessive modification significantly degrades task performance. In this paper, we propose a novel Recombining for Obfuscation (FRO) approach to address this trade-off. Unlike existing methods that generate one anonymized instance by perturbing the original data on a one-to-one basis, our FRO approach generates an anonymized instance by reassembling mixed ID-related features from multiple original data sources on a many-in-one basis. Instead of introducing additional noise for de-identification, our approach leverages the existing non-polluted features from other instances to anonymize data. Extensive experiments on identity identification tasks demonstrate that FRO outperforms previous state-of-the-art methods, not only in utility performance but also in visual anonymization.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Beyond Modality Collapse: Representation Blending for Multimodal Dataset Distillation

Xin Zhang
Ziruo Zhang
Jiawei Du
Zuozhu Liu
Joey Tianyi Zhou

Multimodal Dataset Distillation (MDD) seeks to condense large-scale image-text datasets into compact surrogates while retaining their effectiveness for cross-modal learning. Despite recent progress, existing MDD approaches often suffer from ***Modality Collapse***, characterized by over-concentrated intra-modal representations and enlarged distributional gap across modalities. In this paper, at the first time, we identify this issue as stemming from a fundamental conflict between the over-compression behavior inherent in dataset distillation and the cross-modal supervision imposed by contrastive objectives. To alleviate modality collapse, we introduce **RepBlend**, a novel MDD framework that weakens overdominant cross-modal supervision via representation blending, thereby significantly enhancing intra-modal diversity. Additionally, we observe that current MDD methods impose asymmetric supervision across modalities, resulting in biased optimization. To address this, we propose symmetric projection trajectory matching, which synchronizes the optimization dynamics using modality-specific projection heads, thereby promoting balanced supervision and enhancing cross-modal alignment. Experiments on Flickr-30K and MS-COCO show that RepBlend consistently outperforms prior state-of-the-art MDD methods, achieving significant gains in retrieval performance (e. g. , +9. 4 IR@10, +6. 3 TR@10 under the 100-pair setting) and offering up to 6. 7$\times$ distillation speedup.

PDF Details

ICLR Conference 2025 Conference Paper

Breaking Class Barriers: Efficient Dataset Distillation via Inter-Class Feature Compensator

Xin Zhang 0092
Jiawei Du
Ping Liu 0004
Joey Tianyi Zhou

Dataset distillation has emerged as a technique aiming to condense informative features from large, natural datasets into a compact and synthetic form. While recent advancements have refined this technique, its performance is bottlenecked by the prevailing class-specific synthesis paradigm. Under this paradigm, synthetic data is optimized exclusively for a pre-assigned one-hot label, creating an implicit class barrier in feature condensation. This leads to inefficient utilization of the distillation budget and oversight of inter-class feature distributions, which ultimately limits the effectiveness and efficiency, as demonstrated in our analysis. To overcome these constraints, this paper presents the Inter-class Feature Compensator (INFER), an innovative distillation approach that transcends the class-specific data-label framework widely utilized in current dataset distillation methods. Specifically, INFER leverages a Universal Feature Compensator (UFC) to enhance feature integration across classes, enabling the generation of multiple additional synthetic instances from a single UFC input. This significantly improves the efficiency of the distillation budget. Moreover, INFER enriches inter-class interactions during the distillation, thereby enhancing the effectiveness and generalizability of the distilled data. By allowing for the linear interpolation of labels similar to those in the original dataset, INFER meticulously optimizes the synthetic data and dramatically reduces the size of soft labels in the synthetic dataset to almost zero, establishing a new benchmark for efficiency and effectiveness in dataset distillation. In practice, INFER demonstrates state-of-the-art performance across benchmark datasets. For instance, in the $\texttt{ipc} = 50$ setting on ImageNet-1k with the same compression level, it outperforms SRe2L by 34.5\% using ResNet18. Codes are available at https://github.com/zhangxin-xd/UFC.

Details

IROS Conference 2025 Conference Paper

DPGP: A Hybrid 2D-3D Dual Path Potential Ghost Probe Zone Prediction Framework for Safe Autonomous Driving

Weiming Qu
Jiawei Du
Shenghai Yuan 0001
Jia Wang
Yang Sun
Shengyi Liu
Yuanhao Zhu
Jiayi Rao

Modern robots must coexist with humans in dense urban environments. A key challenge is the ghost probe problem, where pedestrians or objects unexpectedly rush into traffic paths. This issue affects both autonomous vehicles and human drivers. Existing works propose vehicle-to-everything (V2X) strategies and non-line-of-sight (NLOS) imaging for ghost probe zone detection. However, most require high computational power or specialized hardware, limiting real-world feasibility. Additionally, many methods do not explicitly address this issue. To tackle this, we propose DPGP, a hybrid 2D-3D fusion framework for ghost probe zone prediction using only a monocular camera during training and inference. With unsupervised depth prediction, we observe ghost probe zones align with depth discontinuities, but different depth representations offer varying robustness. To exploit this, we fuse multiple feature embeddings to improve prediction. To validate our approach, we created a 12K-image dataset annotated with ghost probe zones, carefully sourced and cross-checked for accuracy. Experimental results show our framework outperforms existing methods while remaining cost-effective. To our knowledge, this is the first work extending ghost probe zone prediction beyond vehicles, addressing diverse non-vehicle objects. We will open-source our code and dataset for community benefit.

Details

ICLR Conference 2025 Conference Paper

Influence-Guided Diffusion for Dataset Distillation

Mingyang Chen
Jiawei Du
Bo Huang 0017
Yi Wang 0017
Xiaobo Zhang
Wei Wang 0011

Dataset distillation aims to streamline the training process by creating a compact yet effective dataset for a much larger original dataset. However, existing methods often struggle with distilling large, high-resolution datasets due to prohibitive resource costs and limited performance, primarily stemming from sample-wise optimizations in the pixel space. Motivated by the remarkable capabilities of diffusion generative models in learning target dataset distributions and controllably sampling high-quality data tailored to user needs, we propose framing dataset distillation as a controlled diffusion generation task aimed at generating data specifically tailored for effective training purposes. By establishing a correlation between the overarching objective of dataset distillation and the trajectory influence function, we introduce the Influence-Guided Diffusion (IGD) sampling framework to generate training-effective data without the need to retrain diffusion models. An efficient guided function is designed by leveraging the trajectory influence function as an indicator to steer diffusions to produce data with influence promotion and diversity enhancement. Extensive experiments show that the training performance of distilled datasets generated by diffusions can be significantly improved by integrating with our IGD method and achieving state-of-the-art performance in distilling ImageNet datasets. Particularly, an exceptional result is achieved on the ImageNet-1K, reaching 60.3\% at IPC=50. Our code is available at https://github.com/mchen725/DD_IGD.

Details

AAAI Conference 2025 Conference Paper

KPL: Training-Free Medical Knowledge Mining of Vision-Language Models

Jiaxiang Liu
Tianxiang Hu
Jiawei Du
Ruiyuan Zhang
Joey Tianyi Zhou
Zuozhu Liu

Visual Language Models such as CLIP excel in image recognition due to extensive image-text pre-training. However, applying the CLIP inference in zero-shot classification, particularly for medical image diagnosis, faces challenges due to: 1) the inadequacy of representing image classes solely with single category names; 2) the modal gap between the visual and text spaces generated by CLIP encoders. Despite attempts to enrich disease descriptions with large language models, the lack of class-specific knowledge often leads to poor performance. In addition, empirical evidence suggests that existing proxy learning methods for zero-shot image classification on natural image datasets exhibit instability when applied to medical datasets. To tackle these challenges, we introduce the Knowledge Proxy Learning (KPL) to mine knowledge from CLIP. KPL is designed to leverage CLIP's multimodal understandings for medical image classification through Text Proxy Optimization and Multimodal Proxy Learning. Specifically, KPL retrieves image-relevant knowledge descriptions from the constructed knowledge-enhanced base to enrich semantic text proxies. It then harnesses input images and these descriptions, encoded via CLIP, to stably generate multimodal proxies that boost the zero-shot classification performance. Extensive experiments conducted on both medical and natural image datasets demonstrate that KPL enables effective zero-shot image classification, outperforming all baselines. These findings highlight the great potential in this paradigm of mining knowledge from CLIP for medical image classification and broader areas.

PDF Details DOI

IROS Conference 2025 Conference Paper

Online Iterative Learning with Forward Simulation for Sub-minimum End-effector Displacement Positioning

Weiming Qu
Tianlin Liu
Jiawei Du
Xihong Wu
Dingsheng Luo

Precision is a crucial performance indicator for robot arms. During interacting with human, high precision enables a robot arm to be used effectively and safely, while low precision may lead to safety issues. Traditional methods for improving robot arm precision rely on error compensation. However, these methods are often not robust and lack adaptability. Learning-based methods offer greater flexibility and adaptability, while current researches show that they often fall short in achieving high precision and struggle to handle many scenarios requiring high precision. In this paper, we propose a novel high-precision robot arm manipulation framework based on online iterative learning and forward simulation, which can achieve positioning error (precision) less than end-effector physical minimum displacement. In other words, our proposed method can compensate for the precision-limitation of the hardware structure of the robot arms. Furthermore, we consider the joint angular resolution of the real robot arm, which is usually neglected in related works. A series of experiments on both simulation and real UR3 robot arm platforms demonstrate that our proposed method is effective and promising. The related code will be available soon.

Details

IROS Conference 2025 Conference Paper

SILM: A Subjective Intent Based Low-Latency Framework for Multiple Traffic Participants Joint Trajectory Prediction

Weiming Qu
Jia Wang
Jiawei Du
Yuanhao Zhu
Jianfeng Yu
Rui Xia
Song Cao
Xihong Wu

Trajectory prediction is a fundamental technology for advanced autonomous driving systems and represents one of the most challenging problems in the field of cognitive intelligence. Accurately predicting the future trajectories of each traffic participant is a prerequisite for building high safety and high reliability decision-making, planning, and control capabilities in autonomous driving. However, existing methods often focus solely on the motion of other traffic participants without considering the underlying intent behind that motion, which increases the uncertainty in trajectory prediction. Autonomous vehicles operate in real-time environments, meaning that trajectory prediction algorithms must be able to process data and generate predictions in real-time. While many existing methods achieve high accuracy, they often struggle to effectively handle heterogeneous traffic scenarios. In this paper, we propose a Subjective Intent-based Low-latency framework for Multiple traffic participants joint trajectory prediction. Our method explicitly incorporates the subjective intent of traffic participants based on their key points, and predicts the future trajectories jointly without map, which ensures promising performance while significantly reducing the prediction latency. Additionally, we introduce a novel dataset designed specifically for trajectory prediction. Related code and dataset will be available soon.

Details

NeurIPS Conference 2024 Conference Paper

Diversity-Driven Synthesis: Enhancing Dataset Distillation through Directed Weight Adjustment

Jiawei Du
Xin Zhang
Juncheng Hu
Wenxing Huang
Joey T. Zhou

The sharp increase in data-related expenses has motivated research into condensing datasets while retaining the most informative features. Dataset distillation has thus recently come to the fore. This paradigm generates synthetic datasets that are representative enough to replace the original dataset in training a neural network. To avoid redundancy in these synthetic datasets, it is crucial that each element contains unique features and remains diverse from others during the synthesis stage. In this paper, we provide a thorough theoretical and empirical analysis of diversity within synthesized datasets. We argue that enhancing diversity can improve the parallelizable yet isolated synthesizing approach. Specifically, we introduce a novel method that employs dynamic and directed weight adjustment techniques to modulate the synthesis process, thereby maximizing the representativeness and diversity of each synthetic instance. Our method ensures that each batch of synthetic data mirrors the characteristics of a large, varying subset of the original dataset. Extensive experiments across multiple datasets, including CIFAR, Tiny-ImageNet, and ImageNet-1K, demonstrate the superior performance of our method, highlighting its effectiveness in producing diverse and representative synthetic datasets with minimal computational expense. Our code is available at https: //github. com/AngusDujw/Diversity-Driven-Synthesis.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Sequential Subset Matching for Dataset Distillation

Jiawei Du
Qin Shi
Joey Tianyi Zhou

Dataset distillation is a newly emerging task that synthesizes a small-size dataset used in training deep neural networks (DNNs) for reducing data storage and model training costs. The synthetic datasets are expected to capture the essence of the knowledge contained in real-world datasets such that the former yields a similar performance as the latter. Recent advancements in distillation methods have produced notable improvements in generating synthetic datasets. However, current state-of-the-art methods treat the entire synthetic dataset as a unified entity and optimize each synthetic instance equally. This static optimization approach may lead to performance degradation in dataset distillation. Specifically, we argue that static optimization can give rise to a coupling issue within the synthetic data, particularly when a larger amount of synthetic data is being optimized. This coupling issue, in turn, leads to the failure of the distilled dataset to extract the high-level features learned by the deep neural network (DNN) in the latter epochs. In this study, we propose a new dataset distillation strategy called Sequential Subset Matching (SeqMatch), which tackles this problem by adaptively optimizing the synthetic data to encourage sequential acquisition of knowledge during dataset distillation. Our analysis indicates that SeqMatch effectively addresses the coupling issue by sequentially generating the synthetic instances, thereby enhancing its performance significantly. Our proposed SeqMatch outperforms state-of-the-art methods in various datasets, including SVNH, CIFAR-10, CIFAR-100, and Tiny ImageNet.

PDF Details

ICLR Conference 2022 Conference Paper

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Jiawei Du
Hanshu Yan
Jiashi Feng
Joey Tianyi Zhou
Liangli Zhen
Rick Siow Mong Goh
Vincent Y. F. Tan

Overparametrized Deep Neural Networks (DNNs) often achieve astounding performances, but may potentially result in severe generalization error. Recently, the relation between the sharpness of the loss landscape and the generalization error has been established by Foret et al. (2020), in which the Sharpness Aware Minimizer (SAM) was proposed to mitigate the degradation of the generalization. Unfortunately, SAM’s computational cost is roughly double that of base optimizers, such as Stochastic Gradient Descent (SGD). This paper thus proposes Efficient Sharpness Aware Minimizer (ESAM), which boosts SAM’s efficiency at no cost to its generalization performance. ESAM includes two novel and efficient training strategies—StochasticWeight Perturbation and Sharpness-Sensitive Data Selection. In the former, the sharpness measure is approximated by perturbing a stochastically chosen set of weights in each iteration; in the latter, the SAM loss is optimized using only a judiciously selected subset of data that is sensitive to the sharpness. We provide theoretical explanations as to why these strategies perform well. We also show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-`a-vis base optimizers, while test accuracies are preserved or even improved.

Details

NeurIPS Conference 2022 Conference Paper

Sharpness-Aware Training for Free

Jiawei Du
Daquan Zhou
Jiashi Feng
Vincent Tan
Joey Tianyi Zhou

Modern deep neural networks (DNNs) have achieved state-of-the-art performances but are typically over-parameterized. The over-parameterization may result in undesirably large generalization error in the absence of other customized training strategies. Recently, a line of research under the name of Sharpness-Aware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error. However, SAM-like methods incur a two-fold computational overhead of the given base optimizer (e. g. SGD) for approximating the sharpness measure. In this paper, we propose Sharpness-Aware Training for Free, or SAF, which mitigates the sharp landscape at almost zero additional computational cost over the base optimizer. Intuitively, SAF achieves this by avoiding sudden drops in the loss in the sharp local minima throughout the trajectory of the updates of the weights. Specifically, we suggest a novel trajectory loss, based on the KL-divergence between the outputs of DNNs with the current weights and past weights, as a replacement of the SAM's sharpness measure. This loss captures the rate of change of the training loss along the model's update trajectory. By minimizing it, SAF ensures the convergence to a flat minimum with improved generalization capabilities. Extensive empirical results show that SAF minimizes the sharpness in the same way that SAM does, yielding better results on the ImageNet dataset with essentially the same computational cost as the base optimizer.

PDF Details

ICLR Conference 2020 Conference Paper

On Robustness of Neural Ordinary Differential Equations

Hanshu Yan
Jiawei Du
Vincent Y. F. Tan
Jiashi Feng

Neural ordinary differential equations (ODEs) have been attracting increasing attention in various research domains recently. There have been some works studying optimization issues and approximation capabilities of neural ODEs, but their robustness is still yet unclear. In this work, we fill this important gap by exploring robustness properties of neural ODEs both empirically and theoretically. We first present an empirical study on the robustness of the neural ODE-based networks (ODENets) by exposing them to inputs with various types of perturbations and subsequently investigating the changes of the corresponding outputs. In contrast to conventional convolutional neural networks (CNNs), we find that the ODENets are more robust against both random Gaussian perturbations and adversarial attack examples. We then provide an insightful understanding of this phenomenon by exploiting a certain desirable property of the flow of a continuous-time ODE, namely that integral curves are non-intersecting. Our work suggests that, due to their intrinsic robustness, it is promising to use neural ODEs as a basic block for building robust deep network models. To further enhance the robustness of vanilla neural ODEs, we propose the time-invariant steady neural ODE (TisODE), which regularizes the flow on perturbed data via the time-invariant property and the imposition of a steady-state constraint. We show that the TisODE method outperforms vanilla neural ODEs and also can work in conjunction with other state-of-the-art architectural methods to build more robust deep networks.

Details

ICLR Conference 2020 Conference Paper

Query-efficient Meta Attack to Deep Neural Networks

Jiawei Du
Hu Zhang 0005
Joey Tianyi Zhou
Yi Yang 0001
Jiashi Feng

Black-box attack methods aim to infer suitable attack patterns to targeted DNN models by only using output feedback of the models and the corresponding input queries. However, due to lack of prior and inefficiency in leveraging the query and feedback information, existing methods are mostly query-intensive for obtaining effective attack patterns. In this work, we propose a meta attack approach that is capable of attacking a targeted model with much fewer queries. Its high query-efficiency stems from effective utilization of meta learning approaches in learning generalizable prior abstraction from the previously observed attack patterns and exploiting such prior to help infer attack patterns from only a few queries and outputs. Extensive experiments on MNIST, CIFAR10 and tiny-Imagenet demonstrate that our meta-attack method can remarkably reduce the number of model queries without sacrificing the attack performance. Besides, the obtained meta attacker is not restricted to a particular model but can be used easily with a fast adaptive ability to attack a variety of models. Our code will be released to the public.

Details

AAAI Conference 2018 Conference Paper

SC2Net: Sparse LSTMs for Sparse Coding

Joey Tianyi Zhou
Kai Di
Jiawei Du
Xi Peng
Hao Yang
Sinno Jialin Pan
Ivor Tsang
Yong Liu

The iterative hard-thresholding algorithm (ISTA) is one of the most popular optimization solvers to achieve sparse codes. However, ISTA suffers from following problems: 1) ISTA employs non-adaptive updating strategy to learn the parameters on each dimension with a ﬁxed learning rate. Such a strategy may lead to inferior performance due to the scarcity of diversity; 2) ISTA does not incorporate the historical information into the updating rules, and the historical information has been proven helpful to speed up the convergence. To address these challenging issues, we propose a novel formulation of ISTA (named as adaptive ISTA) by introducing a novel adaptive momentum vector. To efﬁciently solve the proposed adaptive ISTA, we recast it as a recurrent neural network unit and show its connection with the well-known long short term memory (LSTM) model. With a new proposed unit, we present a neural network (termed SC2Net) to achieve sparse codes in an end-to-end manner. To the best of our knowledge, this is one of the ﬁrst works to bridge the 1-solver and LSTM, and may provide novel insights in understanding model-based optimization and LSTM. Extensive experiments show the effectiveness of our method on both unsupervised and supervised tasks.

PDF Details