Arrow Research search

Author name cluster

Jianyu Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers
2 author rows

Possible papers

13

AAAI Conference 2026 Conference Paper

Boosting Noisy Correspondence Discrimination via Dynamic Neighborhood Semantic Verification

  • Yu Wang
  • Fengxia Han
  • Jianyu Wang

Noisy correspondence, characterized by mismatches in cross-modal data pairs, presents a significant challenge for real-world applications. Current approaches primarily rely on direct cross-modal pairwise similarity metrics, which suffer from two critical limitations: noise sensitivity, where direct similarity calculations are easily corrupted by noisy or ambiguous instances, and contextual blindness, where isolated pairwise comparisons fail to exploit the rich semantic context embedded in neighboring instances. To address this issue, we propose to improve noise correspondence discrimination through a well-designed Dynamic Neighborhood Semantic association verification paradigm, namely DNS. Specifically, we hypothesize that the matching degree of current samples can be quantified through the interrelationships among their respective semantic neighbors. For this reason, we develop a novel semantic drift distance and local relation proximity based on dynamic neighborhood association. Furthermore, beyond implicit approaches to semantic gap modeling in cross-modal data, we introduce an explicit decomposition framework that disentangles the gap into the semantic orientation and scalar magnitude. Through the strategic integration of these proposed mechanisms, DNS achieves substantial enhancement in noisy correspondence discrimination, yielding remarkable performance gains. Extensive experiments on three widely-used benchmark datasets, including Flickr30K, MS-COCO, and Conceptual Captions, demonstrate the superiority of DNS over state-of-the-art methods.

ICML Conference 2025 Conference Paper

Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective

  • Jianyu Wang
  • Zhiqiang Hu
  • Lidong Bing

We propose a novel prompt design paradigm that challenges conventional wisdom in large language model (LLM) prompting. While conventional wisdom prioritizes well-crafted instructions and demonstrations for in-context learning (ICL), we show that pruning random demonstrations into seemingly incoherent ”gibberish” can remarkably improve performance across diverse tasks. Notably, the ”gibberish” always matches or surpasses state-of-the-art automatic prompt optimization techniques, achieving substantial gains regardless of LLM alignment. Nevertheless, discovering an effective pruning strategy is non-trivial, as existing attribution methods and prompt compression algorithms fail to deliver robust results, let alone human intuition. In terms of this, we propose a self-discover prompt optimization framework, PromptQuine, an evolutionary search framework that automatically searches for the pruning strategy by itself using only low-data regimes. Much like the emergent complexity in nature—such as symbiosis and self-organization—arising in response to resource constraints, our framework evolves and refines unconventional yet highly effective prompts by leveraging only the tokens present within the context. We demonstrate its effectiveness across classification, multi-choice question answering, generation and math reasoning tasks across LLMs, while achieving decent runtime efficiency. We hope our findings can guide mechanistic studies on in-context learning, and provide a call to action, to pave the way for more open-ended search algorithms for more effective LLM prompting.

ICML Conference 2025 Conference Paper

Instruction-Following Pruning for Large Language Models

  • Bairu Hou
  • Qibin Chen
  • Jianyu Wang
  • Guoli Yin
  • Chong Wang
  • Nan Du 0002
  • Ruoming Pang
  • Shiyu Chang

With the rapid scaling of large language models (LLMs), structured pruning has become a widely used technique to learn efficient, smaller models from larger ones, delivering superior performance compared to training similarly sized models from scratch. In this paper, we move beyond the traditional static pruning approach of determining a fixed pruning mask for a model, and propose a dynamic approach to structured pruning. In our method, the pruning mask is input-dependent and adapts dynamically based on the information described in a user instruction. Our approach, termed "instruction-following pruning”, introduces a sparse mask predictor that takes the user instruction as input and dynamically selects the most relevant model parameters for the given task. To identify and activate effective parameters, we jointly optimize the sparse mask predictor and the LLM, leveraging both instruction-following data and the pre-training corpus. Experimental results demonstrate the effectiveness of our approach on a wide range of evaluation benchmarks. For example, our 3B activated model improves over the 3B dense model by 5-8 points of absolute margin on domains such as math and coding, and rivals the performance of a 9B model.

NeurIPS Conference 2025 Conference Paper

Point4Bit: Post Training 4-bit Quantization for Point Cloud 3D Detection

  • Jianyu Wang
  • Yu Wang
  • Shengjie Zhao
  • Sifan Zhou

Voxel-based 3D object detectors have achieved remarkable performance in point cloud perception, yet their high computational and memory demands pose significant challenges for deployment on resource-constrained edge devices. Post-training quantization (PTQ) provides a practical means to compress models and accelerate inference; however, existing PTQ methods for point cloud detection are typically limited to INT8 and lack support for lower-bit formats such as INT4, which restricts their deployment potential. In this paper, we present Point4bit, the first general 4-bit PTQ framework tailored for voxel-based 3D object detectors. To tackle challenges in low-bit quantization, we propose two key techniques: (1) Foreground-aware Piecewise Activation Quantization (FA-PAQ), which leverages foreground structural cues to improve the quantization of sparse activations; and (2) Gradient-guided Key Weight Quantization (G-KWQ), which preserves task-critical weights through gradient-based analysis to reduce quantization-induced degradation. Extensive experiments demonstrate that Point4bit achieves INT4 quantization with minimal accuracy loss with less than 1. 5\% accuracy drop. Moreover, we validate its generalization ability on point cloud classification and segmentation tasks, demonstrating broad applicability. Our method further advances the bit-width limitation of point cloud quantization to 4 bits, demonstrating strong potential for efficient deployment on resource-constrained edge devices.

ICLR Conference 2024 Conference Paper

FedHyper: A Universal and Robust Learning Rate Scheduler for Federated Learning with Hypergradient Descent

  • Ziyao Wang
  • Jianyu Wang
  • Ang Li 0005

The theoretical landscape of federated learning (FL) undergoes rapid evolution, but its practical application encounters a series of intricate challenges, and hyperparameter optimization is one of these critical challenges. Amongst the diverse adjustments in hyperparameters, the adaptation of the learning rate emerges as a crucial component, holding the promise of significantly enhancing the efficacy of FL systems. In response to this critical need, this paper presents FedHyper, a novel hypergradient-based learning rate adaptation algorithm specifically designed for FL. FedHyper serves as a universal learning rate scheduler that can adapt both global and local rates as the training progresses. In addition, FedHyper not only showcases unparalleled robustness to a spectrum of initial learning rate configurations but also significantly alleviates the necessity for laborious empirical learning rate adjustments. We provide a comprehensive theoretical analysis of FedHyper’s convergence rate and conduct extensive experiments on vision and language benchmark datasets. The results demonstrate that FEDHYPER consistently converges 1.1-3× faster than FedAvg and the competing baselines while achieving superior final accuracy. Moreover, FEDHYPER catalyzes a remarkable surge in accuracy, augmenting it by up to 15% compared to FedAvg under suboptimal initial learning rate settings.

TMLR Journal 2024 Journal Article

On the Unreasonable Effectiveness of Federated Averaging with Heterogeneous Data

  • Jianyu Wang
  • Rudrajit Das
  • Gauri Joshi
  • Satyen Kale
  • Zheng Xu
  • Tong Zhang

Existing theoretical results (such as (Woodworth et al., 2020a)) predict that the performance of federated averaging (FedAvg) is exacerbated by high data heterogeneity. However, in practice, FedAvg converges pretty well on several naturally heterogeneous datasets. In order to explain this seemingly unreasonable effectiveness of FedAvg that contradicts previous theoretical predictions, this paper introduces the client consensus hypothesis: on certain federated datasets, the average of local models updates on clients starting from the optimum is close to zero. We prove that under this hypothesis, data heterogeneity does not exacerbate the convergence of FedAvg. Moreover, we show that this hypothesis holds for a linear regression problem and some naturally heterogeneous datasets such as FEMNIST and StackOverflow. Therefore, we believe that this hypothesis can better explain the performance of FedAvg in practice.

IROS Conference 2023 Conference Paper

SELVO: A Semantic-Enhanced Lidar-Visual Odometry

  • Kun Jiang
  • Shuang Gao
  • Xudong Zhang
  • Jijunnan Li
  • Yandong Guo
  • Shijie Liu
  • Chunlai Li
  • Jianyu Wang

In the face of complex external environment, single sensor information can no longer meet the accuracy requirements of low-drift SLAM. In this paper, we focus on the fusion scheme of cameras and lidar, and explore the gain of semantic information to SLAM system. A Semantic-Enhanced Lidar-Visual Odometry (SELVO) is proposed to achieve pose estimation with high accuracy and robustness by applying semantics and utilizing strategies of initialization and sensor fusion. In loop closure detection thread, we propose a novel place recognition method based on semantic information to maintain the global consistency of the map. In the back-end, we design a joint optimization framework including visual odometry, lidar odometry and loop closure detection, and innovatively propose to recognize degraded scenes with semantic information. We have conducted a large number of experiments on KITTI [1] and KITTI-360 [2] dataset, and the results show that our system can achieve the high accuracy and competitive performance in comparison with state-of-the-art methods.

ICLR Conference 2023 Conference Paper

Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning

  • John Nguyen
  • Jianyu Wang
  • Kshitiz Malik
  • Maziar Sanjabi
  • Mike Rabbat

An oft-cited challenge of federated learning is the presence of heterogeneity. \emph{Data heterogeneity} refers to the fact that data from different clients may follow very different distributions. \emph{System heterogeneity} refers to client devices having different system capabilities. A considerable number of federated optimization methods address this challenge. In the literature, empirical evaluations usually start federated training from random initialization. However, in many practical applications of federated learning, the server has access to proxy data for the training task that can be used to pre-train a model before starting federated training. Using four standard federated learning benchmark datasets, we empirically study the impact of starting from a pre-trained model in federated learning. Unsurprisingly, starting from a pre-trained model reduces the training time required to reach a target error rate and enables the training of more accurate models (up to 40\%) than is possible when starting from random initialization. Surprisingly, we also find that starting federated learning from a pre-trained initialization reduces the effect of both data and system heterogeneity. We recommend future work proposing and evaluating federated optimization methods to evaluate the performance when starting from random and pre-trained initializations. This study raises several questions for further work on understanding the role of heterogeneity in federated optimization.

JMLR Journal 2021 Journal Article

Cooperative SGD: A Unified Framework for the Design and Analysis of Local-Update SGD Algorithms

  • Jianyu Wang
  • Gauri Joshi

When training machine learning models using stochastic gradient descent (SGD) with a large number of nodes or massive edge devices, the communication cost of synchronizing gradients at every iteration is a key bottleneck that limits the scalability of the system and hinders the benefit of parallel computation. Local-update SGD algorithms, where worker nodes perform local iterations of SGD and periodically synchronize their local models, can effectively reduce the communication frequency and save the communication delay. In this paper, we propose a powerful framework, named Cooperative SGD, that subsumes a variety of local-update SGD algorithms (such as local SGD, elastic averaging SGD, and decentralized parallel SGD) and provides a unified convergence analysis. Notably, special cases of the unified convergence analysis provided by the cooperative SGD framework yield 1) the first convergence analysis of elastic averaging SGD for general non-convex objectives, and 2) improvements upon previous analyses of local SGD and decentralized parallel SGD. Moreover, we design new algorithms such as elastic averaging SGD with overlapped computation and communication, and decentralized periodic averaging which are shown to be 4x or more faster than the baseline in reaching the same training loss. [abs] [ pdf ][ bib ] &copy JMLR 2021. ( edit, beta )

ICLR Conference 2020 Conference Paper

SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum

  • Jianyu Wang
  • Vinayak Tantia
  • Nicolas Ballas
  • Mike Rabbat

Distributed optimization is essential for training large models on large datasets. Multiple approaches have been proposed to reduce the communication overhead in distributed training, such as synchronizing only after performing multiple local SGD steps, and decentralized methods (e.g., using gossip algorithms) to decouple communications among workers. Although these methods run faster than AllReduce-based methods, which use blocking communication before every update, the resulting models may be less accurate after the same number of updates. Inspired by the BMUF method of Chen & Huo (2016), we propose a slow momentum (SlowMo) framework, where workers periodically synchronize and perform a momentum update, after multiple iterations of a base optimization algorithm. Experiments on image classification and machine translation tasks demonstrate that SlowMo consistently yields improvements in optimization and generalization performance relative to the base optimizer, even when the additional overhead is amortized over many updates so that the SlowMo runtime is on par with that of the base optimizer. We provide theoretical convergence guarantees showing that SlowMo converges to a stationary point of smooth non-convex losses. Since BMUF can be expressed through the SlowMo framework, our results also correspond to the first theoretical convergence guarantees for BMUF.

NeurIPS Conference 2020 Conference Paper

Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization

  • Jianyu Wang
  • Qinghua Liu
  • Hao Liang
  • Gauri Joshi
  • H. Vincent Poor

In federated learning, heterogeneity in the clients' local datasets and computation speeds results in large variations in the number of local updates performed by each client in each communication round. Naive weighted aggregation of such models causes objective inconsistency, that is, the global model converges to a stationary point of a mismatched objective function which can be arbitrarily different from the true objective. This paper provides a general framework to analyze the convergence of federated heterogeneous optimization algorithms. It subsumes previously proposed methods such as FedAvg and FedProx and provides the first principled understanding of the solution bias and the convergence slowdown due to objective inconsistency. Using insights from this analysis, we propose FedNova, a normalized averaging method that eliminates objective inconsistency while preserving fast error convergence.

NeurIPS Conference 2019 Conference Paper

Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training

  • Haichao Zhang
  • Jianyu Wang

We introduce a feature scattering-based adversarial training approach for improving model robustness against adversarial attacks. Conventional adversarial training approaches leverage a supervised scheme (either targeted or non-targeted) in generating attacks for training, which typically suffer from issues such as label leaking as noted in recent works. Differently, the proposed approach generates adversarial images for training through feature scattering in the latent space, which is unsupervised in nature and avoids label leaking. More importantly, this new approach generates perturbed images in a collaborative fashion, taking the inter-sample relationships into consideration. We conduct analysis on model robustness and demonstrate the effectiveness of the proposed approach through extensively experiments on different datasets compared with state-of-the-art approaches.