Author name cluster

Jue Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

47 papers

2 author rows

YNIMG Journal 2026 Journal Article

Experimental evaluation of an integrated focused ultrasound and electroencephalography approach for developing activation-informed neuroimaging

Ye Yuan
Jun You Li
Yongzhi Zhang
Sonia E. Vader
Tongsheng Zhang
Gösta Ehnholm
Jue Wang
Nathan McDannold

Details DOI

AAAI Conference 2026 Conference Paper

RipAlert: A Future-Frame-Aware Framework for Rip Current Forecasting and Early Alerting

Meng Wan
Qi Su
Zhixin Xia
Kanglin Chen
Jue Wang
Tiantian Liu
Rongqiang Cao
Hui Cui

Rip currents cause over 100 drowning deaths and more than 30,000 rescues annually in the United States, posing a severe threat to beach safety worldwide. However, most existing detection methods are reactive, identifying rip currents only after they form, leaving limited time for intervention. We propose RipAlert, a future-frame-aware framework that forecasts near-future coastal dynamics and proactively identifies rip current risks. We design a region-sensitive optical flow prediction method with a novel entropy-based object detector to capture early-stage reverse-flow anomalies. Unlike static-image approaches, RipAlert leverages temporal motion patterns to detect rip currents up to 5 seconds before they visibly form. To support real-world deployment, we design a lightweight mobile application and release a curated dataset with over 2,000 annotated images. Experiments on the RipVIS benchmark show that our approach achieves state-of-the-art performance. The system has been deployed at high-risk beaches in China, issuing successful early warnings over real-world events. Our work advances AI-driven coastal safety and contributes to SDG 3 (Good Health and Well-Being) and SDG 13 (Climate Action).

PDF Details DOI

JBHI Journal 2025 Journal Article

A Post-Quantum Blockchain and Autonomous AI-Enabled Scheme for Secure Healthcare Information Exchange

Linlin He
Siyuan Rao
Kexin Tian
Yuyuan Liu
Jue Wang
Shuanggen Liu
Xiuhua Lu

Secure healthcare information exchange (HIE) is critical to improving medical services, enabling data interoperability, and ensuring patient privacy. However, the increasing threat posed by quantum computing challenges the reliability of conventional cryptographic mechanisms. To address this, we propose a post-quantum secure healthcare data-sharing scheme that combines the Extended Merkle Signature Scheme (XMSS) and consortium blockchain technology to guarantee the integrity, authenticity, and traceability of electronic medical records (EMRs). Furthermore, the scheme incorporates autonomous artificial intelligence (AI) to assist healthcare professionals in generating accurate and intelligent diagnostic reports, enhancing clinical decision-making. We theoretically analyze the scheme’s security in the random oracle model, demonstrating that it effectively resists various threats. Performance evaluation shows that the scheme is particularly suitable for HIE scenarios as it reduces about 49% in total computational overheads and 36% in blockchain storage compared to other schemes.

Details DOI

EAAI Journal 2025 Journal Article

Contrast gas detection: Improving infrared gas semantic segmentation with static background

Jue Wang
Jianzhi Fan
Tianshuo Yuan
Dong Luo
Guohua Jiao
Wei Chen

Details DOI

AAAI Conference 2025 Conference Paper

FoldToken: Learning Protein Language via Vector Quantization and Beyond

Zhangyang Gao
Cheng Tan
Jue Wang
Yufei Huang
Lirong Wu
Stan Z. Li

Is there a foreign language describing protein sequences and structures simultaneously? Protein structures, represented by continuous 3D points, have long posed a challenge due to the contrasting modeling paradigms of discrete sequences. We introduce FoldTokenizer to represent protein sequence-structure as discrete symbols. This approach involves projecting residue types and structures into a discrete space, guided by a reconstruction loss for information preservation. We name the learned discrete symbols as FoldToken, and the sequence of FoldTokens serves as a new protein language, transforming the protein sequence-structure into a unified modality. We apply the created protein language on general backbone inpainting task, building the first GPT-style model (FoldGPT) for sequence-structure co-generation with promising results. Key to our success is the substantial enhancement of the vector quantization module, Soft Conditional Vector Quantization (SoftCVQ).

PDF Details DOI

ICML Conference 2025 Conference Paper

Improving Model Alignment Through Collective Intelligence of Open-Source Models

Junlin Wang
Roy Xie
Shang Zhu
Jue Wang
Ben Athiwaratkun
Bhuwan Dhingra
Shuaiwen Leon Song
Ce Zhang 0001

Building helpful and harmless large language models (LLMs) requires effective model alignment approach based on human instructions and feedback, which necessitates high-quality human-labeled data. Constructing such datasets is often expensive and hard to scale, and may face potential limitations on diversity and generalization. To address these challenges, we introduce Mixture of Agents Alignment (MoAA), that leverages the collective strengths of various language models to provide high-quality data for model alignment. By employing MoAA, we enhance both supervised fine-tuning and preference optimization, leading to improved performance compared to using a single model alone to generate alignment data (e. g. using GPT-4o alone). Evaluation results show that our approach can improve win rate of LLaMA-3. 1-8B-Instruct from 19. 5 to 48. 3 on Arena-Hard and from 22. 33 to 57. 23 on AlpacaEval2, highlighting a promising direction for model alignment through this new scalable and diverse synthetic data recipe. Furthermore, we demonstrate that MoAA enables a self-improvement pipeline, where models fine-tuned on MoA-generated data surpass their own initial capabilities, providing evidence that our approach can push the frontier of open-source LLMs without reliance on stronger external supervision. Data and code will be released.

Details

ICML Conference 2025 Conference Paper

Ladder-Residual: Parallelism-Aware Architecture for Accelerating Large Model Inference with Communication Overlapping

Muru Zhang
Mayank Mishra
Zhongzhu Zhou
William Brandon
Jue Wang
Yoon Kim
Jonathan Ragan-Kelley
Shuaiwen Leon Song

Large language model inference is both memory-intensive and time-consuming, often requiring distributed algorithms to efficiently scale. Various model parallelism strategies are used in multi-gpu training and inference to partition computation across multiple devices, reducing memory load and computation time. However, using model parallelism necessitates communication of information between GPUs, which has been a major bottleneck and limits the gains obtained by scaling up the number of devices. We introduce Ladder Residual, a simple architectural modification applicable to all residual-based models that enables straightforward overlapping that effectively hides the latency of communication. Our insight is that in addition to systems optimization, one can also redesign the model architecture to decouple communication from computation. While Ladder Residual can allow communication-computation decoupling in conventional parallelism patterns, we focus on Tensor Parallelism in this paper, which is particularly bottlenecked by its heavy communication. For a Transformer model with 70B parameters, applying Ladder Residual to all its layers can achieve 29% end-to-end wall clock speed up at inference time with TP sharding over 8 devices. We refer the resulting Transformer model as the Ladder Transformer. We train a 1B and 3B Ladder Transformer from scratch and observe comparable performance to a standard dense transformer baseline. We also show that it is possible to convert parts of the Llama-3. 1 8B model to our Ladder Residual architecture with minimal accuracy degradation by only retraining for 3B tokens.

Details

IJCAI Conference 2025 Conference Paper

MCloudNet: An Ultra-Short-Term Photovoltaic Power Forecasting Framework With Multi-Layer Cloud Coverage

Meng Wan
Tiantian Liu
Yuxuan Bi
Jue Wang
Hui Cui
Rongqiang Cao
Jiaxiang Wang
Peng Shi

Over 4. 15 million low-income households across nearly 60, 000 villages in China benefit from photovoltaic (PV) poverty alleviation power stations. However, weak infrastructure and limited capabilities make these systems vulnerable to fluctuations. One of the United Nations' Sustainable Development Goals (SDG 7) seeks to ensure access to affordable and reliable energy for all, especially in underdeveloped regions. This paper proposes MCloudNet, a multi-modal framework designed to improve ultra-short-term PV prediction in data-scarce, cloud-dynamic environments. MCloudNet explicitly models multi-layer cloud structures from satellite imagery and fuses them with time-series meteorological data to enhance prediction accuracy and interpretability. A province-level dispatch system with MCloudNet has been deployed in Hebei, supporting scheduling across rural PV stations. Experiments conducted in counties such as Shexian and Luxi highlight the framework's effectiveness for use in underdeveloped micro-grids. Operational results show that the system has reduced over 60 million kWh of solar curtailment and generated 24 million CNY in economic value, benefiting approximately 50, 000 rural households. By minimizing power fluctuations and improving rural energy scheduling, MCloudNet supports essential services such as lighting, medical facilities, and communications. The source code is available at: https: //github. com/AI4SClab/MCloudNet.

PDF Details DOI

ICLR Conference 2025 Conference Paper

Mixture-of-Agents Enhances Large Language Model Capabilities

Junlin Wang
Jue Wang
Ben Athiwaratkun
Ce Zhang 0001
James Y. Zou

Recent advances in large language models (LLMs) demonstrate substantial capabilities in natural language understanding and generation tasks. With the growing number of LLMs, how to harness the collective expertise of multiple LLMs is an exciting open direction. Toward this goal, we propose a new approach that leverages the collective strengths of multiple LLMs through a Mixture-of-Agents (MoA) methodology. In our approach, we construct a layered MoA architecture wherein each layer comprises multiple LLM agents. Each agent takes all the outputs from agents in the previous layer as auxiliary information in generating its response. MoA models achieves state-of-art performance on AlpacaEval 2.0, Arena-Hard, MT-Bench, and FLASK, surpassing GPT-4 Omni. For example, our MoA using only open-source LLMs achieves a score of 65.1% on AlpacaEval 2.0 compared to 57.5% by GPT-4 Omni.

Details

ICLR Conference 2025 Conference Paper

Scaling Instruction-tuned LLMs to Million-token Contexts via Hierarchical Synthetic Data Generation

Linda He
Jue Wang
Maurice Weber
Shang Zhu
Ben Athiwaratkun
Ce Zhang 0001

Large Language Models (LLMs) struggle with long-context reasoning, not only due to the quadratic scaling of computational complexity with sequence length but also because of the scarcity and expense of annotating long-context data. There has been barely any open-source work that systematically ablates long-context data, nor is there any openly available instruction tuning dataset with contexts surpassing 100K tokens. To bridge this gap, we introduce a novel post-training synthetic data generation strategy designed to efficiently extend the context window of LLMs while preserving their general task performance. Our approach scalably extends to arbitrarily long context lengths, unconstrained by the length of available real-world data, which effectively addresses the scarcity of raw long-context data. Through a step-by-step rotary position embedding (RoPE) scaling training strategy, we demonstrate that our model, with a context length of up to 1M tokens, performs well on the RULER benchmark and InfiniteBench and maintains robust performance on general language tasks.

Details

IJCAI Conference 2025 Conference Paper

SEP: A General Lossless Compression Framework with Semantics Enhancement and Multi-Stream Pipelines

Meng Wan
Rongqiang Cao
Yanghao Li
Jue Wang
Zijian Wang
Qi Su
Lei Qiu
Peng Shi

Deep-learning-based lossless compression is of immense importance in real-world applications, such as cold data persistence, sensor data collection, and astronomical data transmission. However, existing compressors typically model data using single-byte symbols as tokens, which makes it hard to capture the inherent correlations and cannot effectively utilize the parallel capabilities of GPU and multi-core CPU. This paper proposes SEP, a novel lossless compression framework for most time-series backbone neural networks. We first introduce a semantic enhancement module to capture the complex intra-patch relationships of binary byte streams. To improve the compression speed, we design multi-stream pipelines that dynamically assign parallel tasks to GPU streams and multi-cores. We further propose a novel GPU memory optimization strategy, which reuses GPU memory by a shared pool across streams. We conduct experiments on seven real-world datasets and the results demonstrate that our SEP framework outperforms state-of-the-art compressors with an average speed improvement of 30. 0% and an average compression ratio gain of 5. 1%, which is further elevated to 7. 6% with the use of pre-training models. The GPU memory footprint is reduced by as high as 63. 1% and by an average of 36. 2%. The source code is available at: https: //github. com/damonwan1/SEP.

PDF Details DOI

ICML Conference 2024 Conference Paper

Soft Prompt Recovers Compressed LLMs, Transferably

Zhaozhuo Xu
Zirui Liu 0001
Beidi Chen
Shaochen (Henry) Zhong
Yuxin Tang
Jue Wang
Kaixiong Zhou
Xia Hu 0001

Model compression is one of the most popular approaches to improve the accessibility of Large Language Models (LLMs) by reducing their memory footprint. However, the gaining of such efficiency benefits often simultaneously demands extensive engineering efforts and intricate designs to mitigate the performance decline. In this work, we leverage (Soft) Prompt Tuning in its most vanilla form and discover such conventionally learned soft prompts can recover the performance of compressed LLMs. More surprisingly, we observe such recovery effect to be transferable among different tasks and models (albeit natural tokenizer and dimensionality limitations), resulting in further overhead reduction and yet, subverting the common belief that learned soft prompts are task-specific. Our work is fully orthogonal and compatible with model compression frameworks such as pruning and quantization, where we enable up to $8\times$ compressed LLM (with a joint 4-bit quantization and 50% weight pruning compression) to match its uncompressed counterparts on popular benchmarks. We note that we are the first to reveal vanilla Parameter-Efficient Fine-Tuning (PEFT) techniques have the potential to be utilized under a compression recovery context, opening a new line of opportunities for model accessibility advancement while freeing our fellow researchers from the previously present engineering burdens and constraints. The code is available at https: //github. com/zirui-ray-liu/compress-then-prompt.

Details

NeurIPS Conference 2024 Conference Paper

UniIF: Unified Molecule Inverse Folding

Zhangyang Gao
Jue Wang
Cheng Tan
Lirong Wu
Yufei Huang
Siyuan Li
Zhirui Ye
Stan Z. Li

Molecule inverse folding has been a long-standing challenge in chemistry and biology, with the potential to revolutionize drug discovery and material science. Despite specified models have been proposed for different small- or macro-molecules, few have attempted to unify the learning process, resulting in redundant efforts. Complementary to recent advancements in molecular structure prediction, such as RoseTTAFold All-Atom and AlphaFold3, we propose the unified model UniIF for the inverse folding of all molecules. We do such unification in two levels: 1) Data-Level: We propose a unified block graph data form for all molecules, including the local frame building and geometric feature initialization. 2) Model-Level: We introduce a geometric block attention network, comprising a geometric interaction, interactive attention and virtual long-term dependency modules, to capture the 3D interactions of all molecules. Through comprehensive evaluations across various tasks such as protein design, RNA design, and material design, we demonstrate that our proposed method surpasses state-of-the-art methods on all tasks. UniIF offers a versatile and effective solution for general molecule inverse folding.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Video Token Merging for Long Video Understanding

Seon-Ho Lee
Jue Wang
Zhikang Zhang
David Fan
Xinyu Li

As the scale of data and models for video understanding rapidly expand, handling long-form video input in transformer-based models presents a practical challenge. Rather than resorting to input sampling or token dropping, which may result in information loss, token merging shows promising results when used in collaboration with transformers. However, the application of token merging for long-form video processing is not trivial. We begin with the premise that token merging should not rely solely on the similarity of video tokens; the saliency of tokens should also be considered. To address this, we explore various video token merging strategies for long-form video classification, starting with a simple extension of image token merging, moving to region-concentrated merging, and finally proposing a learnable video token merging (VTM) algorithm that dynamically merges tokens based on their saliency. Extensive experimental results show that we achieve better or comparable performances on the LVU, COIN, and Breakfast datasets. Moreover, our approach significantly reduces memory costs by 84% and boosts throughput by approximately 6. 89 times compared to baseline algorithms.

PDF Details DOI

ICML Conference 2023 Conference Paper

CocktailSGD: Fine-tuning Foundation Models over 500Mbps Networks

Jue Wang
Yucheng Lu 0003
Binhang Yuan
Beidi Chen
Percy Liang
Christopher De Sa
Christopher Ré
Ce Zhang 0001

Distributed training of foundation models, especially large language models (LLMs), is communication-intensive and so has heavily relied on centralized data centers with fast interconnects. Can we train on slow networks and unlock the potential of decentralized infrastructure for foundation models? In this paper, we propose CocktailSGD, a novel communication-efficient training framework that combines three distinct compression techniques – random sparsification, top-K sparsification, and quantization – to achieve much greater compression than each individual technique alone. We justify the benefit of such a hybrid approach through a theoretical analysis of convergence. Empirically, we show that CocktailSGD achieves up to 117$\times$ compression in fine-tuning LLMs up to 20 billion parameters without hurting convergence. On a 500Mbps network, CocktailSGD only incurs $\sim$1. 2$\times$ slowdown compared with data center networks.

Details

AAAI Conference 2023 Conference Paper

CoordFill: Efficient High-Resolution Image Inpainting via Parameterized Coordinate Querying

Weihuang Liu
Xiaodong Cun
Chi-Man Pun
Menghan Xia
Yong Zhang
Jue Wang

Image inpainting aims to fill the missing hole of the input. It is hard to solve this task efficiently when facing high-resolution images due to two reasons: (1) Large reception field needs to be handled for high-resolution image inpainting. (2) The general encoder and decoder network synthesizes many background pixels synchronously due to the form of the image matrix. In this paper, we try to break the above limitations for the first time thanks to the recent development of continuous implicit representation. In detail, we down-sample and encode the degraded image to produce the spatial-adaptive parameters for each spatial patch via an attentional Fast Fourier Convolution (FFC)-based parameter generation network. Then, we take these parameters as the weights and biases of a series of multi-layer perceptron (MLP), where the input is the encoded continuous coordinates and the output is the synthesized color value. Thanks to the proposed structure, we only encode the high-resolution image in a relatively low resolution for larger reception field capturing. Then, the continuous position encoding will be helpful to synthesize the photo-realistic high-frequency textures by re-sampling the coordinate in a higher resolution. Also, our framework enables us to query the coordinates of missing pixels only in parallel, yielding a more efficient solution than the previous methods. Experiments show that the proposed method achieves real-time performance on the 2048X2048 images using a single GTX 2080 Ti GPU and can handle 4096X4096 images, with much better performance than existing state-of-the-art methods visually and numerically. The code is available at: https://github.com/NiFangBaAGe/CoordFill.

PDF Details DOI

ICML Conference 2023 Conference Paper

Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

Zichang Liu
Jue Wang
Tri Dao
Tianyi Zhou 0002
Binhang Yuan
Zhao Song 0002
Anshumali Shrivastava
Ce Zhang 0001

Large language models (LLMs) with hundreds of billions of parameters have sparked a new wave of exciting AI applications. However, they are computationally expensive at inference time. Sparsity is a natural approach to reduce this cost, but existing methods either require costly retraining, have to forgo LLM’s in-context learning ability, or do not yield wall-clock time speedup on modern hardware. We hypothesize that contextual sparsity, which are small, input-dependent sets of attention heads and MLP parameters that yield approximately the same output as the dense model for a given input, can address these issues. We show that contextual sparsity exists, that it can be accurately predicted, and that we can exploit it to speed up LLM inference in wall-clock time without compromising LLM’s quality or in-context learning ability. Based on these insights, we propose DejaVu, a system that uses a low-cost algorithm to predict contextual sparsity on the fly given inputs to each layer, along with an asynchronous and hardware-aware implementation that speeds up LLM inference. We validate that DejaVu can reduce the inference latency of OPT-175B by over 2$\times$ compared to the state-of-the-art FasterTransformer, and over 6$\times$ compared to the widely used Hugging Face implementation, without compromising model quality. The code is available at https: //github. com/FMInference/DejaVu.

Details

AAAI Conference 2023 Conference Paper

Effective Continual Learning for Text Classification with Lightweight Snapshots

Jue Wang
Dajie Dong
Lidan Shou
Ke Chen
Gang Chen

Continual learning is known for suffering from catastrophic forgetting, a phenomenon where previously learned concepts are forgotten upon learning new tasks. A natural remedy is to use trained models for old tasks as ‘teachers’ to regularize the update of the current model to prevent such forgetting. However, this requires storing all past models, which is very space-consuming for large models, e.g. BERT, thus impractical in real-world applications. To tackle this issue, we propose to construct snapshots of seen tasks whose key knowledge is captured in lightweight adapters. During continual learning, we transfer knowledge from past snapshots to the current model through knowledge distillation, allowing the current model to review previously learned knowledge while learning new tasks. We also design representation recalibration to better handle the class-incremental setting. Experiments over various task sequences show that our approach effectively mitigates catastrophic forgetting and outperforms all baselines.

PDF Details DOI

TMLR Journal 2023 Journal Article

Holistic Evaluation of Language Models

Percy Liang
Rishi Bommasani
Tony Lee
Dimitris Tsipras
Dilara Soylu
Michihiro Yasunaga
Yian Zhang
Deepak Narayanan

Language models (LMs) are becoming the foundation for almost all major language technologies, but their capabilities, limitations, and risks are not well understood. We present Holistic Evaluation of Language Models (HELM) to improve the transparency of language models. First, we taxonomize the vast space of potential scenarios (i.e. use cases) and metrics (i.e. desiderata) that are of interest for LMs. Then we select a broad subset based on coverage and feasibility, noting what’s missing or underrepresented (e.g. question answering for neglected English dialects, metrics for trustworthiness). Second, we adopt a multi-metric approach: We measure 7 metrics (accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency) for each of 16 core scenarios to the extent possible (87.5% of the time), ensuring that metrics beyond accuracy don’t fall to the wayside, and that trade-offs across models and metrics are clearly exposed. We also perform 7 targeted evaluations, based on 26 targeted scenarios, to more deeply analyze specific aspects (e.g. knowledge, reasoning, memorization/copyright, disinformation). Third, we conduct a large-scale evaluation of 30 prominent language models (spanning open, limited-access, and closed models) on all 42 scenarios, including 21 scenarios that were not previously used in mainstream LM evaluation. Prior to HELM, models on average were evaluated on just 17.9% of the core HELM scenarios, with some prominent models not sharing a single scenario in common. We improve this to 96.0%: now all 30 models have been densely benchmarked on a set of core scenarios and metrics under standardized conditions. Our evaluation surfaces 25 top-level findings concerning the interplay between different scenarios, metrics, and models. For full transparency, we release all raw model prompts and completions publicly for further analysis, as well as a general modular toolkit for easily adding new scenarios, models, metrics, and prompting strategies. We intend for HELM to be a living benchmark for the community, continuously updated with new scenarios, metrics, and models.