Arrow Research search

Author name cluster

Yue Huang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

42 papers
2 author rows

Possible papers

42

AAAI Conference 2026 Conference Paper

AsFT: Anchoring Safety During LLM Fine-Tuning Within Narrow Safety Basin

  • Shuo Yang
  • Qihui Zhang
  • Yuyang Liu
  • Yue Huang
  • Xiaojun Jia
  • Kun-Peng Ning
  • Jia-Yu Yao
  • Jigang Wang

Fine-tuning large language models (LLMs) improves performance but introduces critical safety vulnerabilities: even minimal harmful data can severely compromise safety measures. We observe that perturbations orthogonal to the alignment direction—defined by weight differences between aligned (safe) and unaligned models—rapidly compromise model safety. In contrast, updates along the alignment direction largely preserve it, revealing the parameter space as a "narrow safety basin". To address this, we propose AsFT (Anchoring Safety in Fine-Tuning) to maintain safety by explicitly constraining update directions during fine-tuning. By penalizing updates orthogonal to the alignment direction, AsFT effectively constrains the model within the "narrow safety basin," thus preserving its inherent safety. Extensive experiments on multiple datasets and models show that AsFT reduces harmful behaviors by up to 7.60%, improves task performance by 3.44%, and consistently outperforms existing methods across multiple tasks.

AAAI Conference 2026 Conference Paper

Better Datasets Start from RefineLab: Automatic Optimization for High-Quality Dataset Refinement

  • Xiaonan Luo
  • Yue Huang
  • Ping He
  • Xiangliang Zhang

High‑quality Question–Answer (QA) datasets are foundational for reliable Large Language Model (LLM) evaluation, yet even expert‑crafted datasets exhibit persistent gaps in domain coverage, misaligned difficulty distributions, and factual inconsistencies. The recent surge in generative model-powered datasets has compounded these quality challenges. In this work, we introduce RefineLab, the first LLM‑driven framework that automatically refines raw QA textual data into high-quality datasets under a controllable token‑budget constraint. RefineLab takes a set of target quality attributes as refinement objectives and performs selective edits within a predefined token budget to ensure practicality and efficiency. In essence, RefineLab addresses a constrained optimization problem: improving the quality of QA samples as much as possible while respecting resource limitations. With a set of available refinement operations, RefineLab takes as input the original dataset, a specified set of target quality dimensions, and a token budget, and determines which refinement operations should be applied to each QA sample. This process is guided by an assignment module that selects optimal refinement strategies to maximize overall dataset quality while adhering to the budget constraint. Experiments demonstrate that RefineLab consistently narrows divergence from expert datasets across coverage, difficulty alignment, factual fidelity, and distractor quality. RefineLab pioneers a scalable, customizable path to reproducible dataset design, with broad implications for LLM evaluation.

AAAI Conference 2026 Conference Paper

Decompose and Attribute: Boosting Generalizable Open-Set Object Detection via Objectness Score

  • Yuxuan Yuan
  • Lichen Wei
  • Luyao Tang
  • Chaoqi Chen
  • Zheyuan Cai
  • Yue Huang
  • Xinghao Ding

Open-set object detection (OSOD) aims to recognize known object categories while localizing previously unseen instances. However, real-world scenarios often involve co-occurring domain shifts and novel object categories. Existing OSOD methods typically overlook domain shifts, relying on source-trained representations that entangle domain-specific style with semantic content, thereby hindering generalization to both unseen domains and novel categories. To address this challenge, we propose a unified framework, termed DecOmpose and ATtribute (DOAT), which disentangles domain-specific style from semantic structure, thereby facilitating generalizable object detection. DOAT employs wavelet-based feature decomposition to separate style information from high-frequency structural details, thus enabling an explicit separation of domain and category shifts. To account for domain shift, the low-frequency components are perturbed within a style subspace to simulate diverse domain appearances. For unknown object discovery, the high-frequency components are utilized to estimate objectness scores via an attribution mechanism that fuses wavelet energy with semantic distance to known-category prototypes. Extensive experiments on standard open-set benchmarks have demonstrated the superior generalization performance of DOAT.

AAAI Conference 2026 Conference Paper

FCMO: A Flow-Curv Mamba Operator for Large-Scale 3D Vehicle Aerodynamics

  • Yuchen Xie
  • Yufeng Xie
  • Hanyu He
  • Yue Huang
  • Lijuan Sun
  • Hengyi Ren

Large-scale three dimensional vehicle aerodynamics prediction poses critical computational challenges in modern automotive design, where traditional CFD methods require prohibitive simulation times that conflict with rapid design iteration demands. While recent neural operator approaches show promise, existing methods struggle with computational complexity in dense meshes and fail to preserve essential topological information when processing large-scale point clouds. We propose FCMO, a physics-aware neural operator that integrates fluid mechanics principles with selective state space modeling for efficient large-scale vehicle aerodynamics. FCMO introduces four synergistic components: FlowCurv Anchor Sampling that intelligently selects mesh nodes based on normalized local curvature and windward sensitivity. Additionally, dual-scale physics-aware position encoding with adaptive k-NN construction transforms 3D irregular meshes into causality-preserving sequences through feature-guided serpentine scanning. The model integrates a flow-aware Mamba processor incorporating selective mechanisms that dynamically modulate state transitions based on wall distance and flow characteristics. Finally, a physics-constrained decoder enforces conservation laws through mixed weighted interpolation. Extensive experiments on Ahmed-Body and DrivAerNet benchmarks demonstrate that FCMO achieves consistent state-of-the-art performance with 5.2% improvement in surface pressure prediction, 9.3% enhancement in wall shear stress estimation, and 11.4% boost in drag coefficient accuracy, while maintaining superior computational efficiency with 9.4% fewer FLOPs and 9.9% reduced memory usage compared to existing methods.

AAAI Conference 2026 Conference Paper

RMO: Towards Better LLM Alignment via Reshaping Reward Margin Distributions

  • Yanchi Ru
  • Yue Huang
  • Xiangliang Zhang

Large Language Models (LLMs) have achieved remarkable success in instruction-following and dialogue tasks, yet aligning them with human preferences remains a critical challenge. Recent advances such as Direct Preference Optimization (DPO) simplify the alignment pipeline by bypassing explicit reward modeling, but they often suffer from suboptimal reward margin distributions, leading to weak supervision signals and reduced discriminative capacity. In this work, we propose Reward Margin Optimization (RMO), a framework that reshapes reward margin distributions during training to improve alignment performance. RMO comprises three components: (1) a Dual Denoising Filtering strategy that filters ambiguous and noisy preference pairs based on reward margin dynamics; (2) Batch Margin Diversification, which maximizes intra-batch margin variance to enhance learning signal diversity; and (3) Pairwise Margin Amplification, an auxiliary regularization term that encourages larger margins between preferred and dispreferred responses. Extensive experiments on multiple LLMs and datasets demonstrate that RMO consistently improves win rates over strong baselines such as DPO and SimPO, while remaining compatible with various preference-based optimization methods. Our results highlight the critical role of reward margin distribution in preference alignment and establish RMO as an effective and scalable enhancement to existing alignment techniques.

AAAI Conference 2026 Conference Paper

SPA: Achieving Consensus in LLM Alignment via Self-Priority Optimization

  • Yue Huang
  • Xiangqi Wang
  • Xiangliang Zhang

In high-stakes scenarios—such as self-harm, legal, or medical queries—LLMs must be both trustworthy and helpful. However, these goals often conflict. We propose priority alignment, a new alignment paradigm that enforces a strict “trustworthy-before-helpful” ordering: optimization of helpfulness is conditioned on first meeting trustworthy thresholds (e.g., harmlessness or honesty). To realize this, we introduce Self-Priority Alignment (SPA)—a fully unsupervised framework that generates diverse responses, self-evaluates them and refines them by the model itself, and applies dual-criterion denoising to remove inconsistency and control variance. From this, SPA constructs lexicographically ordered preference pairs and fine-tunes the model using an uncertainty-weighted alignment loss that emphasizes high-confidence, high-gap decisions. Experiments across multiple benchmarks show that SPA improves helpfulness without compromising safety, outperforming strong baselines while preserving general capabilities. Our results demonstrate that SPA provides a scalable and interpretable alignment strategy for critical LLM applications.

AAAI Conference 2026 Conference Paper

TASE: Token Awareness and Structured Evaluation for Multilingual Language Models

  • Chenzhuo Zhao
  • Xinda Wang
  • Yue Huang
  • Junting Lu
  • Ziqian Liu

While large language models (LLMs) have demonstrated remarkable performance on high-level semantic tasks, they often struggle with fine-grained, token-level understanding and structural reasoning—capabilities that are essential for applications requiring precision and control. We introduce TASE, a comprehensive benchmark designed to evaluate LLMs' ability to perceive and reason about token-level information across languages. TASE covers 10 tasks under two core categories: token awareness and structural understanding, spanning Chinese, English, and Korean, with a 35,927-instance evaluation set and a scalable synthetic data generation pipeline for training. Tasks include character counting, token alignment, syntactic structure parsing, and length constraint satisfaction. We evaluate over 30 leading commercial and open-source LLMs, including O3, Claude 4, Gemini 2.5 Pro, and DeepSeek-R1, and train a custom Qwen2.5-14B model using the GRPO training method. Results show that human performance significantly outpaces current LLMs, revealing persistent weaknesses in token-level reasoning. TASE sheds light on these limitations and provides a new diagnostic lens for future improvements in low-level language understanding and cross-lingual generalization.

ECAI Conference 2025 Conference Paper

A Style-Aware Polytomous Diagnostic Model for Individual Traits

  • Yixuan Wang
  • Jiale Feng
  • Yue Huang
  • Xuruo Pan
  • Zhongjing Huang
  • Zhi Liu
  • Hong Qian

Diagnostic models aim to precisely infer individuals’ cognitive or non-cognitive competencies from their response logs, such as mathematical or social-emotional skills. While deep learning shows success in cognitive diagnosis, it remains underexplored in the equally important area of non-cognitive trait diagnosis. Accurate non-cognitive trait estimation is critical for individuals’ development. Unlike cognitive assessments using right or wrong responses, non-cognitive trait assessments typically use subjective Likert-scale items with ordinal polytomous options to reflect latent trait levels. Furthermore, individual response styles, such as tendencies toward higher or lower options, introduce bias in trait inference, causing estimations that deviate from true trait levels. Thus, maintaining options ordinal semantic structure and mitigating the response style bias in trait estimation are two major challenges for accurate trait diagnosis. To address these issues, this paper proposes a Style-Aware Polytomous Diagnosis (SAPD) model. Specifically, to capture the ordinal semantics of response options, SAPD constructs an Ordinal Option Graph (OOG) that explicitly encodes the ordinal relationship among polytomous options, where higher options reflect higher latent trait levels. To mitigate the bias caused by individual response styles, we first design a Style-Aware Relational Graph (SARG), a heterogeneous graph that integrates multiple interactions among participants, items, options and traits, implicitly embedding response style information within node representations. We then propose a Response Style Corrector (RSC) that explicitly captures individual response tendencies and disentangles response style bias during trait diagnosis, allowing for dynamic and adaptive correction of trait levels. Extensive experiments on five real-world datasets show that SAPD improves accuracy by an average of 4% over competitive methods. Visualizations confirm SAPD effectively disentangles response style effects, leading to more accurate and interpretable trait diagnosis.

AAAI Conference 2025 Conference Paper

Accelerated Diffusion via High-Low Frequency Decomposition for Pan-Sharpening

  • Ge Meng
  • Jingjia Huang
  • Jingyan Tu
  • Yingying Wang
  • Yunlong Lin
  • Xiaotong Tu
  • Yue Huang
  • Xinghao Ding

Pan-sharpening aims to preserve the spectral information of the multi-spectral (MS) image while leveraging the high-frequency details from the guided high-resolution panchromatic (PAN) image to enhance its spatial resolution. The key challenge is how to preserve the spectral information from the MS image and the spatial details from the PAN image as much as possible. Diffusion models have achieved favorable results in image restoration and synthesis tasks but suffer from excessive computational resource and time consumption. In this paper, we design a novel and computationally efficient diffusion-based pan-sharpening network that achieves accelerated diffusion while reducing task complexity by decoupling the high and low-frequency components of the fused image. Specifically, leveraging the information-preserving characteristic of the wavelet transformation, we introduce a Wavelet-based Low-frequency Diffusion Model (WLDM). WLDM generates the low-frequency coefficient of high-resolution MS (HRMS) image from the low-resolution MS (LRMS) image. This approach significantly reduces computational resources and complexity compared to the direct restoration of the HRMS image. Furthermore, we have devised a High-frequency Information Restoration Module (HIRM) to restore the high-frequency information in the HRMS image through the interaction of high-frequency coefficients from the PAN image in three directions. Extensive experiments on three different datasets demonstrate that our method outperforms existing approaches in both quantitative metrics, qualitative metrics, and inference efficiency.

NeurIPS Conference 2025 Conference Paper

Adaptive Distraction: Probing LLM Contextual Robustness with Automated Tree Search

  • Yanbo Wang
  • Zixiang Xu
  • Yue Huang
  • Gao Chujie
  • Siyuan Wu
  • Jiayi Ye
  • Pin-Yu Chen
  • Xiuying Chen

Large Language Models (LLMs) often struggle to maintain their original performance when faced with semantically coherent but task-irrelevant contextual information. Although prior studies have explored this issue using fixed-template or retrieval-based distractions, such static methods show limited effectiveness against contemporary models. To address this problem, we propose a dynamic distraction generation framework based on tree search, where the generation process is guided by model behavior. Without modifying the original question or answer, the method efficiently produces challenging adaptive distractions across multiple datasets, enabling systematic stress testing of LLMs’ contextual robustness. Experiments on four benchmarks demonstrate that the generated distractions lead to an average performance drop of over 45\% for mainstream models. Further comparisons of mitigation strategies show that prompt-based optimization methods yield limited gains, whereas post-training approaches (e. g. , DPO) significantly enhance the model's contextual robustness. The results indicate that these issues do not stem from knowledge deficits in LLMs, but from a fundamental inability to maintain consistent reasoning under contextual distraction, posing a major challenge to the reliability of LLMs in real-world applications.

NeurIPS Conference 2025 Conference Paper

AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking

  • Xiangqi Wang
  • Yue Huang
  • Yanbo Wang
  • Xiaonan Luo
  • Kehan Guo
  • Yujun Zhou
  • Xiangliang Zhang

LLMs often need effective configurations, like temperature and reasoning steps, to handle tasks requiring sophisticated reasoning and problem-solving, ranging from joke generation to mathematical reasoning. Existing prompting approaches usually adopt general-purpose, fixed configurations that work “well enough” across tasks but seldom achieve task-specific optimality. To address this gap, we introduce AdaReasoner, an LLM-agnostic plugin designed for any LLM to automate adaptive reasoning configurations for tasks requiring different types of thinking. AdaReasoner is trained using a reinforcement learning (RL) framework, combining a factorized action space with a targeted exploration strategy, along with a pretrained reward model to optimize the policy model for reasoning configurations with only a few-shot guide. AdaReasoner is backed by theoretical guarantees and experiments of fast convergence and a sublinear policy gap. Across six different LLMs and a variety of reasoning tasks, it consistently outperforms standard baselines, preserves out-of-distribution robustness, and yield gains on knowledge-intensive tasks through tailored prompts.

IJCAI Conference 2025 Conference Paper

Artificial Intelligence in Spectroscopy: Advancing Chemistry from Prediction To Generation and Beyond

  • Kehan Guo
  • Yili Shen
  • Gisela Abigail Gonzalez-Montiel
  • Yue Huang
  • Yujun Zhou
  • Mihir Surve
  • Zhichun Guo
  • Payel Das

The rapid advent of machine learning (ML) and artificial intelligence (AI) has catalyzed major transformations in chemistry, yet the application of these methods to spectroscopic and spectrometric data—termed Spectroscopy Machine Learning (SpectraML)—remains relatively underexplored. Modern spectroscopic techniques (MS, NMR, IR, Raman, UV-Vis) generate an ever-growing volume of high-dimensional data, creating a pressing need for automated and intelligent analysis beyond traditional expert-based workflows. In this survey, we provide a unified review of SpectraML, systematically examining state-of-the-art approaches for both forward tasks (molecule-to-spectrum prediction) and inverse tasks (spectrum-to-molecule inference). We trace the historical evolution of ML in spectroscopy—from early pattern recognition to the latest foundation models capable of advanced reasoning—and offer a taxonomy of representative neural architectures, including graph-based and transformer-based methods. Addressing key challenges such as data quality, multimodal integration, and computational scalability, we highlight emerging directions like synthetic data generation, large-scale pretraining, and few- or zero-shot learning. To foster reproducible research, we release an open-source repository containing curated datasets and code implementations. Our survey serves as a roadmap for researchers, guiding advancements at the intersection of spectroscopy and AI.

NeurIPS Conference 2025 Conference Paper

ChemOrch: Empowering LLMs with Chemical Intelligence via Groundbreaking Synthetic Instructions

  • Yue Huang
  • Zhengzhe Jiang
  • Xiaonan Luo
  • Kehan Guo
  • Haomin Zhuang
  • Yujun Zhou
  • Zhengqing Yuan
  • Xiaoqi Sun

Empowering large language models (LLMs) with chemical intelligence remains a challenge due to the scarcity of high-quality, domain-specific instruction-response datasets and the misalignment of existing synthetic data generation pipelines with the inherently hierarchical and rule-governed structure of chemical information. To address this, we propose ChemOrch, a framework that synthesizes chemically grounded instruction–response pairs through a two-stage process: task-controlled instruction generation and tool-aware response construction. ChemOrch enables controllable diversity and levels of difficulty for the generated tasks and ensures response precision through tool planning & distillation, and tool-based self-repair mechanisms. The effectiveness of ChemOrch is evaluated based on: 1) the \textbf{high quality} of generated instruction data, demonstrating superior diversity and strong alignment with chemical constraints; 2) the \textbf{dynamic generation of evaluation tasks} that more effectively reveal LLM weaknesses in chemistry; and 3) the significant \textbf{improvement of LLM chemistry capabilities} when the generated instruction data are used for fine-tuning. Our work thus represents a critical step toward scalable and verifiable chemical intelligence in LLMs. The code is available at \url{https: //anonymous. 4open. science/r/ChemOrch-854A}.

AAAI Conference 2025 Conference Paper

DPLUT: Unsupervised Low-light Image Enhancement with Lookup Tables and Diffusion Priors

  • Yunlong Lin
  • Zhenqi Fu
  • Kairun Wen
  • Tian Ye
  • Sixiang Chen
  • Ge Meng
  • Yingying Wang
  • Chui Kong

Low-light image enhancement (LIE) aims at precisely and efficiently recovering an image degraded in poor illumination environments. Recent advanced LIE techniques are using deep neural networks, which require lots of low-normal light image pairs, network parameters, and computational resources. As a result, their practicality is limited. In this work, we devise a novel unsupervised LIE framework based on diffusion priors and lookup tables (DPLUT) to achieve efficient low-light image recovery. The proposed approach comprises two critical components: a light adjustment lookup table (LLUT) and a noise suppression lookup table (NLUT). LLUT is optimized with a set of unsupervised losses. It aims at predicting pixel-wise curve parameters for the dynamic range adjustment of a specific image. NLUT is designed to remove the amplified noise after the light brightens. As diffusion models are sensitive to noise, diffusion priors are introduced to achieve high-performance noise suppression. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods in terms of visual quality and efficiency.

NeurIPS Conference 2025 Conference Paper

DyFlow: Dynamic Workflow Framework for Agentic Reasoning

  • Yanbo Wang
  • Zixiang Xu
  • Yue Huang
  • Xiangqi Wang
  • Zirui Song
  • Lang Gao
  • Chenxi Wang
  • Robert Tang

Agent systems based on large language models (LLMs) have shown great potential in complex reasoning tasks, but building efficient and generalizable workflows remains a major challenge. Most existing approaches rely on manually designed processes, which limits their adaptability across different tasks. While a few methods attempt automated workflow generation, they are often tied to specific datasets or query types and make limited use of intermediate feedback, reducing system robustness and reasoning depth. Moreover, their operations are typically predefined and inflexible. To address these limitations, we propose DyFlow, a dynamic workflow generation framework that adaptively constructs and adjusts reasoning procedures based on task requirements and real-time intermediate feedback, thereby enhancing cross-task generalization. DyFlow consists of two core components: a designer and an executor. The designer decomposes complex problems into a sequence of sub-goals defined by high-level objectives and dynamically plans the next steps based on intermediate outputs and feedback. These plans are then carried out by the executor, which executes each operation using dynamic operators with context-aware parameterization, enabling flexible and semantically grounded reasoning. We systematically evaluate DyFlow across diverse domains, including social reasoning, biomedical tasks, mathematical problem solving, and code generation. Results demonstrate that DyFlow significantly outperforms existing baselines, achieving substantial Pass@k improvements and exhibiting robust generalization across diverse domains.

NeurIPS Conference 2025 Conference Paper

DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling

  • Kairun Wen
  • Runyu Chen
  • Hui Zheng
  • Yunlong Lin
  • Panwang Pan
  • Chenxin Li
  • Wenyan Cong
  • Jian Zhang

Understanding the dynamic physical world, characterized by its evolving 3D structure, real-world motion, and semantic content with textual descriptions, is crucial for human-agent interaction and enables embodied agents to perceive and act within real environments with human‑like capabilities. However, existing datasets are often derived from limited simulators or utilize traditional Structure-from-Motion for up-to-scale annotation and offer limited descriptive captioning, which restricts the capacity of foundation models to accurately interpret real-world dynamics from monocular videos, commonly sourced from the internet. To bridge these gaps, we introduce DynamicVerse, a physical‑scale, multimodal 4D world modeling framework for dynamic real-world video. We employ large vision, geometric, and multimodal models to interpret metric-scale static geometry, real-world dynamic motion, instance-level masks, and holistic descriptive captions. By integrating window-based Bundle Adjustment with global optimization, our method converts long real-world video sequences into a comprehensive 4D multimodal format. DynamicVerse delivers a large-scale dataset consists of 100K+ videos with 800K+ annotated masks and 10M+ frames from internet videos. Experimental evaluations on three benchmark tasks, namely video depth estimation, camera pose estimation, and camera intrinsics estimation, demonstrate that our 4D modeling achieves superior performance in capturing physical-scale measurements with greater global accuracy than existing methods.

NeurIPS Conference 2025 Conference Paper

FRN: Fractal-Based Recursive Spectral Reconstruction Network

  • Ge Meng
  • Zhongnan Cai
  • Ruizhe Chen
  • Jingyan Tu
  • Yingying Wang
  • Yue Huang
  • Xinghao Ding

Generating hyperspectral images (HSIs) from RGB images through spectral reconstruction can significantly reduce the cost of HSI acquisition. In this paper, we propose a Fractal-Based Recursive Spectral Reconstruction Network (FRN), which differs from existing paradigms that attempt to directly integrate the full-spectrum information from the R, G, and B channels in a one-shot manner. Instead, it treats spectral reconstruction as a progressive process, predicting from broad to narrow bands or employing a coarse-to-fine approach for predicting the next wavelength. Inspired by fractals in mathematics, FRN establishes a novel spectral reconstruction paradigm by recursively invoking an atomic reconstruction module. In each invocation, only the spectral information from neighboring bands is used to provide clues for the generation of the image at the next wavelength, which follows the low-rank property of spectral data. Moreover, we design a band-aware state space model that employs a pixel-differentiated scanning strategy at different stages of the generation process, further suppressing interference from low-correlation regions caused by reflectance differences. Through extensive experimentation across different datasets, FRN achieves superior reconstruction performance compared to state-of-the-art methods. Code is available at https: //github. com/mongko007/frn.

NeurIPS Conference 2025 Conference Paper

Pan-LUT: Efficient Pan-sharpening via Learnable Look-Up Tables

  • Zhongnan Cai
  • Yingying Wang
  • Hui Zheng
  • Panwang Pan
  • Zixu Lin
  • Ge Meng
  • Chenxin Li
  • Chunming He

Recently, deep learning-based pan-sharpening algorithms have achieved notable advancements over traditional methods. However, deep learning-based methods incur substantial computational overhead during inference, especially with large images. This excessive computational demand limits the applicability of these methods in real-world scenarios, particularly in the absence of dedicated computing devices such as GPUs and TPUs. To address these challenges, we propose Pan-LUT, a novel learnable look-up table (LUT) framework for pan-sharpening that strikes a balance between performance and computational efficiency for large remote sensing images. Our method makes it possible to process 15K$\times$15K remote sensing images on a 24GB GPU. To finely control the spectral transformation, we devise the PAN-guided look-up table (PGLUT) for channel-wise spectral mapping. To effectively capture fine-grained spatial details, we introduce the spatial details look-up table (SDLUT). Furthermore, to adaptively aggregate channel information for generating high-resolution multispectral images, we design an adaptive output look-up table (AOLUT). Our model contains fewer than 700K parameters and processes a 9K$\times$9K image in under 1 ms using one RTX 2080 Ti GPU, demonstrating significantly faster performance compared to other methods. Experiments reveal that Pan-LUT efficiently processes large remote sensing images in a lightweight manner, bridging the gap to real-world applications. Furthermore, our model surpasses SOTA methods in full-resolution scenes under real-world conditions, highlighting its effectiveness and efficiency. We also extend our method to general image fusion tasks.

AAAI Conference 2025 Conference Paper

Sp3ctralMamba: Physics-Driven Joint State Space Model for Hyperspectral Image Reconstruction

  • Ge Meng
  • Jingyan Tu
  • Jingjia Huang
  • Yunlong Lin
  • Yingying Wang
  • Xiaotong Tu
  • Yue Huang
  • Xinghao Ding

Hyperspectral image (HSI) reconstruction aims to restore the original 3D HSIs from the 2D hyperspectral snapshot compressive images (SCIs). The key to high-fidelity HSI reconstruction lies in designing refined spatial and spectral attention mechanisms, which are crucial for generating fine-grained representations of HSI based on the limited spatial and spectral information available in SCI. Recently, Mamba has demonstrated remarkable performance and efficiency in modeling spatial correlations. Its implicit attention mechanism generates three orders of magnitude more attention matrices than transformers, significantly raising the performance ceiling for HSI reconstruction. In this paper, we propose a novel joint SSM network named Sp3ctralMamba for HSI reconstruction. Sp3ctralMamba integrates frequency domain knowledge and physical priors to enhance reconstruction quality. Specifically, we first perform hierarchical decomposition of the 3D HSI embedding to mitigate the negative impact of distant bands on reconstruction. Next, we design a joint SSM block S3Mamba (S3MAB) to perform parallel scans of the embeddings from different bands. In addition to the conventional vanilla scan, S3MAB introduces a local scanning scheme to address the reconstruction challenges posed by the spatial sparsity of spectral information. Furthermore, a spiral scanning scheme in the frequency domain is incorporated to enhance the order correlation between different frequency signals. Finally, we introduce energy priors and structural priors to constrain the generation of spectral and spatial representations during the training process. Extensive experiments on both simulated and real datasets demonstrate that Sp3ctralMamba significantly elevates HSI reconstruction performance to a new level, surpassing SOTA methods in both quantitative and qualitative metrics.

AAAI Conference 2025 Conference Paper

STAMPsy: Towards SpatioTemporal-Aware Mixed-Type Dialogues for Psychological Counseling

  • Jieyi Wang
  • Yue Huang
  • Zeming Liu
  • Dexuan Xu
  • Chuan Wang
  • Xiaoming Shi
  • Ruiyuan Guan
  • Hongxing Wang

Online psychological counseling dialogue systems are trending, offering a convenient and accessible alternative to traditional in-person therapy. However, existing psychological counseling dialogue systems mainly focus on basic empathetic dialogue or QA with minimal professional knowledge and without goal guidance. In many real-world counseling scenarios, clients often seek multi-type help, such as diagnosis, consultation, therapy, console, and common questions, but existing dialogue systems struggle to combine different dialogue types naturally. In this paper, we identify this challenge as how to construct mixed-type dialogue systems for psychological counseling that enable clients to clarify their goals before proceeding with counseling. To mitigate the challenge, we collect a mixed-type counseling dialogues corpus termed STAMPsy, covering five dialogue types, task-oriented dialogue for diagnosis, knowledge-grounded dialogue, conversational recommendation, empathetic dialogue, and question answering, over 5,000 conversations. Moreover, spatiotemporal-aware knowledge enables systems to have world awareness and has been proven to affect one's mental health. Therefore, we link dialogues in STAMPsy to spatiotemporal state and propose a spatiotemporal-aware mixed-type psychological counseling dataset. Additionally, we build baselines on STAMPsy and develop an iterative self-feedback psychological dialogue generation framework, named Self-STAMPsy. Results indicate that clarifying dialogue goals in advance and utilizing spatiotemporal states are effective.

NeurIPS Conference 2024 Conference Paper

Flaws can be Applause: Unleashing Potential of Segmenting Ambiguous Objects in SAM

  • Chenxin Li
  • Yuzhi Huang
  • Wuyang Li
  • Hengyu Liu
  • Xinyu Liu
  • Qing Xu
  • Zhen Chen
  • Yue Huang

As the vision foundation models like the Segment Anything Model (SAM) demonstrate potent universality, they also present challenges in giving ambiguous and uncertain predictions. Significant variations in the model output and granularity can occur with simply subtle changes in the prompt, contradicting the consensus requirement for the robustness of a model. While some established works have been dedicated to stabilizing and fortifying the prediction of SAM, this paper takes a unique path to explore how this flaw can be inverted into an advantage when modeling inherently ambiguous data distributions. We introduce an optimization framework based on a conditional variational autoencoder, which jointly models the prompt and the granularity of the object with a latent probability distribution. This approach enables the model to adaptively perceive and represent the real ambiguous label distribution, taming SAM to produce a series of diverse, convincing, and reasonable segmentation outputs controllably. Extensive experiments on several practical deployment scenarios involving ambiguity demonstrates the exceptional performance of our framework. Project page: \url{https: //a-sa-m. github. io/}.

NeurIPS Conference 2024 Conference Paper

HonestLLM: Toward an Honest and Helpful Large Language Model

  • Chujie Gao
  • Siyuan Wu
  • Yue Huang
  • Dongping Chen
  • Qihui Zhang
  • Zhengyan Fu
  • Yao Wan
  • Lichao Sun

Large Language Models (LLMs) have achieved remarkable success across various industries and applications, owing to their exceptional generative capabilities. Nevertheless, honesty and helpfulness, which ensure safe and useful real-world deployments, have been considered as the longstanding cornerstones in practice. In this paper, we first established comprehensive principles for honesty LLM and further created the HoneSet with 930 queries across six categories, which is designed to evaluate LLMs’ ability to maintain honesty. Then, we improved the honesty and helpfulness of LLMs in both training-free and fine-tuning settings. Specifically, we propose a training-free method named Curiosity-Driven Prompting, which enables LLMs to express their internal confusion and uncertainty about the given query and then optimize their responses. Moreover, we also propose a two-stage fine-tuning approach, inspired by curriculum learning, to enhance the honesty and helpfulness of LLMs. The method first teaches LLMs to distinguish between honest and dishonest, and then LLMs are trained to learn to respond more helpfully. Experimental results demonstrated that both of the two proposed methods improve the helpfulness of LLMs while making them maintain honesty. Our research has paved the way for more reliable and trustworthy LLMs in real-world applications.

AAAI Conference 2024 Conference Paper

Progressive High-Frequency Reconstruction for Pan-Sharpening with Implicit Neural Representation

  • Ge Meng
  • Jingjia Huang
  • Yingying Wang
  • Zhenqi Fu
  • Xinghao Ding
  • Yue Huang

Pan-sharpening aims to leverage the high-frequency signal of the panchromatic (PAN) image to enhance the resolution of its corresponding multi-spectral (MS) image. However, deep neural networks (DNNs) tend to prioritize learning the low-frequency components during the training process, which limits the restoration of high-frequency edge details in MS images. To overcome this limitation, we treat pan-sharpening as a coarse-to-fine high-frequency restoration problem and propose a novel method for achieving high-quality restoration of edge information in MS images. Specifically, to effectively obtain fine-grained multi-scale contextual features, we design a Band-limited Multi-scale High-frequency Generator (BMHG) that generates high-frequency signals from the PAN image within different bandwidths. During training, higher-frequency signals are progressively injected into the MS image, and corresponding residual blocks are introduced into the network simultaneously. This design enables gradients to flow from later to earlier blocks smoothly, encouraging intermediate blocks to concentrate on missing details. Furthermore, to address the issue of pixel position misalignment arising from multi-scale features fusion, we propose a Spatial-spectral Implicit Image Function (SIIF) that employs implicit neural representation to effectively represent and fuse spatial and spectral features in the continuous domain. Extensive experiments on different datasets demonstrate that our method outperforms existing approaches in terms of quantitative and visual measurements for high-frequency detail recovery.

ICRA Conference 2024 Conference Paper

SGCalib: A Two-stage Camera-LiDAR Calibration Method Using Semantic Information and Geometric Features

  • Zhipeng Lin
  • Zhi Gao 0005
  • Xinyi Liu 0002
  • Jialiang Wang
  • Weiwei Song
  • Ben M. Chen
  • Chenyang Li
  • Yue Huang

Extrinsic calibration is an essential prerequisite for the applications of camera-LiDAR fusion. Existing methods either suffer from the complex offline setting of man-made targets or tend to produce suboptimal and unrobust results. In this paper, we propose an online two-stage calibration method that estimates robust and accurate extrinsic parameters between camera and LiDAR. This is a novel work to use semantic information and geometric features jointly in calibration to promote accuracy and robustness. In the first stage, we detect objects in the image and point cloud and build graphs on the objects using Delaunay triangulation. Then, we design a novel graph matching algorithm to associate the objects in the two data domains and extract pairs of 2D-3D points. Using the PnP solver, we get robust initial extrinsic parameters. Then, in the second stage, we design a new optimization formulation with semantic information and geometric features to generate accurate extrinsic parameters with the initial value from the first stage. Extensive experiments on solid-state LiDAR, conventional spinning LiDAR and KITTI datasets have verified the robustness and accuracy of our method which outperforms existing works. We will share the code publicly to benefit the community (after review stages).

NeurIPS Conference 2023 Conference Paper

CODA: Generalizing to Open and Unseen Domains with Compaction and Disambiguation

  • Chaoqi Chen
  • Luyao Tang
  • Yue Huang
  • Xiaoguang Han
  • Yizhou Yu

The generalization capability of machine learning systems degenerates notably when the test distribution drifts from the training distribution. Recently, Domain Generalization (DG) has been gaining momentum in enabling machine learning models to generalize to unseen domains. However, most DG methods assume that training and test data share an identical label space, ignoring the potential unseen categories in many real-world applications. In this paper, we delve into a more general but difficult problem termed Open Test-Time DG (OTDG), where both domain shift and open class may occur on the unseen test data. We propose Compaction and Disambiguation (CODA), a novel two-stage framework for learning compact representations and adapting to open classes in the wild. To meaningfully regularize the model's decision boundary, CODA introduces virtual unknown classes and optimizes a new training objective to insert unknowns into the latent space by compacting the embedding space of source known classes. To adapt target samples to the source model, we then disambiguate the decision boundaries between known and unknown classes with a test-time training objective, mitigating the adaptivity gap and catastrophic forgetting challenges. Experiments reveal that CODA can significantly outperform the previous best method on standard DG datasets and harmonize the classification accuracy between known and unknown classes.

AAAI Conference 2023 Conference Paper

Self-Supervised Image Denoising Using Implicit Deep Denoiser Prior

  • Huangxing Lin
  • Yihong Zhuang
  • Xinghao Ding
  • Delu Zeng
  • Yue Huang
  • Xiaotong Tu
  • John Paisley

We devise a new regularization for denoising with self-supervised learning. The regularization uses a deep image prior learned by the network, rather than a traditional predefined prior. Specifically, we treat the output of the network as a ``prior'' that we again denoise after ``re-noising.'' The network is updated to minimize the discrepancy between the twice-denoised image and its prior. We demonstrate that this regularization enables the network to learn to denoise even if it has not seen any clean images. The effectiveness of our method is based on the fact that CNNs naturally tend to capture low-level image statistics. Since our method utilizes the image prior implicitly captured by the deep denoising CNN to guide denoising, we refer to this training strategy as an Implicit Deep Denoiser Prior (IDDP). IDDP can be seen as a mixture of learning-based methods and traditional model-based denoising methods, in which regularization is adaptively formulated using the output of the network. We apply IDDP to various denoising tasks using only observed corrupted data and show that it achieves better denoising results than other self-supervised denoising methods.

NeurIPS Conference 2022 Conference Paper

Mix and Reason: Reasoning over Semantic Topology with Data Mixing for Domain Generalization

  • Chaoqi Chen
  • Luyao Tang
  • Feng Liu
  • Gangming Zhao
  • Yue Huang
  • Yizhou Yu

Domain generalization (DG) enables generalizing a learning machine from multiple seen source domains to an unseen target one. The general objective of DG methods is to learn semantic representations that are independent of domain labels, which is theoretically sound but empirically challenged due to the complex mixture of common and domain-specific factors. Although disentangling the representations into two disjoint parts has been gaining momentum in DG, the strong presumption over the data limits its efficacy in many real-world scenarios. In this paper, we propose Mix and Reason (MiRe), a new DG framework that learns semantic representations via enforcing the structural invariance of semantic topology. MiRe consists of two key components, namely, Category-aware Data Mixing (CDM) and Adaptive Semantic Topology Refinement (ASTR). CDM mixes two images from different domains in virtue of activation maps generated by two complementary classification losses, making the classifier focus on the representations of semantic objects. ASTR introduces relation graphs to represent semantic topology, which is progressively refined via the interactions between local feature aggregation and global cross-domain relational reasoning. Experiments on multiple DG benchmarks validate the effectiveness and robustness of the proposed MiRe.

AAAI Conference 2022 Conference Paper

Unsupervised Underwater Image Restoration: From a Homology Perspective

  • Zhenqi Fu
  • Huangxing Lin
  • Yan Yang
  • Shu Chai
  • Liyan Sun
  • Yue Huang
  • Xinghao Ding

Underwater images suffer from degradation due to light scattering and absorption. It remains challenging to restore such degraded images using deep neural networks since real-world paired data is scarcely available while synthetic paired data cannot approximate real-world data perfectly. In this paper, we propose an UnSupervised Underwater Image Restoration method (USUIR) by leveraging the homology property between a raw underwater image and a re-degraded image. Specifically, USUIR first estimates three latent components of the raw underwater image, i. e. , the global background light, the transmission map, and the scene radiance (the clean image). Then, a re-degraded image is generated by randomly mixing up the estimated scene radiance and the raw underwater image. We demonstrate that imposing a homology constraint between the raw underwater image and the re-degraded image is equivalent to minimizing the restoration error and hence can be used for the unsupervised restoration. Extensive experiments show that USUIR achieves promising performance in both inference time and restoration quality.

JBHI Journal 2021 Journal Article

Curriculum Feature Alignment Domain Adaptation for Epithelium-Stroma Classification in Histopathological Images

  • Qi Qi
  • Xin Lin
  • Chaoqi Chen
  • Weiping Xie
  • Yue Huang
  • Xinghao Ding
  • Xiaoqing Liu
  • Yizhou Yu

In recent years, deep learning methods have received more attention in epithelial-stroma (ES) classification tasks. Traditional deep learning methods assume that the training and test data have the same distribution, an assumption that is seldom satisfied in complex imaging procedures. Unsupervised domain adaptation (UDA) transfers knowledge from a labelled source domain to a completely unlabeled target domain, and is more suitable for ES classification tasks to avoid tedious annotation. However, existing UDA methods for this task ignore the semantic alignment across domains. In this paper, we propose a Curriculum Feature Alignment Network (CFAN) to gradually align discriminative features across domains through selecting effective samples from the target domain and minimizing intra-class differences. Specifically, we developed the Curriculum Transfer Strategy (CTS) and Adaptive Centroid Alignment (ACA) steps to train our model iteratively. We validated the method using three independent public ES datasets, and experimental results demonstrate that our method achieves better performance in ES classification compared with commonly used deep learning methods and existing deep domain adaptation methods.

IJCAI Conference 2021 Conference Paper

Noise2Grad: Extract Image Noise to Denoise

  • Huangxing Lin
  • Yihong Zhuang
  • Yue Huang
  • Xinghao Ding
  • Xiaoqing Liu
  • Yizhou Yu

In many image denoising tasks, the difficulty of collecting noisy/clean image pairs limits the application of supervised CNNs. We consider such a case in which paired data and noise statistics are not accessible, but unpaired noisy and clean images are easy to collect. To form the necessary supervision, our strategy is to extract the noise from the noisy image to synthesize new data. To ease the interference of the image background, we use a noise removal module to aid noise extraction. The noise removal module first roughly removes noise from the noisy image, which is equivalent to excluding much background information. A noise approximation module can therefore easily extract a new noise map from the removed noise to match the gradient of the noisy input. This noise map is added to a random clean image to synthesize a new data pair, which is then fed back to the noise removal module to correct the noise removal process. These two modules cooperate to extract noise finely. After convergence, the noise removal module can remove noise without damaging other background details, so we use it as our final denoising network. Experiments show that the denoising performance of the proposed method is competitive with other supervised CNNs.

JBHI Journal 2020 Journal Article

An Adversarial Learning Approach to Medical Image Synthesis for Lesion Detection

  • Liyan Sun
  • Jiexiang Wang
  • Yue Huang
  • Xinghao Ding
  • Hayit Greenspan
  • John Paisley

The identification of lesion within medical image data is necessary for diagnosis, treatment and prognosis. Segmentation and classification approaches are mainly based on supervised learning with well-paired image-level or voxel-level labels. However, labeling the lesion in medical images is laborious requiring highly specialized knowledge. We propose a medical image synthesis model named abnormal-to-normal translation generative adversarial network (ANT-GAN) to generate a normal-looking medical image based on its abnormal-looking counterpart without the need for paired training data. Unlike typical GANs, whose aim is to generate realistic samples with variations, our more restrictive model aims at producing a normal-looking image corresponding to one containing lesions, and thus requires a special design. Being able to provide a “normal” counterpart to a medical image can provide useful side information for medical imaging tasks like lesion segmentation or classification validated by our experiments. In the other aspect, the ANT-GAN model is also capable of producing highly realistic lesion-containing image corresponding to the healthy one, which shows the potential in data augmentation verified in our experiments.

JBHI Journal 2019 Journal Article

Label-Efficient Breast Cancer Histopathological Image Classification

  • Qi Qi
  • Yanlong Li
  • Jitian Wang
  • Han Zheng
  • Yue Huang
  • Xinghao Ding
  • Gustavo Kunde Rohde

The automatic classification of breast cancer histopathological images has great significance in computer-aided diagnosis. Recently, deep learning via neural networks has enabled pattern detection and prediction using large, labeled datasets; whereas, collecting and annotating sufficient histological data using professional pathologists is time consuming, tedious, and extremely expensive. In the proposed paper, a deep active learning framework is designed and implemented for classification of breast cancer histopathological images, with the goal of maximizing the learning accuracy from very limited labeling. This method involves manual annotation of the most valuable unlabeled samples, which are then integrated into the training set. The model is then iteratively updated with an increasing training set. Here, two selection strategies are discussed for the proposed deep active learning framework: An entropy-based strategy and a confidence-boosting strategy. The proposed method has been validated using a publicly available breast cancer histopathological image dataset, wherein each image patch is binarily classified as benign or malignant. The experimental results demonstrate that, compared with a random selection, our proposed framework can reduce annotation costs up to 66. 67%, with higher accuracy and less expensive annotation than standard query strategy.

JBHI Journal 2019 Journal Article

SetSVM: An Approach to Set Classification in Nuclei-Based Cancer Detection

  • Chi Liu
  • Yue Huang
  • John A. Ozolek
  • Matthew G. Hanna
  • Rajendra Singh
  • Gustavo K. Rohde

Due to the importance of nuclear structure in cancer diagnosis, several predictive models have been described for diagnosing a wide variety of cancers based on nuclear morphology. In many computer-aided diagnosis (CAD) systems, cancer detection tasks can be generally formulated as set classification problems, which can not be directly solved by classifying single instances. In this paper, we propose a novel set classification approach SetSVM to build a predictive model by considering any nuclei set as a whole without specific assumptions. SetSVM features highly discriminative power in cancer detection challenges in the sense that it not only optimizes the classifier decision boundary but also transfers discriminative information to set representation learning. During model training, these two processes are unified in the support vector machine (SVM) maximum separation margin problem. Experiment results show that SetSVM provides significant improvements compared with five commonly used approaches in cancer detection tasks utilizing 260 patients in total across three different cancer types, namely, thyroid cancer, liver cancer, and melanoma. In addition, we show that SetSVM enables visual interpretation of discriminative nuclear characteristics representing the nuclei set. These features make SetSVM a potentially practical tool in building accurate and interpretable CAD systems for cancer detection.

AAAI Conference 2018 Conference Paper

Compressed Sensing MRI Using a Recursive Dilated Network

  • Liyan Sun
  • Zhiwen Fan
  • Yue Huang
  • Xinghao Ding
  • John Paisley

Compressed sensing magnetic resonance imaging (CS-MRI) is an active research topic in the field of inverse problems. Conventional CS-MRI algorithms usually exploit the sparse nature of MRI in an iterative manner. These optimizationbased CS-MRI methods are often time-consuming at test time, and are based on fixed transform bases or shallow dictionaries, which limits modeling capacity. Recently, deep models have been introduced to the CS-MRI problem. One main challenge for CS-MRI methods based on deep learning is the trade-off between model performance and network size. We propose a recursive dilated network (RDN) for CS-MRI that achieves good performance while reducing the number of network parameters. We adopt dilated convolutions in each recursive block to aggregate multi-scale information within the MRI. We also adopt a modified shortcut strategy to help features flow into deeper layers. Experimental results show that the proposed RDN model achieves state-of-the-art performance in CS-MRI while using far fewer parameters than previously required.

JBHI Journal 2017 Journal Article

Epithelium-Stroma Classification via Convolutional Neural Networks and Unsupervised Domain Adaptation in Histopathological Images

  • Yue Huang
  • Han Zheng
  • Chi Liu
  • Xinghao Ding
  • Gustavo K. Rohde

Epithelium-stroma classification is a necessary preprocessing step in histopathological image analysis. Current deep learning based recognition methods for histology data require collection of large volumes of labeled data in order to train a new neural network when there are changes to the image acquisition procedure. However, it is extremely expensive for pathologists to manually label sufficient volumes of data for each pathology study in a professional manner, which results in limitations in real-world applications. A very simple but effective deep learning method, that introduces the concept of unsupervised domain adaptation to a simple convolutional neural network (CNN), has been proposed in this paper. Inspired by transfer learning, our paper assumes that the training data and testing data follow different distributions, and there is an adaptation operation to more accurately estimate the kernels in CNN in feature extraction, in order to enhance performance by transferring knowledge from labeled data in source domain to unlabeled data in target domain. The model has been evaluated using three independent public epithelium-stroma datasets by cross-dataset validations. The experimental results demonstrate that for epithelium-stroma classification, the proposed framework outperforms the state-of-the-art deep neural network model, and it also achieves better performance than other existing deep domain adaptation methods. The proposed model can be considered to be a better option for real-world applications in histopathological image analysis, since there is no longer a requirement for large-scale labeled data in each specified domain.

JBHI Journal 2015 Journal Article

An Implantable RFID Sensor Tag toward Continuous Glucose Monitoring

  • Zhibin Xiao
  • Xi Tan
  • Xianliang Chen
  • Sizheng Chen
  • Zijian Zhang
  • Hualei Zhang
  • Junyu Wang
  • Yue Huang

This paper presents a wirelessly powered implantable electrochemical sensor tag for continuous blood glucose monitoring. The system is remotely powered by a 13. 56-MHz inductive link and utilizes an ISO 15693 radio frequency identification (RFID) standard for communication. This paper provides reliable and accurate measurement for changing glucose level. The sensor tag employs a long-term glucose sensor, a winding ferrite antenna, an RFID front-end, a potentiostat, a 10-bit sigma-delta analog to digital converter, an on-chip temperature sensor, and a digital baseband for protocol processing and control. A high-frequency external reader is used to power, command, and configure the sensor tag. The only off-chip support circuitry required is a tuned antenna and a glucose microsensor. The integrated chip fabricated in SMIC 0. 13-μm CMOS process occupies an area of 1. 2 mm × 2 mm and consumes 50 μW. The power sensitivity of the whole system is -4 dBm. The sensor tag achieves a measured glucose range of 0-30 mM with a sensitivity of 0. 75 nA/mM.