Arrow Research search

Author name cluster

Zhe Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
2 author rows

Possible papers

11

AAAI Conference 2026 Conference Paper

C³TG: Conflict-aware, Composite, and Collaborative Controlled Text Generation

  • Yu Li
  • Zhe Yang
  • Yi Huang
  • Xin Liu
  • Guilin Qi

Recent advancements in large language models (LLMs) have demonstrated remarkable text generation capabilities. However, controlling specific attributes of generated text remains challenging without architectural modifications or extensive fine-tuning. Current methods typically toggle a single, basic attribute but struggle with precise multi-attribute control. In scenarios where attribute requirements conflict, existing methods lack coordination mechanisms, causing interference between desired attributes. Furthermore, these methods fail to incorporate iterative optimization processes in the controlled generation pipeline. To address these limitations, we propose Conflict-aware, Composite, and Collaborative Controlled Text Generation (C³TG), a two-phase framework for fine-grained, multi-dimensional text attribute control. During generation, C³TG selectively pairs the LLM with the required attribute classifiers from the 17 available dimensions and employs weighted KL-divergence to adjust token probabilities. The optimization phase then leverages an energy function combining classifier scores and penalty terms to resolve attribute conflicts through iterative feedback, enabling precise control over multiple dimensions simultaneously while preserving natural text flow. Experiments show that C³TG significantly outperforms baselines across multiple metrics including attribute accuracy, linguistic fluency, and output diversity, while simultaneously reducing toxicity. These results establish C³TG as an effective and flexible solution for multi-dimensional text attribute control that requires no costly model modifications.

TMLR Journal 2026 Journal Article

Data Compressibility Quantifies LLM Memorization

  • Yizhan Huang
  • Zhe Yang
  • Meifang Chen
  • HUANG Nianchen
  • Jianping Zhang
  • Michael R. Lyu

Large Language Models (LLMs) are known to memorize portions of their training data, sometimes even reproduce content verbatim when prompted appropriately. Despite substantial interest, existing LLM memorization research has offered limited insight into how training data influences memorization and largely lacks quantitative characterization. In this work, we build upon the line of research that seeks to quantify memorization through data compressibility. We analyze why prior attempts fail to yield a reliable quantitative measure and show that a surprisingly simple shift from instance-level to set-level metrics uncovers a robust phenomenon, which we term the \textit{Entropy--Memorization (EM) Linearity}. This law states that a set-level data entropy estimator exhibits a linear correlation with memorization scores. We validate the EM Linearity through extensive experiments across a wide range of open-source models and experimental configurations. We further investigate the role of the token space—an implicit yet pivotal factor in our method—and identify an additional variant of the EM Law. Besides, we made a side observation that EM Linearity enables a simple application to distinguish between LLM train data and test data.

EAAI Journal 2025 Journal Article

A Dual-Population Constrained Multi-Objective Evolutionary Algorithm with Success Incentive Mechanism and its application to uncertain multimodal transportation problems

  • Zhe Yang
  • Libao Deng
  • Yuanzhu Di
  • Chunlei Li
  • Yifan Qin
  • Lili Zhang

The evolution of the transportation industry has heightened the focus on environmentally sustainable multimodal transport, particularly in addressing carbon emissions. In modern logistics, path planning under uncertainty has become a pivotal research area. This paper proposes a multi-objective, multi-constraint optimization model for multimodal transport that aims to concurrently minimize cost, carbon emissions, and time. The model accounts for numerous operational constraints, including timetables, as well as dual sources of uncertainty from demand and the transport environment. To solve this complex problem, this paper introduces a new algorithmic framework. The proposed algorithm, a Dual-Population Constrained Multi-Objective Evolutionary Algorithm with a Success Incentive Mechanism (DSCMOEA), integrates three key innovations: a universal priority-based encoding/decoding adapter, a specialized constraint-handling architecture, and an adaptive operator selection mechanism. The adapter is central to the framework, enabling continuous-domain evolutionary algorithms to solve the discrete transport problem without internal modification. This approach also provides the versatility to handle various uncertainty paradigms through a multi-scenario simulation context. Experimental analysis validates the superiority of the proposed algorithm against eight established competitors, demonstrating its effectiveness in solving complex multimodal transport problems under uncertainty.

AAAI Conference 2025 Conference Paper

Exploring Activation Patterns of Parameters in Language Models

  • Yudong Wang
  • Damai Dai
  • Zhe Yang
  • Jingyuan Ma
  • Zhifang Sui

Most work treats large language models as black boxes without an in-depth understanding of their internal working mechanism. To explain the internal representations of LLMs, we utilize a gradient-based metric to assess the activation level of model parameters. Based on this metric, we obtain three preliminary findings. (1) When the inputs are in the same domain, parameters in the shallow layers will be activated densely, which means a larger portion of parameters will have great impacts on the outputs. In contrast, parameters in the deep layers are activated sparsely. (2) When the inputs are across different domains, parameters in shallow layers exhibit higher similarity in the activation behavior than in deep layers. (3) In deep layers, the similarity of the distributions of activated parameters is positively correlated to the empirical data relevance. Further, we develop three validation experiments to solidify these findings. (1) Firstly, starting from the first finding, we attempt to configure different sparsities for different layers and find this method can benefit model pruning. (2) Secondly, we find that a pruned model based on one calibration set can better handle tasks related to the calibration task than those not related, which validates the second finding. (3) Thirdly, Based on the STS-B and SICK benchmarks, we find that two sentences with consistent semantics tend to share similar parameter activation patterns in deep layers, which aligns with our third finding. Our work sheds light on the behavior of parameter activation in LLMs, and we hope these findings will have the potential to inspire more practical applications.

NeurIPS Conference 2025 Conference Paper

ForensicHub: A Unified Benchmark & Codebase for All-Domain Fake Image Detection and Localization

  • Bo Du
  • Xuekang Zhu
  • Xiaochen Ma
  • Chenfan Qu
  • Kaiwen Feng
  • Zhe Yang
  • Chi-Man Pun
  • Jian Liu

The field of Fake Image Detection and Localization (FIDL) is highly fragmented, encompassing four domains: deepfake detection (Deepfake), image manipulation detection and localization (IMDL), artificial intelligence-generated image detection (AIGC), and document image manipulation localization (Doc). Although individual benchmarks exist in some domains, a unified benchmark for all domains in FIDL remains blank. The absence of a unified benchmark results in significant domain silos, where each domain independently constructs its datasets, models, and evaluation protocols without interoperability, preventing cross-domain comparisons and hindering the development of the entire FIDL field. To close the domain silo barrier, we propose ForensicHub, the first unified benchmark & codebase for all-domain fake image detection and localization. Considering drastic variations on dataset, model, and evaluation configurations across all domains, as well as the scarcity of open-sourced baseline models and the lack of individual benchmarks in some domains, ForensicHub: i) proposes a modular and configuration-driven architecture that decomposes forensic pipelines into interchangeable components across datasets, transforms, models, and evaluators, allowing flexible composition across all domains; ii) fully implements 10 baseline models (3 of which are reproduced from scratch), 6 backbones, 2 new benchmarks for AIGC and Doc, and integrates 2 existing benchmarks of DeepfakeBench and IMDLBenCo through an adapter-based design; iii) establishes an image forensic fusion protocol evaluation mechanism that supports unified training and testing of diverse forensic models across tasks; iv) conducts indepth analysis based on the ForensicHub, offering 8 key actionable insights into FIDL model architecture, dataset characteristics, and evaluation standards. Specifically, ForensicHub includes 4 forensic tasks, 23 datasets, 42 baseline models, 6 backbones, 11 GPU-accelerated pixel- and image-level evaluation metrics, and realizes 16 kinds of cross-domain evaluations. ForensicHub represents a significant leap forward in breaking the domain silos in the FIDL field and inspiring future breakthroughs. Code is available at: https: //github. com/scu-zjz/ForensicHub.

AAAI Conference 2025 Conference Paper

Hyperbolic-Constraint Point Cloud Reconstruction from Single RGB-D Images

  • Wenrui Li
  • Zhe Yang
  • Wei Han
  • Hengyu Man
  • Xingtao Wang
  • Xiaopeng Fan

Reconstructing desired objects and scenes has long been a primary goal in 3D computer vision. Single-view point cloud reconstruction has become a popular technique due to its low cost and accurate results. However, single-view reconstruction methods often rely on expensive CAD models and complex geometric priors. Effectively utilizing prior knowledge about the data remains a challenge. In this paper, we introduce hyperbolic space to 3D point cloud reconstruction, enabling the model to represent and understand complex hierarchical structures in point clouds with low distortion. We build upon previous methods by proposing a hyperbolic Chamfer distance and a regularized triplet loss to enhance the relationship between partial and complete point clouds. Additionally, we design adaptive boundary conditions to improve the model's understanding and reconstruction of 3D structures. Our model outperforms most existing models, and ablation studies demonstrate the significance of our model and its components. Experimental results show that our method significantly improves feature extraction capabilities. Our model achieves outstanding performance in 3D reconstruction tasks.

ICLR Conference 2025 Conference Paper

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

  • Shaolei Zhang 0001
  • Qingkai Fang
  • Zhe Yang
  • Yang Feng 0004

The advent of real-time large multimodal models (LMMs) like GPT-4o has sparked considerable interest in efficient LMMs. LMM frameworks typically encode visual inputs into vision tokens (continuous representations) and integrate them and textual instructions into the context of large language models (LLMs), where large-scale parameters and numerous context tokens (predominantly vision tokens) result in substantial computational overhead. Previous efforts towards efficient LMMs always focus on replacing the LLM backbone with smaller models, while neglecting the crucial issue of token quantity. In this paper, we introduce LLaVA-Mini, an efficient LMM with minimal vision tokens. To achieve a high compression ratio of vision tokens while preserving visual information, we first analyze how LMMs understand vision tokens and find that most vision tokens only play a crucial role in the early layers of LLM backbone, where they mainly fuse visual information into text tokens. Building on this finding, LLaVA-Mini introduces modality pre-fusion to fuse visual information into text tokens in advance, thereby facilitating the extreme compression of vision tokens fed to LLM backbone into one token. LLaVA-Mini is a unified large multimodal model that can support the understanding of images, high-resolution images, and videos in an efficient manner. Experiments across 11 image-based and 7 video-based benchmarks demonstrate that LLaVA-Mini outperforms LLaVA-v1.5 with just 1 vision token instead of 576. Efficiency analyses reveal that LLaVA-Mini can reduce FLOPs by 77%, deliver low-latency responses within 40 milliseconds, and process over 10,000 frames of video on the GPU hardware with 24GB of memory.

EAAI Journal 2025 Journal Article

Optimization of Built-In Self-Test test chain configuration in 2.5D Integrated Circuits Using Constrained Multi-Objective Evolutionary Algorithm

  • Zhe Yang
  • Libao Deng
  • Chunlei Li
  • Lili Zhang

2. 5D Integrated Circuit (2. 5D IC) is an advanced packaging technology. This technology facilitates the dense integration of multiple dies by adding passive components to the silicon interposer. However, the high-density stacking of dies has increased the complexity of IC. Its highly interconnected nature presents a series of challenges for IC testing. As for 2. 5D IC testing utilizing the Built-In Self-Test(BIST) architecture, a dual-objective optimal model is established for test chain configuration. The model aims to minimize both hardware testing costs and test time, considering constraints related to maximum power consumption and maximum testing time. Subsequently, A generic encoding and decoding method is devised to enable the direct application of existing Multi-Objective Evolutionary Algorithms (MOEAs) and Constrained Multi-Objective Evolutionary Algorithms (CMOEAs) to solve the test chain configuration problem. Simultaneously, a dual-stage dual-population CMOEA is devised, integrating diverse local search strategies and adaptively adjusting them based on a successful incentive mechanism. Additionally, a constraint handling framework is introduced, utilizing the relationship between the Constrained Pareto Front (CPF) and Unconstrained Pareto Front (UPF) to assist in searching for solutions that satisfy constraints. This framework also adaptively allocates computational resources by assessing the collaborative effects of auxiliary tasks. Multiple experiments are designed and conducted to demonstrate the correctness and effectiveness of the proposed model and algorithm. Comparative analysis with 8 MOEAs or CMOEAs reveals that the proposed algorithm outperforms in solving the test chain configuration problem.

EAAI Journal 2023 Journal Article

DPCTN: Dual path context-aware transformer network for medical image segmentation

  • Pengfei Song
  • Zhe Yang
  • Jinjiang Li
  • Hui Fan

Accurate segmentation of lesions in medical images is a key step to assist clinicians in diagnosis and analysis. Most studies combine the Transformer model with CNN at a single scale or use the highest-level feature tensor extracted by CNN as input to Transformer without fully exploiting Transformer’s potential. In addition, for the problems of structural boundary blurring, heterogeneous textures, etc. , in medical images, most existing methods pay attention to using contour information to solve this problem but simply fuse the contour information and ignore the potential relationship between the regions and the contours. We propose the DPCTN network based on the traditional encoding–decoding structure, consisting of the CNN, Transformer dual backbone networks and parallel attention mechanisms, to achieve accurate segmentation in medical image lesions. Local and global multiscale feature information is extracted by CNN and Transformer. The Transformer block of channel cross fusion can implement multiscale information fusion of the high-level local features and reduce the impact of the redundant information. The dual backbone feature fusion module effectively couples the local and global high-level feature information. The decoder refines and enriches the boundary and regional features, layer by layer, to achieve effective supervision of the boundary and region. Considering the possible dimension collapse in the attention mechanism, a novel three branch transposed self-attention module is designed to reduce the information loss caused by feature pooling. To verify the effectiveness of our proposed method, subjective and objective comparative experiments and ablation experiments were performed on four medical segmentation tasks, polyps, skin lesions, glands and breast tumors. A large number of experimental results show that our method is superior to the current state-of-the-art method, reduces the standard deviation and is more robust. Source code is released at https: //github. com/sd-spf/DPCTN.

EAAI Journal 2020 Journal Article

Short-term natural gas consumption prediction based on Volterra adaptive filter and improved whale optimization algorithm

  • Weibiao Qiao
  • Zhe Yang
  • Zhangyang Kang
  • Zhen Pan

Short-term natural gas consumption prediction is an important indicator of natural gas pipeline network planning and design, which is of great significance. The purpose of this study is to propose a novel hybrid forecast model in view of the Volterra adaptive filter and an improved whale optimization algorithm to predict the short-term natural gas consumption. Firstly, Gauss smoothing and C–C method is adopted to pretreat and reconstruct short-term natural gas consumption time series; secondly, to improve the performance of whale optimization algorithm, adaptive search-surround mechanism and spiral position and jumping behavior are introduced into it; Thirdly, Volterra adaptive filter is used to predict the short-term natural gas consumption, and the important parameters (e. g. embedding dimension) is optimized by improved whale optimization algorithm. Finally, an actual example is given to test the performance of the developed prediction model. The results indicate that (1) short-term natural gas consumption time series has chaotic characteristics; (2) performance of the improved whale optimization algorithm is better than some comparative algorithms (i. e. cuckoo optimization algorithm, etc. ) based on the different evaluation indicators; (3) exploration factor is the main operational factor; (4) the performance of the proposed prediction model is better than some advanced prediction models (e. g. back propagation neural network). It can be concluded that such an innovative hybrid prediction model may provide a reference for natural gas companies to achieve intelligent scheduling.

TCS Journal 2004 Journal Article

Encoding types in ML-like languages

  • Zhe Yang

This article presents several general approaches to programming with type-indexed families of values within a Hindley–Milner type system. A type-indexed family of values is a function that maps a family of types to a family of values. The function performs a case analysis on the input types and returns values of possibly different types. Such a case analysis on types seems to be prohibited by the Hindley–Milner type system. Our approaches solve the problem by using type encodings. The compile-time types of the type encodings reflect the types themselves, thereby making the approaches type-safe, in the sense that the underlying type system statically prevents any mismatch between the input type and the function arguments that depend on this type. A type encoding could be either value-dependent, meaning that the type encoding is tied to a specific type-indexed family, or value-independent, meaning that the type encoding can be shared by various type-indexed families. Our first approach is value-dependent: we simply interpret a type as its corresponding value. Our second approach provides value-independent type encodings through embedding and projection functions; they are universal type interpretations, in that they can be used to compute other type interpretations. We also present an alternative approach to value-independent type encodings, using higher-order functors. We demonstrate our techniques through applications such as C printf-like formatting, type-directed partial evaluation, and subtype coercions.