Arrow Research search

Author name cluster

Ling Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers
2 author rows

Possible papers

17

TIST Journal 2026 Journal Article

Matching Accounts on Blockchain via Pseudo Fine-tuning of Language Models

  • Sihao Hu
  • Tiansheng Huang
  • Fatih Ilhan
  • Selim Furkan Tekin
  • Greg Eisenhauer
  • Margaret L. Loper
  • Ling Liu

Web 3.0, built on blockchain technology, prioritizes user privacy and autonomy, presenting new opportunities for financial systems while also complicating the regulation of illicit activities. In this study, we present a novel infrastructure named Pseudo Fine-tuning (PFT) that provides account matching services to combat financial crimes on account-based blockchains such as money laundering through coin-mixing services. The significance of PFT lies in overcoming the need for real labels to fine-tune language models for account matching, given the limited availability of labeled account pairs for the task. Specifically, our design involves (1) crafting pseudo-labeled pairs from transactions of an account across different periods, and (2) fine-tuning language models to distill knowledge from pseudo pairs, which is transferable to the target task. We provide an in-depth analysis to investigate the inherent knowledge acquired during the PFT process and the conditions conducive to its effectiveness. Comprehensive experiments on real-world datasets collected from coin-mixing services and ENS name services, corroborate that the framework delivers pronounced enhancements over state-of-the-art approaches. Our implementation is released at https://github.com/git-disl/PFT.

ICLR Conference 2025 Conference Paper

Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge

  • Aparna Elangovan
  • Lei Xu 0040
  • Jongwoo Ko
  • Mahsa Elyasi
  • Ling Liu
  • Sravan Babu Bodapati
  • Dan Roth 0001

The effectiveness of automatic evaluation of generative models is typically measured by comparing the labels generated via automation with human labels using correlation metrics. However, metrics like Krippendorff's $\alpha$ and Randolph's $\kappa$ were originally designed to measure the reliability of human labeling, thus make assumptions about typical human labeling behavior, and these assumptions may not be applicable to machine generated labels. In this paper, we show how *relying on a single aggregate correlation score* can obscure fundamental differences between human labels and those from automatic evaluation, including LLM-as-a-Judge. Specifically, we demonstrate that when the proportion of samples with variation or uncertainty in human assigned labels is relatively high, machine labels (generated by automatic evaluation methods) may superficially appear to have similar or better correlation with the human majority label compared to the human-to-human (HH) correlation. This can create the illusion that labels from automatic evaluation approximates the human majority label. However, as the proportion of samples with consistent human labels increases, the correlation between machine and human labels fall well below HH correlation. Based on these findings, we first propose *stratifying data by human label uncertainty* to provide a more robust analysis of automatic evaluation performance. Second, recognizing that uncertainty and variation are inherent in perception-based human evaluations, such as those involving attitudes or preferences, we introduce a new metric -*binned Jensen-Shannon Divergence for perception* for such scenarios to better measure the effectiveness of automatic evaluations. Third, we present visualization techniques -- *perception charts*, to contextualize correlation measures appropriately and to show the strengths and limitations of automatic evaluation. We have open-sourced our analysis and visualization tools at https://github.com/amazon-science/BeyondCorrelation.

TIST Journal 2025 Journal Article

Robust Few-Shot Ensemble Learning with Focal Diversity-Based Pruning

  • Selim Furkan Tekin
  • Fatih Ilhan
  • Tiansheng Huang
  • Sihao Hu
  • Margaret Loper
  • Ling Liu

This article presents FusionShot, a focal diversity-optimized few-shot ensemble learning approach for boosting the robustness and generalization performance of pre-trained few-shot models. The article makes three original contributions. First, we explore the unique characteristics of few-shot learning to ensemble multiple few-shot (FS) models by creating three alternative fusion channels. Second, we introduce the concept of focal error diversity to learn the most efficient ensemble teaming strategy, rather than assuming that an ensemble of a larger number of base models will outperform those sub-ensembles of smaller size. We develop a focal diversity ensemble pruning method to effectively prune out the candidate ensembles with low ensemble error diversity and recommend top- \( K \) FS ensembles with the highest focal error diversity. Finally, we capture the complex non-linear patterns of ensemble few-shot predictions by designing the learn-to-combine algorithm, which can learn the diverse weight assignments for robust ensemble fusion over different member models. Extensive experiments on representative few-shot benchmarks show that the top-K ensembles recommended by FusionShot can outperform the representative state-of-the-art (SOTA) few-shot models on novel tasks (different distributions and unknown at training) and can prevail over existing few-shot learners in both cross-domain settings and adversarial settings. For reproducibility purposes, FusionShot trained models, results, and code are made available at https://github.com/sftekin/fusionshot.

TIST Journal 2024 Journal Article

Hierarchical Pruning of Deep Ensembles with Focal Diversity

  • Yanzhao Wu
  • Ka-Ho Chow
  • Wenqi Wei
  • Ling Liu

Deep neural network ensembles combine the wisdom of multiple deep neural networks to improve the generalizability and robustness over individual networks. It has gained increasing popularity to study and apply deep ensemble techniques in the deep learning community. Some mission-critical applications utilize a large number of deep neural networks to form deep ensembles to achieve desired accuracy and resilience, which introduces high time and space costs for ensemble execution. However, it still remains a critical challenge whether a small subset of the entire deep ensemble can achieve the same or better generalizability and how to effectively identify these small deep ensembles for improving the space and time efficiency of ensemble execution. This article presents a novel deep ensemble pruning approach, which can efficiently identify smaller deep ensembles and provide higher ensemble accuracy than the entire deep ensemble of a large number of member networks. Our hierarchical ensemble pruning approach (HQ) leverages three novel ensemble pruning techniques. First, we show that the focal ensemble diversity metrics can accurately capture the complementary capacity of the member networks of an ensemble team, which can guide ensemble pruning. Second, we design a focal ensemble diversity based hierarchical pruning approach, which will iteratively find high quality deep ensembles with low cost and high accuracy. Third, we develop a focal diversity consensus method to integrate multiple focal diversity metrics to refine ensemble pruning results, where smaller deep ensembles can be effectively identified to offer high accuracy, high robustness and high ensemble execution efficiency. Evaluated using popular benchmark datasets, we demonstrate that the proposed hierarchical ensemble pruning approach can effectively identify high quality deep ensembles with better classification generalizability while being more time and space efficient in ensemble decision making. We have released the source codes on GitHub at https://github.com/git-disl/HQ-Ensemble.

NeurIPS Conference 2024 Conference Paper

Lisa: Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning Attack

  • Tiansheng Huang
  • Sihao Hu
  • Fatih Ilhan
  • Selim F. Tekin
  • Ling Liu

Recent studies show that Large Language Models (LLMs) with safety alignment can be jail-broken by fine-tuning on a dataset mixed with harmful data. For the first time in the literature, we show that the jail-break effect can be mitigated by separating two states in the fine-tuning stage to respectively optimize over the alignment and user datasets. Unfortunately, our subsequent study shows that this simple Bi-State Optimization (BSO) solution experiences convergence instability when steps invested in its alignment state is too small, leading to downgraded alignment performance. By statistical analysis, we show that the \textit{excess drift} towards the switching iterates of the two states could be a probable reason for the instability. To remedy this issue, we propose \textbf{L}azy(\textbf{i}) \textbf{s}afety \textbf{a}lignment (\textbf{Lisa}), which introduces a proximal term to constraint the drift of each state. Theoretically, the benefit of the proximal term is supported by the convergence analysis, wherein we show that a sufficient large proximal factor is necessary to guarantee Lisa's convergence. Empirically, our results on four downstream fine-tuning tasks show that Lisa with a proximal term can significantly increase alignment performance while maintaining the LLM's accuracy on the user tasks. Code is available at https: //github. com/git-disl/Lisa.

TIST Journal 2024 Journal Article

TS-Fastformer: Fast Transformer for Time-series Forecasting

  • Sangwon Lee
  • Junho Hong
  • Ling Liu
  • Wonik Choi

Many real-world applications require precise and fast time-series forecasting. Recent trends in time-series forecasting models are shifting from LSTM-based models to Transformer-based models. However, the Transformer-based model has a limited ability to represent sequential relationships in time-series data. In addition, the transformer-based model suffers from slow training and inference speed due to the bottleneck incurred by a deep encoder and step-by-step decoder inference. To address these problems, we propose a time-series forecasting optimized Transformer model, called TS-Fastformer. TS-Fastformer introduces three new optimizations: First, we propose a Sub Window Tokenizer for compressing input in a simple manner. The Sub Window Tokenizer reduces the length of input sequences to mitigate the complexity of self-attention and enables both single and multi-sequence learning. Second, we propose Time-series Pre-trained Encoder to extract effective representations through pre-training. This optimization enables TS-Fastformer to capture both seasonal and trend representations as well as to mitigate bottlenecks of conventional transformer models. Third, we propose the Past Attention Decoder to forecast target by incorporating past long short-term dependency patterns. Furthermore, Past Attention Decoder achieves high performance improvement by removing a trend distribution that changes over a long period. We evaluate the efficiency of our model with extensive experiments using seven real-world datasets and compare our model to six representative time-series forecasting approaches. The results show that the proposed TS-Fastformer reduces MSE by 10.1% compared to state-of-the-art model and demonstrates 21.6% faster training time compared to the existing fastest transformer, respectively.

NeurIPS Conference 2024 Conference Paper

Vaccine: Perturbation-aware Alignment for Large Language Models against Harmful Fine-tuning Attack

  • Tiansheng Huang
  • Sihao Hu
  • Ling Liu

The new paradigm of fine-tuning-as-a-service introduces a new attack surface for Large Language Models (LLMs): a few harmful data uploaded by users can easily trick the fine-tuning to produce an alignment-broken model. We conduct an empirical analysis and uncovera \textit{harmful embedding drift} phenomenon, showing a probable cause of the alignment-broken effect. Inspired by our findings, we propose Vaccine, a perturbation-aware alignment technique to mitigate the security risk of users fine-tuning. The core idea of Vaccine is to produce invariant hidden embeddings by progressively adding crafted perturbation to them in the alignment phase. This enables the embeddings to withstand harmful perturbation from un-sanitized user data in the fine-tuning phase. Our results on open source mainstream LLMs (e. g. , Llama2, Opt, Vicuna) demonstrate that Vaccine can boost the robustness of alignment against harmful prompts induced embedding drift while reserving reasoning ability towards benign prompts. Our code is available at https: //github. com/git-disl/Vaccine.

NeurIPS Conference 2023 Conference Paper

Lockdown: Backdoor Defense for Federated Learning with Isolated Subspace Training

  • Tiansheng Huang
  • Sihao Hu
  • Ka-Ho Chow
  • Fatih Ilhan
  • Selim Tekin
  • Ling Liu

Federated learning (FL) is vulnerable to backdoor attacks due to its distributed computing nature. Existing defense solution usually requires larger amount of computation in either the training or testing phase, which limits their practicality in the resource-constrain scenarios. A more practical defense, i. e. , neural network (NN) pruning based defense has been proposed in centralized backdoor setting. However, our empirical study shows that traditional pruning-based solution suffers \textit{poison-coupling} effect in FL, which significantly degrades the defense performance. This paper presents Lockdown, an isolated subspace training method to mitigate the poison-coupling effect. Lockdown follows three key procedures. First, it modifies the training protocol by isolating the training subspaces for different clients. Second, it utilizes randomness in initializing isolated subspacess, and performs subspace pruning and subspace recovery to segregate the subspaces between malicious and benign clients. Third, it introduces quorum consensus to cure the global model by purging malicious/dummy parameters. Empirical results show that Lockdown achieves \textit{superior} and \textit{consistent} defense performance compared to existing representative approaches against backdoor attacks. Another value-added property of Lockdown is the communication-efficiency and model complexity reduction, which are both critical for resource-constrain FL scenario. Our code is available at \url{https: //github. com/git-disl/Lockdown}.

TIST Journal 2023 Journal Article

Selecting and Composing Learning Rate Policies for Deep Neural Networks

  • Yanzhao Wu
  • Ling Liu

The choice of learning rate (LR) functions and policies has evolved from a simple fixed LR to the decaying LR and the cyclic LR, aiming to improve the accuracy and reduce the training time of Deep Neural Networks (DNNs). This article presents a systematic approach to selecting and composing an LR policy for effective DNN training to meet desired target accuracy and reduce training time within the pre-defined training iterations. It makes three original contributions. First, we develop an LR tuning mechanism for auto-verification of a given LR policy with respect to the desired accuracy goal under the pre-defined training time constraint. Second, we develop an LR policy recommendation system (LRBench) to select and compose good LR policies from the same and/or different LR functions through dynamic tuning, and avoid bad choices, for a given learning task, DNN model, and dataset. Third, we extend LRBench by supporting different DNN optimizers and show the significant mutual impact of different LR policies and different optimizers. Evaluated using popular benchmark datasets and different DNN models (LeNet, CNN3, ResNet), we show that our approach can effectively deliver high DNN test accuracy, outperform the existing recommended default LR policies, and reduce the DNN training time by 1.6-6.7× to meet a targeted model accuracy.

IROS Conference 2018 Conference Paper

HERO: Accelerating Autonomous Robotic Tasks with FPGA

  • Xuesong Shi
  • Lu Cao
  • Dawei Wang
  • Ling Liu
  • Ganmei You
  • Shuang Liu
  • Chunjie Wang

The Heterogeneous Extensible Robot Open (HERO) platform is designed for autonomous robotic research. While bringing in the flexible computational capacities by CPU and FPGA, it addresses the challenges of heterogeneous computing by embracing OpenCL programming. We propose heterogeneous computing approaches for three fundamental robotic tasks: simultaneous localization and mapping (SLAM), motion planning and convolutional neural network (CNN) inference. With FPGA acceleration, the SLAM and motion planning tasks are performed 2–4 times faster on the HERO platform against fine-tuned software implementation. For CNN inference, it can process 20–30 images per second with the network of VGG-16 or ResNet-50. We expect the open platform and the developing experiences shared in this paper can facilitate future robotic research, especially for those compute intensive tasks of perception, movement and manipulation.

TIST Journal 2013 Journal Article

Mondrian tree

  • Myungcheol Doo
  • Ling Liu

With ubiquitous wireless connectivity and technological advances in mobile devices, we witness the growing demands and increasing market shares of mobile intelligent systems and technologies for real-time decision making and location-based knowledge discovery. Spatial alarms are considered as one of the fundamental capabilities for intelligent mobile location-based systems. Like time-based alarms that remind us the arrival of a future time point, spatial alarms remind us the arrival of a future spatial point. Existing approaches for scaling spatial alarm processing are focused on computing Alarm-Free Regions (A fr ) and Alarm-Free Period (A fp ) such that mobile objects traveling within an A fr can safely hibernate the alarm evaluation process for the computed A fp, to save battery power, until approaching the nearest alarm of interest. A key technical challenge in scaling spatial alarm processing is to efficiently compute A fr and A fp such that mobile objects traveling within an A fr can safely hibernate the alarm evaluation process during the computed A fp, while maintaining high accuracy. In this article we argue that on-demand computation of A fr is expensive and may not scale well for dense populations of mobile objects. Instead, we propose to maintain an index for both spatial alarms and empty regions (A fr ) such that for a given mobile user's location, we can find relevant spatial alarms and whether it is in an alarm-free region more efficiently. We also show that conventional spatial indexing methods, such as R-tree family, k -d tree, Quadtree, and Grid, are by design not well suited to index empty regions. We present Mondrian Tree – a region partitioning tree for indexing both spatial alarms and alarm-free regions. We first introduce the Mondrian Tree indexing algorithms, including index construction, search, and maintenance. Then we describe a suite of Mondrian Tree optimizations to further enhance the performance of spatial alarm processing. Our experimental evaluation shows that the Mondrian Tree index, as an intelligent technology for mobile systems, outperforms traditional index methods, such as R-tree, Quadtree, and k -d tree, for spatial alarm processing.