Arrow Research search

Author name cluster

Kai Tian

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers
1 author row

Possible papers

6

NeurIPS Conference 2025 Conference Paper

Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration

  • 俊琪 高
  • Zhichang Guo
  • Dazhi Zhang
  • Dong Li
  • Runze Liu
  • Pengfei Li
  • Kai Tian
  • Biqing Qi

Heterogeneous Large Language Model (LLM) fusion integrates the strengths of multiple source LLMs with different architectures into a target LLM with low computational overhead. While promising, existing methods suffer from two major limitations: 1) reliance on real data from limited domain for knowledge fusion, preventing the target LLM from fully acquiring knowledge across diverse domains, and 2) fixed data allocation proportions across domains, failing to dynamically adjust according to the target LLM's varying capabilities across domains, leading to a capability imbalance. To overcome these limitations, we propose Bohdi, a synthetic-data-only heterogeneous LLM fusion framework. Through the organization of knowledge domains into a hierarchical tree structure, Bohdi enables automatic domain exploration and multi-domain data generation through multi-model collaboration, thereby comprehensively extracting knowledge from source LLMs. By formalizing domain expansion and data sampling proportion allocation on the knowledge tree as a Hierarchical Multi-Armed Bandit problem, Bohdi leverages the designed DynaBranches mechanism to adaptively adjust sampling proportions based on the target LLM's performance feedback across domains. Integrated with our proposed Introspection-Rebirth (IR) mechanism, DynaBranches dynamically tracks capability shifts during target LLM's updates via Sliding Window Binomial Likelihood Ratio Testing (SWBLRT), further enhancing its online adaptation capability. Comparative experimental results on a comprehensive suite of benchmarks demonstrate that Bohdi significantly outperforms existing baselines on multiple target LLMs, exhibits higher data efficiency, and virtually eliminates the imbalance in the target LLM's capabilities.

NeurIPS Conference 2025 Conference Paper

DePass: Unified Feature Attributing by Simple Decomposed Forward Pass

  • Xiangyu Hong
  • Che Jiang
  • Kai Tian
  • Biqing Qi
  • Youbang Sun
  • Ning Ding
  • Bowen Zhou

Attributing the behavior of Transformer models to internal computations is a central challenge in mechanistic interpretability. We introduce DePass, a unified framework for feature attribution based on a single decomposed forward pass. DePass decomposes hidden states into customized additive components, then propagates them with attention scores and MLP's activations fixed. It achieves faithful, fine-grained attribution without requiring auxiliary training. We validate DePass across token-level, model component-level, and subspace-level attribution tasks, demonstrating its effectiveness and fidelity. Our experiments highlight its potential to attribute information flow between arbitrary components of a Transformer model. We hope DePass serves as a foundational tool for broader applications in interpretability.

NeurIPS Conference 2024 Conference Paper

Exploring Adversarial Robustness of Deep State Space Models

  • Biqing Qi
  • Yiang Luo
  • Junqi Gao
  • Pengfei Li
  • Kai Tian
  • Zhiyuan Ma
  • Bowen Zhou

Deep State Space Models (SSMs) have proven effective in numerous task scenarios but face significant security challenges due to Adversarial Perturbations (APs) in real-world deployments. Adversarial Training (AT) is a mainstream approach to enhancing Adversarial Robustness (AR) and has been validated on various traditional DNN architectures. However, its effectiveness in improving the AR of SSMs remains unclear. While many enhancements in SSM components, such as integrating Attention mechanisms and expanding to data-dependent SSM parameterizations, have brought significant gains in Standard Training (ST) settings, their potential benefits in AT remain unexplored. To investigate this, we evaluate existing structural variants of SSMs with AT to assess their AR performance. We observe that pure SSM structures struggle to benefit from AT, whereas incorporating Attention yields a markedly better trade-off between robustness and generalization for SSMs in AT compared to other components. Nonetheless, the integration of Attention also leads to Robust Overfitting (RO) issues. To understand these phenomena, we empirically and theoretically analyze the output error of SSMs under AP. We find that fixed-parameterized SSMs have output error bounds strictly related to their parameters, limiting their AT benefits, while input-dependent SSMs may face the problem of error explosion. Furthermore, we show that the Attention component effectively scales the output error of SSMs during training, enabling them to benefit more from AT, but at the cost of introducing RO due to its high model complexity. Inspired by this, we propose a simple and effective Adaptive Scaling (AdS) mechanism that brings AT performance close to Attention-integrated SSMs without introducing the issue of RO.

AAAI Conference 2021 Conference Paper

GIF Thumbnails: Attract More Clicks to Your Videos

  • Yi Xu
  • Fan Bai
  • Yingxuan Shi
  • Qiuyu Chen
  • Longwen Gao
  • Kai Tian
  • Shuigeng Zhou
  • Huyang Sun

With the rapid increase of mobile devices and online media, more and more people prefer posting/viewing videos online. Generally, these videos are presented on video streaming sites with image thumbnails and text titles. While facing huge amounts of videos, a viewer clicks through a certain video with high probability because of its eye-catching thumbnail. However, current video thumbnails are created manually, which is time-consuming and quality-unguaranteed. And static image thumbnails contain very limited information of the corresponding videos, which prevents users from successfully clicking what they really want to view. In this paper, we address a novel problem, namely GIF thumbnail generation, which aims to automatically generate GIF thumbnails for videos and consequently boost their Click- Through-Rate (CTR). Here, a GIF thumbnail is an animated GIF file consisting of multiple segments from the video, containing more information of the target video than a static image thumbnail. To support this study, we build the first GIF thumbnails benchmark dataset that consists of 1070 videos covering a total duration of 69. 1 hours, and 5394 corresponding manually-annotated GIFs. To solve this problem, we propose a learning-based automatic GIF thumbnail generation model, which is called Generative Variational Dual- Encoder (GEVADEN). As not relying on any user interaction information (e. g. time-sync comments and real-time view counts), this model is applicable to newly-uploaded/rarelyviewed videos. Experiments on our built dataset show that GEVADEN significantly outperforms several baselines, including video-summarization and highlight-detection based ones. Furthermore, we develop a pilot application of the proposed model on an online video platform with 9814 videos covering 1231 hours, which shows that our model achieves a 37. 5% CTR improvement over traditional image thumbnails. This further validates the effectiveness of the proposed model and the promising application prospect of GIF thumbnails.

AAAI Conference 2020 Conference Paper

Network as Regularization for Training Deep Neural Networks: Framework, Model and Performance

  • Kai Tian
  • Yi Xu
  • Jihong Guan
  • Shuigeng Zhou

Despite powerful representation ability, deep neural networks (DNNs) are prone to over-fitting, because of overparametrization. Existing works have explored various regularization techniques to tackle the over-fitting problem. Some of them employed soft targets rather than one-hot labels to guide network training (e. g. label smoothing in classification tasks), which are called target-based regularization approaches in this paper. To alleviate the over-fitting problem, here we propose a new and general regularization framework that introduces an auxiliary network to dynamically incorporate guided semantic disturbance to the labels. We call it Network as Regularization (NaR in short). During training, the disturbance is constructed by a convex combination of the predictions of the target network and the auxiliary network. These two networks are initialized separately. And the auxiliary network is trained independently from the target network, while providing instance-level and class-level semantic information to the latter progressively. We conduct extensive experiments to validate the effectiveness of the proposed method. Experimental results show that NaR outperforms many state-of-the-art target-based regularization methods, and other regularization approaches (e. g. mixup) can also benefit from combining with NaR.

AAAI Conference 2019 Conference Paper

Learning Competitive and Discriminative Reconstructions for Anomaly Detection

  • Kai Tian
  • Shuigeng Zhou
  • Jianping Fan
  • Jihong Guan

Most of the existing methods for anomaly detection use only positive data to learn the data distribution, thus they usually need a pre-defined threshold at the detection stage to determine whether a test instance is an outlier. Unfortunately, a good threshold is vital for the performance and it is really hard to find an optimal one. In this paper, we take the discriminative information implied in unlabeled data into consideration and propose a new method for anomaly detection that can learn the labels of unlabelled data directly. Our proposed method has an end-to-end architecture with one encoder and two decoders that are trained to model inliers and outliers’ data distributions in a competitive way. This architecture works in a discriminative manner without suffering from overfitting, and the training algorithm of our model is adopted from SGD, thus it is efficient and scalable even for large-scale datasets. Empirical studies on 7 datasets including KDD99, MNIST, Caltech-256, and ImageNet etc. show that our model outperforms the state-of-the-art methods.