Author name cluster

Kuan Cheng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

2 author rows

ICML Conference 2025 Conference Paper

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

Yuan Zhang 0020
Chun-Kai Fan
Junpeng Ma
Wenzhao Zheng
Tao Huang 0020
Kuan Cheng
Denis A. Gudovskiy
Tomoyuki Okuno

In vision-language models (VLMs), visual tokens usually consume a significant amount of computational overhead, despite their sparser information density compared to text tokens. To address this, most existing methods learn a network to prune redundant visual tokens and require additional training data. Differently, we propose an efficient training-free token optimization mechanism dubbed SparseVLM without extra parameters or fine-tuning costs. Concretely, given that visual tokens complement text tokens in VLMs for linguistic reasoning, we select visual-relevant text tokens to rate the significance of vision tokens within the self-attention matrix extracted from the VLMs. Then we progressively prune irrelevant tokens. To maximize sparsity while retaining essential information, we introduce a rank-based strategy to adaptively determine the sparsification ratio for each layer, alongside a token recycling method that compresses pruned tokens into more compact representations. Experimental results show that our SparseVLM improves the efficiency of various VLMs across a range of image and video understanding tasks. In particular, when LLaVA is equipped with SparseVLM, it achieves a 54% reduction in FLOPs, lowers CUDA time by 37%, and maintains an accuracy rate of 97%. Our code is available at https: //github. com/Gumpest/SparseVLMs.

Details

NeurIPS Conference 2024 Conference Paper

Unveiling the Tapestry of Consistency in Large Vision-Language Models

Yuan Zhang
Fei Xiao
Tao Huang
Chun-Kai Fan
Hongyuan Dong
Jiawen Li
Jiacong Wang
Kuan Cheng

Large vision-language models (LVLMs) have recently achieved rapid progress, exhibiting great perception and reasoning abilities concerning visual information. However, when faced with prompts in different sizes of solution spaces, LVLMs fail to always give consistent answers regarding the same knowledge point. This inconsistency of answers between different solution spaces is prevalent in LVLMs and erodes trust. To this end, we provide a multi-modal benchmark ConBench, to intuitively analyze how LVLMs perform when the solution space of a prompt revolves around a knowledge point. Based on the ConBench tool, we are the first to reveal the tapestry and get the following findings: (1) In the discriminate realm, the larger the solution space of the prompt, the lower the accuracy of the answers. (2) Establish the relationship between the discriminative and generative realms: the accuracy of the discriminative question type exhibits a strong positive correlation with its Consistency with the caption. (3) Compared to open-source models, closed-source models exhibit a pronounced bias advantage in terms of Consistency. Eventually, we ameliorate the consistency of LVLMs by trigger-based diagnostic refinement, indirectly improving the performance of their caption. We hope this paper will accelerate the research community in better evaluating their models and encourage future advancements in the consistency domain.

PDF Details DOI

ICLR Conference 2023 Conference Paper

On The Relative Error of Random Fourier Features for Preserving Kernel Distance

Kuan Cheng
Shaofeng H. -C. Jiang
Luojian Wei
Zhide Wei

The method of random Fourier features (RFF), proposed in a seminal paper by Rahimi and Recht (NIPS'07), is a powerful technique to find approximate low-dimensional representations of points in (high-dimensional) kernel space, for shift-invariant kernels. While RFF has been analyzed under various notions of error guarantee, the ability to preserve the kernel distance with \emph{relative} error is less understood. We show that for a significant range of kernels, including the well-known Laplacian kernels, RFF cannot approximate the kernel distance with small relative error using low dimensions. We complement this by showing as long as the shift-invariant kernel is analytic, RFF with $\mathrm{poly}(\epsilon^{-1} \log n)$ dimensions achieves $\epsilon$-relative error for pairwise kernel distance of $n$ points, and the dimension bound is improved to $\mathrm{poly}(\epsilon^{-1}\log k)$ for the specific application of kernel $k$-means. Finally, going beyond RFF, we make the first step towards data-oblivious dimension-reduction for general shift-invariant kernels, and we obtain a similar $\mathrm{poly}(\epsilon^{-1} \log n)$ dimension bound for Laplacian kernels. We also validate the dimension-error tradeoff of our methods on simulated datasets, and they demonstrate superior performance compared with other popular methods including random-projection and Nystr\"{o}m methods.

Details

SODA Conference 2021 Conference Paper

Efficient Document Exchange and Error Correcting Codes with Asymmetric Information

Kuan Cheng
Xin Li 0006

Details

SODA Conference 2021 Conference Paper

Efficient Linear and Affine Codes for Correcting Insertions/Deletions

Kuan Cheng
Venkatesan Guruswami
Bernhard Haeupler
Xin Li 0006

This paper studies linear and affine error-correcting codes for correcting synchronization errors such as insertions and deletions. We call such codes linear/affine insdel codes. Linear codes that can correct even a single deletion are limited to have information rate at most 1/2 (achieved by the trivial 2-fold repetition code). Previously it was (erroneously) reported that more generally no non-trivial linear codes correcting k deletions exist, i. e. , that the ( k + 1)-fold repetition codes and its rate of 1/( k + 1) are basically optimal for any k. We disprove this and show the existence of binary linear codes of length n and rate just below 1/2 capable of correcting Ω( n ) insertions and deletions. This identifies rate 1/2 as a sharp threshold for recovery from deletions for linear codes, and reopens the quest for a better understanding of the capabilities of linear codes for correcting insertions/deletions. We prove novel outer bounds and existential inner bounds for the rate vs. (edit) distance trade-off of linear insdel codes. We complement our existential results with an efficient synchronization-string-based transformation that converts any asymptotically-good linear code for Hamming errors into an asymptotically-good linear code for insdel errors. Lastly we show that the ½-rate limitation does not hold for affine codes by giving an explicit affine code of rate 1 – ∊ which can efficiently correct a constant fraction of insdel errors.

Details

FOCS Conference 2021 Conference Paper

Exponential Lower Bounds for Locally Decodable and Correctable Codes for Insertions and Deletions

Jeremiah Blocki
Kuan Cheng
Elena Grigorescu
Xin Li 0006
Yu Zheng 0014
Minshen Zhu

Locally Decodable Codes (LDCs) are error-correcting codes for which individual message symbols can be quickly recovered despite errors in the codeword. LDCs for Hamming errors have been studied extensively in the past few decades, where a major goal is to understand the amount of redundancy that is necessary and sufficient to decode from large amounts of error, with small query complexity. Despite exciting progress, we still don't have satisfactory answers in several important parameter regimes. For example, in the case of 3-query LDCs, the gap between existing constructions and lower bounds is superpolynomial in the message length. In this work we study LDCs for insertion and deletion errors, called Insdel LDCs. Their study was initiated by Ostrovsky and Paskin-Cherniavsky (Information Theoretic Security, 2015), who gave a reduction from Hamming LDCs to Insdel LDCs with a small blowup in the code parameters. On the other hand, the only known lower bounds for Insdel LDCs come from those for Hamming LDCs, thus there is no separation between them. Here we prove new, strong lower bounds for the existence of Insdel LDCs. In particular, we show that 2-query linear Insdel LDCs do not exist, and give an exponential lower bound for the length of all q-query Insdel LDCs with constant q. For $q$ ≥ 3 our bounds are exponential in the existing lower bounds for Hamming LDCs. Furthermore, our exponential lower bounds continue to hold for adaptive decoders, and even in private-key settings where the encoder and decoder share secret randomness. This exhibits a strict separation between Hamming LDCs and Insdel LDCs. Our strong lower bounds also hold for the related notion of Insdel LCCs (except in the private-key setting), due to an analogue to the Insdel notions of a reduction from Hamming LCCs to LDCs. Our techniques are based on a delicate design and analysis of hard distributions of insertion and deletion errors, which depart significantly from typical techniques used in analyzing Hamming LDCs.

Details

SODA Conference 2019 Conference Paper

Synchronization Strings: Highly Efficient Deterministic Constructions over Small Alphabets

Kuan Cheng
Bernhard Haeupler
Xin Li 0006
Amirbehshad Shahrasbi
Ke Wu 0001

Details

FOCS Conference 2018 Conference Paper

Deterministic Document Exchange Protocols, and Almost Optimal Binary Codes for Edit Errors

Kuan Cheng
Zhengzhong Jin
Xin Li 0006
Ke Wu 0001

We study two basic problems regarding edit errors (insertions and deletions). The first one is document exchange, where two parties Alice and Bob hold two strings x and y with a bounded edit distance k. The goal is to have Alice send a short sketch to Bob, so that Bob can recover x based on y and the sketch. The second one is the fundamental problem of designing error correcting codes for edit errors, where the goal is to construct an explicit code to transmit a message x through a channel that can add at most k worst case insertions and deletions, so that the original message x can be successfully recovered at the other end of the channel. Both problems have been extensively studied for decades, and in this paper we focus on deterministic document exchange protocols and binary codes for insertions and deletions (insdel codes). If the length of x is n, then it is known that for small k (e. g. , k ≤ n/4), in both problems the optimal sketch size or the optimal number of redundant bits is Θ(k log n/k). In particular, this implies the existence of binary codes that can correct ε fraction of insertions and deletions with rate 1-Θ(ε log (1/ε). However, known constructions are far from achieving these bounds. In this paper we significantly improve previous results on both problems. For document exchange, we give an efficient deterministic protocol with sketch size O(k log 2 n/k). This significantly improves the previous best known deterministic protocol, which has sketch size O(k 2 + k log 2 n) [2]. For binary insdel codes, we obtain the following results: 1) An explicit binary insdel code which encodes an n-bit message x against k errors with redundancy O(k log 2 n/k). In particular this implies an explicit family of binary insdel codes that can correct ε fraction of insertions and deletions with rate 1-O(ε log 2 1/(1-ε))=1-Õ(ε). This significantly improves the previous best known result which only achieves rate 1-Õ(√ε) [11], [10], and is optimal up to a log (1/ε) factor. 1) An explicit binary insdel code which encodes an n-bit message x against k errors with redundancy O(k log n). This significantly improves the previous best known result of [4], which only works for constant k and has redundancy O(k 2 log k log n); and that of [2], which has redundancy O(k 2 + k log 2 n). Our code has optimal redundancy for k ≤ n 1-α, any constant 0 <; α <; 1. This is the first explicit construction of binary insdel codes that has optimal redundancy for a wide range of error parameters k, and this brings our understanding of binary insdel codes much closer to that of standard binary error correcting codes. In obtaining our results we introduce several new techniques. Most notably, we introduce the notion of ε-self matching hash functions and ε-synchronization hash functions. We believe our techniques can have further applications in the literature.

Details