Arrow Research search

Author name cluster

Ziteng Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers
2 author rows

Possible papers

9

ICRA Conference 2025 Conference Paper

FACET: Fast and Accurate Event-Based Eye Tracking Using Ellipse Modeling for Extended Reality

  • Junyuan Ding
  • Ziteng Wang
  • Chang Gao 0002
  • Min Liu
  • Qinyu Chen

Eye tracking is a key technology for gaze-based interactions in Extended Reality (XR), but traditional frame-based systems struggle to meet XR's demands for high accuracy, low latency, and power efficiency. Event cameras offer a promising alternative due to their high temporal resolution and low power consumption. In this paper, we present FACET (Fast and Accurate Event-based Eye Tracking), an end-to-end neural network that directly outputs pupil ellipse parameters from event data, optimized for real-time XR applications. The ellipse output can be directly used in subsequent ellipse-based pupil trackers. We enhance the EV-Eye dataset by expanding annotated data and converting original mask labels to ellipse-based annotations to train the model. Besides, a novel trigonometric loss is adopted to address angle discontinuities and a fast causal event volume event representation method is put forward. On the enhanced EV-Eye test set, FACET achieves an average pupil center error of $\mathbf{0. 2 0}$ pixels and an inference time of 0. 53 ms, reducing pixel error and inference time by $1. 6 \times$ and $1. 8 \times$ compared to the prior art, EV-Eye, with $4. 4 \times$ and $11. 7 \times$ less parameters and arithmetic operations. The code is available at https://github.com/DeanJY/FACET.

ICLR Conference 2025 Conference Paper

ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing

  • Ziteng Wang
  • Jun Zhu 0001
  • Jianfei Chen 0001

Sparsely activated Mixture-of-Experts (MoE) models are widely adopted to scale up model capacity without increasing the computation budget. However, vanilla TopK routers are trained in a discontinuous, non-differentiable way, limiting their performance and scalability. To address this issue, we propose ReMoE, a fully differentiable MoE architecture that offers a simple yet effective drop-in replacement for the conventional TopK+Softmax routing, utilizing ReLU as the router instead. We further propose methods to regulate the router's sparsity while balancing the load among experts. ReMoE’s continuous nature enables efficient dynamic allocation of computation across tokens and layers, while also exhibiting domain specialization. Our experiments demonstrate that ReMoE consistently outperforms vanilla TopK-routed MoE across various model sizes, expert counts, and levels of granularity. Furthermore, ReMoE exhibits superior scalability with respect to the number of experts, surpassing traditional MoE architectures. The implementation based on Megatron-LM is available at https://github.com/thu-ml/ReMoE.

NeurIPS Conference 2025 Conference Paper

VITRIX-CLIPIN: Enhancing Fine-Grained Visual Understanding in CLIP via Instruction-Editing Data and Long Captions

  • Ziteng Wang
  • Siqi Yang
  • Limeng Qiao
  • Lin Ma

Despite the success of Vision-Language Models (VLMs) like CLIP in aligning vision and language, their proficiency in detailed, fine-grained visual comprehension remains a key challenge. We present CLIP-IN, a novel framework that bolsters CLIP's fine-grained perception through two core innovations. Firstly, we leverage instruction-editing datasets, originally designed for image manipulation, as a unique source of hard negative image-text pairs. Coupled with a symmetric hard negative contrastive loss, this enables the model to effectively distinguish subtle visual-semantic differences. Secondly, CLIP-IN incorporates long descriptive captions, utilizing rotary positional encodings to capture rich semantic context often missed by standard CLIP. Our experiments demonstrate that CLIP-IN achieves substantial gains on the MMVP benchmark and various fine-grained visual recognition tasks, without compromising robust zero-shot performance on broader classification and retrieval tasks. Critically, integrating CLIP-IN's visual representations into Multimodal Large Language Models significantly reduces visual hallucinations and enhances reasoning abilities. This work underscores the considerable potential of synergizing targeted, instruction-based contrastive learning with comprehensive descriptive information to elevate the fine-grained understanding of VLMs.

ICLR Conference 2024 Conference Paper

Efficient Backpropagation with Variance Controlled Adaptive Sampling

  • Ziteng Wang
  • Jianfei Chen 0001
  • Jun Zhu 0001

Sampling-based algorithms, which eliminate "unimportant" computations during forward and/or backpropagation (BP), offer potential solutions to accelerate neural network training. However, since sampling introduces approximations to training, such algorithms may not consistently maintain accuracy across various tasks. In this work, we introduce a variance-controlled adaptive sampling (VCAS) method designed to minimize the computational load of BP. VCAS computes an unbiased stochastic gradient with fine-grained layerwise importance sampling in data dimension for activation gradient calculation and leverage score sampling in token dimension for weight gradient calculation. To preserve accuracy, we control the additional variance introduced by learning the sample ratio jointly with model parameters during training. We assessed VCAS on multiple fine-tuning and pre-training tasks in both vision and natural language domains. On all the tasks, VCAS can preserve the original training loss trajectory and validation accuracy with an up to 73.87% FLOPs reduction of BP and 49.58% FLOPs reduction of the whole training process. The implementation is available at https://github.com/thu-ml/VCAS.

AAAI Conference 2023 Short Paper

Self-Paced Learning Based Graph Convolutional Neural Network for Mixed Integer Programming (Student Abstract)

  • Li Chen
  • Hua Xu
  • Ziteng Wang
  • Chengming Wang
  • Yu Jiang

Graph convolutional neural network (GCN) based methods have achieved noticeable performance in solving mixed integer programming problems (MIPs). However, the generalization of existing work is limited due to the problem structure. This paper proposes a self-paced learning (SPL) based GCN network (SPGCN) with curriculum learning (CL) to make the utmost of samples. SPGCN employs a GCN model to imitate the branching variable selection during the branch and bound process, while the training process is conducted in a self-paced fashion. Specifically, SPGCN contains a loss-based automatic difficulty measurer, where the training loss of the sample represents the difficulty level. In each iteration, a dynamic training dataset is constructed according to the difficulty level for GCN model training. Experiments on four NP-hard datasets verify that CL can lead to generalization improvement and convergence speedup in solving MIPs, where SPL performs better than predefined CL methods.

ICRA Conference 2021 Conference Paper

Projector-Guided Non-Holonomic Mobile 3D Printing

  • Xuchu Xu
  • Ziteng Wang
  • Chen Feng 0002

Fused deposition modeling (FDM) using mobile robots instead of the gantry-based 3D printer enables additive manufacturing at a larger scale with higher speed. This introduces challenges including accurate localization, control of the printhead, and design of a stable mobile manipulator with low vibrations and proper degrees of freedom. We proposed and developed a low-cost non-holonomic mobile 3D printing system guided by a projector via learning-based visual servoing. It requires almost no manual calibration of the system parameters. Using a regular top-down projector without any expensive external localization device for pose feedback, this system enabled mobile robots to accurately follow pre-designed millimeter-level printing trajectories with speed control. We evaluate the system in terms of its trajectory accuracy and printing quality compared with original 3D designs. We further demonstrated the potential of this system using two such mobile robots to collaboratively print a 3D object with dimensions of 80 cm × 30 cm size, which exceeds the limitation of common desktop FDM 3D printers.

JMLR Journal 2016 Journal Article

Differentially Private Data Releasing for Smooth Queries

  • Ziteng Wang
  • Chi Jin
  • Kai Fan
  • Jiaqi Zhang
  • Junliang Huang
  • Yiqiao Zhong
  • Liwei Wang

In the past few years, differential privacy has become a standard concept in the area of privacy. One of the most important problems in this field is to answer queries while preserving differential privacy. In spite of extensive studies, most existing work on differentially private query answering assumes the data are discrete (i.e., in $\{0,1\}^d$) and focuses on queries induced by \emph{Boolean} functions. In real applications however, continuous data are at least as common as binary data. Thus, in this work we explore a less studied topic, namely, differential privately query answering for continuous data with continuous function. As a first step towards the continuous case, we study a natural class of linear queries on continuous data which we refer to as smooth queries. A linear query is said to be $K$-smooth if it is specified by a function defined on $[-1,1]^d$ whose partial derivatives up to order $K$ are all bounded. We develop two $\epsilon$-differentially private mechanisms which are able to answer all smooth queries. The first mechanism outputs a summary of the database and can then give answers to the queries. The second mechanism is an improvement of the first one and it outputs a synthetic database. The two mechanisms both achieve an accuracy of $O (n^{-\frac{K}{2d+K}}/\epsilon )$. Here we assume that the dimension $d$ is a constant. It turns out that even in this parameter setting (which is almost trivial in the discrete case), using existing discrete mechanisms to answer the smooth queries is difficult and requires more noise. Our mechanisms are based on $L_{\infty}$-approximation of (transformed) smooth functions by low-degree even trigonometric polynomials with uniformly bounded coefficients. We also develop practically efficient variants of the mechanisms with promising experimental results. [abs] [ pdf ][ bib ] &copy JMLR 2016. ( edit, beta )

NeurIPS Conference 2015 Conference Paper

Fast Second Order Stochastic Backpropagation for Variational Inference

  • Kai Fan
  • Ziteng Wang
  • Jeff Beck
  • James Kwok
  • Katherine Heller

We propose a second-order (Hessian or Hessian-free) based optimization method for variational inference inspired by Gaussian backpropagation, and argue that quasi-Newton optimization can be developed as well. This is accomplished by generalizing the gradient computation in stochastic backpropagation via a reparametrization trick with lower complexity. As an illustrative example, we apply this approach to the problems of Bayesian logistic regression and variational auto-encoder (VAE). Additionally, we compute bounds on the estimator variance of intractable expectations for the family of Lipschitz continuous function. Our method is practical, scalable and model free. We demonstrate our method on several real-world datasets and provide comparisons with other stochastic gradient methods to show substantial enhancement in convergence rates.

NeurIPS Conference 2013 Conference Paper

Efficient Algorithm for Privately Releasing Smooth Queries

  • Ziteng Wang
  • Kai Fan
  • Jiaqi Zhang
  • Liwei Wang

We study differentially private mechanisms for answering \emph{smooth} queries on databases consisting of data points in $\mathbb{R}^d$. A $K$-smooth query is specified by a function whose partial derivatives up to order $K$ are all bounded. We develop an $\epsilon$-differentially private mechanism which for the class of $K$-smooth queries has accuracy $O (\left(\frac{1}{n}\right)^{\frac{K}{2d+K}}/\epsilon)$. The mechanism first outputs a summary of the database. To obtain an answer of a query, the user runs a public evaluation algorithm which contains no information of the database. Outputting the summary runs in time $O(n^{1+\frac{d}{2d+K}})$, and the evaluation algorithm for answering a query runs in time $\tilde O (n^{\frac{d+2+\frac{2d}{K}}{2d+K}} )$. Our mechanism is based on $L_{\infty}$-approximation of (transformed) smooth functions by low degree even trigonometric polynomials with small and efficiently computable coefficients.