Arrow Research search

Author name cluster

Wang Zhang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers
2 author rows

Possible papers

10

AAAI Conference 2026 Conference Paper

TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents

  • Bofei Zhang
  • Zirui Shang
  • Zhi Gao
  • Wang Zhang
  • Rui Xie
  • Xiaojian Ma
  • Tao Yuan
  • Xinxiao Wu

Building Graphical User Interface (GUI) agents is a promising research direction, which simulates human interaction with computers or mobile phones to perform diverse GUI tasks. However, a major challenge in developing generalized GUI agents is the lack of sufficient trajectory data across various operating systems and applications, mainly due to the high cost of manual annotations. In this paper, we propose the TongUI framework that transforms millions of multimodal web tutorials into GUI trajectories for generalized GUI agents. Concretely, we crawl GUI videos and articles from the Internet and process them into GUI agent trajectory data. Based on this, we construct the GUI-Net-1M dataset, which contains 1 million trajectories across five operating systems and over 280 applications. To the best of our knowledge, this is the largest open-source GUI trajectory dataset. We develop the TongUI agent by fine-tuning Qwen2.5-VL-3B/7B/32B models on GUI-Net-1M, which shows consistent performance improvements on commonly used grounding and navigation benchmarks, outperforming baseline agents by 10\% on multiple benchmarks, showing the effectiveness of the GUI-Net-1M dataset and underscoring the significance of our TongUI framework.

NeurIPS Conference 2025 Conference Paper

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

  • Qiying Yu
  • Zheng Zhang
  • Ruofei Zhu
  • Yufeng Yuan
  • Xiaochen Zuo
  • Yu Yue
  • Weinan Dai
  • Tiantian Fan

Inference scaling empowers LLMs with unprecedented reasoning ability, with reinforcement learning as the core technique to elicit complex reasoning. However, key technical details of state-of-the-art reasoning LLMs are concealed (such as in OpenAI o1 blog and DeepSeek R1 technical report), thus the community still struggles to reproduce their RL training results. We propose the D ecoupled Clip and D ynamic s A mpling P olicy O ptimization ( DAPO ) algorithm, and fully open-source a state-of-the-art large-scale RL system that achieves 50 points on AIME 2024 using Qwen2. 5-32B base model. Unlike previous works that withhold training details, we introduce four key techniques of our algorithm that make large-scale LLM RL a success. In addition, we open-source our training code, which is built on the verl framework, along with a carefully curated and processed dataset. These components of our open-source system enhance reproducibility and support future research in large-scale LLM RL.

AAAI Conference 2025 Conference Paper

Prompt-SID: Learning Structural Representation Prompt via Latent Diffusion for Single Image Denoising

  • Huaqiu Li
  • Wang Zhang
  • Xiaowan Hu
  • Tao Jiang
  • Zikang Chen
  • Haoqian Wang

Many studies have concentrated on constructing supervised models utilizing paired datasets for image denoising, which proves to be expensive and time-consuming. Current self-supervised and unsupervised approaches typically rely on blind-spot networks or sub-image pairs sampling, resulting in pixel information loss and destruction of detailed structural information, thereby significantly constraining the efficacy of such methods. In this paper, we introduce Prompt-SID, a prompt-learning-based single image denoising framework that emphasizes the preservation of structural details. This approach is trained in a self-supervised manner using downsampled image pairs. It captures original-scale image information through structural encoding and integrates this prompt into the denoiser. To achieve this, we propose a structural representation generation model based on the latent diffusion process and design a structural attention module within the transformer-based denoiser architecture to decode the prompt. Additionally, we introduce a scale replay training mechanism, which effectively mitigates the scale gap from images of different resolutions. We conduct comprehensive experiments on synthetic, real-world, and fluorescence imaging datasets, showcasing the remarkable effectiveness of Prompt-SID.

AAAI Conference 2025 Conference Paper

Spatiotemporal Blind-Spot Network with Calibrated Flow Alignment for Self-Supervised Video Denoising

  • Zikang Chen
  • Tao Jiang
  • Xiaowan Hu
  • Wang Zhang
  • Huaqiu Li
  • Haoqian Wang

Self-supervised video denoising aims to remove noise from videos without relying on ground truth data, leveraging the video itself to recover clean frames. Existing methods often rely on simplistic feature stacking or apply optical flow without thorough analysis. This results in suboptimal utilization of both inter-frame and intra-frame information, and it also neglects the potential of optical flow alignment under self-supervised conditions, leading to biased and insufficient denoising outcomes. To this end, we first explore the practicality of optical flow in the self-supervised setting and introduce a SpatioTemporal Blind-spot Network (STBN) for global frame feature utilization. In the temporal domain, we utilize bidirectional blind-spot feature propagation through the proposed blind-spot alignment block to ensure accurate temporal alignment and effectively capture long-range dependencies. In the spatial domain, we introduce the spatial receptive field expansion module, which enhances the receptive field and improves global perception capabilities. Additionally, to reduce the sensitivity of optical flow estimation to noise, we propose an unsupervised optical flow distillation mechanism that refines fine-grained inter-frame interactions during optical flow alignment. Our method demonstrates superior performance across both synthetic and real-world video denoising datasets.

AAAI Conference 2024 Conference Paper

Enhancing Multi-Scale Diffusion Prediction via Sequential Hypergraphs and Adversarial Learning

  • Pengfei Jiao
  • Hongqian Chen
  • Qing Bao
  • Wang Zhang
  • Huaming Wu

Information diffusion prediction plays a crucial role in understanding the propagation of information in social networks, encompassing both macroscopic and microscopic prediction tasks. Macroscopic prediction estimates the overall impact of information diffusion, while microscopic prediction focuses on identifying the next user to be influenced. While prior research often concentrates on one of these aspects, a few tackle both concurrently. These two tasks provide complementary insights into the diffusion process at different levels, revealing common traits and unique attributes. The exploration of leveraging common features across these tasks to enhance information prediction remains an underexplored avenue. In this paper, we propose an intuitive and effective model that addresses both macroscopic and microscopic prediction tasks. Our approach considers the interactions and dynamics among cascades at the macro level and incorporates the social homophily of users in social networks at the micro level. Additionally, we introduce adversarial training and orthogonality constraints to ensure the integrity of shared features. Experimental results on four datasets demonstrate that our model significantly outperforms state-of-the-art methods.

AAAI Conference 2024 Conference Paper

One Step Closer to Unbiased Aleatoric Uncertainty Estimation

  • Wang Zhang
  • Ziwen Martin Ma
  • Subhro Das
  • Tsui-Wei Lily Weng
  • Alexandre Megretski
  • Luca Daniel
  • Lam M. Nguyen

Neural networks are powerful tools in various applications, and quantifying their uncertainty is crucial for reliable decision-making. In the deep learning field, the uncertainties are usually categorized into aleatoric (data) and epistemic (model) uncertainty. In this paper, we point out that the existing popular variance attenuation method highly overestimates aleatoric uncertainty. To address this issue, we proposed a new estimation method by actively de-noising the observed data. By conducting a broad range of experiments, we demonstrate that our proposed approach provides a much closer approximation to the actual data uncertainty than the standard method.

ICML Conference 2023 Conference Paper

ConCerNet: A Contrastive Learning Based Framework for Automated Conservation Law Discovery and Trustworthy Dynamical System Prediction

  • Wang Zhang
  • Tsui-Wei Weng
  • Subhro Das
  • Alexandre Megretski
  • Luca Daniel
  • Lam M. Nguyen

Deep neural networks (DNN) have shown great capacity of modeling a dynamical system; nevertheless, they usually do not obey physics constraints such as conservation laws. This paper proposes a new learning framework named $\textbf{ConCerNet}$ to improve the trustworthiness of the DNN based dynamics modeling to endow the invariant properties. $\textbf{ConCerNet}$ consists of two steps: (i) a contrastive learning method to automatically capture the system invariants (i. e. conservation properties) along the trajectory observations; (ii) a neural projection layer to guarantee that the learned dynamics models preserve the learned invariants. We theoretically prove the functional relationship between the learned latent representation and the unknown system invariant function. Experiments show that our method consistently outperforms the baseline neural networks in both coordinate error and conservation metrics by a large margin. With neural network based parameterization and no dependence on prior knowledge, our method can be extended to complex and large-scale dynamics by leveraging an autoencoder.

IJCAI Conference 2021 Conference Paper

Learning Stochastic Equivalence based on Discrete Ricci Curvature

  • Xuan Guo
  • Qiang Tian
  • Wang Zhang
  • Wenjun Wang
  • Pengfei Jiao

Role-based network embedding methods aim to preserve node-centric connectivity patterns, which are expressions of node roles, into low-dimensional vectors. However, almost all the existing methods are designed for capturing a relaxation of automorphic equivalence or regular equivalence. They may be good at structure identification but could show poorer performance on role identification. Because automorphic equivalence and regular equivalence strictly tie the role of a node to the identities of all its neighbors. To mitigate this problem, we construct a framework called Curvature-based Network Embedding with Stochastic Equivalence (CNESE) to embed stochastic equivalence. More specifically, we estimate the role distribution of nodes based on discrete Ricci curvature for its excellent ability to concisely representing local topology. We use a Variational Auto-Encoder to generate embeddings while a degree-guided regularizer and a contrastive learning regularizer are leveraged to improving both its robustness and discrimination ability. The effectiveness of our proposed CNESE is demonstrated by extensive experiments on real-world networks.

NeurIPS Conference 2021 Conference Paper

Robust Deep Reinforcement Learning through Adversarial Loss

  • Tuomas Oikarinen
  • Wang Zhang
  • Alexandre Megretski
  • Luca Daniel
  • Tsui-Wei Weng

Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs, which raises concerns about deploying such agents in the real world. To address this issue, we propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against $l_p$-norm bounded adversarial attacks. Our framework is compatible with popular deep reinforcement learning algorithms and we demonstrate its performance with deep Q-learning, A3C and PPO. We experiment on three deep RL benchmarks (Atari, MuJoCo and ProcGen) to show the effectiveness of our robust training algorithm. Our RADIAL-RL agents consistently outperform prior methods when tested against attacks of varying strength and are more computationally efficient to train. In addition, we propose a new evaluation method called Greedy-Worst-Case Reward (GWC) to measure attack agnostic robustness of deep RL agents. We show that GWC can be evaluated efficiently and is a good estimate of the reward under the worst possible sequence of adversarial attacks. All code used for our experiments is available at https: //github. com/tuomaso/radial_rl_v2.

IROS Conference 2015 Conference Paper

Compliant wing design for a flapping wing micro air vehicle

  • David Colmenares
  • Randall Kania
  • Wang Zhang
  • Metin Sitti

In this work, we examine several wing designs for a motor-driven, flapping-wing micro air vehicle capable of liftoff. The full system consists of two wings independently driven by geared pager motors that include a spring in parallel with the output shaft. The linear transmission allows for resonant operation, while control is achieved by direct drive of the wing angle. Wings used in previous work were chosen to be fully rigid for simplicity of modeling and fabrication. However, biological wings are highly flexible and other micro air vehicles have successfully utilized flexible wing structures for specialized tasks. The goal of our study is to determine if wing flexibility can be generally used to increase wing performance. Two approaches to lift improvement using flexible wings are explored, resonance of the wing cantilever structure and dynamic wing twisting. We design and test several wings that are compared using different figures of merit. A twisted design improved lift per power by 73. 6% and maximum lift production by 53. 2% compared to the original rigid design. Wing twist is then modeled in order to propose optimal wing twist profiles that can maximize either wing efficiency or lift production.