Arrow Research search

Author name cluster

Wei Tang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

34 papers
2 author rows

Possible papers

34

JBHI Journal 2025 Journal Article

Aleatoric-Uncertainty-Aware Maximum Intensity Projection-Based GAN for 7T-Like Generation From 3T TOF-MRA

  • Wei Tang
  • Yuxiang Dai
  • Boyu Zhang
  • Zhang Shi
  • Ying-Hua Chu
  • Peixian Zhuang
  • Dinggang Shen
  • Chengyan Wang

Time-of-flight magnetic resonance angiography (TOF-MRA) is a prevalent vascular imaging technique for assessing cerebrovascular diseases. Compared to routine 3T TOF-MRA, 7T TOF-MRA provides vascular structures with a higher signal-to-noise ratio (SNR) and better vessel contrast, revealing greater vascular details. However, the inaccessibility of 7T scanners and specific physiological and technical concerns limit its clinical application. Therefore, we aimed to generate high-quality 7T-like TOF-MRA from 3T TOF-MRA. Considering the spatial sparsity of vessel signals, the visibility discrepancy of distal and small vessels between 3T and 7T images, and the subtle spatial misalignment between paired data, we proposed a novel aleatoric-uncertainty-aware maximum intensity projection-based generative adversarial network (AU-MIPGAN). In our method, we employed a knowledge distillation (KD) framework to incorporate multi-directional MIP information into the 3T-to-7T learning process to strengthen the learning of vessels and provide three-dimensional (3D) vascular morphological knowledge for the student model, facilitating accurate generation of vascular structures. Furthermore, we exploited AU modeling to compensate for the spatial misalignment between paired 3T and 7T images during the training procedure, which helped the model concentrate more on learning the intrinsic gap between 3T and 7T images. Qualitative and quantitative results demonstrated that the proposed AU-MIPGAN can achieve promising performance for 7T-like TOF-MRA generation.

IJCAI Conference 2025 Conference Paper

Detection and Geographic Localization of Natural Objects in the Wild: A Case Study on Palms

  • Kangning Cui
  • Rongkun Zhu
  • Manqi Wang
  • Wei Tang
  • Gregory D. Larsen
  • Victor P. Pauca
  • Sarra Alqahtani
  • Fan Yang

Palms are ecologically and economically indicators of tropical forest health, biodiversity, and human impact that support local economies and global forest product supply chains. While palm detection in plantations is well-studied, efforts to map naturally occurring palms in dense forests remain limited by overlapping crowns, uneven shading, and heterogeneous landscapes. We develop PRISM (Processing, Inference, Segmentation, and Mapping), a flexible pipeline for detecting and localizing palms in dense tropical forests using large orthomosaic images. Orthomosaics are created from thousands of aerial images and spanning several to hundreds of gigabytes. Our contributions are threefold. First, we construct a large UAV-derived orthomosaic dataset collected across 21 ecologically diverse sites in western Ecuador, annotated with 8, 830 bounding boxes and 5, 026 palm center points. Second, we evaluate multiple state-of-the-art object detectors based on efficiency and performance, integrating zero-shot SAM~2 as the segmentation backbone, and refining the results for precise geographic mapping. Third, we apply calibration methods to align confidence scores with IoU and explore saliency maps for feature explainability. Though optimized for palms, PRISM is adaptable for identifying other natural objects, such as eastern white pines. Future work will explore transfer learning for lower-resolution datasets (0. 5–1m). Data and code can be found at github. com/Zippppo/PRISM.

AAAI Conference 2025 Conference Paper

Fast Multi-Instance Partial-Label Learning

  • Yin-Fang Yang
  • Wei Tang
  • Min-Ling Zhang

Multi-instance partial-label learning (MIPL) is a paradigm where each training example is encapsulated as a multi-instance bag associated with the candidate label set, which includes one true label and several false positives. Current MIPL algorithms typically assume that all instances are independent, thereby neglecting the dependencies and heterogeneity inherent in MIPL data. Moreover, these algorithms often prove to be excessively time-consuming when dealing with complex datasets, significantly limiting the practical application of MIPL. In this paper, we propose FastMIPL, a framework that employs mixed-effects model to explicitly capture the dependencies and heterogeneity among instances and bags. FastMIPL is able to learn from MIPL data both effectively and efficiently by utilizing the predefined dependencies modeling module and leveraging the posterior predictive probability disambiguation strategy. Experiments show that the performance of FastMIPL is highly competitive to state-of-the-art methods, while significantly reducing computational time in benchmark and the real-world datasets.

NeurIPS Conference 2025 Conference Paper

No-Regret Online Autobidding Algorithms in First-price Auctions

  • Yilin Li
  • Yuan Deng
  • Wei Tang
  • Hanrui Zhang

Automated bidding to optimize online advertising with various constraints, e. g. ROI constraints and budget constraints, is widely adopted by advertisers. A key challenge lies in designing algorithms for non-truthful mechanisms with ROI constraints. While prior work has addressed truthful auctions or non-truthful auctions with weaker benchmarks, this paper provides a significant improvement: We develop online bidding algorithms for repeated first-price auctions with ROI constraints, benchmarking against the optimal randomized strategy in hindsight. In the full feedback setting, where the maximum competing bid is observed, our algorithm achieves a near-optimal $\tilde O(\sqrt{T})$ regret bound, and in the bandit feedback setting (where the bidder only observes whether the bidder wins each auction), our algorithm attains $\tilde O(T^{3/4})$ regret bound.

IROS Conference 2025 Conference Paper

Region-Centric 6-Dof Grasp Detection: A Data-Efficient Solution for Cluttered Scenes

  • Siang Chen
  • Wei Tang
  • Pengwei Xie
  • Dingchang Hu
  • Wenming Yang
  • Guijin Wang

Robotic grasping, serving as the cornerstone of robot manipulation, is fundamental for embodied intelligence. Manipulation in challenging scenarios demands grasp detection algorithms with higher efficiency and generalizability. However, for general 6-Dof grasp detection, most data-driven methods directly extract scene-level features to generate grasp prediction, relying on a relatively heavy scene-level feature encoder and a significant amount of data with dense grasp labels for model training. In this letter, we propose a novel data-efficient 6-Dof grasp detection framework in cluttered scenes, named Region-Centric Grasp Detection (RCGD), consisting of an Iterative Search Module (ISM) and a Region Grasp Model (RGM). Concretely, ISM aims to retrieve potential region centers and aggregate multiple regions in a coarse-to-fine way. Then, RGM extracts aligned grasp-related embeddings and predicts grasps within these local regions. Benefiting from the region-centric paradigm and the training-free location strategy, RCGD significantly outperforms previous methods and shows minimal performance loss with even a very small portion of training data or labels. Furthermore, real-world robotic experiments in two distinct settings highlight the effectiveness of our method with a 95% success rate.

NeurIPS Conference 2025 Conference Paper

SAMPO: Scale-wise Autoregression with Motion Prompt for Generative World Models

  • Sen Wang
  • Jingyi Tian
  • Le Wang
  • Zhimin Liao
  • Huaiyi Dong
  • Kun Xia
  • Sanping Zhou
  • Wei Tang

World models allow agents to simulate the consequences of actions in imagined environments for planning, control, and long-horizon decision-making. However, existing autoregressive world models struggle with visually coherent predictions due to disrupted spatial structure, inefficient decoding, and inadequate motion modeling. In response, we propose Scale-wise Autoregression with Motion PrOmpt (SAMPO), a hybrid framework that combines visual autoregressive modeling for intra-frame generation with causal modeling for next-frame generation. Specifically, SAMPO integrates temporal causal decoding with bidirectional spatial attention, which preserves spatial locality and supports parallel decoding within each scale. This design significantly enhances both temporal consistency and rollout efficiency. To further improve dynamic scene understanding, we devise an asymmetric multi-scale tokenizer that preserves spatial details in observed frames and extracts compact dynamic representations for future frames, optimizing both memory usage and model performance. Additionally, we introduce a trajectory-aware motion prompt module that injects spatiotemporal cues about object and robot trajectories, focusing attention on dynamic regions and improving temporal consistency and physical realism. Extensive experiments show that SAMPO achieves competitive performance in action-conditioned video prediction and model-based control, improving generation quality with 4. 4× faster inference. We also evaluate SAMPO's zero-shot generalization and scaling behavior, demonstrating its ability to generalize to unseen tasks and benefit from larger model sizes.

NeurIPS Conference 2025 Conference Paper

The Rise of Parameter Specialization for Knowledge Storage in Large Language Models

  • Yihuai Hong
  • Yiran Zhao
  • Wei Tang
  • Yang Deng
  • Yu Rong
  • Wenxuan Zhang

Over time, a growing wave of large language models from various series has been introduced to the community. Researchers are striving to maximize the performance of language models with constrained parameter sizes. However, from a microscopic perspective, there has been limited research on how to better store knowledge in model parameters, particularly within MLPs, to enable more effective utilization of this knowledge by the model. In this work, we analyze twenty publicly available open-source large language models to investigate the relationship between their strong performance and the way knowledge is stored in their corresponding MLP parameters. Our findings reveal that as language models become more advanced and demonstrate stronger knowledge capabilities, their parameters exhibit increased specialization. Specifically, parameters in the MLPs tend to be more focused on encoding similar types of knowledge. We experimentally validate that this specialized distribution of knowledge contributes to improving the efficiency of knowledge utilization in these models. Furthermore, by conducting causal training experiments, we confirm that this specialized knowledge distribution plays a critical role in improving the model's efficiency in leveraging stored knowledge.

NeurIPS Conference 2024 Conference Paper

Automating Dataset Updates Towards Reliable and Timely Evaluation of Large Language Models

  • Jiahao Ying
  • Yixin Cao
  • Yushi Bai
  • Qianru Sun
  • Bo Wang
  • Wei Tang
  • Zhaojun Ding
  • Yizhe Yang

Large language models (LLMs) have achieved impressive performance across various natural language benchmarks, prompting a continual need to curate more difficult datasets for larger LLMs, which is costly and time-consuming. In this paper, we propose to automate dataset updating and provide systematical analysis regarding its effectiveness in dealing with benchmark leakage issue, difficulty control, and stability. Thus, once current benchmark has been mastered or leaked, we can update it for timely and reliable evaluation. There are two updating strategies: 1) mimicking strategy to generate similar samples based on original data, preserving stylistic and contextual essence, and 2) extending strategy that further expands existing samples at varying cognitive levels by adapting Bloom’s taxonomy of educational objectives. Extensive experiments on updated MMLU and BIG-Bench demonstrate the stability of the proposed strategies and find that the mimicking strategy can effectively alleviate issues of overestimation from benchmark leakage. In cases where the efficient mimicking strategy fails, our extending strategy still shows promising results. Additionally, by controlling the difficulty, we can better discern the models’ performance and enable fine-grained analysis — neither too difficult nor too easy an exam can fairly judge students’ learning status. To the best of our knowledge, we are the first to automate updating benchmarks for reliable and timely evaluation. Our demo leaderboard can be found at https: //yingjiahao14. github. io/Automating-DatasetUpdates/.

IROS Conference 2024 Conference Paper

CFD-enabled Approach for Optimizing CPG Control Network for Underwater Soft Robotic Fish

  • Yunfei Wang
  • Weiyuan Sun
  • Wei Tang
  • Xianrui Zhang
  • Zhenping Yu
  • Shunxiang Cao
  • Juntian Qu

Central Pattern Generators (CPG) nonlinear oscillation network is being increasingly used in the control of multi-joint collaborative robots. The motion attitude of robots can be effectively adjusted by tuning parameters of the CPG neural network. However, the mapping from CPG parameters to motion attitude is relatively complicated. To improve the motion performance, an optimization method combining computational fluid dynamics (CFD) and CPG network is proposed. In this work, we design a three-joint biomimetic soft robot fish following the body structure of trevally and an improved CPG network based on the Hopf model is incorporated into the control system. Directly optimizing the swimming performance through experiments is time consuming and complex, a mode of first adjusting parameters on the simulation platform and then refining on the robot is usually adopted. Therefore, a CFD simulation platform using hydrodynamic solutions has been established to assist in analyzing the swimming effect. Finally, the experimental results show that the swimming simulation by the CFD is highly similar to the real test, and the swimming performance after the improved CPG network optimization has been significantly increased.

IROS Conference 2024 Conference Paper

Decentralized Communication-Maintained Coordination for Multi-Robot Exploration: Achieving Connectivity and Adaptability

  • Wei Tang
  • Chao Li
  • Jun Wu 0003
  • Qiuguo Zhu

The realm of multi-robot autonomous exploration tasks underscores the critical role of communication in coordinating group activities. This paper introduces an innovative decentralized multi-robot exploration algorithm, meticulously crafted to ensure unbroken communication within robotic groups, a crucial element for effective coordination. The motivation for our work is two-fold: Firstly, seamless communication is vital for coordinating multi-robot autonomous exploration tasks. Secondly, in applications such as disaster rescue operations or military maneuvers, there are numerous scenarios where spatial congregation of multiple robots is imperative for joint task accomplishment. Our approach addresses these challenges through a stringent communication constraint, ensuring that each robot remains in constant communicative contact with the rest of the group. This is realized by employing a decentralized policy that integrates Graph Neural Network (GNN) layers with self-attention mechanism. Such policy network design allows adaptation to different numbers of robots and varied environments. After an initial imitation learning phase, the policy is refined through learning from experiences generated via a tree-search-based lookahead technique. Our experimental analysis validates that the algorithm not only maintains consistent communication links among all group members but also improve the exploration efficiency under the communication constraints. These results highlight the potential of our method in enhancing the effectiveness of robotic group explorations while ensuring robust communication connection.

JBHI Journal 2024 Journal Article

DiffMAR: A Generalized Diffusion Model for Metal Artifact Reduction in CT Images

  • Tianxiao Cai
  • Xiang Li
  • Chenglan Zhong
  • Wei Tang
  • Jixiang Guo

X-ray imaging frequently introduces varying degrees of metal artifacts to computed tomography (CT) images when metal implants are present. For the metal artifact reduction (MAR) task, existing end-to-end methods often exhibit limited generalization capabilities. While methods based on multiple iterations often suffer from accumulative error, resulting in lower-quality restoration outcomes. In this work, we innovatively present a generalized diffusion model for Metal Artifact Reduction (DiffMAR). The proposed method utilizes a linear degradation process to simulate the physical phenomenon of metal artifact formation in CT images and directly learn an iterative restoration process from paired CT images in the reverse process. During the reverse process of DiffMAR, a Time-Latent Adjustment (TLA) module is designed to adjust time embedding at the latent level, thereby minimizing the accumulative error during iterative restoration. We also designed a structure information extraction (SIE) module to utilize linear interpolation data in the image domain, guiding the generation of anatomical structures during the iterative restoring. This leads to more accurate and robust shadow-free image generation. Comprehensive analysis, including both synthesized data and clinical evidence, confirms that our proposed method surpasses the current state-of-the-art (SOTA) MAR methods in terms of both image generation quality and generalization.

IJCAI Conference 2024 Conference Paper

Exploiting Conjugate Label Information for Multi-Instance Partial-Label Learning

  • Wei Tang
  • Weijia Zhang
  • Min-Ling Zhang

Multi-instance partial-label learning (MIPL) addresses scenarios where each training sample is represented as a multi-instance bag associated with a candidate label set containing one true label and several false positives. Existing MIPL algorithms have primarily focused on mapping multi-instance bags to candidate label sets for disambiguation, disregarding the intrinsic properties of the label space and the supervised information provided by non-candidate label sets. In this paper, we propose an algorithm named ELIMIPL, i. e. , Exploiting conjugate Label Information for Multi-Instance Partial-Label learning, which exploits the conjugate label information to improve the disambiguation performance. To achieve this, we extract the label information embedded in both candidate and non-candidate label sets, incorporating the intrinsic properties of the label space. Experimental results obtained from benchmark and real-world datasets demonstrate the superiority of the proposed ELIMIPL over existing MIPL algorithms and other well-established partial-label learning algorithms.

NeurIPS Conference 2024 Conference Paper

Intrinsic Robustness of Prophet Inequality to Strategic Reward Signaling

  • Wei Tang
  • Haifeng Xu
  • Ruimin Zhang
  • Derek Zhu

Prophet inequality concerns a basic optimal stopping problem and states that simple threshold stopping policies --- i. e. , accepting the first reward larger than a certain threshold --- can achieve tight $\frac{1}{2}$-approximation to the optimal prophet value. Motivated by its economic applications, this paper studies the robustness of this approximation to natural strategic manipulations in which each random reward is associated with a self-interested player who may selectively reveal his realized reward to the searcher in order to maximize his probability of being selected. We say a threshold policy is $\alpha$(-strategically)-robust if it (a) achieves the $\alpha$-approximation to the prophet value for strategic players; and (b) meanwhile remains a $\frac{1}{2}$-approximation in the standard non-strategic setting. Starting with a characterization of each player's optimal information revealing strategy, we demonstrate the intrinsic robustness of prophet inequalities to strategic reward signaling through the following results: (1) for arbitrary reward distributions, there is a threshold policy that is $\frac{1-\frac{1}{e}}{2}$-robust, and this ratio is tight; (2) for i. i. d. reward distributions, there is a threshold policy that is $\frac{1}{2}$-robust, which is tight for the setting; and (3) for log-concave (but non-identical) reward distributions, the $\frac{1}{2}$-robustness can also be achieved under certain regularity assumptions.

AAAI Conference 2024 Conference Paper

Invisible Backdoor Attack against 3D Point Cloud Classifier in Graph Spectral Domain

  • Linkun Fan
  • Fazhi He
  • Tongzhen Si
  • Wei Tang
  • Bing Li

3D point cloud has been wildly used in security crucial domains, such as self-driving and 3D face recognition. Backdoor attack is a serious threat that usually destroy Deep Neural Networks (DNN) in the training stage. Though a few 3D backdoor attacks are designed to achieve guaranteed attack efficiency, their deformation will alarm human inspection. To obtain invisible backdoored point cloud, this paper proposes a novel 3D backdoor attack, named IBAPC, which generates backdoor trigger in the graph spectral domain. The effectiveness is grounded by the advantage of graph spectral signal that it can induce both global structure and local points to be responsible for the caused deformation in spatial domain. In detail, a new backdoor implanting function is proposed whose aim is to transform point cloud to graph spectral signal for conducting backdoor trigger. Then, we design a backdoor training procedure which updates the parameter of backdoor implanting function and victim 3D DNN alternately. Finally, the backdoored 3D DNN and its associated backdoor implanting function is obtained by finishing the backdoor training procedure. Experiment results suggest that IBAPC achieves SOTA attack stealthiness from three aspects including objective distance measurement, subjective human evaluation, graph spectral signal residual. At the same time, it obtains competitive attack efficiency. The code is available at https://github.com/f-lk/IBAPC.

NeurIPS Conference 2024 Conference Paper

Multi-Instance Partial-Label Learning with Margin Adjustment

  • Wei Tang
  • Yin-Fang Yang
  • Zhaofei Wang
  • Weijia Zhang
  • Min-Ling Zhang

Multi-instance partial-label learning (MIPL) is an emerging learning framework where each training sample is represented as a multi-instance bag associated with a candidate label set. Existing MIPL algorithms often overlook the margins for attention scores and predicted probabilities, leading to suboptimal generalization performance. A critical issue with these algorithms is that the highest prediction probability of the classifier may appear on a non-candidate label. In this paper, we propose an algorithm named MIPLMA, i. e. , Multi-Instance Partial-Label learning with Margin Adjustment, which adjusts the margins for attention scores and predicted probabilities. We introduce a margin-aware attention mechanism to dynamically adjust the margins for attention scores and propose a margin distributionloss to constrain the margins between the predicted probabilities on candidate and non-candidate label sets. Experimental results demonstrate the superior performance of MIPLMA over existing MIPL algorithms, as well as other well-established multi-instance learning algorithms and partial-label learning algorithms.

ICML Conference 2024 Conference Paper

Performative Prediction with Bandit Feedback: Learning through Reparameterization

  • Yatong Chen
  • Wei Tang
  • Chien-Ju Ho
  • Yang Liu 0018

Performative prediction, as introduced by Perdomo et al. , is a framework for studying social prediction in which the data distribution itself changes in response to the deployment of a model. Existing work in this field usually hinges on three assumptions that are easily violated in practice: that the performative risk is convex over the deployed model, that the mapping from the model to the data distribution is known to the model designer in advance, and the first-order information of the performative risk is available. In this paper, we initiate the study of performative prediction problems that do not require these assumptions. Specifically, we develop a parameterization framework that parametrizes the performative prediction objective as a function of the induced data distribution. We also develop a two-level zeroth-order optimization procedure, where the first level performs iterative optimization on the distribution parameter space, and the second level learns the model that induced a particular target distribution parameter at each iteration. Under mild conditions, this reparameterization allows us to transform the non-convex objective into a convex one and achieve provable regret guarantees. In particular, we provide a regret bound that is sublinear in the total number of performative samples taken and is only polynomial in the dimension of the model parameter.

NeurIPS Conference 2023 Conference Paper

Disambiguated Attention Embedding for Multi-Instance Partial-Label Learning

  • Wei Tang
  • Weijia Zhang
  • Min-Ling Zhang

In many real-world tasks, the concerned objects can be represented as a multi-instance bag associated with a candidate label set, which consists of one ground-truth label and several false positive labels. Multi-instance partial-label learning (MIPL) is a learning paradigm to deal with such tasks and has achieved favorable performances. Existing MIPL approach follows the instance-space paradigm by assigning augmented candidate label sets of bags to each instance and aggregating bag-level labels from instance-level labels. However, this scheme may be suboptimal as global bag-level information is ignored and the predicted labels of bags are sensitive to predictions of negative instances. In this paper, we study an alternative scheme where a multi-instance bag is embedded into a single vector representation. Accordingly, an intuitive algorithm named DEMIPL, i. e. , Disambiguated attention Embedding for Multi-Instance Partial-Label learning, is proposed. DEMIPL employs a disambiguation attention mechanism to aggregate a multi-instance bag into a single vector representation, followed by a momentum-based disambiguation strategy to identify the ground-truth label from the candidate label set. Furthermore, we introduce a real-world MIPL dataset for colorectal cancer classification. Experimental results on benchmark and real-world datasets validate the superiority of DEMIPL against the compared MIPL and partial-label learning approaches.

NeurIPS Conference 2023 Conference Paper

Dynamic Pricing and Learning with Bayesian Persuasion

  • Shipra Agrawal
  • Yiding Feng
  • Wei Tang

We consider a novel dynamic pricing and learning setting where in addition to setting prices of products in sequential rounds, the seller also ex-ante commits to ‘advertising schemes’. That is, in the beginning of each round the seller can decide what kind of signal they will provide to the buyer about the product’s quality upon realization. Using the popular Bayesian persuasion framework to model the effect of these signals on the buyers’ valuation and purchase responses, we formulate the problem of finding an optimal design of the advertising scheme along with a pricing scheme that maximizes the seller’s expected revenue. Without any apriori knowledge of the buyers’ demand function, our goal is to design an online algorithm that can use past purchase responses to adaptively learn the optimal pricing and advertising strategy. We study the regret of the algorithm when compared to the optimal clairvoyant price and advertisingscheme. Our main result is a computationally efficient online algorithm that achieves an $O(T^{2/3}(m \log T )^{1/3})$ regret bound when the valuation function is linear in the product quality. Here $m$ is the cardinality of the discrete product quality domain and $T$ is the time horizon. This result requires some natural monotonicity and Lipschitz assumptions on the valuation function, but no Lipschitz or smoothness assumption on the buyers’ demand function. For constant $m$, our result matches the regret lower bound for dynamic pricing within logarithmic factors, which is a special case of our problem. We also obtain several improved results for the widely considered special case of additive valuations, including an $\tilde{O}(T^{2/3})$ regret bound independent of $m$ when $m\le T^{1/3}$.

NeurIPS Conference 2023 Conference Paper

Encoding Human Behavior in Information Design through Deep Learning

  • Guanghui Yu
  • Wei Tang
  • Saumik Narayanan
  • Chien-Ju Ho

We initiate the study of $\textit{behavioral information design}$ through deep learning. In information design, a $\textit{sender}$ aims to persuade a $\textit{receiver}$ to take certain actions by strategically revealing information. We address scenarios in which the receiver might exhibit different behavior patterns other than the standard Bayesian rational assumption. We propose HAIDNet, a neural-network-based optimization framework for information design that can adapt to multiple representations of human behavior. Through extensive simulation, we show that HAIDNet can not only recover information policies that are near-optimal compared with known analytical solutions, but also can extend to designing information policies for settings that are computationally challenging (e. g. , when there are multiple receivers) or for settings where there are no known solutions in general (e. g. , when the receiver behavior does not follow the Bayesian rational assumption). We also conduct real-world human-subject experiments and demonstrate that our framework can capture human behavior from data and lead to more effective information policy for real-world human receivers.

AAAI Conference 2023 Conference Paper

Multi-Stream Representation Learning for Pedestrian Trajectory Prediction

  • Yuxuan Wu
  • Le Wang
  • Sanping Zhou
  • Jinghai Duan
  • Gang Hua
  • Wei Tang

Forecasting the future trajectory of pedestrians is an important task in computer vision with a range of applications, from security cameras to autonomous driving. It is very challenging because pedestrians not only move individually across time but also interact spatially, and the spatial and temporal information is deeply coupled with one another in a multi-agent scenario. Learning such complex spatio-temporal correlation is a fundamental issue in pedestrian trajectory prediction. Inspired by the procedure that the hippocampus processes and integrates spatio-temporal information to form memories, we propose a novel multi-stream representation learning module to learn complex spatio-temporal features of pedestrian trajectory. Specifically, we learn temporal, spatial and cross spatio-temporal correlation features in three respective pathways and then adaptively integrate these features with learnable weights by a gated network. Besides, we leverage the sparse attention gate to select informative interactions and correlations brought by complex spatio-temporal modeling and reduce complexity of our model. We evaluate our proposed method on two commonly used datasets, i.e. ETH-UCY and SDD, and the experimental results demonstrate our method achieves the state-of-the-art performance. Code: https://github.com/YuxuanIAIR/MSRL-master

AAAI Conference 2022 Conference Paper

Learning Disentangled Classification and Localization Representations for Temporal Action Localization

  • Zixin Zhu
  • Le Wang
  • Wei Tang
  • Ziyi Liu
  • Nanning Zheng
  • Gang Hua

A common approach to Temporal Action Localization (TAL) is to generate action proposals and then perform action classification and localization on them. For each proposal, existing methods universally use a shared proposal-level representation for both tasks. However, our analysis indicates that this shared representation focuses on the most discriminative frames for classification, e. g. , “take-offs” rather than “runups” in distinguishing “high jump” and “long jump”, while frames most relevant to localization, such as the start and end frames of an action, are largely ignored. In other words, such a shared representation can not simultaneously handle both classification and localization tasks well, and it makes precise TAL difficult. To address this challenge, this paper disentangles the shared representation into classification and localization representations. The disentangled classification representation focuses on the most discriminative frames, and the disentangled localization representation focuses on the action phase as well as the action start and end. Our model can be divided into two sub-networks, i. e. , the disentanglement network and the context-based aggregation network. The disentanglement network is an autoencoder to learn orthogonal hidden variables of classification and localization. The context-based aggregation network aggregates the classification and localization representations by modeling local and global contexts. We evaluate our proposed method on two popular benchmarks for TAL, which outperforms all state-ofthe-art methods.

AAAI Conference 2021 Conference Paper

ACSNet: Action-Context Separation Network for Weakly Supervised Temporal Action Localization

  • Ziyi Liu
  • Le Wang
  • Qilin Zhang
  • Wei Tang
  • Junsong Yuan
  • Nanning Zheng
  • Gang Hua

The object of Weakly-supervised Temporal Action Localization (WS-TAL) is to localize all action instances in an untrimmed video with only video-level supervision. Due to the lack of frame-level annotations during training, current WS-TAL methods rely on attention mechanisms to localize the foreground snippets or frames that contribute to the video-level classification task. This strategy frequently confuse context with the actual action, in the localization result. Separating action and context is a core problem for precise WS-TAL, but it is very challenging and has been largely ignored in the literature. In this paper, we introduce an Action-Context Separation Network (ACSNet) that explicitly takes into account context for accurate action localization. It consists of two branches (i. e. , the Foreground-Background branch and the Action-Context branch). The Foreground- Background branch first distinguishes foreground from background within the entire video while the Action-Context branch further separates the foreground as action and context. We associate video snippets with two latent components (i. e. , a positive component and a negative component), and their different combinations can effectively characterize foreground, action and context. Furthermore, we introduce extended labels with auxiliary context categories to facilitate the learning of action-context separation. Experiments on THU- MOS14 and ActivityNet v1. 2/v1. 3 datasets demonstrate the ACSNet outperforms existing state-of-the-art WS-TAL methods by a large margin.

NeurIPS Conference 2021 Conference Paper

Bandit Learning with Delayed Impact of Actions

  • Wei Tang
  • Chien-Ju Ho
  • Yang Liu

We consider a stochastic multi-armed bandit (MAB) problem with delayed impact of actions. In our setting, actions taken in the pastimpact the arm rewards in the subsequent future. This delayed impact of actions is prevalent in the real world. For example, the capability to pay back a loan for people in a certain social group might depend on historically how frequently that group has been approved loan applications. If banks keep rejecting loan applications to people in a disadvantaged group, it could create a feedback loop and further damage the chance of getting loans for people in that group. In this paper, we formulate this delayed and long-term impact of actions within the context of multi-armed bandits. We generalize the bandit setting to encode the dependency of this ``bias" due to the action history during learning. The goal is to maximize the collected utilities over time while taking into account the dynamics created by the delayed impacts of historical actions. We propose an algorithm that achieves a regret of $\tilde{O}(KT^{2/3})$ and show a matching regret lower bound of $\Omega(KT^{2/3})$, where $K$ is the number of arms and $T$ is the learning horizon. Our results complement the bandit literature by adding techniques to deal with actions with long-term impacts and have implications in designing fair algorithms.

AAAI Conference 2021 Conference Paper

Weakly Supervised Temporal Action Localization Through Learning Explicit Subspaces for Action and Context

  • Ziyi Liu
  • Le Wang
  • Wei Tang
  • Junsong Yuan
  • Nanning Zheng
  • Gang Hua

Weakly-supervised Temporal Action Localization (WS-TAL) methods learn to localize temporal starts and ends of action instances in a video under only video-level supervision. Existing WS-TAL methods rely on deep features learned for action recognition. However, due to the mismatch between classification and localization, these features cannot distinguish the frequently co-occurring contextual background, i. e. , the context, and the actual action instances. We term this challenge action-context confusion, and it will adversely affect the action localization accuracy. To address this challenge, we introduce a framework that learns two feature subspaces respectively for actions and their context. By explicitly accounting for action visual elements, the action instances can be localized more precisely without the distraction from the context. To facilitate the learning of these two feature subspaces with only video-level categorical labels, we leverage the predictions from both spatial and temporal streams for snippets grouping. In addition, an unsupervised learning task is introduced to make the proposed module focus on mining temporal information. The proposed approach outperforms state-of-the-art WS-TAL methods on three benchmarks, i. e. , THUMOS14, ActivityNet v1. 2 and v1. 3 datasets.

NeurIPS Conference 2020 Conference Paper

Optimal Query Complexity of Secure Stochastic Convex Optimization

  • Wei Tang
  • Chien-Ju Ho
  • Yang Liu

We study the \emph{secure} stochastic convex optimization problem: a learner aims to learn the optimal point of a convex function through sequentially querying a (stochastic) gradient oracle, in the meantime, there exists an adversary who aims to free-ride and infer the learning outcome of the learner from observing the learner's queries. The adversary observes only the points of the queries but not the feedback from the oracle. The goal of the learner is to optimize the accuracy, i. e. , obtaining an accurate estimate of the optimal point, while securing her privacy, i. e. , making it difficult for the adversary to infer the optimal point. We formally quantify this tradeoff between learner’s accuracy and privacy and characterize the lower and upper bounds on the learner's query complexity as a function of desired levels of accuracy and privacy. For the analysis of lower bounds, we provide a general template based on information theoretical analysis and then tailor the template to several families of problems, including stochastic convex optimization and (noisy) binary search. We also present a generic secure learning protocol that achieves the matching upper bound up to logarithmic factors.

AAMAS Conference 2019 Conference Paper

Bandit Learning with Biased Human Feedback

  • Wei Tang
  • Chien-Ju Ho

We study a multi-armed bandit problem with biased human feedback. In our setting, each arm is associated with an unknown reward distribution. When an arm is played, a user receives a realized reward drawn from the distribution of the arm. She then provides feedback, a biased report of the realized reward, that depends on both the realized reward and the feedback history of the arm. The learner can observe only the biased feedback but not the realized rewards. The goal is to design a strategy to sequentially choose arms to maximize the total rewards users receive while only having access to the biased user feedback. We explore two natural feedback models. When user feedback is biased only by the average feedback of the arm (i. e. , the ratio of positive feedback), we demonstrate that the evolution of the average feedback over time is mathematically equivalent to users performing online gradient descent for some latent function with a decreasing step size. With this mathematical connection, we show that under some mild conditions, it is possible to design a bandit algorithm achieving regret (i. e. , the difference between the algorithm performance and the optimal performance of always choosing the best arm) sublinear in the number of rounds. However, in another model when user feedback is biased by both the average feedback and the number of feedback instances, we show that there exist no bandit algorithms that could achieve sublinear regret. Our results demonstrate the importance of understanding human behavior when applying bandit approaches in systems with humans in the loop.

AAAI Conference 2017 Conference Paper

How to Train a Compact Binary Neural Network with High Accuracy?

  • Wei Tang
  • Gang Hua
  • Liang Wang

How to train a binary neural network (BinaryNet) with both high compression rate and high accuracy on large scale datasets? We answer this question through a careful analysis of previous work on BinaryNets, in terms of training strategies, regularization, and activation approximation. Our findings first reveal that a low learning rate is highly preferred to avoid frequent sign changes of the weights, which often makes the learning of BinaryNets unstable. Secondly, we propose to use PReLU instead of ReLU in a BinaryNet to conveniently absorb the scale factor for weights to the activation function, which enjoys high computation efficiency for binarized layers while maintains high approximation accuracy. Thirdly, we reveal that instead of imposing L2 regularization, driving all weights to zero which contradicts with the setting of BinaryNets, we introduce a regularization term that encourages the weights to be bipolar. Fourthly, we discover that the failure of binarizing the last layer, which is essential for high compression rate, is due to the improper output range. We propose to use a scale layer to bring it to normal. Last but not least, we propose multiple binarizations to improve the approximation of the activations. The composition of all these enables us to train BinaryNets with both high compression rate and high accuracy, which is strongly supported by our extensive empirical study.