Author name cluster

Weiguo Gao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

1 author row

TMLR Journal 2025 Journal Article

Evolution of Discriminator and Generator Gradients in GAN Training: From Fitting to Collapse

Weiguo Gao
Ming Li

Generative Adversarial Networks (GANs) are powerful generative models but often suffer from mode mixture and mode collapse. We propose a perspective that views GAN training as a two-phase progression from fitting to collapse, where mode mixture and mode collapse are treated as inter-connected. Inspired by the particle model interpretation of GANs, we leverage the discriminator gradient to analyze particle movement and the generator gradient, specifically "steepness," to quantify the severity of mode mixture by measuring the generator's sensitivity to changes in the latent space. Using these theoretical insights into evolution of gradients, we design a specialized metric that integrates both gradients to detect the transition from fitting to collapse. This metric forms the basis of an early stopping algorithm, which stops training at a point that retains sample quality and diversity. Experiments on synthetic and real-world datasets, including MNIST, Fashion MNIST, and CIFAR-10, validate our theoretical findings and demonstrate the effectiveness of the proposed algorithm.

PDF Details

JMLR Journal 2024 Journal Article

Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning

Jinchi Chen
Jie Feng
Weiguo Gao
Ke Wei

This paper studies a policy optimization problem arising from collaborative multi-agent reinforcement learning in a decentralized setting where agents communicate with their neighbors over an undirected graph to maximize the sum of their cumulative rewards. A novel decentralized natural policy gradient method, dubbed Momentum-based Decentralized Natural Policy Gradient (MDNPG), is proposed, which incorporates natural gradient, momentum-based variance reduction, and gradient tracking into the decentralized stochastic gradient ascent framework. The $\mathcal{O}(n^{-1}\epsilon^{-3})$ sample complexity for MDNPG to converge to an $\epsilon$-stationary point has been established under standard assumptions, where $n$ is the number of agents. It indicates that MDNPG can achieve the optimal convergence rate for decentralized policy gradient methods and possesses a linear speedup in contrast to centralized optimization methods. Moreover, superior empirical performance of MDNPG over other state-of-the-art algorithms has been demonstrated by extensive numerical experiments. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

PDF Details

NeurIPS Conference 2021 Conference Paper

SOFT: Softmax-free Transformer with Linear Complexity

Jiachen Lu
Jinghan Yao
Junge Zhang
Xiatian Zhu
Hang Xu
Weiguo Gao
Chunjing Xu
Tao Xiang

Vision transformers (ViTs) have pushed the state-of-the-art for various visual recognition tasks by patch-wise image tokenization followed by self-attention. However, the employment of self-attention modules results in a quadratic complexity in both computation and memory usage. Various attempts on approximating the self-attention computation with linear complexity have been made in Natural Language Processing. However, an in-depth analysis in this work shows that they are either theoretically flawed or empirically ineffective for visual recognition. We further identify that their limitations are rooted in keeping the softmax self-attention during approximations. Specifically, conventional self-attention is computed by normalizing the scaled dot-product between token feature vectors. Keeping this softmax operation challenges any subsequent linearization efforts. Based on this insight, for the first time, a softmax-free transformer or SOFT is proposed. To remove softmax in self-attention, Gaussian kernel function is used to replace the dot-product similarity without further normalization. This enables a full self-attention matrix to be approximated via a low-rank matrix decomposition. The robustness of the approximation is achieved by calculating its Moore-Penrose inverse using a Newton-Raphson method. Extensive experiments on ImageNet show that our SOFT significantly improves the computational efficiency of existing ViT variants. Crucially, with a linear complexity, much longer token sequences are permitted in SOFT, resulting in superior trade-off between accuracy and complexity.

PDF Details

IJCAI Conference 2020 Conference Paper

Multi-scale Two-way Deep Neural Network for Stock Trend Prediction

Guang Liu
Yuzhao Mao
Qi Sun
Hailong Huang
Weiguo Gao
Xuan Li
Jianping Shen
Ruifan Li

Stock Trend Prediction(STP) has drawn wide attention from various fields, especially Artificial Intelligence. Most previous studies are single-scale oriented which results in information loss from a multi-scale perspective. In fact, multi-scale behavior is vital for making intelligent investment decisions. A mature investor will thoroughly investigate the state of a stock market at various time scales. To automatically learn the multi-scale information in stock data, we propose a Multi-scale Two-way Deep Neural Network. It learns multi-scale patterns from two types of scale-information, wavelet-based and downsampling-based, by eXtreme Gradient Boosting and Recurrent Convolutional Neural Network, respectively. After combining the learned patterns from the two-way, our model achieves state-of-the-art performance on FI-2010 and CSI-2016, where the latter is our published long-range stock dataset to help future studies for STP task. Extensive experimental results on the two datasets indicate that multi-scale information can significantly improve the STP performance and our model is superior in capturing such information.

PDF Details DOI