Author name cluster

Richard Yi Da Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

1 author row

IJCAI Conference 2025 Conference Paper

TsCA: On the Semantic Consistency Alignment via Conditional Transport for Compositional Zero-Shot Learning

Miaoge Li
Jingcai Guo
Richard Yi Da Xu
Dongsheng Wang
Xiaofeng Cao
Zhijie Rao
Song Guo

Compositional Zero-Shot Learning (CZSL) aims to recognize novel state-object compositions by leveraging the shared knowledge of their primitive components. Despite considerable progress, effectively calibrating the bias between semantically similar multimodal representations, as well as generalizing pre-trained knowledge to novel compositional contexts, remains an enduring challenge. In this paper, our interest is to revisit the conditional transport (CT) theory and its homology to the visual-semantics interaction in CZSL and further, propose a novel Trisets Consistency Alignment framework (dubbed TsCA) that well-addresses these issues. Concretely, we utilize three distinct yet semantically homologous sets, i. e. , patches, primitives, and compositions, to construct pairwise CT costs to minimize their semantic discrepancies. To further ensure the consistency transfer within these sets, we implement a cycle-consistency constraint that refines the learning by guaranteeing the feature consistency of the self-mapping during transport flow, regardless of modality. Moreover, we extend the CT plans to an open-world setting, which enables the model to effectively filter out unfeasible pairs, thereby speeding up the inference as well as increasing the accuracy. Extensive experiments are conducted to verify the effectiveness of the proposed method. The code is available at https: //github. com/keepgoingjkg/TsCA.

PDF Details DOI

TMLR Journal 2023 Journal Article

Analyzing Deep PAC-Bayesian Learning with Neural Tangent Kernel: Convergence, Analytic Generalization Bound, and Efficient Hyperparameter Selection

Wei Huang
Chunrui Liu
Yilan Chen
Richard Yi Da Xu
Miao Zhang
Tsui-Wei Weng

PAC-Bayes is a well-established framework for analyzing generalization performance in machine learning models. This framework provides a bound on the expected population error by considering the sum of training error and the divergence between posterior and prior distributions. In addition to being a successful generalization bound analysis tool, the PAC-Bayesian bound can also be incorporated into an objective function for training probabilistic neural networks, which we refer to simply as {\it Deep PAC-Bayesian Learning}. Deep PAC-Bayesian learning has been shown to achieve competitive expected test set error and provide a tight generalization bound in practice at the same time through gradient descent training. Despite its empirical success, theoretical analysis of deep PAC-Bayesian learning for neural networks is rarely explored. To this end, this paper proposes a theoretical convergence and generalization analysis for Deep PAC-Bayesian learning. For a deep and wide probabilistic neural network, our analysis shows that PAC-Bayesian learning corresponds to solving a kernel ridge regression when the probabilistic neural tangent kernel (PNTK) is used as the kernel. We utilize this outcome in conjunction with the PAC-Bayes $\mathcal{C}$-bound, enabling us to derive an analytical and guaranteed PAC-Bayesian generalization bound for the first time. Finally, drawing insight from our theoretical results, we propose a proxy measure for efficient hyperparameter selection, which is proven to be time-saving on various benchmarks. Our work not only provides a better understanding of the theoretical underpinnings of Deep PAC-Bayesian learning, but also offers practical tools for improving the training and generalization performance of these models.

PDF Details

AAAI Conference 2023 Conference Paper

Domain Decorrelation with Potential Energy Ranking

Sen Pei
Jiaxi Sun
Richard Yi Da Xu
Shiming Xiang
Gaofeng Meng

Machine learning systems, especially the methods based on deep learning, enjoy great success in modern computer vision tasks under ideal experimental settings. Generally, these classic deep learning methods are built on the i.i.d. assumption, supposing the training and test data are drawn from the same distribution independently and identically. However, the aforementioned i.i.d. assumption is, in general, unavailable in the real-world scenarios, and as a result, leads to sharp performance decay of deep learning algorithms. Behind this, domain shift is one of the primary factors to be blamed. In order to tackle this problem, we propose using Potential Energy Ranking (PoER) to decouple the object feature and the domain feature in given images, promoting the learning of label-discriminative representations while filtering out the irrelevant correlations between the objects and the background. PoER employs the ranking loss in shallow layers to make features with identical category and domain labels close to each other and vice versa. This makes the neural networks aware of both objects and background characteristics, which is vital for generating domain-invariant features. Subsequently, with the stacked convolutional blocks, PoER further uses the contrastive loss to make features within the same categories distribute densely no matter domains, filtering out the domain information progressively for feature alignment. PoER reports superior performance on domain generalization benchmarks, improving the average top-1 accuracy by at least 1.20% compared to the existing methods. Moreover, we use PoER in the ECCV 2022 NICO Challenge, achieving top place with only a vanilla ResNet-18 and winning the jury award. The code has been made publicly available at: https://github.com/ForeverPs/PoER.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Robust Feature Rectification of Pretrained Vision Models for Object Recognition

Shengchao Zhou
Gaofeng Meng
Zhaoxiang Zhang
Richard Yi Da Xu
Shiming Xiang

Pretrained vision models for object recognition often suffer a dramatic performance drop with degradations unseen during training. In this work, we propose a RObust FEature Rectification module (ROFER) to improve the performance of pretrained models against degradations. Specifically, ROFER first estimates the type and intensity of the degradation that corrupts the image features. Then, it leverages a Fully Convolutional Network (FCN) to rectify the features from the degradation by pulling them back to clear features. ROFER is a general-purpose module that can address various degradations simultaneously, including blur, noise, and low contrast. Besides, it can be plugged into pretrained models seamlessly to rectify the degraded features without retraining the whole model. Furthermore, ROFER can be easily extended to address composite degradations by adopting a beam search algorithm to find the composition order. Evaluations on CIFAR-10 and Tiny-ImageNet demonstrate that the accuracy of ROFER is 5% higher than that of SOTA methods on different degradations. With respect to composite degradations, ROFER improves the accuracy of a pretrained CNN by 10% and 6% on CIFAR-10 and Tiny-ImageNet respectively.

PDF Details DOI

AAAI Conference 2021 Conference Paper

Capturing Uncertainty in Unsupervised GPS Trajectory Segmentation Using Bayesian Deep Learning

Christos Markos
James J. Q. Yu
Richard Yi Da Xu

Intelligent transportation management requires not only statistical information on users’ mobility patterns, but also knowledge of their corresponding transportation modes. While GPS trajectories can be readily obtained from GPS sensors found in modern smartphones and vehicles, these massive geospatial data are neither automatically annotated nor segmented by transportation mode, subsequently complicating transportation mode identification. In addition, predictive uncertainty caused by the learned model parameters or variable noise in GPS sensor readings typically remains unaccounted for. To jointly address the above issues, we propose a Bayesian deep learning framework for unsupervised GPS trajectory segmentation. After unlabeled GPS trajectories are preprocessed into sequences of motion features, they are used in unsupervised training of a channel-calibrated temporal convolutional neural network for timestep-level transportation mode identification. At test time, we approximate variational inference via Monte Carlo dropout sampling, leveraging the mean and variance of the predicted distributions to classify each input timestep and estimate its predictive uncertainty, respectively. The proposed approach outperforms both its non-Bayesian variant and established GPS trajectory segmentation baselines on Microsoft’s Geolife dataset without using any labels.

PDF Details

IJCAI Conference 2021 Conference Paper

On the Neural Tangent Kernel of Deep Networks with Orthogonal Initialization

Wei Huang
Weitao Du
Richard Yi Da Xu

The prevailing thinking is that orthogonal weights are crucial to enforcing dynamical isometry and speeding up training. The increase in learning speed that results from orthogonal initialization in linear networks has been well-proven. However, while the same is believed to also hold for nonlinear networks when the dynamical isometry condition is satisfied, the training dynamics behind this contention have not been thoroughly explored. In this work, we study the dynamics of ultra-wide networks across a range of architectures, including Fully Connected Networks (FCNs) and Convolutional Neural Networks (CNNs) with orthogonal initialization via neural tangent kernel (NTK). Through a series of propositions and lemmas, we prove that two NTKs, one corresponding to Gaussian weights and one to orthogonal weights, are equal when the network width is infinite. Further, during training, the NTK of an orthogonally-initialized infinite-width network should theoretically remain constant. This suggests that the orthogonal initialization cannot speed up training in the NTK (lazy training) regime, contrary to the prevailing thoughts. In order to explore under what circumstances can orthogonality accelerate training, we conduct a thorough empirical investigation outside the NTK regime. We find that when the hyper-parameters are set to achieve a linear regime in nonlinear activation, orthogonal initialization can improve the learning speed with a large learning rate or large depth.

PDF Details DOI

AAAI Conference 2020 Conference Paper

Geometry-Driven Self-Supervised Method for 3D Human Pose Estimation

Yang Li
Kan Li
Shuai Jiang
Ziyue Zhang
Congzhentao Huang
Richard Yi Da Xu

The neural network based approach for 3D human pose estimation from monocular images has attracted growing interest. However, annotating 3D poses is a labor-intensive and expensive process. In this paper, we propose a novel selfsupervised approach to avoid the need of manual annotations. Different from existing weakly/self-supervised methods that require extra unpaired 3D ground-truth data to alleviate the depth ambiguity problem, our method trains the network only relying on geometric knowledge without any additional 3D pose annotations. The proposed method follows the two-stage pipeline: 2D pose estimation and 2D-to-3D pose lifting. We design the transform re-projection loss that is an effective way to explore multi-view consistency for training the 2Dto-3D lifting network. Besides, we adopt the conﬁdences of 2D joints to integrate losses from different views to alleviate the inﬂuence of noises caused by the self-occlusion problem. Finally, we design a two-branch training architecture, which helps to preserve the scale information of re-projected 2D poses during training, resulting in accurate 3D pose predictions. We demonstrate the effectiveness of our method on two popular 3D human pose datasets, Human3. 6M and MPI- INF-3DHP. The results show that our method signiﬁcantly outperforms recent weakly/self-supervised approaches.

PDF Details

IJCAI Conference 2016 Conference Paper

Copula Mixed-Membership Stochastic Blockmodel

Xuhui Fan
Richard Yi Da Xu
Longbing Cao

The Mixed-Membership Stochastic Blockmodels (MMSB) is a popular framework for modelling social relationships by fully exploiting each individual node's participation (or membership) in a social network. Despite its powerful representations, MMSB assumes that the membership indicators of each pair of nodes (i. e. , people) are distributed independently. However, such an assumption often does not hold in real-life social networks, in which certain known groups of people may correlate with each other in terms of factors such as their membership categories. To expand MMSB's ability to model such dependent relationships, a new framework - a Copula Mixed-Membership Stochastic Blockmodel - is introduced in this paper for modeling intra-group correlations, namely an individual Copula function jointly models the membership pairs of those nodes within the group of interest. This framework enables various Copula functions to be used on demand, while maintaining the membership indicator's marginal distribution needed for modelling membership indicators with other nodes outside of the group of interest. Sampling algorithms for both the finite and infinite number of groups are also detailed. Our experimental results show its superior performance in capturing group interactions when compared with the baseline models on both synthetic and real world datasets.

PDF Details