Author name cluster

Cong Ma

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

1 author row

NeurIPS Conference 2025 Conference Paper

Multi-modal contrastive learning adapts to intrinsic dimensions of shared latent variables

Yu Gui
Cong Ma
Zongming Ma

Multi-modal contrastive learning as a self-supervised representation learning technique has achieved great success in foundation model training, such as CLIP~\citep{radford2021learning}. In this paper, we study the theoretical properties of the learned representations from multi-modal contrastive learning beyond linear representations and specific data distributions. Our analysis reveals that, enabled by temperature optimization, multi-modal contrastive learning not only maximizes mutual information between modalities but also adapts to intrinsic dimensions of data, which can be much lower than user-specified dimensions for representation vectors. Experiments on both synthetic and real-world datasets demonstrate the ability of contrastive learning to learn low-dimensional and informative representations, bridging theoretical insights and practical performance.

PDF Details

NeurIPS Conference 2024 Conference Paper

Off-policy estimation with adaptively collected data: the power of online learning

Jeonghwan Lee
Cong Ma

We consider estimation of a linear functional of the treatment effect from adaptively collected data. This problem finds a variety of applications including off-policy evaluation in contextual bandits, and estimation of the average treatment effect in causal inference. While a certain class of augmented inverse propensity weighting (AIPW) estimators enjoys desirable asymptotic properties including the semi-parametric efficiency, much less is known about their non-asymptotic theory with adaptively collected data. To fill in the gap, we first present generic upper bounds on the mean-squared error of the class of AIPW estimators that crucially depends on a sequentially weighted error between the treatment effect and its estimates. Motivated by this, we propose a general reduction scheme that allows one to produce a sequence of estimates for the treatment effect via online learning to minimize the sequentially weighted estimation error. To illustrate this, we provide three concrete instantiations in (1) the tabular case; (2) the case of linear function approximation; and (3) the case of general function approximation for the outcome model. We then provide a local minimax lower bound to show the instance-dependent optimality of the AIPW estimator using no-regret online learning algorithms.

PDF Details DOI

AAAI Conference 2024 Conference Paper

SwiftPillars: High-Efficiency Pillar Encoder for Lidar-Based 3D Detection

Xin Jin
Kai Liu
Cong Ma
Ruining Yang
Fei Hui
Wei Wu

Lidar-based 3D Detection is one of the significant components of Autonomous Driving. However, current methods over-focus on improving the performance of 3D Lidar perception, which causes the architecture of networks becoming complicated and hard to deploy. Thus, the methods are difficult to apply in Autonomous Driving for real-time processing. In this paper, we propose a high-efficiency network, SwiftPillars, which includes Swift Pillar Encoder (SPE) and Multi-scale Aggregation Decoder (MAD). The SPE is constructed by a concise Dual-attention Module with lightweight operators. The Dual-attention Module utilizes feature pooling, matrix multiplication, etc. to speed up point-wise and channel-wise attention extraction and fusion. The MAD interconnects multiple scale features extracted by SPE with minimal computational cost to leverage performance. In our experiments, our proposal accomplishes 61.3% NDS and 53.2% mAP in nuScenes dataset. In addition, we evaluate inference time on several platforms (P4, T4, A2, MLU370, RTX3080), where SwiftPillars achieves up to 13.3ms (75FPS) on NVIDIA Tesla T4. Compared with PointPillars, SwiftPillars is on average 26.58% faster in inference speed with equivalent GPUs and a higher mAP of approximately 3.2% in the nuScenes dataset.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Conformalized matrix completion

Yu Gui
Rina Barber
Cong Ma

Matrix completion aims to estimate missing entries in a data matrix, using the assumption of a low-complexity structure (e. g. , low-rankness) so that imputation is possible. While many effective estimation algorithms exist in the literature, uncertainty quantification for this problem has proved to be challenging, and existing methods are extremely sensitive to model misspecification. In this work, we propose a distribution-free method for predictive inference in the matrix completion problem. Our method adapts the framework of conformal prediction, which provides prediction intervals with guaranteed distribution-free validity in the setting of regression, to the problem of matrix completion. Our resulting method, conformalized matrix completion (cmc), offers provable predictive coverage regardless of the accuracy of the low-rank model. Empirical results on simulated and real data demonstrate that cmc is robust to model misspecification while matching the performance of existing model-based methods when the model is correct.

PDF Details

NeurIPS Conference 2022 Conference Paper

Pessimism for Offline Linear Contextual Bandits using $\ell_p$ Confidence Sets

Gene Li
Cong Ma
Nati Srebro

We present a family $\{\widehat{\pi}_p\}_{p\ge 1}$ of pessimistic learning rules for offline learning of linear contextual bandits, relying on confidence sets with respect to different $\ell_p$ norms, where $\widehat{\pi}_2$ corresponds to Bellman-consistent pessimism (BCP), while $\widehat{\pi}_\infty$ is a novel generalization of lower confidence bound (LCB) to the linear setting. We show that the novel $\widehat{\pi}_\infty$ learning rule is, in a sense, adaptively optimal, as it achieves the minimax performance (up to log factors) against all $\ell_q$-constrained problems, and as such it strictly dominates all other predictors in the family, including $\widehat{\pi}_2$.

PDF Details

JMLR Journal 2022 Journal Article

Scaling and Scalability: Provable Nonconvex Low-Rank Tensor Estimation from Incomplete Measurements

Tian Tong
Cong Ma
Ashley Prater-Bennette
Erin Tripp
Yuejie Chi

Tensors, which provide a powerful and flexible model for representing multi-attribute data and multi-way interactions, play an indispensable role in modern data science across various fields in science and engineering. A fundamental task is to faithfully recover the tensor from highly incomplete measurements in a statistically and computationally efficient manner. Harnessing the low-rank structure of tensors in the Tucker decomposition, this paper develops a scaled gradient descent (ScaledGD) algorithm to directly recover the tensor factors with tailored spectral initializations, and shows that it provably converges at a linear rate independent of the condition number of the ground truth tensor for two canonical problems --- tensor completion and tensor regression --- as soon as the sample size is above the order of $n^{3/2}$ ignoring other parameter dependencies, where $n$ is the dimension of the tensor. This leads to an extremely scalable approach to low-rank tensor estimation compared with prior art, which suffers from at least one of the following drawbacks: extreme sensitivity to ill-conditioning, high per-iteration costs in terms of memory and computation, or poor sample complexity guarantees. To the best of our knowledge, ScaledGD is the first algorithm that achieves near-optimal statistical and computational complexities simultaneously for low-rank tensor completion with the Tucker decomposition. Our algorithm highlights the power of appropriate preconditioning in accelerating nonconvex statistical estimation, where the iteration-varying preconditioners promote desirable invariance properties of the trajectory with respect to the underlying symmetry in low-rank tensor factorization. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2022. ( edit, beta )

PDF Details

JMLR Journal 2021 Journal Article

Accelerating Ill-Conditioned Low-Rank Matrix Estimation via Scaled Gradient Descent

Tian Tong
Cong Ma
Yuejie Chi

Low-rank matrix estimation is a canonical problem that finds numerous applications in signal processing, machine learning and imaging science. A popular approach in practice is to factorize the matrix into two compact low-rank factors, and then optimize these factors directly via simple iterative methods such as gradient descent and alternating minimization. Despite nonconvexity, recent literatures have shown that these simple heuristics in fact achieve linear convergence when initialized properly for a growing number of problems of interest. However, upon closer examination, existing approaches can still be computationally expensive especially for ill-conditioned matrices: the convergence rate of gradient descent depends linearly on the condition number of the low-rank matrix, while the per-iteration cost of alternating minimization is often prohibitive for large matrices. The goal of this paper is to set forth a competitive algorithmic approach dubbed Scaled Gradient Descent (ScaledGD) which can be viewed as preconditioned or diagonally-scaled gradient descent, where the preconditioners are adaptive and iteration-varying with a minimal computational overhead. With tailored variants for low-rank matrix sensing, robust principal component analysis and matrix completion, we theoretically show that ScaledGD achieves the best of both worlds: it converges linearly at a rate independent of the condition number of the low-rank matrix similar as alternating minimization, while maintaining the low per-iteration cost of gradient descent. Our analysis is also applicable to general loss functions that are restricted strongly convex and smooth over low-rank matrices. To the best of our knowledge, ScaledGD is the first algorithm that provably has such properties over a wide range of low-rank matrix estimation tasks. At the core of our analysis is the introduction of a new distance function that takes account of the preconditioners when measuring the distance between the iterates and the ground truth. Finally, numerical examples are provided to demonstrate the effectiveness of ScaledGD in accelerating the convergence rate of ill-conditioned low-rank matrix estimation in a wide number of applications. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2021. ( edit, beta )

PDF Details

NeurIPS Conference 2021 Conference Paper

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Paria Rashidinejad
Banghua Zhu
Cong Ma
Jiantao Jiao
Stuart Russell

Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from a fixed dataset without active data collection. Based on the composition of the offline dataset, two main methods are used: imitation learning which is suitable for expert datasets, and vanilla offline RL which often requires uniform coverage datasets. From a practical standpoint, datasets often deviate from these two extremes and the exact data composition is usually unknown. To bridge this gap, we present a new offline RL framework that smoothly interpolates between the two extremes of data composition, hence unifying imitation learning and vanilla offline RL. The new framework is centered around a weak version of the concentrability coefficient that measures the deviation of the behavior policy from the expert policy alone. Under this new framework, we ask: can one develop an algorithm that achieves a minimax optimal rate adaptive to unknown data composition? To address this question, we consider a lower confidence bound (LCB) algorithm developed based on pessimism in the face of uncertainty in offline RL. We study finite-sample properties of LCB as well as information-theoretic limits in multi-armed bandits, contextual bandits, and Markov decision processes (MDPs). Our analysis reveals surprising facts about optimality rates. In particular, in both contextual bandits and RL, LCB achieves a faster rate of $1/N$ for nearly-expert datasets compared to the usual rate of $1/\sqrt{N}$ in offline RL, where $N$ is the batch dataset sample size. In contextual bandits with at least two contexts, we prove that LCB is adaptively optimal for the entire data composition range, achieving a smooth transition from imitation learning to offline RL. We further show that LCB is almost adaptively optimal in MDPs.

PDF Details

AAAI Conference 2020 Conference Paper

Multi-Spectral Salient Object Detection by Adversarial Domain Adaptation

Shaoyue Song
Hongkai Yu
Zhenjiang Miao
Jianwu Fang
Kang Zheng
Cong Ma
Song Wang

Although there are many existing research works about the salient object detection (SOD) in RGB images, there are still many complex situations that regular RGB images cannot provide enough cues for the accurate SOD, such as the shadow effect, similar appearance between background and foreground, strong or insufﬁcient illumination, etc. Because of the success of near-infrared spectrum in many computer vision tasks, we explore the multi-spectral SOD in the synchronized RGB images and near-infrared (NIR) images for the both simple and complex situations. We assume that the RGB SOD in the existing RGB image datasets could provide references for the multi-spectral SOD problem. In this paper, we ﬁrst collect and will publicize a large multi-spectral dataset including 780 synchronized RGB and NIR image pairs for the multi-spectral SOD problem in the simple and complex situations. We model this research problem as an adversarial domain adaptation from the existing RGB image dataset (source domain) to the collected multi-spectral dataset (target domain). Experimental results show the effectiveness and accuracy of the proposed adversarial domain adaptation for the multi-spectral SOD.

PDF Details

JBHI Journal 2019 Journal Article

Moving-Tolerant Augmented Reality Surgical Navigation System Using Autostereoscopic Three-Dimensional Image Overlay

Cong Ma
Guowen Chen
Xinran Zhang
Guochen Ning
Hongen Liao

Augmented reality (AR) surgical navigation systems based on image overlay have been used in minimally invasive surgery. However, conventional systems still suffer from a limited viewing zone, a shortage of intuitive three-dimensional (3D) image guidance and cannot be moved freely. To fuse the 3-D overlay image with the patient in situ, it is essential to track the overlay device while it is moving. A direct line-of-sight should be maintained between the optical markers and the tracker camera. In this study, we propose a moving-tolerant AR surgical navigation system using autostereoscopic image overlay, which can avoid the use of the optical tracking system during the intraoperative period. The system captures binocular image sequences of environmental change in the operation room to locate the overlay device, rather than tracking the device directly. Therefore, it is no longer required to maintain a direct line-of-sight between the tracker and the tracked devices. The movable range of the system is also not limited by the scope of the tracker camera. Computer simulation experiments demonstrate the reliability of the proposed moving-tolerant AR surgical navigation system. We also fabricate a computer-generated integral photography-based 3-D overlay AR system to validate the feasibility of the proposed moving-tolerant approach. Qualitative and quantitative experiments demonstrate that the proposed system can always fuse the 3-D image with the patient, thus, increasing the feasibility and reliability of traditional 3-D overlay image AR surgical navigation systems.

Details DOI