Author name cluster

Jialei Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

23 papers

2 author rows

ICML Conference 2025 Conference Paper

OmniAudio: Generating Spatial Audio from 360-Degree Video

Huadai Liu
Tianyi Luo
Kaicheng Luo
Qikai Jiang
Peiwen Sun
Jialei Wang
Rongjie Huang 0001
Qian Chen 0003

Traditional video-to-audio generation techniques primarily focus on perspective video and non-spatial audio, often missing the spatial cues necessary for accurately representing sound sources in 3D environments. To address this limitation, we introduce a novel task, 360V2SA, to generate spatial audio from 360-degree videos, specifically producing First-order Ambisonics (FOA) audio - a standard format for representing 3D spatial audio that captures sound directionality and enables realistic 3D audio reproduction. We first create Sphere360, a novel dataset tailored for this task that is curated from real-world data. We also design an efficient semi-automated pipeline for collecting and cleaning paired video-audio data. To generate spatial audio from 360-degree video, we propose a novel framework OmniAudio, which leverages self-supervised pre-training using both spatial audio data (in FOA format) and large-scale non-spatial data. Furthermore, OmniAudio features a dual-branch framework that utilizes both panoramic and perspective video inputs to capture comprehensive local and global information from 360-degree videos. Experimental results demonstrate that OmniAudio achieves state-of-the-art performance across both objective and subjective metrics on Sphere360. Code and datasets are available at https: //github. com/liuhuadai/OmniAudio. The project website is available at https: //OmniAudio-360V2SA. github. io.

Details

NeurIPS Conference 2025 Conference Paper

Orient Anything V2: Unifying Orientation and Rotation Understanding

Zehan Wang
Ziang Zhang
Jiayang Xu
Jialei Wang
Tianyu Pang
Chao Du
Hengshuang Zhao
Zhou Zhao

This work presents Orient Anything V2, an enhanced foundation model for unified understanding of object 3D orientation and rotation from single or paired images. Building upon Orient Anything V1, which defines orientation via a single unique front face, V2 extends this capability to handle objects with diverse rotational symmetries and directly estimate relative rotations. These improvements are enabled by four key innovations: 1) Scalable 3D assets synthesized by generative models, ensuring broad category coverage and balanced data distribution; 2) An efficient, model-in-the-loop annotation system that robustly identifies 0 to N valid front faces for each object; 3) A symmetry-aware, periodic distribution fitting objective that captures all plausible front-facing orientations, effectively modeling object rotational symmetry; 4) A multi-frame architecture that directly predicts relative object rotations. Extensive experiments show that Orient Anything V2 achieves state-of-the-art zero-shot performance on orientation estimation, 6DoF pose estimation, and object symmetry recognition across 11 widely used benchmarks. The model demonstrates strong generalization, significantly broadening the applicability of orientation estimation in diverse downstream tasks.

PDF Details

NeurIPS Conference 2025 Conference Paper

ThinkSound: Chain-of-Thought Reasoning in Multimodal LLMs for Audio Generation and Editing

Huadai Liu
Kaicheng Luo
Jialei Wang
Wen Wang
Qian Chen
Zhou Zhao
Wei Xue

While end-to-end video-to-audio generation has greatly improved, producing high-fidelity audio that authentically captures the nuances of visual content remains challenging. Like professionals in the creative industries, this generation requires sophisticated reasoning about items such as visual dynamics, acoustic environments, and temporal relationships. We present ThinkSound, a novel framework that leverages Chain-of-Thought (CoT) reasoning to enable stepwise, interactive audio generation and editing for videos. Our approach decomposes the process into three complementary stages: foundational foley generation that creates semantically coherent soundscapes, interactive object-centric refinement through precise user interactions, and targeted editing guided by natural language instructions. At each stage, a multimodal large language model generates contextually aligned CoT reasoning that guides a unified audio foundation model. Furthermore, we introduce AudioCoT, a comprehensive dataset with structured reasoning annotations that establishes connections between visual content, textual descriptions, and sound synthesis. Experiments demonstrate that ThinkSound achieves state-of-the-art performance in video-to-audio generation across both audio metrics and CoT metrics, and excels in the out-of-distribution Movie Gen Audio benchmark. The project page is available at https: //ThinkSound-Project. github. io.

PDF Details

NeurIPS Conference 2024 Conference Paper

GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks

Yu Zhang
Changhao Pan
Wenxiang Guo
Ruiqi Li
Zhiyuan Zhu
Jialei Wang
Wenhao Xu
Jingyu Lu

The scarcity of high-quality and multi-task singing datasets significantly hinders the development of diverse controllable and personalized singing tasks, as existing singing datasets suffer from low quality, limited diversity of languages and singers, absence of multi-technique information and realistic music scores, and poor task suitability. To tackle these problems, we present GTSinger, a large Global, multi-Technique, free-to-use, high-quality singing corpus with realistic music scores, designed for all singing tasks, along with its benchmarks. Particularly, (1) we collect 80. 59 hours of high-quality singing voices, forming the largest recorded singing dataset; (2) 20 professional singers across nine widely spoken languages offer diverse timbres and styles; (3) we provide controlled comparison and phoneme-level annotations of six commonly used singing techniques, helping technique modeling and control; (4) GTSinger offers realistic music scores, assisting real-world musical composition; (5) singing voices are accompanied by manual phoneme-to-audio alignments, global style labels, and 16. 16 hours of paired speech for various singing tasks. Moreover, to facilitate the use of GTSinger, we conduct four benchmark experiments: technique-controllable singing voice synthesis, technique recognition, style transfer, and speech-to-singing conversion.

PDF Details DOI

AAAI Conference 2024 Conference Paper

HACDR-Net: Heterogeneous-Aware Convolutional Network for Diabetic Retinopathy Multi-Lesion Segmentation

Qihao Xu
Xiaoling Luo
Chao Huang
Chengliang Liu
Jie Wen
Jialei Wang
Yong Xu

Diabetic Retinopathy (DR), the leading cause of blindness in diabetic patients, is diagnosed by the condition of retinal multiple lesions. As a difficult task in medical image segmentation, DR multi-lesion segmentation faces the main concerns as follows. On the one hand, retinal lesions vary in location, shape, and size. On the other hand, because some lesions occupy only a very small part of the entire fundus image, the high proportion of background leads to difficulties in lesion segmentation. To solve the above problems, we propose a heterogeneous-aware convolutional network (HACDR-Net) that composes heterogeneous cross-convolution, heterogeneous modulated deformable convolution, and optional near-far-aware convolution. Our network introduces an adaptive aggregation module to summarize the heterogeneous feature maps and get diverse lesion areas in the heterogeneous receptive field along the channels and space. In addition, to solve the problem of the highly imbalanced proportion of focal areas, we design a new medical image segmentation loss function, Noise Adjusted Loss (NALoss). NALoss balances the predictive feature distribution of background and lesion by jointing Gaussian noise and hard example mining, thus enhancing awareness of lesions. We conduct the experiments on the public datasets IDRiD and DDR, and the experimental results show that the proposed method achieves better performance than other state-of-the-art methods. The code is open-sourced on github.com/xqh180110910537/HACDR-Net.

PDF Details DOI

JBHI Journal 2023 Journal Article

A Revised Approach to Orthodontic Treatment Monitoring From Oralscan Video

Yan Tian
Guotang Jian
Jialei Wang
Hong Chen
Lei Pan
Zhaocheng Xu
Jianyuan Li
Ruili Wang

Research on orthodontic treatment monitoring from oralscan video is a new direction in dental digitalization. We designed an approach to reconstruct, segment, and estimate the pose of individual teeth to measure orthodontic treatment. To handle the semantic gap in heterogeneous data on the condition that they are combined linearly, we present a multimedia interaction network (MIN) to combine heterogeneous information in point cloud segmentation by extending the graph attention mechanism. Moreover, a structure-aware quadruple loss is designed to explore the relation between multiple and diverse unmatched points in point cloud registration. The performance of our approach is evaluated on multiple tooth registration datasets, and extensive experiments show that our approach improves the accuracy by a margin of 1. 4% in the inlier ratio on the Aoralscan3 dataset when it is compared with prevailing approaches.

Details DOI

JMLR Journal 2019 Journal Article

Stochastic Canonical Correlation Analysis

Chao Gao
Dan Garber
Nathan Srebro
Jialei Wang
Weiran Wang

We study the sample complexity of canonical correlation analysis (CCA), i.e., the number of samples needed to estimate the population canonical correlation and directions up to arbitrarily small error. With mild assumptions on the data distribution, we show that in order to achieve $\epsilon$-suboptimality in a properly defined measure of alignment between the estimated canonical directions and the population solution, we can solve the empirical objective exactly with $N(\epsilon, \Delta, \gamma)$ samples, where $\Delta$ is the singular value gap of the whitened cross-covariance matrix and $1/\gamma$ is an upper bound of the condition number of auto-covariance matrices. Moreover, we can achieve the same learning accuracy by drawing the same level of samples and solving the empirical objective approximately with a stochastic optimization algorithm; this algorithm is based on the shift-and-invert power iterations and only needs to process the dataset for $\mathcal{O} \left(\log \frac{1}{\epsilon} \right)$ passes. Finally, we show that, given an estimate of the canonical correlation, the streaming version of the shift-and-invert power iterations achieves the same learning accuracy with the same level of sample complexity, by processing the data only once. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )

PDF Details

JMLR Journal 2019 Journal Article

Utilizing Second Order Information in Minibatch Stochastic Variance Reduced Proximal Iterations

Jialei Wang
Tong Zhang

We present a novel minibatch stochastic optimization method for empirical risk minimization of linear predictors. The method efficiently leverages both sub-sampled first-order and higher-order information, by incorporating variance-reduction and acceleration techniques. We prove improved iteration complexity over state-of-the-art methods under suitable conditions. In particular, the approach enjoys global fast convergence for quadratic convex objectives and local fast convergence for general convex objectives. Experiments are provided to demonstrate the empirical advantage of the proposed method over existing approaches in the literature. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )

PDF Details

NeurIPS Conference 2018 Conference Paper

Gradient Sparsification for Communication-Efficient Distributed Optimization

Jianqiao Wangni
Jialei Wang
Ji Liu
Tong Zhang

Modern large-scale machine learning applications require stochastic optimization algorithms to be implemented on distributed computational architectures. A key bottleneck is the communication overhead for exchanging information such as stochastic gradients among different workers. In this paper, to reduce the communication cost, we propose a convex optimization formulation to minimize the coding length of stochastic gradients. The key idea is to randomly drop out coordinates of the stochastic gradient vectors and amplify the remaining coordinates appropriately to ensure the sparsified gradient to be unbiased. To solve the optimal sparsification efficiently, several simple and fast algorithms are proposed for an approximate solution, with a theoretical guarantee for sparseness. Experiments on $\ell_2$ regularized logistic regression, support vector machines, and convolutional neural networks validate our sparsification approaches.

PDF Details

NeurIPS Conference 2018 Conference Paper

Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization

Blake Woodworth
Jialei Wang
Adam Smith
Brendan McMahan
Nati Srebro

We suggest a general oracle-based framework that captures parallel stochastic optimization in different parallelization settings described by a dependency graph, and derive generic lower bounds in terms of this graph. We then use the framework and derive lower bounds to study several specific parallel optimization settings, including delayed updates and parallel processing with intermittent communication. We highlight gaps between lower and upper bounds on the oracle complexity, and cases where the ``natural'' algorithms are not known to be optimal.

PDF Details

JMLR Journal 2017 Journal Article

A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization

Shun Zheng
Jialei Wang
Fen Xia
Wei Xu
Tong Zhang

In modern large-scale machine learning applications, the training data are often partitioned and stored on multiple machines. It is customary to employ the data parallelism approach, where the aggregated training loss is minimized without moving data across machines. In this paper, we introduce a novel distributed dual formulation for regularized loss minimization problems that can directly handle data parallelism in the distributed setting. This formulation allows us to systematically derive dual coordinate optimization procedures, which we refer to as Distributed Alternating Dual Maximization (DADM). The framework extends earlier studies described in (Boyd et al., 2011; Ma et al., 2017; Jaggi et al., 2014; Yang, 2013) and has rigorous theoretical analyses. Moreover, with the help of the new formulation, we develop the accelerated version of DADM (Acc-DADM) by generalizing the acceleration technique from (Shalev-Shwartz and Zhang, 2014) to the distributed setting. We also provide theoretical results for the proposed accelerated version, and the new result improves previous ones (Yang, 2013; Ma et al., 2017) whose iteration complexities grow linearly on the condition number. Our empirical studies validate our theory and show that our accelerated approach significantly improves the previous state- of-the-art distributed dual coordinate optimization algorithms. [abs] [ pdf ][ bib ] &copy JMLR 2017. ( edit, beta )

PDF Details

ICML Conference 2017 Conference Paper

Efficient Distributed Learning with Sparsity

Jialei Wang
Mladen Kolar
Nathan Srebro
Tong Zhang 0001

We propose a novel, efficient approach for distributed sparse learning with observations randomly partitioned across machines. In each round of the proposed method, worker machines compute the gradient of the loss on local data and the master machine solves a shifted $\ell_1$ regularized loss minimization problem. After a number of communication rounds that scales only logarithmically with the number of machines, and independent of other parameters of the problem, the proposed approach provably matches the estimation error bound of centralized methods.

Details

ICML Conference 2017 Conference Paper

Exploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms

Jialei Wang
Lin Xiao

We consider empirical risk minimization of linear predictors with convex loss functions. Such problems can be reformulated as convex-concave saddle point problems and solved by primal-dual first-order algorithms. However, primal-dual algorithms often require explicit strongly convex regularization in order to obtain fast linear convergence, and the required dual proximal mapping may not admit closed-form or efficient solution. In this paper, we develop both batch and randomized primal-dual algorithms that can exploit strong convexity from data adaptively and are capable of achieving linear convergence even without regularization. We also present dual-free variants of adaptive primal-dual algorithms that do not need the dual proximal mapping, which are especially suitable for logistic regression.

Details

NeurIPS Conference 2017 Conference Paper

Multi-Information Source Optimization

Matthias Poloczek
Jialei Wang
Peter Frazier

We consider Bayesian methods for multi-information source optimization (MISO), in which we seek to optimize an expensive-to-evaluate black-box objective function while also accessing cheaper but biased and noisy approximations ("information sources"). We present a novel algorithm that outperforms the state of the art for this problem by using a Gaussian process covariance kernel better suited to MISO than those used by previous approaches, and an acquisition function based on a one-step optimality analysis supported by efficient parallelization. We also provide a novel technique to guarantee the asymptotic quality of the solution provided by this algorithm. Experimental evaluations demonstrate that this algorithm consistently finds designs of higher value at less cost than previous approaches.

PDF Details

NeurIPS Conference 2016 Conference Paper

Efficient Globally Convergent Stochastic Optimization for Canonical Correlation Analysis

Weiran Wang
Jialei Wang
Dan Garber
Nati Srebro

We study the stochastic optimization of canonical correlation analysis (CCA), whose objective is nonconvex and does not decouple over training samples. Although several stochastic gradient based optimization algorithms have been recently proposed to solve this problem, no global convergence guarantee was provided by any of them. Inspired by the alternating least squares/power iterations formulation of CCA, and the shift-and-invert preconditioning method for PCA, we propose two globally convergent meta-algorithms for CCA, both of which transform the original problem into sequences of least squares problems that need only be solved approximately. We instantiate the meta-algorithms with state-of-the-art SGD methods and obtain time complexities that significantly improve upon that of previous work. Experimental results demonstrate their superior performance.

PDF Details

JMLR Journal 2016 Journal Article

Large Scale Online Kernel Learning

Jing Lu
Steven C.H. Hoi
Jialei Wang
Peilin Zhao
Zhi-Yong Liu

In this paper, we present a new framework for large scale online kernel learning, making kernel methods efficient and scalable for large-scale online learning applications. Unlike the regular budget online kernel learning scheme that usually uses some budget maintenance strategies to bound the number of support vectors, our framework explores a completely different approach of kernel functional approximation techniques to make the subsequent online learning task efficient and scalable. Specifically, we present two different online kernel machine learning algorithms: (i) Fourier Online Gradient Descent (FOGD) algorithm that applies the random Fourier features for approximating kernel functions; and (ii) NystrÃ¶m Online Gradient Descent (NOGD) algorithm that applies the NystrÃ¶m method to approximate large kernel matrices. We explore these two approaches to tackle three online learning tasks: binary classification, multi-class classification, and regression. The encouraging results of our experiments on large-scale datasets validate the effectiveness and efficiency of the proposed algorithms, making them potentially more practical than the family of existing budget online kernel learning approaches. [abs] [ pdf ][ bib ] &copy JMLR 2016. ( edit, beta )

PDF Details

TIST Journal 2016 Journal Article

Soft Confidence-Weighted Learning

Jialei Wang
Peilin Zhao
Steven C. H. Hoi

Online learning plays an important role in many big data mining problems because of its high efficiency and scalability. In the literature, many online learning algorithms using gradient information have been applied to solve online classification problems. Recently, more effective second-order algorithms have been proposed, where the correlation between the features is utilized to improve the learning efficiency. Among them, Confidence-Weighted (CW) learning algorithms are very effective, which assume that the classification model is drawn from a Gaussian distribution, which enables the model to be effectively updated with the second-order information of the data stream. Despite being studied actively, these CW algorithms cannot handle nonseparable datasets and noisy datasets very well. In this article, we propose a family of Soft Confidence-Weighted (SCW) learning algorithms for both binary classification and multiclass classification tasks, which is the first family of online classification algorithms that enjoys four salient properties simultaneously: (1) large margin training, (2) confidence weighting, (3) capability to handle nonseparable data, and (4) adaptive margin. Our experimental results show that the proposed SCW algorithms significantly outperform the original CW algorithm. When comparing with a variety of state-of-the-art algorithms (including AROW, NAROW, and NHERD), we found that SCW in general achieves better or at least comparable predictive performance, but enjoys considerably better efficiency advantage (i.e., using a smaller number of updates and lower time cost). To facilitate future research, we release all the datasets and source code to the public at http://libol.stevenhoi.org/.

Details DOI

UAI Conference 2014 Conference Paper

A Consistent Estimator of the Expected Gradient Outerproduct

Shubhendu Trivedi
Jialei Wang
Samory Kpotufe
Gregory Shakhnarovich

In high-dimensional classification or regression problems, the expected gradient outerproduct (EGOP) of the unknown regression function f, namely EX ∇f(X) · ∇f(X)>, is known to recover those directions v ∈ Rd most relevant to predicting the output Y. However, just as in gradient estimation, optimal estimators of the EGOP can be expensive in practice. We show that a simple rough estimator, much cheaper in practice, suffices to obtain significant improvements on real-world nonparametric classification and regression tasks. Furthermore, we prove that, despite its simplicity, this rough estimator remains statistically consistent under mild conditions.

Details

JMLR Journal 2014 Journal Article

LIBOL: A Library for Online Learning Algorithms

Steven C.H. Hoi
Jialei Wang
Peilin Zhao

LIBOL is an open-source library for large-scale online learning, which consists of a large family of efficient and scalable state-of-the-art online learning algorithms for large- scale online classification tasks. We have offered easy-to-use command-line tools and examples for users and developers, and also have made comprehensive documents available for both beginners and advanced users. LIBOL is not only a machine learning toolbox, but also a comprehensive experimental platform for conducting online learning research. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2014. ( edit, beta )

PDF Details

AIJ Journal 2014 Journal Article

Online Transfer Learning

Peilin Zhao
Steven C.H. Hoi
Jialei Wang
Bin Li

In this paper, we propose a novel machine learning framework called “Online Transfer Learning” (OTL), which aims to attack an online learning task on a target domain by transferring knowledge from some source domain. We do not assume data in the target domain follows the same distribution as that in the source domain, and the motivation of our work is to enhance a supervised online learning task on a target domain by exploiting the existing knowledge that had been learnt from training data in source domains. OTL is in general very challenging since data in both source and target domains not only can be different in their class distributions, but also can be diverse in their feature representations. As a first attempt to this new research problem, we investigate two different settings of OTL: (i) OTL on homogeneous domains of common feature space, and (ii) OTL across heterogeneous domains of different feature spaces. For each setting, we propose effective OTL algorithms to solve online classification tasks, and show some theoretical bounds of the algorithms. In addition, we also apply the OTL technique to attack the challenging online learning tasks with concept-drifting data streams. Finally, we conduct extensive empirical studies on a comprehensive testbed, in which encouraging results validate the efficacy of our techniques.

Details DOI

IJCAI Conference 2013 Conference Paper

Large Scale Online Kernel Classification

Jialei Wang
Steven C. H. Hoi
Peilin Zhao
Jinfeng Zhuang
Zhi-Yong Liu

In this work, we present a new framework for large scale online kernel classification, making kernel methods efficient and scalable for large-scale online learning tasks. Unlike the regular budget kernel online learning scheme that usually uses different strategies to bound the number of support vectors, our framework explores a functional approximation approach to approximating a kernel function/matrix in order to make the subsequent online learning task efficient and scalable. Specifically, we present two different online kernel machine learning algorithms: (i) the Fourier Online Gradient Descent (FOGD) algorithm that applies the random Fourier features for approximating kernel functions; and (ii) the Nyström Online Gradient Descent (NOGD) algorithm that applies the Nyström method to approximate large kernel matrices. We offer theoretical analysis of the proposed algorithms, and conduct experiments for large-scale online classification tasks with some data set of over 1 million instances. Our encouraging results validate the effectiveness and efficiency of the proposed algorithms, making them potentially more practical than the family of existing budget kernel online learning approaches.

PDF Details DOI

ICML Conference 2012 Conference Paper

Exact Soft Confidence-Weighted Learning

Steven C. H. Hoi
Jialei Wang
Peilin Zhao

Details

ICML Conference 2012 Conference Paper

Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-Based Online Learning

Steven C. H. Hoi
Jialei Wang
Peilin Zhao
Rong Jin 0001
Pengcheng Wu

Details