Arrow Research search

Author name cluster

Yi Ding

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

22 papers
2 author rows

Possible papers

22

JBHI Journal 2026 Journal Article

Decoding Covert Speech from EEG by Functional Areas Spatio-Temporal Transformer

  • Muyun Jiang
  • Wei Zhang
  • Yi Ding
  • Kok Ann Colin Teo
  • LaiGuan Fong
  • Shuailei Zhang
  • Zhiwei Guo
  • Chenyu Liu

Covert speech involves imagining speaking without audible sound or any movements. Decoding covert speech from electroencephalogram (EEG) is challenging due to a limited understanding of neural pronunciation mapping and the low signal-to-noise ratio of the signal. In this study, we developed a large-scale multi-utterance speech EEG dataset from 57 right-handed native English-speaking subjects, each performing covert and overt speech tasks by repeating the same word in five utterances within a ten-second duration. Given the spatio-temporal nature of the neural activation process during speech pronunciation, we developed a Functional Areas Spatio-temporal Transformer (FAST), an effective framework for converting EEG signals into tokens and utilizing transformer architecture for sequence encoding. Our results reveal distinct and interpretable speech neural features by the visualization of FAST-generated activation maps across frontal and temporal brain regions with each word being covertly spoken, providing new insights into the discriminative features of the neural representation of covert speech. This is the first report of such a study, which provides interpretable evidence for speech decoding from EEG. The code for this work has been made public at https://github.com/Jiang-Muyun/FAST

AAAI Conference 2026 Conference Paper

EEG-DLite: Dataset Distillation for Efficient Large EEG Model Training

  • Yuting Tang
  • Weibang Jiang
  • Shanglin Li
  • Yong Li
  • Chenyu Liu
  • Xinliang Zhou
  • Yi Ding
  • Cuntai Guan

Large-scale EEG foundation models have shown strong generalization across a range of downstream tasks, but their training remains resource-intensive due to the volume and variable quality of EEG data. In this work, we introduce EEG-DLite, a data distillation framework that enables more efficient pre-training by selectively removing noisy and redundant samples from large EEG datasets. EEG-DLite begins by encoding EEG segments into compact latent representations using a self-supervised autoencoder, allowing sample selection to be performed efficiently and with reduced sensitivity to noise. Based on these representations, EEG-DLite filters out outliers and minimizes redundancy, resulting in a smaller yet informative subset that retains the diversity essential for effective foundation model training. Through extensive experiments, we demonstrate that training on only 5 percent of a 2,500-hour dataset curated with EEG-DLite yields performance comparable to, and in some cases better than, training on the full dataset across multiple downstream tasks. To our knowledge, this is the first systematic study of pre-training data distillation in the context of EEG foundation models. EEG-DLite provides a scalable and practical path toward more effective and efficient physiological foundation modeling.

AAAI Conference 2026 Conference Paper

ForeDiffusion: Foresight-Conditioned Diffusion Policy via Future View Construction for Robot Manipulation

  • Weize Xie
  • Yi Ding
  • Ying He
  • Leilei Wang
  • Binwen Bai
  • Zheyi Zhao
  • Chenyang Wang
  • F. Richard Yu

Diffusion strategies have advanced visual motor control by progressively denoising high-dimensional action sequences, providing a promising method for robot manipulation. However, as task complexity increases, the success rate of existing baseline models decreases considerably. Analysis indicates that current diffusion strategies are confronted with two limitations. First, these strategies only rely on short-term observations as conditions. Second, the training objective remains limited to a single denoising loss, which leads to error accumulation and causes grasping deviations. To address these limitations, this paper proposes Foresight-Conditioned Diffusion (ForeDiffusion), by injecting the predicted future view representation into the diffusion process. As a result, the policy is guided to be forward-looking, enabling it to correct trajectory deviations. Following this design, ForeDiffusion employs a dual loss mechanism, combining the traditional denoising loss and the consistency loss of future observations, to achieve the unified optimization. Extensive evaluation on the Adroit suite and the MetaWorld benchmark demonstrates that ForeDiffusion achieves an average success rate of 80% for the overall task, significantly outperforming the existing mainstream diffusion methods by approximately 20% in high difficulty tasks, while maintaining more stable performance across the entire tasks.

EAAI Journal 2025 Journal Article

Constructing a robust Short-Text Clustering Model for contrastive learning based on optimized adaptive optimal transport for pseudo-label generation

  • Jiahui Liu
  • Chun Yan
  • Wei Liu
  • Yi Ding

Short-text data often suffers from noise and class imbalances, posing challenges for effective clustering. To address these issues, we propose a Short-Text Clustering Model based on Pseudo-Labels and Contrastive Learning (SCPCL). The model comprises two key components: (1) a pseudo-label acquisition module, which introduces the optimal transport theory into short-text clustering and adopts a dynamically adjusted prior distribution to enhance the clustering of minority classes; and (2) a contrastive learning module combining a supervised clustering network, an instance contrastive head, and an anchor network. These components ensure intraclass compactness, interclass separability, and robustness to noise. Experiments on six benchmark datasets showed that SCPCL achieves an average clustering accuracy improvement of 2. 61%, with a maximum gain of 6. 47% for long-tailed distributions. This model provides an effective solution for clustering complex short text data.

JBHI Journal 2025 Journal Article

EEG-Deformer: A Dense Convolutional Transformer for Brain-Computer Interfaces

  • Yi Ding
  • Yong Li
  • Hao Sun
  • Rui Liu
  • Chengxuan Tong
  • Chenyu Liu
  • Xinliang Zhou
  • Cuntai Guan

Effectively learning the temporal dynamics in electroencephalogram (EEG) signals is challenging yet essential for decoding brain activities using brain-computer interfaces (BCIs). Although Transformers are popular for their long-term sequential learning ability in the BCI field, most methods combining Transformers with convolutional neural networks (CNNs) fail to capture the coarse-to-fine temporal dynamics of EEG signals. To overcome this limitation, we introduce EEG-Deformer, which incorporates two main novel components into a CNN-Transformer: (1) a Hierarchical Coarse-to-Fine Transformer (HCT) block that integrates a Fine-grained Temporal Learning (FTL) branch into Transformers, effectively discerning coarse-to-fine temporal patterns; and (2) a Dense Information Purification (DIP) module, which utilizes multi-level, purified temporal information to enhance decoding accuracy. Comprehensive experiments on three representative cognitive tasksâcognitive attention, driving fatigue, and mental workload detectionâconsistently confirm the generalizability of our proposed EEG-Deformer, demonstrating that it either outperforms or performs comparably to existing state-of-the-art methods. Visualization results show that EEG-Deformer learns from neurophysiologically meaningful brain regions for the corresponding cognitive tasks.

ICLR Conference 2025 Conference Paper

ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time

  • Yi Ding
  • Bolian Li
  • Ruqi Zhang

Vision Language Models (VLMs) have become essential backbones for multi-modal intelligence, yet significant safety challenges limit their real-world application. While textual inputs can often be effectively safeguarded, adversarial visual inputs can often easily bypass VLM defense mechanisms. Existing defense methods are either resource-intensive, requiring substantial data and compute, or fail to simultaneously ensure safety and usefulness in responses. To address these limitations, we propose a novel two-phase inference-time alignment framework, **E**valuating **T**hen **A**ligning (ETA): i) Evaluating input visual contents and output responses to establish a robust safety awareness in multimodal settings, and ii) Aligning unsafe behaviors at both shallow and deep levels by conditioning the VLMs' generative distribution with an interference prefix and performing sentence-level best-of-$N$ to search the most harmless and helpful generation paths. Extensive experiments show that ETA outperforms baseline methods in terms of harmlessness, helpfulness, and efficiency, reducing the unsafe rate by 87.5\% in cross-modality attacks and achieving 96.6\% win-ties in GPT-4 helpfulness evaluation.

NeurIPS Conference 2025 Conference Paper

REFED: A Subject Real-time Dynamic Labeled EEG-fNIRS Synchronized Recorded Emotion Dataset

  • Xiaojun Ning
  • Jing Wang
  • Zhiyang Feng
  • Tianzuo Xin
  • Shuo Zhang
  • Shaoqi Zhang
  • Zheng Lian
  • Yi Ding

Affective brain-computer interfaces (aBCIs) play a crucial role in personalized human–computer interaction and neurofeedback modulation. To develop practical and effective aBCI paradigms and to investigate the spatial-temporal dynamics of brain activity under emotional inducement, portable electroencephalography (EEG) signals have been widely adopted. To further enhance spatial-temporal perception, functional near-infrared spectroscopy (fNIRS) has attracted increasing interest in the aBCI field and has been explored in combination with EEG. However, existing datasets typically provide only static fixation labels, overlooking the dynamic changes in subjects' emotions. Notably, some studies have attempted to collect continuously annotated emotional data, but they have recorded only peripheral physiological signals without directly observing brain activity, limiting insight into underlying neural states under different emotions. To address these challenges, we present the Real-time labeled EEG-fNIRS Dataset (REFED). To the best of our knowledge, this is the first EEG-fNIRS dataset with real-time dynamic emotional annotations. REFED simultaneously records brain signals from both EEG and fNIRS modalities while providing continuous, real-time annotations of valence and arousal. The results of the data analysis demonstrate the effectiveness of emotion inducement and the reliability of real-time annotation. This dataset offers the possibility for studying the neurovascular coupling mechanism under emotional evolution and for developing dynamic, robust affective BCIs.

JBHI Journal 2025 Journal Article

REI-Net: A Reference Electrode Standardization Interpolation Technique Based 3D CNN for Motor Imagery Classification

  • Meiyan Xu
  • Jie Jiao
  • Duo Chen
  • Yi Ding
  • Qingqing Chen
  • Jipeng Wu
  • Peipei Gu
  • Yijie Pan

High-quality scalp EEG datasets are extremely valuable for motor imagery (MI) analysis. However, due to electrode size and montage, different datasets inevitably experience channel information loss, posing a significant challenge for MI decoding. A 2D representation that focuses on the time domain may loss the spatial information in EEG. In contrast, a 3D representation based on topography may suffer from channel loss and introduce noise through different padding methods. In this paper, we propose a framework called Reference Electrode Standardization Interpolation Network (REI-Net). Through an interpolation of 3D representation, REI-Net retains the temporal information in 2D scalp EEG while improving the spatial resolution within a certain montage. Additionally, to overcome the data variability caused by individual differences, transfer learning is employed to enhance the decoding robustness. Our approach achieves promising performance on two widely-recognized MI datasets, with an accuracy of 77. 99% on BCI-C IV-2a and an accuracy of 63. 94% on Kaya2018. The proposed algorithm outperforms the SOTAs leading to more accurate and robust results.

NeurIPS Conference 2025 Conference Paper

Sherlock: Self-Correcting Reasoning in Vision-Language Models

  • Yi Ding
  • Ruqi Zhang

Reasoning Vision-Language Models (VLMs) have shown promising performance on complex multimodal tasks. However, they still face significant challenges: they are highly sensitive to reasoning errors, require large volumes of annotated data or accurate verifiers, and struggle to generalize beyond specific domains. To address these limitations, we explore self-correction as a strategy to enhance reasoning VLMs. We first conduct an in-depth analysis of reasoning VLMs’ self-correction abilities and identify key gaps. Based on our findings, we introduce \emph{Sherlock}, a self-correction and self-improvement training framework. \emph{Sherlock} introduces a trajectory-level self-correction objective, a preference data construction method based on visual perturbation, and a dynamic $\beta$ for preference tuning. Once the model acquires self-correction capabilities using only 20k randomly sampled annotated data, it continues to self-improve without external supervision. Built on the Llama3. 2-Vision-11B model, \emph{Sherlock} achieves remarkable results across eight benchmarks, reaching an average accuracy of 64. 1 with direct generation and 65. 4 after self-correction. It outperforms LLaVA-CoT (63. 2), Mulberry (63. 9), and LlamaV-o1 (63. 4) while using less than 20\% of the annotated data.

EAAI Journal 2024 Journal Article

Enhancing OCR with line segmentation mask for container text recognition in container terminal

  • Zhichao Zhang
  • Yi Ding
  • Rui Li
  • Kaimin Chen

Optical Character Recognition (OCR) plays a pivotal role in enhancing the operational efficiency of container ports. However, challenges such as angle limitations and the complexity of container fonts in traditional OCR systems lead to tilted text and text adhesion, thereby reducing the recognition rate. Recognizing containers at a high speed is equally crucial for port operations. In this study, we address these challenges by introducing an Enhanced OCR (EOCR) system, incorporating Line Segmentation Mask (LSM)-based detection and Scanline-based recognition. LSM tackles the issue of text adhesion caused by traditional segmentation, while recognition based on scan lines accelerates efficiency. Additionally, we propose the arbitrary angle quadrilateral fitting algorithm targeting sloping quad areas in images taken at a container terminal. Experimental results on a dataset of container images from the Shanghai Port demonstrate superior performance compared to existing algorithms, achieving a recognition accuracy rate of up to 98. 7%. Furthermore, an ablation study confirms that our EOCR significantly enhances recognition accuracy while ensuring real-time performance.

JBHI Journal 2024 Journal Article

MASA-TCN: Multi-Anchor Space-Aware Temporal Convolutional Neural Networks for Continuous and Discrete EEG Emotion Recognition

  • Yi Ding
  • Su Zhang
  • Chuangao Tang
  • Cuntai Guan

Emotion recognition from electroencephalogram (EEG) signals is a critical domain in biomedical research with applications ranging from mental disorder regulation to human-computer interaction. In this paper, we address two fundamental aspects of EEG emotion recognition: continuous regression of emotional states and discrete classification of emotions. While classification methods have garnered significant attention, regression methods remain relatively under-explored. To bridge this gap, we introduce MASA-TCN, a novel unified model that leverages the spatial learning capabilities of Temporal Convolutional Networks (TCNs) for EEG emotion regression and classification tasks. The key innovation lies in the introduction of a space-aware temporal layer, which empowers TCN to capture spatial relationships among EEG electrodes, enhancing its ability to discern nuanced emotional states. Additionally, we design a multi-anchor block with attentive fusion, enabling the model to adaptively learn dynamic temporal dependencies within the EEG signals. Experiments on two publicly available datasets show that MASA-TCN achieves higher results than the state-of-the-art methods for both EEG emotion regression and classification tasks.

ICML Conference 2024 Conference Paper

Predictive Dynamic Fusion

  • Bing Cao 0002
  • Yinan Xia
  • Yi Ding
  • Changqing Zhang 0002
  • Qinghua Hu

Multimodal fusion is crucial in joint decision-making systems for rendering holistic judgments. Since multimodal data changes in open environments, dynamic fusion has emerged and achieved remarkable progress in numerous applications. However, most existing dynamic multimodal fusion methods lack theoretical guarantees and easily fall into suboptimal problems, yielding unreliability and instability. To address this issue, we propose a Predictive Dynamic Fusion (PDF) framework for multimodal learning. We proceed to reveal the multimodal fusion from a generalization perspective and theoretically derive the predictable Collaborative Belief (Co-Belief) with Mono- and Holo-Confidence, which provably reduces the upper bound of generalization error. Accordingly, we further propose a relative calibration strategy to calibrate the predicted Co-Belief for potential uncertainty. Extensive experiments on multiple benchmarks confirm our superiority. Our code is available at https: //github. com/Yinan-Xia/PDF.

NeurIPS Conference 2024 Conference Paper

Test-Time Dynamic Image Fusion

  • Bing Cao
  • Yinan Xia
  • Yi Ding
  • Changqing Zhang
  • Qinghua Hu

The inherent challenge of image fusion lies in capturing the correlation of multi-source images and comprehensively integrating effective information from different sources. Most existing techniques fail to perform dynamic image fusion while notably lacking theoretical guarantees, leading to potential deployment risks in this field. Is it possible to conduct dynamic image fusion with a clear theoretical justification? In this paper, we give our solution from a generalization perspective. We proceed to reveal the generalized form of image fusion and derive a new test-time dynamic image fusion paradigm. It provably reduces the upper bound of generalization error. Specifically, we decompose the fused image into multiple components corresponding to its source data. The decomposed components represent the effective information from the source data, thus the gap between them reflects the \textit{Relative Dominability} (RD) of the uni-source data in constructing the fusion image. Theoretically, we prove that the key to reducing generalization error hinges on the negative correlation between the RD-based fusion weight and the uni-source reconstruction loss. Intuitively, RD dynamically highlights the dominant regions of each source and can be naturally converted to the corresponding fusion weight, achieving robust results. Extensive experiments and discussions with in-depth analysis on multiple benchmarks confirm our findings and superiority. Our code is available at https: //github. com/Yinan-Xia/TTD.

EAAI Journal 2023 Journal Article

Self-adaptive physics-driven deep learning for seismic wave modeling in complex topography

  • Yi Ding
  • Su Chen
  • Xiaojun Li
  • Suyang Wang
  • Shaokai Luan
  • Hao Sun

Solving for the scattered wavefield is a key scientific problem in the field of seismology and earthquake engineering. Physics-informed neural networks (PINNs) developed in recent years have great potential in possibly increasing the flexibility and efficacy of seismic modeling and inversion. Inspired by self-adaptive physics-informed neural networks (SA-PINNs), we introduce a framework for modeling seismic waves in complex topography The relevant theoretical model construction was performed using the one-dimensional (1D) wave equation as an example. Using SA-PINNs and combining them with sparse initial wavefield data formed by the spectral element method (SEM), we carry out a numerical simulation of two-dimensional (2D) SH wave propagation to realize typical cases such as infinite/semi-infinite domain and arc-shaped canyon/hill topography. For complex scattered wavefields, a sequential learning method with time-domain decomposition was introduced in SA-PINNs to improve the scalability and solution accuracy of the network. The accuracy and reliability of the proposed method to simulate wave propagation in complex topography were verified by comparing the displacement seismograms calculated by the SA-PINNs method with those calculated by the SEM. The results show that the SA-PINNs have the advantage of gridless and fine-grained simulation and can realize numerical simulation conditions, such as free surface and side-boundary wavefield transmission.

JBHI Journal 2022 Journal Article

MVFusFra: A Multi-View Dynamic Fusion Framework for Multimodal Brain Tumor Segmentation

  • Yi Ding
  • Wei Zheng
  • Ji Geng
  • Zhen Qin
  • Kim-Kwang Raymond Choo
  • Zhiguang Qin
  • Xiaolin Hou

Medical practitioners generally rely on multimodal brain images, for example based on the information from the axial, coronal, and sagittal views, to inform brain tumor diagnosis. Hence, to further utilize the 3D information embedded in such datasets, this paper proposes a multi-view dynamic fusion framework (hereafter, referred to as MVFusFra) to improve the performance of brain tumor segmentation. The proposed framework consists of three key building blocks. First, a multi-view deep neural network architecture, which represents multi learning networks for segmenting the brain tumor from different views and each deep neural network corresponds to multi-modal brain images from one single view. Second, the dynamic decision fusion method, which is mainly used to fuse segmentation results from multi-views into an integrated method. Then, two different fusion methods (i. e. , voting and weighted averaging) are used to evaluate the fusing process. Third, the multi-view fusion loss (comprising segmentation loss, transition loss, and decision loss) is proposed to facilitate the training process of multi-view learning networks, so as to ensure consistency in appearance and space, for both fusing segmentation results and the training of the learning network. We evaluate the performance of MVFusFra on the BRATS 2015 and BRATS 2018 datasets. Findings from the evaluations suggest that fusion results from multi-views achieve better performance than segmentation results from the single view, and also implying effectiveness of the proposed multi-view fusion loss. A comparative summary also shows that MVFusFra achieves better segmentation performance, in terms of efficiency, in comparison to other competing approaches.

AAAI Conference 2021 Short Paper

Improving Label Noise Robustness with Data Augmentation and Semi-Supervised Learning (Student Abstract)

  • Kento Nishi
  • Yi Ding
  • Alex Rich
  • Tobias Höllerer

Modern machine learning algorithms typically require large amounts of labeled training data to fit a reliable model. To minimize the cost of data collection, researchers often employ techniques such as crowdsourcing and web scraping. However, web data and human annotations are known to exhibit high margins of error, resulting in sizable amounts of incorrect labels. Poorly labeled training data can cause models to overfit to the noise distribution, crippling performance in real-world applications. In this work, we investigate the viability of using data augmentation in conjunction with semisupervised learning to improve the label noise robustness of image classification models. We conduct several experiments using noisy variants of the CIFAR-10 image classification dataset to benchmark our method against existing algorithms. Experimental results show that our augmentative SSL approach improves upon the state-of-the-art.

NeurIPS Conference 2020 Conference Paper

A polynomial-time algorithm for learning nonparametric causal graphs

  • Ming Gao
  • Yi Ding
  • Bryon Aragam

We establish finite-sample guarantees for a polynomial-time algorithm for learning a nonlinear, nonparametric directed acyclic graphical (DAG) model from data. The analysis is model-free and does not assume linearity, additivity, independent noise, or faithfulness. Instead, we impose a condition on the residual variances that is closely related to previous work on linear models with equal variances. Compared to an optimal algorithm with oracle knowledge of the variable ordering, the additional cost of the algorithm is linear in the dimension $d$ and the number of samples $n$. Finally, we compare the proposed algorithm to existing approaches in a simulation study.

AAAI Conference 2020 Short Paper

Exploring the Benefits of Depth Information in Object Pixel Masking (Student Abstract)

  • Anish Kachinthaya
  • Yi Ding
  • Tobias Hollerer

In this paper, we look at how depth data can benefit existing object masking methods applied in occluded scenes. Masking the pixel locations of objects within scenes helps computers get a spatial awareness of where objects are within images. The current state-of-the-art algorithm for masking objects in images is Mask R-CNN, which builds on the Faster R-CNN network to mask object pixels rather than just detecting their bounding boxes. This paper examines the weaknesses Mask R-CNN has in masking people when they are occluded in a frame. It then looks at how depth data gathered from an RGB- D sensor can be used. We provide a case study to show how simply applying thresholding methods on the depth information can aid in distinguishing occluded persons. The intention of our research is to examine how features from depth data can benefit object pixel masking methods in an explainable manner, especially in complex scenes with multiple objects.

NeurIPS Conference 2020 Conference Paper

Handling Missing Data with Graph Representation Learning

  • Jiaxuan You
  • Xiaobai Ma
  • Yi Ding
  • Mykel J. Kochenderfer
  • Jure Leskovec

Machine learning with missing data has been approached in many different ways, including feature imputation where missing feature values are estimated based on observed values and label prediction where downstream labels are learned directly from incomplete data. However, existing imputation models tend to have strong prior assumptions and cannot learn from downstream tasks, while models targeting label predictions often involve heuristics and can encounter scalability issues. Here we propose GRAPE, a framework for feature imputation as well as label prediction. GRAPE tackles the missing data problem using graph representation, where the observations and features are viewed as two types of nodes in a bipartite graph, and the observed feature values as edges. Under the GRAPE framework, the feature imputation is formulated as an edge-level prediction task and the label prediction as a node-level prediction task. These tasks are then solved with Graph Neural Networks. Experimental results on nine benchmark datasets show that GRAPE yields 20% lower mean absolute error for imputation tasks and 10% lower for label prediction tasks, compared with existing state-of-the-art methods.

NeurIPS Conference 2017 Conference Paper

Multiresolution Kernel Approximation for Gaussian Process Regression

  • Yi Ding
  • Risi Kondor
  • Jonathan Eskreis-Winkler

Gaussian process regression generally does not scale to beyond a few thousands data points without applying some sort of kernel approximation method. Most approximations focus on the high eigenvalue part of the spectrum of the kernel matrix, $K$, which leads to bad performance when the length scale of the kernel is small. In this paper we introduce Multiresolution Kernel Approximation (MKA), the first true broad bandwidth kernel approximation algorithm. Important points about MKA are that it is memory efficient, and it is a direct method, which means that it also makes it easy to approximate $K^{-1}$ and $\mathop{\textrm{det}}(K)$.

AAAI Conference 2015 Conference Paper

An Adaptive Gradient Method for Online AUC Maximization

  • Yi Ding
  • Peilin Zhao
  • Steven Hoi
  • Yew-Soon Ong

Learning for maximizing AUC performance is an important research problem in machine learning. Unlike traditional batch learning methods for maximizing AUC which often suffer from poor scalability, recent years have witnessed some emerging studies that attempt to maximize AUC by single-pass online learning approaches. Despite their encouraging results reported, the existing online AUC maximization algorithms often adopt simple stochastic gradient descent approaches, which fail to exploit the geometry knowledge of the data observed in the online learning process, and thus could suffer from relatively slow convergence. To overcome the limitation of the existing studies, in this paper, we propose a novel algorithm of Adaptive Online AUC Maximization (AdaOAM), by applying an adaptive gradient method for exploiting the knowledge of historical gradients to perform more informative online learning. The new adaptive updating strategy by AdaOAM is less sensitive to parameter settings due to its natural effect of tuning the learning rate. In addition, the time complexity of the new algorithm remains the same as the previous non-adaptive algorithms. To demonstrate the effectiveness of the proposed algorithm, we analyze its theoretical bound, and further evaluate its empirical performance on both public benchmark datasets and anomaly detection datasets. The encouraging empirical results clearly show the effectiveness and efficiency of the proposed algorithm.

AAAI Conference 2014 Conference Paper

Learning Relative Similarity by Stochastic Dual Coordinate Ascent

  • Pengcheng Wu
  • Yi Ding
  • Peilin Zhao
  • Chunyan Miao
  • Steven Hoi

Learning relative similarity from pairwise instances is an important problem in machine learning and has a wide range of applications. Despite being studied for years, some existing methods solved by Stochastic Gradient Descent (SGD) techniques generally suffer from slow convergence. In this paper, we investigate the application of Stochastic Dual Coordinate Ascent (SDCA) technique to tackle the optimization task of relative similarity learning by extending from vector to matrix parameters. Theoretically, we prove the optimal linear convergence rate for the proposed SDCA algorithm, beating the well-known sublinear convergence rate by the previous best metric learning algorithms. Empirically, we conduct extensive experiments on both standard and large-scale data sets to validate the effectiveness of the proposed algorithm for retrieval tasks.