Author name cluster

Songcan Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

42 papers

2 author rows

AAAI Conference 2026 Conference Paper

Beyond Observations: Reconstruction Error-Guided Irregularly Sampled Time Series Representation Learning

Jiexi Liu
Meng Cao
Songcan Chen

Irregularly sampled time series (ISTS), characterized by non-uniform time intervals with natural missingness, are prevalent in real-world applications. Existing approaches for ISTS modeling primarily rely on observed values to impute unobserved ones or infer latent dynamics. However, these methods overlook a critical source of learning signal: the reconstruction error inherently produced during model training. Such error implicitly reflects how well a model captures the underlying data structure and can serve as an informative proxy for unobserved values. To exploit this insight, we propose iTimER, a simple yet effective self-supervised pre-training framework for ISTS representation learning. iTimER models the distribution of reconstruction errors over observed values and generates pseudo-observations for unobserved timestamps through a mixup strategy between sampled errors and the last available observations. This transforms unobserved timestamps into noise-aware training targets, enabling meaningful reconstruction signals. A Wasserstein metric aligns reconstruction error distributions between observed and pseudo-observed regions, while a contrastive learning objective enhances the discriminability of learned representations. Extensive experiments on classification, interpolation, and forecasting tasks demonstrate that iTimER consistently outperforms state-of-the-art methods under the ISTS setting.

PDF Details DOI

AAAI Conference 2026 Conference Paper

HGLTR: Hierarchical Knowledge Injection for Calibrating Pre-trained Models in Long-Tail Recognition

Jinpeng Zheng
Shao-Yuan Li
Gan Xu
Wenhai Wan
Zijian Tao
Songcan Chen
Kangkan Wang

Long-tail recognition remains challenging for pre-trained foundation models like CLIP, which often suffer from performance degradation under imbalanced data. This stems not only from the overfitting/underfitting issues during fine-tuning but, more fundamentally, from the inherent bias inherited from the long-tail distribution of their massive pre-training datasets. To address this, we propose HGLTR (Hierarchy-Guided Long-Tail Recognition), a novel framework that calibrates pre-trained models by injecting objective class hierarchy knowledge. We argue that the semantic proximity defined by a hierarchy provides a robust, data-independent prior to counteract model bias. Our method is specifically designed for vision-language models' dual-modality architecture. At the feature level, we align image embeddings with a hierarchy-guided text similarity structure. At the classifier level, we employ a distillation loss to regularize predictions using soft labels derived from the hierarchy. This dual-level injection effectively transfers knowledge from head to tail classes. Experiments on ImageNet-LT, Places-LT, and iNaturalist 2018 demonstrate that HGLTR achieves state-of-the-art performance, particularly in tail-classes accuracy, highlighting the importance of leveraging structural priors to calibrate foundation models for real-world data.

PDF Details DOI

AAAI Conference 2026 Conference Paper

The Finer the Better: Towards Granular-aware Open-set Domain Generalization

Yunyun Wang
Zheng Duan
Xinyue Liao
Ke-Jia CHEN
Songcan Chen

Open-Set Domain Generalization (OSDG) aims to generalize over unseen target domains containing open classes, and the core challenge lies in identifying unknown samples never encountered during training. Recently, CLIP has exhibited impressive performance in OSDG, while it still falls into the dilemma between structural risk of known classes and open space risk from unknown classes, and easily suffers from over-confidence, especially when distinguishing known-like unknown samples. To this end, we propose a Semantic-enhanced CLIP (SeeCLIP) framework that leverages fine-grained semantics to boost unknown detection, so as to accommodate both risks and enable precise discrimination among categories. In SeeCLIP, we propose a semantic-aware prompt enhancement module to extract fine-grained key semantic features, and establish a fine-grained vision-language alignment. Duplex contrastive learning is proposed for prompt learning, which jointly optimizes duplex losses such that the unknown prompt is similar to known prompts, yet exhibits key semantic differences. We also design a semantic-guided diffusion module to enable nuanced capture in generation. By injecting perturbed key semantics into a diffusion model as control conditions, it generates the closest unknowns or pseudo-open samples with high similarity yet low belongingness to known classes. We formulate a generalization bound for OSDG, and show that SeeCLIP can achieve a lower generalization risk. Extensive experiments on benchmark datasets validate the superiority of SeeCLIP, it outperforms the SOTA methods by nearly 3% on accuracy and 5% on H-index, respectively.

PDF Details DOI

ECAI Conference 2025 Conference Paper

Beyond Myopia: Enhancing Few-Shot Open-Set Recognition via Hyperopia Distillation

Chuanxing Geng
Xiangshu Ding
Songcan Chen
Pong C. Yuen

Existing few-shot open-set recognition (FSOR) methods primarily employ the meta-learning mechanism, in which each meta-task randomly selects a small subset of base classes as knowns, and samples an equal number of classes from the remaining base classes as pseudo-unknowns. While effective, these methods potentially face two critical weaknesses: i) Class-identity overlapping: The same classes are designated as knowns in one meta-task but may be considered as pseudo-unknowns in another, leading to conflicts across meta-tasks and consequently degrading the model’s performance; ii) Narrow pseudo-unknown utilization: Each meta-task selects only a limited number of base classes as pseudo-unknowns rather than providing a broader view of more available base classes. Fundamentally, these issues arise from the myopia of the meta-learners in existing methods, as they lack a broad view on all available base classes. To this end, we strategically propose a novel Hyperopia Distillation Enhancement framework (HDE) for FSOR, which encourages the meta-learner to observe a broader view of pseudo-unknown classes without too worrying about the class-imbalance issue, while effectively mitigating the class-identity overlapping problem. The key to HDE lies in its dual hyperopia distillation mechanism, which enhances the meta-learner by hyperopically distilling available full-view inter-class relationships and more pseudo-unknown knowledge. Extensive experiments verify the effectiveness of our HDE.

Details

ICML Conference 2025 Conference Paper

Cut out and Replay: A Simple yet Versatile Strategy for Multi-Label Online Continual Learning

Xinrui Wang
Shao-Yuan Li
Jiaqiang Zhang
Songcan Chen

Multi-Label Online Continual Learning (MOCL) requires models to learn continuously from endless multi-label data streams, facing complex challenges including persistent catastrophic forgetting, potential missing labels, and uncontrollable imbalanced class distributions. While existing MOCL methods attempt to address these challenges through various techniques, they all overlook label-specific region identifying and feature learning - a fundamental solution rooted in multi-label learning but challenging to achieve in the online setting with incremental and partial supervision. To this end, we first leverage the inherent structural information of input data to evaluate and verify the innate localization capability of different pre-trained models. Then, we propose CUTER (CUT-out-and-Experience-Replay), a simple yet versatile strategy that provides fine-grained supervision signals by further identifying, strengthening and cutting out label-specific regions for efficient experience replay. It not only enables models to simultaneously address catastrophic forgetting, missing labels, and class imbalance challenges, but also serves as an orthogonal solution that seamlessly integrates with existing approaches. Extensive experiments on multiple multi-label image benchmarks demonstrate the superiority of our proposed method. The code is available at https: //github. com/wxr99/Cut-Replay

Details

IJCAI Conference 2025 Conference Paper

DM-POSA: Enhancing Open-World Test-Time Adaptation with Dual-Mode Matching and Prompt-Based Open Set Adaptation

Shiji Zhao
Shao-Yuan Li
Chuanxing Geng
Sheng-Jun Huang
Songcan Chen

The need to generalize the pre-trained deep learning models to unknown test-time data distributions has spurred research into test-time adaptation (TTA). Existing studies have mainly focused on closed-set TTA with only covariate shifts, while largely overlooking open-set TTA that involves semantic shifts, i. e. , unknown open-set classes. However, addressing adaptation to unknown classes is crucial for open-world safety-critical applications such as autonomous driving. In this paper, we emphasize that accurate identification of the open-set samples is rather challenging in TTA. The entanglement of semantic shift and covariate shift mutually confuse the network’s discriminative capability. This co-interference further exacerbates considering the single-pass data nature and low latency requirements. With this under standing, we propose Dual-mode Matching and Prompt-based Open Set Adaptation (DM-POSA) for open-set TTA to enhance discriminative feature learning and unknown classes distinguishment with minimal time cost. DM-POSA identifies open-set samples via dual-mode matching strategies, including model-parameter-based and feature space-based matching. It also optimizes the model with a random pairing discrepancy loss, enhancing the distributional difference between open-set and closed-set samples, thus improving the model’s ability to recognize unknown categories. Extensive experiments show the superiority of DM-POSA over state-of-the-art baselines on both closed-set class adaptation and open-set class detection.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Expand Horizon: Graph Out-of-Distribution Generalization via Multi-Level Environment Inference

Jiaqiang Zhang
Songcan Chen

Graph neural networks (GNNs) are widely used for node classification tasks, but when encountering distribution shifts due to environmental change in real-world scenarios, they tend to learn unstable correlations between features and labels. To overcome this dilemma, a powerful class of approaches views the environment as the root cause of those unstable correlations, thereby their key focus is to infer the environment involved, enabling the model to avoid capturing environment-sensitive correlations. However, their inferences rely solely on the single-level information from one low-hop ego-graph, neglecting both global information and multi-granularity information in local ego-graphs with different hops. Although applying deeper GNNs on the high-hop ego-graph could capture global information, it will bring the side effect of over-smoothing node representations. To tackle these issues, we propose a novel Multi-Level Environment Inference model named MLEI, which effectively broadens the horizon of training GNNs under node-level distribution shifts. Specifically, MLEI first leverages a linear graph transformer to surpass the scope of ego-graph, efficiently enabling high-level global environment inference. This global environment is in turn used as an overview to assist layer-by-layer environment inference on local multi-hop ego-graphs. Finally, we combine the environment from global and local views and utilize the designed objective function to capture stable predictive patterns. Extensive experiments on real-world datasets demonstrate that our model achieves satisfactory performance compared with the state-of-the-art methods under various distribution shifts.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

LoD: Loss-difference OOD Detection by Intentionally Label-Noisifying Unlabeled Wild Data

Chuanxing Geng
Qifei Li
Xinrui Wang
Dong Liang
Songcan Chen
Pong C. Yuen

Using unlabeled wild data containing both in-distribution (ID) and out-of-distribution (OOD) data to improve the safety and reliability of models has recently received increasing attention. Existing methods either design customized losses for labeled ID and unlabeled wild data then perform joint optimization, or first filter out OOD data from the latter then learn an OOD detector. While achieving varying degrees of success, two potential issues remain: (i) Labeled ID data typically dominates the learning of models, inevitably making models tend to fit OOD data as IDs; (ii) The selection of thresholds for identifying OOD data in unlabeled wild data usually faces dilemma due to the unavailability of pure OOD samples. To address these issues, we propose a novel loss-difference OOD detection framework (LoD) by intentionally label-noisifying unlabeled wild data. Such operations not only enable labeled ID data and OOD data in unlabeled wild data to jointly dominate the models' learning but also ensure the distinguishability of the losses between ID and OOD samples in unlabeled wild data, allowing the classic clustering technique (e. g. , K-means) to filter these OOD samples without requiring thresholds any longer. We also provide theoretical foundation for LoD's viability, and extensive experiments verify its superiority.

PDF Details DOI

AAAI Conference 2025 Conference Paper

MLC-NC: Long-Tailed Multi-Label Image Classification Through the Lens of Neural Collapse

Zijian Tao
Shao-Yuan Li
Wenhai Wan
Jinpeng Zheng
Jia-Yao Chen
Yuchen Li
Sheng-Jun Huang
Songcan Chen

Long-tailed (LT) data distribution is common in multi-label image classification (MLC) and can significantly impact the performance of classification models. One reason is the challenge of learning unbiased instance representations (i.e. features) for imbalanced datasets. Additionally, the co-occurrence of head/tail classes within the same instance, along with complex label dependencies, introduces further challenges. In this work, we delve into this problem through the lens of neural collapse (NC). NC refers to a phenomenon where the last-layer features and classifier of a deep neural network model exhibit a simplex Equiangular Tight Frame (ETF) structure during its terminal training phase. This structure creates an optimal linearly separable state. However, this phenomenon typically occurs in balanced datasets but rarely applies to the typical imbalanced problem. To induce NC properties under Long-tailed multi-label classification (LT-MLC) conditions, we propose an approach named MLC-NC, which aims to learn high-quality data representations and improve the model’s generalization ability. Specifically, MLC-NC accounts for the fact that different labels correspond to different feature parts located in images. MLC-NC extracts class-wise features from each instance through a cross-attention mechanism. To guide the features toward the ETF structure, we introduce visual-semantic feature alignment with a fixed ETF structured label embedding, which helps to learn evenly distributed class centers. To reduce within-class feature variation, we introduce collapse calibration within a lower-dimensional feature space. To mitigate classification bias, we concatenate features and feed them into a binarized fixed ETF classifier. As an orthogonal approach to existing methods, MLC-NC can be seamlessly integrated into various frameworks. Extensive experiments on widely-used benchmarks demonstrate the effectiveness of our method.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

ProMEA: Prompt-driven Expansion and Alignment for Single Domain Generalization

Yunyun Wang
Yi Guo
Xiaodong Liu
Songcan Chen

In single Domain Generalization (single-DG), data scarcity in the single source domain hampers the learning for invariant features, leading to overfitting over source domain and poor generalization to unseen target domains. Existing single-DG methods primarily augment the source domain by adversarial generation. However, there are still two key challenges. i) With simple feature perturbation to confuse the classifier, it may generate unnatural samples with semantic ambiguity or distortion. ii) It is still difficult to cover the sufficient shift in a real domain by generating indistinguishable samples from source data, thus the learning model is inescapable from overfitting to the single source domain. To this end, we turn to augment the domain prompt, considering that text prompt perturbation is easier to generate and generalize. Then the source domain is expanded with the guidance of augmented text prompts, which are learnable with both semantic consistency and style diversity. Specifically, we propose a ProMpt-driven Expansion and Alignment (ProMEA) method for single-DG, in which a Domain Prompt Expansion module is first developed to expand the single source domain with frequency features of augmented text prompts, in which the amplitude spectrum predominantly harbors the domain style information. With source prompts, a Domain Prompt Alignment module is further designed in inference for adapting target samples to the expanded source domains, in order to reduce the domain discrepancy. Finally, empirically results over single-DG benchmarks demonstrate the superiority of our proposal.

PDF Details DOI

AAAI Conference 2025 Conference Paper

TimeCHEAT: A Channel Harmony Strategy for Irregularly Sampled Multivariate Time Series Analysis

Jiexi Liu
Meng Cao
Songcan Chen

Irregularly sampled multivariate time series (ISMTS) are prevalent in reality. Due to their non-uniform intervals between successive observations and varying sampling rates among series, the channel-independent (CI) strategy, which has been demonstrated more desirable for complete multivariate time series forecasting in recent studies, has failed. This failure can be further attributed to the sampling sparsity, which provides insufficient information for effective CI learning, thereby reducing its capacity. When we resort to the channel-dependent (CD) strategy, even higher capacity cannot mitigate the potential loss of diversity in learning similar embedding patterns across different channels. We find that existing work considers CI and CD strategies to be mutually exclusive, primarily because they apply these strategies to the global channel. However, we hold the view that channel strategies do not necessarily have to be used globally. Instead, by appropriately applying them locally and globally, we can create an opportunity to take full advantage of both strategies. This leads us to introduce the Channel Harmony ISMTS Transformer (TimeCHEAT), which utilizes the CD strategy locally and the CI strategy globally. Specifically, we segment the ISMTS into sub-series level patches. Locally, the CD strategy aggregates information within each patch for time embedding learning, maximizing the use of relevant observations while reducing long-range irrelevant interference. Here, we enhance generality by transforming embedding learning into an edge weight prediction task using bipartite graphs, eliminating the need for special prior knowledge. Globally, the CI strategy is applied across patches, allowing the Transformer to learn individualized attention patterns for each channel. Experimental results indicate our proposed TimeCHEAT demonstrates competitive state-of-the-art performance across three mainstream tasks including classification, forecasting and interpolation.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Unlocking Better Closed-Set Alignment Based on Neural Collapse for Open-Set Recognition

Chaohua Li
Enhao Zhang
Chuanxing Geng
Songcan Chen

In recent Open-set Recognition (OSR) community, a prevailing belief is that enhancing the discriminative boundaries of closed-set classes can improve the robustness of Deep Neural Networks (DNNs) against open data during testing. Typical studies validate this *implicitly* by empirical evidence, without a formalized understanding of *how DNNs help the closed-set features obtain more discriminative boundaries?* For this, we provide an answer from the Neural Collapse (NC) perspective: DNNs align the closed-set with a *Simplex Equiangular Tight Frame* (ETF) structure that has geometric and mathematical interpretability. Regrettably, although NC naturally occurs in DNNs, we discover that typical studies cannot guarantee the features being learned to strictly align with the ETF. Thus, we introduce a novel concept, Fixed ETF Template (FiT), which holds an ideal structure associated with closed-set classes. To force class means and classifier vectors to align with FiT, we further design a Dual ETF (DEF) loss involving two components. Specifically, *F*-DEF loss is designed to align class means with FiT strictly, yielding optimal inter-class separability. Meanwhile, we extend a dual form to classifier vectors, termed *C*-DEF loss, which guides class means and classifier vectors to satisfy self-duality. Our theoretical analysis proves the validity of the proposed approach, and extensive experiments demonstrate that DEF achieves comparable or superior results with reduced computational resources on standard OSR benchmarks.

PDF Details DOI

AAAI Conference 2024 Conference Paper

All Beings Are Equal in Open Set Recognition

Chaohua Li
Enhao Zhang
Chuanxing Geng
Songcan Chen

In open-set recognition (OSR), a promising strategy is exploiting pseudo-unknown data outside given K known classes as an additional K+1-th class to explicitly model potential open space. However, treating unknown classes without distinction is unequal for them relative to known classes due to the category-agnostic and scale-agnostic of the unknowns. This inevitably not only disrupts the inherent distributions of unknown classes but also incurs both class-wise and instance-wise imbalances between known and unknown classes. Ideally, the OSR problem should model the whole class space as K+∞, but enumerating all unknowns is impractical. Since the core of OSR is to effectively model the boundaries of known classes, this means just focusing on the unknowns nearing the boundaries of targeted known classes seems sufficient. Thus, as a compromise, we convert the open classes from infinite to K, with a novel concept Target-Aware Universum (TAU) and propose a simple yet effective framework Dual Contrastive Learning with Target-Aware Universum (DCTAU). In details, guided by the targeted known classes, TAU automatically expands the unknown classes from the previous 1 to K, effectively alleviating the distribution disruption and the imbalance issues mentioned above. Then, a novel Dual Contrastive (DC) loss is designed, where all instances irrespective of known or TAU are considered as positives to contrast with their respective negatives. Experimental results indicate DCTAU sets a new state-of-the-art.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Dynamic against Dynamic: An Open-Set Self-Learning Framework

Haifeng Yang
Chuanxing Geng
Pong C. Yuen
Songcan Chen

In open set recognition, existing methods generally learn statically fixed decision boundaries to reject unknown classes. Though they have achieved promising results, such decision boundaries are evidently insufficient for universal unknown classes in dynamic and open scenarios as they can potentially appear at any position in the feature space. Moreover, these methods just simply reject unknown class samples during testing without any effective utilization for them. In fact, such samples completely can constitute the true instantiated representation of the unknown classes to further enhance the model's performance. To address these issues, this paper proposes a novel dynamic against dynamic idea, i. e. , dynamic method against dynamic changing open-set world, where an open-set self-learning (OSSL) framework is correspondingly developed. OSSL starts with a good closed-set classifier trained by known classes and utilizes available test samples for model adaptation during testing, thus gaining the adaptability to changing data distributions. In particular, a novel self-matching module is designed for OSSL, which can achieve the adaptation in automatically identifying known class samples while rejecting unknown class samples which are further utilized to enhance the discriminability of the model as the instantiated representation of unknown classes. Our method establishes new performance milestones respectively in almost all standard and cross-data benchmarks.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Forgetting, Ignorance or Myopia: Revisiting Key Challenges in Online Continual Learning

Xinrui Wang
Chuanxing Geng
Wenhai Wan
Shao-Yuan Li
Songcan Chen

Online continual learning (OCL) requires the models to learn from constant, endless streams of data. While significant efforts have been made in this field, most were focused on mitigating the \textit{catastrophic forgetting} issue to achieve better classification ability, at the cost of a much heavier training workload. They overlooked that in real-world scenarios, e. g. , in high-speed data stream environments, data do not pause to accommodate slow models. In this paper, we emphasize that \textit{model throughput}-- defined as the maximum number of training samples that a model can process within a unit of time -- is equally important. It directly limits how much data a model can utilize and presents a challenging dilemma for current methods. With this understanding, we revisit key challenges in OCL from both empirical and theoretical perspectives, highlighting two critical issues beyond the well-documented catastrophic forgetting: (\romannumeral1) Model's ignorance: the single-pass nature of OCL challenges models to learn effective features within constrained training time and storage capacity, leading to a trade-off between effective learning and model throughput; (\romannumeral2) Model's myopia: the local learning nature of OCL on the current task leads the model to adopt overly simplified, task-specific features and \textit{excessively sparse classifier}, resulting in the gap between the optimal solution for the current task and the global objective. To tackle these issues, we propose the Non-sparse Classifier Evolution framework (NsCE) to facilitate effective global discriminative feature learning with minimal time cost. NsCE integrates non-sparse maximum separation regularization and targeted experience replay techniques with the help of pre-trained models, enabling rapid acquisition of new globally discriminative features. Extensive experiments demonstrate the substantial improvements of our framework in performance, throughput and real-world practicality.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Mixup-Induced Domain Extrapolation for Domain Generalization

Meng Cao
Songcan Chen

Domain generalization aims to learn a well-performed classifier on multiple source domains for unseen target domains under domain shift. Domain-invariant representation (DIR) is an intuitive approach and has been of great concern. In practice, since the targets are variant and agnostic, only a few sources are not sufficient to reflect the entire domain population, leading to biased DIR. Derived from PAC-Bayes framework, we provide a novel generalization bound involving the number of domains sampled from the environment (N) and the radius of the Wasserstein ball centred on the target (r), which have rarely been considered before. Herein, we can obtain two natural and significant findings: when N increases, 1) the gap between the source and target sampling environments can be gradually mitigated; 2) the target can be better approximated within the Wasserstein ball. These findings prompt us to collect adequate domains against domain shift. For seeking convenience, we design a novel yet simple Extrapolation Domain strategy induced by the Mixup scheme, namely EDM. Through a reverse Mixup scheme to generate the extrapolated domains, combined with the interpolated domains, we expand the interpolation space spanned by the sources, providing more abundant domains to increase sampling intersections to shorten r. Moreover, EDM is easy to implement and be plugged-and-played. In experiments, EDM has been plugged into several methods in both closed and open set settings, achieving up to 5.73% improvement.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

No Regularization Is Needed: Efficient and Effective Incomplete Label Distribution Learning

Xiang Li
Songcan Chen

In reality, it is laborious to obtain complete label degrees, giving birth to Incomplete Label Distribution Learning (InLDL), where some degrees are missing. Existing InLDL methods often assume that degrees are uniformly random missing. However, it is often not the case in practice, which arises the first issue. Besides, they often adopt explicit regularization to compensate the incompleteness, leading to burdensome parameter tuning and extra computation, causing the second issue. To address the first issue, we adopt a more practical setting, i. e. , small degrees are more prone to be missing, since large degrees are likely to catch more attention. To tackle the second issue, we argue that label distribution itself already contains abundant knowledge, such as label correlation and ranking order, thus it may have provided enough prior for learning. It is precisely because existing methods overlook such a prior that leads to the forced adoption of explicit regularization. By directly utilizing the label degrees prior, we design a properly weighted objective function, exempting the need from explicit regularization. Moreover, we provide rigorous theoretical analysis, revealing in principle that the weighting plays an implicit regularization role. To sum up, our method has four advantages, it is 1) model selection free; 2) with closed-form solution (sub-problem) and easy-to-implement (a few lines of codes); 3) with linear computational complexity in the number of samples, thus scalable to large datasets; 4) competitive with state-of-the-arts in both random and non-random missing scenarios.

PDF Details DOI

AAAI Conference 2024 Conference Paper

TimesURL: Self-Supervised Contrastive Learning for Universal Time Series Representation Learning

Jiexi Liu
Songcan Chen

Learning universal time series representations applicable to various types of downstream tasks is challenging but valuable in real applications. Recently, researchers have attempted to leverage the success of self-supervised contrastive learning (SSCL) in Computer Vision(CV) and Natural Language Processing(NLP) to tackle time series representation. Nevertheless, due to the special temporal characteristics, relying solely on empirical guidance from other domains may be ineffective for time series and difficult to adapt to multiple downstream tasks. To this end, we review three parts involved in SSCL including 1) designing augmentation methods for positive pairs, 2) constructing (hard) negative pairs, and 3) designing SSCL loss. For 1) and 2), we find that unsuitable positive and negative pair construction may introduce inappropriate inductive biases, which neither preserve temporal properties nor provide sufficient discriminative features. For 3), just exploring segment- or instance-level semantics information is not enough for learning universal representation. To remedy the above issues, we propose a novel self-supervised framework named TimesURL. Specifically, we first introduce a frequency-temporal-based augmentation to keep the temporal property unchanged. And then, we construct double Universums as a special kind of hard negative to guide better contrastive learning. Additionally, we introduce time reconstruction as a joint optimization objective with contrastive learning to capture both segment-level and instance-level information. As a result, TimesURL can learn high-quality universal representations and achieve state-of-the-art performance in 6 different downstream tasks, including short- and long-term forecasting, imputation, classification, anomaly detection and transfer learning.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Unlocking the Power of Open Set: A New Perspective for Open-Set Noisy Label Learning

Wenhai Wan
Xinrui Wang
Ming-Kun Xie
Shao-Yuan Li
Sheng-Jun Huang
Songcan Chen

Learning from noisy data has attracted much attention, where most methods focus on closed-set label noise. However, a more common scenario in the real world is the presence of both open-set and closed-set noise. Existing methods typically identify and handle these two types of label noise separately by designing a specific strategy for each type. However, in many real-world scenarios, it would be challenging to identify open-set examples, especially when the dataset has been severely corrupted. Unlike the previous works, we explore how models behave when faced with open-set examples, and find that a part of open-set examples gradually get integrated into certain known classes, which is beneficial for the separation among known classes. Motivated by the phenomenon, we propose a novel two-step contrastive learning method CECL (Class Expansion Contrastive Learning) which aims to deal with both types of label noise by exploiting the useful information of open-set examples. Specifically, we incorporate some open-set examples into closed-set classes to enhance performance while treating others as delimiters to improve representative ability. Extensive experiments on synthetic and real-world datasets with diverse label noise demonstrate the effectiveness of CECL.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

ALL-E: Aesthetics-guided Low-light Image Enhancement

Ling Li
Dong Liang
Yuanhang Gao
Sheng-Jun Huang
Songcan Chen

Evaluating the performance of low-light image enhancement (LLE) is highly subjective, thus making integrating human preferences into image enhancement a necessity. Existing methods fail to consider this and present a series of potentially valid heuristic criteria for training enhancement models. In this paper, we propose a new paradigm, i. e. , aesthetics-guided low-light image enhancement (ALL-E), which introduces aesthetic preferences to LLE and motivates training in a reinforcement learning framework with an aesthetic reward. Each pixel, functioning as an agent, refines itself by recursive actions, i. e. , its corresponding adjustment curve is estimated sequentially. Extensive experiments show that integrating aesthetic assessment improves both subjective experience and objective evaluation. Our results on various benchmarks demonstrate the superiority of ALL-E over state-of-the-art methods. Source code: https: //dongl-group. github. io/project pages/ALLE. html

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Beyond Myopia: Learning from Positive and Unlabeled Data through Holistic Predictive Trends

Wang Xinrui
Wenhai Wan
Chuanxing Geng
Shao-Yuan Li
Songcan Chen

Learning binary classifiers from positive and unlabeled data (PUL) is vital in many real-world applications, especially when verifying negative examples is difficult. Despite the impressive empirical performance of recent PUL methods, challenges like accumulated errors and increased estimation bias persist due to the absence of negative labels. In this paper, we unveil an intriguing yet long-overlooked observation in PUL: \textit{resampling the positive data in each training iteration to ensure a balanced distribution between positive and unlabeled examples results in strong early-stage performance. Furthermore, predictive trends for positive and negative classes display distinctly different patterns. } Specifically, the scores (output probability) of unlabeled negative examples consistently decrease, while those of unlabeled positive examples show largely chaotic trends. Instead of focusing on classification within individual time frames, we innovatively adopt a holistic approach, interpreting the scores of each example as a temporal point process (TPP). This reformulates the core problem of PUL as recognizing trends in these scores. We then propose a novel TPP-inspired measure for trend detection and prove its asymptotic unbiasedness in predicting changes. Notably, our method accomplishes PUL without requiring additional parameter tuning or prior assumptions, offering an alternative perspective for tackling this problem. Extensive experiments verify the superiority of our method, particularly in a highly imbalanced real-world setting, where it achieves improvements of up to $11. 3\%$ in key metrics.

PDF Details

NeurIPS Conference 2022 Conference Paper

Can Adversarial Training Be Manipulated By Non-Robust Features?

Lue Tao
Lei Feng
Hongxin Wei
Jinfeng Yi
Sheng-Jun Huang
Songcan Chen

Adversarial training, originally designed to resist test-time adversarial examples, has shown to be promising in mitigating training-time availability attacks. This defense ability, however, is challenged in this paper. We identify a novel threat model named stability attack, which aims to hinder robust availability by slightly manipulating the training data. Under this threat, we show that adversarial training using a conventional defense budget $\epsilon$ provably fails to provide test robustness in a simple statistical setting, where the non-robust features of the training data can be reinforced by $\epsilon$-bounded perturbation. Further, we analyze the necessity of enlarging the defense budget to counter stability attacks. Finally, comprehensive experiments demonstrate that stability attacks are harmful on benchmark datasets, and thus the adaptive defense is necessary to maintain robustness.

PDF Details

IJCAI Conference 2022 Conference Paper

Reconstruction Enhanced Multi-View Contrastive Learning for Anomaly Detection on Attributed Networks

Jiaqiang Zhang
Senzhang Wang
Songcan Chen

Detecting abnormal nodes from attributed networks is of great importance in many real applications, such as financial fraud detection and cyber security. This task is challenging due to both the complex interactions between the anomalous nodes with other counterparts and their inconsistency in terms of attributes. This paper proposes a self-supervised learning framework that jointly optimizes a multi-view contrastive learning-based module and an attribute reconstruction-based module to more accurately detect anomalies on attributed networks. Specifically, two contrastive learning views are firstly established, which allow the model to better encode rich local and global information related to the abnormality. Motivated by the attribute consistency principle between neighboring nodes, a masked autoencoder-based reconstruction module is also introduced to identify the nodes which have large reconstruction errors, then are regarded as anomalies. Finally, the two complementary modules are integrated for more accurately detecting the anomalous nodes. Extensive experiments conducted on five benchmark datasets show our model outperforms current state-of-the-art models.

PDF Details DOI

AAAI Conference 2022 Conference Paper

With False Friends Like These, Who Can Notice Mistakes?

Lue Tao
Lei Feng
Jinfeng Yi
Songcan Chen

Adversarial examples crafted by an explicit adversary have attracted significant attention in machine learning. However, the security risk posed by a potential false friend has been largely overlooked. In this paper, we unveil the threat of hypocritical examples—inputs that are originally misclassified yet perturbed by a false friend to force correct predictions. While such perturbed examples seem harmless, we point out for the first time that they could be maliciously used to conceal the mistakes of a substandard (i. e. , not as good as required) model during an evaluation. Once a deployer trusts the hypocritical performance and applies the “well-performed” model in realworld applications, unexpected failures may happen even in benign environments. More seriously, this security risk seems to be pervasive: we find that many types of substandard models are vulnerable to hypocritical examples across multiple datasets. Furthermore, we provide the first attempt to characterize the threat with a metric called hypocritical risk and try to circumvent it via several countermeasures. Results demonstrate the effectiveness of the countermeasures, while the risk remains non-negligible even after adaptive robust training.

PDF Details

NeurIPS Conference 2021 Conference Paper

Better Safe Than Sorry: Preventing Delusive Adversaries with Adversarial Training

Lue Tao
Lei Feng
Jinfeng Yi
Sheng-Jun Huang
Songcan Chen

Delusive attacks aim to substantially deteriorate the test accuracy of the learning model by slightly perturbing the features of correctly labeled training examples. By formalizing this malicious attack as finding the worst-case training data within a specific $\infty$-Wasserstein ball, we show that minimizing adversarial risk on the perturbed data is equivalent to optimizing an upper bound of natural risk on the original data. This implies that adversarial training can serve as a principled defense against delusive attacks. Thus, the test accuracy decreased by delusive attacks can be largely recovered by adversarial training. To further understand the internal mechanism of the defense, we disclose that adversarial training can resist the delusive perturbations by preventing the learner from overly relying on non-robust features in a natural setting. Finally, we complement our theoretical findings with a set of experiments on popular benchmark datasets, which show that the defense withstands six different practical attacks. Both theoretical and empirical results vote for adversarial training when confronted with delusive adversaries.

PDF Details

AAAI Conference 2021 Conference Paper

Improving Model Robustness by Adaptively Correcting Perturbation Levels with Active Queries

Kun-Peng Ning
Lue Tao
Songcan Chen
Sheng-Jun Huang

In addition to high accuracy, robustness is becoming increasingly important for machine learning models in various applications. Recently, much research has been devoted to improving the model robustness by training with noise perturbations. Most existing studies assume a fixed perturbation level for all training examples, which however hardly holds in real tasks. In fact, excessive perturbations may destroy the discriminative content of an example, while deficient perturbations may fail to provide helpful information for improving the robustness. Motivated by this observation, we propose to adaptively adjust the perturbation levels for each example in the training process. Specifically, a novel active learning framework is proposed to allow the model to interactively query the correct perturbation level from human experts. By designing a cost-effective sampling strategy along with a new query type, the robustness can be significantly improved with a few queries. Both theoretical analysis and experimental studies validate the effectiveness of the proposed approach.

PDF Details

ICML Conference 2020 Conference Paper

Accelerated Stochastic Gradient-free and Projection-free Methods

Feihu Huang 0001
Lue Tao
Songcan Chen

In the paper, we propose a class of accelerated stochastic gradient-free and projection-free (a. k. a. , zeroth-order Frank-Wolfe) methods to solve the constrained stochastic and finite-sum nonconvex optimization. Specifically, we propose an accelerated stochastic zeroth-order Frank-Wolfe (Acc-SZOFW) method based on the variance reduced technique of SPIDER/SpiderBoost and a novel momentum accelerated technique. Moreover, under some mild conditions, we prove that the Acc-SZOFW has the function query complexity of $O(d\sqrt{n}\epsilon^{-2})$ for finding an $\epsilon$-stationary point in the finite-sum problem, which improves the exiting best result by a factor of $O(\sqrt{n}\epsilon^{-2})$, and has the function query complexity of $O(d\epsilon^{-3})$ in the stochastic problem, which improves the exiting best result by a factor of $O(\epsilon^{-1})$. To relax the large batches required in the Acc-SZOFW, we further propose a novel accelerated stochastic zeroth-order Frank-Wolfe (Acc-SZOFW*) based on a new variance reduced technique of STORM, which still reaches the function query complexity of $O(d\epsilon^{-3})$ in the stochastic problem without relying on any large batches. In particular, we present an accelerated framework of the Frank-Wolfe methods based on the proposed momentum accelerated technique. The extensive experimental results on black-box adversarial attack and robust black-box classification demonstrate the efficiency of our algorithms.

Details

TIST Journal 2020 Journal Article

Moment-Guided Discriminative Manifold Correlation Learning on Ordinal Data

Qing Tian
Wenqiang Zhang
Meng Cao
Liping Wang
Songcan Chen
Hujun Yin

Canonical correlation analysis (CCA) is a typical and useful learning paradigm in big data analysis for capturing correlation across multiple views of the same objects. When dealing with data with additional ordinal information, traditional CCA suffers from poor performance due to ignoring the ordinal relationships within the data. Such data is becoming increasingly common, as either temporal or sequential information is often associated with the data collection process. To incorporate the ordinal information into the objective function of CCA, the so-called ordinal discriminative CCA has been presented in the literature. Although ordinal discriminative CCA can yield better ordinal regression results, its performance deteriorates when data is corrupted with noise and outliers, as it tends to smear the order information contained in class centers. To address this issue, in this article we construct a robust manifold-preserved ordinal discriminative correlation regression (rmODCR). The robustness is achieved by replacing the traditional ( l 2 -norm) class centers with l p -norm centers, where p is efficiently estimated according to the moments of the data distributions, as well as by incorporating the manifold distribution information of the data in the objective optimization. In addition, we further extend the robust manifold-preserved ordinal discriminative correlation regression to deep convolutional architectures. Extensive experimental evaluations have demonstrated the superiority of the proposed methods.

Details DOI

AAAI Conference 2020 Conference Paper

Uncertainty Aware Graph Gaussian Process for Semi-Supervised Learning

Zhao-Yang Liu
Shao-Yuan Li
Songcan Chen
Yao Hu
Sheng-Jun Huang

Graph-based semi-supervised learning (GSSL) studies the problem where in addition to a set of data points with few available labels, there also exists a graph structure that describes the underlying relationship between data items. In practice, structure uncertainty often occurs in graphs when edges exist between data with different labels, which may further results in prediction uncertainty of labels. Considering that Gaussian process generalizes well with few labels and can naturally model uncertainty, in this paper, we propose an Uncertainty aware Graph Gaussian Process based approach (UaGGP) for GSSL. UaGGP exploits the prediction uncertainty and label smooth regularization to guide each other during learning. To further subdue the effect of irrelevant neighbors, UaGGP also aggregates the clean representation in the original space and the learned representation. Experiments on benchmarks demonstrate the effectiveness of the proposed approach.

PDF Details

AAAI Conference 2019 Conference Paper

Faster Gradient-Free Proximal Stochastic Methods for Nonconvex Nonsmooth Optimization

Feihu Huang
Bin Gu
Zhouyuan Huo
Songcan Chen
Heng Huang

Proximal gradient method has been playing an important role to solve many machine learning tasks, especially for the nonsmooth problems. However, in some machine learning problems such as the bandit model and the black-box learning problem, proximal gradient method could fail because the explicit gradients of these problems are difficult or infeasible to obtain. The gradient-free (zeroth-order) method can address these problems because only the objective function values are required in the optimization. Recently, the first zeroth-order proximal stochastic algorithm was proposed to solve the nonconvex nonsmooth problems. However, its convergence rate is O( 1 √ T ) for the nonconvex problems, which is significantly slower than the best convergence rate O( 1 T ) of the zerothorder stochastic algorithm, where T is the iteration number. To fill this gap, in the paper, we propose a class of faster zeroth-order proximal stochastic methods with the variance reduction techniques of SVRG and SAGA, which are denoted as ZO-ProxSVRG and ZO-ProxSAGA, respectively. In theoretical analysis, we address the main challenge that an unbiased estimate of the true gradient does not hold in the zerothorder case, which was required in previous theoretical analysis of both SVRG and SAGA. Moreover, we prove that both ZO-ProxSVRG and ZO-ProxSAGA algorithms have O( 1 T ) convergence rates. Finally, the experimental results verify that our algorithms have a faster convergence rate than the existing zeroth-order proximal stochastic algorithm.

PDF Details

ICML Conference 2019 Conference Paper

Faster Stochastic Alternating Direction Method of Multipliers for Nonconvex Optimization

Feihu Huang 0001
Songcan Chen
Heng Huang 0001

In this paper, we propose a faster stochastic alternating direction method of multipliers (ADMM) for nonconvex optimization by using a new stochastic path-integrated differential estimator (SPIDER), called as SPIDER-ADMM. Moreover, we prove that the SPIDER-ADMM achieves a record-breaking incremental first-order oracle (IFO) complexity for finding an $\epsilon$-approximate solution. As one of major contribution of this paper, we provide a new theoretical analysis framework for nonconvex stochastic ADMM methods with providing the optimal IFO complexity. Based on this new analysis framework, we study the unsolved optimal IFO complexity of the existing non-convex SVRG-ADMM and SAGA-ADMM methods, and prove their the optimal IFO complexity. Thus, the SPIDER-ADMM improves the existing stochastic ADMM methods. Moreover, we extend SPIDER-ADMM to the online setting, and propose a faster online SPIDER-ADMM. Our theoretical analysis also derives the IFO complexity of the online SPIDER-ADMM. Finally, the experimental results on benchmark datasets validate that the proposed algorithms have faster convergence rate than the existing ADMM algorithms for nonconvex optimization.

Details

AAAI Conference 2019 Conference Paper

One-Pass Incomplete Multi-View Clustering

Menglei Hu
Songcan Chen

Real data are often with multiple modalities or from multiple heterogeneous sources, thus forming so-called multi-view data, which receives more and more attentions in machine learning. Multi-view clustering (MVC) becomes its important paradigm. In real-world applications, some views often suffer from instances missing. Clustering on such multi-view datasets is called incomplete multi-view clustering (IMC) and quite challenging. To date, though many approaches have been developed, most of them are offline and have high computational and memory costs especially for large scale datasets. To address this problem, in this paper, we propose an One-Pass Incomplete Multi-view Clustering framework (OPIMC). With the help of regularized matrix factorization and weighted matrix factorization, OPIMC can relatively easily deal with such problem. Different from the existing and sole online IMC method, OPIMC can directly get clustering results and effectively determine the termination of iteration process by introducing two global statistics. Finally, extensive experiments conducted on four real datasets demonstrate the efficiency and effectiveness of the proposed OPIMC method.

PDF Details

IJCAI Conference 2019 Conference Paper

Zeroth-Order Stochastic Alternating Direction Method of Multipliers for Nonconvex Nonsmooth Optimization

Feihu Huang
Shangqian Gao
Songcan Chen
Heng Huang

Alternating direction method of multipliers (ADMM) is a popular optimization tool for the composite and constrained problems in machine learning. However, in many machine learning problems such as black-box learning and bandit feedback, ADMM could fail because the explicit gradients of these problems are difficult or even infeasible to obtain. Zeroth-order (gradient-free) methods can effectively solve these problems due to that the objective function values are only required in the optimization. Recently, though there exist a few zeroth-order ADMM methods, they build on the convexity of objective function. Clearly, these existing zeroth-order methods are limited in many applications. In the paper, thus, we propose a class of fast zeroth-order stochastic ADMM methods (\emph{i. e. }, ZO-SVRG-ADMM and ZO-SAGA-ADMM) for solving nonconvex problems with multiple nonsmooth penalties, based on the coordinate smoothing gradient estimator. Moreover, we prove that both the ZO-SVRG-ADMM and ZO-SAGA-ADMM have convergence rate of $O(1/T)$, where $T$ denotes the number of iterations. In particular, our methods not only reach the best convergence rate of $O(1/T)$ for the nonconvex optimization, but also are able to effectively solve many complex machine learning problems with multiple regularized penalties and constraints. Finally, we conduct the experiments of black-box binary classification and structured adversarial attack on black-box deep neural network to validate the efficiency of our algorithms.

PDF Details

IJCAI Conference 2018 Conference Paper

Doubly Aligned Incomplete Multi-view Clustering

Menglei Hu
Songcan Chen

Nowadays, multi-view clustering has attracted more and more attention. To date, almost all the previous studies assume that views are complete. However, in reality, it is often the case that each view may contain some missing instances. Such incompleteness makes it impossible to directly use traditional multi-view clustering methods. In this paper, we propose a Doubly Aligned Incomplete Multi-view Clustering algorithm (DAIMC) based on weighted semi-nonnegative matrix factorization (semi-NMF). Specifically, on the one hand, DAIMC utilizes the given instance alignment information to learn a common latent feature matrix for all the views. On the other hand, DAIMC establishes a consensus basis matrix with the help of L2, 1-Norm regularized regression for reducing the influence of missing instances. Consequently, compared with existing methods, besides inheriting the strength of semi-NMF with ability to handle negative entries, DAIMC has two unique advantages: 1) solving the incomplete view problem by introducing a respective weight matrix for each view, making it able to easily adapt to the case with more than two views; 2) reducing the influence of view incompleteness on clustering by enforcing the basis matrices of individual views being aligned with the help of regression. Experiments on four real-world datasets demonstrate its advantages.

PDF Details

IJCAI Conference 2017 Conference Paper

Multi-instance multi-label active learning

Sheng-Jun Huang
Nengneng Gao
Songcan Chen

Multi-instance multi-label learning(MIML) has been successfully applied into many real-world applications. Along with the enhancing of the expressive power, the cost of labelling a MIML example increases significantly. And thus it becomes an important task to train an effective MIML model with as few labelled examples as possible. Active learning, which actively selects the most valuable data to query their labels, is a main approach to reducing labeling cost. Existing active methods achieved great success in traditional learning tasks, but cannot be directly applied to MIML problems. In this paper, we propose a MIML active learning algorithm, which exploits diversity and uncertainty in both the input and output space to query the most valuable information. This algorithm designs a novel query strategy for MIML objects specifically and acquires more precise information from the oracle without addition cost. Based on the queried information, the MIML model is then effectively trained by simultaneously optimizing the relative rank among instances and labels.

PDF Details

AAAI Conference 2017 Conference Paper

Semi-Supervised Multi-View Correlation Feature Learning with Application to Webpage Classification

Xiao-Yuan Jing
Fei Wu
Xiwei Dong
Shiguang Shan
Songcan Chen

Webpage classification has attracted a lot of research interest. Webpage data is often multi-view and high-dimensional, and the webpage classification application is usually semisupervised. Due to these characteristics, using semisupervised multi-view feature learning (SMFL) technique to deal with the webpage classification problem has recently received much attention. However, there still exists room for improvement for this kind of feature learning technique. How to effectively utilize the correlation information among multi-view of webpage data is an important research topic. Correlation analysis on multi-view data can facilitate extraction of the complementary information. In this paper, we propose a novel SMFL approach, named semi-supervised multi-view correlation feature learning (SMCFL), for webpage classification. SMCFL seeks for a discriminant common space by learning a multi-view shared transformation in a semi-supervised manner. In the discriminant space, the correlation between intra-class samples is maximized, and the correlation between inter-class samples and the global correlation among both labeled and unlabeled samples are minimized simultaneously. We transform the matrix-variable based nonconvex objective function of SMCFL into a convex quadratic programming problem with one real variable, and can achieve a global optimal solution. Experiments on widely used datasets demonstrate the effectiveness and efficiency of the proposed approach.

PDF Details

IJCAI Conference 2016 Conference Paper

Transfer Learning with Active Queries from Source Domain

Sheng-Jun Huang
Songcan Chen

To learn with limited labeled data, active learning tries to query more labels from an oracle, while transfer learning tries to utilize the labeled data from a related source domain. However, in many real cases, there is very few labeled data in both source and target domains, and the oracle is unavailable in the target domain. To solve this practical yet rarely studied problem, in this paper, we jointly perform transfer learning and active learning by querying the most valuable information from the source domain. The computation of importance weights for domain adaptation and the instance selection for active queries are integrated into one unified framework based on distribution matching, which is further solved with alternating optimization. The effectiveness of the proposed method is validated by experiments on 15 datasets for sentiment analysis and text categorization.

PDF Details

IJCAI Conference 2015 Conference Paper

Multi-Label Active Learning: Query Type Matters

Sheng-Jun Huang
Songcan Chen
Zhi-Hua Zhou

Active learning reduces the labeling cost by selectively querying the most valuable information from the annotator. It is essentially important for multilabel learning, where the labeling cost is rather high because each object may be associated with multiple labels. Existing multi-label active learning (MLAL) research mainly focuses on the task of selecting instances to be queried. In this paper, we disclose for the first time that the query type, which decides what information to query for the selected instance, is more important. Based on this observation, we propose a novel MLAL framework to query the relevance ordering of label pairs, which gets richer information from each query and requires less expertise of the annotator. By incorporating a simple selection strategy and a label ranking model into our framework, the proposed approach can reduce the labeling effort of annotators significantly. Experiments on 20 benchmark datasets and a manually labeled real data validate that our approach not only achieves superior performance on classification, but also provides accurate ranking for relevant labels.

PDF Details

IJCAI Conference 2015 Conference Paper

Web Page Classification Based on Uncorrelated Semi-Supervised Intra-View and Inter-View Manifold Discriminant Feature Extraction

Xiao-Yuan Jing
Qian Liu
Fei Wu
Baowen Xu
Yangping Zhu
Songcan Chen

Web page classification has attracted increasing research interest. It is intrinsically a multi-view and semi-supervised application, since web pages usually contain two or more types of data, such as text, hyperlinks and images, and unlabeled pages are generally much more than labeled ones. Web page data is commonly high-dimensional. Thus, how to extract useful features from this kind of data in the multi-view semi-supervised scenario is important for web page classification. To our knowledge, only one method is specially presented for this topic. And with respect to a few semisupervised multi-view feature extraction methods on other applications, there still exists much room for improvement. In this paper, we firstly design a feature extraction schema called semi-supervised intra-view and inter-view manifold discriminant (SI2 MD) learning, which sufficiently utilizes the intra-view and inter-view discriminant information of labeled samples and the local neighborhood structures of unlabeled samples. We then design a semi-supervised uncorrelation constraint for the SI2 MD schema to remove the multi-view correlation in the semi-supervised scenario. By combining the SI2 MD schema with the constraint, we propose an uncorrelated semi-supervised intra-view and inter-view manifold discriminant (USI2 MD) learning approach for web page classification. Experiments on public web page databases validate the proposed approach.

PDF Details

AAAI Conference 2012 Conference Paper

Ensemble Feature Weighting Based on Local Learning and Diversity

Yun Li
Suyan Gao
Songcan Chen

Recently, besides the performance, the stability (robustness, i. e. , the variation in feature selection results due to small changes in the data set) of feature selection is received more attention. Ensemble feature selection where multiple feature selection outputs are combined to yield more robust results without sacrificing the performance is an effective method for stable feature selection. In order to make further improvements of the performance (classification accuracy), the diversity regularized ensemble feature weighting framework is presented, in which the base feature selector is based on local learning with logistic loss for its robustness to huge irrelevant features and small samples. At the same time, the sample complexity of the proposed ensemble feature weighting algorithm is analyzed based on the VCtheory. The experiments on different kinds of data sets show that the proposed ensemble method can achieve higher accuracy than other ensemble ones and other stable feature selection strategy (such as sample weighting) without sacrificing stability.

PDF Details

ICML Conference 2011 Conference Paper

BCDNPKL: Scalable Non-Parametric Kernel Learning Using Block Coordinate Descent

Enliang Hu
Bo Wang 0022
Songcan Chen

Details

IJCAI Conference 2009 Conference Paper

Jun Liu
Jianhui Chen
Songcan Chen
Jieping Ye

Kernel methods have been applied successfully in many applications. The kernel matrix plays an important role in kernel-based learning methods, but the “ideal” kernel matrix is usually unknown in practice and needs to be estimated. In this paper, we propose to directly learn the “ideal” kernel matrix (called the optimal neighborhood kernel matrix) from a pre-speciﬁed kernel matrix for improved classiﬁcation performance. We assume that the prespeciﬁed kernel matrix generated from the speciﬁc application is a noisy observation of the ideal one. The resulting optimal neighborhood kernel matrix is shown to be the summation of the pre-speciﬁed kernel matrix and a rank-one matrix. We formulate the problem of learning the optimal neighborhood kernel as a constrained quartic problem, and propose to solve it using two methods: level method and constrained gradient descent. Empirical results on several benchmark data sets demonstrate the efﬁciency and effectiveness of the proposed algorithms.

PDF Details