Arrow Research search

Author name cluster

Ivor Tsang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

42 papers
1 author row

Possible papers

42

AAAI Conference 2026 Conference Paper

Correspondence Coverage Matters for Multi-Modal Dataset Distillation

  • Zhuohang Dang
  • Minnan Luo
  • Chengyou Jia
  • Hangwei Qian
  • Xinyu Zhang
  • Xiaojun Chang
  • Ivor Tsang

Multi-modal dataset distillation (DD) condenses large datasets into compact ones that retain task efficacy by capturing correspondence patterns, i.e., shared semantics between paired modalities. However, such patterns rely on cross-modal similarity and cannot be faithfully captured by intra-modal similarity of current unimodal strategies. As a result, current multi-modal DD methods tend to over-concentrate, redundantly encoding similar correspondence patterns and thus limiting generalizability. To this end, we propose a novel multi-modal DD framework to systematically Promote Correspondence coverage, i.e., ProCo. Initially, we develop a correspondence consistency metric based on cross-modal retrieval distributions to cluster correspondence patterns. These clusters capture the underlying correspondence distribution, enabling ProCo to initialize distilled data with representative patterns while regularizing optimization to promote correspondence representativeness and diversity. Moreover, we employ conditional neural fields for efficient distilled data parameterization, enhancing fine-grained pattern capture while allowing more distilled data under a fixed budget to boost correspondence coverage. Extensive experiments verify that our ProCo achieves superior and elastic budget-efficacy trade-offs, surpassing prior methods by over 15% with 10x distillation budget reduction, highlighting its real-world practicality.

AAAI Conference 2026 System Paper

GenMatLab: A Generative Platform for Inverse Materials Design

  • Hangwei Qian
  • Yang He
  • Yaxin Shi
  • Ivor Tsang

In this demo, we present GenMatLab, a user-friendly web platform that makes latest AI techniques accessible for inverse materials design. The platform integrates data analysis and generative modeling into an easy-to-use interface, enabling researchers, material domain experts, and practitioners to explore and apply AI techniques without requiring advanced coding expertise. At its core are generative AI models that support interactive operations, allowing users to conduct inverse design and investigate generated candidates in an intuitive and exploratory way. By lowering technical barriers, GenMatLab empowers a broader community to leverage cutting-edge AI methods for accelerating materials discovery.

AAAI Conference 2026 Conference Paper

MAGIC: Mastering Physical Adversarial Generation in Context Through Collaborative LLM Agents

  • Yun Xing
  • Nhat Chung
  • Jie Zhang
  • Yue Cao
  • Ivor Tsang
  • Yang Liu
  • Lei Ma
  • Qing Guo

Physical adversarial attacks in driving scenarios can expose critical vulnerabilities in visual perception models. However, developing such attacks remains non-trivial due to diverse real-world environmental influences. Existing approaches either struggle to generalize to dynamic environments or fail to achieve consistent physical attack performance. To address these challenges, we propose MAGIC (Mastering Physical Adversarial Generation In Context), a novel framework powered by multi-modal LLM agents to automatically understand the scene context during testing time and generate adversarial patches through synergistic interaction of language and vision understanding. Specifically, MAGIC orchestrates three specialized LLM agents: the adv-patch generation agent masters the creation of deceptive patches via strategic prompt manipulation for text-to-image models; the adv-patch deployment agent ensures contextual coherence by determining optimal deployment strategies based on scene understanding; and the self-examination agent completes this trilogy by providing critical oversight and iterative refinement of both processes. We validate our approach with both digital and physical scenarios, i.e., nuImage and real-world scenes, where both statistical and visual results demonstrate that our MAGIC is powerful and effective for attacking widely applied object detection systems, such as YOLO and DETR series.

NeurIPS Conference 2025 Conference Paper

AngleRoCL: Angle-Robust Concept Learning for Physically View-Invariant Adversarial Patches

  • Wenjun Ji
  • Yuxiang Fu
  • Luyang Ying
  • Deng-Ping Fan
  • Yuyi Wang
  • Ming-Ming Cheng
  • Ivor Tsang
  • Qing Guo

Cutting-edge works have demonstrated that text-to-image (T2I) diffusion models can generate adversarial patches that mislead state-of-the-art object detectors in the physical world, revealing detectors' vulnerabilities and risks. However, these methods neglect the T2I patches' attack effectiveness when observed from different views in the physical world (i. e. , angle robustness of the T2I adversarial patches). In this paper, we study the angle robustness of T2I adversarial patches comprehensively, revealing their angle-robust issues, demonstrating that texts affect the angle robustness of generated patches significantly, and task-specific linguistic instructions fail to enhance the angle robustness. Motivated by the studies, we introduce Angle-Robust Concept Learning (AngleRoCL), a simple and flexible approach that learns a generalizable concept (i. e. , text embeddings in implementation) representing the capability of generating angle-robust patches. The learned concept can be incorporated into textual prompts and guides T2I models to generate patches with their attack effectiveness inherently resistant to viewpoint variations. Through extensive simulation and physical-world experiments on five SOTA detectors across multiple views, we demonstrate that AngleRoCL significantly enhances the angle robustness of T2I adversarial patches compared to baseline methods. Our patches maintain high attack success rates even under challenging viewing conditions, with over 50% average relative improvement in attack effectiveness across multiple angles. This research advances the understanding of physically angle-robust patches and provides insights into the relationship between textual concepts and physical properties in T2I-generated contents. We released our code at https: //github. com/tsingqguo/anglerocl.

NeurIPS Conference 2025 Conference Paper

DepthVanish: Optimizing Adversarial Interval Structures for Stereo-Depth-Invisible Patches

  • Yun Xing
  • Yue Cao
  • Nhat Chung
  • Jie Zhang
  • Ivor Tsang
  • Ming-Ming Cheng
  • Yang Liu
  • Lei Ma

Stereo depth estimation is a critical task in autonomous driving and robotics, where inaccuracies (such as misidentifying nearby objects as distant) can lead to dangerous situations. Adversarial attacks against stereo depth estimation can help revealing vulnerabilities before deployment. Previous works have shown that repeating optimized textures can effectively mislead stereo depth estimation in digital settings. However, our research reveals that these naively repeated textures perform poorly in physical implementations, $\textit{i. e. }$, when deployed as patches, limiting their practical utility for stress-testing stereo depth estimation systems. In this work, for the first time, we discover that introducing regular intervals among the repeated textures, creating a grid structure, significantly enhances the patch attack performance. Through extensive experimentation, we analyze how variations of this novel structure influence the adversarial effectiveness. Based on these insights, we develop a novel stereo depth attack that jointly optimizes both the interval structure and texture elements. Our generated adversarial patches can be inserted into any scenes and successfully attack advanced stereo depth estimation methods of different paradigms, $\textit{i. e. }$, RAFT-Stereo and STTR. Most critically, our patch can also attack commercial RGB-D cameras (Intel RealSense) in real-world conditions, demonstrating their practical relevance for security assessment of stereo systems. The code is officially released at: https: //github. com/WiWiN42/DepthVanish

TMLR Journal 2025 Journal Article

Graph Potential Field Neural Network for Massive Agents Group-wise Path Planning

  • Yueming LYU
  • Xiaowei Zhou
  • Xingrui Yu
  • Ivor Tsang

Multi-agent path planning is important in both multi-agent path finding and multi-agent reinforcement learning areas. However, continual group-wise multi-agent path planning that requires the agents to perform as a team to pursue high team scores instead of individually is less studied. To address this problem, we propose a novel graph potential field-based neural network (GPFNN), which models a valid potential field map for path planning. Our GPFNN unfolds the T-step iterative optimization of the potential field maps as a T-layer feedforward neural network. Thus, a deeper GPFNN leads to more precise potential field maps without the over-smoothing issue. A potential field map inherently provides a monotonic potential flow from any source node to the target nodes to construct the optimal path (w.r.t. the potential decay), equipping our GPFNN with an elegant planning ability. Moreover, we incorporate dynamically updated boundary conditions into our GPFNN to address group-wise multi-agent path planning that supports both static targets and dynamic moving targets. Empirically, experiments on three different-sized mazes (up to $1025 \times 1025$ sized mazes) with up to 1,000 agents demonstrate the planning ability of our GPFNN to handle both static and dynamic moving targets. Experiments on extensive graph node classification tasks on six graph datasets (up to millions of nodes) demonstrate the learning ability of our GPFNN.

IJCAI Conference 2025 Conference Paper

Grounding Open-Domain Knowledge from LLMs to Real-World Reinforcement Learning Tasks: A Survey

  • Haiyan Yin
  • Hangwei Qian
  • Yaxin Shi
  • Ivor Tsang
  • Yew-Soon Ong

Grounding open-domain knowledge from large language models (LLMs) into real-world reinforcement learning (RL) tasks represents a transformative frontier in developing intelligent agents capable of advanced reasoning, adaptive planning, and robust decision-making in dynamic environments. In this paper, we introduce the LLM-RL Grounding Taxonomy, a systematic framework that categorizes emerging methods for integrating LLMs into RL systems by bridging their open-domain knowledge and reasoning capabilities with the task-specific dynamics, constraints, and objectives inherent to real-world RL environments. This taxonomy encompasses both training-free approaches, which leverage the zero-shot and few-shot generalization capabilities of LLMs without fine-tuning, and fine-tuning paradigms that adapt LLMs to environment-specific tasks for improved performance. We critically analyze these methodologies, highlight practical examples of effective knowledge grounding, and examine the challenges of alignment, generalization, and real-world deployment. Our work not only illustrates the potential of LLM-RL agents for enhanced decision-making, but also offers actionable insights for advancing the design of next-generation RL systems that integrate open-domain knowledge with adaptive learning.

NeurIPS Conference 2025 Conference Paper

InstructFlow: Adaptive Symbolic Constraint-Guided Code Generation for Long-Horizon Planning

  • Haotian Chi
  • Zeyu Feng
  • Yueming LYU
  • Chengqi Zheng
  • Linbo Luo
  • Yew Soon Ong
  • Ivor Tsang
  • Hechang Chen

Long-horizon planning in robotic manipulation tasks requires translating underspecified, symbolic goals into executable control programs satisfying spatial, temporal, and physical constraints. However, language model-based planners often struggle with long-horizon task decomposition, robust constraint satisfaction, and adaptive failure recovery. We introduce InstructFlow, a multi-agent framework that establishes a symbolic, feedback-driven flow of information for code generation in robotic manipulation tasks. InstructFlow employs a InstructFlow Planner to construct and traverse a hierarchical instruction graph that decomposes goals into semantically meaningful subtasks, while a Code Generator generates executable code snippets conditioned on this graph. Crucially, when execution failures occur, a Constraint Generator analyzes feedback and induces symbolic constraints, which are propagated back into the instruction graph to guide targeted code refinement without regenerating from scratch. This dynamic, graph-guided flow enables structured, interpretable, and failure-resilient planning, significantly improving task success rates and robustness across diverse manipulation benchmarks, especially in constraint-sensitive and long-horizon scenarios.

IJCAI Conference 2025 Conference Paper

Instructing Text-to-Image Diffusion Models via Classifier-Guided Semantic Optimization

  • Yuanyuan Chang
  • Yinghua Yao
  • Tao Qin
  • Mengmeng Wang
  • Ivor Tsang
  • Guang Dai

Text-to-image diffusion models have emerged as powerful tools for high-quality image generation and editing. Many existing approaches rely on text prompts as editing guidance. However, these methods are constrained by the need for manual prompt crafting, which can be time-consuming, introduce irrelevant details, and significantly limit editing performance. In this work, we propose optimizing semantic embeddings guided by attribute classifiers to steer text-to-image models toward desired edits, without relying on text prompts or requiring any training or fine-tuning of the diffusion model. We utilize classifiers to learn precise semantic embeddings at the dataset level. The learned embeddings are theoretically justified as the optimal representation of attribute semantics, enabling disentangled and accurate edits. Experiments further demonstrate that our method achieves high levels of disentanglement and strong generalization across different domains of data. Code is available at https: //github. com/Chang-yuanyuan/CASO.

AAAI Conference 2025 Conference Paper

Max-Mahalanobis Anchors Guidance for Multi-View Clustering

  • Pei Zhang
  • Yuangang Pan
  • Siwei Wang
  • Shengju Yu
  • Huiying Xu
  • En Zhu
  • Xinwang Liu
  • Ivor Tsang

Anchor selection or learning has become a critical component in large-scale multi-view clustering. Existing anchor-based methods, which either select-then-fix or initialize-then-optimize with orthogonality, yield promising performance. However, these methods still suffer from instability of initialization or insufficient depiction of data distribution. Moreover, the desired properties of anchors in multi-view clustering remain unspecified. To address these issues, this paper first formalizes the desired characteristics of anchors, namely Diversity, Balance and Compactness. We then devise and mathematically validate anchors that satisfy these properties by maximizing the Mahalanobis distance between anchors. Furthermore, we introduce a novel method called Max-Mahalanobis Anchors Guidance for multi-view Clustering (MAGIC), which guides the cross-view representations to progressively align with our well-defined anchors. This process yields highly discriminative and compact representations, significantly enhancing the performance of multi-view clustering. Experimental results show that our meticulously designed strategy significantly outperforms existing anchor-based methods in enhancing anchor efficacy, leading to substantial improvement in multi-view clustering performance.

TMLR Journal 2025 Journal Article

Unlearning Misalignment for Personalized LLM Adaptation via Instance-Response-Dependent Discrepancies

  • Cheng Chen
  • Atsushi Nitanda
  • Ivor Tsang

While Large Language Models (LLMs) have revolutionized chatbot interactions, they often fall short of aligning responses with the nuanced preferences of individual users, a challenge rooted in the inherently subjective and proprietary nature of those preferences. Consequently, prompt-based learning, though effective in enhancing factual accuracy due to its emphasis on universal correctness, remains insufficient for achieving accurate personalised response alignment. Because user preferences vary widely across individuals and contexts, aligning responses requires a more personalized and context-aware approach. To address this limitation, we propose Consistent Marginalization (CM), a novel framework that aims to unlearn misalignment by constructing a personalised memory bank of instance-response-dependent discrepancies, built from a small set of user preference samples. This personalised memory bank equips LLMs with the ability to understand, recall, and adapt to individual preferences, enabling more consistent and personalized responses. Evaluated across a diverse range of domain-specific datasets and model architectures, CM yields notable improvements in response alignment and robustness. We believe Consistent Marginalization represents a valuable step toward enabling LLMs to become genuinely personable and adaptive conversational agents by understanding user preferences and generating responses that are better aligned with individual user expectations.

IJCAI Conference 2024 Conference Paper

Fast Unpaired Multi-view Clustering

  • Xingfeng Li
  • Yuangang Pan
  • Yinghui Sun
  • Quansen Sun
  • Ivor Tsang
  • Zhenwen Ren

Anchor based pair-wised multi-view clustering often assumes multi-view data are paired, and has demonstrated significant advancements in recent years. However, this presumption is easily violated, and data is commonly unpaired fully in practical applications due to the influence of data collection and storage processes. Addressing unpaired large-scale multi-view data through anchor learning remains a research gap. The absence of pairing in multi-view data disrupts the consistency and complementarity of multiple views, posing significant challenges in learning powerful and meaningful anchors and bipartite graphs from unpaired multi-view data. To tackle this challenge, this study proposes a novel Fast Unpaired Multi-view Clustering (FUMC) framework for fully unpaired large-scale multi-view data. Specifically, FUMC first designs an inverse local manifold learning paradigm to guide the learned anchors for effective pairing and balancing, ensuring alignment, fairness, and power in unpaired multi-view data. Meanwhile, a novel bipartite graph matching framework is developed to align unpaired bipartite graphs, creating a consistent bipartite graph from unpaired multi-view data. The efficacy, efficiency, and superiority of our FUMC are corroborated through extensive evaluations on numerous benchmark datasets with shallow and deep SOTA methods.

NeurIPS Conference 2024 Conference Paper

Parsimony or Capability? Decomposition Delivers Both in Long-term Time Series Forecasting

  • Jinliang Deng
  • Feiyang Ye
  • Du Yin
  • Xuan Song
  • Ivor Tsang
  • Hui Xiong

Long-term time series forecasting (LTSF) represents a critical frontier in time series analysis, characterized by extensive input sequences, as opposed to the shorter spans typical of traditional approaches. While longer sequences inherently offer richer information for enhanced predictive precision, prevailing studies often respond by escalating model complexity. These intricate models can inflate into millions of parameters, resulting in prohibitive parameter scales. Our study demonstrates, through both theoretical and empirical evidence, that decomposition is key to containing excessive model inflation while achieving uniformly superior and robust results across various datasets. Remarkably, by tailoring decomposition to the intrinsic dynamics of time series data, our proposed model outperforms existing benchmarks, using over 99\% fewer parameters than the majority of competing methods. Through this work, we aim to unleash the power of a restricted set of parameters by capitalizing on domain characteristics—a timely reminder that in the realm of LTSF, bigger is not invariably better. The code is available at \url{https: //anonymous. 4open. science/r/SSCNN-321D/}.

NeurIPS Conference 2024 Conference Paper

Sharpness-Aware Minimization Activates the Interactive Teaching's Understanding and Optimization

  • Mingwei Xu
  • Xiaofeng Cao
  • Ivor Tsang

Teaching is a potentially effective approach for understanding interactions among multiple intelligences. Previous explorations have convincingly shown that teaching presents additional opportunities for observation and demonstration within the learning model, such as data distillation and selection. However, the underlying optimization principles and convergence of interactive teaching lack theoretical analysis, and in this regard co-teaching serves as a notable prototype. In this paper, we discuss its role as a reduction of the larger loss landscape derived from Sharpness-Aware Minimization (SAM). Then, we classify it as an iterative parameter estimation process using Expectation-Maximization. The convergence of this typical interactive teaching is achieved by continuously optimizing a variational lower bound on the log marginal likelihood. This lower bound represents the expected value of the log posterior distribution of the latent variables under a scaled, factorized variational distribution. To further enhance interactive teaching's performance, we incorporate SAM's strong generalization information into interactive teaching, referred as Sharpness Reduction Interactive Teaching (SRIT). This integration can be viewed as a novel sequential optimization process. Finally, we validate the performance of our approach through multiple experiments.

TMLR Journal 2023 Journal Article

Contrastive Attraction and Contrastive Repulsion for Representation Learning

  • Huangjie Zheng
  • Xu Chen
  • Jiangchao Yao
  • Hongxia Yang
  • Chunyuan Li
  • Ya Zhang
  • Hao Zhang
  • Ivor Tsang

Contrastive learning (CL) methods effectively learn data representations in a self-supervision manner, where the encoder contrasts each positive sample over multiple negative samples via a one-vs-many softmax cross-entropy loss. By leveraging large amounts of unlabeled image data, recent CL methods have achieved promising results when pretrained on large-scale datasets, such as ImageNet. However, most of them consider the augmented views from the same instance are positive pairs, while views from other instances are negative ones. Such binary partition insufficiently considers the relation between samples and tends to yield worse performance when generalized on images in the wild. In this paper, to further improve the performance of CL and enhance its robustness on various datasets, we propose a doubly CL strategy that contrasts positive samples and negative ones within themselves separately. We realize this strategy with contrastive attraction and contrastive repulsion (CACR), which makes the query not only exert a greater force to attract more distant positive samples but also do so to repel closer negative samples. Theoretical analysis reveals that CACR generalizes CL's behavior by positive attraction and negative repulsion. It further considers the intra-contrastive relation within the positive and negative pairs to narrow the gap between the sampled and true distribution, which is important when datasets are less curated. Extensive large-scale experiments on standard vision tasks show that CACR not only consistently outperforms existing CL methods on benchmark datasets, but also shows better robustness when generalized on imbalanced image datasets.

NeurIPS Conference 2023 Conference Paper

Nonparametric Teaching for Multiple Learners

  • Chen Zhang
  • Xiaofeng Cao
  • Weiyang Liu
  • Ivor Tsang
  • James Kwok

We study the problem of teaching multiple learners simultaneously in the nonparametric iterative teaching setting, where the teacher iteratively provides examples to the learner for accelerating the acquisition of a target concept. This problem is motivated by the gap between current single-learner teaching setting and the real-world scenario of human instruction where a teacher typically imparts knowledge to multiple students. Under the new problem formulation, we introduce a novel framework -- Multi-learner Nonparametric Teaching (MINT). In MINT, the teacher aims to instruct multiple learners, with each learner focusing on learning a scalar-valued target model. To achieve this, we frame the problem as teaching a vector-valued target model and extend the target model space from a scalar-valued reproducing kernel Hilbert space used in single-learner scenarios to a vector-valued space. Furthermore, we demonstrate that MINT offers significant teaching speed-up over repeated single-learner teaching, particularly when the multiple learners can communicate with each other. Lastly, we conduct extensive experiments to validate the practicality and efficiency of MINT.

AAAI Conference 2022 Conference Paper

Multi-View Clustering on Topological Manifold

  • Shudong Huang
  • Ivor Tsang
  • Zenglin Xu
  • Jiancheng Lv
  • Quan-Hui Liu

Multi-view clustering has received a lot of attentions in data mining recently. Though plenty of works have been investigated on this topic, it is still a severe challenge due to the complex nature of the multiple heterogeneous features. Particularly, existing multi-view clustering algorithms fail to consider the topological structure in the data, which is essential for clustering data on manifold. In this paper, we propose to exploit the implied data manifold by learning the topological relationship between data points. Our method coalesces multiple view-wise graphs with the topological relevance considered, and learns the weights as well as the consensus graph interactively in a unified framework. Furthermore, we manipulate the consensus graph by a connectivity constraint such that the data points from the same cluster are precisely connected into the same component. Substantial experiments on benchmark datasets are conducted to validate the effectiveness of the proposed method, compared to the state-of-the-art algorithms over the clustering performance.

NeurIPS Conference 2022 Conference Paper

Multi-view Subspace Clustering on Topological Manifold

  • Shudong Huang
  • Hongjie Wu
  • Yazhou Ren
  • Ivor Tsang
  • Zenglin Xu
  • Wentao Feng
  • Jiancheng Lv

Multi-view subspace clustering aims to exploit a common affinity representation by means of self-expression. Plenty of works have been presented to boost the clustering performance, yet seldom considering the topological structure in data, which is crucial for clustering data on manifold. Orthogonal to existing works, in this paper, we argue that it is beneficial to explore the implied data manifold by learning the topological relationship between data points. Our model seamlessly integrates multiple affinity graphs into a consensus one with the topological relevance considered. Meanwhile, we manipulate the consensus graph by a connectivity constraint such that the connected components precisely indicate different clusters. Hence our model is able to directly obtain the final clustering result without reliance on any label discretization strategy as previous methods do. Experimental results on several benchmark datasets illustrate the effectiveness of the proposed model, compared to the state-of-the-art competitors over the clustering performance.

IJCAI Conference 2022 Conference Paper

Neural Subgraph Explorer: Reducing Noisy Information via Target-oriented Syntax Graph Pruning

  • Bowen Xing
  • Ivor Tsang

Recent years have witnessed the emerging success of leveraging syntax graphs for the target sentiment classification task. However, we discover that existing syntax-based models suffer from two issues: noisy information aggregation and loss of distant correlations. In this paper, we propose a novel model termed Neural Subgraph Explorer, which (1) reduces the noisy information via pruning target-irrelevant nodes on the syntax graph; (2) introduces beneficial first-order connections between the target and its related words into the obtained graph. Specifically, we design a multi-hop actions score estimator to evaluate the value of each word regarding the specific target. The discrete action sequence is sampled through Gumble-Softmax and then used for both of the syntax graph and the self-attention graph. To introduce the first-order connections between the target and its relevant words, the two pruned graphs are merged. Finally, graph convolution is conducted on the obtained unified graph to update the hidden states. And this process is stacked with multiple layers. To our knowledge, this is the first attempt of target-oriented syntax graph pruning in this task. Experimental results demonstrate the superiority of our model, which achieves new state-of-the-art performance.

TMLR Journal 2022 Journal Article

Reasonable Effectiveness of Random Weighting: A Litmus Test for Multi-Task Learning

  • Baijiong Lin
  • Feiyang Ye
  • Yu Zhang
  • Ivor Tsang

Multi-Task Learning (MTL) has achieved success in various fields. However, training with equal weights for all tasks may cause unsatisfactory performance for part of tasks. To address this problem, there are many works to carefully design dynamical loss/gradient weighting strategies but the basic random experiments are ignored to examine their effectiveness. In this paper, we propose the Random Weighting (RW) methods, including Random Loss Weighting (RLW) and Random Gradient Weighting (RGW), where an MTL model is trained with random loss/gradient weights sampled from a distribution. To show the effectiveness and necessity of RW methods, theoretically, we analyze the convergence of RW and reveal that RW has a higher probability to escape local minima, resulting in better generalization ability. Empirically, we extensively evaluate the proposed RW methods to compare with twelve state-of-the-art methods on five image datasets and two multilingual problems from the XTREME benchmark to show that RW methods can achieve comparable performance with state-of-the-art baselines. Therefore, we think the RW methods are important baselines for MTL and should attract more attention.

NeurIPS Conference 2020 Conference Paper

Graph Cross Networks with Vertex Infomax Pooling

  • Maosen Li
  • Siheng Chen
  • Ya Zhang
  • Ivor Tsang

We propose a novel graph cross network (GXN) to achieve comprehensive feature learning from multiple scales of a graph. Based on trainable hierarchical representations of a graph, GXN enables the interchange of intermediate features across scales to promote information flow. Two key ingredients of GXN include a novel vertex infomax pooling (VIPool), which creates multiscale graphs in a trainable manner, and a novel feature-crossing layer, enabling feature interchange across scales. The proposed VIPool selects the most informative subset of vertices based on the neural estimation of mutual information between vertex features and neighborhood features. The intuition behind is that a vertex is informative when it can maximally reflect its neighboring information. The proposed feature-crossing layer fuses intermediate features between two scales for mutual enhancement by improving information flow and enriching multiscale features at hidden layers. The cross shape of feature-crossing layer distinguishes GXN from many other multiscale architectures. Experimental results show that the proposed GXN improves the classification accuracy by 2. 12% and 1. 15% on average for graph classification and vertex classification, respectively. Based on the same network, the proposed VIPool consistently outperforms other graph-pooling methods.

NeurIPS Conference 2020 Conference Paper

Subgroup-based Rank-1 Lattice Quasi-Monte Carlo

  • Yueming LYU
  • Yuan Yuan
  • Ivor Tsang

Quasi-Monte Carlo (QMC) is an essential tool for integral approximation, Bayesian inference, and sampling for simulation in science, etc. In the QMC area, the rank-1 lattice is important due to its simple operation, and nice property for point set construction. However, the construction of the generating vector of the rank-1 lattice is usually time-consuming through an exhaustive computer search. To address this issue, we propose a simple closed-form rank-1 lattice construction method based on group theory. Our method reduces the number of distinct pairwise distance values to generate a more regular lattice. We theoretically prove a lower and an upper bound of the minimum pairwise distance of any non-degenerate rank-1 lattice. Empirically, our methods can generate near-optimal rank-1 lattice compared with Korobov exhaustive search regarding the $l_1$-norm and $l_2$-norm minimum distance. Moreover, experimental results show that our method achieves superior approximation performance on the benchmark integration test problems and the kernel approximation problems.

NeurIPS Conference 2018 Conference Paper

Co-teaching: Robust training of deep neural networks with extremely noisy labels

  • Bo Han
  • Quanming Yao
  • Xingrui Yu
  • Gang Niu
  • Miao Xu
  • Weihua Hu
  • Ivor Tsang
  • Masashi Sugiyama

Deep learning with noisy labels is practically challenging, as the capacity of deep models is so high that they can totally memorize these noisy labels sooner or later during training. Nonetheless, recent studies on the memorization effects of deep neural networks show that they would first memorize training data of clean labels and then those of noisy labels. Therefore in this paper, we propose a new deep learning paradigm called ''Co-teaching'' for combating with noisy labels. Namely, we train two deep neural networks simultaneously, and let them teach each other given every mini-batch: firstly, each network feeds forward all data and selects some data of possibly clean labels; secondly, two networks communicate with each other what data in this mini-batch should be used for training; finally, each network back propagates the data selected by its peer network and updates itself. Empirical results on noisy versions of MNIST, CIFAR-10 and CIFAR-100 demonstrate that Co-teaching is much superior to the state-of-the-art methods in the robustness of trained deep models.

AAAI Conference 2018 Conference Paper

Compact Multi-Label Learning

  • Xiaobo Shen
  • Weiwei Liu
  • Ivor Tsang
  • Quan-Sen Sun
  • Yew-Soon Ong

Embedding methods have shown promising performance in multi-label prediction, as they can discover the dependency of labels. Most embedding methods cannot well align the input and output, which leads to degradation in prediction performance. Besides, they suffer from expensive prediction computational costs when applied to large-scale datasets. To address the above issues, this paper proposes a Co-Hashing (CoH) method by formulating multi-label learning from the perspective of cross-view learning. CoH first regards the input and output as two views, and then aims to learn a common latent hamming space, where input and output pairs are compressed into compact binary embeddings. CoH enjoys two key benefits: 1) the input and output can be well aligned, and their correlations are explored; 2) the prediction is very efficient using fast cross-view kNN search in the hamming space. Moreover, we provide the generalization error bound for our method. Extensive experiments on eight real-world datasets demonstrate the superiority of the proposed CoH over the state-of-the-art methods in terms of both prediction accuracy and efficiency.

AAAI Conference 2018 Conference Paper

Doubly Approximate Nearest Neighbor Classification

  • Weiwei Liu
  • Zhuanghua Liu
  • Ivor Tsang
  • Wenjie Zhang
  • Xuemin Lin

Nonparametric classification models, such as K-Nearest Neighbor (KNN), have become particularly powerful tools in machine learning and data mining, due to their simplicity and flexibility. However, the testing time of the KNN classi- fier becomes unacceptable and the KNN’s performance deteriorates significantly when applied to data sets with millions of dimensions. We observe that state-of-the-art approximate nearest neighbor (ANN) methods aim to either reduce the number of distance comparisons based on tree structure or decrease the cost of distance computation by dimension reduction methods. In this paper, we propose a doubly approximate nearest neighbor classification strategy, which marries the two branches which compress the dimensions for decreasing distance computation cost as well as reduce the number of distance comparison instead of full scan. Under this strategy, we build a compressed dimensional tree (CD-Tree) to avoid unnecessary distance calculations. In each decision node, we propose a novel feature selection paradigm by optimizing the feature selection vector as well as the separator (indicator variables for splitting instances) with the maximum margin. An efficient algorithm is then developed to find the globally optimal solution with convergence guarantee. Furthermore, we also provide a data-dependent generalization error bound for our model, which reveals a new insight for the design of ANN classification algorithms. Our empirical studies show that our algorithm consistently obtains competitive or better classification results on all data sets, yet we can also achieve three orders of magnitude faster than state-of-the-art libraries on very high dimensions.

NeurIPS Conference 2018 Conference Paper

Masking: A New Perspective of Noisy Supervision

  • Bo Han
  • Jiangchao Yao
  • Gang Niu
  • Mingyuan Zhou
  • Ivor Tsang
  • Ya Zhang
  • Masashi Sugiyama

It is important to learn various types of classifiers given training data with noisy labels. Noisy labels, in the most popular noise model hitherto, are corrupted from ground-truth labels by an unknown noise transition matrix. Thus, by estimating this matrix, classifiers can escape from overfitting those noisy labels. However, such estimation is practically difficult, due to either the indirect nature of two-step approaches, or not big enough data to afford end-to-end approaches. In this paper, we propose a human-assisted approach called ''Masking'' that conveys human cognition of invalid class transitions and naturally speculates the structure of the noise transition matrix. To this end, we derive a structure-aware probabilistic model incorporating a structure prior, and solve the challenges from structure extraction and structure alignment. Thanks to Masking, we only estimate unmasked noise transition probabilities and the burden of estimation is tremendously reduced. We conduct extensive experiments on CIFAR-10 and CIFAR-100 with three noise structures as well as the industrial-level Clothing1M with agnostic noise structure, and the results show that Masking can improve the robustness of classifiers significantly.

AAAI Conference 2018 Conference Paper

SC2Net: Sparse LSTMs for Sparse Coding

  • Joey Tianyi Zhou
  • Kai Di
  • Jiawei Du
  • Xi Peng
  • Hao Yang
  • Sinno Jialin Pan
  • Ivor Tsang
  • Yong Liu

The iterative hard-thresholding algorithm (ISTA) is one of the most popular optimization solvers to achieve sparse codes. However, ISTA suffers from following problems: 1) ISTA employs non-adaptive updating strategy to learn the parameters on each dimension with a fixed learning rate. Such a strategy may lead to inferior performance due to the scarcity of diversity; 2) ISTA does not incorporate the historical information into the updating rules, and the historical information has been proven helpful to speed up the convergence. To address these challenging issues, we propose a novel formulation of ISTA (named as adaptive ISTA) by introducing a novel adaptive momentum vector. To efficiently solve the proposed adaptive ISTA, we recast it as a recurrent neural network unit and show its connection with the well-known long short term memory (LSTM) model. With a new proposed unit, we present a neural network (termed SC2Net) to achieve sparse codes in an end-to-end manner. To the best of our knowledge, this is one of the first works to bridge the 1-solver and LSTM, and may provide novel insights in understanding model-based optimization and LSTM. Extensive experiments show the effectiveness of our method on both unsupervised and supervised tasks.

AAAI Conference 2017 Conference Paper

Approximate Conditional Gradient Descent on Multi-Class Classification

  • Zhuanghua Liu
  • Ivor Tsang

Conditional gradient descent, aka the Frank-Wolfe algorithm, regains popularity in recent years. The key advantage of Frank-Wolfe is that at each step the expensive projection is replaced with a much more efficient linear optimization step. Similar to gradient descent, the loss function of Frank- Wolfe scales with the data size. Training on big data poses a challenge for researchers. Recently, stochastic Frank-Wolfe methods have been proposed to solve the problem, but they do not perform well in practice. In this work, we study the problem of approximating the Frank-Wolfe algorithm on the large-scale multi-class classification problem which is a typical application of the Frank-Wolfe algorithm. We present a simple but effective method employing internal structure of data to approximate Frank-Wolfe on the large-scale multiclass classification problem. Empirical results verify that our method outperforms the state-of-the-art stochastic projectionfree methods.

AAAI Conference 2017 Conference Paper

Compressed K-Means for Large-Scale Clustering

  • Xiaobo Shen
  • Weiwei Liu
  • Ivor Tsang
  • Fumin Shen
  • Quan-Sen Sun

Large-scale clustering has been widely used in many applications, and has received much attention. Most existing clustering methods suffer from both expensive computation and memory costs when applied to large-scale datasets. In this paper, we propose a novel clustering method, dubbed compressed k-means (CKM), for fast large-scale clustering. Specifically, high-dimensional data are compressed into short binary codes, which are well suited for fast clustering. CKM enjoys two key benefits: 1) storage can be significantly reduced by representing data points as binary codes; 2) distance computation is very efficient using Hamming metric between binary codes. We propose to jointly learn binary codes and clusters within one framework. Extensive experimental results on four large-scale datasets, including two million-scale datasets demonstrate that CKM outperforms the state-of-theart large-scale clustering methods in terms of both computation and memory cost, while achieving comparable clustering accuracy.

AAAI Conference 2017 Conference Paper

Latent Smooth Skeleton Embedding

  • Li Wang
  • Qi Mao
  • Ivor Tsang

Learning a smooth skeleton in a low-dimensional space from noisy data becomes important in computer vision and computational biology. Existing methods assume that the manifold constructed from the data is smooth, but they lack the ability to model skeleton structures from noisy data. To overcome this issue, we propose a novel probabilistic structured learning model to learn the density of latent embedding given high-dimensional data and its neighborhood graph. The embedded points that form a smooth skeleton structure are obtained by maximum a posteriori (MAP) estimation. Our analysis shows that the resulting similarity matrix is sparse and unique, and its associated kernel has eigenvalues that follow a power law distribution, which leads to the embeddings of a smooth skeleton. The model is extended to learn a sparse similarity matrix when the graph structure is unknown. Extensive experiments demonstrate the effectiveness of the proposed methods on various datasets by comparing them with existing methods.

NeurIPS Conference 2017 Conference Paper

Sparse Embedded $k$-Means Clustering

  • Weiwei Liu
  • Xiaobo Shen
  • Ivor Tsang

The $k$-means clustering algorithm is a ubiquitous tool in data mining and machine learning that shows promising performance. However, its high computational cost has hindered its applications in broad domains. Researchers have successfully addressed these obstacles with dimensionality reduction methods. Recently, [1] develop a state-of-the-art random projection (RP) method for faster $k$-means clustering. Their method delivers many improvements over other dimensionality reduction methods. For example, compared to the advanced singular value decomposition based feature extraction approach, [1] reduce the running time by a factor of $\min \{n, d\}\epsilon^2 log(d)/k$ for data matrix $X \in \mathbb{R}^{n\times d} $ with $n$ data points and $d$ features, while losing only a factor of one in approximation accuracy. Unfortunately, they still require $\mathcal{O}(\frac{ndk}{\epsilon^2log(d)})$ for matrix multiplication and this cost will be prohibitive for large values of $n$ and $d$. To break this bottleneck, we carefully build a sparse embedded $k$-means clustering algorithm which requires $\mathcal{O}(nnz(X))$ ($nnz(X)$ denotes the number of non-zeros in $X$) for fast matrix multiplication. Moreover, our proposed algorithm improves on [1]'s results for approximation accuracy by a factor of one. Our empirical studies corroborate our theoretical findings, and demonstrate that our approach is able to significantly accelerate $k$-means clustering, while achieving satisfactory clustering performance.

AAAI Conference 2016 Conference Paper

Optimizing Multivariate Performance Measures from Multi-View Data

  • Jim Jing-Yan Wang
  • Ivor Tsang
  • Xin Gao

To date, many machine learning applications have multiple views of features, and different applications require specific multivariate performance measures, such as the F-score for retrieval. However, existing multivariate performance measure optimization methods are limited to single-view data, while traditional multi-view learning methods cannot optimize multivariate performance measures directly. To fill this gap, in this paper, we propose the problem of optimizing multivariate performance measures from multi-view data, and an effective method to solve it. We propose to learn linear discriminant functions for different views, and combine them to construct an overall multivariate mapping function for multiview data. To learn the parameters of the linear discriminant functions of different views to optimize a given multivariate performance measure, we formulate an optimization problem. In this problem, we propose to minimize the complexity of the linear discriminant function of each view, promote the consistency of the responses of different views over the same data points, and minimize the upper boundary of the corresponding loss of a given multivariate performance measure. To optimize this problem, we develop an iterative cuttingplane algorithm. Experiments on four benchmark data sets show that it not only outperforms traditional single-view based multivariate performance optimization methods, but also achieves better results than ordinary multi-view learning methods.

AAAI Conference 2016 Conference Paper

Robust Semi-Supervised Learning through Label Aggregation

  • Yan Yan
  • Zhongwen Xu
  • Ivor Tsang
  • Guodong Long
  • Yi Yang

Semi-supervised learning is proposed to exploit both labeled and unlabeled data. However, as the scale of data in real world applications increases significantly, conventional semisupervised algorithms usually lead to massive computational cost and cannot be applied to large scale datasets. In addition, label noise is usually present in the practical applications due to human annotation, which very likely results in remarkable degeneration of performance in semi-supervised methods. To address these two challenges, in this paper, we propose an efficient RObust Semi-Supervised Ensemble Learning (ROSSEL) method, which generates pseudo-labels for unlabeled data using a set of weak annotators, and combines them to approximate the ground-truth labels to assist semisupervised learning. We formulate the weighted combination process as a multiple label kernel learning (MLKL) problem which can be solved efficiently. Compared with other semisupervised learning algorithms, the proposed method has linear time complexity. Extensive experiments on five benchmark datasets demonstrate the superior effectiveness, efficiency and robustness of the proposed algorithm.

AAAI Conference 2016 Conference Paper

Sparse Perceptron Decision Tree for Millions of Dimensions

  • Weiwei Liu
  • Ivor Tsang

Due to the nonlinear but highly interpretable representations, decision tree (DT) models have significantly attracted a lot of attention of researchers. However, DT models usually suffer from the curse of dimensionality and achieve degenerated performance when there are many noisy features. To address these issues, this paper first presents a novel data-dependent generalization error bound for the perceptron decision tree (PDT), which provides the theoretical justification to learn a sparse linear hyperplane in each decision node and to prune the tree. Following our analysis, we introduce the notion of sparse perceptron decision node (SPDN) with a budget constraint on the weight coefficients, and propose a sparse perceptron decision tree (SPDT) algorithm to achieve nonlinear prediction performance. To avoid generating an unstable and complicated decision tree and improve the generalization of the SPDT, we present a pruning strategy by learning classi- fiers to minimize cross-validation errors on each SPDN. Extensive empirical studies verify that our SPDT is more resilient to noisy features and effectively generates a small, yet accurate decision tree. Compared with state-of-the-art DT methods and SVM, our SPDT achieves better generalization performance on ultrahigh dimensional problems with more than 1 million features.

AAAI Conference 2016 Conference Paper

Transfer Learning for Cross-Language Text Categorization through Active Correspondences Construction

  • Joey Zhou
  • Sinno Pan
  • Ivor Tsang
  • Shen-Shyang Ho

Most existing heterogeneous transfer learning (HTL) methods for cross-language text classification rely on sufficient cross-domain instance correspondences to learn a mapping across heterogeneous feature spaces, and assume that such correspondences are given in advance. However, in practice, correspondences between domains are usually unknown. In this case, extensively manual efforts are required to establish accurate correspondences across multilingual documents based on their content and meta-information. In this paper, we present a general framework to integrate active learning to construct correspondences between heterogeneous domains for HTL, namely HTL through active correspondences construction (HTLA). Based on this framework, we develop a new HTL method. On top of the new HTL method, we further propose a strategy to actively construct correspondences between domains. Extensive experiments are conducted on various multilingual text classification tasks to verify the effectiveness of HTLA.

AAAI Conference 2015 Conference Paper

Effectively Predicting Whether and When a Topic Will Become Prevalent in a Social Network

  • Weiwei Liu
  • Zhi-Hong Deng
  • Xiuwen Gong
  • Frank Jiang
  • Ivor Tsang

Effective forecasting of future prevalent topics plays an important role in social network business development. It involves two challenging aspects: predicting whether a topic will become prevalent, and when. This cannot be directly handled by the existing algorithms in topic modeling, item recommendation and action forecasting. The classic forecasting framework based on time series models may be able to predict a hot topic when a series of periodical changes to user-addressed frequency in a systematic way. However, the frequency of topics discussed by users often changes irregularly in social networks. In this paper, a generic probabilistic framework is proposed for hot topic prediction, and machine learning methods are explored to predict hot topic patterns. Two effective models, PreWHether and PreWHen, are introduced to predict whether and when a topic will become prevalent. In the PreWHether model, we simulate the constructed features of previously observed frequency changes for better prediction. In the PreWHen model, distributions of time intervals associated with the emergence to prevalence of a topic are modeled. Extensive experiments on real datasets demonstrate that our method outperforms the baselines and generates more effective predictions.

AAAI Conference 2015 Conference Paper

Large Margin Metric Learning for Multi-Label Prediction

  • Weiwei Liu
  • Ivor Tsang

Canonical correlation analysis (CCA) and maximum margin output coding (MMOC) methods have shown promising results for multi-label prediction, where each instance is associated with multiple labels. However, these methods require an expensive decoding procedure to recover the multiple labels of each testing instance. The testing complexity becomes unacceptable when there are many labels. To avoid decoding completely, we present a novel large margin metric learning paradigm for multi-label prediction. In particular, the proposed method learns a distance metric to discover label dependency such that instances with very different multiple labels will be moved far away. To handle many labels, we present an accelerated proximal gradient procedure to speed up the learning process. Comprehensive experiments demonstrate that our proposed method is significantly faster than CCA and MMOC in terms of both training and testing complexities. Moreover, our method achieves superior prediction performance compared with state-of-the-art methods.

NeurIPS Conference 2015 Conference Paper

On the Optimality of Classifier Chain for Multi-label Classification

  • Weiwei Liu
  • Ivor Tsang

To capture the interdependencies between labels in multi-label classification problems, classifier chain (CC) tries to take the multiple labels of each instance into account under a deterministic high-order Markov Chain model. Since its performance is sensitive to the choice of label order, the key issue is how to determine the optimal label order for CC. In this work, we first generalize the CC model over a random label order. Then, we present a theoretical analysis of the generalization error for the proposed generalized model. Based on our results, we propose a dynamic programming based classifier chain (CC-DP) algorithm to search the globally optimal label order for CC and a greedy classifier chain (CC-Greedy) algorithm to find a locally optimal CC. Comprehensive experiments on a number of real-world multi-label data sets from various domains demonstrate that our proposed CC-DP algorithm outperforms state-of-the-art approaches and the CC-Greedy algorithm achieves comparable prediction performance with CC-DP.

IJCAI Conference 2015 Conference Paper

Scalable Maximum Margin Matrix Factorization by Active Riemannian Subspace Search

  • Yan Yan
  • Mingkui Tan
  • Ivor Tsang
  • Yi Yang
  • Chengqi Zhang
  • Qinfeng Shi

The user ratings in recommendation systems are usually in the form of ordinal discrete values. To give more accurate prediction of such rating data, maximum margin matrix factorization (M3 F) was proposed. Existing M3 F algorithms, however, either have massive computational cost or require expensive model selection procedures to determine the number of latent factors (i. e. the rank of the matrix to be recovered), making them less practical for large scale data sets. To address these two challenges, in this paper, we formulate M3 F with a known number of latent factors as the Riemannian optimization problem on a fixed-rank matrix manifold and present a block-wise nonlinear Riemannian conjugate gradient method to solve it efficiently. We then apply a simple and efficient active subspace search scheme to automatically detect the number of latent factors. Empirical studies on both synthetic data sets and large real-world data sets demonstrate the superior efficiency and effectiveness of the proposed method.

AAAI Conference 2014 Conference Paper

Hybrid Heterogeneous Transfer Learning through Deep Learning

  • Joey Zhou
  • Sinno Pan
  • Ivor Tsang
  • Yan Yan

Most previous heterogeneous transfer learning methods learn a cross-domain feature mapping between heterogeneous feature spaces based on a few cross-domain instance-correspondences, and these corresponding instances are assumed to be representative in the source and target domains respectively. However, in many realworld scenarios, this assumption may not hold. As a result, the constructed feature mapping may not be precise due to the bias issue of the correspondences in the target or (and) source domain(s). In this case, a classifier trained on the labeled transformed-sourcedomain data may not be useful for the target domain. In this paper, we present a new transfer learning framework called Hybrid Heterogeneous Transfer Learning (HHTL), which allows the corresponding instances across domains to be biased in either the source or target domain. Specifically, we propose a deep learning approach to learn a feature mapping between crossdomain heterogeneous features as well as a better feature representation for mapped data to reduce the bias issue caused by the cross-domain correspondences. Extensive experiments on several multilingual sentiment classification tasks verify the effectiveness of our proposed approach compared with some baseline methods.

AAAI Conference 2012 Conference Paper

Convex Matching Pursuit for Large-Scale Sparse Coding and Subset Selection

  • Mingkui Tan
  • Ivor Tsang
  • Li Wang
  • Xinming Zhang

In this paper, a new convex matching pursuit scheme is proposed for tackling large-scale sparse coding and subset selection problems. In contrast with current matching pursuit algorithms such as subspace pursuit (SP), the proposed algorithm has a convex formulation and guarantees that the objective value can be monotonically decreased. Moreover, theoretical analysis and experimental results show that the proposed method achieves better scalability while maintaining similar or better decoding ability compared with state-of-the-art methods on large-scale problems.

NeurIPS Conference 2006 Conference Paper

Large-Scale Sparsified Manifold Regularization

  • Ivor Tsang
  • James Kwok

Semi-supervised learning is more powerful than supervised learning by using both labeled and unlabeled data. In particular, the manifold regularization framework, together with kernel methods, leads to the Laplacian SVM (LapSVM) that has demonstrated state-of-the-art performance. However, the LapSVM solution typically involves kernel expansions of all the labeled and unlabeled examples, and is slow on testing. Moreover, existing semi-supervised learning methods, including the LapSVM, can only handle a small number of unlabeled examples. In this paper, we integrate manifold regularization with the core vector machine, which has been used for large-scale supervised and unsupervised learning. By using a sparsified manifold regularizer and formulating as a center-constrained minimum enclosing ball problem, the proposed method produces sparse solutions with low time and space complexities. Experimental results show that it is much faster than the LapSVM, and can handle a million unlabeled examples on a standard PC; while the LapSVM can only handle several thousand patterns.