Arrow Research search

Author name cluster

Irwin King

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

81 papers
2 author rows

Possible papers

81

AAAI Conference 2026 Conference Paper

ConSurv: Multimodal Continual Learning for Survival Analysis

  • Dianzhi Yu
  • Conghao Xiong
  • Yankai Chen
  • Wenqian Cui
  • Xinni Zhang
  • Yifei Zhang
  • Hao Chen
  • Joseph J. Y. Sung

Survival prediction of cancers is crucial for clinical practice, as it informs mortality risks and influences treatment plans. However, a static model trained on a single dataset fails to adapt to the dynamically evolving clinical environment and continuous data streams, limiting its practical utility. While continual learning (CL) offers a solution to learn dynamically from new datasets, existing CL methods primarily focus on unimodal inputs and suffer from severe catastrophic forgetting in survival prediction. In real-world scenarios, multimodal inputs often provide comprehensive and complementary information, such as whole slide images and genomics; and neglecting inter-modal correlations negatively impacts the performance. To address the two challenges of catastrophic forgetting and complex inter-modal interactions between gigapixel whole slide images and genomics, we propose ConSurv, the first multimodal continual learning (MMCL) method for survival analysis. ConSurv incorporates two key components: Multi-staged Mixture of Experts (MS-MoE) and Feature Constrained Replay (FCR). MS-MoE captures both task-shared and task-specific knowledge at different learning stages of the network, including two modality encoders and the modality fusion component, learning inter-modal relationships. FCR further enhances learned knowledge and mitigates forgetting by restricting feature deviation of previous data at different levels, including encoder-level features of two modalities and the fusion-level representations. Additionally, we introduce a new benchmark integrating four datasets, Multimodal Survival Analysis Incremental Learning (MSAIL), for comprehensive evaluation in the CL setting. Extensive experiments demonstrate that ConSurv outperforms competing methods across multiple metrics.

AAAI Conference 2025 Conference Paper

Context-aware Inductive Knowledge Graph Completion with Latent Type Constraints and Subgraph Reasoning

  • Muzhi Li
  • Cehao Yang
  • Chengjin Xu
  • Zixing Song
  • Xuhui Jiang
  • Jian Guo
  • Ho-fung Leung
  • Irwin King

Inductive knowledge graph completion (KGC) aims to predict missing triples with unseen entities. Recent works focus on modeling reasoning paths between the head and tail entity as direct supporting evidence. However, these methods depend heavily on the existence and quality of reasoning paths, which limits their general applicability in different scenarios. In addition, we observe that latent type constraints and neighboring facts inherent in KGs are also vital in inferring missing triples. To effectively utilize all useful information in KGs, we introduce CATS, a novel context-aware inductive KGC solution. With sufficient guidance from proper prompts and supervised fine-tuning, CATS activates the strong semantic understanding and reasoning capabilities of large language models to assess the existence of query triples, which consist of two modules. First, the type-aware reasoning module evaluates whether the candidate entity matches the latent entity type as required by the query relation. Then, the subgraph reasoning module selects relevant reasoning paths and neighboring facts, and evaluates their correlation to the query triple. Experiment results on three widely used datasets demonstrate that CATS significantly outperforms state-of-the-art methods in 16 out of 18 transductive, inductive, and few-shot settings with an average absolute MRR improvement of 7.2%.

NeurIPS Conference 2025 Conference Paper

Hyperbolic Fine-Tuning for Large Language Models

  • Menglin Yang
  • Ram B
  • Aosong Feng
  • Bo Xiong
  • Jiahong Liu
  • Irwin King
  • Rex Ying

Large language models (LLMs) have demonstrated remarkable performance across various tasks. However, it remains an open question whether the default Euclidean space is the most suitable choice for LLMs. In this study, we investigate the geometric characteristics of LLMs, focusing specifically on tokens and their embeddings. Our findings reveal that token frequency follows a power-law distribution, where high-frequency tokens (e. g. , the, that ) constitute the minority, while low-frequency tokens (e. g. , apple, dog) constitute the majority. Furthermore, high-frequency tokens cluster near the origin, whereas low-frequency tokens are positioned farther away in the embedding space. Additionally, token embeddings exhibit hyperbolic characteristics, indicating a latent tree-like structure within the embedding space. Motivated by these observations, we propose HypLoRA, an efficient fine-tuning approach that operates in hyperbolic space to exploit these underlying hierarchical structures better. HypLoRA performs low-rank adaptation directly in hyperbolic space, thereby preserving hyperbolic modeling capabilities throughout the fine-tuning process. Extensive experiments across various base models and reasoning benchmarks, specifically arithmetic and commonsense reasoning tasks, demonstrate that HypLoRA substantially improves LLM performance.

ICML Conference 2025 Conference Paper

NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction

  • Qichao Wang
  • Ziqiao Meng
  • Wenqian Cui
  • Yifei Zhang
  • Pengcheng Wu
  • Bingzhe Wu
  • Irwin King
  • Liang Chen 0001

Inspired by the impressive capabilities of GPT-4o, there is growing interest in enabling speech language models (SLMs) to engage in natural, fluid spoken interactions with humans. Recent advancements have led to the development of several SLMs that demonstrate promising results in this area. However, current approaches have yet to fully exploit dual-channel speech data, which inherently captures the structure and dynamics of human conversation. In this work, we systematically explore the use of dual-channel speech data in the context of modern large language models, and introduce a novel generative modeling paradigm—Next-Token-Pair Prediction (NTPP)—to enable speaker-independent dual-channel spoken dialogue learning using decoder-only architectures for the first time. We evaluate our approach on standard benchmarks, and empirical results show that our proposed method, NTPP, significantly improves the conversational abilities of SLMs in terms of turn-taking prediction, response coherence, and naturalness. Moreover, compared to existing methods, NTPP achieves substantially lower inference latency, highlighting its practical efficiency for real-time applications. Demo and code can be found at https: //audio-3059. pages. dev.

AAAI Conference 2024 Conference Paper

A Diffusion-Based Pre-training Framework for Crystal Property Prediction

  • Zixing Song
  • Ziqiao Meng
  • Irwin King

Many significant problems involving crystal property prediction from 3D structures have limited labeled data due to expensive and time-consuming physical simulations or lab experiments. To overcome this challenge, we propose a pretrain-finetune framework for the crystal property prediction task named CrysDiff based on diffusion models. In the pre-training phase, CrysDiff learns the latent marginal distribution of crystal structures via the reconstruction task. Subsequently, CrysDiff can be fine-tuned under the guidance of the new sparse labeled data, fitting the conditional distribution of the target property given the crystal structures. To better model the crystal geometry, CrysDiff notably captures the full symmetric properties of the crystals, including the invariance of reflection, rotation, and periodic translation. Extensive experiments demonstrate that CrysDiff can significantly improve the performance of the downstream crystal property prediction task on multiple target properties, outperforming all the SOTA pre-training models for crystals with good margins on the popular JARVIS-DFT dataset.

TIST Journal 2024 Journal Article

A Survey of Trustworthy Federated Learning: Issues, Solutions, and Challenges

  • Yifei Zhang
  • Dun Zeng
  • Jinglong Luo
  • Xinyu Fu
  • Guanzhong Chen
  • Zenglin Xu
  • Irwin King

Trustworthy artificial intelligence (TAI) has proven invaluable in curbing potential negative repercussions tied to AI applications. Within the TAI spectrum, federated learning (FL) emerges as a promising solution to safeguard personal information in distributed settings across a multitude of practical contexts. However, the realm of FL is not without its challenges. Especially worrisome are adversarial attacks targeting its algorithmic robustness and systemic confidentiality. Moreover, the presence of biases and opacity in prediction outcomes further complicates FL’s broader adoption. Consequently, there is a growing expectation for FL to instill trust. To address this, we chart out a comprehensive road-map for Trustworthy Federated Learning (TFL) and provide an overview of existing efforts across four pivotal dimensions: Privacy and Security, Robustness, Fairness, and Explainability. For each dimension, we identify potential pitfalls that might undermine TFL and present a curated selection of defensive strategies, enriched by a discourse on technical solutions tailored for TFL. Furthermore, we present potential challenges and future directions to be explored for in-depth TFL research with broader impacts.

IJCAI Conference 2024 Conference Paper

A Systematic Survey on Federated Semi-supervised Learning

  • Zixing Song
  • Xiangli Yang
  • Yifei Zhang
  • Xinyu Fu
  • Zenglin Xu
  • Irwin King

Federated learning (FL) revolutionizes distributed machine learning by enabling devices to collaboratively learn a model while maintaining data privacy. However, FL usually faces a critical challenge with limited labeled data, making semi-supervised learning (SSL) crucial for utilizing abundant unlabeled data. The integration of SSL within the federated framework gives rise to federated semi-supervised learning (FSSL), a novel approach that exploits unlabeled data across devices without compromising privacy. This paper systematically explores FSSL, shedding light on its four basic problem settings that commonly appear in real-world scenarios. By examining the unique challenges, generic solutions, and representative methods tailored for each setting of FSSL, we aim to provide a cohesive overview of the current state of the art and pave the way for future research directions in this promising field.

ICLR Conference 2024 Conference Paper

An Unforgeable Publicly Verifiable Watermark for Large Language Models

  • Aiwei Liu
  • Leyi Pan
  • Xuming Hu
  • Shuang Li 0015
  • Lijie Wen 0001
  • Irwin King
  • Philip S. Yu

Recently, text watermarking algorithms for large language models (LLMs) have been proposed to mitigate the potential harms of text generated by LLMs, including fake news and copyright issues. However, current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection. To address this limitation, we propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages. Meanwhile, the token embedding parameters are shared between the generation and detection networks, which makes the detection network achieve a high accuracy very efficiently. Experiments demonstrate that our algorithm attains high detection accuracy and computational efficiency through neural networks. Subsequent analysis confirms the high complexity involved in forging the watermark from the detection network. Our code is available at https://github.com/THU-BPM/unforgeable_watermark

AAAI Conference 2024 Conference Paper

Deep Structural Knowledge Exploitation and Synergy for Estimating Node Importance Value on Heterogeneous Information Networks

  • Yankai Chen
  • Yixiang Fang
  • Qiongyan Wang
  • Xin Cao
  • Irwin King

The classic problem of node importance estimation has been conventionally studied with homogeneous network topology analysis. To deal with practical network heterogeneity, a few recent methods employ graph neural models to automatically learn diverse sources of information. However, the major concern revolves around that their fully adaptive learning process may lead to insufficient information exploration, thereby formulating the problem as the isolated node value prediction with underperformance and less interpretability. In this work, we propose a novel learning framework namely SKES. Different from previous automatic learning designs, SKES exploits heterogeneous structural knowledge to enrich the informativeness of node representations. Then based on a sufficiently uninformative reference, SKES estimates the importance value for any input node, by quantifying its informativeness disparity against the reference. This establishes an interpretable node importance computation paradigm. Furthermore, SKES dives deep into the understanding that "nodes with similar characteristics are prone to have similar importance values" whilst guaranteeing that such informativeness disparity between any different nodes is orderly reflected by the embedding distance of their associated latent features. Extensive experiments on three widely-evaluated benchmarks demonstrate the performance superiority of SKES over several recent competing methods.

AAAI Conference 2024 Conference Paper

HiHPQ: Hierarchical Hyperbolic Product Quantization for Unsupervised Image Retrieval

  • Zexuan Qiu
  • Jiahong Liu
  • Yankai Chen
  • Irwin King

Existing unsupervised deep product quantization methods primarily aim for the increased similarity between different views of the identical image, whereas the delicate multi-level semantic similarities preserved between images are overlooked. Moreover, these methods predominantly focus on the Euclidean space for computational convenience, compromising their ability to map the multi-level semantic relationships between images effectively. To mitigate these shortcomings, we propose a novel unsupervised product quantization method dubbed Hierarchical Hyperbolic Product Quantization (HiHPQ), which learns quantized representations by incorporating hierarchical semantic similarity within hyperbolic geometry. Specifically, we propose a hyperbolic product quantizer, where the hyperbolic codebook attention mechanism and the quantized contrastive learning on the hyperbolic product manifold are introduced to expedite quantization. Furthermore, we propose a hierarchical semantics learning module, designed to enhance the distinction between similar and non-matching images for a query by utilizing the extracted hierarchical semantics as an additional training supervision. Experiments on benchmark image datasets show that our proposed method outperforms state-of-the-art baselines.

AAAI Conference 2024 Conference Paper

Influential Exemplar Replay for Incremental Learning in Recommender Systems

  • Xinni Zhang
  • Yankai Chen
  • Chenhao Ma
  • Yixiang Fang
  • Irwin King

Personalized recommender systems have found widespread applications for effective information filtering. Conventional models engage in knowledge mining within the static setting to reconstruct singular historical data. Nonetheless, the dynamics of real-world environments are in a constant state of flux, rendering acquired model knowledge inadequate for accommodating emergent trends and thus leading to notable recommendation performance decline. Given the typically prohibitive cost of exhaustive model retraining, it has emerged to study incremental learning for recommender systems with ever-growing data. In this paper, we propose an effective model-agnostic framework, namely INFluential Exemplar Replay (INFER). INFER facilitates recommender models in retaining the earlier assimilated knowledge, e.g., users' enduring preferences, while concurrently accommodating evolving trends manifested in users' new interaction behaviors. We commence with a vanilla implementation that centers on identifying the most representative data samples for effective consolidation of early knowledge. Subsequently, we propose an advanced solution, namely INFERONCE, to optimize the computational overhead associated with the vanilla implementation. Extensive experiments on four prototypical backbone models, two classic recommendation tasks, and four widely used benchmarks consistently demonstrate the effectiveness of our method as well as its compatibility for extending to several incremental recommender models.

NeurIPS Conference 2024 Conference Paper

On the Necessity of Collaboration for Online Model Selection with Decentralized Data

  • Junfan Li
  • Zheshun Wu
  • Zenglin Xu
  • Irwin King

We consider online model selection with decentralized data over $M$ clients, and study the necessity of collaboration among clients. Previous work proposed various federated algorithms without demonstrating their necessity, while we answer the question from a novel perspective of computational constraints. We prove lower bounds on the regret, and propose a federated algorithm and analyze the upper bound. Our results show (i) collaboration is unnecessary in the absence of computational constraints on clients; (ii) collaboration is necessary if the computational cost on each client is limited to $o(K)$, where $K$ is the number of candidate hypothesis spaces. We clarify the unnecessary nature of collaboration in previous federated algorithms for distributed online multi-kernel learning, and improve the regret bounds at a smaller computational and communication cost. Our algorithm relies on three new techniques including an improved Bernstein's inequality for martingale, a federated online mirror descent framework, and decoupling model selection and prediction, which might be of independent interest.

IJCAI Conference 2024 Conference Paper

Towards Geometric Normalization Techniques in SE(3) Equivariant Graph Neural Networks for Physical Dynamics Simulations

  • Ziqiao Meng
  • Liang Zeng
  • Zixing Song
  • Tingyang Xu
  • Peilin Zhao
  • Irwin King

SE(3) equivariance is a fundamental property that is highly desirable to maintain in physical dynamics modeling. This property ensures neural outputs to remain robust when the inputs are translated or rotated. Recently, there have been several proposals for SE(3) equivariant graph neural networks (GNNs) that have shown promising results in simulating particle dynamics. However, existing works have neglected an important issue that current SE(3) equivariant GNNs cannot scale to large particle systems. Although some simple normalization techniques are already in use to stabilize the training dynamics of equivariant graph networks, they actually break the SE(3) equivariance of the architectures. In this work, we first show the numerical instability of training equivariant GNNs on large particle systems and then analyze some existing normalization strategies adopted in modern works. We propose a new normalization layer called GeoNorm, which can satisfy the SE(3) equivariance and simultaneously stabilize the training process. We conduct comprehensive experiments on N-body system simulation tasks with larger particle system sizes. The experimental results demonstrate that GeoNorm successfully preserves the SE(3) equivariance compared to baseline techniques and stabilizes the training dynamics of SE(3) equivariant GNNs on large systems.

IJCAI Conference 2023 Conference Paper

A Unified View of Deep Learning for Reaction and Retrosynthesis Prediction: Current Status and Future Challenges

  • Ziqiao Meng
  • Peilin Zhao
  • Yang Yu
  • Irwin King

Reaction and retrosynthesis prediction are two fundamental tasks in computational chemistry. In recent years, these two tasks have attracted great attentions from both machine learning and drug discovery communities. Various deep learning approaches have been proposed to tackle these two problems and achieved initial success. In this survey, we conduct a comprehensive investigation on advanced deep learning-based reaction and retrosynthesis prediction models. We first summarize the design mechanism, strengths and weaknesses of the state-of-the-art approaches. Then we further discuss limitations of current solutions and open challenges in the problem itself. Last but not the least, we present some promising directions to facilitate future research. To our best knowledge, this paper is the first comprehensive and systematic survey on unified understanding of reaction and retrosynthesis prediction.

IJCAI Conference 2023 Conference Paper

Diagnose Like a Pathologist: Transformer-Enabled Hierarchical Attention-Guided Multiple Instance Learning for Whole Slide Image Classification

  • Conghao Xiong
  • Hao Chen
  • Joseph J. Y. Sung
  • Irwin King

Multiple Instance Learning (MIL) and transformers are increasingly popular in histopathology Whole Slide Image (WSI) classification. However, unlike human pathologists who selectively observe specific regions of histopathology tissues under different magnifications, most methods do not incorporate multiple resolutions of the WSIs, hierarchically and attentively, thereby leading to a loss of focus on the WSIs and information from other resolutions. To resolve this issue, we propose a Hierarchical Attention-Guided Multiple Instance Learning framework to fully exploit the WSIs. This framework can dynamically and attentively discover the discriminative regions across multiple resolutions of the WSIs. Within this framework, an Integrated Attention Transformer is proposed to further enhance the performance of the transformer and obtain a more holistic WSI (bag) representation. This transformer consists of multiple Integrated Attention Modules, which is the combination of a transformer layer and an aggregation module that produces a bag representation based on every instance representation in that bag. The experimental results show that our method achieved state-of-the-art performances on multiple datasets, including Camelyon16, TCGA-RCC, TCGA-NSCLC, and an in-house IMGC dataset. The code is available at https: //github. com/BearCleverProud/HAG-MIL.

IJCAI Conference 2023 Conference Paper

Doubly Stochastic Graph-based Non-autoregressive Reaction Prediction

  • Ziqiao Meng
  • Peilin Zhao
  • Yang Yu
  • Irwin King

Organic reaction prediction is a critical task in drug discovery. Recently, researchers have achieved non-autoregressive reaction prediction by modeling the redistribution of electrons, resulting in state-of-the-art top-1 accuracy, and enabling parallel sampling. However, the current non-autoregressive decoder does not satisfy two essential rules of electron redistribution modeling simultaneously: the electron-counting rule and the symmetry rule. This violation of the physical constraints of chemical reactions impairs model performance. In this work, we propose a new framework called ReactionSink that combines two doubly stochastic self-attention mappings to obtain electron redistribution predictions that follow both constraints. We further extend our solution to a general multi-head attention mechanism with augmented constraints. To achieve this, we apply Sinkhorn's algorithm to iteratively update self-attention mappings, which imposes doubly conservative constraints as additional informative priors on electron redistribution modeling. We theoretically demonstrate that our ReactionSink can simultaneously satisfy both rules, which the current decoder mechanism cannot do. Empirical results show that our approach consistently improves the predictive performance of non-autoregressive models and does not bring an unbearable additional computational cost.

IJCAI Conference 2023 Conference Paper

FedHGN: A Federated Framework for Heterogeneous Graph Neural Networks

  • Xinyu Fu
  • Irwin King

Heterogeneous graph neural networks (HGNNs) can learn from typed and relational graph data more effectively than conventional GNNs. With larger parameter spaces, HGNNs may require more training data, which is often scarce in real-world applications due to privacy regulations (e. g. , GDPR). Federated graph learning (FGL) enables multiple clients to train a GNN collaboratively without sharing their local data. However, existing FGL methods mainly focus on homogeneous GNNs or knowledge graph embeddings; few have considered heterogeneous graphs and HGNNs. In federated heterogeneous graph learning, clients may have private graph schemas. Conventional FL/FGL methods attempting to define a global HGNN model would violate schema privacy. To address these challenges, we propose FedHGN, a novel and general FGL framework for HGNNs. FedHGN adopts schema-weight decoupling to enable schema-agnostic knowledge sharing and employs coefficients alignment to stabilize the training process and improve HGNN performance. With better privacy preservation, FedHGN consistently outperforms local training and conventional FL methods on three widely adopted heterogeneous graph datasets with varying client numbers. The code is available at https: //github. com/cynricfu/FedHGN.

AAAI Conference 2023 Conference Paper

Graph Component Contrastive Learning for Concept Relatedness Estimation

  • Yueen Ma
  • Zixing Song
  • Xuming Hu
  • Jingjing Li
  • Yifei Zhang
  • Irwin King

Concept relatedness estimation (CRE) aims to determine whether two given concepts are related. Existing methods only consider the pairwise relationship between concepts, while overlooking the higher-order relationship that could be encoded in a concept-level graph structure. We discover that this underlying graph satisfies a set of intrinsic properties of CRE, including reflexivity, commutativity, and transitivity. In this paper, we formalize the CRE properties and introduce a graph structure named ConcreteGraph. To address the data scarcity issue in CRE, we introduce a novel data augmentation approach to sample new concept pairs from the graph. As it is intractable for data augmentation to fully capture the structural information of the ConcreteGraph due to a large amount of potential concept pairs, we further introduce a novel Graph Component Contrastive Learning framework to implicitly learn the complete structure of the ConcreteGraph. Empirical results on three datasets show significant improvement over the state-of-the-art model. Detailed ablation studies demonstrate that our proposed approach can effectively capture the high-order relationship among concepts.

ICML Conference 2023 Conference Paper

Hyperbolic Representation Learning: Revisiting and Advancing

  • Menglin Yang 0001
  • Min Zhou 0006
  • Rex Ying
  • Yankai Chen 0001
  • Irwin King

The non-Euclidean geometry of hyperbolic spaces has recently garnered considerable attention in the realm of representation learning. Current endeavors in hyperbolic representation largely presuppose that the underlying hierarchies can be automatically inferred and preserved through the adaptive optimization process. This assumption, however, is questionable and requires further validation. In this work, we first introduce a position-tracking mechanism to scrutinize existing prevalent hyperbolic models, revealing that the learned representations are sub-optimal and unsatisfactory. To address this, we propose a simple yet effective method, hyperbolic informed embedding (HIE), by incorporating cost-free hierarchical information deduced from the hyperbolic distance of the node to the origin (i. e. , induced hyperbolic norm) to advance existing hyperbolic models. The proposed method HIE is both task-agnostic and model-agnostic, enabling its seamless integration with a broad spectrum of models and tasks. Extensive experiments across various models and different tasks demonstrate the versatility and adaptability of the proposed method. Remarkably, our method achieves a remarkable improvement of up to 21. 4% compared to the competing baselines.

NeurIPS Conference 2023 Conference Paper

Mitigating the Popularity Bias of Graph Collaborative Filtering: A Dimensional Collapse Perspective

  • Yifei Zhang
  • Hao Zhu
  • Yankai Chen
  • Zixing Song
  • Piotr Koniusz
  • Irwin King

Graph-based Collaborative Filtering (GCF) is widely used in personalized recommendation systems. However, GCF suffers from a fundamental problem where features tend to occupy the embedding space inefficiently (by spanning only a low-dimensional subspace). Such an effect is characterized in GCF by the embedding space being dominated by a few of popular items with the user embeddings highly concentrated around them. This enhances the so-called Matthew effect of the popularity bias where popular items are highly recommend whereas remaining items are ignored. In this paper, we analyze the above effect in GCF and reveal that the simplified graph convolution operation (typically used in GCF) shrinks the singular space of the feature matrix. As typical approaches (i. e. , optimizing the uniformity term) fail to prevent the embedding space degradation, we propose a decorrelation-enhanced GCF objective that promotes feature diversity by leveraging the so-called principle of redundancy reduction in embeddings. However, unlike conventional methods that use the Euclidean geometry to relax hard constraints for decorrelation, we exploit non-Euclidean geometry. Such a choice helps maintain the range space of the matrix and obtain small condition number, which prevents the embedding space degradation. Our method outperforms contrastive-based GCF models on several benchmark datasets and improves the performance for unpopular items.

ECAI Conference 2023 Conference Paper

Neural Architecture Search for Explainable Networks

  • Yaoman Li
  • Irwin King

One of the main challenges in machine learning is providing understandable explanations for complex models. Despite outperforming humans in many tasks, machine learning models are often treated as black boxes that are difficult to interpret. Post-hoc explanation methods have been developed to create interpretable surrogate models that explain the behavior of black-box models. However, these methods have been shown to perpetuate bad practices and lack stability. Recently, inherent explainable approaches have been proposed to provide built-in explainability to models. However, most of these methods sacrifice performance. This paper proposes the Neural Architecture Search for Explainable Networks (NASXNet) approach to address the trade-off between performance and interpretability. Our method applies architecture search to generate high-performance and explainable neural networks for image classification tasks. We conduct experiments on four datasets: CUB-200-2011, Stanford Cars, CIFAR 10, and CIFAR 100. The results demonstrate that our models provide a high-level interpretation of prediction results, achieving state-of-the-art performance that is on par with non-explainable models. This paper contributes by solving the trade-off problem between performance and interpretability. It is the first to apply neural architecture search to develop explainable deep learning models, generating state-of-the-art explainable models that outperform existing approaches. Additionally, a new training process is proposed that enables faster convergence during model training.

NeurIPS Conference 2023 Conference Paper

No Change, No Gain: Empowering Graph Neural Networks with Expected Model Change Maximization for Active Learning

  • Zixing Song
  • Yifei Zhang
  • Irwin King

Graph Neural Networks (GNNs) are crucial for machine learning applications with graph-structured data, but their success depends on sufficient labeled data. We present a novel active learning (AL) method for GNNs, extending the Expected Model Change Maximization (EMCM) principle to improve prediction performance on unlabeled data. By presenting a Bayesian interpretation for the node embeddings generated by GNNs under the semi-supervised setting, we efficiently compute the closed-form EMCM acquisition function as the selection criterion for AL without re-training. Our method establishes a direct connection with expected prediction error minimization, offering theoretical guarantees for AL performance. Experiments demonstrate our method's effectiveness compared to existing approaches, in terms of both accuracy and efficiency.

NeurIPS Conference 2023 Conference Paper

Optimal Block-wise Asymmetric Graph Construction for Graph-based Semi-supervised Learning

  • Zixing Song
  • Yifei Zhang
  • Irwin King

Graph-based semi-supervised learning (GSSL) serves as a powerful tool to model the underlying manifold structures of samples in high-dimensional spaces. It involves two phases: constructing an affinity graph from available data and inferring labels for unlabeled nodes on this graph. While numerous algorithms have been developed for label inference, the crucial graph construction phase has received comparatively less attention, despite its significant influence on the subsequent phase. In this paper, we present an optimal asymmetric graph structure for the label inference phase with theoretical motivations. Unlike existing graph construction methods, we differentiate the distinct roles that labeled nodes and unlabeled nodes could play. Accordingly, we design an efficient block-wise graph learning algorithm with a global convergence guarantee. Other benefits induced by our method, such as enhanced robustness to noisy node features, are explored as well. Finally, we perform extensive experiments on synthetic and real-world datasets to demonstrate its superiority to the state-of-the-art graph construction methods in GSSL.

NeurIPS Conference 2023 Conference Paper

Predicting Global Label Relationship Matrix for Graph Neural Networks under Heterophily

  • Langzhang Liang
  • Xiangjing Hu
  • Zenglin Xu
  • Zixing Song
  • Irwin King

Graph Neural Networks (GNNs) have been shown to achieve remarkable performance on node classification tasks by exploiting both graph structures and node features. The majority of existing GNNs rely on the implicit homophily assumption. Recent studies have demonstrated that GNNs may struggle to model heterophilous graphs where nodes with different labels are more likely connected. To address this issue, we propose a generic GNN applicable to both homophilous and heterophilous graphs, namely Low-Rank Graph Neural Network (LRGNN). Our analysis demonstrates that a signed graph's global label relationship matrix has a low rank. This insight inspires us to predict the label relationship matrix by solving a robust low-rank matrix approximation problem, as prior research has proven that low-rank approximation could achieve perfect recovery under certain conditions. The experimental results reveal that the solution bears a strong resemblance to the label relationship matrix, presenting two advantages for graph modeling: a block diagonal structure and varying distributions of within-class and between-class entries.

AAAI Conference 2023 Conference Paper

Spectral Feature Augmentation for Graph Contrastive Learning and Beyond

  • Yifei Zhang
  • Hao Zhu
  • Zixing Song
  • Piotr Koniusz
  • Irwin King

Although augmentations (e.g., perturbation of graph edges, image crops) boost the efficiency of Contrastive Learning (CL), feature level augmentation is another plausible, complementary yet not well researched strategy. Thus, we present a novel spectral feature argumentation for contrastive learning on graphs (and images). To this end, for each data view, we estimate a low-rank approximation per feature map and subtract that approximation from the map to obtain its complement. This is achieved by the proposed herein incomplete power iteration, a non-standard power iteration regime which enjoys two valuable byproducts (under mere one or two iterations): (i) it partially balances spectrum of the feature map, and (ii) it injects the noise into rebalanced singular values of the feature map (spectral augmentation). For two views, we align these rebalanced feature maps as such an improved alignment step can focus more on less dominant singular values of matrices of both views, whereas the spectral augmentation does not affect the spectral angle alignment (singular vectors are not perturbed). We derive the analytical form for: (i) the incomplete power iteration to capture its spectrum-balancing effect, and (ii) the variance of singular values augmented implicitly by the noise. We also show that the spectral augmentation improves the generalization bound. Experiments on graph/image datasets show that our spectral feature augmentation outperforms baselines, and is complementary with other augmentation strategies and compatible with various contrastive losses.

AAAI Conference 2022 Conference Paper

Hierarchical Heterogeneous Graph Attention Network for Syntax-Aware Summarization

  • Zixing Song
  • Irwin King

The task of summarization often requires a non-trivial understanding of the given text at the semantic level. In this work, we essentially incorporate the constituent structure into the single document summarization via the Graph Neural Networks to learn the semantic meaning of tokens. More specifically, we propose a novel hierarchical heterogeneous graph attention network over constituency-based parse trees for syntax-aware summarization. This approach reflects psychological findings that humans will pinpoint specific selection patterns to construct summaries hierarchically. Extensive experiments demonstrate that our model is effective for both the abstractive and extractive summarization tasks on five benchmark datasets from various domains. Moreover, further performance improvement can be obtained by virtue of stateof-the-art pre-trained models.

AAAI Conference 2022 Conference Paper

Text Revision By On-the-Fly Representation Optimization

  • Jingjing Li
  • Zichao Li
  • Tao Ge
  • Irwin King
  • Michael R. Lyu

Text revision refers to a family of natural language generation tasks, where the source and target sequences share moderate resemblance in surface form but differentiate in attributes, such as text formality and simplicity. Current state-of-theart methods formulate these tasks as sequence-to-sequence learning problems, which rely on large-scale parallel training corpus. In this paper, we present an iterative in-place editing approach for text revision, which requires no parallel data. In this approach, we simply fine-tune a pre-trained Transformer with masked language modeling and attribute classification. During inference, the editing at each iteration is realized by two-step span replacement. At the first step, the distributed representation of the text optimizes on the fly towards an attribute function. At the second step, a text span is masked and another new one is proposed conditioned on the optimized representation. The empirical experiments on two typical and important text revision tasks, text formalization and text simplification, show the effectiveness of our approach. It achieves competitive and even better performance than state-of-the-art supervised methods on text simplification, and gains better performance than strong unsupervised methods on text formalization. Our code and model are released at https: //github. com/jingjingli01/OREO.

NeurIPS Conference 2022 Conference Paper

Towards Efficient Post-training Quantization of Pre-trained Language Models

  • Haoli Bai
  • Lu Hou
  • Lifeng Shang
  • Xin Jiang
  • Irwin King
  • Michael R Lyu

Network quantization has gained increasing attention with the rapid growth of large pre-trained language models~(PLMs). However, most existing quantization methods for PLMs follow quantization-aware training~(QAT) that requires end-to-end training with full access to the entire dataset. Therefore, they suffer from slow training, large memory overhead, and data accessibility issues. In this paper, we study post-training quantization~(PTQ) of PLMs, and propose module-wise quantization error minimization~(MREM), an efficient solution to mitigate these issues. By partitioning the PLM into multiple modules, we minimize the reconstruction error incurred by quantization for each module. In addition, we design a new model parallel training strategy such that each module can be trained locally on separate computing devices without waiting for preceding modules, which brings nearly the theoretical training speed-up (e. g. , $4\times$ on $4$ GPUs). Experiments on GLUE and SQuAD benchmarks show that our proposed PTQ solution not only performs close to QAT, but also enjoys significant reductions in training time, memory overhead, and data consumption.

IJCAI Conference 2020 Conference Paper

Efficient Community Search over Large Directed Graph: An Augmented Index-based Approach

  • Yankai Chen
  • Jie Zhang
  • Yixiang Fang
  • Xin Cao
  • Irwin King

Given a graph G and a query vertex q, the topic of community search (CS), aiming to retrieve a dense subgraph of G containing q, has gained much attention. Most existing works focus on undirected graphs which overlooks the rich information carried by the edge directions. Recently, the problem of community search over directed graphs (or CSD problem) has been studied [Fang et al. , 2019b]; it finds a connected subgraph containing q, where the in-degree and out-degree of each vertex within the subgraph are at least k and l, respectively. However, existing solutions are inefficient, especially on large graphs. To tackle this issue, in this paper we propose a novel index called D-Forest, which allows a CSD query to be completed within the optimal time cost. We further propose efficient index construction methods. Extensive experiments on six real large graphs show that our index-based query algorithm is up to two orders of magnitude faster than existing solutions.

AAAI Conference 2020 Conference Paper

Few Shot Network Compression via Cross Distillation

  • Haoli Bai
  • Jiaxiang Wu
  • Irwin King
  • Michael Lyu

Model compression has been widely adopted to obtain lightweighted deep neural networks. Most prevalent methods, however, require fine-tuning with sufficient training data to ensure accuracy, which could be challenged by privacy and security issues. As a compromise between privacy and performance, in this paper we investigate few shot network compression: given few samples per class, how can we effectively compress the network with negligible performance drop? The core challenge of few shot network compression lies in high estimation errors from the original network during inference, since the compressed network can easily over-fits on the few training instances. The estimation errors could propagate and accumulate layer-wisely and finally deteriorate the network output. To address the problem, we propose cross distillation, a novel layer-wise knowledge distillation approach. By interweaving hidden layers of teacher and student network, layer-wisely accumulated estimation errors can be effectively reduced. The proposed method offers a general framework compatible with prevalent network compression techniques such as pruning. Extensive experiments n benchmark datasets demonstrate that cross distillation can significantly improve the student network’s accuracy when only a few training instances are available.

AAAI Conference 2020 Conference Paper

Real-Time Emotion Recognition via Attention Gated Hierarchical Memory Network

  • Wenxiang Jiao
  • Michael Lyu
  • Irwin King

Real-time emotion recognition (RTER) in conversations is significant for developing emotionally intelligent chatting machines. Without the future context in RTER, it becomes critical to build the memory bank carefully for capturing historical context and summarize the memories appropriately to retrieve relevant information. We propose an Attention Gated Hierarchical Memory Network (AGHMN) to address the problems of prior work: (1) Commonly used convolutional neural networks (CNNs) for utterance feature extraction are less compatible in the memory modules; (2) Unidirectional gated recurrent units (GRUs) only allow each historical utterance to have context before it, preventing information propagation in the opposite direction; (3) The Soft Attention for summarizing loses the positional and ordering information of memories, regardless of how the memory bank is built. Particularly, we propose a Hierarchical Memory Network (HMN) with a bidirectional GRU (BiGRU) as the utterance reader and a BiGRU fusion layer for the interaction between historical utterances. For memory summarizing, we propose an Attention GRU (AGRU) where we utilize the attention weights to update the internal state of GRU. We further promote the AGRU to a bidirectional variant (BiAGRU) to balance the contextual information from recent memories and that from distant memories. We conduct experiments on two emotion conversation datasets with extensive analysis, demonstrating the efficacy of our AGHMN models.

NeurIPS Conference 2020 Conference Paper

Revisiting Parameter Sharing for Automatic Neural Channel Number Search

  • Jiaxing Wang
  • Haoli Bai
  • Jiaxiang Wu
  • Xupeng Shi
  • Junzhou Huang
  • Irwin King
  • Michael Lyu
  • Jian Cheng

Recent advances in neural architecture search inspire many channel number search algorithms~(CNS) for convolutional neural networks. To improve searching efficiency, parameter sharing is widely applied, which reuses parameters among different channel configurations. Nevertheless, it is unclear how parameter sharing affects the searching process. In this paper, we aim at providing a better understanding and exploitation of parameter sharing for CNS. Specifically, we propose affine parameter sharing~(APS) as a general formulation to unify and quantitatively analyze existing channel search algorithms. It is found that with parameter sharing, weight updates of one architecture can simultaneously benefit other candidates. However, it also results in less confidence in choosing good architectures. We thus propose a new strategy of parameter sharing towards a better balance between training efficiency and architecture discrimination. Extensive analysis and experiments demonstrate the superiority of the proposed strategy in channel configuration against many state-of-the-art counterparts on benchmark datasets.

NeurIPS Conference 2020 Conference Paper

Unsupervised Text Generation by Learning from Search

  • Jingjing Li
  • Zichao Li
  • Lili Mou
  • Xin Jiang
  • Michael Lyu
  • Irwin King

In this work, we propose TGLS, a novel framework for unsupervised Text Generation by Learning from Search. We start by applying a strong search algorithm (in particular, simulated annealing) towards a heuristically defined objective that (roughly) estimates the quality of sentences. Then, a conditional generative model learns from the search results, and meanwhile smooth out the noise of search. The alternation between search and learning can be repeated for performance bootstrapping. We demonstrate the effectiveness of TGLS on two real-world natural language generation tasks, unsupervised paraphrasing and text formalization. Our model significantly outperforms unsupervised baseline methods in both tasks. Especially, it achieves comparable performance to strong supervised methods for paraphrase generation.

AAAI Conference 2019 Conference Paper

DDFlow: Learning Optical Flow with Unlabeled Data Distillation

  • Pengpeng Liu
  • Irwin King
  • Michael R. Lyu
  • Jia Xu

We present DDFlow, a data distillation approach to learning optical flow estimation from unlabeled data. The approach distills reliable predictions from a teacher network, and uses these predictions as annotations to guide a student network to learn optical flow. Unlike existing work relying on handcrafted energy terms to handle occlusion, our approach is data-driven, and learns optical flow for occluded pixels. This enables us to train our model with a much simpler loss function, and achieve a much higher accuracy. We conduct a rigorous evaluation on the challenging Flying Chairs, MPI Sintel, KITTI 2012 and 2015 benchmarks, and show that our approach significantly outperforms all existing unsupervised learning methods, while running at real time.

IJCAI Conference 2019 Conference Paper

Difficulty Controllable Generation of Reading Comprehension Questions

  • Yifan Gao
  • Lidong Bing
  • Wang Chen
  • Michael Lyu
  • Irwin King

We investigate the difficulty levels of questions in reading comprehension datasets such as SQuAD, and propose a new question generation setting, named Difficulty-controllable Question Generation (DQG). Taking as input a sentence in the reading comprehension paragraph and some of its text fragments (i. e. , answers) that we want to ask questions about, a DQG method needs to generate questions each of which has a given text fragment as its answer, and meanwhile the generation is under the control of specified difficulty labels---the output questions should satisfy the specified difficulty as much as possible. To solve this task, we propose an end-to-end framework to generate questions of designated difficulty levels by exploring a few important intuitions. For evaluation, we prepared the first dataset of reading comprehension questions with difficulty labels. The results show that the question generated by our framework not only have better quality under the metrics like BLEU, but also comply with the specified difficulty labels.

AAAI Conference 2019 Conference Paper

Generating Distractors for Reading Comprehension Questions from Real Examinations

  • Yifan Gao
  • Lidong Bing
  • Piji Li
  • Irwin King
  • Michael R. Lyu

We investigate the task of distractor generation for multiple choice reading comprehension questions from examinations. In contrast to all previous works, we do not aim at preparing words or short phrases distractors, instead, we endeavor to generate longer and semantic-rich distractors which are closer to distractors in real reading comprehension from examinations. Taking a reading comprehension article, a pair of question and its correct option as input, our goal is to generate several distractors which are somehow related to the answer, consistent with the semantic context of the question and have some trace in the article. We propose a hierarchical encoderdecoder framework with static and dynamic attention mechanisms to tackle this task. Specifically, the dynamic attention can combine sentence-level and word-level attention varying at each recurrent time step to generate a more readable sequence. The static attention is to modulate the dynamic attention not to focus on question irrelevant sentences or sentences which contribute to the correct option. Our proposed framework outperforms several strong baselines on the first prepared distractor generation dataset of real reading comprehension questions. For human evaluation, compared with those distractors generated by baselines, our generated distractors are more functional to confuse the annotators.

IJCAI Conference 2019 Conference Paper

Parallel Wasserstein Generative Adversarial Nets with Multiple Discriminators

  • Yuxin Su
  • Shenglin Zhao
  • Xixian Chen
  • Irwin King
  • Michael Lyu

Wasserstein Generative Adversarial Nets~(GANs) are newly proposed GAN algorithms and widely used in computer vision, web mining, information retrieval, etc. However, the existing algorithms with approximated Wasserstein loss converge slowly due to heavy computation cost and usually generate unstable results as well. In this paper, we solve the computation cost problem by speeding up the Wasserstein GANs from a well-designed communication efficient parallel architecture. Specifically, we develop a new problem formulation targeting the accurate evaluation of Wasserstein distance and propose an easily parallel optimization algorithm to train the Wasserstein GANs. Compared to traditional parallel architecture, our proposed framework is designed explicitly for the skew parameter updates between the generator network and discriminator network. Rigorous experiments reveal that our proposed framework achieves a significant improvement regarding convergence speed with comparable stability on generating images, compared to the state-of-the-art of Wasserstein GANs algorithms.

IJCAI Conference 2019 Conference Paper

STAR-GCN: Stacked and Reconstructed Graph Convolutional Networks for Recommender Systems

  • Jiani Zhang
  • Xingjian Shi
  • Shenglin Zhao
  • Irwin King

We propose a new STAcked and Reconstructed Graph Convolutional Networks (STAR-GCN) architecture to learn node representations for boosting the performance in recommender systems, especially in the cold start scenario. STAR-GCN employs a stack of GCN encoder-decoders combined with intermediate supervision to improve the final prediction performance. Unlike the graph convolutional matrix completion model with one-hot encoding node inputs, our STAR-GCN learns low-dimensional user and item latent factors as the input to restrain the model space complexity. Moreover, our STAR-GCN can produce node embeddings for new nodes by reconstructing masked input node embeddings, which essentially tackles the cold start problem. Furthermore, we discover a label leakage issue when training GCN-based models for link prediction tasks and propose a training strategy to avoid the issue. Empirical results on multiple rating prediction benchmarks demonstrate our model achieves state-of-the-art performance in four out of five real-world datasets and significant improvements in predicting ratings in the cold start scenario. The code implementation is available in https: //github. com/jennyzhang0215/STAR-GCN.

AAAI Conference 2019 Conference Paper

Title-Guided Encoding for Keyphrase Generation

  • Wang Chen
  • Yifan Gao
  • Jiani Zhang
  • Irwin King
  • Michael R. Lyu

Keyphrase generation (KG) aims to generate a set of keyphrases given a document, which is a fundamental task in natural language processing (NLP). Most previous methods solve this problem in an extractive manner, while recently, several attempts are made under the generative setting using deep neural networks. However, the state-of-the-art generative methods simply treat the document title and the document main body equally, ignoring the leading role of the title to the overall document. To solve this problem, we introduce a new model called Title-Guided Network (TG-Net) for automatic keyphrase generation task based on the encoderdecoder architecture with two new features: (i) the title is additionally employed as a query-like input, and (ii) a titleguided encoder gathers the relevant information from the title to each word in the document. Experiments on a range of KG datasets demonstrate that our model outperforms the state-ofthe-art models with a large margin, especially for documents with either very low or very high title length ratios.

IJCAI Conference 2018 Conference Paper

A Generic Approach for Accelerating Stochastic Zeroth-Order Convex Optimization

  • XiaoTian Yu
  • Irwin King
  • Michael R. Lyu
  • Tianbao Yang

In this paper, we propose a generic approach for accelerating the convergence of existing algorithms to solve the problem of stochastic zeroth-order convex optimization (SZCO). Standard techniques for accelerating the convergence of stochastic zeroth-order algorithms are by exploring multiple functional evaluations (e. g. , two-point evaluations), or by exploiting global conditions of the problem (e. g. , smoothness and strong convexity). Nevertheless, these classic acceleration techniques are necessarily restricting the applicability of newly developed algorithms. The key of our proposed generic approach is to explore a local growth condition (or called local error bound condition) of the objective function in SZCO. The benefits of the proposed acceleration technique are: (i) it is applicable to both settings with one-point evaluation and two-point evaluations; (ii) it does not necessarily require strong convexity or smoothness condition of the objective function; (iii) it yields an improvement on convergence for a broad family of problems. Empirical studies in various settings demonstrate the effectiveness of the proposed acceleration approach.

NeurIPS Conference 2018 Conference Paper

Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payoffs

  • Han Shao
  • XiaoTian Yu
  • Irwin King
  • Michael Lyu

In linear stochastic bandits, it is commonly assumed that payoffs are with sub-Gaussian noises. In this paper, under a weaker assumption on noises, we study the problem of \underline{lin}ear stochastic {\underline b}andits with h{\underline e}avy-{\underline t}ailed payoffs (LinBET), where the distributions have finite moments of order $1+\epsilon$, for some $\epsilon\in (0, 1]$. We rigorously analyze the regret lower bound of LinBET as $\Omega(T^{\frac{1}{1+\epsilon}})$, implying that finite moments of order 2 (i. e. , finite variances) yield the bound of $\Omega(\sqrt{T})$, with $T$ being the total number of rounds to play bandits. The provided lower bound also indicates that the state-of-the-art algorithms for LinBET are far from optimal. By adopting median of means with a well-designed allocation of decisions and truncation based on historical information, we develop two novel bandit algorithms, where the regret upper bounds match the lower bound up to polylogarithmic factors. To the best of our knowledge, we are the first to solve LinBET optimally in the sense of the polynomial order on $T$. Our proposed algorithms are evaluated based on synthetic datasets, and outperform the state-of-the-art results.

IJCAI Conference 2018 Conference Paper

Code Completion with Neural Attention and Pointer Networks

  • Jian Li
  • Yue Wang
  • Michael R. Lyu
  • Irwin King

Intelligent code completion has become an essential research task to accelerate modern software development. To facilitate effective code completion for dynamically-typed programming languages, we apply neural language models by learning from large codebases, and develop a tailored attention mechanism for code completion. However, standard neural language models even with attention mechanism cannot correctly predict the out-of-vocabulary (OoV) words that restrict the code completion performance. In this paper, inspired by the prevalence of locally repeated terms in program source code, and the recently proposed pointer copy mechanism, we propose a pointer mixture network for better predicting OoV words in code completion. Based on the context, the pointer mixture network learns to either generate a within-vocabulary word through an RNN component, or regenerate an OoV word from local context through a pointer component. Experiments on two benchmarked datasets demonstrate the effectiveness of our attention mechanism and pointer mixture network on the code completion task.

UAI Conference 2018 Conference Paper

GaAN: Gated Attention Networks for Learning on Large and Spatiotemporal Graphs

  • Jiani Zhang 0001
  • Xingjian Shi
  • Junyuan Xie
  • Hao Ma 0001
  • Irwin King
  • Dit-Yan Yeung

We propose a new network architecture, Gated Attention Networks (GaAN), for learning on graphs. Unlike the traditional multi-head attention mechanism, which equally consumes all attention heads, GaAN uses a convolutional sub-network to control each attention head’s importance. We demonstrate the effectiveness of GaAN on the inductive node classification problem on large graphs. Moreover, with GaAN as a building block, we construct the Graph Gated Recurrent Unit (GGRU) to address the traffic speed forecasting problem. Extensive experiments on three realworld datasets show that our GaAN framework achieves state-of-the-art results on both tasks.

UAI Conference 2018 Conference Paper

Pure Exploration of Multi-Armed Bandits with Heavy-Tailed Payoffs

  • Xiaotian Yu
  • Han Shao
  • Michael R. Lyu
  • Irwin King

Inspired by heavy-tailed distributions in practical scenarios, we investigate the problem on pure exploration of Multi-Armed Bandits (MAB) with heavy-tailed payoffs by breaking the assumption of payoffs with sub-Gaussian noises in MAB, and assuming that stochastic payoffs from bandits are with finite p-th moments, where p ∈ (1, +∞). The main contributions in this paper are three-fold. First, we technically analyze tail probabilities of empirical average and truncated empirical average (TEA) for estimating expected payoffs in sequential decisions with heavy-tailed noises via martingales. Second, we propose two effective bandit algorithms based on different prior information (i. e. , fixed confidence or fixed budget) for pure exploration of MAB generating payoffs with finite p-th moments. Third, we derive theoretical guarantees for the proposed two bandit algorithms, and demonstrate the effectiveness of two algorithms in pure exploration of MAB with heavy-tailed payoffs in synthetic data and real-world financial data.

AAAI Conference 2017 Conference Paper

CBRAP: Contextual Bandits with RAndom Projection

  • XiaoTian Yu
  • Michael R. Lyu
  • Irwin King

Contextual bandits with linear payoffs, which are also known as linear bandits, provide a powerful alternative for solving practical problems of sequential decisions, e. g. , online advertisements. In the era of big data, contextual data usually tend to be high-dimensional, which leads to new challenges for traditional linear bandits mostly designed for the setting of low-dimensional contextual data. Due to the curse of dimensionality, there are two challenges in most of the current bandit algorithms: the first is high time-complexity; and the second is extreme large upper regret bounds with highdimensional data. In this paper, in order to attack the above two challenges effectively, we develop an algorithm of Contextual Bandits via RAndom Projection (CBRAP) in the setting of linear payoffs, which works especially for high-dimensional contextual data. The proposed CBRAP algorithm is time-efficient and flexible, because it enables players to choose an arm in a low-dimensional space, and relaxes the sparsity assumption of constant number of non-zero components in previous work. Besides, we prove an upper regret bound for the proposed algorithm, which is associated with reduced dimensions. By comparing with three benchmark algorithms, we demonstrate improved performance on cumulative payoffs of CBRAP during its sequential decisions on both synthetic and real-world datasets, as well as its superior time-efficiency.

UAI Conference 2017 Conference Paper

FROSH: FasteR Online Sketching Hashing

  • Xixian Chen
  • Irwin King
  • Michael R. Lyu

Many hashing methods, especially those that are in the data-dependent category with good learning accuracy, are still inefficient when dealing with three critical problems in modern data analysis. First, data usually come in a streaming fashion, but most of the existing hashing methods are batch-based models. Second, when data become huge, the extensive computational time, large space requirement, and multiple passes to load the data into memory will be prohibitive. Third, data often lack sufficient label information. Although the recently proposed Online Sketching Hashing (OSH) is promising to alleviate all three issues mentioned above, its training procedure still suffers from a high time complexity. In this paper, we propose a FasteR Online Sketching Hashing (FROSH) method to make the training process faster. Compared with OSH, we leverage fast transform to sketch data more compactly. Particularly, we derive independent transformations to guarantee the sketching accuracy, and design a novel implementation to make such transformations applicable to online data sketching without increasing the space cost. We rigorously prove that our method can yield a comparable learning accuracy with a lower time complexity and an equal space cost compared with OSH. Finally, extensive experiments on synthetic and realworld datasets demonstrate the excellent performance of our method.

ICML Conference 2017 Conference Paper

Toward Efficient and Accurate Covariance Matrix Estimation on Compressed Data

  • Xixian Chen
  • Michael R. Lyu
  • Irwin King

Estimating covariance matrices is a fundamental technique in various domains, most notably in machine learning and signal processing. To tackle the challenges of extensive communication costs, large storage capacity requirements, and high processing time complexity when handling massive high-dimensional and distributed data, we propose an efficient and accurate covariance matrix estimation method via data compression. In contrast to previous data-oblivious compression schemes, we leverage a data-aware weighted sampling method to construct low-dimensional data for such estimation. We rigorously prove that our proposed estimator is unbiased and requires smaller data to achieve the same accuracy with specially designed sampling distributions. Besides, we depict that the computational procedures in our algorithm are efficient. All achievements imply an improved tradeoff between the estimation accuracy and computational costs. Finally, the extensive experiments on synthetic and real-world datasets validate the superior property of our method and illustrate that it significantly outperforms the state-of-the-art algorithms.

TIST Journal 2016 Journal Article

A Unified Point-of-Interest Recommendation Framework in Location-Based Social Networks

  • Chen Cheng
  • Haiqin Yang
  • Irwin King
  • Michael R. Lyu

Location-based social networks (LBSNs), such as Gowalla, Facebook, Foursquare, Brightkite, and so on, have attracted millions of users to share their social friendship and their locations via check-ins in the past few years. Plenty of valuable information is accumulated based on the check-in behaviors, which makes it possible to learn users’ moving patterns as well as their preferences. In LBSNs, point-of-interest (POI) recommendation is one of the most significant tasks because it can help targeted users explore their surroundings as well as help third-party developers provide personalized services. Matrix factorization is a promising method for this task because it can capture users’ preferences to locations and is widely adopted in traditional recommender systems such as movie recommendation. However, the sparsity of the check-in data makes it difficult to capture users’ preferences accurately. Geographical influence can help alleviate this problem and have a large impact on the final recommendation result. By studying users’ moving patterns, we find that users tend to check in around several centers and different users have different numbers of centers. Based on this, we propose a Multi-center Gaussian Model (MGM) to capture this pattern via modeling the probability of a user’s check-in on a location. Moreover, users are usually more interested in the top 20 or even top 10 recommended POIs, which makes personalized ranking important in this task. From previous work, directly optimizing for pairwise ranking like Bayesian Personalized Ranking (BPR) achieves better performance in the top- k recommendation than directly using matrix matrix factorization that aims to minimize the point-wise rating error. To consider users’ preferences, geographical influence and personalized ranking, we propose a unified POI recommendation framework, which unifies all of them together. Specifically, we first fuse MGM with matrix factorization methods and further with BPR using two different approaches. We conduct experiments on Gowalla and Foursquare datasets, which are two large-scale real-world LBSN datasets publicly available online. The results on both datasets show that our unified POI recommendation framework can produce better performance.

IJCAI Conference 2016 Conference Paper

Modeling the Homophily Effect between Links and Communities for Overlapping Community Detection

  • Hongyi Zhang
  • Tong Zhao
  • Irwin King
  • Michael R. Lyu

Overlapping community detection has drawn much attention recently since it allows nodes in a network to have multiple community memberships. A standard framework to deal with overlapping community detection is Matrix Factorization (MF). Although all existing MF-based approaches use links as input to identify communities, the relationship between links and communities is still under-investigated. Most of the approaches only view links as consequences of communities (community-to-link) but fail to explore how nodes' community memberships can be represented by their linked neighbors (link-to-community). In this paper, we propose a Homophily-based Nonnegative Matrix Factorization (HNMF) to model both-sided relationships between links and communities. From the community-to-link perspective, we apply a preference-based pairwise function by assuming that nodes with common communities have a higher probability to build links than those without common communities. From the link-to-community perspective, we propose a new community representation learning with network embedding by assuming that linked nodes have similar community representations. We conduct experiments on several real-world networks and the results show that our HNMF model is able to find communities with better quality compared with state-of-the-art baselines.

AAAI Conference 2016 Conference Paper

STELLAR: Spatial-Temporal Latent Ranking for Successive Point-of-Interest Recommendation

  • Shenglin Zhao
  • Tong Zhao
  • Haiqin Yang
  • Michael Lyu
  • Irwin King

Successive point-of-interest (POI) recommendation in location-based social networks (LBSNs) becomes a significant task since it helps users to navigate a number of candidate POIs and provides the best POI recommendations based on users’ most recent check-in knowledge. However, all existing methods for successive POI recommendation only focus on modeling the correlation between POIs based on users’ check-in sequences, but ignore an important fact that successive POI recommendation is a time-subtle recommendation task. In fact, even with the same previous check-in information, users would prefer different successive POIs at different time. To capture the impact of time on successive POI recommendation, in this paper, we propose a spatial-temporal latent ranking (STELLAR) method to explicitly model the interactions among user, POI, and time. In particular, the proposed STELLAR model is built upon a ranking-based pairwise tensor factorization framework with a fine-grained modeling of user-POI, POI-time, and POI-POI interactions for successive POI recommendation. Moreover, we propose a new interval-aware weight utility function to differentiate successive check-ins’ correlations, which breaks the time interval constraint in prior work. Evaluations on two real-world datasets demonstrate that the STELLAR model outperforms state-of-the-art successive POI recommendation model about 20% in Precision@5 and Recall@5.

IJCAI Conference 2015 Conference Paper

Exploiting k-Degree Locality to Improve Overlapping Community Detection

  • Hongyi Zhang
  • Michael R. Lyu
  • Irwin King

Community detection is of crucial importance in understanding structures of complex networks. In many real-world networks, communities naturally overlap since a node usually has multiple community memberships. One popular technique to cope with overlapping community detection is Matrix Factorization (MF). However, existing MFbased models have ignored the fact that besides neighbors, “local non-neighbors” (e. g. , my friend’s friend but not my direct friend) are helpful when discovering communities. In this paper, we propose a Locality-based Non-negative Matrix Factorization (LNMF) model to refine a preference-based model by incorporating locality into learning objective. We define a subgraph called “k-degree local network” to set a boundary between local nonneighbors and other non-neighbors. By discriminately treating these two class of non-neighbors, our model is able to capture the process of community formation. We propose a fast sampling strategy within the stochastic gradient descent based learning algorithm. We compare our LNMF model with several baseline methods on various real-world networks, including large ones with ground-truth communities. Results show that our model outperforms state-of-the-art approaches.

UAI Conference 2015 Conference Paper

Fast Relative-Error Approximation Algorithm for Ridge Regression

  • Shouyuan Chen
  • Yang Liu 0003
  • Michael R. Lyu
  • Irwin King
  • Shengyu Zhang 0002

Ridge regression is one of the most popular and effective regularized regression methods, and one case of particular interest is that the number of features p is much larger than the number of samples n, i. e. p n. In this case, the standard optimization algorithm for ridge regression computes the optimal solution x⇤ in O(n2 p + n3 ) time. In this paper, we propose a fast relativeerror approximation algorithm for ridge regression. More specifically, our algorithm outputs a solution x̃ satisfying kx̃ x⇤ k2  ✏kx⇤ k2 with high probability and runs in Õ(nnz(A) + n3 /✏2 ) time, where nnz(A) is the number of non-zero entries of matrix A. To the best of our knowledge, this is the first algorithm for ridge regression that runs in o(n2 p) time with provable relative-error approximation bound on the output vector. In addition, we analyze the risk inflation bound of our algorithm and apply our techniques to two generalizations of ridge regression, including multiple response ridge regression and a non-linear ridge regression problem. Finally, we show empirical results on both synthetic and real datasets.

AAAI Conference 2015 Conference Paper

Incorporating Implicit Link Preference Into Overlapping Community Detection

  • Hongyi Zhang
  • Irwin King
  • Michael Lyu

Community detection is an important technique to understand structures and patterns in complex networks. Recently, overlapping community detection becomes a trend due to the ubiquity of overlapping and nested communities in real world. However, existing approaches have ignored the use of implicit link preference information, i. e. , links can reflect a node’s preference on the targets of connections it wants to build. This information has strong impact on community detection since a node prefers to build links with nodes inside its community than those outside its community. In this paper, we propose a preference-based nonnegative matrix factorization (PNMF) model to incorporate implicit link preference information. Unlike conventional matrix factorization approaches, which simply approximate the original adjacency matrix in value, our model maximizes the likelihood of the preference order for each node by following the intuition that a node prefers its neighbors than other nodes. Our model overcomes the indiscriminate penalty problem in which non-linked pairs inside one community are equally penalized in objective functions as those across two communities. We propose a learning algorithm which can learn a node-community membership matrix via stochastic gradient descent with bootstrap sampling. We evaluate our PNMF model on several real-world networks. Experimental results show that our model outperforms state-of-the-art approaches and can be applied to large datasets.

AAAI Conference 2015 Conference Paper

Kernelized Online Imbalanced Learning with Fixed Budgets

  • Junjie Hu
  • Haiqin Yang
  • Irwin King
  • Michael Lyu
  • Anthony Man-Cho So

Online learning from imbalanced streaming data to capture the nonlinearity and heterogeneity of the data is significant in machine learning and data mining. To tackle this problem, we propose a kernelized online imbalanced learning (KOIL) algorithm to directly maximize the area under the ROC curve (AUC). We address two more challenges: 1) How to control the number of support vectors without sacrificing model performance; and 2) how to restrict the fluctuation of the learned decision function to attain smooth updating. To this end, we introduce two buffers with fixed budgets (buffer sizes) for positive class and negative class, respectively, to store the learned support vectors, which can allow us to capture the global information of the decision boundary. When determining the weight of a new support vector, we confine its influence only to its k-nearest opposite support vectors. This can restrict the effect of new instances and prevent the harm of outliers. More importantly, we design a sophisticated scheme to compensate the model after replacement is conducted when either buffer is full. With this compensation, the learned model approaches the one learned with infinite budgets. We present both theoretical analysis and extensive experimental comparison to demonstrate the effectiveness of our proposed KOIL.

IJCAI Conference 2015 Conference Paper

Training-Efficient Feature Map for Shift-Invariant Kernels

  • Xixian Chen
  • Haiqin Yang
  • Irwin King
  • Michael R. Lyu

Random feature map is popularly used to scale up kernel methods. However, employing a large number of mapped features to ensure an accurate approximation will still make the training time consuming. In this paper, we aim to improve the training efficiency of shift-invariant kernels by using fewer informative features without sacrificing precision. We propose a novel feature map method by extending Random Kitchen Sinks through fast datadependent subspace embedding to generate the desired features. More specifically, we describe two algorithms with different tradeoffs on the running speed and accuracy, and prove that O(l) features induced by them are able to perform as accurately as O(l2 ) features by other feature map methods. In addition, several experiments are conducted on the real-world datasets demonstrating the superiority of our proposed algorithms.

NeurIPS Conference 2014 Conference Paper

Combinatorial Pure Exploration of Multi-Armed Bandits

  • Shouyuan Chen
  • Tian Lin
  • Irwin King
  • Michael Lyu
  • Wei Chen

We study the {\em combinatorial pure exploration (CPE)} problem in the stochastic multi-armed bandit setting, where a learner explores a set of arms with the objective of identifying the optimal member of a \emph{decision class}, which is a collection of subsets of arms with certain combinatorial structures such as size-$K$ subsets, matchings, spanning trees or paths, etc. The CPE problem represents a rich class of pure exploration tasks which covers not only many existing models but also novel cases where the object of interest has a non-trivial combinatorial structure. In this paper, we provide a series of results for the general CPE problem. We present general learning algorithms which work for all decision classes that admit offline maximization oracles in both fixed confidence and fixed budget settings. We prove problem-dependent upper bounds of our algorithms. Our analysis exploits the combinatorial structures of the decision classes and introduces a new analytic tool. We also establish a general problem-dependent lower bound for the CPE problem. Our results show that the proposed algorithms achieve the optimal sample complexity (within logarithmic factors) for many decision classes. In addition, applying our results back to the problems of top-$K$ arms identification and multiple bandit best arms identification, we recover the best available upper bounds up to constant factors and partially resolve a conjecture on the lower bounds.

IJCAI Conference 2013 Conference Paper

A Unified Framework for Reputation Estimation in Online Rating Systems

  • Guang Ling
  • Irwin King
  • Michael R. Lyu

Online rating systems are now ubiquitous due to the success of recommender systems. In such systems, users are allowed to rate the items (movies, songs, commodities) in a predefined range of values. The ratings collected can be used to infer users’ preferences as well as items’ intrinsic features, which are then matched to perform personalized recommendation. Most previous work focuses on improving the prediction accuracy or ranking capability. Little attention has been paid to the problem of spammers or low-reputed users in such systems. Spammers contaminate the rating system by assigning unreasonable scores to items, which may affect the accuracy of a recommender system. There are evidences supporting the existence of spammers in online rating systems. Reputation estimation methods can be employed to keep track of users’ reputation and detect spammers in such systems. In this paper, we propose a unified framework for computing the reputation score of a user, given only users’ ratings on items. We show that previously proposed reputation estimation methods can be captured as special cases of our framework. We propose a new low-rank matrix factorization based reputation estimation method and demonstrate its superior discrimination ability.

NeurIPS Conference 2013 Conference Paper

Exact and Stable Recovery of Pairwise Interaction Tensors

  • Shouyuan Chen
  • Michael Lyu
  • Irwin King
  • Zenglin Xu

Tensor completion from incomplete observations is a problem of significant practical interest. However, it is unlikely that there exists an efficient algorithm with provable guarantee to recover a general tensor from a limited number of observations. In this paper, we study the recovery algorithm for pairwise interaction tensors, which has recently gained considerable attention for modeling multiple attribute data due to its simplicity and effectiveness. Specifically, in the absence of noise, we show that one can exactly recover a pairwise interaction tensor by solving a constrained convex program which minimizes the weighted sum of nuclear norms of matrices from $O(nr\log^2(n))$ observations. For the noisy cases, we also prove error bounds for a constrained convex program for recovering the tensors. Our experiments on the synthetic dataset demonstrate that the recovery performance of our algorithm agrees well with the theory. In addition, we apply our algorithm on a temporal collaborative filtering task and obtain state-of-the-art results.

IJCAI Conference 2013 Conference Paper

SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering

  • Jianping Shi
  • Naiyan Wang
  • Yang Xia
  • Dit-Yan Yeung
  • Irwin King
  • Jiaya Jia

Matrix factorization (MF) is a popular collaborative filtering approach for recommender systems due to its simplicity and effectiveness. Existing MF methods either assume that all latent features are uncorrelated or assume that all are correlated. To address the important issue of what structure should be imposed on the features, we investigate the covariance matrix of the latent features learned from real data. Based on the findings, we propose an MF model with a sparse covariance prior which favors a sparse yet non-diagonal covariance matrix. Not only can this reflect the semantics more faithfully, but imposing sparsity can also have a side effect of preventing overfitting. Starting from a probabilistic generative model with a sparse covariance prior, we formulate the model inference problem as a maximum a posteriori (MAP) estimation problem. The optimization procedure makes use of stochastic gradient descent and majorizationminimization. For empirical validation, we conduct experiments using the MovieLens and Netflix datasets to compare the proposed method with two strong baselines which use different priors. Experimental results show that our sparse covariance prior can lead to performance improvement.

IJCAI Conference 2013 Conference Paper

Where You Like to Go Next: Successive Point-of-Interest Recommendation

  • Chen Cheng
  • Haiqin Yang
  • Michael R. Lyu
  • Irwin King

Personalized point-of-interest (POI) recommendation is a significant task in location-based social networks (LBSNs) as it can help provide better user experience as well as enable third-party services, e. g. , launching advertisements. To provide a good recommendation, various research has been conducted in the literature. However, pervious efforts mainly consider the “check-ins” in a whole and omit their temporal relation. They can only recommend POI globally and cannot know where a user would like to go tomorrow or in the next few days. In this paper, we consider the task of successive personalized POI recommendation in LB- SNs, which is a much harder task than standard personalized POI recommendation or prediction. To solve this task, we observe two prominent properties in the check-in sequence: personalized Markov chain and region localization. Hence, we propose a novel matrix factorization method, namely FPMC- LR, to embed the personalized Markov chains and the localized regions. Our proposed FPMC-LR not only exploits the personalized Markov chain in the check-in sequence, but also takes into account users’ movement constraint, i. e. , moving around a localized region. More importantly, utilizing the information of localized regions, we not only reduce the computation cost largely, but also discard the noisy information to boost recommendation. Results on two real-world LBSNs datasets demonstrate the merits of our proposed FPMC-LR.

AAAI Conference 2012 Conference Paper

A Data-Driven Approach to Question Subjectivity Identification in Community Question Answering

  • Tom Chao Zhou
  • Xiance Si
  • Edward Y. Chang
  • Irwin King
  • Michael R. Lyu

Automatic Subjective Question Answering (ASQA), which aims at answering users’ subjective questions using summaries of multiple opinions, becomes increasingly important. One challenge of ASQA is that expected answers for subjective questions may not readily exist in the Web. The rising and popularity of Community Question Answering (CQA) sites, which provide platforms for people to post and answer questions, provides an alternative to ASQA. One important task of ASQA is question subjectivity identification, which identifies whether a user is asking a subjective question. Unfortunately, there has been little labeled training data available for this task. In this paper, we propose an approach to collect training data automatically by utilizing social signals in CQA sites without involving any manual labeling. Experimental results show that our data-driven approach achieves 9. 37% relative improvement over the supervised approach using manually labeled data, and achieves 5. 15% relative gain over a stateof-the-art semi-supervised approach. In addition, we propose several heuristic features for question subjectivity identification. By adding these features, we achieve 11. 23% relative improvement over word n-gram feature under the same experimental setting.

AAAI Conference 2012 Conference Paper

Fused Matrix Factorization with Geographical and Social Influence in Location-Based Social Networks

  • Chen Cheng
  • Haiqin Yang
  • Irwin King
  • Michael Lyu

Recently, location-based social networks (LBSNs), such as Gowalla, Foursquare, Facebook, and Brightkite, etc. , have attracted millions of users to share their social friendship and their locations via check-ins. The available check-in information makes it possible to mine users’ preference on locations and to provide favorite recommendations. Personalized Point-of-interest (POI) recommendation is a significant task in LBSNs since it can help targeted users explore their surroundings as well as help third-party developers to provide personalized services. To solve this task, matrix factorization is a promising tool due to its success in recommender systems. However, previously proposed matrix factorization (MF) methods do not explore geographical influence, e. g. , multi-center check-in property, which yields suboptimal solutions for the recommendation. In this paper, to the best of our knowledge, we are the first to fuse MF with geographical and social influence for POI recommendation in LBSNs. We first capture the geographical influence via modeling the probability of a user’s check-in on a location as a Multi-center Gaussian Model (MGM). Next, we include social information and fuse the geographical influence into a generalized matrix factorization framework. Our solution to POI recommendation is efficient and scales linearly with the number of observations. Finally, we conduct thorough experiments on a large-scale real-world LBSNs dataset and demonstrate that the fused matrix factorization framework with MGM utilizes the distance information sufficiently and outperforms other state-of-the-art methods significantly.

UAI Conference 2012 Conference Paper

Response Aware Model-Based Collaborative Filtering

  • Guang Ling
  • Haiqin Yang
  • Michael R. Lyu
  • Irwin King

Previous work on recommender systems mainly focus on fitting the ratings provided by users. However, the response patterns, i.e., some items are rated while others not, are generally ignored. We argue that failing to observe such response patterns can lead to biased parameter estimation and sub-optimal model performance. Although several pieces of work have tried to model users’ response patterns, they miss the effectiveness and interpretability of the successful matrix factorization collaborative filtering approaches. To bridge the gap, in this paper, we unify explicit response models and PMF to establish the Response Aware Probabilistic Matrix Factorization (RAPMF) framework. We show that RAPMF subsumes PMF as a special case. Empirically we demonstrate the merits of RAPMF from various aspects.

TIST Journal 2011 Journal Article

Learning to recommend with explicit and implicit social relations

  • Hao Ma
  • Irwin King
  • Michael R. Lyu

Recommender systems have been well studied and developed, both in academia and in industry recently. However, traditional recommender systems assume that all the users are independent and identically distributed; this assumption ignores the connections among users, which is not consistent with the real-world observations where we always turn to our trusted friends for recommendations. Aiming at modeling recommender systems more accurately and realistically, we propose a novel probabilistic factor analysis framework which naturally fuses the users' tastes and their trusted friends' favors together. The proposed framework is quite general, and it can also be applied to pure user-item rating matrix even if we do not have explicit social trust information among users. In this framework, we coin the term social trust ensemble to represent the formulation of the social trust restrictions on the recommender systems. The complexity analysis indicates that our approach can be applied to very large datasets since it scales linearly with the number of observations, while the experimental results show that our method outperforms state-of-the-art approaches.

AAAI Conference 2011 Conference Paper

Learning to Suggest Questions in Online Forums

  • Tom Zhou
  • Chin-Yew Lin
  • Irwin King
  • Michael R. Lyu
  • Young-In Song
  • Yunbo Cao

Online forums contain interactive and semantically related discussions on various questions. Extracted question-answer archive is invaluable knowledge, which can be used to improve Question Answering services. In this paper, we address the problem of Question Suggestion, which targets at suggesting questions that are semantically related to a queried question. Existing bag-of-words approaches suffer from the shortcoming that they could not bridge the lexical chasm between semantically related questions. Therefore, we present a new framework to suggest questions, and propose the Topicenhanced Translation-based Language Model (TopicTRLM) which fuses both the lexical and latent semantic knowledge. Extensive experiments have been conducted with a large real world data set. Experimental results indicate our approach is very effective and outperforms other popular methods in several metrics.

AAAI Conference 2010 Conference Paper

Diversifying Query Suggestion Results

  • Hao Ma
  • Michael Lyu
  • Irwin King

In order to improve the user search experience, Query Suggestion, a technique for generating alternative queries to Web users, has become an indispensable feature for commercial search engines. However, previous work mainly focuses on suggesting relevant queries to the original query while ignoring the diversity in the suggestions, which will potentially dissatisfy Web users’ information needs. In this paper, we present a novel unified method to suggest both semantically relevant and diverse queries to Web users. The proposed approach is based on Markov random walk and hitting time analysis on the query-URL bipartite graph. It can effectively prevent semantically redundant queries from receiving a high rank, hence encouraging diversities in the results. We evaluate our method on a large commercial clickthrough dataset in terms of relevance measurement and diversity measurement. The experimental results show that our method is very effective in generating both relevant and diverse query suggestions.

AAAI Conference 2010 Conference Paper

Smooth Optimization for Effective Multiple Kernel Learning

  • Zenglin Xu
  • Rong Jin
  • Shenghuo Zhu
  • Michael Lyu
  • Irwin King

Multiple Kernel Learning (MKL) can be formulated as a convex-concave minmax optimization problem, whose saddle point corresponds to the optimal solution to MKL. Most MKL methods employ the L1-norm simplex constraints on the combination weights of kernels, which therefore involves optimization of a non-smooth function of the kernel weights. These methods usually divide the optimization into two cycles: one cycle deals with the optimization on the kernel combination weights, and the other cycle updates the parameters of SVM. Despite the success of their efficiency, they tend to discard informative complementary kernels. To improve accuracy, we introduce smoothness to the optimization procedure. Furthermore, we transform the optimization into a single smooth convex optimization problem and employ the Nesterov’s method to efficiently solve the optimization problem. Experiments on benchmark data sets demonstrate that the proposed algorithm clearly improves current MKL methods in a number scenarios.

AAAI Conference 2010 Conference Paper

UserRec: A User Recommendation Framework in Social Tagging Systems

  • Tom Zhou
  • Hao Ma
  • Michael Lyu
  • Irwin King

Social tagging systems have emerged as an effective way for users to annotate and share objects on the Web. However, with the growth of social tagging systems, users are easily overwhelmed by the large amount of data and it is very difficult for users to dig out information that he/she is interested in. Though the tagging system has provided interestbased social network features to enable the user to keep track of other users’ tagging activities, there is still no automatic and effective way for the user to discover other users with common interests. In this paper, we propose a User Recommendation (UserRec) framework for user interest modeling and interest-based user recommendation, aiming to boost information sharing among users with similar interests. Our work brings three major contributions to the research community: (1) we propose a tag-graph based community detection method to model the users’ personal interests, which are further represented by discrete topic distributions; (2) the similarity values between users’ topic distributions are measured by Kullback-Leibler divergence (KL-divergence), and the similarity values are further used to perform interestbased user recommendation; and (3) by analyzing users’ roles in a tagging system, we find users’ roles in a tagging system are similar to Web pages in the Internet. Experiments on tagging dataset of Web pages (Yahoo! Delicious) show that UserRec outperforms other state-of-the-art recommender system approaches.

IJCAI Conference 2009 Conference Paper

  • Zenglin Xu
  • Rong Jin
  • Michael R. Lyu
  • Irwin King

We consider the problem of semi-supervised feature selection, where we are given a small amount of labeled examples and a large amount of unlabeled examples. Since a small number of labeled samples are usually insufficient for identifying the relevant features, the critical problem arising from semi-supervised feature selection is how to take advantage of the information underneath the unlabeled data. To address this problem, we propose a novel discriminative semi-supervised feature selection method based on the idea of manifold regularization. The proposed method selects features through maximizing the classification margin between different classes and simultaneously exploiting the geometry of the probability distribution that generates both labeled and unlabeled data. We formulate the proposed feature selection method into a convex-concave optimization problem, where the saddle point corresponds to the optimal solution. To find the optimal solution, the level method, a fairly recent optimization method, is employed. We also present a theoretic proof of the convergence rate for the application of the level method to our problem. Empirical evaluation on several benchmark data sets demonstrates the effectiveness of the proposed semi-supervised feature selection method.

NeurIPS Conference 2009 Conference Paper

Adaptive Regularization for Transductive Support Vector Machine

  • Zenglin Xu
  • Rong Jin
  • Jianke Zhu
  • Irwin King
  • Michael Lyu
  • Zhirong Yang

We discuss the framework of Transductive Support Vector Machine (TSVM) from the perspective of the regularization strength induced by the unlabeled data. In this framework, SVM and TSVM can be regarded as a learning machine without regularization and one with full regularization from the unlabeled data, respectively. Therefore, to supplement this framework of the regularization strength, it is necessary to introduce data-dependant partial regularization. To this end, we reformulate TSVM into a form with controllable regularization strength, which includes SVM and TSVM as special cases. Furthermore, we introduce a method of adaptive regularization that is data dependant and is based on the smoothness assumption. Experiments on a set of benchmark data sets indicate the promising results of the proposed work compared with state-of-the-art TSVM algorithms.

NeurIPS Conference 2009 Conference Paper

Heavy-Tailed Symmetric Stochastic Neighbor Embedding

  • Zhirong Yang
  • Irwin King
  • Zenglin Xu
  • Erkki Oja

Stochastic Neighbor Embedding (SNE) has shown to be quite promising for data visualization. Currently, the most popular implementation, t-SNE, is restricted to a particular Student t-distribution as its embedding distribution. Moreover, it uses a gradient descent algorithm that may require users to tune parameters such as the learning step size, momentum, etc. , in finding its optimum. In this paper, we propose the Heavy-tailed Symmetric Stochastic Neighbor Embedding (HSSNE) method, which is a generalization of the t-SNE to accommodate various heavy-tailed embedding similarity functions. With this generalization, we are presented with two difficulties. The first is how to select the best embedding similarity among all heavy-tailed functions and the second is how to optimize the objective function once the heave-tailed function has been selected. Our contributions then are: (1) we point out that various heavy-tailed embedding similarities can be characterized by their negative score functions. Based on this finding, we present a parameterized subset of similarity functions for choosing the best tail-heaviness for HSSNE; (2) we present a fixed-point optimization algorithm that can be applied to all heavy-tailed functions and does not require the user to set any parameters; and (3) we present two empirical studies, one for unsupervised visualization showing that our optimization algorithm runs as fast and as good as the best known t-SNE implementation and the other for semi-supervised visualization showing quantitative superiority using the homogeneity measure as well as qualitative advantage in cluster separation over t-SNE.

NeurIPS Conference 2008 Conference Paper

An Extended Level Method for Efficient Multiple Kernel Learning

  • Zenglin Xu
  • Rong Jin
  • Irwin King
  • Michael Lyu

We consider the problem of multiple kernel learning (MKL), which can be formulated as a convex-concave problem. In the past, two efficient methods, i. e. , Semi-Infinite Linear Programming (SILP) and Subgradient Descent (SD), have been proposed for large-scale multiple kernel learning. Despite their success, both methods have their own shortcomings: (a) the SD method utilizes the gradient of only the current solution, and (b) the SILP method does not regularize the approximate solution obtained from the cutting plane model. In this work, we extend the level method, which was originally designed for optimizing non-smooth objective functions, to convex-concave optimization, and apply it to multiple kernel learning. The extended level method overcomes the drawbacks of SILP and SD by exploiting all the gradients computed in past iterations and by regularizing the solution via a projection to a level set. Empirical study with eight UCI datasets shows that the extended level method can significantly improve efficiency by saving on average 91. 9% of computational time over the SILP method and 70. 3% over the SD method.

NeurIPS Conference 2008 Conference Paper

Learning with Consistency between Inductive Functions and Kernels

  • Haixuan Yang
  • Irwin King
  • Michael Lyu

Regularized Least Squares (RLS) algorithms have the ability to avoid over-fitting problems and to express solutions as kernel expansions. However, we observe that the current RLS algorithms cannot provide a satisfactory interpretation even on a constant function. On the other hand, while kernel-based algorithms have been developed in such a tendency that almost all learning algorithms are kernelized or being kernelized, a basic fact is often ignored: The learned function from the data and the kernel fits the data well, but may not be consistent with the kernel. Based on these considerations and on the intuition that a good kernel-based inductive function should be consistent with both the data and the kernel, a novel learning scheme is proposed. The advantages of this scheme lie in its corresponding Representer Theorem, its strong interpretation ability about what kind of functions should not be penalized, and its promising accuracy improvements shown in a number of experiments. Furthermore, we provide a detailed technical description about heat kernels, which serves as an example for the readers to apply similar techniques for other kernels. Our work provides a preliminary step in a new direction to explore the varying consistency between inductive functions and kernels under various distributions.

NeurIPS Conference 2007 Conference Paper

Efficient Convex Relaxation for Transductive Support Vector Machine

  • Zenglin Xu
  • Rong Jin
  • Jianke Zhu
  • Irwin King
  • Michael Lyu

We consider the problem of Support Vector Machine transduction, which involves a combinatorial problem with exponential computational complexity in the number of unlabeled examples. Although several studies are devoted to Transductive SVM, they suffer either from the high computation complexity or from the solutions of local optimum. To address this problem, we propose solving Transductive SVM via a convex relaxation, which converts the NP-hard problem to a semi-definite programming. Compared with the other SDP relaxation for Transductive SVM, the proposed algorithm is computationally more efficient with the number of free parameters reduced from O(n2) to O(n) where n is the number of examples. Empirical study with several benchmark data sets shows the promising performance of the proposed algorithm in comparison with other state-of-the-art implementations of Transductive SVM.

JMLR Journal 2004 Journal Article

The Minimum Error Minimax Probability Machine

  • Kaizhu Huang
  • Haiqin Yang
  • Irwin King
  • Michael R. Lyu
  • Laiwan Chan

We construct a distribution-free Bayes optimal classifier called the Minimum Error Minimax Probability Machine (MEMPM) in a worst-case setting, i.e., under all possible choices of class-conditional densities with a given mean and covariance matrix. By assuming no specific distributions for the data, our model is thus distinguished from traditional Bayes optimal approaches, where an assumption on the data distribution is a must. This model is extended from the Minimax Probability Machine (MPM), a recently-proposed novel classifier, and is demonstrated to be the general case of MPM. Moreover, it includes another special case named the Biased Minimax Probability Machine, which is appropriate for handling biased classification. One appealing feature of MEMPM is that it contains an explicit performance indicator, i.e., a lower bound on the worst-case accuracy, which is shown to be tighter than that of MPM. We provide conditions under which the worst-case Bayes optimal classifier converges to the Bayes optimal classifier. We demonstrate how to apply a more general statistical framework to estimate model input parameters robustly. We also show how to extend our model to nonlinear classification by exploiting kernelization techniques. A series of experiments on both synthetic data sets and real world benchmark data sets validates our proposition and demonstrates the effectiveness of our model. [abs] [ pdf ]