Author name cluster

Ruobing Xie

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

26 papers

2 author rows

AAAI Conference 2026 Conference Paper

TransMamba: A Sequence-Level Hybrid Transformer-Mamba Language Model

Yixing Li
Ruobing Xie
Zhen Yang
Xingwu Sun
Shuaipeng Li
Weidong Han
Zhanhui Kang
Di Wang

Transformers are the cornerstone of modern large language models, but their quadratic computational complexity limits efficiency in long-sequence processing. Recent advancements in Mamba, a state space model (SSM) with linear complexity, offer promising efficiency gains but suffer from unstable contextual learning and multitask generalization. Some works conduct layer-level hybrid structures that combine Transformer and Mamba layers, aiming to make full use of both advantages. This paper proposes TransMamba, a novel sequence-level hybrid framework that unifies Transformer and Mamba through shared parameter matrices (QKV and CBx), and thus could dynamically switch between attention and SSM mechanisms at different token lengths and layers. We design the Memory Converter to bridge Transformer and Mamba by converting attention outputs into SSM-compatible states, ensuring seamless information flow at TransPoints where the transformation happens. The TransPoint scheduling is also thoroughly explored for balancing effectiveness and efficiency. We conducted extensive experiments demonstrating that TransMamba achieves superior training efficiency and performance compared to single and hybrid baselines, and validated the deeper consistency between Transformer and Mamba paradigms at sequence level, offering a scalable solution for next-generation language modeling.

PDF Details DOI

ICLR Conference 2025 Conference Paper

Advancing LLM Reasoning Generalists with Preference Trees

Lifan Yuan
Ganqu Cui
Hanbin Wang
Ning Ding 0002
Xingyao Wang 0002
Boji Shan
Zeyuan Liu
Jia Deng

We introduce EURUS, a suite of large language models (LLMs) optimized for reasoning. Finetuned from Mistral-7B, Llama-3-8B, and Mixtral-8x22B, EURUS models achieve state-of-the-art results among open-source models on a diverse set of benchmarks covering mathematics, code generation, and logical reasoning problems. Notably, EURUX-8X22B outperforms GPT-3.5 Turbo in reasoning through a comprehensive benchmarking across 12 test sets covering five tasks. The strong performance of EURUS can be primarily attributed to ULTRAINTERACT, our newly-curated large-scale, high-quality training data dataset specifically designed for complex reasoning tasks. ULTRAINTERACT can be used in both supervised fine-tuning, preference learning, and reward modeling. It pairs each instruction with a preference tree consisting of (1) reasoning chains with diverse planning strategies in a unified format, (2) multi-turn interaction trajectories with the environment and the critique, and (3) pairwise positive and negative responses to facilitate preference learning. ULTRAINTERACT allows us to conduct an in-depth exploration of preference learning for reasoning tasks. Our investigation reveals that some well-established preference learning algorithms may be less suitable for reasoning tasks compared to their effectiveness in general conversations. The hypothesis is that in reasoning tasks, the space of correct answers is much smaller than that of incorrect ones, so it is necessary to explicitly increase the reward of chosen data. Therefore, in addition to increasing the reward margin as many preference learning algorithms do, the absolute values of positive responses’ rewards should be positive and may serve as a proxy for performance. Inspired by this, we derive a novel reward modeling objective and empirically that it leads to a stable reward modeling curve and better performance. Together with ULTRAINTERACT, we obtain a strong reward model.

Details

ICML Conference 2025 Conference Paper

Autonomy-of-Experts Models

Ang Lv
Ruobing Xie
Yining Qian
Songhao Wu
Xingwu Sun
Zhanhui Kang
Di Wang 0052
Rui Yan 0001

Mixture-of-Experts (MoE) models mostly use a router to assign tokens to specific expert modules, activating only partial parameters and often outperforming dense models. We argue that the separation between the router’s decision-making and the experts’ execution is a critical yet overlooked issue, leading to suboptimal expert selection and learning. To address this, we propose Autonomy-of-Expert (AoE), a novel MoE paradigm in which experts autonomously select themselves to process inputs. AoE is based on the insight that an expert is aware of its own capacity to effectively process a token, an awareness reflected in the scale of its internal activations. In AoE, routers are removed; instead, experts pre-compute internal activations for inputs and are ranked based on their activation norms. Only the top-ranking experts proceed with the forward pass, while the others abort. The overhead of pre-computing activations is reduced through a low-rank weight factorization. This self-evaluating-then-partner-comparing approach ensures improved expert selection and effective learning. We pre-train language models having 700M up to 4B parameters, demonstrating that AoE outperforms traditional MoE models with comparable efficiency.

Details

AAAI Conference 2025 Conference Paper

Curriculum Conditioned Diffusion for Multimodal Recommendation

Yimeng Yang
Haokai Ma
Lei Meng
Shuo Xu
Ruobing Xie
Xiangxu Meng

Multimodal recommendation (MMRec) aims to integrate multimodal information of items to address the inherent data sparsity issue in collaborative-based recommendation. Traditional MMRec methods typically capture the structure-level item representations from the observed user behaviors within the multimodal graph, overlooking the potential impact of negative instances for personalized preference understanding. In light of the outstanding generative ability and step-by-step inference characteristic of Diffusion Models (DMs), we propose a Curriculum Conditioned Diffusion framework for Multimodal Recommendation (CCDRec), which precisely excavates the modality-aware distribution-level correlation among multi-modalities and elegantly integrates the reverse phase of DMs into negative sampling to highlight the most suitable instances in a curricular manner. Specifically, CCDRec proposes the Diffusion-controlled Multimodal Aligning module (DMA) to align multimodal knowledge with collaborative signals by capturing the fine-grained relationships among multi-modalities in the probabilistic distribution space. Furthermore, CCDRec designs the Negative-sensitive Diffusive Inferring module (NDI) to progressively synthesize the negative sample pool with diverse hardness to support the following knowledge-aware negative sampling. To gradually ramp up the training complexity, CCDRec further introduces a Curricular Negative Sampler (CNS) to tally the curriculum learning paradigm with the reverse phase of DMA, thereby adaptively sampling the gold-standard negative instances to enhance optimization. Extensive experiments on three datasets with four diverse backbones demonstrate the effectiveness and robustness of our CCDRec. The visualization analyses also clarify the underlying mechanism of our DMA in multimodal representation alignment and CNS in curricular negative discovery. The code and the corresponding dataset will be uploaded in the Appendix.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Enhancing Contrastive Learning Inspired by the Philosophy of “The Blind Men and the Elephant”

Yudong Zhang
Ruobing Xie
Jiansheng Chen
Xingwu Sun
Zhanhui Kang
Yu Wang

Contrastive learning is a prevalent technique in self-supervised vision representation learning, typically generating positive pairs by applying two data augmentations to the same image. Designing effective data augmentation strategies is crucial for the success of contrastive learning. Inspired by the story of the blind men and the elephant, we introduce JointCrop and JointBlur. These methods generate more challenging positive pairs by leveraging the joint distribution of the two augmentation parameters, thereby enabling contrastive learning to acquire more effective feature representations. To the best of our knowledge, this is the first effort to explicitly incorporate the joint distribution of two data augmentation parameters into contrastive learning. As a plug-and-play framework without additional computational overhead, JointCrop and JointBlur enhance the performance of SimCLR, BYOL, MoCo v1, MoCo v2, MoCo v3, SimSiam, and Dino baselines with notable improvements.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Flexible Realignment of Language Models

Wenhong Zhu
Ruobing Xie
Weinan Zhang
Rui Wang

Realignment becomes necessary when a language model (LM) fails to meet expected performance. We propose a flexible realignment framework that supports quantitative control of alignment degree during training and inference. This framework incorporates Training-time Realignment (TrRa), which efficiently realigns the reference model by leveraging the controllable fusion of logits from both the reference and already aligned models. For example, TrRa reduces token usage by 54. 63% on DeepSeek-R1-Distill-Qwen-1. 5B without any performance degradation, outperforming DeepScaleR-1. 5B’s 33. 86%. To complement TrRa during inference, we introduce a layer adapter that enables smooth Inference-time Realignment (InRa). This adapter is initialized to perform an identity transformation at the bottom layer and is inserted preceding the original layers. During inference, input embeddings are simultaneously processed by the adapter and the original layer, followed by the remaining layers, and then controllably interpolated at the logit level. We upgraded DeepSeek-R1-Distill-Qwen-7B from a slow-thinking model to one that supports both fast and slow thinking, allowing flexible alignment control even during inference. By encouraging deeper reasoning, it even surpassed its original performance.

PDF Details

ICLR Conference 2025 Conference Paper

Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence

Weize Chen
Ziming You
Ran Li
Yitong Guan
Chen Qian
Chenyang Zhao
Cheng Yang 0002
Ruobing Xie

The rapid advancement of large language models (LLMs) has paved the way for the development of highly capable autonomous agents. However, existing multi-agent frameworks often struggle with integrating diverse capable third-party agents due to reliance on agents defined within their own ecosystems. They also face challenges in simulating distributed environments, as most frameworks are limited to single-device setups. Furthermore, these frameworks often rely on hard-coded communication pipelines, limiting their adaptability to dynamic task requirements. Inspired by the concept of the Internet, we propose the Internet of Agents (IoA), a novel framework that addresses these limitations by providing a flexible and scalable platform for LLM-based multi-agent collaboration. IoA introduces an agent integration protocol, an instant-messaging-like architecture design, and dynamic mechanisms for agent teaming and conversation flow control. Through extensive experiments on general assistant tasks, embodied AI tasks, and retrieval-augmented generation benchmarks, we demonstrate that IoA consistently outperforms state-of-the-art baselines, showcasing its ability to facilitate effective collaboration among heterogeneous agents. IoA represents a step towards linking diverse agents in an Internet-like environment, where agents can seamlessly collaborate to achieve greater intelligence and capabilities. We will release our code to facilitate further research.

Details

ICML Conference 2025 Conference Paper

Scaling Laws for Floating-Point Quantization Training

Xingwu Sun
Shuaipeng Li
Ruobing Xie
Weidong Han 0006
Kan Wu
Zhen Yang
Yixing Li
An Wang

Low-precision training is considered an effective strategy for reducing both training and downstream inference costs. Previous scaling laws for precision mainly focus on integer quantization, which pay less attention to the constituents in floating-point (FP) quantization, and thus cannot well fit the LLM losses in this scenario. In contrast, while FP quantization training is more commonly implemented in production, it’s research has been relatively superficial. In this paper, we thoroughly explore the effects of FP quantization targets, exponent bits, mantissa bits, and the calculation granularity of the scaling factor in FP quantization training performance of LLM models. In addition to an accurate FP quantization unified scaling law, we also provide valuable suggestions for the community: (1) Exponent bits contribute slightly more to the model performance than mantissa bits. We provide the optimal exponent-mantissa bit ratio for different bit numbers, which is available for future reference by hardware manufacturers; (2) We discover the formation of the critical data size in low-precision LLM training. Too much training data exceeding the critical data size will inversely bring in degradation of LLM performance; (3) The optimal FP quantization precision is directly proportional to the computational power, but within a wide computational power range. We estimate that the best cost-performance precision should lie between 4-8 bits.

Details

ICLR Conference 2024 Conference Paper

AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors

Weize Chen
Yusheng Su
Jingwei Zuo
Cheng Yang 0002
Chenfei Yuan
Chi-Min Chan
Heyang Yu
Yaxi Lu

Autonomous agents empowered by Large Language Models (LLMs) have undergone significant improvements, enabling them to generalize across a broad spectrum of tasks. However, in real-world scenarios, cooperation among individuals is often required to enhance the efficiency and effectiveness of task accomplishment. Hence, inspired by human group dynamics, we propose a multi-agent framework AgentVerse that can effectively orchestrate a collaborative group of expert agents as a greater-than-the-sum-of-its-parts system. Our experiments demonstrate that AgentVerse can proficiently deploy multi-agent groups that outperform a single agent. Extensive experiments on text understanding, reasoning, coding, tool utilization, and embodied AI confirm the effectiveness of AgentVerse. Moreover, our analysis of agent interactions within AgentVerse reveals the emergence of specific collaborative behaviors, contributing to heightened group efficiency. We will release our codebase, AgentVerse, to further facilitate multi-agent research.

Details

ICML Conference 2024 Conference Paper

Exploring the Benefit of Activation Sparsity in Pre-training

Zhengyan Zhang
Chaojun Xiao
Qiujieli Qin
Yankai Lin 0001
Zhiyuan Zeng
Xu Han 0007
Zhiyuan Liu 0001
Ruobing Xie

Pre-trained Transformers inherently possess the characteristic of sparse activation, where only a small fraction of the neurons are activated for each token. While sparse activation has been explored through post-training methods, its potential in pre-training remains untapped. In this work, we first study how activation properties change during pre-training. Our examination reveals that Transformers exhibit sparse activation throughout the majority of the pre-training process while the activation correlation keeps evolving as training progresses. Leveraging this observation, we propose Switchable Sparse-Dense Learning (SSD). SSD adaptively switches between the Mixtures-of-Experts (MoE) based sparse training and the conventional dense training during the pre-training process, leveraging the efficiency of sparse training and avoiding the static activation correlation of sparse training. Compared to dense training, SSD achieves comparable performance with identical model size and reduces pre-training costs. Moreover, the models trained with SSD can be directly used as MoE models for sparse inference and achieve the same performance as dense models with up to $2\times$ faster inference speed. Codes are available at https: //github. com/thunlp/moefication.

Details

AAAI Conference 2024 Conference Paper

Plug-In Diffusion Model for Sequential Recommendation

Haokai Ma
Ruobing Xie
Lei Meng
Xin Chen
Xu Zhang
Leyu Lin
Zhanhui Kang

Pioneering efforts have verified the effectiveness of the diffusion models in exploring the informative uncertainty for recommendation. Considering the difference between recommendation and image synthesis tasks, existing methods have undertaken tailored refinements to the diffusion and reverse process. However, these approaches typically use the highest-score item in corpus for user interest prediction, leading to the ignorance of the user's generalized preference contained within other items, thereby remaining constrained by the data sparsity issue. To address this issue, this paper presents a novel Plug-in Diffusion Model for Recommendation (PDRec) framework, which employs the diffusion model as a flexible plugin to jointly take full advantage of the diffusion-generating user preferences on all items. Specifically, PDRec first infers the users' dynamic preferences on all items via a time-interval diffusion model and proposes a Historical Behavior Reweighting (HBR) mechanism to identify the high-quality behaviors and suppress noisy behaviors. In addition to the observed items, PDRec proposes a Diffusion-based Positive Augmentation (DPA) strategy to leverage the top-ranked unobserved items as the potential positive samples, bringing in informative and diverse soft signals to alleviate data sparsity. To alleviate the false negative sampling issue, PDRec employs Noise-free Negative Sampling (NNS) to select stable negative samples for ensuring effective model optimization. Extensive experiments and analyses on four datasets have verified the superiority of the proposed PDRec over the state-of-the-art baselines and showcased the universality of PDRec as a flexible plugin for commonly-used sequential encoders in different recommendation scenarios. The code is available in https://github.com/hulkima/PDRec.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

SeeDRec: Sememe-based Diffusion for Sequential Recommendation

Haokai Ma
Ruobing Xie
Lei Meng
Yimeng Yang
Xingwu Sun
Zhanhui Kang

Inspired by the power of Diffusion Models (DM) verified in various fields, some pioneering works have started to explore DM in recommendation. However, these prevailing endeavors commonly implement diffusion on item indices, leading to the increasing time complexity, the lack of transferability, and the inability to fully harness item semantic information. To tackle these challenges, we propose SeeDRec, a sememe-based diffusion framework for sequential recommendation (SR). Specifically, inspired by the notion of sememe in NLP, SeeDRec first defines a similar concept of recommendation sememe to represent the minimal interest unit and upgrades the specific diffusion objective from the item level to the sememe level. With the Sememe-to-Interest Diffusion Model (S2IDM), SeeDRec can accurately capture the user's diffused interest distribution learned from both local interest evolution and global interest generalization while maintaining low computational costs. Subsequently, an Interest-aware Prompt-enhanced (IPE) strategy is proposed to better guide each user's sequential behavior modeling via the learned user interest distribution. Extensive experiments on nine SR datasets and four cross-domain SR datasets verify its effectiveness and universality. The code is available in https: //github. com/hulkima/SeeDRec.

PDF Details DOI

ICLR Conference 2024 Conference Paper

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Yujia Qin
Shihao Liang
Yining Ye
Kunlun Zhu
Lan Yan
Yaxi Lu
Yankai Lin 0001
Xin Cong

Despite the advancements of open-source large language models (LLMs), e.g., LLaMA, they remain significantly limited in tool-use capabilities, i.e., using external tools (APIs) to fulfill human instructions. The reason is that current instruction tuning largely focuses on basic language tasks but ignores the tool-use domain. This is in contrast to the excellent tool-use capabilities of state-of-the-art (SOTA) closed-source LLMs, e.g., ChatGPT. To bridge this gap, we introduce ToolLLM, a general tool-use framework encompassing data construction, model training, and evaluation. We first present ToolBench, an instruction-tuning dataset for tool use, which is constructed automatically using ChatGPT. Specifically, the construction can be divided into three stages: (i) API collection: we collect 16,464 real-world RESTful APIs spanning 49 categories from RapidAPI Hub; (ii) instruction generation: we prompt ChatGPT to generate diverse instructions involving these APIs, covering both single-tool and multi-tool scenarios; (iii) solution path annotation: we use ChatGPT to search for a valid solution path (chain of API calls) for each instruction. To enhance the reasoning capabilities of LLMs, we develop a novel depth-first search-based decision tree algorithm. It enables LLMs to evaluate multiple reasoning traces and expand the search space. Moreover, to evaluate the tool-use capabilities of LLMs, we develop an automatic evaluator: ToolEval. Based on ToolBench, we fine-tune LLaMA to obtain an LLM ToolLLaMA, and equip it with a neural API retriever to recommend appropriate APIs for each instruction. Experiments show that ToolLLaMA demonstrates a remarkable ability to execute complex instructions and generalize to unseen APIs, and exhibits comparable performance to ChatGPT. Our ToolLLaMA also demonstrates strong zero-shot generalization ability in an out-of-distribution tool-use dataset: APIBench.

Details

ICML Conference 2024 Conference Paper

ULTRAFEEDBACK: Boosting Language Models with Scaled AI Feedback

Ganqu Cui
Lifan Yuan
Ning Ding 0002
Guanming Yao
Bingxiang He
Wei Zhu 0016
Yuan Ni
Guotong Xie

Learning from human feedback has become a pivot technique in aligning large language models (LLMs) with human preferences. However, acquiring vast and premium human feedback is bottlenecked by time, labor, and human capability, resulting in small sizes or limited topics of current datasets. This further hinders feedback learning as well as alignment research within the open-source community. To address this issue, we explore how to go beyond human feedback and collect high-quality AI feedback automatically for a scalable alternative. Specifically, we identify scale and diversity as the key factors for feedback data to take effect. Accordingly, we first broaden instructions and responses in both amount and breadth to encompass a wider range of user-assistant interactions. Then, we meticulously apply a series of techniques to mitigate annotation biases for more reliable AI feedback. We finally present UltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset, which contains over 1 million GPT-4 feedback for 250k user-assistant conversations from various aspects. Built upon UltraFeedback, we align a LLaMA-based model by best-of-$n$ sampling and reinforcement learning, demonstrating its exceptional performance on chat benchmarks. Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models, serving as a solid foundation for future feedback learning research.

Details

AAAI Conference 2023 Conference Paper

Visually Grounded Commonsense Knowledge Acquisition

Yuan Yao
Tianyu Yu
Ao Zhang
Mengdi Li
Ruobing Xie
Cornelius Weber
Zhiyuan Liu
Hai-Tao Zheng

Large-scale commonsense knowledge bases empower a broad range of AI applications, where the automatic extraction of commonsense knowledge (CKE) is a fundamental and challenging problem. CKE from text is known for suffering from the inherent sparsity and reporting bias of commonsense in text. Visual perception, on the other hand, contains rich commonsense knowledge about real-world entities, e.g., (person, can_hold, bottle), which can serve as promising sources for acquiring grounded commonsense knowledge. In this work, we present CLEVER, which formulates CKE as a distantly supervised multi-instance learning problem, where models learn to summarize commonsense relations from a bag of images about an entity pair without any human annotation on image instances. To address the problem, CLEVER leverages vision-language pre-training models for deep understanding of each image in the bag, and selects informative instances from the bag to summarize commonsense entity relations via a novel contrastive attention mechanism. Comprehensive experimental results in held-out and human evaluation show that CLEVER can extract commonsense knowledge in promising quality, outperforming pre-trained language model-based methods by 3.9 AUC and 6.4 mAUC points. The predicted commonsense scores show strong correlation with human judgment with a 0.78 Spearman coefficient. Moreover, the extracted commonsense can also be grounded into images with reasonable interpretability. The data and codes can be obtained at https://github.com/thunlp/CLEVER.

PDF Details DOI

NeurIPS Conference 2021 Conference Paper

Curriculum Disentangled Recommendation with Noisy Multi-feedback

Hong Chen
Yudong Chen
Xin Wang
Ruobing Xie
Rui Wang
Feng Xia
Wenwu Zhu

Learning disentangled representations for user intentions from multi-feedback (i. e. , positive and negative feedback) can enhance the accuracy and explainability of recommendation algorithms. However, learning such disentangled representations from multi-feedback data is challenging because i) multi-feedback is complex: there exist complex relations among different types of feedback (e. g. , click, unclick, and dislike, etc) as well as various user intentions, and ii) multi-feedback is noisy: there exists noisy (useless) information both in features and labels, which may deteriorate the recommendation performance. Existing works on disentangled representation learning only focus on positive feedback, failing to handle the complex relations and noise hidden in multi-feedback data. To solve this problem, in this work we propose a Curriculum Disentangled Recommendation (CDR) model that is capable of efficiently learning disentangled representations from complex and noisy multi-feedback for better recommendation. Concretely, we design a co-filtering dynamic routing mechanism that simultaneously captures the complex relations among different behavioral feedback and user intentions as well as denoise the representations in the feature level. We then present an adjustable self-evaluating curriculum that is able to evaluate sample difficulties for better model training and conduct denoising in the label level via disregarding useless information. Our extensive experiments on several real-world datasets demonstrate that the proposed CDR model can significantly outperform several state-of-the-art methods in terms of recommendation accuracy.

PDF Details

AAAI Conference 2021 Conference Paper

Hierarchical Reinforcement Learning for Integrated Recommendation

Ruobing Xie
Shaoliang Zhang
Rui Wang
Feng Xia
Leyu Lin

Integrated recommendation aims to jointly recommend heterogeneous items in the main feed from different sources via multiple channels, which needs to capture user preferences on both item and channel levels. It has been widely used in practical systems by billions of users, while few works concentrate on the integrated recommendation systematically. In this work, we propose a novel Hierarchical reinforcement learning framework for integrated recommendation (HRL-Rec), which divides the integrated recommendation into two tasks to recommend channels and items sequentially. The low-level agent is a channel selector, which generates a personalized channel list. The high-level agent is an item recommender, which recommends specific items from heterogeneous channels under the channel constraints. We design various rewards for both recommendation accuracy and diversity, and propose four losses for fast and stable model convergence. We also conduct an online exploration for sufficient training. In experiments, we conduct extensive offline and online experiments on a billion-level real-world dataset to show the effectiveness of HRL-Rec. HRL-Rec has also been deployed on WeChat Top Stories, affecting millions of users. The source codes are released in https: //github. com/modriczhang/HRL-Rec.

PDF Details

IJCAI Conference 2020 Conference Paper

Deep Feedback Network for Recommendation

Ruobing Xie
Cheng Ling
Yalong Wang
Rui Wang
Feng Xia
Leyu Lin

Both explicit and implicit feedbacks can reflect user opinions on items, which are essential for learning user preferences in recommendation. However, most current recommendation algorithms merely focus on implicit positive feedbacks (e. g. , click), ignoring other informative user behaviors. In this paper, we aim to jointly consider explicit/implicit and positive/negative feedbacks to learn user unbiased preferences for recommendation. Specifically, we propose a novel Deep feedback network (DFN) modeling click, unclick and dislike behaviors. DFN has an internal feedback interaction component that captures fine-grained interactions between individual behaviors, and an external feedback interaction component that uses precise but relatively rare feedbacks (click/dislike) to extract useful information from rich but noisy feedbacks (unclick). In experiments, we conduct both offline and online evaluations on a real-world recommendation system WeChat Top Stories used by millions of users. The significant improvements verify the effectiveness and robustness of DFN. The source code is in https: //github. com/qqxiaochongqq/DFN.

PDF Details DOI

IJCAI Conference 2020 Conference Paper

Internal and Contextual Attention Network for Cold-start Multi-channel Matching in Recommendation

Ruobing Xie
Zhijie Qiu
Jun Rao
Yi Liu
Bo Zhang
Leyu Lin

Real-world integrated personalized recommendation systems usually deal with millions of heterogeneous items. It is extremely challenging to conduct full corpus retrieval with complicated models due to the tremendous computation costs. Hence, most large-scale recommendation systems consist of two modules: a multi-channel matching module to efficiently retrieve a small subset of candidates, and a ranking module for precise personalized recommendation. However, multi-channel matching usually suffers from cold-start problems when adding new channels or new data sources. To solve this issue, we propose a novel Internal and contextual attention network (ICAN), which highlights channel-specific contextual information and feature field interactions between multiple channels. In experiments, we conduct both offline and online evaluations with case studies on a real-world integrated recommendation system. The significant improvements confirm the effectiveness and robustness of ICAN, especially for cold-start channels. Currently, ICAN has been deployed on WeChat Top Stories used by millions of users. The source code can be obtained from https: //github. com/zhijieqiu/ICAN.

PDF Details DOI

AAAI Conference 2020 Conference Paper

Neural Snowball for Few-Shot Relation Learning

Tianyu Gao
Xu Han
Ruobing Xie
Zhiyuan Liu
Fen Lin
Leyu Lin
Maosong Sun

Knowledge graphs typically undergo open-ended growth of new relations. This cannot be well handled by relation extraction that focuses on pre-deﬁned relations with sufﬁcient training data. To address new relations with few-shot instances, we propose a novel bootstrapping approach, Neural Snowball, to learn new relations by transferring semantic knowledge about existing relations. More speciﬁcally, we use Relational Siamese Networks (RSN) to learn the metric of relational similarities between instances based on existing relations and their labeled data. Afterwards, given a new relation and its few-shot instances, we use RSN to accumulate reliable instances from unlabeled corpora; these instances are used to train a relation classiﬁer, which can further identify new facts of the new relation. The process is conducted iteratively like a snowball. Experiments show that our model can gather high-quality instances for better fewshot relation learning and achieves signiﬁcant improvement compared to baselines. Codes and datasets are released on https: //github. com/thunlp/Neural-Snowball.

PDF Details

AAAI Conference 2018 Conference Paper

Does William Shakespeare REALLY Write Hamlet? Knowledge Representation Learning With Confidence

Ruobing Xie
Zhiyuan Liu
Fen Lin
Leyu Lin

Knowledge graphs (KGs), which could provide essential relational information between entities, have been widely utilized in various knowledge-driven applications. Since the overall human knowledge is innumerable that still grows explosively and changes frequently, knowledge construction and update inevitably involve automatic mechanisms with less human supervision, which usually bring in plenty of noises and conﬂicts to KGs. However, most conventional knowledge representation learning methods assume that all triple facts in existing KGs share the same signiﬁcance without any noises. To address this problem, we propose a novel conﬁdence-aware knowledge representation learning framework (CKRL), which detects possible noises in KGs while learning knowledge representations with conﬁdence simultaneously. Speciﬁcally, we introduce the triple conﬁdence to conventional translation-based methods for knowledge representation learning. To make triple conﬁdence more ﬂexible and universal, we only utilize the internal structural information in KGs, and propose three kinds of triple con- ﬁdences considering both local and global structural information. In experiments, We evaluate our models on knowledge graph noise detection, knowledge graph completion and triple classiﬁcation. Experimental results demonstrate that our conﬁdence-aware models achieve signiﬁcant and consistent improvements on all tasks, which conﬁrms the capability of CKRL modeling conﬁdence with structural information in both KG noise detection and knowledge representation learning.

PDF Details

IJCAI Conference 2017 Conference Paper

Image-embodied Knowledge Representation Learning

Ruobing Xie
Zhiyuan Liu
Huanbo Luan
Maosong Sun

Entity images could provide significant visual information for knowledge representation learning. Most conventional methods learn knowledge representations merely from structured triples, ignoring rich visual information extracted from entity images. In this paper, we propose a novel Image-embodied Knowledge Representation Learning model (IKRL), where knowledge representations are learned with both triple facts and images. More specifically, we first construct representations for all images of an entity with a neural image encoder. These image representations are then integrated into an aggregated image-based representation via an attention-based method. We evaluate our IKRL models on knowledge graph completion and triple classification. Experimental results demonstrate that our models outperform all baselines on both tasks, which indicates the significance of visual information for knowledge representations and the capability of our models in learning knowledge representations with images.

PDF Details

IJCAI Conference 2017 Conference Paper

Iterative Entity Alignment via Joint Knowledge Embeddings

Hao Zhu
Ruobing Xie
Zhiyuan Liu
Maosong Sun

Entity alignment aims to link entities and their counterparts among multiple knowledge graphs (KGs). Most existing methods typically rely on external information of entities such as Wikipedia links and require costly manual feature construction to complete alignment. In this paper, we present a novel approach for entity alignment via joint knowledge embeddings. Our method jointly encodes both entities and relations of various KGs into a unified low-dimensional semantic space according to a small seed set of aligned entities. During this process, we can align entities according to their semantic distance in this joint semantic space. More specifically, we present an iterative and parameter sharing method to improve alignment performance. Experiment results on real-world datasets show that, as compared to baselines, our method achieves significant improvements on entity alignment, and can further improve knowledge graph completion performance on various KGs with the favor of joint knowledge embeddings.

PDF Details

IJCAI Conference 2017 Conference Paper

Lexical Sememe Prediction via Word Embeddings and Matrix Factorization

Ruobing Xie
Xingchi Yuan
Zhiyuan Liu
Maosong Sun

Sememes are defined as the minimum semantic units of human languages. People have manually annotated lexical sememes for words and form linguistic knowledge bases. However, manual construction is time-consuming and labor-intensive, with significant annotation inconsistency and noise. In this paper, we for the first time explore to automatically predict lexical sememes based on semantic meanings of words encoded by word embeddings. Moreover, we apply matrix factorization to learn semantic relations between sememes and words. In experiments, we take a real-world sememe knowledge base HowNet for training and evaluation, and the results reveal the effectiveness of our method for lexical sememe prediction. Our method will be of great use for annotation verification of existing noisy sememe knowledge bases and annotation suggestion of new words and phrases.

PDF Details

AAAI Conference 2016 Conference Paper

Representation Learning of Knowledge Graphs with Entity Descriptions

Ruobing Xie
Zhiyuan Liu
Jia Jia
Huanbo Luan
Maosong Sun

Representation learning (RL) of knowledge graphs aims to project both entities and relations into a continuous lowdimensional space. Most methods concentrate on learning representations with knowledge triples indicating relations between entities. In fact, in most knowledge graphs there are usually concise descriptions for entities, which cannot be well utilized by existing methods. In this paper, we propose a novel RL method for knowledge graphs taking advantages of entity descriptions. More speciﬁcally, we explore two encoders, including continuous bag-of-words and deep convolutional neural models to encode semantics of entity descriptions. We further learn knowledge representations with both triples and descriptions. We evaluate our method on two tasks, including knowledge graph completion and entity classiﬁcation. Experimental results on real-world datasets show that, our method outperforms other baselines on the two tasks, especially under the zero-shot setting, which indicates that our method is capable of building representations for novel entities according to their descriptions. The source code of this paper can be obtained from https: //github. com/xrb92/DKRL.

PDF Details

IJCAI Conference 2016 Conference Paper

Representation Learning of Knowledge Graphs with Hierarchical Types

Ruobing Xie
Zhiyuan Liu
Maosong Sun

Representation learning of knowledge graphs aims to encode both entities and relations into a continuous low-dimensional vector space. Most existing methods only concentrate on learning representations with structured information located in triples, regardless of the rich information located in hierarchical types of entities, which could be collected in most knowledge graphs. In this paper, we propose a novel method named Type-embodied Knowledge Representation Learning (TKRL) to take advantages of hierarchical entity types. We suggest that entities should have multiple representations in different types. More specifically, we consider hierarchical types as projection matrices for entities, with two type encoders designed to model hierarchical structures. Meanwhile, type information is also utilized as relation-specific type constraints. We evaluate our models on two tasks including knowledge graph completion and triple classification, and further explore the performances on long-tail dataset. Experimental results show that our models significantly outperform all baselines on both tasks, especially with long-tail distribution. It indicates that our models are capable of capturing hierarchical type information which is significant when constructing representations of knowledge graphs. The source code of this paper can be obtained from https: //github. com/thunlp/TKRL.

PDF Details