Arrow Research search

Author name cluster

Dawei Yin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

31 papers
1 author row

Possible papers

31

AAAI Conference 2026 Conference Paper

AdaFuse: Accelerating Dynamic Adapter Inference via Token-Level Pre-Gating and Fused Kernel Optimization

  • Qiyang Li
  • Rui Kong
  • Yuchen Li
  • Hengyi Cai
  • Shuaiqiang Wang
  • Linghe Kong
  • Guihai Chen
  • Dawei Yin

The integration of dynamic, sparse structures like Mixture-of-Experts (MoE) with parameter-efficient adapters (e.g., LoRA) is a powerful technique for enhancing Large Language Models (LLMs). However, this architectural enhancement comes at a steep cost: despite minimal increases in computational load, the inference latency often skyrockets, leading to decoding speeds slowing by over 2.5 times. Through a fine-grained performance analysis, we pinpoint the primary bottleneck not in the computation itself, but in the severe overhead from fragmented, sequential CUDA kernel launches required for conventional dynamic routing. To address this challenge, we introduce AdaFuse, a framework built on a tight co-design between the algorithm and the underlying hardware system to enable efficient dynamic adapter execution. Departing from conventional layer-wise or block-wise routing, AdaFuse employs a token-level pre-gating strategy, which makes a single, global routing decision for all adapter layers before a token is processed. This ``decide-once, apply-everywhere'' approach effectively staticizes the execution path for each token, creating an opportunity for holistic optimization. We capitalize on this by developing a custom CUDA kernel that performs a fused switching operation, merging the parameters of all selected LoRA adapters into the backbone model in a single, efficient pass. Experimental results on popular open-source LLMs show that AdaFuse achieves accuracy on par with state-of-the-art dynamic adapters while drastically cutting decoding latency by a factor of over 2.4x, thereby bridging the gap between model capability and inference efficiency.

AAAI Conference 2026 Conference Paper

Beyond ReAct: A Planner-Centric Framework for Complex Tool-Augmented LLM Reasoning

  • Xiaolong Wei
  • Yuehu Dong
  • Xingliang Wang
  • Xingyu Zhang
  • Zhejun Zhao
  • Dongdong Shen
  • Long Xia
  • Dawei Yin

Existing tool-augmented large language models (LLMs) encounter significant challenges when processing complex queries. Current frameworks such as ReAct are prone to local optimization traps due to their reliance on incremental decision-making processes. To address these limitations, we propose a novel Planner-centric Plan-Execute paradigm that fundamentally resolves local optimization bottlenecks through architectural innovation. Central to our approach is a novel Planner model that performs global Directed Acyclic Graph (DAG) planning for complex queries, enabling optimized execution beyond conventional tool coordination. We also introduce ComplexTool-Plan, a large-scale benchmark dataset featuring complex queries that demand sophisticated multi-tool composition and coordination capabilities. Additionally, we develop a two-stage training methodology that integrates Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO), systematically enhancing the Planner's tool selection accuracy and global planning awareness through structured DAG-based planning. When integrated with a capable executor, our framework achieves state-of-the-art performance on the StableToolBench benchmark for complex user queries, demonstrating superior end-to-end execution capabilities and robust handling of intricate multi-tool workflows.

AAAI Conference 2026 Conference Paper

Efficient Thought Space Exploration Through Strategic Intervention

  • Ziheng Li
  • Hengyi Cai
  • Xiaochi Wei
  • Yuchen Li
  • Shuaiqiang Wang
  • Zhi-Hong Deng
  • Dawei Yin

While large language models (LLMs) demonstrate emerging reasoning capabilities, current inference-time expansion methods incur prohibitive computational costs through exhaustive sampling. Through analyzing decoding trajectories, we observe that most next-token predictions align well with the golden output, except for a few critical tokens that lead to deviations. Inspired by this phenomenon, we propose a novel Hint-Practice Reasoning (HPR) framework that operationalizes this insight through two synergistic components: 1) a hinter (powerful LLM) that provides probabilistic guidance at critical decision points, and 2) a practitioner (efficient smaller model) that executes major reasoning steps. The framework's core innovation lies in Distributional Inconsistency Reduction (DIR), a theoretically-grounded metric that dynamically identifies intervention points by quantifying the divergence between practitioner's reasoning trajectory and hinter's expected distribution in a tree-structured probabilistic space. Through iterative tree updates guided by DIR, HPR reweights promising reasoning paths while deprioritizing low-probability branches. Experiments across arithmetic and commonsense reasoning benchmarks demonstrate HPR's state-of-the-art efficiency-accuracy tradeoffs: it achieves comparable performance to self-consistency and MCTS baselines while decoding only 1/5 tokens, and outperforms existing methods by at most 5.1% absolute accuracy while maintaining similar or lower FLOPs.

AAAI Conference 2026 Conference Paper

Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning

  • Wenda Wei
  • Yu-An Liu
  • Ruqing Zhang
  • Jiafeng Guo
  • Lixin Su
  • Shuaiqiang Wang
  • Dawei Yin
  • Maarten de Rijke

Retrieval-augmented generation (RAG) has proven to be effective in mitigating hallucinations in large language models, yet its effectiveness remains limited in complex, multi-step reasoning scenarios. Recent efforts have incorporated search-based interactions into RAG, enabling iterative reasoning with real-time retrieval. Most approaches rely on outcome-based supervision, offering no explicit guidance for intermediate steps. This often leads to reward hacking and degraded response quality. We propose Bi-RAR, a novel retrieval-augmented reasoning framework that evaluates each intermediate step jointly in both forward and backward directions. To assess the information completeness of each step, we introduce a bidirectional information distance grounded in Kolmogorov complexity, approximated via language model generation probabilities. This quantification measures both how far the current reasoning is from the answer and how well it addresses the question. To optimize reasoning under these bidirectional signals, we adopt a multi-objective reinforcement learning framework with a cascading reward structure that emphasizes early trajectory alignment. Empirical results on seven question answering benchmarks demonstrate that Bi-RAR surpasses previous methods and enables efficient interaction and reasoning with the search engine during training and inference.

TIST Journal 2025 Journal Article

Graph Machine Learning in the Era of Large Language Models (LLMs)

  • Shijie Wang
  • Jiani Huang
  • Zhikai Chen
  • Yu Song
  • Wenzhuo Tang
  • Haitao Mao
  • Wenqi Fan
  • Hui Liu

Graphs play an important role in representing complex relationships in various domains like social networks, knowledge graphs, and molecular discovery. With the advent of deep learning, Graph Neural Networks (GNNs) have emerged as a cornerstone in Graph Machine Learning (Graph ML), facilitating the representation and processing of graphs. Recently, LLMs have demonstrated unprecedented capabilities in language tasks and are widely adopted in a variety of applications, such as computer vision and recommender systems. This remarkable success has also attracted interest in applying LLMs to the graph domain. Increasing efforts have been made to explore the potential of LLMs in advancing Graph ML’s generalization, transferability, and few-shot learning ability. Meanwhile, graphs, especially knowledge graphs, are rich in reliable factual knowledge, which can be utilized to enhance the reasoning capabilities of LLMs and potentially alleviate their limitations, such as hallucinations and the lack of explainability. Given the rapid progress of this research direction, a systematic review summarizing the latest advancements for Graph ML in the era of LLMs is necessary to provide an in-depth understanding to researchers and practitioners. Therefore, in this survey, we first review the recent developments in Graph ML. We then explore how LLMs can be utilized to enhance the quality of graph features, alleviate the reliance on labeled data, and address challenges such as graph Heterophily and Out-of-Distribution (OOD) generalization. Afterward, we delve into how graphs can enhance LLMs, highlighting their abilities to enhance LLM pre-training and inference. Furthermore, we investigate various applications and discuss the potential future directions in this promising field.

NeurIPS Conference 2025 Conference Paper

Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning

  • Yiqun Chen
  • Lingyong Yan
  • Weiwei Sun
  • Xinyu Ma
  • Yi Zhang
  • Shuaiqiang Wang
  • Dawei Yin
  • Yiming Yang

Retrieval-augmented generation (RAG) is widely utilized to incorporate external knowledge into large language models, thereby enhancing factuality and reducing hallucinations in question-answering (QA) tasks. A standard RAG pipeline consists of several components, such as query rewriting, document retrieval, document filtering, and answer generation. However, these components are typically optimized separately through supervised fine-tuning, which can lead to misalignments between the objectives of individual components and the overarching aim of generating accurate answers. Although recent efforts have explored using reinforcement learning (RL) to optimize specific RAG components, these approaches often focus on simple pipelines with only two components or do not adequately address the complex interdependencies and collaborative interactions among the modules. To overcome these limitations, we propose treating the complex RAG pipeline with multiple components as a multi-agent cooperative task, in which each component can be regarded as an RL agent. Specifically, we present MMOA-RAG\footnote{The code of MMOA-RAG is on \url{https: //github. com/chenyiqun/MMOA-RAG}. }, \textbf{M}ulti-\textbf{M}odule joint \textbf{O}ptimization \textbf{A}lgorithm for \textbf{RAG}, which employs multi-agent reinforcement learning to harmonize all agents' goals toward a unified reward, such as the F1 score of the final answer. Experiments conducted on various QA benchmarks demonstrate that MMOA-RAG effectively boost the overall performance of the pipeline and outperforms existing baselines. Furthermore, comprehensive ablation studies validate the contributions of individual components and demonstrate MMOA-RAG can be adapted to different RAG pipelines and benchmarks.

AAAI Conference 2025 Conference Paper

InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct

  • Yutong Wu
  • Di Huang
  • Wenxuan Shi
  • Wei Wang
  • Yewen Pu
  • Lingzhe Gao
  • Shihao Liu
  • Ziyuan Nan

Recent advancements in open-source code large language models (LLMs) have been driven by fine-tuning on the data generated from powerful closed-source LLMs, which are expensive to obtain. This paper explores whether it is possible to use a fine-tuned open-source model to generate additional data to augment its instruction-tuning dataset. We make two observations: (1) A code snippet can serve as the response to different instructions. (2) Instruction-tuned code LLMs perform better at translating code into instructions than the reverse. Based on these observations, we propose Inverse-Instruct, a data augmentation technique that uses a fine-tuned LLM to generate additional instructions of code responses from its own training dataset. The additional instruction-response pairs are added to the original dataset, and a stronger code LLM can be obtained by fine-tuning on the augmented dataset. We empirically validate Inverse-Instruct on a range of open-source code models (e.g. CodeLlama-Python and DeepSeek-Coder) and benchmarks (e.g., HumanEval(+), MBPP(+), DS-1000 and MultiPL-E), showing it consistently improves the base models.

NeurIPS Conference 2025 Conference Paper

Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers

  • Zhengliang Shi
  • Lingyong Yan
  • Dawei Yin
  • Suzan Verberne
  • Maarten Rijke
  • Zhaochun Ren

Large language models (LLMs) have been widely integrated into information retrieval to advance traditional techniques. However, effectively enabling LLMs to seek accurate knowledge in complex tasks remains a challenge due to the complexity of multi-hop queries as well as the irrelevant retrieved content. To address these limitations, we propose ExSearch, an agentic search framework, where the LLM learns to retrieve useful information as the reasoning unfolds through a self-incentivized process. At each step, the LLM decides what to retrieve (thinking), triggers an external retriever (search), and extracts fine-grained evidence (recording) to support next-step reasoning. To enable LLM with this capability, we adopts a Generalized Expectation-Maximization algorithm. In the E-step, the LLM generates multiple search trajectories and assigns an importance weight to each; the M-step trains the LLM on them with a re-weighted loss function. This creates a self-incentivized loop, where the LLM iteratively learns from its own generated data, progressively improving itself for search. We further theoretically analyze this training process, establishing convergence guarantees. Extensive experiments on four knowledge-intensive benchmarks show that ExSearchS substantially outperforms baselines, e. g. , +7. 8% improvement on exact match score. Motivated by these promising results, we introduce ExSearch-Zoo, an extension that extends our method to broader scenarios, to facilitate future work.

TIST Journal 2025 Journal Article

Open Spatio-Temporal Foundation Models for Traffic Prediction

  • Zhonghang Li
  • Long Xia
  • Lei Shi
  • Yong Xu
  • Dawei Yin
  • Chao Huang

Accurate traffic forecasting is crucial for effective urban planning and transportation management, enabling efficient resource allocation and enhanced travel experiences. However, existing models often face limitations in generalization, struggling with zero-shot prediction on unseen regions and cities, as well as diminished long-term accuracy. This is primarily due to the inherent challenges in handling the spatial and temporal heterogeneity of traffic data, coupled with the significant distribution shift across time and space. In this work, we aim to unlock new possibilities for building versatile, resilient and adaptive spatio-temporal foundation models for traffic prediction. We introduce OpenCity, a foundation model that captures underlying spatio-temporal patterns from diverse data, facilitating zero-shot generalization across urban environments. OpenCity integrates Transformers with graph neural networks to capture complex spatio-temporal dependencies in traffic data. By pre-training OpenCity on large-scale, heterogeneous traffic data from web platforms, we enable the model to learn rich, generalizable representations that can be seamlessly applied to a wide range of traffic forecasting scenarios. Experiments show OpenCity excels in zero-shot prediction and exhibits scaling laws, highlighting its potential as a universal one-for-all traffic prediction solution adaptable to new urban contexts with minimal overhead. Source codes are available at: https://github.com/HKUDS/OpenCity

AAAI Conference 2025 Conference Paper

Towards S²-Challenges Underlying LLM-Based Augmentation for Personalized News Recommendation

  • Shicheng Wang
  • Hengzhu Tang
  • Li Gao
  • Shu Guo
  • Suqi Cheng
  • Junfeng Wang
  • Dawei Yin
  • Tingwen Liu

Personalized news recommendation aims to recommend candidate news to the target user. Since the data and knowledge involved in traditional recommender systems are restricted, recent studies utilize large language models (LLMs) to generate news articles and augment the original dataset. However, despite the superiority of LLM-based augmentation in news recommendation, previous studies still suffer from two serious problems, i.e., structure-level deficiency and semantic-level noise. Since the LLM-based augmentation is mainly implemented at the semantic level, collaborative signals, the critical structure information in recommender systems, is neglected during the generation process. Thus, it is inappropriate to perform recommendation based on the augmented user-news bipartite, which manifests as multiple isolated cliques. Moreover, utilizing the open-world knowledge of LLMs to extend the closed systems will inevitably introduce noise information, leading to difficulties in mining users' real preferences. In this paper, we propose a novel Structure-aware and Semantic-aware approach for LLM-Empowered personalized News Recommendation, named S^2LENR, to tackle the mentioned problems. Specifically, we propose a structure-aware refinement module to inject collaborative information in a parametric way, in order to construct a valid augmented bipartite. Besides, we devise a semantic-aware denoising module utilizing contrastive learning paradigm to overcome the negative effects of noise information. Finally, we calculate the relevance score between target user and candidate news representations. We conduct experiments on two real-world news recommendation datasets MIND-Large, MIND-Small and empirical results demonstrate the effectiveness of our approach from multiple perspectives.

NeurIPS Conference 2024 Conference Paper

Cross-model Control: Improving Multiple Large Language Models in One-time Training

  • Jiayi Wu
  • Hao Sun
  • Hengyi Cai
  • Lixin Su
  • Shuaiqiang Wang
  • Dawei Yin
  • Xiang Li
  • Ming Gao

The number of large language models (LLMs) with varying parameter scales and vocabularies is increasing. While they deliver powerful performance, they also face a set of common optimization needs to meet specific requirements or standards, such as instruction following or avoiding the output of sensitive information from the real world. However, how to reuse the fine-tuning outcomes of one model to other models to reduce training costs remains a challenge. To bridge this gap, we introduce Cross-model Control (CMC), a method that improves multiple LLMs in one-time training with a portable tiny language model. Specifically, we have observed that the logit shift before and after fine-tuning is remarkably similar across different models. Based on this insight, we incorporate a tiny language model with a minimal number of parameters. By training alongside a frozen template LLM, the tiny model gains the capability to alter the logits output by the LLMs. To make this tiny language model applicable to models with different vocabularies, we propose a novel token mapping strategy named PM-MinED. We have conducted extensive experiments on instruction tuning and unlearning tasks, demonstrating the effectiveness of CMC. Our code is available at https: //github. com/wujwyi/CMC

TIST Journal 2024 Journal Article

Explainability for Large Language Models: A Survey

  • Haiyan Zhao
  • Hanjie Chen
  • Fan Yang
  • Ninghao Liu
  • Huiqi Deng
  • Hengyi Cai
  • Shuaiqiang Wang
  • Dawei Yin

Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. However, their internal mechanisms are still unclear and this lack of transparency poses unwanted risks for downstream applications. Therefore, understanding and explaining these models is crucial for elucidating their behaviors, limitations, and social impacts. In this article, we introduce a taxonomy of explainability techniques and provide a structured overview of methods for explaining Transformer-based language models. We categorize techniques based on the training paradigms of LLMs: traditional fine-tuning-based paradigm and prompting-based paradigm. For each paradigm, we summarize the goals and dominant approaches for generating local explanations of individual predictions and global explanations of overall model knowledge. We also discuss metrics for evaluating generated explanations and discuss how explanations can be leveraged to debug models and improve performance. Lastly, we examine key challenges and emerging opportunities for explanation techniques in the era of LLMs in comparison to conventional deep learning models.

NeurIPS Conference 2024 Conference Paper

G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models

  • Pengyue Jia
  • Yiding Liu
  • Xiaopeng Li
  • Yuhao Wang
  • Yantong Du
  • Xiao Han
  • Xuetao Wei
  • Shuaiqiang Wang

Worldwide geolocalization aims to locate the precise location at the coordinate level of photos taken anywhere on the Earth. It is very challenging due to 1) the difficulty of capturing subtle location-aware visual semantics, and 2) the heterogeneous geographical distribution of image data. As a result, existing studies have clear limitations when scaled to a worldwide context. They may easily confuse distant images with similar visual contents, or cannot adapt to various locations worldwide with different amounts of relevant data. To resolve these limitations, we propose G3, a novel framework based on Retrieval-Augmented Generation (RAG). In particular, G3 consists of three steps, i. e. , G eo-alignment, G eo-diversification, and G eo-verification to optimize both retrieval and generation phases of worldwide geolocalization. During Geo-alignment, our solution jointly learns expressive multi-modal representations for images, GPS and textual descriptions, which allows us to capture location-aware semantics for retrieving nearby images for a given query. During Geo-diversification, we leverage a prompt ensembling method that is robust to inconsistent retrieval performance for different image queries. Finally, we combine both retrieved and generated GPS candidates in Geo-verification for location prediction. Experiments on two well-established datasets IM2GPS3k and YFCC4k verify the superiority of G3 compared to other state-of-the-art methods. Our code is available online https: //github. com/Applied-Machine-Learning-Lab/G3 for reproduction.

IJCAI Conference 2024 Conference Paper

GS2P: A Generative Pre-trained Learning to Rank Model with Over-parameterization for Web-Scale Search (Extended Abstract)

  • Yuchen Li
  • Haoyi Xiong
  • Linghe Kong
  • Jiang Bian
  • Shuaiqiang Wang
  • Guihai Chen
  • Dawei Yin

While Learning to Rank (LTR) is widely employed in web searches to prioritize pertinent webpages from the retrieved contents based on input queries, traditional LTR models stumble over two principal stumbling blocks leading to subpar performance: 1) the lack of well-annotated query-webpage pairs with ranking scores to cover search queries of various popularity, debilitating their coverage of search queries across the popularity spectrum, and 2) ill-trained models that are incapable of inducing generalized representations for LTR, culminating in overfitting. To tackle above challenges, we proposed a Generative Semi-supervised Pre-trained (GS2P) LTR model. Specifically, GS2P first generates pseudo-labels for the unlabeled samples using tree-based LTR models after a series of co-training procedures, then learns the representations of query-webpage pairs with self-attentive transformers via both discriminative and generative losses. Finally, GS2P boosts the performance of LTR through incorporating Random Fourier Features to over-parameterize the models into "interpolating regime", so as to enjoy the further descent of generalization errors with learned representations. We conduct extensive offline experiments on a publicly available dataset and a real-world dataset collected from a large-scale search engine. The results show that GS2P can achieve the best performance on both datasets, compared to baselines. We also deploy GS2P at a large-scale web search engine with realistic traffic, where we can still observe significant improvement in real-world applications.

IJCAI Conference 2024 Conference Paper

MPGraf: a Modular and Pre-trained Graphformer for Learning to Rank at Web-scale (Extended Abstract)

  • Yuchen Li
  • Haoyi Xiong
  • Linghe Kong
  • Zeyi Sun
  • Hongyang Chen
  • Shuaiqiang Wang
  • Dawei Yin

Both Transformer and Graph Neural Networks (GNNs) have been used in learning to rank (LTR), however, they adhere to two distinct yet complementary problem formulations, i. e. , ranking score regression based on query-webpage pairs and link prediction within query-webpage bipartite graphs, respectively. Though it is possible to pre-train GNNs or Transformers on source datasets and fine-tune them subject to sparsely annotated LTR datasets separately, the source-target distribution shifts across the pairs and bipartite graphs domains make it extremely difficult to integrate these diverse models into a single LTR framework at a web-scale. We introduce the novel MPGraf model, which utilizes a modular and capsule-based pre-training approach, aiming to incorporate regression capacities from Transformers and link prediction capabilities of GNNs cohesively. We conduct extensive experiments to evaluate the performance of MPGraf using real-world datasets collected from large-scale search engines. The results show that MPGraf can outperform baseline algorithms on several major metrics. Further, we deploy and evaluate MPGraf atop a large-scale search engine with realistic web traffic via A/B tests, where we can still observe significant improvement. MPGraf performs consistently in both offline and online evaluations.

NeurIPS Conference 2023 Conference Paper

Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New Benchmarking

  • Juanhui Li
  • Harry Shomer
  • Haitao Mao
  • Shenglai Zeng
  • Yao Ma
  • Neil Shah
  • Jiliang Tang
  • Dawei Yin

Link prediction attempts to predict whether an unseen edge exists based on only a portion of the graph. A flurry of methods has been created in recent years that attempt to make use of graph neural networks (GNNs) for this task. Furthermore, new and diverse datasets have also been created to better evaluate the effectiveness of these new models. However, multiple limitations currently exist that hinders our ability to properly evaluate these new methods. This includes, but is not limited to: (1) The underreporting of performance on multiple baselines, (2) A lack of a unified data split and evaluation metric on some datasets, (3) An unrealistic evaluation setting that produces negative samples that are easy to classify. To overcome these challenges we first conduct a fair comparison across prominent methods and datasets, utilizing the same dataset settings and hyperparameter settings. We then create a new real-world evaluation setting that samples difficult negative samples via multiple heuristics. The new evaluation setting helps promote new challenges and opportunities in link prediction by aligning the evaluation with real-world situations.

AAAI Conference 2023 Conference Paper

Feature-Level Debiased Natural Language Understanding

  • Yougang Lyu
  • Piji Li
  • Yechang Yang
  • Maarten de Rijke
  • Pengjie Ren
  • Yukun Zhao
  • Dawei Yin
  • Zhaochun Ren

Natural language understanding (NLU) models often rely on dataset biases rather than intended task-relevant features to achieve high performance on specific datasets. As a result, these models perform poorly on datasets outside the training distribution. Some recent studies address this issue by reducing the weights of biased samples during the training process. However, these methods still encode biased latent features in representations and neglect the dynamic nature of bias, which hinders model prediction. We propose an NLU debiasing method, named debiasing contrastive learning (DCT), to simultaneously alleviate the above problems based on contrastive learning. We devise a debiasing, positive sampling strategy to mitigate biased latent features by selecting the least similar biased positive samples. We also propose a dynamic negative sampling strategy to capture the dynamic influence of biases by employing a bias-only model to dynamically select the most similar biased negative samples. We conduct experiments on three NLU benchmark datasets. Experimental results show that DCT outperforms state-of-the-art baselines on out-of-distribution datasets while maintaining in-distribution performance. We also verify that DCT can reduce biased latent features from the model's representation.

NeurIPS Conference 2023 Conference Paper

Learning to Tokenize for Generative Retrieval

  • Weiwei Sun
  • Lingyong Yan
  • Zheng Chen
  • Shuaiqiang Wang
  • Haichao Zhu
  • Pengjie Ren
  • Zhumin Chen
  • Dawei Yin

As a new paradigm in information retrieval, generative retrieval directly generates a ranked list of document identifiers (docids) for a given query using generative language models (LMs). How to assign each document a unique docid (denoted as document tokenization) is a critical problem, because it determines whether the generative retrieval model can precisely retrieve any document by simply decoding its docid. Most existing methods adopt rule-based tokenization, which is ad-hoc and does not generalize well. In contrast, in this paper we propose a novel document tokenization learning method, GenRet, which learns to encode the complete document semantics into docids. GenRet learns to tokenize documents into short discrete representations (i. e. , docids) via a discrete auto-encoding approach. We develop a progressive training scheme to capture the autoregressive nature of docids and diverse clustering techniques to stabilize the training process. Based on the semantic-embedded docids of any set of documents, the generative retrieval model can learn to generate the most relevant docid only according to the docids' semantic relevance to the queries. We conduct experiments on the NQ320K, MS MARCO, and BEIR datasets. GenRet establishes the new state-of-the-art on the NQ320K dataset. Compared to generative retrieval baselines, GenRet can achieve significant improvements on unseen documents. Moreover, GenRet can also outperform comparable baselines on MS MARCO and BEIR, demonstrating the method's generalizability.

NeurIPS Conference 2022 Conference Paper

A Large Scale Search Dataset for Unbiased Learning to Rank

  • Lixin Zou
  • Haitao Mao
  • Xiaokai Chu
  • Jiliang Tang
  • Wenwen Ye
  • Shuaiqiang Wang
  • Dawei Yin

The unbiased learning to rank (ULTR) problem has been greatly advanced by recent deep learning techniques and well-designed debias algorithms. However, promising results on the existing benchmark datasets may not be extended to the practical scenario due to some limitations of existing datasets. First, their semantic feature extractions are outdated while state-of-the-art large-scale pre-trained language models like BERT cannot be utilized due to the lack of original text. Second, display features are incomplete; thus in-depth study on ULTR is impossible such as the displayed abstract for analyzing the click necessary bias. Third, synthetic user feedback has been adopted by most existing datasets and real-world user feedback is greatly missing. To overcome these disadvantages, we introduce the Baidu-ULTR dataset. It involves randomly sampled 1. 2 billion searching sessions and 7, 008 expert annotated queries(397, 572 query document pairs). Baidu-ULTR is the first billion-level dataset for ULTR. Particularly, it offers: (1)the original semantic features and pre-trained language models of different sizes; (2)sufficient display information such as position, displayed height, and displayed abstract, enabling the comprehensive study of multiple displayed biases; and (3)rich user feedback on search result pages (SERPs) like dwelling time, allowing for user engagement optimization and promoting the exploration of multi-task learning in ULTR. Furthermore, we present the design principle of Baidu-ULTR and the performance of representative ULTR algorithms on Baidu-ULTR. The Baidu-ULTR dataset and corresponding baseline implementations are available at https: //github. com/ChuXiaokai/baidu ultr dataset. The dataset homepage is available at https: //searchscience. baidu. com/dataset. html.

IJCAI Conference 2020 Conference Paper

Exemplar Guided Neural Dialogue Generation

  • Hengyi Cai
  • Hongshen Chen
  • Yonghao Song
  • Xiaofang Zhao
  • Dawei Yin

Humans benefit from previous experiences when taking actions. Similarly, related examples from the training data also provide exemplary information for neural dialogue models when responding to a given input message. However, effectively fusing such exemplary information into dialogue generation is non-trivial: useful exemplars are required to be not only literally-similar, but also topic-related with the given context. Noisy exemplars impair the neural dialogue models understanding the conversation topics and even corrupt the response generation. To address the issues, we propose an exemplar guided neural dialogue generation model where exemplar responses are retrieved in terms of both the text similarity and the topic proximity through a two-stage exemplar retrieval model. In the first stage, a small subset of conversations is retrieved from a training set given a dialogue context. These candidate exemplars are then finely ranked regarding the topical proximity to choose the best-matched exemplar response. To further induce the neural dialogue generation model consulting the exemplar response and the conversation topics more faithfully, we introduce a multi-source sampling mechanism to provide the dialogue model with both local exemplary semantics and global topical guidance during decoding. Empirical evaluations on a large-scale conversation dataset show that the proposed approach significantly outperforms the state-of-the-art in terms of both the quantitative metrics and human evaluations.

AAAI Conference 2020 Conference Paper

Learning from Easy to Complex: Adaptive Multi-Curricula Learning for Neural Dialogue Generation

  • Hengyi Cai
  • Hongshen Chen
  • Cheng Zhang
  • Yonghao Song
  • Xiaofang Zhao
  • Yangxi Li
  • Dongsheng Duan
  • Dawei Yin

Current state-of-the-art neural dialogue systems are mainly data-driven and are trained on human-generated responses. However, due to the subjectivity and open-ended nature of human conversations, the complexity of training dialogues varies greatly. The noise and uneven complexity of query-response pairs impede the learning efficiency and effects of the neural dialogue generation models. What is more, so far, there are no unified dialogue complexity measurements, and the dialogue complexity embodies multiple aspects of attributes— specificity, repetitiveness, relevance, etc. Inspired by human behaviors of learning to converse, where children learn from easy dialogues to complex ones and dynamically adjust their learning progress, in this paper, we first analyze five dialogue attributes to measure the dialogue complexity in multiple perspectives on three publicly available corpora. Then, we propose an adaptive multi-curricula learning framework to schedule a committee of the organized curricula. The framework is established upon the reinforcement learning paradigm, which automatically chooses different curricula at the evolving learning process according to the learning status of the neural dialogue generation model. Extensive experiments conducted on five state-of-the-art models demonstrate its learning efficiency and effectiveness with respect to 13 automatic evaluation metrics and human judgments.

IJCAI Conference 2020 Conference Paper

Modeling Topical Relevance for Multi-Turn Dialogue Generation

  • Hainan Zhang
  • Yanyan Lan
  • Liang Pang
  • Hongshen Chen
  • Zhuoye Ding
  • Dawei Yin

Topic drift is a common phenomenon in multi-turn dialogue. Therefore, an ideal dialogue generation models should be able to capture the topic information of each context, detect the relevant context, and produce appropriate responses accordingly. However, existing models usually use word or sentence level similarities to detect the relevant contexts, which fail to well capture the topical level relevance. In this paper, we propose a new model, named STAR-BTM, to tackle this problem. Firstly, the Biterm Topic Model is pre-trained on the whole training dataset. Then, the topic level attention weights are computed based on the topic representation of each context. Finally, the attention weights and the topic distribution are utilized in the decoding process to generate the corresponding responses. Experimental results on both Chinese customer services data and English Ubuntu dialogue data show that STAR-BTM significantly outperforms several state-of-the-art methods, in terms of both metric-based and human evaluations.

AAAI Conference 2020 Conference Paper

Posterior-GAN: Towards Informative and Coherent Response Generation with Posterior Generative Adversarial Network

  • Shaoxiong Feng
  • Hongshen Chen
  • Kan Li
  • Dawei Yin

Neural conversational models learn to generate responses by taking into account the dialog history. These models are typically optimized over the query-response pairs with a maximum likelihood estimation objective. However, the queryresponse tuples are naturally loosely coupled, and there exist multiple responses that can respond to a given query, which leads the conversational model learning burdensome. Besides, the general dull response problem is even worsened when the model is confronted with meaningless response training instances. Intuitively, a high-quality response not only responds to the given query but also links up to the future conversations, in this paper, we leverage the queryresponse-future turn triples to induce the generated responses that consider both the given context and the future conversations. To facilitate the modeling of these triples, we further propose a novel encoder-decoder based generative adversarial learning framework, Posterior Generative Adversarial Network (Posterior-GAN), which consists of a forward and a backward generative discriminator to cooperatively encourage the generated response to be informative and coherent by two complementary assessment perspectives. Experimental results demonstrate that our method effectively boosts the informativeness and coherence of the generated response on both automatic and human evaluation, which verifies the advantages of considering two assessment perspectives.

IJCAI Conference 2019 Conference Paper

Semi-supervised User Profiling with Heterogeneous Graph Attention Networks

  • Weijian Chen
  • Yulong Gu
  • Zhaochun Ren
  • Xiangnan He
  • Hongtao Xie
  • Tong Guo
  • Dawei Yin
  • Yongdong Zhang

Aiming to represent user characteristics and personal interests, the task of user profiling is playing an increasingly important role for many real-world applications, e. g. , e-commerce and social networks platforms. By exploiting the data like texts and user behaviors, most existing solutions address user profiling as a classification task, where each user is formulated as an individual data instance. Nevertheless, a user's profile is not only reflected from her/his affiliated data, but also can be inferred from other users, e. g. , the users that have similar co-purchase behaviors in e-commerce, the friends in social networks, etc. In this paper, we approach user profiling in a semi-supervised manner, developing a generic solution based on heterogeneous graph learning. On the graph, nodes represent the entities of interest (e. g. , users, items, attributes of items, etc. ), and edges represent the interactions between entities. Our heterogeneous graph attention networks (HGAT) method learns the representation for each entity by accounting for the graph structure, and exploits the attention mechanism to discriminate the importance of each neighbor entity. Through such a learning scheme, HGAT can leverage both unsupervised information and limited labels of users to build the predictor. Extensive experiments on a real-world e-commerce dataset verify the effectiveness and rationality of our HGAT for user profiling.

AAAI Conference 2019 Conference Paper

Spectral Clustering in Heterogeneous Information Networks

  • Xiang Li
  • Ben Kao
  • Zhaochun Ren
  • Dawei Yin

A heterogeneous information network (HIN) is one whose objects are of different types and links between objects could model different object relations. We study how spectral clustering can be effectively applied to HINs. In particular, we focus on how meta-path relations are used to construct an effective similarity matrix based on which spectral clustering is done. We formulate the similarity matrix construction as an optimization problem and propose the SClump algorithm for solving the problem. We conduct extensive experiments comparing SClump with other state-of-the-art clustering algorithms on HINs. Our results show that SClump outperforms the competitors over a range of datasets w. r. t. different clustering quality measures.

IJCAI Conference 2018 Conference Paper

Learning Tag Dependencies for Sequence Tagging

  • Yuan Zhang
  • Hongshen Chen
  • Yihong Zhao
  • Qun Liu
  • Dawei Yin

Sequence tagging is the basis for multiple applications in natural language processing. Despite successes in learning long term token sequence dependencies with neural network, tag dependencies are rarely considered previously. Sequence tagging actually possesses complex dependencies and interactions among the input tokens and the output tags. We propose a novel multi-channel model, which handles different ranges of token-tag dependencies and their interactions simultaneously. A tag LSTM is augmented to manage the output tag dependencies and word-tag interactions, while three mechanisms are presented to efficiently incorporate token context representation and tag dependency. Extensive experiments on part-of-speech tagging and named entity recognition tasks show that the proposed model outperforms the BiLSTM-CRF baseline by effectively incorporating the tag dependency feature.

AAAI Conference 2016 Conference Paper

Recommendation with Social Dimensions

  • Jiliang Tang
  • Suhang Wang
  • Xia Hu
  • Dawei Yin
  • Yingzhou Bi
  • Yi Chang
  • Huan Liu

The pervasive presence of social media greatly enriches online users’ social activities, resulting in abundant social relations. Social relations provide an independent source for recommendation, bringing about new opportunities for recommender systems. Exploiting social relations to improve recommendation performance attracts a great amount of attention in recent years. Most existing social recommender systems treat social relations homogeneously and make use of direct connections (or strong dependency connections). However, connections in online social networks are intrinsically heterogeneous and are a composite of various relations. While connected users in online social networks form groups, and users in a group share similar interests, weak dependency connections are established among these users when they are not directly connected. In this paper, we investigate how to exploit the heterogeneity of social relations and weak dependency connections for recommendation. In particular, we employ social dimensions to simultaneously capture heterogeneity of social relations and weak dependency connections, and provide principled ways to model social dimensions, and propose a recommendation framework SoDimRec which incorporates heterogeneity of social relations and weak dependency connections based on social dimensions. Experimental results on real-world data sets demonstrate the effectiveness of the proposed framework. We conduct further experiments to understand the important role of social dimensions in the proposed framework.

IJCAI Conference 2016 Conference Paper

Timeline Summarization from Social Media with Life Cycle Models

  • Yi Chang
  • Jiliang Tang
  • Dawei Yin
  • Makoto Yamada
  • Yan Liu

The popularity of social media shatters the barrier for online users to create and share information at any place at any time. As a consequence, it has become increasing difficult to locate relevance information about an entity. Timeline has been proven to provide an effective and efficient access to understand an entity by displaying a list of episodes about the entity in chronological order. However, summarizing the timeline about an entity with social media data faces new challenges. First, key timeline episodes about the entity are typically unavailable in existing social media services. Second, the short, noisy and informal nature of social media posts determines that only content-based summarization could be insufficient. In this paper, we investigate the problem of timeline summarization and propose a novel framework Timeline-Sumy, which consists of episode detecting and summary ranking. In episode detecting, we explicitly model temporal information with life cycle models to detect timeline episodes since episodes usually exhibit sudden-rise-and-heavy-tail patterns on time-series. In summary ranking, we rank social media posts in each episode via a learning-to-rank approach. The experimental results on social media datasets demonstrate the effectiveness of the proposed framework.

AAAI Conference 2015 Conference Paper

Causal Inference via Sparse Additive Models with Application to Online Advertising

  • Wei Sun
  • Pengyuan Wang
  • Dawei Yin
  • Jian Yang
  • Yi Chang

Advertising effectiveness measurement is a fundamental problem in online advertising. Various causal inference methods have been employed to measure the causal effects of ad treatments. However, existing methods mainly focus on linear logistic regression for univariate and binary treatments and are not well suited for complex ad treatments of multi-dimensions, where each dimension could be discrete or continuous. In this paper we propose a novel two-stage causal inference framework for assessing the impact of complex ad treatments. In the first stage, we estimate the propensity parameter via a sparse additive model; in the second stage, a propensity-adjusted regression model is applied for measuring the treatment effect. Our approach is shown to provide an unbiased estimation of the ad effectiveness under regularity conditions. To demonstrate the efficacy of our approach, we apply it to a real online advertising campaign to evaluate the impact of three ad treatments: ad frequency, ad channel, and ad size. We show that the ad frequency usually has a treatment effect cap when ads are showing on mobile device. In addition, the strategies for choosing best ad size are completely different for mobile ads and online ads.

AAAI Conference 2011 Conference Paper

Temporal Dynamics of User Interests in Tagging Systems

  • Dawei Yin
  • Liangjie Hong
  • Zhenzhen Xue
  • Brian Davison

Collaborative tagging systems are now deployed extensively to help users share and organize resources. Tag prediction and recommendation systems generally model user behavior as research has shown that accuracy can be significantly improved by modeling users’ preferences. However, these preferences are usually treated as constant over time, neglecting the temporal factor within users’ interests. On the other hand, little is known about how this factor may influence prediction in social bookmarking systems. In this paper, we investigate the temporal dynamics of user interests in tagging systems and propose a user-tag-specific temporal interests model for tracking users’ interests over time. Additionally, we analyze the phenomenon of topic switches in social bookmarking systems, showing that a temporal interests model can benefit from the integration of topic switch detection and that temporal characteristics of social tagging systems are different from traditional concept drift problems. We conduct experiments on three public datasets, demonstrating the importance of personalization and user-tag specialization in tagging systems. Experimental results show that our method can outperform state-of-the-art tag prediction algorithms. We also incorporate our model within existing content-based methods yielding significant improvements in performance.