Arrow Research search

Author name cluster

Maarten de Rijke

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

40 papers
2 author rows

Possible papers

40

TMLR Journal 2026 Journal Article

A Simple and Effective Reinforcement Learning Method for Text-to-Image Diffusion Fine-tuning

  • Shashank Gupta
  • Chaitanya Ahuja
  • Tsung-Yu Lin
  • Sreya Dutta Roy
  • Harrie Oosterhuis
  • Maarten de Rijke
  • Satya Narayan Shukla

Reinforcement learning (RL)-based fine-tuning has emerged as a powerful approach for aligning diffusion models with black-box objectives. Proximal policy optimization ( PPO) is a popular choice of method for policy optimization. While effective in terms of performance and sample complexity, PPO is highly sensitive to hyper-parameters and involves substantial computational overhead. REINFORCE, on the other hand, mitigates some implementation complexities such as high memory overhead and sensitive hyper-parameter tuning, but has suboptimal performance due to high variance and crucially sample inefficiency, which is the primary notion of efficiency we study in this work. While the variance of the REINFORCE can be reduced by sampling multiple actions per input prompt and using a baseline correction term, it still suffers from sample inefficiency. To address these challenges, we systematically analyze the sample efficiency-effectiveness trade-off between REINFORCE and PPO, and propose leave-one-out PPO ( LOOP), a novel RL for diffusion fine-tuning method. LOOP combines variance reduction techniques from REINFORCE, such as sampling multiple actions per input prompt and a baseline correction term, with the robustness and sample efficiency of PPO via clipping and importance sampling. Our results demonstrate that LOOP effectively improves diffusion models on various black-box objectives, and achieves a better balance between sample efficiency and final performance.

AAAI Conference 2026 Conference Paper

Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning

  • Wenda Wei
  • Yu-An Liu
  • Ruqing Zhang
  • Jiafeng Guo
  • Lixin Su
  • Shuaiqiang Wang
  • Dawei Yin
  • Maarten de Rijke

Retrieval-augmented generation (RAG) has proven to be effective in mitigating hallucinations in large language models, yet its effectiveness remains limited in complex, multi-step reasoning scenarios. Recent efforts have incorporated search-based interactions into RAG, enabling iterative reasoning with real-time retrieval. Most approaches rely on outcome-based supervision, offering no explicit guidance for intermediate steps. This often leads to reward hacking and degraded response quality. We propose Bi-RAR, a novel retrieval-augmented reasoning framework that evaluates each intermediate step jointly in both forward and backward directions. To assess the information completeness of each step, we introduce a bidirectional information distance grounded in Kolmogorov complexity, approximated via language model generation probabilities. This quantification measures both how far the current reasoning is from the answer and how well it addresses the question. To optimize reasoning under these bidirectional signals, we adopt a multi-objective reinforcement learning framework with a cascading reward structure that emphasizes early trajectory alignment. Empirical results on seven question answering benchmarks demonstrate that Bi-RAR surpasses previous methods and enables efficient interaction and reasoning with the search engine during training and inference.

JAIR Journal 2026 Journal Article

Understanding AI Trustworthiness: A Scoping Review of AIES & FAccT Articles

  • Siddharth Mehrotra
  • Jin Huang
  • Xuelong Fu
  • Roel Dobbe
  • Clara I. Sánchez
  • Maarten de Rijke

Background: Trustworthy AI serves as a foundational pillar for two major AI ethics conferences: AIES and FAccT. Current research often adopts techno-centric approaches, focusing primarily on technical attributes such as accuracy, reliability, robustness, and fairness, while overlooking the sociotechnical dimensions critical to understanding AI trustworthiness in real-world contexts. Objectives: This scoping review aims to examine how the AIES and FAccT communities conceptualize, measure, and validate AI trustworthiness, identifying major gaps and opportunities for advancing a holistic understanding of trustworthy AI systems. Methods: We conduct a scoping review of the AIES and FAccT conference proceedings to date, systematically analyzing how trustworthiness is defined, operationalized, and applied across different research domains. Our analysis focuses on conceptualization approaches, measurement methods, verification and validation techniques, application areas, and underlying values. Results: While significant progress has been made in defining technical attributes such as transparency, accountability, and robustness, our findings reveal critical gaps. Current research often predominantly emphasizes technical precision at the expense of social and ethical considerations. The sociotechnical nature of AI systems remains less explored and trustworthiness emerges as a contested concept shaped by those with the power to define it. Conclusions: An interdisciplinary approach combining technical rigor with social, cultural, and institutional considerations is essential for advancing trustworthy AI. We propose actionable measures for the AI ethics community to adopt holistic frameworks that genuinely address the complex interplay between AI systems and society, ultimately promoting responsible technological development that benefits all stakeholders.

AAAI Conference 2025 Conference Paper

Attack-in-the-Chain: Bootstrapping Large Language Models for Attacks Against Black-Box Neural Ranking Models

  • Yu-An Liu
  • Ruqing Zhang
  • Jiafeng Guo
  • Maarten de Rijke
  • Yixing Fan
  • Xueqi Cheng

Neural ranking models (NRMs) have been shown to be highly effective in terms of retrieval performance. Unfortunately, they have also displayed a higher degree of sensitivity to attacks than previous generation models. To help expose and address this lack of robustness, we introduce a novel ranking attack framework named Attack-in-the-Chain, which tracks interactions between large language models (LLMs) and NRMs based on chain-of-thought (CoT) prompting to generate adversarial examples under black-box settings. Our approach starts by identifying anchor documents with higher ranking positions than the target document as nodes in the reasoning chain. We then dynamically assign the number of perturbation words to each node and prompt LLMs to execute attacks. Finally, we verify the attack performance of all nodes at each reasoning step and proceed to generate the next reasoning step. Empirical results on two web search benchmarks show the effectiveness of our method.

UAI Conference 2025 Conference Paper

Distributional Reinforcement Learning with Dual Expectile-Quantile Regression

  • Sami Jullien
  • Romain Deffayet
  • Jean-Michel Renders
  • Paul Groth
  • Maarten de Rijke

Distributional reinforcement learning (RL) has proven useful in multiple benchmarks as it enables approximating the full distribution of returns and extracts a rich feedback from environment samples. The commonly used quantile regression approach to distributional RL – based on asymmetric $L_1$ losses – provides a flexible and effective way of learning arbitrary return distributions. In practice, it is often improved by using a more efficient, asymmetric hybrid $L_1$-$L_2$ Huber loss for quantile regression. However, by doing so, distributional estimation guarantees vanish, and we empirically observe that the estimated distribution rapidly collapses to its mean. Indeed, asymmetric $L_2$ losses, corresponding to expectile regression, cannot be readily used for distributional temporal difference learning. Motivated by the efficiency of $L_2$-based learning, we propose to jointly learn expectiles and quantiles of the return distribution in a way that allows efficient learning while keeping an estimate of the full distribution of returns. We prove that our proposed operator converges to the distributional Bellman operator in the limit of infinite estimated quantile and expectile fractions, and we benchmark a practical implementation on a toy example and at scale. On the Atari benchmark, our approach matches the performance of the Huber-based IQN-1 baseline after $200$M training frames but avoids distributional collapse and keeps estimates of the full distribution of returns.

AAAI Conference 2025 Conference Paper

ExcluIR: Exclusionary Neural Information Retrieval

  • Wenhao Zhang
  • Mengqi Zhang
  • Shiguang Wu
  • Jiahuan Pei
  • Zhaochun Ren
  • Maarten de Rijke
  • Zhumin Chen
  • Pengjie Ren

Exclusion is an important and universal linguistic skill that humans use to express what they do not want. There is little research on exclusionary retrieval, where users express what they do not want to be part of the results produced for their queries. We investigate the scenario of exclusionary retrieval in document retrieval for the first time. We present ExcluIR, a set of resources for exclusionary retrieval, consisting of an evaluation benchmark and a training set for helping retrieval models to comprehend exclusionary queries. The evaluation benchmark includes 3,452 high-quality exclusionary queries, each of which has been manually annotated. The training set contains 70,293 exclusionary queries, each paired with a positive document and a negative document. We conduct detailed experiments and analyses, obtaining three main observations: (i) existing retrieval models with different architectures struggle to comprehend exclusionary queries effectively; (ii) although integrating our training data can improve the performance of retrieval models on exclusionary retrieval, there still exists a gap compared to human performance; and (iii) generative retrieval models have a natural advantage in handling exclusionary queries.

ICLR Conference 2025 Conference Paper

MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization

  • Yougang Lyu
  • Lingyong Yan
  • Zihan Wang 0002
  • Dawei Yin 0001
  • Pengjie Ren
  • Maarten de Rijke
  • Zhaochun Ren

As large language models (LLMs) are rapidly advancing and achieving near-human capabilities on specific tasks, aligning them with human values is becoming more urgent. In scenarios where LLMs outperform humans, we face a weak-to-strong alignment problem where we need to effectively align strong student LLMs through weak supervision generated by weak teachers. Existing alignment methods mainly focus on strong-to-weak alignment and self-alignment settings, and it is impractical to adapt them to the much harder weak-to-strong alignment setting. To fill this gap, we propose a multi-agent contrastive preference optimization (MACPO) framework. MACPO facilitates weak teachers and strong students to learn from each other by iteratively reinforcing unfamiliar positive behaviors while penalizing familiar negative ones. To get this, we devise a mutual positive behavior augmentation strategy to encourage weak teachers and strong students to learn from each other's positive behavior and further provide higher quality positive behavior for the next iteration. Additionally, we propose a hard negative behavior construction strategy to induce weak teachers and strong students to generate familiar negative behavior by fine-tuning on negative behavioral data. Experimental results on the HH-RLHF and PKU-SafeRLHF datasets, evaluated using both automatic metrics and human judgments, demonstrate that MACPO simultaneously improves the alignment performance of strong students and weak teachers. Moreover, as the number of weak teachers increases, MACPO achieves better weak-to-strong alignment performance through more iteration optimization rounds.

TMLR Journal 2024 Journal Article

Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning

  • Maurits Bleeker
  • Mariya Hendriksen
  • Andrew Yates
  • Maarten de Rijke

Vision-language models (VLMs) mainly rely on contrastive training to learn general-purpose representations of images and captions. We focus on the situation when one image is associated with several captions, each caption containing both information shared among all captions and unique information per caption about the scene depicted in the image. In such cases, it is unclear whether contrastive losses are sufficient for learning task-optimal representations that contain all the information provided by the captions or whether the contrastive learning setup encourages the learning of a simple shortcut that minimizes contrastive loss. We introduce synthetic shortcuts for vision-language: a training and evaluation framework where we inject synthetic shortcuts into image-text data. We show that contrastive VLMs trained from scratch or fine-tuned with data containing these synthetic shortcuts mainly learn features that represent the shortcut. Hence, contrastive losses are not sufficient to learn task-optimal representations, i.e., representations that contain all task-relevant information shared between the image and associated captions. We examine two methods to reduce shortcut learning in our training and evaluation framework: (i) latent target decoding and (ii) implicit feature modification. We show empirically that both methods improve performance on the evaluation task, but only partly reduce shortcut learning when training and evaluating with our shortcut learning framework. Hence, we show the difficulty and challenge of our shortcut learning framework for contrastive vision-language representation learning.

NeurIPS Conference 2024 Conference Paper

Generative Retrieval Meets Multi-Graded Relevance

  • Yubao Tang
  • Ruqing Zhang
  • Jiafeng Guo
  • Maarten de Rijke
  • Wei Chen
  • Xueqi Cheng

Generative retrieval represents a novel approach to information retrieval, utilizing an encoder-decoder architecture to directly produce relevant document identifiers (docids) for queries. While this method offers benefits, current implementations are limited to scenarios with binary relevance data, overlooking the potential for documents to have multi-graded relevance. Extending generative retrieval to accommodate multi-graded relevance poses challenges, including the need to reconcile likelihood probabilities for docid pairs and the possibility of multiple relevant documents sharing the same identifier. To address these challenges, we introduce a new framework called GRaded Generative Retrieval (GR$^2$). Our approach focuses on two key components: ensuring relevant and distinct identifiers, and implementing multi-graded constrained contrastive training. Firstly, we aim to create identifiers that are both semantically relevant and sufficiently distinct to represent individual documents effectively. This is achieved by jointly optimizing the relevance and distinctness of docids through a combination of docid generation and autoencoder models. Secondly, we incorporate information about the relationship between relevance grades to guide the training process. Specifically, we leverage a constrained contrastive training strategy to bring the representations of queries and the identifiers of their relevant documents closer together, based on their respective relevance grades. Extensive experiments on datasets with both multi-graded and binary relevance demonstrate the effectiveness of our method.

AAAI Conference 2024 Conference Paper

Perturbation-Invariant Adversarial Training for Neural Ranking Models: Improving the Effectiveness-Robustness Trade-Off

  • Yu-An Liu
  • Ruqing Zhang
  • Mingkun Zhang
  • Wei Chen
  • Maarten de Rijke
  • Jiafeng Guo
  • Xueqi Cheng

Neural ranking models (NRMs) have shown great success in information retrieval (IR). But their predictions can easily be manipulated using adversarial examples, which are crafted by adding imperceptible perturbations to legitimate documents. This vulnerability raises significant concerns about their reliability and hinders the widespread deployment of NRMs. By incorporating adversarial examples into training data, adversarial training has become the de facto defense approach to adversarial attacks against NRMs. However, this defense mechanism is subject to a trade-off between effectiveness and adversarial robustness. In this study, we establish theoretical guarantees regarding the effectiveness-robustness trade-off in NRMs. We decompose the robust ranking error into two components, i.e., a natural ranking error for effectiveness evaluation and a boundary ranking error for assessing adversarial robustness. Then, we define the perturbation invariance of a ranking model and prove it to be a differentiable upper bound on the boundary ranking error for attainable computation. Informed by our theoretical analysis, we design a novel perturbation-invariant adversarial training (PIAT) method for ranking models to achieve a better effectiveness-robustness trade-off. We design a regularized surrogate loss, in which one term encourages the effectiveness to be maximized while the regularization term encourages the output to be smooth, so as to improve adversarial robustness. Experimental results on several ranking models demonstrate the superiority of PITA compared to existing adversarial defenses.

ECAI Conference 2024 Conference Paper

QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs

  • Weijia Zhang 0004
  • Vaishali Pal
  • Jia-Hong Huang
  • Evangelos Kanoulas
  • Maarten de Rijke

Table summarization is a crucial task aimed at condensing information from tabular data into concise and comprehensible textual summaries. However, existing approaches often fall short of adequately meeting users’ information and quality requirements and tend to overlook the complexities of real-world queries. In this paper, we propose a novel method to address these limitations by introducing query-focused multi-table summarization. Our approach, which comprises a table serialization module, a summarization controller, and a large language model (LLM), utilizes textual queries and multiple tables to generate query-dependent table summaries tailored to users’ information needs. To facilitate research in this area, we present a comprehensive dataset specifically tailored for this task, consisting of 4, 909 query-summary pairs, each associated with multiple tables. Through extensive experiments using our curated dataset, we demonstrate the effectiveness of our proposed method compared to baseline approaches. Our findings offer insights into the challenges of complex table reasoning for precise summarization, contributing to the advancement of research in query-focused multi-table summarization.

TMLR Journal 2023 Journal Article

A Simulation Environment and Reinforcement Learning Method for Waste Reduction

  • Sami Jullien
  • Mozhdeh Ariannezhad
  • Paul Groth
  • Maarten de Rijke

In retail (e.g., grocery stores, apparel shops, online retailers), inventory managers have to balance short-term risk (no items to sell) with long-term-risk (over ordering leading to product waste). This balancing task is made especially hard due to the lack of information about future customer purchases. In this paper, we study the problem of restocking a grocery store’s inventory with perishable items over time, from a distributional point of view. The objective is to maximize sales while minimizing waste, with uncertainty about the actual consumption by costumers. This problem is of a high relevance today, given the growing demand for food and the impact of food waste on the environment, the economy, and purchasing power. We frame inventory restocking as a new reinforcement learning task that exhibits stochastic behavior conditioned on the agent’s actions, making the environment partially observable. We make two main contributions. First, we introduce a new reinforcement learning environment, RetaiL, based on real grocery store data and expert knowledge. This environment is highly stochastic, and presents a unique challenge for reinforcement learning practitioners. We show that uncertainty about the future behavior of the environment is not handled well by classical supply chain algorithms, and that distributional approaches are a good way to account for the uncertainty. Second, we introduce GTDQN, a distributional reinforcement learning algorithm that learns a generalized lambda distribution over the reward space. GTDQN provides a strong baseline for our environment. It outperforms other distributional reinforcement learning approaches in this partially observable setting, in both overall reward and generated waste.

AAAI Conference 2023 Conference Paper

Contrastive Learning Reduces Hallucination in Conversations

  • Weiwei Sun
  • Zhengliang Shi
  • Shen Gao
  • Pengjie Ren
  • Maarten de Rijke
  • Zhaochun Ren

Pre-trained language models (LMs) store knowledge in their parameters and can generate informative responses when used in conversational systems. However, LMs suffer from the problem of “hallucination:” they may generate plausible-looking statements that are irrelevant or factually incorrect. To address this problem, we propose a contrastive learning scheme, named MixCL. A novel mixed contrastive objective is proposed to explicitly optimize the implicit knowledge elicitation process of LMs, and thus reduce their hallucination in conversations. We also examine negative sampling strategies of retrieved hard negatives and model-generated negatives. We conduct experiments on Wizard-of-Wikipedia, a public, open-domain knowledge-grounded dialogue benchmark, and assess the effectiveness of MixCL. MixCL effectively reduces the hallucination of LMs in conversations and achieves the highest performance among LM-based dialogue agents in terms of relevancy and factuality. We show that MixCL achieves comparable performance to state-of-the-art KB-based approaches while enjoying notable advantages in terms of efficiency and scalability.

AAAI Conference 2023 Conference Paper

Feature-Level Debiased Natural Language Understanding

  • Yougang Lyu
  • Piji Li
  • Yechang Yang
  • Maarten de Rijke
  • Pengjie Ren
  • Yukun Zhao
  • Dawei Yin
  • Zhaochun Ren

Natural language understanding (NLU) models often rely on dataset biases rather than intended task-relevant features to achieve high performance on specific datasets. As a result, these models perform poorly on datasets outside the training distribution. Some recent studies address this issue by reducing the weights of biased samples during the training process. However, these methods still encode biased latent features in representations and neglect the dynamic nature of bias, which hinders model prediction. We propose an NLU debiasing method, named debiasing contrastive learning (DCT), to simultaneously alleviate the above problems based on contrastive learning. We devise a debiasing, positive sampling strategy to mitigate biased latent features by selecting the least similar biased positive samples. We also propose a dynamic negative sampling strategy to capture the dynamic influence of biases by employing a bias-only model to dynamically select the most similar biased negative samples. We conduct experiments on three NLU benchmark datasets. Experimental results show that DCT outperforms state-of-the-art baselines on out-of-distribution datasets while maintaining in-distribution performance. We also verify that DCT can reduce biased latent features from the model's representation.

TMLR Journal 2023 Journal Article

Reducing Predictive Feature Suppression in Resource-Constrained Contrastive Image-Caption Retrieval

  • Maurits Bleeker
  • Andrew Yates
  • Maarten de Rijke

To train image-caption retrieval (ICR) methods, contrastive loss functions are a common choice for optimization functions. Unfortunately, contrastive ICR methods are vulnerable to predictive feature suppression. Predictive features are features that correctly indicate the similarity between a query and a candidate item. However, in the presence of multiple predictive features during training, encoder models tend to suppress redundant predictive features, since these features are not needed to learn to discriminate between positive and negative pairs. We introduce an approach to reduce predictive feature suppression for resource-constrained ICR methods: latent target decoding (LTD). We add an additional decoder to the contrastive ICR framework, to reconstruct the input caption in a latent space of a general-purpose sentence encoder, which prevents the image and caption encoder from suppressing predictive features. We implement the LTD objective as an optimization constraint, to ensure that the reconstruction loss is below a bound value while primarily optimizing for the contrastive loss. Importantly, LTD does not depend on additional training data or expensive (hard) negative mining strategies. Our experiments show that, unlike reconstructing the input caption in the input space, LTD reduces predictive feature suppression, measured by obtaining higher recall@k, r-precision, and nDCG scores than a contrastive ICR baseline. Moreover, we show that LTD should be implemented as an optimization constraint instead of a dual optimization objective. Finally, we show that LTD can be used with different contrastive learning losses and a wide variety of resource-constrained ICR methods.

AIJ Journal 2022 Journal Article

Bayesian feature interaction selection for factorization machines

  • Yifan Chen
  • Yang Wang
  • Pengjie Ren
  • Meng Wang
  • Maarten de Rijke

Factorization machines are a generic supervised method for a wide range of tasks in the field of artificial intelligence, such as prediction, inference, etc. , which can effectively model feature interactions. However, handling combinations of features is expensive due to the exponential growth of feature interactions with the order. In nature, not all feature interactions are equally useful for prediction. Recently, a large number of methods that perform feature interaction selection have attracted great attention because of their effectiveness at filtering out useless feature interactions. Current feature interaction selection methods suffered from the following limitations: (1) they assume that all users share the same feature interactions; and (2) they select pairwise feature interactions only. In this paper, we propose novel Bayesian variable selection methods, targeting feature interaction selection for factorization machines, which effectively reduce the number of interactions. We study personalized feature interaction selection to account for individual preferences, and further extend the model to investigate higher-order feature interaction selection on higher-order factorization machines. We provide empirical evidence for the advantages of the proposed Bayesian feature interaction selection methods using different prediction tasks.

AAAI Conference 2022 Conference Paper

FOCUS: Flexible Optimizable Counterfactual Explanations for Tree Ensembles

  • Ana Lucic
  • Harrie Oosterhuis
  • Hinda Haned
  • Maarten de Rijke

Model interpretability has become an important problem in Machine Learning (ML) due to the increased effect that algorithmic decisions have on humans. Counterfactual explanations can help users understand not only why ML models make certain decisions, but also how these decisions can be changed. We frame the problem of finding counterfactual explanations as a gradient-based optimization task and extend previous work that could only be applied to differentiable models. In order to accommodate non-differentiable models such as tree ensembles, we use probabilistic model approximations in the optimization framework. We introduce an approximation technique that is effective for finding counterfactual explanations for predictions of the original model and show that our counterfactual examples are significantly closer to the original instances than those produced by other methods specifically designed for tree ensembles.

TMLR Journal 2022 Journal Article

sigmoidF1: A Smooth F1 Score Surrogate Loss for Multilabel Classification

  • Gabriel Bénédict
  • Hendrik Vincent Koops
  • Daan Odijk
  • Maarten de Rijke

Multilabel classification is the task of attributing multiple labels to examples via predictions. Current models formulate a reduction of the multilabel setting into either multiple binary classifications or multiclass classification, allowing for the use of existing loss functions (sigmoid, cross-entropy, logistic, etc.). These multilabel classification reductions do not accommodate for the prediction of varying numbers of labels per example. Moreover, the loss functions are distant estimates of the performance metrics. We propose sigmoidF1, a loss function that is an approximation of the F1 score that (i) is smooth and tractable for stochastic gradient descent, (ii) naturally approximates a multilabel metric, and (iii) estimates both label suitability and label counts. We show that any confusion matrix metric can be formulated with a smooth surrogate. We evaluate the proposed loss function on text and image datasets, and with a variety of metrics, to account for the complexity of multilabel classification evaluation. sigmoidF1 outperforms other loss functions on one text and two image datasets over several metrics. These results show the effectiveness of using inference-time metrics as loss functions for non-trivial classification problems like multilabel classification.

IJCAI Conference 2021 Conference Paper

Unifying Online and Counterfactual Learning to Rank: A Novel Counterfactual Estimator that Effectively Utilizes Online Interventions (Extended Abstract)

  • Harrie Oosterhuis
  • Maarten de Rijke

State-of-the-art Learning to Rank (LTR) methods for optimizing ranking systems based on user interactions are divided into online approaches – that learn by direct interaction – and counterfactual approaches – that learn from historical interactions. We propose a novel intervention-aware estimator to bridge this online/counterfactual division. The estimator corrects for the effect of position bias, trust bias, and item-selection bias by using corrections based on the behavior of the logging policy and on online interventions: changes to the logging policy made during the gathering of click data. Our experimental results show that, unlike existing counterfactual LTR methods, the intervention-aware estimator can greatly benefit from online interventions. To the best of our knowledge, this is the first method that is shown to be highly effective in both online and counterfactual scenarios.

ECAI Conference 2020 Conference Paper

Bidirectional Scene Text Recognition with a Single Decoder

  • Maurits J. R. Bleeker
  • Maarten de Rijke

Scene Text Recognition (STR) is the problem of recognizing the correct word or character sequence in a cropped word image. To obtain more robust output sequences, the notion of bidirectional STR has been introduced. So far, bidirectional STRs have been implemented by using two separate decoders; one for left-to-right decoding and one for right-to-left. Having two separate decoders for almost the same task with the same output space is undesirable from a computational and optimization point of view. We introduce the Bidirectional Scene Text Transformer (Bi-STET), a novel bidirectional STR method with a single decoder for bidirectional text decoding. With its single decoder, Bi-STET outperforms methods that apply bidirectional decoding by using two separate decoders while also being more efficient than those methods, Furthermore, we achieve or beat state-of-the-art (SOTA) methods on all STR benchmarks with Bi-STET. Finally, we provide analyzes and insights into the performance of Bi-STET.

AAAI Conference 2020 Conference Paper

RefNet: A Reference-Aware Network for Background Based Conversation

  • Chuan Meng
  • Pengjie Ren
  • Zhumin Chen
  • Christof Monz
  • Jun Ma
  • Maarten de Rijke

Existing conversational systems tend to generate generic responses. Recently, Background Based Conversations (BBCs) have been introduced to address this issue. Here, the generated responses are grounded in some background information. The proposed methods for BBCs are able to generate more informative responses, however, they either cannot generate natural responses or have difficulties in locating the right background information. In this paper, we propose a Reference-aware Network (RefNet) to address both issues. Unlike existing methods that generate responses token by token, RefNet incorporates a novel reference decoder that provides an alternative way to learn to directly select a semantic unit (e. g. , a span containing complete semantic information) from the background. Experimental results show that RefNet significantly outperforms state-of-the-art methods in terms of both automatic and human evaluations, indicating that RefNet can generate more appropriate and human-like responses.

ECAI Conference 2020 Conference Paper

Retrospective and Prospective Mixture-of-Generators for Task-Oriented Dialogue Response Generation

  • Jiahuan Pei
  • Pengjie Ren
  • Christof Monz
  • Maarten de Rijke

Dialogue response generation (DRG) is a critical component of task-oriented dialogue systems (TDSs). Its purpose is to generate proper natural language responses given some context, e. g. , historical utterances, system states, etc. State-of-the-art work focuses on how to better tackle DRG in an end-to-end way. Typically, such studies assume that each token is drawn from a single distribution over the output vocabulary, which may not always be optimal. Responses vary greatly with different intents, e. g. , domains, system actions. We propose a novel mixture-of-generators network (MoGNet) for DRG, where we assume that each token of a response is drawn from a mixture of distributions. MoGNet consists of a chair generator and several expert generators. Each expert is specialized for DRG w. r. t. a particular intent. The chair coordinates multiple experts and combines the output they have generated to produce more appropriate responses. We propose two strategies to help the chair make better decisions, namely, a retrospective mixture-of-generators (RMoG) and a prospective mixture-of-generators (PMoG). The former only considers the historical expert-generated responses until the current time step while the latter also considers possible expert-generated responses in the future by encouraging exploration. In order to differentiate experts, we also devise a global-and-local (GL) learning scheme that forces each expert to be specialized towards a particular intent using a local loss and trains the chair and all experts to coordinate using a global loss. We carry out extensive experiments on the MultiWOZ benchmark dataset. MoGNet significantly outperforms state-of-the-art methods in terms of both automatic and human evaluations, demonstrating its effectiveness for DRG.

AAAI Conference 2020 Conference Paper

Thinking Globally, Acting Locally: Distantly Supervised Global-to-Local Knowledge Selection for Background Based Conversation

  • Pengjie Ren
  • Zhumin Chen
  • Christof Monz
  • Jun Ma
  • Maarten de Rijke

Background Based Conversations (BBCs) have been introduced to help conversational systems avoid generating overly generic responses. In a BBC, the conversation is grounded in a knowledge source. A key challenge in BBCs is Knowledge Selection (KS): given a conversational context, try to find the appropriate background knowledge (a text fragment containing related facts or comments, etc.) based on which to generate the next response. Previous work addresses KS by employing attention and/or pointer mechanisms. These mechanisms use a local perspective, i. e. , they select a token at a time based solely on the current decoding state. We argue for the adoption of a global perspective, i. e. , pre-selecting some text fragments from the background knowledge that could help determine the topic of the next response. We enhance KS in BBCs by introducing a Global-to-Local Knowledge Selection (GLKS) mechanism. Given a conversational context and background knowledge, we first learn a topic transition vector to encode the most likely text fragments to be used in the next response, which is then used to guide the local KS at each decoding timestamp. In order to effectively learn the topic transition vector, we propose a distantly supervised learning schema. Experimental results show that the GLKS model significantly outperforms state-of-the-art methods in terms of both automatic and human evaluation. More importantly, GLKS achieves this without requiring any extra annotations, which demonstrates its high degree of scalability.

UAI Conference 2019 Conference Paper

BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback

  • Chang Li 0003
  • Branislav Kveton
  • Tor Lattimore
  • Ilya Markov
  • Maarten de Rijke
  • Csaba Szepesvári
  • Masrour Zoghi

In this paper, we study the problem of safe online learning to re-rank, where user feedback is used to improve the quality of displayed lists. Learning to rank has traditionally been studied in two settings. In the offline setting, rankers are typically learned from relevance labels created by judges. This approach has generally become standard in industrial applications of ranking, such as search. However, this approach lacks exploration and thus is limited by the information content of the offline training data. In the online setting, an algorithm can experiment with lists and learn from feedback on them in a sequential fashion. Bandit algorithms are well-suited for this setting but they tend to learn user preferences from scratch, which results in a high initial cost of exploration. This poses an additional challenge of safe exploration in ranked lists. We propose BubbleRank, a bandit algorithm for safe re-ranking that combines the strengths of both the offline and online settings. The algorithm starts with an initial base list and improves it online by gradually exchanging higher-ranked less attractive items for lower-ranked more attractive items. We prove an upper bound on the n-step regret of BubbleRank that degrades gracefully with the quality of the initial base list. Our theoretical findings are supported by extensive experiments on a large-scale real-world click dataset.

IJCAI Conference 2019 Conference Paper

Cascading Non-Stationary Bandits: Online Learning to Rank in the Non-Stationary Cascade Model

  • Chang Li
  • Maarten de Rijke

Non-stationarity appears in many online applications such as web search and advertising. In this paper, we study the online learning to rank problem in a non-stationary environment where user preferences change abruptly at an unknown moment in time. We consider the problem of identifying the K most attractive items and propose cascading non-stationary bandits, an online learning variant of the cascading model, where a user browses a ranked list from top to bottom and clicks on the first attractive item. We propose two algorithms for solving this non-stationary problem: CascadeDUCB and CascadeSWUCB. We analyze their performance and derive gap-dependent upper bounds on the n-step regret of these algorithms. We also establish a lower bound on the regret for cascading non-stationary bandits and show that both algorithms match the lower bound up to a logarithmic factor. Finally, we evaluate their performance on a real-world web search click dataset.

AAAI Conference 2019 Conference Paper

Dialogue Generation: From Imitation Learning to Inverse Reinforcement Learning

  • Ziming Li
  • Julia Kiseleva
  • Maarten de Rijke

The performance of adversarial dialogue generation models relies on the quality of the reward signal produced by the discriminator. The reward signal from a poor discriminator can be very sparse and unstable, which may lead the generator to fall into a local optimum or to produce nonsense replies. To alleviate the first problem, we first extend a recently proposed adversarial dialogue generation method to an adversarial imitation learning solution. Then, in the framework of adversarial inverse reinforcement learning, we propose a new reward model for dialogue generation that can provide a more accurate and precise reward signal for generator training. We evaluate the performance of the resulting model with automatic metrics and human evaluations in two annotation settings. Our experimental results demonstrate that our model can generate more high-quality responses and achieve higher overall performance than the state-of-the-art.

AAAI Conference 2019 Conference Paper

RepeatNet: A Repeat Aware Neural Recommendation Machine for Session-Based Recommendation

  • Pengjie Ren
  • Zhumin Chen
  • Jing Li
  • Zhaochun Ren
  • Jun Ma
  • Maarten de Rijke

Recurrent neural networks for session-based recommendation have attracted a lot of attention recently because of their promising performance. repeat consumption is a common phenomenon in many recommendation scenarios (e. g. , e-commerce, music, and TV program recommendations), where the same item is re-consumed repeatedly over time. However, no previous studies have emphasized repeat consumption with neural networks. An effective neural approach is needed to decide when to perform repeat recommendation. In this paper, we incorporate a repeat-explore mechanism into neural networks and propose a new model, called RepeatNet, with an encoder-decoder structure. RepeatNet integrates a regular neural recommendation approach in the decoder with a new repeat recommendation mechanism that can choose items from a user’s history and recommends them at the right time. We report on extensive experiments on three benchmark datasets. RepeatNet outperforms state-of-the-art baselines on all three datasets in terms of MRR and Recall. Furthermore, as the dataset size and the repeat ratio increase, the improvements of RepeatNet over the baselines also increase, which demonstrates its advantage in handling repeat recommendation scenarios.

ICML Conference 2018 Conference Paper

Finding Influential Training Samples for Gradient Boosted Decision Trees

  • Boris Sharchilev
  • Yury Ustinovskiy
  • Pavel Serdyukov
  • Maarten de Rijke

We address the problem of finding influential training samples for a particular case of tree ensemble-based models, e. g. , Random Forest (RF) or Gradient Boosted Decision Trees (GBDT). A natural way of formalizing this problem is studying how the model’s predictions change upon leave-one-out retraining, leaving out each individual training sample. Recent work has shown that, for parametric models, this analysis can be conducted in a computationally efficient way. We propose several ways of extending this framework to non-parametric GBDT ensembles under the assumption that tree structures remain fixed. Furthermore, we introduce a general scheme of obtaining further approximations to our method that balance the trade-off between performance and computational complexity. We evaluate our approaches on various experimental setups and use-case scenarios and demonstrate both the quality of our approach to finding influential training samples in comparison to the baselines and its computational efficiency.

NeurIPS Conference 2015 Conference Paper

Copeland Dueling Bandits

  • Masrour Zoghi
  • Zohar Karnin
  • Shimon Whiteson
  • Maarten de Rijke

A version of the dueling bandit problem is addressed in which a Condorcet winner may not exist. Two algorithms are proposed that instead seek to minimize regret with respect to the Copeland winner, which, unlike the Condorcet winner, is guaranteed to exist. The first, Copeland Confidence Bound (CCB), is designed for small numbers of arms, while the second, Scalable Copeland Bandits (SCB), works better for large-scale problems. We provide theoretical results bounding the regret accumulated by CCB and SCB, both substantially improving existing results. Such existing results either offer bounds of the form O(K log T) but require restrictive assumptions, or offer bounds of the form O(K^2 log T) without requiring such assumptions. Our results offer the best of both worlds: O(K log T) bounds without restrictive assumptions.

ECAI Conference 2014 Conference Paper

Detecting the Reputation Polarity of Microblog Posts

  • Cristina Garbacea
  • Manos Tsagkias
  • Maarten de Rijke

We address the task of detecting the reputation polarity of social media updates, that is, deciding whether the content of an update has positive or negative implications for the reputation of a given entity. Typical approaches to this task include sentiment lexicons and linguistic features. However, they fall short in the social media domain because of its unedited and noisy nature, and, more importantly, because reputation polarity is not only encoded in sentiment-bearing words but it is also embedded in other word usage. To this end, automatic methods for extracting discriminative features for reputation polarity detection can play a role. We propose a data-driven, supervised approach for extracting textual features, which we use to train a reputation polarity classifier. Experiments on the RepLab 2013 collection show that our model outperforms the state-of-the-art method based on sentiment analysis by 20% accuracy.

ICML Conference 2014 Conference Paper

Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem

  • Masrour Zoghi
  • Shimon Whiteson
  • Rémi Munos
  • Maarten de Rijke

This paper proposes a new method for the K-armed dueling bandit problem, a variation on the regular K-armed bandit problem that offers only relative feedback about pairs of arms. Our approach extends the Upper Confidence Bound algorithm to the relative setting by using estimates of the pairwise probabilities to select a promising arm and applying Upper Confidence Bound with the winner as a benchmark. We prove a sharp finite-time regret bound of order O(K log t) on a very general class of dueling bandit problems that matches a lower bound proven in (Yue et al. , 2012). In addition, our empirical results using real data from an information retrieval application show that it greatly outperforms the state of the art.

ECAI Conference 2008 Conference Paper

Finding Key Bloggers, One Post At A Time

  • Wouter Weerkamp
  • Krisztian Balog
  • Maarten de Rijke

User generated content in general, and blogs in particular, form an interesting and relatively little explored domain for mining knowledge. We address the task of blog distillation: to find blogs that are principally devoted to a given topic, as opposed to blogs that merely happen to discuss the topic in passing. Working in the setting of statistical language modeling, we model the task by aggregating a blogger's blog posts to collect evidence of relevance to the topic and persistence of interest in the topic. This approach achieves state-of-the-art performance. On top of this baseline, we extend our model by incorporating a number of blog-specific features, concerning document structure, social structure, and temporal structure. These blog-specific features yield further improvements.

IJCAI Conference 2007 Conference Paper

  • Krisztian Balog
  • Maarten de Rijke

The profile of an individual is a record of the types and areas of skills of that individual ("topical profile") plus a description of her collaboration network ("social profile"). In this paper we define and formalize the task of automatically determining an expert profile of a person from a heterogeneous corpus made up of a large organization's intranet. We propose multiple models for addressing the topical profiling task. Our main methods build on ideas from information retrieval, while refinements bring in filtering (allowing an area into a person's profile only if she is among the top ranking experts in the area). An evaluation based on the W3C-corpus made available by TREC, shows significant improvements of the refined methods over the baseline. We apply our profiling algorithms to significantly enhance the performance of a state-of-the-art expert finding algorithm and to help users of an operational expert search system find the person they would contact, given a specific problem, topic or information need. Finally, we address the task of determining a social profile for a given person, using graph-based methods.

TIME Conference 2004 Conference Paper

CTL Model Checking for Processing Simple XPath Queries

  • Loredana Afanasiev
  • Massimo Franceschet
  • Maarten Marx
  • Maarten de Rijke

The eXtensible Markup Language (XML) was designed to describe the content of a document and its hierarchical structure, and the XML Path language (XPath) is a language for selecting elements from XML documents. There is a close connection between the query processing problem for XPath and the model checking problem for temporal logics. Both boil down to checking which nodes of a graph satisfy a property. We investigate the potential of a technique based on computation tree logic (CTL) model checking for evaluating queries expressed in (a subset of) XPath. To this aim, we isolate a simple fragment of XPath that is naturally embeddable into CTL. We report on experiments based on the model checker NuSMV, and compare our results with alternative academic XPath processors. We comment on the advantages and drawbacks of the application of our model checking-based approach to XPath processing.

TIME Conference 2003 Conference Paper

Hybrid Logics on Linear Structures: Expressivity and Complexity

  • Massimo Franceschet
  • Maarten de Rijke
  • Bernd-Holger Schlingloff

We investigate expressivity and complexity of hybrid logics on linear structures. Hybrid logics are an enrichment of modal logics with certain first-order features which are algorithmically well behaved. Therefore, they are well suited for the specification of certain properties of computational systems. We show that hybrid logics are more expressive than usual modal and temporal logics on linear structures, and exhibit a hierarchy of hybrid languages. We determine the complexities of the satisfiability problem for these languages and define an existential fragment of hybrid logic for which satisfiability is still NP-complete. Finally, we examine the linear time model checking problem for hybrid logics and its complexity.

KER Journal 1997 Journal Article

Formalisms for multi-agent systems

  • MARK D'INVERNO
  • Michael Fisher
  • Alessio Lomuscio
  • Michael Luck
  • Maarten de Rijke
  • Mark Ryan
  • Michael Wooldridge

As computer scientists, our goals are motivated by the desire to improve computer systems in some way: making them easier to design and implement, more robust and less prone to error, easier to use, faster, cheaper, and so on. In the field of multi-agent systems, our goal is to build systems capable of flexible autonomous decision making, with societies of such systems cooperating with one-another. There is a lot of formal theory in the area but it is often not obvious what such theories should represent and what role the theory is intended to play. Theories of agents are often abstract and obtuse and not related to concrete computational models.

CSL Conference 1996 Conference Paper

A Proof System for Finite Trees

  • Patrick Blackburn
  • Wilfried Meyer-Viol
  • Maarten de Rijke

Abstract In this paper we introduce a description language for finite trees. Although we briefly note some of its intended applications, the main goal of the paper is to provide it with a sound and complete proof system. We do so using standard axioms from modal provability logic and modal logics of programs, and prove completeness by extending techniques due to Van Benthem and Meyer-Viol (1994) and Blackburn and Meyer-Viol (1994). We conclude with a proof of the EXPTIME-completeness of the satisfiability problem, and a discussion of issues related to complexity and theorem proving.