Author name cluster

Xiaotie Deng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

54 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Mechanism Design for LLM Fine-tuning with Multiple Reward Models

Haoran Sun
Yurong Chen
Siwei Wang
Chu Xu
Wei Chen
Xiaotie Deng

Fine-tuning large language models (LLMs) to aggregate multiple preferences has attracted considerable research attention. With aggregation algorithms advancing, a potential economic scenario arises where fine-tuning services are provided to agents with different preferences. In this context, agents may benefit from strategically misreporting their preferences, but this could harm the aggregation performance. This paper addresses such incentive issues by framing it as a mechanism design problem: an LLM provider determines the fine-tuning objective (training rule) and the pricing scheme (payment rule) for agents. We primarily focus on training rules that maximize social welfare subject to certain regularizations, referred to as SW-Max rules. First, we show that under most circumstances, truthful reporting is sub-optimal with simply a SW-Max rule, thereby highlighting the necessity of payments. Second, we extend the VCG payment to implement SW-Max rules in dominant-strategy incentive compatibility (DSIC). We characterize sufficient conditions for payment equivalence and derive the necessary conditions for a payment rule to implement a SW-Max rule in DSIC and other principles. Third, we demonstrate that our mechanism is approximately DSIC with perturbed input, showcasing its robustness against the inevitable errors in real-world applications. Experiments on real LLM training results further confirm the practical implications of our results.

PDF Details

TCS Journal 2025 Journal Article

On the optimal mixing problem of approximate Nash equilibria in bimatrix games

Xiaotie Deng
Dongchen Li
Hanyu Li

This paper introduces the optimal mixing problem, a natural extension of the computation of approximate Nash Equilibria (NE) in bimatrix games. The problem focuses on determining the optimal convex combination of given strategies that minimizes the approximation (i. e. , regret) in NE computation. We develop algorithms for the exact and approximate optimal mixing problems and present new complexity results that bridge both practical and theoretical aspects of NE computation. Practically, our algorithms can be used to enhance and integrate arbitrary existing constant-approximate NE algorithms, offering a powerful tool for the design of approximate NE algorithms. Theoretically, these algorithms allow us to explore the implications of support restrictions on approximate NE and derive the upper-bound separations between approximate NE and exact NE. Consequently, this work contributes to theoretical understandings of the computational complexity of approximate NE under various constraints and practical improvements in multi-agent reinforcement learning (MARL) and other fields where NE computation is involved.

Details DOI

AAMAS Conference 2025 Conference Paper

Optimal Mechanism Design for Crowdfunding of Public Goods

Yukun Cheng
Xiaotie Deng
Baqiao Quan

Mechanisms for crowdfunding public goods are essential for ensuring that societies can collectively benefit from public goods. Unlike previous researches on crowdfunding for public goods, which focused on binary outcomes—either full provision or none at all, this paper proposes an auction framework to examine the partial provision of public goods, based on the funds raised, with the goal of maximizing the final investment amount. We develop truthful investment mechanisms that achieve the (approximate) optimal expected investment amount across different models, taking into account the number of agents.

PDF

IJCAI Conference 2025 Conference Paper

Survey on Strategic Mining in Blockchain: A Reinforcement Learning Approach

Jichen Li
Lijia Xie
Hanting Huang
Bo Zhou
Binfeng Song
Wanying Zeng
Xiaotie Deng
Xiao Zhang

Strategic mining attacks, such as selfish mining, exploit blockchain consensus protocols by deviating from honest behavior to maximize rewards. Markov Decision Process (MDP) analysis faces scalability challenges in modern digital economics, including blockchain. To address these limitations, reinforcement learning (RL) provides a scalable alternative, enabling adaptive strategy optimization in complex dynamic environments. In this survey, we examine RL’s role in strategic mining analysis, comparing it to MDP-based approaches. We begin by reviewing foundational MDP models and their limitations, before exploring RL frameworks that can learn near-optimal strategies across various protocols. Building on this analysis, we compare RL techniques and their effectiveness in deriving security thresholds, such as the minimum attacker power required for profitable attacks. Expanding the discussion further, we classify consensus protocols and propose open challenges, such as multi-agent dynamics and real-world validation. This survey highlights the potential of reinforcement learning to address the challenges of selfish mining, including protocol design, threat detection, and security analysis, while offering a strategic roadmap for researchers in decentralized systems and AI-driven analytics.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Competition among Pairwise Lottery Contests

Xiaotie Deng
Hangxin Gan
Ningyuan Li
Weian Li
Qi Qi

We investigate a two-stage competitive model involving multiple contests. In this model, each contest designer chooses two participants from a pool of candidate contestants and determines the biases. Contestants strategically distribute their efforts across various contests within their budget. We first show the existence of a pure strategy Nash equilibrium (PNE) for the contestants, and propose a fully polynomial-time approximation scheme to compute an approximate PNE. In the scenario where designers simultaneously decide the participants and biases, the subgame perfect equilibrium (SPE) may not exist. Nonetheless, when designers' decisions are made in two substages, the existence of SPE is established. In the scenario where designers can hold multiple contests, we show that the SPE always exists under mild conditions and can be computed efficiently.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Contextual Decision-Making with Knapsacks Beyond the Worst Case

Zhaohua Chen
Rui Ai
Mingwei Yang
Yuqi Pan
Chang Wang
Xiaotie Deng

We study the framework of a dynamic decision-making scenario with resource constraints. In this framework, an agent, whose target is to maximize the total reward under the initial inventory, selects an action in each round upon observing a random request, leading to a reward and resource consumptions that are further associated with an unknown random external factor. While previous research has already established an $\widetilde{O}(\sqrt{T})$ worst-case regret for this problem, this work offers two results that go beyond the worst-case perspective: one for the worst-case gap between benchmarks and another for logarithmic regret rates. We first show that an $\Omega(\sqrt{T})$ distance between the commonly used fluid benchmark and the online optimum is unavoidable when the former has a degenerate optimal solution. On the algorithmic side, we merge the re-solving heuristic with distribution estimation skills and propose an algorithm that achieves an $\widetilde{O}(1)$ regret as long as the fluid LP has a unique and non-degenerate solution. Furthermore, we prove that our algorithm maintains a near-optimal $\widetilde{O}(\sqrt{T})$ regret even in the worst cases and extend these results to the setting where the request and external factor are continuous. Regarding information structure, our regret results are obtained under two feedback models, respectively, where the algorithm accesses the external factor at the end of each round and at the end of a round only when a non-null action is executed.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Dynamic Budget Throttling in Repeated Second-Price Auctions

Zhaohua Chen
Chang Wang
Qian Wang
Yuqi Pan
Zhuming Shi
Zheng Cai
Yukun Ren
Zhihua Zhu

In today's online advertising markets, a crucial requirement for an advertiser is to control her total expenditure within a time horizon under some budget. Among various budget control methods, throttling has emerged as a popular choice, managing an advertiser's total expenditure by selecting only a subset of auctions to participate in. This paper provides a theoretical panorama of a single advertiser's dynamic budget throttling process in repeated second-price auctions. We first establish a lower bound on the regret and an upper bound on the asymptotic competitive ratio for any throttling algorithm, respectively, when the advertiser's values are stochastic and adversarial. Regarding the algorithmic side, we propose the OGD-CB algorithm, which guarantees a near-optimal expected regret with stochastic values. On the other hand, when values are adversarial, we prove that this algorithm also reaches the upper bound on the asymptotic competitive ratio. We further compare throttling with pacing, another widely adopted budget control method, in repeated second-price auctions. In the stochastic case, we demonstrate that pacing is generally superior to throttling for the advertiser, supporting the well-known result that pacing is asymptotically optimal in this scenario. However, in the adversarial case, we give an exciting result indicating that throttling is also an asymptotically optimal dynamic bidding strategy. Our results bridge the gaps in theoretical research of throttling in repeated auctions and comprehensively reveal the ability of this popular budget-smoothing strategy.

PDF Details DOI

ICLR Conference 2024 Conference Paper

Learning Thresholds with Latent Values and Censored Feedback

Jiahao Zhang
Tao Lin 0013
Weiqiang Zheng
Zhe Feng 0004
Yifeng Teng
Xiaotie Deng

In this paper, we investigate a problem of *actively* learning threshold in latent space, where the *unknown* reward $g(\gamma, v)$ depends on the proposed threshold $\gamma$ and latent value $v$ and it can be $only$ achieved if the threshold is lower than or equal to the *unknown* latent value. This problem has broad applications in practical scenarios, e.g., reserve price optimization in online auctions, online task assignments in crowdsourcing, setting recruiting bars in hiring, etc. We first characterize the query complexity of learning a threshold with the expected reward at most $\epsilon$ smaller than the optimum and prove that the number of queries needed can be infinitely large even when $g(\gamma, v)$ is monotone with respect to both $\gamma$ and $v$. On the positive side, we provide a tight query complexity $\tilde{\Theta}(1/\epsilon^3)$ when $g$ is monotone and the CDF of value distribution is Lipschitz. Moreover, we show a tight $\tilde{\Theta}(1/\epsilon^3)$ query complexity can be achieved as long as $g$ satisfies one-sided Lipschitzness, which provides a complete characterization for this problem. Finally, we extend this model to an online learning setting and demonstrate a tight $\Theta(T^{2/3})$ regret bound using continuous-arm bandit techniques and the aforementioned query complexity results.