Arrow Research search

Author name cluster

Jieming Mao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

22 papers
2 author rows

Possible papers

22

ICML Conference 2025 Conference Paper

Differentially Private Space-Efficient Algorithms for Counting Distinct Elements in the Turnstile Model

  • Rachel Cummings
  • Alessandro Epasto
  • Jieming Mao
  • Tamalika Mukherjee
  • Tingting Ou
  • Peilin Zhong

The turnstile continual release model of differential privacy captures scenarios where a privacy-preserving real-time analysis is sought for a dataset evolving through additions and deletions. In typical applications of real-time data analysis, both the length of the stream $T$ and the size of the universe $|\mathcal{U}|$ from which data come can be extremely large. This motivates the study of private algorithms in the turnstile setting using space sublinear in both $T$ and $|\mathcal{U}|$. In this paper, we give the first sublinear space differentially private algorithms for the fundamental problems of counting distinct elements in the turnstile streaming model. Our algorithm achieves, on arbitrary streams, $O_{\eta}(T^{1/3})$ space and additive error, and a $(1+\eta)$-relative approximation for all $\eta \in (0, 1)$. Our result significantly improves upon the space requirements of the state-of-the-art algorithms for this problem, which is linear, approaching the known $\Omega(T^{1/4})$ additive error lower bound for arbitrary streams. Moreover, when a bound $W$ on the number of times an item appears in the stream is known, our algorithm provides $O_{\eta}(\sqrt{W})$ additive error, using $O_{\eta}(\sqrt{W})$ space. This additive error asymptotically matches that of prior work which required instead linear space. Our results address an open question posed by Jain et al. about designing low-memory mechanisms for this problem. We complement this results with a space lower bound for this problem, which shows that any algorithm that uses similar techniques must use space $\Omega(T^{1/3})$.

ICML Conference 2025 Conference Paper

Retraining with Predicted Hard Labels Provably Increases Model Accuracy

  • Rudrajit Das
  • Inderjit S. Dhillon
  • Alessandro Epasto
  • Adel Javanmard
  • Jieming Mao
  • Vahab Mirrokni
  • Sujay Sanghavi
  • Peilin Zhong

The performance of a model trained with noisy labels is often improved by simply retraining the model with its own predicted hard labels (i. e. , $1$/$0$ labels). Yet, a detailed theoretical characterization of this phenomenon is lacking. In this paper, we theoretically analyze retraining in a linearly separable binary classification setting with randomly corrupted labels given to us and prove that retraining can improve the population accuracy obtained by initially training with the given (noisy) labels. To the best of our knowledge, this is the first such theoretical result. Retraining finds application in improving training with local label differential privacy (DP), which involves training with noisy labels. We empirically show that retraining selectively on the samples for which the predicted label matches the given label significantly improves label DP training at no extra privacy cost; we call this consensus-based retraining. For example, when training ResNet-18 on CIFAR-100 with $\epsilon=3$ label DP, we obtain more than $6$% improvement in accuracy with consensus-based retraining.

NeurIPS Conference 2024 Conference Paper

Autobidder's Dilemma: Why More Sophisticated Autobidders Lead to Worse Auction Efficiency

  • Yuan Deng
  • Jieming Mao
  • Vahab Mirrokni
  • Hanrui Zhang
  • Song Zuo

The recent increasing adoption of autobidding has inspired the growing interest in analyzing the performance of classic mechanism with value-maximizing autobidders both theoretically and empirically. It is known that optimal welfare can be obtained in first-price auctions if autobidders are restricted to uniform bid-scaling and the price of anarchy is $2$ when non-uniform bid-scaling strategies are allowed. In this paper, we provide a fine-grained price of anarchy analysis for non-uniform bid-scaling strategies in first-price auctions, demonstrating the reason why more powerful (individual) non-uniform bid-scaling strategies may lead to worse (aggregated) performance in social welfare. Our theoretical results match recent empirical findings that a higher level of non-uniform bid-scaling leads to lower welfare performance in first-price auctions.

NeurIPS Conference 2024 Conference Paper

Efficiency of the First-Price Auction in the Autobidding World

  • Yuan Deng
  • Jieming Mao
  • Vahab Mirrokni
  • Hanrui Zhang
  • Song Zuo

We study the price of anarchy of first-price auctions in the autobidding world, where bidders can be either utility maximizers (i. e. , traditional bidders) or value maximizers (i. e. , autobidders). We show that with autobidders only, the price of anarchy of first-price auctions is $1/2$, and with both kinds of bidders, the price of anarchy degrades to about $0. 457$ (the precise number is given by an optimization). These results complement the recent result by [Jin and Lu, 2022] showing that the price of anarchy of first-price auctions with traditional bidders is $1 - 1/e^2$. We further investigate a setting where the seller can utilize machine-learned advice to improve the efficiency of the auctions. There, we show that as the accuracy of the advice increases, the price of anarchy improves smoothly from about $0. 457$ to $1$.

ICLR Conference 2022 Conference Paper

Shuffle Private Stochastic Convex Optimization

  • Albert Cheu
  • Matthew Joseph
  • Jieming Mao
  • Binghui Peng

In shuffle privacy, each user sends a collection of randomized messages to a trusted shuffler, the shuffler randomly permutes these messages, and the resulting shuffled collection of messages must satisfy differential privacy. Prior work in this model has largely focused on protocols that use a single round of communication to compute algorithmic primitives like means, histograms, and counts. In this work, we present interactive shuffle protocols for stochastic convex optimization. Our optimization protocols rely on a new noninteractive protocol for summing vectors of bounded $\ell_2$ norm. By combining this sum subroutine with techniques including mini-batch stochastic gradient descent, accelerated gradient descent, and Nesterov's smoothing method, we obtain loss guarantees for a variety of convex loss functions that significantly improve on those of the local model and sometimes match those of the central model.

NeurIPS Conference 2021 Conference Paper

Robust Auction Design in the Auto-bidding World

  • Santiago Balseiro
  • Yuan Deng
  • Jieming Mao
  • Vahab Mirrokni
  • Song Zuo

In classic auction theory, reserve prices are known to be effective for improving revenue for the auctioneer against quasi-linear utility maximizing bidders. The introduction of reserve prices, however, usually do not help improve total welfare of the auctioneer and the bidders. In this paper, we focus on value maximizing bidders with return on spend constraints---a paradigm that has drawn considerable attention recently as more advertisers adopt auto-bidding algorithms in advertising platforms---and show that the introduction of reserve prices has a novel impact on the market. Namely, by choosing reserve prices appropriately the auctioneer can improve not only the total revenue but also the total welfare. Our results also demonstrate that reserve prices are robust to bidder types, i. e. , reserve prices work well for different bidder types, such as value maximizers and utility maximizers, without using bidder type information. We generalize these results for a variety of auction mechanisms such as VCG, GSP, and first-price auctions. Moreover, we show how to combine these results with additive boosts to improve the welfare of the outcomes of the auction further. Finally, we complement our theoretical observations with an empirical study confirming the effectiveness of these ideas using data from online advertising auctions.

SODA Conference 2020 Conference Paper

Exponential Separations in Local Differential Privacy

  • Matthew Joseph
  • Jieming Mao
  • Aaron Roth 0001

We prove a general connection between the communication complexity of two-player games and the sample complexity of their multi-player locally private analogues. We use this connection to prove sample complexity lower bounds for locally differentially private protocols as straightforward corollaries of results from communication complexity. In particular, we 1) use a communication lower bound for the hidden layers problem to prove an exponential sample complexity separation between sequentially and fully interactive locally private protocols, and 2) use a communication lower bound for the pointer chasing problem to prove an exponential sample complexity separation between k -round and ( k + 1)-round sequentially interactive locally private protocols, for every k.

NeurIPS Conference 2020 Conference Paper

Smoothly Bounding User Contributions in Differential Privacy

  • Alessandro Epasto
  • Mohammad Mahdian
  • Jieming Mao
  • Vahab Mirrokni
  • Lijie Ren

A differentially private algorithm guarantees that the input of a single user won’t significantly change the output distribution of the algorithm. When a user contributes more data points, more information can be collected to improve the algorithm’s performance. But at the same time, more noise might need to be added to the algorithm in order to keep the algorithm differentially private and this might hurt the algorithm’s performance. Amin et al. (2019) initiates the study on bounding user contributions and proposes a very natural algorithm which limits the number of samples each user can contribute by a threshold. For a better trade-off between utility and privacy guarantee, we propose a method which smoothly bounds user contributions by setting appropriate weights on data points and apply it to estimating the mean/quantiles, linear regression, and empirical risk minimization. We show that our algorithm provably outperforms the sample limiting algorithm. We conclude with experimental evaluations which validate our theoretical results.

ICML Conference 2019 Conference Paper

Differentially Private Fair Learning

  • Matthew Jagielski
  • Michael J. Kearns
  • Jieming Mao
  • Alina Oprea
  • Aaron Roth 0001
  • Saeed Sharifi-Malvajerdi
  • Jonathan R. Ullman

Motivated by settings in which predictive models may be required to be non-discriminatory with respect to certain attributes (such as race), but even collecting the sensitive attribute may be forbidden or restricted, we initiate the study of fair learning under the constraint of differential privacy. Our first algorithm is a private implementation of the equalized odds post-processing approach of (Hardt et al. , 2016). This algorithm is appealingly simple, but must be able to use protected group membership explicitly at test time, which can be viewed as a form of “disparate treatment”. Our second algorithm is a differentially private version of the oracle-efficient in-processing approach of (Agarwal et al. , 2018) which is more complex but need not have access to protected group membership at test time. We identify new tradeoffs between fairness, accuracy, and privacy that emerge only when requiring all three properties, and show that these tradeoffs can be milder if group membership may be used at test time. We conclude with a brief experimental evaluation.

NeurIPS Conference 2019 Conference Paper

Locally Private Gaussian Estimation

  • Matthew Joseph
  • Janardhan Kulkarni
  • Jieming Mao
  • Steven Wu

We study a basic private estimation problem: each of n users draws a single i. i. d. sample from an unknown Gaussian distribution N(\mu, \sigma^2), and the goal is to estimate \mu while guaranteeing local differential privacy for each user. As minimizing the number of rounds of interaction is important in the local setting, we provide adaptive two-round solutions and nonadaptive one-round solutions to this problem. We match these upper bounds with an information-theoretic lower bound showing that our accuracy guarantees are tight up to logarithmic factors for all sequentially interactive locally private protocols.

FOCS Conference 2019 Conference Paper

The Role of Interactivity in Local Differential Privacy

  • Matthew Joseph
  • Jieming Mao
  • Seth Neel
  • Aaron Roth 0001

We study the power of interactivity in local differential privacy. First, we focus on the difference between fully interactive and sequentially interactive protocols. Sequentially interactive protocols may query users adaptively in sequence, but they cannot return to previously queried users. The vast majority of existing lower bounds for local differential privacy apply only to sequentially interactive protocols, and before this paper it was not known whether fully interactive protocols were more powerful. We resolve this question. First, we classify locally private protocols by their compositionality, the multiplicative factor by which the sum of a protocol's single-round privacy parameters exceeds its overall privacy guarantee. We then show how to efficiently transform any fully interactive compositional protocol into an equivalent sequentially interactive protocol with a blowup in sample complexity linear in this compositionality. Next, we show that our reduction is tight by exhibiting a family of problems such that any sequentially interactive protocol requires this blowup in sample complexity over a fully interactive compositional protocol. We then turn our attention to hypothesis testing problems. We show that for a large class of compound hypothesis testing problems - which include all simple hypothesis testing problems as a special case - a simple noninteractive test is optimal among the class of all (possibly fully interactive) tests.

NeurIPS Conference 2018 Conference Paper

Contextual Pricing for Lipschitz Buyers

  • Jieming Mao
  • Renato Leme
  • Jon Schneider

We investigate the problem of learning a Lipschitz function from binary feedback. In this problem, a learner is trying to learn a Lipschitz function $f: [0, 1]^d \rightarrow [0, 1]$ over the course of $T$ rounds. On round $t$, an adversary provides the learner with an input $x_t$, the learner submits a guess $y_t$ for $f(x_t)$, and learns whether $y_t > f(x_t)$ or $y_t \leq f(x_t)$. The learner's goal is to minimize their total loss $\sum_t\ell(f(x_t), y_t)$ (for some loss function $\ell$). The problem is motivated by \textit{contextual dynamic pricing}, where a firm must sell a stream of differentiated products to a collection of buyers with non-linear valuations for the items and observes only whether the item was sold or not at the posted price. For the symmetric loss $\ell(f(x_t), y_t) = \vert f(x_t) - y_t \vert$, we provide an algorithm for this problem achieving total loss $O(\log T)$ when $d=1$ and $O(T^{(d-1)/d})$ when $d>1$, and show that both bounds are tight (up to a factor of $\sqrt{\log T}$). For the pricing loss function $\ell(f(x_t), y_t) = f(x_t) - y_t {\bf 1}\{y_t \leq f(x_t)\}$ we show a regret bound of $O(T^{d/(d+1)})$ and show that this bound is tight. We present improved bounds in the special case of a population of linear buyers.

SODA Conference 2018 Conference Paper

On Simultaneous Two-player Combinatorial Auctions

  • Mark Braverman
  • Jieming Mao
  • S. Matthew Weinberg

We consider the following communication problem: Alice and Bob each have some valuation functions υ 1 (·) and υ 2 (·) over subsets of m items, and their goal is to partition the items into S, in a way that maximizes the welfare, . We study both the allocation problem, which asks for a welfare-maximizing partition and the decision problem, which asks whether or not there exists a partition guaranteeing certain welfare, for binary XOS valuations. For interactive protocols with poly( m ) communication, a tight 3/4-approximation is known for both [29, 23]. For interactive protocols, the allocation problem is provably harder than the decision problem: any solution to the allocation problem implies a solution to the decision problem with one additional round and log m additional bits of communication via a trivial reduction. Surprisingly, the allocation problem is provably easier for simultaneous protocols. Specifically, we show: • There exists a simultaneous, randomized protocol with polynomial communication that selects a partition whose expected welfare is at least 3/4 of the optimum. This matches the guarantee of the best interactive, randomized protocol with polynomial communication. • For all ε > 0, any simultaneous, randomized protocol that decides whether the welfare of the optimal partition is ≥ 1 or ≤ 3/4 – 1/108 + ε correctly with probability > 1/2 + 1/poly( m ) requires exponential communication. This provides a separation between the attainable approximation guarantees via interactive (3/4) versus simultaneous (≤ 3/4 – 1/108) protocols with polynomial communication. In other words, this trivial reduction from decision to allocation problems provably requires the extra round of communication. We further discuss the implications of our results for the design of truthful combinatorial auctions in general, and extensions to general XOS valuations. In particular, our protocol for the allocation problem implies a new style of truthful mechanisms.

SODA Conference 2017 Conference Paper

Competitive analysis of the top- K ranking problem

  • Xi Chen 0010
  • Sivakanth Gopi
  • Jieming Mao
  • Jon Schneider

Motivated by applications in recommender systems, web search, social choice and crowdsourcing, we consider the problem of identifying the set of top K items from noisy pairwise comparisons. In our setting, we are non-actively given r pairwise comparisons between each pair of n items, where each comparison has noise constrained by a very general noise model called the strong stochastic transitivity (SST) model. We analyze the competitive ratio of algorithms for the top- K problem. In particular, we present a linear time algorithm for the top- K problem which has a competitive ratio of i. e. to solve any instance of top -K, our algorithm needs at most times as many samples needed as the best possible algorithm for that instance (in contrast, all previous known algorithms for the top- K problem have competitive ratios of Ω( n ) or worse). We further show that this is tight: any algorithm for the top- K problem has competitive ratio at least

FOCS Conference 2015 Conference Paper

Near-Optimal Bounds on Bounded-Round Quantum Communication Complexity of Disjointness

  • Mark Braverman
  • Ankit Garg 0001
  • Young Kun-Ko
  • Jieming Mao
  • Dave Touchette

We prove a near optimal round-communication tradeoff for the two-party quantum communication complexity of disjointness. For protocols with r rounds, we prove a lower bound of Omega(n/r) on the communication required for computing disjointness of input size n, which is optimal up to logarithmic factors. The previous best lower bound was Omega(n/r̂ 2 ) due to Jain, Radhakrishnan and Sen. Along the way, we develop several tools for quantum information complexity, one of which is a lower bound for quantum information complexity in terms of the generalized discrepancy method. As a corollary, we get that the quantum communication complexity of any boolean function f is at most 2 ̂O(QIC(f)), where QIC(f) is the prior-free quantum information complexity of f (with error 1/3).