Author name cluster

Weidong Ma

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

1 author row

IJCAI Conference 2024 Conference Paper

Unbiased Active Semi-supervised Binary Classification Models

JooChul Lee
Weidong Ma
Ziyang Wang

Active learning is known to be a well-motivated algorithm that aims to maximize model performance with relatively small data, but it introduces sampling bias due to active selection. To adjust the bias, current literature utilizes corrective weights in a supervised learning approach. However, those methods consider only a small amount of actively sampled data and thus estimation efficiency can be improved using unsampled data together. In this paper, we develop an actively improved augmented estimation equation (AI-AEE) based on corrective weights as well as imputation models that allow us to leverage unlabeled data. The asymptotic distribution of the proposed estimator as the solution to the AI-AEE is derived, and an optimal sampling scheme to minimize the asymptotic mean squared error of the estimator is proposed. We then propose a general practical algorithm for training prediction models in the active and semi-supervised learning framework. The superiority of our method is demonstrated on synthetic and real data examples.

PDF Details DOI

IJCAI Conference 2017 Conference Paper

Efficient Mechanism Design for Online Scheduling (Extended Abstract)

Xujin Chen
Xiaodong Hu
Tie-Yan Liu
Weidong Ma
Tao Qin
Pingzhong Tang
Changjun Wang
Bo Zheng

This work concerns the mechanism design for online scheduling in a strategic setting. In this setting, each job is owned by a self-interested agent who may misreport the release time, deadline, length, and value of her job, while we need to determine not only the schedule of the jobs, but also the payment of each agent. We focus on the design of incentive compatible (IC) mechanisms, and study the maximization of social welfare (i. e. , the aggregated value of completed jobs) by competitive analysis. We first derive two lower bounds on the competitive ratio of any deterministic IC mechanism to characterize the landscape of our research: one bound is 5, which holds for equal-length jobs; the other bound is $\frac{\kappa}{\ln\kappa}+1-o(1)$, which holds for unequal-length jobs, where $\kappa$ is the maximum ratio between lengths of any two jobs. We then propose a deterministic IC mechanism and show that such a simple mechanism works very well for two models: (1) In the preemption-restart model, the mechanism can achieve the optimal competitive ratio of 5 for equal-length jobs and a near optimal ratio of $(\frac{1}{(1-\epsilon)^2}+o(1)) \frac{\kappa}{\ln\kappa}$ for unequal-length jobs, where $0<\epsilon<1$ is a small constant; (2) In the preemption-resume model, the mechanism can achieve the optimal competitive ratio of 5 for equal-length jobs and a near optimal competitive ratio (within factor 2) for unequal-length jobs.

PDF Details

NeurIPS Conference 2017 Conference Paper

LightGBM: A Highly Efficient Gradient Boosting Decision Tree

Guolin Ke
Qi Meng
Thomas Finley
Taifeng Wang
Wei Chen
Weidong Ma
Qiwei Ye
Tie-Yan Liu

Gradient Boosting Decision Tree (GBDT) is a popular machine learning algorithm, and has quite a few effective implementations such as XGBoost and pGBRT. Although many engineering optimizations have been adopted in these implementations, the efficiency and scalability are still unsatisfactory when the feature dimension is high and data size is large. A major reason is that for each feature, they need to scan all the data instances to estimate the information gain of all possible split points, which is very time consuming. To tackle this problem, we propose two novel techniques: \emph{Gradient-based One-Side Sampling} (GOSS) and \emph{Exclusive Feature Bundling} (EFB). With GOSS, we exclude a significant proportion of data instances with small gradients, and only use the rest to estimate the information gain. We prove that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size. With EFB, we bundle mutually exclusive features (i. e. , they rarely take nonzero values simultaneously), to reduce the number of features. We prove that finding the optimal bundling of exclusive features is NP-hard, but a greedy algorithm can achieve quite good approximation ratio (and thus can effectively reduce the number of features without hurting the accuracy of split point determination by much). We call our new GBDT implementation with GOSS and EFB \emph{LightGBM}. Our experiments on multiple public datasets show that, LightGBM speeds up the training process of conventional GBDT by up to over 20 times while achieving almost the same accuracy.

PDF Details

AAAI Conference 2017 Conference Paper

Randomized Mechanisms for Selling Reserved Instances in Cloud

Jia Zhang
Weidong Ma
Tao Qin
Xiaoming Sun
Tie-Yan Liu

Selling reserved instances (or virtual machines) is a basic service in cloud computing. In this paper, we consider a more ﬂexible pricing model for instance reservation, in which a customer can propose the time length and number of resources of her request, while in today’s industry, customers can only choose from several predeﬁned reservation packages. Under this model, we design randomized mechanisms for customers coming online to optimize social welfare and providers’ revenue. We ﬁrst consider a simple case, where the requests from the customers do not vary too much in terms of both length and value density. We design a randomized mechanism that achieves a competitive ratio 1 42 for both social welfare and revenue, which is a improvement as there is usually no revenue guarantee in previous works such as (Azar et al. 2015; Wang et al. 2015). This ratio can be improved up to 1 11 when we impose a realistic constraint on the maximum number of resources used by each request. On the hardness side, we show an upper bound 1 3 on competitive ratio for any randomized mechanism. We then extend our mechanism to the general case and achieve a competitive ratio 1 42 log k log T for both social welfare and revenue, where T is the ratio of the maximum request length to the minimum request length and k is the ratio of the maximum request value density to the minimum request value density. This result outperforms the previous upper bound 1 CkT for deterministic mechanisms (Wang et al. 2015). We also prove an upper bound 2 log 8kT for any randomized mechanism. All the mechanisms we provide are in a greedy style. They are truthful and easy to be integrated into practical cloud systems.

PDF Details

IJCAI Conference 2016 Conference Paper

Budgeted Multi-Armed Bandits with Multiple Plays

Yingce Xia
Tao Qin
Weidong Ma
Nenghai Yu
Tie-Yan Liu

We study the multi-play budgeted multi-armed bandit (MP-BMAB) problem, in which pulling an arm receives both a random reward and a random cost, and a player pulls L( ≥ 1) arms at each round. The player targets at maximizing her total expected reward under a budget constraint B for the pulling costs. We present a multiple ratio confidence bound policy: At each round, we first calculate a truncated upper (lower) confidence bound for the expected reward (cost) of each arm, and then pull the L arms with the maximum ratio of the sum of the upper confidence bounds of rewards to the sum of the lower confidence bounds of costs. We design 0-1 integer linear fractional programming oracle that can pick such the L arms within polynomial time. We prove that the regret of our policy is sublinear in general and is log-linear for certain parameter settings. We further consider two special cases of MP-BMABs: (1) We derive a lower bound for any consistent policy for MP-BMABs with Bernoulli reward and cost distributions. (2) We show that the proposed policy can also solve conventional budgeted MAB problem (a special case of MP-BMABs with L = 1) and provides better theoretical results than existing UCB-based pulling policies.

PDF Details

JAIR Journal 2016 Journal Article

Efficient Mechanism Design for Online Scheduling

Xujin Chen
Xiaodong Hu
Tie-Yan Liu
Weidong Ma
Tao Qin
Pingzhong Tang
Changjun Wang
Bo Zheng

This paper concerns the mechanism design for online scheduling in a strategic setting. In this setting, each job is owned by a self-interested agent who may misreport the release time, deadline, length, and value of her job, while we need to determine not only the schedule of the jobs, but also the payment of each agent. We focus on the design of incentive compatible (IC) mechanisms, and study the maximization of social welfare (i.e., the aggregated value of completed jobs) by competitive analysis. We first derive two lower bounds on the competitive ratio of any deterministic IC mechanism to characterize the landscape of our research. We then propose a deterministic IC mechanism and show that such a simple mechanism works very well for both the preemption-restart model and the preemption-resume model. We show the mechanism can achieve the optimal competitive ratio of 5 for equal-length jobs and a near optimal competitive ratio (within a constant factor) for unequal-length jobs.

PDF Details DOI

AAMAS Conference 2016 Conference Paper

Online Non-Preemptive Story Scheduling in Web Advertising

Tie-Yan Liu
Weidong Ma
Tao Qin
Pingzhong Tang
Guang Yang
Bo Zheng

PDF

IJCAI Conference 2015 Conference Paper

Selling Reserved Instances in Cloud Computing

Changjun Wang
Weidong Ma
Tao Qin
Xujin Chen
Xiaodong Hu
Tie-Yan Liu

In this paper, we study the problem of designing new mechanisms for selling reserved instances (also referred to as virtual machines) in cloud computing. Unlike the practice in today’s clouds in which users only have a few predefined options to reserve instances (i. e. , either 1-year reservation or 3-year reservation), we allow users to reserve resources for any length and from any time point in the future. Our goal is to maximize the social welfare. We propose two mechanisms, one for the case where all the jobs are tight (their lengths are exactly their reservation time intervals), and the other for the more general case where jobs are delayable and have some flexibility on their reservations. Both of the mechanisms are prompt in the sense that the acceptance and the payment for a job is determined at the very moment of its arrival. We use competitive analysis to evaluate the performance of our mechanisms, and show that both of the mechanisms have a competitive ratio of O(ln(kT)) under some mild assumption, where k (res. T) is the maximum ratio between per-instance-hour valuation (res. length) of any two jobs. We then prove that no algorithm can achieve a competitive ratio better than ln(2kT) under the same assumption. Therefore, our mechanisms are optimal within a constant factor.

PDF Details

TCS Journal 2013 Journal Article

Reducing price of anarchy of selfish task allocation with more selfishness

Xujin Chen
Xiaodong Hu
Weidong Ma
Changjun Wang

In this paper we consider the task allocation problem from a new game theoretic perspective. We assume that tasks and machines are both controlled by selfish agents with two distinct objectives, which stands in contrast to the passive role of machines in the traditional model of selfish task allocation. To characterize the outcome of this new game where two classes of players interact, we introduce the concept of dual equilibrium. We prove that the price of anarchy with respect to dual equilibria is 1. 4, which is considerably smaller than the counterpart 2 in the traditional model. Our study shows that activating more freedom and selfishness in a game may bring about a better global outcome.

Details DOI

TCS Journal 2012 Journal Article

Pairwise cooperations in selfish ring routing for minimax linear latency

Xujin Chen
Xiaodong Hu
Weidong Ma

This paper studies the selfish routing game in ring networks with a load-dependent linear latency on each link. We adopt the asymmetric atomic routing model. Each player selfishly chooses a route to connect his source-destination pair, aiming at the lowest latency of his route, while the system objective is to minimize the maximum latency among all routes of players. The effectiveness of these routing games is often measured by the price of anarchy (PoA), the worst-case ratio between the maximum latencies in a Nash equilibrium (NE) and in a system optimum, where NE refers to a “stable state” among all players, from which no player has the incentive to deviate unilaterally. In classical setting, no cooperation is allowed and 16 stands as the current best upper bound on the PoA of such selfish ring routing. In this paper we show that the PoA is at most 10. 16 provided cooperations within pairs of players are allowed, where any two players could change their routes simultaneously if neither would experience a longer latency and at least one would experience a shorter latency.

Details DOI