Author name cluster

Qingwei Lin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

30 papers

2 author rows

TMLR Journal 2026 Journal Article

VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model

Mengzhuo Chen
Jiani zheng
Lu Wang
Fangkai Yang
Chaoyun Zhang
Lingrui Mei
Wenjie Yin
Qingwei Lin

Training Vision-Language Models (VLMs) for Graphical User Interfaces (GUI) agents via Reinforcement Learning (RL) faces critical challenges: environment-based RL requires costly interactions, while environment-free methods struggle with distribution shift and reward generalization. We propose an environment-free RL framework that decouples action utility learning from policy optimization by leveraging a pretrained Value Environment Model (VEM), which requires no live environment interaction during policy optimization. VEM predicts value-aligned action utilities directly from offline data, distilling human-like priors about GUI interaction outcomes without requiring next-state prediction or environmental feedback. This avoids compounding errors and enhances resilience to UI changes by focusing on semantic reasoning (e.g., “Does this action advance the user’s goal?”). The framework operates in two stages: (1) pretraining VEM to learn action-level utility signals and (2) guiding policy exploration with frozen VEM signals, enabling layout-agnostic GUI automation. Evaluated across diverse benchmarks including Android-in-the-Wild for mobile apps and Multimodal-Mind2Web for web environments, VEM achieves state-of-the-art or highly competitive performance in both offline and online settings. It significantly outperforms environment-free baselines and matches or exceeds environment-based approaches, crucially without incurring interaction costs. Importantly, VEM demonstrates that robust, generalizable GUI agents can be trained efficiently using semantic-aware action utility prediction, proving effective across distinct interaction platforms like mobile and web. The code is available at https://github.com/microsoft/GUI-Agent-RL.

PDF Details

NeurIPS Conference 2025 Conference Paper

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Qianhui Wu
Kanzhi Cheng
Rui Yang
Chaoyun Zhang
Jianwei Yang
Huiqiang Jiang
Jian Mu
Baolin Peng

One of the principal challenges in building VLM-powered GUI agents is visual grounding—localizing the appropriate screen region for action execution based on both the visual content and the textual plans. Most existing work formulates this as a text-based coordinate generation task. However, these approaches suffer from several limitations: weak spatial-semantic alignment due to lack of explicit spatial supervision; inability to handle ambiguous supervision targets, as single-point predictions penalize valid variations; and a mismatch between the dense nature of screen coordinates and the coarse, patch-level granularity of visual features extracted by models like Vision Transformers. In this paper, we propose GUI-Actor, a VLM-based method for coordinate-free GUI grounding. At its core, GUI-Actor introduces an attention-based action head that learns to align a dedicated `` token with all relevant visual patch tokens, enabling the model to propose one or more action regions in a single forward pass. In line with this, we further design a grounding verifier to evaluate and select the most plausible action region from the candidates proposed for action execution. Extensive experiments show that GUI-Actor outperforms prior state-of-the-art methods on multiple GUI action grounding benchmarks, with improved generalization to unseen screen resolutions and layouts. Notably, GUI-Actor-7B achieves scores of 40. 7 with Qwen2-VL and 44. 6 with Qwen2. 5-VL as backbones, outperforming UI-TARS-72B (38. 1) on ScreenSpot-Pro, with significantly fewer parameters and training data. Furthermore, by incorporating the verifier, we find that fine-tuning only the newly introduced action head (~100M parameters for 7B model) while keeping the VLM backbone frozen is sufficient to achieve performance comparable to previous state-of-the-art models, highlighting that GUI-Actor can endow the underlying VLM with effective grounding capabilities without compromising its general-purpose strengths. Project page: https: //aka. ms/GUI-Actor

PDF Details

TMLR Journal 2025 Journal Article

Large Action Models: From Inception to Implementation

Lu Wang
Fangkai Yang
Chaoyun Zhang
Junting Lu
Jiaxu Qian
Shilin He
Pu Zhao
Bo Qiao

As AI continues to advance, there is a growing demand for systems that go beyond language-based assistance and move toward intelligent agents capable of performing real-world actions. This evolution requires the transition from traditional Large Language Models (LLMs), which excel at generating textual responses, to Large Action Models (LAMs), designed for action generation and execution within dynamic environments. Enabled by agent systems, LAMs hold the potential to transform AI from passive language understanding to active task completion, marking a significant milestone in the progression toward artificial general intelligence. In this paper, we present a comprehensive framework for developing LAMs, offering a systematic approach to their creation, from inception to deployment. We begin with an overview of LAMs, highlighting their unique characteristics and delineating their differences from LLMs. Using a Windows OS-based agent as a case study, we provide a detailed, step-by-step guide on the key stages of LAM development, including data collection, model training, environment integration, grounding, and evaluation. This generalizable workflow can serve as a blueprint for creating functional LAMs in various application domains. We conclude by identifying the current limitations of LAMs and discussing directions for future research and industrial deployment, emphasizing the challenges and opportunities that lie ahead in realizing the full potential of LAMs in real-world applications.

PDF Details

TMLR Journal 2025 Journal Article

Large Language Model-Brained GUI Agents: A Survey

Chaoyun Zhang
Shilin He
Jiaxu Qian
Bowen Li
Liqun Li
Si Qin
Yu Kang
Minghua Ma

Graphical User Interfaces (GUIs) have long been central to human-computer interaction, providing an intuitive and visually-driven way to access and interact with digital systems. Traditionally, automating GUI interactions relied on script-based or rule-based approaches, which, while effective for fixed workflows, lacked the flexibility and adaptability required for dynamic, real-world applications. The advent of Large Language Models (LLMs), particularly multimodal models, has ushered in a new era of GUI automation. They have demonstrated exceptional capabilities in natural language understanding, code generation, task generalization, and visual processing. This has paved the way for a new generation of ''LLM-brained'' GUI agents capable of interpreting complex GUI elements and autonomously executing actions based on natural language instructions. These agents represent a paradigm shift, enabling users to perform intricate, multi-step tasks through simple conversational commands. Their applications span across web navigation, mobile app interactions, and desktop automation, offering a transformative user experience that revolutionizes how individuals interact with software. This emerging field is rapidly advancing, with significant progress in both research and industry. To provide a structured understanding of this trend, this paper presents a comprehensive survey of LLM-brained GUI agents, exploring their historical evolution, core components, and advanced techniques. We address critical research questions such as existing GUI agent frameworks, the collection and utilization of data for training specialized GUI agents, the development of large action models tailored for GUI tasks, and the evaluation metrics and benchmarks necessary to assess their effectiveness. Additionally, we examine emerging applications powered by these agents. Through a detailed analysis, this survey identifies key research gaps and outlines a roadmap for future advancements in the field. By consolidating foundational knowledge and state-of-the-art developments, this work aims to guide both researchers and practitioners in overcoming challenges and unlocking the full potential of LLM-brained GUI agents. We anticipate that this survey will serve both as a practical cookbook for constructing LLM-powered GUI agents, and as a definitive reference for advancing research in this rapidly evolving domain.

PDF Details

ICLR Conference 2025 Conference Paper

OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?

Junjielong Xu
Qinan Zhang
Zhiqing Zhong
Shilin He
Chaoyun Zhang
Qingwei Lin
Dan Pei
Pinjia He

Large language models (LLMs) are driving substantial advancements in software engineering, with successful applications like Copilot and Cursor transforming real-world development practices. However, current research predominantly focuses on the early stages of development, such as code generation, while overlooking the post-development phases that are crucial to user experience. To explore the potential of LLMs in this direction, we propose OpenRCA, a benchmark dataset and evaluation framework for assessing LLMs’ ability to identify the root cause of software failures. OpenRCA includes 335 failures from three enterprise software systems, along with over 68 GB of telemetry data (logs, metrics, and traces). Given a failure case and its associated telemetry, the LLM is tasked to identify the root cause that triggered the failure, requiring comprehension of software dependencies and reasoning over heterogeneous, long-context telemetry data. Our results show substantial room for improvement, as current models can only handle the simplest cases. Even with the specially designed RCA-agent, the best-performing model, Claude 3.5, solved only 11.34% failure cases. Our work paves the way for future research in this direction.

Details

ICLR Conference 2025 Conference Paper

RuAG: Learned-rule-augmented Generation for Large Language Models

Yudi Zhang 0006
Pei Xiao 0007
Lu Wang 0029
Chaoyun Zhang
Meng Fang
Yali Du 0001
Yevgeniy Puzyrev
Randolph Yao

In-context learning (ICL) and Retrieval-Augmented Generation (RAG) have gained attention for their ability to enhance LLMs' reasoning by incorporating external knowledge but suffer from limited contextual window size, leading to insufficient information injection. To this end, we propose a novel framework to automatically distill large volumes of offline data into interpretable first-order logic rules, which are injected into LLMs to boost their reasoning capabilities. Our method begins by formulating the search process relying on LLMs' commonsense, where LLMs automatically define head and body predicates. Then, we apply Monte Carlo Tree Search (MCTS) to address the combinational searching space and efficiently discover logic rules from data. The resulting logic rules are translated into natural language, allowing targeted knowledge injection and seamless integration into LLM prompts for LLM's downstream task reasoning. We evaluate our framework on public and private industrial tasks, including Natural Language Processing (NLP), time-series, decision-making, and industrial tasks, demonstrating its effectiveness in enhancing LLM's capability over diverse tasks.

Details

ICLR Conference 2025 Conference Paper

Self-Evolved Reward Learning for LLMS

Chenghua Huang
Zhizhen Fan
Lu Wang 0029
Fangkai Yang
Pu Zhao 0004
Zeqi Lin
Qingwei Lin
Dongmei Zhang 0001

Reinforcement Learning from Human Feedback (RLHF) is a crucial technique for aligning language models with human preferences and is a key factor in the success of modern conversational models like GPT-4, ChatGPT, and Llama 2. A significant challenge in employing RLHF lies in training a reliable RM, which relies on high-quality labels. Typically, these labels are provided by human experts or a stronger AI, both of which can be costly and introduce bias that may affect the language model's responses. As models improve, human input may become less effective in enhancing their performance. This paper explores the potential of using the RM itself to generate additional training data for a more robust RM. Our experiments demonstrate that reinforcement learning from self-feedback outperforms baseline approaches. We conducted extensive experiments with our approach on multiple datasets, such as HH-RLHF and UltraFeedback, and models including Mistral and Llama 3, comparing it against various baselines. Our results indicate that, even with a limited amount of human-labeled data, learning from self-feedback can robustly enhance the performance of the RM, thereby improving the capabilities of large language models.

Details

NeurIPS Conference 2025 Conference Paper

SWE-bench Goes Live!

LingHao Zhang
Shilin He
Chaoyun Zhang
Yu Kang
Bowen Li
Chengxing Xie
Junhao Wang
Maoquan Wang

The issue-resolving task, where a model generates patches to fix real-world bugs, has emerged as a key benchmark for evaluating the capabilities of large language models (LLMs). While SWE-bench has become the dominant benchmark in this domain, it suffers from several limitations: it has not been updated since its release, is restricted to only 12 repositories, and relies heavily on manual effort for constructing test instances and setting up executable environments, significantly limiting its scalability. We present SWE-bench-Live, a live-updatable benchmark designed to address these limitations. SWE-bench-Live currently includes 1, 890 tasks derived from real GitHub issues created since 2024, spanning 223 repositories. Each task is accompanied by a dedicated Docker image to ensure reproducible execution. Additionally, we introduce an automated curation pipeline that streamlines the entire process from instance creation to environment setup, removing manual bottlenecks and enabling scalability and continuous updates. We evaluate a range of state-of-the-art models and agent frameworks on SWE-bench-Live, offering detailed empirical insights into their real-world bug-fixing capabilities. By providing a fresh, diverse, and executable benchmark grounded in live repository activity, SWE-bench-Live supports reliable, large-scale assessment of code LLMs and code agents in realistic development settings.

PDF Details

ICLR Conference 2025 Conference Paper

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Haipeng Luo
Qingfeng Sun
Can Xu
Pu Zhao 0004
Jian-Guang Lou
Chongyang Tao
Xiubo Geng
Qingwei Lin

Large language models (LLMs), such as GPT-4, have shown remarkable performance in natural language processing (NLP) tasks, including challenging mathematical reasoning. However, most existing open-source models are only pre-trained on large-scale internet data and without math-related optimization. In this paper, we present WizardMath, which enhances the mathematical reasoning abilities of LLMs, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our model. Remarkably, WizardMath-Mistral 7B surpasses all other open-source LLMs by a substantial margin. Furthermore, WizardMath 70B even outperforms ChatGPT-3.5, Claude Instant, Gemini Pro and Mistral Medium. Additionally, our preliminary exploration highlights the pivotal role of instruction evolution and process supervision in achieving exceptional math performance.

Details

TMLR Journal 2025 Journal Article

Zoomer: Adaptive Image Focus Optimization for Black-box MLLM

Jiaxu Qian
Chendong Wang
Yifan Yang
Chaoyun Zhang
Huiqiang Jiang
Xufang Luo
Yu Kang
Qingwei Lin

Multimodal large language models (MLLMs) such as GPT-4o, Gemini Pro, and Claude 3.5 have enabled unified reasoning over text and visual inputs, yet they often hallucinate in real-world scenarios—especially when small objects or fine spatial context are involved. We pinpoint two core causes of this failure: the absence of region-adaptive attention and inflexible token budgets that force uniform downsampling, leading to critical information loss. To overcome these limitations, we introduce Zoomer a visual prompting framework that delivers token-efficient, detail-preserving image representations for black-box MLLMs. Zoomer integrates (1) a prompt-aware emphasis module to highlight semantically relevant regions, (2) a spatial-preserving orchestration schema to maintain object relationships, and (3) a budget-aware strategy to optimally allocate tokens between global context and local details. Extensive experiments on nine benchmarks and three commercial MLLMs demonstrate that Zoomer boosts accuracy by up to 27% while cutting image token usage by up to 67\%. Our approach establishes a principled methodology for robust, resource-aware multimodal understanding in settings where model internals are inaccessible.

PDF Details

AAAI Conference 2024 Conference Paper

Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation

Shangding Gu
Bilgehan Sel
Yuhao Ding
Lu Wang
Qingwei Lin
Ming Jin
Alois Knoll

Ensuring the safety of Reinforcement Learning (RL) is crucial for its deployment in real-world applications. Nevertheless, managing the trade-off between reward and safety during exploration presents a significant challenge. Improving reward performance through policy adjustments may adversely affect safety performance. In this study, we aim to address this conflicting relation by leveraging the theory of gradient manipulation. Initially, we analyze the conflict between reward and safety gradients. Subsequently, we tackle the balance between reward and safety optimization by proposing a soft switching policy optimization method, for which we provide convergence analysis. Based on our theoretical examination, we provide a safe RL framework to overcome the aforementioned challenge, and we develop a Safety-MuJoCo Benchmark to assess the performance of safe RL algorithms. Finally, we evaluate the effectiveness of our method on the Safety-MuJoCo Benchmark and a popular safe benchmark, Omnisafe. Experimental results demonstrate that our algorithms outperform several state-of-the-art baselines in terms of balancing reward and safety optimization.

PDF Details DOI

ECAI Conference 2024 Conference Paper

Nissist: An Incident Mitigation Copilot based on Troubleshooting Guides

Kaikai An
Fangkai Yang
Junting Lu
Liqun Li
Zhixing Ren
Hao Huang
Lu Wang 0029
Pu Zhao 0004

Effective incident management is pivotal for the smooth operation of Microsoft cloud services. In order to expedite incident mitigation, service teams gather troubleshooting knowledge into Troubleshooting Guides (TSGs) accessible to On-Call Engineers (OCEs). While automated pipelines are enabled to resolve the most frequent and easy incidents, there still exist complex incidents that require OCEs’ intervention. In addition, TSGs are often unstructured and incomplete, which requires manual interpretation by OCEs, leading to on-call fatigue and decreased productivity, especially among new-hire OCEs. In this work, we propose Nissist which leverages unstructured TSGs and incident mitigation history to provide proactive incident mitigation suggestions, reducing human intervention. Leveraging Large Language Models (LLM), Nissist extracts knowledge from unstructured TSGs and incident mitigation history, forming a comprehensive knowledge base. Its multi-agent system design enhances proficiency in precisely discerning OCE intents, retrieving relevant information, and delivering systematic plans consecutively. Through our user experiments, we demonstrate that Nissist significantly reduce Time to Mitigate (TTM) in incident mitigation, alleviating operational burdens on OCEs and improving service reliability. Our webpage is available at https: //aka. ms/nissist.

Details

UAI Conference 2024 Conference Paper

SMuCo: Reinforcement Learning for Visual Control via Sequential Multi-view Total Correlation

Tong Cheng
Hang Dong 0004
Lu Wang 0029
Bo Qiao 0001
Qingwei Lin
Saravan Rajmohan
Thomas Moscibroda

The advent of abundant image data has catalyzed the advancement of visual control in reinforcement learning (RL) systems, leveraging multiple view- points to capture the same physical states, which could enhance control performance theoretically. However, integrating multi-view data into representation learning remains challenging. In this paper, we introduce SMuCo, an innovative multi-view reinforcement learning algorithm that constructs robust latent representations by optimizing multi- view sequential total correlation. This technique effectively captures task-relevant information and temporal dynamics while filtering out irrelevant data. Our method supports an unlimited number of views and demonstrates superior performance over leading model-free and model-based RL algorithms. Empirical results from the DeepMind Control Suite and the Sapien Basic Manipulation Task confirm SMuCo’s enhanced efficacy, significantly improving task performance across diverse scenarios and views.

Details

NeurIPS Conference 2024 Conference Paper

WizardArena: Post-training Large Language Models via Simulated Offline Chatbot Arena

Haipeng Luo
Qingfeng Sun
Can Xu
Pu Zhao
Qingwei Lin
Jianguang Lou
Shifeng Chen
Yansong Tang

Recent work demonstrates that, post-training large language models with open-domain instruction following data have achieved colossal success. Simultaneously, human Chatbot Arena has emerged as one of the most reasonable benchmarks for model evaluation and developmental guidance. However, the processes of manually curating high-quality training data and utilizing online human evaluation platforms are both expensive and limited. To mitigate the manual and temporal costs associated with post-training, this paper introduces a Simulated Chatbot Arena named WizardArena, which is fully based on and powered by open-source LLMs. For evaluation scenario, WizardArena can efficiently predict accurate performance rankings among different models based on offline test set. For training scenario, we simulate arena battles among various state-of-the-art models on a large scale of instruction data, subsequently leveraging the battle results to constantly enhance target model in both the supervised fine-tuning and reinforcement learning. Experimental results demonstrate that our WizardArena aligns closely with the online human arena rankings, and our models trained on offline extensive battle data exhibit significant performance improvements during SFT, DPO, and PPO stages.

PDF Details DOI

ICLR Conference 2024 Conference Paper

WizardCoder: Empowering Code Large Language Models with Evol-Instruct

Ziyang Luo
Can Xu
Pu Zhao 0004
Qingfeng Sun
Xiubo Geng
Wenxiang Hu
Chongyang Tao
Jing Ma 0004

Code Large Language Models (Code LLMs), such as StarCoder, have demonstrated remarkable performance in various code-related tasks. However, different from their counterparts in the general language modeling field, the technique of instruction fine-tuning remains relatively under-researched in this domain. In this paper, we present Code Evol-Instruct, a novel approach that adapts the Evol-Instruct method to the realm of code, enhancing Code LLMs to create novel models, WizardCoder. Through comprehensive experiments on five prominent code generation benchmarks, namely HumanEval, HumanEval+, MBPP, DS-1000, and MultiPL-E, our models showcase outstanding performance. They consistently outperform all other open-source Code LLMs by a significant margin. Remarkably, WizardCoder 15B even surpasses the well-known closed-source LLMs, including Anthropic's Claude and Google's Bard, on the HumanEval and HumanEval+ benchmarks. Additionally, WizardCoder 34B not only achieves a HumanEval score comparable to GPT3.5 (ChatGPT) but also surpasses it on the HumanEval+ benchmark. Furthermore, our preliminary exploration highlights the pivotal role of instruction complexity in achieving exceptional coding performance.

Details

ICLR Conference 2024 Conference Paper

WizardLM: Empowering Large Pre-Trained Language Models to Follow Complex Instructions

Can Xu
Qingfeng Sun
Kai Zheng 0021
Xiubo Geng
Pu Zhao 0004
Jiazhan Feng
Chongyang Tao
Qingwei Lin

Training large language models (LLMs) with open-domain instruction following data brings colossal success. However, manually creating such instruction data is very time-consuming and labor-intensive. Moreover, humans may struggle to produce high-complexity instructions. In this paper, we show an avenue for creating large amounts of instruction data with varying levels of complexity using LLM instead of humans. Starting with an initial set of instructions, we use our proposed Evol-Instruct to rewrite them step by step into more complex instructions. Then, we mix all generated instruction data to fine-tune LLaMA. We call the resulting model WizardLM. Both automatic and human evaluations consistently indicate that WizardLM outperforms baselines such as Alpaca (trained from Self-Instruct) and Vicuna (trained from human-created instructions). The experimental results demonstrate that the quality of instruction-following dataset crafted by Evol-Instruct can significantly improve the performance of LLMs.

Details

NeurIPS Conference 2023 Conference Paper

Conservative State Value Estimation for Offline Reinforcement Learning

Liting Chen
Jie Yan
Zhengdao Shao
Lu Wang
Qingwei Lin
Saravanakumar Rajmohan
Thomas Moscibroda
Dongmei Zhang

Offline reinforcement learning faces a significant challenge of value over-estimation due to the distributional drift between the dataset and the current learned policy, leading to learning failure in practice. The common approach is to incorporate a penalty term to reward or value estimation in the Bellman iterations. Meanwhile, to avoid extrapolation on out-of-distribution (OOD) states and actions, existing methods focus on conservative Q-function estimation. In this paper, we propose Conservative State Value Estimation (CSVE), a new approach that learns conservative V-function via directly imposing penalty on OOD states. Compared to prior work, CSVE allows more effective state value estimation with conservative guarantees and further better policy optimization. Further, we apply CSVE and develop a practical actor-critic algorithm in which the critic does the conservative value estimation by additionally sampling and penalizing the states around the dataset, and the actor applies advantage weighted updates extended with state exploration to improve the policy. We evaluate in classic continual control tasks of D4RL, showing that our method performs better than the conservative Q-function learning methods and is strongly competitive among recent SOTA methods.

PDF Details

IJCAI Conference 2023 Conference Paper

PathLAD+: An Improved Exact Algorithm for Subgraph Isomorphism Problem

Yiyuan Wang
Chenghou Jin
Shaowei Cai
Qingwei Lin

The subgraph isomorphism problem (SIP) is a challenging problem with wide practical applications. In the last decade, despite being a theoretical hard problem, researchers design various algorithms for solving SIP. In this work, we propose three main heuristics and develop an improved exact algorithm for SIP. First, we design a probing search procedure to try whether the search procedure can successfully obtain a solution at first sight. Second, we design a novel matching ordering as a value-ordering heuristic, which uses some useful information obtained from the probing search procedure to preferentially select some promising target vertices. Third, we discuss the characteristics of different propagation methods in the context of SIP and present an adaptive propagation method to make a good balance between these methods. Experimental results on a broad range of real-world benchmarks show that our proposed algorithm performs better than state-of-the-art algorithms for the SIP.

PDF Details DOI

ICLR Conference 2023 Conference Paper

Towards Lightweight, Model-Agnostic and Diversity-Aware Active Anomaly Detection

Xu Zhang 0024
Yuan Zhao 0014
Ziang Cui
Liqun Li
Shilin He
Qingwei Lin
Yingnong Dang
Saravan Rajmohan

Active Anomaly Discovery (AAD) is flourishing in the anomaly detection research area, which aims to incorporate analysts’ feedback into unsupervised anomaly detectors. However, existing AAD approaches usually prioritize the samples with the highest anomaly scores for user labeling, which hinders the exploration of anomalies that were initially ranked lower. Besides, most existing AAD approaches are specially tailored for a certain unsupervised detector, making it difficult to extend to other detection models. To tackle these problems, we propose a lightweight, model-agnostic and diversity-aware AAD method, named LMADA. In LMADA, we design a diversity-aware sample selector powered by Determinantal Point Process (DPP). It considers the diversity of samples in addition to their anomaly scores for feedback querying. Furthermore, we propose a model-agnostic tuner. It approximates diverse unsupervised detectors with a unified proxy model, based on which the feedback information is incorporated by a lightweight non-linear representation adjuster. Through extensive experiments on 8 public datasets, LMADA achieved 74% F1-Score improvement on average, outperforming other comparative AAD approaches. Besides, LMADA can also achieve significant performance boosting under any unsupervised detectors.

Details

ICLR Conference 2022 Conference Paper

Automatic Loss Function Search for Predict-Then-Optimize Problems with Strong Ranking Property

Boshi Wang
Jialin Yi
Hang Dong 0004
Bo Qiao 0001
Chuan Luo 0002
Qingwei Lin

Combinatorial optimization problems with parameters to be predicted from side information are commonly seen in a variety of problems during the paradigm shift from reactive decision making to proactive decision making. Due to the misalignment between the continuous prediction results and the discrete decisions in optimization problems, it is hard to achieve a satisfactory prediction result with the ordinary $l_2$ loss in the prediction phase. To properly connect the prediction loss with the optimization goal, in this paper we propose a total group preorder (TGP) loss and its differential version called approximated total group preorder (ATGP) loss for predict-then-optimize (PTO) problems with strong ranking property. These new losses are provably more robust than the usual $l_2$ loss in a linear regression setting and have great potential to extend to other settings. We also propose an automatic searching algorithm that adapts the ATGP loss to PTO problems with different combinatorial structures. Extensive experiments on the ranking problem, the knapsack problem, and the shortest path problem have demonstrated that our proposed method can achieve a significant performance compared to the other methods designed for PTO problems.

Details

IJCAI Conference 2022 Conference Paper

T-SMOTE: Temporal-oriented Synthetic Minority Oversampling Technique for Imbalanced Time Series Classification

Pu Zhao
Chuan Luo
Bo Qiao
Lu Wang
Saravan Rajmohan
Qingwei Lin
Dongmei Zhang

Time series classification is a popular and important topic in machine learning, and it suffers from the class imbalance problem in many real-world applications. In this paper, to address the class imbalance problem, we propose a novel and practical oversampling method named T-SMOTE, which can make full use of the temporal information of time-series data. In particular, for each sample of minority class, T-SMOTE generates multiple samples that are close to class border. Then, based on those samples near class border, T-SMOTE synthesizes more samples. Finally, a weighted sampling method is called on both generated samples near class border and synthetic samples. Extensive experiments on a diverse set of both univariate and multivariate time-series datasets demonstrate that T-SMOTE consistently outperforms the current state-of-the-art methods on imbalanced time series classification. More encouragingly, our empirical evaluations show that T-SMOTE performs better in the scenario of early prediction, an important application scenario in industry, which indicates that T-SMOTE could bring benefits in practice.

PDF Details DOI

IJCAI Conference 2021 Conference Paper

A Runtime Analysis of Typical Decomposition Approaches in MOEA/D Framework for Many-objective Optimization Problems

Zhengxin Huang
Yuren Zhou
Chuan Luo
Qingwei Lin

Decomposition approach is an important component in multi-objective evolutionary algorithm based on decomposition (MOEA/D), which is a popular method for handing many-objective optimization problems (MaOPs). This paper presents a theoretical analysis on the convergence ability of using the typical weighted sum (WS), Tchebycheff (TCH) or penalty-based boundary intersection (PBI) approach in a basic MOEA/D for solving two benchmark MaOPs. The results show that using WS, the algorithm can always find an optimal solution for any subproblem in polynomial expected runtime. In contrast, the algorithm needs at least exponential expected runtime for some subproblems if using TCH or PBI. Moreover, our analyses discover an obvious shortcoming of using WS, that is, the optimal solutions of different subproblems are easily corresponding to the same solution. In addition, this analysis indicates that if using PBI, a small value of the penalty parameter is a good choice for faster converging to the Pareto front, but it may lose the diversity. This study reveals some optimization behaviors of using three typical decomposition approaches in the well-known MOEA/D framework for solving MaOPs.

PDF Details DOI

NeurIPS Conference 2021 Conference Paper

A Surrogate Objective Framework for Prediction+Programming with Soft Constraints

Kai Yan
Jie Yan
Chuan Luo
Liting Chen
Qingwei Lin
Dongmei Zhang

Prediction+optimization is a common real-world paradigm where we have to predict problem parameters before solving the optimization problem. However, the criteria by which the prediction model is trained are often inconsistent with the goal of the downstream optimization problem. Recently, decision-focused prediction approaches, such as SPO+ and direct optimization, have been proposed to fill this gap. However, they cannot directly handle the soft constraints with the max operator required in many real-world objectives. This paper proposes a novel analytically differentiable surrogate objective framework for real-world linear and semi-definite negative quadratic programming problems with soft linear and non-negative hard constraints. This framework gives the theoretical bounds on constraints’ multipliers, and derives the closed-form solution with respect to predictive parameters and thus gradients for any variable in the problem. We evaluate our method in three applications extended with soft constraints: synthetic linear programming, portfolio optimization, and resource provisioning, demonstrating that our method outperforms traditional two-staged methods and other decision-focused approaches

PDF Details

AAAI Conference 2021 Conference Paper

Correlation-Aware Heuristic Search for Intelligent Virtual Machine Provisioning in Cloud Systems

Chuan Luo
Bo Qiao
Wenqian Xing
Xin Chen
Pu Zhao
Chao Du
Randolph Yao
Hongyu Zhang

The optimization of resource is crucial for the operation of public cloud systems such as Microsoft Azure, as well as servers dedicated to the workloads of large customers such as Microsoft 365. Those optimization tasks often need to take unknown parameters into consideration and can be formulated as Prediction+Optimization problems. This paper proposes a new Prediction+Optimization method named Correlation-Aware Heuristic Search (CAHS) that is capable of accounting for the uncertainty in unknown parameters and delivering effective solutions to difficult optimization problems. We apply this method to solving the predictive virtual machine (VM) provisioning (PreVMP) problem, where the VM provisioning plans are optimized based on the predicted demands of different VM types, to ensure rapid provisions upon customers’ requests and to pursue high resource utilization. Unlike the current state-of-the-art PreVMP approaches that assume independence among the demands for different VM types, CAHS incorporates demand correlation when conducting prediction and optimization in a novel and effective way. Our experiments on two public benchmarks and one industrial benchmark demonstrate that CAHS can achieve better performance than its nine state-of-the-art competitors. CAHS has been successfully deployed in Microsoft Azure and significantly improved its performance. The main ideas of CAHS have also been leveraged to improve the efficiency and the reliability of the cloud services provided by Microsoft 365.

PDF Details

AAAI Conference 2021 Conference Paper

NuQClq: An Effective Local Search Algorithm for Maximum Quasi-Clique Problem

Jiejiang Chen
Shaowei Cai
Shiwei Pan
Yiyuan Wang
Qingwei Lin
Mengyu Zhao
Minghao Yin

The maximum quasi-clique problem (MQCP) is an important extension of maximum clique problem with wide applications. Recent heuristic MQCP algorithms can hardly solve large and hard graphs effectively. This paper develops an efficient local search algorithm named NuQClq for the MQCP, which has two main ideas. First, we propose a novel vertex selection strategy, which utilizes cumulative saturation information to be a selection criterion when the candidate vertices have equal values on the primary scoring function. Second, a variant of configuration checking named BoundedCC is designed by setting an upper bound for the threshold of forbidding strength. When the threshold value of vertex exceeds the upper bound, we reset its threshold value to increase the diversity of search process. Experiments on a broad range of classic benchmarks and sparse instances show that NuQ- Clq significantly outperforms the state-of-the-art MQCP algorithms for most instances.

PDF Details

IJCAI Conference 2021 Conference Paper

Predictive Job Scheduling under Uncertain Constraints in Cloud Computing

Hang Dong
Boshi Wang
Bo Qiao
Wenqian Xing
Chuan Luo
Si Qin
Qingwei Lin
Dongmei Zhang

Capacity management has always been a great challenge for cloud platforms due to massive, heterogeneous on-demand instances running at different times. To better plan the capacity for the whole platform, a class of cloud computing instances have been released to collect computing demands beforehand. To use such instances, users are allowed to submit jobs to run for a pre-specified uninterrupted duration in a flexible range of time in the future with a discount compared to the normal on-demand instances. Proactively scheduling those pre-collected job requests considering the capacity status over the platform can greatly help balance the computing workloads along time. In this work, we formulate the scheduling problem for these pre-collected job requests under uncertain available capacity as a Prediction + Optimization problem with uncertainty in constraints, and propose an effective algorithm called Controlling under Uncertain Constraints (CUC), where the predicted capacity guides the optimization of job scheduling and job scheduling results are leveraged to improve the prediction of capacity through Bayesian optimization. The proposed formulation and solution are commonly applicable for proactively scheduling problems in cloud computing. Our extensive experiments on three public, industrial datasets shows that CUC has great potential for supporting high reliability in cloud platforms.

PDF Details DOI

AAAI Conference 2021 Conference Paper

PULNS: Positive-Unlabeled Learning with Effective Negative Sample Selector

Chuan Luo
Pu Zhao
Chen Chen
Bo Qiao
Chao Du
Hongyu Zhang
Wei Wu
Shaowei Cai

Positive-unlabeled learning (PU learning) is an important case of binary classification where the training data only contains positive and unlabeled samples. The current stateof-the-art approach for PU learning is the cost-sensitive approach, which casts PU learning as a cost-sensitive classification problem and relies on unbiased risk estimator for correcting the bias introduced by the unlabeled samples. However, this approach requires the knowledge of class prior and is subject to the potential label noise. In this paper, we propose a novel PU learning approach dubbed PULNS, equipped with an effective negative sample selector, which is optimized by reinforcement learning. Our PULNS approach employs an effective negative sample selector as the agent responsible for selecting negative samples from the unlabeled data. While the selected, likely negative samples can be used to improve the classifier, the performance of classifier is also used as the reward to improve the selector through the REINFORCE algorithm. By alternating the updates of the selector and the classifier, the performance of both is improved. Extensive experimental studies on 7 real-world application benchmarks demonstrate that PULNS consistently outperforms the current state-of-the-art methods in PU learning, and our experimental results also confirm the effectiveness of the negative sample selector underlying PULNS.

PDF Details

IJCAI Conference 2020 Conference Paper

Intelligent Virtual Machine Provisioning in Cloud Computing

Chuan Luo
Bo Qiao
Xin Chen
Pu Zhao
Randolph Yao
Hongyu Zhang
Wei Wu
Andrew Zhou

Virtual machine (VM) provisioning is a common and critical problem in cloud computing. In industrial cloud platforms, there are a huge number of VMs provisioned per day. Due to the complexity and resource constraints, it needs to be carefully optimized to make cloud platforms effectively utilize the resources. Moreover, in practice, provisioning a VM from scratch requires fairly long time, which would degrade the customer experience. Hence, it is advisable to provision VMs ahead for upcoming demands. In this work, we formulate the practical scenario as the predictive VM provisioning (PreVMP) problem, where upcoming demands are unknown and need to be predicted in advance, and then the VM provisioning plan is optimized based on the predicted demands. Further, we propose Uncertainty-Aware Heuristic Search (UAHS) for solving the PreVMP problem. UAHS first models the prediction uncertainty, and then utilizes the prediction uncertainty in optimization. Moreover, UAHS leverages Bayesian optimization to interact prediction and optimization to improve its practical performance. Extensive experiments show that UAHS performs much better than state-of-the-art competitors on two public datasets and an industrial dataset. UAHS has been successfully applied in Microsoft Azure and brought practical benefits in real-world applications.

PDF Details DOI

IJCAI Conference 2020 Conference Paper

Two-goal Local Search and Inference Rules for Minimum Dominating Set

Shaowei Cai
Wenying Hou
Yiyuan Wang
Chuan Luo
Qingwei Lin

Minimum dominating set (MinDS) is a canonical NP-hard combinatorial optimization problem with applications. For large and hard instances one must resort to heuristic approaches to obtain good solutions within reasonable time. This paper develops an efficient local search algorithm for MinDS, which has two main ideas. The first one is a novel local search framework, while the second is a construction procedure with inference rules. Our algorithm named FastDS is evaluated on 4 standard benchmarks and 3 massive graphs benchmarks. FastDS obtains the best performance for almost all benchmarks, and obtains better solutions than state-of-the-art algorithms on massive graphs.

PDF Details DOI

IJCAI Conference 2019 Conference Paper

Local Search with Efficient Automatic Configuration for Minimum Vertex Cover

Chuan Luo
Holger H. Hoos
Shaowei Cai
Qingwei Lin
Hongyu Zhang
Dongmei Zhang

Minimum vertex cover (MinVC) is a prominent NP-hard problem in artificial intelligence, with considerable importance in applications. Local search solvers define the state of the art in solving MinVC. However, there is no single MinVC solver that works best across all types of MinVC instances, and finding the most suitable solver for a given application poses considerable challenges. In this work, we present a new local search framework for MinVC called MetaVC, which is highly parametric and incorporates many effective local search techniques. Using an automatic algorithm configurator, the performance of MetaVC can be optimized for particular types of MinVC instances. Through extensive experiments, we demonstrate that MetaVC significantly outperforms previous solvers on medium-size hard MinVC instances, and shows competitive performance on large MinVC instances. We further introduce a neural-network-based approach for enhancing the automatic configuration process, by identifying and terminating unpromising configuration runs. Our results demonstrate that MetaVC, when automatically configured using this method, can achieve improvements in the best known solutions for 16 large MinVC instances.

PDF Details