Arrow Research search

Author name cluster

Cheng Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

35 papers
2 author rows

Possible papers

35

AAAI Conference 2026 Conference Paper

SMIDT: High-Performance Inference Framework for MoE Models with Dynamic Top-K Routing

  • Zewen Jin
  • Shen Fu
  • Chengjie Tang
  • Youhui Bai
  • Shengnan Wang
  • Jiaan Zhu
  • Chizheng Fang
  • Ping Gong

To accelerate Mixture-of-Experts (MoE) inference, the hybrid parallelism paradigm is first applying pipeline parallelism (PP) to vertically divide the model into stages, with each stage further divided horizontally using tensor or expert parallelism. On the algorithm side, dynamic Top-K routing reduces computation by activating fewer experts per token on average. In this paper, we explore the application of dynamic Top-K routing to PP-enabled MoE inference, aiming to fully unleash their combined potential. We identify key performance bottlenecks arising from Top-K value variation across layers, which conflicts with PP's typically uniform stage partitioning, as well as opportunities to optimize memory usage through their integration. To address these challenges, we present SMIDT, an efficient MoE inference framework tailored for dynamic Top-K routing. SMIDT features: (1) an adaptive, module-level uneven partitioning strategy to balance computation across PP stages, (2) a memory-aware expert replication scheme (DPMoE) that reduces communication overhead, and (3) a lightweight search algorithm combining binary search and dynamic programming to generate efficient parallelism plans. We implement SMIDT on SGLang, a state-of-the-art LLM inference framework, evaluate it on 32 A40 GPUs and 16 A100 GPUs, and compare with manually tuned parallelism strategies. Experimental results show that, when co-locating prefill and decoding phases, SMIDT achieves 1.20–3.13x throughput improvements for prefill-only tasks and 1.05–1.89x for prefill-decoding tasks. When disaggregating prefill and decoding tasks, SMIDT improves average and P99 time-to-first-token (TTFT) by 1.10–1.17x and 1.21–1.26x, respectively.

JBHI Journal 2025 Journal Article

Accurate and Real-Time Hierarchical Ensemble Network for Activity Classification in Construction Worker

  • Guoyu Zuo
  • Qifei Wu
  • Wenbin Gao
  • Cheng Li
  • Liangkun Sun
  • Shuangyue Yu

Accurate and real-time locomotion classification is crucial for exoskeletons to assist construction workers in completing multiple tasks. However, state-of-the-art algorithms for classifying multiple activities face multifaceted challenges in both accuracy and real-time capability. In addition, advanced studies typically provide a single solution based on certain sensor combinations, which may have an indirect impact on different assistive devices (e. g. , an algorithm using feet IMUs is not suited for bilateral portable hip exoskeletons or unilateral knee exoskeletons), limiting its practicality and applicability in diverse applications. To fill these two gaps, first, we developed a novel hierarchical ensemble network framework that can accurately and real-time classify 11 typical lower limb activities of construction workers. Second, building upon this hierarchical ensemble network framework, we developed 6 configurations wearing IMU sensors on different body segments, which are potentially used for different wearable devices. Experimental results with leave-one-out cross-validation obtained from 10 able-bodied subjects validated the effectiveness of the proposed algorithm. Compared to the baseline ANN-based algorithm, our algorithm under 6 configurations on average was able to improve accuracy, precision, recall, and F1-score by 4. 97%, 3. 40%, 4. 97%, and 5. 31%, respectively, and reduce the number of parameters and inference time by 71. 86% and 47. 85%, respectively. This study showcases multiple solutions with different wearable sensor configurations, offering high accuracy and strong real-time performance for classifying multiple activities, which can be deployed to controllers for multiple types of assistive devices targeting construction workers.

ICRA Conference 2025 Conference Paper

AVD2: Accident Video Diffusion for Accident Video Description

  • Cheng Li
  • Keyuan Zhou
  • Tong Liu
  • Yu Wang
  • Mingqiao Zhuang
  • Huan-ang Gao
  • Bu Jin
  • Hao Zhao 0002

Traffic accidents present complex challenges for autonomous driving, often featuring unpredictable scenarios that hinder accurate system interpretation and responses. Nonetheless, prevailing methodologies fall short in elucidating the causes of accidents and proposing preventive measures due to the paucity of training data specific to accident scenarios. In this work, we introduce AVD2 (Accident Video Diffusion for Accident Video Description), a novel framework that enhances accident scene understanding by generating accident videos that aligned with detailed natural language descriptions and reasoning, resulting in the contributed EMM-AU (Enhanced Multi-Modal Accident Video Understanding) dataset. Empirical results reveal that the integration of the EMM-AU dataset establishes state-of-the-art performance across both automated metrics and human evaluations, markedly advancing the domains of accident analysis and prevention. Project resources are available at https://an-answer-tree.github.io

AAAI Conference 2025 Conference Paper

BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference

  • Zewen Jin
  • Shengnan Wang
  • Jiaan Zhu
  • Hongrui Zhan
  • Youhui Bai
  • Lin Zhang
  • Zhenyu Ming
  • Cheng Li

The Mixture-of-Experts (MoE) structure scales the Transformer-based large language models (LLMs) and improves their performance with only the sub-linear increase in computation resources. Recently, a fine-grained DeepSeekMoE structure is proposed, which can further improve the computing efficiency of MoE without performance degradation. However, the All-to-All communication introduced by MoE has become a bottleneck, especially for the fine-grained structure, which typically involves and activates more experts, hence contributing to heavier communication overhead. In this paper, we propose a novel MoE structure named BigMac, which is also fine-grained but with high communication efficiency. The innovation of BigMac is mainly due to that we abandon the Communicate-Descend-Ascend-Communicate (CDAC) manner used by fine-grained MoE, which leads to the All-to-All communication always taking place at the highest dimension. Instead, BigMac designs an efficient Descend-Communicate-Communicate-Ascend (DCCA) manner. Specifically, we add a descending and ascending projection at the entrance and exit of the expert, respectively, which enables the communication to perform at a very low dimension. Furthermore, to adapt to DCCA, we re-design the structure of small experts, ensuring that the expert in BigMac has enough complexity to address tokens. Experimental results show that BigMac achieves comparable or even better model quality than fine-grained MoEs with the same number of experts and a similar number of total parameters. Equally importantly, BigMac reduces the end-to-end latency by up to 3.09 x for training and increases the throughput by up to 3.11 x for inference on state-of-the-art AI computing frameworks including Megatron, Tutel, and DeepSpeed-Inference.

AAAI Conference 2025 Conference Paper

De-singularity Subgradient for the q-th-Powered lₚ-Norm Weber Location Problem

  • Zhao-Rong Lai
  • Xiaotian Wu
  • Liangda Fang
  • Ziliang Chen
  • Cheng Li

The Weber location problem is widely used in several artificial intelligence scenarios. However, the gradient of the objective does not exist at a considerable set of singular points. Recently, a de-singularity subgradient method has been proposed to fix this problem, but it can only handle the q-th-powered l_2-norm case (1<= q<2), which has only finite singular points. In this paper, we further establish the de-singularity subgradient for the q-th-powered l_p-norm case with 1<= q<= p and 1<= p<2, which includes all the rest unsolved situations in this problem. This is a challenging task because the singular set is a continuum. The geometry of the objective function is also complicated so that the characterizations of the subgradients, minimum and descent direction are very difficult. We develop a q-th-powered l_p-norm Weiszfeld Algorithm without Singularity (qPpNWAWS) for this problem, which ensures convergence and the descent property of the objective function. Extensive experiments on six real-world data sets demonstrate that qPpNWAWS successfully solves the singularity problem and achieves a linear computational convergence rate in practical scenarios.

JBHI Journal 2025 Journal Article

Medical Vision-Language Modeling With Semantic Interaction and Adaptive Refinement Prompting for Bias Mitigation

  • Cheng Li
  • Weijian Huang
  • Hao Yang
  • Jiarun Liu
  • Yong Liang
  • Shanshan Wang

Vision-Language Models (VLMs) have demonstrated impressive capabilities across various medical tasks, including report generation and visual question answering (VQA). However, pixel-level tasks such as image segmentation remain relatively underexplored, despite their critical importance for clinical decision-making, surgical planning, and model interpretability. Moreover, the scarcity of high-quality segmentation annotations in the medical domain often leads to biased data distributions, characterized by imbalances in disease types, anatomical coverage, and image quality. These biases are frequently overlooked during both model development and evaluation, limiting the robustness and real-world applicability of VLMs in healthcare scenarios. In this study, we propose a unified medical vision-language model applicable for a variety of clinical tasks, including report generation, VQA, and pixel-level image segmentation. Within the model, we propose a semantic interaction mechanism aimed at enhancing pixel-level vision and language representation learning. To mitigate the impact of biased data distributions, we explicitly develop an adaptive refinement prompting method involving the iterative re-prompting of hard samples. The proposed method is thoroughly validated through experiments on eight datasets and comparisons with nine state-of-the-art methods. The experimental results indicate that our model achieves superior performance in both medical VQA and segmentation tasks. These results highlight the potential of our approach in advancing the deployment of medical VLMs in real-world clinical applications. Code will be released at: https://github.com/SZUHvern/Unified-Medical-Vision-Language-Modeling

NeurIPS Conference 2025 Conference Paper

rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset

  • Yifei Liu
  • Li Lyna Zhang
  • Yi Zhu
  • Bingcheng Dong
  • Xudong Zhou
  • Ning Shang
  • Fan Yang
  • Cheng Li

Advancing code reasoning in large language models (LLMs) is fundamentally limited by the scarcity of high-difficulty datasets, especially those with verifiable input-output test cases necessary for rigorous solution validation at scale. We introduce rStar-Coder, which significantly improves LLM code reasoning capabilities by constructing a large-scale, verified dataset of 418K competition-level code problems, 580K long-reasoning solutions along with rich test cases of varying difficulty. This is achieved through three core contributions: (1) we curate competitive programming code problems and solutions to synthesize new, solvable problems; (2) we introduce a reliable input-output test case synthesis pipeline that decouples the generation into a three-step input generation method and a mutual verification mechanism for effective output labeling; (3) we augment problems with high-quality, test-case-verified long-reasoning solutions. Extensive experiments on Qwen models (1. 5B-14B) across various code reasoning benchmarks demonstrate the superiority of rStar-Coder dataset, achieving leading performance comparable to frontier reasoning LLMs with significantly smaller model sizes. On LiveCodeBench, rStar-Coder improves Qwen2. 5-7B from 17. 4% to an impressive 57. 3%, and Qwen2. 5-14B from 23. 3% to 62. 5%, surpassing o3-mini (low) by 3. 1%. On the more challenging USA Computing Olympiad, our 7B model achieves an average pass@1 accuracy of 16. 15%, outperforming the frontier-level QWQ-32B. rStar-Coder dataset is publicly available at https: //huggingface. co/datasets/microsoft/rStar-Coder.

AAAI Conference 2025 Conference Paper

World Knowledge-Enhanced Reasoning Using Instruction-Guided Interactor in Autonomous Driving

  • Mingliang Zhai
  • Cheng Li
  • Zengyuan Guo
  • Ningrui Yang
  • Xiameng Qin
  • Sanyuan Zhao
  • Junyu Han
  • Ji Tao

The Multi-modal Large Language Models (MLLMs) with extensive world knowledge have revitalized autonomous driving, particularly in reasoning tasks within perceivable regions. However, when faced with perception-limited areas (dynamic or static occlusion regions), MLLMs struggle to effectively integrate perception ability with world knowledge for reasoning. These perception-limited regions can conceal crucial safety information, especially for vulnerable road users. In this paper, we propose a framework, which aims to improve autonomous driving performance under perception-limited conditions by enhancing the integration of perception capabilities and world knowledge. Specifically, we propose a plug-and-play instruction-guided interaction module that bridges modality gaps and significantly reduces the input sequence length, allowing it to adapt effectively to multi-view video inputs. Furthermore, to better integrate world knowledge with driving-related tasks, we have collected and refined a large-scale multi-modal dataset that includes 2 million natural language QA pairs, 1.7 million grounding task data. To evaluate the model’s utilization of world knowledge, we introduce an object-level risk assessment dataset comprising 200K QA pairs, where the questions necessitate multi-step reasoning leveraging world knowledge for resolution. Extensive experiments validate the effectiveness of our proposed method.

NeurIPS Conference 2024 Conference Paper

A Globally Optimal Portfolio for m-Sparse Sharpe Ratio Maximization

  • Yizun Lin
  • Zhao-Rong Lai
  • Cheng Li

The Sharpe ratio is an important and widely-used risk-adjusted return in financial engineering. In modern portfolio management, one may require an m-sparse (no more than m active assets) portfolio to save managerial and financial costs. However, few existing methods can optimize the Sharpe ratio with the m-sparse constraint, due to the nonconvexity and the complexity of this constraint. We propose to convert the m-sparse fractional optimization problem into an equivalent m-sparse quadratic programming problem. The semi-algebraic property of the resulting objective function allows us to exploit the Kurdyka-Lojasiewicz property to develop an efficient Proximal Gradient Algorithm (PGA) that leads to a portfolio which achieves the globally optimal m-sparse Sharpe ratio under certain conditions. The convergence rates of PGA are also provided. To the best of our knowledge, this is the first proposal that achieves a globally optimal m-sparse Sharpe ratio with a theoretically-sound guarantee.

NeurIPS Conference 2024 Conference Paper

CultureLLM: Incorporating Cultural Differences into Large Language Models

  • Cheng Li
  • Mengzhuo Chen
  • Jindong Wang
  • Sunayana Sitaram
  • Xing Xie

Large language models (LLMs) have been observed to exhibit bias towards certain cultures due to the predominance of training data obtained from English corpora. Considering that multilingual cultural data is often expensive to procure, existing methodologies address this challenge through prompt engineering or culture-specific pre-training. However, these strategies may neglect the knowledge deficiency of low-resource cultures and necessitate substantial computing resources. In this paper, we propose CultureLLM, a cost-effective solution to integrate cultural differences into LLMs. CultureLLM employs the World Value Survey (WVS) as seed data and generates semantically equivalent training data through the proposed semantic data augmentation. Utilizing only $50$ seed samples from WVS with augmented data, we fine-tune culture-specific LLMs as well as a unified model (CultureLLM-One) for $9$ cultures, encompassing both rich and low-resource languages. Extensive experiments conducted on $60$ culture-related datasets reveal that CultureLLM significantly surpasses various counterparts such as GPT-3. 5 (by $8. 1$\%) and Gemini Pro (by $9. 5$\%), demonstrating performance comparable to or exceeding that of GPT-4. Our human study indicates that the generated samples maintain semantic equivalence to the original samples, offering an effective solution for LLMs augmentation. Code is released at https: //github. com/Scarelette/CultureLLM.

NeurIPS Conference 2024 Conference Paper

CulturePark: Boosting Cross-cultural Understanding in Large Language Models

  • Cheng Li
  • Damien Teney
  • Linyi Yang
  • Qingsong Wen
  • Xing Xie
  • Jindong Wang

Cultural bias is pervasive in many large language models (LLMs), largely due to the deficiency of data representative of different cultures. Typically, cultural datasets and benchmarks are constructed either by extracting subsets of existing datasets or by aggregating from platforms such as Wikipedia and social media. However, these approaches are highly dependent on real-world data and human annotations, making them costly and difficult to scale. Inspired by cognitive theories on social communication, this paper introduces CulturePark, an LLM-powered multi-agent communication framework for cultural data collection. CulturePark simulates cross-cultural human communication with LLM-based agents playing roles in different cultures. It generates high-quality cross-cultural dialogues encapsulating human beliefs, norms, and customs. Using CulturePark, we generated 41, 000 cultural samples to fine-tune eight culture-specific LLMs. We evaluated these models across three downstream tasks: content moderation, cultural alignment, and cultural education. Results show that for content moderation, our GPT-3. 5-based models either match or outperform GPT-4 on $41$ datasets. Regarding cultural alignment, our models surpass GPT-4 on Hofstede's VSM 13 framework. Furthermore, for cultural education of human participants, our models demonstrate superior outcomes in both learning efficacy and user experience compared to GPT-4. CulturePark proves an important step in addressing cultural bias and advancing the democratization of AI, highlighting the critical role of culturally inclusive data in model training. Code is released at https: //github. com/Scarelette/CulturePark.

AAAI Conference 2024 Conference Paper

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing

  • Conglong Li
  • Zhewei Yao
  • Xiaoxia Wu
  • Minjia Zhang
  • Connor Holmes
  • Cheng Li
  • Yuxiong He

Recent advances on deep learning models come at the price of formidable training cost. The increasing model size is one of the root causes, but another less-emphasized fact is that data scale is actually increasing at a similar speed as model scale, and the training cost is proportional to both of them. Compared to the rapidly evolving model architecture, how to efficiently use the training data (especially for the expensive foundation model pretraining) is both less explored and difficult to realize due to the lack of a convenient framework that focus on data efficiency capabilities. To this end, we present DeepSpeed Data Efficiency, a framework that makes better use of data, increases training efficiency, and improves model quality. Specifically, we propose and combine two data efficiency techniques: efficient data sampling via a general curriculum learning library, and efficient data routing via a novel random layerwise token dropping technique. For GPT-3 1.3B language model pretraining, our work achieves 12.5x less data/time/cost ($3.7K if rent on Azure), while still maintaining 95% of model quality compared to baseline with full data and cost ($46.3K). For GPT-3 1.3B and BERT-large pretraining, our work can also achieve the same model quality with up to 2x less data/time/cost, or achieve better model quality under same data/time/cost. DeepSpeed Data Efficiency is easy to use and tune, enabling us to easily apply it and verify its benefit on additional tasks including GPT-3 MoE model pretraining and small-scale GPT-2/ViT finetuning.

AAAI Conference 2024 Conference Paper

Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation

  • Zhewei Yao
  • Xiaoxia Wu
  • Cheng Li
  • Stephen Youn
  • Yuxiong He

Post-training quantization (PTQ) has emerged as a promising technique for mitigating memory consumption and computational costs in large language models (LLMs). However, a systematic examination of various quantization schemes, model families, and quantization bit precision has been absent from the literature. In this paper, we conduct a comprehensive analysis of these factors by investigating the effects of PTQ on weight-only, activation-only, and weight-and-activation quantization using diverse methods such as round-to-nearest (RTN), GPTQ, ZeroQuant, and their variants. We apply these methods to two distinct model families with parameters ranging from 125M to 176B. Our contributions include: (1) a sensitivity analysis revealing that activation quantization is generally more susceptible to weight quantization, with smaller models often outperforming larger models in terms of activation quantization; (2) an evaluation and comparison of existing PTQ methods to optimize model size reduction while minimizing the impact on accuracy, revealing that none of the current methods can achieve the original model quality for quantization with either INT4-weight or INT4-weight-and-INT8-activation; (3) based on these insights, we propose an optimized method called Low-Rank Compensation (LoRC), which employs low-rank matrices to enhance model quality recovery with a minimal increase in model size.

IJCAI Conference 2024 Conference Paper

NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional Stimuli

  • Xu Wang
  • Cheng Li
  • Yi Chang
  • Jindong Wang
  • Yuan Wu

Large Language Models (LLMs) have become integral to a wide spectrum of applications, ranging from traditional computing tasks to advanced artificial intelligence (AI) applications. This widespread adoption has spurred extensive research into LLMs across various disciplines, including the social sciences. Notably, studies have revealed that LLMs possess emotional intelligence, which can be further developed through positive emotional stimuli. This discovery raises an intriguing question: can negative emotions similarly influence LLMs, potentially enhancing their performance? In response to this question, we introduce NegativePrompt, a novel approach underpinned by psychological principles, involving ten specifically designed negative emotional stimuli. We embark on rigorous experimental evaluations of five LLMs including Flan-T5-Large, Vicuna, Llama 2, ChatGPT, and GPT-4, across a set of 45 tasks. The results are revealing: NegativePrompt markedly enhances the performance of LLMs, evidenced by relative improvements of 12. 89% in Instruction Induction tasks and 46. 25% in BIG-Bench tasks. Moreover, we conduct attention visualization experiments to decipher the underlying mechanisms of NegativePrompt's influence. Our research contributes significantly to the understanding of LLMs and emotion interaction, demonstrating the practical efficacy of NegativePrompt as an emotion-driven method and offering novel insights for the enhancement of LLMs in real-world applications. The code is available at https: //github. com/wangxu0820/NegativePrompt.

AAAI Conference 2024 Conference Paper

Self-Distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach

  • Ziyin Zhang
  • Ning Lu
  • Minghui Liao
  • Yongshuai Huang
  • Cheng Li
  • Min Wang
  • Wei Peng

Text recognition methods are gaining rapid development. Some advanced techniques, e.g., powerful modules, language models, and un- and semi-supervised learning schemes, consecutively push the performance on public benchmarks forward. However, the problem of how to better optimize a text recognition model from the perspective of loss functions is largely overlooked. CTC-based methods, widely used in practice due to their good balance between performance and inference speed, still grapple with accuracy degradation. This is because CTC loss emphasizes the optimization of the entire sequence target while neglecting to learn individual characters. We propose a self-distillation scheme for CTC-based model to address this issue. It incorporates a framewise regularization term in CTC loss to emphasize individual supervision, and leverages the maximizing-a-posteriori of latent alignment to solve the inconsistency problem that arises in distillation between CTC-based models. We refer to the regularized CTC loss as Distillation Connectionist Temporal Classification (DCTC) loss. DCTC loss is module-free, requiring no extra parameters, longer inference lag, or additional training data or phases. Extensive experiments on public benchmarks demonstrate that DCTC can boost text recognition model accuracy by up to 2.6%, without any of these drawbacks.

ICML Conference 2024 Conference Paper

The Good, The Bad, and Why: Unveiling Emotions in Generative AI

  • Cheng Li
  • Jindong Wang 0001
  • Yixuan Zhang 0001
  • Kaijie Zhu
  • Xinyi Wang
  • Wenxin Hou
  • Jianxun Lian
  • Fang Luo

Emotion significantly impacts our daily behaviors and interactions. While recent generative AI models, such as large language models, have shown impressive performance in various tasks, it remains unclear whether they truly comprehend emotions and why. This paper aims to address this gap by incorporating psychological theories to gain a holistic understanding of emotions in generative AI models. Specifically, we propose three approaches: 1) EmotionPrompt to enhance AI model performance, 2) EmotionAttack to impair AI model performance, and 3) EmotionDecode to explain the effects of emotional stimuli, both benign and malignant. Through extensive experiments involving language and multi-modal models on semantic understanding, logical reasoning, and generation tasks, we demonstrate that both textual and visual EmotionPrompt can boost the performance of AI models while EmotionAttack can hinder it. More importantly, EmotionDecode reveals that AI models can comprehend emotional stimuli akin to the mechanism of dopamine in the human brain. Our work heralds a novel avenue for exploring psychology to enhance our understanding of generative AI models, thus boosting the research and development of human-AI collaboration and mitigating potential risks.

AAAI Conference 2023 Conference Paper

Unsupervised Paraphrasing under Syntax Knowledge

  • Tianyuan Liu
  • Yuqing Sun
  • Jiaqi Wu
  • Xi Xu
  • Yuchen Han
  • Cheng Li
  • Bin Gong

The soundness of syntax is an important issue for the paraphrase generation task. Most methods control the syntax of paraphrases by embedding the syntax and semantics in the generation process, which cannot guarantee the syntactical correctness of the results. Different from them, in this paper we investigate the structural patterns of word usages termed as the word composable knowledge and integrate it into the paraphrase generation to control the syntax in an explicit way. This syntax knowledge is pretrained on a large corpus with the dependency relationships and formed as the probabilistic functions on the word-level syntactical soundness. For the sentence-level correctness, we design a hierarchical syntax structure loss to quantitatively verify the syntactical soundness of the paraphrase against the given dependency template. Thus, the generation process can select the appropriate words with consideration on both semantics and syntax. The proposed method is evaluated on a few paraphrase datasets. The experimental results show that the quality of paraphrases by our proposed method outperforms the compared methods, especially in terms of syntax correctness.

IROS Conference 2022 Conference Paper

A Value-based Dynamic Learning Approach for Vehicle Dispatch in Ride-Sharing

  • Cheng Li
  • David Parker 0001
  • Qi Hao 0003

To ensure real-time response to passengers, existing solutions to the vehicle dispatch problem typically optimize dispatch policies using small batch windows and ignore the spatial-temporal dynamics over the long-term horizon. In this paper, we focus on improving the long-term performance of ride-sharing services and propose a deep reinforcement learning based approach for the ride-sharing dispatch problem. In particular, this work includes: (1) an offline policy evaluation (OPE) based method to learn a value function that indicates the expected reward of a vehicle reaching a particular state; (2) an online learning procedure to update the offline trained value function to capture the real-time dynamics during the operation; (3) an efficient online dispatch method that optimizes the matching policy by considering both past and future influences. Extensive simulations are conducted based on New York City taxi data, and show that the proposed solution further increases the service rate compared to the state-of-the-art farsighted ride-sharing dispatch approach.

AAAI Conference 2022 Conference Paper

Deep Spatial Adaptive Network for Real Image Demosaicing

  • Tao Zhang
  • Ying Fu
  • Cheng Li

Demosaicing is the crucial step in the image processing pipeline and is a highly ill-posed inverse problem. Recently, various deep learning based demosaicing methods have achieved promising performance, but they often design the same nonlinear mapping function for different spatial locations and do not well consider the difference of mosaic pattern for each color. In this paper, we propose a deep spatial adaptive network (SANet) for real image demosaicing, which can adaptively learn the nonlinear mapping function for different locations. The weights of spatial adaptive convolution layer are generated by the pattern information in the receptive filed. Besides, we collect a paired real demosaicing dataset to train and evaluate the deep network, which can make the learned demosaicing network more practical in the real world. The experimental results show that our SANet outperforms the state-of-the-art methods under both comprehensive quantitative metrics and perceptive quality in both noiseless and noisy cases.

JMLR Journal 2022 Journal Article

Distributed Bayesian Varying Coefficient Modeling Using a Gaussian Process Prior

  • Rajarshi Guhaniyogi
  • Cheng Li
  • Terrance D. Savitsky
  • Sanvesh Srivastava

Varying coefficient models (VCMs) are widely used for estimating nonlinear regression functions for functional data. Their Bayesian variants using Gaussian process priors on the functional coefficients, however, have received limited attention in massive data applications, mainly due to the prohibitively slow posterior computations using Markov chain Monte Carlo (MCMC) algorithms. We address this problem using a divide-and-conquer Bayesian approach. We first create a large number of data subsamples with much smaller sizes. Then, we formulate the VCM as a linear mixed-effects model and develop a data augmentation algorithm for obtaining MCMC draws on all the subsets in parallel. Finally, we aggregate the MCMC-based estimates of subset posteriors into a single Aggregated Monte Carlo (AMC) posterior, which is used as a computationally efficient alternative to the true posterior distribution. Theoretically, we derive minimax optimal posterior convergence rates for the AMC posteriors of both the varying coefficients and the mean regression function. We provide quantification on the orders of subset sample sizes and the number of subsets. The empirical results show that the combination schemes that satisfy our theoretical assumptions, including the AMC posterior, have better estimation performance than their main competitors across diverse simulations and in a real data analysis. [abs] [ pdf ][ bib ] &copy JMLR 2022. ( edit, beta )

JBHI Journal 2022 Journal Article

Single-Channel Selection for EEG-Based Emotion Recognition Using Brain Rhythm Sequencing

  • Jia Wen Li
  • Shovan Barma
  • Peng Un Mak
  • Fei Chen
  • Cheng Li
  • Ming Tao Li
  • Mang I Vai
  • Sio Hang Pun

Recently, electroencephalography (EEG) signals have shown great potential for emotion recognition. Nevertheless, multichannel EEG recordings lead to redundant data, computational burden, and hardware complexity. Hence, efficient channel selection, especially single-channel selection, is vital. For this purpose, a technique termed brain rhythm sequencing (BRS) that interprets EEG based on a dominant brain rhythm having the maximum instantaneous power at each 0. 2 s timestamp has been proposed. Then, dynamic time warping (DTW) is used for rhythm sequence classification through the similarity measure. After evaluating the rhythm sequences for the emotion recognition task, the representative channel that produces impressive accuracy can be found, which realizes single-channel selection accordingly. In addition, the appropriate time segment for emotion recognition is estimated during the assessments. The results from the music emotion recognition (MER) experiment and three emotional datasets (SEED, DEAP, and MAHNOB) indicate that the classification accuracies achieve 70–82% by single-channel data with a 10 s time length. Such performances are remarkable when considering minimum data sources as the primary concerns. Furthermore, the individual characteristics in emotion recognition are investigated based on the channels and times found. Therefore, this study provides a novel method to solve single-channel selection for emotion recognition.

JBHI Journal 2021 Journal Article

Infant Facial Expression Analysis: Towards a Real-Time Video Monitoring System Using R-CNN and HMM

  • Cheng Li
  • A. Pourtaherian
  • L. van Onzenoort
  • W. E. Tjon a Ten
  • P. H. N. de With

The manual monitoring of young infants suffering from diseases like reflux is significant, since infants can hardly articulate their feelings. In this work, we propose a video-based infant monitoring system for the analysis of infant expressions and states, approaching real-time performance. The expressions of interest consist of discomfort, unhappy, joy and neutral, whereas states include sleep, pacifier and open mouth. Benefiting from the expression analysis, the discomfort moments can also be used and correlated with a symptom-related disease, such as a reflux measurement for the diagnosis of gastroesophageal reflux. The system consists of three components: infant expressions and states detection, object tracking and detection compensation. The proposed system is based on combining expression detection using Fast R-CNN with a compensated detection using analyzing information from the previous frame and utilizing a Hidden Markov Model. The experimental results show a mean average precision of 81. 9% and 84. 8% for 4 infant expressions and 3 states evaluated with both clinical and daily datasets. Meanwhile, the average precision for discomfort detection achieves up to 90%.

ICRA Conference 2021 Conference Paper

Optimal Online Dispatch for High-Capacity Shared Autonomous Mobility-on-Demand Systems

  • Cheng Li
  • David Parker 0001
  • Qi Hao 0003

Shared autonomous mobility-on-demand systems hold great promise for improving the efficiency of urban transportation, but are challenging to implement due to the huge scheduling search space and highly dynamic nature of requests. This paper presents a novel optimal schedule pool (OSP) assignment approach to optimally dispatch high-capacity ride-sharing vehicles in real time, including: (1) an incremental search algorithm that can efficiently compute the exact lowest-cost schedule of a ride-sharing trip with a reduced search space; (2) an iterative online re-optimization strategy to dynamically alter the assignment policy for new incoming requests, in order to maximize the service rate. Experimental results based on New York City taxi data show that our proposed approach outperforms the state-of-the-art in terms of service rate and system scalability.

IROS Conference 2021 Conference Paper

Vehicle Dispatch in On-Demand Ride-Sharing with Stochastic Travel Times

  • Cheng Li
  • David Parker 0001
  • Qi Hao 0003

On-demand ride-sharing is a promising way to improve mobility efficiency and reliability. The quality of passenger experience and the profit achieved by these platforms are strongly affected by the vehicle dispatch policy. However, existing ride-sharing research seldom considers travel time uncertainty, which leads to inaccurate dispatch allocations. This paper proposes a framework for dynamic vehicle dispatch that leverages stochastic travel time models to improve the performance of a fleet of shared vehicles. The novelty of this work includes: (1) a stochastic on-demand ride-sharing scheme to maximize the service rate (percentage of requests served) and reliability (probability of on-time arrival); (2) a technique based on approximate stochastic shortest path algorithms to compute the reliability for a ride-sharing trip; (3) a method to maximize the profit when a penalty for late arrivals is introduced. Based on New York City taxi data, it is shown that by considering travel time uncertainty, ride-sharing service achieves higher service rate, reliability and profit.

AAAI Conference 2018 Conference Paper

A Cascaded Inception of Inception Network With Attention Modulated Feature Fusion for Human Pose Estimation

  • Wentao Liu
  • Jie Chen
  • Cheng Li
  • Chen Qian
  • Xiao Chu
  • Xiaolin Hu

Accurate keypoint localization of human pose needs diversified features: the high level for contextual dependencies and the low level for detailed refinement of joints. However, the importance of the two factors varies from case to case, but how to efficiently use the features is still an open problem. Existing methods have limitations in preserving low level features, adaptively adjusting the importance of different levels of features, and modeling the human perception process. This paper presents three novel techniques step by step to efficiently utilize different levels of features for human pose estimation. Firstly, an inception of inception (IOI) block is designed to emphasize the low level features. Secondly, an attention mechanism is proposed to adjust the importance of individual levels according to the context. Thirdly, a cascaded network is proposed to sequentially localize the joints to enforce message passing from joints of stand-alone parts like head and torso to remote joints like wrist or ankle. Experimental results demonstrate that the proposed method achieves the state-of-the-art performance on both MPII and LSP benchmarks.

AAAI Conference 2018 Conference Paper

Merge or Not? Learning to Group Faces via Imitation Learning

  • Yue He
  • Kaidi Cao
  • Cheng Li
  • Chen Loy

Face grouping remains a challenging problem despite the remarkable capability of deep learning approaches in learning face representation. In particular, grouping results can still be egregious given profile faces and a large number of uninteresting faces and noisy detections. Often, a user needs to correct the erroneous grouping manually. In this study, we formulate a novel face grouping framework that learns clustering strategy from ground-truth simulated behavior. This is achieved through imitation learning (a. k. a apprenticeship learning or learning by watching) via inverse reinforcement learning (IRL). In contrast to existing clustering approaches that group instances by similarity, our framework makes sequential decision to dynamically decide when to merge two face instances/groups driven by short- and long-term rewards. Extensive experiments on three benchmark datasets show that our framework outperforms unsupervised and supervised baselines.

JMLR Journal 2018 Journal Article

Scalable Bayes via Barycenter in Wasserstein Space

  • Sanvesh Srivastava
  • Cheng Li
  • David B. Dunson

Divide-and-conquer based methods for Bayesian inference provide a general approach for tractable posterior inference when the sample size is large. These methods divide the data into smaller subsets, sample from the posterior distribution of parameters in parallel on all the subsets, and combine posterior samples from all the subsets to approximate the full data posterior distribution. The smaller size of any subset compared to the full data implies that posterior sampling on any subset is computationally more efficient than sampling from the true posterior distribution. Since the combination step takes negligible time relative to sampling, posterior computations can be scaled to massive data by dividing the full data into sufficiently large number of data subsets. One such approach relies on the geometry of posterior distributions estimated across different subsets and combines them through their barycenter in a Wasserstein space of probability measures. We provide theoretical guarantees on the accuracy of approximation that are valid in many applications. We show that the geometric method approximates the full data posterior distribution better than its competitors across diverse simulations and reproduces known results when applied to a movie ratings database. [abs] [ pdf ][ bib ] &copy JMLR 2018. ( edit, beta )

NeurIPS Conference 2017 Conference Paper

From which world is your graph

  • Cheng Li
  • Felix Wong
  • Zhenming Liu
  • Varun Kanade

Discovering statistical structure from links is a fundamental problem in the analysis of social networks. Choosing a misspecified model, or equivalently, an incorrect inference algorithm will result in an invalid analysis or even falsely uncover patterns that are in fact artifacts of the model. This work focuses on unifying two of the most widely used link-formation models: the stochastic block model (SBM) and the small world (or latent space) model (SWM). Integrating techniques from kernel learning, spectral graph theory, and nonlinear dimensionality reduction, we develop the first statistically sound polynomial-time algorithm to discover latent patterns in sparse graphs for both models. When the network comes from an SBM, the algorithm outputs a block structure. When it is from an SWM, the algorithm outputs estimates of each node's latent position.

IJCAI Conference 2017 Conference Paper

High Dimensional Bayesian Optimization using Dropout

  • Cheng Li
  • Sunil Gupta
  • Santu Rana
  • Vu Nguyen
  • Svetha Venkatesh
  • Alistair Shilton

Scaling Bayesian optimization to high dimensions is challenging task as the global optimization of high-dimensional acquisition function can be expensive and often infeasible. Existing methods depend either on limited “active” variables or the additive form of the objective function. We propose a new method for high-dimensional Bayesian optimization, that uses a drop-out strategy to optimize only a subset of variables at each iteration. We derive theoretical bounds for the regret and show how it can inform the derivation of our algorithm. We demonstrate the efficacy of our algorithms for optimization on two benchmark functions and two real-world applications - training cascade classifiers and optimizing alloy composition.