Arrow Research search

Author name cluster

Ming Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

46 papers
2 author rows

Possible papers

46

AAAI Conference 2026 Conference Paper

CCFQA: A Benchmark for Cross-Lingual and Cross-Modal Speech and Text Factuality Evaluation

  • Yexing Du
  • Kaiyuan Liu
  • Youcheng Pan
  • Zheng Chu
  • Bo Yang
  • Xiaocheng Feng
  • Ming Liu
  • Yang Xiang

As Large Language Models (LLMs) are increasingly popularized in the multilingual world, ensuring hallucination-free factuality becomes markedly crucial. However, existing benchmarks for evaluating the reliability of Multimodal Large Language Models (MLLMs) predominantly focus on textual or visual modalities with a primary emphasis on English, which creates a gap in evaluation when processing multilingual input, especially in speech. To bridge this gap, we propose a novel Cross-lingual and Cross-modal Factuality benchmark (CCFQA). Specifically, the CCFQA benchmark contains parallel speech-text factual questions across 8 languages, designed to systematically evaluate MLLMs' cross-lingual and cross-modal factuality capabilities. Our experimental results demonstrate that current MLLMs still face substantial challenges on the CCFQA benchmark. Furthermore, we propose a few-shot transfer learning strategy that effectively transfers the Question Answering (QA) capabilities of LLMs in English to multilingual Spoken Question Answering (SQA) tasks, achieving competitive performance with GPT-4o-mini-Audio using just 5-shot training. We release CCFQA as a foundational research resource to promote the development of MLLMs with more robust and reliable speech understanding capabilities.

AAAI Conference 2026 Conference Paper

From Sampling to Cognition: Modeling Internal Cognitive Confidence in Language Models for Robust Uncertainty Calibration

  • Hao Li
  • Tao He
  • Jiafeng Liang
  • Zheng Chu
  • Ming Liu

Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of tasks, yet they generally lack self-awareness, often displaying overconfidence when confronted with questions beyond their knowledge boundaries. This limitation severely hinders their trustworthiness in high-stakes scenarios. Existing calibration methods typically rely on sampling accuracy, derived from multiple outputs, as a proxy for model confidence. However, this coarse-grained metric fails to capture the model’s internal cognitive states, such as confusion, hallucination, or persistent belief in false knowledge. To address this, we propose CogConf (Cognitive Confidence), a cognitively grounded uncertainty signal that extends sampling accuracy by incorporating the semantic diversity of incorrect answers and the model’s abstention behaviors. By shifting the focus from sampling-based to cognition-oriented uncertainty modeling, CogConf offers a more faithful reflection of the model's internal beliefs. Building on this signal, we introduce CogAlign, a simple yet effective alignment framework that explicitly aligns the model’s verbalized confidence with CogConf, thereby producing uncertainty estimates that better reflect the model’s internal cognition. Experimental results on six knowledge-intensive in-domain and out-of-domain QA datasets demonstrate that CogConf robustly characterizes the model's internal uncertainty. Building on this foundation, CogAlign guides the model's expression to significantly enhance the trustworthiness and utility of its uncertainty calibration without compromising its underlying QA capabilities, while also demonstrating strong cross-task generalization and output stability. Offering a new pathway toward building more trustworthy LLMs.

AAAI Conference 2026 Conference Paper

Is Your (Reasoning) Multimodal Language Model Vulnerable Toward Distractions?

  • Ming Liu
  • Hao Chen
  • Jindong Wang
  • Liwen Wang
  • Jingchen Sun
  • Wensheng Zhang

Vision-Language Models (VLMs) have achieved success in tasks such as visual question answering, yet their resilience to distractions remains underexplored. Understanding how distractions affect VLMs' performance is crucial for real-world applications, as input data often contains noisy or irrelevant content. This paper assesses the robustness of VLMs—including general-purpose models and those specialized for reasoning—against distractions in the context of science question answering. We introduce I-ScienceQA, a new benchmark based on the ScienceQA dataset, which systematically injects distractions into both visual and textual contexts. We evaluate how distractions perturb the underlying reasoning processes of these models by analyzing changes in textual explanations leading to answers. Our findings show that most VLMs are vulnerable to distractions, with a noticeable degradation in reasoning when extraneous content is present. In particular, some models (including GPT-o4 mini) exhibit a higher degree of robustness. We also observe that textual distractions generally cause greater performance declines than visual distractions. Finally, we explore mitigation strategies such as prompt engineering. Although these strategies improve resilience modestly, our analysis highlights considerable room for further improvement in the robustness of VLMs.

AAAI Conference 2026 Conference Paper

RefSTAR: Blind Face Image Restoration with Reference Selection, Transfer, and Reconstruction

  • Zhicun Yin
  • Junjie Chen
  • Ming Liu
  • Zhixin Wang
  • Fan Li
  • Renjing Pei
  • Xiaoming Li
  • Rynson W. H. Lau

Introducing high-quality references can largely alleviate the uncertainty in blind face image restoration tasks, yet the equivocal utilization of reference priors makes it still a struggle to well preserve the human identity. We attribute the identity inconsistency to two deficiencies of existing reference-based face restoration methods, namely the inability to effectively determine which features need to be transferred, and the failure to preserve the structure and details of the selected features. This work mainly focuses on these two issues, and we present a novel blind face image restoration method that considers reference selection, transfer, and reconstruction (RefSTAR) to introduce proper features from reference images. Specifically, we construct a reference selection (RefSel) module, which can generate accurate masks to select reference features. For training the RefSel module, we construct a RefSel-HQ dataset through a mask generation pipeline, which contains annotated masks for 10,000 ground truth-reference pairs. To guarantee the exact introduction of selected reference features, a feature fusion paradigm is designed for reference feature transferring, and a Mask-Compatible Cycle-Consistency Loss is redesigned based on reference reconstruction to further ensure the presence of selected reference image features in the output image. Experiments on various backbone models demonstrate superior performance, showing better identity preservation ability and reference feature transfer quality.

ICLR Conference 2025 Conference Paper

Effective Interplay between Sparsity and Quantization: From Theory to Practice

  • Simla Burcu Harma
  • Ayan Chakraborty 0005
  • Elizaveta Kostenok
  • Danila Mishin
  • Dongho Ha
  • Babak Falsafi
  • Martin Jaggi
  • Ming Liu

The increasing size of deep neural networks (DNNs) necessitates effective model compression to reduce their computational and memory footprints. Sparsity and quantization are two prominent compression methods that have been shown to reduce DNNs' computational and memory footprints significantly while preserving model accuracy. However, how these two methods interact when combined together remains a key question for developers, as many tacitly assume that they are orthogonal, meaning that their combined use does not introduce additional errors beyond those introduced by each method independently. In this paper, we provide the first mathematical proof that sparsity and quantization are non-orthogonal. We corroborate these results with experiments spanning a range of large language models, including the OPT and LLaMA model families (with 125M to 8B parameters), and vision models like ViT and ResNet. We show that the order in which we apply these methods matters because applying quantization before sparsity may disrupt the relative importance of tensor elements, which may inadvertently remove significant elements from a tensor. More importantly, we show that even if applied in the correct order, the compounded errors from sparsity and quantization can significantly harm accuracy. Our findings extend to the efficient deployment of large models in resource-constrained compute platforms to reduce serving cost, offering insights into best practices for applying these compression methods to maximize hardware resource efficiency without compromising accuracy.

ECAI Conference 2025 Conference Paper

Endexformer: Hierarchical Endogenous-Exogenous Synergy for Multivariate Time Series Forecasting

  • Zhiquan Huang
  • Ruijuan Zheng
  • Junlong Zhu
  • Luxin Liu
  • Meiwen Li
  • Ming Liu

Exogenous variables provide complementary information that enhances endogenous representations, thereby facilitating more accurate multivariate time series forecasting (MTSF). However, existing methods typically overlook the synergistic interplay between exogenous and endogenous variables by adopting shallow fusion strategies such as simple concatenation or separate encoding, which fail to capture the dynamic dependencies essential for modeling complex temporal patterns. To address this issue, we propose Endexformer, a novel hierarchical Endogenous-Exogenous modeling framework built upon the Transformer architecture. Specifically, Endexformer adopts a hierarchical architecture to jointly model temporal embeddings of endogenous variables and structural embeddings of exogenous variables, enabling a unified representation of cross-variable dependencies. To capture the fine-grained temporal patterns of endogenous variables, we present a multilevel temporal attention mechanism that leverages variable-level embeddings to adaptively incorporate exogenous information. Furthermore, we design a dynamic interactive attention mechanism that selectively emphasizes informative endogenous and exogenous patterns, mitigating redundancy and preserving semantic integrity in variable representations. Extensive experiments on eight real-world datasets show that Endexformer achieves outstanding performance against competing benchmark approaches in MTSF tasks across various temporal scenarios.

ICLR Conference 2025 Conference Paper

Is Your Video Language Model a Reliable Judge?

  • Ming Liu
  • Wensheng Zhang

As video language models (VLMs) gain more applications in various scenarios, the need for robust and scalable evaluation of their performance becomes increasingly critical. The traditional human expert-based evaluation of VLMs has limitations in consistency and scalability, which sparked interest in automatic methods such as employing VLMs to evaluate VLMs. However, the reliability of VLMs as judges remains underexplored. Existing methods often rely on a single VLM as the evaluator. However, this approach can be unreliable or biased because such a model may lack the ability to fully understand the content and may have inherent biases, ultimately compromising evaluation reliability. A remedy is to apply the principle of collective thoughts, aggregating evaluations from multiple VLMs to enhance reliability. This study investigates the efficacy of such approaches, particularly when the pool of judges includes both reliable and unreliable models. Our findings reveal that incorporating collective judgments from such a mixed pool does not necessarily improve the accuracy of the final evaluation. The inclusion of less reliable judges can introduce noise, undermining the overall reliability of the outcomes. To explore the factors that impact evaluation reliability, we fine-tune an underperforming VLM judge, Video-LLaVA, and observe that improved understanding ability alone is insufficient to make VLM judges more reliable. These findings stress the limitations of collective thought approaches and highlight the need for more advanced methods that can account for the reliability of individual models. Our study promotes the development of more reliable evaluation methods for VLMs

NeurIPS Conference 2025 Conference Paper

On Fairness of Unified Multimodal Large Language Model for Image Generation

  • Ming Liu
  • Hao Chen
  • Jindong Wang
  • Liwen Wang
  • Bhiksha Raj
  • Wensheng Zhang

Unified multimodal large language models (U-MLLMs) have demonstrated impressive performance in end-to-end visual understanding and generation tasks. However, compared to generation-only systems (e. g. , Stable Diffusion), the unified architecture of U-MLLMs introduces new risks of propagating demographic stereotypes. In this paper, we benchmark several state-of-the-art U-MLLMs and show that they exhibit significant gender and race biases in the generated outputs. To diagnose the source of these biases, we propose a locate-then-fix framework: we first audit the vision and language components — using techniques such as linear probing and controlled generation — and find that the language model appears to be a primary origin of the observed generative bias. Moreover, we observe a ``partial alignment'' phenomenon, where the U-MLLMs exhibit less bias in understanding tasks yet produce substantially biased images. To address this, we introduce a novel \emph{balanced preference loss} that enforces uniform generation probabilities across demographics by leveraging a synthetically balanced dataset. Extensive experiments show that our approach significantly reduces demographic bias while preserving semantic fidelity and image quality. Our findings underscore the need for targeted debiasing strategies in unified multimodal systems and introduce a practical approach to mitigate biases.

AAAI Conference 2025 Conference Paper

Simulation-Free Hierarchical Latent Policy Planning for Proactive Dialogues

  • Tao He
  • Lizi Liao
  • Yixin Cao
  • Yuanxing Liu
  • Yiheng Sun
  • Zerui Chen
  • Ming Liu
  • Bing Qin

Recent advancements in proactive dialogues have garnered significant attention, particularly for more complex objectives (e.g. emotion support and persuasion). Unlike traditional task-oriented dialogues, proactive dialogues demand advanced policy planning and adaptability, requiring rich scenarios and comprehensive policy repositories to develop such systems. However, existing approaches tend to rely on Large Language Models (LLMs) for user simulation and online learning, leading to biases that diverge from realistic scenarios and result in suboptimal efficiency. Moreover, these methods depend on manually defined, context-independent, coarse-grained policies, which not only incur high expert costs but also raise concerns regarding their completeness. In our work, we highlight the potential for automatically discovering policies directly from raw, real-world dialogue records. To this end, we introduce a novel dialogue policy planning framework, LDPP. It fully automates the process from mining policies in dialogue records to learning policy planning. Specifically, we employ a variant of the Variational Autoencoder to discover fine-grained policies represented as latent vectors. After automatically annotating the data with these latent policy labels, we propose an Offline Hierarchical Reinforcement Learning (RL) algorithm in the latent space to develop effective policy planning capabilities. Our experiments demonstrate that LDPP outperforms existing methods on two proactive scenarios, even surpassing ChatGPT with only a 1.8-billion-parameter LLM.

EAAI Journal 2024 Journal Article

Coal allocation optimization based on a hybrid residual prediction model with an improved genetic algorithm

  • Ming Liu
  • Ziqi Yu
  • Boran Li
  • Qingjie Wang
  • Huawei Ren
  • Dong Xu

The objective of the coal blending optimization problem is to find an optimal coal blending in the feasible domain such that the blended coal meets the quality requirements at the end of the coking process and the cost of coal blending is minimized. This paper proposes a hybrid residual prediction model and an improved genetic algorithm to solve this problem and predict coke quality. For this purpose, a hybrid residual prediction model is used to predict coke quality. The model first uses a random forest feature extraction method to reduce the dimensionality of the data, and then trains several prediction models such as eXtreme Gradient Boosting (XGBoost), Adaboost and Light Gradient-Boosting Machine (lightGBM) for different coke indicators an improved genetic algorithm based on the adaptive weighted genetic algorithm (awGA) and another improved genetic algorithm based on a priori knowledge and adaptive random initialization method were designed and implemented to solve the optimization problem under strict constraints (P-awGA). The experimental results show that using the hybrid residual prediction model and the improved genetic algorithm can accurately predict the coke quality and use less time to obtain a lower-cost coal blending solution.

NeurIPS Conference 2024 Conference Paper

Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation

  • Jingchang Chen
  • Hongxuan Tang
  • Zheng Chu
  • Qianglong Chen
  • Zekun Wang
  • Ming Liu
  • Bing Qin

Despite recent progress made by large language models in code generation, they still struggle with programs that meet complex requirements. Recent work utilizes plan-and-solve decomposition to decrease the complexity and leverage self-tests to refine the generated program. Yet, planning deep-inside requirements in advance can be challenging, and the tests need to be accurate to accomplish self-improvement. To this end, we propose FunCoder, a code generation framework incorporating the divide-and-conquer strategy with functional consensus. Specifically, FunCoder recursively branches off sub-functions as smaller goals during code generation, represented by a tree hierarchy. These sub-functions are then composited to attain more complex objectives. Additionally, we designate functions via a consensus formed by identifying similarities in program behavior, mitigating error propagation. FunCoder outperforms state-of-the-art methods by +9. 8% on average in HumanEval, MBPP, xCodeEval and MATH with GPT-3. 5 and GPT-4. Moreover, our method demonstrates superiority on smaller models: With FunCoder, StableCode-3b surpasses GPT-3. 5 by +18. 6% and achieves 97. 7% of GPT-4's performance on HumanEval. Further analysis reveals that our proposed dynamic function decomposition is capable of handling complex requirements, and the functional consensus prevails over self-testing in correctness evaluation.

IJCAI Conference 2024 Conference Paper

GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension

  • Jiafeng Liang
  • Shixin Jiang
  • Zekun Wang
  • Haojie Pan
  • Zerui Chen
  • Zheng Chu
  • Ming Liu
  • Ruiji Fu

There are substantial instructional videos on the Internet, which provide us tutorials for completing various tasks. Existing instructional video datasets only focus on specific steps at the video level, lacking experiential guidelines at the task level, which can lead to beginners struggling to learn new tasks due to the lack of relevant experience. Moreover, the specific steps without guidelines are trivial and unsystematic, making it difficult to provide a clear tutorial. To address these problems, we present the Guide (Guideline-Guided) dataset, which contains 3. 5K videos of 560 instructional tasks in 8 domains related to our daily life. Specifically, we annotate each instructional task with a guideline, representing a common pattern shared by all task-related videos. On this basis, we annotate systematic specific steps, including their associated guideline steps, specific step descriptions and timestamps. Our proposed benchmark consists of three sub-tasks to evaluate comprehension ability of models: (1) Step Captioning: models have to generate captions for specific steps from videos. (2) Guideline Summarization: models have to mine the common pattern in task-related videos and summarize a guideline from them. (3) Guideline-Guided Captioning: models have to generate captions for specific steps under the guide of guideline. We evaluate plenty of foundation models with Guide and perform in-depth analysis. Given the diversity and practicality of Guide, we believe that it can be used as a better benchmark for instructional video comprehension.

IJCAI Conference 2024 Conference Paper

MGCBS: An Optimal and Efficient Algorithm for Solving Multi-Goal Multi-Agent Path Finding Problem

  • Mingkai Tang
  • Yuanhang Li
  • Hongji Liu
  • Yingbing Chen
  • Ming Liu
  • Lujia Wang

With the expansion of the scale of robotics applications, the multi-goal multi-agent pathfinding (MG-MAPF) problem began to gain widespread attention. This problem requires each agent to visit pre-assigned multiple goal points at least once without conflict. Some previous methods have been proposed to solve the MG-MAPF problem based on Decoupling the goal Vertex visiting order search and the Single-agent pathfinding (DVS). However, this paper demonstrates that the methods based on DVS cannot always obtain the optimal solution. To obtain the optimal result, we propose the Multi-Goal Conflict-Based Search (MGCBS), which is based on Decoupling the goal Safe interval visiting order search and the Single-agent pathfinding (DSS). Additionally, we present the Time-Interval-Space Forest (TIS Forest) to enhance the efficiency of MGCBS by maintaining the shortest paths from any start point at any start time step to each safe interval at the goal points. The experiment demonstrates that our method can consistently obtain optimal results and execute up to 7 times faster than the state-of-the-art method in our evaluation.

IJCAI Conference 2023 Conference Paper

A Survey on Out-of-Distribution Evaluation of Neural NLP Models

  • Xinzhe Li
  • Ming Liu
  • Shang Gao
  • Wray Buntine

Adversarial robustness, domain generalization and dataset biases are three active lines of research contributing to out-of-distribution (OOD) evaluation on neural NLP models. However, a comprehensive, integrated discussion of the three research lines is still lacking in the literature. This survey will 1) compare the three lines of research under a unifying definition; 2) summarize their data-generating processes and evaluation protocols for each line of research; and 3) emphasize the challenges and opportunities for future work.

EAAI Journal 2023 Journal Article

Counterfactual-based minority oversampling for imbalanced classification

  • Shu Wang
  • Hao Luo
  • Shanshan Huang
  • Qingsong Li
  • Li Liu
  • Guoxin Su
  • Ming Liu

A key challenge of oversampling in imbalanced classification is that the generation of new minority samples often neglects the usage of majority classes, resulting in most new minority sampling spreading the whole minority space. In view of this, we present a new oversampling framework based on the counterfactual theory. Our framework introduces a counterfactual objective by leveraging the rich inherent information of majority classes and explicitly perturbing majority samples to generate new samples in the territory of minority space. It can be analytically shown that the new minority samples satisfy the minimum inversion. Therefore, most of them are located near the decision boundary. The empirical evaluation of the six benchmark datasets shows that our approach clearly outperforms the state-of-the-art methods.

AAAI Conference 2023 Conference Paper

Enhanced Multi-Relationships Integration Graph Convolutional Network for Inferring Substitutable and Complementary Items

  • Huajie Chen
  • Jiyuan He
  • Weisheng Xu
  • Tao Feng
  • Ming Liu
  • Tianyu Song
  • Runfeng Yao
  • Yuanyuan Qiao

Understanding the relationships between items can improve the accuracy and interpretability of recommender systems. Among these relationships, the substitute and complement relationships attract the most attention in e-commerce platforms. The substitutable items are interchangeable and might be compared with each other before purchasing, while the complementary items are used in conjunction and are usually bought together with the query item. In this paper, we focus on two issues of inferring the substitutable and complementary items: 1) how to model their mutual influence to improve the performance of downstream tasks, 2) how to further discriminate them by considering the strength of relationship for different item pairs. We propose a novel multi-task learning framework named Enhanced Multi-Relationships Integration Graph Convolutional Network (EMRIGCN). We regard the relationship inference task as a link prediction task in heterogeneous graph with different types of edges between nodes (items). To model the mutual influence between substitute and complement, EMRIGCN adopts a two-level integration module, i.e., feature and structure integration, based on experts sharing mechanism during message passing. To obtain the strength of relationship for item pairs, we build an auxiliary loss function to further increase or decrease the distances between embeddings of items with weak or strong relation in latent space. Extensive experiments on both public and industrial datasets prove that EMRIGCN significantly outperforms the state-of-the-art solutions. We also conducted A/B tests on real world recommender systems of Meituan Maicai, an online supermarket platform in China, and obtained 15.3% improvement on VBR and 15.34% improvement on RPM.

IJCAI Conference 2023 Conference Paper

GTR: A Grafting-Then-Reassembling Framework for Dynamic Scene Graph Generation

  • Jiafeng Liang
  • Yuxin Wang
  • Zekun Wang
  • Ming Liu
  • Ruiji Fu
  • Zhongyuan Wang
  • Bing Qin

Dynamic scene graph generation aims to identify visual relationships (subject-predicate-object) in frames based on spatio-temporal contextual information in the video. Previous work implicitly models the spatio-temporal interaction simultaneously, which leads to entanglement of spatio-temporal contextual information. To this end, we propose a Grafting-Then-Reassembling framework (GTR), which explicitly extracts intra-frame spatial information and inter-frame temporal information in two separate stages to decouple spatio-temporal contextual information. Specifically, we first graft a static scene graph generation model to generate static visual relationships within frames. Then we propose the temporal dependency model to extract the temporal dependencies across frames, and explicitly reassemble static visual relationships into dynamic scene graphs. Experimental results show that GTR achieves the state-of-the-art performance on Action Genome dataset. Further analyses reveal that the reassembling stage is crucial to the success of our framework.

IROS Conference 2022 Conference Paper

360ST-Mapping: An Online Semantics-Guided Topological Mapping Module for Omnidirectional Visual SLAM

  • Hongji Liu
  • Huajian Huang
  • Sai-Kit Yeung
  • Ming Liu

As an abstract representation of the environment structure, a topological map has advantageous properties for path-planning and navigation. Here we proposed an online topological mapping method, 360ST-Mapping, using omnidirectional vision. The 360° field-of-view allows the agent to obtain consistent observation and incrementally extract topological environment information. Moreover, we leverage semantic infor-mation to guide topological place recognition, further improving performance. The topological map possessing semantic infor-mation has the potential to support semantics-related advanced tasks. After integrating the topological mapping module into the omnidirectional visual SLAM system, we conducted extensive experiments in several large-scale indoor scenes and validated the method's effectiveness.

IJCAI Conference 2022 Conference Paper

Cost Ensemble with Gradient Selecting for GANs

  • Minghui Liu
  • Jiali Deng
  • Meiyi Yang
  • Xuan Cheng
  • Nianbo Liu
  • Ming Liu
  • Xiaomin Wang

Generative Adversarial Networks(GANs) are powerful generative models on numerous tasks and datasets but are also known for their training instability and mode collapse. The latter is because the optimal transportation map is discontinuous, but DNNs can only approximate continuous ones. One way to solve the problem is to introduce multiple discriminators or generators. However, their impacts are limited because the cost function of each component is the same. That is, they are homogeneous. In contrast, multiple discriminators with different cost functions can yield various gradients for the generator, which indicates we can use them to search for more transportation maps in the latent space. Inspired by this, we have proposed a framework to combat the mode collapse problem, containing multiple discriminators with different cost functions, named CES-GAN. Unfortunately, it may also lead to the generator being hard to train because the performance between discriminators is unbalanced, according to the Cannikin Law. Thus, a gradient selecting mechanism is also proposed to pick up proper gradients. We provide mathematical statements to prove our assumptions and conduct extensive experiments to verify the performance. The results show that CES-GAN is lightweight and more effective for fighting against the mode collapse problem than similar works.

EAAI Journal 2022 Journal Article

Localization of myocardial infarction using a multi-branch weight sharing network based on 2-D vectorcardiogram

  • Cong He
  • Ming Liu
  • Peng Xiong
  • Jianli Yang
  • Haiman Du
  • Jinpeng Xu
  • Zengguang Hou
  • Xiuling Liu

Early diagnosis and localization of myocardial infarction (MI) assist clinicians in saving numerous lives through the timely treatment for patients with MI. Vectorcardiogram (VCG) can reflect the characteristic changes of cardiac electrical activity in MI in detail. In this context, the present study reports a multi-branch weight sharing network model based on 2-D VCG constructed to realize the automatic localization of MI. The three-branch network extracted the spatial morphological features of the three planes of the 2-D VCG, respectively, and the weight-sharing part of the network obtained the spatial correlation information among the three planes. Subsequently, the Softmax classifier was employed to classify normal individuals and MI patients (11 class infarct sites). To evaluate the performance of the proposed method for MI localization, PTB(Physikalisch-Technische Bundesanstalt) diagnostic ECG database was employed. The localization accuracy, sensitivity, and specificity achieved using the proposed method were 99. 87%, 99. 92%, and 99. 99%, respectively. Thus, the proposed scheme is expected to be useful in assisting cardiologists in interpreting VCG for clinical diagnosis.

JBHI Journal 2022 Journal Article

Reinforcement Learning Based Diagnosis and Prediction for COVID-19 by Optimizing a Mixed Cost Function From CT Images

  • Siying Chen
  • Minghui Liu
  • Pan Deng
  • Jiali Deng
  • Yi Yuan
  • Xuan Cheng
  • Tianshu Xie
  • Libo Xie

A novel coronavirus disease (COVID-19) is a pandemic disease has caused 4 million deaths and more than 200 million infections worldwide (as of August 4, 2021). Rapid and accurate diagnosis of COVID-19 infection is critical to controlling the spread of the epidemic. In order to quickly and efficiently detect COVID-19 and reduce the threat of COVID-19 to human survival, we have firstly proposed a detection framework based on reinforcement learning for COVID-19 diagnosis, which constructs a mixed loss function that can integrate the advantages of multiple loss functions. This paper uses the accuracy of the validation set as the reward value, and obtains the initial model for the next epoch by searching the model corresponding to the maximum reward value in each epoch. We also have proposed a prediction framework that integrates multiple detection frameworks using parameter sharing to predict the progression of patients' disease without additional training. This paper also constructed a higher-quality version of the CT image dataset containing 247 cases screened by professional physicians, and obtained more excellent results on this dataset. Meanwhile, we used the other two COVID-19 datasets as external verifications, and still achieved a high accuracy rate without additional training. Finally, the experimental results show that our classification accuracy can reach 98. 31%, and the precision, sensitivity, specificity, and AUC (Area Under Curve) are 98. 82%, 97. 99%, 98. 67%, and 0. 989, respectively. The accuracy of external verification can reach 93. 34% and 91. 05%. What's more, the accuracy of our prediction framework is 91. 54%. A large number of experiments demonstrate that our proposed method is effective and robust for COVID-19 detection and prediction.

NeurIPS Conference 2022 Conference Paper

Self-Supervised Image Restoration with Blurry and Noisy Pairs

  • Zhilu Zhang
  • RongJian Xu
  • Ming Liu
  • Zifei Yan
  • Wangmeng Zuo

When taking photos under an environment with insufficient light, the exposure time and the sensor gain usually require to be carefully chosen to obtain images with satisfying visual quality. For example, the images with high ISO usually have inescapable noise, while the long-exposure ones may be blurry due to camera shake or object motion. Existing solutions generally suggest to seek a balance between noise and blur, and learn denoising or deblurring models under either full- or self-supervision. However, the real-world training pairs are difficult to collect, and the self-supervised methods merely rely on blurry or noisy images are limited in performance. In this work, we tackle this problem by jointly leveraging the short-exposure noisy image and the long-exposure blurry image for better image restoration. Such setting is practically feasible due to that short-exposure and long-exposure images can be either acquired by two individual cameras or synthesized by a long burst of images. Moreover, the short-exposure images are hardly blurry, and the long-exposure ones have negligible noise. Their complementarity makes it feasible to learn restoration model in a self-supervised manner. Specifically, the noisy images can be used as the supervision information for deblurring, while the sharp areas in the blurry images can be utilized as the auxiliary supervision information for self-supervised denoising. By learning in a collaborative manner, the deblurring and denoising tasks in our method can benefit each other. Experiments on synthetic and real-world images show the effectiveness and practicality of the proposed method. Codes are available at https: //github. com/cszhilu1998/SelfIR.

EAAI Journal 2021 Journal Article

A multi-dimensional association information analysis approach to automated detection and localization of myocardial infarction

  • Jieshuo Zhang
  • Ming Liu
  • Peng Xiong
  • Haiman Du
  • Hong Zhang
  • Feng Lin
  • Zengguang Hou
  • Xiuling Liu

Developing an accurate and automatic algorithm for detection and localization of myocardial infarction (MI) remains a great challenge for multi-lead electrocardiograph (ECG) signals. The core is a novel technique of multi-dimensional association information analysis for a multi-lead ECG tensor. Tensorization based on Discrete Wavelet Transform is investigated to construct an effective ECG tensor containing multi-dimensional association information from 12-lead ECG signals. The multi-lead feature extraction algorithm based on Parallel Factor Analysis is developed to automatically extract the low-dimensional and highly recognizable lead characteristic features of the tensor. After that a bagged decision tree is constructed to categorize 12 types of heartbeats, healthy controls and 11 kinds of MI, from the lead features. Using the PTB database, we compare with the existing MI diagnosis methods. For MI detection, significant improvement of the accuracy, sensitivity and specificity are achieved; as high as 99. 88%, 99. 98% and 99. 39% respectively. Furthermore, an experiment with 36-dimensional features obtained from the ECG tensor is conducted for the localization of 11 kinds of MI, and our proposed method achieved an accuracy of 99. 40%, sensitivity of 99. 86%, and specificity of 99. 89%. The proposed algorithm can effectually accomplish the localization of 11 categories of MI by using the lead features extracted from the multi-dimensional association ECG tensor, which has not been achieved in literature. The accurate and comprehensive tool development will greatly help cardiologists diagnose 12-lead ECG signals of MI.

ECAI Conference 2020 Conference Paper

Dual Attention-Based Adversarial Autoencoder for Attributed Network Embedding

  • Ming Liu
  • Jianxin Liao
  • Jingyu Wang 0001
  • Qi Qi 0001
  • Haifeng Sun 0001

Existing embedding methods for Attributed Network aim to learn low-dimensional embeddings for nodes, which can preserve both consistency and complementarity for network structures and node attributes. The main assumption is that nodes with similar structures and/or similar attributes should be close in the embedding space. In reality, nodes with similar attributes might be far away from each other in topology and vice versa. The conflict is often caused by noisy links or incomplete network structures. Previous methods either independently project embeddings based on the assumption without considering the conflicts, or encode embeddings into a shared space ignoring the complementarity. In this paper, we propose a Dual Attention-based Adversarial Attributed Network Embedding framework (DAANE) to preserve the consistency and complementarity between structures and attributes, and reduce the conflict caused by their discrepancy. DAANE includes an attribute attention mechanism designed to detect and weakening the impact of noisy links and a structure attention mechanism applied to assign weights to network structures of different scales and capture a more complete global context. Furthermore, we develop efficient adversarial learning when combining the two heterogeneous embeddings. The adversarial auto-encoder projects embeddings of attributes and structures into the same space. Meanwhile, it completely circumvents the interference of various types of noise by removing the constraints of embedding space. Extensive experiments on three realworld network datasets indicate that the proposed model achieves state-of-the-art results.

AAAI Conference 2020 Conference Paper

Enhancing Personalized Trip Recommendation with Attractive Routes

  • Jiqing Gu
  • Chao Song
  • Wenjun Jiang
  • Xiaomin Wang
  • Ming Liu

Personalized trip recommendation tries to recommend a sequence of point of interests (POIs) for a user. Most of existing studies search POIs only according to the popularity of POIs themselves. In fact, the routes among the POIs also have attractions to visitors, and some of these routes have high popularity. We term this kind of route as Attractive Route (AR), which brings extra user experience. In this paper, we study the attractive routes to improve personalized trip recommendation. To deal with the challenges of discovery and evaluation of ARs, we propose a personalized Trip Recommender with POIs and Attractive Route (TRAR). It discovers the attractive routes based on the popularity and the Gini coefficient of POIs, then it utilizes a gravity model in a category space to estimate the rating scores and preferences of the attractive routes. Based on that, TRAR recommends a trip with ARs to maximize user experience and leverage the tradeoff between the time cost and the user experience. The experimental results show the superiority of TRAR compared with other state-of-the-art methods.

NeurIPS Conference 2020 Conference Paper

Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point

  • Bita Darvish Rouhani
  • Daniel Lo
  • Ritchie Zhao
  • Ming Liu
  • Jeremy Fowers
  • Kalin Ovtcharov
  • Anna Vinogradsky
  • Sarah Massengill

In this paper, we explore the limits of Microsoft Floating Point (MSFP), a new class of datatypes developed for production cloud-scale inferencing on custom hardware. Through the co-evolution of hardware design and algorithms, MSFP achieves accuracy comparable to or better than industry standards Bfloat16 and INT8 at 3x and 4x lower cost, respectively. MSFP incurs negligible impact to accuracy (<1%), requires no changes to the model topology, and is integrated with a mature cloud production pipeline. MSFP supports various classes of deep learning models including CNNs, RNNs, and Transformers without modification. Finally, we characterize the accuracy and implementation of MSFP and demonstrate its efficacy on a number of production scenarios, including models that power major online scenarios such as web search, question-answering, and image classification.

AAAI Conference 2020 Conference Paper

RobuTrans: A Robust Transformer-Based Text-to-Speech Model

  • Naihan Li
  • Yanqing Liu
  • Yu Wu
  • Shujie Liu
  • Sheng Zhao
  • Ming Liu

Recently, neural network based speech synthesis has achieved outstanding results, by which the synthesized audios are of excellent quality and naturalness. However, current neural TTS models suffer from the robustness issue, which results in abnormal audios (bad cases) especially for unusual text (unseen context). To build a neural model which can synthesize both natural and stable audios, in this paper, we make a deep analysis of why the previous neural TTS models are not robust, based on which we propose RobuTrans (Robust Transformer), a robust neural TTS model based on Transformer. Comparing to TransformerTTS, our model first converts input texts to linguistic features, including phonemic features and prosodic features, then feed them to the encoder. In the decoder, the encoder-decoder attention is replaced with a duration-based hard attention mechanism, and the causal self-attention is replaced with a ”pseudo non-causal attention” mechanism to model the holistic information of the input. Besides, the position embedding is replaced with a 1- D CNN, since it constrains the maximum length of synthesized audio. With these modifications, our model not only fix the robustness problem, but also achieves on parity MOS (4. 36) with TransformerTTS (4. 37) and Tacotron2 (4. 37) on our general set.

EAAI Journal 2019 Journal Article

Multi-lead model-based ECG signal denoising by guided filter

  • Huaqing Hao
  • Ming Liu
  • Peng Xiong
  • Haiman Du
  • Hong Zhang
  • Feng Lin
  • Zengguang Hou
  • Xiuling Liu

The electrocardiogram (ECG) denoising is of paramount importance for accurate disease diagnosis, but individual differences bring great difficulties for ECG denoising, especially for Dynamic Electrocardiography (DCG). In this paper, a multi-lead model-based ECG signal denoising method is proposed, in which a guided filter is inherently adapted to denoise ECG signal. For each person, a patient-specific statistical model will be constructed by sparse autoencoder (SAE) which can effectively preserve the detailed signal features. Thus, the guided signal producing by the statistical model can perform well in the guided filter. Especially, even the sudden morphological changes, the denoised ECG signals can still be conserved. The results on the 12-lead Arrhythmia Database and the MIT-BIH Arrhythmia Database demonstrate that the signal-to-noise ratio (SNR) improvement of the proposed method can reach as high as 21. 54 dB, and the mean squared error (MSE) is less than 0. 0401. Besides achievement of minimum signal distortion in comparisons with the major of the current denoising algorithms for complex noise environment, the proposed method demonstrate robustness in the complex interferences, especially in tracing the sudden morphological changes of ECG signals. Due to the remarkable superiority in preserving diagnostic and detail features of ECG signals, the proposed method can handle ECG signals with abnormal heart beats, and then can improve the accuracy detection of the disease.

AAAI Conference 2019 Conference Paper

Neural Speech Synthesis with Transformer Network

  • Naihan Li
  • Shujie Liu
  • Yanqing Liu
  • Sheng Zhao
  • Ming Liu

Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed and achieve state-of-theart performance, they still suffer from two problems: 1) low efficiency during training and inference; 2) hard to model long dependency using current recurrent neural networks (RNNs). Inspired by the success of Transformer network in neural machine translation (NMT), in this paper, we introduce and adapt the multi-head attention mechanism to replace the RNN structures and also the original attention mechanism in Tacotron2. With the help of multi-head self-attention, the hidden states in the encoder and decoder are constructed in parallel, which improves training efficiency. Meanwhile, any two inputs at different times are connected directly by a self-attention mechanism, which solves the long range dependency problem effectively. Using phoneme sequences as input, our Transformer TTS network generates mel spectrograms, followed by a WaveNet vocoder to output the final audio results. Experiments are conducted to test the efficiency and performance of our new network. For the efficiency, our Transformer TTS network can speed up the training about 4. 25 times faster compared with Tacotron2. For the performance, rigorous human tests show that our proposed model achieves state-of-the-art performance (outperforms Tacotron2 with a gap of 0. 048) and is very close to human quality (4. 39 vs 4. 44 in MOS).

AAAI Conference 2019 Conference Paper

Understanding Pictograph with Facial Features: End-to-End Sentence-Level Lip Reading of Chinese

  • Xiaobing Zhang
  • Haigang Gong
  • Xili Dai
  • Fan Yang
  • Nianbo Liu
  • Ming Liu

With the breakthrough of deep learning, lip reading technologies are under extraordinarily rapid progress. It is well-known that Chinese is the most widely spoken language in the world. Unlike alphabetic languages, it involves more than 1, 000 pronunciations as Pinyin, and nearly 90, 000 pictographic characters as Hanzi, which makes lip reading of Chinese very challenging. In this paper, we implement visual-only Chinese lip reading of unconstrained sentences in a two-step end-to-end architecture (LipCH-Net), in which two deep neural network models are employed to perform the recognition of Pictureto-Pinyin (mouth motion pictures to pronunciations) and the recognition of Pinyin-to-Hanzi (pronunciations to texts) respectively, before having a jointly optimization to improve the overall performance. In addition, two modules in the Pinyin-to-Hanzi model are pre-trained separately with large auxiliary data in advance of sequence-to-sequence training to make the best of long sequence matches for avoiding ambiguity. We collect 6-month daily news broadcasts from China Central Television (CCTV) website, and semi-automatically label them into a 20. 95 GB dataset with 20, 495 natural Chinese sentences. When trained on the CCTV dataset, the LipCH-Net model outperforms the performance of all stateof-the-art lip reading frameworks. According to the results, our scheme not only accelerates training and reduces overfitting, but also overcomes syntactic ambiguity of Chinese which provides a baseline for future relevant work.

IJCAI Conference 2018 Conference Paper

Topic-to-Essay Generation with Neural Networks

  • Xiaocheng Feng
  • Ming Liu
  • Jiahao Liu
  • Bing Qin
  • Yibo Sun
  • Ting Liu

We focus on essay generation, which is a challenging task that generates a paragraph-level text with multiple topics. Progress towards understanding different topics and expressing diversity in this task requires more powerful generators and richer training and evaluation resources. To address this, we develop a multi-topic aware long short-term memory (MTA-LSTM) network. In this model, we maintain a novel multi-topic coverage vector, which learns the weight of each topic and is sequentially updated during the decoding process. Afterwards this vector is fed to an attention model to guide the generator. Moreover, we automatically construct two paragraph-level Chinese essay corpora, 305, 000 essay paragraphs and 55, 000 question-and-answer pairs. Empirical results show that our approach obtains much better BLEU score compared to various baselines. Furthermore, human judgment shows that MTA-LSTM has the ability to generate essays that are not only coherent but also closely related to the input topics.

EAAI Journal 2016 Journal Article

ECG signal enhancement based on improved denoising auto-encoder

  • Peng Xiong
  • Hongrui Wang
  • Ming Liu
  • Suiping Zhou
  • Zengguang Hou
  • Xiuling Liu

The electrocardiogram (ECG) is a primary diagnostic tool for examining cardiac tissue and structures. ECG signals are often contaminated by noise, which can manifest with similar morphologies as an ECG waveform in the frequency domain. In this paper, a novel deep neural network (DNN) is proposed to solve the above mentioned problem. This DNN is created from an improved denoising auto-encoder (DAE) reformed by a wavelet transform (WT) method. A WT with scale-adaptive thresholding method is used to filter most of the noise. A DNN based on improved DAE is then used to remove any residual noise, which is often complex with an unknown distribution in the frequency domain. The proposed method was evaluated on ECG signals from the MIT-BIH Arrhythmia database, and added noise signals were obtained from the MIT-BIH Noise Stress Test database. The results show that the average of output signal-to-noise ratio (SNR) is from 21. 56dB to 22. 96dB, and the average of root mean square error (RMSE) is less than 0. 037. The proposed method showed significant improvement in SNR and RMSE compared with the individual processing with either a WT or DAE, thus providing promising approaches for ECG signal enhancement.

YNICL Journal 2016 Journal Article

Repeated acupuncture treatments modulate amygdala resting state functional connectivity of depressive patients

  • Xiaoyun Wang
  • Zengjian Wang
  • Jian Liu
  • Jun Chen
  • Xian Liu
  • Guangning Nie
  • Joon-Seok Byun
  • Yilin Liang

As a widely-applied alternative therapy, acupuncture is gaining popularity in Western society. One challenge that remains, however, is incorporating it into mainstream medicine. One solution is to combine acupuncture with other conventional, mainstream treatments. In this study, we investigated the combination effect of acupuncture and the antidepressant fluoxetine, as well as its underlying mechanism using resting state functional connectivity (rsFC) in patients with major depressive disorders. Forty-six female depressed patients were randomized into a verum acupuncture plus fluoxetine or a sham acupuncture plus fluoxetine group for eight weeks. Resting-state fMRI data was collected before the first and last treatments. Results showed that compared with those in the sham acupuncture treatment, verum acupuncture treatment patients showed 1) greater clinical improvement as indicated by Montgomery-Åsberg Depression Rating Scale (MADRS) and Self-Rating Depression Scale (SDS) scores; 2) increased rsFC between the left amygdala and subgenual anterior cingulate cortex (sgACC)/preguenual anterior cingulate cortex (pgACC); 3) increased rsFC between the right amygdala and left parahippocampus (Para)/putamen (Pu). The strength of the amygdala-sgACC/pgACC rsFC was positively associated with corresponding clinical improvement (as indicated by a negative correlation with MADRS and SDS scores). Our findings demonstrate the additive effect of acupuncture to antidepressant treatment and suggest that this effect may be achieved through the limbic system, especially the amygdala and the ACC.

AAAI Conference 2016 Conference Paper

Write-righter: An Academic Writing Assistant System

  • Yuanchao Liu
  • Xin Wang
  • Ming Liu
  • Xiaolong Wang

Writing academic articles in English is a challenging task for non-native speakers, as more effort has to be spent to enhance their language expressions. This paper presents an academic writing assistant system called Write-righter, which can provide real-time hint and recommendation by analyzing the input context. To achieve this goal, some novel strategies, e. g. , semantic extension based sentence retrieval and LDA based sentence structure identification have been proposed. Write-righter is expected to help people express their ideas correctly by recommending top N most possible expressions.

IJCAI Conference 2015 Conference Paper

VRCA: A Clustering Algorithm for Massive Amount of Texts

  • Ming Liu
  • Lei Chen
  • Bingquan Liu
  • Xiaolong Wang

There are lots of texts appearing in the web every day. This fact enables the amount of texts in the web to explode. Therefore, how to deal with large-scale text collection becomes more and more important. Clustering is a generally acceptable solution for text organization. Via its unsupervised characteristic, users can easily dig the useful information that they desired. However, traditional clustering algorithms can only deal with small-scale text collection. When it enlarges, they lose their performances. The main reason attributes to the high-dimensional vectors generated from texts. Therefore, to cluster texts in large amount, this paper proposes a novel clustering algorithm, where only the features that can represent cluster are preserved in cluster’s vector. In this algorithm, clustering process is separated into two parts. In one part, feature’s weight is fine-tuned to make cluster partition meet an optimization function. In the other part, features are reordered and only the useful features that can represent cluster are kept in cluster’s vector. Experimental results demonstrate that our algorithm obtains high performance on both small-scale and large-scale text collections.

TCS Journal 2013 Journal Article

Approximation algorithms for parallel machine scheduling with linear deterioration

  • Ming Liu
  • Feifeng Zheng
  • Shijin Wang
  • Yinfeng Xu

This paper deals with a parallel machine scheduling problem. Different from fixed processing time assumption in the classical scheduling, a job’s processing time is a simple linear increasing function of its starting time. The aim is makespan minimization, and our focus is on the case with an arbitrary number of parallel machines. We prove that LIST rule is ( 1 + b m a x ) m − 1 m -approximation where m is the number of machines and b m a x is the maximum deteriorating rate of job. We then propose one heuristic LDR (Largest deteriorating Rate first). The heuristic is proved by ( 1 + b m i n ) m − 1 m -approximation where b m i n is the minimum deteriorating rate. We further show that this ratio is tight when m = 2, 3 and 4.

TCS Journal 2012 Journal Article

New results on single-machine scheduling with past-sequence-dependent delivery times

  • Ming Liu
  • Feifeng Zheng
  • Chengbin Chu
  • Yinfeng Xu

Scheduling with past-sequence-dependent ( p s d ) delivery times is motivated by questions that arise in the electronic manufacturing industry: an electronic component may be exposed to certain a electromagnetic field while waiting for processing and is required to neutralize the effect of electromagnetism. The time spent on the neutralization process has been modeled as p s d delivery time in the literature. In this paper, we consider single-machine scheduling problems with p s d delivery times. We respectively derive polynomial algorithms for the following objective functions: the minimization of the total weighted completion time, the total weighted discounted completion time, the total absolute differences in completion times and the sum of earliness, tardiness and common due date penalty. At last, for the criteria of minimization the total weighted tardiness, we propose a polynomial algorithm to optimally solve the problem under a certain condition.

TCS Journal 2012 Journal Article

Optimal algorithms for online single machine scheduling with deteriorating jobs

  • Ming Liu
  • Feifeng Zheng
  • Shijin Wang
  • Jiazhen Huo

In many realistic scenarios of job processing, one job consumes a longer time to be satisfied with a later start time of processing. This phenomenon is known as job’s deterioration effect. Such effect is unexplored in the context of online environment. In this paper we study online single machine scheduling for deteriorating jobs, where jobs arrive over time and become known to any online algorithm on their arrivals. The processing time of each job is a linearly increasing function of its start time. We mainly investigate three online models that minimize makespan, total completion time and maximum delivery time, respectively. For each model we present an optimal online algorithm in competitiveness.

TCS Journal 2011 Journal Article

Optimal algorithms for online scheduling on parallel machines to minimize the makespan with a periodic availability constraint

  • Ming Liu
  • Feifeng Zheng
  • Chengbin Chu
  • Yinfeng Xu

In this paper we investigate two online scheduling problems. The first one is online scheduling on m parallel machines with one machine periodically unavailable. The second problem is online scheduling on two uniform parallel machines where one machine is periodically unavailable. The online paradigm is that jobs arrive over list, i. e. , when a job presents, we have to irrevocably assign it before the next one is seen. Preemption is not allowed. The objective is to minimize makespan. We suppose that the length of each available period is normalized to 1 and the length of each unavailable period is α > 0. For the first problem, we give an optimal algorithm with competitive ratio 2. For the second problem, we assume that the speed of the periodically unavailable machine is normalized to 1, while the speed of the other one is s > 0. In the case where s ≥ 1, we design an algorithm and show that it is optimal with competitive ratio 1 + 1 s. Then we further give some lower bounds on competitive ratio in the case 0 < s < 1. We also study a special case and prove that L P T algorithm proposed in Xu et al. (2009) [7] is optimal with competitive ratio 3 2.

TCS Journal 2009 Journal Article

Online scheduling on m uniform machines to minimize total (weighted) completion time

  • Ming Liu
  • Chengbin Chu
  • Yinfeng Xu
  • Feifeng Zheng

We study two online problems on m uniform machines with speeds s 1 ≤ ⋯ ≤ s m. The problems are online in the sense that all jobs arrive over time. Each job’s characteristics, such as processing time and weight become known at its arrival time. For the first problem Q | r j, o n l i n e | ∑ C j, we prove that R-LIST algorithm is 4 m − 3 + 3 2 -competitive. For the second problem Q | r j, o n l i n e, p m t n | ∑ w j C j, we show that WSPT-1 algorithm is 2 -competitive if s i / s m ≥ ∑ h = 1 i s h / ∑ h = 1 m s h for i = 1, …, m − 1. Then we study a special case where s 1 = s 2 = ⋯ = s m − 1 ≤ s m. We obtain that algorithm WSPT-1 is 2 -competitive if s m ( m − 2 ) ≤ s 1 ( m − 1 ).

TCS Journal 2009 Journal Article

Online scheduling on two uniform machines to minimize the makespan

  • Ming Liu
  • Yinfeng Xu
  • Chengbin Chu
  • Feifeng Zheng

We consider two problems of online scheduling on two uniform machines: online scheduling under a grade of service (GoS) and online scheduling with reassignment. These problems are online in the sense that when a job presents, we have to irrevocably assign it to one of the machines before the next job is seen. The objective is to minimize the makespan. In the first problem, GoS means that some jobs have to be processed by some machine so that they can be guaranteed a higher quality. Assume that the speed of the higher GoS machine is normalized to 1, while the speed of the other one is s. We show that a lower bound of competitive ratio is 1 + 2 s s + 2 in the case 0 < s ≤ 1 and 1 + s + 1 s ( 2 s + 1 ) in the case s > 1. Then we propose and analyze two online algorithms: HSF algorithm and EX-ONLINE algorithm. HSF is optimal in the case where s > 1 and Σ 1 ≥ Σ 2 s, where Σ 1 and Σ 2 denote the total processing time of jobs which request higher GoS machine and the total processing time of jobs which request the normal one, respectively. EX-ONLINE is optimal in the case 2 ( 2 − 1 ) ≤ s ≤ 1. In the second problem, we study two subproblems P L and P A proposed in [Z. Tan, S. Yu, Online scheduling with reassignment, Operations Research Letters 36 (2008) 250–254]. Assume that the speeds of 2 uniform machines are 1 and s ≥ 1, respectively. For P L where we can reassign the last k jobs of the sequence, we show a lower bound of competitive ratio 1 + 1 1 + s. For P A where we can reassign arbitrary k jobs, we show a lower bound of competitive ratio ( s + 1 ) 2 s 2 + s + 1. We propose a s + 1 s -competitive algorithm HSF-1 for both P L and P A. For P A, we propose a ( s + 1 ) 2 s + 2 -competitive algorithm EX-RA, which is superior to HSF-1 when 1 ≤ s ≤ 2.

TCS Journal 2009 Journal Article

Online scheduling to minimize modified total tardiness with an availability constraint

  • Ming Liu
  • Yinfeng Xu
  • Chengbin Chu
  • Feifeng Zheng

We consider online scheduling problems to minimize modified total tardiness. The problems are online in the sense that jobs arrive over time. For each job J j, its processing time p j, due date d j and weight w j become known at its arrival time (or release time) r j. Preemption is not allowed. We first show that there is no finite competitive ratio for problem 1 | o n l i n e, r j, d j | ∑ w j T j. So we focus on problem 1 | o n l i n e, r j, d j | ∑ w j ( T j + d j ) and show that D-SWPT (Delayed Shortest Weighted Processing Time) algorithm is 3 -competitive. We further study two problems 1 | o n l i n e, r j, d j, h ( 1 ), r e s | ∑ w j ( T j + d j ) and 1 | o n l i n e, r j, d j, h ( 1 ), N − r e s | ∑ w j ( T j + d j ), where r e s and N − r e s denote resumable and non-resumable models respectively, and h ( 1 ) denotes a non-available time interval [ s, α s ] with s > 0 and α ≥ 1. We give a lower bound of 1 + α for both problems and prove that M − D − S W P T (Modified D-SWPT) is 3 α and 6 α -competitive in the resumable and non-resumable models, respectively. Moreover, we extend the upper bounds to the scenario of parallel machine scheduling with uniform job weight and an assumption that all machines have the same non-available time interval [ s, α s ]. A lower bound of min { α, 1 + α m } is given as well for the scenario.

IROS Conference 2006 Conference Paper

Identification of Attitude Flight Dynamics for An Unconventional UAV

  • Ming Liu
  • Gregory K. Egan
  • YunJian Ge

The unconventional air frames used by many small fixed-wing unmanned aerial vehicles (UAVs) pose difficulties in determining their dynamic models accurately. This is further complicated by the low Reynolds numbers flight regimes encountered by these aircraft. This paper studies the attitude flight dynamics identification of an unconventional model-scale UAV which has only two independently actuated elevon surfaces (no rudder or elevators). The study presents the first steps towards the development of autopilots for this kind of UAV. Utilizing the real-time flight data collected from a human-controlled test flight, the standard identification approaches were applied to obtain a MIMO linear model with given configuration deduced from a theoretical study. The simulation-based validation of a simplified model was undertaken which shows acceptable modelling accuracy. Further flight tests controlled by an autopilot tuned according to the model are currently being undertaken

ICRA Conference 2000 Conference Paper

A 3-Step Set-Point Control Algorithm for Robot Arms

  • Nghe Huan Quach
  • Ming Liu

For robot arm set-point control, a novel PD based algorithm for the estimation and compensation of gravity force and static friction is proposed. Based on the linear-in-parameter property of the gravity force and the steady state feature of PD set-point control, the algorithm modifies the set-points according to the steady state position errors. Using the steady state equations near the given set-points the unknown static friction force and gravity force can be calculated online. This information can then be used to compensate the effects of gravity force and static friction to eliminate set-point errors.

ICRA Conference 1997 Conference Paper

Decentralized adaptive control for robot arm tracking

  • Ming Liu

We propose a decentralized (or so called independent-joint) adaptive control scheme. The adaptive control is developed based on the passivity and linear-in-parameters properties of the robot motion equation. The proposed scheme, which consists of a PD term, a nonlinear term and an adaptation term, ensures the globally ultimately bounded tracking errors and parameter estimation errors. The practical significance of the scheme lies in the fact that it can be implemented in most robot manipulators without hardware alteration.

ICRA Conference 1995 Conference Paper

Conputed Torque Scheme Based Adptive Tracking for Robot Manipulators

  • Ming Liu

A computed torque scheme based adaptive tracking control approach for robot manipulators, which can be regarded as a modified version of Craig's adaptive control law (1987), is presented in this paper. Utilising a two component structure, this controller consists of a linear PD control component and an adaptive component. The PD control, using some reasonable estimates of system parameters, is used to stabilise the robot system in the sense that all internal signals remain bounded. The adaptive control component is to reduce the tracking errors for the resultant error dynamics. In this scheme, even though the acceleration signals are still needed in parameter estimation, there is no need to compute the inverse of the estimated inertial matrix, which is required in Craig's scheme. The adaptive controller design is based on the standard Lyapunov method and the linear-in-parameter property of robot motion equations. The scheme ensures the global convergence of the tracking errors and will approach the computed torque scheme provided that the parameter estimates converge to their true values.