Arrow Research search

Author name cluster

Ping Yu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers
2 author rows

Possible papers

12

AIIM Journal 2025 Journal Article

Empowering large language models for automated clinical assessment with generation-augmented retrieval and hierarchical chain-of-thought

  • Zhanzhong Gu
  • Wenjing Jia
  • Massimo Piccardi
  • Ping Yu

Background: Understanding and extracting valuable information from electronic health records (EHRs) is important for improving healthcare delivery and health outcomes. Large language models (LLMs) have demonstrated significant proficiency in natural language understanding and processing, offering promises for automating the typically labor-intensive and time-consuming analytical tasks with EHRs. Despite the active application of LLMs in the healthcare setting, many foundation models lack real-world healthcare relevance. Applying LLMs to EHRs is still in its early stage. To advance this field, in this study, we pioneer a generation-augmented prompting paradigm “GAPrompt” to empower generic LLMs for automated clinical assessment, in particular, quantitative stroke severity assessment, using data extracted from EHRs. Methods: The GAPrompt paradigm comprises five components: (i) prompt-driven selection of LLMs, (ii) generation-augmented construction of a knowledge base, (iii) summary-based generation-augmented retrieval (SGAR); (iv) inferencing with a hierarchical chain-of-thought (HCoT), and (v) ensembling of multiple generations. Results: GAPrompt addresses the limitations of generic LLMs in clinical applications in a progressive manner. It efficiently evaluates the applicability of LLMs in specific tasks through LLM selection prompting, enhances their understanding of task-specific knowledge from the constructed knowledge base, improves the accuracy of knowledge and demonstration retrieval via SGAR, elevates LLM inference precision through HCoT, enhances generation robustness, and reduces hallucinations of LLM via ensembling. Experiment results demonstrate the capability of our method to empower LLMs to automatically assess EHRs and generate quantitative clinical assessment results. Conclusion: Our study highlights the applicability of enhancing the capabilities of foundation LLMs in medical domain-specific tasks, i. e. , automated quantitative analysis of EHRs, addressing the challenges of labor-intensive and often manually conducted quantitative assessment of stroke in clinical practice and research. This approach offers a practical and accessible GAPrompt paradigm for researchers and industry practitioners seeking to leverage the power of LLMs in domain-specific applications. Its utility extends beyond the medical domain, applicable to a wide range of fields.

ICML Conference 2025 Conference Paper

R. I. P. : Better Models by Survival of the Fittest Prompts

  • Ping Yu
  • Weizhe Yuan
  • Olga Golovneva
  • Tianhao Wu 0002
  • Sainbayar Sukhbaatar
  • Jason Weston
  • Jing Xu 0014

Training data quality is one of the most important drivers of final model quality. In this work, we introduce a method for evaluating data integrity based on the assumption that low-quality input prompts result in high variance and low quality responses. This is achieved by measuring the rejected response quality and the reward gap between the chosen and rejected preference pair. Our method, Rejecting Instruction Preferences (RIP) can be used to filter prompts from existing training sets, or to make high quality synthetic datasets, yielding large performance gains across various benchmarks compared to unfiltered data. Using Llama 3. 1-8B-Instruct, RIP improves AlpacaEval2 LC Win Rate by 9. 4%, Arena-Hard by 8. 7%, and WildBench by 9. 9%. Using Llama 3. 3-70B-Instruct, RIP improves Arena-Hard from 67. 5 to 82. 9, from 18th place to 6th overall in the leaderboard.

EAAI Journal 2024 Journal Article

A novel multiphase flow water cut modeling framework based on flow behavior-heuristic deep learning

  • Weidong Dang
  • Dongmei Lv
  • Feng Jing
  • Ping Yu
  • Wei Guo
  • Zhongke Gao

Industry 4. 0 is of great significance for the development of oil industry. One of the pivotal steps towards achieving oil Industry 4. 0 is accurately mastering the oilfield production dynamics, especially the changes of process parameters. Among many process parameters, the accurate modeling of water cut is extremely critical and difficult. In this paper, under laboratory conditions, oil–water flows with various preset water cuts are simulated. A fluid sensor, equipped with eight concave-shaped conductive electrodes, is employed to capture multi-channel measurement data, continuously recording the oil–water flow process from multiple angles. Subsequently, a novel flow behavior-heuristic deep learning model, named FBHWC model, is developed to model the relationship between the measurement data and water cut, achieving water cut measurement. The FBHWC model is guided by the complex flow behaviors of oil–water flows and consists of two key modules. Particularly, the multi-level feature fusion module focuses on the high-order feature extraction and fusion of sensor measurement data, while the multi-scale measurement module uses fully convolutional design to achieve accurate measurement of water cut. Experimental results show that the FBHWC model has excellent performance in measuring water cut, with mean square error of 0. 016%. All these open up a new venue for exploring industrial multiphase flows through combining multi-electrode sensor and deep learning.

AIIM Journal 2024 Journal Article

Automatic quantitative stroke severity assessment based on Chinese clinical named entity recognition with domain-adaptive pre-trained large language model

  • Zhanzhong Gu
  • Xiangjian He
  • Ping Yu
  • Wenjing Jia
  • Xiguang Yang
  • Gang Peng
  • Penghui Hu
  • Shiyan Chen

Background: Stroke is a prevalent disease with a significant global impact. Effective assessment of stroke severity is vital for an accurate diagnosis, appropriate treatment, and optimal clinical outcomes. The National Institutes of Health Stroke Scale (NIHSS) is a widely used scale for quantitatively assessing stroke severity. However, the current manual scoring of NIHSS is labor-intensive, time-consuming, and sometimes unreliable. Applying artificial intelligence (AI) techniques to automate the quantitative assessment of stroke on vast amounts of electronic health records (EHRs) has attracted much interest. Objective: This study aims to develop an automatic, quantitative stroke severity assessment framework through automating the entire NIHSS scoring process on Chinese clinical EHRs. Methods: Our approach consists of two major parts: Chinese clinical named entity recognition (CNER) with a domain-adaptive pre-trained large language model (LLM) and automated NIHSS scoring. To build a high-performing CNER model, we first construct a stroke-specific, densely annotated dataset “Chinese Stroke Clinical Records” (CSCR) from EHRs provided by our partner hospital, based on a stroke ontology that defines semantically related entities for stroke assessment. We then pre-train a Chinese clinical LLM coined “CliRoberta” through domain-adaptive transfer learning and construct a deep learning-based CNER model that can accurately extract entities directly from Chinese EHRs. Finally, an automated, end-to-end NIHSS scoring pipeline is proposed by mapping the extracted entities to relevant NIHSS items and values, to quantitatively assess the stroke severity. Results: Results obtained on a benchmark dataset CCKS2019 and our newly created CSCR dataset demonstrate the superior performance of our domain-adaptive pre-trained LLM and the CNER model, compared with the existing benchmark LLMs and CNER models. The high F1 score of 0. 990 ensures the reliability of our model in accurately extracting the entities for the subsequent automatic NIHSS scoring. Subsequently, our automated, end-to-end NIHSS scoring approach achieved excellent inter-rater agreement (0. 823) and intraclass consistency (0. 986) with the ground truth and significantly reduced the processing time from minutes to a few seconds. Conclusion: Our proposed automatic and quantitative framework for assessing stroke severity demonstrates exceptional performance and reliability through directly scoring the NIHSS from diagnostic notes in Chinese clinical EHRs. Moreover, this study also contributes a new clinical dataset, a pre-trained clinical LLM, and an effective deep learning-based CNER model. The deployment of these advanced algorithms can improve the accuracy and efficiency of clinical assessment, and help improve the quality, affordability and productivity of healthcare services.

ICLR Conference 2024 Conference Paper

Self-Alignment with Instruction Backtranslation

  • Xian Li 0003
  • Ping Yu
  • Chunting Zhou
  • Timo Schick
  • Omer Levy
  • Luke Zettlemoyer
  • Jason Weston
  • Mike Lewis

We present a scalable method to build a high quality instruction following language model by automatically labelling human-written text with corresponding instructions. Our approach, named instruction backtranslation, starts with a language model finetuned on a small amount of seed data, and a given web corpus. The seed model is used to construct training examples by generating instruction prompts for web documents (self-augmentation), and then selecting high quality examples from among these candidates (self-curation). This data is then used to finetune a stronger model. Finetuning LLaMa on two iterations of our approach yields a model that outperforms all other LLaMa-based models on the Alpaca leaderboard not relying on distillation data, demonstrating highly effective self-alignment.

EAAI Journal 2024 Journal Article

Unsupervised Signal Anomaly Transformer method: Achieving bearing life anomaly detection without the need for failure samples

  • Ping Yu
  • Mengmeng Ping
  • Jialin Ma
  • Jie Cao

Bearings are a crucial component, whose performance and lifespan directly affect the safety and operational efficiency of the entire mechanical system. However, in the case of new components or new process bearing equipment that lack complete life-cycle data, supervised learning methods relying on labels may fail. Therefore, a specialized unsupervised anomaly detection method for bearing signals has been proposed. Drawing on the ideas of the anomaly transformer, an unsupervised signal anomaly transformer method is proposed. This method optimizes the anomaly transformer based on signal input and uses a convolutional neural network (CNN) encoder–decoder structure to encode and reconstruct signals. Given the instability of symmetric Kullback–Leibler (KL) divergence in the overlapping area of probability distributions, this method uses the Wasserstein distance to measure the distance between two distributions. Additionally, a new metric is proposed for comparing the decoder-reconstructed signal to the original signal. To verify the effectiveness of this method, an objective evaluation was conducted using simulated signals, achieving an average accuracy of 99. 54%. Furthermore, the 2012 IEEE and XJTU-SY real datasets were used for subjective evaluation of anomaly detection signals. Multiple results confirm that this method has strong competitiveness in bearing signal anomaly detection, significantly improving the prediction accuracy and reliability of anomalies in bearings without full life data.

NeurIPS Conference 2023 Conference Paper

LIMA: Less Is More for Alignment

  • Chunting Zhou
  • Pengfei Liu
  • Puxin Xu
  • Srinivasan Iyer
  • Jiao Sun
  • Yuning Mao
  • Xuezhe Ma
  • Avia Efrat

Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences. We measure the relative importance of these two stages by training LIMA, a 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1, 000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling. LIMA demonstrates remarkably strong performance, learning to follow specific response formats from only a handful of examples in the training data, including complex queries that range from planning trip itineraries to speculating about alternate history. Moreover, the model tends to generalize well to unseen tasks that did not appear in the training data. In a controlled human study, responses from LIMA are either equivalent or strictly preferred to GPT-4 in 43\% of cases; this statistic is as high as 58\% when compared to Bard and 65\% versus DaVinci003, which was trained with human feedback. Taken together, these results strongly suggest that almost all knowledge in large language models is learned during pretraining, and only limited instruction tuning data is necessary to teach models to produce high quality output.

TIST Journal 2023 Journal Article

Obfuscating the Dataset: Impacts and Applications

  • Guangsheng Yu
  • Xu Wang
  • Caijun Sun
  • Ping Yu
  • Wei Ni
  • Ren Ping Liu

Obfuscating a dataset by adding random noises to protect the privacy of sensitive samples in the training dataset is crucial to prevent data leakage to untrusted parties when dataset sharing is essential. We conduct comprehensive experiments to investigate how the dataset obfuscation can affect the resultant model weights —in terms of the model accuracy, ℓ 2 -distance-based model distance, and level of data privacy—and discuss the potential applications with the proposed Privacy, Utility, and Distinguishability (PUD)-triangle diagram to visualize the requirement preferences. Our experiments are based on the popular MNIST and CIFAR-10 datasets under both independent and identically distributed (IID) and non-IID settings. Significant results include a tradeoff between the model accuracy and privacy level and a tradeoff between the model difference and privacy level. The results indicate broad application prospects for training outsourcing and guarding against attacks in federated learning both of which have been increasingly attractive in many areas, particularly learning in edge computing.

ICLR Conference 2020 Conference Paper

Bayesian Meta Sampling for Fast Uncertainty Adaptation

  • Zhenyi Wang 0001
  • Yang Zhao
  • Ping Yu
  • Ruiyi Zhang 0002
  • Changyou Chen

Meta learning has been making impressive progress for fast model adaptation. However, limited work has been done on learning fast uncertainty adaption for Bayesian modeling. In this paper, we propose to achieve the goal by placing meta learning on the space of probability measures, inducing the concept of meta sampling for fast uncertainty adaption. Specifically, we propose a Bayesian meta sampling framework consisting of two main components: a meta sampler and a sample adapter. The meta sampler is constructed by adopting a neural-inverse-autoregressive-flow (NIAF) structure, a variant of the recently proposed neural autoregressive flows, to efficiently generate meta samples to be adapted. The sample adapter moves meta samples to task-specific samples, based on a newly proposed and general Bayesian sampling technique, called optimal-transport Bayesian sampling. The combination of the two components allows a simple learning procedure for the meta sampler to be developed, which can be efficiently optimized via standard back-propagation. Extensive experimental results demonstrate the efficiency and effectiveness of the proposed framework, obtaining better sample quality and faster uncertainty adaption compared to related methods.

ICML Conference 2020 Conference Paper

Feature Quantization Improves GAN Training

  • Yang Zhao
  • Chunyuan Li
  • Ping Yu
  • Jianfeng Gao 0001
  • Changyou Chen

The instability in GANs’ training has been a long-standing problem despite remarkable research efforts. We identify that instability issues stem from difficulties of performing feature matching with mini-batch statistics, due to a fragile balance between the fixed target distribution and the progressively generated distribution. In this work, we propose feature quantizatoin (FQ) for the discriminator, to embed both true and fake data samples into a shared discrete space. The quantized values of FQ are constructed as an evolving dictionary, which is consistent with feature statistics of the recent distribution history. Hence, FQ implicitly enables robust feature matching in a compact space. Our method can be easily plugged into existing GAN models, with little computational overhead in training. Extensive experimental results show that the proposed FQ-GAN can improve the FID scores of baseline methods by a large margin on a variety of tasks, including three representative GAN models on 10 benchmarks, achieving new state-of-the-art performance.

AAAI Conference 2020 Conference Paper

Learning Diverse Stochastic Human-Action Generators by Learning Smooth Latent Transitions

  • Zhenyi Wang
  • Ping Yu
  • Yang Zhao
  • Ruiyi Zhang
  • Yufan Zhou
  • Junsong Yuan
  • Changyou Chen

Human-motion generation is a long-standing challenging task due to the requirement of accurately modeling complex and diverse dynamic patterns. Most existing methods adopt sequence models such as RNN to directly model transitions in the original action space. Due to high dimensionality and potential noise, such modeling of action transitions is particularly challenging. In this paper, we focus on skeleton-based action generation and propose to model smooth and diverse transitions on a latent space of action sequences with much lower dimensionality. Conditioned on a latent sequence, actions are generated by a frame-wise decoder shared by all latent action-poses. Specifically, an implicit RNN is defined to model smooth latent sequences, whose randomness (diversity) is controlled by noise from the input. Different from standard action-prediction methods, our model can generate action sequences from pure noise without any conditional action poses. Remarkably, it can also generate unseen actions from mixed classes during training. Our model is learned with a bi-directional generative-adversarial-net framework, which can not only generate diverse action sequences of a particular class or mix classes, but also learns to classify action sequences within the same model. Experimental results show the superiority of our method in both diverse action-sequence generation and classification, relative to existing methods.

JMLR Journal 2019 Journal Article

Robust Estimation of Derivatives Using Locally Weighted Least Absolute Deviation Regression

  • Wenwu Wang
  • Ping Yu
  • Lu Lin
  • Tiejun Tong

In nonparametric regression, the derivative estimation has attracted much attention in recent years due to its wide applications. In this paper, we propose a new method for the derivative estimation using the locally weighted least absolute deviation regression. Different from the local polynomial regression, the proposed method does not require a finite variance for the error term and so is robust to the presence of heavy-tailed errors. Meanwhile, it does not require a zero median or a positive density at zero for the error term in comparison with the local median regression. We further show that the proposed estimator with random difference is asymptotically equivalent to the (infinitely) composite quantile regression estimator. In other words, running one regression is equivalent to combining infinitely many quantile regressions. In addition, the proposed method is also extended to estimate the derivatives at the boundaries and to estimate higher-order derivatives. For the equidistant design, we derive theoretical results for the proposed estimators, including the asymptotic bias and variance, consistency, and asymptotic normality. Finally, we conduct simulation studies to demonstrate that the proposed method has better performance than the existing methods in the presence of outliers and heavy-tailed errors, and analyze the Chinese house price data for the past ten years to illustrate the usefulness of the proposed method. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )