Arrow Research search

Author name cluster

Ying Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

32 papers
2 author rows

Possible papers

32

EAAI Journal 2026 Journal Article

A collaborative approach based on large language model and knowledge graphs for information integration towards smart manufacturing

  • Ruihao Li
  • Chong Chen
  • Ying Liu
  • Tao Wang
  • Haidong Shao
  • Lianglun Cheng

In the era of smart manufacturing, integrating vast amounts of information has become an essential task. Knowledge Graph (KG) is a key technology for improving information integration, which can greatly improve the performance of question-answering for Large Language Models (LLMs). However, the existing approach mainly adopts KG as the plug-in database for Retrieval-Augmented Generation (RAG), which cannot achieve accurate answering due to the imperfections of KG. In order to address this challenge, a collaborative LLM-KG framework is proposed to iteratively update the KG, which can provide fine-grained knowledge for RAG. The methodology firstly constructs a foundational ontology, and adopts LLM for knowledge triples extraction to establish an initial KG based on multi-source data. Then, competency questions (CQs) are designed for the evaluation and optimization of the initial KG. After ontology optimization, a fine-grained KG is obtained to facilitate a robust question-answering mechanism through RAG. The proposed iterative approach can effectively refine the system's decision-support capabilities. An experimental study based on the real-world shipbuilding process data is implemented. The experimental results demonstrate that the answering accuracy can be improved from 86.18% to 93.09% with the enhancement of the proposed approach.

YNIMG Journal 2026 Journal Article

Enhanced visual and auditory inhibitory control in musicians: EEG evidence

  • Ying Liu
  • Jiarui Ma
  • Jing Ning
  • Jiejia Chen

Musicians' perceptual advantages are often concentrated in the auditory rather than the visual modality. However, whether the relationship between musical training and the core cognitive ability of inhibitory control exhibits auditory modality specificity remains unclear. To address this gap, the present study employed matched visual and auditory Go/No-go tasks combined with electroencephalography (EEG) to compare differences in inhibitory control between university students with long-term musical training (musical training group) and their untrained peers (control group). The results showed that in both visual and auditory inhibitory control tasks, even after controlling for eight potential confounding variables including age, socioeconomic status, IQ, and the Big Five personality traits, the musical training group not only demonstrated behavioral advantages (higher d' scores) but also exhibited enhanced neural activity during conflict monitoring (smaller N2 amplitudes and increased theta power) and motor inhibition (larger P3 amplitudes). These findings suggest a modality-general effect in the relationship between musical training and enhanced inhibitory control. Meanwhile, the musically trained group showed specific advantages only in the early processing stage of auditory stimuli, reflecting the potential strengthening effect of the auditory system brought about by musical experience. This study is the first to reveal, from a dual-modality perspective, the relationship between musical training and cross-modal inhibitory control. It contributes to our understanding of the cognitive mechanisms associated with musical training and provides empirical evidence for its potential applications in music therapy and education.

YNIMG Journal 2026 Journal Article

Gestational age-specific DTI templates of the neonatal brain: Application in preterm developmental study

  • Xiaochen Jiang
  • Mengyi Wang
  • Ying Liu
  • Tianhao Zhang
  • Guangjuan Mao
  • Qi Zhou
  • Shilun Zhao
  • Baoci Shan

Due to significant differences in brain volume, morphology, and white matter integrity among neonates of varying gestational ages, using a single full-term template for preterm analysis inevitably introduces analytical errors. To address this, we aimed to develop gestational-age-specific stereotaxic DTI templates using retrospective diffusion MRI scans from 161 neonates acquired between August 2021 and January 2024. The cohort was stratified into four WHO-defined subgroups: extremely preterm (n = 31), very preterm (n = 29), moderate to late preterm (n = 28), and full-term (n = 73). Templates were constructed via iterative registration, with corresponding atlases transformed from JHU space and manually corrected. Quantitative evaluation using the Jacobian determinant and standard deviation revealed that our age-specific templates demonstrated significantly lower deformation magnitude and registration error compared to a standard full-term template. When applied to investigate developmental differences, we observed progressively more extensive fractional anisotropy reductions from moderate-to-late to extremely preterm neonates. Notably, commissural fibers, particularly the corpus callosum body (0.194 ± 0.005 in extremely preterm vs. 0.230 ± 0.003 in full-term, p < 0.001), exhibited significant developmental gradients. Consequently, these constructed gestational-age-specific DTI templates offer a robust tool to improve the accuracy of morbidity risk predictions and facilitate multicenter studies of preterm neonates.

YNIMG Journal 2026 Journal Article

Hierarchical neurobehavioral model reveals that shared flexibility, not individual stability, supports rhythmic coordination

  • Ruoyu Niu
  • Yanan Li
  • Lei Liu
  • Yafeng Pan
  • Ying Liu

Interpersonal coordination requires balancing individual control with interaction-derived synergy, yet it remains unclear when neural coupling contributes beyond behavior. Using an fNIRS hyperscanning paradigm, we examined dyadic rhythmic coordination and jointly modeled behavioral stability, dispositional structure, and interbrain synchrony within a hierarchical neurobehavioral framework. Across models, mean individual stability was negatively associated with dyadic performance, whereas interaction-derived shared flexibility (i.e., dyad-level behavioral stability synergy) was the most robust positive predictor. Incorporating dispositional structure showed that larger within-dyad differences in figure-embedding performance impaired coordination, whereas higher dyad-level self-esteem facilitated coordination. The neural coupling index showed no reliable main effect after accounting for behavioral and trait factors, but moderation analyses indicated a conditional contribution: interbrain synchrony compensated when shared flexibility was low, with diminishing benefit as synergy increased. Together, these findings support a hierarchical neurobehavioral architecture in which shared flexibility provides the primary foundation of coordination, dispositional structure shapes the conditions for synergy, and interbrain synchrony contributes in a context-dependent manner.

AAAI Conference 2026 Conference Paper

PointSLAM++: Robust Dense Neural Gaussian Point Cloud-based SLAM

  • Xu Wang
  • Boyao Han
  • Xiaojun Chen
  • Ying Liu
  • Ruihui Li

Real-time 3D reconstruction is crucial for robotics and augmented reality, yet current simultaneous localization and mapping(SLAM) approaches often struggle to maintain structural consistency and robust pose estimation in the presence of depth noise. This work introduces PointSLAM++, a novel RGB-D SLAM system that leverages a hierarchically constrained neural Gaussian representation to preserve structural relationships while generating Gaussian primitives for scene mapping. It also employs progressive pose optimization to mitigate depth sensor noise, significantly enhancing localization accuracy. Furthermore, it utilizes a dynamic neural representation graph that adjusts the distribution of Gaussian nodes based on local geometric complexity, enabling the map to adapt to intricate scene details in real time. This combination yields high-precision 3D mapping and photorealistic scene rendering. Experimental results show PointSLAM++ outperforms existing 3DGS-based SLAM methods in reconstruction accuracy and rendering quality, demonstrating its advantages for large-scale AR and robotics.

AAAI Conference 2026 Conference Paper

Sparse-Scale Transformer with Bidirectional Awareness for Time Series Forecasting

  • Ying Liu
  • Bo Liu
  • Sheng Huang
  • Gang Luo
  • Wenbo Hu
  • Meng Wang
  • Richang Hong

Time series forecasting (TSF) plays a crucial role in many real-world applications, such as weather prediction and economic planning. While Transformer-based models have shown strong capabilities in modeling long-range dependencies, effectively capturing the multi-scale temporal dynamics inherent in time series remains a major challenge. Existing methods often adopt time-windows of varying sizes, which may introduce noisy or irrelevant representations when mismatched with the underlying temporal patterns, potentially leading to overfitting. In this paper, we propose Sparse-Scale Transformer (SSformer) with Bidirectional Awareness for Time Series Forecasting to enhance the multi-scale modeling for time series. Specifically, we propose a novel Sparse-Scale Convolution (SSC) block that imposes sparsity on scales to obtain the informative representations by evaluating the intra-scale segment similarity of time series, and utilizes scale-specific convolutions to extract local patterns. Furthermore, we design a Bidirectional-Scale Interaction (BSI) block to explicitly model scale correlations in both coarse-to-fine and fine-to-coarse directions. Finally, scale predictions are ensembled to fully exploit the complementary forecasting capabilities across scales. Extensive experiments on various real-world datasets demonstrate that SSformer achieves state-of-the-art performance with superior efficiency.

ICRA Conference 2025 Conference Paper

Fine-Grained Open-Vocabulary Object Detection with Fined-Grained Prompts: Task, Dataset and Benchmark

  • Ying Liu
  • Yijing Hua
  • Haojiang Chai
  • Yanbo Wang
  • TengQi Ye

Open-vocabulary detectors are proposed to locate and recognize objects in novel classes. However, variations in vision-aware language vocabulary data used for open-vocabulary learning can lead to unfair and unreliable evaluations. Recent evaluation methods have attempted to address this issue by incorporating object properties or adding locations and characteristics to the captions. Nevertheless, since these properties and locations depend on the specific details of the images instead of classes, detectors can not make accurate predictions without precise descriptions provided through human annotation. This paper introduces 3F-OVD, a novel task that extends supervised fine-grained object detection to the open-vocabulary setting. Our task is intuitive and challenging, requiring a deep understanding of Fine-grained captions and careful attention to Fine-grained details in images in order to accurately detect Fine-grained objects. Additionally, due to the scarcity of qualified fine-grained object detection datasets, we have created a new dataset, NEU-171K, tailored for both supervised and open-vocabulary settings. We benchmark state-of-the-art object detectors on our dataset for both settings. Furthermore, we propose a simple yet effective post-processing technique. Our data, annotations and codes are available at https://github.com/tengerye/3FOVD.

JBHI Journal 2025 Journal Article

Image-enhanced Multi-Modal Contrastive Transformer for subcellular spatial transcriptomics

  • Wanwan Shi
  • Ying Liu
  • Qiu Xiao
  • Yuting Bai
  • Xiao Liang
  • Xinling Zeng
  • Chee Keong Kwoh
  • Jiawei Luo

Recent advances in spatial molecular imaging technologies have enabled gene expression profiling alongside high-resolution imaging, providing unprece dented opportunities to resolve molecular heterogeneity at subcellular resolution. However, these technologies fail to fully capture cellular characteristics due to the limited number of genes they can detect, which hinderdownstream analysis. Spatial imaging data provide high-resolution and fine-grained morphology information, developing computational methods that effectively integrate image features with transcriptomic profiles is crucial for enabling comprehensive subcellular data analysis. In this study, we present SIMMT, an image-enhanced multi-modal contrastivetrans former framework for identifying spatial domains and en hancing subcellular data. In the framework, we design a dual transformer architecture to learn multi-modal representations for cells by modeling transcriptomics and morphological images respectively. To fully capture modality interactions within spatial contexts, we introduce a contrastive learning module that enhances cell representation by aligning tissue morphology and gene expression at the cell level. We tested SIMMT on subcellular spatial transcriptomics datasets from human lung cancer tissue, mouse brain tissue, human colorectal cancer tissue, and human ovarian cancer tissue. The results demonstrated that SIMMT consistently outperformed state-of-the-art methods in spatial clustering and gene expression pattern analysis. Our method also effectively demonstrated its ability to identify tumor spatial heterogeneity and uncover potential gene biomarkers in the human bronchiolar adenoma (BA) dataset. The code and dataset of SIMMT can be downloaded from https://github.com/LWanzi/SIMMT

AAAI Conference 2025 Conference Paper

Iterative Self-Training with Class-Aware Text-to-Image Synthesis for Visual Task Learning

  • Xiang Zhang
  • Wanqing Zhao
  • Pengyang Li
  • Ying Liu
  • Hangzai Luo
  • Sheng Zhong
  • Jinye Peng
  • Jianping Fan

Generative models are widely used to produce synthetic images with annotations, alleviating the burden of image collection and annotation for training deep visual models. However, challenges such as limited image diversity, noisy pseudo labels, and domain gaps between synthetic and real images often undermine their effectiveness in downstream visual tasks. This paper introduces the Iterative Self-Training with Class-Aware Text-to-Image Synthesis (IST-CATS) framework, which addresses these challenges by integrating a class-aware text-to-image synthesis (CATS) component with an iterative self-training (IST) strategy. CATS innovatively introduces a class-aware chain approach to generate detailed descriptions. These descriptions act as prompts for a diffusion model, enabling the creation of a diverse of images accompanied by distinguishable objects against the background. The generated images can be easily pseudo-labeled by an unsupervised instance segmentation method, and then noisy pseudo labels can be effectively purified by a novel feature similarity-based filtering mechanism. The generated images underpin our IST, which progressively enhances vision models and refines pseudo labels through self-training and our proposed label filtering strategy (LabFilt). LabFilt meticulously improves the quality of pseudo labels by employing class-adaptive techniques at both the pixel and object levels, ensuring refined pseudo-label accuracy. IST-CATS demonstrates superior performance in object detection and semantic segmentation compared to traditional synthetic and semi/weakly-supervised methods, effectively addressing data collection and annotation challenges.

AAAI Conference 2025 Conference Paper

Semi-Implicit Neural Ordinary Differential Equations

  • Hong Zhang
  • Ying Liu
  • Romit Maulik

Classical neural ODEs trained with explicit methods are intrinsically limited by stability, crippling their efficiency and robustness for stiff learning problems that are common in graph learning and scientific machine learning. We present a semi-implicit neural ODE approach that exploits the partitionable structure of the underlying dynamics. Our technique leads to an implicit neural network with significant computational advantages over existing approaches because of enhanced stability and efficient linear solves during time integration. We show that our approach outperforms existing approaches on a variety of applications including graph classification and learning complex dynamical systems. We also demonstrate that our approach can train challenging neural ODEs where both explicit methods and fully implicit methods are intractable.

YNIMG Journal 2025 Journal Article

Uncovering the neural basis of risk preferences in cooperative Dyads: A fNIRS study

  • Qianlan Yin
  • Jing Wen
  • Shuo Chen
  • Tianya Hou
  • Ying Liu
  • Danni Yang
  • Guorui Liu
  • Peiqi Shi

BACKGROUND: Individuals' risk preferences have been shown to influence their decision-making in various contexts. However, the neural mechanisms underlying the relationship between risk preference and decision-making in a social setting remain unclear. This study utilized functional near-infrared spectroscopy (fNIRS) to investigate the neural correlates of dyadic decision-making under risk and the modulating effect of individual risk preference. METHOD: This study examined the impact of risk preference on group decision-making using a two-phase experimental design. Based on G-power software calculations, 168 right-handed participants (62 males, 106 females, mean age 21.26±1.70) were recruited. Participants first completed a single-player Sequential Risk Task to measure risk preference, followed by group classification into three groups: Risky&Risky, Risky&Safe, and Safe&Safe. Task performance and decision-making behavior were recorded. Functional Near-Infrared Spectroscopy (fNIRS) was employed to measure cortical activation in the prefrontal cortex, focusing on inter-brain synchrony and coupling directionality using wavelet coherence and Granger causality(GC) analyses. Data were preprocessed to remove noise, and statistical analyses included repeated measures ANOVAs, Support Vector Regression and multiple regression analyses. RESULTS: = 0.173 and 0.191). CONCLUSION: This study employed fNIRS hyperscanning to investigate how individual differences in risk preference impact decision-making in dyadic contexts. The results indicated that variations in connectivity and information transfer between the orbitofrontal and medial prefrontal cortices underlie the distinct risk-taking behaviors exhibited by dyadic pairs. These findings underscore the pivotal role of affective and cognitive control mechanisms and individual risk personality traits in cooperative decision-making under conditions of uncertainty.

JBHI Journal 2024 Journal Article

An Explainable and Personalized Cognitive Reasoning Model Based on Knowledge Graph: Toward Decision Making for General Practice

  • Qianghua Liu
  • Yu Tian
  • Tianshu Zhou
  • Kewei Lyu
  • Zhixiao Wang
  • Yixiao Zheng
  • Ying Liu
  • Jingjing Ren

General practice plays a prominent role in primary health care (PHC). However, evidence has shown that the quality of PHC is still unsatisfactory, and the accuracy of clinical diagnosis and treatment must be improved in China. Decision making tools based on artificial intelligence can help general practitioners diagnose diseases, but most existing research is not sufficiently scalable and explainable. An explainable and personalized cognitive reasoning model based on knowledge graph (CRKG) proposed in this article can provide personalized diagnosis, perform decision making in general practice, and simulate the mode of thinking of human beings utilizing patients’ electronic health records (EHRs) and knowledge graph. Taking abdominal diseases as the application point, an abdominal disease knowledge graph is first constructed in a semiautomated manner. Then, the CRKG designed referring to dual process theory in cognitive science involves the update strategy of global graph representations and reasoning on a personal cognitive graph by adopting the idea of graph neural networks and attention mechanisms. For the diagnosis of diseases in general practice, the CRKG outperforms all the baselines with a precision@1 of 0. 7873, recall@10 of 0. 9020 and hits@10 of 0. 9340. Additionally, the visualization of the reasoning process for each visit of a patient based on the knowledge graph enhances clinicians' comprehension and contributes to explainability. This study is of great importance for the exploration and application of decision making based on EHRs and knowledge graph.

YNICL Journal 2024 Journal Article

Quantitative comparison of CSVD imaging markers between patients with possible amyloid small vessel disease and with non-amyloid small vessel disease

  • Chun-Qiang Lu
  • Ying Liu
  • Jia-Rong Huang
  • Meng-Shuang Li
  • Yan-Shuang Wang
  • Yan Gu
  • Di Chang

The spatial distribution patterns of cerebral microbleeds are associated with different types of cerebral small vessel disease (CSVD). This study aims to examine the disparities in brain imaging markers of CSVD among patients diagnosed with possible amyloid and non-amyloid small vessel disease. The head MR scans including susceptibility-weighted imaging (SWI) sequences from 351 patients at our institute were collected for analysis. CSVD imaging markers were quantified or graded across various CSVD dimensions in the patient images. Patients were categorized into the cerebral amyloid angiopathy group (CAA), hypertensive arteriopathy group (HA), or mixed small vessel disease group (Mixed), based on the spatial distribution of microbleeds. White matter lesions (WML) were segmented using an artificial neural network and assessed via a voxel-wise approach. Significant differences were observed among the three groups in several indices: microbleed count, lacune count at the centrum semiovale and basal ganglia levels, grade of enlarged perivascular space (EPVS) at the basal ganglia, and white matter lesion volume. These indices were substantially higher in the Mixed group compared to the other groups. Additionally, the incidences of cerebral hemorrhages (χ2 = 7.659, P = 0.006) and recent small subcortical infarcts (χ2 = 4.660, P = 0.031) were significantly more frequent in the HA group than in the CAA group. These results indicate that mixed spatial distribution patterns of microbleeds demonstrated the highest burden of cerebral small vessel disease. Microbleeds located in the deep brain regions were associated with a higher incidence of recent small subcortical infarcts and cerebral hemorrhages compared to those in the cortical areas.

YNIMG Journal 2022 Journal Article

No smoking signs with strong smoking symbols induce weak cravings: an fMRI and EEG study

  • Wanwan Lü
  • Qichao Wu
  • Ying Liu
  • Ying Wang
  • Zhengde Wei
  • Yu Li
  • Chuan Fan
  • An-Li Wang

No smoking signs (NSSs) that combine smoking symbols (SSs) and prohibition symbols (PSs) represent common examples of reward and prohibition competition. To evaluate how SSs within NSSs influence their effectiveness in guiding reward vs. prohibition, we studied 93 male smokers. We collected self-reported craving ratings (N=30), cue reactivity under fMRI/EEG (N=33), and smoking-behavior anticipation for paired NSSs and SSs (N=30). We found that NSS-induced cravings were negatively correlated with SS-induced cravings and PS-induced inhibition. fMRI indicated that both correlations were mediated by activation of the inferior frontal gyrus and precuneus, suggesting that the effects of SSs and PSs interact with each other. EEG revealed that the prohibition response occurs after the cigarette response, indicating that the cigarette response might be precluded by the prohibition, supporting the effect of SSs in discouraging smoking. Moreover, stronger SSs induced stronger slow positive waves and late positive potentials, and the stronger the late positive potentials, the stronger the late positive potentials. Both the amplitudes of late positive potentials and slow positive waves were positively correlated with the amplitude of N2, which was positively correlated with the attention grabbed score by the NSS. In addition, the weaker the NSS-induced craving, the greater the smoking behavior anticipation reduction, indicating the capability of NSSs to decrease smoking behavior. Our study provides empirical evidence for selecting the most effective NSSs: those combining strong SS and PS, offering insights about competition between cigarette reward and prohibition and providing neural evidence on how cigarette reward and prohibition interact.

ICRA Conference 2021 Conference Paper

Elevation control of a soft jumping robot

  • Huimin Chen
  • Jiaming Liang
  • Zicong Miao
  • Guo Zhou
  • Ying Liu
  • Min Zhang 0031

Jumping with controllable elevation is significant for insect-scale robots to improve terrain adaptability and to escape from risks. However, jumping robots based on soft materials with low stiffness cannot transmit displacement precisely, exhibiting poor control of jumping. Here, we propose a modified two-bars catapult mechanism combined with an asynchronous sequential releasing strategy to realize elevation controllable jumping. In this work, an 80 mg prototype robot, 56 (long) × 29 mm (wide) × 3 mm (high) mm in size, is designed with the controllable elevation range from 63° to 112°. The soft robot is mainly composed of a shape memory alloy actuator and four electrostatic pads acting as the lock/release structures. Elevation control is realized by asynchronously releasing the electrostatic pads in a small time interval (about 10 ms). A maximum jump height of 62 mm and a maximum half-distance of 41 mm are also achieved.

JBHI Journal 2019 Journal Article

Inferring MicroRNA Targets Based on Restricted Boltzmann Machines

  • Ying Liu
  • Jiawei Luo
  • Pingjian Ding

Predicting the miRNA-target interactions (MTIs) is a critical task for elucidating mechanistic roles of miRNAs in pathophysiology. However, most existing techniques have a higher false positive because the precise miRNA target mechanisms are poorly known. Considering that ensemble methods can take advantage of the complementary knowledge in different methods, we propose an alternative optimization framework, Inferring MiRNA Targets based on Restricted Boltzmann Machines (IMTRBM), to enhance the accuracy of previous prediction results. First, the proposed method directly constructs a weighted MTI network though the results predicted by individual methods and each miRNA target pair is weighted based on the frequency appearing in these results. Second, we transform the miRNA-target prediction problem into a complete bipartite graph model, named restricted Boltzmann machine, and utilize a practical learning procedure to train our model and make predictions. Our results show that the algorithm outperforms individual miRNA-target prediction approach in the number of validated miRNA targets at cutoffs of top list. Moreover, our framework can tolerate the decrease and increase of predicted MTIs and even discover new miRNA targets, which have been a challenge to predict for any individual methods. Finally, for the miRNAs that are not appearing in IMTRBM, we design a new method to supplement IMTRBM based on the intuition that similar miRNAs have similar functions, which also achieves a comparable result. The source code of IMTRBM is available at https://github.com/liuying201705/IMTRBM.

IJCAI Conference 2019 Conference Paper

LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs

  • Weibin Meng
  • Ying Liu
  • Yichen Zhu
  • Shenglin Zhang
  • Dan Pei
  • Yuqing Liu
  • Yihao Chen
  • Ruizhi Zhang

Recording runtime status via logs is common for almost every computer system, and detecting anomalies in logs is crucial for timely identifying malfunctions of systems. However, manually detecting anomalies for logs is time-consuming, error-prone, and infeasible. Existing automatic log anomaly detection approaches, using indexes rather than semantics of log templates, tend to cause false alarms. In this work, we propose LogAnomaly, a framework to model unstructured a log stream as a natural language sequence. Empowered by template2vec, a novel, simple yet effective method to extract the semantic information hidden in log templates, LogAnomaly can detect both sequential and quantitive log anomalies simultaneously, which were not done by any previous work. Moreover, LogAnomaly can avoid the false alarms caused by the newly appearing log templates between periodic model retrainings. Our evaluation on two public production log datasets show that LogAnomaly outperforms existing log-based anomaly detection methods.

JBHI Journal 2015 Journal Article

Predicting Days in Hospital Using Health Insurance Claims

  • Yang Xie
  • Gunter Schreier
  • David C. W. Chang
  • Sandra Neubauer
  • Ying Liu
  • Stephen J. Redmond
  • Nigel H. Lovell

Health-care administrators worldwide are striving to lower the cost of care while improving the quality of care given. Hospitalization is the largest component of health expenditure. Therefore, earlier identification of those at higher risk of being hospitalized would help health-care administrators and health insurers to develop better plans and strategies. In this paper, a method was developed, using large-scale health insurance claims data, to predict the number of hospitalization days in a population. We utilized a regression decision tree algorithm, along with insurance claim data from 242 075 individuals over three years, to provide predictions of number of days in hospital in the third year, based on hospital admissions and procedure claims data. The proposed method performs well in the general population as well as in subpopulations. Results indicate that the proposed model significantly improves predictions over two established baseline methods (predicting a constant number of days for each customer and using the number of days in hospital of the previous year as the forecast for the following year). A reasonable predictive accuracy (AUC $=0. 843$) was achieved for the whole population. Analysis of two subpopulations-namely elderly persons aged 63 years or older in 2011 and patients hospitalized for at least one day in the previous year-revealed that the medical information (e. g. , diagnosis codes) contributed more to predictions for these two subpopulations, in comparison to the population as a whole.

NeurIPS Conference 2013 Conference Paper

Learning Gaussian Graphical Models with Observed or Latent FVSs

  • Ying Liu
  • Alan Willsky

Gaussian Graphical Models (GGMs) or Gauss Markov random fields are widely used in many applications, and the trade-off between the modeling capacity and the efficiency of learning and inference has been an important research problem. In this paper, we study the family of GGMs with small feedback vertex sets (FVSs), where an FVS is a set of nodes whose removal breaks all the cycles. Exact inference such as computing the marginal distributions and the partition function has complexity $O(k^{2}n)$ using message-passing algorithms, where k is the size of the FVS, and n is the total number of nodes. We propose efficient structure learning algorithms for two cases: 1) All nodes are observed, which is useful in modeling social or flight networks where the FVS nodes often correspond to a small number of high-degree nodes, or hubs, while the rest of the networks is modeled by a tree. Regardless of the maximum degree, without knowing the full graph structure, we can exactly compute the maximum likelihood estimate in $O(kn^2+n^2\log n)$ if the FVS is known or in polynomial time if the FVS is unknown but has bounded size. 2) The FVS nodes are latent variables, where structure learning is equivalent to decomposing a inverse covariance matrix (exactly or approximately) into the sum of a tree-structured matrix and a low-rank matrix. By incorporating efficient inference into the learning steps, we can obtain a learning algorithm using alternating low-rank correction with complexity $O(kn^{2}+n^{2}\log n)$ per iteration. We also perform experiments using both synthetic data as well as real data of flight delays to demonstrate the modeling capacity with FVSs of various sizes. We show that empirically the family of GGMs of size $O(\log n)$ strikes a good balance between the modeling capacity and the efficiency.

AAAI Conference 2007 Conference Paper

TableRank: A Ranking Algorithm for Table Search and Retrieval

  • Ying Liu
  • Prasenjit Mitra

Tables are ubiquitous in web pages and scientific documents. With the explosive development of the web, tables have become a valuable information repository. Therefore, effectively and efficiently searching tables becomes a challenge. Existing search engines do not provide satisfactory search results largely because the current ranking schemes are inadequate for table search and automatic table understanding and extraction are rather difficult in general. In this work, we design and evaluate a novel table ranking algorithm – TableRank to improve the performance of our table search engine Table- Seer. Given a keyword based table query, TableRank facilities TableSeer to return the most relevant tables by tailoring the classic vector space model. TableRank adopts an innovative term weighting scheme by aggregating multiple weighting factors from three levels: term, table and document. The experimental results show that our table search engine outperforms existing search engines on table search. In addition, incorporating multiple weighting factors can significantly improve the ranking results.