Arrow Research search

Author name cluster

Yasha Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

24 papers
2 author rows

Possible papers

24

AAAI Conference 2026 Conference Paper

Toward Better EHR Reasoning in LLMs: Reinforcement Learning with Expert Attention Guidance

  • Yue Fang
  • Yuxin Guo
  • Jiaran Gao
  • Hongxin Ding
  • Xinke Jiang
  • Weibin Liao
  • Yongxin Xu
  • Yinghao Zhu

Improving large language models (LLMs) for electronic health record (EHR) reasoning is essential for enabling accurate and generalizable clinical predictions. While LLMs excel at medical text understanding, they underperform on EHR-based prediction tasks due to challenges in modeling temporally structured, high-dimensional data. Existing approaches often rely on hybrid paradigms, where LLMs serve merely as frozen prior retrievers while downstream deep learning (DL) models handle prediction, failing to improve the LLM’s intrinsic reasoning capacity and inheriting the generalization limitations of DL models. To this end, we propose EAG-RL, a novel two-stage training framework designed to intrinsically enhance LLMs’ EHR reasoning ability through expert attention guidance, where expert EHR models refer to task-specific DL models trained on EHR data. Concretely, EAG-RL first constructs high-quality, stepwise reasoning trajectories using expert-guided Monte Carlo Tree Search to effectively initialize the LLM’s policy. Then, EAG-RL further optimizes the policy via reinforcement learning by aligning the LLM’s attention with clinically salient features identified by expert EHR models. Extensive experiments on two real-world EHR datasets show that EAG-RL improves the intrinsic EHR reasoning ability of LLMs by an average of 14.62%, while also enhancing robustness to feature perturbations and generalization to unseen clinical domains. These results demonstrate the practical potential of EAG-RL for real-world deployment in clinical prediction tasks.

AAAI Conference 2025 Conference Paper

DearLLM: Enhancing Personalized Healthcare via Large Language Models-Deduced Feature Correlations

  • Yongxin Xu
  • Xinke Jiang
  • Xu Chu
  • Rihong Qiu
  • Yujie Feng
  • Hongxin Ding
  • Junfeng Zhao
  • Yasha Wang

Exploring the correlations between medical features is essential for extracting patient health patterns from electronic health records (EHR) data, and strengthening medical predictions and decision-making. To constrain the hypothesis space of pure data-driven deep learning in the context of limited annotated data, a common trend is to incorporate external knowledge, especially knowledge priors related to personalized health contexts, to optimize model training. However, most existing methods lack flexibility and are constrained by the uncertainties brought about by fixed feature correlation priors. In addition, in utilizing knowledge, these methods overlook the knowledge informative for personalized healthcare. To this end, we propose DearLLM, a novel and effective framework that leverages feature correlations deduced by large language models (LLMs) to enhance personalized healthcare. Concretely, DearLLM captures and learns quantitative correlations between medical features by calculating the conditional perplexity of LLMs’ deduction based on personalized patient backgrounds. Then, DearLLM enhances healthcare predictions by emphasizing knowledge that carries unique patient information through a feature-frequency-aware graph pooling method. Extensive experiments on two real-world benchmark datasets show significant performance gains brought by DearLLM. Furthermore, the discovered findings align well with medical literature, offering meaningful clinical interpretations.

ICLR Conference 2025 Conference Paper

DRESSing Up LLM: Efficient Stylized Question-Answering via Style Subspace Editing

  • Xinyu Ma
  • Yifeng Xu
  • Yang Lin
  • Tianlong Wang
  • Xu Chu
  • Xin Gao
  • Junfeng Zhao 0001
  • Yasha Wang

We introduce DRESS, a novel approach for generating stylized large language model (LLM) responses through representation editing. Existing methods like prompting and fine-tuning are either insufficient for complex style adaptation or computationally expensive, particularly in tasks like NPC creation or character role-playing. Our approach leverages the over-parameterized nature of LLMs to disentangle a style-relevant subspace within the model's representation space to conduct representation editing, ensuring a minimal impact on the original semantics. By applying adaptive editing strengths, we dynamically adjust the steering vectors in the style subspace to maintain both stylistic fidelity and semantic integrity. We develop two stylized QA benchmark datasets to validate the effectiveness of DRESS, and the results demonstrate significant improvements compared to baseline methods such as prompting and ITI. In short, DRESS is a lightweight, train-free solution for enhancing LLMs with flexible and effective style control, making it particularly useful for developing stylized conversational agents. Codes and benchmark datasets are available at https://github.com/ArthurLeoM/DRESS-LLM.

ICML Conference 2025 Conference Paper

Efficient Graph Continual Learning via Lightweight Graph Neural Tangent Kernels-based Dataset Distillation

  • Rihong Qiu
  • Xinke Jiang
  • Yuchen Fang 0001
  • Hongbin Lai
  • Hao Miao 0001
  • Xu Chu
  • Junfeng Zhao 0001
  • Yasha Wang

Graph Neural Networks (GNNs) have emerged as a fundamental tool for modeling complex graph structures across diverse applications. However, directly applying pretrained GNNs to varied downstream tasks without fine-tuning-based continual learning remains challenging, as this approach incurs high computational costs and hinders the development of Large Graph Models (LGMs). In this paper, we investigate an efficient and generalizable dataset distillation framework for Graph Continual Learning (GCL) across multiple downstream tasks, implemented through a novel Lightweight Graph Neural Tangent Kernel (LIGHTGNTK). Specifically, LIGHTGNTK employs a low-rank approximation of the Laplacian matrix via Bernoulli sampling and linear association within the GNTK. This design enables efficient capture of both structural and feature relationships while supporting gradient-based dataset distillation. Additionally, LIGHTGNTK incorporates a unified subgraph anchoring strategy, allowing it to handle graph-level, node-level, and edge-level tasks under diverse input structures. Comprehensive experiments on several datasets show that LIGHTGNTK achieves state-of-the-art performance in GCL scenarios, promoting the development of adaptive and scalable LGMs.

AAAI Conference 2025 Conference Paper

KnowPO: Knowledge-Aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models

  • Ruizhe Zhang
  • Yongxin Xu
  • Yuzhen Xiao
  • Runchuan Zhu
  • Xinke Jiang
  • Xu Chu
  • Junfeng Zhao
  • Yasha Wang

By integrating external knowledge, Retrieval-Augmented Generation (RAG) has become an effective strategy for mitigating the hallucination problems that large language models (LLMs) encounter when dealing with knowledge-intensive tasks. However, in the process of integrating external non-parametric supporting evidence with internal parametric knowledge, inevitable knowledge conflicts may arise, leading to confusion in the model's responses. To enhance the knowledge selection of LLMs in various contexts, some research has focused on refining their behavior patterns through instruction-tuning. Nonetheless, due to the absence of explicit negative signals and comparative objectives, models fine-tuned in this manner may still exhibit undesirable behaviors such as contextual ignorance and contextual overinclusion. To this end, we propose a Knowledge-aware Preference Optimization strategy, dubbed KnowPO, aimed at achieving adaptive knowledge selection based on contextual relevance in real retrieval scenarios. Concretely, we proposed a general paradigm for constructing knowledge conflict datasets, which comprehensively cover various error types and learn how to avoid these negative signals through preference optimization methods. Simultaneously, we proposed a rewriting strategy and data ratio optimization strategy to address preference imbalances. Experimental results show that KnowPO outperforms previous methods for handling knowledge conflicts by over 37%, while also exhibiting robust generalization across various out-of-distribution datasets.

NeurIPS Conference 2025 Conference Paper

Magical: Medical Lay Language Generation via Semantic Invariance and Layperson-tailored Adaptation

  • Weibin Liao
  • Tianlong Wang
  • Yinghao Zhu
  • Yasha Wang
  • Junyi Gao
  • Liantao Ma

Medical Lay Language Generation (MLLG) plays a vital role in improving the accessibility of complex scientific content for broader audiences. Recent literature to MLLG commonly employ parameter-efficient fine-tuning methods such as Low-Rank Adaptation (LoRA) to fine-tuning large language models (LLMs) using paired expert-lay language datasets. However, LoRA struggles with the challenges posed by multi-source heterogeneous MLLG datasets. Specifically, through a series of exploratory experiments, we reveal that standard LoRA fail to meet the requirement for semantic fidelity and diverse lay-style generation in MLLG task. To address these limitations, we propose Magical, an asymmetric LoRA architecture tailored for MLLG under heterogeneous data scenarios. Magical employs a shared matrix A for abstractive summarization, along with multiple isolated matrices B for diverse lay-style generation. To preserve semantic fidelity during the lay language generation process, Magical introduces a Semantic Invariance Constraint to mitigate semantic subspace shifts on matrix A. Furthermore, to better adapt to diverse lay-style generation, Magical incorporates the Recommendation-guided Switch, an externally interface to prompt the LLM to switch between different matrices B. Experimental results on three real-world lay language generation datasets demonstrate that Magical consistently outperforms prompt-based methods, vanilla LoRA, and its recent variants, while also reducing trainable parameters by 31. 66%. Our code is publicly available at https: //github. com/tianlwang/Magical. git.

ICLR Conference 2025 Conference Paper

TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees

  • Weibin Liao
  • Xu Chu
  • Yasha Wang

In the domain of complex reasoning tasks, such as mathematical reasoning, recent advancements have proposed the use of Direct Preference Optimization (DPO) to suppress output of dispreferred responses, thereby enhancing the long-chain reasoning capabilities of large language models (LLMs). To this end, these studies employed LLMs to generate preference trees via Tree-of-thoughts (ToT) and sample the paired preference responses required by the DPO algorithm. However, the DPO algorithm based on binary preference optimization is unable to learn multiple responses with varying degrees of preference/dispreference that provided by the preference trees, resulting in incomplete preference learning. In this work, we introduce Tree Preference Optimization (TPO), that does not sample paired preference responses from the preference tree; instead, it directly learns from the entire preference tree during the fine-tuning. Specifically, TPO formulates the language model alignment as a Preference List Ranking problem, where the policy can potentially learn more effectively from a ranked preference list of responses given the prompt. In addition, to further assist LLMs in identifying discriminative steps within long-chain reasoning and increase the relative reward margin in the preference list, TPO utilizes Adaptive Step Reward to adjust the reward values of each step in trajectory for performing fine-grained preference optimization. We carry out extensive experiments on mathematical reasoning tasks to evaluate TPO. The experimental results indicate that TPO consistently outperforms DPO across five public large language models on four datasets.

NeurIPS Conference 2024 Conference Paper

RAGraph: A General Retrieval-Augmented Graph Learning Framework

  • Xinke Jiang
  • Rihong Qiu
  • Yongxin Xu
  • Wentao Zhang
  • Yichen Zhu
  • Ruizhe Zhang
  • Yuchen Fang
  • Xu Chu

Graph Neural Networks (GNNs) have become essential in interpreting relational data across various domains, yet, they often struggle to generalize to unseen graph data that differs markedly from training instances. In this paper, we introduce a novel framework called General Retrieval-Augmented Graph Learning (RAGraph), which brings external graph data into the general graph foundation model to improve model generalization on unseen scenarios. On the top of our framework is a toy graph vector library that we established, which captures key attributes, such as features and task-specific label information. During inference, the RAGraph adeptly retrieves similar toy graphs based on key similarities in downstream tasks, integrating the retrieved data to enrich the learning context via the message-passing prompting mechanism. Our extensive experimental evaluations demonstrate that RAGraph significantly outperforms state-of-the-art graph learning methods in multiple tasks such as node classification, link prediction, and graph classification across both dynamic and static datasets. Furthermore, extensive testing confirms that RAGraph consistently maintains high performance without the need for task-specific fine-tuning, highlighting its adaptability, robustness, and broad applicability.

NeurIPS Conference 2024 Conference Paper

SMART: Towards Pre-trained Missing-Aware Model for Patient Health Status Prediction

  • Zhihao Yu
  • Xu Chu
  • Yujie Jin
  • Yasha Wang
  • Junfeng Zhao

Electronic health record (EHR) data has emerged as a valuable resource for analyzing patient health status. However, the prevalence of missing data in EHR poses significant challenges to existing methods, leading to spurious correlations and suboptimal predictions. While various imputation techniques have been developed to address this issue, they often obsess difficult-to-interpolate details and may introduce additional noise when making clinical predictions. To tackle this problem, we propose SMART, a Self-Supervised Missing-Aware RepresenTation Learning approach for patient health status prediction, which encodes missing information via missing-aware temporal and variable attentions and learns to impute missing values through a novel self-supervised pre-training approach which reconstructs missing data representations in the latent space rather than in input space as usual. By adopting elaborated attentions and focusing on learning higher-order representations, SMART promotes better generalization and robustness to missing data. We validate the effectiveness of SMART through extensive experiments on six EHR tasks, demonstrating its superiority over state-of-the-art methods.

NeurIPS Conference 2023 Conference Paper

Fused Gromov-Wasserstein Graph Mixup for Graph-level Classifications

  • Xinyu Ma
  • Xu Chu
  • Yasha Wang
  • Yang Lin
  • Junfeng Zhao
  • Liantao Ma
  • Wenwu Zhu

Graph data augmentation has shown superiority in enhancing generalizability and robustness of GNNs in graph-level classifications. However, existing methods primarily focus on the augmentation in the graph signal space and the graph structure space independently, neglecting the joint interaction between them. In this paper, we address this limitation by formulating the problem as an optimal transport problem that aims to find an optimal inter-graph node matching strategy considering the interactions between graph structures and signals. To solve this problem, we propose a novel graph mixup algorithm called FGWMixup, which seeks a "midpoint" of source graphs in the Fused Gromov-Wasserstein (FGW) metric space. To enhance the scalability of our method, we introduce a relaxed FGW solver that accelerates FGWMixup by improving the convergence rate from $\mathcal{O}(t^{-1})$ to $\mathcal{O}(t^{-2})$. Extensive experiments conducted on five datasets using both classic (MPNNs) and advanced (Graphormers) GNN backbones demonstrate that \mname\xspace effectively improves the generalizability and robustness of GNNs. Codes are available at https: //github. com/ArthurLeoM/FGWMixup.

AAAI Conference 2023 Conference Paper

KerPrint: Local-Global Knowledge Graph Enhanced Diagnosis Prediction for Retrospective and Prospective Interpretations

  • Kai Yang
  • Yongxin Xu
  • Peinie Zou
  • Hongxin Ding
  • Junfeng Zhao
  • Yasha Wang
  • Bing Xie

While recent developments of deep learning models have led to record-breaking achievements in many areas, the lack of sufficient interpretation remains a problem for many specific applications, such as the diagnosis prediction task in healthcare. The previous knowledge graph(KG) enhanced approaches mainly focus on learning clinically meaningful representations, the importance of medical concepts, and even the knowledge paths from inputs to labels. However, it is infeasible to interpret the diagnosis prediction, which needs to consider different medical concepts, various medical relationships, and the time-effectiveness of knowledge triples in different patient contexts. More importantly, the retrospective and prospective interpretations of disease processes are valuable to clinicians for the patients' confounding diseases. We propose KerPrint, a novel KG enhanced approach for retrospective and prospective interpretations to tackle these problems. Specifically, we propose a time-aware KG attention method to solve the problem of knowledge decay over time for trustworthy retrospective interpretation. We also propose a novel element-wise attention method to select candidate global knowledge using comprehensive representations from the local KG for prospective interpretation. We validate the effectiveness of our KerPrint through an extensive experimental study on a real-world dataset and a public dataset. The results show that our proposed approach not only achieves significant improvement over knowledge-enhanced methods but also gives the interpretability of diagnosis prediction in both retrospective and prospective views.

IJCAI Conference 2023 Conference Paper

VecoCare: Visit Sequences-Clinical Notes Joint Learning for Diagnosis Prediction in Healthcare Data

  • Yongxin Xu
  • Kai Yang
  • Chaohe Zhang
  • Peinie Zou
  • Zhiyuan Wang
  • Hongxin Ding
  • Junfeng Zhao
  • Yasha Wang

Due to the insufficiency of electronic health records (EHR) data utilized in practical diagnosis prediction scenarios, most works are devoted to learning powerful patient representations either from structured EHR data (e. g. , temporal medical events, lab test results, etc. ) or unstructured data (e. g. , clinical notes, etc. ). However, synthesizing rich information from both of them still needs to be explored. Firstly, the heterogeneous semantic biases across them heavily hinder the synthesis of representation spaces, which is critical for diagnosis prediction. Secondly, the intermingled quality of partial clinical notes leads to inadequate representations of to-be-predicted patients. Thirdly, typical attention mechanisms mainly focus on aggregating information from similar patients, ignoring important auxiliary information from others. To tackle these challenges, we propose a novel visit sequences-clinical notes joint learning approach, dubbed VecoCare. It performs a Gromov-Wasserstein Distance (GWD)-based contrastive learning task and an adaptive masked language model task in a sequential pre-training manner to reduce heterogeneous semantic biases. After pre-training, VecoCare further aggregates information from both similar and dissimilar patients through a dual-channel retrieval mechanism. We conduct diagnosis prediction experiments on two real-world datasets, which indicates that VecoCare outperforms state-of-the-art approaches. Moreover, the findings discovered by VecoCare are consistent with the medical researches.

ICML Conference 2023 Conference Paper

Wasserstein Barycenter Matching for Graph Size Generalization of Message Passing Neural Networks

  • Xu Chu
  • Yujie Jin
  • Xin Wang 0019
  • Shanghang Zhang
  • Yasha Wang
  • Wenwu Zhu 0001
  • Hong Mei 0001

Graph size generalization is hard for Message passing neural networks (MPNNs). The graph-level classification performance of MPNNs degrades across various graph sizes. Recently, theoretical studies reveal that a slow uncontrollable convergence rate w. r. t. graph size could adversely affect the size generalization. To address the uncontrollable convergence rate caused by correlations across nodes in the underlying dimensional signal-generating space, we propose to use Wasserstein barycenters as graph-level consensus to combat node-level correlations. Methodologically, we propose a Wasserstein barycenter matching (WBM) layer that represents an input graph by Wasserstein distances between its MPNN-filtered node embeddings versus some learned class-wise barycenters. Theoretically, we show that the convergence rate of an MPNN with a WBM layer is controllable and independent to the dimensionality of the signal-generating space. Thus MPNNs with WBM layers are less susceptible to slow uncontrollable convergence rate and size variations. Empirically, the WBM layer improves the size generalization over vanilla MPNNs with different backbones (e. g. , GCN, GIN, and PNA) significantly on real-world graph datasets.

ICML Conference 2022 Conference Paper

DNA: Domain Generalization with Diversified Neural Averaging

  • Xu Chu
  • Yujie Jin
  • Wenwu Zhu 0001
  • Yasha Wang
  • Xin Wang 0019
  • Shanghang Zhang
  • Hong Mei 0001

The inaccessibility of the target domain data causes domain generalization (DG) methods prone to forget target discriminative features, and challenges the pervasive theme in existing literature in pursuing a single classifier with an ideal joint risk. In contrast, this paper investigates model misspecification and attempts to bridge DG with classifier ensemble theoretically and methodologically. By introducing a pruned Jensen-Shannon (PJS) loss, we show that the target square-root risk w. r. t. the PJS loss of the $\rho$-ensemble (the averaged classifier weighted by a quasi-posterior $\rho$) is bounded by the averaged source square-root risk of the Gibbs classifiers. We derive a tighter bound by enforcing a positive principled diversity measure of the classifiers. We give a PAC-Bayes upper bound on the target square-root risk of the $\rho$-ensemble. Methodologically, we propose a diversified neural averaging (DNA) method for DG, which optimizes the proposed PAC-Bayes bound approximately. The DNA method samples Gibbs classifiers transversely and longitudinally by simultaneously considering the dropout variational family and optimization trajectory. The $\rho$-ensemble is approximated by averaging the longitudinal weights in a single run with dropout shut down, ensuring a fast ensemble with low computational overhead. Empirically, the proposed DNA method achieves the state-of-the-art classification performance on standard DG benchmark datasets.

IJCAI Conference 2022 Conference Paper

Domain Generalization through the Lens of Angular Invariance

  • Yujie Jin
  • Xu Chu
  • Yasha Wang
  • Wenwu Zhu

Domain generalization (DG) aims at generalizing a classifier trained on multiple source domains to an unseen target domain with domain shift. A common pervasive theme in existing DG literature is domain-invariant representation learning with various invariance assumptions. However, prior works restrict themselves to an impractical assumption for real-world challenges: If a mapping induced by a deep neural network (DNN) could align the source domains well, then such a mapping aligns a target domain as well. In this paper, we simply take DNNs as feature extractors to relax the requirement of distribution alignment. Specifically, we put forward a novel angular invariance and the accompanied norm shift assumption. Based on the proposed term of invariance, we propose a novel deep DG method dubbed Angular Invariance Domain Generalization Network (AIDGN). The optimization objective of AIDGN is developed with a von-Mises Fisher (vMF) mixture model. Extensive experiments on multiple DG benchmark datasets validate the effectiveness of the proposed AIDGN method.

AAAI Conference 2021 Conference Paper

GRASP: Generic Framework for Health Status Representation Learning Based on Incorporating Knowledge from Similar Patients

  • Chaohe Zhang
  • Xin Gao
  • Liantao Ma
  • Yasha Wang
  • Jiangtao Wang
  • Wen Tang

Deep learning models have been applied to many healthcare tasks based on electronic medical records (EMR) data and shown substantial performance. Existing methods commonly embed the records of a single patient into a representation for medical tasks. Such methods learn inadequate representations and lead to inferior performance, especially when the patient’s data is sparse or low-quality. Aiming at the above problem, we propose GRASP, a generic framework for healthcare models. For a given patient, GRASP first finds patients in the dataset who have similar conditions and similar results (i. e. , the similar patients), and then enhances the representation learning and prognosis of the given patient by leveraging knowledge extracted from these similar patients. GRASP defines similarities with different meanings between patients for different clinical tasks, and finds similar patients with useful information accordingly, and then learns cohort representation to extract valuable knowledge contained in the similar patients. The cohort information is fused with the current patient’s representation to conduct final clinical tasks. Experimental evaluations on two real-world datasets show that GRASP can be seamlessly integrated into state-of-the-art models with consistent performance improvements. Besides, under the guidance of medical experts, we verified the findings extracted by GRASP, and the findings are consistent with the existing medical knowledge, indicating that GRASP can generate useful insights for relevant predictions.

IJCAI Conference 2021 Conference Paper

Learning Groupwise Explanations for Black-Box Models

  • Jingyue Gao
  • Xiting Wang
  • Yasha Wang
  • Yulan Yan
  • Xing Xie

We study two user demands that are important during the exploitation of explanations in practice: 1) understanding the overall model behavior faithfully with limited cognitive load and 2) predicting the model behavior accurately on unseen instances. We illustrate that the two user demands correspond to two major sub-processes in the human cognitive process and propose a unified framework to fulfill them simultaneously. Given a local explanation method, our framework jointly 1) learns a limited number of groupwise explanations that interpret the model behavior on most instances with high fidelity and 2) specifies the region where each explanation applies. Experiments on six datasets demonstrate the effectiveness of our method.

AAAI Conference 2020 Conference Paper

AdaCare: Explainable Clinical Health Status Representation Learning via Scale-Adaptive Feature Extraction and Recalibration

  • Liantao Ma
  • Junyi Gao
  • Yasha Wang
  • Chaohe Zhang
  • Jiangtao Wang
  • Wenjie Ruan
  • Wen Tang
  • Xin Gao

Deep learning-based health status representation learning and clinical prediction have raised much research interest in recent years. Existing models have shown superior performance, but there are still several major issues that have not been fully taken into consideration. First, the historical variation pattern of the biomarker in diverse time scales plays a vital role in indicating the health status, but it has not been explicitly extracted by existing works. Second, key factors that strongly indicate the health risk are different among patients. It is still challenging to adaptively make use of the features for patients in diverse conditions. Third, using prediction models as the black box will limit the reliability in clinical practice. However, none of the existing works can provide satisfying interpretability and meanwhile achieve high prediction performance. In this work, we develop a general health status representation learning model, named AdaCare. It can capture the long and short-term variations of biomarkers as clinical features to depict the health status in multiple time scales. It also models the correlation between clinical features to enhance the ones which strongly indicate the health status and thus can maintain a state-of-the-art performance in terms of prediction accuracy while providing qualitative interpretability. We conduct a health risk prediction experiment on two real-world datasets. Experiment results indicate that AdaCare outperforms state-of-the-art approaches and provides effective interpretability, which is verifiable by clinical experts.

AAAI Conference 2020 Conference Paper

ConCare: Personalized Clinical Feature Embedding via Capturing the Healthcare Context

  • Liantao Ma
  • Chaohe Zhang
  • Yasha Wang
  • Wenjie Ruan
  • Jiangtao Wang
  • Wen Tang
  • Xinyu Ma
  • Xin Gao

Predicting the patient’s clinical outcome from the historical electronic medical records (EMR) is a fundamental research problem in medical informatics. Most deep learning-based solutions for EMR analysis concentrate on learning the clinical visit embedding and exploring the relations between visits. Although those works have shown superior performances in healthcare prediction, they fail to explore the personal characteristics during the clinical visits thoroughly. Moreover, existing works usually assume that the more recent record weights more in the prediction, but this assumption is not suitable for all conditions. In this paper, we propose ConCare to handle the irregular EMR data and extract feature interrelationship to perform individualized healthcare prediction. Our solution can embed the feature sequences separately by modeling the time-aware distribution. ConCare further improves the multi-head self-attention via the cross-head decorrelation, so that the inter-dependencies among dynamic features and static baseline information can be effectively captured to form the personal health context. Experimental results on two real-world EMR datasets demonstrate the effectiveness of ConCare. The medical findings extracted by ConCare are also empirically confirmed by human experts and medical literature.

AAAI Conference 2020 Conference Paper

COTSAE: CO-Training of Structure and Attribute Embeddings for Entity Alignment

  • Kai Yang
  • Shaoqin Liu
  • Junfeng Zhao
  • Yasha Wang
  • Bing Xie

Entity alignment is a fundamental and vital task in Knowledge Graph (KG) construction and fusion. Previous works mainly focus on capturing the structural semantics of entities by learning the entity embeddings on the relational triples and pre-aligned ”seed entities”. Some works also seek to incorporate the attribute information to assist refining the entity embeddings. However, there are still many problems not considered, which dramatically limits the utilization of attribute information in the entity alignment. Different KGs may have lots of different attribute types, and even the same attribute may have diverse data structures and value granularities. Most importantly, attributes may have various ”contributions” to the entity alignment. To solve these problems, we propose COTSAE that combines the structure and attribute information of entities by co-training two embedding learning components, respectively. We also propose a joint attention method in our model to learn the attentions of attribute types and values cooperatively. We verified our COTSAE on several datasets from real-world KGs, and the results showed that it is significantly better than the latest entity alignment methods. The structure and attribute information can complement each other and both contribute to performance improvement.

ICML Conference 2020 Conference Paper

Distance Metric Learning with Joint Representation Diversification

  • Xu Chu
  • Yang Lin
  • Yasha Wang
  • Xiting Wang
  • Hailong Yu
  • Xin Gao
  • Qi Tong

Distance metric learning (DML) is to learn a representation space equipped with a metric, such that similar examples are closer than dissimilar examples concerning the metric. The recent success of DNNs motivates many DML losses that encourage the intra-class compactness and inter-class separability. The trade-off between inter-class compactness and inter-class separability shapes the DML representation space by determining how much information of the original inputs to retain. In this paper, we propose a Distance Metric Learning with Joint Representation Diversification (JRD) that allows a better balancing point between intra-class compactness and inter-class separability. Specifically, we propose a Joint Representation Similarity regularizer that captures different abstract levels of invariant features and diversifies the joint distributions of representations across multiple layers. Experiments on three deep DML benchmark datasets demonstrate the effectiveness of the proposed approach.

AAAI Conference 2019 Conference Paper

Explainable Recommendation through Attentive Multi-View Learning

  • Jingyue Gao
  • Xiting Wang
  • Yasha Wang
  • Xing Xie

Recommender systems have been playing an increasingly important role in our daily life due to the explosive growth of information. Accuracy and explainability are two core aspects when we evaluate a recommendation model and have become one of the fundamental trade-offs in machine learning. In this paper, we propose to alleviate the trade-off between accuracy and explainability by developing an explainable deep model that combines the advantages of deep learning-based models and existing explainable methods. The basic idea is to build an initial network based on an explainable deep hierarchy (e. g. , Microsoft Concept Graph) and improve the model accuracy by optimizing key variables in the hierarchy (e. g. , node importance and relevance). To ensure accurate rating prediction, we propose an attentive multi-view learning framework. The framework enables us to handle sparse and noisy data by co-regularizing among different feature levels and combining predictions attentively. To mine readable explanations from the hierarchy, we formulate personalized explanation generation as a constrained tree node selection problem and propose a dynamic programming algorithm to solve it. Experimental results show that our model outperforms state-of-the-art methods in terms of both accuracy and explainability.

IJCAI Conference 2019 Conference Paper

MLRDA: A Multi-Task Semi-Supervised Learning Framework for Drug-Drug Interaction Prediction

  • Xu Chu
  • Yang Lin
  • Yasha Wang
  • Leye Wang
  • Jiangtao Wang
  • Jingyue Gao

Drug-drug interactions (DDIs) are a major cause of preventable hospitalizations and deaths. Recently, researchers in the AI community try to improve DDI prediction in two directions, incorporating multiple drug features to better model the pharmacodynamics and adopting multi-task learning to exploit associations among DDI types. However, these two directions are challenging to reconcile due to the sparse nature of the DDI labels which inflates the risk of overfitting of multi-task learning models when incorporating multiple drug features. In this paper, we propose a multi-task semi-supervised learning framework MLRDA for DDI prediction. MLRDA effectively exploits information that is beneficial for DDI prediction in unlabeled drug data by leveraging a novel unsupervised disentangling loss CuXCov. The CuXCov loss cooperates with the classification loss to disentangle the DDI prediction relevant part from the irrelevant part in a representation learnt by an autoencoder, which helps to ease the difficulty in mining useful information for DDI prediction in both labeled and unlabeled drug data. Moreover, MLRDA adopts a multi-task learning framework to exploit associations among DDI types. Experimental results on real-world datasets demonstrate that MLRDA significantly outperforms state-of-the-art DDI prediction methods by up to 10. 3% in AUPR.

TIST Journal 2017 Journal Article

SPACE-TA

  • Leye Wang
  • Daqing Zhang
  • Dingqi Yang
  • Animesh Pathak
  • Chao Chen
  • Xiao Han
  • Haoyi Xiong
  • Yasha Wang

Data quality and budget are two primary concerns in urban-scale mobile crowdsensing. Traditional research on mobile crowdsensing mainly takes sensing coverage ratio as the data quality metric rather than the overall sensed data error in the target-sensing area. In this article, we propose to leverage spatiotemporal correlations among the sensed data in the target-sensing area to significantly reduce the number of sensing task assignments. In particular, we exploit both intradata correlations within the same type of sensed data and interdata correlations among different types of sensed data in the sensing task. We propose a novel crowdsensing task allocation framework called SPACE-TA (SPArse Cost-Effective Task Allocation), combining compressive sensing, statistical analysis, active learning, and transfer learning, to dynamically select a small set of subareas for sensing in each timeslot (cycle), while inferring the data of unsensed subareas under a probabilistic data quality guarantee. Evaluations on real-life temperature, humidity, air quality, and traffic monitoring datasets verify the effectiveness of SPACE-TA. In the temperature-monitoring task leveraging intradata correlations, SPACE-TA requires data from only 15.5% of the subareas while keeping the inference error below 0.25°C in 95% of the cycles, reducing the number of sensed subareas by 18.0% to 26.5% compared to baselines. When multiple tasks run simultaneously, for example, for temperature and humidity monitoring, SPACE-TA can further reduce ∼10% of the sensed subareas by exploiting interdata correlations.