Arrow Research search

Author name cluster

Yu Zhao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

36 papers
2 author rows

Possible papers

36

AAAI Conference 2026 Conference Paper

Grow-on-Demand: Sparse and Adaptive Expert Expansion for Continual Instruction Tuning

  • Ying Zhang
  • Xingyue Guo
  • Yu Zhao
  • Xuhui Sui
  • Baohang Zhou
  • Xinying Qian
  • Xiaojie Yuan

Continual instruction tuning aims to incrementally adapt large language models to new tasks without forgetting previously acquired knowledge. Existing approaches often struggle to balance plasticity and stability. Replay-based methods retrain on historical data, which raises privacy concerns. Architecture-based methods allocate task-specific components, resulting in significant parameter growth. To address this, we consider a structure-sharing strategy that enables parameter reuse across similar tasks and expands only when necessary, avoiding any data replay. Specifically, we introduce Grow-on-Demand (GoD-MoE), a parameter-efficient framework that is based on sparse and adaptive expert module expansion for continual instruction tuning. GoD-MoE inserts multiple LoRA-based experts into attention layers and dynamically activates a small subset of experts for each task. To avoid redundant parameter growth, we develop an Expert Demand Detector that determines whether new experts are added, facilitating adaptive structural sharing and minimizing parameter overhead. We conduct comprehensive experiments on the TRACE benchmark, demonstrating that GoD-MoE achieves state-of-the-art performance. Furthermore, it effectively mitigates catastrophic forgetting and even outperforms several advanced replay-based baselines.

AAAI Conference 2026 Conference Paper

Knowledge Graph Guided Heterogeneity-Informed Diffusion Model for Spatio-Temporal Generation

  • Zi'ang Wang
  • Lei Chen
  • Yuanchang Jin
  • Pan Deng
  • Shuangshuang Pang
  • Junting Liu
  • Yu Zhao

Spatio-temporal data generation aims to synthesize realistic urban data across graph nodes by learning spatial and temporal dependencies. This task plays a crucial role in urban planning by enabling the simulation of unobserved nodes. However, existing approaches face critical limitations that time series generation methods fail to generalize to unseen nodes, while spatio-temporal generative models are either restricted to the trajectory generation task or dependent on auxiliary data inputs. To bridge these gaps, we propose a Knowledge Graph Guided Heterogeneity-Informed Diffusion Model (KGDiff) in this paper through the following key innovations. First, we design a geometry-aware mixture of experts integrating Euclidean, hyperbolic, and hyperspherical representations to comprehensively encode urban structural knowledge. Next, we present a learnable meta spatio-temporal pattern module that normalizes node-specific heterogeneity before the generation process, and a conditional denoising process that progressively transforms random noise into realistic samples under structural guidance. Finally, extensive experiments across real-world urban datasets demonstrate that KGDiff achieves the state-of-art performance in generating realistic urban spatio-temporal data.

AAAI Conference 2026 Conference Paper

Transferable Graph Condensation from the Causal Perspective

  • Huaming Du
  • Yijie Huang
  • Su Yao
  • Yiying Wang
  • Yueyang Zhou
  • Jingwen Yang
  • Jinshi Zhang
  • Han Ji

The increasing scale of graph datasets has significantly improved the performance of graph representation learning methods, but it has also introduced substantial training challenges. Graph dataset condensation techniques have emerged to compress large datasets into smaller yet information-rich datasets, while maintaining similar test performance. However, these methods strictly require downstream applications to match the original dataset and task, which often fails in cross-task and cross-domain scenarios. To address these challenges, we propose a novel causal-invariance-based and transferable graph dataset condensation method, named TGCC, providing effective and transferable condensed datasets. Specifically, to preserve domain-invariant knowledge, we first extract domain causal-invariant features from the spatial domain of the graph using causal interventions. Then, to fully capture the structural and feature information of the original graph, we perform enhanced condensation operations. Finally, through spectral-domain Enhanced contrastive learning, we inject the causal-invariant features into the condensed graph, ensuring that the compressed graph retains the causal information of the original graph. Experimental results on five public datasets and our novel FinReport dataset demonstrate that TGCC achieves up to a 13.41% improvement in cross-task and cross-domain complex scenarios compared to existing methods, and achieves state-of-the-art performance on 5 out of 6 datasets in the single dataset and task scenario.

ICRA Conference 2025 Conference Paper

Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V

  • Peiyuan Zhi
  • Zhiyuan Zhang
  • Yu Zhao
  • Muzhi Han
  • Zeyu Zhang 0001
  • Zhitian Li
  • Ziyuan Jiao
  • Baoxiong Jia

Autonomous robot navigation and manipulation in open environments require reasoning and replanning with closed-loop feedback. In this work, we present COME-robot, the first closed-loop robotic system utilizing the GPT-4V vision-language foundation model for open-ended reasoning and adaptive planning in real-world scenarios. COME-robot incorporates two key innovative modules: (i) a multi-level open-vocabulary perception and situated reasoning module that enables effective exploration of the 3D environment and target object identification using commonsense knowledge and situated information, and (ii) an iterative closed-loop feedback and restoration mechanism that verifies task feasibility, monitors execution success, and traces failure causes across different modules for robust failure recovery. Through comprehensive experiments involving 8 challenging real-world mobile and tabletop manipulation tasks, COME-robot demonstrates a significant improvement in task success rate ( $\sim 35 \%$ ) compared to state-of-the-art methods. We further conduct comprehensive analyses to elucidate how COME-robot's design facilitates failure recovery, free-form instruction following, and long-horizon task planning.

JBHI Journal 2025 Journal Article

Estimation of Ankle Joint Moment From Plantar Pressure Through an Optimized Sensor Layout Using Genetic Algorithm and Deep Forest Regression

  • Mingxia Gong
  • Wenxuan Chen
  • Yih-Kuen Jan
  • Yu Zhao
  • Jie Yao
  • Yan Wang
  • Weiyan Ren
  • Fang Pu

Objective: Ankle joint moments are critical in gait analysis, with accurate assessments typically necessitating complex inverse dynamics modeling. Pressure insoles are widely used wearable devices that have shown feasibility in estimating joint angles. However, achieving cost-effective, high-precision estimation of ankle joint moment remains challenging. This study combines genetic algorithm (GA) with deep forest regression (DFR) to optimize the number and layout of plantar pressure sensors, and estimate ankle joint moment based on plantar pressure. Methods: 26 healthy young participants were recruited to collect motion trajectories, ground reaction forces, and plantar pressure data while walking at fast, medium, and slow speeds. Ten gait cycles per speed per participant were analyzed for ankle joint moments using inverse dynamics, constituting the dataset. An optimization algorithm was constructed by combining GA with DFR, using the fitness function as the objective for sensor number and layout optimization. The leave-one-out cross-validation was employed to evaluate the precision of the model. Results: The highest fitness was achieved with an optimized layout using 9 sensors. The Pearson Correlation Coefficients for the sagittal, coronal, and transverse plane moments were 0. 967 ± 0. 014, 0. 918 ± 0. 027, and 0. 894 ± 0. 073. The optimized layout showed no significant difference in estimation accuracy across various walking speeds (P > 0. 05). Conclusion: The proposed GA-DFR algorithm is capable of estimating ankle joint moment accurately and optimizing the number and layout of sensors. Significance: The algorithm and optimized sensor layout enables the accurate and rapid estimation of ankle joint moment from plantar pressure insoles with trade-off approach.

IROS Conference 2025 Conference Paper

GIPD: Global Intent Prediction and Decomposition of Cooperative Multi-Robot System in Non-Communication Environments

  • Yu Zhao
  • Zhe Liu
  • Haoyu Wei
  • Kai Wang
  • Haitao Wang
  • Duwen Zhai
  • Kefan Jin
  • Haibin Shao

In complex multi-robot application scenarios, particularly in dynamically adversarial, hazardous, or disaster environments, traditional cooperation paradigms face significant challenges due to unreliable or absent communication links. Achieving efficient cooperation in the absence of communication has become a key bottleneck limiting the performance of multirobot systems. In this paper, we propose a Global Intent Prediction and Decomposition (GIPD) framework that enables robots to perform cooperative behavior without relying on communication. Each robot independently infers a globally consistent intent based solely on its local observations, ensuring implicit alignment across the system. Given the inferred global intent, robots autonomously determine their responsibilities and select the most appropriate tasks. They then base their local decision-making on the global intent, selected tasks, and individual observations, thereby facilitating effective execution and cooperation. We validate our approach using the MPE and SMAC benchmarks. Additionally, real-world experiments involving multiple ships demonstrate the effectiveness and practical applicability of the proposed GIPD method.

AIJ Journal 2025 Journal Article

Grammar induction from visual, speech and text

  • Yu Zhao
  • Hao Fei
  • Shengqiong Wu
  • Meishan Zhang
  • Min Zhang
  • Tat-Seng Chua

Grammar Induction (GI) seeks to uncover the underlying grammatical rules and linguistic patterns of a language, positioning it as a pivotal research topic within Artificial Intelligence (AI). Although extensive research in GI has predominantly focused on text or other singular modalities, we reveal that GI could significantly benefit from rich heterogeneous signals, such as text, vision, and acoustics. In the process, features from distinct modalities essentially serve complementary roles to each other. With such intuition, this work introduces a novel unsupervised visual-audio-text grammar induction task (named VAT-GI ), to induce the constituent grammar trees from parallel images, text, and speech inputs. Inspired by the fact that language grammar natively exists beyond the texts, we argue that the text has not to be the predominant modality in grammar induction. Thus we further introduce a textless setting of VAT-GI, wherein the task solely relies on visual and auditory inputs. To approach the task, we propose a visual-audio-text inside-outside recursive autoencoder ( VaTiora ) framework, which leverages rich modal-specific and complementary features for effective grammar parsing. Besides, a more challenging benchmark data is constructed to assess the generalization ability of VAT-GI system. Experiments on two benchmark datasets demonstrate that our proposed VaTiora system is more effective in incorporating the various multimodal signals, and also presents new state-of-the-art performance of VAT-GI. Further in-depth analyses are shown to gain a deep understanding of the VAT-GI task and how our VaTiora system advances. Our code and data: https://github.com/LLLogen/VAT-GI/

NeurIPS Conference 2025 Conference Paper

MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly

  • Zhaowei Wang
  • Wenhao Yu
  • Xiyu REN
  • Jipeng Zhang
  • Yu Zhao
  • Rohit Saxena
  • Liang Cheng
  • Ginny Wong

The rapid extension of context windows in large vision-language models has given rise to long-context vision-language models (LCVLMs), which are capable of handling hundreds of images with interleaved text tokens in a single forward pass. In this work, we introduce MMLongBench, the first benchmark covering a diverse set of long-context vision-language tasks, to evaluate LCVLMs effectively and thoroughly. MMLongBench is composed of 13, 331 examples spanning five different categories of downstream tasks, such as Visual RAG and Many-Shot ICL. It also provides broad coverage of image types, including various natural and synthetic images. To assess the robustness of the models to different input lengths, all examples are delivered at five standardized input lengths (8K-128K tokens) via a cross-modal tokenization scheme that combines vision patches and text tokens. Through a thorough benchmarking of 46 closed-source and open-source LCVLMs, we provide a comprehensive analysis of the current models' vision-language long-context ability. Our results show that: i) performance on a single task is a weak proxy for overall long-context capability; ii) both closed-source and open-source models face challenges in long-context vision-language tasks, indicating substantial room for future improvement; iii) models with stronger reasoning ability tend to exhibit better long-context performance. By offering wide task coverage, various image types, and rigorous length control, MMLongBench provides the missing foundation for diagnosing and advancing the next generation of LCVLMs.

AAAI Conference 2025 Conference Paper

Structured Packing in LLM Training Improves Long Context Utilization

  • Konrad Staniszewski
  • Szymon Tworkowski
  • Sebastian Jaszczur
  • Yu Zhao
  • Henryk Michalewski
  • Łukasz Kuciński
  • Piotr Miłoś

Recent advancements in long-context language modeling have attracted significant attention, yet their practical applications often suffer from suboptimal context utilization. To efficiently address this issue, we introduce the Structured Packing for Long Context, SPLiCe, a method that uses retrieval to collate mutually relevant documents into long training samples. We demonstrate that SPLiCe improves performance on long-context tasks, particularly by achieving perfect accuracy on the synthetic Needle in the Haystack benchmark, and effectively mitigating the ‘lost-in-the-middle’ phenomenon often observed in large language models. Notably, these long-context capabilities also extend to realistic downstream tasks, such as Qasper, across multiple model sizes—3B, 7B, and 13B—and are achieved with only brief fine-tuning on 2-6 billion tokens. We supplement these results with a detailed analysis of SPLiCe, examining the impact of hyperparameter choices, the different mixtures and proportions of SPLiCe-generated training data, and the choice of the retriever. We also study the transfer of long-context utilization skills between the modalities. An intriguing finding from our analysis is that training on a corpus of code can enhance performance on natural language tasks.

ICLR Conference 2025 Conference Paper

Training-free LLM-generated Text Detection by Mining Token Probability Sequences

  • Yihuai Xu
  • Yongwei Wang
  • Yifei Bi
  • Huangsen Cao
  • Zhouhan Lin
  • Yu Zhao
  • Fei Wu 0001

Large language models (LLMs) have demonstrated remarkable capabilities in generating high-quality texts across diverse domains. However, the potential misuse of LLMs has raised significant concerns, underscoring the urgent need for reliable detection of LLM-generated texts. Conventional training-based detectors often struggle with generalization, particularly in cross-domain and cross-model scenarios. In contrast, training-free methods, which focus on inherent discrepancies through carefully designed statistical features, offer improved generalization and interpretability. Despite this, existing training-free detection methods typically rely on global text sequence statistics, neglecting the modeling of local discriminative features, thereby limiting their detection efficacy. In this work, we introduce a novel training-free detector, termed \textbf{Lastde}\footnote{The code and data are released at \url{https://github.com/TrustMedia-zju/Lastde_Detector}.} that synergizes local and global statistics for enhanced detection. For the first time, we introduce time series analysis to LLM-generated text detection, capturing the temporal dynamics of token probability sequences. By integrating these local statistics with global ones, our detector reveals significant disparities between human and LLM-generated texts. We also propose an efficient alternative, \textbf{Lastde++} to enable real-time detection. Extensive experiments on six datasets involving cross-domain, cross-model, and cross-lingual detection scenarios, under both white-box and black-box settings, demonstrated that our method consistently achieves state-of-the-art performance. Furthermore, our approach exhibits greater robustness against paraphrasing attacks compared to existing baseline methods.

AAAI Conference 2024 Conference Paper

A Label Disambiguation-Based Multimodal Massive Multiple Instance Learning Approach for Immune Repertoire Classification

  • Fan Xu
  • Yu Zhao
  • Bingzhe Wu
  • Yueshan Huang
  • Qin Ren
  • Yang Xiao
  • Bing He
  • Jie Zheng

One individual human’s immune repertoire consists of a huge set of adaptive immune receptors at a certain time point, representing the individual's adaptive immune state. Immune repertoire classification and associated receptor identification have the potential to make a transformative contribution to the development of novel vaccines and therapies. The vast number of instances and exceedingly low witness rate pose a great challenge to the immune repertoire classification, which can be formulated as a Massive Multiple Instance Learning (MMIL) problem. Traditional MIL methods, at both bag-level and instance-level, confront the issues of substantial computational burden or supervision ambiguity when handling massive instances. To address these issues, we propose a novel label disambiguation-based multimodal massive multiple instance learning approach (LaDM³IL) for immune repertoire classification. LaDM³IL adapts the instance-level MIL paradigm to deal with the issue of high computational cost and employs a specially-designed label disambiguation module for label correction, mitigating the impact of misleading supervision. To achieve a more comprehensive representation of each receptor, LaDM³IL leverages a multimodal fusion module with gating-based attention and tensor-fusion to integrate the information from gene segments and amino acid (AA) sequences of each immune receptor. Extensive experiments on the Cytomegalovirus (CMV) and Cancer datasets demonstrate the superior performance of the proposed LaDM³IL for both immune repertoire classification and associated receptor identification tasks. The code is publicly available at https://github.com/Josie-xufan/LaDM3IL.

NeurIPS Conference 2024 Conference Paper

DePLM: Denoising Protein Language Models for Property Optimization

  • Zeyuan Wang
  • Keyan Ding
  • Ming Qin
  • Xiaotong Li
  • Xiang Zhuang
  • Yu Zhao
  • Jianhua Yao
  • Qiang Zhang

Protein optimization is a fundamental biological task aimed at enhancing theperformance of proteins by modifying their sequences. Computational methodsprimarily rely on evolutionary information (EI) encoded by protein languagemodels (PLMs) to predict fitness landscape for optimization. However, thesemethods suffer from a few limitations. (1) Evolutionary processes involve thesimultaneous consideration of multiple functional properties, often overshadowingthe specific property of interest. (2) Measurements of these properties tend to betailored to experimental conditions, leading to reduced generalizability of trainedmodels to novel proteins. To address these limitations, we introduce DenoisingProtein Language Models (DePLM), a novel approach that refines the evolutionaryinformation embodied in PLMs for improved protein optimization. Specifically, weconceptualize EI as comprising both property-relevant and irrelevant information, with the latter acting as “noise” for the optimization task at hand. Our approachinvolves denoising this EI in PLMs through a diffusion process conducted in therank space of property values, thereby enhancing model generalization and ensuringdataset-agnostic learning. Extensive experimental results have demonstrated thatDePLM not only surpasses the state-of-the-art in mutation effect prediction butalso exhibits strong generalization capabilities for novel proteins.

EAAI Journal 2024 Journal Article

Differentiable sampling based efficient architecture search for automatic fault diagnosis

  • Xingwu Zhang
  • Rui Ma
  • Yu Zhao
  • Chenxi Wang
  • Zhibin Zhao
  • Xuefeng Chen

Intelligent diagnosis on rotating machinery has developed rapidly, but different methods have fluctuating performance and fussy design, causing poor effect in practical applications. Thus, it would be great to automatically generate the optimal method for given diagnosis tasks, as differentiable neural architecture search (DNAS) does. However, three challenges severely restrict DNAS methods in industrial scenarios: 1) vibration signals are multi-scale and non-stationary; 2) huge memory cost by supernet-based search is unsuitable to practical diagnosis; 3) manual architecture derivation causes performance collapse between architecture search and practical diagnosis. Thus, we propose Differentiable Sampling based Efficient Architecture Search (DS-EAS), which generates architecture by differentiable sampling. First, the operator involution is introduced to adaptively extract critical features from noisy signals. Second, Gumbel Max-Softmax is adopted to forward sample and backward propagate the gradient on single sub-architecture at one iteration, alleviating huge memory cost. Third, progressively pruning is proposed to eliminate manual discretization error, leading to the final architecture with zero operators. Based on the searched architecture, a deeper one is built to test its real performance. Traction motor experiment is performed to discuss the performance of DS-EAS on three different sample cases. Compared with other state-of-the-art methods, outperformance of DS-EAS is successfully verified.

AAAI Conference 2024 Conference Paper

Harnessing Holistic Discourse Features and Triadic Interaction for Sentiment Quadruple Extraction in Dialogues

  • Bobo Li
  • Hao Fei
  • Lizi Liao
  • Yu Zhao
  • Fangfang Su
  • Fei Li
  • Donghong Ji

Dialogue Aspect-based Sentiment Quadruple (DiaASQ) is a newly-emergent task aiming to extract the sentiment quadruple (i.e., targets, aspects, opinions, and sentiments) from conversations. While showing promising performance, the prior DiaASQ approach unfortunately falls prey to the key crux of DiaASQ, including insufficient modeling of discourse features, and lacking quadruple extraction, which hinders further task improvement. To this end, we introduce a novel framework that not only capitalizes on comprehensive discourse feature modeling, but also captures the intrinsic interaction for optimal quadruple extraction. On the one hand, drawing upon multiple discourse features, our approach constructs a token-level heterogeneous graph and enhances token interactions through a heterogeneous attention network. We further propose a novel triadic scorer, strengthening weak token relations within a quadruple, thereby enhancing the cohesion of the quadruple extraction. Experimental results on the DiaASQ benchmark showcase that our model significantly outperforms existing baselines across both English and Chinese datasets. Our code is available at https://bit.ly/3v27pqA.

NeurIPS Conference 2024 Conference Paper

Synergistic Dual Spatial-aware Generation of Image-to-text and Text-to-image

  • Yu Zhao
  • Hao Fei
  • Xiangtai Li
  • Libo Qin
  • Jiayi Ji
  • Hongyuan Zhu
  • Meishan Zhang
  • Min Zhang

In the visual spatial understanding (VSU) field, spatial image-to-text (SI2T) and spatial text-to-image (ST2I) are two fundamental tasks that appear in dual form. Existing methods for standalone SI2T or ST2I perform imperfectly in spatial understanding, due to the difficulty of 3D-wise spatial feature modeling. In this work, we consider modeling the SI2T and ST2I together under a dual learning framework. During the dual framework, we then propose to represent the 3D spatial scene features with a novel 3D scene graph (3DSG) representation that can be shared and beneficial to both tasks. Further, inspired by the intuition that the easier 3D$\to$image and 3D$\to$text processes also exist symmetrically in the ST2I and SI2T, respectively, we propose the Spatial Dual Discrete Diffusion (SD$^3$) framework, which utilizes the intermediate features of the 3D$\to$X processes to guide the hard X$\to$3D processes, such that the overall ST2I and SI2T will benefit each other. On the visual spatial understanding dataset VSD, our system outperforms the mainstream T2I and I2T methods significantly. Further in-depth analysis reveals how our dual learning strategy advances.

IJCAI Conference 2023 Conference Paper

A Noisy-Label-Learning Formulation for Immune Repertoire Classification and Disease-Associated Immune Receptor Sequence Identification

  • MingCai Chen
  • Yu Zhao
  • Zhonghuang Wang
  • Bing He
  • Jianhua Yao

Immune repertoire classification, a typical multiple instance learning (MIL) problem, is a frontier research topic in computational biology that makes transformative contributions to new vaccines and immune therapies. However, the traditional instance-space MIL, directly assigning bag-level labels to instances, suffers from the massive amount of noisy labels and extremely low witness rate. In this work, we propose a noisy-label-learning formulation to solve the immune repertoire classification task. To remedy the inaccurate supervision of repertoire-level labels for a sequence-level classifier, we design a robust training strategy: The initial labels are smoothed to be asymmetric and are progressively corrected using the model's predictions throughout the training process. Furthermore, two models with the same architecture but different parameter initialization are co-trained simultaneously to remedy the known ``confirmation bias'' problem in the self-training-like schema. As a result, we obtain accurate sequence-level classification and, subsequently, repertoire-level classification. Experiments on the Cytomegalovirus (CMV) and Cancer datasets demonstrate our method's effectiveness and superior performance on sequence-level and repertoire-level tasks. Code available at https: //github. com/TencentAILabHealthcare/NLL-IRC.

AAAI Conference 2023 Conference Paper

Causal Conditional Hidden Markov Model for Multimodal Traffic Prediction

  • Yu Zhao
  • Pan Deng
  • Junting Liu
  • Xiaofeng Jia
  • Mulan Wang

Multimodal traffic flow can reflect the health of the transportation system, and its prediction is crucial to urban traffic management. Recent works overemphasize spatio-temporal correlations of traffic flow, ignoring the physical concepts that lead to the generation of observations and their causal relationship. Spatio-temporal correlations are considered unstable under the influence of different conditions, and spurious correlations may exist in observations. In this paper, we analyze the physical concepts affecting the generation of multimode traffic flow from the perspective of the observation generation principle and propose a Causal Conditional Hidden Markov Model (CCHMM) to predict multimodal traffic flow. In the latent variables inference stage, a posterior network disentangles the causal representations of the concepts of interest from conditional information and observations, and a causal propagation module mines their causal relationship. In the data generation stage, a prior network samples the causal latent variables from the prior distribution and feeds them into the generator to generate multimodal traffic flow. We use a mutually supervised training method for the prior and posterior to enhance the identifiability of the model. Experiments on real-world datasets show that CCHMM can effectively disentangle causal representations of concepts of interest and identify causality, and accurately predict multimodal traffic flow.

AIIM Journal 2023 Journal Article

DDI-GCN: Drug-drug interaction prediction via explainable graph convolutional networks

  • Yi Zhong
  • Houbing Zheng
  • Xiaoming Chen
  • Yu Zhao
  • Tingfang Gao
  • Huiqun Dong
  • Heng Luo
  • Zuquan Weng

Drug-drug interactions (DDI) may lead to unexpected side effects, which is a growing concern in both academia and industry. Many DDIs have been reported, but the underlying mechanisms are not well understood. Predicting and understanding DDIs can help researchers to improve drug safety and protect patient health. Here, we introduce DDI-GCN, a method that utilizes graph convolutional networks (GCN) to predict DDIs based on chemical structures. We demonstrate that this method achieves state-of-the-art prediction performance on the independent hold-out set. It can also provide visualization of structural features associated with DDIs, which can help us to study the underlying mechanisms. To make it easy and accessible to use, we developed a web server for DDI-GCN, which is freely available at http: //wengzq-lab. cn/ddi/.

YNIMG Journal 2023 Journal Article

Functional alterations in bipartite network of white and grey matters during aging

  • Yurui Gao
  • Yu Zhao
  • Muwei Li
  • Richard D. Lawless
  • Kurt G. Schilling
  • Lyuan Xu
  • Andrea T. Shafer
  • Lori L. Beason-Held

The effects of normal aging on functional connectivity (FC) within various brain networks of gray matter (GM) have been well-documented. However, the age effects on the networks of FC between white matter (WM) and GM, namely WM-GM FC, remains unclear. Evaluating crucial properties, such as global efficiency (GE), for a WM-GM FC network poses a challenge due to the absence of closed triangle paths which are essential for assessing network properties in traditional graph models. In this study, we propose a bipartite graph model to characterize the WM-GM FC network and quantify these challenging network properties. Leveraging this model, we assessed the WM-GM FC network properties at multiple scales across 1,462 cognitively normal subjects aged 22-96 years from three repositories (ADNI, BLSA and OASIS-3) and investigated the age effects on these properties throughout adulthood and during late adulthood (age ≥70 years). Our findings reveal that (1) heterogeneous alterations occurred in region-specific WM-GM FC over the adulthood and decline predominated during late adulthood; (2) the FC density of WM bundles engaged in memory, executive function and processing speed declined with age over adulthood, particularly in later years; and (3) the GE of attention, default, somatomotor, frontoparietal and limbic networks reduced with age over adulthood, and GE of visual network declined during late adulthood. These findings provide unpresented insights into multi-scale alterations in networks of WM-GM functional synchronizations during normal aging. Furthermore, our bipartite graph model offers an extendable framework for quantifying WM-engaged networks, which may contribute to a wide range of neuroscience research.

EAAI Journal 2023 Journal Article

Inference and analysis of a new evidential reasoning rule-based performance evaluation model

  • Jie Wang
  • Zhi-Jie Zhou
  • Peng-Yun Ning
  • Shuai-Tong Liu
  • Xiang-Yi Zhou
  • Yu Zhao

In the current studies on the evidential reasoning (ER) rule, a performance evaluation model that utilizes the ER rule to combine evidence constituted by a single observation indicator (Single indicator-based ER rule, SER) has been developed with excellent scalability. However, the belief distribution (BD) of SER-based evaluation model is composed of a set of single grades that describe the system performance and the corresponding belief degrees. This makes some common uncertain judgments with local ignorance unable to be considered in the inference process, resulting in a decrease in the accuracy and validity of the evaluation results. In this paper, a new SER-based performance evaluation model with extended BD is proposed. Firstly, on the basis of the existing BD in SER, some new elements capable of describing local ignorance are introduced. Secondly, by calculating and optimizing the relevant parameters, the new evaluation model is formed. Thirdly, according to the Stone–Weierstrass theorem and information entropy theory, the approximation ability and uncertainty of the evaluation model are discussed respectively. Finally, a practical example is given to illustrate the potential applications of the proposed model in engineering practice.

JBHI Journal 2023 Journal Article

MSHT: Multi-Stage Hybrid Transformer for the ROSE Image Analysis of Pancreatic Cancer

  • Tianyi Zhang
  • Yunlu Feng
  • Yu Zhao
  • Guangda Fan
  • Aiming Yang
  • Shangqing Lyu
  • Peng Zhang
  • Fan Song

Pancreatic cancer is one of the most malignant cancers with high mortality. The rapid on-site evaluation (ROSE) technique can significantly accelerate the diagnostic workflow of pancreatic cancer by immediately analyzing the fast-stained cytopathological images with on-site pathologists. However, the broader expansion of ROSE diagnosis has been hindered by the shortage of experienced pathologists. Deep learning has great potential for the automatic classification of ROSE images in diagnosis. But it is challenging to model the complicated local and global image features. The traditional convolutional neural network (CNN) structure can effectively extract spatial features, while it tends to ignore global features when the prominent local features are misleading. In contrast, the Transformer structure has excellent advantages in capturing global features and long-range relations, while it has limited ability in utilizing local features. We propose a multi-stage hybrid Transformer (MSHT) to combine the strengths of both, where a CNN backbone robustly extracts multi-stage local features at different scales as the attention guidance, and a Transformer encodes them for sophisticated global modeling. Going beyond the strength of each single method, the MSHT can simultaneously enhance the Transformer global modeling ability with the local guidance from CNN features. To evaluate the method in this unexplored field, a dataset of 4240 ROSE images is collected where MSHT achieves 95. 68% in classification accuracy with more accurate attention regions. The distinctively superior results compared to the state-of-the-art models make MSHT extremely promising for cytopathological image analysis.

IJCAI Conference 2023 Conference Paper

Multi-Scale Subgraph Contrastive Learning

  • Yanbei Liu
  • Yu Zhao
  • Xiao Wang
  • Lei Geng
  • Zhitao Xiao

Graph-level contrastive learning, aiming to learn the representations for each graph by contrasting two augmented graphs, has attracted considerable attention. Previous studies usually simply assume that a graph and its augmented graph as a positive pair, otherwise as a negative pair. However, it is well known that graph structure is always complex and multi-scale, which gives rise to a fundamental question: after graph augmentation, will the previous assumption still hold in reality? By an experimental analysis, we discover the semantic information of an augmented graph structure may be not consistent as original graph structure, and whether two augmented graphs are positive or negative pairs is highly related with the multi-scale structures. Based on this finding, we propose a multi-scale subgraph contrastive learning architecture which is able to characterize the fine-grained semantic information. Specifically, we generate global and local views at different scales based on subgraph sampling, and construct multiple contrastive relationships according to their semantic associations to provide richer self-supervised signals. Extensive experiments and parametric analyzes on eight graph classification real-world datasets well demonstrate the effectiveness of the proposed method.

EAAI Journal 2023 Journal Article

Single-image HDR reconstruction by dual learning the camera imaging process

  • Lei She
  • Mao Ye
  • Shuai Li
  • Yu Zhao
  • Ce Zhu
  • Hu Wang

It is a very challenging problem to reconstruct a high dynamic range (HDR) image from a single exposure image. There exist three problems, i. e. , the many-to-many mapping problem between low dynamic range (LDR) images and HDR images, the image quality problem caused by the change of dynamic range and the problem of unpaired LDR–HDR training images. These problems can be solved to some extent through a dual learning framework simultaneously to learn the forward and reverse of camera imaging processes. This procedure is divided into a primary module, to reconstruct HDR from LDR, and a secondary module to reversely mapping the HDR to LDR. The secondary module guides the learning of primary module by constraining the outputs of the primary module. After that, the attention mechanism is used to solve the problem of unnatural perception caused by the change of dynamic range. In the end, with the advantage of our dual learning framework, unpaired data is further explored to train our model, which enriches the training samples. Compared with the state-of-the-art methods, a large number of quantitative and qualitative experiments confirm that our method can achieve better performance.

AAAI Conference 2023 Conference Paper

Spatio-Temporal Neural Structural Causal Models for Bike Flow Prediction

  • Pan Deng
  • Yu Zhao
  • Junting Liu
  • Xiaofeng Jia
  • Mulan Wang

As a representative of public transportation, the fundamental issue of managing bike-sharing systems is bike flow prediction. Recent methods overemphasize the spatio-temporal correlations in the data, ignoring the effects of contextual conditions on the transportation system and the inter-regional time-varying causality. In addition, due to the disturbance of incomplete observations in the data, random contextual conditions lead to spurious correlations between data and features, making the prediction of the model ineffective in special scenarios. To overcome this issue, we propose a Spatio-temporal Neural Structure Causal Model(STNSCM) from the perspective of causality. First, we build a causal graph to describe the traffic prediction, and further analyze the causal relationship between the input data, contextual conditions, spatio-temporal states, and prediction results. Second, we propose to apply the frontdoor criterion to eliminate confounding biases in the feature extraction process. Finally, we propose a counterfactual representation reasoning module to extrapolate the spatio-temporal state under the factual scenario to future counterfactual scenarios to improve the prediction performance. Experiments on real-world datasets demonstrate the superior performance of our model, especially its resistance to fluctuations caused by the external environment. The source code and data will be released.

NeurIPS Conference 2023 Conference Paper

WildfireSpreadTS: A dataset of multi-modal time series for wildfire spread prediction

  • Sebastian Gerard
  • Yu Zhao
  • Josephine Sullivan

We present a multi-temporal, multi-modal remote-sensing dataset for predicting how active wildfires will spread at a resolution of 24 hours. The dataset consists of 13607 images across 607 fire events in the United States from January 2018 to October 2021. For each fire event, the dataset contains a full time series of daily observations, containing detected active fires and variables related to fuel, topography and weather conditions. The dataset is challenging due to: a) its inputs being multi-temporal, b) the high number of 23 multi-modal input channels, c) highly imbalanced labels and d) noisy labels, due to smoke, clouds, and inaccuracies in the active fire detection. The underlying complexity of the physical processes adds to these challenges. Compared to existing public datasets in this area, WildfireSpreadTS allows for multi-temporal modeling of spreading wildfires, due to its time series structure. Furthermore, we provide additional input modalities and a high spatial resolution of 375m for the active fire maps. We publish this dataset to encourage further research on this important task with multi-temporal, noise-resistant or generative methods, uncertainty estimation or advanced optimization techniques that deal with the high-dimensional input space.

YNIMG Journal 2022 Journal Article

Detection of functional activity in brain white matter using fiber architecture informed synchrony mapping

  • Yu Zhao
  • Yurui Gao
  • Zhongliang Zu
  • Muwei Li
  • Kurt G. Schilling
  • Adam W. Anderson
  • Zhaohua Ding
  • John C. Gore

A general linear model is widely used for analyzing fMRI data, in which the blood oxygenation-level dependent (BOLD) signals in gray matter (GM) evoked in response to neural stimulation are modeled by convolving the time course of the expected neural activity with a canonical hemodynamic response function (HRF) obtained a priori. The maps of brain activity produced reflect the magnitude of local BOLD responses. However, detecting BOLD signals in white matter (WM) is more challenging as the BOLD signals are weaker and the HRF is different, and may vary more across the brain. Here we propose a model-free approach to detect changes in BOLD signals in WM by measuring task-evoked increases of BOLD signal synchrony in WM fibers. The proposed approach relies on a simple assumption that, in response to a functional task, BOLD signals in relevant fibers are modulated by stimulus-evoked neural activity and thereby show greater synchrony than when measured in a resting state, even if their magnitudes do not change substantially. This approach is implemented in two technical stages. First, for each voxel a fiber-architecture-informed spatial window is created with orientation distribution functions constructed from diffusion imaging data. This provides the basis for defining neighborhoods in WM that share similar local fiber architectures. Second, a modified principal component analysis (PCA) is used to estimate the synchrony of BOLD signals in each spatial window. The proposed approach is validated using a 3T fMRI dataset from the Human Connectome Project (HCP) at a group level. The results demonstrate that neural activity can be reliably detected as increases in fMRI signal synchrony within WM fibers that are engaged in a task with high sensitivities and reproducibility.

AAAI Conference 2022 Conference Paper

Improving Human-Object Interaction Detection via Phrase Learning and Label Composition

  • Zhimin Li
  • Cheng Zou
  • Yu Zhao
  • Boxun Li
  • Sheng Zhong

Human-Object Interaction (HOI) detection is a fundamental task in high-level human-centric scene understanding. We propose PhraseHOI, containing a HOI branch and a novel phrase branch, to leverage language prior and improve relation expression. Specifically, the phrase branch is supervised by semantic embeddings, whose ground truths are automatically converted from the original HOI annotations without extra human efforts. Meanwhile, a novel label composition method is proposed to deal with the long-tailed problem in HOI, which composites novel phrase labels by semantic neighbors. Further, to optimize the phrase branch, a loss composed of a distilling loss and a balanced triplet loss is proposed. Extensive experiments are conducted to prove the effectiveness of the proposed PhraseHOI, which achieves significant improvement over the baseline and surpasses previous state-of-the-art methods on Full and NonRare on the challenging HICO-DET benchmark.

JBHI Journal 2020 Journal Article

Coarse-to-Fine Adversarial Networks and Zone-Based Uncertainty Analysis for NK/T-Cell Lymphoma Segmentation in CT/PET Images

  • Xiaobin Hu
  • Rui Guo
  • Jieneng Chen
  • Hongwei Li
  • Diana Waldmannstetter
  • Yu Zhao
  • Biao Li
  • Kuangyu Shi

Extranodal natural killer/T cell lymphoma (ENKL), nasal type is a kind of rare disease with a low survival rate that primarily affects Asian and South American populations. Segmentation of ENKL lesions is crucial for clinical decision support and treatment planning. This paper is the first study on computer-aided diagnosis systems for the ENKL segmentation problem. We propose an automatic, coarse-to-fine approach for ENKL segmentation using adversarial networks. In the coarse stage, we extract the region of interest bounding the lesions utilizing a segmentation neural network. In the fine stage, we use an adversarial segmentation network and further introduce a multi-scale L 1 loss function to drive the network to learn both global and local features. The generator and discriminator are alternately trained by backpropagation in an adversarial fashion in a min-max game. Furthermore, we present the first exploration of zone-based uncertainty estimates based on Monte Carlo dropout technique in the context of deep networks for medical image segmentation. Specifically, we propose the uncertainty criteria based on the lesion and the background, and then linearly normalize them to a specific interval. This is not only the crucial criterion for evaluating the superiority of the algorithm, but also permits subsequent optimization by engineers and revision by clinicians after quantitatively understanding the main source of uncertainty from the background or the lesion zone. Experimental results demonstrate that the proposed method is more effective and lesion-zone stable than state-of-the-art deep-learning based segmentation model.

JBHI Journal 2019 Journal Article

Knowledge-Aided Convolutional Neural Network for Small Organ Segmentation

  • Yu Zhao
  • Hongwei Li
  • Shaohua Wan
  • Anjany Sekuboyina
  • Xiaobin Hu
  • Giles Tetteh
  • Marie Piraud
  • Bjoern Menze

Accurate and automatic organ segmentation is critical for computer-aided analysis towards clinical decision support and treatment planning. State-of-the-art approaches have achieved remarkable segmentation accuracy on large organs, such as the liver and kidneys. However, most of these methods do not perform well on small organs, such as the pancreas, gallbladder, and adrenal glands, especially when lacking sufficient training data. This paper presents an automatic approach for small organ segmentation with limited training data using two cascaded steps- localization and segmentation. The localization stage involves the extraction of the region of interest after the registration of images to a common template and during the segmentation stage, a voxel-wise label map of the extracted region of interest is obtained and then transformed back to the original space. In the localization step, we propose to utilize a graph-based groupwise image registration method to build the template for registration so as to minimize the potential bias and avoid getting a fuzzy template. More importantly, a novel knowledge-aided convolutional neural network is proposed to improve segmentation accuracy in the second stage. This proposed network is flexible and can combine the effort of both deep learning and traditional methods, consequently achieving better segmentation relative to either of individual methods. The ISBI 2015 VISCERAL challenge dataset is used to evaluate the presented approach. Experimental results demonstrate that the proposed method outperforms cutting-edge deep learning approaches, traditional forest-based approaches, and multiatlas approaches in the segmentation of small organs.

JBHI Journal 2018 Journal Article

Size-Scalable Content-Based Histopathological Image Retrieval From Database That Consists of WSIs

  • Yushan Zheng
  • Zhiguo Jiang
  • Haopeng Zhang
  • Fengying Xie
  • Yibing Ma
  • Huaqiang Shi
  • Yu Zhao

Content-based image retrieval (CBIR) has been widely researched for histopathological images. It is challenging to retrieve contently similar regions from histopathological whole slide images (WSIs) for regions of interest (ROIs) in different size. In this paper, we propose a novel CBIR framework for database that consists of WSIs and size-scalable query ROIs. Each WSI in the database is encoded into a matrix of binary codes. When retrieving, a group of region proposals that have similar size with the query ROI are firstly located in the database through an efficient table-lookup approach. Then, these regions are ranked by a designed multi-binary-code-based similarity measurement. Finally, the top relevant regions and their locations in the WSIs as well as the corresponding diagnostic information are returned to assist pathologists. The effectiveness of the proposed framework is evaluated on a fine-annotated WSI database of epithelial breast tumors. The experimental results have proved that the proposed framework is effective for retrieval from database that consists of WSIs. Specifically, for query ROIs of 4096 $\times$ 4096 pixels, the retrieval precision of the top 20 return has reached 96% and the retrieval time is less than 1. 5 s.

YNIMG Journal 2018 Journal Article

Spatio-temporal modeling of connectome-scale brain network interactions via time-evolving graphs

  • Jing Yuan
  • Xiang Li
  • Jinhe Zhang
  • Liao Luo
  • Qinglin Dong
  • Jinglei Lv
  • Yu Zhao
  • Xi Jiang

Many recent literature studies have revealed interesting dynamics patterns of functional brain networks derived from fMRI data. However, it has been rarely explored how functional networks spatially overlap (or interact) and how such connectome-scale network interactions temporally evolve. To explore these unanswered questions, this paper presents a novel framework for spatio-temporal modeling of connectome-scale functional brain network interactions via two main effective computational methodologies. First, to integrate, pool and compare brain networks across individuals and their cognitive states under task performances, we designed a novel group-wise dictionary learning scheme to derive connectome-scale consistent brain network templates that can be used to define the common reference space of brain network interactions. Second, the temporal dynamics of spatial network interactions is modeled by a weighted time-evolving graph, and then a data-driven unsupervised learning algorithm based on the dynamic behavioral mixed-membership model (DBMM) is adopted to identify behavioral patterns of brain networks during the temporal evolution process of spatial overlaps/interactions. Experimental results on the Human Connectome Project (HCP) task fMRI data showed that our methods can reveal meaningful, diverse behavior patterns of connectome-scale network interactions. In particular, those networks’ behavior patterns are distinct across HCP tasks such as motor, working memory, language and social tasks, and their dynamics well correspond to the temporal changes of specific task designs. In general, our framework offers a new approach to characterizing human brain function by quantitative description for the temporal evolution of spatial overlaps/interactions of connectome-scale brain networks in a standard reference space.

JBHI Journal 2017 Journal Article

Breast Histopathological Image Retrieval Based on Latent Dirichlet Allocation

  • Yibing Ma
  • Zhiguo Jiang
  • Haopeng Zhang
  • Fengying Xie
  • Yushan Zheng
  • Huaqiang Shi
  • Yu Zhao

In the field of pathology, whole slide image (WSI) has become the major carrier of visual and diagnostic information. Content-based image retrieval among WSIs can aid the diagnosis of an unknown pathological image by finding its similar regions in WSIs with diagnostic information. However, the huge size and complex content of WSI pose several challenges for retrieval. In this paper, we propose an unsupervised, accurate, and fast retrieval method for a breast histopathological image. Specifically, the method presents a local statistical feature of nuclei for morphology and distribution of nuclei, and employs the Gabor feature to describe the texture information. The latent Dirichlet allocation model is utilized for high-level semantic mining. Locality-sensitive hashing is used to speed up the search. Experiments on a WSI database with more than 8000 images from 15 types of breast histopathology demonstrate that our method achieves about 0. 9 retrieval precision as well as promising efficiency. Based on the proposed framework, we are developing a search engine for an online digital slide browsing and retrieval platform, which can be applied in computer-aided diagnosis, pathology education, and WSI archiving and management.

YNICL Journal 2016 Journal Article

Connectome-scale group-wise consistent resting-state network analysis in autism spectrum disorder

  • Yu Zhao
  • Hanbo Chen
  • Yujie Li
  • Jinglei Lv
  • Xi Jiang
  • Fangfei Ge
  • Tuo Zhang
  • Shu Zhang

Understanding the organizational architecture of human brain function and its alteration patterns in diseased brains such as Autism Spectrum Disorder (ASD) patients are of great interests. In-vivo functional magnetic resonance imaging (fMRI) offers a unique window to investigate the mechanism of brain function and to identify functional network components of the human brain. Previously, we have shown that multiple concurrent functional networks can be derived from fMRI signals using whole-brain sparse representation. Yet it is still an open question to derive group-wise consistent networks featured in ASD patients and controls. Here we proposed an effective volumetric network descriptor, named connectivity map, to compactly describe spatial patterns of brain network maps and implemented a fast framework in Apache Spark environment that can effectively identify group-wise consistent networks in big fMRI dataset. Our experiment results identified 144 group-wisely common intrinsic connectivity networks (ICNs) shared between ASD patients and healthy control subjects, where some ICNs are substantially different between the two groups. Moreover, further analysis on the functional connectivity and spatial overlap between these 144 common ICNs reveals connectomics signatures characterizing ASD patients and controls. In particular, the computing time of our Spark-enabled functional connectomics framework is significantly reduced from 240 hours (C ++ code, single core) to 20 hours, exhibiting a great potential to handle fMRI big data in the future.

YNIMG Journal 2015 Journal Article

Optimization of large-scale mouse brain connectome via joint evaluation of DTI and neuron tracing data

  • Hanbo Chen
  • Tao Liu
  • Yu Zhao
  • Tuo Zhang
  • Yujie Li
  • Meng Li
  • Hongmiao Zhang
  • Hui Kuang

Tractography based on diffusion tensor imaging (DTI) data has been used as a tool by a large number of recent studies to investigate structural connectome. Despite its great success in offering unique 3D neuroanatomy information, DTI is an indirect observation with limited resolution and accuracy and its reliability is still unclear. Thus, it is essential to answer this fundamental question: how reliable is DTI tractography in constructing large-scale connectome? To answer this question, we employed neuron tracing data of 1772 experiments on the mouse brain released by the Allen Mouse Brain Connectivity Atlas (AMCA) as the ground-truth to assess the performance of DTI tractography in inferring white matter fiber pathways and inter-regional connections. For the first time in the neuroimaging field, the performance of whole brain DTI tractography in constructing a large-scale connectome has been evaluated by comparison with tracing data. Our results suggested that only with the optimized tractography parameters and the appropriate scale of brain parcellation scheme, can DTI produce relatively reliable fiber pathways and a large-scale connectome. Meanwhile, a considerable amount of errors were also identified in optimized DTI tractography results, which we believe could be potentially alleviated by efforts in developing better DTI tractography approaches. In this scenario, our framework could serve as a reliable and quantitative test bed to identify errors in tractography results which will facilitate the development of such novel tractography algorithms and the selection of optimal parameters.

AAAI Conference 2015 Conference Paper

Phrase Type Sensitive Tensor Indexing Model for Semantic Composition

  • Yu Zhao
  • Zhiyuan Liu
  • Maosong Sun

Compositional semantic aims at constructing the meaning of phrases or sentences according to the compositionality of word meanings. In this paper, we propose to synchronously learn the representations of individual words and extracted high-frequency phrases. Representations of extracted phrases are considered as gold standard for constructing more general operations to compose the representation of unseen phrases. We propose a grammatical type specific model that improves the composition flexibility by adopting vector-tensorvector operations. Our model embodies the compositional characteristics of traditional additive and multiplicative model. Empirical result shows that our model outperforms state-of-the-art composition methods in the task of computing phrase similarities.

IJCAI Conference 2015 Conference Paper

Representation Learning for Measuring Entity Relatedness with Rich Information

  • Yu Zhao
  • Zhiyuan Liu
  • Maosong Sun

Incorporating multiple types of relational information from heterogeneous networks has been proved effective in data mining. Although Wikipedia is one of the most famous heterogeneous network, previous works of semantic analysis on Wikipedia are mostly limited on single type of relations. In this paper, we aim at incorporating multiple types of relations to measure the semantic relatedness between Wikipedia entities. We propose a framework of coordinate matrix factorization to construct lowdimensional continuous representation for entities, categories and words in the same semantic space. We formulate this task as the completion of a sparse entity-entity association matrix, in which each entry quantifies the strength of relatedness between corresponding entities. We evaluate our model on the task of judging pair-wise word similarity. Experiment result shows that our model outperforms both traditional entity relatedness algorithms and other representation learning models.