Arrow Research search

Author name cluster

Yan Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

83 papers
2 author rows

Possible papers

83

AAAI Conference 2026 Conference Paper

Active3D: Active High-Fidelity 3D Reconstruction via Multi-Level Uncertainty Quantification

  • Yan Li
  • Yingzhao Li
  • Gim Hee Lee

In this paper, we present an active exploration framework for high-fidelity 3D reconstruction that incrementally builds a multi-level uncertainty space and selects next-best-views through an uncertainty-driven motion planner. We introduce a hybrid implicit–explicit representation that fuses neural fields with Gaussian primitives to jointly capture global structural priors and locally observed details. Based on this hybrid state, we derive a hierarchical uncertainty volume that quantifies both implicit global structure quality and explicit local surface confidence. To focus optimization on the most informative regions, we propose an uncertainty-driven keyframe selection strategy that anchors high-entropy viewpoints as sparse attention nodes, coupled with a viewpoint-space sliding window for uncertainty-aware local refinement. The planning module formulates next-best-view selection as an Expected Hybrid Information Gain problem and incorporates a risk-sensitive path planner to ensure efficient and safe exploration. Extensive experiments on challenging benchmarks demonstrate that our approach consistently achieves state-of-the-art accuracy, completeness, and rendering quality, highlighting its effectiveness for real-world active reconstruction and robotic perception tasks.

EAAI Journal 2026 Journal Article

Deep reinforcement learning-based energy management strategy integrating physics information and expert system: efficient regenerative braking energy recovery in urban rail transit traction power system

  • Yan Li
  • Fei Lin
  • Zhongping Yang
  • Xiaochun Fang

Considering the observability of the urban rail transit traction power system (TPS) in engineering practice, traditional deep reinforcement learning (DRL)-based methods are often unsuitable for real-time online control of wayside regenerative energy utilization devices or fail to effectively recover regenerative braking energy (RBE) in scenarios involving TPS load fluctuations. To address this issue, this paper proposes a hybrid DRL-based energy management framework. The long-term management (LTM) derives an expert strategy through constrained DRL under offline conditions. The short-term management (STM) addresses real-world observability by incorporating the TPS power conservation law into the reward function and leveraging observable states at the wayside to identify load power. When operating conditions change online, STM performs real-time identification at the backend, outputs estimated load values, and feeds them back to correct the LTM inputs. The expert system then carries out online control at the frontend based on the corrected results. Simulation results based on engineering data demonstrate that the proposed method can significantly enhance energy efficiency performance under dynamic operating conditions. Across multiple case scenarios, the collaborative operation of LTM and STM achieves up to a 40% reduction in RBE losses. Compared with state-of-the-art DRL-based general energy management frameworks, the proposed method improves energy efficiency by 11%. The method not only enhances the adaptability and reliability but also ensures deploy-ability in engineering applications.

AAAI Conference 2026 Conference Paper

DSAP: Enhancing Generalization in Goal-Conditioned Reinforcement Learning

  • Yiming Wang
  • Kaiyan Zhao
  • Ming Yang
  • Yan Li
  • Furui Liu
  • Jiayu Chen
  • Leong Hou U

Goal-conditioned Reinforcement Learning (RL) is a promising direction for training agents capable of tackling a variety of tasks. However, generalizing to new goals in different environments remains a central challenge for goal-conditioned RL agents. Existing methods often rely on state abstraction, which involves learning abstracted state representations by excluding irrelevant features, to improve generalization. Despite their success in simplified settings, these methods often fail to generalize effectively to realistic environments with varied goals. In this work, we propose to enhance generalization through state abstraction from the perspective of causal inference. We hypothesize that the generalization gap arises in part due to unobserved confounders: latent variables that simultaneously influence both the global and goal states. To address this, we introduce Deconfounded State Abstraction for Policy learning (DSAP), a novel framework that mitigates backdoor confounding by employing a learned causal graph as a *proxy* for the hidden confounders. We provide theoretical analysis demonstrating that DSAP improves both the learning process and the generalization capability of goal-conditioned policies. Extensive experiments across different settings of multiple benchmarks show that our method significantly outperforms existing methods.

AAAI Conference 2026 Conference Paper

Explore to Learn: Latent Exploration Through Disentangled Synergy Patterns for Reinforcement Learning in Overactuated Control

  • Yiming Wang
  • Kaiyan Zhao
  • Xu Li
  • Yan Li
  • Jiayu Chen
  • Steven Morad
  • Leong Hou U

Control in high-dimensional action spaces remains a fundamental challenge in reinforcement learning (RL), primarily due to inefficient exploration of the action space. While recent methods attempt to guide exploration, they often fall short of achieving the agility and coordination exhibited in biological motor control. Inspired by how organisms exploit muscle synergies for efficient movement, we propose Explore to Learn (ETL), a two-stage framework that first discovers fundamental synergy patterns and then leverages them for task-specific policy learning. In the first stage, ETL discovers underlying synergy patterns by deploying a targeted exploration policy. These patterns are modeled as latent directions in a low-dimensional space, along which the agent is guided to collect diverse and structured muscle activation trajectories. A variational autoencoder (VAE) is then trained to encode high-dimensional actions into a latent space whose dimensions correspond to the synergy patterns. In the second stage, the policy is trained entirely in this synergy-aware latent space, producing synergy coefficients that the decoder maps back to full-dimensional muscle actions. This structured representation significantly reduces the complexity of learning, while the decoder is further fine-tuned to enhance expressiveness and generalization across downstream tasks. Extensive experiments across musculoskeletal environments and the DMControl suite demonstrate that ETL consistently outperforms prior methods in both exploration efficiency and control performance, achieving superior scalability and generalization in overactuated control tasks.

AAAI Conference 2026 Conference Paper

Latent State-Predictive Exploration for Deep Reinforcement Learning

  • Yiming Wang
  • Kaiyan Zhao
  • Borong Zhang
  • Yan Li
  • Leong Hou U

Reinforcement learning (RL) has achieved promising results in continuous control tasks, where efficient exploration of the state space is crucial for success. However, many recent RL approaches still struggle with sample inefficiency and insufficient exploration for long-horizon tasks, particularly in environments characterized by high-dimensional and complex state spaces. To address these challenges, we propose a novel exploration framework, Latent State Predictive Exploration (LSPE). The core idea behind LSPE is to endow the agent with a form of ``foresight" to enhance exploration in long-horizon settings. Specifically, LSPE employs a state encoder to learn compact latent representations from high-dimensional visual observations, effectively filtering out irrelevant or noisy information. To further enrich and stabilize these representations, we incorporate a diffusion-based self-predictive module that enforces temporal consistency by predicting future states, thereby improving both exploration and downstream predictive control. Additionally, we introduce an Exploration Reward Function (ERF) that explicitly encourages the agent to visit novel latent states. This reward signal promotes more efficient and scalable exploration in complex environments. We evaluate LSPE across a diverse set of challenging long-horizon navigation and manipulation tasks, spanning simulation environments such as Habitat and Robosuite, as well as deployment on a real robot in a **physical indoor environment**. Experimental results show that LSPE substantially enhances exploration efficiency and scales effectively to complex, high-dimensional tasks.

AAAI Conference 2026 Conference Paper

Manipulating the Mind’s Eye: A-SAGE, the Attention-Based Attack on ViT Explainability

  • Boshi Zheng
  • Yan Li
  • Jiabin Liu

The rise of Vision Transformers (ViTs) as cornerstone models in safety-critical applications like autonomous driving and medical diagnosis has shifted the focus from pure accuracy to verifiable trustworthiness. However, the very mechanisms used to explain these models, their internal attention maps, are themselves vulnerable. This creates a critical "trust gap," as the model's apparent reasoning can be maliciously manipulated. To systematically investigate this vulnerability, we introduce A-SAGE (Attention-based Steering Adversarial Generation by Corrupting Explanations), a dual-objective attack framework that forces a model to misclassify an input while simultaneously corrupting its internal attention patterns to generate a misleading explanation. A-SAGE achieves this by optimizing a unified loss that combines a standard classification objective with two explanation-specific terms: an attention entropy loss to diffuse the model's focus and an attention map distortion loss to steer the corrupted explanation towards a desired target. Our primary finding is A-SAGE's exceptional black-box transferability. Using a CaiT-S as a white-box surrogate, adversarial examples generated with imperceptible perturbations achieve attack success rates of 79.4% on ViT-B, 49.7% on ResNet-50, and over 81.5% on other transformers (DeiT-B,TNT-S). Crucially, these successful attacks do not merely destroy the explanation; they generate a coherent but false attention map that deceptively "justifies" the wrong prediction. These results reveal a systemic vulnerability in the core reasoning of modern foundation models, establishing A-SAGE as a critical benchmark for auditing the robustness of AI explainability.

TIST Journal 2026 Journal Article

Mining High Average Utility Nonoverlapping Patterns from Sequential Database

  • Meng Geng
  • Youxi Wu
  • Yan Li
  • Jing Liu
  • Lei Guo
  • Xingquan Zhu
  • Xindong Wu

As a crucial aspect of data mining, high average utility sequential pattern mining (SPM) aims to discover low frequency and high average utility patterns (subsequences) in sequence data. Most existing high average utility SPM methods overlook the repetitive occurrences of patterns in each sequence, resulting in some important patterns being ignored. To address this issue, we focus on the problem of mining high average utility nonoverlapping patterns (HUPs) from sequential database, and propose an HUP-Miner algorithm. To reduce the need for repeated scanning of the original database, we use a position dictionary to record the occurrence information of each item. To reduce the number of candidate patterns generated, we adopt a pattern join strategy and explore four pruning strategies. To efficiently calculate the average utility of a pattern, we propose an SPC algorithm that utilizes the occurrence positions of sub-patterns. When compared with 12 competitive algorithms, the experimental results on 14 databases show that HUP-Miner gives superior results. Furthermore, we use information gain as the utility for each item, and find that the HUPs discovered in this way can generate better performance via a clustering analysis. All of the algorithms and databases used here are available from https://github.com/wuc567/Pattern-Mining/tree/master/HUP-Miner.

AAAI Conference 2026 Conference Paper

Rethink Representation Learning for Questionnaire Data

  • Guanhua Ye
  • Jifeng He
  • Yan Li
  • Junping Du
  • Zhe Xue
  • Yingxia Shao
  • Meiyu Liang
  • Yawen Li

Questionnaire data serve as a valuable resource across numerous scientific domains, offering insights into human behavior, health, and social trends. Traditional downsampling-based representation learning methods—such as standardization and one-hot encoding—reformat these data into tabular structures that inherently discard semantic richness and obscure inter-sample and inter-feature relationships. Consequently, advanced deep learning models often underperform compared to simpler approaches like gradient-boosted decision trees (GBDT), due to their limited capacity to extract meaningful representations from semantically sparse inputs. To address this limitation, we introduce SemantiQ, a novel upsampling-based representation learning framework that embeds questionnaire responses into a unified semantic space. Leveraging Retrieval-Augmented Generation (RAG) in conjunction with large language models (LLMs), SemantiQ transforms question text, option text, and external knowledge into semantically enriched natural language statements. These statements are then encoded into semantic embeddings, which are further refined through a three-stage training mechanism and test-time training (TTT), enabling the model to capture complex sample- and feature-wise dependencies. Extensive experiments on multiple real-world datasets demonstrate that SemantiQ consistently outperforms state-of-the-art baselines.

AAAI Conference 2026 Conference Paper

RiemanLine: Riemannian Manifold Representation of 3D Lines for Factor Graph Optimization

  • Yan Li
  • Ze Yang
  • Keisuke Tateno
  • Federico Tombari
  • Liang Zhao
  • Gim Hee Lee

Minimal parametrization of 3D lines plays a critical role in camera localization and structural mapping. Existing representations in robotics and computer vision predominantly handle independent lines, overlooking structural regularities such as sets of parallel lines that are pervasive in man-made environments. This paper introduces RiemanLine, a unified minimal representation for 3D lines formulated on Riemannian manifolds that jointly accommodates both individual lines and parallel-line groups. Our key idea is to decouple each line landmark into global and local components: a shared vanishing direction optimized on the unit sphere, and scaled normal vectors constrained on orthogonal subspaces, enabling compact encoding of structural regularities. For n parallel lines, the proposed representation reduces the parameter space from 4n (orthonormal form) to 2n+2, naturally embedding parallelism without explicit constraints. We further integrate this parameterization into a factor graph framework, allowing global direction alignment and local reprojection optimization within a unified manifold-based bundle adjustment. Extensive experiments on ICL-NUIM, TartanAir, and synthetic benchmarks demonstrate that our method achieves significantly more accurate pose estimation and line reconstruction, while reducing parameter dimensionality and improving convergence stability.

EAAI Journal 2026 Journal Article

Robust guaranteed neural learning-based output tracking control for uncertain nonlinear systems: An uncertainty feedback compensation method

  • Chengbo Dai
  • Jie Li
  • Zhenlong Wu
  • Yan Li
  • Donghai Li

The proven efficacy of neural network-based control schemes has spurred their application to physical systems. However, ensuring performance robustness when deploying such controllers in uncertain physical environments remains a significant challenge. This article proposes an uncertainty feedback compensation framework to guarantee the performance robustness of neural learning-based output tracking control for uncertain nonlinear systems. Active Disturbance Rejection Control (ADRC) is incorporated as an ancillary compensator, requiring only that the varying rate of uncertainty be bounded. A single critic network-based output tracking control is then developed by constructing an augmented nominal model and adaptive dynamic programming (ADP), while ADRC operates in parallel to compensate for general uncertainties in real time. Furthermore, a desired dynamic equation-based parameter tuning rule is proposed to configure ADRC for effective tracking using nominal model information. The convergence of neural network weights is established via Lyapunov analysis, and the closed-loop stability and performance robustness are further demonstrated by analyzing the boundedness of ADRC’s estimation and tracking errors under general system uncertainties. Finally, the effectiveness of the proposed method is validated through both numerical simulations and practical experiments, demonstrating substantial improvements in the safety and practical applicability of neural learning-based control.

EAAI Journal 2026 Journal Article

Siamese network for insulated gate bipolar transistor welding layer localization in computed laminography images using sharpness guidance

  • Yan Li
  • Shuangquan Liu
  • Cunfeng Wei
  • Baodong Liu
  • Chao Fan

Insulated gate bipolar transistor (IGBT) are pivotal components in energy conversion and transmission systems. Void defects in IGBT welding layers degrade thermal exchange efficiency and operational stability, thus serving as a critical metric for packaging quality assessment. Computed laminography (CL) has become a premier non-destructive inspection technique for IGBT welding layers, benefiting from its high spatial resolution. However, CL-based three-dimensional (3D) reconstruction generates thousands of heterogeneous two-dimensional (2D) tomographic slices, and extracting high-quality welding layer images from this dataset is essential for subsequent void defect detection. A key challenge arises from the embedding of welding layers within massive tomographic slices and inter-layer aliasing artifacts that render adjacent layers highly similar, impeding accurate discrimination. To resolve this, we propose a two-stage method: first, we model inter-layer aliasing as an image defocus problem and welding layer localization as a focusing task, then optimize a sharpness evaluation algorithm to screen candidate slices; second, we adopt a transfer learning-enhanced Siamese network for efficient feature extraction and quality scoring, with optimal slices selected via score ranking. Experimental results show that the proposed method achieves a top-1 accuracy of 97. 51%. By integrating high-resolution CL imaging with deep learning, this work provides a robust automated solution for IGBT packaging quality inspection, laying a foundation for reliable void defect evaluation. Our code is available at https: //github. com/YanLi1808/IGBT_Layer_Localization.

AAAI Conference 2026 Conference Paper

UniAPO: Unified Multimodal Automated Prompt Optimization

  • Qipeng zhu
  • Yanzhe Chen
  • Huasong Zhong
  • Jie Chen
  • Yan Li
  • Zhixin Zhang
  • Junping Zhang
  • Zhenheng Yang

Prompting is fundamental to unlocking the full potential of large language models. To automate and enhance this process, automatic prompt optimization (APO) has been developed, demonstrating effectiveness primarily in text-only input scenarios. However, extending existing APO methods to multimodal tasks—such as video-language generation—introduces two core challenges: (i) visual token inflation, where long visual-token sequences restrict context capacity and result in insufficient feedback signals; (ii) a lack of process-level supervision, as existing methods focus on outcome-level supervision and overlook intermediate supervision, limiting prompt optimization. We present UniAPO: Unified Multimodal Automated Prompt Optimization, the first framework tailored for multimodal APO. UniAPO adopts an EM-inspired optimization process that decouples feedback modeling and prompt refinement, making the optimization more stable and goal-driven. To further address the aforementioned challenges, we introduce a short-long term memory mechanism: historical feedback mitigates context limitations, while historical prompts provide directional guidance for effective prompt optimization. UniAPO achieves consistent gains across text, image, and video benchmarks, establishing a unified framework for efficient and transferable prompt optimization.

JMLR Journal 2026 Journal Article

Unsupervised Feature Selection via Nonnegative Orthogonal Constrained Regularized Minimization

  • Yan Li
  • Defeng Sun
  • Liping Zhang

Unsupervised feature selection has drawn wide attention in the era of big data, since it serves as a fundamental technique for dimensionality reduction. However, many existing unsupervised feature selection models and solution methods are primarily designed for practical applications, and often lack rigorous theoretical support, such as convergence guarantees. In this paper, we first establish a novel unsupervised feature selection model based on regularized minimization with nonnegative orthogonality constraints, which has advantages of embedding feature selection into the nonnegative spectral clustering and preventing overfitting. To solve the proposed model, we develop an effective inexact augmented Lagrangian multiplier method, in which the subproblems are addressed using a proximal alternating minimization approach. We rigorously prove the algorithm's sequence converges to a stationary point of the model. Extensive numerical experiments on popular datasets demonstrate the stability and robustness of our method. Moreover, comparative results show that our method outperforms some existing state-of-the-art methods in terms of clustering evaluation metrics. The code is available at https://github.com/liyan-amss/NOCRM_code. [abs] [ pdf ][ bib ] &copy JMLR 2026. ( edit, beta )

EAAI Journal 2025 Journal Article

An agent-based emotional persuasion model driven by integrated trust assessment

  • Jinghua Wu
  • Ya Zhang
  • Ruiyang Cao
  • Yan Li

Recent research on automated negotiation has primarily focused on improving the artificial intelligence of agents and equipping them with more flexible internal mechanisms to facilitate high-quality negotiations. However, the study on the systematic modeling of human-like psychological and behavioral activities and their role in the negotiation process is still in its early stages. In light of this, this paper proposes an emotional persuasion model that takes into account the effect of negotiators' integrated trust assessments on negotiation. Firstly, the paper presents a negotiation agent with both cognitive and emotional functions, detailing its internal system and operating mechanism. Secondly, the integrated trust of a negotiator is obtained by evaluating multiple single trusts, and the mapping of the integrated trust to the negotiation round parameter is modeled. Integrated trust is also parameterized into the agent's cognitive processes. Finally, the paper introduces a new framework for the generation of emotional persuasive behavior to assist agents in making new proposals. A series of experiments were conducted, yielding the following results: Compared with the non-emotional model, the performance of negotiation rounds and utility differences improved by 7. 97 % and 4. 81 %, respectively. Furthermore, the trust-driven emotional persuasion model outperformed the several existing competing models by at least 31. 1 % in utility difference and 81. 0 % in negotiation rounds. Additionally, a case study of human-computer negotiation demonstrated that the agent designed using the proposed method has negotiating capabilities comparable to those of a real human, which further showcases the application effect of the artificial intelligence agent in practice.

EAAI Journal 2025 Journal Article

Application of deep learning-based multimodal fusion technology in cancer diagnosis: A survey

  • Yan Li
  • Liangrui Pan
  • Yijun Peng
  • Xiaoyu Li
  • Xiang Wang
  • Limeng Qu
  • Qiya Song
  • Qingchun Liang

Relying solely on a single medical data for cancer diagnosis may increase the risk of misdiagnosis and missed diagnosis. Multi-modal data provides comprehensive information on disease characteristics and can effectively promote the development of precision oncology. This paper first introduces the genomic, pathological, radiological and clinical information in cancer multimodal data. Secondly, the common subfields of cancer multimodal data fusion are reviewed, with emphasis on data fusion techniques. The evolution of architectures under different fusion classes is compared, highlighting their comparative advantages and limitations. Importantly, we systematically reviewed the last five years of deep learning-based multimodal cancer data fusion, focusing on the application of multimodal techniques to cancer survival prediction and subtype typing. Finally, we present the challenges and possible solutions for multimodal applications in cancer. The purpose of this paper is to promote the fusion and application of multimodal tumor data and highlight potential research directions in the future.

IJCAI Conference 2025 Conference Paper

BILE: An Effective Behavior-based Latent Exploration Scheme for Deep Reinforcement Learning

  • Yiming Wang
  • Kaiyan Zhao
  • Yan Li
  • Leong Hou U

Efficient exploration of state spaces is critical for the success of deep reinforcement learning (RL). While many methods leverage exploration bonuses to encourage exploration instead of relying solely on extrinsic rewards, these bonus-based approaches often face challenges with learning efficiency and scalability, especially in environments with high-dimensional state spaces. To address these issues, we propose BehavIoral metric-based Latent Exploration (BILE). The core idea is to learn a compact representation within the behavioral metric space that preserves value differences between states. By introducing additional rewards to encourage exploration in this latent space, BILE drives the agent to visit states with higher value diversity and exhibit more behaviorally distinct actions, leading to more effective exploration of the state space. Additionally, we present a novel behavioral metric for efficient and robust training of the state encoder, backed by theoretical guarantees. Extensive experiments on high-dimensional environments, including realistic indoor scenarios in Habitat, robotic tasks in Robosuite, and challenging discrete Minigrid benchmarks, demonstrate the superiority and scalability of our method over other approaches.

NeurIPS Conference 2025 Conference Paper

CausalVerse: Benchmarking Causal Representation Learning with Configurable High-Fidelity Simulations

  • Guangyi Chen
  • Yunlong Deng
  • Peiyuan Zhu
  • Yan Li
  • Yifan Shen
  • Zijian Li
  • Kun Zhang

Causal Representation Learning (CRL) aims to uncover the data-generating process and identify the underlying causal variables and relations, whose evaluation remains inherently challenging due to the requirement of known ground-truth causal variables and causal structure. Existing evaluations often rely on either simplistic synthetic datasets or downstream performance on real-world tasks, generally suffering a dilemma between realism and evaluative precision. In this paper, we introduce a new benchmark for CRL using high-fidelity simulated visual data that retains both realistic visual complexity and, more importantly, access to ground-truth causal generating processes. The dataset comprises around 200 thousand images and 3 million video frames across 24 sub-scenes in four domains: static image generation, dynamic physical simulations, robotic manipulations, and traffic situation analysis. These scenarios range from static to dynamic settings, simple to complex structures, and single to multi-agent interactions, offering a comprehensive testbed that hopefully bridges the gap between rigorous evaluation and real-world applicability. In addition, we provide flexible access to the underlying causal structures, allowing users to modify or configure them to align with the required assumptions in CRL, such as available domain labels, temporal dependencies, or intervention histories. Leveraging this benchmark, we evaluated representative CRL methods across diverse paradigms and offered empirical insights to assist practitioners and newcomers in choosing or extending appropriate CRL frameworks to properly address specific types of real problems that can benefit from the CRL perspective. Welcome to visit our: Project page: https: //causal-verse. github. io/, Dataset: https: //huggingface. co/CausalVerse

NeurIPS Conference 2025 Conference Paper

Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

  • Tao Zhang
  • Cheng Da
  • Kun Ding
  • Huan Yang
  • Kun Jin
  • Yan Li
  • Tingting Gao
  • Di Zhang

Preference optimization for diffusion models aims to align them with human preferences for images. Previous methods typically use Vision-Language Models (VLMs) as pixel-level reward models to approximate human preferences. However, when used for step-level preference optimization, these models face challenges in handling noisy images of different timesteps and require complex transformations into pixel space. In this work, we show that pre-trained diffusion models are naturally suited for step-level reward modeling in the noisy latent space, as they are explicitly designed to process latent images at various noise levels. Accordingly, we propose the Latent Reward Model (LRM), which repurposes components of the diffusion model to predict preferences of latent images at arbitrary timesteps. Building on LRM, we introduce Latent Preference Optimization (LPO), a step-level preference optimization method conducted directly in the noisy latent space. Experimental results indicate that LPO significantly improves the model's alignment with general, aesthetic, and text-image alignment preferences, while achieving a 2. 5-28x training speedup over existing preference optimization methods.

IJCAI Conference 2025 Conference Paper

Efficient Diversity-based Experience Replay for Deep Reinforcement Learning

  • Kaiyan Zhao
  • Yiming Wang
  • Yuyang Chen
  • Yan Li
  • Leong Hou U
  • Xiaoguang Niu

Experience replay is widely used to improve learning efficiency in reinforcement learning by leveraging past experiences. However, existing experience replay methods, whether based on uniform or prioritized sampling, often suffer from low efficiency, particularly in real-world scenarios with high-dimensional state spaces. To address this limitation, we propose a novel approach, Efficient Diversity-based Experience Replay (EDER). EDER employs a determinantal point process to model the diversity between samples and prioritizes replay based on the diversity between samples. To further enhance learning efficiency, we incorporate Cholesky decomposition for handling large state spaces in realistic environments. Additionally, rejection sampling is applied to select samples with higher diversity, thereby improving overall learning efficacy. Extensive experiments are conducted on robotic manipulation tasks in MuJoCo, Atari games, and realistic indoor environments in Habitat. The results demonstrate that our approach not only significantly improves learning efficiency but also achieves superior performance in high-dimensional, realistic environments.

EAAI Journal 2025 Journal Article

Exploiting Non-likelihood Adversarial Training for Chinese Counterfactual Data Augmentation

  • Dezhi An
  • Fang Wang
  • Shengcai Zhang
  • Yan Li

Chinese natural language processing (NLP) models often struggle with out-of-distribution generalization due to data biases and shortcut learning. Existing counterfactual data augmentation (CDA) methods heavily rely on manual intervention, limiting their scalability. To address this issue, we propose CAT-CDA (Chinese Non-likelihood Adversarial Training for Counterfactual Data Augmentation), an automated approach that generates counterfactual data with opposite labels while maintaining semantic consistency. CAT-CDA employs a classifier to identify causal features and optimizes a generator through non-likelihood adversarial training, ensuring both diversity and fluency. Experimental results demonstrate that CAT-CDA significantly improves model robustness, enhances out-of-distribution performance, and mitigates shortcut learning while requiring lower computational resources than existing methods, making it highly applicable to Chinese NLP tasks.

EAAI Journal 2025 Journal Article

Intelligent traffic accident detection system in complex dynamic scenarios based on the dual-stream spatiotemporal-fusion model

  • Huilin Liu
  • Xiaolong Hu
  • Guanghan Sun
  • Wenkang Zhang
  • Jialei Zhan
  • Haobo Fang
  • Yan Li
  • Wanqi Ma

Rapid urbanization and the global surge in motor vehicle usage have led to an increase in traffic accidents, posing threats to public safety and socio-economic development. Traditional traffic accident detection methods often face limitations in both accuracy and robustness, especially in complex scenarios involving diverse accident types and external environmental factors. This study proposes a Multi-View Dual-Stream Temporal-Spatial Accident Detection Network (MDTS-ADNet) to tackle these challenges in traffic accident detection. The MDTS-ADNet architecture systematically integrates spatial information derived from RGB visual frames with temporal dynamics extracted from optical flow data, enabling a comprehensive characterization of accident-related patterns under varying conditions. The framework incorporates specialized modules tailored for multi-scale spatial feature extraction and spatiotemporal behavior encoding, significantly enhancing the reliability and robustness of traffic accident detection. To empirically validate the proposed method, we constructed the 4 Multi-View Traffic Accident Dataset (4M-TAD), a meticulously annotated, extensive dataset capturing diverse real-world traffic environments and accident scenarios. Experimental results demonstrate that MDTS-ADNet outperforms existing state-of-the-art techniques, achieving an Area Under the Curve (AUC) of 84. 33 %. Further evaluation on the public AI City Challenge dataset demonstrates its strong generalization capability across different traffic domains. Additionally, we established a detection and early warning system that integrates MDTS-ADNet with monitoring equipment, substantially improving the efficiency of traffic accident detection. Overall, this research provides critical advancements in automated accident detection technologies, substantially contributing to the reduction of traffic-related fatalities and economic losses. Code is available at https: //github. com/Jasoncode0115/MDTS-ADNet.

AAAI Conference 2025 Conference Paper

LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application

  • Jian Jia
  • Yipei Wang
  • Yan Li
  • Honggang Chen
  • Xuehan Bai
  • Zhaocheng Liu
  • Jian Liang
  • Quan Chen

Contemporary recommendation systems predominantly rely on ID embedding to capture latent associations among users and items. However, this approach overlooks the wealth of semantic information embedded within textual descriptions of items, leading to suboptimal performance and poor generalizations. Leveraging the capability of large language models to comprehend and reason about textual content presents a promising avenue for advancing recommendation systems. To achieve this, we propose an Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework that synergizes open-world knowledge with collaborative knowledge. We address computational complexity concerns by utilizing pretrained LLMs as item encoders and freezing LLM parameters to avoid catastrophic forgetting and preserve open-world knowledge. To bridge the gap between the open-world and collaborative domains, we design a twin-tower structure supervised by the recommendation task and tailored for practical industrial application. Through experiments on the real large-scale industrial dataset and online A/B tests, we demonstrate the efficacy of our approach in industry application. We also achieve state-of-the-art performance on six Amazon Review datasets to verify the superiority of our method.

ICLR Conference 2025 Conference Paper

Learning vector fields of differential equations on manifolds with geometrically constrained operator-valued kernels

  • Daning Huang
  • Hanyang He
  • John Harlim
  • Yan Li

We address the problem of learning ordinary differential equations (ODEs) on manifolds. Existing machine learning methods, particularly those using neural networks, often struggle with high computational demands. To overcome this issue, we introduce a geometrically constrained operator-valued kernel that allows us to represent vector fields on tangent bundles of smooth manifolds. The construction of the kernel imposes the geometric constraints that are estimated from the data and ensures the computational feasibility for learning high dimensional systems of ODEs. Once the vector fields are estimated, e.g., by the kernel ridge regression, we need an ODE solver that guarantees the solution to stay on (or close to) the manifold. To overcome this issue, we propose a geometry-preserving ODE solver that approximates the exponential maps corresponding to the ODE solutions. We deduce a theoretical error bound for the proposed solver that guarantees the approximate solutions to lie on the manifold in the limit of large data. We verify the effectiveness of the proposed approach on high-dimensional dynamical systems, including the cavity flow problem, the beating and travelling waves in Kuramoto-Sivashinsky equations, and the reaction-diffusion dynamics.

AAMAS Conference 2025 Conference Paper

Lite-DIO Is Actually What You Need for Efficient Inertial Localization

  • Yan Li
  • Meng Liu
  • Zhongchen Shi
  • Yanqing Hou
  • Liang Xie
  • Hongbo Chen
  • Erwei Yin

In this work, we propose a simple and effective framework (i. e. , Lite- DIO), marking the first attempt to accelerate deep inertial odometry with knowledge distillation. In Lite-DIO, we first independently construct the Transformer-based teacher model and a lightweight student network. Then, adaptive transferring knowledge is enabled between the teacher model and the student network in a duallevel contrastive distillation manner. With such design, the distilled knowledge comes from not only the teacher model’s predictions but also the latent high-order collaborative semantics preserved in embeddings. Extensive experiments conducted on three real-world datasets demonstrate that the proposed Lite-DIO significantly reduces model size and inference time compared to existing popular alternatives, while the compressed model still maintains competitive localization accuracy.

ECAI Conference 2025 Conference Paper

MInF: Multi-Band Invariant Feature Learning for Efficient Inertial Navigation

  • Yan Li
  • Yingying Wang 0003

Neural Inertial Navigation (NIN) plays a pivotal role in self-localization, aiming to infer the position of a mobile entity using noisy data from the onboard inertial measurement unit (IMU). Most existing methods rely on convolutional neural networks (CNNs) to capture dependencies among multiple variables, yet the time-frequency and invariant underlying features of IMU measurements remain underexplored. In this paper, we propose MInF, a Multi-Band Invariant Feature Learning for Efficient Inertial Navigation. The MInF advances mainly in two aspects. First, we design a Wavelet-based Multi-Band Mixer (MBMixer) for neural inertial navigation, which leverages the merits of multi-band 1D wavelet decomposition and Multi-Layer Perceptron (MLP)-based mixing to efficiently extract information in both the time and frequency domains in IMU measurements. Second, we introduce a self-supervised learning (SSL) method for learning invariant underlying features from inertial data without the need for any semantic labels. On the one hand, we learn a multi-task MBMixer via jointly classifying different transformations (i. e. , pretext tasks) applied to an input signal for extracting invariant underlying features. On the other hand, we use the learned MBMixer in pretext task as the pre-trained model and fine-tune it to regress velocity in the neural inertial navigation (i. e. , downstream task). Extensive experiments conducted on two real-world datasets demonstrate that the proposed MInF achieves SOTA results in neural inertial navigation, leading to 15% performance improvement while maintaining a low memory footprint and computational cost.

JBHI Journal 2025 Journal Article

P2TC: A Lightweight Pyramid Pooling Transformer-CNN Network for Accurate 3D Whole Heart Segmentation

  • Hengfei Cui
  • Yifan Wang
  • Fan Zheng
  • Yan Li
  • Yanning Zhang
  • Yong Xia

Cardiovascular disease is a leading global cause of death, requiring accurate heart segmentation for diagnosis and surgical planning. Deep learning methods have been demonstrated to achieve superior performances in cardiac structures segmentation. However, there are still limitations in 3D whole heart segmentation, such as inadequate spatial context modeling, difficulty in capturing long-distance dependencies, high computational complexity, and limited representation of local high-level semantic information. To tackle the above problems, we propose a lightweight Pyramid Pooling Transformer-CNN (P2TC) network for accurate 3D whole heart segmentation. The proposed architecture comprises a dual encoder-decoder structure with a 3D pyramid pooling Transformer for multi-scale information fusion and a lightweight large-kernel Convolutional Neural Network (CNN) for local feature extraction. The decoder has two branches for precise segmentation and contextual residual handling. The first branch is used to generate segmentation masks for pixel-level classification based on the features extracted by the encoder to achieve accurate segmentation of cardiac structures. The second branch highlights contextual residuals across slices, enabling the network to better handle variations and boundaries. Extensive experimental results on the Multi-Modality Whole Heart Segmentation (MM-WHS) 2017 challenge dataset demonstrate that P2TC outperforms the most advanced methods, achieving the Dice scores of 92. 6% and 88. 1% in Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) modalities respectively, which surpasses the baseline model by 1. 5% and 1. 7%, and achieves state-of-the-art segmentation results.

NeurIPS Conference 2025 Conference Paper

SCOUT: Teaching Pre-trained Language Models to Enhance Reasoning via Flow Chain-of-Thought

  • Guanghao Li
  • Wenhao Jiang
  • Mingfeng Chen
  • Yan Li
  • Hao Yu
  • Shuting Dong
  • Tao Ren
  • Ming Tang

Chain-of-Thought (CoT) prompting improves the reasoning performance of large language models (LLMs) by encouraging step-by-step thinking. However, CoT-based methods depend on intermediate reasoning steps, which limits scalability and generalization. Recent work explores recursive reasoning, where LLMs reuse internal layers across iterations to refine latent representations without explicit CoT supervision. While promising, these approaches often require costly pretraining and lack a principled framework for how reasoning should evolve across iterations. We address this gap by introducing Flow Chain-of-Thought (Flow CoT), a reasoning paradigm that models recursive inference as a progressive trajectory of latent cognitive states. Flow CoT frames each iteration as a distinct cognitive stage—deepening reasoning across iterations without relying on manual supervision. To realize this, we propose SCOUT ( Stepwise Cognitive Optimization Using Teachers ), a lightweight fine-tuning framework that enables Flow CoT-style reasoning without the need for pretraining. SCOUT uses progressive distillation to align each iteration with a teacher of appropriate capacity, and a cross-attention-based retrospective module that integrates outputs from previous iterations while preserving the model’s original computation flow. Experiments across eight reasoning benchmarks show that SCOUT consistently improves both accuracy and explanation quality, achieving up to 1. 8\% gains under fine-tuning. Qualitative analyses further reveal that SCOUT enables progressively deeper reasoning across iterations—refining both belief formation and explanation granularity. These results not only validate the effectiveness of SCOUT, but also demonstrate the practical viability of Flow CoT as a scalable framework for enhancing reasoning in LLMs.

NeurIPS Conference 2025 Conference Paper

Towards Self-Refinement of Vision-Language Models with Triangular Consistency

  • Yunlong Deng
  • Guangyi Chen
  • Tianpei Gu
  • Lingjing Kong
  • Yan Li
  • Zeyu Tang
  • Kun Zhang

Vision-Language Models (VLMs) integrate visual knowledge with the analytical capabilities of Large Language Models (LLMs) through supervised visual instruction tuning, using image-question-answer triplets. However, the potential of VLMs trained without supervised instruction remains largely unexplored. This study validates that VLMs possess inherent self-refinement capabilities, enabling them to generate high-quality supervised data without external inputs and thereby learn autonomously. Specifically, to stimulate the self-refinement ability of VLMs, we propose a self-refinement framework based on a Triangular Consistency principle: within the image-query-answer triangle, any masked elements should be consistently and accurately reconstructed. The framework involves three steps: (1) We enable the instruction generation ability of VLMs by adding multi-task instruction tuning like image$\rightarrow$question-answer or image-answer$\rightarrow$question. (2) We generate image-query-answer triplets from unlabeled images and use the Triangular Consistency principle for filtering. (3) The model is further updated using the filtered synthetic data. To investigate the underlying mechanisms behind this self-refinement capability, we conduct a theoretical analysis from a causal perspective. Using the widely recognized LLaVA-1. 5 as our baseline, our experiments reveal that the model can autonomously achieve consistent, though deliberately modest, improvements across multiple benchmarks without any external supervision, such as human annotations or environmental feedback. We expect that the insights of this study on the self-refinement ability of VLMs can inspire future research on the learning mechanism of VLMs. Code is available at https: //github. com/dengyl20/SRF-LLaVA-1. 5.

IJCAI Conference 2025 Conference Paper

VRD-IU: Lessons from Visually Rich Document Intelligence and Understanding

  • Yihao Ding
  • Soyeon Caren Han
  • Yan Li
  • Josiah Poon

Visually Rich Document Understanding (VRDU) has emerged as a critical field in document intelligence, enabling automated extraction of key information from complex documents across domains such as medical, financial, and educational applications. However, form-like documents pose unique challenges due to their complex layouts, multi-stakeholder involvement, and high structural variability. Addressing these issues, the VRD-IU Competition was introduced, focusing on extracting and localizing key information from multi-format forms within the Form-NLU dataset, which includes digital, printed, and handwritten documents. This paper presents insights from the competition, which featured two tracks: Track A, emphasizing entity-based key information retrieval, and Track B, targeting end-to-end key information localization from raw document images. With over 20 participating teams, the competition showcased various state-of-the-art methodologies, including hierarchical decomposition, transformer-based retrieval, multimodal feature fusion, and advanced object detection techniques. The top-performing models set new benchmarks in VRDU, providing valuable insights into document intelligence.

NeurIPS Conference 2025 Conference Paper

When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding

  • Yan Shu
  • Hangui Lin
  • Yexin Liu
  • Yan Zhang
  • Gangyan Zeng
  • Yan Li
  • Yu Zhou
  • Ser Nam Lim

Large Multimodal Models (LMMs) have achieved impressive progress in visual perception and reasoning. However, when confronted with visually ambiguous or non-semantic scene text, they often struggle to accurately spot and understand the content, frequently generating semantically plausible yet visually incorrect answers, which we refer to as semantic hallucination. In this work, we investigate the underlying causes of semantic hallucination and identify a key finding: Transformer layers in LLM with stronger attention focus on scene text regions are less prone to producing semantic hallucinations. Thus, we propose a training-free semantic hallucination mitigation framework comprising two key components: (1) ZoomText, a coarse-to-fine strategy that identifies potential text regions without external detectors; and (2) Grounded Layer Correction, which adaptively leverages the internal representations from layers less prone to hallucination to guide decoding, correcting hallucinated outputs for non-semantic samples while preserving the semantics of meaningful ones. To enable rigorous evaluation, we introduce TextHalu-Bench, a benchmark of 1, 740 samples spanning both semantic and non-semantic cases, with manually curated question–answer pairs designed to probe model hallucinations. Extensive experiments demonstrate that our method not only effectively mitigates semantic hallucination but also achieves strong performance on public benchmarks for scene text spotting and understanding.

YNIMG Journal 2025 Journal Article

White matter hyperintensity tissue property spatial variations as a function of cognitive status in Parkinson’s disease

  • Mariyemuguli Reheman
  • Sagar Buch
  • Naying He
  • Pei Huang
  • Qiurong Yu
  • Xinhui Wang
  • Yu Liu
  • Youmin Zhang

BACKGROUND AND PURPOSE: The pathological relationship between white matter hyperintensities (WMH) and cognitive impairment in Parkinson's disease (PD) remains unclear due to their variable locations, heterogeneity, and limited assessment of underlying tissue properties. This study integrates T2-FLAIR and quantitative MRI (qMRI) to investigate burden, spatial distribution, and extent of tissue alterations in WMH, aiming to elucidate their role in cognitive decline among PD patients. METHODS: A total of 122 age- and sex-matched PD patients and 65 healthy controls (HC) were recruited, with PD patients grouped by Montreal Cognitive Assessment (MoCA) score including normal, mild cognitive impairment (MCI) or PD with dementia (PDD). WMH burden was compared across groups and cognitive status. Water content, T1, and T2* measures were derived from qMRI data and tissue property heatmaps and periventricular distance profiles were constructed for all groups to visualize location-dependent tissue alterations of WMH relative to the lateral ventricles. In addition, voxel-wise analysis was performed to examine the correlation between WMH lesion tissue properties and MoCA scores. RESULTS: WMH volume was significantly higher in PDD compared to other groups (p < 0.05) and negatively correlated with MoCA scores (r = -0.352, p < 0.001). WMH appeared predominantly around the lateral ventricles, with anterior horn involvement common to all groups and posterior horn involvement specific to PDD. qMRI measures were significantly elevated in WMH compared to normal appearing white matter (NAWM) (p < 0.001), with heatmaps showing a negative gradient of tissue property changes from the lateral ventricles to the NAWM. Voxel-wise analysis revealed a significant negative correlation between the qMRI tissue properties of periventricular WMH and MoCA scores, with the strongest association observed in the periventricular WM situated just beyond the boundary of the lateral ventricles. CONCLUSION: Over and above volume differences, the spatial distribution and tissue property variations of WMH were closely linked to cognitive impairment in PD patients, with distinct patterns across different cognitive stages.

JBHI Journal 2024 Journal Article

3D-DGGAN: A Data-Guided Generative Adversarial Network for High Fidelity in Medical Image Generation

  • Jion Kim
  • Yan Li
  • Byeong-Seok Shin

Three-dimensional images are frequently used in medical imaging research for classification, segmentation, and detection. However, the limited availability of 3D images hinders research progress due to network training difficulties. Generative methods have been proposed to create medical images using AI techniques. Nevertheless, 2D approaches have difficulty dealing with 3D anatomical structures, which can result in discontinuities between slices. To mitigate these discontinuities, several 3D generative networks have been proposed. However, the scarcity of available 3D images makes training these networks with limited samples inadequate for producing high-fidelity 3D images. We propose a data-guided generative adversarial network to provide high fidelity in 3D image generation. The generator creates fake images with noise using reference code obtained by extracting features from real images. The generator also creates decoded images using reference code without noise. These decoded images are compared to the real images to evaluate fidelity in the reference code. This generation process can create high-fidelity 3D images from only a small amount of real training data. Additionally, our method employs three types of discriminator: volume (evaluates all the slices), slab (evaluates a set of consecutive slices), and slice (evaluates randomly selected slices). The proposed discriminator enhances fidelity by differentiating between real and fake images based on detailed characteristics. Results from our method are compared with existing methods by using quantitative analysis such as Fréchet inception distance and maximum mean discrepancy. The results demonstrate that our method produces more realistic 3D images than existing methods.

ICLR Conference 2024 Conference Paper

A Unified and General Framework for Continual Learning

  • Zhenyi Wang 0001
  • Yan Li
  • Li Shen 0008
  • Heng Huang 0001

Continual Learning (CL) focuses on learning from dynamic and changing data distributions while retaining previously acquired knowledge. Various methods have been developed to address the challenge of catastrophic forgetting, including regularization-based, Bayesian-based, and memory-replay-based techniques. However, these methods lack a unified framework and common terminology for describing their approaches. This research aims to bridge this gap by introducing a comprehensive and overarching framework that encompasses and reconciles these existing methodologies. Notably, this new framework is capable of encompassing established CL approaches as special instances within a unified and general optimization objective. An intriguing finding is that despite their diverse origins, these methods share common mathematical structures. This observation highlights the compatibility of these seemingly distinct techniques, revealing their interconnectedness through a shared underlying optimization objective. Moreover, the proposed general framework introduces an innovative concept called *refresh learning*, specifically designed to enhance the CL performance. This novel approach draws inspiration from neuroscience, where the human brain often sheds outdated information to improve the retention of crucial knowledge and facilitate the acquisition of new information. In essence, *refresh learning* operates by initially unlearning current data and subsequently relearning it. It serves as a versatile plug-in that seamlessly integrates with existing CL methods, offering an adaptable and effective enhancement to the learning process. Extensive experiments on CL benchmarks and theoretical analysis demonstrate the effectiveness of the proposed *refresh learning*.

AIIM Journal 2024 Journal Article

Efficient pyramid channel attention network for pathological myopia recognition with pretraining-and-finetuning

  • Xiaoqing Zhang
  • Jilu Zhao
  • Yan Li
  • Hao Wu
  • Xiangtian Zhou
  • Jiang Liu

Pathological myopia (PM) is the leading ocular disease for impaired vision worldwide. Clinically, the characteristics of pathology distribution in PM are global-local on the fundus image, which plays a significant role in assisting clinicians in diagnosing PM. However, most existing deep neural networks focused on designing complex architectures but rarely explored the pathology distribution prior of PM. To tackle this issue, we propose an efficient pyramid channel attention (EPCA) module, which fully leverages the potential of the clinical pathology prior of PM with pyramid pooling and multi-scale context fusion. Then, we construct EPCA-Net for automatic PM recognition based on fundus images by stacking a sequence of EPCA modules. Moreover, motivated by the recent pretraining-and-finetuning paradigm, we attempt to adapt pre-trained natural image models for PM recognition by freezing them and treating the EPCA and other attention modules as adapters. In addition, we construct a PM recognition benchmark termed PM-fundus by collecting fundus images of PM from publicly available datasets. The comprehensive experiments demonstrate the superiority of EPCA-Net over state-of-the-art methods in the PM recognition task. For example, EPCA-Net achieves 97. 56% accuracy and outperforms ViT by 2. 85% accuracy on the PM-fundus dataset. The results also show that our method based on the pretraining-and-finetuning paradigm achieves competitive performance through comparisons to part of previous methods based on traditional fine-tuning paradigm with fewer tunable parameters, which has the potential to leverage more natural image foundation models to address the PM recognition task in limited medical data regime.

JBHI Journal 2024 Journal Article

Generalizable Polyp Segmentation via Randomized Global Illumination Augmentation

  • Zuyu Zhang
  • Yan Li
  • Byeong-Seok Shin

Accuratelysegmenting polyps from colonoscopy images is essential for diagnosing colorectal cancer. Despite the tremendous success of the deep convolutional neural networks in automatic polyp segmentation, it suffers from domain shift issues, where the trained model yields performance deterioration on unseen test datasets. This article proposes an illumination enhancement-based domain generalization approach to improve the generalization capability of the model on unseen test datasets and alleviate this issue. In particular, an image decomposition module (IDM) was developed to separate colonoscopy images into reflectance, local, and global illumination components. An illumination transform module (ITM) was proposed to augment images with different global illuminations by synthesizing target-like global illumination maps. A novel illumination variance insensitiveness (IViSen) is also introduced to evaluate the robustness of the model against illumination disturbance. IViSen is easy to compute and correlates well with model generalizability. The segmentation performance of the proposed model on four colonoscopy datasets was examined: CVC-ClinicDB, CVC-ColonDB, ETIS-Larib, and Kvasir-SEG. The method outperformed the competitive methods when tested on unseen domains. In particular, the proposed approach yielded 60. 82% and 53. 19% in terms of mean Dice and IoU, respectively, with 2. 06% and 2. 31% improvements.

AIIM Journal 2024 Journal Article

Harnessing machine learning for EEG signal analysis: Innovations in depth of anaesthesia assessment

  • Thomas Schmierer
  • Tianning Li
  • Yan Li

Anaesthesia, crucial to surgical practice, is undergoing renewed scrutiny due to the integration of artificial intelligence in its medical use. The precise control over the temporary loss of consciousness is vital to ensure safe, pain-free procedures. Traditional methods of depth of anaesthesia (DoA) assessment, reliant on physical characteristics, have proven inconsistent due to individual variations. In response, electroencephalography (EEG) techniques have emerged, with indices such as the Bispectral Index offering quantifiable assessments. This literature review explores the current scope and frontier of DoA research, emphasising methods utilising EEG signals for effective clinical monitoring. This review offers a critical synthesis of recent advances, specifically focusing on electroencephalography (EEG) techniques and their role in enhancing clinical monitoring. By examining 117 high-impact papers, the review delves into the nuances of feature extraction, model building, and algorithm design in EEG-based DoA analysis. Comparative assessments of these studies highlight their methodological approaches and performance, including clinical correlations with established indices like the Bispectral Index. The review identifies knowledge gaps, particularly the need for improved collaboration for data access, which is essential for developing superior machine learning models and real-time predictive algorithms for patient management. It also calls for refined model evaluation processes to ensure robustness across diverse patient demographics and anaesthetic agents. The review underscores the potential of technological advancements to enhance precision, safety, and patient outcomes in anaesthesia, paving the way for a new standard in anaesthetic care. The findings of this review contribute to the ongoing discourse on the application of EEG in anaesthesia, providing insights into the potential for technological advancement in this critical area of medical practice.

YNICL Journal 2024 Journal Article

Temporal evolution of microstructural integrity in cerebellar peduncles in Parkinson’s disease: Stage-specific patterns and dopaminergic correlates

  • Chentao He
  • Rui Yang
  • Siming Rong
  • Piao Zhang
  • Xi Chen
  • Qi Qi
  • Ziqi Gao
  • Yan Li

BACKGROUND: Previous research revealed differences in cerebellar white matter integrity by disease stages, indicating a compensatory role in Parkinson's disease (PD). However, the temporal evolution of cerebellar white matter microstructure in patients with PD (PwPD) remains unclear. OBJECTIVE: To unravel temporal evolution of cerebellar white matter and its dopaminergic correlates in PD. METHODS: We recruited 124 PwPD from the PPMI study. The participants were divided into two subsets: Subset 1 (n = 41) had three MRI scans (baseline, 2 years, and 4 years), and Subset 2 (n = 106) had at least two MRI scans at baseline, 1 year, and/or 2 years. Free water-corrected diffusion metrics were used to measure the microstructural integrity in cerebellar peduncles (CP), the main white matter tracts connecting to and from the cerebellum. The ACAPULCO processing pipeline was used to assess cerebellar lobules volumes. Linear mixed-effect models were used to study longitudinal changes. We also examined the relationships between microstructural integrity in CP, striatal dopamine transporter specific binding ratio (SBR), and clinical symptoms. RESULTS: Microstructural changes in CP showed a non-linear pattern in PwPD. Free water-corrected fractional anisotropy (FAt) increased in the first two years but declined from 2 to 4 years, while free water-corrected mean diffusivity exhibited the opposite trend. The initial increased FAt in CP correlated with cerebellar regional volume atrophy, striatal dopaminergic SBR decline, and worsening clinical symptoms, but this correlation varied across disease stages. CONCLUSIONS: Our findings suggest a non-linear evolution of microstructural integrity in CP throughout the course of PD, indicating the adaptive structural reorganization of the cerebellum simultaneously with progressive striatal dopaminergic degeneration in PD.

YNIMG Journal 2024 Journal Article

The enhanced connectivity between the frontoparietal, somatomotor network and thalamus as the most significant network changes of chronic low back pain

  • Kun Zhu
  • Jianchao Chang
  • Siya Zhang
  • Yan Li
  • Junxun Zuo
  • Haoyu Ni
  • Bingyong Xie
  • Jiyuan Yao

The prolonged duration of chronic low back pain (cLBP) inevitably leads to changes in the cognitive, attentional, sensory and emotional processing brain regions. Currently, it remains unclear how these alterations are manifested in the interplay between brain functional and structural networks. This study aimed to predict the Oswestry Disability Index (ODI) in cLBP patients using multimodal brain magnetic resonance imaging (MRI) data and identified the most significant features within the multimodal networks to aid in distinguishing patients from healthy controls (HCs). We constructed dynamic functional connectivity (dFC) and structural connectivity (SC) networks for all participants (n = 112) and employed the Connectome-based Predictive Modeling (CPM) approach to predict ODI scores, utilizing various feature selection thresholds to identify the most significant network change features in dFC and SC outcomes. Subsequently, we utilized these significant features for optimal classifier selection and the integration of multimodal features. The results revealed enhanced connectivity among the frontoparietal network (FPN), somatomotor network (SMN) and thalamus in cLBP patients compared to HCs. The thalamus transmits pain-related sensations and emotions to the cortical areas through the dorsolateral prefrontal cortex (dlPFC) and primary somatosensory cortex (SI), leading to alterations in whole-brain network functionality and structure. Regarding the model selection for the classifier, we found that Support Vector Machine (SVM) best fit these significant network features. The combined model based on dFC and SC features significantly improved classification performance between cLBP patients and HCs (AUC=0.9772). Finally, the results from an external validation set support our hypotheses and provide insights into the potential applicability of the model in real-world scenarios. Our discovery of enhanced connectivity between the thalamus and both the dlPFC (FPN) and SI (SMN) provides a valuable supplement to prior research on cLBP.

JBHI Journal 2023 Journal Article

An Improved Combination of Faster R-CNN and U-Net Network for Accurate Multi-Modality Whole Heart Segmentation

  • Hengfei Cui
  • Yifan Wang
  • Yan Li
  • Di Xu
  • Lei Jiang
  • Yong Xia
  • Yanning Zhang

Detailed information of substructures of the whole heart is usually vital in the diagnosis of cardiovascular diseases and in 3D modeling of the heart. Deep convolutional neural networks have been demonstrated to achieve state-of-the-art performance in 3D cardiac structures segmentation. However, when dealing with high-resolution 3D data, current methods employing tiling strategies usually degrade segmentation performances due to GPU memory constraints. This work develops a two-stage multi-modality whole heart segmentation strategy, which adopts an improved Combination of Faster R-CNN and 3D U-Net (CFUN+). More specifically, the bounding box of the heart is first detected by Faster R-CNN, and then the original Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) images of the heart aligned with the bounding box are input into 3D U-Net for segmentation. The proposed CFUN+ method redefines the bounding box loss function by replacing the previous Intersection over Union (IoU) loss with Complete Intersection over Union (CIoU) loss. Meanwhile, the integration of the edge loss makes the segmentation results more accurate, and also improves the convergence speed. The proposed method achieves an average Dice score of 91. 1% on the Multi-Modality Whole Heart Segmentation (MM-WHS) 2017 challenge CT dataset, which is 5. 2% higher than the baseline CFUN model, and achieves state-of-the-art segmentation results. In addition, the segmentation speed of a single heart has been dramatically improved from a few minutes to less than 6 seconds.

EAAI Journal 2023 Journal Article

Deep learning-powered vessel traffic flow prediction with spatial-temporal attributes and similarity grouping

  • Yan Li
  • Maohan Liang
  • Huanhuan Li
  • Zaili Yang
  • Liang Du
  • Zhongshuo Chen

Perceiving the future trend of Vessel Traffic Flow (VTF) in advance has great application values in the maritime industry. However, using such big data from the Automatic Identification System (AIS) for accurate VTF prediction remains challenging. Deep training networks can learn valuable features from extensive historical data. This paper proposes a new learning-based prediction network, improved Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) with similarity grouping, including three views. To effectively enable the training network to capture the temporal and periodic (i. e. a spatial attribute) change characteristics of VTF, the CNN and LSTM are employed to compose spatial and temporal views, respectively. Hence, the original one-dimensional data is transformed into a matrix (hour of the day ✕ day) to adapt the input of the proposed methodology. In practical applications, VTF of multiple adjacent target regions need to be predicted simultaneously, and the changes of VTF in different areas may influence each other. To explore their hidden relationships, the similarity grouping view aims to find the target area that exhibits the most similarity with the VTF change trend of the current research area. Furthermore, similar information is combined with the features generated from the other two views to obtain the prediction results. In summary, the new advantage lies in mining the spatiotemporal attributes of data and fusing the similarity information of adjacent regions. Comparative experiments with eleven other methods on realistic VTF datasets show that the proposed method demonstrates superior prediction accuracy and stability performance.

YNIMG Journal 2023 Journal Article

Diagnosing Parkinson's disease by combining neuromelanin and iron imaging features using an automated midbrain template approach

  • Mojtaba Jokar
  • Zhijia Jin
  • Pei Huang
  • Ying Wang
  • Youmin Zhang
  • Yan Li
  • Zenghui Cheng
  • Yu Liu

BACKGROUND AND PURPOSE: Early diagnosis of Parkinson's disease (PD) is still a clinical challenge. Most previous studies using manual or semi-automated methods for segmenting the substantia nigra (SN) are time-consuming and, despite raters being well-trained, individual variation can be significant. In this study, we used a template-based, automatic, SN subregion segmentation pipeline to detect the neuromelanin (NM) and iron features in the SN and SN pars compacta (SNpc) derived from a single 3D magnetization transfer contrast (MTC) gradient echo (GRE) sequence in an attempt to develop a comprehensive imaging biomarker that could be used to diagnose PD. MATERIALS AND METHODS: volume, SNpc volume and iron content with a variety of thresholds as well as the N1 sign in diagnosing PD. Correlation analyses were performed to study the relationship between these imaging measures and the clinical scales in PD. RESULTS: = 0.04, p = 0.013) in PD patients. CONCLUSION: volume, SNpc volume and iron content) resulted in an AUC of 0.947 and provided a comprehensive set of imaging biomarkers that, potentially, could be used to diagnose PD clinically.

AIIM Journal 2023 Journal Article

Diagnosis of Alzheimer’s disease by joining dual attention CNN and MLP based on structural MRIs, clinical and genetic data

  • Yan-Rui Qiang
  • Shao-Wu Zhang
  • Jia-Ni Li
  • Yan Li
  • Qin-Yi Zhou

Alzheimer’s disease (AD) is an irreversible central nervous degenerative disease, while mild cognitive impairment (MCI) is a precursor state of AD. Accurate early diagnosis of AD is conducive to the prevention and early intervention treatment of AD. Although some computational methods have been developed for AD diagnosis, most employ only neuroimaging, ignoring other data (e. g. , genetic, clinical) that may have potential disease information. In addition, the results of some methods lack interpretability. In this work, we proposed a novel method (called DANMLP) of joining dual attention convolutional neural network (CNN) and multilayer perceptron (MLP) for computer-aided AD diagnosis by integrating multi-modality data of the structural magnetic resonance imaging (sMRI), clinical data (i. e. , demographics, neuropsychology), and APOE genetic data. Our DANMLP consists of four primary components: (1) the Patch-CNN for extracting the image characteristics from each local patch, (2) the position self-attention block for capturing the dependencies between features within a patch, (3) the channel self-attention block for capturing dependencies of inter-patch features, (4) two MLP networks for extracting the clinical features and outputting the AD classification results, respectively. Compared with other state-of-the-art methods in the 5CV test, DANMLP achieves 93% and 82. 4% classification accuracy for the AD vs. MCI and MCI vs. NC tasks on the ADNI database, which is 0. 2% ∼ 15. 2% and 3. 4% ∼ 26. 8% higher than that of other five methods, respectively. The individualized visualization of focal areas can also help clinicians in the early diagnosis of AD. These results indicate that DANMLP can be effectively used for diagnosing AD and MCI patients.

YNIMG Journal 2023 Journal Article

Hyperpolarized [2–13C]pyruvate MR molecular imaging with whole brain coverage

  • Brian T. Chung
  • Yaewon Kim
  • Jeremy W. Gordon
  • Hsin-Yu Chen
  • Adam W. Autry
  • Philip M. Lee
  • Jasmine Y. Hu
  • Chou T. Tan

Hyperpolarized (HP) 13C Magnetic Resonance Imaging (MRI) was applied for the first time to image and quantify the uptake and metabolism of [2–13C]pyruvate in the human brain to provide new metabolic information on cerebral energy metabolism. HP [2–13C]pyruvate was injected intravenously and imaged in 5 healthy human volunteer exams with whole brain coverage in a 1-minute acquisition using a specialized spectral-spatial multi-slice echoplanar imaging (EPI) pulse sequence to acquire 13C-labeled volumetric and dynamic images of [2–13C]pyruvate and downstream metabolites [5–13C]glutamate and [2–13C]lactate. Metabolic ratios and apparent conversion rates of pyruvate-to-lactate (k PL) and pyruvate-to-glutamate (k PG) were quantified to investigate simultaneously glycolytic and oxidative metabolism in a single injection.

YNICL Journal 2023 Journal Article

Locus coeruleus and substantia nigra neuromelanin magnetic resonance imaging differentiates Parkinson’s disease and essential tremor

  • Xinhui Wang
  • Pei Huang
  • Ewart Mark Haacke
  • Yu Liu
  • Youmin Zhang
  • Zhijia Jin
  • Yan Li
  • Qiuyun Xu

BACKGROUND: Differential diagnosis of essential tremor (ET) and Parkinson's disease (PD) can still be a challenge in clinical practice. These two tremor disorders may have different pathogenesis related to the substantia nigra (SN) and locus coeruleus (LC). Characterizing neuromelanin (NM) in these structures may help improve the differential diagnosis. METHODS: from ET was assessed with a receiver operative characteristic curve, and the area under the curve (AUC) was calculated. RESULTS: from ET. CONCLUSION: and ET, and the investigation of the underlying pathophysiology.

YNICL Journal 2023 Journal Article

Multi-parametric hyperpolarized 13C/1H imaging reveals Warburg-related metabolic dysfunction and associated regional heterogeneity in high-grade human gliomas

  • Adam W. Autry
  • Sana Vaziri
  • Marisa LaFontaine
  • Jeremy W. Gordon
  • Hsin-Yu Chen
  • Yaewon Kim
  • Javier E. Villanueva-Meyer
  • Annette Molinaro

BACKGROUND: C imaging approach, we investigated dynamic and steady-state metabolism, together with physiological parameters, in high-grade gliomas to characterize active tumor. METHODS: and treatment effects. RESULTS: C]lactate and modified ratios relative to treatment effects. CONCLUSIONS: H imaging techniques.

NeurIPS Conference 2023 Conference Paper

Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms

  • Alexander Bukharin
  • Yan Li
  • Yue Yu
  • Qingru Zhang
  • Zhehui Chen
  • Simiao Zuo
  • Chao Zhang
  • Songan Zhang

Multi-Agent Reinforcement Learning (MARL) has shown promising results across several domains. Despite this promise, MARL policies often lack robustness and are therefore sensitive to small changes in their environment. This presents a serious concern for the real world deployment of MARL algorithms, where the testing environment may slightly differ from the training environment. In this work we show that we can gain robustness by controlling a policy’s Lipschitz constant, and under mild conditions, establish the existence of a Lipschitz and close-to-optimal policy. Motivated by these insights, we propose a new robust MARL framework, ERNIE, that promotes the Lipschitz continuity of the policies with respect to the state observations and actions by adversarial regularization. The ERNIE framework provides robustness against noisy observations, changing transition dynamics, and malicious actions of agents. However, ERNIE’s adversarial regularization may introduce some training instability. To reduce this instability, we reformulate adversarial regularization as a Stackelberg game. We demonstrate the effectiveness of the proposed framework with extensive experiments in traffic light control and particle environments. In addition, we extend ERNIE to mean-field MARL with a formulation based on distributionally robust optimization that outperforms its non-robust counterpart and is of independent interest. Our code is available at https: //github. com/abukharin3/ERNIE.

EAAI Journal 2023 Journal Article

TFG-Net:Tropical Cyclone Intensity Estimation from a Fine-grained perspective with the Graph convolution neural network

  • Guangning Xu
  • Yan Li
  • Chi Ma
  • Xutao Li
  • Yunming Ye
  • Qingquan Lin
  • Zhichao Huang
  • Shidong Chen

Tropical Cyclone Intensity Estimation (TIE) is a fundamental study subject for tropical cyclone development, flood or landslide avoidance, etc. Despite considerable efforts, two main challenges remain unresolved in this critical endeavor. The first challenge is that the TIE task is frequently conducted as a coarse-grained recognition problem rather than a fine-grained one. The second challenge is that the prediction fails to consider general wind speed information. To conquer these two challenges, we offer a novel model, namely Tropical cyclone intensity estimation from a Fine-grained perspective with the Graph convolution neural Network (TFG-Net). It is composed of three key components, viz. , the Backbone, the Fine-grained Tropical cyclone Features Extractor (FTFE), and the Wind Scale Transition Rule Generator (WTRG), which aim at extracting general spatial features, subtle spatial features, and general wind speed information, respectively. To validate the proposed method, extensive experiments on a well-known real-world tropical dataset named GridSat were carried out. Following the standard benchmark task setting that the model estimates the wind speed from a given satellite image, the proposed TFG-Net reaches 11. 12 knots in the RMSE metric, which outperforms 33. 33%, 2. 54% to the traditional method and the state-of-the-art deep learning method, respectively. The code is available on GitHub: https: //github. com/xuguangning1218/TI_Estimation and its reproductive result is available on Code Ocean: https: //doi. org/10. 24433/CO. 6606867. v1.

AAAI Conference 2022 Conference Paper

A Hybrid Causal Structure Learning Algorithm for Mixed-Type Data

  • Yan Li
  • Rui Xia
  • Chunchen Liu
  • Liang Sun

Inferring the causal structure of a set of random variables is a crucial problem in many disciplines of science. Over the past two decades, various approaches have been proposed for causal discovery from observational data. However, most of the existing methods are designed for either purely discrete or continuous data, which limit their practical usage. In this paper, we target the problem of causal structure learning from observational mixed-type data. Although there are a few methods that are able to handle mixed-type data, they suffer from restrictions, such as linear assumption and poor scalability. To overcome these weaknesses, we formulate the causal mechanisms via mixed structure equation model and prove its identifiability under mild conditions. A novel locally consistent score, named CVMIC, is proposed for causal directed acyclic graph (DAG) structure learning. Moreover, we propose an efficient conditional independence test, named MRCIT, for mixed-type data, which is used in causal skeleton learning and final pruning to further improve the computational efficiency and precision of our model. Experimental results on both synthetic and real-world data demonstrate that our proposed hybrid model outperforms the other state-of-the-art methods. Our source code is available at https: //github. com/DAMO-DI-ML/AAAI2022-HCM.

YNICL Journal 2022 Journal Article

Assessment of higher-order singular value decomposition denoising methods on dynamic hyperpolarized [1-13C]pyruvate MRI data from patients with glioma

  • Sana Vaziri
  • Adam W. Autry
  • Marisa LaFontaine
  • Yaewon Kim
  • Jeremy W. Gordon
  • Hsin-Yu Chen
  • Jasmine Y. Hu
  • Janine M. Lupo

BACKGROUND: C]pyruvate MRI data acquired from patients with glioma. METHODS: ) conversion rates within regions of interest (ROIs) before and after denoising was then compared. RESULTS: modeling error increased from 0% to 15% (TRI) and 8% (GL-HOSVD). CONCLUSION: C data and thereby improve monitoring of metabolic changes in patients with glioma following treatment.

NeurIPS Conference 2022 Conference Paper

Generative Time Series Forecasting with Diffusion, Denoise, and Disentanglement

  • Yan Li
  • Xinjiang Lu
  • Yaqing Wang
  • Dejing Dou

Time series forecasting has been a widely explored task of great importance in many applications. However, it is common that real-world time series data are recorded in a short time period, which results in a big gap between the deep model and the limited and noisy time series. In this work, we propose to address the time series forecasting problem with generative modeling and propose a bidirectional variational auto-encoder (BVAE) equipped with diffusion, denoise, and disentanglement, namely D3VAE. Specifically, a coupled diffusion probabilistic model is proposed to augment the time series data without increasing the aleatoric uncertainty and implement a more tractable inference process with BVAE. To ensure the generated series move toward the true target, we further propose to adapt and integrate the multiscale denoising score matching into the diffusion process for time series forecasting. In addition, to enhance the interpretability and stability of the prediction, we treat the latent variable in a multivariate manner and disentangle them on top of minimizing total correlation. Extensive experiments on synthetic and real-world data show that D3VAE outperforms competitive algorithms with remarkable margins. Our implementation is available at https: //github. com/PaddlePaddle/PaddleSpatial/tree/main/research/D3VAE.

AAAI Conference 2022 Conference Paper

Neighborhood-Adaptive Structure Augmented Metric Learning

  • Pandeng Li
  • Yan Li
  • Hongtao Xie
  • Lei Zhang

Most metric learning techniques typically focus on sample embedding learning, while implicitly assume a homogeneous local neighborhood around each sample, based on the metrics used in training (e. g. , hypersphere for Euclidean distance or unit hyperspherical crown for cosine distance). As realworld data often lies on a low-dimensional manifold curved in a high-dimensional space, it is unlikely that everywhere of the manifold shares the same local structures in the input space. Besides, considering the non-linearity of neural networks, the local structure in the output embedding space may not be as homogeneous as assumed. Therefore, representing each sample simply with its embedding while ignoring its individual neighborhood structure would have limitations in Embedding-Based Retrieval (EBR). By exploiting the heterogeneity of local structures in the embedding space, we propose a Neighborhood-Adaptive Structure Augmented metric learning framework (NASA), where the neighborhood structure is realized as a structure embedding, and learned along with the sample embedding in a self-supervised manner. In this way, without any modifications, most indexing techniques can be used to support large-scale EBR with NASA embeddings. Experiments on six standard benchmarks with two kinds of embeddings, i. e. , binary embeddings and real-valued embeddings, show that our method significantly improves and outperforms the state-of-the-art methods.

YNIMG Journal 2022 Journal Article

Predicting brain age from functional connectivity in symptomatic and preclinical Alzheimer disease

  • Peter R. Millar
  • Patrick H. Luckett
  • Brian A. Gordon
  • Tammie L.S. Benzinger
  • Suzanne E. Schindler
  • Anne M. Fagan
  • Randall J. Bateman
  • Jae-Hong Lee

"Brain-predicted age" quantifies apparent brain age compared to normative neuroimaging trajectories. Advanced brain-predicted age has been well established in symptomatic Alzheimer disease (AD), but is underexplored in preclinical AD. Prior brain-predicted age studies have typically used structural MRI, but resting-state functional connectivity (FC) remains underexplored. Our model predicted age from FC in 391 cognitively normal, amyloid-negative controls (ages 18-89). We applied the trained model to 145 amyloid-negative, 151 preclinical AD, and 156 symptomatic AD participants to test group differences. The model accurately predicted age in the training set. FC-predicted brain age gaps (FC-BAG) were significantly older in symptomatic AD and significantly younger in preclinical AD compared to controls. There was minimal correspondence between networks predictive of age and AD. Elevated FC-BAG may reflect network disruption during symptomatic AD. Reduced FC-BAG in preclinical AD was opposite to the expected direction, and may reflect a biphasic response to preclinical AD pathology or may be driven by inconsistency between age-related vs. AD-related networks. Overall, FC-predicted brain age may be a sensitive AD biomarker.

IJCAI Conference 2022 Conference Paper

Robust Single Image Dehazing Based on Consistent and Contrast-Assisted Reconstruction

  • De Cheng
  • Yan Li
  • Dingwen Zhang
  • Nannan Wang
  • Xinbo Gao
  • Jiande Sun

Single image dehazing as a fundamental low-level vision task, is essential for the development of robust intelligent surveillance system. In this paper, we make an early effort to consider dehazing robustness under variational haze density, which is a realistic while under-studied problem in the research filed of singe image dehazing. To properly address this problem, we propose a novel density-variational learning framework to improve the robustness of the image dehzing model assisted by a variety of negative hazy images, to better deal with various complex hazy scenarios. Specifically, the dehazing network is optimized under the consistency-regularized framework with the proposed Contrast-Assisted Reconstruction Loss (CARL). The CARL can fully exploit the negative information to facilitate the traditional positive-orient dehazing objective function, by squeezing the dehazed image to its clean target from different directions. Meanwhile, the consistency regularization keeps consistent outputs given multi-level hazy images, thus improving the model robustness. Extensive experimental results on two synthetic and three real-world datasets demonstrate that our method significantly surpasses the state-of-the-art approaches.

JBHI Journal 2021 Journal Article

Deep Learning-Based End-to-End Diagnosis System for Avascular Necrosis of Femoral Head

  • Yang Li
  • Yan Li
  • Hua Tian

As the first diagnostic imaging modality of avascular necrosis of the femoral head (AVNFH), accurately staging AVNFH from a plain radiograph is critical yet challenging for orthopedists. Thus, we propose a deep learning-based AVNFH diagnosis system (AVN-net). The proposed AVN-net reads plain radiographs of the pelvis, conducts diagnosis, and visualizes results automatically. Deep convolutional neural networks are trained to provide an end-to-end diagnosis solution, covering tasks of femoral head detection, exam-view identification, side classification, AVNFH diagnosis, and key clinical notes generation. AVN-net is able to obtain state-of-the-art testing AUC of 0. 97 ($95\%$ CI: $0. 97-0. 98$) in AVNFH detection and significantly greater F1 scores than less-to-moderately experienced orthopedists in all diagnostic tests (p <; 0. 01). Furthermore, two real-world pilot studies were conducted for diagnosis support and education assistance, respectively, to assess the utility of AVN-net. The experimental results are promising. With the AVN-net diagnosis as a reference, the diagnostic accuracy and consistency of all orthopedists considerably improved while requiring only 1/4 of the time. Students self-studying the AVNFH diagnosis using AVN-net can learn better and faster than the control group. To the best of our knowledge, this study is the first research on the prospective use of a deep learning-based diagnosis system for AVNFH by conducting two pilot studies representing real-world application scenarios. We have demonstrated that the proposed AVN-net achieves expert-level AVNFH diagnosis performance, provides efficient support in clinical decision-making, and effectively passes clinical experience to students.

AAAI Conference 2021 Conference Paper

Deep Metric Learning with Self-Supervised Ranking

  • Zheren Fu
  • Yan Li
  • Zhendong Mao
  • Quan Wang
  • Yongdong Zhang

Deep metric learning aims to learn a deep embedding space, where similar objects are pushed towards together and different objects are repelled against. Existing approaches typically use inter-class characteristics, e. g. , class-level information or instance-level similarity, to obtain semantic relevance of data points and get a large margin between different classes in the embedding space. However, the intra-class characteristics, e. g. , local manifold structure or relative relationship within the same class, are usually overlooked in the learning process. Hence the data structure cannot be fully exploited and the output embeddings have limitation in retrieval. More importantly, retrieval results lack in a good ranking. This paper presents a novel self-supervised ranking auxiliary framework, which captures intra-class characteristics as well as inter-class characteristics for better metric learning. Our method defines specific transform functions to simulates the local structure change of intra-class in the initial image domain, and formulates a self-supervised learning procedure to fully exploit this property and preserve it in the embedding space. Extensive experiments on three standard benchmarks show that our method significantly improves and outperforms the state-of-the-art methods on the performances of both retrieval and ranking by 2%-4%.

TCS Journal 2021 Journal Article

Equitable list tree-coloring of bounded treewidth graphs

  • Yan Li
  • Xin Zhang

The equitable list tree-coloring model is an useful tool to formulate a structure decomposition problem on the complex network with some security considerations. In this paper, it is proved that the equitable list vertex arboricity of every graph with treewidth ω is at most ⌈ Δ ( G ) / 2 ⌉ + ω − 2 whenever Δ ( G ) ≥ 4 ω + 1, and moreover, if such a graph does not contain K 3, 3 as a topological minor, then its equitable list vertex arboricity is at most ⌈ Δ ( G ) / 2 ⌉ provided that ω ∈ { 2, 3, 4 } and Δ ( G ) ≥ 6 ω − 3.

YNIMG Journal 2021 Journal Article

Imaging iron and neuromelanin simultaneously using a single 3D gradient echo magnetization transfer sequence: Combining neuromelanin, iron and the nigrosome-1 sign as complementary imaging biomarkers in early stage Parkinson's disease

  • Naying He
  • Kiarash Ghassaban
  • Pei Huang
  • Mojtaba Jokar
  • Ying Wang
  • Zenghui Cheng
  • Zhijia Jin
  • Yan Li

Diagnosing early stage Parkinson's disease (PD) is still a clinical challenge. Previous studies using iron, neuromelanin (NM) or the Nigrosome-1 (N1) sign in the substantia nigra (SN) by themselves have been unable to provide sufficiently high diagnostic performance for these methods to be adopted clinically. Our goal in this study was to extract the NM complex volume, iron content and volume representing the entire SN, and the N1 sign as potential complementary imaging biomarkers using a single 3D magnetization transfer contrast (MTC) gradient echo sequence and to evaluate their diagnostic performance and clinical correlations in early stage PD. A total of 40 early stage idiopathic PD subjects and 40 age- and sex-matched healthy controls (HCs) were imaged at 3T. NM boundaries (representing the SN pars compacta (SNpc) and parabrachial pigmented nucleus) and iron boundaries representing the total SN (SNpc and SN pars reticulata) were determined semi-automatically using a dynamic programming (DP) boundary detection algorithm. Receiver operating characteristic analyses were performed to evaluate the utility of these imaging biomarkers in diagnosing early stage PD. A correlation analysis was used to study the relationship between these imaging measures and the clinical scales. We also introduced the concept of NM and total iron overlap volumes to demonstrate the loss of NM relative to the iron containing SN. Furthermore, all 80 cases were evaluated for the N1 sign independently. The NM and SN volumes were lower while the iron content was higher in the SN for PD subjects compared to HCs. Interestingly, the PD subjects with bilateral loss of the N1 sign had the highest iron content. The area under the curve (AUC) values for the average of both hemispheres for single measures were: .960 for NM complex volume; .788 for total SN volume; .740 for SN iron content and. 891 for the N1 sign. Combining NM complex volume with each of the following measures through binary logistic regression led to AUC values for the averaged right and left sides of: .976 for total iron content; .969 for total SN volume, .965 for overlap volume and. 983 for the N1 sign. We found a negative correlation between SN volume and UPDRS-III (R2 =. 22, p =. 002). While the N1 sign performed well, it does not contain any information about iron content or NM quantitatively, therefore, marrying this sign with the NM and iron measures provides a better physiological explanation of what is happening when the N1 sign disappears in PD subjects. In summary, the combination of NM complex volume, SN volume, iron content and the N1 sign as derived from a single MTC sequence provides complementary information for understanding and diagnosing early stage PD.

NeurIPS Conference 2021 Conference Paper

Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL

  • Minshuo Chen
  • Yan Li
  • Ethan Wang
  • Zhuoran Yang
  • Zhaoran Wang
  • Tuo Zhao

Mean-Field Multi-Agent Reinforcement Learning (MF-MARL) is attractive in the applications involving a large population of homogeneous agents, as it exploits the permutation invariance of agents and avoids the curse of many agents. Most existing results only focus on online settings, in which agents can interact with the environment during training. In some applications such as social welfare optimization, however, the interaction during training can be prohibitive or even unethical in the societal systems. To bridge such a gap, we propose a SAFARI (peSsimistic meAn-Field vAlue iteRatIon) algorithm for off-line MF-MARL, which only requires a handful of pre-collected experience data. Theoretically, under a weak coverage assumption that the experience dataset contains enough information about the optimal policy, we prove that for an episodic mean-field MDP with a horizon $H$ and $N$ training trajectories, SAFARI attains a sub-optimality gap of $\mathcal{O}(H^2d_{\rm eff} /\sqrt{N})$, where $d_{\rm eff}$ is the effective dimension of the function class for parameterizing the value function, but independent on the number of agents. Numerical experiments are provided.

AAAI Conference 2021 Conference Paper

Query-Memory Re-Aggregation for Weakly-supervised Video Object Segmentation

  • Fanchao Lin
  • Hongtao Xie
  • Yan Li
  • Yongdong Zhang

Weakly-supervised video object segmentation (WVOS) is an emerging video task that can track and segment the target given a simple bounding box label. However, existing WVOS methods are still unsatisfied in either speed or accuracy, since they only use the exemplar frame to guide the prediction while they neglect the reference from other frames. To solve the problem, we propose a novel Re-Aggregation based framework, which uses feature matching to efficiently find the target and capture the temporal dependencies from multiple frames to guide the segmentation. Based on a two-stage structure, our framework builds an information-symmetric matching process to achieve robust aggregation. In each stage, we design a Query-Memory Aggregation (QMA) module to gather features from the past frames and make bidirectional aggregation to adaptively weight the aggregated features, which relieves the latent misguidance in unidirectional aggregation. To exploit the information from different aggregation stages, we propose a novel coarse-fine constraint by using the Cascaded Refinement Module (CRM) to combine the predictions from different stages and further boost the performance. Experimental results on three benchmarks show that our method achieves the state-of-the-art performance in WVOS (e. g. , an overall score of 84. 7% on the DAVIS 2016 validation set).

TCS Journal 2021 Journal Article

Star-critical Ramsey number of large cycle and book of different orders

  • Yan Li
  • Yusheng Li
  • Ye Wang

For graphs F, G and H, let F → ( G, H ) signify that any red/blue edge coloring of F contains either a red G or a blue H. The Ramsey number R ( G, H ) is defined to be the minimum r such that K r → ( G, H ), and the star-critical Ramsey number R S ( G, H ) is defined to be the maximum t such that K r ∖ K 1, t → ( G, H ), where r = R ( G, H ). In this note, we shall determine R S ( B n, C m ) for almost same orders.

TIST Journal 2020 Journal Article

A Joint Neural Model for User Behavior Prediction on Social Networking Platforms

  • Junwei Li
  • Le Wu
  • Richang Hong
  • Kun Zhang
  • Yong Ge
  • Yan Li

Social networking services provide platforms for users to perform two kinds of behaviors: consumption behavior (e.g., recommending items of interest) and social link behavior (e.g., recommending potential social links). Accurately modeling and predicting users’ two kinds of behaviors are two core tasks in these platforms with various applications. Recently, with the advance of neural networks, many neural-based models have been designed to predict a single users’ behavior, i.e., social link behavior or consumption behavior. Compared to the classical shallow models, these neural-based models show better performance to drive a user’s behavior by modeling the complex patterns. However, there are few works exploiting whether it is possible to design a neural-based model to jointly predict users’ two kinds of behaviors to further enhance the prediction performance. In fact, social scientists have already shown that users’ two kinds of behaviors are not isolated; people trend to the consumption recommendation of friends on social platforms and would like to make new friends with like-minded users. While some previous works jointly model users’ two kinds of behaviors with shallow models, we argue that the correlation between users’ two kinds of behaviors are complex, which could not be well-designed with shallow linear models. To this end, in this article, we propose a neural joint behavior prediction model named Neural Joint Behavior Prediction Model (NJBP) to mutually enhance the prediction performance of these two tasks on social networking platforms. Specifically, there are two key characteristics of our proposed model: First, to model the correlation of users’ two kinds of behaviors, we design a fusion layer in the neural network to model the positive correlation of users’ two kinds of behaviors. Second, as the observed links in the social network are often very sparse, we design a new link-based loss function that could preserve the social network topology. After that, we design a joint optimization function to allow the two behaviors modeling tasks to be trained to mutually enhance each other. Finally, extensive experimental results on two real-world datasets show that our proposed method is on average 7.14% better than the best baseline on social link behavior while 6.21% on consumption behavior prediction. Compared with the pair-wise loss function on two datasets, our proposed link-based loss function improves at least 4.69% on the social link behavior prediction and 4.72% on the consumption behavior prediction.

IJCAI Conference 2020 Conference Paper

Bilinear Graph Neural Network with Neighbor Interactions

  • Hongmin Zhu
  • Fuli Feng
  • Xiangnan He
  • Xiang Wang
  • Yan Li
  • Kai Zheng
  • Yongdong Zhang

Graph Neural Network (GNN) is a powerful model to learn representations and make predictions on graph data. Existing efforts on GNN have largely defined the graph convolution as a weighted sum of the features of the connected nodes to form the representation of the target node. Nevertheless, the operation of weighted sum assumes the neighbor nodes are independent of each other, and ignores the possible interactions between them. When such interactions exist, such as the co-occurrence of two neighbor nodes is a strong signal of the target node's characteristics, existing GNN models may fail to capture the signal. In this work, we argue the importance of modeling the interactions between neighbor nodes in GNN. We propose a new graph convolution operator, which augments the weighted sum with pairwise interactions of the representations of neighbor nodes. We term this framework as Bilinear Graph Neural Network (BGNN), which improves GNN representation ability with bilinear interactions between neighbor nodes. In particular, we specify two BGNN models named BGCN and BGAT, based on the well-known GCN and GAT, respectively. Empirical results on three public benchmarks of semi-supervised node classification verify the effectiveness of BGNN --- BGCN (BGAT) outperforms GCN (GAT) by 1. 6% (1. 5%) in classification accuracy. Codes are available at: https: //github. com/zhuhm1996/bgnn.

YNICL Journal 2020 Journal Article

Characterization of serial hyperpolarized 13C metabolic imaging in patients with glioma

  • Adam W. Autry
  • Jeremy W. Gordon
  • Hsin-Yu Chen
  • Marisa LaFontaine
  • Robert Bok
  • Mark Van Criekinge
  • James B. Slater
  • Lucas Carvajal

BACKGROUND: C imaging in patients undergoing treatment for brain tumors and determine whether there is evidence of aberrant metabolism in the tumor lesion compared to normal-appearing tissue. METHODS: was measured in terms of the coefficient of variation (CV). RESULTS: . CONCLUSION: in gadolinium-enhancing and non-enhancing lesions. Larger prospective studies with homogeneous patient populations are planned to evaluate metabolic changes following treatment.

TCS Journal 2020 Journal Article

Complete bipartite graphs deleted in Ramsey graphs

  • Yan Li
  • Yusheng Li
  • Ye Wang

For graphs F, G and H, let F → ( G, H ) signify that any red/blue edge coloring of F contains either a red G or a blue H. The Ramsey number R ( G, H ) is defined as min ⁡ { r | K r → ( G, H ) }. In this note, we consider an optimization problem to bound the complete bipartite-critical Ramsey number R Λ ( G, H ) defined as max ⁡ { t | K r ∖ K t, t → ( G, H ) } where r = R ( G, H ) and Λ is a set of K t, t, and determine R Λ ( G, H ) for some pairs ( G, H ).

AAAI Conference 2020 Conference Paper

Find Objects and Focus on Highlights: Mining Object Semantics for Video Highlight Detection via Graph Neural Networks

  • Yingying Zhang
  • Junyu Gao
  • Xiaoshan Yang
  • Chang Liu
  • Yan Li
  • Changsheng Xu

With the increasing prevalence of portable computing devices, browsing unedited videos is time-consuming and tedious. Video highlight detection has the potential to significantly ease this situation, which discoveries moments of user’s major or special interest in a video. Existing methods suffer from two problems. Firstly, most existing approaches only focus on learning holistic visual representations of videos but ignore object semantics for inferring video highlights. Secondly, current state-of-the-art approaches often adopt the pairwise ranking-based strategy, which cannot enjoy the global information to infer highlights. Therefore, we propose a novel video highlight framework, named VH- GNN, to construct an object-aware graph and model the relationships between objects from a global view. To reduce computational cost, we decompose the whole graph into two types of graphs: a spatial graph to capture the complex interactions of object within each frame, and a temporal graph to obtain object-aware representation of each frame and capture the global information. In addition, we optimize the framework via a proposed multi-stage loss, where the first stage aims to determine the highlight-probability and the second stage leverage the relationships between frames and focus on hard examples from the former stage. Extensive experiments on two standard datasets strongly evidence that VH-GNN obtains significant performance compared with state-of-the-arts.

YNICL Journal 2020 Journal Article

Imaging the Nigrosome 1 in the substantia nigra using susceptibility weighted imaging and quantitative susceptibility mapping: An application to Parkinson's disease

  • Zenghui Cheng
  • Naying He
  • Pei Huang
  • Yan Li
  • Rongbiao Tang
  • Sean K. Sethi
  • Kiarash Ghassaban
  • Kiran Kumar Yerramsetty

Parkinson's disease (PD) is a clinically heterogeneous chronic progressive neuro-degenerative disease with loss of dopaminergic neurons in the nigrosome 1 (N1) territory of the substantia nigra pars compacta (SNpc). To date, there has been a major effort to identify changes in the N1 territory by monitoring increases of iron in the SNpc. However, there is no standard protocol being used to visualize or characterize the N1 territory. Therefore, the purpose of this study was to create a robust high quality, rapid imaging protocol, determine a slice by slice characterization of the appearance of N1 (the "N1 sign") and evaluate the loss of the N1 sign in order to differentiate healthy controls (HCs) from patients with PD. Firstly, one group of 10 HCs was used to determine the choice of imaging parameters. Secondly, another group of 80 HCs was used to characterize the appearance of the N1 sign and train the raters. In this step, the magnitude, susceptibility weighted images (SWI), quantitative susceptibility maps (QSM) and true SWI (tSWI) images were all reviewed using data from a 3D gradient recalled echo sequence. A resolution of 0.67 mm × 0.67 mm × 1.34 mm was chosen based on the ability to cover all the basal ganglia, midbrain and dentate nucleus with good signal-to-noise with echo times of 11 ms and 20 ms. Thirdly, 80 Parkinsonism and related disorders patients [idiopathic Parkinson's disease (IPD): 57; atypical parkinsonian syndromes (APs): 14; essential tremor (ET): 9] and one additional group of 80 age-matched HCs were blindly analyzed for the presence or absence of the N1 sign for a differential diagnosis. From the first group of 80 HCs, all of the 76 (100%) cases (4 were excluded due to motion artifacts) showed the N1 sign in one form or another after reviewing the first 5 caudal slices of the SN. For the second group of 80 HCs, 78 (97.5%) showed the N1 sign in at least 2 slices. Of the 80 Parkinsonism and related disorders patients, 32 (56.1%, 32/57) IPD and 6 (42.9%, 6/14) APs showed a bilateral loss of the N1 sign, 12 (21.1%, 12/57) IPD and 6 (42.9%, 6/14) APs showed the N1 sign unilaterally and 13 (22.8%, 13/57) IPD and 2 (14.2%, 2/14) APs showed the N1 sign bilaterally. Also, all 9 (100%, 9/9) ET patients showed the N1 sign bilaterally. The mean total structure and mean high susceptibility region for the SN for both IPD and APs patients with bilateral loss of N1 were higher than those of the HCs (p < 0.002). In conclusion, the N1 sign can be consistently visualized using tSWI with a resolution of at least 0.67 mm × 0.67 mm × 1.34 mm and can be seen in 95% of HCs.

TCS Journal 2020 Journal Article

Maximum star deleted from Ramsey graphs of book and tree

  • Ye Wang
  • Yusheng Li
  • Yan Li

For graphs F, G and H, let F → ( G, H ) signify that any red/blue edge coloring of F contains a red G or a blue H. Define the Ramsey number R ( G, H ) to be the smallest r such that K r → ( G, H ). In this note, we consider an optimization problem to find the star-critical Ramsey number R S ( G, H ) defined as max ⁡ { n | K r ∖ K 1, n → ( G, H ) } by showing that for n ≥ 3 m, R S ( B m, T n ) = n − 2, where r = R ( G, H ).

AAAI Conference 2020 Conference Paper

PEIA: Personality and Emotion Integrated Attentive Model for Music Recommendation on Social Media Platforms

  • Tiancheng Shen
  • Jia Jia
  • Yan Li
  • Yihui Ma
  • Yaohua Bu
  • Hanjie Wang
  • Bo Chen
  • Tat-Seng Chua

With the rapid expansion of digital music formats, it’s indispensable to recommend users with their favorite music. For music recommendation, users’ personality and emotion greatly affect their music preference, respectively in a longterm and short-term manner, while rich social media data provides effective feedback on these information. In this paper, aiming at music recommendation on social media platforms, we propose a Personality and Emotion Integrated Attentive model (PEIA), which fully utilizes social media data to comprehensively model users’ long-term taste (personality) and short-term preference (emotion). Specifically, it takes full advantage of personality-oriented user features, emotionoriented user features and music features of multi-faceted attributes. Hierarchical attention is employed to distinguish the important factors when incorporating the latent representations of users’ personality and emotion. Extensive experiments on a large real-world dataset of 171, 254 users demonstrate the effectiveness of our PEIA model which achieves an NDCG of 0. 5369, outperforming the state-of-the-art methods. We also perform detailed parameter analysis and feature contribution analysis, which further verify our scheme and demonstrate the significance of co-modeling of user personality and emotion in music recommendation.

YNIMG Journal 2019 Journal Article

Big GABA II: Water-referenced edited MR spectroscopy at 25 research sites

  • Mark Mikkelsen
  • Daniel L. Rimbault
  • Peter B. Barker
  • Pallab K. Bhattacharyya
  • Maiken K. Brix
  • Pieter F. Buur
  • Kim M. Cecil
  • Kimberly L. Chan

Accurate and reliable quantification of brain metabolites measured in vivo using 1H magnetic resonance spectroscopy (MRS) is a topic of continued interest. Aside from differences in the basic approach to quantification, the quantification of metabolite data acquired at different sites and on different platforms poses an additional methodological challenge. In this study, spectrally edited γ-aminobutyric acid (GABA) MRS data were analyzed and GABA levels were quantified relative to an internal tissue water reference. Data from 284 volunteers scanned across 25 research sites were collected using GABA+ (GABA + co-edited macromolecules (MM)) and MM-suppressed GABA editing. The unsuppressed water signal from the volume of interest was acquired for concentration referencing. Whole-brain T 1-weighted structural images were acquired and segmented to determine gray matter, white matter and cerebrospinal fluid voxel tissue fractions. Water-referenced GABA measurements were fully corrected for tissue-dependent signal relaxation and water visibility effects. The cohort-wide coefficient of variation was 17% for the GABA + data and 29% for the MM-suppressed GABA data. The mean within-site coefficient of variation was 10% for the GABA + data and 19% for the MM-suppressed GABA data. Vendor differences contributed 53% to the total variance in the GABA + data, while the remaining variance was attributed to site- (11%) and participant-level (36%) effects. For the MM-suppressed data, 54% of the variance was attributed to site differences, while the remaining 46% was attributed to participant differences. Results from an exploratory analysis suggested that the vendor differences were related to the unsuppressed water signal acquisition. Discounting the observed vendor-specific effects, water-referenced GABA measurements exhibit similar levels of variance to creatine-referenced GABA measurements. It is concluded that quantification using internal tissue water referencing is a viable and reliable method for the quantification of in vivo GABA levels.

TIST Journal 2019 Journal Article

Spatial Ensemble Learning for Heterogeneous Geographic Data with Class Ambiguity

  • Zhe Jiang
  • Arpan Man Sainju
  • Yan Li
  • Shashi Shekhar
  • Joseph Knight

Class ambiguity refers to the phenomenon whereby similar features correspond to different classes at different locations. Given heterogeneous geographic data with class ambiguity, the spatial ensemble learning (SEL) problem aims to find a decomposition of the geographic area into disjoint zones such that class ambiguity is minimized and a local classifier can be learned in each zone. The problem is important for applications such as land cover mapping from heterogeneous earth observation data with spectral confusion. However, the problem is challenging due to its high computational cost. Related work in ensemble learning either assumes an identical sample distribution (e.g., bagging, boosting, random forest) or decomposes multi-modular input data in the feature vector space (e.g., mixture of experts, multimodal ensemble) and thus cannot effectively minimize class ambiguity. In contrast, we propose a spatial ensemble framework that explicitly partitions input data in geographic space. Our approach first preprocesses data into homogeneous spatial patches and uses a greedy heuristic to allocate pairs of patches with high class ambiguity into different zones. We further extend our spatial ensemble learning framework with spatial dependency between nearby zones based on the spatial autocorrelation effect. Both theoretical analysis and experimental evaluations on two real world wetland mapping datasets show the feasibility of the proposed approach.

AAAI Conference 2019 Short Paper

Transductive Zero-Shot Learning via Visual Center Adaptation

  • Ziyu Wan
  • Yan Li
  • Min Yang
  • Junge Zhang

In this paper, we propose a Visual Center Adaptation Method (VCAM) to address the domain shift problem in zero-shot learning. For the seen classes in the training data, VCAM builds an embedding space by learning the mapping from semantic space to some visual centers. While for unseen classes in the test data, the construction of embedding space is constrained by a symmetric Chamfer-distance term, aiming to adapt the distribution of the synthetic visual centers to that of the real cluster centers. Therefore the learned embedding space can generalize the unseen classes well. Experiments on two widely used datasets demonstrate that our model significantly outperforms state-of-the-art methods.

NeurIPS Conference 2019 Conference Paper

Transductive Zero-Shot Learning with Visual Structure Constraint

  • Ziyu Wan
  • DongDong Chen
  • Yan Li
  • Xingguang Yan
  • Junge Zhang
  • Yizhou Yu
  • Jing Liao

To recognize objects of the unseen classes, most existing Zero-Shot Learning (ZSL) methods first learn a compatible projection function between the common semantic space and the visual space based on the data of source seen classes, then directly apply it to the target unseen classes. However, in real scenarios, the data distribution between the source and target domain might not match well, thus causing the well-known domain shift problem. Based on the observation that visual features of test instances can be separated into different clusters, we propose a new visual structure constraint on class centers for transductive ZSL, to improve the generality of the projection function (\ie alleviate the above domain shift problem). Specifically, three different strategies (symmetric Chamfer-distance, Bipartite matching distance, and Wasserstein distance) are adopted to align the projected unseen semantic centers and visual cluster centers of test instances. We also propose a new training strategy to handle the real cases where many unrelated images exist in the test dataset, which is not considered in previous methods. Experiments on many widely used datasets demonstrate that the proposed visual structure constraint can bring substantial performance gain consistently and achieve state-of-the-art results.

JBHI Journal 2018 Journal Article

An Automatic Detection System of Lung Nodule Based on Multigroup Patch-Based Deep Learning Network

  • Hongyang Jiang
  • He Ma
  • Wei Qian
  • Mengdi Gao
  • Yan Li

High-efficiency lung nodule detection dramatically contributes to the risk assessment of lung cancer. It is a significant and challenging task to quickly locate the exact positions of lung nodules. Extensive work has been done by researchers around this domain for approximately two decades. However, previous computer-aided detection (CADe) schemes are mostly intricate and time-consuming since they may require more image processing modules, such as the computed tomography image transformation, the lung nodule segmentation, and the feature extraction, to construct a whole CADe system. It is difficult for these schemes to process and analyze enormous data when the medical images continue to increase. Besides, some state of the art deep learning schemes may be strict in the standard of database. This study proposes an effective lung nodule detection scheme based on multigroup patches cut out from the lung images, which are enhanced by the Frangi filter. Through combining two groups of images, a four-channel convolution neural networks model is designed to learn the knowledge of radiologists for detecting nodules of four levels. This CADe scheme can acquire the sensitivity of 80. 06% with 4. 7 false positives per scan and the sensitivity of 94% with 15. 1 false positives per scan. The results demonstrate that the multigroup patch-based learning system is efficient to improve the performance of lung nodule detection and greatly reduce the false positives under a huge amount of image data.

AAAI Conference 2018 Conference Paper

Deep Semantic Structural Constraints for Zero-Shot Learning

  • Yan Li
  • Zhen Jia
  • Junge Zhang
  • Kaiqi Huang
  • Tieniu Tan

Zero-shot learning aims to classify unseen image categories by learning a visual-semantic embedding space. In most cases, the traditional methods adopt a separated two-step pipeline that extracts image features from pre-trained CNN models. Then the fixed image features are utilized to learn the embedding space. It leads to the lack of specific structural semantic information of image features for zero-shot learning task. In this paper, we propose an end-to-end trainable Deep Semantic Structural Constraints model to address this issue. The proposed model contains the Image Feature Structure constraint and the Semantic Embedding Structure constraint, which aim to learn structure-preserving image features and endue the learned embedding space with stronger generalization ability respectively. With the assistance of semantic structural information, the model gains more auxiliary clues for zero-shot learning. The state-of-the-art performance certifies the effectiveness of our proposed method.

JBHI Journal 2018 Journal Article

Multimodal Neuroimaging Feature Learning With Multimodal Stacked Deep Polynomial Networks for Diagnosis of Alzheimer's Disease

  • Jun Shi
  • Xiao Zheng
  • Yan Li
  • Qi Zhang
  • Shihui Ying

The accurate diagnosis of Alzheimer's disease (AD) and its early stage, i. e. , mild cognitive impairment, is essential for timely treatment and possible delay of AD. Fusion of multimodal neuroimaging data, such as magnetic resonance imaging (MRI) and positron emission tomography (PET), has shown its effectiveness for AD diagnosis. The deep polynomial networks (DPN) is a recently proposed deep learning algorithm, which performs well on both large-scale and small-size datasets. In this study, a multimodal stacked DPN (MM-SDPN) algorithm, which MM-SDPN consists of two-stage SDPNs, is proposed to fuse and learn feature representation from multimodal neuroimaging data for AD diagnosis. Specifically speaking, two SDPNs are first used to learn high-level features of MRI and PET, respectively, which are then fed to another SDPN to fuse multimodal neuroimaging information. The proposed MM-SDPN algorithm is applied to the ADNI dataset to conduct both binary classification and multiclass classification tasks. Experimental results indicate that MM-SDPN is superior over the state-of-the-art multimodal feature-learning-based algorithms for AD diagnosis.

ICML Conference 2018 Conference Paper

Non-convex Conditional Gradient Sliding

  • Chao Qu
  • Yan Li
  • Huan Xu 0001

We investigate a projection free optimization method, namely non-convex conditional gradient sliding (NCGS) for non-convex optimization problems on the batch, stochastic and finite-sum settings. Conditional gradient sliding (CGS) method, by integrating Nesterov’s accelerated gradient method with Frank-Wolfe (FW) method in a smart way, outperforms FW for convex optimization, by reducing the amount of gradient computations. However, the study of CGS in the non-convex setting is limited. In this paper, we propose the non-convex conditional gradient sliding (NCGS) methods and analyze their convergence properties. We also leverage the idea of variance reduction from the recent progress in convex optimization to obtain a new algorithm termed variance reduced NCGS (NCGS-VR), and obtain faster convergence rate than the batch NCGS in the finite-sum setting. We show that NCGS algorithms outperform their Frank-Wolfe counterparts both in theory and in practice, for all three settings, namely the batch, stochastic and finite-sum setting. This significantly improves our understanding of optimizing non-convex functions with complicated feasible sets (where projection is prohibitively expensive).

YNIMG Journal 2017 Journal Article

Big GABA: Edited MR spectroscopy at 24 research sites

  • Mark Mikkelsen
  • Peter B. Barker
  • Pallab K. Bhattacharyya
  • Maiken K. Brix
  • Pieter F. Buur
  • Kim M. Cecil
  • Kimberly L. Chan
  • David Y.-T. Chen

Magnetic resonance spectroscopy (MRS) is the only biomedical imaging method that can noninvasively detect endogenous signals from the neurotransmitter γ-aminobutyric acid (GABA) in the human brain. Its increasing popularity has been aided by improvements in scanner hardware and acquisition methodology, as well as by broader access to pulse sequences that can selectively detect GABA, in particular J-difference spectral editing sequences. Nevertheless, implementations of GABA-edited MRS remain diverse across research sites, making comparisons between studies challenging. This large-scale multi-vendor, multi-site study seeks to better understand the factors that impact measurement outcomes of GABA-edited MRS. An international consortium of 24 research sites was formed. Data from 272 healthy adults were acquired on scanners from the three major MRI vendors and analyzed using the Gannet processing pipeline. MRS data were acquired in the medial parietal lobe with standard GABA+ and macromolecule- (MM-) suppressed GABA editing. The coefficient of variation across the entire cohort was 12% for GABA+ measurements and 28% for MM-suppressed GABA measurements. A multilevel analysis revealed that most of the variance (72%) in the GABA+ data was accounted for by differences between participants within-site, while site-level differences accounted for comparatively more variance (20%) than vendor-level differences (8%). For MM-suppressed GABA data, the variance was distributed equally between site- (50%) and participant-level (50%) differences. The findings show that GABA+ measurements exhibit strong agreement when implemented with a standard protocol. There is, however, increased variability for MM-suppressed GABA measurements that is attributed in part to differences in site-to-site data acquisition. This study's protocol establishes a framework for future methodological standardization of GABA-edited MRS, while the results provide valuable benchmarks for the MRS community.

JBHI Journal 2017 Journal Article

Histopathological Image Classification With Color Pattern Random Binary Hashing-Based PCANet and Matrix-Form Classifier

  • Jun Shi
  • Jinjie Wu
  • Yan Li
  • Qi Zhang
  • Shihui Ying

The computer-aided diagnosis for histopathological images has attracted considerable attention. Principal component analysis network (PCANet) is a novel deep learning algorithm for feature learning with the simple network architecture and parameters. In this study, a color pattern random binary hashing-based PCANet (C-RBH-PCANet) algorithm is proposed to learn an effective feature representation from color histopathological images. The color norm pattern and angular pattern are extracted from the principal component images of R, G, and B color channels after cascaded PCA networks. The random binary encoding is then performed on both color norm pattern images and angular pattern images to generate multiple binary images. Moreover, we rearrange the pooled local histogram features by spatial pyramid pooling to a matrix-form for reducing the dimension of feature and preserving spatial information. Therefore, a C-RBH-PCANet and matrix-form classifier-based feature learning and classification framework is proposed for diagnosis of color histopathological images. The experimental results on three color histopathological image datasets show that the proposed C-RBH-PCANet algorithm is superior to the original PCANet and other conventional unsupervised deep learning algorithms, while the best performance is achieved by the proposed feature learning and classification framework that combines C-RBH-PCANet and matrix-form classifier.

IROS Conference 2015 Conference Paper

Maintaining constant towing tension between cable ship and burying system under sea waves by hybrid FUZZY P + ID controller

  • Qi Chen
  • Wei Li 0006
  • Xiaohui Wang 0014
  • Yan Li
  • Shuo Li 0001
  • Bin Xian

In this paper, we propose a hybrid FUZZY P + ID controller to stabilize the towing cable tension between a cable ship and a burying system. First, we develop the model of a winch system driven by valve-controlled hydraulic motors and evaluate the step responses yielded by the conventional PID and the proposed FUZZY P+ID controllers using simulations. The comparative studies show that the control performance yielded by the FUZZY P+ID controller is superior. We replace the existing PID controller implemented on the towing winch with the FUZZY P + ID controller for burying cable tasks at speed up to 1 knot under sea waves with significant variations (peak-to-peak) from 1. 5 to 2. 5 meters. The real applications demonstrate that the FUZZY P + ID controller is much more robust than the conventional PID controller.

EAAI Journal 2014 Journal Article

A novel statistical algorithm for multiclass EEG signal classification

  • Siuly
  • Yan Li

This paper presents a new algorithm for the classification of multiclass EEG signals. This algorithm involves applying the optimum allocation technique to select representative samples that reflect an entire database. This research investigates whether the optimum allocation is suitable to extract representative samples depending on their variability within the groups in the input EEG data. It also assesses whether these samples are efficient for the multiclass least square support vector machine (MLS-SVM) to classify EEG signals. The performances of the MLS-SVM with four different output coding approaches: minimum output codes (MOC), error correcting output codes (ECOC), One vs One (1vs1) and One vs All (1vsA), are evaluated with a benchmark epileptic EEG database. To test the consistency, all experiments are repeated ten times with the same classifying parameters in each classification process. The results show very high classification performances for each class, and also confirm the consistency of the proposed method in each repeated experiment. In addition, the performances by the optimum allocation based MLS-SVM method are compared with the four existing reference methods using the same database. The outcomes of this research demonstrate that the optimum allocation is very effective and efficient for extracting the representative patterns from the multiclass EEG data, and the MLS-SVM is also very well fitted with the optimum allocation technique for the EEG classification.

JBHI Journal 2014 Journal Article

Analysis and Classification of Sleep Stages Based on Difference Visibility Graphs From a Single-Channel EEG Signal

  • Guohun Zhu
  • Yan Li
  • Peng Wen

The existing sleep stages classification methods are mainly based on time or frequency features. This paper classifies the sleep stages based on graph domain features from a single-channel electroencephalogram (EEG) signal. First, each epoch (30 s) EEG signal is mapped into a visibility graph (VG) and a horizontal VG (HVG). Second, a difference VG (DVG) is obtained by subtracting the edges set of the HVG from the edges set of the VG to extract essential degree sequences and to detect the gait-related movement artifact recordings. The mean degrees (MDs) and degree distributions (DDs) $P$ $(k)$ on HVGs and DVGs are analyzed epoch-by-epoch from 14, 963 segments of EEG signals. Then, the MDs of each DVG and HVG and seven distinguishable DD values of $P$ $(k)$ from each DVG are extracted. Finally, nine extracted features are forwarded to a support vector machine to classify the sleep stages into two, three, four, five, and six states. The accuracy and kappa coefficients of six-state classification are 87. 5% and 0. 81, respectively. It was found that the MDs of the VGs on the deep sleep stage are higher than those on the awake and light sleep stages, and the MDs of the HVGs are just the reverse.

ICRA Conference 2010 Conference Paper

An adaptive roadmap guided Multi-RRTs strategy for single query path planning

  • Wei Wang 0434
  • Yan Li
  • Xin Xu 0001
  • Simon X. Yang

During the past decade, Rapidly-exploring Random Tree (RRT) and its variants are shown to be powerful sampling based single query path planning approaches for robots in high-dimensional configuration space. However, the performance of such tree-based planners that rely on uniform sampling strategy degrades significantly when narrow passages are contained in the configuration space. Given the assumption that computation resources should be allocated in proportion the geometric complexity of local region, we present a novel single-query Multi-RRTs path planning framework that employs an improved Bridge Test algorithm to identify global important roadmaps in narrow passages. Multiple trees can grown from these sampled roadmaps to explore sub-regions which are difficult to reach. The probability of selecting one particular tree for expansion and connection, which can dynamically updated by on-line learning algorithm based on the historic results of exploration, guides the tree through narrow passage rapidly. Experimental results show that the proposed approach gives substantial improvement in planning efficiency over a wide range of single-query path planning problems.

IROS Conference 2009 Conference Paper

Path planning in changing environments by using optimal path segment search

  • Hong Liu 0008
  • He Wen
  • Yan Li

This paper presents a novel planner for manipulators and robots in changing environments. When environments are complicated, it's always difficult to find a completely valid path solution, which is essential for many methods. However, our planner searches for several path segments to make robot move towards its goal as much as possible even though such a complete solution doesn't exist currently. In the learning phase, the planner begins by building a roadmap that captures the topological structure of the configuration space in a workspace without obstacles. In the query phase, the planner searches for a solution path in the roadmap with the A* algorithm and performs roadmap updating using the lazy evaluation idea concurrently with the solution search process. If a completely valid solution is found, it will be adopted immediately. Otherwise the planner will collect a set of maximum valid path segments and then select the optimal one for planning in the execution process. The searching and execution process will be repeatedly performed until a goal configuration is reached. In plentiful experiments, our planner shows promising performances.