Arrow Research search

Author name cluster

Yi Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

70 papers
2 author rows

Possible papers

70

EAAI Journal 2026 Journal Article

DawnNet: Domain-augmented multi-weighting network for endometrial histopathological image classification

  • Fengjun Zhao
  • Lin Wu
  • Yi Li
  • Xuelei He
  • Hongyan Du
  • Yanrong Chen
  • Xiaowei He
  • Yuqing Hou

Histopathological examination is the gold standard for diagnosing endometrial tissues, including normal endometrium, endometrial polyps, endometrial hyperplasia, and endometrial adenocarcinoma. However, subtle variations in gland-to-stroma ratios and nuclear morphology make the diagnosis subjective and dependent on pathologist expertise. Computer-aided diagnosis systems using deep learning-based approaches can improve diagnostic efficiency by automatically extracting representative features. However, their performance often degrades when encountering data variations from different institutes—a domain shift issue that violates the independent and identically distributed assumption between training and testing data. This out-of-distribution challenge is not fully addressed by existing domain generalization methods, which often overlook key morphological features essential for histopathological interpretation. To address this issue, we propose DawnNet, a domain-augmented multi-weighting network for robust endometrial histopathological image classification. DawnNet incorporates a domain augmentation module to improve generalization, a spatial–channel weighting attention module to enhance discriminative features while suppressing domain-specific ones, a sample weighting module to reduce spurious correlations, and a hybrid objective function to learn domain-invariant and diagnosis-relevant features. Experiments on publicly available datasets demonstrate that DawnNet outperforms state-of-the-art methods, showing promising generalization for both in-distribution and out-of-distribution cases. Codes are available at https: //github. com/aliy-ali/DawnNet.

AAAI Conference 2026 Conference Paper

Doubly Debiased Test-Time Prompt Tuning for Vision-Language Models

  • Fei Song
  • Yi Li
  • Rui Wang
  • Jiahuan Zhou
  • Changwen Zheng
  • Jiangmeng Li

Test-time prompt tuning for vision-language models has demonstrated impressive generalization capabilities under zero-shot settings. However, tuning the learnable prompts solely based on unlabeled test data may induce prompt optimization bias, ultimately leading to suboptimal performance on downstream tasks. In this work, we analyze the underlying causes of prompt optimization bias from both the model and data perspectives. In terms of the model, the entropy minimization objective typically focuses on reducing the entropy of model predictions while overlooking their correctness. This can result in overconfident yet incorrect outputs, thereby compromising the quality of prompt optimization. On the data side, prompts affected by optimization bias can introduce misalignment between visual and textual modalities, which further aggravates the prompt optimization bias. To this end, we propose a Doubly Debiased Test-Time Prompt Tuning method, abbreviated as D2TPT. Specifically, we first introduce a dynamic retrieval-augmented modulation module that retrieves high-confidence knowledge from a dynamic knowledge base using the test image feature as a query, and uses the retrieved knowledge to modulate the predictions. Guided by the refined predictions, we further develop a reliability-aware prompt optimization module that incorporates a confidence-based weighted ensemble and cross-modal consistency distillation to impose regularization constraints during prompt tuning. Extensive experiments across 15 benchmark datasets involving both natural distribution shifts and cross-datasets generalization demonstrate that D2TPT outperforms baselines, validating its effectiveness in mitigating prompt optimization bias.

AAAI Conference 2026 Conference Paper

Good Gradients Poison Your Model: Evading Defenses in Federated Learning via Boundary-adaptive Perturbation

  • Xiaojie Zhao
  • Jinqiao Shi
  • Yi Li
  • Junmin Huang
  • Chongru Fan

Federated learning (FL) allows for collaborative model training while preserving data privacy, but its distributed nature makes it vulnerable to poisoning attacks. Existing defense methods typically rely on using gradients from multiple clients to define a trusted region, selecting only the trustworthy update (good gradients) within this region for aggregation. Mainstream defense boundaries are categorized as hard boundaries, soft boundaries, and semi-soft boundaries. However, we argue that even good gradients within these boundaries can still be exploited by attackers to poison the model. To tackle this challenge, we introduce a boundary-adaptive attack method that leverages the directional properties of optimization techniques to derive baseline poisoned gradients. Through iterative perturbation, it generates seemingly innocent gradients that subtly deviate from the global model. Our extensive study on benchmark datasets and mainstream defensive mechanisms confirms that the proposed attack raises a significantly threat to the integrity and security of FL practices, regardless of the flourishing of robust FL methods.

TAAS Journal 2026 Journal Article

Graph Unlearning System with Subgraph De-Isolation Measures

  • Yi Li
  • Debo Cheng
  • Guixian Zhang
  • Chengyu Li
  • Shichao Zhang

Graph unlearning system offers a promising solution for securely erasing specific data points and their associated influences from Graph Neural Networks (GNNs). However, existing approaches often treat the problem as multiple isolated and disjoint sub-problems by partitioning graph data into isolated subgraphs, which overlooks the native graph structure information between subgraphs. This results in biased representations that hinder the accurate modeling of key connections and relationships within the data, leading to a notable reduction in model utility due to this loss of information. To address these issues, we propose an innovative framework called N on- I solated G raph Eraser (NIGEraser) that decomposes the unlearning task into multiple non-isolated, intersecting sub-problems. Specifically, a novel non-isolated graph partitioning strategy is proposed for NIGEraser that mitigates isolation by replicating key nodes across multiple neighboring subgraphs, along with an attention-based sub-model aggregation technique in that global graph structure information is employed. By this design, a broader natural neighborhood is explored, capturing and effectively utilizing the critical graph structure features lost between subgraphs during partitioning, thereby reducing information loss during task decomposition and aggregation. Additionally, it is demonstrated that graph unlearning methods can overcome the limitations of traditional isolated partitioning strategies, providing an effective theoretical constraint on time consumption. Extensive experiments on four real-world graph-structured datasets show that NIGEraser consistently outperforms existing unlearning methods, offering superior model utility while ensuring efficient and deterministic data removal.

AAAI Conference 2026 Conference Paper

PLaST: Towards Paralinguistic-aware Speech Translation

  • Yi Li
  • Rui Zhao
  • Ruiquan Zhang
  • Jinsong Su
  • Daimeng Wei
  • Min Zhang
  • Yidong Chen

Speech translation (ST) aims to translate speech from a source language into text in the target language. Naturally, speech signals contain paralinguistic cues beyond linguistic content, which could influence or even alter the interpretation of a lexically identical sentence, thereby yielding distinct translations. However, existing ST models lack direct and sufficient modeling of paralinguistic information, which limits their ability to perceive paralinguistic cues and understand speech comprehensively, leading to degraded translation performance. In response, we propose Paralinguistic-aware Speech Translation (PLaST), a novel dual-branch framework which directly leverages paralinguistic cues beyond the linguistic content. Specifically, PLaST employs a speech encoder and a style extractor to independently generate linguistic and paralinguistic representations, respectively. To obtain a purified linguistic representation aligned with the text representation, a hierarchical Optimal Transport (OT) is applied on the layer-wise outputs from an LLM decoder. Then, the paralinguistic information is retrieved and refined with an Attention-based Retrieval (AR) module, with the linguistic representation serving as queries to enable joint guidance for semantic understanding and translation generation. PLaST outperforms the strong baseline with an average of 5.0 directional and 4.5 global contrastive likelihood scores on the paralinguistic-sensitive benchmark ContraProST, demonstrating its superior capability in paralinguistic perception. Further experiments on the standard speech translation benchmark CoVoST-2 show that PLaST generalizes well to typical ST scenarios.

AAAI Conference 2026 Conference Paper

Trimming the Fat: Redundancy-Aware Acceleration Framework for DGNNs

  • Renhong Huang
  • Yuxuan Cao
  • Yi Li
  • Junwei Hu
  • Zihua Xiong
  • Shuai Fang
  • Sheng Guo
  • Bo Zheng

Temporal graphs are essential for modeling complex real-world systems, such as social interactions, financial transactions, and recommendation systems, but the high computational cost and model complexity of dynamic graph neural networks (DGNNs) pose significant challenges for practical deployment. Although various pruning and sampling techniques have proven effective in accelerating static GNNs, they fall short in dynamic settings due to temporal dependencies in evolving graph structures. To address these challenges, we propose TrimDG, a general framework that accelerates DGNNs by eliminating both static and runtime redundancies. For static redundancy, we introduce a novel node influence metric, Temporal Personalized PageRank (TPP), to prune less informative nodes, and employ temporal binning to remove redundant events. For runtime redundancy during training, we develop an adaptive sampling strategy guided by graph information bottleneck and further reduce sampling frequency through temporal batch selector and sampling cache. Theoretical analysis supports our design, and experiments on real-world datasets show that TrimDG reduces runtime by an average of 83.49% across diverse DGNN backbones, while maintaining strong predictive performance, demonstrating both its efficiency and generalizability.

EAAI Journal 2025 Journal Article

A flow rate estimation method for gas–liquid two-phase flow based on filter-enhanced convolutional neural network

  • Yuxiao Jiang
  • Yinyan Liu
  • Lihui Peng
  • Yi Li

Accurate estimation of flow rate in gas–liquid two-phase flow is crucial for various industrial processes. How to accurately estimate flow rate remains a challenging problem. Previously, deep learning-based methods focused on a few human-set points with single task learning. In addition, the data were not denoised. In this study, a flow rate estimation method based on a filter-enhanced convolutional neural network (FECNN) is proposed for gas–liquid two-phase flow. The method leverages multimodal data from a Venturi tube and an electrical capacitance tomography (ECT) sensor as input, utilizing multilayer perceptron (MLP) to fuse data. Subsequently, a learnable filter module is employed to attenuate noise adaptively, followed by multiscale convolutional neural network (MSCNN) extraction of flow rate features at different scales. Finally, the method enables estimate each single-phase flow rate simultaneously through multi-task learning (MTL). The adaptive noise attenuation capabilities of the learnable filter module are demonstrated, and the ability of the proposed MSCNN to capture multiscale flow rate features through multiple comparative experiments is shown. Additionally, a qualitative comparison with recent flow rate estimation methods is provided. Overall, this study demonstrates the effectiveness and superiority of the proposed FECNN in flow rate estimation.

TAAS Journal 2025 Journal Article

Adaptive Scheduling of High-Availability Drone Swarms for Congestion Alleviation in Connected Automated Vehicles

  • Shengye Pang
  • Yi Li
  • Zhen Qin
  • Xinkui Zhao
  • Jintao Chen
  • Fan Wang
  • Jianwei Yin

The Intelligent Transportation System (ITS) serves as a pivotal element within urban networks, offering decision support to users and connected automated vehicles through comprehensive information gathering, sensing, device control, and data processing. Presently, ITS predominantly relies on sensors embedded in fixed infrastructure, notably Roadside Units (RSUs). However, RSUs are confined by coverage limitations and may encounter challenges in prompt emergency responses. On-demand resources, such as drones, present a viable option to supplement these deficiencies effectively. This article introduces an approach where Software-Defined Networking and Mobile Edge Computing technologies are integrated to formulate a high-availability drone swarm control and communication infrastructure framework comprising the cloud layer, edge layer, and device layer. Drones confront limitations in flight duration attributed to battery limitations, posing a challenge in sustaining continuous monitoring of road conditions over extended periods. Effective drone scheduling stands as a promising solution to overcome these constraints. To tackle this issue, we initially utilized Graph WaveNet, a specialized graph neural network structure tailored for spatial-temporal graph modeling, for training a congestion prediction model using real-world dataset inputs. Building upon this, we further propose an algorithm for drone scheduling based on congestion prediction. Our simulation experiments using real-world data demonstrate that, compared to the baseline method, the proposed scheduling algorithm not only yielded superior scheduling gains but also mitigated drone idle rates.

TAAS Journal 2025 Journal Article

Chameleon Hash based Collaborative Time-Series Data Integrity Monitoring

  • Yi Li
  • Jian Shen
  • Mohammad S. Obaidat
  • Pandi Vijayakumar
  • Sendhilkumar Selvaradjou
  • Kuei-Fang Hsiao

The importance of the ocean to humanity is undeniable, whether in terms of ecology, climate, resources. Utilizing collected ocean data combined with AI to achieve adaptive and automated processing and prediction is a current research focus. The effectiveness of AI applications largely depends on the integrity of ocean data. Ocean data has three characteristics: vast spatial coverage, long temporal duration, and large volume. Traditional cloud-based data integrity verification methods are no longer suitable. Ocean data should be processed on edge servers located closer to the data collection points and then sent to the appropriate data storage servers. The data processing methods should be lightweight to accommodate the sequential characteristics of data. Moreover, the data integrity monitoring process should be collaboratively completed on the data storage servers without the need for a central third party. To this end, we propose a ocean data integrity monitoring protocol. It generates data for different storage servers, using sensor sampling periods and data masks, and utilizes chameleon hash with ephemeral trapdoors to generate validators, thus supporting mutual integrity monitoring among storage servers. Experiments demonstrate that our scheme compared to the latest solutions, not only meets security requirements but also offers advantages of computational overhead.

AAAI Conference 2025 Conference Paper

Community-Centric Graph Unlearning

  • Yi Li
  • Shichao Zhang
  • Guixian Zhang
  • Debo Cheng

Graph unlearning technology has become increasingly important since the advent of the `right to be forgotten' and the growing concerns about the privacy and security of artificial intelligence. Graph unlearning aims to quickly eliminate the effects of specific data on graph neural networks (GNNs). However, most existing deterministic graph unlearning frameworks follow a balanced partition-submodel training-aggregation paradigm, resulting in a lack of structural information between subgraph neighborhoods and redundant unlearning parameter calculations. To address this issue, we propose a novel Graph Structure Mapping Unlearning paradigm (GSMU) and a novel method based on it named Community-centric Graph Eraser (CGE). CGE maps community subgraphs to nodes, thereby enabling the reconstruction of a node-level unlearning operation within a reduced mapped graph. CGE makes the exponential reduction of both the amount of training data and the number of unlearning parameters. Extensive experiments conducted on five real-world datasets and three widely used GNN backbones have verified the high performance and efficiency of our CGE method, highlighting its potential in the field of graph unlearning.

AAAI Conference 2025 Conference Paper

Complex-Cycle-Consistent Diffusion Model for Monaural Speech Enhancement

  • Yi Li
  • Yang Sun
  • Plamen P Angelov

In this paper, we present a novel diffusion model-based monaural speech enhancement method. Our approach incorporates the separate estimation of speech spectra's magnitude and phase in two diffusion networks. Throughout the diffusion process, noise clips from real-world noise interferences are added gradually to the clean speech spectra and a noise-aware reverse process is proposed to learn how to generate both clean speech spectra and noise spectra. Furthermore, to fully leverage the intrinsic relationship between magnitude and phase, we introduce a complex-cycle-consistent (CCC) mechanism that uses the estimated magnitude to map the phase, and vice versa. We implement this algorithm within a phase-aware speech enhancement diffusion model (SEDM). We conduct extensive experiments on public datasets to demonstrate the effectiveness of our method, highlighting the significant benefits of exploiting the intrinsic relationship between phase and magnitude information to enhance speech. The comparison to conventional diffusion models demonstrates the superiority of SEDM.

IJCAI Conference 2025 Conference Paper

Efficient Hi-Fi Style Transfer via Statistical Attention and Modulation

  • Zhirui Fang
  • Yi Li
  • Xin Xie
  • Chengyan Li
  • Yanqing Guo

Style transfer is a challenging task in computer vision, aiming to blend the stylistic features of one image with the content of another while preserving the content details. Traditional methods often face challenges in terms of computational efficiency and fine-grained content preservation. In this paper, we propose a novel feature modulation mechanism based on parameterized normalization, where the modulation parameters for content and style features are learned using a dual convolution network (BiConv). These parameters adjust the mean and standard deviation of the features, improving both the stability and quality of the style transfer process. To achieve fast inference, we introduce an efficient acceleration technique by leveraging a row and column weighted attention matrix. In addition, we incorporate a contrastive learning scheme to align the local features of the content and the stylized images, improving the fidelity of the generated output. Experimental results demonstrate that our method significantly improves the inference speed and the quality of style transfer while preserving content details, outperforming existing approaches based on both convolution and diffusion.

ICML Conference 2025 Conference Paper

Efficiently Serving Large Multimodal Models Using EPD Disaggregation

  • Gursimran Singh
  • Xinglu Wang
  • Yifan Hu
  • Timothy Tin Long Yu
  • Linzi Xing
  • Wei Jiang
  • Zhefeng Wang
  • Xiaolong Bai

Large Multimodal Models (LMMs) extend Large Language Models (LLMs) by handling diverse inputs such as images, audio, and video, but at the cost of adding a multimodal encoding stage that increases both computational and memory overhead. This step negatively affects key Service Level Objectives (SLOs), such as time to first token (TTFT) and time per output token (TPOT). We introduce Encode-Prefill-Decode (EPD) Disaggregation, a novel framework that separates the encoding, prefill, and decode stages onto dedicated resources. Unlike current systems, which bundle encoding and prefill together, our approach decouples these steps, unlocking new opportunities and optimizations. These include a mechanism to cache multimedia tokens for efficient transfer, a novel way to parallelize the encoding load within a request, a module for optimal resource allocation for disaggregated serving, and a novel role-switching method to handle changing workload characteristics. Experimental evaluations with popular LMMs show substantial gains in memory efficiency (up to 15$\times$ lower peak memory utilization), batch sizes (up to 22$\times$ larger), 10$\times$ more images per request, and 2. 2$\times$ larger KV caches. Furthermore, it leads to significant improvements in SLO attainment (up to 90–100% improvement) and TTFT (up to 71% reduction), compared to systems that do not disaggregate. The code is available at https: //github. com/vbdi/epdserve.

IROS Conference 2025 Conference Paper

Hybrid Data-Model-Driven External Force Estimation for Manipulators via Generalized Momentum-Based Third-Order Observer *

  • Haohao Zhang
  • Yi Li
  • Yixin Wang
  • Chong Li
  • Xuhang Tian
  • Yulan Han
  • Zhongyi Ren

Accurate dynamic modeling and external force estimation are crucial for high-precision robot control and applications. However, model incompleteness and external disturbances inevitably lead to a residual between the actual joint torque and the torque calculated by the identified dynamic model. To address this, this paper proposes a hierarchical fusion framework. First, a multi-layer perceptron neural network (MLPNN) is employed to systematically compensate for these joint torque residuals. Subsequently, a generalized momentum-based third-order external force observer is designed to enhance the accuracy of estimating external forces acting on the manipulator. This approach retains the interpretability inherent in physics-based models while augmenting generalization capability through data-driven correction. The advantages of the third-order external force observer are substantiated via comparative analysis with first- and second-order observers on a Simulink simulation platform using a 2-DOF planar manipulator. Furthermore, the effectiveness of the proposed method was validated through a dragging experiment conducted on a 6-DOF manipulator without end-effector force/torque sensor, demonstrating its performance in practical applications.

IROS Conference 2025 Conference Paper

Make Your AUV Adaptive: An Environment-Aware Reinforcement Learning Framework For Underwater Tasks

  • Yimian Ding
  • Jingzehua Xu
  • Guanwen Xie
  • Shuai Zhang 0015
  • Yi Li

This study presents a novel environment-aware reinforcement learning (RL) framework designed to augment the operational capabilities of autonomous underwater vehicles (AUVs) in underwater environments. Departing from traditional RL architectures, the proposed framework integrates an environment-aware network module that dynamically captures flow field data, effectively embedding this critical environmental information into the state space. This integration facilitates real-time environmental adaptation, significantly enhancing the AUV’s situational awareness and decision-making capabilities. Furthermore, the framework incorporates AUV structure characteristics into the optimization process, employing a large language model (LLM)-based iterative refinement mechanism that leverages both environmental conditions and training outcomes to optimize task performance. Comprehensive experimental evaluations demonstrate the framework’s superior performance, robustness and adaptability.

ICLR Conference 2025 Conference Paper

Near-optimal Active Regression of Single-Index Models

  • Yi Li
  • Wai Ming Tai

The active regression problem of the single-index model is to solve $\min_x \lVert f(Ax)-b\rVert_p$, where $A$ is fully accessible and $b$ can only be accessed via entry queries, with the goal of minimizing the number of queries to the entries of $b$. When $f$ is Lipschitz, previous results only obtain constant-factor approximations. This work presents the first algorithm that provides a $(1+\varepsilon)$-approximation solution by querying $\tilde{O}(d^{\frac{p}{2}\vee 1}/\varepsilon^{p\vee 2})$ entries of $b$. This query complexity is also shown to be optimal up to logarithmic factors for $p\in [1,2]$ and the $\varepsilon$-dependence of $1/\varepsilon^p$ is shown to be optimal for $p>2$.

IROS Conference 2025 Conference Paper

Never too Prim to Swim: An LLM-Enhanced RL-based Adaptive S-Surface Controller for AUVs under Extreme Sea Conditions

  • Guanwen Xie
  • Jingzehua Xu
  • Yimian Ding
  • Zhi Zhang
  • Shuai Zhang 0015
  • Yi Li

The adaptivity and maneuvering capabilities of Autonomous Underwater Vehicles (AUVs) have drawn significant attention in oceanic research, due to the unpredictable disturbances and strong coupling among the AUV’s degrees of freedom. In this paper, we developed large language model (LLM)-enhanced reinforcement learning (RL)-based adaptive S-surface controller for AUVs. Specifically, LLMs are introduced for the joint optimization of controller parameters and reward functions in RL training. Using multi-modal and structured explicit task feedback, LLMs enable joint adjustments, balance multiple objectives, and enhance task-oriented performance and adaptability. In the proposed controller, the RL policy focuses on upper-level tasks, outputting task-oriented high-level commands that the S-surface controller then converts into control signals, ensuring cancellation of nonlinear effects and unpredictable external disturbances in extreme sea conditions. Under extreme sea conditions involving complex terrain, waves, and currents, the proposed controller demonstrates superior performance and adaptability in high-level tasks such as underwater target tracking and data collection, outperforming traditional PID and SMC controllers. 3

IROS Conference 2025 Conference Paper

RaGNNarok: A Light-Weight Graph Neural Network for Enhancing Radar Point Clouds on Unmanned Ground Vehicles

  • David Hunt
  • Shaocheng Luo
  • Spencer Hallyburton
  • Shafii Nillongo
  • Yi Li
  • Tingjun Chen
  • Miroslav Pajic

Current lidar and camera-based solutions for low-cost indoor mobile robots have limitations such as poor performance in visually obscured environments, high computational overhead for data processing, and high costs for lidars. In contrast, mmWave radar sensors offer a cost-effective and lightweight alternative, providing accurate ranging regardless of visibility. However, existing radar-based localization suffers from sparse point cloud generation, noise, and false detections. Thus, in this work, we introduce RaGNNarok, a real-time, lightweight, and generalizable graph neural network (GNN)-based framework to enhance radar point clouds, even in complex and dynamic environments. With an inference time of only 7. 3 ms on the low-cost Raspberry Pi 5, RaGNNarok runs even on such resource-constrained devices, without additional computational resources. We evaluate its performance across key tasks, including localization, SLAM, and autonomous navigation, in three different environments. Our results demonstrate strong reliability and generalizability, making RaGNNarok a robust solution for low-cost indoor mobile robots.

NeurIPS Conference 2025 Conference Paper

Reconciling Geospatial Prediction and Retrieval via Sparse Representations

  • Yi Li
  • CHEN YUANLONG
  • Weiming Huang
  • Xiaoli Li
  • Gao Cong

Urban computing harnesses big data to decode complex urban dynamics and revolutionize location-based services. Traditional approaches have treated geospatial prediction tasks (e. g. , estimating socio-economic indicators) and retrieval tasks (e. g. , querying geographic objects) as isolated challenges, necessitating separate models with distinct training objectives. This fragmentation imposes significant computational burdens and limits cross-task synergy, despite advances in representation learning and multi-task foundation models. We present UrbanSparse, a pioneering framework that unifies geospatial prediction and retrieval through a novel sparse-dense representation architecture. By synergistically combining these tasks, UrbanSparse eliminates redundant systems while amplifying their mutual strengths. Our approach introduces two innovations: (1) Bloom filter-based sparse encodings that compress high-sparsity geographic queries and fine-grained text terms for retrieval effectiveness, and (2) a dense semantic codebook that captures granular urban features to boost prediction accuracy. A two-view contrastive learning mechanism further bridges urban objects, regions, and contexts. Experiments on real-world datasets demonstrate 25. 16% gains in prediction accuracy and 20. 76% improvements in retrieval precision over state-of-the-art baselines, alongside 65. 97% faster training. These advantages position UrbanSparse as a scalable solution for large urban datasets. To our knowledge, this is the first unified framework bridging geospatial prediction and retrieval, opening new frontiers in data-driven urban intelligence.

NeurIPS Conference 2025 Conference Paper

Sparse Meets Dense: Unified Generative Recommendations with Cascaded Sparse-Dense Representations

  • Yuhao Yang
  • ZhI JI
  • Zhaopeng Li
  • Yi Li
  • Zhonglin Mo
  • Yue Ding
  • Kai Chen
  • Zijian Zhang

Generative models have recently gained attention in recommendation systems by directly predicting item identifiers from user interaction sequences. However, existing methods suffer from significant information loss due to the separation of stages such as quantization and sequence modeling, hindering their ability to achieve the modeling precision and accuracy of sequential dense retrieval techniques. Integrating generative and dense retrieval methods remains a critical challenge. To address this, we introduce the Cascaded Organized Bi-Represented generAtive retrieval (COBRA) framework, which innovatively integrates sparse semantic IDs and dense vectors through a cascading process. Our method alternates between generating these representations by first generating sparse IDs, which serve as conditions to aid in the generation of dense vectors. End-to-end training enables dynamic refinement of dense representations, capturing both semantic insights and collaborative signals from user-item interactions. During inference, COBRA employs a coarse-to-fine strategy, starting with sparse ID generation and refining them into dense vectors via the generative model. We further propose BeamFusion, an innovative approach combining beam search with nearest neighbor scores to enhance inference flexibility and recommendation diversity. Extensive experiments on public datasets and offline tests validate our method's robustness. Online A/B tests on a real-world advertising platform with over 200 million daily users demonstrate substantial improvements in key metrics, highlighting COBRA's practical advantages.

IJCAI Conference 2025 Conference Paper

Test-Time Adaptation on Recommender System with Data-Centric Graph Transformation

  • Yating Liu
  • Xin Zheng
  • Yi Li
  • Yanqing Guo

Distribution shifts in recommender systems between training and testing in user-item interactions lead to inaccurate recommendations. Despite the promising performance of test-time adaptation technology in various domains, it still faces challenges in recommender systems due to the impracticality of fine-tuning models and the infeasibility of obtaining test-time labels. To address these challenges, we first propose a Test-Time Adaptation framework for Graph-based Recommender system, named TTA-GREC, to dynamically adapt user-item graphs at test time in a data-centric way, handling distribution shifts effectively. Specifically, our TTA-GREC targets KG-enhanced GNN-based recommender systems with three core components: (1) Pseudo-label guided UI graph transformation for adaptive improvement; (2) Rationale score guided KG graph revision for semantic enhancement; and (3) Sampling-based self-supervised adaptation for contrastive learning. Experiments demonstrate TTA-GREC's superiority at test time and provide new data-centric insights on test-time adaptation for better recommender system inference.

YNIMG Journal 2025 Journal Article

The brain-gut microbiota network (BGMN) is correlated with symptom severity and neurocognition in patients with schizophrenia

  • Runlin Peng
  • Wei Wang
  • Liqin Liang
  • Rui Han
  • Yi Li
  • Haiyuan Wang
  • Yuran Wang
  • Wenhao Li

The association between the human brain and gut microbiota, known as the "brain-gut-microbiota axis", is involved in the neuropathological mechanisms of schizophrenia (SZ); however, its association patterns and correlations with symptom severity and neurocognition are still largely unknown. In this study, 43 SZ patients and 55 normal controls (NCs) were included, and resting-state functional magnetic resonance imaging (rs-fMRI) and gut microbiota data were acquired for each participant. First, the brain features of brain images and functional brain networks were computed from rs-fMRI data; the gut features of gut microbiota abundance and the gut microbiota network were computed from gut microbiota data. Second, we propose a novel methodology to construct an individual brain-gut microbiota network (BGMN) for each participant by combining the brain and gut features via multiple strategies. Third, discriminative models between SZ patients and NCs were built using the connectivity matrices of the BGMN as input features. Moreover, the correlations between the most discriminative features and the scores of symptom severity and neurocognition were analyzed in SZ patients. The results showed that the best discriminative model between SZ patients and NCs was achieved using the connectivity matrices of the BGMN when all the brain and gut features were integrated, with an accuracy of 0.90 and an area under the curve value of 0.97. The most discriminative features were related primarily to the genera Faecalibacterium and Collinsella, in which the genus Faecalibacterium was linked to the visual system and subcortical cortices and the genus Collinsella was linked to the default network and subcortical cortices. Furthermore, parts of the most discriminative features were significantly correlated with the scores of neurocognition in the SZ patients. The methodology for constructing individual BGMNs proposed in this study can help us reveal the associations between the brain and gut microbiota and understand the neuropathology of SZ.

JBHI Journal 2025 Journal Article

Unified VideoMAE Framework for Detection of Multi-Disorder ADHD and Depression

  • Yichun Li
  • Yi Li
  • Syed Mohsen Naqvi

Mental disorders have become a major public health issue worldwide. Traditional face-to-face and clinical diagnostic methods are not only time-consuming and expensive but also rely heavily on the expertise of professionals and require costly equipment. With the rapid development of computer vision and deep learning, a growing number of studies are exploring deep learning methods to assist in diagnosing and evaluating various mental disorders. However, most current research focuses on a single type of mental disorder, which limits the broader application of these technologies. This paper proposes a novel system aimed at detecting multiple mental disorders, i. e. , Attention Deficit Hyperactivity Disorder (ADHD) and depression, within one unified framework. To make the proposed system both computationally efficient and robust despite the limited availability of mental health datasets, we first preprocess the original videos to extract cost-effective facial video segments and then fine-tune pre-trained models on these facial videos for classification. In our mental disorder detection task, three distinct fine-tuning methods are adopted for the pre-trained video-masked autoencoder (VideoMAE) model. Moreover, unlike conventional fine-tuning procedures, the proposed methods innovatively refine the attention masks during fine-tuning to prioritise facial features most relevant to mental disorders. Experimental results show that our proposed method outperforms existing multi-mental disorder detection approaches on standard benchmarks, achieving competitive accuracy with fewer parameters, stable cross-dataset performance, and improved robustness. This work points to a promising direction for automated mental disorder diagnostic technologies.

NeurIPS Conference 2025 Conference Paper

You Only Spectralize Once: Taking a Spectral Detour to Accelerate Graph Neural Network

  • Yi Li
  • Zhichun Guo
  • Guanpeng Li
  • Bingzhe Li

Training Graph Neural Networks (GNNs) often relies on repeated, irregular, and expensive message-passing operations over all nodes (e. g. , $N$), leading to high computational overhead. To alleviate this inefficiency, we revisit the GNNs training from a spectral perspective. In many real-world graphs, node features and embeddings exhibit sparse representation in the Graph Fourier domain. This inherent spectral sparsity aligns well with the principles of Compressed Sensing, which posits that signals sparse in one transform domain can be accurately reconstructed from a significantly reduced number of measurements. This observation motivates the design of a more efficient GNNs that operates predominantly in compressed spectral subspace. Thus, we propose You Only Spectralize Once (YOSO), a GNN training scheme that performs single Graph Fourier Transformation to project features onto a learnable orthonormal Fourier basis, retaining only $M$ spectral coefficients ($M \ll N$). The entire GNN computation is then carried out in reduced spectral domain. Final full-graph embeddings are recovered only at output layer by solving a bounded $\ell_{2, 1}$-regularized optimization problem. Theoretically, drawing upon Compressed Sensing theory, we prove stable recovery throughout training by showing that the projection onto our learnable Fourier basis can satisfy the Restricted Isometry Property when $M=\mathcal{O}(k \log N)$ for $k$-row-sparse spectra, acting as the measurement process. Empirically, YOSO achieves an average 74\% reduction in training time across five benchmark datasets compared to state-of-the-art methods, while maintaining competitive accuracy.

AAAI Conference 2024 Conference Paper

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

  • Wenbo Hu
  • Yifan Xu
  • Yi Li
  • Weiyue Li
  • Zeyuan Chen
  • Zhuowen Tu

Vision Language Models (VLMs), which extend Large Language Models (LLM) by incorporating visual understanding capability, have demonstrated significant advancements in addressing open-ended visual question-answering (VQA) tasks. However, these models cannot accurately interpret images infused with text, a common occurrence in real-world scenarios. Standard procedures for extracting information from images often involve learning a fixed set of query embeddings. These embeddings are designed to encapsulate image contexts and are later used as soft prompt inputs in LLMs. Yet, this process is limited to the token count, potentially curtailing the recognition of scenes with text-rich context. To improve upon them, the present study introduces BLIVA: an augmented version of InstructBLIP with Visual Assistant. BLIVA incorporates the query embeddings from InstructBLIP and also directly projects encoded patch embeddings into the LLM, a technique inspired by LLaVA. This approach assists the model to capture intricate details potentially missed during the query decoding process. Empirical evidence demonstrates that our model, BLIVA, significantly enhances performance in processing text-rich VQA benchmarks (up to 17.76% in OCR-VQA benchmark) and in undertaking general (not particularly text-rich) VQA benchmarks (up to 7.9% in Visual Spatial Reasoning benchmark), and achieved 17.72% overall improvement in a comprehensive multimodal LLM benchmark (MME), comparing to our baseline InstructBLIP. BLIVA demonstrates significant capability in decoding real-world images, irrespective of text presence. To demonstrate the broad industry applications enabled by BLIVA, we evaluate the model using a new dataset comprising YouTube thumbnails paired with question-answer sets across 11 diverse categories. For researchers interested in further exploration, our code and models are freely accessible at https://github.com/mlpc-ucsd/BLIVA.

JBHI Journal 2024 Journal Article

Exploiting Hierarchical Interactions for Protein Surface Learning

  • Yiqun Lin
  • Liang Pan
  • Yi Li
  • Ziwei Liu
  • Xiaomeng Li

Predicting interactions between proteins is one of the most important yet challenging problems in structural bioinformatics. Intrinsically, potential function sites in protein surfaces are determined by both geometric and chemical features. However, existing works only consider handcrafted or individually learned chemical features from the atom type and extract geometric features independently. Here, we identify two key properties of effective protein surface learning: 1) relationship among atoms: atoms are linked with each other by covalent bonds to form biomolecules instead of appearing alone, leading to the significance of modeling the relationship among atoms in chemical feature learning. 2) hierarchical feature interaction: the neighboring residue effect validates the significance of hierarchical feature interaction among atoms and between surface points and atoms (or residues). In this paper, we present a principled framework based on deep learning techniques, namely Hierarchical Chemical and Geometric Feature Interaction Network (HCGNet), for protein surface analysis by bridging chemical and geometric features with hierarchical interactions. Extensive experiments demonstrate that our method outperforms the prior state-of-the-art method by 2. 3% in site prediction task and 3. 2% in interaction matching task, respectively.

IROS Conference 2024 Conference Paper

High-Accuracy 2-D AoA Estimation Using Lightweight UWB Arrays

  • Yi Li
  • Hanying Zhao
  • Yiman Liu
  • Tianyu Wang 0012
  • Jincheng Yu
  • Yuan Shen 0001

Ultra-wide band (UWB) systems are gaining popularity for multi-robot localization benefiting from their high-accuracy ranging capabilities. However, current UWB systems fall short in determining orientations and realizing pair-wise localization for neglecting bearing information. Given the importance of bearing capabilities, especially when vision-based methods fail, this paper proposes a high-accuracy 2-D bearing estimation method using stereo UWB arrays. We propose a novel phase error calibration method that effectively mitigates various phase imperfections. This array is designed with antenna spacing larger than half the wavelength to diminish antenna coupling and enhance bearing accuracy. As regards the phase ambiguity issue arising from large antenna spacing, a distributed range-assisted phase ambiguity determination method is developed. Our bearing estimation method exhibits low complexity and is well-suited for the deployment on mobile robots with limited computational resources. The performance of the proposed method is validated on the practical platforms under dynamic scenarios, yielding root mean squared errors (RMSEs) less than 4° and 3° for azimuth and elevation angle estimation, respectively.

ICML Conference 2024 Conference Paper

Longitudinal Targeted Minimum Loss-based Estimation with Temporal-Difference Heterogeneous Transformer

  • Toru Shirakawa
  • Yi Li
  • Yulun Wu
  • Sky Qiu
  • Yuxuan Li
  • Mingduo Zhao
  • Hiroyasu Iso
  • Mark J. van der Laan

We propose Deep Longitudinal Targeted Minimum Loss-based Estimation (Deep LTMLE), a novel approach to estimate the counterfactual mean of outcome under dynamic treatment policies in longitudinal problem settings. Our approach utilizes a transformer architecture with heterogeneous type embedding trained using temporal-difference learning. After obtaining an initial estimate using the transformer, following the targeted minimum loss-based likelihood estimation (TMLE) framework, we statistically corrected for the bias commonly associated with machine learning algorithms. Furthermore, our method also facilitates statistical inference by enabling the provision of 95% confidence intervals grounded in asymptotic statistical theory. Simulation results demonstrate our method’s superior performance over existing approaches, particularly in complex, long time-horizon scenarios. It remains effective in small-sample, short-duration contexts, matching the performance of asymptotically efficient estimators. To demonstrate our method in practice, we applied our method to estimate counterfactual mean outcomes for standard versus intensive blood pressure management strategies in a real-world cardiovascular epidemiology cohort study.

IROS Conference 2024 Conference Paper

Spike-based high energy efficiency and accuracy tracker for Robot

  • Jinye Qu
  • Zeyu Gao
  • Yi Li
  • Yanfeng Lu
  • Hong Qiao

Spiking Neural Networks (SNNs) have gained attention for their apparent energy efficiency and significant biological interpretability, although they also face significant challenges such as prolonged latency and suboptimal tracking accuracy. Recent studies have explored the application of SNNs in object tracking tasks. Dynamic visual sensors (DVS) have become a popular way to implement SNN-based object tracking due to their asynchronous and spiking characteristics similar to SNNs. However, challenges such as the high cost of DVS cameras and the lack of object surface texture information hinder the utility and performance of DVS trackers. In contrast, RGB information has inherent advantages, including low acquisition cost and comprehensive object surface texture representation. However, RGB information is prone to excessive image blurring in low-light conditions or in fast-motion scenes. To address these challenges, we propose the “Motion Feature Extractor” and the "RGB-DVS Fusion Module". The “Motion Feature Extractor” can replace the DVS camera at a very low cost, and the "RGB-DVS Fusion Module" can deeply fuse the feature information of the two to make up for their respective deficiencies. In addition, we adopt a conversion method to obtain a lossless SNN version of the model. Through experiments, our model achieves a 13. 6% improvement in the expected average overlap (EAO) index using only 1. 47% of the energy consumption of SiamRPN (VOT2016 dataset). In addition, we deployed the model to a robot and then conducted tracking experiments, which confirmed that the model can operate on the robot losslessly with satisfactory results.

AAAI Conference 2022 Conference Paper

Close the Loop: A Unified Bottom-Up and Top-Down Paradigm for Joint Image Deraining and Segmentation

  • Yi Li
  • Yi Chang
  • Changfeng Yu
  • Luxin Yan

In this work, we focus on a very practical problem: image segmentation under rain conditions. Image deraining is a classic low-level restoration task, while image segmentation is a typical high-level understanding task. Most of the existing methods intuitively employ the bottom-up paradigm by taking deraining as a preprocessing step for subsequent segmentation. However, our statistical analysis indicates that not only deraining would benefit segmentation (bottom-up), but also segmentation would further improve deraining performance (top-down) in turn. This motivates us to solve the rainy image segmentation task within a novel top-down and bottomup unified paradigm, in which two sub-tasks are alternatively performed and collaborated with each other. Specifically, the bottom-up procedure yields both clearer images and rainrobust features from both image and feature domains, so as to ease the segmentation ambiguity caused by rain streaks. The top-down procedure adopts semantics to adaptively guide the restoration for different contents via a novel multi-path semantic attentive module (SAM). Thus the deraining and segmentation could boost the performance of each other cooperatively and progressively. Extensive experiments and ablations demonstrate that the proposed method outperforms the state-of-the-art on rainy image segmentation.

NeurIPS Conference 2022 Conference Paper

GALOIS: Boosting Deep Reinforcement Learning via Generalizable Logic Synthesis

  • Yushi Cao
  • Zhiming Li
  • Tianpei Yang
  • Hao Zhang
  • Yan Zheng
  • Yi Li
  • Jianye Hao
  • Yang Liu

Despite achieving superior performance in human-level control problems, unlike humans, deep reinforcement learning (DRL) lacks high-order intelligence (e. g. , logic deduction and reuse), thus it behaves ineffectively than humans regarding learning and generalization in complex problems. Previous works attempt to directly synthesize a white-box logic program as the DRL policy, manifesting logic-driven behaviors. However, most synthesis methods are built on imperative or declarative programming, and each has a distinct limitation, respectively. The former ignores the cause-effect logic during synthesis, resulting in low generalizability across tasks. The latter is strictly proof-based, thus failing to synthesize programs with complex hierarchical logic. In this paper, we combine the above two paradigms together and propose a novel Generalizable Logic Synthesis (GALOIS) framework to synthesize hierarchical and strict cause-effect logic programs. GALOIS leverages the program sketch and defines a new sketch-based hybrid program language for guiding the synthesis. Based on that, GALOIS proposes a sketch-based program synthesis method to automatically generate white-box programs with generalizable and interpretable cause-effect logic. Extensive evaluations on various decision-making tasks with complex logic demonstrate the superiority of GALOIS over mainstream baselines regarding the asymptotic performance, generalizability, and great knowledge reusability across different environments.

TCS Journal 2022 Journal Article

Synthesizing ranking functions for loop programs via SVM

  • Yi Li
  • Xie Li
  • Yong Li
  • Xuechao Sun
  • Andrea Turrini
  • Lijun Zhang

Termination of programs is probably the most famous undecidable problem in computer science. Despite this undecidability result, a lot of effort has been spent on improving algorithms that prove termination of loops, which is one of the fundamental aspects of software reliability analysis. These algorithms usually focus on finding an appropriate ranking function for the loop, which proves its termination. In this paper, we focus on handling the synthesis problem of nested ranking functions and multi-phase ranking functions for loop programs. We first reduce the problem of a nested ranking function synthesis to the existence problem of a hyperplane separating classes of data. This allows us to leverage Support-Vector Machines (SVM) techniques for the synthesis of nested ranking functions. SVM are supervised learning algorithms that are used to classify data; they work by finding a hyperplane separating data points parted into two classes. We show how to carefully define the data points so that the separating hyperplane gives rise to a nested ranking function for the loop. Then we use this algorithm for nested ranking functions synthesis as a subprocedure to devise a sound algorithm which incrementally synthesizes multi-phase ranking functions. Experimental results confirm the effectiveness of our SVM-based synthesis of nested and multi-phase ranking functions.

AAAI Conference 2022 Conference Paper

Uncertainty Estimation via Response Scaling for Pseudo-Mask Noise Mitigation in Weakly-Supervised Semantic Segmentation

  • Yi Li
  • Yiqun Duan
  • Zhanghui Kuang
  • Yimin Chen
  • Wayne Zhang
  • Xiaomeng Li

Weakly-Supervised Semantic Segmentation (WSSS) segments objects without a heavy burden of dense annotation. While as a price, generated pseudo-masks exist obvious noisy pixels, which result in sub-optimal segmentation models trained over these pseudo-masks. But rare studies notice or work on this problem, even these noisy pixels are inevitable after their improvements on pseudo-mask. So we try to improve WSSS in the aspect of noise mitigation. And we observe that many noisy pixels are of high confidence, especially when the response range is too wide or narrow, presenting an uncertain status. Thus, in this paper, we simulate noisy variations of response by scaling the prediction map multiple times for uncertainty estimation. The uncertainty is then used to weight the segmentation loss to mitigate noisy supervision signals. We call this method URN, abbreviated from Uncertainty estimation via Response scaling for Noise mitigation. Experiments validate the benefits of URN, and our method achieves state-of-the-art results at 71. 2% and 41. 5% on PASCAL VOC 2012 and MS COCO 2014 respectively, without extra models like saliency detection. Code is available at https: //github. com/XMed-Lab/URN.

AAAI Conference 2021 Conference Paper

Early Safety Warnings for Long-Distance Pipelines: A Distributed Optical Fiber Sensor Machine Learning Approach

  • Yiyuan Yang
  • Yi Li
  • Taojia Zhang
  • Yan Zhou
  • Haifeng Zhang

Automated pipeline safety early warning (PSEW) systems are designed to automatically identify and locate third-party damage events on oil and gas pipelines. They are intended to replace traditional, inefficient manual inspection methods. However, current PSEW methods cannot achieve universality for various complex environments because they are sensitive to the spatiotemporal stability of the signal obtained by its distributed sensors at various locations and times. Our research aimed to improve the accuracy of long-distance oil–gas PSEW systems through machine learning. In this paper, we propose a novel real-time action recognition method for long-distance PSEW systems based on a coherent Rayleigh scattering distributed optical fiber sensor. More specifically, we put forward two complementary feature calculation methods to describe signals and build a new action recognition deep learning network based on those features. Encouraging empirical results on the data collected at a real location confirm that the features can effectively describe signals in an environment with strong noise and weak signals, and the entire approach can identify and locate third-party damage events quickly under various hardware conditions with accuracies of 99.26% (500 Hz) and 97.20% (100 Hz). More generically, our method can be applied to other fields as well.

JMLR Journal 2021 Journal Article

Estimation and Inference for High Dimensional Generalized Linear Models: A Splitting and Smoothing Approach

  • Zhe Fei
  • Yi Li

The focus of modern biomedical studies has gradually shifted to explanation and estimation of joint effects of high dimensional predictors on disease risks. Quantifying uncertainty in these estimates may provide valuable insight into prevention strategies or treatment decisions for both patients and physicians. High dimensional inference, including confidence intervals and hypothesis testing, has sparked much interest. While much work has been done in the linear regression setting, there is lack of literature on inference for high dimensional generalized linear models. We propose a novel and computationally feasible method, which accommodates a variety of outcome types, including normal, binomial, and Poisson data. We use a “splitting and smoothing” approach, which splits samples into two parts, performs variable selection using one part and conducts partial regression with the other part. Averaging the estimates over multiple random splits, we obtain the smoothed estimates, which are numerically stable. We show that the estimates are consistent, asymptotically normal, and construct confidence intervals with proper coverage probabilities for all predictors. We examine the finite sample performance of our method by comparing it with the existing methods and applying it to analyze a lung cancer cohort study. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2021. ( edit, beta )

AAAI Conference 2021 Conference Paper

GraphMSE: Efficient Meta-path Selection in Semantically Aligned Feature Space for Graph Neural Networks

  • Yi Li
  • Yilun Jin
  • Guojie Song
  • Zihao Zhu
  • Chuan Shi
  • Yiming Wang

Heterogeneous information networks (HINs) are ideal for describing real-world data with different types of entities and relationships. To carry out machine learning on HINs, metapaths are widely utilized to extract semantics with pre-defined patterns, and models such as graph convolutional networks (GCNs) are thus enabled. However, previous works generally assume a fixed set of meta-paths, which is unrealistic as real-world data are overwhelmingly diverse. Therefore, it is appealing if meta-paths can be automatically selected given an HIN, yet existing works aiming at such problem possess drawbacks, such as poor efficiency and ignoring feature heterogeneity. To address these drawbacks, we propose GraphMSE, an efficient heterogeneous GCN combined with automatic meta-path selection. Specifically, we design highly efficient meta-path sampling techniques, and then injectively project sampled meta-path instances to vectors. We then design a novel semantic feature space alignment, aiming to align the meta-path instance vectors and hence facilitate meta-path selection. Extensive experiments on real-world datasets demonstrate that GraphMSE outperforms state-ofthe-art counterparts, figures out important meta-paths, and is dramatically (e. g. 200 times) more efficient.

IJCAI Conference 2020 Conference Paper

Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning

  • Hao Zhu
  • Huaibo Huang
  • Yi Li
  • Aihua Zheng
  • Ran He

Talking face generation aims to synthesize a face video with precise lip synchronization as well as a smooth transition of facial motion over the entire video via the given speech clip and facial image. Most existing methods mainly focus on either disentangling the information in a single image or learning temporal information between frames. However, cross-modality coherence between audio and video information has not been well addressed during synthesis. In this paper, we propose a novel arbitrary talking face generation framework by discovering the audio-visual coherence via the proposed Asymmetric Mutual Information Estimator (AMIE). In addition, we propose a Dynamic Attention (DA) block by selectively focusing the lip area of the input image during the training stage, to further enhance lip synchronization. Experimental results on benchmark LRW dataset and GRID dataset transcend the state-of-the-art methods on prevalent metrics with robust high-resolution synthesizing on gender and pose variations.

ICRA Conference 2020 Conference Paper

Construction of Bounding Volume Hierarchies for Triangle Meshes with Mixed Face Sizes

  • Yi Li
  • Evan Shellshear
  • Robert Bohlin
  • Johan S. Carlson

We consider the problem of creating tighter-fitting bounding volumes (more specifically rectangular swept spheres) when constructing bounding volume hierarchies (BVHs) for complex 3D geometries given in the form of unstructured triangle meshes/soups with the aim of speeding up our IPS Path Planner for rigid bodies, where the triangles often have very different sizes. Currently, the underlying collision and distance computation module (IPS CDC) does not take into account the sizes of the triangles when it constructs BVHs using a top-down strategy. To split triangles in a BVH node into two BVH nodes, IPS CDC has to compute both the split axis and the split position. In this work, we use the principal axes of the tensor of inertia as the potential split axes and the center of mass as the split position, where the computations of both the tensor of inertia and the center of mass require knowledge of the areas of the triangles. We show that our method improves performance (up to 20 % faster) of our IPS Path Planner when it is used to plan collision-free disassembly paths for three different test cases taken from manufacturing industries.

ICRA Conference 2020 Conference Paper

Contact-based Bounding Volume Hierarchy for Assembly Tasks

  • Evan Shellshear
  • Yi Li
  • Robert Bohlin
  • Johan S. Carlson

Path planning of an object which is allowed to be in contact with other objects during assembly process is a significant challenge due to the variety of permitted or forbidden collisions between the distinct parts of the objects to be assembled. In order to put objects together in real-life scenarios, parts of assembled objects may be required to flex, whereas other parts may have to fit exactly. Consequently, existing collision checking and distance computation algorithms have to be modified to enable path planning of objects that can be in contact during the assembly process. In this paper, we analyze an improved broad phase proximity query algorithm to enable such contact-based assembly tasks we call CHAT (Contact-based Hierarchy for Assembly Tasks). We demonstrate that, compared to existing approaches, our proposed method is more than an order of magnitude faster for collision queries and up to three times faster for distance queries when the two objects contain a large number of parts (with some parts containing thousands or tens of thousands of triangles). Due to the nature of the algorithm, we expect the performance improvements to increase as the number of parts in an object becomes larger.

AAAI Conference 2020 Conference Paper

Incorporating Expert-Based Investment Opinion Signals in Stock Prediction: A Deep Learning Framework

  • Heyuan Wang
  • Tengjiao Wang
  • Yi Li

Investment messages published on social media platforms are highly valuable for stock prediction. Most previous work regards overall message sentiments as forecast indicators and relies on shallow features (bag-of-words, noun phrases, etc.) to determine the investment opinion signals. These methods neither capture the time-sensitive and target-aware characteristics of stock investment reviews, nor consider the impact of investor’s reliability. In this study, we provide an in-depth analysis of public stock reviews and their application in stock movement prediction. Specifically, we propose a novel framework which includes the following three key components: time-sensitive and target-aware investment stance detection, expert-based dynamic stance aggregation, and stock movement prediction. We first introduce our stance detection model named MFN, which learns the representation of each review by integrating multi-view textual features and extended knowledge in financial domain to distill bullish/bearish investment opinions. Then we show how to identify the validity of each review, and enhance stock movement prediction by incorporating expert-based aggregated opinion signals. Experiments on real datasets show our framework can effectively improve the performance of both investment opinion mining and individual stock forecasting.

NeurIPS Conference 2020 Conference Paper

Learning Representations from Audio-Visual Spatial Alignment

  • Pedro Morgado
  • Yi Li
  • Nuno Nvasconcelos

We introduce a novel self-supervised pretext task for learning representations from audio-visual content. Prior work on audio-visual representation learning leverages correspondences at the video level. Approaches based on audio-visual correspondence (AVC) predict whether audio and video clips originate from the same or different video instances. Audio-visual temporal synchronization (AVTS) further discriminates negative pairs originated from the same video instance but at different moments in time. While these approaches learn high-quality representations for downstream tasks such as action recognition, they completely disregard the spatial cues of audio and visual signals naturally occurring in the real world. To learn from these spatial cues, we tasked a network to perform contrastive audio-visual spatial alignment of 360\degree video and spatial audio. The ability to perform spatial alignment is enhanced by reasoning over the full spatial content of the 360\degree video using a transformer architecture to combine representations from multiple viewpoints. The advantages of the proposed pretext task are demonstrated on a variety of audio and visual downstream tasks, including audio-visual correspondence, spatial alignment, action recognition and video semantic segmentation. Dataset and code are available at https: //github. com/pedro-morgado/AVSpatialAlignment.

AAAI Conference 2020 Conference Paper

PSENet: Psoriasis Severity Evaluation Network

  • Yi Li
  • Zhe Wu
  • Shuang Zhao
  • Xian Wu
  • Yehong Kuang
  • YangTian Yan
  • Shen Ge
  • Kai Wang

Psoriasis is a chronic skin disease which affects hundreds of millions of people around the world. This disease cannot be fully cured and requires lifelong caring. If the deterioration of Psoriasis is not detected and properly treated in time, it could cause serious complications or even lead to a life threat. Therefore, a quantitative measurement that can track the Psoriasis severity is necessary. Currently, PASI (Psoriasis Area and Severity Index) is the most frequently used measurement in clinical practices. However, PASI has the following disadvantages: (1) Time consuming: calculating PASI usually takes more than 30 minutes which poses a heavy burden on dermatologists; and (2) Inconsistency: due to the complexity of PASI calculation, different or even the same dermatologist could give different scores for the same case. To overcome these drawbacks, we propose PSENet which applies deep neural networks to estimate Psoriasis severity based on skin lesion images. Different from typical deep learning frameworks for image processing, PSENet has the following characteristics: (1) PSENet introduces a score re- fine module which is able to capture the visual features of skin at both coarse and fine-grained granularities; (2) PSENet uses siamese structure in training and accepts pairwise inputs, which reduces the dependency on large amount of training data; and (3) PSENet can not only estimate the severity, but also locate the skin lesion regions from the input image. To train and evaluate PSENet, we work with professional dermatologists from a top hospital and spend years in building a golden dataset. The experimental results show that PSENet can achieve the mean absolute error of 2. 21 and the accuracy of 77. 87% in pair comparison, outperforming baseline methods. Overall, PSENet not only relieves dermatologists from the dull PASI calculation but also enables patients to track Psoriasis severity in a much more convenient manner.

TCS Journal 2020 Journal Article

Target users' activation probability maximization with different seed set constraints in social networks

  • Ruidong Yan
  • Hongwei Du
  • Yi Li
  • Wenping Chen
  • Yongcai Wang
  • Yuqing Zhu
  • Deying Li

Influence Maximization (IM) over the online social networks have been widely explored in recent years, which selects a seed set from nodes in the network using a limited budget such that the expected number of nodes influenced by the seed set is maximized. However, how to activate a considered set of targeting users T, e. g. , selling a product to a specific target group, is a more practical problem. To address this problem, we respectively propose the Target Users' Activation Probability Maximization with Constraint (TUAPM-WC) problem and the Target Users' Activation Probability Maximization without Constraint (TUAPM-WOC) problem, i. e. , to select a seed set S with/without size constraints such that the activation probabilities of the target users in T are maximized. Considering that the influence will decay during information propagation, we propose a novel and practical Influence Decay Model (IDM) as the information diffusion model. Based on the IDM, we show that the TUAPM-WC and the TUAPM-WOC problems are NP-hard. We also prove that the objective functions of TUAPM-WC and TUAPM-WOC problems are monotone non-decreasing and submodular. On one hand, we employ a Double Greedy Algorithm (DGA) to guarantee a (1/3)-approximation ratio for TUAPM-WOC problem when | S | is unconstrained. On the other hand, we propose a series of algorithms to solve the TUAPM-WC when | S | ≤ b, where b is a positive integer. More specifically, we provide a ( 1 − 1 / e )-approximation Basic Greedy Algorithm (BGA). Furthermore, a speed-up Scalable Algorithm (SA) is proposed for online large social networks. Finally, we run our algorithms by simulations on synthetic and real-life social networks to evaluate the effectiveness and efficiency of the proposed algorithms. Experimental results validate our algorithms' superior to the comparison algorithms.

YNIMG Journal 2019 Journal Article

Challenges in pediatric neuroimaging

  • Matthew J. Barkovich
  • Yi Li
  • Rahul S. Desikan
  • A. James Barkovich
  • Duan Xu

Pediatric neuroimaging is challenging due the rapid structural, metabolic, and functional changes that occur in the developing brain. A specially trained team is needed to produce high quality diagnostic images in children, due to their small physical size and immaturity. Patient motion, cooperation and medical condition dictate the methods and equipment used. A customized approach tailored to each child's age and functional status with the appropriate combination of dedicated staff, imaging hardware, and software is key; these range from low-tech techniques, such as feed and swaddle, to specialized small bore MRI scanners, MRI compatible incubators and neonatal head coils. New pre-and post-processing techniques can also compensate for the motion artifacts and low signal that often degrade neonatal scans.

YNIMG Journal 2019 Journal Article

Image processing and analysis methods for the Adolescent Brain Cognitive Development Study

  • Donald J. Hagler
  • SeanN. Hatton
  • M. Daniela Cornejo
  • Carolina Makowski
  • Damien A. Fair
  • Anthony Steven Dick
  • Matthew T. Sutherland
  • B.J. Casey

The Adolescent Brain Cognitive Development (ABCD) Study is an ongoing, nationwide study of the effects of environmental influences on behavioral and brain development in adolescents. The main objective of the study is to recruit and assess over eleven thousand 9-10-year-olds and follow them over the course of 10 years to characterize normative brain and cognitive development, the many factors that influence brain development, and the effects of those factors on mental health and other outcomes. The study employs state-of-the-art multimodal brain imaging, cognitive and clinical assessments, bioassays, and careful assessment of substance use, environment, psychopathological symptoms, and social functioning. The data is a resource of unprecedented scale and depth for studying typical and atypical development. The aim of this manuscript is to describe the baseline neuroimaging processing and subject-level analysis methods used by ABCD. Processing and analyses include modality-specific corrections for distortions and motion, brain segmentation and cortical surface reconstruction derived from structural magnetic resonance imaging (sMRI), analysis of brain microstructure using diffusion MRI (dMRI), task-related analysis of functional MRI (fMRI), and functional connectivity analysis of resting-state fMRI. This manuscript serves as a methodological reference for users of publicly shared neuroimaging data from the ABCD Study.

IJCAI Conference 2019 Conference Paper

Pose-preserving Cross Spectral Face Hallucination

  • Junchi Yu
  • Jie Cao
  • Yi Li
  • Xiaofei Jia
  • Ran He

To narrow the inherent sensing gap in heterogeneous face recognition (HFR), recent methods have resorted to generative models and explored the? recognition via generation? framework. Even though, it remains a very challenging task to synthesize photo-realistic visible faces (VIS) from near-infrared (NIR) images especially when paired training data are unavailable. We present an approach to avert the data misalignment problem and faithfully preserve pose, expression and identity information during cross-spectral face hallucination. At the pixel level, we introduce an unsupervised attention mechanism to warping that is jointly learned with the generator to derive pixel-wise correspondence from unaligned data. At the image level, an auxiliary generator is employed to facilitate the learning of mapping from NIR to VIS domain. At the domain level, we first apply the mutual information constraint to explicitly measure the correlation between domains and thus benefit synthesis. Extensive experiments on three heterogeneous face datasets demonstrate that our approach not only outperforms current state-of-the-art HFR methods but also produce visually appealing results at a high resolution.

AAAI Conference 2018 Conference Paper

Anti-Makeup: Learning A Bi-Level Adversarial Network for Makeup-Invariant Face Verification

  • Yi Li
  • Lingxiao Song
  • Xiang Wu
  • Ran He
  • Tieniu Tan

Makeup is widely used to improve facial attractiveness and is well accepted by the public. However, different makeup styles will result in significant facial appearance changes. It remains a challenging problem to match makeup and non-makeup face images. This paper proposes a learning from generation approach for makeup-invariant face verification by introducing a bi-level adversarial network (BLAN). To alleviate the negative effects from makeup, we first generate non-makeup images from makeup ones, and then use the synthesized nonmakeup images for further verification. Two adversarial networks in BLAN are integrated in an end-to-end deep network, with the one on pixel level for reconstructing appealing facial images and the other on feature level for preserving identity information. These two networks jointly reduce the sensing gap between makeup and non-makeup images. Moreover, we make the generator well constrained by incorporating multiple perceptual losses. Experimental results on three benchmark makeup face datasets demonstrate that our method achieves state-of-the-art verification accuracy across makeup status and can produce photo-realistic non-makeup face images.

JMLR Journal 2017 Journal Article

A Robust-Equitable Measure for Feature Ranking and Selection

  • A. Adam Ding
  • Jennifer G. Dy
  • Yi Li
  • Yale Chang

In many applications, not all the features used to represent data samples are important. Often only a few features are relevant for the prediction task. The choice of dependence measures often affect the final result of many feature selection methods. To select features that have complex nonlinear relationships with the response variable, the dependence measure should be equitable, a concept proposed by Reshef et al. (2011); that is, the dependence measure treats linear and nonlinear relationships equally. Recently, Kinney and Atwal (2014) gave a mathematical definition of self- equitability. In this paper, we introduce a new concept of robust-equitability and identify a robust- equitable copula dependence measure, the robust copula dependence (RCD) measure. RCD is based on the $L_1$-distance of the copula density from uniform and we show that it is equitable under both equitability definitions. We also prove theoretically that RCD is much easier to estimate than mutual information. Because of these theoretical properties, the RCD measure has the following advantages compared to existing dependence measures: it is robust to different relationship forms and robust to unequal sample sizes of different features. Experiments on both synthetic and real-world data sets confirm the theoretical analysis, and illustrate the advantage of using the dependence measure RCD for feature selection. [abs] [ pdf ][ bib ] &copy JMLR 2017. ( edit, beta )

IJCAI Conference 2017 Conference Paper

Opinion-aware Knowledge Graph for Political Ideology Detection

  • Wei Chen
  • Xiao Zhang
  • Tengjiao Wang
  • Bishan Yang
  • Yi Li

Identifying individual's political ideology from their speeches and written texts is important for analyzing political opinions and user behavior on social media. Traditional opinion mining methods rely on bag-of-words representations to classify texts into different ideology categories. Such methods are too coarse for understanding political ideologies. The key to identify different ideologies is to recognize different opinions expressed toward a specific topic. To model this insight, we classify ideologies based on the distribution of opinions expressed towards real-world entities or topics. Specifically, we propose a novel approach to political ideology detection that makes predictions based on an opinion-aware knowledge graph. We show how to construct such graph by integrating the opinions and targeted entities extracted from text into an existing structured knowledge base, and show how to perform ideology inference by information propagation on the graph. Experimental results demonstrate that our method achieves high accuracy in detecting ideologies compared to baselines including LR, SVM and RNN.

NeurIPS Conference 2016 Conference Paper

R-FCN: Object Detection via Region-based Fully Convolutional Networks

  • Jifeng Dai
  • Yi Li
  • Kaiming He
  • Jian Sun

We present region-based, fully convolutional networks for accurate and efficient object detection. In contrast to previous region-based detectors such as Fast/Faster R-CNN that apply a costly per-region subnetwork hundreds of times, our region-based detector is fully convolutional with almost all computation shared on the entire image. To achieve this goal, we propose position-sensitive score maps to address a dilemma between translation-invariance in image classification and translation-variance in object detection. Our method can thus naturally adopt fully convolutional image classifier backbones, such as the latest Residual Networks (ResNets), for object detection. We show competitive results on the PASCAL VOC datasets (e. g. , 83. 6% mAP on the 2007 set) with the 101-layer ResNet. Meanwhile, our result is achieved at a test-time speed of 170ms per image, 2. 5-20 times faster than the Faster R-CNN counterpart. Code is made publicly available at: https: //github. com/daijifeng001/r-fcn.

KR Conference 2016 Short Paper

Specifying Scheduling Problems Using Metric Temporal Logic

  • Roy Luo
  • Richard Valenzano
  • Yi Li
  • Chris Beck
  • Sheila McIlraith

We introduce Scheduling MTL (SMTL) an extension of Metric Temporal Logic that supports the specification of complex scheduling problems with repeated and conditional occurrences of activities, and rich temporal relationships among them. We define the syntax and semantics of SMTL, and explore natural restrictions of the language to gain tractability. We also provide an algorithm for finding a schedule to a problem specified as an SMTL formula, and establish a novel equivalence between a fragment of MTL and simple temporal networks, a widely-used formalism in AI temporal planning.

ICRA Conference 2015 Conference Paper

A proposal of a light-weight walking assist wear using PVC gel artificial muscles

  • Yi Li
  • Minoru Hashimoto

We have developed a contraction and expansion-type artificial muscle by using the plasticized polyvinyl chloride (PVC) gel and mesh electrodes. The PVC gel artificial muscle exhibits a fast response in air, large deformation, variable stiffness, high output force and low power consumption under an electrical field, so it has a great potential to use as a new type of artificial muscles. In a previous study, we designed variable stiffness spats for walking assistance by incorporating the variable stiffness PVC gel artificial muscles with the generally used spats. The spats can assist walking by the generation force of the variation of stiffness with the on and off switching of an electric field. It was found that both the integrated electromyogram (IEMG) and maximal voluntary contraction (%MVC) of the rectus femoris muscle decreased during the walking when wearing the variable stiffness gel spats which showed that it was possible to use the PVC gel artificial muscles for walking assistance. However, the variable stiffness spats has small generation force and lower robustness to external forces. In this study, we propose a novel approach to make a high-performance light-weight walking assist wear using the PVC gel artificial muscles. We use the contraction-expansion output force for motion assist to get a bigger generation force. An expansion type structure unit is introduced to make the assist wear more robust to external forces. Insole force sensors are used to detect the gait changes during walking. The proposed walking assist wear has the characteristics of a simple structure, light weight, easy to put on and take off and with a high flexibility. In this paper, the framework involving the proposed walking assist wear is described in detail.

AAAI Conference 2015 Conference Paper

Robot Learning Manipulation Action Plans by “Watching” Unconstrained Videos from the World Wide Web

  • Yezhou Yang
  • Yi Li
  • Cornelia Fermuller
  • Yiannis Aloimonos

In order to advance action generation and creation in robots beyond simple learned schemas we need computational tools that allow us to automatically interpret and represent human actions. This paper presents a system that learns manipulation action plans by processing unconstrained videos from the World Wide Web. Its goal is to robustly generate the sequence of atomic actions of seen longer actions in video in order to acquire knowledge for robots. The lower level of the system consists of two convolutional neural network (CNN) based recognition modules, one for classifying the hand grasp type and the other for object recognition. The higher level is a probabilistic manipulation action grammar based parsing module that aims at generating visual sentences for robot manipulation. Experiments conducted on a publicly available unconstrained video dataset show that the system is able to learn manipulation actions by “watching” unconstrained videos with high accuracy.

IROS Conference 2013 Conference Paper

Development of variable stiffness gel spats for walking assistance

  • Yasuhiro Maeda
  • Yi Li
  • Keigo Yasuda
  • Minoru Hashimoto

In a previous study we developed an expansion and contraction actuator using PVC gel. We investigated the characteristics of the PVC gel actuator and found that its stiffness changed noticeably with a variation in the applied DC field. In this study, we designed new variable stiffness spats for walking assistance by incorporating the variable stiffness PVC gel actuator in generally used spats. The stiffness of the spats can be varied with the on and off switching of an electric field. We believe that the spats can assist walking by restraining and releasing body motion with different stiffness while walking, and conducted experiments to evaluate their effectiveness. It was found that the integrated electromyogram (IEMG) and maximal voluntary contraction (%MVC) of the rectus femoris muscle decreased during walking when wearing them which indicated that the gel spats designed in this study were effective in assisting walking.

ICRA Conference 2013 Conference Paper

Fast grasp planning by using cord geometry to find grasping points

  • Yi Li
  • Jean-Philippe Saut
  • Julien Pettré
  • Anis Sahbani
  • Philippe Bidaud
  • Franck Multon

In this paper, we propose a novel idea to address the problem of fast computation of enveloping grasp configurations for a multi-fingered hand with 3D polygonal models represented as polygon soups. The proposed method performs a low-level shape matching by wrapping multiple cords around an object in order to quickly isolate promising grasping spots. From these spots, hand palm posture can be computed followed by a standard close-until-contact procedure to find the contact points. Along with the contacts information, the finger kinematics is then used to filter the unstable grasps. Through multiple simulated examples with a twelve degrees-of-freedom anthropomorphic hand, we demonstrate that our method can compute good grasps for objects with complex geometries in a short amount of time. Best of all, this is achieved without complex model preprocessing like segmentation by parts and medial axis extraction.

IJCAI Conference 2013 Conference Paper

Learning Visual Symbols for Parsing Human Poses in Images

  • Fang Wang
  • Yi Li

Parsing human poses in images is fundamental in extracting critical visual information for artificial intelligent agents. Our goal is to learn selfcontained body part representations from images, which we call visual symbols, and their symbolwise geometric contexts in this parsing process. Each symbol is individually learned by categorizing visual features leveraged by geometric information. In the categorization, we use Latent Support Vector Machine followed by an efficient cross validation procedure. Then, these symbols naturally define geometric contexts of body parts in a fine granularity. When the structure of the compositional parts is a tree, we derive an efficient approach to estimating human poses in images. Experiments on two large datasets suggest our approach outperforms state of the art methods.

ICRA Conference 2011 Conference Paper

Finding enveloping grasps by matching continuous surfaces

  • Yi Li
  • Jean-Philippe Saut
  • Juan Cortés
  • Thierry Siméon
  • Daniel Sidobre

This paper presents a new method to compute enveloping grasps with a multi-fingered robotic hand. The method is guided by the idea that a good grasp should maximize the contact surface between the held object and the hand's palmar surface. Starting from a given hand pregrasp configuration, the proposed method finds the hand poses that maximize this surface similarity. We use a surface descriptor that is based on a geodesic measure and on a continuous representation of the surfaces, unlike previous shape matching methods that rely on the Euclidean distance and/or discrete representation (e. g. random point set). Using geodesic contours to describe local surfaces enables us to detect details such as a handle or a thin part. Once the surface matching returns a set of hand poses, sorted by similarity, a second step is performed to adjust the hand configuration with the purpose of eliminating penetration of the object. Lastly, the grasp stability is tested in order to definitely validate the candidate grasps.

IROS Conference 2009 Conference Paper

Real-time shape retrieval for robotics using skip Tri-Grams

  • Yi Li
  • Konstantinos Bitsakos
  • Cornelia Fermüller
  • Yiannis Aloimonos

The real time requirement is an additional constraint on many intelligent applications in robotics, such as shape recognition and retrieval using a mobile robot platform. In this paper, we present a scalable approach for efficiently retrieving closed contour shapes. The contour of an object is represented by piecewise linear segments. A skip Tri-Gram is obtained by selecting three segments in the clockwise order while allowing a constant number of segments to be ¿skipped¿ in between. The main idea is to use skip Tri-Grams of the segments to implicitly encode the distant dependency of the shape. All skip Tri-Grams are used for efficiently retrieving closed contour shapes without pairwise matching feature points from two shapes. The retrieval is at least an order of magnitude faster than other state-of-the-art algorithms. We score 80% in the Bullseye retrieval test on the whole MPEG 7 shape dataset. We further test the algorithm using a mobile robot platform in an indoor environment. 8 objects are used for testing from different viewing directions, and we achieve 82% accuracy.

IROS Conference 2008 Conference Paper

Real-time motion planning of multiple formations in virtual environments: Flexible virtual structures and continuum model

  • Yi Li
  • Kamal Gupta 0001

We present a novel approach for real-time motion planning of multiple formations in virtual environments with dynamic obstacles. Our algorithm is based on the continuum model for crowd simulation and our flexible virtual structure approach for formation control in virtual environments. Simulations created with our algorithm run at interactive rates in quite complex environments. In addition, each formation can be deformed in real-time and the deformation is triggered either automatically (e. g. , when the formation’s path is blocked by dynamic obstacles) or manually. Via simulations, we show that we can plan at least four formations, each with tens of agents, in real-time on a PC.

ICRA Conference 2007 Conference Paper

Motion Planning of Multiple Agents in Virtual Environments on Parallel Architectures

  • Yi Li
  • Kamal Gupta 0001

We proposed in a previous paper [1] a hybrid two-layered approach for motion planning of multiple agents in static virtual environments, consisting of open spaces connected by multiple narrow passages. The discrete Generalized Voronoi Diagram (GVD) of the environment is used to identify narrow passages, and plan the global path of each agent independently of other agents' global paths. As each agent moves along its global path, the agent's path is locally modified using the hybrid technique of combining steering behaviors with Coordination Graphs (CG), where coordination graphs are used for deadlock avoidance in the narrow passages. The planner in the previous paper [1] was single threaded, and it was able to plan the motions of 30 agents moving around in a simple virtual environment with 3 narrow passages. If more agents are moving in a more complex virtual environment (i. e. , with more narrow passages), we may not be able to construct and process all the coordination graphs in real-time. In this paper, we parallelize the single threaded planner in a supervisor-worker paradigm with Unix processes who communicate with each other using System V Interprocess Communication (IPC) mechanism. We show that significant, scalable speedups are obtained by constructing and processing coordination graphs in parallel on a Symmetric Multiprocessing (SMP) system.

IROS Conference 2006 Conference Paper

A Hybrid Two-layered Approach to Real-Time Motion Planning of Multiple Agents in Virtual Environments

  • Yi Li
  • Kamal Gupta 0001

We proposed in a previous paper a hybrid technique, combining local steering behaviors and coordination graphs (CG), that allows real-time motion planning of multiple agents in a narrow passage. This hybrid technique not only avoids deadlocks, but also exhibits other interesting behaviors such as leader following, even though they are not explicitly coded in the algorithm. In this paper, we build upon the earlier result, and propose a two-layered approach to motion planning of multiple agents in virtual environments, consisting of open spaces connected by multiple narrow passages. The discrete generalized Voronoi diagram (GVD) of the static environment is used to identify all narrow passages automatically. The global path of each agent is also planned using the GVD. As each agent moves along its global path, it is locally modified using the hybrid technique combining steering behaviors with coordination graphs. Experimental results show that the resulting planner is able to plan motions of 30 agents in a virtual environment with three narrow passages in real-time, and the pre-processing phase of our approach is extremely fast. Since all planning is done in real-time, the approach allows an agent to change its final destination at runtime

ICRA Conference 2005 Conference Paper

Motion Planning of Multiple Agents in Virtual Environments using Coordination Graphs

  • Yi Li
  • Kamal Gupta 0001
  • Shahram Payandeh

Motion planning of multiple mobile agents in virtual environments is a very challenging problem, especially if one wants to plan the motions of these agents in real-time. We propose a two layered approach to plan motions of multiple mobile agents in real-time. The mobile agents are moving in a 2-dimensional static environment with open spaces connected to each other by narrow corridors. The global path of each agent is computed by a decoupled planner during the preprocessing process with minimum delay. Each agent’s local path is generated in real-time by combining steering behaviors and a new, principled and efficient AI technique for decision making and planning cooperative multi-agent dynamic systems, Coordination Graph (CG). With CG, we can not only avoid deadlocks in narrow corridors, but also achieve more complicated behavior such as leader-and-followers behavior. We show, via some preliminary examples, real-time performance of our approach, for instance, several robots avoiding deadlocks and successfully navigating a corridor.

NeurIPS Conference 1999 Conference Paper

The Relaxed Online Maximum Margin Algorithm

  • Yi Li
  • Philip Long

We describe a new incremental algorithm for training linear thresh(cid: 173) old functions: the Relaxed Online Maximum Margin Algorithm, or ROMMA. ROMMA can be viewed as an approximation to the algorithm that repeatedly chooses the hyperplane that classifies previously seen ex(cid: 173) amples correctly with the maximum margin. It is known that such a maximum-margin hypothesis can be computed by minimizing the length of the weight vector subject to a number of linear constraints. ROMMA works by maintaining a relatively simple relaxation of these constraints that can be efficiently updated. We prove a mistake bound for ROMMA that is the same as that proved for the perceptron algorithm. Our analysis implies that the more computationally intensive maximum-margin algo(cid: 173) rithm also satisfies this mistake bound; this is the first worst-case perfor(cid: 173) mance guarantee for this algorithm. We describe some experiments us(cid: 173) ing ROMMA and a variant that updates its hypothesis more aggressively as batch algorithms to recognize handwritten digits. The computational complexity and simplicity of these algorithms is similar to that of per(cid: 173) ceptron algorithm, but their generalization is much better. We describe a sense in which the performance of ROMMA converges to that of SVM in the limit if bias isn't considered.