Arrow Research search

Author name cluster

Ying Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

48 papers
2 author rows

Possible papers

48

AAAI Conference 2026 Conference Paper

Learning Whom to Align With: Progressive Anomaly Combination Detection for Partially View-Aligned Clustering

  • Hang Gao
  • Zuosong Cai
  • Yuze Li
  • Cheng Liu
  • Gaoyang Li
  • Ying Li
  • Wei Du
  • You Zhou

Partially View-aligned Clustering (PVC) addresses the challenge of partial view alignment in multi-view learning by leveraging complementary and consistent information. While existing PVC methods show promise, most rely on distance-based strategies that are sensitive to view-specific details and noise, limiting their robustness. In this work, we propose a novel view alignment strategy that reformulates the alignment task as an anomaly detection problem. Rather than learning a view-alignment matrix that enforces strict one-to-one correspondences across views, we adopt a progressive approach to identify well-aligned samples. Specifically, we sample subsets of data by generating random view combinations from unaligned samples and propose an anomaly combination detection module to evaluate the alignment consistency of these combinations. In addition, our progressive training framework alternates between updating model parameters and selecting high-confidence view combinations for subsequent optimization. By reformulating view alignment as an anomaly detection task, our approach provides a more robust and effective solution to partial view alignment. Experiments on benchmark datasets demonstrate that our method outperforms state-of-the-art approaches in the PVC problem.

AAAI Conference 2026 Conference Paper

ManipDreamer3D: Synthesizing Plausible Robotic Manipulation Video with Occupancy-aware 3D Trajectory

  • Ying Li
  • Xiaobao Wei
  • Xiaowei Chi
  • Yuming Li
  • Zhongyu Zhao
  • Hao Wang
  • Ningning Ma
  • Ming Lu

Data scarcity continues to be a critical bottleneck in the field of robotic manipulation, limiting the ability to train robust and generalizable models. While diffusion models provide a promising approach to synthesizing realistic robotic manipulation videos, their effectiveness hinges on the availability of precise and reasonable control instructions. Current methods primarily rely on 2D trajectories as instruction prompts, which inherently face issues with 3D spatial ambiguity. In this work, we present a novel framework named ManipDreamer3Dfor generating plausible 3D-aware robotic manipulation videos from the input image and the text instruction. Our method combines 3D trajectory planning with a reconstructed 3D occupancy map created from a third-person perspective, along with a novel trajectory-to-video diffusion model. Specifically, ManipDreamer3D first reconstructs the 3D occupancy representation from the input image and then computes an optimized 3D end-effector trajectory, minimizing path length, avoiding collisions and retiming. Next, we employ a latent editing technique to create video sequences from the initial image latent, text instruction and the optimized 3D trajectory. This process conditions our specially trained trajectory-to-video diffusion model to produce robotic pick-and-place videos. Our method significantly reduces human intervention requirements by autonomously planing plausible 3D trajectories. Experimental results demonstrate its superior visual quality and precision.

AAAI Conference 2026 Conference Paper

Pansharpening for Thin-Cloud Contaminated Remote Sensing Images: A Unified Framework and Benchmark Dataset

  • Songcheng Du
  • Yang Zou
  • Jiaxin Li
  • Mingxuan Liu
  • Ying Li
  • Changjing Shang
  • Qiang Shen

Pansharpening under thin cloudy conditions is a practically significant yet rarely addressed task, challenged by simultaneous spatial resolution degradation and cloud-induced spectral distortions. Existing methods often address cloud removal and pansharpening sequentially, leading to cumulative errors and suboptimal performance due to the lack of joint degradation modeling. To address these challenges, we propose a Unified Pansharpening Model with Thin Cloud Removal (Pan-TCR), an end-to-end framework that integrates physical priors. Motivated by theoretical analysis in the frequency domain, we design a frequency-decoupled restoration (FDR) block that disentangles the restoration of multispectral image (MSI) features into amplitude and phase components, each guided by complementary degradation-robust prompts: the near-infrared (NIR) band amplitude for cloud-resilient restoration, and the panchromatic (PAN) phase for high-resolution structural enhancement. To ensure coherence between the two components, we further introduce an interactive inter-frequency consistency (IFC) module, enabling cross-modal refinement that enforces consistency and robustness across frequency cues. Furthermore, we introduce the first real-world thin-cloud contaminated pansharpening dataset (PanTCR-GF2), comprising paired clean and cloudy PAN-MSI images, to enable robust benchmarking under realistic conditions. Extensive experiments on real-world and synthetic datasets demonstrate the superiority and robustness of Pan-TCR, establishing a new benchmark for pansharpening under realistic atmospheric degradations.

AAAI Conference 2026 Conference Paper

UVLM: Benchmarking Video Language Model for Underwater World Understanding

  • Xizhe Xue
  • Yang Zhou
  • Dawei Yan
  • Lijie Tao
  • Junjie Li
  • Ying Li
  • Haokui Zhang
  • Rong Xiao

Recently, video-language models (VidLMs) have gained widespread attention and adoption. However, existing works primarily focus on terrestrial scenarios, overlooking the highly demanding application needs of underwater observation. To overcome this gap, we introduce UVLM, an under water observation benchmark which is build through a collaborative approach combining human expertise and AI models. To ensure data quality, we have conducted in-depth considerations from multiple perspectives. First, to address the unique challenges of underwater environments, we selected videos that represent typical underwater challenges including light variations, water turbidity, and diverse viewing angles to construct the dataset. Second, to ensure data diversity, the dataset covers a wide range of frame rates, resolutions, 419 classes of marine animals, and various static plants and terrains. Next, for task diversity, we adopted a structured design where observation targets are categorized into two major classes: biological and environmental. Each category includes content observation and change/action observation, totaling 20 subtask types. Finally, we designed several challenging evaluation metrics to enable quantitative comparison and analysis of different methods. Experiments on two representative VidLMs demonstrate that fine-tuning VidLMs on UVLM significantly improves underwater world understanding while also showing potential for slight improvements on existing in-air VidLM benchmarks.

AAAI Conference 2025 Conference Paper

Contrastive Auxiliary Learning with Structure Transformation for Heterogeneous Graphs

  • Wei Du
  • Hongmin Sun
  • Hang Gao
  • Gaoyang Li
  • Ying Li

In recent years, methods based on heterogeneous graph neural networks (HGNNs) have been widely used for embedding heterogeneous graphs (HGs) due to their ability to effectively encode the rich information from HGs into low-dimensional node embeddings. Existing HGNNs focus on neighbor aggregation and semantic fusion while neglecting the HG structure and learning paradigms. However, the original HG data might lack node features, which existing models may not effectively account for. Additionally, exclusively relying on a single supervised learning approach may only partially leverage the invariant information in graph data. To address these challenges, we introduce the Contrastive Auxiliary Learning Model for Heterogeneous Graphs (CALHG). This model combines edge perturbation and graph diffusion to enhance graph data, allowing it to capture the inherent structural information within heterogeneous graphs fully. Additionally, we employ a category-guided multi-view contrastive learning approach, which does not rely on positive and negative samples for model training, enabling us to capture the intrinsic invariances in heterogeneous graph data. Extensive experiments and analyses on five benchmark datasets without node features and three benchmark datasets with node features demonstrate the effectiveness and efficiency of our novel method compared with several state-of-the-art methods.

AAAI Conference 2025 Conference Paper

Dynamic Syntactic Feature Filtering and Injecting Networks for Cross-lingual Dependency Parsing

  • Jianjian Liu
  • Zhengtao Yu
  • Ying Li
  • Yuxin Huang
  • Shengxiang Gao

Pre-trained language models enhanced parsers have achieved outstanding performance in rich-resource languages. Cross-lingual dependency parsing aims to learn useful knowledge from high-resource languages to alleviate data scarcity in low-resource languages. However, effectively reducing the syntactic structure distributional bias and excavating the commonalities among languages is the key challenge for cross-lingual dependency parsing. To address this issue, we propose novel dynamic syntactic feature filtering and injecting networks based on the typical shared-private model that employs one shared and two private encoders to separate source and target language features. Concretely, a Language-Specific Filtering Network (LSFN) on private encoders emphasizes helpful information and ignores the irrelevant or harmful parts of it from the source language. Meanwhile, a Language-Invariant Injecting Network (LIIN) on the shared encoder integrates the advantages of BiLSTM and improved Transformer encoders to transcend language boundaries, thus amplifying syntactic commonalities across languages. Experiments on seven benchmark datasets show that our model achieves an average absolute gain of 1.84 UAS and 3.43 LAS compared with the shared-private model. Comparative experiments validate that both LSFN and LIIN components are complementary in transferring beneficial knowledge from source to target languages. Detailed analyses highlight that our model can effectively capture linguistic commonalities and mitigate the effect of distributional bias, showcasing its robustness and efficacy.

NeurIPS Conference 2025 Conference Paper

FreqExit: Enabling Early-Exit Inference for Visual Autoregressive Models via Frequency-Aware Guidance

  • Ying Li
  • Chengfei Lyu
  • Huan Wang

Visual AutoRegressive (VAR) modeling employs a next-scale decoding paradigm that progresses from coarse structures to fine details. While enhancing fidelity and scalability, this approach challenges two fundamental assumptions of conventional dynamic inference: semantic stability (intermediate outputs approximating final results) and monotonic locality (smooth representation evolution across layers), which renders existing dynamic inference methods ineffective for VAR models. To address this challenge, we propose FreqExit, a unified training framework that enables dynamic inference in VAR without altering its architecture or compromising output quality. FreqExit is based on a key insight: high-frequency details are crucial for perceptual quality and tend to emerge only in later decoding stages. Leveraging this insight, we design targeted mechanisms that guide the model to learn more effectively through frequency-aware supervision. The proposed framework consists of three components: (1) a curriculum-based supervision strategy with progressive layer dropout and early exit loss; (2) a wavelet-domain high-frequency consistency loss that aligns spectral content across different generation steps; and (3) a lightweight self-supervised frequency-gated module that guides adaptive learning of both structural and detailed spectral components. On ImageNet 256×256, FreqExit achieves up to 2× speedup with only minor degradation, and delivers 1. 3× acceleration without perceptible quality loss. This enables runtime-adaptive acceleration within a unified model, offering a favorable trade-off between efficiency and fidelity for for practical and flexible deployment.

IJCAI Conference 2025 Conference Paper

FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers

  • Tianyu Chen
  • Haoyi Zhou
  • Ying Li
  • Hao Wang
  • Zhenzhe Zhang
  • Tianchen Zhu
  • Shanghang Zhang
  • Jianxin Li

Fourier Neural Operators (FNO) have emerged as promising solutions for efficiently solving partial differential equations (PDEs) by learning infinite-dimensional function mappings through frequency domain transformations. However, the sparsity of high-frequency signals limits computational efficiency for high-dimensional inputs, and fixed-pattern truncation often causes high-frequency signal loss, reducing performance in scenarios such as high-resolution inputs or long-term predictions. To address these challenges, we propose FreqMoE, an efficient and progressive training framework that exploits the dependency of high-frequency signals on low-frequency components. The model first learns low-frequency weights and then applies a sparse upward-cycling strategy to construct a mixture of experts (MoE) in the frequency domain, effectively extending the learned weights to high-frequency regions. Experiments on both regular and irregular grid PDEs demonstrate that FreqMoE achieves up to 16. 6 percent accuracy improvement while using merely 2. 1 percent parameters (47. 32x reduction) compared to dense FNO. Furthermore, the approach demonstrates remarkable stability in long-term predictions and generalizes seamlessly to various FNO variants and grid structures, establishing a new Low frequency Pretraining, High frequency Fine-tuning'' paradigm for solving PDEs.

JBHI Journal 2025 Journal Article

Localized Intra- and Inter-Tumoral Heterogeneity for Predicting Treatment Response to Neoadjuvant Chemotherapy in Breast Cancer

  • Yinhao Liang
  • Wenjie Tang
  • Qingcong Kong
  • Ting Wang
  • Jianjun Zhang
  • Wing W. Y. Ng
  • Siyi Chen
  • Ying Li

This study proposes a novel method for extracting breast cancer tumor heterogeneity descriptors to non-invasively predict whether pathological complete response (pCR) can be achieved after neoadjuvant chemotherapy (NAC). These localized descriptors extract corresponding heterogeneity features for different radiomic features and are able to capture tumor characteristics at various localization levels. These descriptors also capture tumor heterogeneity both at the individual tumor level and across the whole dataset, providing decision-making models with features that are both more effective and interpretable. We validated the effectiveness of the proposed features with the Kolmogorov-Arnold network (KAN) across multiple centers, yielding an AUC of 0. 92 when combined with pathological features and demonstrating good performance in external datasets (AUCs of 0. 84 and 0. 81). Additionally, we transform the best model into a symbolic formula to intuitively explain the machine learning model's prediction process, showing how factors such as age, HER2, Ki-67 and heterogeneity influence the prediction. The symbolized model is consistent with the experience of clinical experts, which enhances users' confidence in deep models. The experimental results show that our proposed features and method outperform classical heterogeneity features and end-to-end neural networks with a small additional computational cost.

ICLR Conference 2025 Conference Paper

LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation

  • Can Jin
  • Ying Li
  • Mingyu Zhao
  • Shiyu Zhao 0001
  • Zhenting Wang
  • Xiaoxiao He
  • Ligong Han
  • Tong Che

Visual prompting has gained popularity as a method for adapting pre-trained models to specific tasks, particularly in the realm of parameter-efficient tuning. However, existing visual prompting techniques often pad the prompt parameters around the image, limiting the interaction between the visual prompts and the original image to a small set of patches while neglecting the inductive bias present in shared information across different patches. In this study, we conduct a thorough preliminary investigation to identify and address these limitations. We propose a novel visual prompt design, introducing **Lo**w-**R**ank matrix multiplication for **V**isual **P**rompting (LoR-VP), which enables shared and patch-specific information across rows and columns of image pixels. Extensive experiments across seven network architectures and four datasets demonstrate significant improvements in both performance and efficiency compared to state-of-the-art visual prompting methods, achieving up to $6\times$ faster training times, utilizing $18\times$ fewer visual prompt parameters, and delivering a 3.1% improvement in performance.

NeurIPS Conference 2025 Conference Paper

Multi-View Oriented GPLVM: Expressiveness and Efficiency

  • Zi Yang
  • Ying Li
  • Zhidi Lin
  • Michael Minyi Zhang
  • Pablo Olmos

The multi-view Gaussian process latent variable model (MV-GPLVM) aims to learn a unified representation from multi-view data but is hindered by challenges such as limited kernel expressiveness and low computational efficiency. To overcome these issues, we first introduce a new duality between the spectral density and the kernel function. By modeling the spectral density with a bivariate Gaussian mixture, we then derive a generic and expressive kernel termed Next-Gen Spectral Mixture (NG-SM) for MV-GPLVMs. To address the inherent computational inefficiency of the NG-SM kernel, we propose a random Fourier feature approximation. Combined with a tailored reparameterization trick, this approximation enables scalable variational inference for both the model and the unified latent representations. Numerical evaluations across a diverse range of multi-view datasets demonstrate that our proposed method consistently outperforms state-of-the-art models in learning meaningful latent representations.

AAAI Conference 2025 Conference Paper

Safe Online Convex Optimization with Heavy-Tailed Observation Noises

  • Yunhao Yang
  • Bo Xue
  • Yunzhi Hao
  • Ying Li
  • Yuanyu Wan

We investigate safe online convex optimization (SOCO), where each decision must satisfy a set of unknown linear constraints. Assuming that the unknown constraints can be observed with a sub-Gaussian noise for each chosen decision, previous studies have established a high-probability regret bound of O(T^{2/3}). However, this assumption may not hold in many practical scenarios. To address this limitation, in this paper, we relax the assumption to allow any noise that admits finite (1+ε)-th moments for some ε∈(0,1], and propose two algorithms that enjoy an O(T^{c_ε}) regret bound with high probability, where T is the time horizon and c_ε=(1+ε)/(1+2ε). The key idea of our two algorithms is to respectively utilize the median-of-means and truncation techniques to achieve accurate estimation under heavy-tailed noises. To the best of our knowledge, these are the first algorithms designed to handle SOCO with heavy-tailed observation noises.

IS Journal 2025 Journal Article

SNNL: A Programming Language for SNN Development

  • Qinghui Xing
  • Zirun Li
  • Ying Li
  • Schahram Dustdar
  • Xin Du
  • Gang Pan
  • Shuiguang Deng

Spiking Neural Networks (SNNs) are gaining attention for biological plausibility and energy efficiency. Advances in neuromorphic systems—integrating hardware and software tools—accelerate SNN implementation. Yet, deploying SNNs on such platforms remains challenging due to model complexity and system heterogeneity, requiring flexible frameworks. Existing tools (e. g. , PyNN, Brian2) show limited expressiveness for neuromorphic applications or poor cross-platform support. This paper proposes SNNL, a flexible domain-specific language for SNN development and deployment on neuromorphic hardware. SNNL decouples neuronal dynamics modeling from network topology specification: equation-based representations handle diverse neuron/synapse models, while hierarchical constructs define complex connectivity patterns. We present a Darwin3-targeted compiler with efficient code generation. Evaluations confirm SNNL achieves precise neuronal dynamic descriptions and flexible network configurations. This work bridges algorithm-hardware gaps in neuromorphic computing by enhancing programmability. Experimental results have demonstrated the feasibility of SNNL in developing SNNs for neuromorphic systems.

NeurIPS Conference 2025 Conference Paper

URDF-Anything: Constructing Articulated Objects with 3D Multimodal Language Model

  • Zhe Li
  • Xiang Bai
  • Jieyu Zhang
  • Zhuangzhe Wu
  • Che Xu
  • Ying Li
  • Chengkai Hou
  • Shanghang Zhang

Constructing accurate digital twins of articulated objects is essential for robotic simulation training and embodied AI world model building, yet historically requires painstaking manual modeling or multi-stage pipelines. In this work, we propose \textbf{URDF-Anything}, an end-to-end automatic reconstruction framework based on a 3D multimodal large language model (MLLM). URDF-Anything utilizes an autoregressive prediction framework based on point-cloud and text multimodal input to jointly optimize geometric segmentation and kinematic parameter prediction. It implements a specialized [SEG] token mechanism that interacts directly with point cloud features, enabling fine-grained part-level segmentation while maintaining consistency with the kinematic parameter predictions. Experiments on both simulated and real-world datasets demonstrate that our method significantly outperforms existing approaches regarding geometric segmentation (mIoU 17\% improvement), kinematic parameter prediction (average error reduction of 29\%), and physical executability (surpassing baselines by 50\%). Notably, our method exhibits excellent generalization ability, performing well even on objects outside the training set. This work provides an efficient solution for constructing digital twins for robotic simulation, significantly enhancing the sim-to-real transfer capability.

AAAI Conference 2024 Conference Paper

Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images

  • Qingping Zheng
  • Yuanfan Guo
  • Jiankang Deng
  • Jianhua Han
  • Ying Li
  • Songcen Xu
  • Hang Xu

Stable diffusion, a generative model used in text-to-image synthesis, frequently encounters resolution-induced composition problems when generating images of varying sizes. This issue primarily stems from the model being trained on pairs of single-scale images and their corresponding text descriptions. Moreover, direct training on images of unlimited sizes is unfeasible, as it would require an immense number of text-image pairs and entail substantial computational expenses. To overcome these challenges, we propose a two-stage pipeline named Any-Size-Diffusion (ASD), designed to efficiently generate well-composed HD images of any size, while minimizing the need for high-memory GPU resources. Specifically, the initial stage, dubbed Any Ratio Adaptability Diffusion (ARAD), leverages a selected set of images with a restricted range of ratios to optimize the text-conditional diffusion model, thereby improving its ability to adjust composition to accommodate diverse image sizes. To support the creation of images at any desired size, we further introduce a technique called Fast Seamless Tiled Diffusion (FSTD) at the subsequent stage. This method allows for the rapid enlargement of the ASD output to any high-resolution size, avoiding seaming artifacts or memory overloads. Experimental results on the LAION-COCO and MM-CelebA-HQ benchmarks demonstrate that ASD can produce well-structured images of arbitrary sizes, cutting down the inference time by 2X compared to the traditional tiled algorithm. The source code is available at https://github.com/ProAirVerse/Any-Size-Diffusion.

IJCAI Conference 2024 Conference Paper

FBLG: A Local Graph Based Approach for Handling Dual Skewed Non-IID Data in Federated Learning

  • Yi Xu
  • Ying Li
  • Haoyu Luo
  • Xiaoliang Fan
  • Xiao Liu

In real-world situations, federated learning often needs to process non-IID (non-independent and identically distributed) data with multiple skews, causing inadequate model performance. Existing federated learning methods mainly focus on addressing the problem with a single skew of non-IID, and hence the performance of global models can be degraded when faced with dual skewed non-IID data caused by heterogeneous label distributions and sample sizes among clients. To address the problem with dual skewed non-IID data, in this paper, we propose a federated learning algorithm based on local graph, named FBLG. Specifically, to address the label distribution skew, we firstly construct a local graph based on clients' local losses and Jensen-Shannon (JS) divergence, so that similar clients can be selected for aggregation to ensure a highly consistent global model. Afterwards, to address the sample size skew, we design the objective function to favor clients with more samples as models trained with more samples tend to carry more useful information. Experiments on four datasets with dual skewed non-IID data demonstrate FBLG outperforms nine baseline methods and achieves up to 9% improvement in accuracy. Simultaneously, both theoretical analysis and experiments show FBLG can converge quickly.

AAAI Conference 2023 Conference Paper

Semi-attention Partition for Occluded Person Re-identification

  • Mengxi Jia
  • Yifan Sun
  • Yunpeng Zhai
  • Xinhua Cheng
  • Yi Yang
  • Ying Li

This paper proposes a Semi-Attention Partition (SAP) method to learn well-aligned part features for occluded person re-identification (re-ID). Currently, the mainstream methods employ either external semantic partition or attention-based partition, and the latter manner is usually better than the former one. Under this background, this paper explores a potential that the weak semantic partition can be a good teacher for the strong attention-based partition. In other words, the attention-based student can substantially surpass its noisy semantic-based teacher, contradicting the common sense that the student usually achieves inferior (or comparable) accuracy. A key to this effect is: the proposed SAP encourages the attention-based partition of the (transformer) student to be partially consistent with the semantic-based teacher partition through knowledge distillation, yielding the so-called semi-attention. Such partial consistency allows the student to have both consistency and reasonable conflict with the noisy teacher. More specifically, on the one hand, the attention is guided by the semantic partition from the teacher. On the other hand, the attention mechanism itself still has some degree of freedom to comply with the inherent similarity between different patches, thus gaining resistance against noisy supervision. Moreover, we integrate a battery of well-engineered designs into SAP to reinforce their cooperation (e.g., multiple forms of teacher-student consistency), as well as to promote reasonable conflict (e.g., mutual absorbing partition refinement and a supervision signal dropout strategy). Experimental results confirm that the transformer student achieves substantial improvement after this semi-attention learning scheme, and produces new state-of-the-art accuracy on several standard re-ID benchmarks.

IS Journal 2022 Journal Article

Xsickness in Intelligent Mobile Spaces and Metaverses

  • Ruichen Tan
  • Ruiyang Gao
  • Wenbo Li
  • Kai Cao
  • Ying Li
  • Chen Lv
  • Fei-Yue Wang
  • Dongpu Cao

Motion sickness is known to be a common problem that influences the comfort and work efficiency of human beings during their daily lives. With the proliferation of increasingly intelligent systems, the detection and mitigation of motion sickness will face more opportunities along with bigger challenges. On the one hand, the technology for integrated sensors in the intelligent system will provide more accurate and efficient methods for motion sickness detection. However, on the other hand, since cyber-physical systems have been gaining increasing concerns in the past two decades, the cyber-physical-social systems introduce and augment the social characteristics of such systems. The interactions between physical space and cyber space increase the chance of sensory conflicts when people use intelligent systems, such as traveling in intelligent cockpits or using metaverse-related virtual reality devices. The multimodal interaction methods and larger screens will cause more sensory conflicts. The symptoms will be more severe compared to traditional motion sickness. In this article, the classifications are first introduced based on the causes of motion sickness. A new type of multifactorial motion sickness (Xsickness) is discussed, which is foreseeable to be common with intelligent development. Then, the current state-of-the-art detection methods for motion sickness and cybersickness are summarized and theoretical methods for Xsickness detection are discussed. Finally, the mitigation methods based on motion reduction and four means of human perception are discussed and the innovative mitigation methods based on the intelligent system are also introduced.

JBHI Journal 2021 Journal Article

GCSBA-Net: Gabor-Based and Cascade Squeeze Bi-Attention Network for Gland Segmentation

  • Zhijie Wen
  • Ru Feng
  • Jingxin Liu
  • Ying Li
  • Shihui Ying

Colorectal cancer is the second and the third most common cancer in women and men, respectively. Pathological diagnosis is the “gold standard” for tumor diagnosis. Accurate segmentation of glands from tissue images is a crucial step in assisting pathologists in their diagnosis. The typical methods for gland segmentation form a dense image representation, ignoring its texture and multi-scale attention information. Therefore, we utilize a Gabor-based module to extract texture information at different scales and directions in histopathology images. This paper also designs a Cascade Squeeze Bi-Attention (CSBA) module. Specifically, we add Atrous Cascade Spatial Pyramid (ACSP), Squeeze Position Attention (SPA) module and Squeeze Channel Attention module (SCA) to model semantic correlation and maintain the multi-level aggregation on the spatial pyramid with different dilations. Besides, to solve the imbalance of data distribution and boundary blur, we propose a hybrid loss function to response the object boudary better. The experimental results show that the proposed method achieves state-of-the-art performance on the GlaS challenge dataset and CRAG colorectal adenocarcinoma dataset, respectively.

IJCAI Conference 2021 Conference Paper

Knowledge-Aware Dialogue Generation via Hierarchical Infobox Accessing and Infobox-Dialogue Interaction Graph Network

  • Sixing Wu
  • Minghui Wang
  • Dawei Zhang
  • Yang Zhou
  • Ying Li
  • Zhonghai Wu

Due to limited knowledge carried by queries, traditional dialogue systems often face the dilemma of generating boring responses, leading to poor user experience. To alleviate this issue, this paper proposes a novel infobox knowledge-aware dialogue generation approach, HITA-Graph, with three unique features. First, open-domain infobox tables that describe entities with relevant attributes are adopted as the knowledge source. An order-irrelevance Hierarchical Infobox Table Encoder is proposed to represent an infobox table at three levels of granularity. In addition, an Infobox-Dialogue Interaction Graph Network is built to effectively integrate the infobox context and the dialogue context into a unified infobox representation. Second, a Hierarchical Infobox Attribute Attention mechanism is developed to access the encoded infobox knowledge at different levels of granularity. Last but not least, a Dynamic Mode Fusion strategy is designed to allow the Decoder to select a vocabulary word or copy a word from the given infobox/query. We extract infobox tables from Chinese Wikipedia and construct an infobox knowledge base. Extensive evaluation on an open-released Chinese corpus demonstrates the superior performance of our approach against several representative methods.

IJCAI Conference 2020 Conference Paper

TopicKA: Generating Commonsense Knowledge-Aware Dialogue Responses Towards the Recommended Topic Fact

  • Sixing Wu
  • Ying Li
  • Dawei Zhang
  • Yang Zhou
  • Zhonghai Wu

Insufficient semantic understanding of dialogue always leads to the appearance of generic responses, in generative dialogue systems. Recently, high-quality knowledge bases have been introduced to enhance dialogue understanding, as well as to reduce the prevalence of boring responses. Although such knowledge-aware approaches have shown tremendous potential, they always utilize the knowledge in a black-box fashion. As a result, the generation process is somewhat uncontrollable, and it is also not interpretable. In this paper, we introduce a topic fact-based commonsense knowledge-aware approach, TopicKA. Different from previous works, TopicKA generates responses conditioned not only on the query message but also on a topic fact with an explicit semantic meaning, which also controls the direction of generation. Topic facts are recommended by a recommendation network trained under the Teacher-Student framework. To integrate the recommendation network and the generation network, this paper designs four schemes, which include two non-sampling schemes and two sampling methods. We collected and constructed a large-scale Chinese commonsense knowledge graph. Experimental results on an open Chinese benchmark dataset indicate that our model outperforms baselines in terms of both the objective and the subjective metrics.

IJCAI Conference 2019 Conference Paper

Self-attentive Biaffine Dependency Parsing

  • Ying Li
  • Zhenghua Li
  • Min Zhang
  • Rui Wang
  • Sheng Li
  • Luo Si

The current state-of-the-art dependency parsing approaches employ BiLSTMs to encode input sentences. Motivated by the success of the transformer-based machine translation, this work for the first time applies the self-attention mechanism to dependency parsing as the replacement of the BiLSTM-based encoders, leading to competitive performance on both English and Chinese benchmark data. Based on the detailed error analysis, we then combine the power of both BiLSTM and self-attention via model ensembles, demonstrating their complementary capability of capturing contextual information. Finally, we explore the recently proposed contextualized word representations as extra input features, and further improve the parsing performance.

AAAI Conference 2018 Conference Paper

Early Prediction of Diabetes Complications from Electronic Health Records: A Multi-Task Survival Analysis Approach

  • Bin Liu
  • Ying Li
  • Zhaonan Sun
  • Soumya Ghosh
  • Kenney Ng

Type 2 diabetes mellitus (T2DM) is a chronic disease that usually results in multiple complications. Early identification of individuals at risk for complications after being diagnosed with T2DM is of significant clinical value. In this paper, we present a new data-driven predictive approach to predict when a patient will develop complications after the initial T2DM diagnosis. We propose a novel survival analysis method to model the time-to-event of T2DM complications designed to simultaneously achieve two important metrics: 1) accurate prediction of event times, and 2) good ranking of the relative risks of two patients. Moreover, to better capture the correlations of time-to-events of the multiple complications, we further develop a multi-task version of the survival model. To assess the performance of these approaches, we perform extensive experiments on patient level data extracted from a large electronic health record claims database. The results show that our new proposed survival analysis approach consistently outperforms traditional survival models and demonstrate the effectiveness of the multi-task framework over modeling each complication independently.

ICRA Conference 2002 Conference Paper

An Analytical Grasp Planning on Given Object with Multifingered Hand

  • Ying Li
  • Yong Yu 0003
  • Showzow Tsujio

In this paper, an analytical approach is proposed for planning finger positions of grasping an object with a multifingered hand. First, a method is given to obtain which combination of the object edges is possible to be used for grasping. Then, a graspable finger position region (GFPR) on a combination of edges is defined where the object can be held successfully. It is shown that the region is bounded by several boundary hyperplanes. By combining these boundary hyperplanes, two propositions for analytically and exactly obtaining the GFPR are proposed. An algorithm is proposed to find a stable GFPR that contains the biggest inscribed hypersphere of GFPR and has the largest volume. Finally, a numerical example is performed to show the effectiveness of the proposed grasp planning approach.

IROS Conference 2001 Conference Paper

A novel analytical method for finger position regions on grasped object

  • Yong Yu 0003
  • Ying Li
  • Showzow Tsujio

An analytical approach is proposed for obtaining finger position regions of an object with a multi-fingered hand. First, a method to obtain which combination of the object edges is possible to be used for grasping, is given. Then, a graspable finger position region on a combination of edges is defined where the object can be held successfully. It is shown that the region is bounded by plural boundary hyperplanes. By combining these boundary hyperplanes, two propositions for exactly obtaining the graspable finger position region by using an analytical method, are proposed. Finally, numerical examples are performed to show the effectiveness of the proposed approach.