Arrow Research search

Author name cluster

Chen Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

56 papers
2 author rows

Possible papers

56

AAAI Conference 2026 Conference Paper

HPSU: A Benchmark for Human-Level Perception in Real-World Spoken Speech Understanding

  • Chen Li
  • Peiji Yang
  • Yicheng Zhong
  • Jianxing Yu
  • Zhisheng Wang
  • Zihao Gou
  • Wenqing Chen
  • Jian Yin

Recent advances in Speech Large Language Models (Speech LLMs) have led to great progress in speech understanding tasks such as Automatic Speech Recognition (ASR) and Speech Emotion Recognition (SER). However, whether these models can achieve human-level auditory perception, particularly in terms of their ability to comprehend latent intentions and implicit emotions in real-world spoken language, remains underexplored. To this end, we introduce the Human-level Perception in Spoken Speech Understanding (HPSU), a new benchmark for fully evaluating the human-level perceptual and understanding capabilities of Speech LLMs. HPSU comprises over 20,000 expert-validated spoken language understanding samples in English and Chinese. It establishes a comprehensive evaluation framework by encompassing a spectrum of tasks, ranging from basic speaker attribute recognition to complex inference of latent intentions and implicit emotions. To address the issues of data scarcity and high cost of manual annotation in real-world scenarios, we developed a semi-automatic annotation process. This process fuses audio, textual, and visual information to enable precise speech understanding and labeling, thus enhancing both annotation efficiency and quality. We systematically evaluate various open-source and proprietary Speech LLMs. The results demonstrate that even top-performing models still fall considerably short of human capabilities in understanding genuine spoken interactions. Consequently, HPSU will be useful for guiding the development of Speech LLMs toward human-level perception and cognition.

AAAI Conference 2026 Conference Paper

MCTSr-Zero: Self-Reflective Psychological Counseling Dialogues Generation via Principles and Adaptive Exploration

  • Hao Lu
  • Yanchi Gu
  • Haoyuan Huang
  • Yulin Zhou
  • Ningxin Zhu
  • Chen Li

The integration of Monte Carlo Tree Search (MCTS) with Large Language Models (LLMs) has demonstrated significant success in structured, problem-oriented tasks. However, applying these methods to open-ended dialogues, such as those in psychological counseling, presents unique challenges. Unlike tasks with objective correctness, success in therapeutic conversations depends on subjective factors like empathetic engagement, ethical adherence, and alignment with human preferences, for which strict correctness criteria are ill-defined. Existing result-oriented MCTS approaches can therefore produce misaligned responses. To address this, we introduce MCTSr-Zero, an MCTS framework designed for open-ended, human-centric dialogues. Its core innovation is domain alignment, which shifts the MCTS search objective from predefined end-states towards conversational trajectories that conform to target domain principles (e.g., empathy in counseling). Furthermore, MCTSr-Zero incorporates Regeneration and Meta-Prompt Adaptation mechanisms to substantially broaden exploration by allowing the MCTS to consider fundamentally different initial dialogue strategies. We evaluate MCTSr-Zero in psychological counseling by generating multi-turn dialogue data, which is used to fine-tune an LLM, PsyLLM. We also introduce PsyEval, a benchmark for assessing multi-turn psychological counseling dialogues. Experiments demonstrate that PsyLLM achieves state-of-the-art performance on PsyEval and other relevant metrics, validating MCTSr-Zero's effectiveness in generating high-quality, principle-aligned conversational data for human-centric domains and addressing the LLM challenge of consistently adhering to complex psychological standards.

AAAI Conference 2026 Conference Paper

STELAR-VISION: Self-Topology-Aware Efficient Learning for Aligned Reasoning in Vision

  • Chen Li
  • Han Zhang
  • Zhantao Yang
  • Fangyi Chen
  • Zihan Wang
  • Anudeepsekhar Bolimera
  • Marios Savvides

Vision-language models (VLMs) have made significant strides in reasoning, yet they often struggle with complex multimodal tasks and tend to generate overly verbose outputs. A key limitation is their reliance on chain-of-thought (CoT) reasoning, despite many tasks benefiting from alternative topologies like trees or graphs. To address this, we introduce STELAR-Vision, a training framework for topology-aware reasoning. At its core is TopoAug, a synthetic data pipeline that enriches training with diverse topological structures. Using supervised fine-tuning and reinforcement learning, we post-train Qwen2VL models with both accuracy and efficiency in mind. Additionally, we propose Frugal Learning, which reduces output length with minimal accuracy loss. On MATH-V and VLM_S2H, STELAR-Vision improves accuracy by 9.7% over its base model and surpasses the larger Qwen2VL-72B-Instruct by 7.3%. On five out-of-distribution benchmarks, it outperforms Phi-4-Multimodal-Instruct by up to 28.4% and LLaMA-3.2-11B-Vision-Instruct by up to 13.2%, demonstrating strong generalization. Compared to Chain-Only training, our approach achieves 4.3% higher overall accuracy on in-distribution datasets and consistently outperforms across all OOD benchmarks.

JBHI Journal 2025 Journal Article

Channel-Gated Transformers With Affinity CAM for Weakly Supervised Multi-Class Brain Tumor Segmentation

  • Yan Han
  • Kai Liu
  • Lingling Yuan
  • Md Rahaman
  • Marcin Grzegorzek
  • Hongzan Sun
  • Chen Li
  • Huiling Chen

Precise tumor localization and sub-region identification are critical for disease diagnosis. However, current Weakly Supervised Semantic Segmentation (WSSS) methods for brain tumor segmentation are primarily single-class, neglecting differences between tumor sub-regions. We observed that when mainstream transformer-based WSSS methods are applied to multi-class brain tumor segmentation, they encounter two major challenges: sub-region discrimination errors and over-segmentation of small lesions. To address these challenges and advance multi-class WSSS methods for brain tumor analysis, this paper proposes Channel-gated Transformers with Affinity CAM (CTAC). CTAC first employs channel-gated multi-head self-attention to overcome the over-smoothing tendency of the transformer, thereby enhancing inter-class discriminability and improving the model's subclass differentiation capability. Then, CTAC uses multi-scale smoothed affinity to adaptively suppress low-confidence responses in the Class Activation Map (CAM), mitigating over-activation in the CAM, and alleviating the over-segmentation phenomena of small lesions. The proposed CTAC significantly outperformed the baseline method on the BraTS2021 glioma and BraTS2023-MEN meningioma datasets. On Brats2021, it achieved a multi-class mean IoU (mIoU) of 61. 718%, an increase of 4. 964 percentage points (pp), with the whole-tumor mIoU reaching 79. 798% (+6. 882 pp). On Brats2023-MEN, CTAC attained 72. 887% mIoU (+4. 676 pp) for multi-class segmentation and 75. 394% (+7. 839 pp) for whole-tumor. Furthermore, CTAC surpasses recent state-of-the-art methods. Code is available at https://github.com/yhan94-lab/CTAC.

JBHI Journal 2025 Journal Article

Dual-Level Imbalance Mitigation for Single-FoV Colorectal Histopathology Image Classification

  • Lingling Yuan
  • Yang Chen
  • Md Rahaman
  • Hongzan Sun
  • Haoyuan Chen
  • Marcin Grzegorzek
  • Chen Li
  • Xiaoyan Li

Single-field-of-view (FoV) histopathological image classification is vital for colorectal cancer (CRC) diagnosis in mid- to low-tier hospitals lacking whole-slide imaging (WSI) scanners and storage, yet suffers from severe class imbalance and degraded performance. To address this, we propose a dual-level imbalance mitigation (DIM) framework integrating data-level and algorithm-level approaches. Specifically: (1) A global context generative adversarial network (GCGAN) generates realistic minority-class images for augmentation to balance the dataset. (2) A frequency-aware adaptive focal loss (FAFL) applies a frequency-aware offset and adaptive modulation to better separate overlapping classes. (3) A lightweight receptive field-based convolutional neural network (LRF-CNN) is trained under DIM to leverage both augmentation and loss modulation for improved classification. Extensive experiments on the single-FoV colorectal histopathology dataset demonstrate that DIM-equipped LRF-CNN outperforms five state-of-the-art models (SOTA) across multiple metrics. Furthermore, each DIM component enhances performance when applied individually to those SOTA models, and additional validation on six single-FoV histopathological datasets confirms the generalizability and effectiveness of the proposed DIM framework. Our code is available at https://github.com/Lingling-Yuan/DIM.

JBHI Journal 2025 Journal Article

Few-Shot Class-Incremental Learning for Retinal Disease Recognition

  • Jinghua Zhang
  • Peng Zhao
  • Yongkun Zhao
  • Chen Li
  • Dewen Hu

Few-Shot Class-Incremental Learning (FSCIL) techniques are essential for developing Deep Learning (DL) models that can continuously learn new classes with limited samples while retaining existing knowledge. This capability is particularly crucial for DL-based retinal disease diagnosis system, where acquiring large annotated datasets is challenging, and disease phenotypes evolve over time. This paper introduces Re-FSCIL, a novel framework for Few-Shot Class-Incremental Retinal Disease Recognition (FSCIRDR). Re-FSCIL integrates the RETFound model with a fine-grained module, employing a forward-compatible training strategy to improve adaptability, supervised contrastive learning to enhance feature discrimination, and feature fusion for robust representation quality. We convert existing datasets into the FSCIL format and reproduce numerous representative FSCIL methods to create two new benchmarks, RFMiD38 and JSIEC39, specifically for FSCIRDR. Our experimental results demonstrate that Re-FSCIL achieves State-of-the-art (SOTA) performance, significantly surpassing existing FSCIL methods on these benchmarks.

IJCAI Conference 2025 Conference Paper

InstGAN: Instant Actor-Critic-Driven GAN for De Novo Molecule Generation and Property Optimization

  • Huidong Tang
  • Chen Li
  • Sayaka Kamei
  • Yoshihiro Yamanishi
  • Yasuhiko Morimoto

Deep generative models, such as generative adversarial networks (GANs), have been employed for de~novo molecular generation in drug discovery. Most prior studies have utilized reinforcement learning (RL) algorithms, particularly Monte Carlo tree search (MCTS), to handle the discrete nature of molecular representations in GANs. However, due to the inherent instability in training GANs and RL models, along with the high computational cost associated with MCTS sampling, MCTS RL-based GANs struggle to scale to large chemical databases. To tackle these challenges, this study introduces a novel GAN based on actor-critic RL with instant and global rewards, called InstGAN, to generate molecules at the token-level with multi-property optimization. Furthermore, maximized information entropy is leveraged to alleviate the mode collapse. The experimental results demonstrate that InstGAN outperforms other baselines, achieves comparable performance to state-of-the-art models, and efficiently generates molecules with multi-property optimization. The code is available at: https: //github. com/tang777777/InstGAN.

NeurIPS Conference 2025 Conference Paper

Interaction-Centric Knowledge Infusion and Transfer for Open Vocabulary Scene Graph Generation

  • Lin Li
  • Chuhan ZHANG
  • Dong Zhang
  • Chong Sun
  • Chen Li
  • Long Chen

Open-vocabulary scene graph generation (OVSGG) extends traditional SGG by recognizing novel objects and relationships beyond predefined categories, leveraging the knowledge from pre-trained large-scale models. Existing OVSGG methods always adopt a two-stage pipeline: 1) Infusing knowledge into large-scale models via pre-training on large datasets; 2) Transferring knowledge from pre-trained models with fully annotated scene graphs during supervised fine-tuning. However, due to a lack of explicit interaction modeling, these methods struggle to distinguish between interacting and non-interacting instances of the same object category. This limitation induces critical issues in both stages of OVSGG: it generates noisy pseudo-supervision from mismatched objects during knowledge infusion, and causes ambiguous query matching during knowledge transfer. To this end, in this paper, we propose an interACtion-Centric end-to-end OVSGG framework (ACC) in an interaction-driven paradigm to minimize these mismatches. For interaction-centric knowledge infusion, ACC employs a bidirectional interaction prompt for robust pseudo-supervision generation to enhance the model's interaction knowledge. For interaction-centric knowledge transfer, ACC first adopts interaction-guided query selection that prioritizes pairing interacting objects to reduce interference from non-interacting ones. Then, it integrates interaction-consistent knowledge distillation to bolster robustness by pushing relational foreground away from the background while retaining general knowledge. Extensive experimental results on three benchmarks show that ACC achieves state-of-the-art performance, demonstrating the potential of interaction-centric paradigms for real-world applications.

JBHI Journal 2025 Journal Article

Investigating the Effectiveness of Haptic Resistive Force Feedback to Improve Tremors in Parkinson's Disease: A Training Impact Study

  • Nafees Mahmood
  • Meldin Bektic
  • Chen Li
  • Angela Ridgel
  • Kwangtaek Kim

The goal of this study was to investigate the effectiveness of haptic resistive force feedback in improving the upper limb tremors in Parkinson's Disease (PD) during a button push task. We developed a haptic-mixed reality (HMR) button push simulation system that provides virtual button touch feedback synchronized with mixed reality 3D buttons and various levels of kinesthetic resistance force against hand movements. Two user studies were conducted: one to ensure the system's usability with 25 healthy individuals, and another to measure tremor improvement over a five-day training program with seven individuals diagnosed with idiopathic PD. PD participants were randomly assigned to either a haptic resistive force group or a no haptic resistance group. The results demonstrated that the system provided excellent usability for both healthy and PD participants. Additionally, PD participants in the haptic resistance group showed improvements in tremor and upper bradykinesia levels compared to the no haptic resistance group.

AAAI Conference 2025 Conference Paper

Mamba YOLO: A Simple Baseline for Object Detection with State Space Model

  • Zeyu Wang
  • Chen Li
  • Huiying Xu
  • Xinzhong Zhu
  • Hongbo Li

Driven by the rapid development of deep learning technology, the YOLO series has set a new benchmark for real-time object detectors. Additionally, transformer-based structures have emerged as the most powerful solution in the field, greatly extending the model's receptive field and achieving significant performance improvements. However, this improvement comes at a cost, as the quadratic complexity of the self-attentive mechanism increases the computational burden of the model. To address this problem, we introduce a simple yet effective baseline approach called Mamba YOLO. Our contributions are as follows: 1) We propose that the ODMamba backbone introduce a State Space Model (SSM) with linear complexity to address the quadratic complexity of self-attention. Unlike the other Transformer-base and SSM-base method, ODMamba is simple to train without pretraining. 2) For real-time requirement, we designed the macro structure of ODMamba, determined the optimal stage ratio and scaling size. 3) We design the RG Block that employs a multi-branch structure to model the channel dimensions, which addresses the possible limitations of SSM in sequence modeling, such as insufficient receptive fields and weak image localization. This design captures localized image dependencies more accurately and significantly. Extensive experiments on the publicly available COCO benchmark dataset show that Mamba YOLO achieves state-of-the-art performance compared to previous methods. Specifically, a tiny version of Mamba YOLO achieves a 7.5% improvement in mAP on a single 4090 GPU with an inference time of 1.5 ms.

NeurIPS Conference 2025 Conference Paper

MATCH: Multi-faceted Adaptive Topo-Consistency for Semi-Supervised Histopathology Segmentation

  • Meilong Xu
  • Xiaoling Hu
  • Shahira Abousamra
  • Chen Li
  • Chao Chen

In semi-supervised segmentation, capturing meaningful semantic structures from unlabeled data is essential. This is particularly challenging in histopathology image analysis, where objects are densely distributed. To address this issue, we propose a semi-supervised segmentation framework designed to robustly identify and preserve relevant topological features. Our method leverages multiple perturbed predictions obtained through stochastic dropouts and temporal training snapshots, enforcing topological consistency across these varied outputs. This consistency mechanism helps distinguish biologically meaningful structures from transient and noisy artifacts. A key challenge in this process is to accurately match the corresponding topological features across the predictions in the absence of ground truth. To overcome this, we introduce a novel matching strategy that integrates spatial overlap with global structural alignment, minimizing discrepancies among predictions. Extensive experiments demonstrate that our approach effectively reduces topological errors, resulting in more robust and accurate segmentations essential for reliable downstream analysis. Code is available at https: //github. com/Melon-Xu/MATCH.

ICRA Conference 2025 Conference Paper

Reinforcement Learning Within the Classical Robotics Stack: A Case Study in Robot Soccer

  • Adam Labiosa
  • Zhihan Wang
  • Siddhant Agarwal
  • William Cong
  • Geethika Hemkumar
  • Abhinav Narayan Harish
  • Benjamin Hong
  • Josh Kelle

Robot decision-making in partially observable, real-time, dynamic, and multi-agent environments remains a difficult and unsolved challenge. Model-free reinforcement learning (RL) is a promising approach to learning decisionmaking in such domains, however, end-to-end RL in complex environments is often intractable. To address this challenge in the RoboCup Standard Platform League (SPL) domain, we developed a novel architecture integrating RL within a classical robotics stack, while employing a multi-fidelity sim2real approach and decomposing behavior into learned sub-behaviors with heuristic selection. Our architecture led to victory in the 2024 RoboCup SPL Challenge Shield Division. In this work, we fully describe our system's architecture and empirically analyze key design decisions that contributed to its success. Our approach demonstrates how RL-based behaviors can be integrated into complete robot behavior architectures.

AAAI Conference 2025 Conference Paper

RemDet: Rethinking Efficient Model Design for UAV Object Detection

  • Chen Li
  • Rui Zhao
  • Zeyu Wang
  • Huiying Xu
  • Xinzhong Zhu

Object detection in Unmanned Aerial Vehicle (UAV) images has emerged as a focal area of research, which presents two significant challenges: i) objects are typically small and dense within vast images; ii) computational resource constraints render most models unsuitable for real-time deployment. Current real-time object detectors are not optimized for UAV images, and complex methods designed for small object detection often lack real-time capabilities. To address these challenges, we propose a novel detector, RemDet (Reparameter efficient multiplication Detector). Our contributions are as follows: 1) Rethinking the challenges of existing detectors for small and dense UAV images, and proposing information loss as a design guideline for efficient models. 2) We introduce the ChannelC2f module to enhance small object detection performance, demonstrating that high-dimensional representations can effectively mitigate information loss. 3) We design the GatedFFN module to provide not only strong performance but also low latency, effectively addressing the challenges of real-time detection. Our research reveals that GatedFFN, through the use of multiplication, is more cost-effective than feed-forward networks for high-dimensional representation. 4) We propose the CED module, which combines the advantages of ViT and CNN downsampling to effectively reduce information loss. It specifically enhances context information for small and dense objects. Extensive experiments on large UAV datasets, Visdrone and UAVDT, validate the real-time efficiency and superior performance of our methods. On the challenging UAV dataset VisDrone, our methods not only provided state-of-the-art results, improving detection by more than 3.4%, but also achieve 110 FPS on a single 4090.

IROS Conference 2025 Conference Paper

Safe and Efficient Navigation for Differential-Drive Robots in Dynamic Pedestrian Environments

  • Wenhao Liu
  • Letian Fu
  • Chen Li
  • Wanlei Li
  • Yunjiang Lou

Differential-drive robots are widely used in dynamic pedestrian environments, such as hospitals, for time-sensitive tasks like medication delivery, which require high navigation efficiency to ensure timely arrivals. However, existing methods tend to overemphasize safety, resulting in overly conservative behaviors and prolonged navigation times, which in turn lead to reduced efficiency. To address this issue, this paper proposes a novel navigation framework that integrates a pedestrian risk map, modeled using asymmetric Gaussian distributions, into B-spline trajectory optimization. Rather than strictly avoiding high-risk regions, the method balances collision risk and trajectory length minimization, leading to more effective navigation. Additionally, multiple planning modes enhance adaptability in complex environments, ensuring both safety and efficiency. Furthermore, kinematic constraints specific to differential-drive robots are incorporated to ensure the feasibility of the generated trajectories. Simulations and real-world experiments validate the proposed method’s effectiveness in achieving safe and efficient navigation in dynamic pedestrian environments. The video is available at https://youtu.be/S9qJmXyPEzw.

NeurIPS Conference 2025 Conference Paper

Towards Single-Source Domain Generalized Object Detection via Causal Visual Prompts

  • Chen Li
  • Huiying Xu
  • Changxin Gao
  • Zeyu Wang
  • Yun Liu
  • Xinzhong Zhu

Single-source Domain Generalized Object Detection (SDGOD), as a cutting-edge research topic in computer vision, aims to enhance model generalization capability in unseen target domains through single-source domain training. Current mainstream approaches attempt to mitigate domain discrepancies via data augmentation techniques. However, due to domain shift and limited domain‑specific knowledge, models tend to fall into the pitfall of spurious correlations. This manifests as the model's over-reliance on simplistic classification features (e. g. , color) rather than essential domain-invariant representations like object contours. To address this critical challenge, we propose the Cauvis (Causal Visual Prompts) method. First, we introduce a Cross-Attention Prompts module that mitigates bias from spurious features by integrating visual prompts with cross-attention. To address the inadequate domain knowledge coverage and spurious feature entanglement in visual prompts for single-domain generalization, we propose a dual-branch adapter that disentangles causal-spurious features while achieving domain adaptation via high-frequency feature extraction. Cauvis achieves state-of-the-art performance with 15. 9–31. 4\% gains over existing domain generalization methods on SDGOD datasets, while exhibiting significant robustness advantages in complex interference environments.

JBHI Journal 2025 Journal Article

WP-FSCIL: A Well-Prepared Few-shot Class-incremental Learning Framework for Pill Recognition

  • Jinghua Zhang
  • Chen Li
  • Marco Cristani
  • Hongzan Sun
  • Marcin Grzegorzek
  • Huiling Chen

Few-shot Class-incremental Pill Recognition (FSCIPR) aims to develop an automatic pill recognition system that requires only a few training data and can continuously adapt to new classes, providing technical support for applications in hospitals, portable apps, and assistance for visually impaired individuals. This task faces three core challenges: overfitting, fine-grained classification problems, and catastrophic forgetting. We propose the Well-Prepared Few-shot Class-incremental Learning (WP-FSCIL) framework, which addresses overfitting through a parameter-freezing strategy, enhances the robustness and discriminative power of backbone features with Center-Triplet (CT) loss and supervised contrastive loss for fine-grained classification, and alleviates catastrophic forgetting using a multi-dimensional Knowledge Distillation (KD) strategy based on flexible Pseudo-feature Synthesis (PFS). By flexibly synthesizing any number of old-class features, the PFS strategy resolves the issue of insufficient samples in the KD process, enabling Response-based KD (KD1) and Relation-based KD (KD2) to comprehensively preserve old knowledge. The effectiveness of WP-FSCIL has been validated through experiments conducted on two publicly available pill datasets. These experiments show that WP-FSCIL outperforms existing state-of-the-art methods, demonstrating its superior performance.

NeurIPS Conference 2024 Conference Paper

BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models

  • Fangyikang Wang
  • Hubery Yin
  • Yuejiang Dong
  • Huminhao Zhu
  • Chao Zhang
  • Hanbin Zhao
  • Hui Qian
  • Chen Li

The inversion of diffusion model sampling, which aims to find the corresponding initial noise of a sample, plays a critical role in various tasks. Recently, several heuristic exact inversion samplers have been proposed to address the inexact inversion issue in a training-free manner. However, the theoretical properties of these heuristic samplers remain unknown and they often exhibit mediocre sampling quality. In this paper, we introduce a generic formulation, \emph{Bidirectional Explicit Linear Multi-step} (BELM) samplers, of the exact inversion samplers, which includes all previously proposed heuristic exact inversion samplers as special cases. The BELM formulation is derived from the variable-stepsize-variable-formula linear multi-step method via integrating a bidirectional explicit constraint. We highlight this bidirectional explicit constraint is the key of mathematically exact inversion. We systematically investigate the Local Truncation Error (LTE) within the BELM framework and show that the existing heuristic designs of exact inversion samplers yield sub-optimal LTE. Consequently, we propose the Optimal BELM (O-BELM) sampler through the LTE minimization approach. We conduct additional analysis to substantiate the theoretical stability and global convergence property of the proposed optimal sampler. Comprehensive experiments demonstrate our O-BELM sampler establishes the exact inversion property while achieving high-quality sampling. Additional experiments in image editing and image interpolation highlight the extensive potential of applying O-BELM in varying applications.

NeurIPS Conference 2024 Conference Paper

Can We Leave Deepfake Data Behind in Training Deepfake Detector?

  • Jikang Cheng
  • Zhiyuan Yan
  • Ying Zhang
  • Yuhao Luo
  • Zhongyuan Wang
  • Chen Li

The generalization ability of deepfake detectors is vital for their applications in real-world scenarios. One effective solution to enhance this ability is to train the models with manually-blended data, which we termed ''blendfake'', encouraging models to learn generic forgery artifacts like blending boundary. Interestingly, current SoTA methods utilize blendfake $\textit{without}$ incorporating any deepfake data in their training process. This is likely because previous empirical observations suggest that vanilla hybrid training (VHT), which combines deepfake and blendfake data, results in inferior performance to methods using only blendfake data (so-called "1+1<2"). Therefore, a critical question arises: Can we leave deepfake behind and rely solely on blendfake data to train an effective deepfake detector? Intuitively, as deepfakes also contain additional informative forgery clues ($\textit{e. g. ,}$ deep generative artifacts), excluding all deepfake data in training deepfake detectors seems counter-intuitive. In this paper, we rethink the role of blendfake in detecting deepfakes and formulate the process from "real to blendfake to deepfake" to be a $\textit{progressive transition}$. Specifically, blendfake and deepfake can be explicitly delineated as the oriented pivot anchors between "real-to-fake" transitions. The accumulation of forgery information should be oriented and progressively increasing during this transition process. To this end, we propose an $\underline{O}$riented $\underline{P}$rogressive $\underline{R}$egularizor (OPR) to establish the constraints that compel the distribution of anchors to be discretely arranged. Furthermore, we introduce feature bridging to facilitate the smooth transition between adjacent anchors. Extensive experiments confirm that our design allows leveraging forgery information from both blendfake and deepfake effectively and comprehensively. Code is available at https: //github. com/beautyremain/ProDet.

AAAI Conference 2024 Conference Paper

GxVAEs: Two Joint VAEs Generate Hit Molecules from Gene Expression Profiles

  • Chen Li
  • Yoshihiro Yamanishi

The de novo generation of hit-like molecules that show bioactivity and drug-likeness is an important task in computer-aided drug discovery. Although artificial intelligence can generate molecules with desired chemical properties, most previous studies have ignored the influence of disease-related cellular environments. This study proposes a novel deep generative model called GxVAEs to generate hit-like molecules from gene expression profiles by leveraging two joint variational autoencoders (VAEs). The first VAE, ProfileVAE, extracts latent features from gene expression profiles. The extracted features serve as the conditions that guide the second VAE, which is called MolVAE, in generating hit-like molecules. GxVAEs bridge the gap between molecular generation and the cellular environment in a biological system, and produce molecules that are biologically meaningful in the context of specific diseases. Experiments and case studies on the generation of therapeutic molecules show that GxVAEs outperforms current state-of-the-art baselines and yield hit-like molecules with potential bioactivity and drug-like properties. We were able to successfully generate the potential molecular structures with therapeutic effects for various diseases from patients’ disease profiles.

NeurIPS Conference 2024 Conference Paper

MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps

  • Yating Xu
  • Chen Li
  • Gim Hee Lee

The key challenge of multi-view indoor 3D object detection is to infer accurate geometry information from images for precise 3D detection. Previous method relies on NeRF for geometry reasoning. However, the geometry extracted from NeRF is generally inaccurate, which leads to sub-optimal detection performance. In this paper, we propose MVSDet which utilizes plane sweep for geometry-aware 3D object detection. To circumvent the requirement for a large number of depth planes for accurate depth prediction, we design a probabilistic sampling and soft weighting mechanism to decide the placement of pixel features on the 3D volume. We select multiple locations that score top in the probability volume for each pixel and use their probability score to indicate the confidence. We further apply recent pixel-aligned Gaussian Splatting to regularize depth prediction and improve detection performance with little computation overhead. Extensive experiments on ScanNet and ARKitScenes datasets are conducted to show the superiority of our model. Our code is available at https: //github. com/Pixie8888/MVSDet.

TIST Journal 2024 Journal Article

Self-supervised Bipartite Graph Representation Learning: A Dirichlet Max-margin Matrix Factorization Approach

  • Shenghai Zhong
  • Shu Guo
  • Jing Liu
  • Hongren Huang
  • Lihong Wang
  • Jianxin Li
  • Chen Li
  • Yiming Hei

Bipartite graph representation learning aims to obtain node embeddings by compressing sparse vectorized representations of interactions between two types of nodes, e.g., users and items. Incorporating structural attributes among homogeneous nodes, such as user communities, improves the identification of similar interaction preferences, namely, user/item embeddings, for downstream tasks. However, existing methods often fail to proactively discover and fully utilize these latent structural attributes. Moreover, the manual collection and labeling of structural attributes is always costly. In this article, we propose a novel approach called Dirichlet Max-margin Matrix Factorization (DM3F), which adopts a self-supervised strategy to discover latent structural attributes and model discriminative node representations. Specifically, in self-supervised learning, our approach generates pseudo group labels (i.e., structural attributes) as a supervised signal using the Dirichlet process without relying on manual collection and labeling, and employs them in a max-margin classification. Additionally, we introduce a Variational Markov Chain Monte Carlo algorithm (Variational MCMC) to effectively update the parameters. The experimental results on six real datasets demonstrate that, in the majority of cases, the proposed method outperforms existing approaches based on matrix factorization and neural networks. Furthermore, the modularity analysis confirms the effectiveness of our model in capturing structural attributes to produce high-quality user embeddings.

AAAI Conference 2024 Conference Paper

SwitchTab: Switched Autoencoders Are Effective Tabular Learners

  • Jing Wu
  • Suiyao Chen
  • Qi Zhao
  • Renat Sergazinov
  • Chen Li
  • Shengjie Liu
  • Chongchao Zhao
  • Tianpei Xie

Self-supervised representation learning methods have achieved significant success in computer vision and natural language processing (NLP), where data samples exhibit explicit spatial or semantic dependencies. However, applying these methods to tabular data is challenging due to the less pronounced dependencies among data samples. In this paper, we address this limitation by introducing SwitchTab, a novel self-supervised method specifically designed to capture latent dependencies in tabular data. SwitchTab leverages an asymmetric encoder-decoder framework to decouple mutual and salient features among data pairs, resulting in more representative embeddings. These embeddings, in turn, contribute to better decision boundaries and lead to improved results in downstream tasks. To validate the effectiveness of SwitchTab, we conduct extensive experiments across various domains involving tabular data. The results showcase superior performance in end-to-end prediction tasks with fine-tuning. Moreover, we demonstrate that pre-trained salient embeddings can be utilized as plug-and-play features to enhance the performance of various traditional classification methods (e.g., Logistic Regression, XGBoost, etc.). Lastly, we highlight the capability of SwitchTab to create explainable representations through visualization of decoupled mutual and salient features in the latent space.

NeurIPS Conference 2024 Conference Paper

VCR-GauS: View Consistent Depth-Normal Regularizer for Gaussian Surface Reconstruction

  • Hanlin Chen
  • Fangyin Wei
  • Chen Li
  • Tianxin Huang
  • Yunsong Wang
  • Gim Hee Lee

Although 3D Gaussian Splatting has been widely studied because of its realistic and efficient novel-view synthesis, it is still challenging to extract a high-quality surface from the point-based representation. Previous works improve the surface by incorporating geometric priors from the off-the-shelf normal estimator. However, there are two main limitations: 1) Supervising normal rendered from 3D Gaussians updates only the rotation parameter while neglecting other geometric parameters; 2) The inconsistency of predicted normal maps across multiple views may lead to severe reconstruction artifacts. In this paper, we propose a Depth-Normal regularizer that directly couples normal with other geometric parameters, leading to full updates of the geometric parameters from normal regularization. We further propose a confidence term to mitigate inconsistencies of normal predictions across multiple views. Moreover, we also introduce a densification and splitting strategy to regularize the size and distribution of 3D Gaussians for more accurate surface modeling. Compared with Gaussian-based baselines, experiments show that our approach obtains better reconstruction quality and maintains competitive appearance quality at faster training speed and 100+ FPS rendering. Our code will be made open-source upon paper acceptance.

TMLR Journal 2024 Journal Article

Vision-Language Instruction Tuning: A Review and Analysis

  • Chen Li
  • Yixiao Ge
  • Dian Li
  • Ying Shan

Instruction tuning is a crucial supervised training phase in Large Language Models (LLMs), aiming to enhance the LLM's ability to generalize instruction execution and adapt to user preferences. With the increasing integration of multi-modal data into LLMs, there is growing interest in Vision-Language Instruction Tuning (VLIT), which presents more complex characteristics compared to pure text instruction tuning. In this paper, we systematically review the latest VLIT settings and corresponding datasets in multi-modal LLMs and provide insights into the intrinsic motivations behind their design. For the first time, we offer a detailed multi-perspective categorization for existing VLIT datasets and identify the characteristics that high-quality VLIT data should possess. By incorporating these characteristics as guiding principles into the existing VLIT data construction process, we conduct extensive experiments and verify their positive impact on the performance of tuned multi-modal LLMs. Furthermore, we discuss the current challenges and future research directions of VLIT, providing insights for the continuous development of this field. The code and dataset related to this paper have been open-sourced at \url{https://github.com/palchenli/VL-Instruction-Tuning}.

IJCAI Conference 2023 Conference Paper

Annealing Genetic-based Preposition Substitution for Text Rubbish Example Generation

  • Chen Li
  • Xinghao Yang
  • Baodi Liu
  • Weifeng Liu
  • Honglong Chen

Modern Natural Language Processing (NLP) models expose under-sensitivity towards text rubbish examples. The text rubbish example is the heavily modified input text which is nonsensical to humans but does not change the model’s prediction. Prior work crafts rubbish examples by iteratively deleting words and determining the deletion order with beam search. However, the produced rubbish examples usually cause a reduction in model confidence and sometimes deliver human-readable text. To address these problems, we propose an Annealing Genetic based Preposition Substitution (AGPS) algorithm for text rubbish sample generation with two major merits. Firstly, the AGPS crafts rubbish text examples by substituting input words with meaningless prepositions instead of directly removing them, which brings less degradation to the model’s confidence. Secondly, we design an Annealing Genetic algorithm to optimize the word replacement priority, which allows the Genetic Algorithm (GA) to jump out the local optima with probabilities. This is significant in achieving better objectives, i. e. , a high word modification rate and a high model confidence. Experimental results on five popular datasets manifest the superiority of AGPS compared with the baseline and expose the fact: the NLP models can not really understand the semantics of sentences, as they give the same prediction with even higher confidence for the nonsensical preposition sequences.

NeurIPS Conference 2023 Conference Paper

Formulating Discrete Probability Flow Through Optimal Transport

  • Pengze Zhang
  • Hubery Yin
  • Chen Li
  • Xiaohua Xie

Continuous diffusion models are commonly acknowledged to display a deterministic probability flow, whereas discrete diffusion models do not. In this paper, we aim to establish the fundamental theory for the probability flow of discrete diffusion models. Specifically, we first prove that the continuous probability flow is the Monge optimal transport map under certain conditions, and also present an equivalent evidence for discrete cases. In view of these findings, we are then able to define the discrete probability flow in line with the principles of optimal transport. Finally, drawing upon our newly established definitions, we propose a novel sampling method that surpasses previous discrete diffusion models in its ability to generate more certain outcomes. Extensive experiments on the synthetic toy dataset and the CIFAR-10 dataset have validated the effectiveness of our proposed discrete probability flow. Code is released at: https: //github. com/PangzeCheung/Discrete-Probability-Flow.

NeurIPS Conference 2023 Conference Paper

GNeSF: Generalizable Neural Semantic Fields

  • Hanlin Chen
  • Chen Li
  • Mengqi Guo
  • Zhiwen Yan
  • Gim Hee Lee

3D scene segmentation based on neural implicit representation has emerged recently with the advantage of training only on 2D supervision. However, existing approaches still requires expensive per-scene optimization that prohibits generalization to novel scenes during inference. To circumvent this problem, we introduce a \textit{generalizable} 3D segmentation framework based on implicit representation. Specifically, our framework takes in multi-view image features and semantic maps as the inputs instead of only spatial information to avoid overfitting to scene-specific geometric and semantic information. We propose a novel soft voting mechanism to aggregate the 2D semantic information from different views for each 3D point. In addition to the image features, view difference information is also encoded in our framework to predict the voting scores. Intuitively, this allows the semantic information from nearby views to contribute more compared to distant ones. Furthermore, a visibility module is also designed to detect and filter out detrimental information from occluded views. Due to the generalizability of our proposed method, we can synthesize semantic maps or conduct 3D semantic segmentation for novel scenes with solely 2D semantic supervision. Experimental results show that our approach achieves comparable performance with scene-specific approaches. More importantly, our approach can even outperform existing strong supervision-based approaches with only 2D annotations.

AAAI Conference 2023 Conference Paper

Robust and Fast Measure of Information via Low-Rank Representation

  • Yuxin Dong
  • Tieliang Gong
  • Shujian Yu
  • Hong Chen
  • Chen Li

The matrix-based Rényi's entropy allows us to directly quantify information measures from given data, without explicit estimation of the underlying probability distribution. This intriguing property makes it widely applied in statistical inference and machine learning tasks. However, this information theoretical quantity is not robust against noise in the data, and is computationally prohibitive in large-scale applications. To address these issues, we propose a novel measure of information, termed low-rank matrix-based Rényi's entropy, based on low-rank representations of infinitely divisible kernel matrices. The proposed entropy functional inherits the specialty of of the original definition to directly quantify information from data, but enjoys additional advantages including robustness and effective calculation. Specifically, our low-rank variant is more sensitive to informative perturbations induced by changes in underlying distributions, while being insensitive to uninformative ones caused by noises. Moreover, low-rank Rényi's entropy can be efficiently approximated by random projection and Lanczos iteration techniques, reducing the overall complexity from O(n³) to O(n²s) or even O(ns²), where n is the number of data samples and s ≪ n. We conduct large-scale experiments to evaluate the effectiveness of this new information measure, demonstrating superior results compared to matrix-based Rényi's entropy in terms of both performance and computational efficiency.

JBHI Journal 2023 Journal Article

Semi-Supervised Pixel Contrastive Learning Framework for Tissue Segmentation in Histopathological Image

  • Jiangbo Shi
  • Tieliang Gong
  • Chunbao Wang
  • Chen Li

Accurate tissue segmentation in histopathological images is essential for promoting the development of precision pathology. However, the size of the digital pathological image is great, which needs to be tiled into small patches containing limited semantic information. To imitate the pathologist's diagnosis process and model the semantic relation of the whole slide image, We propose a semi-supervised pixel contrastive learning framework (SSPCL) which mainly includes an uncertainty-guided mutual dual consistency learning module (UMDC) and a cross image pixel-contrastive learning module (CIPC). The UMDC module enables efficient learning from unlabeled data through mutual dual-consistency and consensus-based uncertainty. The CIPC module aims at capturing the cross-patch semantic relationship by optimizing a contrastive loss between pixel embeddings. We also propose several novel domain-related sampling methods by utilizing the continuous spatial structure of adjacent image patches, which can avoid the problem of false sampling and improve the training efficiency. In this way, SSPCL significantly reduces the labeling cost on histopathological images and realizes the accurate quantitation of tissues. Extensive experiments on three tissue segmentation datasets demonstrate the effectiveness of SSPCL, which outperforms state-of-the-art up to 5. 0% in mDice.

IJCAI Conference 2023 Conference Paper

Understanding the Generalization Ability of Deep Learning Algorithms: A Kernelized Rényi's Entropy Perspective

  • Yuxin Dong
  • Tieliang Gong
  • Hong Chen
  • Chen Li

Recently, information-theoretic analysis has become a popular framework for understanding the generalization behavior of deep neural networks. It allows a direct analysis for stochastic gradient / Langevin descent (SGD/SGLD) learning algorithms without strong assumptions such as Lipschitz or convexity conditions. However, the current generalization error bounds within this framework are still far from optimal, while substantial improvements on these bounds are quite challenging due to the intractability of high-dimensional information quantities. To address this issue, we first propose a novel information theoretical measure: kernelized Rényi's entropy, by utilizing operator representation in Hilbert space. It inherits the properties of Shannon's entropy and can be effectively calculated via simple random sampling, while remaining independent of the input dimension. We then establish the generalization error bounds for SGD/SGLD under kernelized Rényi's entropy, where the mutual information quantities can be directly calculated, enabling evaluation of the tightness of each intermediate step. We show that our information-theoretical bounds depend on the statistics of the stochastic gradients evaluated along with the iterates, and are rigorously tighter than the current state-of-the-art (SOTA) results. The theoretical findings are also supported by large-scale empirical studies.

AAAI Conference 2022 Conference Paper

Regularized Modal Regression on Markov-Dependent Observations: A Theoretical Assessment

  • Tieliang Gong
  • Yuxin Dong
  • Hong Chen
  • Wei Feng
  • Bo Dong
  • Chen Li

Modal regression, a widely used regression protocol, has been extensively investigated in statistical and machine learning communities due to its robustness to outliers and heavy-tailed noises. Understanding modal regression’s theoretical behavior can be fundamental in learning theory. Despite significant progress in characterizing its statistical property, the majority of the results are based on the assumption that samples are independent and identical distributed (i. i. d.), which is too restrictive for real-world applications. This paper concerns the statistical property of regularized modal regression (RMR) within an important dependence structure - Markov dependent. Specifically, we establish the upper bound for RMR estimator under moderate conditions and give an explicit learning rate. Our results show that the Markov dependence impacts on the generalization error in the way that sample size would be discounted by a multiplicative factor depending on the spectral gap of underlying Markov chain. This result shed a new light on characterizing the theoretical underpinning for robust regression.

AAAI Conference 2022 Short Paper

SimCTC: A Simple Contrast Learning Method of Text Clustering (Student Abstract)

  • Chen Li
  • Xiaoguang Yu
  • Shuangyong Song
  • Jia Wang
  • Bo Zou
  • Xiaodong He

This paper presents SimCTC, a simple contrastive learning (CL) method that greatly advances the state-of-the-art text clustering models. In SimCTC, a pre-trained BERT model first maps the input sequence to the representation space, which is then followed by three different loss function heads: Clustering head, Instance-CL head and Cluster-CL head. Experimental results on multiple benchmark datasets demonstrate that SimCTC remarkably outperforms 6 competitive text clustering methods with 1%-6% improvement on Accuracy (ACC) and 1%-4% improvement on Normalized Mutual Information (NMI). Moreover, our results also show that the clustering performance can be further improved by setting an appropriate number of clusters in the cluster-level objective.

IJCAI Conference 2022 Conference Paper

Transformer-based Objective-reinforced Generative Adversarial Network to Generate Desired Molecules

  • Chen Li
  • Chikashige Yamanaka
  • Kazuma Kaitoh
  • Yoshihiro Yamanishi

Deep generative models of sequence-structure data have attracted widespread attention in drug discovery. However, such models cannot fully extract the semantic features of molecules from sequential representations. Moreover, mode collapse reduces the diversity of the generated molecules. This paper proposes a transformer-based objective-reinforced generative adversarial network (TransORGAN) to generate molecules. TransORGAN leverages a transformer architecture as a generator and uses a stochastic policy gradient for reinforcement learning to generate plausible molecules with rich semantic features. The discriminator grants rewards that guide the policy update of the generator, while an objective-reinforced penalty encourages the generation of diverse molecules. Experiments were performed using the ZINC chemical dataset, and the results demonstrated the usefulness of TransORGAN in terms of uniqueness, novelty, and diversity of the generated molecules.

TMLR Journal 2022 Journal Article

Understanding Linearity of Cross-Lingual Word Embedding Mappings

  • Xutan Peng
  • Mark Stevenson
  • Chenghua Lin
  • Chen Li

The technique of Cross-Lingual Word Embedding (CLWE) plays a fundamental role in tackling Natural Language Processing challenges for low-resource languages. Its dominant approaches assumed that the relationship between embeddings could be represented by a linear mapping, but there has been no exploration of the conditions under which this assumption holds. Such a research gap becomes very critical recently, as it has been evidenced that relaxing mappings to be non-linear can lead to better performance in some cases. We, for the first time, present a theoretical analysis that identifies the preservation of analogies encoded in monolingual word embeddings as a *necessary and sufficient* condition for the ground-truth CLWE mapping between those embeddings to be linear. On a novel cross-lingual analogy dataset that covers five representative analogy categories for twelve distinct languages, we carry out experiments which provide direct empirical support for our theoretical claim. These results offer additional insight into the observations of other researchers and contribute inspiration for the development of more effective cross-lingual representation learning strategies.

NeurIPS Conference 2021 Conference Paper

Coarse-to-fine Animal Pose and Shape Estimation

  • Chen Li
  • Gim Hee Lee

Most existing animal pose and shape estimation approaches reconstruct animal meshes with a parametric SMAL model. This is because the low-dimensional pose and shape parameters of the SMAL model makes it easier for deep networks to learn the high-dimensional animal meshes. However, the SMAL model is learned from scans of toy animals with limited pose and shape variations, and thus may not be able to represent highly varying real animals well. This may result in poor fittings of the estimated meshes to the 2D evidences, e. g. 2D keypoints or silhouettes. To mitigate this problem, we propose a coarse-to-fine approach to reconstruct 3D animal mesh from a single image. The coarse estimation stage first estimates the pose, shape and translation parameters of the SMAL model. The estimated meshes are then used as a starting point by a graph convolutional network (GCN) to predict a per-vertex deformation in the refinement stage. This combination of SMAL-based and vertex-based representations benefits from both parametric and non-parametric representations. We design our mesh refinement GCN (MRGCN) as an encoder-decoder structure with hierarchical feature representations to overcome the limited receptive field of traditional GCNs. Moreover, we observe that the global image feature used by existing animal mesh reconstruction works is unable to capture detailed shape information for mesh refinement. We thus introduce a local feature extractor to retrieve a vertex-level feature and use it together with the global feature as the input of the MRGCN. We test our approach on the StanfordExtra dataset and achieve state-of-the-art results. Furthermore, we test the generalization capacity of our approach on the Animal Pose and BADJA datasets. Our code is available at the project website.

AAAI Conference 2021 Conference Paper

NaturalConv: A Chinese Dialogue Dataset Towards Multi-turn Topic-driven Conversation

  • Xiaoyang Wang
  • Chen Li
  • Jianqiao Zhao
  • Dong Yu

In this paper, we propose a Chinese multi-turn topic-driven conversation dataset, NaturalConv, which allows the participants to chat anything they want as long as any element from the topic is mentioned and the topic shift is smooth. Our corpus contains 19. 9K conversations from six domains, and 400K utterances with an average turn number of 20. 1. These conversations contain in-depth discussions on related topics or widely natural transition between multiple topics. We believe either way is normal for human conversation. To facilitate the research on this corpus, we provide results of several benchmark models. Comparative results show that for this dataset, our current models are not able to provide significant improvement by introducing background knowledge/topic. Therefore, the proposed dataset should be a good benchmark for further research to evaluate the validity and naturalness of multi-turn conversation systems. Our dataset is available at https: //ai. tencent. com/ailab/nlp/dialogue/#datasets.

AAAI Conference 2021 Short Paper

Scalable Partial Explainability in Neural Networks via Flexible Activation Functions (Student Abstract)

  • Schyler C. Sun
  • Chen Li
  • Zhuangkun Wei
  • Antonios Tsourdos
  • Weisi Guo

Current state-of-the-art neural network explanation methods (e. g. Saliency maps, DeepLIFT, LIME, etc.) focus more on the direct relationship between NN outputs and inputs rather than the NN structure and operations itself, hence there still exists uncertainty over the exact role played by neurons. In this paper, we propose a novel neural network structure with Kolmogorov-Arnold Superposition Theorem based topology and Gaussian Processes based flexible activation function to achieve partial explainability of the neuron inner reasoning. The model feasibility is verified in a case study on binary classification of the banknotes.

IJCAI Conference 2021 Conference Paper

TextGTL: Graph-based Transductive Learning for Semi-supervised Text Classification via Structure-Sensitive Interpolation

  • Chen Li
  • Xutan Peng
  • Hao Peng
  • Jianxin Li
  • Lihong Wang

Compared with traditional sequential learning models, graph-based neural networks exhibit excellent properties when encoding text, such as the capacity of capturing global and local information simultaneously. Especially in the semi-supervised scenario, propagating information along the edge can effectively alleviate the sparsity of labeled data. In this paper, beyond the existing architecture of heterogeneous word-document graphs, for the first time, we investigate how to construct lightweight non-heterogeneous graphs based on different linguistic information to better serve free text representation learning. Then, a novel semi-supervised framework for text classification that refines graph topology under theoretical guidance and shares information across different text graphs, namely Text-oriented Graph-based Transductive Learning (TextGTL), is proposed. TextGTL also performs attribute space interpolation based on dense substructure in graphs to predict low-entropy labels with high-quality feature nodes for data augmentation. To verify the effectiveness of TextGTL, we conduct extensive experiments on various benchmark datasets, observing significant performance gains over conventional heterogeneous graphs. In addition, we also design ablation studies to dive deep into the validity of components in TextTGL.

IJCAI Conference 2020 Conference Paper

An Interactive Multi-Task Learning Framework for Next POI Recommendation with Uncertain Check-ins

  • Lu Zhang
  • Zhu Sun
  • Jie Zhang
  • Yu Lei
  • Chen Li
  • Ziqing Wu
  • Horst Kloeden
  • Felix Klanner

Studies on next point-of-interest (POI) recommendation mainly seek to learn users' transition patterns with certain historical check-ins. However, in reality, users' movements are typically uncertain (i. e. , fuzzy and incomplete) where most existing methods suffer from the transition pattern vanishing issue. To ease this issue, we propose a novel interactive multi-task learning (iMTL) framework to better exploit the interplay between activity and location preference. Specifically, iMTL introduces: (1) temporal-aware activity encoder equipped with fuzzy characterization over uncertain check-ins to unveil the latent activity transition patterns; (2) spatial-aware location preference encoder to capture the latent location transition patterns; and (3) task-specific decoder to make use of the learned latent transition patterns and enhance both activity and location prediction tasks in an interactive manner. Extensive experiments on three real-world datasets show the superiority of iMTL.

AAAI Conference 2020 Conference Paper

Joint Parsing and Generation for Abstractive Summarization

  • Kaiqiang Song
  • Logan Lebanoff
  • Qipeng Guo
  • Xipeng Qiu
  • Xiangyang Xue
  • Chen Li
  • Dong Yu
  • Fei Liu

Sentences produced by abstractive summarization systems can be ungrammatical and fail to preserve the original meanings, despite being locally fluent. In this paper we propose to remedy this problem by jointly generating a sentence and its syntactic dependency parse while performing abstraction. If generating a word can introduce an erroneous relation to the summary, the behavior must be discouraged. The proposed method thus holds promise for producing grammatical sentences and encouraging the summary to stay true-to-original. Our contributions of this work are twofold. First, we present a novel neural architecture for abstractive summarization that combines a sequential decoder with a tree-based decoder in a synchronized manner to generate a summary sentence and its syntactic parse. Secondly, we describe a novel human evaluation protocol to assess if, and to what extent, a summary remains true to its original meanings. We evaluate our method on a number of summarization datasets and demonstrate competitive results against strong baselines.

TIST Journal 2018 Journal Article

Automatic Extraction of Behavioral Patterns for Elderly Mobility and Daily Routine Analysis

  • Chen Li
  • William K. Cheung
  • Jiming Liu
  • Joseph K. Ng

The elderly living in smart homes can have their daily movement recorded and analyzed. As different elders can have their own living habits, a methodology that can automatically identify their daily activities and discover their daily routines will be useful for better elderly care and support. In this article, we focus on automatic detection of behavioral patterns from the trajectory data of an individual for activity identification as well as daily routine discovery. The underlying challenges lie in the need to consider longer-range dependency of the sensor triggering events and spatiotemporal variations of the behavioral patterns exhibited by humans. We propose to represent the trajectory data using a behavior-aware flow graph that is a probabilistic finite state automaton with its nodes and edges attributed with some local behavior-aware features. We identify the underlying subflows as the behavioral patterns using the kernel k -means algorithm. Given the identified activities, we propose a novel nominal matrix factorization method under a Bayesian framework with Lasso to extract highly interpretable daily routines. For empirical evaluation, the proposed methodology has been compared with a number of existing methods based on both synthetic and publicly available real smart home datasets with promising results obtained. We also discuss how the proposed unsupervised methodology can be used to support exploratory behavior analysis for elderly care.

AAAI Conference 2018 Conference Paper

Training and Evaluating Improved Dependency-Based Word Embeddings

  • Chen Li
  • Jianxin Li
  • Yangqiu Song
  • Ziwei Lin

Word embedding has been widely used in many natural language processing tasks. In this paper, we focus on learning word embeddings through selective higher-order relationships in sentences to improve the embeddings to be less sensitive to local context and more accurate in capturing semantic compositionality. We present a novel multi-order dependency-based strategy to composite and represent the context under several essential constraints. In order to realize selective learning from the word contexts, we automatically assign the strengths of different dependencies between co-occurred words in the stochastic gradient descent process. We evaluate and analyze our proposed approach using several direct and indirect tasks for word embeddings. Experimental results demonstrate that our embeddings are competitive to or better than state-of-the-art methods and significantly outperform other methods in terms of context stability. The output weights and representations of dependencies obtained in our embedding model conform to most of the linguistic characteristics and are valuable for many downstream tasks.

IJCAI Conference 2015 Conference Paper

Joint POS Tagging and Text Normalization for Informal Text

  • Chen Li
  • Yang Liu

Text normalization and part-of-speech (POS) tagging for social media data have been investigated recently, however, prior work has treated them separately. In this paper, we propose a joint Viterbi decoding process to determine each token’s POS tag and non-standard token’s correct form at the same time. In order to evaluate our approach, we create two new data sets with POS tag labels and non-standard tokens’ correct forms. This is the first data set with such annotation. The experiment results demonstrate the effect of non-standard words on POS tagging, and also show that our proposed methods perform better than the state-of-theart systems in both POS tagging and normalization.

IJCAI Conference 2011 Conference Paper

CHIME: An Efficient Error-Tolerant Chinese Pinyin Input Method

  • Yabin Zheng
  • Chen Li
  • Maosong Sun

Chinese Pinyin input methods are very important for Chinese language processing. In many cases, users may make typing errors. For example, a user wants to type in "shenme" (meaning "what" in English) but may type in "shenem" instead. Existing Pinyin input methods fail in converting such a Pinyin sequence with errors to the right Chinese words. To solve this problem, we developed an efficient error-tolerant Pinyin input method called "CHIME'' that can handle typing errors. By incorporating state-of-the-art techniques and language-specific features, the method achieves a better performance than state-of-the-art input methods. It can efficiently find relevant words in milliseconds for an input Pinyin sequence.

IROS Conference 2009 Conference Paper

Rapid and precise object detection based on color histograms and adaptive bandwidth mean shift

  • Xiaopeng Chen
  • Qiang Huang 0002
  • Peng Hu
  • Min Li 0015
  • Ye Tian 0024
  • Chen Li

Speed and precision are important for object detection algorithms. In this paper, a novel object detection algorithm based on color histogram and adaptive bandwidth mean shift is proposed. The algorithm is capable of detecting objects rapidly and precisely. It is composed of two stages: a rough detection stage and a precise detection stage. At the rough detection stage, histogram back projection and thresholding are applied to fast object identification and rough global localization. At the precise detection stage, the precise position, size and orientation are derived under the adaptive bandwidth mean shift framework. Experiments verify that the algorithm is able to detect the size, position and orientation of general objects rapidly and precisely.