Arrow Research search

Author name cluster

Lei Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

83 papers
2 author rows

Possible papers

83

AAAI Conference 2026 Conference Paper

Beyond Semantic Features: Pixel-level Mapping for Generalized AI-Generated Image Detection

  • Chenming Zhou
  • Jiaan Wang
  • Yu Li
  • Lei Li
  • Juan Cao
  • Sheng Tang

The rapid evolution of generative technologies necessitates reliable methods for detecting AI-generated images. A critical limitation of current detectors is their failure to generalize to images from unseen generative models, as they often overfit to source-specific semantic cues rather than learning universal generative artifacts. To overcome this, we introduce a simple yet remarkably effective pixel-level mapping pre-processing step to disrupt the pixel value distribution of images and break the fragile, non-essential semantic patterns that detectors commonly exploit as shortcuts. This forces the detector to focus on more fundamental and generalizable high-frequency traces inherent to the image generation process. Through comprehensive experiments on GAN and diffusion-based generators, we show that our approach significantly boosts the cross-generator performance of state-of-the-art detectors. Extensive analysis further verifies our hypothesis that the disruption of semantic cues is the key to generalization.

AAAI Conference 2026 Conference Paper

Multiple Human Motion Understanding

  • Lei Li
  • Sen Jia
  • Jenq-Neng Hwang

We introduce LLaMMo (Large Language and Multi-Person Motion Assistant), the first instruction-tuning multimodal framework tailored for multi-human motion analysis. LLaMMo incorporates a novel human-centric and social-temporal learner that models and fuses both intra-person dynamics and inter-person dependencies, yielding robust, context-aware representations of complex group behaviors while maintaining low computational overhead. To support LLaMMo, we construct LLaVerse, a large-scale dataset with fine-grained manual annotations covering diverse multi-person activities spanning daily social interaction and professional team sports. Built on top of LLaVerse, we also propose LLaMI-Bench, a dedicated benchmark for evaluating multi-human behavior understanding across motion and video modalities. Extensive experiments demonstrate that LLaMMo consistently outperforms baselines in understanding multi-person interactions under low-latency settings, with notable gains in both social and sport-specific contexts.

AAAI Conference 2026 Conference Paper

PUFM: Efficient Point Cloud Upsampling via Flow Matching

  • Zhi-Song Liu
  • Chenhang He
  • Yakun Ju
  • Lei Li

Diffusion models have recently been adopted for point cloud upsampling due to their effectiveness in solving ill-posed problems. However, existing upsampling methods often struggle with inefficiencies, as they generate dense point clouds by mapping Gaussian noise to data, overlooking the geometric information already present in sparse inputs. To address this, we propose PUFM, a novel Point Cloud Upsampling via Flow Matching, which learns to directly transform sparse point clouds into their high-fidelity dense counterparts. Our approach first applies midpoint interpolation to densify the sparse input. Then, we construct a continuous interpolant between sparse and dense point clouds and train a neural network to estimate the velocity field for flow matching. Given the unordered nature of point clouds, we introduce a pre-alignment step based on Earth Mover's Distance (EMD) optimization to ensure coherent and meaningful interpolation between sparse and dense representations. This results in a more stable and efficient learning trajectory during flow matching. Experiments on synthetic benchmarks demonstrate that our method delivers superior upsampling quality but with fewer sampling steps. Further experiments on ScanNet and KITTI also show that our approach generalizes well to real-world RGB-D and LiDAR point clouds, making it more practical for real-world applications.

AAAI Conference 2026 Conference Paper

TEMPLE: Incentivizing Temporal Understanding of Video Large Language Models via Progressive Pre-SFT Alignment

  • Shicheng Li
  • Lei Li
  • Kun Ouyang
  • Shuhuai Ren
  • Yuanxin Liu
  • Yuanxing Zhang
  • Fuzheng Zhang
  • Lingpeng Kong

Video Large Language Models (Video LLMs) have achieved significant success by adopting the paradigm of large-scale pre-training followed by supervised fine-tuning (SFT). However, existing approaches struggle with temporal reasoning due to weak temporal correspondence in the data and over-reliance on the next-token prediction paradigm, which collectively result in the absence temporal supervision. To address these limitations, we propose TEMPLE (TEMporal Preference Learning), a systematic framework that enhances temporal reasoning capabilities through Direct Preference Optimization (DPO). To address temporal information scarcity in data, we introduce an automated pipeline for systematically constructing temporality-intensive preference pairs comprising three steps: selecting temporally rich videos, designing video-specific perturbation strategies, and evaluating model responses on clean and perturbed inputs. Complementing this data pipeline, we provide additional supervision signals via preference learning and propose a novel Progressive Pre-SFT Alignment strategy featuring two key innovations: a curriculum learning strategy which progressively increases perturbation difficulty to maximize data efficiency; and applying preference optimization before instruction tuning to incentivize fundamental temporal alignment. Extensive experiments demonstrate that our approach consistently improves Video LLM performance across multiple benchmarks with a relatively small set of self-generated DPO data. Our findings highlight TEMPLE as a scalable and efficient complement to SFT-based methods, paving the way for developing reliable Video LLMs.

NeurIPS Conference 2025 Conference Paper

A Technical Report on “Erasing the Invisible”: The 2024 NeurIPS Competition on Stress Testing Image Watermarks

  • Mucong Ding
  • Bang An
  • Tahseen Rabbani
  • Chenghao Deng
  • Anirudh Satheesh
  • Souradip Chakraborty
  • Mehrdad Saberi
  • Yuxin Wen

AI-generated images have become pervasive, raising critical concerns around content authenticity, intellectual property, and the spread of misinformation. Invisible watermarks offer a promising solution for identifying AI-generated images, preserving content provenance without degrading visual quality. However, their real-world robustness remains uncertain due to the lack of standardized evaluation protocols and large-scale stress testing. To bridge this gap, we organized “Erasing the Invisible, ” a NeurIPS 2024 competition and newly established benchmark designed to systematically stress testing the resilience of watermarking techniques. The competition introduced two attack tracks—Black-box and Beige-box—that simulate practical scenarios with varying levels of attacker knowledge on watermarks, providing a comprehensive assessment of watermark robustness. The competition attracted significant global participation, with 2, 722 submissions from 298 teams. Through a rigorous evaluation pipeline featuring real-time feedback and human-verified final rankings, participants developed and demonstrated new attack strategies that revealed critical vulnerabilities in state-of-the-art watermarking methods. On average, the top-5 teams in both tracks could remove watermarks from $\geq$ 89% of the images while preserving high visual quality, setting strong baselines for future research on watermark attacks and defenses. To support continued progress in this field, we summarize the insights and lessons learned from this competition in this paper, and release the benchmark dataset, evaluation toolkit, and competition results. “Erasing the Invisible” establishes a valuable open resource for advancing more robust watermarking techniques and strengthening content provenance in the era of generative AI.

ECAI Conference 2025 Conference Paper

Adversarial Pretrained Language Model for Multivariate Time Series Anomaly Detection

  • Jianhuan Mao
  • Mengxiao Zhu 0004
  • Lei Li
  • Haogang Zhu

Multivariate time series anomaly detection plays a vital role in safety-critical domains such as industrial systems, finance, and cybersecurity. However, the scarcity of labeled anomalies poses significant challenges for learning robust normal patterns, often blurring the boundary between normal and abnormal behaviors. To address this challenge, we propose ADLM, an unsupervised adversarial framework that integrates a Language-Model-based Predictor for Time Series (LMPTS) with an autoencoder. To capture normal patterns under limited data, LMPTS repurposes a decoder-only pretrained language model as an autoregressive forecaster, leveraging its strong generative prior to capture temporal dependencies. To model complex cross-sensor dependencies, we incorporate graph structure learning into the framework. Furthermore, we introduce an adversarial training strategy to sharpen the model’s normal-pattern representations and amplify deviations indicative of anomalies. Experiments on six public datasets show that ADLM consistently outperforms state-of-the-art baselines and remains robust under severe data scarcity. By coupling decoder-only language models with an adversarial objective, ADLM offers a label-efficient, structure-aware solution to multivariate time series anomaly detection.

AAAI Conference 2025 Conference Paper

An Efficient and Accurate Dynamic Sparse Training Framework Based on Parameter-Freezing

  • Lei Li
  • Haochen Yang
  • Jiacheng Guo
  • Hongkai Yu
  • Minghai Qin
  • Tianyun Zhang

Federated learning is a decentralized machine learning approach that consists of servers and clients. It protects data privacy during model training by keeping the training data locally in each client. However, the requirement for the server and clients to frequently synchronize the parameters of the model brings a heavy burden to the communication links, especially when the model size has grown drastically in recent years. Several methods have been proposed to compress the model size by sparsification to reduce the communication overhead, albeit with significant accuracy degradation. In this work, we propose methods to better trade-off between model accuracy and training efficiency in federated learning. Our first proposed method is a novel sparse mask readjustment rule on the server and the second is a parameter-freezing method during training on the clients. Experimental results show that the model accuracy has significantly improved when combining our proposed methods. For example, compared with the previous state-of-the-art methods with the same total amount of communication cost and computation FLOPs, the accuracy increases on average by 4% and 6% in our methods for CIFAR-10 and CIFAR-100 datasets on ResNet-18, respectively. On the other hand, when targeting the same accuracy, the proposed method can reduce the communication cost by 4-8 times for different datasets with different sparsity levels.

IROS Conference 2025 Conference Paper

Awakening Facial Emotional Expressions in Human-Robot

  • Yongtong Zhu
  • Lei Li
  • Iggy Qian
  • Wenbin Zhou
  • Ye Yuan
  • Qingdu Li
  • Na Liu 0007
  • Jianwei Zhang 0001

The facial expression generation capability of humanoid social robots is critical for achieving natural and human-like interactions, playing a vital role in enhancing the fluidity of human-robot interactions and the accuracy of emotional expression. Currently, facial expression generation in humanoid social robots still relies on pre-programmed be-havioral patterns, which are manually coded at high human and time costs. To enable humanoid robots to autonomously acquire generalized expressive capabilities, they need to develop the ability to learn human-like expressions through self-training. To address this challenge, we have designed a highly biomimetic robotic face with physical-electronic animated facial units and developed an end-to-end learning framework based on KAN (Kolmogorov-Arnold Network) and attention mechanisms. Unlike previous humanoid social robots, we have also meticulously designed an automated data collection system based on expert strategies of facial motion primitives to construct the dataset. Notably, to the best of our knowledge, this is the first open-source facial dataset for humanoid social robots. Comprehensive evaluations indicate that our approach achieves accurate and diverse facial mimicry across different test subjects.

JBHI Journal 2025 Journal Article

FRSynergy: A Feature Refinement Network for Synergistic Drug Combination Prediction

  • Lei Li
  • Haitao Li
  • Chunhou Zheng
  • Yansen Su

Synergistic drug combinations have shown promising results in treating cancer cell lines by enhancing therapeutic efficacy and minimizing adverse reactions. The effects of a drug vary across cell lines, and cell lines respond differently to various drugs during treatment. Recently, many AI-based techniques have been developed for predicting synergistic drug combinations. However, existing computational models have not addressed this phenomenon, neglecting the refinement of features for the same drug and cell line in different scenarios. In this work, we propose a feature refinement deep learning framework, termed FRSynergy, to identify synergistic drug combinations. It can guide the refinement of drug and cell line features in different scenarios by capturing relationships among diverse drug-drug-cell line triplet features and learning feature contextual information. The heterogeneous graph attention network is employed to acquire topological information-based original features for drugs and cell lines from sampled sub-graphs. Then, the feature refinement network is designed by combining attention mechanism and context information, which can learn context-aware feature representations for each drug and cell line feature in diverse drug-drug-cell line triplet contexts. Extensive experiments affirm the strong performance of FRSynergy in predicting synergistic drug combinations and, more importantly, demonstrate the effectiveness of feature refinement network in synergistic drug combination prediction.

IJCAI Conference 2025 Conference Paper

In-Context Meta LoRA Generation

  • Yihua Shao
  • Minxi Yan
  • Yang Liu
  • Siyu Chen
  • Wenjie Chen
  • Xinwei Long
  • Ziyang Yan
  • Lei Li

Low-rank Adaptation (LoRA) has demonstrated remarkable capabilities for task specific fine-tuning. However, in scenarios that involve multiple tasks, training a separate LoRA model for each one results in considerable inefficiency in terms of storage and inference. Moreover, existing parameter generation methods fail to capture the correlations among these tasks, making multi-task LoRA parameter generation challenging. To address these limitations, we propose In-Context Meta LoRA (ICM-LoRA), a novel approach that efficiently achieves task-specific customization of large language models (LLMs). Specifically, we use training data from all tasks to train a tailored generator, Conditional Variational Autoencoder (CVAE). CVAE takes task descriptions as inputs and produces task-aware LoRA weights as outputs. These LoRA weights are then merged with LLMs to create task-specialized models without the need for additional fine-tuning. Furthermore, we utilize in-context meta-learning for knowledge enhancement and task mapping, to capture the relationship between tasks and parameter distributions. As a result, our method achieves more accurate LoRA parameter generation for diverse tasks using CVAE. ICM-LoRA enables more accurate LoRA parameter reconstruction than current parameter reconstruction methods and is useful for implementing task-specific enhancements of LoRA parameters. At the same time, our method occupies 283MB, only 1% storage compared with the original LoRA. The code is available at https: //github. com/YihuaJerry/ICM-LoRA.

AAAI Conference 2025 Conference Paper

Position-Aware Guided Point Cloud Completion with CLIP Model

  • Feng Zhou
  • Qi Zhang
  • Ju Dai
  • Lei Li
  • Qing Fan
  • Junliang Xing

Point cloud completion aims to recover partial geometric and topological shapes caused by equipment defects or limited viewpoints. Current methods either solely rely on the 3D coordinates of the point cloud to complete it or incorporate additional images with well-calibrated intrinsic parameters to guide the geometric estimation of the missing parts. Although these methods have achieved excellent performance by directly predicting the location of complete points, the extracted features lack fine-grained information regarding the location of the missing area. To address this issue, we propose a rapid and efficient method to expand an unimodal framework into a multimodal framework. This approach incorporates a position-aware module designed to enhance the spatial information of the missing parts through a weighted map learning mechanism. In addition, we establish a Point-Text-Image triplet corpus PCI-TI and MVP-TI based on the existing unimodal point cloud completion dataset and use the pre-trained vision-language model CLIP to provide richer detail information for 3D shapes, thereby enhancing performance. Extensive quantitative and qualitative experiments demonstrate that our method outperforms state-of-the-art point cloud completion methods.

JBHI Journal 2025 Journal Article

Towards Clinical Application of Enhanced Timed Up and Go with Markerless Motion Capture and Machine Learning for Balance and Gait Assessment

  • Longbin Zhang
  • Ananda Sidarta
  • Tsung-Lin Wu
  • Prayook Jatesiktat
  • Hao Wang
  • Lei Li
  • Patrick Wai-Hang Kwong
  • Aoyang Long

Balance and gait impairments play a key role in falls among the elderly. Traditional clinical scales such as the Berg Balance Scale (BBS) to assess fall risk are often subjective, time consuming, and does not assess gait performance. Shorter assessments such as Timed Up and Go (TUG) are available, but most clinicians only look into the completion time. This study aimed to develop a fast, low-cost, and automated framework for balance function assessment and comprehensive gait analysis by enhancing the traditional TUG test with a markerless motion capture (MoCap) system and machine learning models. In total, we included TUG datasets of 70 participants with varying degrees of fall risk based on the BBS scores. We segmented TUG trials into five phases automatically using data from the MoCap system and extracted features from the phases. These features were then analyzed to identify those that significantly discriminate between high and low fall risk groups. Using the identified features, various machine learning models were tested to estimate the BBS scores. The markers obtained from the markerless MoCap system were used for detailed gait analysis, and lower limb kinematics were compared between the markerless and marker-based methods. Our findings indicate that individuals at high risk of falling had longer completion times, lower performance velocities, and smaller ranges of motion in lower-limb joints. Among the tested machine learning models, random forest demonstrated the best performance in predicting BBS scores (RMSE: 0. 98, $R^{2}$: 0. 94). Additionally, our markerless MoCap system showed comparable accuracy to state-of-the-art systems, eliminating the need to attach markers or sensors. The findings could help develop a quick and objective tool for balance and gait assessment in older adults, providing quantitative data to improve screening and intervention planning.

JBHI Journal 2025 Journal Article

ZSG-Net: A Zero-Shot Super-Resolution Guided Network for Ultrasound Image Segmentation and Classification

  • Xingtao Lin
  • Xiahai Zhuang
  • Lin Pan
  • Mingjing Yang
  • Liqin Huang
  • Shun Chen
  • Lei Li

Automated ultrasound (US) image analysis is hindered by challenges stemming from low resolution, noise, and non-uniform grayscale distribution, which compromise image quality. While many existing studies address these issues using super-resolution (SR) techniques, they often focus exclusively on SR without considering downstream tasks or tailoring to the unique characteristics of US images. In this work, we propose ZSG-Net, a zero-shot super-resolution-guided network, designed to bridge the gap between US image quality enhancement and its benefits in segmentation and classification. First, we introduce a zero-shot self-supervised cycle generative adversarial network (ZSCycle-GAN), tailored to the unique characteristics of US images, to perform SR while preserving critical structural details. Unlike conventional SR methods that focus solely on image enhancement, ZSCycle-GAN is designed to optimize downstream tasks. Second, we adopt a zero-shot self-supervised learning strategy, eliminating the reliance on labeled data and addressing the scarcity of annotated medical imaging datasets. Third, we incorporate a random image degradation (RID) strategy to expand the degradation space for clinical US images, enabling robust learning of diverse quality variations. Extensive experiments on three US image datasets validate the effectiveness of the proposed model. Results demonstrate superior performance in segmentation and classification tasks compared to existing approaches, underscoring the potential of our method to improve US image analysis in clinical settings.

AAAI Conference 2024 Conference Paper

Ada-Retrieval: An Adaptive Multi-Round Retrieval Paradigm for Sequential Recommendations

  • Lei Li
  • Jianxun Lian
  • Xiao Zhou
  • Xing Xie

Retrieval models aim at selecting a small set of item candidates which match the preference of a given user. They play a vital role in large-scale recommender systems since subsequent models such as rankers highly depend on the quality of item candidates. However, most existing retrieval models employ a single-round inference paradigm, which may not adequately capture the dynamic nature of user preferences and stuck in one area in the item space. In this paper, we propose Ada-Retrieval, an adaptive multi-round retrieval paradigm for recommender systems that iteratively refines user representations to better capture potential candidates in the full item space. Ada-Retrieval comprises two key modules: the item representation adapter and the user representation adapter, designed to inject context information into items' and users' representations. The framework maintains a model-agnostic design, allowing seamless integration with various backbone models such as RNNs or Transformers. We perform experiments on three widely used public datasets, incorporating five powerful sequential recommenders as backbone models. Our results demonstrate that Ada-Retrieval significantly enhances the performance of various base models, with consistent improvements observed across different datasets. Our code and data are publicly available at: https://github.com/ll0ruc/Ada-Retrieval.

NeurIPS Conference 2024 Conference Paper

Invisible Image Watermarks Are Provably Removable Using Generative AI

  • Xuandong Zhao
  • Kexun Zhang
  • Zihao Su
  • Saastha Vasan
  • Ilya Grishchenko
  • Christopher Kruegel
  • Giovanni Vigna
  • Yu-Xiang Wang

Invisible watermarks safeguard images' copyrights by embedding hidden messages only detectable by owners. They also prevent people from misusing images, especially those generated by AI models. We propose a family of regeneration attacks to remove these invisible watermarks. The proposed attack method first adds random noise to an image to destroy the watermark and then reconstructs the image. This approach is flexible and can be instantiated with many existing image-denoising algorithms and pre-trained generative models such as diffusion models. Through formal proofs and extensive empirical evaluations, we demonstrate that pixel-level invisible watermarks are vulnerable to this regeneration attack. Our results reveal that, across four different pixel-level watermarking schemes, the proposed method consistently achieves superior performance compared to existing attack techniques, with lower detection rates and higher image quality. However, watermarks that keep the image semantically similar can be an alternative defense against our attacks. Our finding underscores the need for a shift in research/industry emphasis from invisible watermarks to semantic-preserving watermarks. Code is available at https: //github. com/XuandongZhao/WatermarkAttacker

JBHI Journal 2024 Journal Article

MDNNSyn: A Multi-Modal Deep Learning Framework for Drug Synergy Prediction

  • Lei Li
  • Haitao Li
  • Tseren-Onolt Ishdorj
  • Chunhou Zheng
  • Yansen Su

Synergistic drug combination prediction tasks based on the computational models have been widely studied and applied in the cancer field. However, most of models only consider the interactions between drug pairs and specific cell lines, without taking into account the multiple biological relationships of drug-drug and cell line-cell line that also largely affect synergistic mechanisms. To this end, here we propose a multi-modal deep learning framework, termed MDNNSyn, which adequately applies multi-source information and trains multi-modal features to infer potential synergistic drug combinations. MDNNSyn extracts topology modality features by implementing the multi-layer hypergraph neural network on drug synergy hypergraph and constructs semantic modality features through similarity strategy. A multi-modal fusion network layer with gated neural network is then employed for synergy score prediction. MDNNSyn is compared to five classic and state-of-the-art prediction methods on DrugCombDB and Oncology-Screen datasets. The model achieves area under the curve (AUC) scores of 0. 8682 and 0. 9013 on two datasets, an improvement of 3. 70 $\%$ and 2. 71 $\%$ over the second-best model. Case study indicates that MDNNSyn is capable of detecting potential synergistic drug combinations.

NeurIPS Conference 2024 Conference Paper

MindMerger: Efficiently Boosting LLM Reasoning in non-English Languages

  • Zixian Huang
  • Wenhao Zhu
  • Gong Cheng
  • Lei Li
  • Fei Yuan

Reasoning capabilities are crucial for Large Language Models~(LLMs), yet a notable gap exists between English and non-English languages. To bridge this disparity, some works fine-tune LLMs to relearn reasoning capabilities in non-English languages, while others replace non-English inputs with an external model's outputs such as English translation text to circumvent the challenge of LLM understanding non-English. Unfortunately, these methods often underutilize the built-in skilled reasoning and useful language understanding capabilities of LLMs. In order to better utilize the minds of reasoning and language understanding in LLMs, we propose a new method, namely MergeMinds, which merges LLMs with the external language understanding capabilities from multilingual models to boost the multilingual reasoning performance. Furthermore, a two-step training scheme is introduced to first train to embeded the external capabilities into LLMs and then train the collaborative utilization of the external capabilities and the built-in capabilities in LLMs. Experiments on three multilingual reasoning datasets and a language understanding dataset demonstrate that MergeMinds consistently outperforms all baselines, especially in low-resource languages. Without updating the parameters of LLMs, the average accuracy improved by 6. 7 and 8. 0 across all languages and low-resource languages on the MGSM dataset, respectively.

JBHI Journal 2024 Journal Article

NeighborNet: Learning Intra- and Inter-Image Pixel Neighbor Representation for Breast Lesion Segmentation

  • Weiwei Cao
  • Jianfeng Guo
  • Xiaohui You
  • Yuxin Liu
  • Lei Li
  • Wenju Cui
  • Yuzhu Cao
  • Xinjian Chen

Breast lesion segmentation from ultrasound images is essential in computer-aided breast cancer diagnosis. To alleviate the problems of blurry lesion boundaries and irregular morphologies, common practices combine CNN and attention to integrate global and local information. However, previous methods use two independent modules to extract global and local features separately, such feature-wise inflexible integration ignores the semantic gap between them, resulting in representation redundancy/insufficiency and undesirable restrictions in clinic practices. Moreover, medical images are highly similar to each other due to the imaging methods and human tissues, but the captured global information by transformer-based methods in the medical domain is limited within images, the semantic relations and common knowledge across images are largely ignored. To alleviate the above problems, in the neighbor view, this paper develops a pixel neighbor representation learning method (NeighborNet) to flexibly integrate global and local context within and across images for lesion morphology and boundary modeling. Concretely, we design two neighbor layers to investigate two properties (i. e. , number and distribution) of neighbors. The neighbor number for each pixel is not fixed but determined by itself. The neighbor distribution is extended from one image to all images in the datasets. With the two properties, for each pixel at each feature level, the proposed NeighborNet can evolve into the transformer or degenerate into the CNN for adaptive context representation learning to cope with the irregular lesion morphologies and blurry boundaries. The state-of-the-art performances on three ultrasound datasets prove the effectiveness of the proposed NeighborNet.

ICRA Conference 2024 Conference Paper

Object-centric Cross-modal Feature Distillation for Event-based Object Detection

  • Lei Li
  • Alexander Liniger
  • Mario Millhäusler
  • Vagia Tsiminaki
  • Yuanyou Li
  • Dengxin Dai

Event cameras are gaining popularity due to their unique properties, such as their low latency and high dynamic range. One task where these benefits can be crucial is real-time object detection. However, RGB detectors still outperform event-based detectors due to the sparsity of the event data and missing visual details. In this paper, we propose a cross-modality feature distillation method that can focus on regions where the knowledge distillation works best to shrink the detection performance gap between these two modalities. We achieve this by using an object-centric slot attention mechanism that can iteratively decouple feature maps into object-centric features and corresponding pixel-features used for distillation. We evaluate our novel distillation approach on a synthetic and a real event dataset with aligned grayscale images as a teacher modality. We show that object-centric distillation allows to significantly improve the performance of the event-based student object detector, nearly halving the performance gap with respect to the teacher.

NeurIPS Conference 2024 Conference Paper

Scaling Law for Time Series Forecasting

  • Jingzhe Shi
  • Qinwei Ma
  • Huan Ma
  • Lei Li

Scaling law that rewards large datasets, complex models and enhanced data granularity has been observed in various fields of deep learning. Yet, studies on time series forecasting have cast doubt on scaling behaviors of deep learning methods for time series forecasting: while more training data improves performance, more capable models do not always outperform less capable models, and longer input horizon may hurt performance for some models. We propose a theory for scaling law for time series forecasting that can explain these seemingly abnormal behaviors. We take into account the impact of dataset size and model complexity, as well as time series data granularity, particularly focusing on the look-back horizon, an aspect that has been unexplored in previous theories. Furthermore, we empirically evaluate various models using a diverse set of time series forecasting datasets, which (1) verifies the validity of scaling law on dataset size and model complexity within the realm of time series forecasting, and (2) validates our theoretical framework, particularly regarding the influence of look back horizon. We hope our findings may inspire new models targeting time series forecasting datasets of limited size, as well as large foundational datasets and models for time series forecasting in future works.

AAAI Conference 2024 Conference Paper

Where It Really Matters: Few-Shot Environmental Conservation Media Monitoring for Low-Resource Languages

  • Sameer Jain
  • Sedrick Scott Keh
  • Shova Chhetri
  • Karun Dewan
  • Pablo Izquierdo
  • Johanna Prussmann
  • Pooja Shrestha
  • César Suárez

Environmental conservation organizations routinely monitor news content on conservation in protected areas to maintain situational awareness of developments that can have an environmental impact. Existing automated media monitoring systems require large amounts of data labeled by domain experts, which is only feasible at scale for high-resource languages like English. However, such tools are most needed in the global south where the news of interest is mainly in local low-resource languages, and far fewer experts are available to annotate datasets on a sustainable basis. In this paper, we propose NewsSerow, a method to automatically recognize environmental conservation content in low-resource languages. NewsSerow is a pipeline of summarization, in-context few-shot classification, and self-reflection using large language models (LLMs). Using at most 10 demonstration example news articles in Nepali, NewsSerow significantly outperforms other few-shot methods and can achieve comparable performance with models fully fine-tuned using thousands of examples. With NewsSerow, Organization X has been able to deploy the media monitoring tool in Nepal, significantly reducing their operational burden, and ensuring that AI tools for conservation actually reach the communities that need them the most. NewsSerow has also been deployed for countries with other languages like Colombia.

NeurIPS Conference 2023 Conference Paper

ALGO: Synthesizing Algorithmic Programs with Generated Oracle Verifiers

  • Kexun Zhang
  • Danqing Wang
  • Jingtao Xia
  • William Yang Wang
  • Lei Li

Large language models (LLMs) excel at implementing code from functionality descriptions but struggle with algorithmic problems that require not only implementation but also identification of the suitable algorithm. Moreover, LLM-generated programs lack guaranteed correctness and require human verification. To address these challenges, we propose ALGO, a framework that synthesizes Algorithmic programs with LLM-Generated Oracles to guide the generation and verify their correctness. ALGO first generates a reference oracle by prompting an LLM to exhaustively enumerate all the combinations of relevant variables. This oracle is then utilized to guide an arbitrary search strategy in exploring the algorithm space and to verify the synthesized algorithms. Our study shows that the LLM-generatedoracles are correct for 88% of the cases. With the oracles as verifiers, ALGO can be integrated with any existing code generation model in a model-agnostic manner to enhance its performance. Experiments show that when equipped with ALGO, we achieve an 8× better one-submission pass rate over the Codex model and a 2. 6× better one-submission pass rate over CodeT, the current state-of-the-art model on CodeContests. We can also get 1. 3× better pass rate over the ChatGPT Code Interpreter on unseen problems. The problem set we used for testing, the prompts we used, the verifier and solution programs, and the test cases generated by ALGOare available at https: //github. com/zkx06111/ALGO.

AAAI Conference 2023 Conference Paper

Converge to the Truth: Factual Error Correction via Iterative Constrained Editing

  • Jiangjie Chen
  • Rui Xu
  • Wenxuan Zeng
  • Changzhi Sun
  • Lei Li
  • Yanghua Xiao

Given a possibly false claim sentence, how can we automatically correct it with minimal editing? Existing methods either require a large number of pairs of false and corrected claims for supervised training or do not handle well errors spanning over multiple tokens within an utterance. In this paper, we propose VENCE, a novel method for factual error correction (FEC) with minimal edits. VENCE formulates the FEC problem as iterative sampling editing actions with respect to a target density function. We carefully design the target function with predicted truthfulness scores from an offline trained fact verification model. VENCE samples the most probable editing positions based on back-calculated gradients of the truthfulness score concerning input tokens and the editing actions using a distantly-supervised language model (T5). Experiments on a public dataset show that VENCE improves the well-adopted SARI metric by 5.3 (or a relative improvement of 11.8%) over the previous best distantly-supervised methods.

JBHI Journal 2023 Journal Article

Deep Learning Segmentation of the Right Ventricle in Cardiac MRI: The M&Ms Challenge

  • Carlos Martín-Isla
  • Víctor M. Campello
  • Cristian Izquierdo
  • Kaisar Kushibar
  • Carla Sendra-Balcells
  • Polyxeni Gkontra
  • Alireza Sojoudi
  • Mitchell J. Fulton

In recent years, several deep learning models have been proposed to accurately quantify and diagnose cardiac pathologies. These automated tools heavily rely on the accurate segmentation of cardiac structures in MRI images. However, segmentation of the right ventricle is challenging due to its highly complex shape and ill-defined borders. Hence, there is a need for new methods to handle such structure's geometrical and textural complexities, notably in the presence of pathologies such as Dilated Right Ventricle, Tricuspid Regurgitation, Arrhythmogenesis, Tetralogy of Fallot, and Inter-atrial Communication. The last MICCAI challenge on right ventricle segmentation was held in 2012 and included only 48 cases from a single clinical center. As part of the 12th Workshop on Statistical Atlases and Computational Models of the Heart (STACOM 2021), the M&Ms-2 challenge was organized to promote the interest of the research community around right ventricle segmentation in multi-disease, multi-view, and multi-center cardiac MRI. Three hundred sixty CMR cases, including short-axis and long-axis 4-chamber views, were collected from three Spanish hospitals using nine different scanners from three different vendors, and included a diverse set of right and left ventricle pathologies. The solutions provided by the participants show that nnU-Net achieved the best results overall. However, multi-view approaches were able to capture additional information, highlighting the need to integrate multiple cardiac diseases, views, scanners, and acquisition protocols to produce reliable automatic cardiac segmentation algorithms.

NeurIPS Conference 2023 Conference Paper

FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation

  • Yuanxin Liu
  • Lei Li
  • Shuhuai Ren
  • Rundong Gao
  • Shicheng Li
  • Sishuo Chen
  • Xu Sun
  • Lu Hou

Recently, open-domain text-to-video (T2V) generation models have made remarkable progress. However, the promising results are mainly shown by the qualitative cases of generated videos, while the quantitative evaluation of T2V models still faces two critical problems. Firstly, existing studies lack fine-grained evaluation of T2V models on different categories of text prompts. Although some benchmarks have categorized the prompts, their categorization either only focuses on a single aspect or fails to consider the temporal information in video generation. Secondly, it is unclear whether the automatic evaluation metrics are consistent with human standards. To address these problems, we propose FETV, a benchmark for F ine-grained E valuation of T ext-to- V ideo generation. FETV is multi-aspect, categorizing the prompts based on three orthogonal aspects: the major content, the attributes to control and the prompt complexity. FETV is also temporal-aware, which introduces several temporal categories tailored for video generation. Based on FETV, we conduct comprehensive manual evaluations of four representative T2V models, revealing their pros and cons on different categories of prompts from different aspects. We also extend FETV as a testbed to evaluate the reliability of automatic T2V metrics. The multi-aspect categorization of FETV enables fine-grained analysis of the metrics' reliability in different scenarios. We find that existing automatic metrics (e. g. , CLIPScore and FVD) correlate poorly with human evaluation. To address this problem, we explore several solutions to improve CLIPScore and FVD, and develop two automatic metrics that exhibit significant higher correlation with humans than existing metrics. Benchmark page: https: //github. com/llyx97/FETV.

AAAI Conference 2023 Short Paper

On Analyzing the Role of Image for Visual-Enhanced Relation Extraction (Student Abstract)

  • Lei Li
  • Xiang Chen
  • Shuofei Qiao
  • Feiyu Xiong
  • Huajun Chen
  • Ningyu Zhang

Multimodal relation extraction is an essential task for knowledge graph construction. In this paper, we take an in-depth empirical analysis that indicates the inaccurate information in the visual scene graph leads to poor modal alignment weights, further degrading performance. Moreover, the visual shuffle experiments illustrate that the current approaches may not take full advantage of visual information. Based on the above observation, we further propose a strong baseline with an implicit fine-grained multimodal alignment based on Transformer for multimodal relation extraction. Experimental results demonstrate the better performance of our method. Codes are available at https://github.com/zjunlp/DeepKE/tree/main/example/re/multimodal.

TIST Journal 2023 Journal Article

On the Relationship between Explanation and Recommendation: Learning to Rank Explanations for Improved Performance

  • Lei Li
  • Yongfeng Zhang
  • Li Chen

Explaining to users why some items are recommended is critical, as it can help users to make better decisions, increase their satisfaction, and gain their trust in recommender systems (RS). However, existing explainable RS usually consider explanation as a side output of the recommendation model, which has two problems: (1) It is difficult to evaluate the produced explanations, because they are usually model-dependent, and (2) as a result, how the explanations impact the recommendation performance is less investigated. In this article, explaining recommendations is formulated as a ranking task and learned from data, similarly to item ranking for recommendation. This makes it possible for standard evaluation of explanations via ranking metrics (e.g., Normalized Discounted Cumulative Gain). Furthermore, this article extends traditional item ranking to an item–explanation joint-ranking formalization to study if purposely selecting explanations could reach certain learning goals, e.g., improving recommendation performance. A great challenge, however, is that the sparsity issue in the user-item-explanation data would be inevitably severer than that in traditional user–item interaction data, since not every user–item pair can be associated with all explanations. To mitigate this issue, this article proposes to perform two sets of matrix factorization by considering the ternary relationship as two groups of binary relationships. Experiments on three large datasets verify the solution’s effectiveness on both explanation ranking and item recommendation.

IJCAI Conference 2023 Conference Paper

One Model for All Domains: Collaborative Domain-Prefix Tuning for Cross-Domain NER

  • Xiang Chen
  • Lei Li
  • Shuofei Qiao
  • Ningyu Zhang
  • Chuanqi Tan
  • Yong Jiang
  • Fei Huang
  • Huajun Chen

Cross-domain NER is a challenging task to address the low-resource problem in practical scenarios. Previous typical solutions mainly obtain a NER model by pre-trained language models (PLMs) with data from a rich-resource domain and adapt it to the target domain. Owing to the mismatch issue among entity types in different domains, previous approaches normally tune all parameters of PLMs, ending up with an entirely new NER model for each domain. Moreover, current models only focus on leveraging knowledge in one general source domain while failing to successfully transfer knowledge from multiple sources to the target. To address these issues, we introduce Collaborative Domain-Prefix Tuning for cross-domain NER (CP-NER) based on text-to-text generative PLMs. Specifically, we present text-to-text generation grounding domain-related instructors to transfer knowledge to new domain NER tasks without structural modifications. We utilize frozen PLMs and conduct collaborative domain-prefix tuning to stimulate the potential of PLMs to handle NER tasks across various domains. Experimental results on the Cross-NER benchmark show that the proposed approach has flexible transfer ability and performs better on both one-source and multiple-source cross-domain NER tasks.

NeurIPS Conference 2023 Conference Paper

Statistical Knowledge Assessment for Large Language Models

  • Qingxiu Dong
  • Jingjing Xu
  • Lingpeng Kong
  • Zhifang Sui
  • Lei Li

Given varying prompts regarding a factoid question, can a large language model (LLM) reliably generate factually correct answers? Existing LLMs may generate distinct responses for different prompts. In this paper, we study the problem of quantifying knowledge contained in an LLM regarding a given set of facts. We propose KaRR, a statistical approach to assess factual knowledge for LLMs. The main idea is to estimate the ratio of LLM generating text corresponding to the answer entity given diverse prompts of the subject and the querying relation, versus it generating by random chances. Our assessment suite contains a comprehensive set of 994, 123 entities and 600 relations, with 1, 395, 905 text aliases. We use our method to evaluate 20 LLMs of various sizes, including LLaMA, Alpaca, OPT, etc. Experiments show that our results have a strong correlation (0. 43 Kendall's $\tau$) with the results of human assessment on LLMs. Our results reveal that the knowledge in LLMs with the same backbone architecture adheres to the scaling law, while tuning on instruction-following data sometimes compromises the model's capability to generate factually correct text reliably.

IJCAI Conference 2022 Conference Paper

Cross-modal Representation Learning and Relation Reasoning for Bidirectional Adaptive Manipulation

  • Lei Li
  • Kai Fan
  • Chun Yuan

Since single-modal controllable manipulation typically requires supervision of information from other modalities or cooperation with complex software and experts, this paper addresses the problem of cross-modal adaptive manipulation (CAM). The novel task performs cross-modal semantic alignment from mutual supervision and implements bidirectional exchange of attributes, relations, or objects in parallel, benefiting both modalities while significantly reducing manual effort. We introduce a robust solution for CAM, which includes two essential modules, namely Heterogeneous Representation Learning (HRL) and Cross-modal Relation Reasoning (CRR). The former is designed to perform representation learning for cross-modal semantic alignment on heterogeneous graph nodes. The latter is adopted to identify and exchange the focused attributes, relations, or objects in both modalities. Our method produces pleasing cross-modal outputs on CUB and Visual Genome.

JBHI Journal 2022 Journal Article

Cross-Modality Multi-Atlas Segmentation via Deep Registration and Label Fusion

  • Wangbin Ding
  • Lei Li
  • Xiahai Zhuang
  • Liqin Huang

Multi-atlas segmentation (MAS) is a promising framework for medical image segmentation. Generally, MAS methods register multiple atlases, i. e. , medical images with corresponding labels, to a target image; and the transformed atlas labels can be combined to generate target segmentation via label fusion schemes. Many conventional MAS methods employed the atlases from the same modality as the target image. However, the number of atlases with the same modality may be limited or even missing in many clinical applications. Besides, conventional MAS methods suffer from the computational burden of registration or label fusion procedures. In this work, we design a novel cross-modality MAS framework, which uses available atlases from a certain modality to segment a target image from another modality. To boost the computational efficiency of the framework, both the image registration and label fusion are achieved by well-designed deep neural networks. For the atlas-to-target image registration, we propose a bi-directional registration network (BiRegNet), which can efficiently align images from different modalities. For the label fusion, we design a similarity estimation network (SimNet), which estimates the fusion weight of each atlas by measuring its similarity to the target image. SimNet can learn multi-scale information for similarity estimation to improve the performance of label fusion. The proposed framework was evaluated by the left ventricle and liver segmentation tasks on the MM-WHS and CHAOS datasets, respectively. Results have shown that the framework is effective for cross-modality MAS in both registration and label fusion https://github.com/NanYoMy/cmmas.

NeurIPS Conference 2022 Conference Paper

Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning

  • Xiang Chen
  • Lei Li
  • Ningyu Zhang
  • Xiaozhuan Liang
  • Shumin Deng
  • Chuanqi Tan
  • Fei Huang
  • Luo Si

Prompt learning approaches have made waves in natural language processing by inducing better few-shot performance while they still follow a parametric-based learning paradigm; the oblivion and rote memorization problems in learning may encounter unstable generalization issues. Specifically, vanilla prompt learning may struggle to utilize atypical instances by rote during fully-supervised training or overfit shallow patterns with low-shot data. To alleviate such limitations, we develop RetroPrompt with the motivation of decoupling knowledge from memorization to help the model strike a balance between generalization and memorization. In contrast with vanilla prompt learning, RetroPrompt constructs an open-book knowledge-store from training instances and implements a retrieval mechanism during the process of input, training and inference, thus equipping the model with the ability to retrieve related contexts from the training corpus as cues for enhancement. Extensive experiments demonstrate that RetroPrompt can obtain better performance in both few-shot and zero-shot settings. Besides, we further illustrate that our proposed RetroPrompt can yield better generalization abilities with new datasets. Detailed analysis of memorization indeed reveals RetroPrompt can reduce the reliance of language models on memorization; thus, improving generalization for downstream tasks. Code is available in https: //github. com/zjunlp/PromptKG/tree/main/research/RetroPrompt.

AAAI Conference 2022 Conference Paper

Deepfake Network Architecture Attribution

  • Tianyun Yang
  • Ziyao Huang
  • Juan Cao
  • Lei Li
  • Xirong Li

With the rapid progress of generation technology, it has become necessary to attribute the origin of fake images. Existing works on fake image attribution perform multi-class classification on several Generative Adversarial Network (GAN) models and obtain high accuracies. While encouraging, these works are restricted to model-level attribution, only capable of handling images generated by seen models with a specific seed, loss and dataset, which is limited in real-world scenarios when fake images may be generated by privately trained models. This motivates us to ask whether it is possible to attribute fake images to the source models’ architectures even if they are finetuned or retrained under different configurations. In this work, we present the first study on Deepfake Network Architecture Attribution to attribute fake images on architecture-level. Based on an observation that GAN architecture is likely to leave globally consistent fingerprints while traces left by model weights vary in different regions, we provide a simple yet effective solution named DNA-Det for this problem. Extensive experiments on multiple cross-test setups and a large-scale dataset demonstrate the effectiveness of DNA-Det.

NeurIPS Conference 2022 Conference Paper

Learning Multi-resolution Functional Maps with Spectral Attention for Robust Shape Matching

  • Lei Li
  • Nicolas Donati
  • Maks Ovsjanikov

In this work, we present a novel non-rigid shape matching framework based on multi-resolution functional maps with spectral attention. Existing functional map learning methods all rely on the critical choice of the spectral resolution hyperparameter, which can severely affect the overall accuracy or lead to overfitting, if not chosen carefully. In this paper, we show that spectral resolution tuning can be alleviated by introducing spectral attention. Our framework is applicable in both supervised and unsupervised settings, and we show that it is possible to train the network so that it can adapt the spectral resolution, depending on the given shape input. More specifically, we propose to compute multi-resolution functional maps that characterize correspondence across a range of spectral resolutions, and introduce a spectral attention network that helps to combine this representation into a single coherent final correspondence. Our approach is not only accurate with near-isometric input, for which a high spectral resolution is typically preferred, but also robust and able to produce reasonable matching even in the presence of significant non-isometric distortion, which poses great challenges to existing methods. We demonstrate the superior performance of our approach through experiments on a suite of challenging near-isometric and non-isometric shape matching benchmarks.

AAAI Conference 2022 Conference Paper

LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification

  • Jiangjie Chen
  • Qiaoben Bao
  • Changzhi Sun
  • Xinbo Zhang
  • Jiaze Chen
  • Hao Zhou
  • Yanghua Xiao
  • Lei Li

Given a natural language statement, how to verify its veracity against a large-scale textual knowledge source like Wikipedia? Most existing neural models make predictions without giving clues about which part of a false claim goes wrong. In this paper, we propose LOREN, an approach for interpretable fact verification. We decompose the verification of the whole claim at phrase-level, where the veracity of the phrases serves as explanations and can be aggregated into the final verdict according to logical rules. The key insight of LOREN is to represent claim phrase veracity as three-valued latent variables, which are regularized by aggregation logical rules. The final claim verification is based on all latent variables. Thus, LOREN enjoys the additional benefit of interpretability — it is easy to explain how it reaches certain results with claim phrase veracity. Experiments on a public fact verification benchmark show that LOREN is competitive against previous approaches while enjoying the merit of faithful and accurate interpretability. The resources of LOREN are available at: https: //github. com/jiangjiechen/LOREN.

AAAI Conference 2022 Conference Paper

Non-autoregressive Translation with Layer-Wise Prediction and Deep Supervision

  • Chenyang Huang
  • Hao Zhou
  • Osmar R. Zaïane
  • Lili Mou
  • Lei Li

How do we perform efficient inference while retaining high translation quality? Existing neural machine translation models, such as Transformer, achieve high performance, but they decode words one by one, which is inefficient. Recent non-autoregressive translation models speed up the inference, but their quality is still inferior. In this work, we propose DSLP, a highly efficient and high-performance model for machine translation. The key insight is to train a nonautoregressive Transformer with Deep Supervision and feed additional Layer-wise Predictions. We conducted extensive experiments on four translation tasks (both directions of WMT’14 EN–DE and WMT’16 EN–RO). Results show that our approach consistently improves the BLEU scores compared with respective base models. Specifically, our best variant outperforms the autoregressive model on three translation tasks, while being 14. 8 times more efficient in inference.

IJCAI Conference 2022 Conference Paper

Rethinking the Promotion Brought by Contrastive Learning to Semi-Supervised Node Classification

  • Deli Chen
  • Yankai Lin
  • Lei Li
  • Xuancheng Ren
  • Peng Li
  • Jie Zhou
  • Xu Sun

Graph Contrastive Learning (GCL) has proven highly effective in promoting the performance of Semi-Supervised Node Classification (SSNC). However, existing GCL methods are generally transferred from other fields like CV or NLP, whose underlying working mechanism remains underexplored. In this work, we first deeply probe the working mechanism of GCL in SSNC, and find that the promotion brought by GCL is severely unevenly distributed: the improvement mainly comes from subgraphs with less annotated information, which is fundamentally different from contrastive learning in other fields. However, existing GCL methods generally ignore this uneven distribution of annotated information and apply GCL evenly to the whole graph. To remedy this issue and further improve GCL in SSNC, we propose the Topology InFormation gain-Aware Graph Contrastive Learning (TIFA-GCL) framework that considers the annotated information distribution across graph in GCL. Extensive experiments on six benchmark graph datasets, including the enormous OGB-Products graph, show that TIFA-GCL can bring a larger improvement than existing GCL methods in both transductive and inductive settings. Further experiments demonstrate the generalizability and interpretability of TIFA-GCL.

AAAI Conference 2022 Conference Paper

Unsupervised Editing for Counterfactual Stories

  • Jiangjie Chen
  • Chun Gan
  • Sijie Cheng
  • Hao Zhou
  • Yanghua Xiao
  • Lei Li

Creating what-if stories requires reasoning about prior statements and possible outcomes of the changed conditions. One can easily generate coherent endings under new conditions, but it would be challenging for current systems to do it with minimal changes to the original story. Therefore, one major challenge is the trade-off between generating a logical story and rewriting with minimal-edits. In this paper, we propose EDUCAT, an editing-based unsupervised approach for counterfactual story rewriting. EDUCAT includes a target position detection strategy based on estimating causal effects of the what-if conditions, which keeps the causal invariant parts of the story. EDUCAT then generates the stories under fluency, coherence and minimal-edits constraints. We also propose a new metric to alleviate the shortcomings of current automatic metrics and better evaluate the trade-off. We evaluate EDU- CAT on a public counterfactual story rewriting benchmark. Experiments show that EDUCAT achieves the best trade-off over unsupervised SOTA methods according to both automatic and human evaluation. The resources of EDUCAT are available at: https: //github. com/jiangjiechen/EDUCAT.

IROS Conference 2022 Conference Paper

Visual Environment perception for obstacle detection and crossing of lower-limb exoskeletons

  • Manoj Ramanathan
  • Lincong Luo
  • Jie Kai Er
  • Ming Jeat Foo
  • Chye Hsia Chiam
  • Lei Li
  • Wei-Yun Yau
  • Wei Tech Ang

Lower limb exoskeletons offer support for patients suffering from mobility disorders due to injury, stroke, etc. But these devices are not used in day-to-day life and environments due to their limited human-computer interface to perceive and handle different terrains and tasks. In this paper, we introduce a simple vision-based environment perception pipeline for lower- limb exoskeletons for obstacle crossing tasks. The proposed pipeline consists of three stages, namely, ground plane and obstacle detection, estimating obstacle location and dimensions, and obstacle tracking. To reduce noisy artifacts and reliably detect obstacles, we propose a similarity metric based on color, gradient orientation, and 2D surface normal. Depth map of the detected obstacle region is utilized for estimating the obstacle location and dimensions. Also, we consider two obstacle tracking modes for obstacle crossing, visual tracking using a RGB-D camera and positional tracking using a SLAM camera. The proposed vision-based perception pipeline is integrated with an exoskeleton, where we propose a control scheme that can vary step length adaptively to successfully cross detected obstacles. We conduct offline and online experiments to validate the proposed perception pipeline and provide insights on the same. Our experiments show that the proposed pipeline allows exoskeletons to understand their environment and successfully cross obstacles.

AAAI Conference 2022 Conference Paper

Well-Classified Examples Are Underestimated in Classification with Deep Neural Networks

  • Guangxiang Zhao
  • Wenkai Yang
  • Xuancheng Ren
  • Lei Li
  • Yunfang Wu
  • Xu Sun

The conventional wisdom behind learning deep classification models is to focus on bad-classified examples and ignore wellclassified examples that are far from the decision boundary. For instance, when training with cross-entropy loss, examples with higher likelihoods (i. e. , well-classified examples) contribute smaller gradients in back-propagation. However, we theoretically show that this common practice hinders representation learning, energy optimization, and margin growth. To counteract this deficiency, we propose to reward well-classified examples with additive bonuses to revive their contribution to the learning process. This counterexample theoretically addresses these three issues. We empirically support this claim by directly verifying the theoretical results or significant performance improvement with our counterexample on diverse tasks, including image classification, graph classification, and machine translation. Furthermore, this paper shows that we can deal with complex scenarios, such as imbalanced classification, OOD detection, and applications under adversarial attacks, because our idea can solve these three issues. Code is available at https: //github. com/lancopku/well-classified-examples-areunderestimated.

AAAI Conference 2021 Conference Paper

ACMo: Angle-Calibrated Moment Methods for Stochastic Optimization

  • Xunpeng Huang
  • Runxin Xu
  • Hao Zhou
  • Zhe Wang
  • Zhengyang Liu
  • Lei Li

Stochastic gradient descent (SGD) is a widely used method for its outstanding generalization ability and simplicity. Adaptive gradient methods have been proposed to further accelerate the optimization process. In this paper, we revisit existing adaptive gradient optimization methods with a new interpretation. Such new perspective leads to a refreshed understanding of the roles of second moments in stochastic optimization. Based on this, we propose Angle-Calibration Moment method (ACMo), a novel stochastic optimization method. It enjoys the benefits of second moments with only first moment updates. Theoretical analysis shows that ACMo is able to achieve the same convergence rate as mainstream adaptive methods. Experiments on a variety of CV and NLP tasks demonstrate that ACMo has a comparable convergence to state-of-the-art Adam-type optimizers, and even a better generalization performance in most cases. The code is available at https: //github. com/Xunpeng746/ACMo.

AAAI Conference 2021 Conference Paper

Consecutive Decoding for Speech-to-text Translation

  • Qianqian Dong
  • Mingxuan Wang
  • Hao Zhou
  • Shuang Xu
  • Bo Xu
  • Lei Li

Speech-to-text translation (ST), which directly translates the source language speech to the target language text, has attracted intensive attention recently. However, the combination of speech recognition and machine translation in a single model poses a heavy burden on the direct cross-modal crosslingual mapping. To reduce the learning difficulty, we propose COnSecutive Transcription and Translation (COSTT), an integral approach for speech-to-text translation. The key idea is to generate source transcript and target translation text with a single decoder. It benefits the model training so that additional large parallel text corpus can be fully exploited to enhance the speech translation training. Our method is verified on three mainstream datasets, including Augmented LibriSpeech English-French dataset, TED English-German dataset, and TED English-Chinese dataset. Experiments show that our proposed COSTT outperforms the previous state-ofthe-art methods. The code is available at https: //github. com/ dqqcasia/st.

NeurIPS Conference 2021 Conference Paper

Duplex Sequence-to-Sequence Learning for Reversible Machine Translation

  • Zaixiang Zheng
  • Hao Zhou
  • Shujian Huang
  • Jiajun Chen
  • Jingjing Xu
  • Lei Li

Sequence-to-sequence learning naturally has two directions. How to effectively utilize supervision signals from both directions? Existing approaches either require two separate models, or a multitask-learned model but with inferior performance. In this paper, we propose REDER (Reversible Duplex Transformer), a parameter-efficient model and apply it to machine translation. Either end of REDER can simultaneously input and output a distinct language. Thus REDER enables {\em reversible machine translation} by simply flipping the input and output ends. Experiments verify that REDER achieves the first success of reversible machine translation, which helps outperform its multitask-trained baselines by up to 1. 3 BLEU.

AAAI Conference 2021 Conference Paper

Finding Sparse Structures for Domain Specific Neural Machine Translation

  • Jianze Liang
  • Chengqi Zhao
  • Mingxuan Wang
  • Xipeng Qiu
  • Lei Li

Neural machine translation often adopts the fine-tuning approach to adapt to specific domains. However, nonrestricted fine-tuning can easily degrade on the general domain and over-fit to the target domain. To mitigate the issue, we propose PRUNE-TUNE, a novel domain adaptation method via gradual pruning. It learns tiny domain-specific sub-networks during fine-tuning on new domains. PRUNE-TUNE alleviates the over-fitting and the degradation problem without model modification. Furthermore, PRUNE-TUNE is able to sequentially learn a single network with multiple disjoint domainspecific sub-networks for multiple domains. Empirical experiment results show that PRUNE-TUNE outperforms several strong competitors in the target domain test set without sacrificing the quality on the general domain in both single and multi-domain settings. The source code and data are available at https: //github. com/ohlionel/Prune-Tune.

AAAI Conference 2021 Conference Paper

Listen, Understand and Translate: Triple Supervision Decouples End-to-end Speech-to-text Translation

  • Qianqian Dong
  • Rong Ye
  • Mingxuan Wang
  • Hao Zhou
  • Shuang Xu
  • Bo Xu
  • Lei Li

An end-to-end speech-to-text translation (ST) takes audio in a source language and outputs the text in a target language. Existing methods are limited by the amount of parallel corpus. Can we build a system to fully utilize signals in a parallel ST corpus? We are inspired by human understanding system which is composed of auditory perception and cognitive processing. In this paper, we propose Listen-Understand- Translate, (LUT), a unified framework with triple supervision signals to decouple the end-to-end speech-to-text translation task. LUT is able to guide the acoustic encoder to extract as much information from the auditory input. In addition, LUT utilizes a pre-trained BERT model to enforce the upper encoder to produce as much semantic information as possible, without extra data. We perform experiments on a diverse set of speech translation benchmarks, including Librispeech English-French, IWSLT English-German and TED English-Chinese. Our results demonstrate LUT achieves the state-of-the-art performance, outperforming previous methods. The code is available at https: //github. com/dqqcasia/st.

TIST Journal 2021 Journal Article

Local Graph Edge Partitioning

  • Shengwei Ji
  • Chenyang Bu
  • Lei Li
  • Xindong Wu

Graph edge partitioning, which is essential for the efficiency of distributed graph computation systems, divides a graph into several balanced partitions within a given size to minimize the number of vertices to be cut. Existing graph partitioning models can be classified into two categories: offline and streaming graph partitioning models. The former requires global graph information during the partitioning, which is expensive in terms of time and memory for large-scale graphs. The latter creates partitions based solely on the received graph information. However, the streaming model may result in a lower partitioning quality compared with the offline model. Therefore, this study introduces a Local Graph Edge Partitioning model, which considers only the local information (i.e., a portion of a graph instead of the entire graph) during the partitioning. Considering only the local graph information is meaningful because acquiring complete information for large-scale graphs is expensive. Based on the Local Graph Edge Partitioning model, two local graph edge partitioning algorithms—Two-stage Local Partitioning and Adaptive Local Partitioning—are given. Experimental results obtained on 14 real-world graphs demonstrate that the proposed algorithms outperform rival algorithms in most tested cases. Furthermore, the proposed algorithms are proven to significantly improve the efficiency of the real graph computation system GraphX.

AAAI Conference 2021 Conference Paper

Taxonomy Completion via Triplet Matching Network

  • Jieyu Zhang
  • Xiangchen Song
  • Ying Zeng
  • Jiaze Chen
  • Jiaming Shen
  • Yuning Mao
  • Lei Li

Automatically constructing taxonomy finds many applications in e-commerce and web search. One critical challenge is as data and business scope grow in real applications, new concepts are emerging and needed to be added to the existing taxonomy. Previous approaches focus on the taxonomy expansion, i. e. finding an appropriate hypernym concept from the taxonomy for a new query concept. In this paper, we formulate a new task, “taxonomy completion”, by discovering both the hypernym and hyponym concepts for a query. We propose Triplet Matching Network (TMN1 ), to find the appropriate hhypernym, hyponymi pairs for a given query concept. TMN consists of one primal scorer and multiple auxiliary scorers. These auxiliary scorers capture various fine-grained signals (e. g. , query to hypernym or query to hyponym semantics), and the primal scorer makes a holistic prediction on hquery, hypernym, hyponymi triplet based on the internal feature representations of all auxiliary scorers. Also, an innovative channel-wise gating mechanism that retains task-specific information in concept representations is introduced to further boost model performance. Experiments on four real-world large-scale datasets show that TMN achieves the best performance on both taxonomy completion task and the previous taxonomy expansion task, outperforming existing methods.

AAAI Conference 2021 Conference Paper

TextGAIL: Generative Adversarial Imitation Learning for Text Generation

  • Qingyang Wu
  • Lei Li
  • Zhou Yu

Generative Adversarial Networks (GANs) for text generation have recently received many criticisms, as they perform worse than their MLE counterparts (Caccia et al. 2020; Tevet et al. 2019; Semeniuta, Severyn, and Gelly 2018). We suspect previous text GANs’ inferior performance is due to the lack of a reliable guiding signal in their discriminators. To address this problem, we propose a generative adversarial imitation learning framework for text generation that uses large pre-trained language models to provide more reliable reward guidance. As previous text GANs suffer from high variance of gradients, we apply contrastive discriminator, and proximal policy optimization (PPO) to stabilize and improve text generation performance. For evaluation, we conduct experiments on a diverse set of unconditional and conditional text generation tasks. Experimental results show that TextGAIL achieves better performance in terms of both quality and diversity than the MLE baseline. We also validate our intuition that TextGAIL’s discriminator demonstrates the capability of providing reasonable rewards with an additional task.

AAAI Conference 2020 Conference Paper

FACT: Fused Attention for Clothing Transfer with Generative Adversarial Networks

  • Yicheng Zhang
  • Lei Li
  • Li Song
  • Rong Xie
  • Wenjun Zhang

Clothing transfer is a challenging task in computer vision where the goal is to transfer the human clothing style in an input image conditioned on a given language description. However, existing approaches have limited ability in delicate colorization and texture synthesis with a conventional fully convolutional generator. To tackle this problem, we propose a novel semantic-based Fused Attention model for Clothing Transfer (FACT), which allows fine-grained synthesis, high global consistency and plausible hallucination in images. Towards this end, we incorporate two attention modules based on spatial levels: (i) soft attention that searches for the most related positions in sentences, and (ii) self-attention modeling long-range dependencies on feature maps. Furthermore, we also develop a stylized channel-wise attention module to capture correlations on feature levels. We effectively fuse these attention modules in the generator and achieve better performances than the state-of-the-art method on the DeepFashion dataset. Qualitative and quantitative comparisons against the baselines demonstrate the effectiveness of our approach.

IJCAI Conference 2020 Conference Paper

Feature Augmented Memory with Global Attention Network for VideoQA

  • Jiayin Cai
  • Chun Yuan
  • Cheng Shi
  • Lei Li
  • Yangyang Cheng
  • Ying Shan

Recently, Recurrent Neural Network (RNN) based methods and Self-Attention (SA) based methods have achieved promising performance in Video Question Answering (VideoQA). Despite the success of these works, RNN-based methods tend to forget the global semantic contents due to the inherent drawbacks of the recurrent units themselves, while SA-based methods cannot precisely capture the dependencies of the local neighborhood, leading to insufficient modeling for temporal order. To tackle these problems, we propose a novel VideoQA framework which progressively refines the representations of videos and questions from fine to coarse grain in a sequence-sensitive manner. Specifically, our model improves the feature representations via the following two steps: (1) introducing two fine-grained feature-augmented memories to strengthen the information augmentation of video and text which can improve memory capacity by memorizing more relevant and targeted information. (2) appending the self-attention and co-attention module to the memory output thus the module is able to capture global interaction between high-level semantic informations. Experimental results show that our approach achieves state-of-the-art performance on VideoQA benchmark datasets.

AAAI Conference 2020 Conference Paper

Importance-Aware Learning for Neural Headline Editing

  • Qingyang Wu
  • Lei Li
  • Hao Zhou
  • Ying Zeng
  • Zhou Yu

Many social media news writers are not professionally trained. Therefore, social media platforms have to hire professional editors to adjust amateur headlines to attract more readers. We propose to automate this headline editing process through neural network models to provide more immediate writing support for these social media news writers. To train such a neural headline editing model, we collected a dataset which contains articles with original headlines and professionally edited headlines. However, it is expensive to collect a large number of professionally edited headlines. To solve this low-resource problem, we design an encoder-decoder model which leverages large scale pre-trained language models. We further improve the pre-trained model’s quality by introducing a headline generation task as an intermediate task before the headline editing task. Also, we propose Self Importance- Aware (SIA) loss to address the different levels of editing in the dataset by down-weighting the importance of easily classified tokens and sentences. With the help of Pre-training, Adaptation, and SIA, the model learns to generate headlines in the professional editor’s style. Experimental results show that our method significantly improves the quality of headline editing comparing against previous methods.

NeurIPS Conference 2020 Conference Paper

SOLOv2: Dynamic and Fast Instance Segmentation

  • Xinlong Wang
  • Rufeng Zhang
  • Tao Kong
  • Lei Li
  • Chunhua Shen

In this work, we design a simple, direct, and fast framework for instance segmentation with strong performance. To this end, we propose a novel and effective approach, termed SOLOv2, following the principle of the SOLO method [32]. First, our new framework is empowered by an efficient and holistic instance mask representation scheme, which dynamically segments each instance in the image, without resorting to bounding box detection. Specifically, the object mask generation is decoupled into a mask kernel prediction and mask feature learning, which are responsible for generating convolution kernels and the feature maps to be convolved with, respectively. Second, SOLOv2 significantly reduces inference overhead with our novel matrix non-maximum suppression (NMS) technique. Our Matrix NMS performs NMS with parallel matrix operations in one shot, and yields better results. We demonstrate that the proposed SOLOv2 achieves the state-of-the- art performance with high efficiency, making it suitable for both mobile and cloud applications. A light-weight version of SOLOv2 executes at 31. 3 FPS and yields 37. 1% AP on COCO test-dev. Moreover, our state-of-the-art results in object detection (from our mask byproduct) and panoptic segmentation show the potential of SOLOv2 to serve as a new strong baseline for many instance-level recognition tasks. Code is available at https: //git. io/AdelaiDet

AAAI Conference 2020 Conference Paper

Task-Aware Monocular Depth Estimation for 3D Object Detection

  • Xinlong Wang
  • Wei Yin
  • Tao Kong
  • Yuning Jiang
  • Lei Li
  • Chunhua Shen

Monocular depth estimation enables 3D perception from a single 2D image, thus attracting much research attention for years. Almost all methods treat foreground and background regions (“things and stuff”) in an image equally. However, not all pixels are equal. Depth of foreground objects plays a crucial role in 3D object recognition and localization. To date how to boost the depth prediction accuracy of foreground objects is rarely discussed. In this paper, we first analyze the data distributions and interaction of foreground and background, then propose the foreground-background separated monocular depth estimation (ForeSeE) method, to estimate the foreground and background depth using separate optimization objectives and decoders. Our method significantly improves the depth estimation performance on foreground objects. Applying ForeSeE to 3D object detection, we achieve 7. 5 AP gains and set new state-of-the-art results among other monocular methods. Code will be available at: https: //github. com/WXinlong/ForeSeE.

AAAI Conference 2020 Conference Paper

Towards Making the Most of BERT in Neural Machine Translation

  • Jiacheng Yang
  • Mingxuan Wang
  • Hao Zhou
  • Chengqi Zhao
  • Weinan Zhang
  • Yong Yu
  • Lei Li

GPT-2 and BERT demonstrate the effectiveness of using pretrained language models (LMs) on various natural language processing tasks. However, LM fine-tuning often suffers from catastrophic forgetting when applied to resource-rich tasks. In this work, we introduce a concerted training framework (CTNMT) that is the key to integrate the pre-trained LMs to neural machine translation (NMT). Our proposed CTNMT consists of three techniques: a) asymptotic distillation to ensure that the NMT model can retain the previous pre-trained knowledge; b) a dynamic switching gate to avoid catastrophic forgetting of pre-trained knowledge; and c) a strategy to adjust the learning paces according to a scheduled policy. Our experiments in machine translation show CTNMT gains of up to 3 BLEU score on the WMT14 English-German language pair which even surpasses the previous state-of-the-art pretraining aided NMT by 1. 4 BLEU score. While for the large WMT14 English-French task with 40 millions of sentencepairs, our base model still significantly improves upon the state-of-the-art Transformer big model by more than 1 BLEU score.

ICRA Conference 2019 Conference Paper

A bio-robotic remora disc with attachment and detachment capabilities for reversible underwater hitchhiking

  • Siqi Wang
  • Lei Li
  • Yufeng Chen 0003
  • Yueping Wang
  • Wenguang Sun
  • Junfei Xiao
  • Dylan K. Wainwright
  • Tianmiao Wang

Remoras employ their adhesive discs to rapidly attach to and detach from a wide range of marine surfaces. By analyzing high-speed images of remoras' (Echeneis naucrates) hitchhiking behavior, we describe the fish's detachment mechanism as a lip curling up to break the seal between the disc and substrate. By mimicking the kinematic and morphological properties of the biological disc, we fabricated a multi-material biomimetic disc (whose stiffness spans four orders of magnitude) that is capable of both attachment and detachment. Detachment is realized by a flexible cable-driven mechanism that curls the anterior region of the silicone soft lip, allows leakage under the disc, and equalizes the internal pressure to the external pressure. The disc lamellae with attached carbon fiber spinules can be rotated by hydraulic soft actuators whose internal pressure is precisely tuned to the ambient underwater pressure. During attachment, increasing the rotational angle of the lamellae and the preload of the disc significantly enhanced the adhesive forces. We found that curling up the soft lip and folding down the lamellae rapidly reduced the pulling force of the disc by a factor of 254 compared to that under the attached state, which lead to detachment. Based on these mechanisms, underwater maneuvers involving repeated attachment and detachment were demonstrated with an integrated ROV unit that had a self-contained actuation and control system for the disc. This study lays a foundation for the development of fully untethered robotic systems for underwater hitchhiking in real-world marine environments.

JBHI Journal 2019 Journal Article

An ARIMA Model With Adaptive Orders for Predicting Blood Glucose Concentrations and Hypoglycemia

  • Jun Yang
  • Lei Li
  • Yimeng Shi
  • Xiaolei Xie

The continuous glucose monitoring system is an effective tool, which enables the users to monitor their blood glucose (BG) levels. Based on the continuous glucose monitoring (CGM) data, we aim at predicting future BG levels so that appropriate actions can be taken in advance to prevent hyperglycemia or hypoglycemia. Due to the time-varying nonstationarity of CGM data, verified by Augmented Dickey–Fuller test and analysis of variance, an autoregressive integrated moving average (ARIMA) model with an adaptive identification algorithm of model orders is proposed in the prediction framework. Such identification algorithm adaptively determines the model orders and simultaneously estimates the corresponding parameters using Akaike Information Criterion and least square estimation. A case study is conducted with the CGM data of diabetics under daily living conditions to analyze the prediction performance of the proposed model together with the early hypoglycemic alarms. Results show that the proposed model outperforms the adaptive univariate model and ARIMA model.

AAAI Conference 2019 Conference Paper

CGMH: Constrained Sentence Generation by Metropolis-Hastings Sampling

  • Ning Miao
  • Hao Zhou
  • Lili Mou
  • Rui Yan
  • Lei Li

In real-world applications of natural language generation, there are often constraints on the target sentences in addition to fluency and naturalness requirements. Existing language generation techniques are usually based on recurrent neural networks (RNNs). However, it is non-trivial to impose constraints on RNNs while maintaining generation quality, since RNNs generate sentences sequentially (or with beam search) from the first word to the last. In this paper, we propose CGMH, a novel approach using Metropolis-Hastings sampling for constrained sentence generation. CGMH allows complicated constraints such as the occurrence of multiple keywords in the target sentences, which cannot be handled in traditional RNN-based approaches. Moreover, CGMH works in the inference stage, and does not require parallel corpora for training. We evaluate our method on a variety of tasks, including keywords-to-sentence generation, unsupervised sentence paraphrasing, and unsupervised sentence error correction. CGMH achieves high performance compared with previous supervised methods for sentence generation. Our code is released at https: //github. com/NingMiao/CGMH

IJCAI Conference 2019 Conference Paper

Correct-and-Memorize: Learning to Translate from Interactive Revisions

  • Rongxiang Weng
  • Hao Zhou
  • Shujian Huang
  • Lei Li
  • Yifan Xia
  • Jiajun Chen

State-of-the-art machine translation models are still not on a par with human translators. Previous work takes human interactions into the neural machine translation process to obtain improved results in target languages. However, not all model--translation errors are equal -- some are critical while others are minor. In the meanwhile, same translation mistakes occur repeatedly in similar context. To solve both issues, we propose CAMIT, a novel method for translating in an interactive environment. Our proposed method works with critical revision instructions, therefore allows human to correct arbitrary words in model-translated sentences. In addition, CAMIT learns from and softly memorizes revision actions based on the context, alleviating the issue of repeating mistakes. Experiments in both ideal and real interactive translation settings demonstrate that our proposed CAMIT enhances machine translation results significantly while requires fewer revision instructions from human compared to previous methods.

IJCAI Conference 2019 Conference Paper

Deep Active Learning for Anchor User Prediction

  • Anfeng Cheng
  • Chuan Zhou
  • Hong Yang
  • Jia Wu
  • Lei Li
  • Jianlong Tan
  • Li Guo

Predicting pairs of anchor users plays an important role in the cross-network analysis. Due to the expensive costs of labeling anchor users for training prediction models, we consider in this paper the problem of minimizing the number of user pairs across multiple networks for labeling as to improve the accuracy of the prediction. To this end, we present a deep active learning model for anchor user prediction (DALAUP for short). However, active learning for anchor user sampling meets the challenges of non-i. i. d. user pair data caused by network structures and the correlation among anchor or non-anchor user pairs. To solve the challenges, DALAUP uses a couple of neural networks with shared-parameter to obtain the vector representations of user pairs, and ensembles three query strategies to select the most informative user pairs for labeling and model training. Experiments on real-world social network data demonstrate that DALAUP outperforms the state-of-the-art approaches.

IJCAI Conference 2019 Conference Paper

GraspSnooker: Automatic Chinese Commentary Generation for Snooker Videos

  • Zhaoyue Sun
  • Jiaze Chen
  • Hao Zhou
  • Deyu Zhou
  • Lei Li
  • Mingmin Jiang

We demonstrate a web-based software system, GraspSnooker, which is able to automatically generate Chinese text commentaries for snooker game videos. It consists of a video analyzer, a strategy predictor and a commentary generator. As far as we know, it is the first attempt on snooker commentary generation, which might be helpful for snooker learners to understand the game.

NeurIPS Conference 2019 Conference Paper

Kernelized Bayesian Softmax for Text Generation

  • Ning Miao
  • Hao Zhou
  • Chengqi Zhao
  • Wenxian Shi
  • Lei Li

Neural models for text generation require a softmax layer with proper token embeddings during the decoding phase. Most existing approaches adopt single point embedding for each token. However, a word may have multiple senses according to different context, some of which might be distinct. In this paper, we propose KerBS, a novel approach for learning better embeddings for text generation. KerBS embodies two advantages: (a) it employs a Bayesian composition of embeddings for words with multiple senses; (b) it is adaptive to semantic variances of words and robust to rare sentence context by imposing learned kernels to capture the closeness of words (senses) in the embedding space. Empirical studies show that KerBS significantly boosts the performance of several text generation tasks.

IJCAI Conference 2019 Conference Paper

Knowledgeable Storyteller: A Commonsense-Driven Generative Model for Visual Storytelling

  • Pengcheng Yang
  • Fuli Luo
  • Peng Chen
  • Lei Li
  • Zhiyi Yin
  • Xiaodong He
  • Xu Sun

The visual storytelling (VST) task aims at generating a reasonable and coherent paragraph-level story with the image stream as input. Different from caption that is a direct and literal description of image content, the story in the VST task tends to contain plenty of imaginary concepts that do not appear in the image. This requires the AI agent to reason and associate with the imaginary concepts based on implicit commonsense knowledge to generate a reasonable story describing the image stream. Therefore, in this work, we present a commonsense-driven generative model, which aims to introduce crucial commonsense from the external knowledge base for visual storytelling. Our approach first extracts a set of candidate knowledge graphs from the knowledge base. Then, an elaborately designed vision-aware directional encoding schema is adopted to effectively integrate the most informative commonsense. Besides, we strive to maximize the semantic similarity within the output during decoding to enhance the coherence of the generated text. Results show that our approach can outperform the state-of-the-art systems by a large margin, which achieves a 29\% relative improvement of CIDEr score. With additional commonsense and semantic-relevance based objective, the generated stories are more diverse and coherent.

NeurIPS Conference 2018 Conference Paper

BRITS: Bidirectional Recurrent Imputation for Time Series

  • Wei Cao
  • Dong Wang
  • Jian Li
  • Hao Zhou
  • Lei Li
  • Yitan Li

Time series are widely used as signals in many classification/regression tasks. It is ubiquitous that time series contains many missing values. Given multiple correlated time series data, how to fill in missing values and to predict their class labels? Existing imputation methods often impose strong assumptions of the underlying data generating process, such as linear dynamics in the state space. In this paper, we propose BRITS, a novel method based on recurrent neural networks for missing value imputation in time series data. Our proposed method directly learns the missing values in a bidirectional recurrent dynamical system, without any specific assumption. The imputed values are treated as variables of RNN graph and can be effectively updated during the backpropagation. BRITS has three advantages: (a) it can handle multiple correlated missing values in time series; (b) it generalizes to time series with nonlinear dynamics underlying; (c) it provides a data-driven imputation procedure and applies to general settings with missing data. We evaluate our model on three real-world datasets, including an air quality dataset, a health-care data, and a localization data for human activity. Experiments show that our model outperforms the state-of-the-art methods in both imputation and classification/regression accuracies.

AAAI Conference 2017 Conference Paper

A Nearly-Black-Box Online Algorithm for Joint Parameter and State Estimation in Temporal Models

  • Yusuf Erol
  • Yi Wu
  • Lei Li
  • Stuart Russell

Online joint parameter and state estimation is a core problem for temporal models. Most existing methods are either restricted to a particular class of models (e. g. , the Storvik filter) or computationally expensive (e. g. , particle MCMC). We propose a novel nearly-black-box algorithm, the Assumed Parameter Filter (APF), a hybrid of particle filtering for state variables and assumed density filtering for parameter variables. It has the following advantages: (a) it is online and computationally efficient; (b) it is applicable to both discrete and continuous parameter spaces with arbitrary transition dynamics. On a variety of toy and real models, APF generates more accurate results within a fixed computation budget compared to several standard algorithms from the literature.

IJCAI Conference 2016 Conference Paper

Swift: Compiled Inference for Probabilistic Programming Languages

  • Yi Wu
  • Lei Li
  • Stuart Russell
  • Rastislav Bodik

A probabilistic program defines a probability measure over its semantic structures. One common goal of probabilistic programming languages (PPLs) is to compute posterior probabilities for arbitrary models and queries, given observed evidence, using a generic inference engine. Most PPL inference engines - even the compiled ones - incur significant runtime interpretation overhead, especially for contingent and open-universe models. This paper describes Swift, a compiler for the BLOG PPL. Swift-generated code incorporates optimizations that eliminate interpretation overhead, maintain dynamic dependencies efficiently, and handle memory management for possible worlds of varying sizes. Experiments comparing Swift with other PPL engines on avariety of inference problems demonstrate speedups ranging from 12x to326x.

IS Journal 2016 Journal Article

Trust Agent-Based Behavior Induction in Social Networks

  • Lei Li
  • Jianping He
  • Meng Wang
  • Xindong Wu

The essence of social networks is that they can influence people's public opinions and group behaviors form quickly. Negative group behavior influences societal stability significantly, but existing behavior-induction approaches are too simple and inefficient. To automatically and efficiently induct behavior in social networks, this article introduces trust agents and designs their features according to group behavior features. In addition, a dynamics control mechanism can be generated to coordinate participant behaviors in social networks to avoid a specific restricted negative group behavior.

IROS Conference 2015 Conference Paper

Design and control of robotic exoskeleton with balance stabilizer mechanism

  • Lei Li
  • Kay Hiang Hoon
  • Adela Tow
  • P. H. Lim
  • Kin Huat Low

Robotic exoskeletons have drawn much attention recently due to their potential ability to help the stroke and spinal cord injury patients to regain the ability of walking. However, the biggest challenge is the balancing of the exoskeleton and how it can balance is still an open question. Most of the time, patients using such exoskeleton devices require sufficient upper body strength to control upright posture and also manipulate crutches/walking frames to partially support body weight and keep balance. However, high energy cost and the high potential of falling using these devices remains a problem. In this paper, the issues are tackled by virtue of a proposed balance stabilizer mechanism which is able to provide active balance assistance for robotic exoskeletons. The design of a robotic exoskeleton together with balance stabilizer mechanism will be presented and discussed. In addition, a trajectory generation method, which can generate dynamically stable and tunable gait pattern, will also be shown. Finally, clinical trial results with a tetraplegia subject is presented and discussed.

NeurIPS Conference 2013 Conference Paper

Multilinear Dynamical Systems for Tensor Time Series

  • Mark Rogers
  • Lei Li
  • Stuart Russell

Many scientific data occur as sequences of multidimensional arrays called tensors. How can hidden, evolving trends in such data be extracted while preserving the tensor structure? The model that is traditionally used is the linear dynamical system (LDS), which treats the observation at each time slice as a vector. In this paper, we propose the multilinear dynamical system (MLDS) for modeling tensor time series and an expectation-maximization (EM) algorithm to estimate the parameters. The MLDS models each time slice of the tensor time series as the multilinear projection of a corresponding member of a sequence of latent, low-dimensional tensors. Compared to the LDS with an equal number of parameters, the MLDS achieves higher prediction accuracy and marginal likelihood for both simulated and real datasets.

IJCAI Conference 2011 Conference Paper

Multi-Agent Plan Recognition with Partial Team Traces and Plan Libraries

  • Hankz Hankui Zhuo
  • Lei Li

Multi-Agent Plan Recognition (MAPR) seeks to identify the dynamic team structures and team behaviors from the observed activity sequences (team traces) of a set of intelligent agents, based on a library of known team activity sequences (team plans). Previous MAPR systems require that team traces and team plans are fully observed. In this paper we relax this constraint, i. e. , team traces and team plans are allowed to be partial. This is an important task in applying MAPR to real-world domains, since in many applications it is often difficult to collect full team traces or team plans due to environment limitations, e. g. , military operation. This is also a hard problem since the information available is limited. We propose a novel approach to recognizing team plans from partial team traces and team plans. We encode the MAPR problem as a satisfaction problem and solve the problem using a state-of-the-art weighted MAX-SAT solver. We empirically show that our algorithm is both effective and efficient.

AAAI Conference 2010 Conference Paper

Subjective Trust Inference in Composite Services

  • Lei Li
  • Yan Wang

In Service-Oriented Computing (SOC) environments, the trustworthiness of each service is critical for a service client when selecting one from a large pool of services. The trust value of a service is usually in the range of [0, 1] and is evaluated from the ratings given by service clients, which represent the subjective belief of these service clients on the satisfaction of delivered services. So a trust value can be taken as the subjective probability, with which one party believes that another party can perform an action in a certain situation. Hence, subjective probability theory should be adopted in trust evaluation. In addition, in SOC environments, a service usually invokes other services offered by different service providers forming a composite service. Thus, the global trust of a composite service should be evaluated based on complex invocation structures. In this paper, firstly, based on Bayesian inference, we propose a novel method to evaluate the subjective trustworthiness of a service component from a series of ratings given by service clients. Secondly, we interpret the trust dependency caused by service invocations as conditional probability, which is evaluated based on the subjective trust values of service components. Furthermore, we propose a joint subjective probability method to evaluate the subjective global trust of a composite service on the basis of trust dependency. Finally, we introduce the results of our conducted experiments to illustrate the properties of our proposed subjective global trust inference method.

IS Journal 2007 Journal Article

Cost-Sensitive-Data Preprocessing for Mining Customer Relationship Management Databases

  • Junfeng Pan
  • Qiang Yang
  • Yiming Yang
  • Lei Li
  • Frances Li
  • George Li

A staged-framework for data preprocessing has been developed to support data mining and help service providers identify customers who might switch to a competitor. The framework pushes the cost sensitivity and data imbalance of customer retention data into the data preprocessing itself. Tests using data set from the ACM KDD Cup 1998 showed that the framework outperformed the winner of that data mining and knowledge discovery competition. The framework has also been incorporated into a software system, called ED-Money. To demonstrate the framework's ability to predict customer attrition with high accuracy, it was applied to some benchmark data and to a real customer attrition data set from a large Chinese mobile telecommunications company