Arrow Research search

Author name cluster

Zheng Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

51 papers
2 author rows

Possible papers

51

AAAI Conference 2026 Conference Paper

Compensating Distribution Drifts in Continual Learning with Pre-trained Vision Transformers

  • Xuan Rao
  • Simian Xu
  • Zheng Li
  • Bo Zhao
  • Derong Liu
  • Mingming Ha
  • Cesare Alippi

Recent advances have shown that sequential fine-tuning (SeqFT) of pre-trained vision transformers (ViTs), followed by classifier refinement using approximate distributions of class features, can be an effective strategy for class-incremental learning (CIL). However, this approach is susceptible to distribution drift, caused by the sequential optimization of shared backbone parameters. This results in a mismatch between the distributions of the previously learned classes and that of the updated model, ultimately degrading the effectiveness of classifier performance over time. To address this issue, we introduce a latent space transition operator and propose Sequential Learning with Drift Compensation (SLDC). SLDC aims to align feature distributions across tasks to mitigate the impact of drift. First, we present a linear variant of SLDC, which learns a linear operator by solving a regularized least-squares problem that maps features before and after fine-tuning. Next, we extend this with a weakly nonlinear SLDC variant, which assumes that the ideal transition operator lies between purely linear and fully nonlinear transformations. This is implemented using learnable, weakly nonlinear mappings that balance flexibility and generalization. To further reduce representation drift, we apply knowledge distillation (KD) in both algorithmic variants. Extensive experiments on standard CIL benchmarks demonstrate that SLDC significantly improves the performance of SeqFT. Notably, by combining KD to address representation drift with SLDC to compensate distribution drift, SeqFT achieves performance comparable to joint training across all evaluated datasets.

AAAI Conference 2026 Conference Paper

DIAA: A Decoding-Efficient Inference Acceleration Approach for On-Device Large Language Models

  • Hao Tian
  • Sheng Lu
  • Fuwen Tian
  • Guangming Cui
  • Zheng Li
  • Xuyun Zhang
  • Quan Z. Sheng
  • Wanchun Dou

Large Language Models (LLMs) have revolutionized intelligent interactions, enabling mobile applications such as personal assistants on edge devices for local execution. Speculative decoding (SD) has emerged as a promising paradigm to accelerate LLM inference without compromising generation quality, employing a draft-then-verify manner. However, due to the constrained computing and memory resources on edge devices, existing SD works heavily rely on an auxiliary draft model that incurs additional memory burden and hinders the adaptability, as well as static token trees that yield suboptimal inference performance. To this end, we propose DIAA, a Decoding-efficient Inference Acceleration Approach for on-device LLMs. DIAA achieves plug-and-play and model-agnostic inference speedup with memory and computation efficiency for edge devices. Specifically, a pair of lightweight look-up tables (LUTs) is constructed by Top-K token sampling to cache historical tokens and probabilities for rapid candidate drafting. DIAA integrates a dynamic token tree with prior LUTs enabling paralleled verification, updated during decoding process, to adapt the online context. A computation overlap is then employed to pipeline the update operations of token tree, LUTs, and KV cache to improve the computational efficiency. Finally, through extensive experiments implemented on edge platform NVIDIA Jetson, DIAA outperforms existing baselines in generation speed and inference wall-clock time, while incurring minimal memory overhead.

AAAI Conference 2026 Conference Paper

Rep Deep & Machine Learning: Exemplar-Free Continual Video Action Recognition via Slow-Fast Collaborative Learning

  • Xueyi Zhang
  • Chengwei Zhang
  • Zheng Li
  • Xiyu Wang
  • Siqi Cai
  • Mingrui Lao
  • Yanming Guo
  • Huiping Zhuang

In real-world applications, video action recognition models must continuously learn new action categories while retaining previously acquired knowledge. However, most existing approaches rely on storing historical data for replay, which introduces storage burdens and raises data privacy concerns. To address these challenges, we investigate the problem of Exemplar-Free Continual Video Action Recognition (EF-CVAR) and propose a novel framework named Slow-Fast Collaborative Learning (SFCL). SFCL integrates two complementary learning paradigms: a slow branch based on gradient-driven deep learning, which provides strong adaptability to new tasks, and a fast branch based on analytic learning (e.g., Recursive Least Squares), which efficiently preserves old knowledge without requiring access to past samples. To enable effective collaboration between the two branches, we design the Slow-Fast Dynamic Re-parameterization (SFDR) mechanism for adaptive fusion, and the Knowledge Reflection Mechanism (KRM), which mitigates forgetting and task-recency bias via pseudo-feature generation and dual-level knowledge distillation. Extensive experiments on UCF101, HMDB51, and Something-Something V2 demonstrate that SFCL achieves superior performance compared to existing replay-based methods, despite being exemplar-free. Notably, in long-duration continual learning scenarios, SFCL exhibits remarkable robustness, achieving up to a 30.39\% improvement in accuracy over baselines while maintaining a low forgetting rate, highlighting its scalability and effectiveness in real-world video recognition tasks.

AAAI Conference 2025 Conference Paper

CFDM: Contrastive Fusion and Disambiguation for Multi-View Partial-Label Learning

  • Qiuru Hai
  • Yongjian Deng
  • Yuena Lin
  • Zheng Li
  • Zhen Yang
  • Gengyu Lyu

When dealing with multi-view data, the heterogeneity of data attributes across different views often leads to label ambiguity. To effectively address this challenge, this paper designs a Multi-View Partial-Label Learning (MVPLL) framework, where each training instance is described by multiple view features and associated with a set of candidate labels, among which only one is correct. The key to deal with such problem lies in how to effectively fuse multi-view information and accurately disambiguate these ambiguous labels. In this paper, we propose a novel approach named CFDM, which explores the consistency and complementarity of multi-view data by multi-view contrastive fusion and reduces label ambiguity by multi-class contrastive prototype disambiguation. Specifically, we first extract view-specific representations using multiple view-specific autoencoders, and then integrate multi-view information through both inter-view and intra-view contrastive fusion to enhance the distinctiveness of these representations. Afterwards, we utilize these distinctive representations to establish and update prototype vectors for each class within each view. Based on these, we apply contrastive prototype disambiguation to learn global class prototypes and accordingly reduce label ambiguity. In our model, multi-view contrastive fusion and multi-class contrastive prototype disambiguation are conducted mutually to enhance each other within a coherent framework, leading to a more ideal classification performance. Experimental results on multiple datasets have demonstrated that our proposed method is superior to other state-of-the-art methods.

AAAI Conference 2025 Conference Paper

Contrasting Adversarial Perturbations: The Space of Harmless Perturbations

  • Lu Chen
  • Shaofeng Li
  • Benhao Huang
  • Fan Yang
  • Zheng Li
  • Jie Li
  • Yuan Luo

Existing works have extensively studied adversarial examples, which are minimal perturbations that can mislead the output of deep neural networks (DNNs) while remaining imperceptible to humans. However, in this work, we reveal the existence of a harmless perturbation space, in which perturbations drawn from this space, regardless of their magnitudes, leave the network output unchanged when applied to inputs. Essentially, the harmless perturbation space emerges from the usage of non-injective functions (linear or non-linear layers) within DNNs, enabling multiple distinct inputs to be mapped to the same output. For linear layers with input dimensions exceeding output dimensions, any linear combination of the orthogonal bases of the nullspace of the parameter consistently yields no change in their output. For non-linear layers, the harmless perturbation space may expand, depending on the properties of the layers and input samples. Inspired by this property of DNNs, we solve for a family of general perturbation spaces that are redundant for the DNN's decision, and can be used to hide sensitive data and serve as a means of model identification. Our work highlights the distinctive robustness of DNNs (i.e., consistency under large magnitude perturbations) in contrast to adversarial examples (vulnerability for small noises).

JBHI Journal 2025 Journal Article

CorrMorph: Unsupervised Deformable Brain MRI Registration Based on Correlation Mining

  • Yuan Chang
  • Zheng Li
  • Ning Yang

Deformable image registration, as a fundamental prerequisite for many medical image analysis tasks, has received considerable attention. However, existing methods suffer from two key issues: 1) single-stream methods that stack moving and fixed images as input are prone to interference from spatial misalignment and style discrepancy, while dual-stream methods that use fully parallel encoders face challenges in learning correlations between images. 2) CNN-based methods are difficult to capture the complex spatial correspondences between images, while Transformer-based methods lack the ability to capture local context information. Therefore, we propose an unsupervised deformable brain MRI registration network, CorrMorph, which achieves reasonable and accurate registration by mining correlations. Specifically, we design a match-fusion strategy that allows the independent extraction of shallow features from the moving and fixed images while capturing their correlations in deeper layers. Furthermore, we propose two novel modules. 1) Correlation Matching Module (CMM), which mines correlations between images to achieve effective feature matching, 2) Feature Transmission Module (FTM), which extracts important spatial features to achieve effective feature transmission. Extensive experiments are conducted on three brain MRI datasets, and the results indicate that our method achieves state-of-the-art performance, with an average improvement of 2. 7% on DSC compared to the representative VoxelMorph.

NeurIPS Conference 2025 Conference Paper

ErrorTrace: A Black-Box Traceability Mechanism Based on Model Family Error Space

  • Chuanchao Zang
  • Xiangtao Meng
  • Wenyu Chen
  • Tianshuo Cong
  • Zha Yaxing
  • Dong Qi
  • Zheng Li
  • Shanqing Guo

The open-source release of large language models (LLMs) enables malicious users to create unauthorized derivative models at low cost, posing significant threats to intellectual property (IP) and market stability. Existing IP protection methods either require access to model parameters or are vulnerable to fine-tuning attacks. To fill this gap, we propose ErrorTrace, a robust and black-box traceability mechanism for protecting LLM IP. Specifically, ErrorTrace leverages the unique error patterns of model families by mapping and analyzing their distinct error spaces, enabling robust and efficient IP protection without relying on internal parameters or specific query responses. Experimental results show that ErrorTrace achieves a traceability accuracy of 0. 8518 for 27 base models when the suspect model is not included in ErrorTrace's training set, outperforming the baseline by 0. 2593. Additionally, ErrorTrace successfully tracks 34 fine-tuned, pruned and merged models across various scenarios, demonstrating its broad applicability and robustness. In addition, ErrorTrace shows a certain level of resilience when subjected to adversarial attacks. Our code is available at: https: //github. com/csdatazcc/ErrorTrace.

IROS Conference 2025 Conference Paper

From Learning to Mastery: Achieving Safe and Efficient Real-World Autonomous Driving with Human-in-the-Loop Reinforcement Learning

  • Zeqiao Li
  • Yijing Wang 0001
  • Haoyu Wang 0012
  • Zheng Li
  • Peng Li 0043
  • Wenfei Liu
  • Zhiqiang Zuo 0001

Autonomous driving with reinforcement learning (RL) has significant potential. However, applying RL in real-world settings remains challenging due to the need for safe, efficient, and robust learning. Incorporating human expertise into the learning process can help overcome these challenges by reducing risky exploration and improving sample efficiency. In this work, we propose a reward-free, active human-in-the-loop learning method called Human-Guided Distributional Soft Actor-Critic (H-DSAC). Our method combines Proxy Value Propagation (PVP) and Distributional Soft Actor-Critic (DSAC) to enable efficient and safe training in real-world environments. The key innovation is the construction of a distributed proxy value function within the DSAC framework. This function encodes human intent by assigning higher expected returns to expert demonstrations and penalizing actions that require human intervention. By extrapolating these labels to unlabeled states, the policy is effectively guided toward expert-like be-havior. With a well-designed state space, our method achieves real-world driving policy learning within practical training times. Results from both simulation and real-world experiments demonstrate that our framework enables safe, robust, and sample-efficient learning for autonomous driving. The videos and code are available at: https://github.com/lzqw/H-DSAC.

ICML Conference 2025 Conference Paper

Local Identifying Causal Relations in the Presence of Latent Variables

  • Zheng Li
  • Zeyu Liu
  • Feng Xie 0002
  • Hao Zhang 0079
  • Chunchen Liu
  • Zhi Geng

We tackle the problem of identifying whether a variable is the cause of a specified target using observational data. State-of-the-art causal learning algorithms that handle latent variables typically rely on identifying the global causal structure, often represented as a partial ancestral graph (PAG), to infer causal relationships. Although effective, these approaches are often redundant and computationally expensive when the focus is limited to a specific causal relationship. In this work, we introduce novel local characterizations that are necessary and sufficient for various types of causal relationships between two variables, enabling us to bypass the need for global structure learning. Leveraging these local insights, we develop efficient and fully localized algorithms that accurately identify causal relationships from observational data. We theoretically demonstrate the soundness and completeness of our approach. Extensive experiments on benchmark networks and real-world datasets further validate the effectiveness and efficiency of our method.

NeurIPS Conference 2025 Conference Paper

Local Learning for Covariate Selection in Nonparametric Causal Effect Estimation with Latent Variables

  • Zheng Li
  • Xichen Guo
  • Feng Xie
  • Yan Zeng
  • Hao Zhang
  • Zhi Geng

Estimating causal effects from nonexperimental data is a fundamental problem in many fields of science. A key component of this task is selecting an appropriate set of covariates for confounding adjustment to avoid bias. Most existing methods for covariate selection often assume the absence of latent variables and rely on learning the global causal structure among variables. However, identifying the global structure can be unnecessary and inefficient, especially when our primary interest lies in estimating the effect of a treatment variable on an outcome variable. To address this limitation, we propose a novel local learning approach for covariate selection in nonparametric causal effect estimation, which accounts for the presence of latent variables. Our approach leverages testable independence and dependence relationships among observed variables to identify a valid adjustment set for a target causal relationship, ensuring both soundness and completeness under standard assumptions. We validate the effectiveness of our algorithm through extensive experiments on both synthetic and real-world data.

IROS Conference 2025 Conference Paper

Prescribed-Time Safe Pursuit Control with Dynamic Obstacle and Occlusion Avoidance

  • Zheng Li
  • Xiaodong Shao
  • Haoran Li
  • Dongyu Li
  • Qinglei Hu

Performing target tracking and surveillance in dynamic obstacle environments requires maintaining continuous visual focus on the target while ensuring collision avoidance. This paper presents a safety-critical tracking control method that ensures dynamic obstacles remain outside the camera’s line of sight while simultaneously avoiding collisions between the chaser vehicle and obstacles. A novel real-time occlusion detection function is developed, and motion constraints are systematically integrated using a hybrid framework combining the artificial potential field (APF) method with an observer-based control strategy. To address temporal-sensitive tasks, a prescribed time controller (PTC) based on time-scale transformation technique has been proposed. Furthermore, a prescribed time linear extended state observer (PTESO) is proposed, featuring a simplified structure to enable rapid and accurate estimation of unknown environmental disturbances and non-linear terms. Finally, the effectiveness of the proposed method was verified via simulation in a simplified physical scenario.

IJCAI Conference 2025 Conference Paper

Training-free Fourier Phase Diffusion for Style Transfer

  • Siyuan Zhang
  • Wei Ma
  • Libin Liu
  • Zheng Li
  • Hongbin Zha

Diffusion models have shown significant potential for image style transfer tasks. However, achieving effective stylization while preserving content in a training-free setting remains a challenging issue due to the tightly coupled representation space and inherent randomness of the models. In this paper, we propose a Fourier phase diffusion model that addresses this challenge. Given that the Fourier phase spectrum encodes an image's edge structures, we propose modulating the intermediate diffusion samples with the Fourier phase of a content image to conditionally guide the diffusion process. This ensures content retention while fully utilizing the diffusion model's style generation capabilities. To implement this, we introduce a content phase spectrum incorporation method that aligns with the characteristics of the diffusion process, preventing interference with generative stylization. To further enhance content preservation, we integrate homomorphic semantic features extracted from the content image at each diffusion stage. Extensive experimental results demonstrate that our method outperforms state-of-the-art models in both content preservation and stylization. Code is available at https: //github. com/zhang2002forwin/Fourier-Phase-Diffusion-for-Style-Transfer.

IROS Conference 2025 Conference Paper

UltraDP: Generalizable Carotid Ultrasound Scanning with Force-Aware Diffusion Policy

  • Ruoqu Chen
  • Xiangjie Yan
  • Kangchen Lv
  • Gao Huang 0001
  • Zheng Li
  • Xiang Li 0009

Ultrasound scanning is a critical imaging technique for real-time, non-invasive diagnostics. However, variations in patient anatomy and complex human-in-the-loop interactions pose significant challenges for autonomous robotic scanning. Existing ultrasound scanning robots are commonly limited to relatively low generalization and inefficient data utilization. To overcome these limitations, we present UltraDP, a Diffusion-Policy-based method that receives multi-sensory inputs (ultrasound images, wrist camera images, contact wrench, and probe pose) and generates actions that are fit for multi-modal action distributions in autonomous ultrasound scanning of carotid artery. We propose a specialized guidance module to enable the policy to output actions that center the artery in ultrasound images. To ensure stable contact and safe interaction between the robot and the human subject, a hybrid force-impedance controller is utilized to drive the robot to track such trajectories. Also, we have built a large-scale training dataset for carotid scanning comprising 210 scans with 460k sample pairs from 21 volunteers of both genders. By exploring our guidance module and DP’s strong generalization ability, UltraDP achieves a 95% success rate in transverse scanning on previously unseen subjects, demonstrating its effectiveness.

AAAI Conference 2025 Conference Paper

Unsupervised Photometric-Consistent Depth Estimation from Endoscopic Monocular Video

  • Shijie Li
  • Weijun Lin
  • Qingyuan Xiang
  • Yunbin Tu
  • Shitan Asu
  • Zheng Li

Recent advancements in unsupervised monocular depth estimation typically rely on an assumption that image photometry remains consistent across consecutive frames. However, this assumption often fails in endoscopic scenes due to: 1) local photometric inconsistency caused by specular reflections creating highlights; and 2) global photometric inconsistency resulting from the simultaneous movement of the light source and the camera. Since unsupervised depth estimation methods rely on appearance discrepancies between frames as a supervisory signal, these photometric inconsistencies inevitably deteriorate loss function calculation. In this paper, our goal is to obtain a strong and reliable supervisory signal for achieving photometric-consistent depth estimation. To this end, for local photometric inconsistency, we utilize the specular reflection model to introduce a Highlight Loss for handling the estimation of highlight regions. For global photometric inconsistency, we design a Photometric Match module, which utilizes the spotlight illumination model to derive an analytical expression, achieving photometric alignment across different frames. Unlike previous works that introduce additional optical flow or networks, our method is simpler and more efficient. Extensive experiments demonstrate our method achieves the state-of-the-art results on C3VD, SCARED and SERV-CT datasets.

IJCAI Conference 2024 Conference Paper

Common-Individual Semantic Fusion for Multi-View Multi-Label Learning

  • Gengyu Lyu
  • Weiqi Kang
  • Haobo Wang
  • Zheng Li
  • Zhen Yang
  • Songhe Feng

In Multi-View Multi-Label Learning, each instance is described by several heterogeneous features and associated with multiple valid labels simultaneously. Existing methods mainly focus on leveraging feature-level view fusion to capture a common representation for multi-label classifier induction. In this paper, we take a new perspective and propose a new semantic-level fusion model named Common-Individual Semantic Fusion Multi-View Multi-Label Learning Method (CISF). Different from previous feature-level fusion model, our proposed method directly focuses on semantic-level view fusion and simultaneously take both the common semantic across different views and the individual semantic of each specific view into consideration. Specifically, we first assume each view involves some common semantic labels while owns a few exclusive semantic labels. Then, the common and exclusive semantic labels are separately forced to be consensus and diverse to excavate the consistences and complementarities among different views. Afterwards, we introduce the low-rank and sparse constraint to highlight the label co-occurrence relationship of common semantics and the view-specific expression of individual semantics. We provide theoretical guarantee for the strict convexity of our method by properly setting parameters. Extensive experiments on various data sets have verified the superiority of our method.

ECAI Conference 2024 Conference Paper

Dual Attention Encoder with Joint Preservation for Medical Image Segmentation

  • Shijie Li
  • Yunbin Tu
  • Yu Gong
  • Bowen Zhong
  • Zheng Li

Transformers have recently gained considerable popularity for capturing long-range dependencies in the medical image segmentation. However, most transformer-based segmentation methods primarily focus on modeling global dependencies and fail to fully explore the complementary nature of different dimensional dependencies within features. These methods simply treat the aggregation of multi-dimensional dependencies as auxiliary modules for incorporating context into the Transformer architecture, thereby limiting the model’s capability to learn rich feature representations. To address this issue, we introduce the Dual Attention Encoder with Joint Preservation (DANIE) for medical image segmentation, which synergistically aggregates spatial-channel dependencies across both local and global areas through attention learning. Additionally, we design a lightweight aggregation mechanism, termed Joint Preservation, which learns a composite feature representation, allowing different dependencies to complement each other. Without bells and whistles, our DANIE significantly improves the performance of previous state-of-the-art methods on five popular medical image segmentation benchmarks, including Synapse, ACDC, ISIC 2017, ISIC 2018 and GlaS.

ICML Conference 2024 Conference Paper

Local Causal Structure Learning in the Presence of Latent Variables

  • Feng Xie 0002
  • Zheng Li
  • Peng Wu 0012
  • Yan Zeng 0002
  • Chunchen Liu
  • Zhi Geng

Discovering causal relationships from observational data, particularly in the presence of latent variables, poses a challenging problem. While current local structure learning methods have proven effective and efficient when the focus lies solely on the local relationships of a target variable, they operate under the assumption of causal sufficiency. This assumption implies that all the common causes of the measured variables are observed, leaving no room for latent variables. Such a premise can be easily violated in various real-world applications, resulting in inaccurate structures that may adversely impact downstream tasks. In light of this, our paper delves into the primary investigation of locally identifying potential parents and children of a target from observational data that may include latent variables. Specifically, we harness the causal information from m-separation and V-structures to derive theoretical consistency results, effectively bridging the gap between global and local structure learning. Together with the newly developed stop rules, we present a principled method for determining whether a variable is a direct cause or effect of a target. Further, we theoretically demonstrate the correctness of our approach under the standard causal Markov and faithfulness conditions, with infinite samples. Experimental results on both synthetic and real-world data validate the effectiveness and efficiency of our approach.

NeurIPS Conference 2024 Conference Paper

Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models

  • Yilun Jin
  • Zheng Li
  • Chenwei Zhang
  • Tianyu Cao
  • Yifan Gao
  • Pratik Jayarao
  • Mao Li
  • Xin Liu

Online shopping is a complex multi-task, few-shot learning problem with a wide and evolving range of entities, relations, and tasks. However, existing models and benchmarks are commonly tailored to specific tasks, falling short of capturing the full complexity of online shopping. Large Language Models (LLMs), with their multi-task and few-shot learning abilities, have the potential to profoundly transform online shopping by alleviating task-specific engineering efforts and by providing users with interactive conversations. Despite the potential, LLMs face unique challenges in online shopping, such as domain-specific concepts, implicit knowledge, and heterogeneous user behaviors. Motivated by the potential and challenges, we propose Shopping MMLU, a diverse multi-task online shopping benchmark derived from real-world Amazon data. Shopping MMLU consists of 57 tasks covering 4 major shopping skills: concept understanding, knowledge reasoning, user behavior alignment, and multi-linguality, and can thus comprehensively evaluate the abilities of LLMs as general shop assistants. With Shoppping MMLU, we benchmark over 20 existing LLMs and uncover valuable insights about practices and prospects of building versatile LLM-based shop assistants. Shopping MMLU can be publicly accessed at https: //github. com/KL4805/ShoppingMMLU. In addition, with Shopping MMLU, we are hosting a competition in KDD Cup 2024 with over 500 participating teams. The winning solutions and the associated workshop can be accessed at our website https: //amazon-kddcup24. github. io/.

JBHI Journal 2024 Journal Article

TPAFNet: Transformer-Driven Pyramid Attention Fusion Network for 3D Medical Image Segmentation

  • Zheng Li
  • Jinhui Zhang
  • Siyi Wei
  • Yueyang Gao
  • Chengwei Cao
  • Zhiwei Wu

The field of 3D medical image segmentation is witnessing a growing trend in the utilization of combined networks that integrate convolutional neural networks and transformers. Nevertheless, prevailing hybrid networks are confronted with limitations in their straightforward serial or parallel combination methods and lack an effective mechanism to fuse channel and spatial feature attention. To address these limitations, we present a robust multi-scale 3D medical image segmentation network, the Transformer-Driven Pyramid Attention Fusion Network, which is denoted as TPAFNet, leveraging a hybrid structure of CNN and transformer. Within this framework, we exploit the characteristics of atrous convolution to extract multi-scale information effectively, thereby enhancing the encoding results of the transformer. Furthermore, we introduce the TPAF block in the encoder to seamlessly fuse channel and spatial feature attention from multi-scale feature inputs. In contrast to conventional skip connections that simply concatenate or add features, our decoder is enriched with a TPAF connection, elevating the integration of feature attention between low-level and high-level features. Additionally, we propose a low-level encoding shortcut from the original input to the decoder output, preserving more original image features and contributing to enhanced results. Finally, the deep supervision is implemented using a novel CNN-based voxel-wise classifier to facilitate better network convergence. Experimental results demonstrate that TPAFNet significantly outperforms other state-of-the-art networks on two public datasets, indicating that our research can effectively improve the accuracy of medical image segmentation, thereby assisting doctors in making more precise diagnoses.

NeurIPS Conference 2023 Conference Paper

Amazon-M2: A Multilingual Multi-locale Shopping Session Dataset for Recommendation and Text Generation

  • Wei Jin
  • Haitao Mao
  • Zheng Li
  • Haoming Jiang
  • Chen Luo
  • Hongzhi Wen
  • Haoyu Han
  • Hanqing Lu

Modeling customer shopping intentions is a crucial task for e-commerce, as it directly impacts user experience and engagement. Thus, accurately understanding customer preferences is essential for providing personalized recommendations. Session-based recommendation, which utilizes customer session data to predict their next interaction, has become increasingly popular. However, existing session datasets have limitations in terms of item attributes, user diversity, and dataset scale. As a result, they cannot comprehensively capture the spectrum of user behaviors and preferences. To bridge this gap, we present the Amazon Multilingual Multi-locale Shopping Session Dataset, namely Amazon-M2. It is the first multilingual dataset consisting of millions of user sessions from six different locales, where the major languages of products are English, German, Japanese, French, Italian, and Spanish. Remarkably, the dataset can help us enhance personalization and understanding of user preferences, which can benefit various existing tasks as well as enable new tasks. To test the potential of the dataset, we introduce three tasks in this work: (1) next-product recommendation, (2) next-product recommendation with domain shifts, and (3) next-product title generation. With the above tasks, we benchmark a range of algorithms on our proposed dataset, drawing new insights for further research and practice. In addition, based on the proposed dataset and tasks, we hosted a competition in the KDD CUP 2023 https: //www. aicrowd. com/challenges/amazon-kdd-cup-23-multilingual-recommendation-challenge and have attracted thousands of users and submissions. The winning solutions and the associated workshop can be accessed at our website~https: //kddcup23. github. io/.

AAAI Conference 2023 Conference Paper

Curriculum Temperature for Knowledge Distillation

  • Zheng Li
  • Xiang Li
  • Lingfeng Yang
  • Borui Zhao
  • Renjie Song
  • Lei Luo
  • Jun Li
  • Jian Yang

Most existing distillation methods ignore the flexible role of the temperature in the loss function and fix it as a hyper-parameter that can be decided by an inefficient grid search. In general, the temperature controls the discrepancy between two distributions and can faithfully determine the difficulty level of the distillation task. Keeping a constant temperature, i.e., a fixed level of task difficulty, is usually sub-optimal for a growing student during its progressive learning stages. In this paper, we propose a simple curriculum-based technique, termed Curriculum Temperature for Knowledge Distillation (CTKD), which controls the task difficulty level during the student's learning career through a dynamic and learnable temperature. Specifically, following an easy-to-hard curriculum, we gradually increase the distillation loss w.r.t. the temperature, leading to increased distillation difficulty in an adversarial manner. As an easy-to-use plug-in technique, CTKD can be seamlessly integrated into existing knowledge distillation frameworks and brings general improvements at a negligible additional computation cost. Extensive experiments on CIFAR-100, ImageNet-2012, and MS-COCO demonstrate the effectiveness of our method.

NeurIPS Conference 2023 Conference Paper

Enhancing User Intent Capture in Session-Based Recommendation with Attribute Patterns

  • Xin Liu
  • Zheng Li
  • Yifan Gao
  • Jingfeng Yang
  • Tianyu Cao
  • Zhengyang Wang
  • Bing Yin
  • Yangqiu Song

The goal of session-based recommendation in E-commerce is to predict the next item that an anonymous user will purchase based on the browsing and purchase history. However, constructing global or local transition graphs to supplement session data can lead to noisy correlations and user intent vanishing. In this work, we propose the Frequent Attribute Pattern Augmented Transformer (FAPAT) that characterizes user intents by building attribute transition graphs and matching attribute patterns. Specifically, the frequent and compact attribute patterns are served as memory to augment session representations, followed by a gate and a transformer block to fuse the whole session information. Through extensive experiments on two public benchmarks and 100 million industrial data in three domains, we demonstrate that FAPAT consistently outperforms state-of-the-art methods by an average of 4. 5% across various evaluation metrics (Hits, NDCG, MRR). Besides evaluating the next-item prediction, we estimate the models' capabilities to capture user intents via predicting items' attributes and period-item recommendations.

AAAI Conference 2023 Conference Paper

People Taking Photos That Faces Never Share: Privacy Protection and Fairness Enhancement from Camera to User

  • Junjie Zhu
  • Lin Gu
  • Xiaoxiao Wu
  • Zheng Li
  • Tatsuya Harada
  • Yingying Zhu

The soaring number of personal mobile devices and public cameras poses a threat to fundamental human rights and ethical principles. For example, the stolen of private information such as face image by malicious third parties will lead to catastrophic consequences. By manipulating appearance of face in the image, most of existing protection algorithms are effective but irreversible. Here, we propose a practical and systematic solution to invertiblely protect face information in the full-process pipeline from camera to final users. Specifically, We design a novel lightweight Flow-based Face Encryption Method (FFEM) on the local embedded system privately connected to the camera, minimizing the risk of eavesdropping during data transmission. FFEM uses a flow-based face encoder to encode each face to a Gaussian distribution and encrypts the encoded face feature by random rotating the Gaussian distribution with the rotation matrix is as the password. While encrypted latent-variable face images are sent to users through public but less reliable channels, password will be protected through more secure channels through technologies such as asymmetric encryption, blockchain, or other sophisticated security schemes. User could select to decode an image with fake faces from the encrypted image on the public channel. Only trusted users are able to recover the original face using the encrypted matrix transmitted in secure channel. More interestingly, by tuning Gaussian ball in latent space, we could control the fairness of the replaced face on attributes such as gender and race. Extensive experiments demonstrate that our solution could protect privacy and enhance fairness with minimal effect on high-level downstream task.

JBHI Journal 2023 Journal Article

Reconstruction of Quantitative Susceptibility Mapping from Total Field Maps with Local Field Maps Guided UU-Net

  • Zheng Li
  • Shihui Ying
  • Jun Wang
  • Hongjian He
  • Jun Shi

Quantitative susceptibility mapping (QSM) is an emerging computational technique based on the magnetic resonance imaging (MRI) phase signal, which can provide magnetic susceptibility values of tissues. The existing deep learning-based models mainly reconstruct QSM from local field maps. However, the complicated inconsecutive reconstruction steps not only accumulate errors for inaccurate estimation, but also are inefficient in clinical practice. To this end, a novel local field maps guided UU-Net with Self- and Cross-Guided Transformer (LGUU-SCT-Net) is proposed to reconstruct QSM directly from the total field maps. Specifically, we propose to additionally generate the local field maps as the auxiliary supervision during the training stage. This strategy decomposes the more complicated mapping from total maps to QSM into two relatively easier ones, effectively alleviating the difficulty of direct mapping. Meanwhile, an improved U-Net model, named LGUU-SCT-Net, is further designed to promote the nonlinear mapping ability. The long-range connections are designed between two sequentially stacked U-Nets to bring more feature fusions and facilitate the information flow. The Self- and Cross-Guided Transformer integrated into these connections further captures multi-scale channel-wise correlations and guides the fusion of multi-scale transferred features, assisting in the more accurate reconstruction. The experimental results on an in-vivo dataset demonstrate the superior reconstruction results of our proposed algorithm.

JBHI Journal 2023 Journal Article

Two-Stage Self-Supervised Cycle-Consistency Transformer Network for Reducing Slice Gap in MR Images

  • Zhiyang Lu
  • Jian Wang
  • Zheng Li
  • Shihui Ying
  • Jun Wang
  • Jun Shi
  • Dinggang Shen

Magnetic resonance (MR) images are usually acquired with large slice gap in clinical practice, i. e. , low resolution (LR) along the through-plane direction. It is feasible to reduce the slice gap and reconstruct high-resolution (HR) images with the deep learning (DL) methods. To this end, the paired LR and HR images are generally required to train a DL model in a popular fully supervised manner. However, since the HR images are hardly acquired in clinical routine, it is difficult to get sufficient paired samples to train a robust model. Moreover, the widely used convolutional Neural Network (CNN) still cannot capture long-range image dependencies to combine useful information of similar contents, which are often spatially far away from each other across neighboring slices. To this end, a Two-stage Self-supervised Cycle-consistency Transformer Network (TSCTNet) is proposed to reduce the slice gap for MR images in this work. A novel self-supervised learning (SSL) strategy is designed with two stages respectively for robust network pre-training and specialized network refinement based on a cycle-consistency constraint. A hybrid Transformer and CNN structure is utilized to build an interpolation model, which explores both local and global slice representations. The experimental results on two public MR image datasets indicate that TSCTNet achieves superior performance over other compared SSL-based algorithms.

NeurIPS Conference 2022 Conference Paper

Deep Multi-Modal Structural Equations For Causal Effect Estimation With Unstructured Proxies

  • Shachi Deshpande
  • Kaiwen Wang
  • Dhruv Sreenivas
  • Zheng Li
  • Volodymyr Kuleshov

Estimating the effect of intervention from observational data while accounting for confounding variables is a key task in causal inference. Oftentimes, the confounders are unobserved, but we have access to large amounts of additional unstructured data (images, text) that contain valuable proxy signal about the missing confounders. This paper argues that leveraging this unstructured data can greatly improve the accuracy of causal effect estimation. Specifically, we introduce deep multi-modal structural equations, a generative model for causal effect estimation in which confounders are latent variables and unstructured data are proxy variables. This model supports multiple multimodal proxies (images, text) as well as missing data. We empirically demonstrate that our approach outperforms existing methods based on propensity scores and corrects for confounding using unstructured inputs on tasks in genomics and healthcare. Our methods can potentially support the use of large amounts of data that were previously not used in causal inference

IJCAI Conference 2022 Conference Paper

FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer

  • Yang Lin
  • Tianyu Zhang
  • Peiqin Sun
  • Zheng Li
  • Shuchang Zhou

Network quantization significantly reduces model inference complexity and has been widely used in real-world deployments. However, most existing quantization methods have been developed mainly on Convolutional Neural Networks (CNNs), and suffer severe degradation when applied to fully quantized vision transformers. In this work, we demonstrate that many of these difficulties arise because of serious inter-channel variation in LayerNorm inputs, and present, Power-of-Two Factor (PTF), a systematic method to reduce the performance degradation and inference complexity of fully quantized vision transformers. In addition, observing an extreme non-uniform distribution in attention maps, we propose Log-Int-Softmax (LIS) to sustain that and simplify inference by using 4-bit quantization and the BitShift operator. Comprehensive experiments on various transformer-based architectures and benchmarks show that our Fully Quantized Vision Transformer (FQ-ViT) outperforms previous works while even using lower bit-width on attention maps. For instance, we reach 84. 89% top-1 accuracy with ViT-L on ImageNet and 50. 8 mAP with Cascade Mask R-CNN (Swin-S) on COCO. To our knowledge, we are the first to achieve lossless accuracy degradation (~1%) on fully quantized vision transformers. The code is available at https: //github. com/megvii-research/FQ-ViT.

NeurIPS Conference 2022 Conference Paper

Learning to Sample and Aggregate: Few-shot Reasoning over Temporal Knowledge Graphs

  • Ruijie Wang
  • Zheng Li
  • Dachun Sun
  • Shengzhong Liu
  • Jinning Li
  • Bing Yin
  • Tarek Abdelzaher

In this paper, we investigate a realistic but underexplored problem, called few-shot temporal knowledge graph reasoning, that aims to predict future facts for newly emerging entities based on extremely limited observations in evolving graphs. It offers practical value in applications that need to derive instant new knowledge about new entities in temporal knowledge graphs (TKGs) with minimal supervision. The challenges mainly come from the few-shot and time shift properties of new entities. First, the limited observations associated with them are insufficient for training a model from scratch. Second, the potentially dynamic distributions from the initially observable facts to the future facts ask for explicitly modeling the evolving characteristics of new entities. We correspondingly propose a novel Meta Temporal Knowledge Graph Reasoning (MetaTKGR) framework. Unlike prior work that relies on rigid neighborhood aggregation schemes to enhance low-data entity representation, MetaTKGR dynamically adjusts the strategies of sampling and aggregating neighbors from recent facts for new entities, through temporally supervised signals on future facts as instant feedback. Besides, such a meta temporal reasoning procedure goes beyond existing meta-learning paradigms on static knowledge graphs that fail to handle temporal adaptation with large entity variance. We further provide a theoretical analysis and propose a temporal adaptation regularizer to stabilize the meta temporal reasoning over time. Empirically, extensive experiments on three real-world TKGs demonstrate the superiority of MetaTKGR over eight state-of-the-art baselines by a large margin.

IJCAI Conference 2022 Conference Paper

Multi-View Visual Semantic Embedding

  • Zheng Li
  • Caili Guo
  • Zerun Feng
  • Jenq-Neng Hwang
  • Xijun Xue

Visual Semantic Embedding (VSE) is a dominant method for cross-modal vision-language retrieval. Its purpose is to learn an embedding space so that visual data can be embedded in a position close to the corresponding text description. However, there are large intra-class variations in the vision-language data. For example, multiple texts describing the same image may be described from different views, and the descriptions of different views are often dissimilar. The mainstream VSE method embeds samples from the same class in similar positions, which will suppress intra-class variations and lead to inferior generalization performance. This paper proposes a Multi-View Visual Semantic Embedding (MV-VSE) framework, which learns multiple embeddings for one visual data and explicitly models intra-class variations. To optimize MV-VSE, a multi-view upper bound loss is proposed, and the multi-view embeddings are jointly optimized while retaining intra-class variations. MV-VSE is plug-and-play and can be applied to various VSE models and loss functions without excessively increasing model complexity. Experimental results on the Flickr30K and MS-COCO datasets demonstrate the superior performance of our framework.

IROS Conference 2022 Conference Paper

Towards Reproducible Evaluations for Flying Drone Controllers in Virtual Environments

  • Zheng Li
  • Yiming Huang 0007
  • Yui-Pan Yau
  • Pan Hui 0001
  • Lik-Hang Lee

Research attention on natural user interfaces (NUIs) for drone flights are rising. Nevertheless, NUIs are highly diversified, and primarily evaluated by different physical environments leading to hard-to-compare performance between such solutions. We propose a virtual environment, namely VRFlightSim, enabling comparative evaluations with enriched drone flight details to address this issue. We first replicated a state-of-the-art (SOTA) interface and designed two tasks (crossing and pointing) in our virtual environment. Then, two user studies with 13 participants demonstrate the necessity of VRFlightSim and further highlight the potential of open-data interface designs.

IJCAI Conference 2020 Conference Paper

Efficient and Modularized Training on FPGA for Real-time Applications

  • Shreyas Kolala Venkataramanaiah
  • Xiaocong Du
  • Zheng Li
  • Shihui Yin
  • Yu Cao
  • Jae-Sun Seo

Training of deep Convolution Neural Networks (CNNs) requires a tremendous amount of computation and memory and thus, GPUs are widely used to meet the computation demands of these complex training tasks. However, lacking the flexibility to exploit architectural optimizations, GPUs have poor energy efficiency of GPUs and are hard to be deployed on energy-constrained platforms. FPGAs are highly suitable for training, such as real-time learning at the edge, as they provide higher energy efficiency and better flexibility to support algorithmic evolution. This paper first develops a training accelerator on FPGA, with 16-bit fixed-point computing and various training modules. Furthermore, leveraging model segmentation techniques from Progressive Segmented Training, the newly developed FPGA accelerator is applied to online learning, achieving much lower computation cost. We demonstrate the performance of representative CNNs trained for CIFAR-10 on Intel Stratix-10 MX FPGA, evaluating both the conventional training procedure and the online learning algorithm.

IJCAI Conference 2020 Conference Paper

Exploiting Visual Semantic Reasoning for Video-Text Retrieval

  • Zerun Feng
  • Zhimin Zeng
  • Caili Guo
  • Zheng Li

Video retrieval is a challenging research topic bridging the vision and language areas and has attracted broad attention in recent years. Previous works have been devoted to representing videos by directly encoding from frame-level features. In fact, videos consist of various and abundant semantic relations to which existing methods pay less attention. To address this issue, we propose a Visual Semantic Enhanced Reasoning Network (ViSERN) to exploit reasoning between frame regions. Specifically, we consider frame regions as vertices and construct a fully-connected semantic correlation graph. Then, we perform reasoning by novel random walk rule-based graph convolutional networks to generate region features involved with semantic relations. With the benefit of reasoning, semantic interactions between regions are considered, while the impact of redundancy is suppressed. Finally, the region features are aggregated to form frame-level features for further encoding to measure video-text similarity. Extensive experiments on two public benchmark datasets validate the effectiveness of our method by achieving state-of-the-art performance due to the powerful semantic reasoning.

NeurIPS Conference 2019 Conference Paper

Dimension-Free Bounds for Low-Precision Training

  • Zheng Li
  • Christopher De Sa

Low-precision training is a promising way of decreasing the time and energy cost of training machine learning models. Previous work has analyzed low-precision training algorithms, such as low-precision stochastic gradient descent, and derived theoretical bounds on their convergence rates. These bounds tend to depend on the dimension of the model $d$ in that the number of bits needed to achieve a particular error bound increases as $d$ increases. In this paper, we derive new bounds for low-precision training algorithms that do not contain the dimension $d$, which lets us better understand what affects the convergence of these algorithms as parameters scale. Our methods also generalize naturally to let us prove new convergence bounds on low-precision training with other quantization schemes, such as low-precision floating-point computation and logarithmic quantization.

AAAI Conference 2019 Conference Paper

Exploiting Coarse-to-Fine Task Transfer for Aspect-Level Sentiment Classification

  • Zheng Li
  • Ying Wei
  • Yu Zhang
  • Xiang Zhang
  • Xin Li

Aspect-level sentiment classification (ASC) aims at identifying sentiment polarities towards aspects in a sentence, where the aspect can behave as a general Aspect Category (AC) or a specific Aspect Term (AT). However, due to the especially expensive and labor-intensive labeling, existing public corpora in AT-level are all relatively small. Meanwhile, most of the previous methods rely on complicated structures with given scarce data, which largely limits the efficacy of the neural models. In this paper, we exploit a new direction named coarse-to-fine task transfer, which aims to leverage knowledge learned from a rich-resource source domain of the coarse-grained AC task, which is more easily accessible, to improve the learning in a low-resource target domain of the fine-grained AT task. To resolve both the aspect granularity inconsistency and feature mismatch between domains, we propose a Multi-Granularity Alignment Network (MGAN). In MGAN, a novel Coarse2Fine attention guided by an auxiliary task can help the AC task modeling at the same finegrained level with the AT task. To alleviate the feature false alignment, a contrastive feature alignment method is adopted to align aspect-specific feature representations semantically. In addition, a large-scale multi-domain dataset for the AC task is provided. Empirically, extensive experiments demonstrate the effectiveness of the MGAN.

JBHI Journal 2019 Journal Article

MR Image Super-Resolution via Wide Residual Networks With Fixed Skip Connection

  • Jun Shi
  • Zheng Li
  • Shihui Ying
  • Chaofeng Wang
  • Qingping Liu
  • Qi Zhang
  • Pingkun Yan

Spatial resolution is a critical imaging parameter in magnetic resonance imaging. The image super-resolution (SR) is an effective and cost efficient alternative technique to improve the spatial resolution of MR images. Over the past several years, the convolutional neural networks (CNN)-based SR methods have achieved state-of-the-art performance. However, CNNs with very deep network structures usually suffer from the problems of degradation and diminishing feature reuse, which add difficulty to network training and degenerate the transmission capability of details for SR. To address these problems, in this work, a progressive wide residual network with a fixed skip connection (named FSCWRN) based SR algorithm is proposed to reconstruct MR images, which combines the global residual learning and the shallow network based local residual learning. The strategy of progressive wide networks is adopted to replace deeper networks, which can partially relax the above-mentioned problems, while a fixed skip connection helps provide rich local details at high frequencies from a fixed shallow layer network to subsequent networks. The experimental results on one simulated MR image database and three real MR image databases show the effectiveness of the proposed FSCWRN SR algorithm, which achieves improved reconstruction performance compared with other algorithms.

JMLR Journal 2019 Journal Article

PyOD: A Python Toolbox for Scalable Outlier Detection

  • Yue Zhao
  • Zain Nasrullah
  • Zheng Li

PyOD is an open-source Python toolbox for performing scalable outlier detection on multivariate data. Uniquely, it provides access to a wide range of outlier detection algorithms, including established outlier ensembles and more recent neural network-based approaches, under a single, well-documented API designed for use by both practitioners and researchers. With robustness and scalability in mind, best practices such as unit testing, continuous integration, code coverage, maintainability checks, interactive examples and parallelization are emphasized as core components in the toolbox's development. PyOD is compatible with both Python 2 and 3 and can be installed through Python Package Index (PyPI) or https://github.com/yzhao062/pyod. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2019. ( edit, beta )

AAAI Conference 2018 Conference Paper

Hierarchical Attention Transfer Network for Cross-Domain Sentiment Classification

  • Zheng Li
  • Ying Wei
  • Yu Zhang
  • Qiang Yang

Cross-domain sentiment classification aims to leverage useful information in a source domain to help do sentiment classification in a target domain that has no or little supervised information. Existing cross-domain sentiment classification methods cannot automatically capture non-pivots, i. e. , the domainspecific sentiment words, and pivots, i. e. , the domain-shared sentiment words, simultaneously. In order to solve this problem, we propose a Hierarchical Attention Transfer Network (HATN) for cross-domain sentiment classification. The proposed HATN provides a hierarchical attention transfer mechanism which can transfer attentions for emotions across domains by automatically capturing pivots and non-pivots. Besides, the hierarchy of the attention mechanism mirrors the hierarchical structure of documents, which can help locate the pivots and non-pivots better. The proposed HATN consists of two hierarchical attention networks, with one named P-net aiming to find the pivots and the other named NP-net aligning the non-pivots by using the pivots as a bridge. Specifically, P-net firstly conducts individual attention learning to provide positive and negative pivots for NP-net. Then, Pnet and NP-net conduct joint attention learning such that the HATN can simultaneously capture pivots and non-pivots and realize transferring attentions for emotions across domains. Experiments on the Amazon review dataset demonstrate the effectiveness of HATN.

IJCAI Conference 2017 Conference Paper

End-to-End Adversarial Memory Network for Cross-domain Sentiment Classification

  • Zheng Li
  • Yu Zhang
  • Ying Wei
  • Yuxiang Wu
  • Qiang Yang

Domain adaptation tasks such as cross-domain sentiment classification have raised much attention in recent years. Due to the domain discrepancy, a sentiment classifier trained in a source domain may not work well when directly applied to a target domain. Traditional methods need to manually select pivots, which behave in the same way for discriminative learning in both domains. Recently, deep learning methods have been proposed to learn a representation shared by domains. However, they lack the interpretability to directly identify the pivots. To address the problem, we introduce an end-to-end Adversarial Memory Network (AMN) for cross-domain sentiment classification. Unlike existing methods, our approach can automatically capture the pivots using an attention mechanism. Our framework consists of two parameter-shared memory networks: one is for sentiment classification and the other is for domain classification. The two networks are jointly trained so that the selected features minimize the sentiment classification error and at the same time make the domain classifier indiscriminative between the representations from the source or target domains. Moreover, unlike deep learning methods that cannot tell us which words are the pivots, our approach can offer a direct visualization of them. Experiments on the Amazon review dataset demonstrate that our approach can significantly outperform state-of-the-art methods.

AAMAS Conference 2010 Conference Paper

Emotional Eye Movement Markup Language for Virtual Agents

  • Zheng Li
  • Xia Mao

EEMML (Emotional Eye Movement Markup Language) is a scripting toolthat enables authors to describe and generate emotional eye movementin virtual agents. The EEMML is capable of describing and generatingboth basic eye movement and emotional eye movement, includingprimary (joy, sadness, anger, fear, disgust and surprise) andintermediate (emotions that can be represented as the mixture of twoprimary emotions) emotions for virtual agents. The emotional eyemovement generation framework is based upon the MPEG-4 FAP (facialanimation parameters), and the animations are driven by parameterspicked from the Cohn-Kanade AU-Coded facial expression database aswell as real-time eye movement data(pupil size, blink rate andsaccade).