Arrow Research search

Author name cluster

Ke Lu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers
1 author row

Possible papers

13

IJCAI Conference 2025 Conference Paper

MATCH: Modality-Calibrated Hypergraph Fusion Network for Conversational Emotion Recognition

  • Jiandong Shi
  • Ming Li
  • Lu Bai
  • Feilong Cao
  • Ke Lu
  • Jiye Liang

Multimodal emotion recognition aims to identify emotions by integrating multimodal features derived from spoken utterances. However, existing work often neglects the calibration of conversational entities, focusing mainly on extracting potential intra- or cross-modal information. This leads to the underutilization of utterance information that is essential for accurately characterizing emotion. Additionally, the lack of effective modeling of conversational patterns limits the ability to capture emotional pathways across contexts, modalities and speakers, impacting the overall emotional understanding. In this study, we propose the modality-calibrated hypergraph fusion network (MATCH), which leverages multimodal fusion and hypergraph learning techniques to address these challenges. In particular, we introduce an entity calibration strategy that refines the representations of conversational entities both at the modality and context levels, allowing for deeper insights into emotion-related cues. Furthermore, we present an emotion-aligned hypergraph fusion method that incorporates a line graph to explore conversational patterns, facilitating flexible knowledge transfer across modalities through hyperedge-level and graph-level alignments. Experiments demonstrate that MATCH outperforms state-of-the-art approaches on two benchmark datasets.

AAAI Conference 2024 Conference Paper

Agile Multi-Source-Free Domain Adaptation

  • Xinyao Li
  • Jingjing Li
  • Fengling Li
  • Lei Zhu
  • Ke Lu

Efficiently utilizing rich knowledge in pretrained models has become a critical topic in the era of large models. This work focuses on adaptively utilize knowledge from multiple source-pretrained models to an unlabeled target domain without accessing the source data. Despite being a practically useful setting, existing methods require extensive parameter tuning over each source model, which is computationally expensive when facing abundant source domains or larger source models. To address this challenge, we propose a novel approach which is free of the parameter tuning over source backbones. Our technical contribution lies in the Bi-level ATtention ENsemble (Bi-ATEN) module, which learns both intra-domain weights and inter-domain ensemble weights to achieve a fine balance between instance specificity and domain consistency. By slightly tuning source bottlenecks, we achieve comparable or even superior performance on a challenging benchmark DomainNet with less than 3% trained parameters and 8 times of throughput compared with SOTA method. Furthermore, with minor modifications, the proposed module can be easily equipped to existing methods and gain more than 4% performance boost. Code is available at https://github.com/TL-UESTC/Bi-ATEN.

AAAI Conference 2023 Conference Paper

Cross-Domain Adaptative Learning for Online Advertisement Customer Lifetime Value Prediction

  • Hongzu Su
  • Zhekai Du
  • Jingjing Li
  • Lei Zhu
  • Ke Lu

Accurate estimation of customer lifetime value (LTV), which reflects the potential consumption of a user over a period of time, is crucial for the revenue management of online advertising platforms. However, predicting LTV in real-world applications is not an easy task since the user consumption data is usually insufficient within a specific domain. To tackle this problem, we propose a novel cross-domain adaptative framework (CDAF) to leverage consumption data from different domains. The proposed method is able to simultaneously mitigate the data scarce problem and the distribution gap problem caused by data from different domains. To be specific, our method firstly learns a LTV prediction model from a different but related platform with sufficient data provision. Subsequently, we exploit domain-invariant information to mitigate data scarce problem by minimizing the Wasserstein discrepancy between the encoded user representations of two domains. In addition, we design a dual-predictor schema which not only enhances domain-invariant information in the semantic space but also preserves domain-specific information for accurate target prediction. The proposed framework is evaluated on five datasets collected from real historical data on the advertising platform of Tencent Games. Experimental results verify that the proposed framework is able to significantly improve the LTV prediction performance on this platform. For instance, our method can boost DCNv2 with the improvement of 13.7% in terms of AUC on dataset G2. Code: https://github.com/TL-UESTC/CDAF.

JBHI Journal 2023 Journal Article

Semi-Supervised Medical Image Segmentation With Voxel Stability and Reliability Constraints

  • Yang Zhao
  • Ke Lu
  • Jian Xue
  • Shuhua Wang
  • Jian Lu

Semi-supervised learning is becoming an effective solution in medical image segmentation because annotations are costly and tedious to acquire. Methods based on the teacher-student model use consistency regularization and uncertainty estimation and have shown good potential in dealing with limited annotated data. Nevertheless, the existing teacher-student model is seriously limited by the exponential moving average algorithm, which leads to the optimization trap. Moreover, the classic uncertainty estimation method calculates the global uncertainty for images but does not consider local region-level uncertainty, which is unsuitable for medical images with blurry regions. In this article, the Voxel Stability and Reliability Constraint (VSRC) model is proposed to address these issues. Specifically, the Voxel Stability Constraint (VSC) strategy is introduced to optimize parameters and exchange effective knowledge between two independent initialized models, which can break through the performance bottleneck and avoid model collapse. Moreover, a new uncertainty estimation strategy, the Voxel Reliability Constraint (VRC), is proposed for use in our semi-supervised model to consider the uncertainty at the local region level. We further extend our model to auxiliary tasks and propose a task-level consistency regularization with uncertainty estimation. Extensive experiments on two 3D medical image datasets demonstrate that our method outperforms other state-of-the-art semi-supervised medical image segmentation methods under limited supervision.

AAAI Conference 2022 Conference Paper

Exploring Visual Context for Weakly Supervised Person Search

  • Yichao Yan
  • Jinpeng Li
  • Shengcai Liao
  • Jie Qin
  • Bingbing Ni
  • Ke Lu
  • Xiaokang Yang

Person search has recently emerged as a challenging task that jointly addresses pedestrian detection and person reidentification. Existing approaches follow a fully supervised setting where both bounding box and identity annotations are available. However, annotating identities is labor-intensive, limiting the practicability and scalability of current frameworks. This paper inventively considers weakly supervised person search with only bounding box annotations. We proposed to address this novel task by investigating three levels of context clues (i. e. , detection, memory and scene) in unconstrained natural images. The first two are employed to promote local and global discriminative capabilities, while the latter enhances clustering accuracy. Despite its simple design, our CGPS achieves 80. 0% in mAP on CUHK-SYSU, boosting the baseline model by 8. 8%. Surprisingly, it even achieves comparable performance with several supervised person search models. Our code is available at https: //github. com/ljpadam/CGPS

AAAI Conference 2022 Conference Paper

Shape-Adaptive Selection and Measurement for Oriented Object Detection

  • Liping Hou
  • Ke Lu
  • Jian Xue
  • Yuqiu Li

The development of detection methods for oriented object detection remains a challenging task. A considerable obstacle is the wide variation in the shape (e. g. , aspect ratio) of objects. Sample selection in general object detection has been widely studied as it plays a crucial role in the performance of the detection method and has achieved great progress. However, existing sample selection strategies still overlook some issues: (1) most of them ignore the object shape information; (2) they do not make a potential distinction between selected positive samples; and (3) some of them can only be applied to either anchor-free or anchor-based methods and cannot be used for both of them simultaneously. In this paper, we propose novel flexible shape-adaptive selection (SA-S) and shape-adaptive measurement (SA-M) strategies for oriented object detection, which comprise an SA-S strategy for sample selection and SA-M strategy for the quality estimation of positive samples. Specifically, the SA-S strategy dynamically selects samples according to the shape information and characteristics distribution of objects. The SA-M strategy measures the localization potential and adds quality information on the selected positive samples. The experimental results on both anchor-free and anchor-based baselines and four publicly available oriented datasets (DOTA, HRSC2016, UCAS- AOD, and ICDAR2015) demonstrate the effectiveness of the proposed method.

AAAI Conference 2022 Conference Paper

TransZero: Attribute-Guided Transformer for Zero-Shot Learning

  • Shiming Chen
  • Ziming Hong
  • Yang Liu
  • Guo-Sen Xie
  • Baigui Sun
  • Hao Li
  • Qinmu Peng
  • Ke Lu

Zero-shot learning (ZSL) aims to recognize novel classes by transferring semantic knowledge from seen classes to unseen ones. Semantic knowledge is learned from attribute descriptions shared between different classes, which act as strong priors for localizing object attributes that represent discriminative region features, enabling significant visual-semantic interaction. Although some attention-based models have attempted to learn such region features in a single image, the transferability and discriminative attribute localization of visual features are typically neglected. In this paper, we propose an attribute-guided Transformer network, termed TransZero, to refine visual features and learn attribute localization for discriminative visual embedding representations in ZSL. Specifically, TransZero takes a feature augmentation encoder to alleviate the cross-dataset bias between ImageNet and ZSL benchmarks, and improves the transferability of visual features by reducing the entangled relative geometry relationships among region features. To learn locality-augmented visual features, TransZero employs a visual-semantic decoder to localize the image regions most relevant to each attribute in a given image, under the guidance of semantic attribute information. Then, the locality-augmented visual features and semantic vectors are used to conduct effective visual-semantic interaction in a visual-semantic embedding network. Extensive experiments show that TransZero achieves the new state of the art on three ZSL benchmarks. The codes are available at: https: //github. com/shiming-chen/TransZero.

AAAI Conference 2021 Conference Paper

Balanced Open Set Domain Adaptation via Centroid Alignment

  • Mengmeng Jing
  • Jingjing Li
  • Lei Zhu
  • Zhengming Ding
  • Ke Lu
  • Yang Yang

Open Set Domain Adaptation (OSDA) is a challenging domain adaptation setting which allows the existence of unknown classes on the target domain. Although existing OSDA methods are good at classifying samples of known classes, they ignore the classification ability for the unknown samples, making them unbalanced OSDA methods. To alleviate this problem, we propose a balanced OSDA methods which could recognize the unknown samples while maintain high classification performance for the known samples. Specifically, to reduce the domain gaps, we first project the features to a hyperspherical latent space. In this space, we propose to bound the centroid deviation angles to not only increase the intraclass compactness but also enlarge the inter-class margins. With the bounded centroid deviation angles, we employ the statistical Extreme Value Theory to recognize the unknown samples that are misclassified into known classes. In addition, to learn better centroids, we propose an improved centroid update strategy based on sample reweighting and adaptive update rate to cooperate with centroid alignment. Experimental results on three OSDA benchmarks verify that our method can significantly outperform the compared methods and reduce the proportion of the unknown samples being misclassified into known classes.

AAAI Conference 2019 Conference Paper

From Zero-Shot Learning to Cold-Start Recommendation

  • Jingjing Li
  • Mengmeng Jing
  • Ke Lu
  • Lei Zhu
  • Yang Yang
  • Zi Huang

Zero-shot learning (ZSL) and cold-start recommendation (CSR) are two challenging problems in computer vision and recommender system, respectively. In general, they are independently investigated in different communities. This paper, however, reveals that ZSL and CSR are two extensions of the same intension. Both of them, for instance, attempt to predict unseen classes and involve two spaces, one for direct feature representation and the other for supplementary description. Yet there is no existing approach which addresses CSR from the ZSL perspective. This work, for the first time, formulates CSR as a ZSL problem, and a tailor-made ZSL method is proposed to handle CSR. Specifically, we propose a Lowrank Linear Auto-Encoder (LLAE), which challenges three cruxes, i. e. , domain shift, spurious correlations and computing efficiency, in this paper. LLAE consists of two parts, a low-rank encoder maps user behavior into user attributes and a symmetric decoder reconstructs user behavior from user attributes. Extensive experiments on both ZSL and CSR tasks verify that the proposed method is a win-win formulation, i. e. , not only can CSR be handled by ZSL models with a significant performance improvement compared with several conventional state-of-the-art methods, but the consideration of CSR can benefit ZSL as well.

AAAI Conference 2018 Conference Paper

Action Recognition With Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion

  • Weiyao Lin
  • Chongyang Zhang
  • Ke Lu
  • Bin Sheng
  • Jianxin Wu
  • Bingbing Ni
  • Xin Liu
  • Hongkai Xiong

Action recognition is an important yet challenging task in computer vision. In this paper, we propose a novel deepbased framework for action recognition, which improves the recognition accuracy by: 1) deriving more precise features for representing actions, and 2) reducing the asynchrony between different information streams. We first introduce a coarse-to-fine network which extracts shared deep features at different action class granularities and progressively integrates them to obtain a more accurate feature representation for input actions. We further introduce an asynchronous fusion network. It fuses information from different streams by asynchronously integrating stream-wise features at different time points, hence better leveraging the complementary information in different streams. Experimental results on action recognition benchmarks demonstrate that our approach achieves the state-of-the-art performance.

IJCAI Conference 2016 Conference Paper

Joint Feature Selection and Structure Preservation for Domain Adaptation

  • Jingjing Li
  • Jidong Zhao
  • Ke Lu

The essence of domain adaptation is to explore common latent factors shared by the involved domains. These factors can be specific features or geometric structures. Most of previous methods exploit either the shared features or the shared geometric structures separately. However, the two strategies are complementary with each other and jointly exploring them is more optimal. This paper proposes a novel approach, named joint Feature Selection and Structure Preservation (FSSP), for unsupervised domain adaptation. FSSP smoothly integrates structure preservation and feature selection into a unified optimization problem. Intensive experiments on text categorization, image classification and video event recognition demonstrate that our method performs better, even with up to 30% improvement in average, compared with the state-of-the-art methods.

TIST Journal 2015 Journal Article

A Real-Time Hand Posture Recognition System Using Deep Neural Networks

  • Ao Tang
  • Ke Lu
  • Yufei Wang
  • Jie Huang
  • Houqiang Li

Hand posture recognition (HPR) is quite a challenging task, due to both the difficulty in detecting and tracking hands with normal cameras and the limitations of traditional manually selected features. In this article, we propose a two-stage HPR system for Sign Language Recognition using a Kinect sensor. In the first stage, we propose an effective algorithm to implement hand detection and tracking. The algorithm incorporates both color and depth information, without specific requirements on uniform-colored or stable background. It can handle the situations in which hands are very close to other parts of the body or hands are not the nearest objects to the camera and allows for occlusion of hands caused by faces or other hands. In the second stage, we apply deep neural networks (DNNs) to automatically learn features from hand posture images that are insensitive to movement, scaling, and rotation. Experiments verify that the proposed system works quickly and accurately and achieves a recognition accuracy as high as 98.12%.