Author name cluster

Weisi Lin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

19 papers

2 author rows

AAAI Conference 2026 Conference Paper

DipGuava: Disentangling Personalized Gaussian Features for 3D Head Avatars from Monocular Video

Jeonghaeng Lee
Seok Keun Choi
Zhixuan Li
Weisi Lin
Sanghoon Lee

While recent 3D head avatar creation methods attempt to animate facial dynamics, they often fail to capture personalized details, limiting realism and expressiveness. To fill this gap, we present DipGuava (Disentangled and Personalized Gaussian UV Avatar), a novel 3D Gaussian head avatar creation method that successfully generates avatars with personalized attributes from monocular video. DipGuava is the first method to explicitly disentangle facial appearance into two complementary components, trained in a structured two-stage pipeline that significantly reduces learning ambiguity and enhances reconstruction fidelity. In the first stage, we learn a stable geometry-driven base appearance that captures global facial structure and coarse expression-dependent variations. In the second stage, the personalized residual details not captured in the first stage are predicted, including high-frequency components and nonlinearly varying features such as wrinkles and subtle skin deformations. These components are fused via dynamic appearance fusion that integrates residual details after deformation, ensuring spatial and semantic alignment. This disentangled design enables DipGuava to generate photorealistic, identity-preserving avatars, consistently outperforming prior methods in both visual quality and quantitative performance, as demonstrated in extensive experiments.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Nighttime Flare Removal via Wavelet-Guided and Gated-Enhanced Spatial-Frequency Fusion Network

Yun Liu
Guang Yang
Tao Li
Weisi Lin

Nighttime flares, caused by complex scattering and reflections from artificial light sources, significantly degrade image quality and hinder downstream visual tasks. Existing deflare networks usually struggle to jointly capture and fuse latent spatial and frequency features. In this paper, we propose a novel Wavelet-guided and Gated-enhanced Spatial-frequency Fusion Network (WGSF-Net) for nighttime flare removal. WGSF-Net is primarily composed of two key modules: Wavelet-guided Fusion Block (WFB) and Local-Global Block (LGB). Specifically, WFB integrates a Multi-level Wavelet Enhancement Block (MWEB) and a Spatial-Frequency Fusion Network (SFFN) to effectively extract hierarchical spatial and frequency features through a coarse-to-fine strategy based on multi-level wavelet decomposition. To better suppress flare artifacts, LGB is designed to jointly capture local and global information: a Gated-Enhanced Attention Block (GEAB) selectively amplifies critical local features through a gated network and a difference network, and the subsequent SFFN performs global spatial-frequency fusion via depthwise separable convolution and partial Fourier convolution. This design enables LGB to effectively disentangle flare-corrupted regions and restore fine-grained details, making it particularly suited for challenging real-world flare scenarios. Extensive experiments on both synthetic and real datasets show that WGSF-Net achieves state-of-the-art performance in nighttime flare removal, outperforming existing methods across five evaluation metrics.

PDF Details DOI

ICLR Conference 2025 Conference Paper

A-Bench: Are LMMs Masters at Evaluating AI-generated Images?

Zicheng Zhang
Haoning Wu 0001
Chunyi Li
Yingjie Zhou 0003
Wei Sun 0029
Xiongkuo Min
Zijian Chen 0001
Xiaohong Liu 0001

How to accurately and efficiently assess AI-generated images (AIGIs) remains a critical challenge for generative models. Given the high costs and extensive time commitments required for user studies, many researchers have turned towards employing large multi-modal models (LMMs) as AIGI evaluators, the precision and validity of which are still questionable. Furthermore, traditional benchmarks often utilize mostly natural-captured content rather than AIGIs to test the abilities of LMMs, leading to a noticeable gap for AIGIs. Therefore, we introduce **A-Bench** in this paper, a benchmark designed to diagnose *whether LMMs are masters at evaluating AIGIs*. Specifically, **A-Bench** is organized under two key principles: 1) Emphasizing both high-level semantic understanding and low-level visual quality perception to address the intricate demands of AIGIs. 2) Various generative models are utilized for AIGI creation, and various LMMs are employed for evaluation, which ensures a comprehensive validation scope. Ultimately, 2,864 AIGIs from 16 text-to-image models are sampled, each paired with question-answers annotated by human experts. We hope that **A-Bench** will significantly enhance the evaluation process and promote the generation quality for AIGIs.

Details

JBHI Journal 2025 Journal Article

Adversarial Exposure Attack on Diabetic Retinopathy Imagery Grading

Yupeng Cheng
Qing Guo
Felix Juefei-Xu
Huazhu Fu
Shang-Wei Lin
Weisi Lin

Diabetic Retinopathy (DR) is a leading cause of vision loss around the world. To help diagnose it, numerous cutting-edge works have built powerful deep neural networks (DNNs) to automatically grade DR via retinal fundus images (RFIs). However, RFIs are commonly affected by camera exposure issues that may lead to incorrect grades. The mis-graded results can potentially pose high risks to an aggravation of the condition. In this paper, we study this problem from the viewpoint of adversarial attacks. We identify and introduce a novel solution to an entirely new task, termed as adversarial exposure attack, which is able to produce natural exposure images and mislead the state-of-the-art DNNs. We validate our proposed method on a real-world public DR dataset with three DNNs, e. g. , ResNet50, MobileNet, and EfficientNet, demonstrating that our method achieves high image quality and success rate in transferring the attacks. Our method reveals the potential threats to DNN-based automatic DR grading and would benefit the development of exposure-robust DR grading methods in the future.

Details DOI

NeurIPS Conference 2025 Conference Paper

BADiff: Bandwidth Adaptive Diffusion Model

Xi Zhang
Hanwei Zhu
Yan Zhong
Jiamang Wang
Weisi Lin

In this work, we propose a novel framework to enable diffusion models to adapt their generation quality based on real-time network bandwidth constraints. Traditional diffusion models produce high-fidelity images by performing a fixed number of denoising steps, regardless of downstream transmission limitations. However, in practical cloud-to-device scenarios, limited bandwidth often necessitates heavy compression, leading to loss of fine textures and wasted computation. To address this, we introduce a joint end-to-end training strategy where the diffusion model is conditioned on a target quality level derived from the available bandwidth. During training, the model learns to adaptively modulate the denoising process, enabling early-stop sampling that maintains perceptual quality appropriate to the target transmission condition. Our method requires minimal architectural changes and leverages a lightweight quality embedding to guide the denoising trajectory. Experimental results demonstrate that our approach significantly improves the visual fidelity of bandwidth-adapted generations compared to naive early-stopping, offering a promising solution for efficient image delivery in bandwidth-constrained environments. Code is available at: https: //github. com/xzhang9308/BADiff.

PDF Details

ICRA Conference 2025 Conference Paper

Deep Learning Based Topography Aware Gas Source Localization with Mobile Robot

Changhao Tian
Annan Wang
Han Fan
Thomas Wiedemann 0002
Yifei Luo
Le Yang 0007
Weisi Lin
Achim J. Lilienthal

Gas source localization in complex environments is critical for applications such as environmental monitoring, industrial safety, and disaster response. Traditional methods often struggle with the challenges posed by a lack of environmental topography integration, especially when interactions between wind and obstacles distort gas dispersion patterns. In this paper, we propose a deep learning-based approach, which leverages spatial context and environmental mapping to enhance gas source localization. By integrating Simultaneous Localization and Mapping (SLAM) with a U-Net-based model, our method predicts the likelihood of gas source locations by analyzing gas sensor data, wind flow, and topography of the environment represented by a 2D occupancy map. We demonstrate the efficacy of our approach using a wheeled robot equipped with a photoionization detector, a LIDAR, and an anemometer, in various scenarios with dynamic wind fields and multiple obstacles. The results show that our approach can robustly locate gas sources, even in challenging environments with fluctuating wind directions, outperforming conventional methods by utilizing topography contextual information. This study underscores the importance of topographical context in gas source localization and offers a flexible and robust solution for real-world applications. Data and code are publicly available.

Details

NeurIPS Conference 2025 Conference Paper

Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression

Xi Zhang
Xiaolin Wu
Jiamang Wang
Weisi Lin

Large Language Models (LLMs) have demonstrated remarkable capabilities but typically require extensive computational resources and memory for inference. Post-training quantization (PTQ) can effectively reduce these demands by storing weights in lower bit-width formats. However, standard uniform quantization often leads to notable performance degradation, particularly in low-bit scenarios. In this work, we introduce a Grouped Lattice Vector Quantization (GLVQ) framework that assigns each group of weights a customized lattice codebook, defined by a learnable generation matrix. To address the non-differentiability of the quantization process, we adopt Babai rounding to approximate nearest-lattice-point search during training, which enables stable optimization of the generation matrices. Once trained, decoding reduces to a simple matrix-vector multiplication, yielding an efficient and practical quantization pipeline. Experiments on multiple benchmarks show that our approach achieves a better trade-off between model size and accuracy compared to existing post-training quantization baselines, highlighting its effectiveness in deploying large models under stringent resource constraints. Our source code is available on GitHub repository: https: //github. com/xzhang9308/GLVQ.

PDF Details

ICLR Conference 2025 Conference Paper

Robust-PIFu: Robust Pixel-aligned Implicit Function for 3D Human Digitalization from a Single Image

Kennard Yanting Chan
Fayao Liu
Guosheng Lin
Chuan-Sheng Foo
Weisi Lin

Existing methods for 3D clothed human digitalization perform well when the input image is captured in ideal conditions that assume the lack of any occlusion. However, in reality, images may often have occlusion problems such as incomplete observation of the human subject's full body, self-occlusion by the human subject, and non-frontal body pose. When given such input images, these existing methods fail to perform adequately. Thus, we propose Robust-PIFu, a pixel-aligned implicit model that capitalized on large-scale, pretrained latent diffusion models to address the challenge of digitalizing human subjects from non-ideal images that suffer from occlusions. Robust-PIfu offers four new contributions. Firstly, we propose a 'disentangling' latent diffusion model. This diffusion model, pretrained on billions of images, takes in any input image and removes external occlusions, such as inter-person occlusions, from that image. Secondly, Robust-PIFu addresses internal occlusions like self-occlusion by introducing a `penetrating' latent diffusion model. This diffusion model outputs multi-layered normal maps that by-pass occlusions caused by the human subject's own limbs or other body parts (i.e. self-occlusion). Thirdly, in order to incorporate such multi-layered normal maps into a pixel-aligned implicit model, we introduce our Layered-Normals Pixel-aligned Implicit Model, which improves the structural accuracy of predicted clothed human meshes. Lastly, Robust-PIFu proposes an optional super-resolution mechanism for the multi-layered normal maps. This addresses scenarios where the input image is of low or inadequate resolution. Though not strictly related to occlusion, this is still an important subproblem. Our experiments show that Robust-PIFu outperforms current SOTA methods both qualitatively and quantitatively. Our code will be released to the public.

Details

NeurIPS Conference 2024 Conference Paper

Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare

Hanwei Zhu
Haoning Wu
Yixuan Li
Zicheng Zhang
Baoliang Chen
Lingyu Zhu
Yuming Fang
Guangtao Zhai

While recent advancements in large multimodal models (LMMs) have significantly improved their abilities in image quality assessment (IQA) relying on absolute quality rating, how to transfer reliable relative quality comparison outputs to continuous perceptual quality scores remains largely unexplored. To address this gap, we introduce an all-around LMM-based NR-IQA model, which is capable of producing qualitatively comparative responses and effectively translating these discrete comparison outcomes into a continuous quality score. Specifically, during training, we present to generate scaled-up comparative instructions by comparing images from the same IQA dataset, allowing for more flexible integration of diverse IQA datasets. Utilizing the established large-scale training corpus, we develop a human-like visual quality comparator. During inference, moving beyond binary choices, we propose a soft comparison method that calculates the likelihood of the test image being preferred over multiple predefined anchor images. The quality score is further optimized by maximum a posteriori estimation with the resulting probability matrix. Extensive experiments on nine IQA datasets validate that the Compare2Score effectively bridges text-defined comparative levels during training with converted single image quality scores for inference, surpassing state-of-the-art IQA models across diverse scenarios. Moreover, we verify that the probability-matrix-based inference conversion not only improves the rating accuracy of Compare2Score but also zero-shot general-purpose LMMs, suggesting its intrinsic effectiveness.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Fine Structure-Aware Sampling: A New Sampling Training Scheme for Pixel-Aligned Implicit Models in Single-View Human Reconstruction

Kennard Yanting Chan
Fayao Liu
Guosheng Lin
Chuan Sheng Foo
Weisi Lin

Pixel-aligned implicit models, such as PIFu, PIFuHD, and ICON, are used for single-view clothed human reconstruction. These models need to be trained using a sampling training scheme. Existing sampling training schemes either fail to capture thin surfaces (e.g. ears, fingers) or cause noisy artefacts in reconstructed meshes. To address these problems, we introduce Fine Structured-Aware Sampling (FSS), a new sampling training scheme to train pixel-aligned implicit models for single-view human reconstruction. FSS resolves the aforementioned problems by proactively adapting to the thickness and complexity of surfaces. In addition, unlike existing sampling training schemes, FSS shows how normals of sample points can be capitalized in the training process to improve results. Lastly, to further improve the training process, FSS proposes a mesh thickness loss signal for pixel-aligned implicit models. It becomes computationally feasible to introduce this loss once a slight reworking of the pixel-aligned implicit function framework is carried out. Our results show that our methods significantly outperform SOTA methods qualitatively and quantitatively. Our code is publicly available at https://github.com/kcyt/FSS.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Iterative Token Evaluation and Refinement for Real-World Super-resolution

Chaofeng Chen
Shangchen Zhou
Liang Liao
Haoning Wu
Wenxiu Sun
Qiong Yan
Weisi Lin

Real-world image super-resolution (RWSR) is a long-standing problem as low-quality (LQ) images often have complex and unidentified degradations. Existing methods such as Generative Adversarial Networks (GANs) or continuous diffusion models present their own issues including GANs being difficult to train while continuous diffusion models requiring numerous inference steps. In this paper, we propose an Iterative Token Evaluation and Refinement (ITER) framework for RWSR, which utilizes a discrete diffusion model operating in the discrete token representation space, i.e., indexes of features extracted from a VQGAN codebook pre-trained with high-quality (HQ) images. We show that ITER is easier to train than GANs and more efficient than continuous diffusion models. Specifically, we divide RWSR into two sub-tasks, i.e., distortion removal and texture generation. Distortion removal involves simple HQ token prediction with LQ images, while texture generation uses a discrete diffusion model to iteratively refine the distortion removal output with a token refinement network. In particular, we propose to include a token evaluation network in the discrete diffusion process. It learns to evaluate which tokens are good restorations and helps to improve the iterative refinement results. Moreover, the evaluation network can first check status of the distortion removal output and then adaptively select total refinement steps needed, thereby maintaining a good balance between distortion removal and texture generation. Extensive experimental results show that ITER is easy to train and performs well within just 8 iterative steps.

PDF Details DOI

ICML Conference 2024 Conference Paper

Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels

Haoning Wu 0001
Zicheng Zhang
Weixia Zhang
Chaofeng Chen
Liang Liao
Chunyi Li
Yixuan Gao
Annan Wang

The explosion of visual content available online underscores the requirement for an accurate machine assessor to robustly evaluate scores across diverse types of visual contents. While recent studies have demonstrated the exceptional potentials of large multi-modality models (LMMs) on a wide range of related fields, in this work, we explore how to teach them for visual rating aligning with human opinions. Observing that human raters only learn and judge discrete text-defined levels in subjective studies, we propose to emulate this subjective process and teach LMMs with text-defined rating levels instead of scores. The proposed Q-Align achieves state-of-the-art accuracy on image quality assessment (IQA), image aesthetic assessment (IAA), as well as video quality assessment (VQA) under the original LMM structure. With the syllabus, we further unify the three tasks into one model, termed the OneAlign. Our experiments demonstrate the advantage of discrete levels over direct scores on training, and that LMMs can learn beyond the discrete levels and provide effective finer-grained evaluations. Code and weights will be released.

Details

ICLR Conference 2024 Conference Paper

Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision

Haoning Wu 0001
Zicheng Zhang
Erli Zhang 0001
Chaofeng Chen
Liang Liao
Annan Wang
Chunyi Li
Wenxiu Sun

The rapid evolution of Multi-modality Large Language Models (MLLMs) has catalyzed a shift in computer vision from specialized models to general-purpose foundation models. Nevertheless, there is still an inadequacy in assessing the abilities of MLLMs on **low-level visual perception and understanding**. To address this gap, we present **Q-Bench**, a holistic benchmark crafted to systematically evaluate potential abilities of MLLMs on three realms: low-level visual perception, low-level visual description, and overall visual quality assessment. **_a)_** To evaluate the low-level **_perception_** ability, we construct the **LLVisionQA** dataset, consisting of 2,990 diverse-sourced images, each equipped with a human-asked question focusing on its low-level attributes. We then measure the correctness of MLLMs on answering these questions. **_b)_** To examine the **_description_** ability of MLLMs on low-level information, we propose the **LLDescribe** dataset consisting of long expert-labelled *golden* low-level text descriptions on 499 images, and a GPT-involved comparison pipeline between outputs of MLLMs and the *golden* descriptions. **_c)_** Besides these two tasks, we further measure their visual quality **_assessment_** ability to align with human opinion scores. Specifically, we design a softmax-based strategy that enables MLLMs to predict *quantifiable* quality scores, and evaluate them on various existing image quality assessment (IQA) datasets. Our evaluation across the three abilities confirms that MLLMs possess preliminary low-level visual skills. However, these skills are still unstable and relatively imprecise, indicating the need for specific enhancements on MLLMs towards these abilities. We hope that our benchmark can encourage the research community to delve deeper to discover and enhance these untapped potentials of MLLMs.

Details

AAAI Conference 2022 Conference Paper

CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes

Hao Huang
Yongtao Wang
Zhaoyu Chen
Yuze Zhang
Yuheng Li
Zhi Tang
Wei Chu
Jingdong Chen

Malicious applications of deepfakes (i. e. , technologies generating target facial attributes or entire faces from facial images) have posed a huge threat to individuals’ reputation and security. To mitigate these threats, recent studies have proposed adversarial watermarks to combat deepfake models, leading them to generate distorted outputs. Despite achieving impressive results, these adversarial watermarks have low imagelevel and model-level transferability, meaning that they can protect only one facial image from one specific deepfake model. To address these issues, we propose a novel solution that can generate a Cross-Model Universal Adversarial Watermark (CMUA-Watermark), protecting a large number of facial images from multiple deepfake models. Specifically, we begin by proposing a cross-model universal attack pipeline that attacks multiple deepfake models iteratively. Then, we design a two-level perturbation fusion strategy to alleviate the conflict between the adversarial watermarks generated by different facial images and models. Moreover, we address the key problem in cross-model optimization with a heuristic approach to automatically find the suitable attack step sizes for different models, further weakening the model-level conflict. Finally, we introduce a more reasonable and comprehensive evaluation method to fully test the proposed method and compare it with existing ones. Extensive experimental results demonstrate that the proposed CMUA- Watermark can effectively distort the fake facial images generated by multiple deepfake models while achieving a better performance than existing methods. Our code is available at https: //github. com/VDIGPKU/CMUA-Watermark.

PDF Details

NeurIPS Conference 2022 Conference Paper

S-PIFu: Integrating Parametric Human Models with PIFu for Single-view Clothed Human Reconstruction

Kennard Chan
Guosheng Lin
Haiyu Zhao
Weisi Lin

We present three novel strategies to incorporate a parametric body model into a pixel-aligned implicit model for single-view clothed human reconstruction. Firstly, we introduce ray-based sampling, a novel technique that transforms a parametric model into a set of highly informative, pixel-aligned 2D feature maps. Next, we propose a new type of feature based on blendweights. Blendweight-based labels serve as soft human parsing labels and help to improve the structural fidelity of reconstructed meshes. Finally, we show how we can extract and capitalize on body part orientation information from a parametric model to further improve reconstruction quality. Together, these three techniques form our S-PIFu framework, which significantly outperforms state-of-the-arts methods in all metrics. Our code is available at https: //github. com/kcyt/SPIFu.

PDF Details

IJCAI Conference 2021 Conference Paper

Low Resolution Information Also Matters: Learning Multi-Resolution Representations for Person Re-Identification

Guoqing Zhang
Yuhao Chen
Weisi Lin
Arun Chandran
Xuan Jing

As a prevailing task in video surveillance and forensics field, person re-identification (re-ID) aims to match person images captured from non-overlapped cameras. In unconstrained scenarios, person images often suffer from the resolution mismatch problem, i. e. , Cross-Resolution Person Re-ID. To overcome this problem, most existing methods restore low resolution (LR) images to high resolution (HR) by super-resolution (SR). However, they only focus on the HR feature extraction and ignore the valid information from original LR images. In this work, we explore the influence of resolutions on feature extraction and develop a novel method for cross-resolution person re-ID called Multi-Resolution Representations Joint Learning (MRJL). Our method consists of a Resolution Reconstruction Network (RRN) and a Dual Feature Fusion Network (DFFN). The RRN uses an input image to construct a HR version and a LR version with an encoder and two decoders, while the DFFN adopts a dual-branch structure to generate person representations from multi-resolution images. Comprehensive experiments on five benchmarks verify the superiority of the proposed MRJL over the relevent state-of-the-art methods.

PDF Details DOI

JBHI Journal 2016 Journal Article

Cross-Examination for Angle-Closure Glaucoma Feature Detection

Swamidoss Issac Niwas
Weisi Lin
Chee Keong Kwoh
C.-C. Jay Kuo
Chelvin C. Sng
Maria Cecilia Aquino
Paul T. K. Chew

Effective feature selection plays a vital role in anterior segment imaging for determining the mechanism involved in angle-closure glaucoma (ACG) diagnosis. This research focuses on the use of redundant features for complex disease diagnosis such as ACG using anterior segment optical coherence tomography images. Both supervised [minimum redundancy maximum relevance (MRMR)] and unsupervised [Laplacian score (L-score)] feature selection algorithms have been cross-examined with different ACG mechanisms. An AdaBoost machine learning classifier is then used for classifying the five various classes of ACG mechanism such as iris roll, lens, pupil block, plateau iris, and no mechanism using both feature selection methods. The overall accuracy has shown that the usefulness of redundant features by L-score method in improved ACG diagnosis compared to minimum redundant features by MRMR method.

Details DOI

IROS Conference 2015 Conference Paper

B-SHOT: A binary feature descriptor for fast and efficient keypoint matching on 3D point clouds

Sai Manoj Prakhya
Bingbing Liu
Weisi Lin

In this paper, we introduce the very first ‘binary’ 3D feature descriptor, B-SHOT, for fast and efficient keypoint matching on 3D point clouds. We propose a binary quantization method that converts a real valued vector to a binary vector. We apply this method on a state-of-the-art 3D feature descriptor, SHOT [1], and create a new binary 3D feature descriptor. B-SHOT requires 32 times lesser memory for its representation while being 6 times faster in feature descriptor matching, when compared to the SHOT feature descriptor. Experimental evaluation shows that B-SHOT offers comparable keypoint matching performance to that of the state-of-the-art 3D feature descriptors on a standard benchmark dataset.

Details

ICRA Conference 2015 Conference Paper

Sparse Depth Odometry: 3D keypoint based pose estimation from dense depth data

Sai Manoj Prakhya
Bingbing Liu
Weisi Lin
Usman Qayyum

This paper presents Sparse Depth Odometry (SDO) to incrementally estimate the 3D pose of a depth camera in indoor environments. SDO relies on 3D keypoints extracted on dense depth data and hence can be used to augment the RGB-D camera based visual odometry methods that fail in places where there is no proper illumination. In SDO, our main contribution is the design of the keypoint detection module, which plays a vital role as it condenses the input point cloud to a few keypoints. SDO differs from existing depth alone methods as it does not use the popular signed distance function and can run online, even without a GPU. A new keypoint detection module is proposed via keypoint selection, and is based on extensive theoretical and experimental evaluation. The proposed keypoint detection module comprises of two existing keypoint detectors, namely SURE [1] and NARF [2]. It offers reliable keypoints that describe the scene more comprehensively, compared to others. Finally, an extensive performance evaluation of SDO on benchmark datasets with the proposed keypoint detection module is presented and compared with the state-of-the-art.

Details