Arrow Research search

Author name cluster

Zhiming Luo

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers
1 author row

Possible papers

13

AAAI Conference 2026 Conference Paper

OneFont: A Unified Agent for End-to-End Font Creation

  • Yingxin Lai
  • Yufei Liu
  • Guoqing Yang
  • Jiaxing Chai
  • Zhiming Luo
  • Shaozi Li

Despite recent advancements in font generation, practitioners still grapple with a laborious trial-and-error workflow. To streamline this, we propose OneFont, an end-to-end framework that interprets user intents via free-form dialogue, seamlessly integrating both glyph synthesis and refinement modules. We introduce the Font with Thought (FwT) paradigm, reframing font design as a reasoning task where the model plans actions and articulates design rationales. OneFont’s core planner is trained via a two-stage regimen to master this paradigm. First, we instill reasoning abilities via Supervised Fine-Tuning (SFT) on a new, comprehensive benchmark of 1,500 font families we built. Second, we refine the model's policy with a novel reinforcement learning algorithm, Group Relative Policy Optimization (GRPO), guided by a hybrid reward that assesses visual fidelity, rationale coherence, and transformation correctness. Extensive experiments show OneFont significantly surpasses existing methods in design quality and stroke precision across diverse scripts, validated on our new benchmark. We will release our dataset, code, and models.

EAAI Journal 2025 Journal Article

Hierarchical vertical-aware and adaptive multi-scale network for three-dimensional object detection in maritime environments

  • Yutang Wang
  • Hangbin Wu
  • Shida Wang
  • Yuanhang Kong
  • Yifan Liu
  • Zhiming Luo
  • Chun Liu

Accurate three-dimensional (3D) object detection in maritime environments is critical for autonomous navigation. However, it remains challenging because of sparse point clouds, complex vertical structures, and extreme object scale variations. Existing 3D detectors are primarily designed for road scenes and often perform poorly in such conditions. Therefore, we propose a Hierarchical Vertical-aware and Adaptive Multi-scale Network (HVAM-Net), an anchor-free, single-stage deep learning framework tailored for maritime scenarios. HVAM-Net integrates three core modules: (1) a Hierarchical Pillar Encoding module that enhances vertical representation via exponential stratification and semantic-aware fusion; (2) an Adaptive Multi-scale Feature Extraction module that captures diverse spatial contexts via parallel atrous convolutions and attention-guided fusion; and (3) an Attention-Guided Dynamic Sampling module that refines upsampling by learning adaptive spatial offsets, enhancing semantic consistency in sparse regions. The effectiveness of HVAM-Net is validated through comprehensive comparisons with state-of-the-art 3D object detection methods. Experiments show that HVAM-Net achieves mean Average Precision scores of 86. 7 %, 78 %, and 88 % on the self-collected, Thames River vessel, and simulated datasets, respectively, outperforming all baseline methods. Moreover, its resilience under adverse weather conditions and varying light detection and ranging configurations further confirms the strong generalization capability of this artificial intelligence-based approach in real-world maritime environments.

AAAI Conference 2025 Conference Paper

Long-Tailed Out-of-Distribution Detection: Prioritizing Attention to Tail

  • Yina He
  • Lei Peng
  • Yongcun Zhang
  • Juanjuan Weng
  • Shaozi Li
  • Zhiming Luo

Current out-of-distribution (OOD) detection methods typically assume balanced in-distribution (ID) data, while most real-world data follow a long-tailed distribution. Previous approaches to long-tailed OOD detection often involve balancing the ID data by reducing the semantics of head classes. However, this reduction can severely affect the classification accuracy of ID data. The main challenge of this task lies in the severe lack of features for tail classes, leading to confusion with OOD data. To tackle this issue, we introduce a novel Prioritizing Attention to Tail (PATT) method using augmentation instead of reduction. Our main intuition involves using a mixture of von Mises-Fisher (vMF) distributions to model the ID data and a temperature scaling module to boost the confidence of ID data. This enables us to generate infinite contrastive pairs, implicitly enhancing the semantics of ID classes while promoting differentiation between ID and OOD data. To further strengthen the detection of OOD data without compromising the classification performance of ID data, we propose feature calibration during the inference phase. By extracting an attention weight from the training set that prioritizes the tail classes and reduces the confidence in OOD data, we improve the OOD detection capability. Extensive experiments verified that our method outperforms the current state-of-the-art methods on various benchmarks.

NeurIPS Conference 2024 Conference Paper

Cross-Modality Perturbation Synergy Attack for Person Re-identification

  • Yunpeng Gong
  • Zhun Zhong
  • Yansong Qu
  • Zhiming Luo
  • Rongrong Ji
  • Min Jiang

In recent years, there has been significant research focusing on addressing security concerns in single-modal person re-identification (ReID) systems that are based on RGB images. However, the safety of cross-modality scenarios, which are more commonly encountered in practical applications involving images captured by infrared cameras, has not received adequate attention. The main challenge in cross-modality ReID lies in effectively dealing with visual differences between different modalities. For instance, infrared images are typically grayscale, unlike visible images that contain color information. Existing attack methods have primarily focused on the characteristics of the visible image modality, overlooking the features of other modalities and the variations in data distribution among different modalities. This oversight can potentially undermine the effectiveness of these methods in image retrieval across diverse modalities. This study represents the first exploration into the security of cross-modality ReID models and proposes a universal perturbation attack specifically designed for cross-modality ReID. This attack optimizes perturbations by leveraging gradients from diverse modality data, thereby disrupting the discriminator and reinforcing the differences between modalities. We conducted experiments on three widely used cross-modality datasets, namely RegDB, SYSU, and LLCM. The results not only demonstrate the effectiveness of our method but also provide insights for future improvements in the robustness of cross-modality ReID systems.

AAAI Conference 2024 Conference Paper

Diversity-Authenticity Co-constrained Stylization for Federated Domain Generalization in Person Re-identification

  • Fengxiang Yang
  • Zhun Zhong
  • Zhiming Luo
  • Yifan He
  • Shaozi Li
  • Nicu Sebe

This paper tackles the problem of federated domain generalization in person re-identification (FedDG re-ID), aiming to learn a model generalizable to unseen domains with decentralized source domains. Previous methods mainly focus on preventing local overfitting. However, the direction of diversifying local data through stylization for model training is largely overlooked. This direction is popular in domain generalization but will encounter two issues under federated scenario: (1) Most stylization methods require the centralization of multiple domains to generate novel styles but this is not applicable under decentralized constraint. (2) The authenticity of generated data cannot be ensured especially given limited local data, which may impair the model optimization. To solve these two problems, we propose the Diversity-Authenticity Co-constrained Stylization (DACS), which can generate diverse and authentic data for learning robust local model. Specifically, we deploy a style transformation model on each domain to generate novel data with two constraints: (1) A diversity constraint is designed to increase data diversity, which enlarges the Wasserstein distance between the original and transformed data; (2) An authenticity constraint is proposed to ensure data authenticity, which enforces the transformed data to be easily/hardly recognized by the local-side global/local model. Extensive experiments demonstrate the effectiveness of the proposed DACS and show that DACS achieves state-of-the-art performance for FedDG re-ID.

IJCAI Conference 2024 Conference Paper

TSESNet: Temporal-Spatial Enhanced Breast Tumor Segmentation in DCE-MRI Using Feature Perception and Separability

  • Jiezhou He
  • Xue Zhao
  • Zhiming Luo
  • Songzhi Su
  • Shaozi Li
  • Guojun Zhang

Accurate segmentation of breast tumors in dynamic contrast-enhanced magnetic resonance images (DCE-MRI) is critical for early diagnosis of breast cancer. However, this task remains challenging due to the wide range of tumor sizes, shapes, and appearances. Additionally, the complexity is further compounded by the high dimensionality and ill-posed artifacts present in DCE-MRI data. Furthermore, accurately modeling features in DCE-MRI sequences presents a challenge that hinders the effective representation of essential tumor characteristics. Therefore, this paper introduces a novel Temporal-Spatial Enhanced Network (TSESNet) for breast tumor segmentation in DCE-MRI. TSESNet leverages the spatial and temporal dependencies of DCE-MRI to provide a comprehensive representation of tumor features. To address sequence modeling challenges, we propose a Temporal-Spatial Contrastive Loss (TSCLoss) that maximizes the distance between different classes and minimizes the distance within the same class, thereby improving the separation between tumors and the background. Moreover, we design a novel Temporal Series Feature Fusion (TSFF) module that effectively integrates temporal MRI features from multiple time points, enhancing the model's ability to handle temporal sequences and improving overall performance. Finally, we introduce a simple and effective Tumor-Aware (TA) module that enriches feature representation to accommodate tumors of various sizes. We conducted comprehensive experiments to validate the proposed method and demonstrate its superior performance compared to recent state-of-the-art segmentation methods on two breast cancer DCE-MRI datasets.

AAAI Conference 2023 Conference Paper

Cross-Modality Earth Mover’s Distance for Visible Thermal Person Re-identification

  • Yongguo Ling
  • Zhun Zhong
  • Zhiming Luo
  • Fengxiang Yang
  • Donglin Cao
  • Yaojin Lin
  • Shaozi Li
  • Nicu Sebe

Visible thermal person re-identification (VT-ReID) suffers from inter-modality discrepancy and intra-identity variations. Distribution alignment is a popular solution for VT-ReID, however, it is usually restricted to the influence of the intra-identity variations. In this paper, we propose the Cross-Modality Earth Mover's Distance (CM-EMD) that can alleviate the impact of the intra-identity variations during modality alignment. CM-EMD selects an optimal transport strategy and assigns high weights to pairs that have a smaller intra-identity variation. In this manner, the model will focus on reducing the inter-modality discrepancy while paying less attention to intra-identity variations, leading to a more effective modality alignment. Moreover, we introduce two techniques to improve the advantage of CM-EMD. First, Cross-Modality Discrimination Learning (CM-DL) is designed to overcome the discrimination degradation problem caused by modality alignment. By reducing the ratio between intra-identity and inter-identity variances, CM-DL leads the model to learn more discriminative representations. Second, we construct the Multi-Granularity Structure (MGS), enabling us to align modalities from both coarse- and fine-grained levels with the proposed CM-EMD. Extensive experiments show the benefits of the proposed CM-EMD and its auxiliary techniques (CM-DL and MGS). Our method achieves state-of-the-art performance on two VT-ReID benchmarks.

AAAI Conference 2023 Conference Paper

Exploring Non-target Knowledge for Improving Ensemble Universal Adversarial Attacks

  • Juanjuan Weng
  • Zhiming Luo
  • Zhun Zhong
  • Dazhen Lin
  • Shaozi Li

The ensemble attack with average weights can be leveraged for increasing the transferability of universal adversarial perturbation (UAP) by training with multiple Convolutional Neural Networks (CNNs). However, after analyzing the Pearson Correlation Coefficients (PCCs) between the ensemble logits and individual logits of the crafted UAP trained by the ensemble attack, we find that one CNN plays a dominant role during the optimization. Consequently, this average weighted strategy will weaken the contributions of other CNNs and thus limit the transferability for other black-box CNNs. To deal with this bias issue, the primary attempt is to leverage the Kullback–Leibler (KL) divergence loss to encourage the joint contribution from different CNNs, which is still insufficient. After decoupling the KL loss into a target-class part and a non-target-class part, the main issue lies in that the non-target knowledge will be significantly suppressed due to the increasing logit of the target class. In this study, we simply adopt a KL loss that only considers the non-target classes for addressing the dominant bias issue. Besides, to further boost the transferability, we incorporate the min-max learning framework to self-adjust the ensemble weights for each CNN. Experiments results validate that considering the non-target KL loss can achieve superior transferability than the original KL loss by a large margin, and the min-max training can provide a mutual benefit in adversarial ensemble attacks. The source code is available at: https://github.com/WJJLL/ND-MM.

IJCAI Conference 2021 Conference Paper

A Multi-Constraint Similarity Learning with Adaptive Weighting for Visible-Thermal Person Re-Identification

  • Yongguo Ling
  • Zhiming Luo
  • Yaojin Lin
  • Shaozi Li

The challenges of visible-thermal person re-identification (VT-ReID) lies in the inter-modality discrepancy and the intra-modality variations. An appropriate metric learning plays a crucial role in optimizing the feature similarity between the two modalities. However, most existing metric learning-based methods mainly constrain the similarity between individual instances or class centers, which are inadequate to explore the rich data relationships in the cross-modality data. Besides, most of these methods fail to consider the importance of different pairs, incurring an inefficiency and ineffectiveness of optimization. To address these issues, we propose a Multi-Constraint (MC) similarity learning method that jointly considers the cross-modality relationships from three different aspects, i. e. , Instance-to-Instance (I2I), Center-to-Instance (C2I), and Center-to-Center (C2C). Moreover, we devise an Adaptive Weighting Loss (AWL) function to implement the MC efficiently. In the AWL, we first use an adaptive margin pair mining to select informative pairs and then adaptively adjust weights of mined pairs based on their similarity. Finally, the mined and weighted pairs are used for the metric learning. Extensive experiments on two benchmark datasets demonstrate the superior performance of the proposed over the state-of-the-art methods.

AAAI Conference 2021 Conference Paper

Learning to Attack Real-World Models for Person Re-identification via Virtual-Guided Meta-Learning

  • Fengxiang Yang
  • Zhun Zhong
  • Hong Liu
  • Zheng Wang
  • Zhiming Luo
  • Shaozi Li
  • Nicu Sebe
  • Shin'ichi Satoh

Recent advances in person re-identification (re-ID) have led to impressive retrieval accuracy. However, existing re-ID models are challenged by the adversarial examples crafted by adding quasi-imperceptible perturbations. Moreover, re- ID systems face the domain shift issue that training and testing domains are not consistent. In this study, we argue that learning powerful attackers with high universality that works well on unseen domains is an important step in promoting the robustness of re-ID systems. Therefore, we introduce a novel universal attack algorithm called “MetaAttack” for person re-ID. MetaAttack can mislead re-ID models on unseen domains by a universal adversarial perturbation. Specifically, to capture common patterns across different domains, we propose a meta-learning scheme to seek the universal perturbation via the gradient interaction between meta-train and meta-test formed by two datasets. We also take advantage of a virtual dataset (PersonX), instead of real ones, to conduct meta-test. This scheme not only enables us to learn with more comprehensive variation factors but also mitigates the negative effects caused by biased factors of real datasets. Experiments on three large-scale re-ID datasets demonstrate the effectiveness of our method in attacking re-ID models on unseen domains. Our final visualization results reveal some new properties of existing re-ID systems, which can guide us in designing a more robust re- ID model. Code and supplemental material are available at https: //github. com/FlyingRoastDuck/MetaAttack AAAI21.

IJCAI Conference 2021 Conference Paper

Text-based Person Search via Multi-Granularity Embedding Learning

  • Chengji Wang
  • Zhiming Luo
  • Yaojin Lin
  • Shaozi Li

Most existing text-based person search methods highly depend on exploring the corresponding relations between the regions of the image and the words in the sentence. However, these methods correlated image regions and words in the same semantic granularity. It 1) results in irrelevant corresponding relations between image and text, 2) causes an ambiguity embedding problem. In this study, we propose a novel multi-granularity embedding learning model for text-based person search. It generates multi-granularity embeddings of partial person bodies in a coarse-to-fine manner by revisiting the person image at different spatial scales. Specifically, we distill the partial knowledge from image scrips to guide the model to select the semantically relevant words from the text description. It can learn discriminative and modality-invariant visual-textual embeddings. In addition, we integrate the partial embeddings at each granularity and perform multi-granularity image-text matching. Extensive experiments validate the effectiveness of our method, which can achieve new state-of-the-art performance by the learned discriminative partial embeddings.

AAAI Conference 2020 Conference Paper

Asymmetric Co-Teaching for Unsupervised Cross-Domain Person Re-Identification

  • Fengxiang Yang
  • Ke Li
  • Zhun Zhong
  • Zhiming Luo
  • Xing Sun
  • Hao Cheng
  • Xiaowei Guo
  • Feiyue Huang

Person re-identification (re-ID), is a challenging task due to the high variance within identity samples and imaging conditions. Although recent advances in deep learning have achieved remarkable accuracy in settled scenes, i. e. , source domain, few works can generalize well on the unseen target domain. One popular solution is assigning unlabeled target images with pseudo labels by clustering, and then retraining the model. However, clustering methods tend to introduce noisy labels and discard low confidence samples as outliers, which may hinder the retraining process and thus limit the generalization ability. In this study, we argue that by explicitly adding a sample filtering procedure after the clustering, the mined examples can be much more efficiently used. To this end, we design an asymmetric co-teaching framework, which resists noisy labels by cooperating two models to select data with possibly clean labels for each other. Meanwhile, one of the models receives samples as pure as possible, while the other takes in samples as diverse as possible. This procedure encourages that the selected training samples can be both clean and miscellaneous, and that the two models can promote each other iteratively. Extensive experiments show that the proposed framework can consistently benefit most clustering based methods, and boost the state-of-the-art adaptation accuracy. Our code is available at https: //github. com/FlyingRoastDuck/ACT AAAI20.

JBHI Journal 2019 Journal Article

Convolutional Neural Network With Shape Prior Applied to Cardiac MRI Segmentation

  • Clement Zotti
  • Zhiming Luo
  • Alain Lalande
  • Pierre-Marc Jodoin

In this paper, we present a novel convolutional neural network architecture to segment images from a series of short-axis cardiac magnetic resonance slices (CMRI). The proposed model is an extension of the U-net that embeds a cardiac shape prior and involves a loss function tailored to the cardiac anatomy. Since the shape prior is computed offline only once, the execution of our model is not limited by its calculation. Our system takes as input raw magnetic resonance images, requires no manual preprocessing or image cropping and is trained to segment the endocardium and epicardium of the left ventricle, the endocardium of the right ventricle, as well as the center of the left ventricle. With its multiresolution grid architecture, the network learns both high and low-level features useful to register the shape prior as well as accurately localize the borders of the cardiac regions. Experimental results obtained on the Automatic Cardiac Diagnostic Challenge - Medical Image Computing and Computer Assisted Intervention (ACDC-MICCAI) 2017 dataset show that our model segments multislices CMRI (left and right ventricle contours) in 0. 18 s with an average Dice coefficient of 0. 91 and an average 3-D Hausdorff distance of 9. 5 mm.