Author name cluster

Zimo Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

2 author rows

AAAI Conference 2026 Conference Paper

DMGINE: Day-Memory Guided Nighttime Image Enhancement for Dynamic Traffic Scenes

Ruizhou Liu
Zhe Wu
Zimo Liu
Qingfang Zheng
Qingming Huang

We introduce Daytime-Memory Guided Nighttime Image Enhancement (DMGNIE) framework, the first framework that turns long-running daytime surveillance videos of a single intersection into persistent “daytime memory” to guide nighttime image enhancement in traffic scenes. Our key insight is simple yet powerful: for a static scene, perfectly exposed daytime frames are, pixel-for-pixel, high-quality illumination prior for the same location under extreme low-light. Due to the complex lighting conditions in real-world traffic scenes, existing low-light image enhancement (LLIE) methods suffer from issues such as overexposure in highlight regions and noise amplification in low-light condition regions, which degrades the performance of downstream computer vision tasks. DMGNIE tackles these issues in two steps: (1) SegBMN, a semantic prior-based background modeling network, distills a clean, static daytime background from hours of video as scene prior guiding the enhancement of nighttime image; (2) a Foreground Localization-Guided Contrastive Learning module avoid the interference from the background prior with foreground objects during the guidance by maximizing the differences between foreground and background features. Finally, We conduct comprehensive experiments on real traffic surveillance datasets of two cities to evaluate the effectiveness. And the experimental results demonstrate that DMGNIE outperforms state-of-the-art baselines and achieves superior performance in challenging low-light conditions.

PDF Details DOI

ICLR Conference 2025 Conference Paper

An Exploration with Entropy Constrained 3D Gaussians for 2D Video Compression

Xiang Liu
Bin Chen 0011
Zimo Liu
Yaowei Wang 0001
Shu-Tao Xia

3D Gaussian Splatting (3DGS) has witnessed its rapid development in novel view synthesis, which attains high quality reconstruction and real-time rendering. At the same time, there is still a gap before implicit neural representation (INR) can become a practical compressor due to the lack of stream decoding and real-time frame reconstruction on consumer-grade hardware. It remains a question whether the fast rendering and partial parameter decoding characteristics of 3DGS are applicable to video compression. To address these challenges, we propose a Toast-like Sliding Window (TSW) orthographic projection for converting any 3D Gaussian model into a video representation model. This method efficiently represents video by leveraging temporal redundancy through a sliding window approach. Additionally, the converted model is inherently stream-decodable and offers a higher rendering frame rate compared to INR methods. Building on TSW, we introduce an end-to-end trainable video compression method, GSVC, which employs deformable Gaussian representation and optical flow guidance to capture dynamic content in videos. Experimental results demonstrate that our method effectively transforms a 3D Gaussian model into a practical video compressor. GSVC further achieves better rate-distortion performance than NeRV on the UVG dataset, while achieving higher frame reconstruction speed (+30%~40% fps) and stream decoding. Code is available at [Github](https://github.com/actcwlf/GSVC)

Details

AAAI Conference 2025 Conference Paper

DM-Adapter: Domain-Aware Mixture-of-Adapters for Text-Based Person Retrieval

Yating Liu
Zimo Liu
Xiangyuan Lan
Wenming Yang
Yaowei Li
Qingmin Liao

Text-based person retrieval (TPR) has gained significant attention as a fine-grained and challenging task that closely aligns with practical applications. Tailoring CLIP to person domain is now a emerging research topic due to the abundant knowledge of vision-language pretraining, but challenges still remain during fine-tuning: (i) Previous full-model fine-tuning in TPR is computationally expensive and prone to overfitting.(ii) Existing parameter-efficient transfer learning (PETL) for TPR lacks of fine-grained feature extraction. To address these issues, we propose Domain-Aware Mixture-of-Adapters (DM-Adapter), which unifies Mixture-of-Experts (MOE) and PETL to enhance fine-grained feature representations while maintaining efficiency. Specifically, Sparse Mixture-of-Adapters is designed in parallel to MLP layers in both vision and language branches, where different experts specialize in distinct aspects of person knowledge to handle features more finely. To promote the router to exploit domain information effectively and alleviate the routing imbalance, Domain-Aware Router is then developed by building a novel gating function and injecting learnable domain-aware prompts. Extensive experiments show that our DM-Adapter achieves state-of-the-art performance, outperforming previous methods by a significant margin.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Pre-Trained Vision-Language Models as Noisy Partial Annotators

Qian-Wei Wang
Yuqiu Xie
Letian Zhang
Zimo Liu
Shu-Tao Xia

In noisy partial label learning, each training sample is associated with a set of candidate labels, and the ground-truth label may be contained within this set. With the emergence of powerful pre-trained vision-language models, e.g. CLIP, it is natural to consider using these models to automatically label training samples instead of relying on laborious manual annotation. In this paper, we investigate the pipeline of learning with CLIP annotated noisy partial labels and propose a novel collaborative consistency regularization method, in which we simultaneously train two neural networks, which collaboratively purify training labels for each other, called Co-Pseudo-Labeling, and perform consistency regularization between label and representation levels. For instance-dependent noise that embodies the underlying patterns of the pre-trained model, our method employs multiple mechanisms to avoid overfitting to noisy annotations, effectively mines information from potentially noisy sample set while iteratively optimizing both representations and pseudo-labels during the training process. Comparison experiments with various kinds of annotations and weakly supervised methods, as well as other pre-trained model application methods demonstrates the effectiveness of method and the feasibility of incorporating weakly supervised learning into the distillation of pre-trained models.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Controller-Guided Partial Label Consistency Regularization with Unlabeled Data

Qian-Wei Wang
Bowen Zhao
Mingyan Zhu
Tianxiang Li
Zimo Liu
Shu-Tao Xia

Partial label learning (PLL) learns from training examples each associated with multiple candidate labels, among which only one is valid. In recent years, benefiting from the strong capability of dealing with ambiguous supervision and the impetus of modern data augmentation methods, consistency regularization-based PLL methods have achieved a series of successes and become mainstream. However, as the partial annotation becomes insufficient, their performances drop significantly. In this paper, we leverage easily accessible unlabeled examples to facilitate the partial label consistency regularization. In addition to a partial supervised loss, our method performs a controller-guided consistency regularization at both the label-level and representation-level with the help of unlabeled data. To minimize the disadvantages of insufficient capabilities of the initial supervised model, we use the controller to estimate the confidence of each current prediction to guide the subsequent consistency regularization. Furthermore, we dynamically adjust the confidence thresholds so that the number of samples of each class participating in consistency regularization remains roughly equal to alleviate the problem of class-imbalance. Experiments show that our method achieves satisfactory performances in more practical situations, and its modules can be applied to existing PLL methods to enhance their capabilities.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

M$^3$GPT: An Advanced Multimodal, Multitask Framework for Motion Comprehension and Generation

Mingshuang Luo
RuiBing Hou
Zhuo Li
Hong Chang
Zimo Liu
Yaowei Wang
Shiguang Shan

This paper presents M$^3$GPT, an advanced $\textbf{M}$ultimodal, $\textbf{M}$ultitask framework for $\textbf{M}$otion comprehension and generation. M$^3$GPT operates on three fundamental principles. The first focuses on creating a unified representation space for various motion-relevant modalities. We employ discrete vector quantization for multimodal conditional signals, such as text, music and motion/dance, enabling seamless integration into a large language model (LLM) with a single vocabulary. The second involves modeling motion generation directly in the raw motion space. This strategy circumvents the information loss associated with a discrete tokenizer, resulting in more detailed and comprehensive motion generation. Third, M$^3$GPT learns to model the connections and synergies among various motion-relevant tasks. Text, the most familiar and well-understood modality for LLMs, is utilized as a bridge to establish connections between different motion tasks, facilitating mutual reinforcement. To our knowledge, M$^3$GPT is the first model capable of comprehending and generating motions based on multiple signals. Extensive experiments highlight M$^3$GPT's superior performance across various motion-relevant tasks and its powerful zero-shot generalization capabilities for extremely challenging tasks. Project page: \url{https: //github. com/luomingshuang/M3GPT}.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Lifelong Person Re-identification via Knowledge Refreshing and Consolidation

Chunlin Yu
Ye Shi
Zimo Liu
Shenghua Gao
Jingya Wang

Lifelong person re-identification (LReID) is in significant demand for real-world development as a large amount of ReID data is captured from diverse locations over time and cannot be accessed at once inherently. However, a key challenge for LReID is how to incrementally preserve old knowledge and gradually add new capabilities to the system. Unlike most existing LReID methods, which mainly focus on dealing with catastrophic forgetting, our focus is on a more challenging problem, which is, not only trying to reduce the forgetting on old tasks but also aiming to improve the model performance on both new and old tasks during the lifelong learning process. Inspired by the biological process of human cognition where the somatosensory neocortex and the hippocampus work together in memory consolidation, we formulated a model called Knowledge Refreshing and Consolidation (KRC) that achieves both positive forward and backward transfer. More specifically, a knowledge refreshing scheme is incorporated with the knowledge rehearsal mechanism to enable bi-directional knowledge transfer by introducing a dynamic memory model and an adaptive working model. Moreover, a knowledge consolidation scheme operating on the dual space further improves model stability over the long-term. Extensive evaluations show KRC’s superiority over the state-of-the-art LReID methods with challenging pedestrian benchmarks. Code is available at https://github.com/cly234/LReID-KRKC.

PDF Details DOI