Arrow Research search

Author name cluster

Chao Liang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers
2 author rows

Possible papers

12

AAAI Conference 2026 Conference Paper

Leveraging Failed Samples: A Few-Shot and Training-Free Framework for Generalized Deepfake Detection

  • Shibo Yao
  • Renshuai Tao
  • Xiaolong Zheng
  • Chao Liang
  • Chunjie Zhang

Recent deepfake detection studies often treat unseen sample detection as a ``zero-shot" task, training on images generated by known models but generalizing to unknown ones. A key real-world challenge arises when a model performs poorly on unknown samples, yet these samples remain available for analysis. This highlights that it should be approached as a ``few-shot" task, where effectively utilizing a small number of samples can lead to significant improvement. Unlike typical few-shot tasks focused on semantic understanding, deepfake detection prioritizes image realism, which closely mirrors real-world distributions. In this work, we propose the Few-shot Training-free Network (FTNet) for real-world few-shot deepfake detection. Simple yet effective, FTNet differs from traditional methods that rely on large-scale known data for training. Instead, FTNet uses only one fake sample from an evaluation set, mimicking the scenario where new samples emerge in the real world and can be gathered for use, without any training or parameter updates. During evaluation, each test sample is compared to the known fake and real samples, and it is classified based on the category of the nearest sample. We conduct a comprehensive analysis of AI-generated images from 29 different generative models and achieve a new SoTA performance, with an average improvement of 8.7% compared to existing methods. This work introduces a fresh perspective on real-world deepfake detection: when the model struggles to generalize on a few-shot sample, leveraging the failed samples leads to better performance.

ICLR Conference 2025 Conference Paper

CyberHost: A One-stage Diffusion Framework for Audio-driven Talking Body Generation

  • Gaojie Lin
  • Jianwen Jiang
  • Chao Liang
  • Tianyun Zhong
  • Jiaqi Yang 0008
  • Zerong Zheng
  • Yanbo Zheng

Diffusion-based video generation technology has advanced significantly, catalyzing a proliferation of research in human animation. While breakthroughs have been made in driving human animation through various modalities for portraits, most of current solutions for human body animation still focus on video-driven methods, leaving audio-driven taking body generation relatively underexplored. In this paper, we introduce CyberHost, a one-stage audio-driven talking body generation framework that addresses common synthesis degradations in half-body animation, including hand integrity, identity consistency, and natural motion. CyberHost's key designs are twofold. Firstly, the Region Attention Module (RAM) maintains a set of learnable, implicit, identity-agnostic latent features and combines them with identity-specific local visual features to enhance the synthesis of critical local regions. Secondly, the Human-Prior-Guided Conditions introduce more human structural priors into the model, reducing uncertainty in generated motion patterns and thereby improving the stability of the generated videos. To our knowledge, CyberHost is the first one-stage audio-driven human diffusion model capable of zero-shot video generation for the human body. Extensive experiments demonstrate that CyberHost surpasses previous works in both quantitative and qualitative aspects. CyberHost can also be extended to video-driven and audio-video hybrid-driven scenarios, achieving similarly satisfactory results.

AAAI Conference 2025 Conference Paper

FriendsQA: A New Large-Scale Deep Video Understanding Dataset with Fine-grained Topic Categorization for Story Videos

  • Zhengqian Wu
  • Ruizhe Li
  • Zijun Xu
  • Zhongyuan Wang
  • Chunxia Xiao
  • Chao Liang

Video question answering (VideoQA) aims to answer natural language questions according to the given videos. Although existing models perform well in the factoid VideoQA task, they still face challenges in deep video understanding (DVU) task, which focuses on story videos. Compared to factoid videos, the most significant feature of story videos is storylines, which are composed of complex interactions and long-range evolvement of core story topics including characters, actions and locations. Understanding these topics requires models to possess DVU capability. However, existing DVU datasets rarely organize questions according to these story topics, making them difficult to comprehensively assess VideoQA models' DVU capability of complex storylines. Additionally, the question quantity and video length of these dataset are limited by high labor costs of handcrafted dataset building method. In this paper, we devise a large language model based multi-agent collaboration framework, StoryMind, to automatically generate a new large-scale DVU dataset. The dataset, FriendsQA, derived from the renowned sitcom Friends with an average episode length of 1,358 seconds, contains 44.6K questions evenly distributed across 14 fine-grained topics. Finally, We conduct comprehensive experiments on 10 state-of-the-art VideoQA models using the FriendsQA dataset.

ICLR Conference 2025 Conference Paper

Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

  • Jianwen Jiang
  • Chao Liang
  • Jiaqi Yang 0008
  • Gaojie Lin
  • Tianyun Zhong
  • Yanbo Zheng

With the introduction of video diffusion model, audio-conditioned human video generation has recently achieved significant breakthroughs in both the naturalness of motion and the synthesis of portrait details. Due to the limited control of audio signals in driving human motion, existing methods often add auxiliary spatial signals such as movement regions to stabilize movements, which compromise the naturalness and freedom of motion. To address this issue, we propose an end-to-end audio-only conditioned video diffusion model named Loopy. Specifically, we designed two key modules: an inter- and intra-clip temporal module and an audio-to-latents module. These enable the model to better utilize long-term motion dependencies and establish a stronger audio-portrait movement correlation. Consequently, the model can generate more natural and stable portrait videos with subtle facial expressions, without the need for manually setting movement constraints. Extensive experiments show that Loopy outperforms recent audio-driven portrait diffusion models, delivering more lifelike and high-quality results across various scenarios. Video samples are available at https://loopyavataranony.github.io/

IJCAI Conference 2024 Conference Paper

A Survey on Rank Aggregation

  • Siyi Wang
  • Qi Deng
  • Shiwei Feng
  • Hong Zhang
  • Chao Liang

Rank aggregation (RA), the technique of combining multiple basic rankings into a consensus one, plays an important role in social choices, bioinformatics, information retrieval, metasearch, and recommendation systems. Although recent years have witnessed remarkable progress in RA, the absence of a systematic overview motivates us to conduct a comprehensive survey including both classic algorithms and the latest advances in RA study. Specifically, we first discuss the challenges of RA research, then present a systematic review with a fine-grained taxonomy to introduce representative algorithms in unsupervised RA, supervised RA, as well as the previously overlooked semi-supervised RA. Within each category, we not only summarize the common ideas of similar methods, but also discuss their strengths and weaknesses. Particularly, to investigate the performance difference of different types of RA methods, we conduct the largest scale of comparative evaluation to date of 27 RA methods on 7 public datasets from person re-identification, recommendation systems, bioinformatics and social choices. Finally, we raise two open questions in the current RA research and make our comments about future trends in the context of the latest research progress.

AAAI Conference 2024 Conference Paper

QI-IRA: Quantum-Inspired Interactive Ranking Aggregation for Person Re-identification

  • Chunyu Hu
  • Hong Zhang
  • Chao Liang
  • Hao Huang

Ranking aggregation (RA), the process of aggregating multiple rankings derived from multiple search strategies, has been proved effective in person re-identification (re-ID) because of a single re-ID method can not always achieve consistent superiority for different scenarios. Existing RA research mainly focus on unsupervised and fully-supervised methods. The former lack external supervision to optimize performance, while the latter are costly because of expensive labeling effort required for training. To address the above challenges, this paper proposes a quantum-inspired interactive ranking aggregation (QI-IRA) method, which (1) utilizes quantum theory to interpret and model the generation and aggregation of multiple basic rankings, (2) approximates or even exceeds the performance of fully-supervised RA methods with much less labeling cost, even as low as only two feedbacks per query on Market1501, MARS and DukeMTMC-VideoReID datasets. Comparative experiments conducted on six public re-ID datasets validate the superiority of the proposed QI-IRA method over existing unsupervised, interactive, and fully-supervised RA approaches.

AAAI Conference 2022 Conference Paper

One More Check: Making “Fake Background” Be Tracked Again

  • Chao Liang
  • Zhipeng Zhang
  • Xue Zhou
  • Bing Li
  • Weiming Hu

The one-shot multi-object tracking, which integrates object detection and ID embedding extraction into a unified network, has achieved groundbreaking results in recent years. However, current one-shot trackers solely rely on singleframe detections to predict candidate bounding boxes, which may be unreliable when facing disastrous visual degradation, e. g. , motion blur, occlusions. Once a target bounding box is mistakenly classified as background by the detector, the temporal consistency of its corresponding tracklet will be no longer maintained. In this paper, we set out to restore the bounding boxes misclassified as “fake background” by proposing a re-check network. The re-check network innovatively expands the role of ID embedding from data association to motion forecasting by effectively propagating previous tracklets to the current frame with a small overhead. Note that the propagation results are yielded by an independent and efficient embedding search, preventing the model from overrelying on detection results. Eventually, it helps to reload the “fake background” and repair the broken tracklets. Building on a strong baseline CSTrack, we construct a new one-shot tracker and achieve favorable gains by 70. 7 → 76. 4, 70. 6 → 76. 3 MOTA on MOT16 and MOT17, respectively. It also reaches a new state-of-the-art MOTA and IDF1 performance. Code is released at https: //github. com/JudasDie/SOTS.

IJCAI Conference 2020 Conference Paper

When Pedestrian Detection Meets Nighttime Surveillance: A New Benchmark

  • Xiao Wang
  • Jun Chen
  • Zheng Wang
  • Wu Liu
  • Shin'ichi Satoh
  • Chao Liang
  • Chia-Wen Lin

Pedestrian detection at nighttime is a crucial and frontier problem in surveillance, but has not been well explored by the computer vision and artificial intelligence communities. Most of existing methods detect pedestrians under favorable lighting conditions (e. g. daytime) and achieve promising performances. In contrast, they often fail under unstable lighting conditions (e. g. nighttime). Night is a critical time for criminal suspects to act in the field of security. The existing nighttime pedestrian detection dataset is captured by a car camera, specially designed for autonomous driving scenarios. The dataset for nighttime surveillance scenario is still vacant. There are vast differences between autonomous driving and surveillance, including viewpoint and illumination. In this paper, we build a novel pedestrian detection dataset from the nighttime surveillance aspect: NightSurveillance1. As a benchmark dataset for pedestrian detection at nighttime, we compare the performances of state-of-the-art pedestrian detectors and the results reveal that the methods cannot solve all the challenging problems of NightSurveillance. We believe that NightSurveillance can further advance the research of pedestrian detection, especially in the field of surveillance security at nighttime.

AAAI Conference 2018 Conference Paper

Video-Based Person Re-Identification via Self Paced Weighting

  • Wenjun Huang
  • Chao Liang
  • Yi Yu
  • Zheng Wang
  • Weijian Ruan
  • Ruimin Hu

Person re-identification (re-id) is a fundamental technique to associate various person images, captured by different surveillance cameras, to the same person. Compared to the single image based person re-id methods, video-based person re-id has attracted widespread attentions because extra space-time information and more appearance cues that can be used to greatly improve the matching performance. However, most existing video-based person re-id methods equally treat all video frames, ignoring their quality discrepancy caused by object occlusion and motions, which is a common phenomenon in real surveillance scenario. Based on this finding, we propose a novel video-based person re-id method via self paced weighting (SPW). Firstly, we propose a self paced outlier detection method to evaluate the noise degree of video sub sequences. Thereafter, a weighted multi-pair distance metric learning approach is adopted to measure the distance of two person image sequences. Experimental results on two public datasets demonstrate the superiority of the proposed method over current state-of-the-art work.

IJCAI Conference 2016 Conference Paper

Scale-Adaptive Low-Resolution Person Re-Identification via Learning a Discriminating Surface

  • Zheng Wang
  • Ruimin Hu
  • Yi Yu
  • Junjun Jiang
  • Chao Liang
  • Jinqiao Wang

Person re-identification, as an important task in video surveillance and forensics applications, has been widely studied. But most of previous approaches are based on the key assumption that images for comparison have the same resolution and a uniform scale. Some recent works investigate how to match low resolution query images against high resolution gallery images, but still assume that the low-resolution query images have the same scale. In real scenarios, person images may not only be with low-resolution but also have different scales. Through investigating the distance variation behavior by changing image scales, we observe that scale-distance functions, generated by image pairs under different scales from the same person or different persons, are distinguishable and can be classified as feasible (for a pair of images from the same person) or infeasible (for a pair of images from different persons). The scale-distance functions are further represented by parameter vectors in the scale-distance function space. On this basis, we propose to learn a discriminating surface separating these feasible and infeasible functions in the scale-distance function space, and use it for reidentifying persons. Experimental results on two simulated datasets and one public dataset demonstrate the effectiveness of the proposed framework.

AAAI Conference 2014 Conference Paper

Intra-View and Inter-View Supervised Correlation Analysis for Multi-View Feature Learning

  • Xiao-Yuan Jing
  • Rui-Min Hu
  • Yang-Ping Zhu
  • Shan-Shan Wu
  • Chao Liang
  • Jing-Yu Yang

Multi-view feature learning is an attractive research topic with great practical success. Canonical correlation analysis (CCA) has become an important technique in multi-view learning, since it can fully utilize the inter-view correlation. In this paper, we mainly study the CCA based multi-view supervised feature learning technique where the labels of training samples are known. Several supervised CCA based multi-view methods have been presented, which focus on investigating the supervised correlation across different views. However, they take no account of the intra-view correlation between samples. Researchers have also introduced the discriminant analysis technique into multiview feature learning, such as multi-view discriminant analysis (MvDA). But they ignore the canonical correlation within each view and between all views. In this paper, we propose a novel multi-view feature learning approach based on intra-view and inter-view supervised correlation analysis (I2 SCA), which can explore the useful correlation information of samples within each view and between all views. The objective function of I2 SCA is designed to simultaneously extract the discriminatingly correlated features from both inter-view and intra-view. It can obtain an analytical solution without iterative calculation. And we provide a kernelized extension of I2 SCA to tackle the linearly inseparable problem in the original feature space. Four widely-used datasets are employed as test data. Experimental results demonstrate that our proposed approaches outperform several representative multi-view supervised feature learning methods.