Author name cluster

Chen Hou

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

2 author rows

AAAI Conference 2026 Conference Paper

Enhanced Privacy Leakage from Noise-Perturbed Gradients via Gradient-Guided Conditional Diffusion Models

Jiayang Meng
Tao Huang
Hong Chen
Chen Hou
Guolong Zheng

Federated learning synchronizes models through gradient transmission and aggregation. However, these gradients pose significant privacy risks, as sensitive training data is embedded within them. Existing gradient inversion attacks suffer from significantly degraded reconstruction performance when gradients are perturbed by noise-a common defense mechanism. In this paper, we introduce gradient-guided conditional diffusion models for reconstructing private images from leaked gradients, without prior knowledge of the target data distribution. Our approach leverages the inherent denoising capability of diffusion models to circumvent the partial protection offered by noise perturbation, thereby improving attack performance under such defenses. We further provide a theoretical analysis of the reconstruction error bounds and the convergence properties of the attack loss, characterizing the impact of key factors—such as noise magnitude and attacked model architecture—on reconstruction quality. Extensive experiments demonstrate our attack's superior reconstruction performance with Gaussian noise-perturbed gradients, and confirm our theoretical findings.

PDF Details DOI

AAAI Conference 2025 Conference Paper

TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation

Xingrui Wang
Xin Li
Yaosi Hu
Hanxin Zhu
Chen Hou
Cuiling Lan
Zhibo Chen

Text-driven Image to Video Generation (TI2V) aims to generate controllable video given the first frame and corresponding textual description. The primary challenges of this task lie in two parts: (i) how to identify the target objects and ensure the consistency between the movement trajectory and the textual description. (ii) how to improve the subjective quality of generated videos. To tackle the above challenges, we propose a new diffusion-based TI2V framework, termed TIV-Diffusion, via object-centric textual-visual alignment, intending to achieve precise control and high-quality video generation based on textual-described motion for different objects. Concretely, we enable our TIV-Diffuion model to perceive the textual-described objects and their motion trajectory by incorporating the fused textual and visual knowledge through scale-offset modulation. Moreover, to mitigate the problems of object disappearance and misaligned objects and motion, we introduce an object-centric textual-visual alignment module, which reduces the risk of misaligned objects/motion by decoupling the objects in the reference image and aligning textual features with each object individually. Based on the above innovations, our TIV-Diffusion achieves state-of-the-art high-quality video generation compared with existing TI2V methods.

PDF Details DOI

ICLR Conference 2025 Conference Paper

Training-free Camera Control for Video Generation

Chen Hou
Zhibo Chen 0001

We propose a training-free and robust solution to offer camera movement control for off-the-shelf video diffusion models. Unlike previous work, our method does not require any supervised finetuning on camera-annotated datasets or self-supervised training via data augmentation. Instead, it is plug-and-play with most pretrained video diffusion models and can generate camera-controllable videos with a single image or text prompt as input. The inspiration for our work comes from the layout prior that intermediate latents encode for the generated results, thus rearranging noisy pixels in them will cause the output content to relocate as well. As camera moving could also be seen as a type of pixel rearrangement caused by perspective change, videos can be reorganized following specific camera motion if their noisy latents change accordingly. Building on this, we propose **CamTrol**, which enables robust camera control for video diffusion models. It is achieved by a two-stage process. First, we model image layout rearrangement through explicit camera movement in 3D point cloud space. Second, we generate videos with camera motion by leveraging the layout prior of noisy latents formed by a series of rearranged images. Extensive experiments have demonstrated its superior performance in both video generation and camera motion alignment compared with other finetuned methods. Furthermore, we show the capability of CamTrol to generalize to various base models, as well as its impressive applications in scalable motion control, dealing with complicated trajectories and unsupervised 3D video generation. Videos available at https://lifedecoder.github.io/CamTrol/.

Details

AAAI Conference 2024 Conference Paper

High-Fidelity Diffusion-Based Image Editing

Chen Hou
Guoqiang Wei
Zhibo Chen

Diffusion models have attained remarkable success in the domains of image generation and editing. It is widely recognized that employing larger inversion and denoising steps in diffusion model leads to improved image reconstruction quality. However, the editing performance of diffusion models tends to be no more satisfactory even with increasing denoising steps. The deficiency in editing could be attributed to the conditional Markovian property of the editing process, where errors accumulate throughout denoising steps. To tackle this challenge, we first propose an innovative framework where a rectifier module is incorporated to modulate diffusion model weights with residual features from the original images, thereby providing compensatory information to bridge the fidelity gap. Furthermore, we introduce a novel learning paradigm aimed at minimizing error propagation during the editing process, which trains the editing procedure in a manner similar to denoising score-matching. Extensive experiments demonstrate that our proposed framework and training strategy achieve high-fidelity reconstruction and editing results across various levels of denoising steps, meanwhile exhibits exceptional performance in terms of both quantitative metric and qualitative assessments. Lastly, we explore our model's generalization though several applications like image-to-image translation and out-of-domain image editing.

PDF Details DOI

IJCAI Conference 2021 Conference Paper

Interactive Video Acquisition and Learning System for Motor Assessment of Parkinson's Disease

Yunyue Wei
Bingquan Zhu
Chen Hou
Chen Zhang
Yanan Sui

Diagnosis and treatment for Parkinson's disease rely on the evaluation of motor functions, which is expensive and time consuming when performing at clinics. It is also difficult for patients to record correct movements at home without the guidance from experienced physicians. To help patients with Parkinson’s disease get better evaluation from in-home recorded movement videos, we developed an interactive video acquisition and learning system for clinical motor assessments. The system provides real-time guidance with multi-level body keypoint tracking and analysis to patients, which guarantees correct understanding and performing of clinical tasks. We tested its effectiveness on healthy subjects, and the efficiency and usability on patient groups. Experiments showed that our system enabled high quality video recordings following clinical standards, benefiting both patients and physicians. Our system provides a novel learning-based telemedicine approach for the care of patients with Parkinson’s disease.

PDF Details DOI