Arrow Research search

Author name cluster

Ding Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers
2 author rows

Possible papers

18

TMLR Journal 2025 Journal Article

MarDini: Masked Auto-regressive Diffusion for Video Generation at Scale

  • Haozhe Liu
  • Shikun Liu
  • Zijian Zhou
  • Mengmeng Xu
  • Yanping Xie
  • Xiao Han
  • Juan Camilo Perez
  • Ding Liu

We introduce MarDini, a new family of video diffusion models that integrate the advantages of masked auto-regression (MAR) into a unified diffusion model (DM) framework. Here, MAR handles temporal planning, while DM focuses on spatial generation in an asymmetric network design: i) a MAR-based planning model containing most of the parameters generates planning signals for each masked frame using low-resolution input; ii) a lightweight generation model uses these signals to produce high-resolution frames via diffusion de-noising. MarDini’s MAR enables video generation conditioned on any number of masked frames at any frame positions: a single model can handle video interpolation (e.g., masking middle frames), image-to-video generation (e.g., masking from the second frame onward), and video expansion (e.g., masking half the frames). The efficient design allocates most of the computational resources to the low-resolution planning model, making computationally expensive but important spatio-temporal attention feasible at scale. MarDini sets a new state-of-the-art for video interpolation; meanwhile, within few inference steps, it efficiently generates videos on par with those of much more expensive advanced image-to-video models.

AAAI Conference 2024 Conference Paper

AdaFormer: Efficient Transformer with Adaptive Token Sparsification for Image Super-resolution

  • Xiaotong Luo
  • Zekun Ai
  • Qiuyuan Liang
  • Ding Liu
  • Yuan Xie
  • Yanyun Qu
  • Yun Fu

Efficient transformer-based models have made remarkable progress in image super-resolution (SR). Most of these works mainly design elaborate structures to accelerate the inference of the transformer, where all feature tokens are propagated equally. However, they ignore the underlying characteristic of image content, i.e., various image regions have distinct restoration difficulties, especially for large images (2K-8K), failing to achieve adaptive inference. In this work, we propose an adaptive token sparsification transformer (AdaFormer) to speed up the model inference for image SR. Specifically, a texture-relevant sparse attention block with parallel global and local branches is introduced, aiming to integrate informative tokens from the global view instead of only in fixed local windows. Then, an early-exit strategy is designed to progressively halt tokens according to the token importance. To estimate the plausibility of each token, we adopt a lightweight confidence estimator, which is constrained by an uncertainty-guided loss to obtain a binary halting mask about the tokens. Experiments on large images have illustrated that our proposal reduces nearly 90% latency against SwinIR on Test8K, while maintaining a comparable performance.

NeurIPS Conference 2023 Conference Paper

Hierarchical Integration Diffusion Model for Realistic Image Deblurring

  • Zheng Chen
  • Yulun Zhang
  • Ding Liu
  • Bin Xia
  • Jinjin Gu
  • Linghe Kong
  • Xin Yuan

Diffusion models (DMs) have recently been introduced in image deblurring and exhibited promising performance, particularly in terms of details reconstruction. However, the diffusion model requires a large number of inference iterations to recover the clean image from pure Gaussian noise, which consumes massive computational resources. Moreover, the distribution synthesized by the diffusion model is often misaligned with the target results, leading to restrictions in distortion-based metrics. To address the above issues, we propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring. Specifically, we perform the DM in a highly compacted latent space to generate the prior feature for the deblurring process. The deblurring process is implemented by a regression-based method to obtain better distortion accuracy. Meanwhile, the highly compact latent space ensures the efficiency of the DM. Furthermore, we design the hierarchical integration module to fuse the prior into the regression-based model from multiple scales, enabling better generalization in complex blurry scenarios. Comprehensive experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods. Code and trained models are available at https: //github. com/zhengchen1999/HI-Diff.

AAAI Conference 2023 Conference Paper

ShadowFormer: Global Context Helps Shadow Removal

  • Lanqing Guo
  • Siyu Huang
  • Ding Liu
  • Hao Cheng
  • Bihan Wen

Recent deep learning methods have achieved promising results in image shadow removal. However, most of the existing approaches focus on working locally within shadow and non-shadow regions, resulting in severe artifacts around the shadow boundaries as well as inconsistent illumination between shadow and non-shadow regions. It is still challenging for the deep shadow removal model to exploit the global contextual correlation between shadow and non-shadow regions. In this work, we first propose a Retinex-based shadow model, from which we derive a novel transformer-based network, dubbed ShandowFormer, to exploit non-shadow regions to help shadow region restoration. A multi-scale channel attention framework is employed to hierarchically capture the global information. Based on that, we propose a Shadow-Interaction Module (SIM) with Shadow-Interaction Attention (SIA) in the bottleneck stage to effectively model the context correlation between shadow and non-shadow regions. We conduct extensive experiments on three popular public datasets, including ISTD, ISTD+, and SRD, to evaluate the proposed method. Our method achieves state-of-the-art performance by using up to 150X fewer model parameters.

AAAI Conference 2021 Conference Paper

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

  • Yang Fu
  • Linjie Yang
  • Ding Liu
  • Thomas S. Huang
  • Humphrey Shi

Video instance segmentation is a complex task in which we need to detect, segment, and track each object for any given video. Previous approaches only utilize single-frame features for the detection, segmentation, and tracking of objects and they suffer in the video scenario due to several distinct challenges such as motion blur and drastic appearance change. To eliminate ambiguities introduced by only using single-frame features, we propose a novel comprehensive feature aggregation approach (CompFeat) to refine features at both framelevel and object-level with temporal and spatial context information. The aggregation process is carefully designed with a new attention mechanism which significantly increases the discriminative power of the learned features. We further improve the tracking capability of our model through a siamese design by incorporating both feature similarities and spatial similarities. Experiments conducted on the YouTube-VIS dataset validate the effectiveness of proposed CompFeat.

NeurIPS Conference 2020 Conference Paper

Neural Sparse Representation for Image Restoration

  • Yuchen Fan
  • Jiahui Yu
  • Yiqun Mei
  • Yulun Zhang
  • Yun Fu
  • Ding Liu
  • Thomas S. Huang

Inspired by the robustness and efficiency of sparse representation in sparse coding based image restoration models, we investigate the sparsity of neurons in deep networks. Our method structurally enforces sparsity constraints upon hidden neurons. The sparsity constraints are favorable for gradient-based learning algorithms and attachable to convolution layers in various networks. Sparsity in neurons enables computation saving by only operating on non-zero components without hurting accuracy. Meanwhile, our method can magnify representation dimensionality and model capacity with negligible additional computation cost. Experiments show that sparse representation is crucial in deep neural networks for multiple image restoration tasks, including image super-resolution, image denoising, and image compression artifacts removal.

AAAI Conference 2020 Conference Paper

Scale-Wise Convolution for Image Restoration

  • Yuchen Fan
  • Jiahui Yu
  • Ding Liu
  • Thomas S. Huang

While scale-invariant modeling has substantially boosted the performance of visual recognition tasks, it remains largely under-explored in deep networks based image restoration. Naively applying those scale-invariant techniques (e. g. , multi-scale testing, random-scale data augmentation) to image restoration tasks usually leads to inferior performance. In this paper, we show that properly modeling scale-invariance into neural networks can bring significant benefits to image restoration performance. Inspired from spatial-wise convolution for shift-invariance, “scale-wise convolution” is proposed to convolve across multiple scales for scale-invariance. In our scale-wise convolutional network (SCN), we first map the input image to the feature space and then build a feature pyramid representation via bi-linear down-scaling progressively. The feature pyramid is then passed to a residual network with scale-wise convolutions. The proposed scalewise convolution learns to dynamically activate and aggregate features from different input scales in each residual building block, in order to exploit contextual information on multiple scales. In experiments, we compare the restoration accuracy and parameter efficiency among our model and many different variants of multi-scale neural networks. The proposed network with scale-wise convolution achieves superior performance in multiple image restoration tasks including image super-resolution, image denoising and image compression artifacts removal. Code and models are available at: https: //github. com/ychfan/scn sr.

IJCAI Conference 2018 Conference Paper

Connecting Low-Level Image Processing and High-Level Vision via Deep Learning

  • Ding Liu

The latest development of computer vision has made exciting progress and tremendous impact in our daily lives. In this exciting era of technological advances, deep learning has gained huge popularity as a powerful tool for solving a lot of computer vision problems, and has added a great boost to this already rapidly developing field. Conventionally, the connection between different vision tasks is fragile. For example, low-level image processing and high-level vision tasks are usually coped with separately. However, the inherent relation of feature representations among various tasks should be effectively utilized rather than omitted. My research focuses on connecting low-level image processing and high-level vision via deep learning. Specifically, my goal is to design deep learning mechanisms that can efficiently and effectively learn features from low-level image processing and use them to improve the performance of high-level vision tasks.

NeurIPS Conference 2018 Conference Paper

Non-Local Recurrent Network for Image Restoration

  • Ding Liu
  • Bihan Wen
  • Yuchen Fan
  • Chen Change Loy
  • Thomas Huang

Many classic methods have shown non-local self-similarity in natural images to be an effective prior for image restoration. However, it remains unclear and challenging to make use of this intrinsic property via deep networks. In this paper, we propose a non-local recurrent network (NLRN) as the first attempt to incorporate non-local operations into a recurrent neural network (RNN) for image restoration. The main contributions of this work are: (1) Unlike existing methods that measure self-similarity in an isolated manner, the proposed non-local module can be flexibly integrated into existing deep networks for end-to-end training to capture deep feature correlation between each location and its neighborhood. (2) We fully employ the RNN structure for its parameter efficiency and allow deep feature correlation to be propagated along adjacent recurrent states. This new design boosts robustness against inaccurate correlation estimation due to severely degraded images. (3) We show that it is essential to maintain a confined neighborhood for computing deep feature correlation given degraded images. This is in contrast to existing practice that deploys the whole image. Extensive experiments on both image denoising and super-resolution tasks are conducted. Thanks to the recurrent non-local operations and correlation propagation, the proposed NLRN achieves superior results to state-of-the-art methods with many fewer parameters.

AAAI Conference 2018 Short Paper

Visual Recognition in Very Low-Quality Settings: Delving Into the Power of Pre-Training

  • Bowen Cheng
  • Ding Liu
  • Zhangyang Wang
  • Haichao Zhang
  • Thomas Huang

Visual recognition from very low-quality images is an extremely challenging task with great practical values. While deep networks have been extensively applied to low-quality image restoration and high-quality image recognition tasks respectively, few works have been done on the important problem of recognition from very low-quality images. This paper presents a degradation-robust pre-training approach on improving deep learning models towards this direction. Extensive experiments on different datasets validate the effectiveness of our proposed method.

IJCAI Conference 2018 Conference Paper

When Image Denoising Meets High-Level Vision Tasks: A Deep Learning Approach

  • Ding Liu
  • Bihan Wen
  • Xianming Liu
  • Zhangyang Wang
  • Thomas Huang

Conventionally, image denoising and high-level vision tasks are handled separately in computer vision. In this paper, we cope with the two jointly and explore the mutual influence between them. First we propose a convolutional neural network for image denoising which achieves the state-of-the-art performance. Second we propose a deep neural network solution that cascades two modules for image denoising and various high-level tasks, respectively, and use the joint loss for updating only the denoising network via back-propagation. We demonstrate that on one hand, the proposed denoiser has the generality to overcome the performance degradation of different high-level vision tasks. On the other hand, with the guidance of high-level vision information, the denoising network can generate more visually appealing results. To the best of our knowledge, this is the first work investigating the benefit of exploiting image semantics simultaneously for image denoising and high-level vision tasks via deep learning.

AAAI Conference 2016 Conference Paper

Epitomic Image Super-Resolution

  • Yingzhen Yang
  • Zhangyang Wang
  • Zhaowen Wang
  • Shiyu Chang
  • Ding Liu
  • Honghui Shi
  • Thomas Huang

We propose Epitomic Image Super-Resolution (ESR) to enhance the current internal SR methods that exploit the selfsimilarities in the input. Instead of local nearest neighbor patch matching used in most existing internal SR methods, ESR employs epitomic patch matching that features robustness to noise, and both local and non-local patch matching. Extensive objective and subjective evaluation demonstrate the effectiveness and advantage of ESR on various images.

STOC Conference 2002 Conference Paper

Approximating the smallest grammar: Kolmogorov complexity in natural models

  • Moses Charikar
  • Eric Lehman
  • Ding Liu
  • Rina Panigrahy
  • Manoj Prabhakaran 0001
  • April Rasala
  • Amit Sahai
  • Abhi Shelat

We consider the problem of finding the smallest context-free grammar that generates exactly one given string of length n . The size of this grammar is of theoretical interest as an efficiently computable variant of Kolmogorov complexity. The problem is of practical importance in areas such as data compression and pattern extraction.The smallest grammar is known to be hard to approximate to within a constant factor, and an o (log n /log log n ) approximation would require progress on a long-standing algebraic problem [10]. Previously, the best proved approximation ratio was O ( n 1/2 ) for the Bisection algorithm [8]. Our main result is an exponential improvement of this ratio; we give an O (log ( n/g * )) approximation algorithm, where g * is the size of the smallest grammar.We then consider other computable variants of Kolomogorov complexity. In particular we give an O (log 2 n ) approximation for the smallest non-deterministic finite automaton with advice that produces a given string. We also apply our techniques to "advice-grammars" and "edit-grammars", two other natural models of string complexity.