Author name cluster

Ding Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers

2 author rows

TMLR Journal 2025 Journal Article

MarDini: Masked Auto-regressive Diffusion for Video Generation at Scale

Haozhe Liu
Shikun Liu
Zijian Zhou
Mengmeng Xu
Yanping Xie
Xiao Han
Juan Camilo Perez
Ding Liu

We introduce MarDini, a new family of video diffusion models that integrate the advantages of masked auto-regression (MAR) into a unified diffusion model (DM) framework. Here, MAR handles temporal planning, while DM focuses on spatial generation in an asymmetric network design: i) a MAR-based planning model containing most of the parameters generates planning signals for each masked frame using low-resolution input; ii) a lightweight generation model uses these signals to produce high-resolution frames via diffusion de-noising. MarDini’s MAR enables video generation conditioned on any number of masked frames at any frame positions: a single model can handle video interpolation (e.g., masking middle frames), image-to-video generation (e.g., masking from the second frame onward), and video expansion (e.g., masking half the frames). The efficient design allocates most of the computational resources to the low-resolution planning model, making computationally expensive but important spatio-temporal attention feasible at scale. MarDini sets a new state-of-the-art for video interpolation; meanwhile, within few inference steps, it efficiently generates videos on par with those of much more expensive advanced image-to-video models.

PDF Details

AAAI Conference 2024 Conference Paper

AdaFormer: Efficient Transformer with Adaptive Token Sparsification for Image Super-resolution

Xiaotong Luo
Zekun Ai
Qiuyuan Liang
Ding Liu
Yuan Xie
Yanyun Qu
Yun Fu

Efficient transformer-based models have made remarkable progress in image super-resolution (SR). Most of these works mainly design elaborate structures to accelerate the inference of the transformer, where all feature tokens are propagated equally. However, they ignore the underlying characteristic of image content, i.e., various image regions have distinct restoration difficulties, especially for large images (2K-8K), failing to achieve adaptive inference. In this work, we propose an adaptive token sparsification transformer (AdaFormer) to speed up the model inference for image SR. Specifically, a texture-relevant sparse attention block with parallel global and local branches is introduced, aiming to integrate informative tokens from the global view instead of only in fixed local windows. Then, an early-exit strategy is designed to progressively halt tokens according to the token importance. To estimate the plausibility of each token, we adopt a lightweight confidence estimator, which is constrained by an uncertainty-guided loss to obtain a binary halting mask about the tokens. Experiments on large images have illustrated that our proposal reduces nearly 90% latency against SwinIR on Test8K, while maintaining a comparable performance.

PDF Details DOI

EAAI Journal 2024 Journal Article

Combination prediction of underground mine rock drilling time based on seasonal and trend decomposition using Loess

Ning Li
Ding Liu
Liguan Wang
Haiwang Ye
Qizhou Wang
Dairong Yan
Shugang Zhao

Details DOI

NeurIPS Conference 2023 Conference Paper

Hierarchical Integration Diffusion Model for Realistic Image Deblurring

Zheng Chen
Yulun Zhang
Ding Liu
Bin Xia
Jinjin Gu
Linghe Kong
Xin Yuan

Diffusion models (DMs) have recently been introduced in image deblurring and exhibited promising performance, particularly in terms of details reconstruction. However, the diffusion model requires a large number of inference iterations to recover the clean image from pure Gaussian noise, which consumes massive computational resources. Moreover, the distribution synthesized by the diffusion model is often misaligned with the target results, leading to restrictions in distortion-based metrics. To address the above issues, we propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring. Specifically, we perform the DM in a highly compacted latent space to generate the prior feature for the deblurring process. The deblurring process is implemented by a regression-based method to obtain better distortion accuracy. Meanwhile, the highly compact latent space ensures the efficiency of the DM. Furthermore, we design the hierarchical integration module to fuse the prior into the regression-based model from multiple scales, enabling better generalization in complex blurry scenarios. Comprehensive experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods. Code and trained models are available at https: //github. com/zhengchen1999/HI-Diff.

PDF Details

AAAI Conference 2023 Conference Paper

ShadowFormer: Global Context Helps Shadow Removal

Lanqing Guo
Siyu Huang
Ding Liu
Hao Cheng
Bihan Wen

Recent deep learning methods have achieved promising results in image shadow removal. However, most of the existing approaches focus on working locally within shadow and non-shadow regions, resulting in severe artifacts around the shadow boundaries as well as inconsistent illumination between shadow and non-shadow regions. It is still challenging for the deep shadow removal model to exploit the global contextual correlation between shadow and non-shadow regions. In this work, we first propose a Retinex-based shadow model, from which we derive a novel transformer-based network, dubbed ShandowFormer, to exploit non-shadow regions to help shadow region restoration. A multi-scale channel attention framework is employed to hierarchically capture the global information. Based on that, we propose a Shadow-Interaction Module (SIM) with Shadow-Interaction Attention (SIA) in the bottleneck stage to effectively model the context correlation between shadow and non-shadow regions. We conduct extensive experiments on three popular public datasets, including ISTD, ISTD+, and SRD, to evaluate the proposed method. Our method achieves state-of-the-art performance by using up to 150X fewer model parameters.

PDF Details DOI

EAAI Journal 2023 Journal Article

Simulation of thermoelastic coupling in silicon single crystal growth based on alternate two-stage physics-informed neural network

Shuyan Shi
Ding Liu
Zhiran Huo

Details DOI

AAAI Conference 2021 Conference Paper

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Yang Fu
Linjie Yang
Ding Liu
Thomas S. Huang
Humphrey Shi

Video instance segmentation is a complex task in which we need to detect, segment, and track each object for any given video. Previous approaches only utilize single-frame features for the detection, segmentation, and tracking of objects and they suffer in the video scenario due to several distinct challenges such as motion blur and drastic appearance change. To eliminate ambiguities introduced by only using single-frame features, we propose a novel comprehensive feature aggregation approach (CompFeat) to refine features at both framelevel and object-level with temporal and spatial context information. The aggregation process is carefully designed with a new attention mechanism which significantly increases the discriminative power of the learned features. We further improve the tracking capability of our model through a siamese design by incorporating both feature similarities and spatial similarities. Experiments conducted on the YouTube-VIS dataset validate the effectiveness of proposed CompFeat.

PDF Details

NeurIPS Conference 2020 Conference Paper

Neural Sparse Representation for Image Restoration

Yuchen Fan
Jiahui Yu
Yiqun Mei
Yulun Zhang
Yun Fu
Ding Liu
Thomas S. Huang

Inspired by the robustness and efficiency of sparse representation in sparse coding based image restoration models, we investigate the sparsity of neurons in deep networks. Our method structurally enforces sparsity constraints upon hidden neurons. The sparsity constraints are favorable for gradient-based learning algorithms and attachable to convolution layers in various networks. Sparsity in neurons enables computation saving by only operating on non-zero components without hurting accuracy. Meanwhile, our method can magnify representation dimensionality and model capacity with negligible additional computation cost. Experiments show that sparse representation is crucial in deep neural networks for multiple image restoration tasks, including image super-resolution, image denoising, and image compression artifacts removal.

PDF Details

AAAI Conference 2020 Conference Paper

Scale-Wise Convolution for Image Restoration

Yuchen Fan
Jiahui Yu
Ding Liu
Thomas S. Huang

While scale-invariant modeling has substantially boosted the performance of visual recognition tasks, it remains largely under-explored in deep networks based image restoration. Naively applying those scale-invariant techniques (e. g. , multi-scale testing, random-scale data augmentation) to image restoration tasks usually leads to inferior performance. In this paper, we show that properly modeling scale-invariance into neural networks can bring signiﬁcant beneﬁts to image restoration performance. Inspired from spatial-wise convolution for shift-invariance, “scale-wise convolution” is proposed to convolve across multiple scales for scale-invariance. In our scale-wise convolutional network (SCN), we ﬁrst map the input image to the feature space and then build a feature pyramid representation via bi-linear down-scaling progressively. The feature pyramid is then passed to a residual network with scale-wise convolutions. The proposed scalewise convolution learns to dynamically activate and aggregate features from different input scales in each residual building block, in order to exploit contextual information on multiple scales. In experiments, we compare the restoration accuracy and parameter efﬁciency among our model and many different variants of multi-scale neural networks. The proposed network with scale-wise convolution achieves superior performance in multiple image restoration tasks including image super-resolution, image denoising and image compression artifacts removal. Code and models are available at: https: //github. com/ychfan/scn sr.

PDF Details

IJCAI Conference 2018 Conference Paper

Connecting Low-Level Image Processing and High-Level Vision via Deep Learning

Ding Liu

The latest development of computer vision has made exciting progress and tremendous impact in our daily lives. In this exciting era of technological advances, deep learning has gained huge popularity as a powerful tool for solving a lot of computer vision problems, and has added a great boost to this already rapidly developing field. Conventionally, the connection between different vision tasks is fragile. For example, low-level image processing and high-level vision tasks are usually coped with separately. However, the inherent relation of feature representations among various tasks should be effectively utilized rather than omitted. My research focuses on connecting low-level image processing and high-level vision via deep learning. Specifically, my goal is to design deep learning mechanisms that can efficiently and effectively learn features from low-level image processing and use them to improve the performance of high-level vision tasks.

PDF Details

NeurIPS Conference 2018 Conference Paper

Non-Local Recurrent Network for Image Restoration

Ding Liu
Bihan Wen
Yuchen Fan
Chen Change Loy
Thomas Huang

Many classic methods have shown non-local self-similarity in natural images to be an effective prior for image restoration. However, it remains unclear and challenging to make use of this intrinsic property via deep networks. In this paper, we propose a non-local recurrent network (NLRN) as the first attempt to incorporate non-local operations into a recurrent neural network (RNN) for image restoration. The main contributions of this work are: (1) Unlike existing methods that measure self-similarity in an isolated manner, the proposed non-local module can be flexibly integrated into existing deep networks for end-to-end training to capture deep feature correlation between each location and its neighborhood. (2) We fully employ the RNN structure for its parameter efficiency and allow deep feature correlation to be propagated along adjacent recurrent states. This new design boosts robustness against inaccurate correlation estimation due to severely degraded images. (3) We show that it is essential to maintain a confined neighborhood for computing deep feature correlation given degraded images. This is in contrast to existing practice that deploys the whole image. Extensive experiments on both image denoising and super-resolution tasks are conducted. Thanks to the recurrent non-local operations and correlation propagation, the proposed NLRN achieves superior results to state-of-the-art methods with many fewer parameters.

PDF Details

AAAI Conference 2018 Short Paper

Visual Recognition in Very Low-Quality Settings: Delving Into the Power of Pre-Training

Bowen Cheng
Ding Liu
Zhangyang Wang
Haichao Zhang
Thomas Huang

Visual recognition from very low-quality images is an extremely challenging task with great practical values. While deep networks have been extensively applied to low-quality image restoration and high-quality image recognition tasks respectively, few works have been done on the important problem of recognition from very low-quality images. This paper presents a degradation-robust pre-training approach on improving deep learning models towards this direction. Extensive experiments on different datasets validate the effectiveness of our proposed method.

PDF Details

IJCAI Conference 2018 Conference Paper

When Image Denoising Meets High-Level Vision Tasks: A Deep Learning Approach

Ding Liu
Bihan Wen
Xianming Liu
Zhangyang Wang
Thomas Huang

Conventionally, image denoising and high-level vision tasks are handled separately in computer vision. In this paper, we cope with the two jointly and explore the mutual influence between them. First we propose a convolutional neural network for image denoising which achieves the state-of-the-art performance. Second we propose a deep neural network solution that cascades two modules for image denoising and various high-level tasks, respectively, and use the joint loss for updating only the denoising network via back-propagation. We demonstrate that on one hand, the proposed denoiser has the generality to overcome the performance degradation of different high-level vision tasks. On the other hand, with the guidance of high-level vision information, the denoising network can generate more visually appealing results. To the best of our knowledge, this is the first work investigating the benefit of exploiting image semantics simultaneously for image denoising and high-level vision tasks via deep learning.

PDF Details

AAAI Conference 2016 Conference Paper

Epitomic Image Super-Resolution

Yingzhen Yang
Zhangyang Wang
Zhaowen Wang
Shiyu Chang
Ding Liu
Honghui Shi
Thomas Huang

We propose Epitomic Image Super-Resolution (ESR) to enhance the current internal SR methods that exploit the selfsimilarities in the input. Instead of local nearest neighbor patch matching used in most existing internal SR methods, ESR employs epitomic patch matching that features robustness to noise, and both local and non-local patch matching. Extensive objective and subjective evaluation demonstrate the effectiveness and advantage of ESR on various images.

PDF Details

SODA Conference 2006 Conference Paper

Self-improving algorithms

Nir Ailon
Bernard Chazelle
C. Seshadhri 0001
Ding Liu

Details

STOC Conference 2003 Conference Paper

Sublinear geometric algorithms

Bernard Chazelle
Ding Liu
Avner Magen

Details

STOC Conference 2002 Conference Paper

Approximating the smallest grammar: Kolmogorov complexity in natural models

Moses Charikar
Eric Lehman
Ding Liu
Rina Panigrahy
Manoj Prabhakaran 0001
April Rasala
Amit Sahai
Abhi Shelat

We consider the problem of finding the smallest context-free grammar that generates exactly one given string of length n . The size of this grammar is of theoretical interest as an efficiently computable variant of Kolmogorov complexity. The problem is of practical importance in areas such as data compression and pattern extraction.The smallest grammar is known to be hard to approximate to within a constant factor, and an o (log n /log log n ) approximation would require progress on a long-standing algebraic problem [10]. Previously, the best proved approximation ratio was O ( n 1/2 ) for the Bisection algorithm [8]. Our main result is an exponential improvement of this ratio; we give an O (log ( n/g * )) approximation algorithm, where g * is the size of the smallest grammar.We then consider other computable variants of Kolomogorov complexity. In particular we give an O (log 2 n ) approximation for the smallest non-deterministic finite automaton with advice that produces a given string. We also apply our techniques to "advice-grammars" and "edit-grammars", two other natural models of string complexity.

Details

STOC Conference 2001 Conference Paper

Lower bounds for intersection searching and fractional cascading in higher dimension

Bernard Chazelle
Ding Liu

Details