Arrow Research search

Author name cluster

Bin Ren

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers
2 author rows

Possible papers

18

TMLR Journal 2026 Journal Article

Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation

  • Bin Ren
  • Eduard Zamfir
  • Zongwei Wu
  • Yawei Li
  • Yidi Li
  • Danda Pani Paudel
  • Radu Timofte
  • Ming-Hsuan Yang

Restoring multiple degradations efficiently via just one model has become increasingly significant and impactful, especially with the proliferation of mobile devices. Traditional solutions typically involve training dedicated models per degradation, resulting in inefficiency and redundancy. More recent approaches either introduce additional modules to learn visual prompts, significantly increasing the size of the model, or incorporate cross-modal transfer from large language models trained on vast datasets, adding complexity to the system architecture. In contrast, our approach, termed AnyIR, takes a unified path that leverages inherent similarity across various degradations to enable both efficient and comprehensive restoration through a joint embedding mechanism, without scaling up the model or relying on large language models. Specifically, we examine the sub-latent space of each input, identifying key components and reweighting them first in a gated manner. To unify intrinsic degradation awareness with contextualized attention, we propose a spatial–frequency parallel fusion strategy that strengthens spatially informed local–global interactions and enriches restoration fidelity from the frequency domain. Comprehensive evaluations across four all-in-one restoration benchmarks demonstrate that AnyIR attains state-of-the-art performance while reducing model parameters by 84% and FLOPs by 80% relative to the baseline. These results highlight the potential of AnyIR as an effective and lightweight solution for further all-in-one image restoration. Our code is available at: https://github.com/Amazingren/AnyIR.

AAAI Conference 2026 Conference Paper

From Semantics to Spectrum: A New Lens on Graph Augmentation Strategy

  • Xiangping Zheng
  • Xiuxin Hao
  • Bo Wu
  • Wei Li
  • Bin Ren
  • Bin Tang
  • Yuhui Guo
  • Xun Liang

Graph augmentation is a cornerstone of effective graph contrastive learning, yet existing methods often rely on random designed perturbations, which may distort latent semantics and impair representation quality. In this work, we argue that semantic consistency can be effectively approximated by low-frequency components in the spectral domain, offering a principled proxy for guiding augmentation. Based on this insight, we propose Frequency-Aware Graph Contrastive Learning (FA-GCL), a novel framework that explicitly preserves low-frequency signals while selectively perturbing high-frequency components. By aligning augmentation with frequency-aware decomposition, FA-GCL generates diverse yet semantically coherent views, mitigating semantic drift and enhancing representational discrimination. Extensive experiments across multiple benchmarks demonstrate that FA-GCL consistently outperforms state-of-the-art baselines with statistically significant gains, validating its exclusive merits.

AAAI Conference 2026 Conference Paper

Masked Clustering Prediction for Unsupervised Point Cloud Pre-training

  • Bin Ren
  • Xiaoshui Huang
  • Mengyuan Liu
  • Hong Liu
  • Fabio Poiesi
  • Nicu Sebe
  • Guofeng Mei

Vision transformers (ViTs) have recently been widely applied to 3D point cloud understanding, with masked autoencoding as the predominant pre-training paradigm. However, the challenge of learning dense and informative semantic features from point clouds via standard ViTs remains underexplored. We propose MaskClu, a novel unsupervised pre-training method for ViTs on 3D point clouds that integrates masked point modeling with clustering-based learning. MaskClu is designed to reconstruct both cluster assignments and cluster centers from masked point clouds, thus encouraging the model to capture dense semantic information. Additionally, we introduce a global contrastive learning mechanism that enhances instance-level feature learning by contrasting different masked views of the same point cloud. By jointly optimizing these complementary objectives, i.e., dense semantic reconstruction, and instance-level contrastive learning. MaskClu enables ViTs to learn richer and more semantically meaningful representations from 3D point clouds. We validate the effectiveness of MaskClu via multiple 3D tasks, including part segmentation, semantic segmentation, object detection, and classification, setting new competitive results.

NeurIPS Conference 2025 Conference Paper

SceneSplat++: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting

  • Mengjiao Ma
  • Qi Ma
  • Yue Li
  • Jiahuan Cheng
  • Runyi Yang
  • Bin Ren
  • Nikola Popovic
  • Mingqiang Wei

3D Gaussian Splatting (3DGS) serves as a highly performant and efficient encoding of scene geometry, appearance, and semantics. Moreover, grounding language in 3D scenes has proven to be an effective strategy for 3D scene understanding. Current Language Gaussian Splatting line of work fall into three main groups: (i) per-scene optimization-based, (ii) per-scene optimization-free, and (iii) generalizable approach. However, most of them are evaluated only on rendered 2D views of a handful of scenes and viewpoints close to the training views, limiting ability and insight into holistic 3D understanding. To address this gap, we propose the first large-scale benchmark that systematically assesses these three groups of methods directly in 3D space, evaluating on 1060 scenes across three indoor datasets and one outdoor dataset. Benchmark results demonstrate a clear advantage of the generalizable paradigm, particularly in relaxing the scene-specific limitation, enabling fast feed-forward inference on novel scenes, and achieving superior segmentation performance. We further introduce SceneSplat-49K -- a carefully curated 3DGS dataset comprising of around 49K diverse indoor and outdoor scenes trained from multiple sources, with which we demonstrate generalizable approach could harness strong data priors. Our codes, benchmark, and datasets are available.

IROS Conference 2025 Conference Paper

Spatial-Temporal Graph Contrastive Learning with Decreasing Masks for Traffic Flow Forecasting

  • Bin Ren
  • Yongfa Zhang
  • Yamin Wen
  • Haocheng Luo
  • Hao Zhang
  • Chunhong He

In recent years, Contrastive learning has shown great potential in traffic flow prediction tasks. However, existing contrastive learning methods have difficulties in dealing with missing data and noise, and it is difficult to fully capture local and global correlations by relying on a single contrast method. In this paper, a Decreasing Mask Spatio-Temporal Graph Comparison Learning Model (DMSTGCL) is proposed. The model dynamically adjusts the mask ratio through the adaptive mask reduction technique to effectively deal with the problem of missing data and noise. Meanwhile, the projection head is further combined with the TripleAttention mechanism in the spatio-temporal contrast learning process, which overcomes the limitations of a single contrast method and captures the complex relationships in local and global space more effectively. Experiments on three real-world datasets demonstrate that DMSTGCL achieves significantly higher prediction accuracy than existing methods.

NeurIPS Conference 2024 Conference Paper

Sharing Key Semantics in Transformer Makes Efficient Image Restoration

  • Bin Ren
  • Yawei Li
  • Jingyun Liang
  • Rakesh Ranjan
  • Mengyuan Liu
  • Rita Cucchiara
  • Luc Van Gool
  • Ming-Hsuan Yang

Image Restoration (IR), a classic low-level vision task, has witnessed significant advancements through deep models that effectively model global information. Notably, the emergence of Vision Transformers (ViTs) has further propelled these advancements. When computing, the self-attention mechanism, a cornerstone of ViTs, tends to encompass all global cues, even those from semantically unrelated objects or regions. This inclusivity introduces computational inefficiencies, particularly noticeable with high input resolution, as it requires processing irrelevant information, thereby impeding efficiency. Additionally, for IR, it is commonly noted that small segments of a degraded image, particularly those closely aligned semantically, provide particularly relevant information to aid in the restoration process, as they contribute essential contextual cues crucial for accurate reconstruction. To address these challenges, we propose boosting IR's performance by sharing the key semantics via Transformer for IR (i. e. , SemanIR) in this paper. Specifically, SemanIR initially constructs a sparse yet comprehensive key-semantic dictionary within each transformer stage by establishing essential semantic connections for every degraded patch. Subsequently, this dictionary is shared across all subsequent transformer blocks within the same stage. This strategy optimizes attention calculation within each block by focusing exclusively on semantically related components stored in the key-semantic dictionary. As a result, attention calculation achieves linear computational complexity within each window. Extensive experiments across 6 IR tasks confirm the proposed SemanIR's state-of-the-art performance, quantitatively and qualitatively showcasing advancements. The visual results, code, and trained models are available at: https: //github. com/Amazingren/SemanIR.

AAAI Conference 2023 Conference Paper

Towards Real-Time Segmentation on the Edge

  • Yanyu Li
  • Changdi Yang
  • Pu Zhao
  • Geng Yuan
  • Wei Niu
  • Jiexiong Guan
  • Hao Tang
  • Minghai Qin

The research in real-time segmentation mainly focuses on desktop GPUs. However, autonomous driving and many other applications rely on real-time segmentation on the edge, and current arts are far from the goal. In addition, recent advances in vision transformers also inspire us to re-design the network architecture for dense prediction task. In this work, we propose to combine the self attention block with lightweight convolutions to form new building blocks, and employ latency constraints to search an efficient sub-network. We train an MLP latency model based on generated architecture configurations and their latency measured on mobile devices, so that we can predict the latency of subnets during search phase. To the best of our knowledge, we are the first to achieve over 74% mIoU on Cityscapes with semi-real-time inference (over 15 FPS) on mobile GPU from an off-the-shelf phone.

AAAI Conference 2023 Conference Paper

Towards Reliable Item Sampling for Recommendation Evaluation

  • Dong Li
  • Ruoming Jin
  • Zhenming Liu
  • Bin Ren
  • Jing Gao
  • Zhi Liu

Since Rendle and Krichene argued that commonly used sampling-based evaluation metrics are ``inconsistent'' with respect to the global metrics (even in expectation), there have been a few studies on the sampling-based recommender system evaluation. Existing methods try either mapping the sampling-based metrics to their global counterparts or more generally, learning the empirical rank distribution to estimate the top-K metrics. However, despite existing efforts, there is still a lack of rigorous theoretical understanding of the proposed metric estimators, and the basic item sampling also suffers from the ``blind spot'' issue, i.e., estimation accuracy to recover the top-K metrics when K is small can still be rather substantial. In this paper, we provide an in-depth investigation into these problems and make two innovative contributions. First, we propose a new item-sampling estimator that explicitly optimizes the error with respect to the ground truth, and theoretically highlights its subtle difference against prior work. Second, we propose a new adaptive sampling method that aims to deal with the ``blind spot'' problem and also demonstrate the expectation-maximization (EM) algorithm can be generalized for such a setting. Our experimental results confirm our statistical analysis and the superiority of the proposed works. This study helps lay the theoretical foundation for adopting item sampling metrics for recommendation evaluation and provides strong evidence for making item sampling a powerful and reliable tool for recommendation evaluation.

IJCAI Conference 2022 Conference Paper

Real-Time Portrait Stylization on the Edge

  • Yanyu Li
  • Xuan Shen
  • Geng Yuan
  • Jiexiong Guan
  • Wei Niu
  • Hao Tang
  • Bin Ren
  • Yanzhi Wang

In this work we demonstrate real-time portrait stylization, specifically, translating self-portrait into cartoon or anime style on mobile devices. We propose a latency-driven differentiable architecture search method, maintaining realistic generative quality. With our framework, we obtain 10× computation reduction on the generative model and achieve real-time video stylization on off-the-shelf smartphone using mobile GPUs.

NeurIPS Conference 2022 Conference Paper

SparCL: Sparse Continual Learning on the Edge

  • Zifeng Wang
  • Zheng Zhan
  • Yifan Gong
  • Geng Yuan
  • Wei Niu
  • Tong Jian
  • Bin Ren
  • Stratis Ioannidis

Existing work in continual learning (CL) focuses on mitigating catastrophic forgetting, i. e. , model performance deterioration on past tasks when learning a new task. However, the training efficiency of a CL system is under-investigated, which limits the real-world application of CL systems under resource-limited scenarios. In this work, we propose a novel framework called Sparse Continual Learning (SparCL), which is the first study that leverages sparsity to enable cost-effective continual learning on edge devices. SparCL achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity. Specifically, we propose task-aware dynamic masking (TDM) to learn a sparse network throughout the entire CL process, dynamic data removal (DDR) to remove less informative training data, and dynamic gradient masking (DGM) to sparsify the gradient updates. Each of them not only improves efficiency, but also further mitigates catastrophic forgetting. SparCL consistently improves the training efficiency of existing state-of-the-art (SOTA) CL methods by at most 23X less training FLOPs, and, surprisingly, further improves the SOTA accuracy by at most 1. 7%. SparCL also outperforms competitive baselines obtained from adapting SOTA sparse training methods to the CL setting in both efficiency and accuracy. We also evaluate the effectiveness of SparCL on a real mobile phone, further indicating the practical potential of our method.

AAAI Conference 2021 System Paper

A Compression-Compilation Co-Design Framework Towards Real-Time Object Detection on Mobile Devices

  • Yuxuan Cai
  • Geng Yuan
  • Hongjia Li
  • Wei Niu
  • Yanyu Li
  • Xulong Tang
  • Bin Ren
  • Yanzhi Wang

The rapid development and wide utilization of object detection techniques have aroused requirements for both accuracy and speed of object detectors. In this work, we propose a compression-compilation co-design framework to achieve real-time YOLOv4 inference on mobile devices. We propose a novel fine-grained structured pruning, which maintain high accuracy while achieving high hardware parallelism. Our pruned YOLOv4 achieves 48. 9 mAP and 17 FPS inference speed on an off-the-shelf Samsung Galaxy S20 smartphone, which is 5. 5× faster than the original state-of-the-art detector YOLOv4.

IJCAI Conference 2021 Conference Paper

A Compression-Compilation Framework for On-mobile Real-time BERT Applications

  • Wei Niu
  • Zhenglun Kong
  • Geng Yuan
  • Weiwen Jiang
  • Jiexiong Guan
  • Caiwen Ding
  • Pu Zhao
  • Sijia Liu

Transformer-based deep learning models have increasingly demonstrated high accuracy on many natural language processing (NLP) tasks. In this paper, we propose a compression-compilation co-design framework that can guarantee the identified model meets both resource and real-time specifications of mobile devices. Our framework applies a compiler-aware neural architecture optimization method (CANAO), which can generate the optimal compressed model that balances both accuracy and latency. We are able to achieve up to 7. 8x speedup compared with TensorFlow-Lite with only minor accuracy loss. We present two types of BERT applications on mobile devices: Question Answering (QA) and Text Generation. Both can be executed in real-time with latency as low as 45ms. Videos for demonstrating the framework can be found on https: //www. youtube. com/watch? v=_WIRvK_2PZI

NeurIPS Conference 2021 Conference Paper

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge

  • Geng Yuan
  • Xiaolong Ma
  • Wei Niu
  • Zhengang Li
  • Zhenglun Kong
  • Ning Liu
  • Yifan Gong
  • Zheng Zhan

Recently, a new trend of exploring sparsity for accelerating neural network training has emerged, embracing the paradigm of training on the edge. This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices. The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S) that ensure superior accuracy at high sparsity ratios. Different from the existing works for sparse training, this current work reveals the importance of sparsity schemes on the performance of sparse training in terms of accuracy as well as training speed on real edge devices. On top of that, the paper proposes to employ data efficiency for further acceleration of sparse training. Our results suggest that unforgettable examples can be identified in-situ even during the dynamic exploration of sparsity masks in the sparse training process, and therefore can be removed for further training speedup on edge devices. Comparing with state-of-the-art (SOTA) works on accuracy, our MEST increases Top-1 accuracy significantly on ImageNet when using the same unstructured sparsity scheme. Systematical evaluation on accuracy, training speed, and memory footprint are conducted, where the proposed MEST framework consistently outperforms representative SOTA works. A reviewer strongly against our work based on his false assumptions and misunderstandings. On top of the previous submission, we employ data efficiency for further acceleration of sparse training. And we explore the impact of model sparsity, sparsity schemes, and sparse training algorithms on the number of removable training examples. Our codes are publicly available at: https: //github. com/boone891214/MEST.

AAAI Conference 2021 Conference Paper

RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices

  • Wei Niu
  • Mengshu Sun
  • Zhengang Li
  • Jou-An Chen
  • Jiexiong Guan
  • Xipeng Shen
  • Yanzhi Wang
  • Sijia Liu

Mobile devices are becoming an important carrier for deep learning tasks, as they are being equipped with powerful, highend mobile CPUs and GPUs. However, it is still a challenging task to execute 3D Convolutional Neural Networks (CNNs) targeting for real-time performance, besides high inference accuracy. The reason is more complex model structure and higher model dimensionality overwhelm the available computation/storage resources on mobile devices. A natural way may be turning to deep learning weight pruning techniques. However, the direct generalization of existing 2D CNN weight pruning methods to 3D CNNs is not ideal for fully exploiting mobile parallelism while achieving high inference accuracy. This paper proposes RT3D, a model compression and mobile acceleration framework for 3D CNNs, seamlessly integrating neural network weight pruning and compiler code generation techniques. We propose and investigate two structured sparsity schemes i. e. , the vanilla structured sparsity and kernel group structured (KGS) sparsity that are mobile acceleration friendly. The vanilla sparsity removes whole kernel groups, while KGS sparsity is a more fine-grained structured sparsity that enjoys higher flexibility while exploiting full on-device parallelism. We propose a reweighted regularization pruning algorithm to achieve the proposed sparsity schemes. The inference time speedup due to sparsity is approaching the pruning rate of the whole model FLOPs (floating point operations). RT3D demonstrates up to 29. 1× speedup in end-to-end inference time comparing with current mobile frameworks supporting 3D CNNs, with moderate 1% ∼ 1. 5% accuracy loss. The endto-end inference time for 16 video frames could be within 150 ms, when executing representative C3D and R(2+1)D models on a cellphone. For the first time, real-time execution of 3D CNNs is achieved on off-the-shelf mobiles.

IJCAI Conference 2021 Conference Paper

Towards Fast and Accurate Multi-Person Pose Estimation on Mobile Devices

  • Xuan Shen
  • Geng Yuan
  • Wei Niu
  • Xiaolong Ma
  • Jiexiong Guan
  • Zhengang Li
  • Bin Ren
  • Yanzhi Wang

The rapid development of autonomous driving, abnormal behavior detection, and behavior recognition makes an increasing demand for multi-person pose estimation-based applications, especially on mobile platforms. However, to achieve high accuracy, state-of-the-art methods tend to have a large model size and complex post-processing algorithm, which costs intense computation and long end-to-end latency. To solve this problem, we propose an architecture optimization and weight pruning framework to accelerate inference of multi-person pose estimation on mobile devices. With our optimization framework, we achieve up to 2. 51X faster model inference speed with higher accuracy compared to representative lightweight multi-person pose estimator.

AAAI Conference 2021 Conference Paper

YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design

  • Yuxuan Cai
  • Hongjia Li
  • Geng Yuan
  • Wei Niu
  • Yanyu Li
  • Xulong Tang
  • Bin Ren
  • Yanzhi Wang

The rapid development and wide utilization of object detection techniques have aroused attention on both accuracy and speed of object detectors. However, the current state-of-theart object detection works are either accuracy-oriented using a large model but leading to high latency or speed-oriented using a lightweight model but sacrificing accuracy. In this work, we propose YOLObile framework, a real-time object detection on mobile devices via compression-compilation co-design. A novel block-punched pruning scheme is proposed for any kernel size. To improve computational efficiency on mobile devices, a GPU-CPU collaborative scheme is adopted along with advanced compiler-assisted optimizations. Experimental results indicate that our pruning scheme achieves 14× compression rate of YOLOv4 with 49. 0 mAP. Under our YOLObile framework, we achieve 17 FPS inference speed using GPU on Samsung Galaxy S20. By incorporating our proposed GPU-CPU collaborative scheme, the inference speed is increased to 19. 1 FPS, and outperforms the original YOLOv4 by 5× speedup. Source code is at: https: //github. com/nightsnack/YOLObile.

AAAI Conference 2020 Conference Paper

PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-Time Execution on Mobile Devices

  • Xiaolong Ma
  • Fu-Ming Guo
  • Wei Niu
  • Xue Lin
  • Jian Tang
  • Kaisheng Ma
  • Bin Ren
  • Yanzhi Wang

Model compression techniques on Deep Neural Network (DNN) have been widely acknowledged as an effective way to achieve acceleration on a variety of platforms, and DNN weight pruning is a straightforward and effective method. There are currently two mainstreams of pruning methods representing two extremes of pruning regularity: non-structured, fine-grained pruning can achieve high sparsity and accuracy, but is not hardware friendly; structured, coarse-grained pruning exploits hardware-efficient structures in pruning, but suffers from accuracy drop when the pruning rate is high. In this paper, we introduce PCONV, comprising a new sparsity dimension, – fine-grained pruning patterns inside the coarsegrained structures. PCONV comprises two types of sparsities, Sparse Convolution Patterns (SCP) which is generated from intra-convolution kernel pruning and connectivity sparsity generated from inter-convolution kernel pruning. Essentially, SCP enhances accuracy due to its special vision properties, and connectivity sparsity increases pruning rate while maintaining balanced workload on filter computation. To deploy PCONV, we develop a novel compiler-assisted DNN inference framework and execute PCONV models in real-time without accuracy compromise, which cannot be achieved in prior work. Our experimental results show that, PCONV outperforms three state-of-art end-to-end DNN frameworks, TensorFlow-Lite, TVM, and Alibaba Mobile Neural Network with speedup up to 39. 2×, 11. 4×, and 6. 3×, respectively, with no accuracy loss. Mobile devices can achieve real-time inference on large-scale DNNs.

IJCAI Conference 2020 Conference Paper

Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization

  • Wei Niu
  • Pu Zhao
  • Zheng Zhan
  • Xue Lin
  • Yanzhi Wang
  • Bin Ren

High-end mobile platforms rapidly serve as primary computing devices for a wide range of Deep Neural Network (DNN) applications. However, the constrained computation and storage resources on these devices still pose significant challenges for real-time DNN inference executions. To address this problem, we propose a set of hardware-friendly structured model pruning and compiler optimization techniques to accelerate DNN executions on mobile devices. This demo shows that these optimizations can enable real-time mobile execution of multiple DNN applications, including style transfer, DNN coloring and super resolution.