Arrow Research search

Author name cluster

Li Song

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers
2 author rows

Possible papers

18

AAAI Conference 2026 Conference Paper

D-FCGS: Feedforward Compression of Dynamic Gaussian Splatting for Free-Viewpoint Videos

  • Wenkang Zhang
  • Yan Zhao
  • Qiang Wang
  • Zhixin Xu
  • Li Song
  • Zhengxue Cheng

Free-Viewpoint Video (FVV) enables immersive 3D experiences, but efficient compression of dynamic 3D representation remains a major challenge. Existing dynamic 3D Gaussian Splatting methods couple reconstruction with optimization-dependent compression and customized motion formats, limiting generalization and standardization. To address this, we propose D-FCGS, a novel Feedforward Compression framework for Dynamic Gaussian Splatting. Key innovations include: (1) a standardized Group-of-Frames (GoF) structure with I-P coding, leveraging sparse control points to extract inter-frame motion tensors; (2) a dual prior-aware entropy model that fuses hyperprior and spatial-temporal priors for accurate rate estimation; (3) a control-point-guided motion compensation mechanism and refinement network to enhance view-consistent fidelity. Trained on Gaussian frames derived from multi-view videos, D-FCGS generalizes across diverse scenes in a zero-shot fashion. Experiments show that it matches the rate-distortion performance of optimization-based methods, achieving over 40 times compression compared to the baseline while preserving visual quality across viewpoints. This work advances feedforward compression of dynamic 3DGS, facilitating scalable FVV transmission and storage for immersive applications.

AAAI Conference 2025 Conference Paper

Controllable Distortion-Perception Tradeoff Through Latent Diffusion for Neural Image Compression

  • Chuqin Zhou
  • Guo Lu
  • Jiangchuan Li
  • Xiangyu Chen
  • Zhengxue Cheng
  • Li Song
  • Wenjun Zhang

Neural image compression often faces a challenging trade-off among rate, distortion and perception. While most existing methods typically focus on either achieving high pixel-level fidelity or optimizing for perceptual metrics, we propose a novel approach that simultaneously addresses both aspects for a fixed neural image codec. Specifically, we introduce a plug-and-play module at the decoder side that leverages a latent diffusion process to transform the decoded features, enhancing either low distortion or high perceptual quality without altering the original image compression codec. Our approach facilitates fusion of original and transformed features without additional training, enabling users to flexibly adjust the balance between distortion and perception during inference. Extensive experimental results demonstrate that our method significantly enhances the pretrained codecs with a wide, adjustable distortion-perception range while maintaining their original compression capabilities. For instance, we can achieve more than 150% improvement in LPIPS-BDRate without sacrificing more than 1 dB in PSNR.

IJCAI Conference 2025 Conference Paper

DiffFERV: Diffusion-based Facial Editing of Real Videos

  • Xiangyi Chen
  • Han Xue
  • Li Song

Face video editing presents significant challenges, requiring precise preservation of facial identity, temporal consistency, and background details. Existing methods encounter three major challenges: difficulty in achieving accurate facial reconstruction, struggles with challenging real-world videos and reliance on a crop-edit-stitch paradigm that confines editing to localized facial regions. In response, we introduce DiffFERV, a novel diffusion-based framework for realistic face video editing that addresses these limitations through three core contributions. (1) A specialization stage that extends large Text-to-Image (T2I) models' general prior to faces while retaining their broad generative capabilities. This enables robust performance on non-aligned and challenging face images. (2) Temporal modeling, implemented through two distinct attention mechanisms, complements the specialization stage to ensure joint and temporally consistent processing of video frames. (3) Finally, we present a holistic editing pipeline and the concept of preservation features, which leverages our model’s enhanced priors and temporal mechanisms to achieve faithful edits of entire video frames without the need for cropping, excelling even in real-world scenarios. Extensive experiments demonstrate that DiffFERV achieves state-of-the-art performance in both reconstruction and editing tasks.

NeurIPS Conference 2025 Conference Paper

H3D-DGS: Exploring Heterogeneous 3D Motion Representation for Deformable 3D Gaussian Splatting

  • Bing He
  • Yunuo Chen
  • Guo Lu
  • Qi Wang
  • Qunshan Gu
  • Rong Xie
  • Li Song
  • Wenjun Zhang

Dynamic scene reconstruction poses a persistent challenge in 3D vision. Deformable 3D Gaussian Splatting has emerged as an effective method for this task, offering real-time rendering and high visual fidelity. This approach decomposes a dynamic scene into a static representation in a canonical space and time-varying scene motion. Scene motion is defined as the collective movement of all Gaussian points, and for compactness, existing approaches commonly adopt implicit neural fields or sparse control points. However, these methods predominantly rely on gradient-based optimization for all motion information. Due to the high degree of freedom, they struggle to converge on real-world datasets exhibiting complex motion. To preserve the compactness of motion representation and address convergence challenges, this paper proposes heterogeneous 3D control points, termed \textbf{H3D control points}, whose attributes are obtained using a hybrid strategy combining optical flow back-projection and gradient-based methods. This design decouples directly observable motion components from those that are geometrically occluded. Specifically, components of 3D motion that project onto the image plane are directly acquired via optical flow back projection, while unobservable portions are refined through gradient-based optimization. Experiments on the Neu3DV and CMU-Panoptic datasets demonstrate that our method achieves superior performance over state-of-the-art deformable 3D Gaussian splatting techniques. Remarkably, our method converges within just 100 iterations and achieves a per-frame processing speed of 2 seconds on a single NVIDIA RTX 4070 GPU.

AAAI Conference 2025 Conference Paper

L3TC: Leveraging RWKV for Learned Lossless Low-Complexity Text Compression

  • Junxuan Zhang
  • Zhengxue Cheng
  • Yan Zhao
  • Shihao Wang
  • Dajiang Zhou
  • Guo Lu
  • Li Song

Learning-based probabilistic models can be combined with an entropy coder for data compression. However, due to the high complexity of learning-based models, their practical application as text compressors has been largely overlooked. To address this issue, our work focuses on a low-complexity design while maintaining compression performance. We introduce a novel Learned Lossless Low-complexity Text Compression method (L3TC). Specifically, we conduct extensive experiments demonstrating that RWKV models achieve the fastest decoding speed with a moderate compression ratio, making it the most suitable backbone for our method. Second, we propose an outlier-aware tokenizer that uses a limited vocabulary to cover frequent tokens while allowing outliers to bypass the prediction and encoding. Third, we propose a novel high-rank reparameterization strategy that enhances the learning capability during training without increasing complexity during inference. Experimental results validate that our method achieves 48% bit saving compared to gzip compressor. Besides, L3TC offers compression performance comparable to other learned compressors, with a 50x reduction in model parameters. More importantly, L3TC is the fastest among all learned compressors, providing real-time decoding speeds up to megabytes per second.

ICLR Conference 2025 Conference Paper

OmniRe: Omni Urban Scene Reconstruction

  • Ziyu Chen
  • Jiawei Yang 0002
  • Jiahui Huang
  • Riccardo de Lutio
  • Janick Martinez Esturo
  • Boris Ivanovic
  • Or Litany
  • Zan Gojcic

We introduce OmniRe, a comprehensive system for efficiently creating high-fidelity digital twins of dynamic real-world scenes from on-device logs. Recent methods using neural fields or Gaussian Splatting primarily focus on vehicles, hindering a holistic framework for all dynamic foregrounds demanded by downstream applications, e.g., the simulation of human behavior. OmniRe extends beyond vehicle modeling to enable accurate, full-length reconstruction of diverse dynamic objects in urban scenes. Our approach builds scene graphs on 3DGS and constructs multiple Gaussian representations in canonical spaces that model various dynamic actors, including vehicles, pedestrians, cyclists, and others. OmniRe allows holistically reconstructing any dynamic object in the scene, enabling advanced simulations (~60 Hz) that include human-participated scenarios, such as pedestrian behavior simulation and human-vehicle interaction. This comprehensive simulation capability is unmatched by existing methods. Extensive evaluations on the Waymo dataset show that our approach outperforms prior state-of-the-art methods quantitatively and qualitatively by a large margin. We further extend our results to 5 additional popular driving datasets to demonstrate its generalizability on common urban scenes. Code and results are available at [omnire](https://ziyc.github.io/omnire/).

AAAI Conference 2025 Conference Paper

VRVVC: Variable-Rate NeRF-Based Volumetric Video Compression

  • Qiang Hu
  • Houqiang Zhong
  • Zihan Zheng
  • Xiaoyun Zhang
  • Zhengxue Cheng
  • Li Song
  • Guangtao Zhai
  • Yanfeng Wang

Neural Radiance Field (NeRF)-based volumetric video has revolutionized visual media by delivering photorealistic Free-Viewpoint Video (FVV) experiences that provide audiences with unprecedented immersion and interactivity. However, the substantial data volumes pose significant challenges for storage and transmission. Existing solutions typically optimize NeRF representation and compression independently or focus on a single fixed rate-distortion (RD) tradeoff. In this paper, we propose VRVVC, a novel end-to-end joint optimization variable-rate framework for volumetric video compression that achieves variable bitrates using a single model while maintaining superior RD performance. Specifically, VRVVC introduces a compact tri-plane implicit residual representation for inter-frame modeling of long-duration dynamic scenes, effectively reducing temporal redundancy. We further propose a variable-rate residual representation compression scheme that leverages a learnable quantization and a tiny MLP-based entropy model. This approach enables variable bitrates through the utilization of predefined Lagrange multipliers to manage the quantization error of all latent representations. Finally, we present an end-to-end progressive training strategy combined with a multi-rate-distortion loss function to optimize the entire framework. Extensive experiments demonstrate that VRVVC achieves a wide range of variable bitrates within a single model and surpasses the RD performance of existing methods across various datasets.

AAAI Conference 2024 Conference Paper

Depth-Guided Robust and Fast Point Cloud Fusion NeRF for Sparse Input Views

  • Shuai Guo
  • Qiuwen Wang
  • Yijie Gao
  • Rong Xie
  • Li Song

Novel-view synthesis with sparse input views is important for real-world applications like AR/VR and autonomous driving. Recent methods have integrated depth information into NeRFs for sparse input synthesis, leveraging depth prior for geometric and spatial understanding. However, most existing works tend to overlook inaccuracies within depth maps and have low time efficiency. To address these issues, we propose a depth-guided robust and fast point cloud fusion NeRF for sparse inputs. We perceive radiance fields as an explicit voxel grid of features. A point cloud is constructed for each input view, characterized within the voxel grid using matrices and vectors. We accumulate the point cloud of each input view to construct the fused point cloud of the entire scene. Each voxel determines its density and appearance by referring to the point cloud of the entire scene. Through point cloud fusion and voxel grid fine-tuning, inaccuracies in depth values are refined or substituted by those from other views. Moreover, our method can achieve faster reconstruction and greater compactness through effective vector-matrix decomposition. Experimental results underline the superior performance and time efficiency of our approach compared to state-of-the-art baselines.

IROS Conference 2021 Conference Paper

A Portable Remote Optoelectronic Tweezer System for Microobjects Manipulation

  • Yuqing Cao
  • Shuzhang Liang
  • Hanlong Chen
  • Chunyuan Gan
  • Li Song
  • Chaonan Zhang
  • Fumihito Arai
  • Lin Feng 0002

Non-contact manipulation technology has extensive application in the manipulation and fabrication of micro/nanomaterials. However, the manipulation devices are often precise and complex, operated only by professionals and subject to site constraints. We propose a simple optoelectronic tweezer platform, which can be controlled remotely and simply for the manipulation of microparticles at different scales, based on the novel manipulation technique called optically-induced dielectrophoresis. In this work, we design and set up the optoelectronic tweezer manipulation platform and develop the full-function human-computer interactive control interface and graphics rendering system to simplify the micro-operation process. Using Qt5. 0 development environment, an experimental image processing system with multi-thread characteristics is developed, and the information interaction requirements needed in the experimental operation of optoelectronic tweezers are integrated into a control system to achieve unified information management and data analysis. Combined with cloud computing technology, the system realizes local/remote synchronous linkage operation, with cross-platform operation capability of various portable operating terminals like laptop and iPad.

IROS Conference 2021 Conference Paper

Precise Control of Magnetized Macrophage Cell Robot for Targeted Drug Delivery

  • Luyao Wang
  • Yuguo Dai
  • Hongyan Sun
  • Li Song
  • Lina Jia
  • Chiju Jiang
  • Fumihito Arai
  • Lin Feng 0002

Micro-nano-robots are considered to be a promising platform for drug delivery in biological organisms, but there are still urgent technical problems in biocompatibility and degradability of 3D-printed-based micro-robots that need to be solved. Therefore, in this paper, we design a magnetized bio-hybrid robot, which uses mouse macrophages as carriers, and allowed it to swallow Fe 2 O 3 particles with a diameter of 10 nm. The robot takes advantage of macrophage’s natural biocompatibility and targeting characteristics to reach and function in complex environments such as: eye, knee, tumor, etc. , and finally being able to be actively metabolized by the organism. More importantly, the cell robot can move precisely along a preplanned path under the control of a three-dimensional magnetic control system built in this study, and be delivered accurately to the vicinity of cancer cells in vitro environment. In future work, cellular robots could be allowed to carry anti-cancer drugs and release them in a targeted manner at the lesion. These microrobots have shown great potential for tumor reginal targeted drug delivery.

AAAI Conference 2020 Conference Paper

FACT: Fused Attention for Clothing Transfer with Generative Adversarial Networks

  • Yicheng Zhang
  • Lei Li
  • Li Song
  • Rong Xie
  • Wenjun Zhang

Clothing transfer is a challenging task in computer vision where the goal is to transfer the human clothing style in an input image conditioned on a given language description. However, existing approaches have limited ability in delicate colorization and texture synthesis with a conventional fully convolutional generator. To tackle this problem, we propose a novel semantic-based Fused Attention model for Clothing Transfer (FACT), which allows fine-grained synthesis, high global consistency and plausible hallucination in images. Towards this end, we incorporate two attention modules based on spatial levels: (i) soft attention that searches for the most related positions in sentences, and (ii) self-attention modeling long-range dependencies on feature maps. Furthermore, we also develop a stylized channel-wise attention module to capture correlations on feature levels. We effectively fuse these attention modules in the generator and achieve better performances than the state-of-the-art method on the DeepFashion dataset. Qualitative and quantitative comparisons against the baselines demonstrate the effectiveness of our approach.

IROS Conference 2020 Conference Paper

Magnetized Cell-robot Propelled by Magnetic Field for Cancer Killing

  • Yuguo Dai
  • Yanmin Feng
  • Lin Feng 0002
  • Yuanyuan Chen 0002
  • Xue Bai
  • Shuzhang Liang
  • Li Song
  • Fumihito Arai

In this paper, we present a magnetized cell-robot using macrophages as templates, which can be controlled under a strong gradient magnetic field, to approach and kill cancer cells in both vitro and vivo environment. Firstly, we establish a magnetic control system using only four coils which can generate gradient field up to 4. 14 T/m utilizing the coupled field contributed by multiple electromagnets acting in concert. Most importantly, the cell-robot which is based on the macrophage is proposed, and can be transported to the vicinity of cancer cells precisely using strong gradient magnetic field. Then the cell-robot will actively phagocytose the cancer cells and eventually kill them, achieving the cancer treatment at the cellular level. It has important significance for guiding accurate targeted therapy in vivo for the future, under the premise of zero harm to the human body.

IJCAI Conference 2018 Conference Paper

Aspect-Level Deep Collaborative Filtering via Heterogeneous Information Networks

  • Xiaotian Han
  • Chuan Shi
  • Senzhang Wang
  • Philip S. Yu
  • Li Song

Latent factor models have been widely used for recommendation. Most existing latent factor models mainly utilize the rating information between users and items, although some recently extended models add some auxiliary information to learn a unified latent factor between users and items. The unified latent factor only represents the latent features of users and items from the aspect of purchase history. However, the latent features of users and items may stem from different aspects, e. g. , the brand-aspect and category-aspect of items. In this paper, we propose a Neural network based Aspect-level Collaborative Filtering model (NeuACF) to exploit different aspect latent factors. Through modelling rich objects and relations in recommender system as a heterogeneous information network, NeuACF first extracts different aspect-level similarity matrices of users and items through different meta-paths and then feeds an elaborately designed deep neural network with these matrices to learn aspect-level latent factors. Finally, the aspect-level latent factors are effectively fused with an attention mechanism for the top-N recommendation. Extensive experiments on three real datasets show that NeuACF significantly outperforms both existing latent factor models and recent neural network models.

YNIMG Journal 2017 Journal Article

The quantification of blood-brain barrier disruption using dynamic contrast-enhanced magnetic resonance imaging in aging rhesus monkeys with spontaneous type 2 diabetes mellitus

  • Ziqian Xu
  • Wen Zeng
  • Jiayu Sun
  • Wei Chen
  • Ruzhi Zhang
  • Zunyuan Yang
  • Zunwei Yao
  • Lei Wang

Microvascular lesions of the body are one of the most serious complications that can affect patients with type 2 diabetes mellitus. The blood-brain barrier (BBB) is a highly selective permeable barrier around the microvessels of the brain. This study investigated BBB disruption in diabetic rhesus monkeys using dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). Multi-slice DCE-MRI was used to quantify BBB permeability. Five diabetic monkeys and six control monkeys underwent magnetic resonance brain imaging in 3 Tesla MRI system. Regions of the frontal cortex, the temporal cortex, the basal ganglia, the thalamus, and the hippocampus in the two groups were selected as regions of interest to calculate the value of the transport coefficient Ktrans using the extended Tofts model. Permeability in the diabetic monkeys was significantly increased as compared with permeability in the normal control monkeys. Histopathologically, zonula occludens protein-1 decreased, immunoglobulin G leaked out of the blood, and nuclear factor E2–related factor translocated from the cytoplasm to the nuclei. It is likely that diabetes contributed to the increased BBB permeability.

AAAI Conference 2016 Conference Paper

DRIMUX: Dynamic Rumor Influence Minimization with User Experience in Social Networks

  • Biao Wang
  • Ge Chen
  • Luoyi Fu
  • Li Song
  • Xinbing Wang
  • Xue Liu

Rumor blocking is a serious problem in large-scale social networks. Malicious rumors could cause chaos in society and hence need to be blocked as soon as possible after being detected. In this paper, we propose a model of dynamic rumor influence minimization with user experience (DRIMUX). Our goal is to minimize the influence of the rumor (i. e. , the number of users that have accepted and sent the rumor) by blocking a certain subset of nodes. A dynamic Ising propagation model considering both the global popularity and individual attraction of the rumor is presented based on realistic scenario. In addition, different from existing problems of in- fluence minimization, we take into account the constraint of user experience utility. Specifically, each node is assigned a tolerance time threshold. If the blocking time of each user exceeds that threshold, the utility of the network will decrease. Under this constraint, we then formulate the problem as a network inference problem with survival theory, and propose solutions based on maximum likelihood principle. Experiments are implemented based on large-scale real world networks and validate the effectiveness of our method.

JBHI Journal 2015 Journal Article

Recognizing Common CT Imaging Signs of Lung Diseases Through a New Feature Selection Method Based on Fisher Criterion and Genetic Optimization

  • Xiabi Liu
  • Ling Ma
  • Li Song
  • Yanfeng Zhao
  • Xinming Zhao
  • Chunwu Zhou

Common CT imaging signs of lung diseases (CISLs) are defined as the imaging signs that frequently appear in lung CT images from patients and play important roles in the diagnosis of lung diseases. This paper proposes a new feature selection method based on FIsher criterion and genetic optimization, called FIG for short, to tackle the CISL recognition problem. In our FIG feature selection method, the Fisher criterion is applied to evaluate feature subsets, based on which a genetic optimization algorithm is developed to find out an optimal feature subset from the candidate features. We use the FIG method to select the features for the CISL recognition from various types of features, including bag-of-visual-words based on the histogram of oriented gradients, the wavelet transform-based features, the local binary pattern, and the CT value histogram. Then, the selected features cooperate with each of five commonly used classifiers including support vector machine (SVM), Bagging (Bag), Naïve Bayes (NB), k -nearest neighbor ( k -NN), and AdaBoost (Ada) to classify the regions of interests (ROIs) in lung CT images into the CISL categories. In order to evaluate the proposed feature selection method and CISL recognition approach, we conducted the fivefold cross-validation experiments on a set of 511 ROIs captured from real lung CT images. For all the considered classifiers, our FIG method brought the better recognition performance than not only the full set of original features but also any single type of features. We further compared our FIG method with the feature selection method based on classification accuracy rate and genetic optimization (ARG). The advantages on computation effectiveness and efficiency of FIG over ARG are shown through experiments.

TIST Journal 2013 Journal Article

Reorder user's tweets

  • Keyi Shen
  • Jianmin Wu
  • Ya Zhang
  • Yiping Han
  • Xiaokang Yang
  • Li Song
  • Xiao Gu

Twitter displays the tweets a user received in a reversed chronological order, which is not always the best choice. As Twitter is full of messages of very different qualities, many informative or relevant tweets might be flooded or displayed at the bottom while some nonsense buzzes might be ranked higher. In this work, we present a supervised learning method for personalized tweets reordering based on user interests. User activities on Twitter, in terms of tweeting, retweeting, and replying, are leveraged to obtain the training data for reordering models. Through exploring a rich set of social and personalized features, we model the relevance of tweets by minimizing the pairwise loss of relevant and irrelevant tweets. The tweets are then reordered according to the predicted relevance scores. Experimental results with real twitter user activities demonstrated the effectiveness of our method. The new method achieved above 30% accuracy gain compared with the default ordering in twitter based on time.