Author name cluster

Wen Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

19 papers

2 author rows

EAAI Journal 2026 Journal Article

A risk-aware driving agent with Long Short-Term Memory and Time-to-Collision co-decision and multi-modal fusion in urban simulation

Xu Zhao
Yu Gao
Wen Liu
Lianpeng Li
Suxian Zhang
Zijun Wang

Details DOI

TCS Journal 2026 Journal Article

An approximation algorithm for the asymmetric profitable tour problem with submodular penalties

Zhenzhen Pang
Wen Liu
Bo Hou

Details DOI

TCS Journal 2026 Journal Article

Approximation algorithm for k-Product uncapacitated facility location problem with submodular penalties

Chenzheng Feng
Wen Liu
Gengsheng Zhang
Bo Hou

Details DOI

EAAI Journal 2025 Journal Article

World model-based reinforcement learning for autonomous ship safe collision avoidance

Kangjie Zheng
Xinyu Zhang
Wen Liu
Jialin Ma
Jinlong Cui

Details DOI

EAAI Journal 2024 Journal Article

Data-driven deformation prediction and control for existing tunnels below shield tunneling

Zongbao Feng
Jingyi Wang
Wen Liu
Tiejun Li
Xianguo Wu
Pengxin Zhao

Details DOI

ICLR Conference 2024 Conference Paper

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior

Jingxiang Sun
Bo Zhang
Ruizhi Shao
Lizhen Wang 0002
Wen Liu
Zhenda Xie
Yebin Liu

We present DreamCraft3D, a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects. We tackle the problem by leveraging a 2D reference image to guide the stages of geometry sculpting and texture boosting. A central focus of this work is to address the consistency issue that existing works encounter. To sculpt geometries that render coherently, we perform score distillation sampling via a view-dependent diffusion model. This 3D prior, alongside several training strategies, prioritizes the geometry consistency but compromises the texture fidelity. We further propose bootstrapped score distillation to specifically boost the texture. We train a personalized diffusion model, Dreambooth, on the augmented renderings of the scene, imbuing it with 3D knowledge of the scene being optimized. The score distillation from this 3D-aware diffusion prior provides view-consistent guidance for the scene. Notably, through an alternating optimization of the diffusion prior and 3D scene representation, we achieve mutually reinforcing improvements: the optimized 3D scene aids in training the scene-specific diffusion model, which offers increasingly view-consistent guidance for 3D optimization. The optimization is thus bootstrapped and leads to substantial texture boosting. With tailored 3D priors throughout the hierarchical generation, DreamCraft3D generates coherent 3D objects with photorealistic renderings, advancing the state-of-the-art in 3D content generation.

Details

AAAI Conference 2024 Conference Paper

PM-INR: Prior-Rich Multi-Modal Implicit Large-Scale Scene Neural Representation

Yiying Yang
Fukun Yin
Wen Liu
Jiayuan Fan
Xin Chen
Gang Yu
Tao Chen

Recent advancements in implicit neural representations have contributed to high-fidelity surface reconstruction and photorealistic novel view synthesis. However, with the expansion of the scene scale, such as block or city level, existing methods will encounter challenges because traditional sampling cannot cope with the cubically growing sampling space. To alleviate the dependence on filling the sampling space, we explore using multi-modal priors to assist individual points to obtain more global semantic information and propose a priorrich multi-modal implicit neural representation network, Pm-INR, for the outdoor unbounded large-scale scene. The core of our method is multi-modal prior extraction and crossmodal prior fusion modules. The former encodes codebooks from different modality inputs and extracts valuable priors, while the latter fuses priors to maintain view consistency and preserve unique features among multi-modal priors. Finally, feature-rich cross-modal priors are injected into the sampling regions to allow each region to perceive global information without filling the sampling space. Extensive experiments have demonstrated the effectiveness and robustness of our method for outdoor unbounded large-scale scene novel view synthesis, which outperforms state-of-the-art methods in terms of PSNR, SSIM, and LPIPS.

PDF Details DOI

YNICL Journal 2023 Journal Article

Altered gray matter volumes and plasma IL-6 level in major depressive disorder patients with suicidal ideation

Yingrui Guo
Xiaowei Jiang
Linna Jia
Yue Zhu
Xinyu Han
Yifan Wu
Wen Liu
Wenhui Zhao

Details DOI

YNICL Journal 2023 Journal Article

Gray matter volume reduction in orbitofrontal cortex correlated with plasma glial cell line-derived neurotrophic factor (GDNF) levels within major depressive disorder

Yifan Wu
Lingtao Kong
Anqi Yang
Kaiqi Xin
Yihui Lu
Xintong Yan
Wen Liu
Yue Zhu

Details DOI

NeurIPS Conference 2023 Conference Paper

Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation

Zibo Zhao
Wen Liu
Xin Chen
Xianfang Zeng
Rui Wang
Pei Cheng
Bin Fu
Tao Chen

We present a novel alignment-before-generation approach to tackle the challenging task of generating general 3D shapes based on 2D images or texts. Directly learning a conditional generative model from images or texts to 3D shapes is prone to producing inconsistent results with the conditions because 3D shapes have an additional dimension whose distribution significantly differs from that of 2D images and texts. To bridge the domain gap among the three modalities and facilitate multi-modal-conditioned 3D shape generation, we explore representing 3D shapes in a shape-image-text-aligned space. Our framework comprises two models: a Shape-Image-Text-Aligned Variational Auto-Encoder (SITA-VAE) and a conditional Aligned Shape Latent Diffusion Model (ASLDM). The former model encodes the 3D shapes into the shape latent space aligned to the image and text and reconstructs the fine-grained 3D neural fields corresponding to given shape embeddings via the transformer-based decoder. The latter model learns a probabilistic mapping function from the image or text space to the latent shape space. Our extensive experiments demonstrate that our proposed approach can generate higher-quality and more diverse 3D shapes that better semantically conform to the visual or textural conditional inputs, validating the effectiveness of the shape-image-text-aligned space for cross-modality 3D shape generation.

PDF Details

NeurIPS Conference 2023 Conference Paper

MotionGPT: Human Motion as a Foreign Language

Biao Jiang
Xin Chen
Wen Liu
Jingyi Yu
Gang Yu
Tao Chen

Though the advancement of pre-trained large language models unfolds, the exploration of building a unified model for language and other multimodal data, such as motion, remains challenging and untouched so far. Fortunately, human motion displays a semantic coupling akin to human language, often perceived as a form of body language. By fusing language data with large-scale motion models, motion-language pre-training that can enhance the performance of motion-related tasks becomes feasible. Driven by this insight, we propose MotionGPT, a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks. Specifically, we employ the discrete vector quantization for human motion and transfer 3D motion into motion tokens, similar to the generation process of word tokens. Building upon this "motion vocabulary", we perform language modeling on both motion and text in a unified manner, treating human motion as a specific language. Moreover, inspired by prompt learning, we pre-train MotionGPT with a mixture of motion-language data and fine-tune it on prompt-based question-and-answer tasks. Extensive experiments demonstrate that MotionGPT achieves state-of-the-art performances on multiple motion tasks including text-driven motion generation, motion captioning, motion prediction, and motion in-between.

PDF Details

NeurIPS Conference 2023 Conference Paper

PDF: Point Diffusion Implicit Function for Large-scale Scene Neural Representation

Yuhan Ding
Fukun Yin
Jiayuan Fan
Hui Li
Xin Chen
Wen Liu
Chongshan Lu
Gang Yu

Recent advances in implicit neural representations have achieved impressive results by sampling and fusing individual points along sampling rays in the sampling space. However, due to the explosively growing sampling space, finely representing and synthesizing detailed textures remains a challenge for unbounded large-scale outdoor scenes. To alleviate the dilemma of using individual points to perceive the entire colossal space, we explore learning the surface distribution of the scene to provide structural priors and reduce the samplable space and propose a Point Diffusion implicit Function, PDF, for large-scale scene neural representation. The core of our method is a large-scale point cloud super-resolution diffusion module that enhances the sparse point cloud reconstructed from several training images into a dense point cloud as an explicit prior. Then in the rendering stage, only sampling points with prior points within the sampling radius are retained. That is, the sampling space is reduced from the unbounded space to the scene surface. Meanwhile, to fill in the background of the scene that cannot be provided by point clouds, the region sampling based on Mip-NeRF 360 is employed to model the background representation. Expensive experiments have demonstrated the effectiveness of our method for large-scale scene novel view synthesis, which outperforms relevant state-of-the-art baselines.

PDF Details

YNICL Journal 2022 Journal Article

Altered dynamic amplitude of low-frequency fluctuation between bipolar type I and type II in the depressive state

Wen Liu
Xiaowei Jiang
Zijing Deng
Linna Jia
Qikun Sun
Lingtao Kong
Feng Wu
Yanqing Tang

Details DOI

NeurIPS Conference 2022 Conference Paper

Coordinates Are NOT Lonely - Codebook Prior Helps Implicit Neural 3D representations

Fukun Yin
Wen Liu
Zilong Huang
Pei Cheng
Tao Chen
Gang Yu

Implicit neural 3D representation has achieved impressive results in surface or scene reconstruction and novel view synthesis, which typically uses the coordinate-based multi-layer perceptrons (MLPs) to learn a continuous scene representation. However, existing approaches, such as Neural Radiance Field (NeRF) and its variants, usually require dense input views (i. e. 50-150) to obtain decent results. To relive the over-dependence on massive calibrated images and enrich the coordinate-based feature representation, we explore injecting the prior information into the coordinate-based network and introduce a novel coordinate-based model, CoCo-INR, for implicit neural 3D representation. The cores of our method are two attention modules: codebook attention and coordinate attention. The former extracts the useful prototypes containing rich geometry and appearance information from the prior codebook, and the latter propagates such prior information into each coordinate and enriches its feature representation for a scene or object surface. With the help of the prior information, our method can render 3D views with more photo-realistic appearance and geometries than the current methods using fewer calibrated images available. Experiments on various scene reconstruction datasets, including DTU and BlendedMVS, and the full 3D head reconstruction dataset, H3DS, demonstrate the robustness under fewer input views and fine detail-preserving capability of our proposed method.

PDF Details

AAAI Conference 2021 Conference Paper

Appearance-Motion Memory Consistency Network for Video Anomaly Detection

Ruichu Cai
Hao Zhang
Wen Liu
Shenghua Gao
Zhifeng Hao

Abnormal event detection in the surveillance video is an essential but the challenging task and many methods have been proposed to deal with this problem. The previous methods either only considers the appearance information or directly integrate the results of appearance and motion information without considering their endogenous consistency semantic explicitly. Inspired by the rule that humans identify the abnormal frames from multi-modality signals, we propose an Appearance-Motion Memory Consistency Network (AMMC-Net). Our method first makes full use of the prior knowledge of appearance and motion signals to capture the correspondence between them in the high-level feature space explicitly. Then, it combines the multi-view features to obtain a more essential and robust feature representation of regular events, which can significantly increase the gap between an abnormal and a regular event. In the anomaly detection phase, we further introduce a commit error in the latent space joint with the prediction error in pixel space to enhance the detection accuracy. Solid experimental results on various standard datasets validate the effectiveness of our approach.

PDF Details

TCS Journal 2021 Journal Article

Approximation algorithms for the submodular edge cover problem with submodular penalties

Xin Wang
Suogang Gao
Bo Hou
Lidong Wu
Wen Liu

Details DOI

TCS Journal 2021 Journal Article

On approximation algorithm for the edge metric dimension problem

Yufei Huang
Bo Hou
Wen Liu
Lidong Wu
Stephen Rainwater
Suogang Gao

Details DOI

TCS Journal 2020 Journal Article

An approximation algorithm for the k-prize-collecting multicut on a tree problem

Xin Hou
Wen Liu
Bo Hou

Details DOI

IJCAI Conference 2019 Conference Paper

Margin Learning Embedded Prediction for Video Anomaly Detection with A Few Anomalies

Wen Liu
Weixin Luo
Zhengxin Li
Peilin Zhao
Shenghua Gao

Classical semi-supervised video anomaly detection assumes that only normal data are available in the training set because of the rare and unbounded nature of anomalies. It is obviously, however, these infrequently observed abnormal events can actually help with the detection of identical or similar abnormal events, a line of thinking that motivates us to study open-set supervised anomaly detection with only a few types of abnormal observed events and many normal events available. Under the assumption that normal events can be well predicted, we propose a Margin Learning Embedded Prediction (MLEP) framework. There are three features in MLEP- based open-set supervised video anomaly detection: i) we customize a video prediction framework that favors the prediction of normal events and distorts the prediction of abnormal events; ii) The margin learning framework learns a more compact normal data distribution and enlarges the margin between normal and abnormal events. Since abnormal events are unbounded, our framework consequently helps with the detection of abnormal events, even for anomalies that have never been previously observed. Therefore, our framework is suitable for the open-set supervised anomaly detection setting; iii) our framework can readily handle both frame-level and video-level anomaly annotations. Considering that video-level anomaly detection is more easily annotated in practice and that anomaly detection with a few anomalies is a more practical setting, our work thus pushes the application of anomaly detection towards real scenarios. Extensive experiments validate the effectiveness of our framework for anomaly detection.

PDF Details