Arrow Research search

Author name cluster

Yi Yuan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

25 papers
2 author rows

Possible papers

25

AAAI Conference 2026 Conference Paper

HumanSense: From Multimodal Perception to Empathetic Context-Aware Responses Through Reasoning MLLMs

  • Zheng Qin
  • Ruobing Zheng
  • Yabing Wang
  • Tianqi Li
  • Yi Yuan
  • Jingdong Chen
  • Le Wang

While Multimodal Large Language Models (MLLMs) show immense promise for achieving truly human-like interactions, progress is hindered by the lack of fine-grained evaluation frameworks for human-centered scenarios, encompassing both the understanding of complex human intentions and the provision of empathetic, context-aware responses. Here we introduce HumanSense, a comprehensive benchmark designed to evaluate the human-centered perception and interaction capabilities of MLLMs, with a particular focus on deep understanding of extended multimodal contexts and the formulation of rational feedback. Our evaluation reveals that leading MLLMs still have considerable room for improvement, particularly for advanced interaction-oriented tasks. Supplementing visual input with audio and text information yields substantial improvements, and Omni-modal models show advantages on these tasks.Furthermore, grounded in the observation that appropriate feedback stems from a contextual analysis of the interlocutor's needs and emotions, we posit that reasoning ability serves as the key to unlocking it. We devise a multi-stage, modality-progressive reinforcement learning approach, resulting in HumanSense-Omni-Reasoning, which substantially enhances performance on higher-level understanding and interactive tasks. Additionally, we observe that successful reasoning processes appear to exhibit consistent thought patterns. By designing corresponding prompts, we also enhance the performance of non-reasoning models in a training-free manner.

IJCAI Conference 2025 Conference Paper

General Incomplete Time Series Analysis via Patch Dropping Without Imputation

  • Yangyang Wu
  • Yi Yuan
  • Mengying Zhu
  • Xiaoye Miao
  • Meng Xi

Missing values in multivariate time series data present significant challenges to effective analysis. Existing methods for multivariate time series analysis either ignore missing data, sacrificing performance, or follow the impute-then-analyze paradigm, which suffers from redundant training and error accumulation, leading to biased results and suboptimal performance. In this paper, we propose INTER, a novel end-to-end framework for incomplete multivariate time series analysis, which bypasses imputation by leveraging pre-trained language models to learn the distribution of incomplete time series data. INTER incorporates two novel components: the missing-rate-aware time series patch-dropping (MPD) strategy and the missing-aware Transformer block, both of which we propose to enhance model generalization, robustness, and the ability to capture underlying patterns in the observed incomplete time series. Moreover, we theoretically prove that the MPD strategy exhibits lower sample variance for time series with the same dropout rate compared to other dropping strategies. Extensive experiments on 11 public real-world time series datasets demonstrate that INTER improves accuracy by over 20% compared to state-of-the-art methods, while maintaining competitive computational efficiency.

YNIMG Journal 2025 Journal Article

Low-intensity transcranial ultrasound stimulation promotes the extinction of fear memory through the BDNF-TrkB signaling pathway

  • Degong Meng
  • Cong Zhang
  • Jiamin Pei
  • Xiao Zhang
  • Hanna Lu
  • Hui Ji
  • Xiangjian Zhang
  • Yi Yuan

Synaptic plasticity plays a crucial role in the extinction of fearful memories. Low-intensity transcranial ultrasound stimulation (TUS) can modulate synaptic plasticity and promote the extinction of fear memories. However, the mechanism by which TUS promotes the extinction of fear memory remains unclear. This study aimed to explore whether and how synaptic plasticity under TUS is involved in modulating fear memory and the role of the brain-derived neurotrophic factor (BDNF)-the tropomyosin-related kinase B (TrkB) signaling pathway in this process. We used behavioral tests and two-photon fluorescence imaging to investigate the modulatory effects of TUS on fear memory and examined the formation/elimination of dendritic spines and the calcium activity of pyramidal neurons in the prefrontal cortex in mice in vivo. We found that TUS of the prefrontal cortex can promote fear memory extinction in mice while promoting dendritic spine formation, reducing dendritic spine elimination, increasing pyramidal neuron activity, and enhancing the expression of BDNF and its receptor TrkB. Conversely, inhibiting the BDNF-TrkB signaling pathway weakened these effects of ultrasound stimulation. Our study demonstrated that TUS could promote the extinction of fear memories, indicating that TUS has the potential to be used in the clinical treatment of patients with fear memory.

IJCAI Conference 2025 Conference Paper

MMNet: Missing-Aware and Memory-Enhanced Network for Multivariate Time Series Imputation

  • Xiaoye Miao
  • Han Shi
  • Yi Yuan
  • Daozhan Pan
  • Yangyang Wu
  • Xiaohua Pan

Multivariate time series (MTS) data in real-world scenarios are often incomplete, which hinders effective data analysis. Therefore, MTS imputation has been widely studied to facilitate various MTS tasks. Existing imputation methods primarily initialize missing values with zeros in order to perform effective incomplete MTS encoding, which impede the model's capacity to precisely discern the missing distribution. Moreover, these methods often overlook the global similarity in time series but are limited in the use of local information within the sample. To this end, we propose a novel multivariate time series imputation network model, named MMNet. MMNet introduces a Missing-Aware Embedding (MAE) approach to adaptively represent incomplete MTS, allowing the model to better distinguish between missing and observed data. Furthermore, we design a Memory-Enhanced Encoder (MEE) aimed at modeling prior knowledge through memory mechanism, enabling better utilization of the global similarity within the time series. Building upon this, MMNet incorporates a Multi-scale Mixing architecture (MSM) that leverages information from multiple scales to enhance the final imputation. Extensive experiments on four public real-world datasets demonstrate that, MMNet yields a more than 25% gain in performance, compared with the state-of-the-art methods.

YNIMG Journal 2024 Journal Article

Low-intensity transcranial ultrasound stimulation improves memory behavior in an ADHD rat model by modulating cortical functional network connectivity

  • Mengran Wang
  • Zhenyu Xie
  • Teng Wang
  • Shuxun Dong
  • Zhenfang Ma
  • Xiangjian Zhang
  • Xin Li
  • Yi Yuan

Working memory in attention deficit hyperactivity disorder (ADHD) is closely related to cortical functional network connectivity (CFNC), such as abnormal connections between the frontal, temporal, occipital cortices and with other brain regions. Low-intensity transcranial ultrasound stimulation (TUS) has the advantages of non-invasiveness, high spatial resolution, and high penetration depth and can improve ADHD memory behavior. However, how it modulates CFNC in ADHD and the CFNC mechanism that improves working memory behavior in ADHD remain unclear. In this study, we observed working memory impairment in ADHD rats, establishing a corresponding relationship between changes in CFNCs and the behavioral state during the working memory task. Specifically, we noted abnormalities in the information transmission and processing capabilities of CFNC in ADHD rats while performing working memory tasks. These abnormalities manifested in the network integration ability of specific areas, as well as the information flow and functional differentiation of CFNC. Furthermore, our findings indicate that TUS effectively enhances the working memory ability of ADHD rats by modulating information transmission, processing, and integration capabilities, along with adjusting the information flow and functional differentiation of CFNC. Additionally, we explain the CFNC mechanism through which TUS improves working memory in ADHD. In summary, these findings suggest that CFNCs are important in working memory behaviors in ADHD.

YNIMG Journal 2024 Journal Article

Low-intensity transcranial ultrasound stimulation improves memory in vascular dementia by enhancing neuronal activity and promoting spine formation

  • Jiamin Pei
  • Cong Zhang
  • Xiao Zhang
  • Zhe Zhao
  • Xiangjian Zhang
  • Yi Yuan

Memory is closely associated with neuronal activity and dendritic spine formation. Low-intensity transcranial ultrasound stimulation (TUS) improves the memory of individuals with vascular dementia (VD). However, it is unclear whether neuronal activity and dendritic spine formation under ultrasound stimulation are involved in memory improvement in VD. In this study, we found that seven days of TUS improved memory in VD model while simultaneously increasing pyramidal neuron activity, promoting dendritic spine formation, and reducing dendritic spine elimination. These effects lasted for 7 days but disappeared on 14 d after TUS. Neuronal activity and dendritic spine formation strongly corresponded to improvements in memory behavior over time. In addition, we also found that the memory, neuronal activity and dendritic spine of VD mice cannot be restored again by TUS of 7 days after 28 d. Collectively, these findings suggest that TUS increases neuronal activity and promotes dendritic spine formation and is thus important for improving memory in patients with VD.

ICML Conference 2023 Conference Paper

AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

  • Haohe Liu
  • Zehua Chen 0005
  • Yi Yuan
  • Xinhao Mei
  • Xubo Liu 0001
  • Danilo P. Mandic
  • Wenwu Wang 0001
  • Mark D. Plumbley

Text-to-audio (TTA) systems have recently gained attention for their ability to synthesize general audio based on text descriptions. However, previous studies in TTA have limited generation quality with high computational costs. In this study, we propose AudioLDM, a TTA system that is built on a latent space to learn continuous audio representations from contrastive language-audio pretraining (CLAP) embeddings. The pretrained CLAP models enable us to train LDMs with audio embeddings while providing text embeddings as the condition during sampling. By learning the latent representations of audio signals without modelling the cross-modal relationship, AudioLDM improves both generation quality and computational efficiency. Trained on AudioCaps with a single GPU, AudioLDM achieves state-of-the-art TTA performance compared to other open-sourced systems, measured by both objective and subjective metrics. AudioLDM is also the first TTA system that enables various text-guided audio manipulations (e. g. , style transfer) in a zero-shot fashion. Our implementation and demos are available at https: //audioldm. github. io.

YNIMG Journal 2023 Journal Article

Low-intensity ultrasound stimulation modulates time-frequency patterns of cerebral blood oxygenation and neurovascular coupling of mouse under peripheral sensory stimulation state

  • Yi Yuan
  • Qianqian Wu
  • Xingran Wang
  • Mengyang Liu
  • Jiaqing Yan
  • Hui Ji

Previous studies have demonstrated that transcranial ultrasound stimulation (TUS) not only modulates cerebral hemodynamics, neural activity, and neurovascular coupling characteristics in resting samples but also exerts a significant inhibitory effect on the neural activity in task samples. However, the effect of TUS on cerebral blood oxygenation and neurovascular coupling in task samples remains to be elucidated. To answer this question, we first used forepaw electrical stimulation of the mice to elicit the corresponding cortical excitation, and then stimulated this cortical region using different modes of TUS, and simultaneously recorded the local field potential using electrophysiological acquisition and hemodynamics using optical intrinsic signal imaging. The results indicate that for the mice under peripheral sensory stimulation state, TUS with a duty cycle of 50% can (1) enhance the amplitude of cerebral blood oxygenation signal, (2) reduce the time-frequency characteristics of evoked potential, (3) reduce the strength of neurovascular coupling in time domain, (4) enhance the strength of neurovascular coupling in frequency domain, and (5) reduce the time-frequency cross-coupling of neurovasculature. The results of this study indicate that TUS can modulate the cerebral blood oxygenation and neurovascular coupling in peripheral sensory stimulation state mice under specific parameters. This study opens up a new area of investigation for potential applicability of TUS in brain diseases related to cerebral blood oxygenation and neurovascular coupling.

AAAI Conference 2023 Conference Paper

SwiftAvatar: Efficient Auto-Creation of Parameterized Stylized Character on Arbitrary Avatar Engines

  • Shizun Wang
  • Weihong Zeng
  • Xu Wang
  • Hao Yang
  • Li Chen
  • Chuang Zhang
  • Ming Wu
  • Yi Yuan

The creation of a parameterized stylized character involves careful selection of numerous parameters, also known as the "avatar vectors" that can be interpreted by the avatar engine. Existing unsupervised avatar vector estimation methods that auto-create avatars for users, however, often fail to work because of the domain gap between realistic faces and stylized avatar images. To this end, we propose SwiftAvatar, a novel avatar auto-creation framework that is evidently superior to previous works. SwiftAvatar introduces dual-domain generators to create pairs of realistic faces and avatar images using shared latent codes. The latent codes can then be bridged with the avatar vectors as pairs, by performing GAN inversion on the avatar images rendered from the engine using avatar vectors. Through this way, we are able to synthesize paired data in high-quality as many as possible, consisting of avatar vectors and their corresponding realistic faces. We also propose semantic augmentation to improve the diversity of synthesis. Finally, a light-weight avatar vector estimator is trained on the synthetic pairs to implement efficient auto-creation. Our experiments demonstrate the effectiveness and efficiency of SwiftAvatar on two different avatar engines. The superiority and advantageous flexibility of SwiftAvatar are also verified in both subjective and objective evaluations.

YNIMG Journal 2023 Journal Article

Transcranial ultrasound stimulation at the peak-phase of theta-cycles in the hippocampus improve memory performance

  • Zhenyu Xie
  • Shuxun Dong
  • Yiyao Zhang
  • Yi Yuan

The present study aimed to investigate the effectiveness of closed-loop transcranial ultrasound stimulation (closed-loop TUS) as a non-invasive, high temporal-spatial resolution method for modulating brain function to enhance memory. For this purpose, we applied closed-loop TUS to the CA1 region of the rat hippocampus for 7 consecutive days at different phases of theta cycles. Following the intervention, we evaluated memory performance through behavioral testing and recorded the neural activity. Our results indicated that closed-loop TUS applied at the peak phase of theta cycles significantly improves the memory performance in rats, as evidenced by behavioral testing. Furthermore, we observed that closed-loop TUS modifies the power and cross-frequency coupling strength of local field potentials (LFPs) during memory task, as well as modulates neuronal activity patterns and synaptic transmission, depending on phase of stimulation relative to theta rhythm. We demonstrated that closed-loop TUS can modulate neural activity and memory performance in a phase-dependent manner. Specifically, we observed that effectiveness of closed-loop TUS in regulating neural activity and memory is dependent on the timing of stimulation in relation to different theta phase. The findings implied that closed-loop TUS may have the capability to alter neural activity and memory performance in a phase-sensitive manner, and suggested that the efficacy of closed-loop TUS in modifying neural activity and memory was contingent on timing of stimulation with respect to the theta rhythm. Moreover, the improvement in memory performance after closed-loop TUS was found to be persistent.

AAAI Conference 2022 Conference Paper

A Unified Framework for Real Time Motion Completion

  • Yinglin Duan
  • Yue Lin
  • Zhengxia Zou
  • Yi Yuan
  • Zhehui Qian
  • Bohan Zhang

Motion completion, as a challenging and fundamental problem, is of great significance in film and game applications. For different motion completion application scenarios (inbetweening, in-filling, and blending), most previous methods deal with the completion problems with case-by-case methodology designs. In this work, we propose a simple but effective method to solve multiple motion completion problems under a unified framework and achieve a new state-ofthe-art accuracy on LaFAN1 (+17% better than the previous SoTA) under multiple evaluation settings. Inspired by the recent great success of self-attention-based transformer models, we consider the completion as a sequence-to-sequence prediction problem. Our method consists of three modules a standard transformer encoder with self-attention that learns long-range dependencies of input motions, a trainable mixture embedding module that models temporal information and encodes different key-frame combinations in a unified form, and a new motion perceptual loss for better capturing high-frequency movements. Our method can predict multiple missing frames within a single forward propagation in real-time without post-processing. We also introduce a novel large-scale dance movement dataset for exploring the scaling capability of our method and its effectiveness in complex motion applications.

IJCAI Conference 2022 Conference Paper

Learning Implicit Body Representations from Double Diffusion Based Neural Radiance Fields

  • Guangming Yao
  • Hongzhi Wu
  • Yi Yuan
  • Lincheng Li
  • Kun Zhou
  • Xin Yu

In this paper, we present a novel double diffusion based neural radiance field, dubbed DD-NeRF, to reconstruct human body geometry and render the human body appearance in novel views from a sparse set of images. We first propose a double diffusion mechanism to achieve expressive representations of input images by fully exploiting human body priors and image appearance details at two levels. At the coarse level, we first model the coarse human body poses and shapes via an unclothed 3D deformable vertex model as guidance. At the fine level, we present a multi-view sampling network to capture subtle geometric deformations and image detailed appearances, such as clothing and hair, from multiple input views. Considering the sparsity of the two level features, we diffuse them into feature volumes in the canonical space to construct neural radiance fields. Then, we present a signed distance function (SDF) regression network to construct body surfaces from the diffused features. Thanks to our double diffused representations, our method can even synthesize novel views of unseen subjects. Experiments on various datasets demonstrate that our approach outperforms the state-of-the-art in both geometric reconstruction and novel view synthesis.

JBHI Journal 2022 Journal Article

Reinforcement Learning Based Diagnosis and Prediction for COVID-19 by Optimizing a Mixed Cost Function From CT Images

  • Siying Chen
  • Minghui Liu
  • Pan Deng
  • Jiali Deng
  • Yi Yuan
  • Xuan Cheng
  • Tianshu Xie
  • Libo Xie

A novel coronavirus disease (COVID-19) is a pandemic disease has caused 4 million deaths and more than 200 million infections worldwide (as of August 4, 2021). Rapid and accurate diagnosis of COVID-19 infection is critical to controlling the spread of the epidemic. In order to quickly and efficiently detect COVID-19 and reduce the threat of COVID-19 to human survival, we have firstly proposed a detection framework based on reinforcement learning for COVID-19 diagnosis, which constructs a mixed loss function that can integrate the advantages of multiple loss functions. This paper uses the accuracy of the validation set as the reward value, and obtains the initial model for the next epoch by searching the model corresponding to the maximum reward value in each epoch. We also have proposed a prediction framework that integrates multiple detection frameworks using parameter sharing to predict the progression of patients' disease without additional training. This paper also constructed a higher-quality version of the CT image dataset containing 247 cases screened by professional physicians, and obtained more excellent results on this dataset. Meanwhile, we used the other two COVID-19 datasets as external verifications, and still achieved a high accuracy rate without additional training. Finally, the experimental results show that our classification accuracy can reach 98. 31%, and the precision, sensitivity, specificity, and AUC (Area Under Curve) are 98. 82%, 97. 99%, 98. 67%, and 0. 989, respectively. The accuracy of external verification can reach 93. 34% and 91. 05%. What's more, the accuracy of our prediction framework is 91. 54%. A large number of experiments demonstrate that our proposed method is effective and robust for COVID-19 detection and prediction.

IJCAI Conference 2021 Conference Paper

Automatic Translation of Music-to-Dance for In-Game Characters

  • Yinglin Duan
  • Tianyang Shi
  • Zhipeng Hu
  • Zhengxia Zou
  • Changjie Fan
  • Yi Yuan
  • Xi Li

Music-to-dance translation is an emerging and powerful feature in recent role-playing games. Previous works of this topic consider music-to-dance as a supervised motion generation problem based on time-series data. However, these methods require a large amount of training data pairs and may suffer from the degradation of movements. This paper provides a new solution to this task where we re-formulate the translation as a piece-wise dance phrase retrieval problem based on the choreography theory. With such a design, players are allowed to optionally edit the dance movements on top of our generation while other regression-based methods ignore such user interactivity. Considering that the dance motion capture is expensive that requires the assistance of professional dancers, we train our method under a semi-supervised learning fashion with a large unlabeled music dataset (20x than our labeled one) and also introduce self-supervised pre-training to improve the training stability and generalization performance. Experimental results suggest that our method not only generalizes well over various styles of music but also succeeds in choreography for game players. Our project including the large-scale dataset and supplemental materials is available at https: //github. com/FuxiCV/music-to-dance.

AAAI Conference 2021 Conference Paper

HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation

  • Xiaoyang Lyu
  • Liang Liu
  • Mengmeng Wang
  • Xin Kong
  • Lina Liu
  • Yong Liu
  • Xinxin Chen
  • Yi Yuan

Self-supervised learning shows great potential in monocular depth estimation, using image sequences as the only source of supervision. Although people try to use high-resolution image for depth estimation, the accuracy of prediction has not been significantly improved. In this work, we find the core reason comes from the inaccurate depth estimation in large gradient regions, making the bilinear interpolation error gradually disappear as the resolution increases. To obtain more accurate depth estimation in large gradient regions, it is necessary to obtain high-resolution features with spatial and semantic information. Therefore, we present an improved DepthNet, HR-Depth, with two effective strategies: (1) redesign the skip-connection in DepthNet to get better highresolution features and (2) propose feature fusion Squeezeand-Excitation(fSE) module to fuse feature more efficiently. Using Resnet-18 as the encoder, HR-Depth surpasses all previous state-of-the-art(SoTA) methods with the least parameters at both high and low resolution. Moreover, previous SoTA methods are based on fairly complex and deep networks with many parameters which limits their real applications. Thus we also construct a lightweight network which uses MobileNetV3 as encoder. Experiments show that the lightweight network can perform on par with many large models like Monodepth2 at high-resolution with only 20% parameters. All codes and models will be available at https: //github. com/shawLyu/HR-Depth.

AAAI Conference 2021 Conference Paper

In-game Residential Home Planning via Visual Context-aware Global Relation Learning

  • Lijuan Liu
  • Yin Yang
  • Yi Yuan
  • Tianjia Shao
  • He Wang
  • Kun Zhou

In this paper, we propose an effective global relation learning algorithm to recommend an appropriate location of a building unit for in-game customization of residential home complex. Given a construction layout, we propose a visual contextaware graph generation network that learns the implicit global relations among the scene components and infers the location of a new building unit. The proposed network takes as input the scene graph and the corresponding top-view depth image. It provides the location recommendations for a newlyadded building units by learning an auto-regressive edge distribution conditioned on existing scenes. We also introduce a global graph-image matching loss to enhance the awareness of essential geometry semantics of the site. Qualitative and quantitative experiments demonstrate that the recommended location well reflects the implicit spatial rules of components in the residential estates, and it is instructive and practical to locate the building units in the 3D scene of the complex construction.

AAAI Conference 2021 Conference Paper

MeInGame: Create a Game Character Face from a Single Portrait

  • Jiangke Lin
  • Yi Yuan
  • Zhengxia Zou

Many deep learning based 3D face reconstruction methods have been proposed recently, however, few of them have applications in games. Current game character customization systems either require players to manually adjust considerable face attributes to obtain the desired face, or have limited freedom of facial shape and texture. In this paper, we propose an automatic character face creation method that predicts both facial shape and texture from a single portrait, and it can be integrated into most existing 3D games. Although 3D Morphable Face Model (3DMM) based methods can restore accurate 3D faces from single images, the topology of 3DMM mesh is different from the meshes used in most games. To acquire fidelity texture, existing methods require a large amount of face texture data for training, while building such datasets is time-consuming and laborious. Besides, such a dataset collected under laboratory conditions may not generalized well to in-the-wild situations. To tackle these problems, we propose 1) a low-cost facial texture acquisition method, 2) a shape transfer algorithm that can transform the shape of a 3DMM mesh to games, and 3) a new pipeline for training 3D game face reconstruction networks. The proposed method not only can produce detailed and vivid game characters similar to the input portrait, but can also eliminate the influence of lighting and occlusions. Experiments show that our method outperforms state-of-theart methods used in games. Code and dataset are available at https: //github. com/FuxiCV/MeInGame.

AAAI Conference 2021 Conference Paper

One-shot Face Reenactment Using Appearance Adaptive Normalization

  • Guangming Yao
  • Yi Yuan
  • Tianjia Shao
  • Shuang Li
  • Shanqi Liu
  • Yong Liu
  • Mengmeng Wang
  • Kun Zhou

The paper proposes a novel generative adversarial network for one-shot face reenactment, which can animate a single face image to a different pose-and-expression (provided by a driving image) while keeping its original appearance. The core of our network is a novel mechanism called appearance adaptive normalization, which can effectively integrate the appearance information from the input image into our face generator by modulating the feature maps of the generator using the learned adaptive parameters. Furthermore, we specially design a local net to reenact the local facial components (i. e. , eyes, nose and mouth) first, which is a much easier task for the network to learn and can in turn provide explicit anchors to guide our face generator to learn the global appearance and pose-and-expression. Extensive quantitative and qualitative experiments demonstrate the significant efficacy of our model compared with prior one-shot methods.

AAAI Conference 2021 Conference Paper

Structure-aware Person Image Generation with Pose Decomposition and Semantic Correlation

  • Jilin Tang
  • Yi Yuan
  • Tianjia Shao
  • Yong Liu
  • Mengmeng Wang
  • Kun Zhou

In this paper we tackle the problem of pose guided person image generation, which aims to transfer a person image from the source pose to a novel target pose while maintaining the source appearance. Given the inefficiency of standard CNNs in handling large spatial transformation, we propose a structure-aware flow based method for high-quality person image generation. Specifically, instead of learning the complex overall pose changes of human body, we decompose the human body into different semantic parts (e. g. , head, torso, and legs) and apply different networks to predict the flow fields for these parts separately. Moreover, we carefully design the network modules to effectively capture the local and global semantic correlations of features within and among the human parts respectively. Extensive experimental results show that our method can generate high-quality results under large pose discrepancy and outperforms state-of-the-art methods in both qualitative and quantitative comparisons.

AAAI Conference 2020 Conference Paper

Fast and Robust Face-to-Parameter Translation for Game Character Auto-Creation

  • Tianyang Shi
  • Zhengxia Zuo
  • Yi Yuan
  • Changjie Fan

With the rapid development of Role-Playing Games (RPGs), players are now allowed to edit the facial appearance of their in-game characters with their preferences rather than using default templates. This paper proposes a game character autocreation framework that generates in-game characters according to a player’s input face photo. Different from the previous methods that are designed based on neural style transfer or monocular 3D face reconstruction, we re-formulate the character auto-creation process in a different point of view: by predicting a large set of physically meaningful facial parameters under a self-supervised learning paradigm. Instead of updating facial parameters iteratively at the input end of the renderer as suggested by previous methods, which are timeconsuming, we introduce a facial parameter translator so that the creation can be done efficiently through a single forward propagation from the face embeddings to parameters, with a considerable 1000x computational speedup. Despite its high efficiency, the interactivity is preserved in our method where users are allowed to optionally fine-tune the facial parameters on our creation according to their needs. Our approach also shows better robustness than previous methods, especially for those photos with head-pose variance. Comparison results and ablation analysis on seven public face verification datasets suggest the effectiveness of our method.

AAAI Conference 2020 Conference Paper

FDN: Feature Decoupling Network for Head Pose Estimation

  • Hao Zhang
  • Mengmeng Wang
  • Yong Liu
  • Yi Yuan

Head pose estimation from RGB images without depth information is a challenging task due to the loss of spatial information as well as large head pose variations in the wild. The performance of existing landmark-free methods remains unsatisfactory as the quality of estimated pose is inferior. In this paper, we propose a novel three-branch network architecture, termed as Feature Decoupling Network (FDN), a more powerful architecture for landmark-free head pose estimation from a single RGB image. In FDN, we first propose a feature decoupling (FD) module to explicitly learn the discriminative features for each pose angle by adaptively recalibrating its channel-wise responses. Besides, we introduce a crosscategory center (CCC) loss to constrain the distribution of the latent variable subspaces and thus we can obtain more compact and distinct subspaces. Extensive experiments on both in-the-wild and controlled environment datasets demonstrate that the proposed method outperforms other state-of-the-art methods based on a single RGB image and behaves on par with approaches based on multimodal input resources.