Arrow Research search

Author name cluster

Tao Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

61 papers
2 author rows

Possible papers

61

AAAI Conference 2026 Conference Paper

Realistic Curriculum Reinforcement Learning for Autonomous and Sustainable Marine Vessel Navigation

  • Xiaocai Zhang
  • Zhe Xiao
  • Maohan Liang
  • Tao Liu
  • Haijiang Li
  • Wenbin Zhang

Sustainability is becoming increasingly critical in the maritime transport, encompassing both environmental and social impacts, such as Greenhouse Gas (GHG) emissions and navigational safety. Traditional vessel navigation heavily relies on human experience, often lacking autonomy and emission awareness, and is prone to human errors that may compromise safety. In this paper, we propose a Curriculum Reinforcement Learning (CRL) framework integrated with a realistic, data-driven marine simulation environment and a machine learning-based fuel consumption prediction module. The simulation environment is constructed using real-world vessel movement data and enhanced with a Diffusion Model to simulate dynamic maritime conditions. Vessel fuel consumption is estimated using historical operational data and learning-based regression. The surrounding environment is represented as image-based inputs to capture spatial complexity. We design a lightweight, policy-based CRL agent with a comprehensive reward mechanism that considers safety, emissions, timeliness, and goal completion. This framework effectively handles complex tasks progressively while ensuring stable and efficient learning in continuous action spaces. We validate the proposed approach in a sea area of the Indian Ocean, demonstrating its efficacy in enabling sustainable and safe vessel navigation.

JBHI Journal 2026 Journal Article

Whisperization and Masked CycleGAN-Based Framework for Electrolaryngeal Speech Enhancement

  • Jie Zhou
  • Li Wang
  • Fengji Li
  • Shaochuan Zhang
  • Fan Fan
  • Tao Liu
  • Xiaohong Chen
  • Haijun Niu

Electrolarynx (EL) provides an effective approach to voice rehabilitation for patients with phonation disorder. However, due to its reliance on an external mechanical source, EL speech suffers from limited acoustic cues, leading to degraded quality and restricting the potential of subsequent modeling and enhancement. This paper proposes a novel EL speech enhancement framework that combines whisperization with Masked CycleGAN model. The whisperization step removes redundant constant excitation and mechanical noise, generating an intermediate speech form—whisper-like EL (W-EL) speech, whose acoustic and perceptual properties are closer to natural whisper. Subsequently, the Masked CycleGAN employs a frame-level masking strategy to guide the generator in reconstructing missing prosodic and linguistic features. Thus, we achieved a dual-stage enhancement of “redundancy removal” and “deficiency compensation. ” Acoustic feature analysis demonstrates that the converted W-EL speech is more similar to normal speech in terms of spectrogram, fundamental frequency (F0) values, and F0 contours, while also compensating for the missing low frequency energy below 500 Hz. Objective evaluations show significant improvements across multiple metrics. Subjective evaluations confirm that W-EL speech exhibits higher naturalness and intelligibility compared to original EL speech. Moreover, the combined “whisperization + voice conversion” framework further enhances perceptual quality. This study not only offer a novel pathway for EL speech enhancement, but also may provide valuable insights for improving other types of pathological speech.

JBHI Journal 2025 Journal Article

Addressing Multiple Challenges in Early Gait Freezing Prediction for Parkinson's Disease: A Practical Deep Learning Approach

  • Wenan Wang
  • Jingfeng Lin
  • Xinning Le
  • Yaru Li
  • Tao Liu
  • Lunxin Pan
  • Min Li
  • Dezhong Yao

Objective: Freezing of Gait (FOG) significantly impacts daily activities of Parkinson's disease (PD) patients. Despite the potential of wearable sensors in predicting FOG, challenges persist, including the brief prediction interval before FOG onset, limited generalization across patients, and the inconvenience of multiple sensors. Addressing one issue often aggravates others, making it difficult to achieve suitable concurrent solutions to all these challenges. Methods: We introduce the PhysioGait Predictive Network (PhysioGPN), a deep learning framework designed to predict FOG events in PD patients at least 2 seconds prior to onset. The model architecture incorporates four key strategies: 1) Detection of progressive motion changes using large convolutional kernels; 2) Unraveling the complexity of motion coordination and gait dynamics using multi-dimensional and multi-scale convolution; 3) Capture gait self-similarity and asymmetry with twin-tower structure; 4) Promoting cross-domain information exchange with multi-domain attention. Furthermore, we propose a framework based on knowledge distillation (KD), reducing the model's dependence on multiple sensors while maintaining prediction accuracy. Results: The model achieves an 85. 8% Area Under the Curve (AUC) in FOG prediction. When reducing the number of sensors, KD mitigates the decline in performance and increases the AUC by 5. 1%, compared to scenarios without KD. Conclusion: Our research proposes a practical solution to the challenges of FOG prediction, demonstrating the effectiveness of the KD approach for lightweight wearable sensors in rehabilitation engineering. Significance: Our findings offer valuable insights for addressing multiple challenges in the practical application of wearable devices.

IROS Conference 2025 Conference Paper

ContextCache: Task-Aware Lifecycle Management for Memory-Efficient LLM Agent Deployment

  • Tao Liu
  • Ping Guo
  • Dong Feng
  • Peng Wang

LLM-based agents have demonstrated remarkable capabilities in multi-step reasoning and task execution across domains such as robotics and autonomous systems. However, deploying these agents on resource-constrained platforms presents a fundamental challenge: minimizing latency while optimizing memory usage. Existing caching techniques (KVCache, PrefixCache, PromptCache) improve inference speed by reusing cached context but overlook LLM dependency relationships in agent workflows, leading to excessive memory usage or redundant recomputation across LLM calls. To address this, we propose ContextCache, a task-aware lifecycle management framework that optimizes context fragment caching for multi-step LLM agents. ContextCache predicts the lifespan of each context fragment and dynamically allocates and releases GPU memory accordingly. We evaluate our approach on a newly constructed dataset, covering logistics coordination, assembly tasks, and health management. Experimental results demonstrate a 15% reduction in memory usage compared to state-of-the-art caching strategies, with no loss in inference efficiency, making our approach well-suited for real-world deployment in resource-constrained environments.

ECAI Conference 2025 Conference Paper

Degree of Staleness-Aware Data Updating in Federated Learning

  • Tao Liu
  • Xuehe Wang

Handling data staleness remains a significant challenge in federated learning with highly time-sensitive tasks, where data is generated continuously and data staleness largely affects model performance. Although recent works attempt to optimize data staleness by determining local data update frequency or client selection strategy, none of them explore taking both data staleness and data volume into consideration. In this paper, we propose Data Updating in Federated Learning (DUFL), an incentive mechanism featuring an innovative local data update scheme manipulated by three knobs: the server’s payment, outdated data conservation rate, and clients’ fresh data collection volume, to coordinate staleness and volume of local data for best utilities. To this end, we introduce a novel metric called Degree of Staleness (DoS) to quantify data staleness and conduct a theoretic analysis illustrating the quantitative relationship between DoS and model performance. We model DUFL as a two-stage Stackelberg game with dynamic constraint, deriving the optimal local data update strategy for each client in closed-form and the approximately optimal strategy for the server. Experimental results on real-world datasets demonstrate the significant performance of our approach.

JBHI Journal 2025 Journal Article

Extraction of Fetal ECG by Logarithmic Hyperbolic Secant Adaptive Algorithm in Alpha-Stable Noise

  • Mengjia Wang
  • Deqiu Zhai
  • Jiacheng Zhang
  • Bo Ni
  • Tao Liu

Direct fetal electrocardiogram (FECG) plays a crucial role in assessing fetal health and monitoring pregnancy conditions. Extracting high-quality FECG signals from maternal abdominal electrocardiogram (AECG) recordings remains a significant challenge due to the low amplitude of the FECG, its overlap with the maternal electrocardiogram (MECG), and the potential exposure to impulsive noise in the real world. Adaptive filtering (AF) is an essential method for FECG extraction, however, its performance tends to degrade in the presence of impulsive noise, such as instrument interference. To address this limitation, we propose a novel AF algorithm based on a nonlinear logarithmic hyperbolic secant (LHS) cost function. Alpha-stable distribution is adopted to model the realistic noises due to its high scalability. To further enhance extraction accuracy and optimize the preset parameters, we introduce a hyperbolic tangent-like transformation and develop the improved logarithmic hyperbolic secant adaptive filtering (ILHSAF) algorithm. The proposed approach leverages the approximate linear interval of the LHS function to maximize the preservation of original FECG information within the AECG. We use the synthetic dataset FECGSYN as well as two real datasets, Daisy and NI-FECG, to evaluate the performance and our methods outperform other existing AF algorithms. The ILHSAF algorithm exhibits commendable performance in R-peak detection and full-wave analysis on both real-world datasets, indicating its effective denoising capability and robustness in FECG extraction. This advancement establishes a foundation for long-term maternal and fetal monitoring using portable devices, as the proposed algorithms are capable of real-time operation.

NeurIPS Conference 2025 Conference Paper

From Cradle to Cane: A Two-Pass Framework for High-Fidelity Lifespan Face Aging

  • Tao Liu
  • Dafeng Zhang
  • Gengchen Li
  • Shizhuo Liu
  • yongqi song
  • Senmao Li
  • Shiqi Yang
  • Boqian Li

Face aging has become a crucial task in computer vision, with applications ranging from entertainment to healthcare. However, existing methods struggle with achieving a realistic and seamless transformation across the entire lifespan, especially when handling large age gaps or extreme head poses. The core challenge lies in balancing $age\ accuracy$ and $identity\ preservation$—what we refer to as the $Age\text{-}ID\ trade\text{-}off$. Most prior methods either prioritize age transformation at the expense of identity consistency or vice versa. In this work, we address this issue by proposing a $two\text{-}pass$ face aging framework, named $Cradle2Cane$, based on few-step text-to-image (T2I) diffusion models. The first pass focuses on solving $age\ accuracy$ by introducing an adaptive noise injection ($AdaNI$) mechanism. This mechanism is guided by including prompt descriptions of age and gender for the given person as the textual condition. Also, by adjusting the noise level, we can control the strength of aging while allowing more flexibility in transforming the face. However, identity preservation is weakly ensured here to facilitate stronger age transformations. In the second pass, we enhance $identity\ preservation$ while maintaining age-specific features by conditioning the model on two identity-aware embeddings ($IDEmb$): $SVR\text{-}ArcFace$ and $Rotate\text{-}CLIP$. This pass allows for denoising the transformed image from the first pass, ensuring stronger identity preservation without compromising the aging accuracy. Both passes are $jointly\ trained\ in\ an\ end\text{-}to\text{-}end\ way\$. Extensive experiments on the CelebA-HQ test dataset, evaluated through Face++ and Qwen-VL protocols, show that our $Cradle2Cane$ outperforms existing face aging methods in age accuracy and identity consistency. Additionally, $Cradle2Cane$ demonstrates superior robustness when applied to in-the-wild human face images, where prior methods often fail. This significantly broadens its applicability to more diverse and unconstrained real-world scenarios. Code is available at https: //github. com/byliutao/Cradle2Cane.

ICRA Conference 2025 Conference Paper

GS-EVT: Cross-Modal Event Camera Tracking Based on Gaussian Splatting

  • Tao Liu
  • Runze Yuan
  • Yi'ang Ju
  • Xun Xu
  • Jiaqi Yang
  • Xiangting Meng
  • Xavier Lagorce
  • Laurent Kneip

Reliable self-localization is a foundational skill for many intelligent mobile platforms. This paper explores the use of event cameras for motion tracking thereby providing a solution with inherent robustness under difficult dynamics and illumination. In order to circumvent the challenge of event camera-based mapping, the solution is framed in a cross-modal way. It tracks a map representation that comes directly from frame-based cameras. Specifically, the proposed method operates on top of gaussian splatting, a state-of-the-art representation that permits highly efficient and realistic novel view synthesis. The key of our approach consists of a novel pose parametrization that uses a reference pose plus first order dynamics for local differential image rendering. The latter is then compared against images of integrated events in a staggered coarse-to-fine optimization scheme. As demonstrated by our results, the realistic view rendering ability of gaussian splatting leads to stable and accurate tracking across a variety of both publicly available and newly recorded data sequences.

NeurIPS Conference 2025 Conference Paper

GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation

  • Tao Liu
  • Chongyu Wang
  • Rongjie Li
  • Yingchen Yu
  • Xuming He
  • Song Bai

While Multimodal Large Language Models (MLLMs) have advanced GUI navigation agents, current approaches face limitations in cross-domain generalization and effective history utilization. We present a reasoning-enhanced framework that systematically integrates structured reasoning, action prediction, and history summarization. The structured reasoning component generates coherent Chain-of-Thought analyses combining progress estimation and decision reasoning, which inform both immediate action predictions and compact history summaries for future steps. Based on this framework, we train a GUI agent, GUI-Rise, through supervised fine-tuning on pseudo-labeled trajectories and reinforcement learning with Group Relative Policy Optimization (GRPO). This framework employs specialized rewards, including a history-aware objective, directly linking summary quality to subsequent action performance. Comprehensive evaluations on standard benchmarks demonstrate state-of-the-art results under identical training data conditions, with particularly strong performance in out-of-domain scenarios. These findings validate our framework's ability to maintain robust reasoning and generalization across diverse GUI navigation tasks.

ICLR Conference 2025 Conference Paper

One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt

  • Tao Liu
  • Kai Wang 0060
  • Senmao Li
  • Joost van de Weijer 0001
  • Fahad Shahbaz Khan
  • Shiqi Yang 0002
  • Yaxing Wang
  • Jian Yang 0003

Text-to-image generation models can create high-quality images from input prompts. However, they struggle to support the consistent generation of identity-preserving requirements for storytelling. Existing approaches to this problem typically require extensive training in large datasets or additional modifications to the original model architectures. This limits their applicability across different domains and diverse diffusion model configurations. In this paper, we first observe the inherent capability of language models, coined $\textit{context consistency}$, to comprehend identity through context with a single prompt. Drawing inspiration from the inherent $\textit{context consistency}$, we propose a novel $\textit{training-free}$ method for consistent text-to-image (T2I) generation, termed "One-Prompt-One-Story" ($\textit{1Prompt1Story}$). Our approach $\textit{1Prompt1Story}$ concatenates all prompts into a single input for T2I diffusion models, initially preserving character identities. We then refine the generation process using two novel techniques: $\textit{Singular-Value Reweighting}$ and $\textit{Identity-Preserving Cross-Attention}$, ensuring better alignment with the input description for each frame. In our experiments, we compare our method against various existing consistent T2I generation approaches to demonstrate its effectiveness, through quantitative metrics and qualitative assessments. Code is available at https://github.com/byliutao/1Prompt1Story.

JBHI Journal 2025 Journal Article

PPA Net: The Pixel Prediction Assisted Net for 3D TOF-MRA Cerebrovascular Segmentation

  • Zhiqi Lee
  • Tao Liu
  • Haonan Zhang
  • Xiang Zhang
  • Xuan Li
  • Yizhen Pan
  • Tingting Wu
  • Jierui Ding

Cerebrovascular segmentation is essential for diagnosing and treating cerebrovascular diseases. However, accurately segmenting cerebral vessels in TOF-MRA remains challenging due to significant interindividual variations in cerebrovascular morphology, low image con-trast, and class imbalance. The present study proposes an advanced deep learning model called PPA Net, consisting of VesselMRA Net and VesselConvLSTM components. Firstly, VesselMRA Net utilizes rectangular convolutional blocks to fuse multi-scale features, enhancing feature extraction per-formance. VesselMRA Net employs the attention mechanism to boost certain valuable semantic weighting, addressing segmentation challenges arising from class imbalance and low contrast. Secondly, VesselConvLSTM, a pixel-level prediction model, employs a gating mechanism to learn cerebral vessel morphology across individuals. It reduces individual differences in segmentation and restores inter-voxel correlations disrupted by data slicing, aiding VesselMRA Net in accurately segmenting cerebrovascular pixels. Lastly, integrating VesselMRA Net and VesselConv-LSTM results in a modular cerebral vessel segmentation framework, PPA Net, facilitating separate optimization of the backbone network and predicted model components. The performance of this model has been extensively validated through experimental evaluations on three publicly available datasets, obtaining significant competitiveness when compared to the state-of-the-art of the current cerebral vessel segmentation models.

AAAI Conference 2025 Conference Paper

Relation-aware Hierarchical Prompt for Open-vocabulary Scene Graph Generation

  • Tao Liu
  • Rongjie Li
  • Chongyu Wang
  • Xuming He

Open-vocabulary Scene Graph Generation (OV-SGG) overcomes the limitations of the closed-set assumption by aligning visual relationship representations with open-vocabulary textual representations. This enables the identification of novel visual relationships, making it applicable to real-world scenarios with diverse relationships. However, existing OV-SGG methods are constrained by fixed text representations, limiting diversity and accuracy in image-text alignment. To address these challenges, we propose the Relation-Aware Hierarchical Prompting (RAHP) framework, which enhances text representation by integrating subject-object and region-specific relation information. Our approach utilizes entity clustering to address the complexity of relation triplet categories, enabling the effective integration of subject-object information. Additionally, we utilize a large language model (LLM) to generate detailed region-aware prompts, capturing fine-grained visual interactions and improving alignment between visual and textual modalities. RAHP also introduces a dynamic selection mechanism within Vision-Language Models (VLMs), which adaptively selects relevant text prompts based on the visual content, reducing noise from irrelevant prompts. Extensive experiments on the Visual Genome and Open Images v6 datasets demonstrate that our framework consistently achieves state-of-the-art performance, demonstrating its effectiveness in addressing the challenges of open-vocabulary scene graph generation.

AAAI Conference 2025 Conference Paper

VQTalker: Towards Multilingual Talking Avatars Through Facial Motion Tokenization

  • Tao Liu
  • Ziyang Ma
  • Qi Chen
  • Feilong Chen
  • Shuai Fan
  • Xie Chen
  • Kai Yu

We present VQTalker, a Vector Quantization-based framework for multilingual talking head generation that addresses the challenges of lip synchronization and natural motion across diverse languages. Our approach is grounded in the phonetic principle that human speech comprises a finite set of distinct sound units (phonemes) and corresponding visual articulations (visemes), which often share commonalities across languages. We introduce a facial motion tokenizer based on Group Residual Finite Scalar Quantization (GRFSQ), which creates a discretized representation of facial features. This method enables comprehensive capture of facial movements while improving generalization to multiple languages, even with limited training data. Building on this quantized representation, we implement a coarse-to-fine motion generation process that progressively refines facial animations. Extensive experiments demonstrate that VQTalker achieves state-of-the-art performance in both video-driven and speech-driven scenarios, particularly in multilingual settings. Notably, our method achieves high-quality results at a resolution of 512 × 512 pixels while maintaining a lower bitrate of approximately 11 kbps. Our work opens new possibilities for cross-lingual talking face generation.

JBHI Journal 2024 Journal Article

Ankle Moment Estimation Based on A Novel Distributed Plantar Pressure Sensing System

  • Mingyu Du
  • Bowen Lv
  • Bingfei Fan
  • Xiaoling Li
  • Junze Yu
  • Fugang Yi
  • Tao Liu
  • Shibo Cai

Ankle moment plays an important role in human gait analysis, patients’ rehabilitation process monitoring, and the human-machine interaction control of exoskeleton robots. However, current ankle moment estimation methods mainly rely on inverse dynamics (ID) based on optical motion capture system (OMC) and force plate. These methods rely on fixed instruments in the laboratory, which are difficult to be applied to the control of exoskeleton robots. To solve this problem, this paper developed a new distributed plantar pressure system and proposed an ankle plantar flexion moment estimation method using the plantar pressure system. We integrated eight pressure sensors in each insole to collect the pressure data of the key area of the foot and then used the plantar pressure data to train four neural networks to obtain the ankle moment. The performance of the models was evaluated using normalized root mean square error (NRMSE) and cross-correlation coefficient (ρ). During experiments, eight subjects were recruited for the overground walking tests, and OMC and force plate were used as the gold standard. The results indicate that the Genetic algorithm - Gated recurrent unit estimation algorithm (GA-GRU) was the best estimation model which achieved the highest accuracy in generalized ankle moment estimation (NRMSE = 7. 23%, ρ = 0. 85) compared with the other models. The designed novel distributed plantar pressure system and the proposed method could serve as a joint moment estimation approach in wearable robot control and human motion state monitoring.

AAAI Conference 2024 Conference Paper

Beyond Traditional Threats: A Persistent Backdoor Attack on Federated Learning

  • Tao Liu
  • Yuhang Zhang
  • Zhu Feng
  • Zhiqin Yang
  • Chen Xu
  • Dapeng Man
  • Wu Yang

Backdoors on federated learning will be diluted by subsequent benign updates. This is reflected in the significant reduction of attack success rate as iterations increase, ultimately failing. We use a new metric to quantify the degree of this weakened backdoor effect, called attack persistence. Given that research to improve this performance has not been widely noted, we propose a Full Combination Backdoor Attack (FCBA) method. It aggregates more combined trigger information for a more complete backdoor pattern in the global model. Trained backdoored global model is more resilient to benign updates, leading to a higher attack success rate on the test set. We test on three datasets and evaluate with two models across various settings. FCBA's persistence outperforms SOTA federated learning backdoor attacks. On GTSRB, post-attack 120 rounds, our attack success rate rose over 50% from baseline. The core code of our method is available at https://github.com/PhD-TaoLiu/FCBA.

IROS Conference 2024 Conference Paper

EVIT: Event-based Visual-Inertial Tracking in Semi-Dense Maps Using Windowed Nonlinear Optimization

  • Runze Yuan
  • Tao Liu
  • Zijia Dai
  • Yi-Fan Zuo
  • Laurent Kneip

Event cameras are an interesting visual exteroceptive sensor that reacts to brightness changes rather than integrating absolute image intensities. Owing to this design, the sensor exhibits strong performance in situations of challenging dynamics and illumination conditions. While event-based simultaneous tracking and mapping remains a challenging problem, a number of recent works have pointed out the sensor’s suitability for prior map-based tracking. By making use of cross-modal registration paradigms, the camera’s ego-motion can be tracked across a large spectrum of illumination and dynamics conditions on top of accurate maps that have been created a priori by more traditional sensors. The present paper follows up on a recently introduced event-based geometric semi-dense tracking paradigm, and proposes the addition of inertial signals in order to robustify the estimation. More specifically, the added signals provide strong cues for pose initialization as well as regularization during windowed, multi-frame tracking. As a result, the proposed framework achieves increased performance under challenging illumination conditions as well as a reduction of the rate at which intermediate event representations need to be registered in order to maintain stable tracking across highly dynamic sequences. Our evaluation focuses on a diverse set of real world sequences and comprises a comparison of our proposed method against a purely event-based alternative running at different rates.

NeurIPS Conference 2024 Conference Paper

Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference

  • Senmao Li
  • Taihang Hu
  • Joost van de Weijer
  • Fahad S. Khan
  • Tao Liu
  • Linxuan Li
  • Shiqi Yang
  • Yaxing Wang

One of the main drawback of diffusion models is the slow inference time for image generation. Among the most successful approaches to addressing this problem are distillation methods. However, these methods require considerable computational resources. In this paper, we take another approach to diffusion model acceleration. We conduct a comprehensive study of the UNet encoder and empirically analyze the encoder features. This provides insights regarding their changes during the inference process. In particular, we find that encoder features change minimally, whereas the decoder features exhibit substantial variations across different time-steps. This insight motivates us to omit encoder computation at certain adjacent time-steps and reuse encoder features of previous time-steps as input to the decoder in multiple time-steps. Importantly, this allows us to perform decoder computation in parallel, further accelerating the denoising process. Additionally, we introduce a prior noise injection method to improve the texture details in the generated image. Besides the standard text-to-image task, we also validate our approach on other tasks: text-to-video, personalized generation and reference-guided generation. Without utilizing any knowledge distillation technique, our approach accelerates both the Stable Diffusion (SD) and DeepFloyd-IF model sampling by 41$\%$ and 24$\%$ respectively, and DiT model sampling by 34$\%$, while maintaining high-quality generation performance. Our code will be publicly released.

NeurIPS Conference 2023 Conference Paper

Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation

  • Ruida Zhou
  • Tao Liu
  • Min Cheng
  • Dileep Kalathil
  • P. R. Kumar
  • Chao Tian

We study robust reinforcement learning (RL) with the goal of determining a well-performing policy that is robust against model mismatch between the training simulator and the testing environment. Previous policy-based robust RL algorithms mainly focus on the tabular setting under uncertainty sets that facilitate robust policy evaluation, but are no longer tractable when the number of states scales up. To this end, we propose two novel uncertainty set formulations, one based on double sampling and the other on an integral probability metric. Both make large-scale robust RL tractable even when one only has access to a simulator. We propose a robust natural actor-critic (RNAC) approach that incorporates the new uncertainty sets and employs function approximation. We provide finite-time convergence guarantees for the proposed RNAC algorithm to the optimal robust policy within the function approximation error. Finally, we demonstrate the robust performance of the policy learned by our proposed RNAC approach in multiple MuJoCo environments and a real-world TurtleBot navigation task.

NeurIPS Conference 2023 Conference Paper

Penguin: Parallel-Packed Homomorphic Encryption for Fast Graph Convolutional Network Inference

  • Ran Ran
  • Nuo Xu
  • Tao Liu
  • Wei Wang
  • Gang Quan
  • Wujie Wen

The marriage of Graph Convolutional Network (GCN) and Homomorphic Encryption (HE) enables the inference of graph data on the cloud with significantly enhanced client data privacy. However, the tremendous computation and memory overhead associated with HE operations challenges the practicality of HE-based GCN inference. GCN inference involves a sequence of expensive matrix-matrix multiplications, and we observe that directly applying the state-of-the-art HE-based secure matrix-matrix multiplication solutions to accelerate HE-GCN inference is far less efficient as it does not exploit the unique aggregation mechanism of two-dimension graph node-features in GCN layer computation. As a result, in this paper, we propose a novel HE-based ciphertext packing technique, i. e. , Penguin, that can take advantage of the unique computation pattern during the HE-GCN inference to significantly reduce the computation and memory overhead associated with HE operations. Specifically, Penguin employs (i) an effective two-dimension parallel packing technique for feature ciphertext with optimal graph node partitioning and graph feature interleaving, and (ii) an interleaved assembly technique that can effectively make use of the blank slots to merge ciphertexts after feature reduction and significantly reduce the costly rotation operation. We provide theoretical analysis and experimental validation to demonstrate the speedup achieved by Penguin in accelerating GCN inference using popular GCN models and datasets. Our results show that Penguin can achieve up to $\sim10\times$ speedup and around $\sim79$% reduction in computational memory overhead, significantly outperforming state-of-the-art solutions. To the best of our knowledge, this is the first work that can ensure the protection of both graph structure and features when accelerating HE-GCN inference on encrypted data. Our code is publicly available at https: //github. com/ranran0523/Penguin.

NeurIPS Conference 2023 Conference Paper

Provably Fast Convergence of Independent Natural Policy Gradient for Markov Potential Games

  • Youbang Sun
  • Tao Liu
  • Ruida Zhou
  • P. R. Kumar
  • Shahin Shahrampour

This work studies an independent natural policy gradient (NPG) algorithm for the multi-agent reinforcement learning problem in Markov potential games. It is shown that, under mild technical assumptions and the introduction of the \textit{suboptimality gap}, the independent NPG method with an oracle providing exact policy evaluation asymptotically reaches an $\epsilon$-Nash Equilibrium (NE) within $\mathcal{O}(1/\epsilon)$ iterations. This improves upon the previous best result of $\mathcal{O}(1/\epsilon^2)$ iterations and is of the same order, $\mathcal{O}(1/\epsilon)$, that is achievable for the single-agent case. Empirical results for a synthetic potential game and a congestion game are presented to verify the theoretical bounds.

NeurIPS Conference 2022 Conference Paper

Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning

  • Ruida Zhou
  • Tao Liu
  • Dileep Kalathil
  • P. R. Kumar
  • Chao Tian

We study policy optimization for Markov decision processes (MDPs) with multiple reward value functions, which are to be jointly optimized according to given criteria such as proportional fairness (smooth concave scalarization), hard constraints (constrained MDP), and max-min trade-off. We propose an Anchor-changing Regularized Natural Policy Gradient (ARNPG) framework, which can systematically incorporate ideas from well-performing first-order methods into the design of policy optimization algorithms for multi-objective MDP problems. Theoretically, the designed algorithms based on the ARNPG framework achieve $\tilde{O}(1/T)$ global convergence with exact gradients. Empirically, the ARNPG-guided algorithms also demonstrate superior performance compared to some existing policy gradient-based approaches in both exact gradients and sample-based scenarios.

NeurIPS Conference 2022 Conference Paper

Falconn++: A Locality-sensitive Filtering Approach for Approximate Nearest Neighbor Search

  • Ninh Pham
  • Tao Liu

We present Falconn++, a novel locality-sensitive filtering (LSF) approach for approximate nearest neighbor search on angular distance. Falconn++ can filter out potential far away points in any hash bucket before querying, which results in higher quality candidates compared to other hashing-based solutions. Theoretically, Falconn++ asymptotically achieves lower query time complexity than Falconn, an optimal locality-sensitive hashing scheme on angular distance. Empirically, Falconn++ achieves a higher recall-speed tradeoff than Falconn on many real-world data sets. Falconn++ is also competitive with HNSW, an efficient representative of graph-based solutions on high search recall regimes.

NeurIPS Conference 2022 Conference Paper

Learning from Few Samples: Transformation-Invariant SVMs with Composition and Locality at Multiple Scales

  • Tao Liu
  • P. R. Kumar
  • Ruida Zhou
  • Xi Liu

Motivated by the problem of learning with small sample sizes, this paper shows how to incorporate into support-vector machines (SVMs) those properties that have made convolutional neural networks (CNNs) successful. Particularly important is the ability to incorporate domain knowledge of invariances, e. g. , translational invariance of images. Kernels based on the \textit{maximum} similarity over a group of transformations are not generally positive definite. Perhaps it is for this reason that they have not been studied theoretically. We address this lacuna and show that positive definiteness indeed holds \textit{with high probability} for kernels based on the maximum similarity in the small training sample set regime of interest, and that they do yield the best results in that regime. We also show how additional properties such as their ability to incorporate local features at multiple spatial scales, e. g. , as done in CNNs through max pooling, and to provide the benefits of composition through the architecture of multiple layers, can also be embedded into SVMs. We verify through experiments on widely available image sets that the resulting SVMs do provide superior accuracy in comparison to well-established deep neural network benchmarks for small sample sizes.

JBHI Journal 2021 Journal Article

IMU-Based Gait Normalcy Index Calculation for Clinical Evaluation of Impaired Gait

  • Lei Wang
  • Yun Sun
  • Qingguo Li
  • Tao Liu
  • Jingang Yi

Inertial measurement units (IMU) have been used for gait analysis in many clinical studies, as a more convenient, low cost and less restricted alternative to the laboratory-based motion capture systems or instrumented walkways. Spatial-temporal gait parameters such as gait cycle duration and stride length calculated from the IMUs were often used in these studies for evaluating the impaired gait. However, the spatial-temporal information provided by IMUs is limited, and sometime suffers incomplete and less effective evaluation. In this study, we develop a novel IMU-based method for clinical gait evaluation. Nine gait variables including three spatial-temporal parameters and six kinematic parameters are extracted from two shank-mounted IMUs for quantifying patient's gait deviations. Based on those parameters, an IMU-based gait normalcy index (INI) is derived to evaluate the overall gait performance. Eight inpatient subjects with gait impairments caused by n-hexane neuropathy and ten healthy subjects were recruited. The proposed gait variables and INI were examined on the inpatients at three to five time instants during the rehabilitation process until being discharged. A comparison with healthy subjects and statistical analysis for the changes of gait variables and INI demonstrated that the proposed new set of gait variables and INI can provide adequate and effective information for quantifying gait abnormalities, and help understanding the progress of gait and effectiveness of therapy during rehabilitation process.

NeurIPS Conference 2021 Conference Paper

Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs

  • Tao Liu
  • Ruida Zhou
  • Dileep Kalathil
  • Panganamala Kumar
  • Chao Tian

We address the issue of safety in reinforcement learning. We pose the problem in an episodic framework of a constrained Markov decision process. Existing results have shown that it is possible to achieve a reward regret of $\tilde{\mathcal{O}}(\sqrt{K})$ while allowing an $\tilde{\mathcal{O}}(\sqrt{K})$ constraint violation in $K$ episodes. A critical question that arises is whether it is possible to keep the constraint violation even smaller. We show that when a strictly safe policy is known, then one can confine the system to zero constraint violation with arbitrarily high probability while keeping the reward regret of order $\tilde{\mathcal{O}}(\sqrt{K})$. The algorithm which does so employs the principle of optimistic pessimism in the face of uncertainty to achieve safe exploration. When no strictly safe policy is known, though one is known to exist, then it is possible to restrict the system to bounded constraint violation with arbitrarily high probability. This is shown to be realized by a primal-dual algorithm with an optimistic primal estimate and a pessimistic dual update.

IJCAI Conference 2021 Conference Paper

What If We Could Not See? Counterfactual Analysis for Egocentric Action Anticipation

  • Tianyu Zhang
  • Weiqing Min
  • Jiahao Yang
  • Tao Liu
  • Shuqiang Jiang
  • Yong Rui

Egocentric action anticipation aims at predicting the near future based on past observation in first-person vision. While future actions may be wrongly predicted due to the dataset bias, we present a counterfactual analysis framework for egocentric action anticipation (CA-EAA) to enhance the capacity. In the factual case, we can predict the upcoming action based on visual features and semantic labels from past observation. Imagining one counterfactual situation where no visual representation had been observed, we would obtain a counterfactual predicted action only using past semantic labels. In this way, we can reduce the side-effect caused by semantic labels via a comparison between factual and counterfactual outcomes, which moves a step towards unbiased prediction for egocentric action anticipation. We conduct experiments on two large-scale egocentric video datasets. Qualitative and quantitative results validate the effectiveness of our proposed CA-EAA.

JBHI Journal 2020 Journal Article

Spatially Aware Dense-LinkNet Based Regression Improves Fluorescent Cell Detection in Adaptive Optics Ophthalmic Images

  • Jianfei Liu
  • Yoo-Jean Han
  • Tao Liu
  • Nancy Aguilera
  • Johnny Tam

Retinal pigment epithelial (RPE) cells play an important role in nourishing retinal neurosensory photoreceptor cells, and numerous blinding diseases are associated with RPE defects. Their fluorescence signature can now be visualized in the living human eye using adaptive optics (AO) imaging combined with indocyanine green (ICG), which motivates us to develop an automated RPE detection method to improve the quantitative evaluation of RPE status in patients. This paper proposes a spatially-aware, Dense-LinkNet-based regression approach to improve the detection of in vivo fluorescent cell patterns, achieving precision, recall, and F1-Score of 93. 6 $\pm$ 4. 3%, 81. 4 $\pm$ 9. 5%, and 86. 7 $\pm$ 5. 7%, respectively. These results demonstrate the utility of incorporating spatial inputs into a deep learning-based regression framework for cell detection.

AAAI Conference 2017 Conference Paper

Neural Bag-of-Ngrams

  • Bofang Li
  • Tao Liu
  • Zhe Zhao
  • Puwei Wang
  • Xiaoyong Du

Bag-of-ngrams (BoN) models are commonly used for representing text. One of the main drawbacks of traditional BoN is the ignorance of n-gram’s semantics. In this paper, we introduce the concept of Neural Bag-of-ngrams (Neural-BoN), which replaces sparse one-hot n-gram representation in traditional BoN with dense and rich-semantic n-gram representations. We first propose context guided n-gram representation by adding n-grams to word embeddings model. However, the context guided learning strategy of word embeddings is likely to miss some semantics for text-level tasks. Text guided ngram representation and label guided n-gram representation are proposed to capture more semantics like topic or sentiment tendencies. Neural-BoN with the latter two n-gram representations achieve state-of-the-art results on 4 documentlevel classification datasets and 6 semantic relatedness categories. They are also on par with some sophisticated DNNs on 3 sentence-level classification datasets. Similar to traditional BoN, Neural-BoN is efficient, robust and easy to implement. We expect it to be a strong baseline and be used in more real-world applications.

AAAI Conference 2011 Conference Paper

Partially Supervised Text Classification with Multi-Level Examples

  • Tao Liu
  • Xiaoyong Du
  • Yongdong Xu
  • Minghui Li
  • Xiaolong Wang

Partially supervised text classification has received great research attention since it only uses positive and unlabeled examples as training data. This problem can be solved by automatically labeling some negative (and more positive) examples from unlabeled examples before training a text classifier. But it is difficult to guarantee both high quality and quantity of the new labeled examples. In this paper, a multi-level example based learning method for partially supervised text classification is proposed, which can make full use of all unlabeled examples. A heuristic method is proposed to assign possible labels to unlabeled examples and partition them into multiple levels according to their labeling confidence. A text classifier is trained on these multi-level examples using weighted support vector machines. Experiments show that the multi-level example based learning method is effective for partially supervised text classification, and outperforms the existing popular methods such as Biased-SVM, ROC-SVM, S-EM and WL.