Arrow Research search

Author name cluster

Tong Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

26 papers
2 author rows

Possible papers

26

TIST Journal 2026 Journal Article

Dynamic Population Distribution Aware Human Trajectory Generation with Diffusion Model

  • Qingyue Long
  • Can Rong
  • Tong Li
  • Yong Li

Human trajectory data are crucial in urban planning, traffic engineering, and public health. However, directly using real-world trajectory data often faces challenges such as privacy concerns, data acquisition costs, and data quality. A practical solution to these challenges is trajectory generation, a method developed to simulate human mobility behaviors. Existing trajectory generation methods mainly focus on capturing individual movement patterns but often overlook the influence of population distribution on trajectory generation. In reality, dynamic population distribution reflects changes in population density across different regions, significantly impacting individual mobility behavior. Thus, we propose a novel trajectory generation framework based on a diffusion model, which integrates the dynamic population distribution constraints to guide high-fidelity generation outcomes. Specifically, we construct a spatial graph to enhance the spatial correlation of trajectories. Then, we design a dynamic population distribution aware denoising network to capture the spatiotemporal dependencies of human mobility behavior as well as the impact of population distribution in the denoising process. Extensive experiments show that the trajectories generated by our model can resemble real-world trajectories in terms of some critical statistical metrics, outperforming state-of-the-art algorithms by over 54%.

AAAI Conference 2025 Conference Paper

Adaptive Dual Guidance Knowledge Distillation

  • Tong Li
  • Long Liu
  • Kang Liu
  • Xin Wang
  • Bo Zhou
  • Hongguang Yang
  • Kai Lu

Knowledge distillation (KD) aims to improve the performance of lightweight student networks under the guidance of pre-trained teachers. However, the large capacity gap between teachers and students limits the distillation gains. Previous methods addressing this problem have two weaknesses. First, most of them decrease the performance of pre-trained teachers, hindering students from achieving comparable performance. Second, these methods fail to dynamically adjust the transferred knowledge to be compatible with the representation ability of students, which is less effective in bridging the capacity gap. In this paper, we propose Adaptive Dual Guidance Knowledge Distillation (ADG-KD), which retains the guidance of the pre-trained teacher and uses the teacher's bidirectional optimization route guiding the student to alleviate the capacity gap problem. Specifically, ADG-KD introduces an initialized teacher, which has an identical structure to the pre-trained teacher and is optimized through the bidirectional supervision from both the pre-trained teacher and student. In this way, we construct the teacher's bidirectional optimization route to provide the students with an easy-to-hard and compatible knowledge sequence. ADG-KD trains the students under the proposed dual guidance approaches and automatically determines their importance weights, making the transferred knowledge better compatible with the representation ability of students. Extensive experiments on CIFAR-100, ImageNet, and MS-COCO demonstrate the effectiveness of our method.

IROS Conference 2025 Conference Paper

Multimodal Integrated Prediction and Decision-making with Adaptive Interaction Modality Explorations

  • Tong Li
  • Lu Zhang 0047
  • Sikang Liu 0002
  • Shaojie Shen

Navigating dense and dynamic environments poses a significant challenge for autonomous driving systems, owing to the intricate nature of multimodal interaction, wherein the actions of various traffic participants and the autonomous vehicle are complex and implicitly coupled. In this paper, we propose a novel framework, Multimodal Integrated predictioN and Decision-making (MIND), which addresses the challenges by efficiently generating joint predictions and decisions covering multiple distinctive interaction modalities. Specifically, MIND leverages learning-based scenario predictions to obtain integrated predictions and decisions with socially-consistent interaction modality and utilizes a modality-aware dynamic branching mechanism to generate scenario trees that efficiently capture the evolutions of distinctive interaction modalities with low growth of interaction uncertainty along the planning horizon. The scenario trees are seamlessly utilized by the contingency planning under interaction uncertainty to obtain clear and considerate maneuvers accounting for multimodal evolutions. Comprehensive experimental results in the closed-loop simulation based on the real-world driving dataset showcase superior performance to other strong baselines under various driving contexts. Code is available at: https://github.com/HKUST-Aerial-Robotics/MIND.

TIST Journal 2024 Journal Article

Demand-driven Urban Facility Visit Prediction

  • Yunke Zhang
  • Tong Li
  • Yuan Yuan
  • Fengli Xu
  • Fan Yang
  • Funing Sun
  • Yong Li

Predicting citizens’ visiting behaviors to urban facilities is instrumental for city governors and planners to detect inequalities in urban opportunities and optimize the distribution of facilities and resources. Previous works predict facility visits simply using observed visit behavior, yet citizens’ intrinsic demands for facilities are not characterized explicitly, causing potential incorrect learned relations in the prediction results. In this article, to make up for this deficiency, we present a demand-driven urban facility visit prediction method that decomposes citizens’ visits to facilities into their unobservable demands and their capability to fulfill them. Demands are expressed as the function of regional demographic attributes by a neural network, and the fulfillment capability is determined by the urban region’s spatial accessibility to facilities. Extensive evaluations of datasets of three large cities confirm the efficiency and rationality of our model. Our method outperforms the best state-of-the-art model by 8.28% on average in facility visit prediction tasks. Further analyses demonstrate the reasonableness of recovered facility demands and their relationship with citizen demographics. For instance, senior citizens tend to have higher medical demands but lower shopping demands. Meanwhile, estimated capabilities and accessibilities provide deeper insights into the decaying accessibility with respect to spatial distance and facilities’ diverse functions in the urban environment. Our findings shed light on demand-driven urban data mining and demand-based urban facility planning.

TIST Journal 2024 Journal Article

E 2 Storyline: Visualizing the Relationship with Triplet Entities and Event Discovery

  • Yunchao Wang
  • Guodao Sun
  • Zihao Zhu
  • Tong Li
  • Ling Chen
  • Ronghua Liang

The narrative progression of events, evolving into a cohesive story, relies on the entity-entity relationships. Among the plethora of visualization techniques, storyline visualization has gained significant recognition for its effectiveness in offering an overview of story trends, revealing entity relationships, and facilitating visual communication. However, existing methods for storyline visualization often fall short in accurately depicting the specific relationships between entities. In this study, we present E 2 Storyline, a novel approach that emphasizes simplicity and aesthetics of layout while effectively conveying entity-entity relationships to users. To achieve this, we begin by extracting entity-entity relationships from textual data and representing them as subject-predicate-object (SPO) triplets, thereby obtaining structured data. By considering three types of design requirements, we establish new optimization objectives and model the layout problem using multi-objective optimization (MOO) techniques. The aforementioned SPO triplets, together with time and event information, are incorporated into the optimization model to ensure a straightforward and easily comprehensible storyline layout. Through a qualitative user study, we determine that a pixel-based view is the most suitable method for displaying the relationships between entities. Finally, we apply E 2 Storyline to real-world data, including movie synopses and live text commentaries. Through comprehensive case studies, we demonstrate that E 2 Storyline enables users to better extract information from stories and comprehend the relationships between entities.

TIST Journal 2024 Journal Article

KGDA: A Knowledge Graph Driven Decomposition Approach for Cellular Traffic Prediction

  • Jiahui Gong
  • Tong Li
  • Huandong Wang
  • Yu Liu
  • Xing Wang
  • Zhendong Wang
  • Chao Deng
  • Junlan Feng

Understanding and accurately predicting cellular traffic data is vital for communication operators and device users, as it facilitates efficient resource allocation and ensures superior service quality. However, large-scale cellular traffic data forecasting remains challenging due to intricate temporal variations and complex spatial relationships. This article proposes a Knowledge Graph Driven Decomposition Approach (KGDA) for precise cellular traffic prediction. The KGDA breaks down the impact of static environmental factors and dynamic autocorrelations of cellular traffic time series, enabling the capture of overall traffic changes and understanding of traffic dependence on past values. Specifically, we propose an urban knowledge graph to capture the static environmental context of base stations, mapping these entities into the same latent space while retaining static environmental knowledge. The cellular traffic is divided into a regular pattern and fluctuating residual components, with the KGDA comprising four modules: a Knowledge Graph Representation Learning model, a traffic regular pattern prediction module, a traffic residual dynamic prediction module, and an attentional fusion module. The first leverages graph neural networks to extract spatial contexts and predict regular patterns, the second utilizes the Bi-directional Long Short-Term Memory (Bi-LSTM) model to capture autocorrelations of traffic time series, and the final module integrates the patterns and residuals to produce the final prediction result. Comprehensive experiments demonstrate that our proposed model outperforms state-of-the-art models by more than 10% in forecasting cellular traffic.

AAAI Conference 2024 Conference Paper

MASTER: Market-Guided Stock Transformer for Stock Price Forecasting

  • Tong Li
  • Zhaoyang Liu
  • Yanyan Shen
  • Xue Wang
  • Haokun Chen
  • Sen Huang

Stock price forecasting has remained an extremely challenging problem for many decades due to the high volatility of the stock market. Recent efforts have been devoted to modeling complex stock correlations toward joint stock price forecasting. Existing works share a common neural architecture that learns temporal patterns from individual stock series and then mixes up temporal representations to establish stock correlations. However, they only consider time-aligned stock correlations stemming from all the input stock features, which suffer from two limitations. First, stock correlations often occur momentarily and in a cross-time manner. Second, the feature effectiveness is dynamic with market variation, which affects both the stock sequential patterns and their correlations. To address the limitations, this paper introduces MASTER, a MArkert-guided Stock TransformER, which models the momentary and cross-time stock correlation and leverages market information for automatic feature selection. MASTER elegantly tackles the complex stock correlation by alternatively engaging in intra-stock and inter-stock information aggregation. Experiments show the superiority of MASTER compared with previous works and visualize the captured realistic stock correlation to provide valuable insights.

JMLR Journal 2024 Journal Article

On the Optimality of Gaussian Kernel Based Nonparametric Tests against Smooth Alternatives

  • Tong Li
  • Ming Yuan

Nonparametric tests via kernel embedding of distributions have witnessed a great deal of practical successes in recent years. However, statistical properties of these tests are largely unknown beyond consistency against a fixed alternative. To fill in this void, we study here the asymptotic properties of goodness-of-fit, homogeneity and independence tests using Gaussian kernels, arguably the most popular and successful among such tests. Our results provide theoretical justifications for this common practice by showing that tests using a Gaussian kernel with an appropriately chosen scaling parameter are minimax optimal against smooth alternatives in all three settings. In addition, our analysis also pinpoints the importance of choosing a diverging scaling parameter when using Gaussian kernels and suggests a data-driven choice of the scaling parameter that yields tests optimal, up to an iterated logarithmic factor, over a wide range of smooth alternatives. Numerical experiments are also presented to further demonstrate the practical merits of the methodology. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

IROS Conference 2024 Conference Paper

Theoretical Modeling and Bio-inspired Trajectory Optimization of A Multiple-locomotion Origami Robot

  • Keqi Zhu
  • Haotian Guo
  • Wei Yu
  • Hassen Nigatu
  • Tong Li
  • Ruihong Dong
  • Huixu Dong

Recent research on mobile robots has focused on increasing their adaptability to unpredictable and unstructured environments using soft materials and structures. However, the determination of key design parameters and control over these compliant robots are predominantly iterated through experiments, lacking a solid theoretical foundation. To improve their efficiency, this paper aims to provide mathematics modeling over two locomotion, crawling and swimming. Specifically, a dynamic model is first devised to reveal the influence of the contact surfaces’ frictional coefficients on displacements in different motion phases. Besides, a swimming kinematics model is provided using coordinate transformation, based on which, we further develop an algorithm that systematically plans human-like swimming gaits, with maximum thrust obtained. The proposed algorithm is highly generalizable and has the potential to be applied in other soft robots with similar multiple joints. Simulation experiments have been conducted to illustrate the effectiveness of the proposed modeling.

TIST Journal 2023 Journal Article

Learning Representations of Satellite Imagery by Leveraging Point-of-Interests

  • Tong Li
  • Yanxin Xi
  • Huandong Wang
  • Yong Li
  • Sasu Tarkoma
  • Pan Hui

Satellite imagery depicts the Earth’s surface remotely and provides comprehensive information for many applications, such as land use monitoring and urban planning. Existing studies on unsupervised representation learning for satellite images only take into account the images’ geographic information, ignoring human activity factors. To bridge this gap, we propose using the Point-of-Interest (POI) data to capture human factors and designing a contrastive learning-based framework to consolidate the representation of satellite imagery with POI information. Besides, we introduce a season-invariant representation learning model on satellite imagery, considering that human factors are mostly unchanging with respect to seasons. An attention model is designed at last to merge the representations from the geographic, seasonal, and POI perspectives adaptively. On the basis of real-world datasets collected from Beijing, 1 we evaluate our method for predicting socioeconomic indicators. The results show that the representation containing POI information outperforms the geographic representation in estimating commercial activity-related indicators. Our proposed attentional framework can estimate the socioeconomic indicators with R 2 of 0.874 and outperforms the baseline methods. Furthermore, we explore the differences in the representations of satellite images with varying socioeconomic statuses. Finally, we investigate the impact of geographic and POI perspective information in the representation learning process, as well as the effect of satellite imagery on various spatial resolutions.

AAAI Conference 2023 Conference Paper

Towards Fine-Grained Explainability for Heterogeneous Graph Neural Network

  • Tong Li
  • Jiale Deng
  • Yanyan Shen
  • Luyu Qiu
  • Huang Yongxiang
  • Caleb Chen Cao

Heterogeneous graph neural networks (HGNs) are prominent approaches to node classification tasks on heterogeneous graphs. Despite the superior performance, insights about the predictions made from HGNs are obscure to humans. Existing explainability techniques are mainly proposed for GNNs on homogeneous graphs. They focus on highlighting salient graph objects to the predictions whereas the problem of how these objects affect the predictions remains unsolved. Given heterogeneous graphs with complex structures and rich semantics, it is imperative that salient objects can be accompanied with their influence paths to the predictions, unveiling the reasoning process of HGNs. In this paper, we develop xPath, a new framework that provides fine-grained explanations for black-box HGNs specifying a cause node with its influence path to the target node. In xPath, we differentiate the influence of a node on the prediction w.r.t. every individual influence path, and measure the influence by perturbing graph structure via a novel graph rewiring algorithm. Furthermore, we introduce a greedy search algorithm to find the most influential fine-grained explanations efficiently. Empirical results on various HGNs and heterogeneous graphs show that xPath yields faithful explanations efficiently, outperforming the adaptations of advanced GNN explanation approaches.

TIST Journal 2023 Journal Article

You Are How You Use Apps: User Profiling Based on Spatiotemporal App Usage Behavior

  • Tong Li
  • Yong Li
  • Mingyang Zhang
  • Sasu Tarkoma
  • Pan Hui

Mobile apps have become an indispensable part of people’s daily lives. Users determine what apps to use and when and where to use them based on their tastes, interests, and personal demands, depending on their personality traits. This article aims to infer user profiles from their spatiotemporal mobile app usage behavior. Specifically, we first transform mobile app usage records into a heterogeneous graph. On the graph, nodes represent users, apps, locations, and time slots. Edges describe the co-occurrence of entities in usage records. We then develop a multi-relational heterogeneous graph attention network (MRel-HGAN), an end-to-end system for user profiling. MRel-HGAN first adopts a neighbor sampling strategy based on bootstrapping to sample heavily connected neighbors of a fixed size for each node. Next, we design a relational graph convolutional operation and a multi-relational attention operation. Through such modules, MRel-HGAN can generate node embedding by sufficiently leveraging the rich semantic information of the multi-relational structure in the mobile app usage graph. Experimental results on real-world mobile app usage datasets show the effectiveness and superiority of our MRel-HGAN in the user profiling task for attributes of gender and age.

ICRA Conference 2022 Conference Paper

Estimation of Upper Limb Kinematics with a Magnetometer-Free Egocentric Visual-Inertial System

  • Tong Li
  • Xiaoyu Wu
  • Huixu Dong
  • Haoyong Yu

Most human activities in daily living or professional work rely on upper body motion. Measuring upper body motion is essential for many applications such as health evaluation, rehabilitation, human power augmentation, skill transferring, etc. Computer vision-based systems have been widely used to directly capture upper limb motion but are usually constrained in a restricted area. Wearable sensors such as inertial measurement units (IMUs) are promising to enable ambulant and out-of-lab measurements but also suffer from issues such as magnetic distortion and drifting. Some visual-inertial systems have been proposed recently to fuse these two complementary measurements but mostly apply in a restricted area. In this paper, we propose a fully wearable egocentric visual-inertial system to estimate the upper-limb pose. Magnetometers are not used to allow the system to work in complex industrial and daily living scenarios or to be integrated with motorized assistive devices. Methods to automatically calibrate the sensor-to-segment alignment and estimate upper body motion is presented and validated with an optical motion capture system. Experimental results showed the system can estimate the joint angles without drift and obtain accurate wrist position even with occlusion, verifying the efficacy of the proposed system and method.

TIST Journal 2022 Journal Article

Utility-aware and Privacy-preserving Trajectory Synthesis Model that Resists Social Relationship Privacy Attacks

  • Zhirun Zheng
  • Zhetao Li
  • Jie Li
  • Hongbo Jiang
  • Tong Li
  • Bin Guo

For academic research and business intelligence, trajectory data has been widely collected and analyzed. Releasing trajectory data to a third party may lead to serious privacy leakage, which has spawned considerable researches on trajectory privacy protection technology. However, existing work suffers from several shortcomings. They either focus on point-based location privacy, ignoring the spatio-temporal correlations among locations within a trajectory, or they protect the privacy of each user separately without considering privacy leakage of the social relationship between trajectories of different users. Besides, they fail to balance privacy protection and data utility. Motivated by these limitations, in this article, we propose S 3 T -Trajectory, which is a utility-aware and privacy-preserving trajectory synthesis model that Resists social relationship privacy attacks. Specifically, we first develop a time-dependent Markov chain based on an adaptive spatio-temporal discrete grid to efficiently and accurately capture human mobility behavior. Then, we propose three mobility feature metrics from spatio-temporal, semantic, and social dimensions. On the basis of the metrics, we construct a bi-level optimization problem to accomplish the utility-aware and privacy-preserving trajectory synthesizing. The upper-level objective guarantees data utility and the lower-level optimization problems (or upper-level constraints) provides two-layer privacy protection for S 3 T -Trajectory, i.e., resisting location inference attacks and social relationship privacy attacks. We conduct extensive experiments on large-scale real-world datasets loc-Gowalla and loc-Brightkite. The experimental results demonstrate the effectiveness and robustness of S 3 T Trajectory. Compared with the baseline models, S 3 T Trajectory achieves between 7.8% and 23.8% performance improvement in resisting social relationship privacy attacks and achieves at least 5.19% improvement regarding data utility.

ICLR Conference 2021 Conference Paper

C-Learning: Horizon-Aware Cumulative Accessibility Estimation

  • Panteha Naderian
  • Gabriel Loaiza-Ganem
  • Harry J. Braviner
  • Anthony L. Caterini
  • Jesse C. Cresswell
  • Tong Li
  • Animesh Garg

Multi-goal reaching is an important problem in reinforcement learning needed to achieve algorithmic generalization. Despite recent advances in this field, current algorithms suffer from three major challenges: high sample complexity, learning only a single way of reaching the goals, and difficulties in solving complex motion planning tasks. In order to address these limitations, we introduce the concept of cumulative accessibility functions, which measure the reachability of a goal from a given state within a specified horizon. We show that these functions obey a recurrence relation, which enables learning from offline interactions. We also prove that optimal cumulative accessibility functions are monotonic in the planning horizon. Additionally, our method can trade off speed and reliability in goal-reaching by suggesting multiple paths to a single goal depending on the provided horizon. We evaluate our approach on a set of multi-goal discrete and continuous control tasks. We show that our method outperforms state-of-the-art goal-reaching algorithms in success rate, sample complexity, and path optimality. Our code is available at https://github.com/layer6ai-labs/CAE, and additional visualizations can be found at https://sites.google.com/view/learning-cae/.

JMLR Journal 2021 Journal Article

On the Optimality of Kernel-Embedding Based Goodness-of-Fit Tests

  • Krishnakumar Balasubramanian
  • Tong Li
  • Ming Yuan

The reproducing kernel Hilbert space (RKHS) embedding of distributions offers a general and flexible framework for testing problems in arbitrary domains and has attracted considerable amount of attention in recent years. To gain insights into their operating characteristics, we study here the statistical performance of such approaches within a minimax framework. Focusing on the case of goodness-of-fit tests, our analyses show that a vanilla version of the kernel embedding based test could be minimax suboptimal, {when considering $\chi^2$ distance as the separation metric}. Hence we suggest a simple remedy by moderating the embedding. We prove that the moderated approach provides optimal tests for a wide range of deviations from the null and can also be made adaptive over a large collection of interpolation spaces. Numerical experiments are presented to further demonstrate the merits of our approach. [abs] [ pdf ][ bib ] &copy JMLR 2021. ( edit, beta )

IJCAI Conference 2020 Conference Paper

Multi-View Joint Graph Representation Learning for Urban Region Embedding

  • Mingyang Zhang
  • Tong Li
  • Yong Li
  • Pan Hui

The increasing amount of urban data enable us to investigate urban dynamics, assist urban planning, and eventually, make our cities more livable and sustainable. In this paper, we focus on learning an embedding space from urban data for urban regions. For the first time, we propose a multi-view joint learning model to learn comprehensive and representative urban region embeddings. We first model different types of region correlations based on both human mobility and inherent region properties. Then, we apply a graph attention mechanism in learning region representations from each view of the built correlations. Moreover, we introduce a joint learning module that boosts the region embedding learning by sharing cross-view information and fuses multi-view embeddings by learning adaptive weights. Finally, we exploit the learned embeddings in the downstream applications of land usage classification and crime prediction in urban areas with real-world data. Extensive experiment results demonstrate that by exploiting our proposed joint learning model, the performance is improved by a large margin on both tasks compared with the state-of-the-art methods.

IJCAI Conference 2019 Conference Paper

A Decomposition Approach for Urban Anomaly Detection Across Spatiotemporal Data

  • Mingyang Zhang
  • Tong Li
  • Hongzhi Shi
  • Yong Li
  • Pan Hui

Urban anomalies such as abnormal flow of crowds and traffic accidents could result in loss of life or property if not handled properly. Detecting urban anomalies at the early stage is important to minimize the adverse effects. However, urban anomaly detection is difficult due to two challenges: a) the criteria of urban anomalies varies with different locations and time; b) urban anomalies of different types may show different signs. In this paper, we propose a decomposing approach to address these two challenges. Specifically, we decompose urban dynamics into the normal component and the abnormal component. The normal component is merely decided by spatiotemporal features, while the abnormal component is caused by anomalous events. Then, we extract spatiotemporal features and estimate the normal component accordingly. At last, we derive the abnormal component to identify anomalies. We evaluate our method using both real-world and synthetic datasets. The results show our method can detect meaningful events and outperforms state-of-the-art anomaly detecting methods by a large margin.

TIST Journal 2019 Journal Article

Secure Deduplication System with Active Key Update and Its Application in IoT

  • Jin Li
  • Tong Li
  • Zheli Liu
  • Xiaofeng Chen

The rich cloud services in the Internet of Things create certain needs for edge computing, in which devices should be able to handle storage tasks securely, reliably, and efficiently. When processing the storage requests from edge devices, each cloud server is supposed to eliminate duplicate copies of repeating data to reduce the amount of storage space and save on bandwidth. To protect data confidentiality while supporting deduplication, some convergent-encryption-based techniques have been proposed to encrypt the data before uploading. However, all these works cannot meet two requirements while preventing brute-force attacks: (i) power-constrained edge nodes should update encryption keys efficiently when an edge node is abandoned; and (ii) the access privacy of edge nodes should be guaranteed. In this article, we propose a novel encryption scheme for secure chunk-level deduplication. Based on this scheme, we present two constructions of the secure deduplication system that support an efficient key update protocol. The key update protocol does not involve any edge node in computational tasks, so that the deduplication system can adopt an active key update strategy. Moreover, one of our constructions, which is called advance construction, can provide access privacy assurances for edge nodes. The security analysis is given in terms of the proposed threat model. The experimental analysis demonstrates that the proposed deduplication system is practical.

AAAI Conference 2019 Conference Paper

Turbo Learning Framework for Human-Object Interactions Recognition and Human Pose Estimation

  • Wei Feng
  • Wentao Liu
  • Tong Li
  • Jing Peng
  • Chen Qian
  • Xiaolin Hu

Human-object interactions (HOI) recognition and pose estimation are two closely related tasks. Human pose is an essential cue for recognizing actions and localizing the interacted objects. Meanwhile, human action and their interacted objects’ localizations provide guidance for pose estimation. In this paper, we propose a turbo learning framework to perform HOI recognition and pose estimation simultaneously. First, two modules are designed to enforce message passing between the tasks, i. e. pose aware HOI recognition module and HOI guided pose estimation module. Then, these two modules form a closed loop to utilize the complementary information iteratively, which can be trained in an end-to-end manner. The proposed method achieves the state-of-the-art performance on two public benchmarks including Verbs in COCO (V-COCO) and HICO-DET datasets.