Author name cluster

Hao Xue

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

1 author row

NeurIPS Conference 2025 Conference Paper

Bisecle: Binding and Separation in Continual Learning for Video Language Understanding

Yue Tan
Xiaoqian Hu
Hao Xue
Celso de Melo
Flora Salim

Frontier vision-language models (VLMs) have made remarkable improvements in video understanding tasks. However, real-world videos typically exist as continuously evolving data streams (e. g. , dynamic scenes captured by wearable glasses), necessitating models to continually adapt to shifting data distributions and novel scenarios. Considering the prohibitive computational costs of fine-tuning models on new tasks, usually, a small subset of parameters is updated while the bulk of the model remains frozen. This poses new challenges to existing continual learning frameworks in the context of large multimodal foundation models, i. e. , catastrophic forgetting and update conflict. While the foundation models struggle with parameter-efficient continual learning, the hippocampus in the human brain has evolved highly efficient mechanisms for memory formation and consolidation. Inspired by the rapid Bi nding and pattern se paration mechanisms in the hippocampus, in this work, we propose Bisecle for video-language c ontinual le arning, where a multi-directional supervision module is used to capture more cross-modal relationships and a contrastive prompt learning scheme is designed to isolate task-specific knowledge to facilitate efficient memory storage. Binding and separation processes further strengthen the ability of VLMs to retain complex experiences, enabling robust and efficient continual learning in video understanding tasks. We perform a thorough evaluation of the proposed Bisecle, demonstrating its ability to mitigate forgetting and enhance cross-task generalization on several VideoQA benchmarks.

PDF Details

TMLR Journal 2025 Journal Article

ODEStream: A Buffer-Free Online Learning Framework with ODE-based Adaptor for Streaming Time Series Forecasting

Futoon M. Abushaqra
Hao Xue
Yongli Ren
Flora D. Salim

Addressing the challenges of irregularity and concept drift in streaming time series is crucial for real-world predictive modelling. Previous studies in time series continual learning often propose models that require buffering long sequences, potentially restricting the responsiveness of the inference system. Moreover, these models are typically designed for regularly sampled data, an unrealistic assumption in real-world scenarios. This paper introduces ODEStream, a novel buffer-free continual learning framework that incorporates a temporal isolation layer to capture temporal dependencies within the data. Simultaneously, it leverages the capability of neural ordinary differential equations to process irregular sequences and generate a continuous data representation, enabling seamless adaptation to changing dynamics in a data streaming scenario. Our approach focuses on learning how the dynamics and distribution of historical data change over time, facilitating direct processing of streaming sequences. Evaluations on benchmark real-world datasets demonstrate that ODEStream outperforms the state-of-the-art online learning and streaming analysis baseline models, providing accurate predictions over extended periods while minimising performance degradation over time by learning how the sequence dynamics change. The implementation of ODEStream is available at: \url{https://github.com/FtoonAbushaqra/ODEStream.git}.

PDF Details

IS Journal 2025 Journal Article

Transforming Urban Dynamics: Harnessing Large Language Models for Smarter Mobility

Hao Xue
Ming Jin
Shirui Pan
Flora Salim

Artificial intelligence (AI) has the potential to analyze mobility data and make mobility systems smarter by leveraging diverse data sources such as geospatial data, transportation logs, and real-time sensor data to optimize traffic flow, enhance public transportation systems, and support the development of autonomous vehicles. With the newly emerged generative AI paradigm, exemplified by large language models (LLMs), there is great potential to transform the current AI applications in mobility, transportation, and urban domains. This article provides an overview of recent efforts and aims to shed light on the challenges and future opportunities to facilitate the adaptation of LLMs for smarter mobility systems.

Details DOI

NeurIPS Conference 2024 Conference Paper

Building Timeseries Dataset: Empowering Large-Scale Building Analytics

Arian Prabowo
Xiachong Lin
Imran Razzak
Hao Xue
Emily W. Yap
Matthew Amos
Flora D. Salim

Buildings play a crucial role in human well-being, influencing occupant comfort, health, and safety. Additionally, they contribute significantly to global energy consumption, accounting for one-third of total energy usage, and carbon emissions. Optimizing building performance presents a vital opportunity to combat climate change and promote human flourishing. However, research in building analytics has been hampered by the lack of accessible, available, and comprehensive real-world datasets on multiple building operations. In this paper, we introduce the Building TimeSeries (BTS) dataset. Our dataset covers three buildings over a three-year period, comprising more than ten thousand timeseries data points with hundreds of unique ontologies. Moreover, the metadata is standardized using the Brick schema. To demonstrate the utility of this dataset, we performed benchmarks on two tasks: timeseries ontology classification and zero-shot forecasting. These tasks represent an essential initial step in addressing challenges related to interoperability in building analytics. Access to the dataset and the code used for benchmarking are available here: https: //github. com/cruiseresearchgroup/DIEF_BTS

PDF Details DOI

AIJ Journal 2024 Journal Article

Learning spatio-temporal dynamics on mobility networks for adaptation to open-world events

Zhaonan Wang
Renhe Jiang
Hao Xue
Flora D. Salim
Xuan Song
Ryosuke Shibasaki
Wei Hu
Shaowen Wang

Details DOI

TMLR Journal 2024 Journal Article

SeqLink: A Robust Neural-ODE Architecture for Modelling Partially Observed Time Series

Futoon M. Abushaqra
Hao Xue
Yongli Ren
Flora D. Salim

Ordinary Differential Equations (ODEs) based models have become popular as foundation models for solving many time series problems. Combining neural ODEs with traditional RNN models has provided the best representation for irregular time series. However, ODEs-based models typically require the trajectory of hidden states to be defined based on either the initial observed value or the most recent observation, raising questions about their effectiveness when dealing with longer sequences and extended time intervals. In this article, we explore the behaviour of the ODEs-based models in the context of time series data with varying degrees of sparsity. We introduce SeqLink, an innovative neural architecture designed to enhance the robustness of sequence representation. Unlike traditional approaches that solely rely on the hidden state generated from the last observed value, SeqLink leverages ODE latent representations derived from multiple data samples, enabling it to generate robust data representations regardless of sequence length or data sparsity level. The core concept behind our model is the definition of hidden states for the unobserved values based on the relationships between samples (links between sequences). Through extensive experiments on partially observed synthetic and real-world datasets, we demonstrate that SeqLink improves the modelling of intermittent time series, consistently outperforming state-of-the-art approaches.

PDF Details

NeurIPS Conference 2024 Conference Paper

ViLCo-Bench: VIdeo Language COntinual learning Benchmark

Tianqi Tang
Shohreh Deldari
Hao Xue
Celso de Melo
Flora Salim

Video language continual learning involves continuously adapting to information from video and text inputs, enhancing a model’s ability to handle new tasks while retaining prior knowledge. This field is a relatively under-explored area, and establishing appropriate datasets is crucial for facilitating communication and research in this field. In this study, we present the first dedicated benchmark, ViLCo-Bench, designed to evaluate continual learning models across a range of video-text tasks. The dataset comprises ten-minute-long videos and corresponding language queries collected from publicly available datasets. Additionally, we introduce a novel memory-efficient framework that incorporates self-supervised learning and mimics long-term and short-term memory effects. This framework addresses challenges including memory complexity from long video clips, natural language complexity from open queries, and text-video misalignment. We posit that ViLCo-Bench, with greater complexity compared to existing continual learning benchmarks, would serve as a critical tool for exploring the video-language domain, extending beyond conventional class-incremental tasks, and addressing complex and limited annotation issues. The curated data, evaluations, and our novel method are available at https: //github. com/cruiseresearchgroup/ViLCo.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Event-Aware Multimodal Mobility Nowcasting

Zhaonan Wang
Renhe Jiang
Hao Xue
Flora D. Salim
Xuan Song
Ryosuke Shibasaki

As a decisive part in the success of Mobility-as-a-Service (MaaS), spatio-temporal predictive modeling for crowd movements is a challenging task particularly considering scenarios where societal events drive mobility behavior deviated from the normality. While tremendous progress has been made to model high-level spatio-temporal regularities with deep learning, most, if not all of the existing methods are neither aware of the dynamic interactions among multiple transport modes nor adaptive to unprecedented volatility brought by potential societal events. In this paper, we are therefore motivated to improve the canonical spatio-temporal network (ST-Net) from two perspectives: (1) design a heterogeneous mobility information network (HMIN) to explicitly represent intermodality in multimodal mobility; (2) propose a memory-augmented dynamic filter generator (MDFG) to generate sequence-specific parameters in an on-the-fly fashion for various scenarios. The enhanced event-aware spatiotemporal network, namely EAST-Net, is evaluated on several real-world datasets with a wide variety and coverage of societal events. Both quantitative and qualitative experimental results verify the superiority of our approach compared with the state-of-the-art baselines. Code and data are published on https: //github. com/underdoc-wang/EAST-Net.

PDF Details

TIST Journal 2022 Journal Article

Generative Adversarial Networks for Spatio-temporal Data: A Survey

Nan Gao
Hao Xue
WEI SHAO
Sichen Zhao
Kyle Kai Qin
Arian Prabowo
Mohammad Saiedur Rahaman
Flora D. Salim

Generative Adversarial Networks (GANs) have shown remarkable success in producing realistic-looking images in the computer vision area. Recently, GAN-based techniques are shown to be promising for spatio-temporal-based applications such as trajectory prediction, events generation, and time-series data imputation. While several reviews for GANs in computer vision have been presented, no one has considered addressing the practical applications and challenges relevant to spatio-temporal data. In this article, we have conducted a comprehensive review of the recent developments of GANs for spatio-temporal data. We summarise the application of popular GAN architectures for spatio-temporal data and the common practices for evaluating the performance of spatio-temporal applications with GANs. Finally, we point out future research directions to benefit researchers in this area.

Details DOI

NeurIPS Conference 2021 Conference Paper

MobTCast: Leveraging Auxiliary Trajectory Forecasting for Human Mobility Prediction

Hao Xue
Flora Salim
Yongli Ren
Nuria Oliver

Human mobility prediction is a core functionality in many location-based services and applications. However, due to the sparsity of mobility data, it is not an easy task to predict future POIs (place-of-interests) that are going to be visited. In this paper, we propose MobTCast, a Transformer-based context-aware network for mobility prediction. Specifically, we explore the influence of four types of context in mobility prediction: temporal, semantic, social, and geographical contexts. We first design a base mobility feature extractor using the Transformer architecture, which takes both the history POI sequence and the semantic information as input. It handles both the temporal and semantic contexts. Based on the base extractor and the social connections of a user, we employ a self-attention module to model the influence of the social context. Furthermore, unlike existing methods, we introduce a location prediction branch in MobTCast as an auxiliary task to model the geographical context and predict the next location. Intuitively, the geographical distance between the location of the predicted POI and the predicted location from the auxiliary branch should be as close as possible. To reflect this relation, we design a consistency loss to further improve the POI prediction performance. In our experimental results, MobTCast outperforms other state-of-the-art next POI prediction methods. Our approach illustrates the value of including different types of context in next POI prediction.

PDF Details