Arrow Research search

Author name cluster

Yong Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

116 papers
2 author rows

Possible papers

116

AAAI Conference 2026 Conference Paper

DehazeGS: Seeing Through Fog with 3D Gaussian Splatting

  • Jinze Yu
  • Yiqun Wang
  • Aiheng Jiang
  • Zhengda Lu
  • Jianwei Guo
  • Yong Li
  • Hongxing Qin
  • Xiaopeng Zhang

Current novel view synthesis methods are typically designed for high-quality and clean input images. However, in foggy scenes, scattering and attenuation can significantly degrade the quality of rendering. Although NeRF-based dehazing approaches have been developed, their reliance on deep fully connected neural networks and per-ray sampling strategies leads to high computational costs. Furthermore, NeRF's implicit representation limits its ability to recover fine-grained details from hazy scenes. To overcome these limitations, we propose DehazeGS, the first physics-driven 3D Gaussian Splatting (3DGS) framework for dehazing. We adopt an explicit Gaussian representation to model fog formation via a physically consistent forward rendering process, enabling reconstruction and rendering of fog-free scenes using only multi-view foggy images as input. Specifically, based on the atmospheric scattering model, we simulate the formation of fog by establishing the transmission function directly on Gaussian primitives via depth-to-transmission mapping. During training, we jointly learn the atmospheric light and scattering coefficients while optimizing the Gaussian representation of foggy scenes. At inference time, we remove the effects of scattering and attenuation in Gaussian distributions and directly render the scene to obtain dehazed views. Experiments on both real-world and synthetic foggy datasets demonstrate that DehazeGS achieves state-of-the-art performance.

TIST Journal 2026 Journal Article

Dynamic Population Distribution Aware Human Trajectory Generation with Diffusion Model

  • Qingyue Long
  • Can Rong
  • Tong Li
  • Yong Li

Human trajectory data are crucial in urban planning, traffic engineering, and public health. However, directly using real-world trajectory data often faces challenges such as privacy concerns, data acquisition costs, and data quality. A practical solution to these challenges is trajectory generation, a method developed to simulate human mobility behaviors. Existing trajectory generation methods mainly focus on capturing individual movement patterns but often overlook the influence of population distribution on trajectory generation. In reality, dynamic population distribution reflects changes in population density across different regions, significantly impacting individual mobility behavior. Thus, we propose a novel trajectory generation framework based on a diffusion model, which integrates the dynamic population distribution constraints to guide high-fidelity generation outcomes. Specifically, we construct a spatial graph to enhance the spatial correlation of trajectories. Then, we design a dynamic population distribution aware denoising network to capture the spatiotemporal dependencies of human mobility behavior as well as the impact of population distribution in the denoising process. Extensive experiments show that the trajectories generated by our model can resemble real-world trajectories in terms of some critical statistical metrics, outperforming state-of-the-art algorithms by over 54%.

AAAI Conference 2026 Conference Paper

EEG-DLite: Dataset Distillation for Efficient Large EEG Model Training

  • Yuting Tang
  • Weibang Jiang
  • Shanglin Li
  • Yong Li
  • Chenyu Liu
  • Xinliang Zhou
  • Yi Ding
  • Cuntai Guan

Large-scale EEG foundation models have shown strong generalization across a range of downstream tasks, but their training remains resource-intensive due to the volume and variable quality of EEG data. In this work, we introduce EEG-DLite, a data distillation framework that enables more efficient pre-training by selectively removing noisy and redundant samples from large EEG datasets. EEG-DLite begins by encoding EEG segments into compact latent representations using a self-supervised autoencoder, allowing sample selection to be performed efficiently and with reduced sensitivity to noise. Based on these representations, EEG-DLite filters out outliers and minimizes redundancy, resulting in a smaller yet informative subset that retains the diversity essential for effective foundation model training. Through extensive experiments, we demonstrate that training on only 5 percent of a 2,500-hour dataset curated with EEG-DLite yields performance comparable to, and in some cases better than, training on the full dataset across multiple downstream tasks. To our knowledge, this is the first systematic study of pre-training data distillation in the context of EEG foundation models. EEG-DLite provides a scalable and practical path toward more effective and efficient physiological foundation modeling.

EAAI Journal 2026 Journal Article

Encoder–Inverter framework for seismic acoustic impedance inversion

  • Junheng Peng
  • Yingtian Liu
  • Mingwei Wang
  • Yong Li
  • Wen Feng

Seismic acoustic impedance inversion is a challenging problem in geophysical exploration, primarily due to the scarcity of well logs and the task’s inherent nonlinearity. Most existing inversion methods, including semi-supervised approaches, remain limited in accuracy and robustness. In this work, we propose an Encoder–Inverter framework that maps continuous seismic traces into high-dimensional linear space, thereby transforming the inversion into a linear extrapolation or interpolation problem to improve stability and performance. To reach this, we introduce two auxiliary models to assist Encoder training and adopt a heterogeneous model structure to prevent shortcut learning, enabling the extraction of more generalizable and effective features. We evaluate the proposed method on benchmark and field datasets; experimental results show that our approach achieves better accuracy and robustness than competing methods. To promote reproducibility, we will open-source the data and code.

EAAI Journal 2026 Journal Article

FA-Former: A dynamic spatiotemporal attention network with feedback and alignment mechanisms for few-shot human motion prediction

  • Linfeng Zhu
  • Yong Li
  • Shuaiyong Li
  • Hui Tian

Human motion prediction has attracted considerable attention in recent years. However, conventional methods show clear limitations when training data are scarce. Few-shot learning has therefore emerged as a promising solution to this challenge. Nevertheless, existing few-shot approaches often struggle to capture the spatiotemporal dependencies inherent in motion sequences. To address this limitation, we propose a novel dynamic spatiotemporal attention network. The proposed network strengthens the modeling of spatiotemporal relationships while introducing a dynamic modulation mechanism for deep feature representations. This design enables the model to adapt motion semantics under limited data conditions. In addition, to reduce the storage and computational costs associated with external memory modules used in prior work, we introduce a feedback mechanism and an alignment-based prediction framework. Specifically, task-aware feedback is subtracted from the query vector to explicitly mitigate bias caused by limited data. A query-enhanced attention module further performs sample-level alignment, promoting consistency between the predicted and target distributions. Overall, the framework conceptually mirrors key aspects of human learning, including experience accumulation, reflection, and alignment with new tasks. Extensive experiments on multiple benchmark datasets demonstrate that the proposed method achieves improvements over state-of-the-art approaches across several evaluation metrics.

AAAI Conference 2026 System Paper

GeoProblem Factory: A Visual Interaction System for Solvable and Controllable Geometric Problem Generation by Leveraging Symbolic Deduction Engine

  • Zhuoxuan Jiang
  • Yanpeng Li
  • Tianyang Zhang
  • Jing Chen
  • Yong Li
  • Mo Guang
  • Wen Si
  • Shaohua Zhang

We propose a novel system, GeoProblem Factory, designed to effectively generate high-quality geometry problems for intelligent education. The system enables to efficiently produce batches of geometry problems for teachers and students, either to save time and manual effort or to support personalized learning. Generating geometry problems is particularly challenging, as it requires ensuring both solvability and controllability from a pedagogical perspective. To address these issues, we adopt a state-of-the-art pipeline method based on a symbolic deduction engine and develop a visual interaction demo. This demo allows users to easily refine the generated problems through visual operations. It provides two modes for inputting controllable information: specifying knowledge points or supplying a reference problem. Moreover, the system can automatically generate a preliminary geometric diagram corresponding to each problem for further refinement. Through human–machine interaction, the system can more efficiently produce high-quality geometry problems than ever.

AAAI Conference 2026 Conference Paper

Good-for-MDP State Reduction for Stochastic LTL Planning

  • Christoph Weinhuber
  • Giuseppe De Giacomo
  • Yong Li
  • Sven Schewe
  • Qiyi Tang

We study stochastic planning problems in Markov Decision Processes (MDPs) with goals specified in Linear Temporal Logic (LTL). The state-of-the-art approach transforms LTL formulas into good-for-MDP (GFM) automata, which feature a restricted form of nondeterminism. These automata are then composed with the MDP, allowing the agent to resolve the nondeterminism during policy synthesis. A major factor affecting the scalability of this approach is the size of the generated automata. In this paper, we propose a novel GFM state-space reduction technique that significantly reduces the number of automata states. Our method employs a sophisticated chain of transformations, leveraging recent advances in good-for-games minimisation developed for adversarial settings. In addition to our theoretical contributions, we present empirical results demonstrating the practical effectiveness of our state-reduction technique. Furthermore, we introduce a direct construction method for formulas of the form GFφ, where φ is a co-safety formula. This construction is provably single-exponential in the worst case, in contrast to the general doubly-exponential complexity. Our experiments confirm the scalability advantages of this specialised construction.

AAAI Conference 2026 Conference Paper

History-Aware Reasoning for GUI Agents

  • Ziwei Wang
  • Leyang Yang
  • Xiaoxuan Tang
  • Sheng Zhou
  • Dajun Chen
  • Wei Jiang
  • Yong Li

Advances in Multimodal Large Language Models have significantly enhanced Graphical User Interface (GUI) automation. Equipping GUI agents with reliable episodic reasoning capabilities is essential for bridging the gap between users’ concise task descriptions and the complexities of real-world execution. Current methods integrate Reinforcement Learning (RL) with System-2 Chain-of-Thought, yielding notable gains in reasoning enhancement. For long-horizon GUI tasks, historical interactions connect each screen to the goal-oriented episode chain, and effectively leveraging these clues is crucial for the current decision. However, existing native GUI agents exhibit weak short-term memory in their explicit reasoning, interpreting the chained interactions as discrete screen understanding, i.e., unawareness of the historical interactions within the episode. This history-agnostic reasoning challenges their performance in GUI automation. To alleviate this weakness, we propose a History-Aware Reasoning (HAR) framework, which encourages an agent to reflect on its own errors and acquire episodic reasoning knowledge from them via tailored strategies that enhance short-term memory in long-horizon interaction. The framework mainly comprises constructing a reflective learning scenario, synthesizing tailored correction guidelines, and designing a hybrid RL reward function. Using the HAR framework, we develop a native end-to-end model, HAR-GUI-3B, which alters the inherent reasoning mode from history-agnostic to history-aware, equipping the GUI agent with stable short-term memory and reliable perception of screen details. Comprehensive evaluations across a range of GUI-related benchmarks demonstrate the effectiveness and generalization of our method.

AAAI Conference 2026 Conference Paper

ProBench: Benchmarking GUI Agents with Accurate Process Information

  • Leyang Yang
  • Ziwei Wang
  • Xiaoxuan Tang
  • Sheng Zhou
  • Dajun Chen
  • Wei Jiang
  • Yong Li

With the deep integration of artificial intelligence and interactive technology, Graphical User Interface (GUI) Agent, as the carrier connecting goal-oriented natural language and real-world devices, has received widespread attention from the community. Contemporary benchmarks aim to evaluate the comprehensive capabilities of GUI agents in GUI operation tasks, generally determining task completion solely by inspecting the final screen state. However, GUI operation tasks consist of multiple chained steps while not all critical information is presented in the final few pages. Although a few research has begun to incorporate intermediate steps into evaluation, accurately and automatically capturing this process information still remains an open challenge. To address this weakness, we introduce ProBench, a comprehensive mobile benchmark with over 200 challenging GUI tasks covering widely-used scenarios. Remaining the traditional State-related Task evaluation, we extend our dataset to include Process-related Task and design a specialized evaluation method. A newly introduced Process Provider automatically supplies accurate process information, enabling presice assessment of agent's performance. Our evaluation of advanced GUI agents reveals significant limitations for real-world GUI scenarios. These shortcomings are prevalent across diverse models, including both large-scale generalist models and smaller, GUI-specific models. A detailed error analysis further exposes several universal problems, outlining concrete directions for future improvements.

AAAI Conference 2026 Conference Paper

Realistic Face Reconstruction from Facial Embeddings via Diffusion Models

  • Dong Han
  • Yong Li
  • Joachim Denzler

With the advancement of face recognition (FR) systems, privacy-preserving face recognition (PPFR) systems have gained popularity for their accurate recognition, enhanced facial privacy protection, and robustness to various attacks. However, there are limited studies to further verify privacy risks by reconstructing realistic high-resolution face images from embeddings of these systems, especially for PPFR. In this work, we propose the face embedding mapping (FEM), a general framework that explores Kolmogorov-Arnold Network (KAN) for conducting the embedding-to-face attack by leveraging pre-trained Identity-Preserving diffusion model against state-of-the-art (SOTA) FR and PPFR systems. Based on extensive experiments, we verify that reconstructed faces can be used for accessing other real-word FR systems. Besides, the proposed method shows the robustness in reconstructing faces from the partial and protected face embeddings. Moreover, FEM can be utilized as a tool for evaluating safety of FR and PPFR systems in terms of privacy leakage. All images used in this work are from public datasets.

AAAI Conference 2026 Conference Paper

WeightFlow: Learning Stochastic Dynamics via Evolving Weight of Neural Network

  • Ruikun Li
  • Jiazhen Liu
  • Huandong Wang
  • Qingmin Liao
  • Yong Li

Modeling stochastic dynamics from discrete observations is a key interdisciplinary challenge. Existing methods often fail to estimate the continuous evolution of probability densities from trajectories or face the curse of dimensionality. To address these limitations, we presents a novel paradigm: modeling dynamics directly in the weight space of a neural network by projecting the evolving probability distribution. We first theoretically establish the connection between dynamic optimal transport in measure space and an equivalent energy functional in weight space. Subsequently, we design WeightFlow, which constructs the neural network weights into a graph and learns its evolution via a graph controlled differential equation. Experiments on interdisciplinary datasets show that WeightFlow improves performance by an average of 43.02\% over state-of-the-art methods, providing an effective and scalable solution for modeling high-dimensional stochastic dynamics.

NeurIPS Conference 2025 Conference Paper

AgentRecBench: Benchmarking LLM Agent-based Personalized Recommender Systems

  • Yu Shang
  • Peijie Liu
  • Yuwei Yan
  • Zijing Wu
  • Leheng Sheng
  • Yuanqing Yu
  • Chumeng Jiang
  • An Zhang

The emergence of agentic recommender systems powered by Large Language Models (LLMs) represents a paradigm shift in personalized recommendations, leveraging LLMs’ advanced reasoning and role-playing capabilities to enable autonomous, adaptive decision-making. Unlike traditional recommendation approaches, agentic recommender systems can dynamically gather and interpret user-item interactions from complex environments, generating robust recommendation strategies that generalize across diverse scenarios. However, the field currently lacks standardized evaluation protocols to systematically assess these methods. To address this critical gap, we propose: (1) an interactive textual recommendation simulator incorporating rich user and item metadata and three typical evaluation scenarios (classic, evolving-interest, and cold-start recommendation tasks); (2) a unified modular framework for developing agentic recommender systems; and (3) the first comprehensive benchmark comparing over 10 classical and agentic recommendation methods. Our findings demonstrate the superiority of agentic systems and establish actionable design guidelines for their core components. The benchmark environment has been rigorously validated through an open challenge and remains publicly available with a maintained leaderboard at https: //tsinghua-fib-lab. github. io/AgentSocietyChallenge/pages/overview. html. The benchmark is available at: https: //huggingface. co/datasets/SGJQovo/AgentRecBench.

AAAI Conference 2025 Conference Paper

ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization

  • Weibo Zhao
  • Yubin Shi
  • Xinyu Lyu
  • Wanchen Sui
  • Shen Li
  • Yong Li

Quantization stands as a pivotal technique for large language model (LLM) serving, yet it poses significant challenges particularly in achieving effective low-bit quantization. The limited numerical mapping makes the quantized model produce a non-trivial error, bringing out intolerable performance degration. This paper is anchored in the basic idea of model compression objectives, and delves into the layer-wise error distribution of LLMs during post-training quantization. Subsequently, we introduce ASER, an algorithm consisting of (1) Error Reconstruction: low-rank compensation for quantization error with LoRA-style matrices constructed by whitening SVD; (2) Activation Smoothing: outlier extraction to gain smooth activation and better error compensation. ASER is capable of quantizing typical LLMs to low-bit ones, particularly preserving accuracy even in W4A8 per-channel setup. Experimental results show that ASER is competitive among the state-of-the-art quantization algorithms, showing potential to activation quantization, with minor overhead.

IJCAI Conference 2025 Conference Paper

Automated Decision-Making on Networks with LLMs through Knowledge-Guided Evolution

  • Xiaohan Zheng
  • Lanning Wei
  • Yong Li
  • Quanming Yao

Effective decision-making on networks often relies on learning from graph-structured data, where Graph Neural Networks (GNNs) play a central role, but they take efforts to configure and tune. In this demo, we propose LLMNet, showing how to design GNN automated through Large Language Models. Our system develops a set of agents that construct graph-related knowlege bases and then leverages Retrieval-Augmented Generation (RAG) to support automated configuration and refinement of GNN models through a knowledge-guided evolution process. These agents, equipped with specialized knowledge bases, extract insights into tasks and graph structures by interacting with the knowledge bases. Empirical results show LLMNet excels in twelve datasets across three graph learning tasks, validating its effectiveness of GNN model designing.

NeurIPS Conference 2025 Conference Paper

Balanced Token Pruning: Accelerating Vision Language Models Beyond Local Optimization

  • kaiyuan Li
  • Xiaoyue Chen
  • Chen Gao
  • Yong Li
  • Xinlei Chen

Large Vision-Language Models (LVLMs) have shown impressive performance across multi-modal tasks by encoding images into thousands of tokens. However, the large number of image tokens results in significant computational overhead, and the use of dynamic high-resolution inputs further increases this burden. Previous approaches have attempted to reduce the number of image tokens through token pruning, typically by selecting tokens based on attention scores or image token diversity. Through empirical studies, we observe that existing methods often overlook the joint impact of pruning on both the current layer's output (local) and the outputs of subsequent layers (global), leading to suboptimal pruning decisions. To address this challenge, we propose Balanced Token Pruning (BTP), a plug-and-play method for pruning vision tokens. Specifically, our method utilizes a small calibration set to divide the pruning process into multiple stages. In the early stages, our method emphasizes the impact of pruning on subsequent layers, whereas in the deeper stages, the focus shifts toward preserving the consistency of local outputs. Extensive experiments across various LVLMs demonstrate the broad effectiveness of our approach on multiple benchmarks. Our method achieves a 78\% compression rate while preserving 96. 7\% of the original models' performance on average. Our code is available at https: //github. com/EmbodiedCity/NeurIPS2025-Balanced-Token-Pruning.

IJCAI Conference 2025 Conference Paper

CD^2: Constrained Dataset Distillation for Few-Shot Class-Incremental Learning

  • Kexin Bao
  • Daichi Zhang
  • Hansong Zhang
  • Yong Li
  • Yutao Yue
  • Shiming Ge

Few-shot class-incremental learning (FSCIL) receives significant attention from the public to perform classification continuously with a few training samples, which suffers from the key catastrophic forgetting problem. Existing methods usually employ an external memory to store previous knowledge and treat it with incremental classes equally, which cannot properly preserve previous essential knowledge. To solve this problem and inspired by recent distillation works on knowledge transfer, we propose a framework termed Constrained Dataset Distillation (CD^2) to facilitate FSCIL, which includes a dataset distillation module (DDM) and a distillation constraint module (DCM). Specifically, the DDM synthesizes highly condensed samples guided by the classifier, forcing the model to learn compacted essential class-related clues from a few incremental samples. The DCM introduces a designed loss to constrain the previously learned class distribution, which can preserve distilled knowledge more sufficiently. Extensive experiments on three public datasets show the superiority of our method against other state-of-the-art competitors.

NeurIPS Conference 2025 Conference Paper

Diffusion Transformers as Open-World Spatiotemporal Foundation Models

  • Yuan Yuan
  • Chonghua Han
  • Jingtao Ding
  • Guozhen Zhang
  • Depeng Jin
  • Yong Li

The urban environment is characterized by complex spatio-temporal dynamics arising from diverse human activities and interactions. Effectively modeling these dynamics is essential for understanding and optimizing urban systems. In this work, we introduce UrbanDiT, a foundation model for open-world urban spatio-temporal learning that successfully scales up diffusion transformers in this field. UrbanDiT pioneers a unified model that integrates diverse data sources and types while learning universal spatio-temporal patterns across different cities and scenarios. This allows the model to unify both multi-data and multi-task learning, and effectively support a wide range of spatio-temporal applications. Its key innovation lies in the elaborated prompt learning framework, which adaptively generates both data-driven and task-specific prompts, guiding the model to deliver superior performance across various urban applications. UrbanDiT offers three advantages: 1) It unifies diverse data types, such as grid-based and graph-based data, into a sequential format; 2) With task-specific prompts, it supports a wide range of tasks, including bi-directional spatio-temporal prediction, temporal interpolation, spatial extrapolation, and spatio-temporal imputation; and 3) It generalizes effectively to open-world scenarios, with its powerful zero-shot capabilities outperforming nearly all baselines with training data. UrbanDiT sets up a new benchmark for foundation models in the urban spatio-temporal domain. Code and datasets are publicly available at \url{https: //github. com/tsinghua-fib-lab/UrbanDiT}.

JBHI Journal 2025 Journal Article

EEG-Deformer: A Dense Convolutional Transformer for Brain-Computer Interfaces

  • Yi Ding
  • Yong Li
  • Hao Sun
  • Rui Liu
  • Chengxuan Tong
  • Chenyu Liu
  • Xinliang Zhou
  • Cuntai Guan

Effectively learning the temporal dynamics in electroencephalogram (EEG) signals is challenging yet essential for decoding brain activities using brain-computer interfaces (BCIs). Although Transformers are popular for their long-term sequential learning ability in the BCI field, most methods combining Transformers with convolutional neural networks (CNNs) fail to capture the coarse-to-fine temporal dynamics of EEG signals. To overcome this limitation, we introduce EEG-Deformer, which incorporates two main novel components into a CNN-Transformer: (1) a Hierarchical Coarse-to-Fine Transformer (HCT) block that integrates a Fine-grained Temporal Learning (FTL) branch into Transformers, effectively discerning coarse-to-fine temporal patterns; and (2) a Dense Information Purification (DIP) module, which utilizes multi-level, purified temporal information to enhance decoding accuracy. Comprehensive experiments on three representative cognitive tasksâcognitive attention, driving fatigue, and mental workload detectionâconsistently confirm the generalizability of our proposed EEG-Deformer, demonstrating that it either outperforms or performs comparably to existing state-of-the-art methods. Visualization results show that EEG-Deformer learns from neurophysiologically meaningful brain regions for the corresponding cognitive tasks.

NeurIPS Conference 2025 Conference Paper

LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models

  • Qianyue Hao
  • Yiwen Song
  • Qingmin Liao
  • Jian Yuan
  • Yong Li

Policy exploration is critical in reinforcement learning (RL), where existing approaches include $\epsilon$-greedy, Gaussian process, etc. However, these approaches utilize preset stochastic processes and are indiscriminately applied in all kinds of RL tasks without considering task-specific features that influence policy exploration. Moreover, during RL training, the evolution of such stochastic processes is rigid, which typically only incorporates a decay in the variance, failing to adjust flexibly according to the agent's real-time learning status. Inspired by the analyzing and reasoning capability of large language models (LLMs), we design **LLM-Explorer** to adaptively generate task-specific exploration strategies with LLMs, enhancing the policy exploration in RL. In our design, we sample the learning trajectory of the agent during the RL training in a given task and prompt the LLM to analyze the agent's current policy learning status and then generate a probability distribution for future policy exploration. Updating the probability distribution periodically, we derive a stochastic process specialized for the particular task and dynamically adjusted to adapt to the learning process. Our design is a plug-in module compatible with various widely applied RL algorithms, including the DQN series, DDPG, TD3, and any possible variants developed based on them. Through extensive experiments on the Atari and MuJoCo benchmarks, we demonstrate LLM-Explorer's capability to enhance RL policy exploration, achieving an average performance improvement up to 37. 27%. Our code is open-source at https: //github. com/tsinghua-fib-lab/LLM-Explorer for reproducibility.

AAAI Conference 2025 Conference Paper

MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector

  • Wenjie Fu
  • Huandong Wang
  • Chen Gao
  • Guanghua Liu
  • Yong Li
  • Tao Jiang

The increasing parameters and expansive dataset of large lan- guage models (LLMs) highlight the urgent demand for a technical solution to audit the underlying privacy risks and copyright issues associated with LLMs. Existing studies have partially addressed this need through an exploration of the pre-training data detection problem, which is an instance of a membership inference attack (MIA). This problem involves determining whether a given piece of text has been used during the pre-training phase of the target LLM. Although existing methods have designed various sophisticated MIA score functions to achieve considerable detection performance in pre-trained LLMs, how to achieve high-confidence detection and how to perform MIA on aligned LLMs remain challenging. In this paper, we propose MIA-Tuner, a novel instruction-based MIA method, which instructs LLMs themselves to serve as a more precise pre-training data detector internally, rather than design an external MIA score function. Furthermore, we design two instruction-based safeguards to respectively mitigate the privacy risks brought by the existing methods and MIA-Tuner. To comprehensively evaluate the most recent state-of-the-art LLMs, we collect a more up-to-date MIA benchmark dataset, named WIKIMIA-24, to replace the widely adopted benchmark WIKIMIA. We conduct extensive experiments across various aligned and unaligned LLMs over the two benchmark datasets. The results demonstrate that MIA-Tuner increases the AUC of MIAs from 0.7 to a significantly high level of 0.9.

TIST Journal 2025 Journal Article

Mobility Data-Driven Privacy-Preserving Model for Detecting High-Risk Infection Cases

  • Wenjie Fu
  • Huandong Wang
  • Chen Gao
  • Guanghua Liu
  • Yong Li
  • Tao Jiang

In the past few years, infectious diseases like COVID-19 have caused serious distress to the global society and the economy. To prevent its spread, the early detection and assessment of infectious diseases based on molecular tests or antigen testing of bodily have led to countless labor and material costs. Fortunately, with the rapid development of mobile localization and web techniques, the collected massive mobile trajectory data provide a promising solution for detecting positive cases. However, existing mobility data-driven infection case detection methods are limited in terms of modeling the complicated epidemic spreading processes and preserving user privacy of the mobility data. In this article, we propose a novel graph convolutional networks (GCN) model for detecting high-risk infection cases, where we incorporate a spatio-temporal hypergraph to model the complex interaction of individuals. Then, we elaborately design a privacy-preserving framework tightly coupled with the structure of the spatio-temporal hypergraph, which includes a mobility data obfuscation module to protect privacy and an accompanying confidence-aware mechanism to mitigate the consequent performance decline. Moreover, we introduce a causal propagation mechanism to further guarantee the temporal dependency and causal effect of the feature propagation in our spatio-temporal hypergraph, which introduces both the causal transform of node features and the causal gathering of edge features. Finally, extensive experiments on a large mobility dataset collected from location-based services (LBS) show that the proposed model improves the performance of infection case detection by at least 12.47% when compared with several widely adopted baselines. Besides, our code and datasets are available at the link ( https://github.com/wjfu99/EPI-HGNN ).

TIST Journal 2025 Journal Article

Modeling N-ary Relational Knowledge Bases with Tensor Decomposition

  • Yu Liu
  • Quanming Yao
  • Yong Li

The binary relational knowledge base (KB, a.k.a. knowledge graph), representing real-world knowledge with binary relations and entities, has been an important research topic in artificial intelligence, while, considerable knowledge also involves beyond-binary relations. Recently, the area proposes to model n-ary relational KBs with both binary and beyond-binary relations included. However, most current models are extended from translational distance and neural network models in binary relational KBs, which suffer from weak expressiveness and high complexity, respectively. To overcome such issues, in this work, we propose a novel two-step modeling framework, GETD, generalizing the powerful tensor decomposition technique from binary relational KBs to the n-ary case. For n-ary relational KBs with single-arity relations, the GETD framework introduces Tucker decomposition and Tensor Ring decomposition for expressive and efficient modeling. Furthermore, the framework is technically extended for the representation of n-ary relational KBs with mixed-arity relations. The existing negative sampling technique is also generalized to the n-ary case for GETD. In addition, we theoretically prove that the GETD framework is fully expressive to completely represent any KBs. Empirical results on two representative datasets show that the proposed framework significantly outperforms the state-of-the-art methods, achieving 11–26% and 4–7% improvements on Hits@10 for the single-arity and the mixed-arity cases, respectively.

IJCAI Conference 2025 Conference Paper

OpenCarbon: A Contrastive Learning-based Cross-Modality Neural Approach for High-Resolution Carbon Emission Prediction Using Open Data

  • Jinwei Zeng
  • Yu Liu
  • Guozhen Zhang
  • Jingtao Ding
  • Yuming Lin
  • Jian Yuan
  • Yong Li

Accurately estimating high-resolution carbon emissions is crucial for effective emission governance and mitigation planning. While conventional methods for precise carbon accounting are hindered by substantial data collection efforts, the rise of open data and advanced learning techniques offers a promising solution. Once an open data-based prediction model is developed and trained, it can easily infer emissions for new areas based on available open data. To address this, we incorporate two modalities of open data, satellite images and point-of-interest (POI) data, to predict high-resolution urban carbon emissions, with satellite images providing macroscopic and static and POI data offering fine-grained and relatively dynamic functionality information. However, estimating high-resolution carbon emissions presents two significant challenges: the intertwined and implicit effects of various functionalities on carbon emissions, and the complex spatial contiguity correlations that give rise to the agglomeration effect. Our model, OpenCarbon, features two major designs that target the challenges: a cross-modality information extraction and fusion module to extract complementary functionality information from two modules and model their interactions, and a neighborhood-informed aggregation module to capture the spatial contiguity correlations. Extensive experiments demonstrate our model's superiority, with a significant performance gain of 26. 6% on R2. Further generalizability tests and case studies also show OpenCarbon's capacity to capture the intrinsic relation between urban functionalities and carbon emissions, validating its potential to empower efficient carbon governance and targeted carbon mitigation planning. Codes and data are available: https: //github. com/JinweiZzz/OpenCarbon.

NeurIPS Conference 2025 Conference Paper

PID-controlled Langevin Dynamics for Faster Sampling of Generative Models

  • Hongyi Chen
  • Jianhai Shu
  • Jingtao Ding
  • Yong Li
  • Xiao-Ping (Steven) Zhang

Langevin dynamics sampling suffers from extremely low generation speed, fundamentally limited by numerous fine-grained iterations to converge to the target distribution. We introduce PID-controlled Langevin Dynamics (PIDLD), a novel sampling acceleration algorithm that reinterprets the sampling process using control-theoretic principles. By treating energy gradients as feedback signals, PIDLD combines historical gradients (the integral term) and gradient trends (the derivative term) to efficiently traverse energy landscapes and adaptively stabilize, thereby significantly reducing the number of iterations required to produce high-quality samples. Our approach requires no additional training, datasets, or prior information, making it immediately integrable with any Langevin-based method. Extensive experiments across image generation and reasoning tasks demonstrate that PIDLD achieves higher quality with fewer steps, making Langevin-based generative models more practical for efficiency-critical applications. The implementation can be found at \href{https: //github. com/tsinghua-fib-lab/PIDLD}{https: //github. com/tsinghua-fib-lab/PIDLD}.

AAAI Conference 2025 Conference Paper

Re-Attentional Controllable Video Diffusion Editing

  • Yuanzhi Wang
  • Yong Li
  • Mengyi Liu
  • Xiaoya Zhang
  • Xin Liu
  • Zhen Cui
  • Antoni B. Chan

Editing videos with textual guidance has garnered popularity due to its streamlined process which mandates users to solely edit the text prompt corresponding to the source video. Recent studies have explored and exploited large-scale text-to-image diffusion models for text-guided video editing, resulting in remarkable video editing capabilities. However, they may still suffer from some limitations such as mislocated objects, incorrect number of objects. Therefore, the controllability of video editing remains a formidable challenge. In this paper, we aim to challenge the above limitations by proposing a Re-Attentional Controllable Video Diffusion Editing (ReAtCo) method. Specially, to align the spatial placement of the target objects with the edited text prompt in a training-free manner, we propose a Re-Attentional Diffusion (RAD) to refocus the cross-attention activation responses between the edited text prompt and the target video during the denoising stage, resulting in a spatially location-aligned and semantically high-fidelity manipulated video. In particular, to faithfully preserve the invariant region content with less border artifacts, we propose an Invariant Region-guided Joint Sampling (IRJS) strategy to mitigate the intrinsic sampling errors w.r.t the invariant regions at each denoising timestep and constrain the generated content to be harmonized with the invariant region content. Experimental results verify that ReAtCo consistently improves the controllability of video diffusion editing and achieves superior video editing performance.

NeurIPS Conference 2025 Conference Paper

RoboScape: Physics-informed Embodied World Model

  • Yu Shang
  • Xin Zhang
  • Yinzhou Tang
  • Lei Jin
  • Chen Gao
  • Wei Wu
  • Yong Li

World models have become indispensable tools for embodied intelligence, serving as powerful simulators capable of generating realistic robotic videos while addressing critical data scarcity challenges. However, current embodied world models exhibit limited physical awareness, particularly in modeling 3D geometry and motion dynamics, resulting in unrealistic video generation for contact-rich robotic scenarios. In this paper, we present RoboScape, a unified physics-informed world model that jointly learns RGB video generation and physics knowledge within an integrated framework. We introduce two key physics-informed joint training tasks: temporal depth prediction that enhances 3D geometric consistency in video rendering, and keypoint dynamics learning that implicitly encodes physical properties (e. g. , object shape and material characteristics) while improving complex motion modeling. Extensive experiments demonstrate that RoboScape generates videos with superior visual fidelity and physical plausibility across diverse robotic scenarios. We further validate its practical utility through downstream applications including robotic policy training with generated data and policy evaluation. Our work provides new insights for building efficient physics-informed world models to advance embodied intelligence research. Our code and demos are available at: https: //github. com/tsinghua-fib-lab/RoboScape.

AAAI Conference 2025 Conference Paper

SAM-Aware Graph Prompt Reasoning Network for Cross-Domain Few-Shot Segmentation

  • Shi-Feng Peng
  • Guolei Sun
  • Yong Li
  • Hongsong Wang
  • Guo-Sen Xie

The primary challenge of cross-domain few-shot segmentation (CD-FSS) is the domain disparity between the training and inference phases, which can exist in either the input data or the target classes. Previous models struggle to learn feature representations that generalize to various unknown domains from limited training domain samples. In contrast, the large-scale visual model SAM, pre-trained on tens of millions of images from various domains and classes, possesses excellent generalizability. In this work, we propose a SAM-aware graph prompt reasoning network (GPRN) that fully leverages SAM to guide CD-FSS feature representation learning and improve prediction accuracy. Specifically, we propose a SAM-aware prompt initialization module (SPI) to transform the masks generated by SAM into visual prompts enriched with high-level semantic information. Since SAM tends to divide an object into many sub-regions, this may lead to visual prompts representing the same semantic object having inconsistent or fragmented features. We further propose a graph prompt reasoning (GPR) module that constructs a graph among visual prompts to reason about their interrelationships and enable each visual prompt to aggregate information from similar prompts, thus achieving global semantic consistency. Subsequently, each visual prompt embeds its semantic information into the corresponding mask region to assist in feature representation learning. To refine the segmentation mask during testing, we also design a non-parameter adaptive point selection module (APS) to select representative point prompts from query predictions and feed them back to SAM to refine inaccurate segmentation results. Experiments on four standard CD-FSS datasets demonstrate that our method establishes new state-of-the-art results.

NeurIPS Conference 2025 Conference Paper

Satellites Reveal Mobility: A Commuting Origin-destination Flow Generator for Global Cities

  • Can Rong
  • Xin Zhang
  • Yanxin Xi
  • HONGJIE SUI
  • Jingtao Ding
  • Yong Li

Commuting Origin-destination (OD) flows, capturing daily population mobility of citizens, are vital for sustainable development across cities around the world. However, it is challenging to obtain the data due to the high cost of travel surveys and privacy concerns. Surprisingly, we find that satellite imagery, publicly available across the globe, contains rich urban semantic signals to support high-quality OD flow generation, with over 98\% expressiveness of traditional multisource hard-to-collect urban sociodemographic, economics, land use, and point of interest data. This inspires us to design a novel data generator, GlODGen (Global-scale OriginDestination Flow Generator), which can generate OD flow data for any cities of interest around the world. Specifically, GlODGen first leverages Vision-Language Geo-Foundation Models to extract urban semantic signals related to human mobility from satellite imagery. These features are then combined with population data to form region-level representations, which are used to generate OD flows via graph diffusion models. Extensive experiments on 4 continents and 6 representative cities show that GlODGen has great generalizability across diverse urban environments on different continents and can generate OD flow data for global cities highly consistent with real-world mobility data. We implement GlODGen as an automated tool, seamlessly integrating data acquisition and curation, urban semantic feature extraction, and OD flow generation together. It has been released at https: //github. com/tsinghua-fib-lab/generate-od-pubtools.

NeurIPS Conference 2025 Conference Paper

Skrull: Towards Efficient Long Context Fine-tuning through Dynamic Data Scheduling

  • Hongtao Xu
  • Wenting Shen
  • Yuanxin Wei
  • Ang Wang
  • Guo Runfan
  • Tianxing Wang
  • Yong Li
  • Mingzhen Li

Long-context supervised fine-tuning (Long-SFT) plays a vital role in enhancing the performance of large language models (LLMs) on long-context tasks. To smoothly adapt LLMs to long-context scenarios, this process typically entails training on mixed datasets containing both long and short sequences. However, this heterogeneous sequence length distribution poses significant challenges for existing training systems, as they fail to simultaneously achieve high training efficiency for both long and short sequences, resulting in sub-optimal end-to-end system performance in Long-SFT. In this paper, we present a novel perspective on data scheduling to address the challenges posed by the heterogeneous data distributions in Long-SFT. We propose Skrull, a dynamic data scheduler specifically designed for efficient long-SFT. Through dynamic data scheduling, Skrull balances the computation requirements of long and short sequences, improving overall training efficiency. Furthermore, we formulate the scheduling process as a joint optimization problem and thoroughly analyze the trade-offs involved. Based on those analysis, Skrull employs a lightweight scheduling algorithm to achieve near-zero cost online scheduling in Long-SFT. Finally, we implement Skrull upon DeepSpeed, a state-of-the-art distributed training system for LLMs. Experimental results demonstrate that Skrull outperforms DeepSpeed by 3. 76x on average (up to 7. 54x) in real-world long-SFT scenarios.

IJCAI Conference 2025 Conference Paper

Solving MDPs with LTLf+ and PPLTL+ Temporal Objectives

  • Giuseppe De Giacomo
  • Yong Li
  • Sven Schewe
  • Christoph Weinhuber
  • Pian Yu

The temporal logics LTLf+ and PPLTL+ have recently been introduced to express objectives over infinite traces. These logics are appealing because they match the expressive power of LTL on infinite traces while enabling efficient DFA-based techniques, which have been crucial to the scalability of reactive synthesis and adversarial planning in LTLf and PPLTL over finite traces. In this paper, we demonstrate that these logics are also highly effective in the context of MDPs. Introducing a technique tailored for probabilistic systems, we leverage the benefits of efficient DFA-based methods and compositionality. This approach is simpler than its nonprobabilistic counterparts in reactive synthesis and adversarial planning, as it accommodates a controlled form of nondeterminism ("good for MDPs") in the automata when transitioning from finite to infinite traces. Notably, by exploiting compositionality, our solution is both implementation-friendly and well-suited for straightforward symbolic implementations.

NeurIPS Conference 2025 Conference Paper

Sparse Diffusion Autoencoder for Test-time Adapting Prediction of Complex Systems

  • Jingwen Cheng
  • Ruikun Li
  • Huandong Wang
  • Yong Li

Predicting the behavior of complex systems is critical in many scientific and engineering domains, and hinges on the model’s ability to capture their underlying dynamics. Existing methods encode the intrinsic dynamics of high-dimensional observations through latent representations and predict autoregressively. However, these latent representations lose the inherent spatial structure of spatiotemporal dynamics, leading to the predictor's inability to effectively model spatial interactions and neglect emerging dynamics during long-term prediction. In this work, we propose SparseDiff, introducing a test-time adaptation strategy to dynamically update the encoding scheme to accommodate emergent spatiotemporal structures during the long-term evolution of the system. Specifically, we first design a codebook-based sparse encoder, which coarsens the continuous spatial domain into a sparse graph topology. Then, we employ a graph neural ordinary differential equation to model the dynamics and guide a diffusion decoder for reconstruction. SparseDiff autoregressively predicts the spatiotemporal evolution and adjust the sparse topological structure to adapt to emergent spatiotemporal patterns by adaptive re-encoding. Extensive evaluations on representative systems demonstrate that SparseDiff achieves an average prediction error reduction of 49. 99\% compared to baselines, requiring only 1\% of the spatial resolution.

NeurIPS Conference 2025 Conference Paper

Synergistic Tensor and Pipeline Parallelism

  • Mengshi Qi
  • Jiaxuan Peng
  • Jie Zhang
  • Juan Zhu
  • Yong Li
  • Huadong Ma

In the machine learning system, the hybrid model parallelism combining tensor parallelism (TP) and pipeline parallelism (PP) has become the dominant solution for distributed training of Large Language Models~(LLMs) and Multimodal LLMs (MLLMs). However, TP introduces significant collective communication overheads, while PP suffers from synchronization inefficiencies such as pipeline bubbles. Existing works primarily address these challenges from isolated perspectives, focusing either on overlapping TP communication or on flexible PP scheduling to mitigate pipeline bubbles. In this paper, we propose a new synergistic tensor and pipeline parallelism schedule that simultaneously reduces both types of bubbles. Our proposed schedule decouples the forward and backward passes in PP into fine-grained computation units, which are then braided to form a composite computation sequence. This compositional structure enables near-complete elimination of TP-related bubbles. Building upon this structure, we further design the PP schedule to minimize PP bubbles. Experimental results demonstrate that our approach improves training throughput by up to 12\% for LLMs and 16\% for MLLMs compared to existing scheduling methods. Our source code is avaiable at https: //github. com/MICLAB-BUPT/STP.

IROS Conference 2025 Conference Paper

TOPP-DWR: Time-Optimal Path Parameterization of Differential-Driven Wheeled Robots Considering Piecewise-Constant Angular Velocity Constraints

  • Yong Li
  • Yujun Huang
  • Yi Chen
  • Hui Cheng

Differential-driven wheeled robots (DWR) represent the quintessential type of mobile robots and find extensive applications across the robotic field. Most high-performance control approaches for DWR explicitly utilize the linear and angular velocities of the trajectory as control references. However, existing research on time-optimal path parameterization (TOPP) for mobile robots usually neglects the angular velocity and joint velocity constraints, which can result in degraded control performance in practical applications. In this article, a systematic and practical TOPP algorithm named TOPP-DWR is proposed for DWR and other mobile robots. First, the non-uniform B-spline is adopted to represent the initial trajectory in the task space. Second, the piecewise-constant angular velocity, as well as joint velocity, linear velocity, and linear acceleration constraints, are incorporated into the TOPP problem. During the construction of the optimization problem, the aforementioned constraints are uniformly represented as linear velocity constraints. To boost the numerical computational efficiency, we introduce a slack variable to reformulate the problem into second-order-cone programming (SOCP). Subsequently, comparative experiments are conducted to validate the superiority of the proposed method. Quantitative performance indexes show that TOPP-DWR achieves TOPP while adhering to all constraints. Finally, field autonomous navigation experiments are carried out to validate the practicability of TOPP-DWR in real-world applications.

NeurIPS Conference 2025 Conference Paper

TrajAgent: An LLM-Agent Framework for Trajectory Modeling via Large-and-Small Model Collaboration

  • Yuwei Du
  • Jie Feng
  • Jie Zhao
  • Yong Li

Trajectory modeling, which includes research on trajectory data pattern mining and future prediction, has widespread applications in areas such as life services, urban transportation, and public administration. Numerous methods have been proposed to address specific problems within trajectory modeling. However, the heterogeneity of data and the diversity of trajectory tasks make effective and reliable trajectory modeling an important yet highly challenging endeavor, even for domain experts. In this paper, we propose TrajAgent, a agent framework powered by large language models (LLMs), designed to facilitate robust and efficient trajectory modeling through automation modeling. This framework leverages and optimizes diverse specialized models to address various trajectory modeling tasks across different datasets effectively. In TrajAgent, we first develop UniEnv, an execution environment with a unified data and model interface, to support the execution and training of various models. Building on UniEnv, we introduce an agentic workflow designed for automatic trajectory modeling across various trajectory tasks and data. Furthermore, we introduce collaborative learning schema between LLM-based agents and small speciallized models, to enhance the performance of the whole framework effectively. Extensive experiments on four tasks using four real-world datasets demonstrate the effectiveness of TrajAgent in automated trajectory modeling, achieving a performance improvement of 2. 38%-69. 91% over baseline methods. The codes and data can be accessed via https: //github. com/tsinghua-fib-lab/TrajAgent.

AAAI Conference 2025 Conference Paper

Treasures in Discarded Weights for LLM Quantization

  • Hao Yu
  • Yang Zhou
  • Bohua Chen
  • Zelan Yang
  • Shen Li
  • Yong Li
  • Jianxin Wu

In recent years, large language models (LLMs) have developed rapidly and revolutionized natural language processing. However, high storage overhead and computing costs limit LLM deployment in resource-constrained environments. Quantization algorithms can effectively compress LLMs and accelerate inference, but they lead to loss in precision, especially in low-bit scenarios. In this paper, we find that the discarded weight values caused by quantization in fact contain treasures to improve LLMs' accuracy. To excavate those hidden treasures, we construct search spaces around these discarded weights and those weights within the search space can seamlessly be incorporated into the original quantization weights. To determine which weights should be merged, we design a plug-and-play weight compensation framework to capture global information and keep the weights with the highest potential benefits. Our framework can be combined with various LLM quantization algorithms to achieve higher precision without additional inference overhead. We validate the effectiveness of our approach on widely used benchmark datasets for LLMs.

IJCAI Conference 2025 Conference Paper

Unveiling the Power of Noise Priors: Enhancing Diffusion Models for Mobile Traffic Prediction

  • Zhi Sheng
  • Daisy Yuan
  • Jingtao Ding
  • Qi Yan
  • Xi Zheng
  • Yue Sun
  • Yong Li

Accurate prediction of mobile traffic, i. e. , network traffic from cellular base stations, is crucial for optimizing network performance and supporting urban development. However, the non-stationary nature of mobile traffic, driven by human activity and environmental changes, leads to both regular patterns and abrupt variations. Diffusion models excel in capturing such complex temporal dynamics due to their ability to capture the inherent uncertainties. Most existing approaches prioritize designing novel denoising networks but often neglect the critical role of noise itself, potentially leading to sub-optimal performance. In this paper, we introduce a novel perspective by emphasizing the role of noise in the denoising process. Our analysis reveals that noise fundamentally shapes mobile traffic predictions, exhibiting distinct and consistent patterns. We propose NPDiff, a framework that decomposes noise into prior and residual components, with the prior derived from data dynamics, enhancing the model's ability to capture both regular and abrupt variations. NPDiff can seamlessly integrate with various diffusion-based prediction models, delivering predictions that are effective, efficient, and robust. Extensive experiments demonstrate that it achieves superior performance with an improvement over 30%, offering a new perspective on leveraging diffusion models in this domain. We provide code and data at https: //github. com/tsinghua-fib-lab/NPDiff.

IJCAI Conference 2024 Conference Paper

Angluin-Style Learning of Deterministic Büchi and Co-Büchi Automata

  • Yong Li
  • Sven Schewe
  • Qiyi Tang

While recently developed Angluin-style learning algorithms for omega-automata have much in common with her classic DFA learning algorithm, there is a huge difference in the cost of the equivalence queries about the target automata. For omega-regular languages, the target is to learn nondeterministic Buchi automata (NBAs) through the vehicle of Families of DFAs (FDFAs). While the cost of equivalence queries is usually idealised as constant in learning, it makes a practical difference that the language equivalence checking about the learned NBAs is computationally hard. We develop efficient techniques for the cases, where we learn deterministic Buchi automata (DBAs) or deterministic co-Buchi automata (DCAs). This is based on the observation that some classes of FDFAs can be used to learn DBAs for DBA recognisable languages, rather than having to resort to nondeterministic ones. We believe that the restriction to DBAs and DCAs in equivalence queries also makes our algorithm more appealing to realistic applications, as the operations are cheap---NL---for DBAs and DCAs.

YNIMG Journal 2024 Journal Article

Arterial pulsation dependence of perivascular cerebrospinal fluid flow measured by dynamic diffusion tensor imaging in the human brain

  • Guangxu Han
  • Bingjie Jiao
  • Yifan Zhang
  • Zejun Wang
  • Chunjing Liang
  • Yong Li
  • Yi-Cheng Hsu
  • Ruiliang Bai

Perivascular cerebrospinal fluid (pCSF) flow is a key component of the glymphatic system. Arterial pulsation has been proposed as the main driving force of pCSF influx along the superficial and penetrating arteries; however, evidence of this mechanism in humans is limited. We proposed an experimental framework of dynamic diffusion tensor imaging with low b-values and ultra-long echo time (dynDTI low-b ) to capture pCSF flow properties during the cardiac cycle in human brains. Healthy adult volunteers (aged 17–28 years; seven men, one woman) underwent dynDTI low-b using a 3T scanner (MAGNETOM Prisma, Siemens Healthcare, Erlangen, Germany) with simultaneously recorded cardiac output. The results showed that diffusion tensors reconstructed from pCSF were mainly oriented in the direction of the neighboring arterial flow. When switching from vasoconstriction to vasodilation, the axial and radial diffusivities of the pCSF increased by 5. 7 % and 4. 94 %, respectively, suggesting that arterial pulsation alters the pCSF flow both parallel and perpendicular to the arterial wall. DynDTI low-b signal intensity at b=0 s/mm2 (i. e. , T2-weighted, [S(b=0 s/mm2)]) decreased in systole, but this change was ∼7. 5 % of a cardiac cycle slower than the changes in apparent diffusivity, suggesting that changes in S(b=0 s/mm2) and apparent diffusivity arise from distinct physiological processes and potential biomarkers associated with perivascular space volume and pCSF flow, respectively. Additionally, the mean diffusivities of white matter showed cardiac-cycle dependencies similar to pCSF, although a delay relative to the peak time of apparent diffusivity in pCSF was present, suggesting that dynDTI low-b could potentially reveal the dynamics of magnetic resonance imaging-invisible pCSF surrounding small arteries and arterioles in white matter; this delay may result from pulse wave propagation along penetrating arteries. In conclusion, the vasodilation-induced increases in axial and radial diffusivities of pCSF and mean diffusivities of white matter are consistent with the notion that arterial pulsation can accelerate pCSF flow in human brain. Furthermore, the proposed dynDTI low-b technique can capture various pCSF dynamics in artery pulsation.

TIST Journal 2024 Journal Article

Demand-driven Urban Facility Visit Prediction

  • Yunke Zhang
  • Tong Li
  • Yuan Yuan
  • Fengli Xu
  • Fan Yang
  • Funing Sun
  • Yong Li

Predicting citizens’ visiting behaviors to urban facilities is instrumental for city governors and planners to detect inequalities in urban opportunities and optimize the distribution of facilities and resources. Previous works predict facility visits simply using observed visit behavior, yet citizens’ intrinsic demands for facilities are not characterized explicitly, causing potential incorrect learned relations in the prediction results. In this article, to make up for this deficiency, we present a demand-driven urban facility visit prediction method that decomposes citizens’ visits to facilities into their unobservable demands and their capability to fulfill them. Demands are expressed as the function of regional demographic attributes by a neural network, and the fulfillment capability is determined by the urban region’s spatial accessibility to facilities. Extensive evaluations of datasets of three large cities confirm the efficiency and rationality of our model. Our method outperforms the best state-of-the-art model by 8.28% on average in facility visit prediction tasks. Further analyses demonstrate the reasonableness of recovered facility demands and their relationship with citizen demographics. For instance, senior citizens tend to have higher medical demands but lower shopping demands. Meanwhile, estimated capabilities and accessibilities provide deeper insights into the decaying accessibility with respect to spatial distance and facilities’ diverse functions in the urban environment. Our findings shed light on demand-driven urban data mining and demand-based urban facility planning.

TIST Journal 2024 Journal Article

Empowering Predictive Modeling by GAN-based Causal Information Learning

  • Jinwei Zeng
  • Guozhen Zhang
  • Jian Yuan
  • Yong Li
  • Depeng Jin

Generally speaking, we can easily specify many causal relationships in the prediction tasks of ubiquitous computing, such as human activity prediction, mobility prediction, and health prediction. However, most of the existing methods in these fields failed to take advantage of this prior causal knowledge. They typically make predictions only based on correlations in the data, which hinders the prediction performance in real-world scenarios, because a distribution shift between training data and testing data generally exists. To fill in this gap, we proposed a Generative Adversarial Network (GAN)-based Causal Information Learning prediction framework, which can effectively leverage causal information to improve the prediction performance of existing ubiquitous computing deep learning models. Specifically, faced with a unique challenge that the treatment variable, referring to the intervention that influences the target in a causal relationship, is generally continuous in ubiquitous computing, the framework employs a representation learning approach with a GAN-based deep learning model. By projecting all variables except the treatment into a latent space, it effectively minimizes confounding bias and leverages the learned latent representation for accurate predictions. In this way, it deals with the continuous treatment challenge, and in the meantime, it can be easily integrated with existing deep learning models to lift their prediction performance in practical scenarios with causal information. Extensive experiments on two large-scale real-world datasets demonstrate its superior performance over multiple state-of-the-art baselines. We also propose an analytical framework together with extensive experiments to empirically show that our framework achieves better performance gain under two conditions: when the distribution differences between the training data and the testing data are more significant and when the treatment effects are larger. Overall, this work suggests that learning causal information is a promising way to improve the prediction performance of ubiquitous computing tasks. We open both our dataset and code 1 and call for more research attention in this area.

AAAI Conference 2024 Conference Paper

Estimating On-Road Transportation Carbon Emissions from Open Data of Road Network and Origin-Destination Flow Data

  • Jinwei Zeng
  • Yu Liu
  • Jingtao Ding
  • Jian Yuan
  • Yong Li

Accounting for over 20% of the total carbon emissions, the precise estimation of on-road transportation carbon emissions is crucial for carbon emission monitoring and efficient mitigation policy formulation. However, existing estimation methods typically depend on hard-to-collect individual statistics of vehicle miles traveled to calculate emissions, thereby suffering from high data collection difficulty. To relieve this issue by utilizing the strong pattern recognition of artificial intelligence, we incorporate two sources of open data representative of the transportation demand and capacity factors, the origin-destination (OD) flow data and the road network data, to build a hierarchical heterogeneous graph learning method for on-road carbon emission estimation (HENCE). Specifically, a hierarchical graph consisting of the road network level, community level, and region level is constructed to model the multi-scale road network-based connectivity and travel connection between spatial areas. Heterogeneous graphs consisting of OD links and spatial links are further built at both the community level and region level to capture the intrinsic interactions between travel demand and road network accessibility. Extensive experiments on two large-scale real-world datasets demonstrate HENCE's effectiveness and superiority with R-squared exceeding 0.75 and outperforming baselines by 9.60% on average, validating its success in pioneering the use of artificial intelligence to empower carbon emission management and sustainability development. The implementation codes are available at this link: https://github.com/tsinghua-fib-lab/HENCE.

TIST Journal 2024 Journal Article

Fine-grained Courier Delivery Behavior Recovery with a Digital Twin Based Iterative Calibration Framework

  • Fudan Yu
  • Guozhen Zhang
  • Haotian Wang
  • Depeng Jin
  • Yong Li

Recovering the fine-grained working process of couriers is becoming one of the essential problems for improving the express delivery systems because knowing the detailed process of how couriers accomplish their daily work facilitates the analyzing, understanding, and optimizing of the working procedure. Although coarse-grained courier trajectories and waybill delivery time data can be collected, this problem is still challenging due to noisy data with spatio-temporal biases, lacking ground truth of couriers’ fine-grained behaviors, and complex correlations between behaviors. Existing works typically focus on a single dimension of the process such as inferring the delivery time and can only yield results of low spatio-temporal resolution, which cannot address the problem well. To bridge the gap, we propose a digital-twin-based iterative calibration system (DTRec) for fine-grained courier working process recovery. We first propose a spatio-temporal bias correction algorithm, which systematically improves existing methods in correcting waybill addresses and trajectory stay points. Second, to model the complex correlations among behaviors and inherent physical constraints, we propose an agent-based model to build the digital twin of couriers. Third, to further improve recovery performance, we design a digital-twin-based iterative calibration framework, which leverages the inconsistency between the deduction results of the digital twin and the recovery results from real-world data to improve both the agent-based model and the recovery results. Experiments show that DTRec outperforms state-of-the-art baselines by 10.8% in terms of fine-grained accuracy on real-world datasets. The system is deployed in the industrial practices in JD Logistics with promising applications. The code is available at https://github.com/tsinghua-fib-lab/Courier-DTRec.

IJCAI Conference 2024 Conference Paper

From Pixels to Progress: Generating Road Network from Satellite Imagery for Socioeconomic Insights in Impoverished Areas

  • Yanxin Xi
  • Yu Liu
  • Zhicheng Liu
  • Sasu Tarkoma
  • Pan Hui
  • Yong Li

The Sustainable Development Goals (SDGs) aim to resolve societal challenges, such as eradicating poverty and improving the lives of vulnerable populations in impoverished areas. Those areas rely on road infrastructure construction to promote accessibility and economic development. Although publicly available data like OpenStreetMap is available to monitor road status, data completeness in impoverished areas is limited. Meanwhile, the development of deep learning techniques and satellite imagery shows excellent potential for earth monitoring. To tackle the challenge of road network assessment in impoverished areas, we develop a systematic road extraction framework combining an encoder-decoder architecture and morphological operations on satellite imagery, offering an integrated workflow for interdisciplinary researchers. Extensive experiments of road network extraction on real-world data in impoverished regions achieve a 42. 7% enhancement in the F1-score over the baseline methods and reconstruct about 80% of the actual roads. We also propose a comprehensive road network dataset covering approximately 794, 178 km2 area and 17. 048 million people in 382 impoverished counties in China. The generated dataset is further utilized to conduct socioeconomic analysis in impoverished counties, showing that road network construction positively impacts regional economic development. The technical appendix, code, and generated dataset can be found at https: //github. com/tsinghua-fib-lab/Road_network_extraction_impoverished_counties.

AAAI Conference 2024 Conference Paper

GLOP: Learning Global Partition and Local Construction for Solving Large-Scale Routing Problems in Real-Time

  • Haoran Ye
  • Jiarui Wang
  • Helan Liang
  • Zhiguang Cao
  • Yong Li
  • Fanzhang Li

The recent end-to-end neural solvers have shown promise for small-scale routing problems but suffered from limited real-time scaling-up performance. This paper proposes GLOP (Global and Local Optimization Policies), a unified hierarchical framework that efficiently scales toward large-scale routing problems. GLOP hierarchically partitions large routing problems into Travelling Salesman Problems (TSPs) and TSPs into Shortest Hamiltonian Path Problems. For the first time, we hybridize non-autoregressive neural heuristics for coarse-grained problem partitions and autoregressive neural heuristics for fine-grained route constructions, leveraging the scalability of the former and the meticulousness of the latter. Experimental results show that GLOP achieves competitive and state-of-the-art real-time performance on large-scale routing problems, including TSP, ATSP, CVRP, and PCTSP. Our code is available at: https://github.com/henry-yeh/GLOP.

NeurIPS Conference 2024 Conference Paper

HLM-Cite: Hybrid Language Model Workflow for Text-based Scientific Citation Prediction

  • Qianyue Hao
  • Jingyang Fan
  • Fengli Xu
  • Jian Yuan
  • Yong Li

Citation networks are critical infrastructures of modern science, serving as intricate webs of past literature and enabling researchers to navigate the knowledge production system. To mine information hiding in the link space of such networks, predicting which previous papers (candidates) will a new paper (query) cite is a critical problem that has long been studied. However, an important gap remains unaddressed: the roles of a paper's citations vary significantly, ranging from foundational knowledge basis to superficial contexts. Distinguishing these roles requires a deeper understanding of the logical relationships among papers, beyond simple edges in citation networks. The emergence of large language models (LLMs) with textual reasoning capabilities offers new possibilities for discerning these relationships, but there are two major challenges. First, in practice, a new paper may select its citations from gigantic existing papers, where the combined texts far exceed the context length of LLMs. Second, logical relationships between papers are often implicit, and directly prompting an LLM to predict citations may lead to results based primarily on surface-level textual similarities, rather than the deeper logical reasoning required. In this paper, we introduce the novel concept of core citation, which identifies the critical references that go beyond superficial mentions. Thereby, we elevate the citation prediction task from a simple binary classification to a more nuanced problem: distinguishing core citations from both superficial citations and non-citations. To address this, we propose $\textbf{HLM-Cite}$, a $\textbf{H}$ybrid $\textbf{L}$anguage $\textbf{M}$odel workflow for citation prediction, which combines embedding and generative LMs. We design a curriculum finetune procedure to adapt a pretrained text embedding model to coarsely retrieve high-likelihood core citations from vast candidate sets and then design an LLM agentic workflow to rank the retrieved papers through one-shot reasoning, revealing the implicit relationships among papers. With the two-stage pipeline, we can scale the candidate sets to 100K papers, vastly exceeding the size handled by existing methods. We evaluate HLM-Cite on a dataset across 19 scientific fields, demonstrating a 17. 6\% performance improvement comparing SOTA methods. Our code is open-source at https: //github. com/tsinghua-fib-lab/H-LM for reproducibility.

TIST Journal 2024 Journal Article

KGDA: A Knowledge Graph Driven Decomposition Approach for Cellular Traffic Prediction

  • Jiahui Gong
  • Tong Li
  • Huandong Wang
  • Yu Liu
  • Xing Wang
  • Zhendong Wang
  • Chao Deng
  • Junlan Feng

Understanding and accurately predicting cellular traffic data is vital for communication operators and device users, as it facilitates efficient resource allocation and ensures superior service quality. However, large-scale cellular traffic data forecasting remains challenging due to intricate temporal variations and complex spatial relationships. This article proposes a Knowledge Graph Driven Decomposition Approach (KGDA) for precise cellular traffic prediction. The KGDA breaks down the impact of static environmental factors and dynamic autocorrelations of cellular traffic time series, enabling the capture of overall traffic changes and understanding of traffic dependence on past values. Specifically, we propose an urban knowledge graph to capture the static environmental context of base stations, mapping these entities into the same latent space while retaining static environmental knowledge. The cellular traffic is divided into a regular pattern and fluctuating residual components, with the KGDA comprising four modules: a Knowledge Graph Representation Learning model, a traffic regular pattern prediction module, a traffic residual dynamic prediction module, and an attentional fusion module. The first leverages graph neural networks to extract spatial contexts and predict regular patterns, the second utilizes the Bi-directional Long Short-Term Memory (Bi-LSTM) model to capture autocorrelations of traffic time series, and the final module integrates the patterns and residuals to produce the final prediction result. Comprehensive experiments demonstrate that our proposed model outperforms state-of-the-art models by more than 10% in forecasting cellular traffic.

NeurIPS Conference 2024 Conference Paper

Long-tailed Object Detection Pretraining: Dynamic Rebalancing Contrastive Learning with Dual Reconstruction

  • Chen-Long Duan
  • Yong Li
  • Xiu-Shen Wei
  • Lin Zhao

Pre-training plays a vital role in various vision tasks, such as object recognition and detection. Commonly used pre-training methods, which typically rely on randomized approaches like uniform or Gaussian distributions to initialize model parameters, often fall short when confronted with long-tailed distributions, especially in detection tasks. This is largely due to extreme data imbalance and the issue of simplicity bias. In this paper, we introduce a novel pre-training framework for object detection, called Dynamic Rebalancing Contrastive Learning with Dual Reconstruction (2DRCL). Our method builds on a Holistic-Local Contrastive Learning mechanism, which aligns pre-training with object detection by capturing both global contextual semantics and detailed local patterns. To tackle the imbalance inherent in long-tailed data, we design a dynamic rebalancing strategy that adjusts the sampling of underrepresented instances throughout the pre-training process, ensuring better representation of tail classes. Moreover, Dual Reconstruction addresses simplicity bias by enforcing a reconstruction task aligned with the self-consistency principle, specifically benefiting underrepresented tail classes. Experiments on COCO and LVIS v1. 0 datasets demonstrate the effectiveness of our method, particularly in improving the mAP/AP scores for tail classes.

IJCAI Conference 2024 Conference Paper

Long-term Detection and Monitory of Chinese Urban Village Using Satellite Imagery

  • Yuming Lin
  • Xin Zhang
  • Yu Liu
  • Zhenyu Han
  • Qingmin Liao
  • Yong Li

Urban villages are areas filled with rural-like improvised structures in Chinese cities, usually housing the most vulnerable groups. Under the guidance of the Sustainable Development Goals (SDGs), the Chinese government initiated renewal and redevelopment projects, underscoring the meticulous mapping and segmentation of urban villages. Satellite imagery is advanced and efficient in identifying urban villages and monitoring changes, but traditional methods neglect the morphological diversity in season, shape, size, spacing, and layout of urban villages, which is not satisfying for long-term wide-range data. Here, we design a targeted approach based on Tobler’s First Law of Geography, using curriculum labeling to solve morphological diversity and semi-automatically generate segmentation for urban village boundaries. Specifically, we use manually labeled data as seeds for pre-trained SegFormer models and incrementally fine-tune the model based on geographical proximity. The rigorous experimentation across five diverse cities substantiates the commendable efficacy of our methodology. IoU metric demonstrates a noteworthy improvement of over 119% to baseline. Our final results cover 265, 050 urban villages across 433 cities in China over the past 10 years, and the analysis reveals the uneven redevelopment by geography and city scale. We further examine the within-city distribution and verify the urban scaling law associated with several socio-economic factors. Our method can be used nationwide to decide redevelopment priority and resource tilt, contributing to SDG 11. 1 on affordable housing and upgrading slums. The code and dataset are available at https: //github. com/tsinghua-fib-lab/LtCUV.

ICML Conference 2024 Conference Paper

Masked Face Recognition with Generative-to-Discriminative Representations

  • Shiming Ge
  • Weijia Guo
  • Chenyu Li 0001
  • Junzheng Zhang
  • Yong Li
  • Dan Zeng 0001

Masked face recognition is important for social good but challenged by diverse occlusions that cause insufficient or inaccurate representations. In this work, we propose a unified deep network to learn generative-to-discriminative representations for facilitating masked face recognition. To this end, we split the network into three modules and learn them on synthetic masked faces in a greedy module-wise pretraining manner. First, we leverage a generative encoder pretrained for face inpainting and finetune it to represent masked faces into category-aware descriptors. Attribute to the generative encoder’s ability in recovering context information, the resulting descriptors can provide occlusion-robust representations for masked faces, mitigating the effect of diverse masks. Then, we incorporate a multi-layer convolutional network as a discriminative reformer and learn it to convert the category-aware descriptors into identity-aware vectors, where the learning is effectively supervised by distilling relation knowledge from off-the-shelf face recognition model. In this way, the discriminative reformer together with the generative encoder serves as the pretrained backbone, providing general and discriminative representations towards masked faces. Finally, we cascade one fully-connected layer following by one softmax layer into a feature classifier and finetune it to identify the reformed identity-aware vectors. Extensive experiments on synthetic and realistic datasets demonstrate the effectiveness of our approach in recognizing masked faces.

NeurIPS Conference 2024 Conference Paper

Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration

  • Wenjie Fu
  • Huandong Wang
  • Chen Gao
  • Guanghua Liu
  • Yong Li
  • Tao Jiang

Membership Inference Attacks (MIA) aim to infer whether a target data record has been utilized for model training or not. Existing MIAs designed for large language models (LLMs) can be bifurcated into two types: reference-free and reference-based attacks. Although reference-based attacks appear promising performance by calibrating the probability measured on the target model with reference models, this illusion of privacy risk heavily depends on a reference dataset that closely resembles the training set. Both two types of attacks are predicated on the hypothesis that training records consistently maintain a higher probability of being sampled. However, this hypothesis heavily relies on the overfitting of target models, which will be mitigated by multiple regularization methods and the generalization of LLMs. Thus, these reasons lead to high false-positive rates of MIAs in practical scenarios. We propose a Membership Inference Attack based on Self-calibrated Probabilistic Variation (SPV-MIA). Specifically, we introduce a self-prompt approach, which constructs the dataset to fine-tune the reference model by prompting the target LLM itself. In this manner, the adversary can collect a dataset with a similar distribution from public APIs. Furthermore, we introduce probabilistic variation, a more reliable membership signal based on LLM memorization rather than overfitting, from which we rediscover the neighbour attack with theoretical grounding. Comprehensive evaluation conducted on three datasets and four exemplary LLMs shows that SPV-MIA raises the AUC of MIAs from 0. 7 to a significantly high level of 0. 9. Our code and dataset are available at: https: //github. com/tsinghua-fib-lab/NeurIPS2024_SPV-MIA

TIST Journal 2024 Journal Article

Mitigating Recommendation Biases via Group-Alignment and Global-Uniformity in Representation Learning

  • Miaomiao Cai
  • Min Hou
  • Lei Chen
  • Le Wu
  • Haoyue Bai
  • Yong Li
  • Meng Wang

Collaborative Filtering (CF) plays a crucial role in modern recommender systems, leveraging historical user-item interactions to provide personalized suggestions. However, CF-based methods often encounter biases due to imbalances in training data. This phenomenon makes CF-based methods tend to prioritize recommending popular items and performing unsatisfactorily on inactive users. Existing works address this issue by rebalancing training samples, reranking recommendation results, or making the modeling process robust to the bias. Despite their effectiveness, these approaches can compromise accuracy or be sensitive to weighting strategies, making them challenging to train. Therefore, exploring how to mitigate these biases remains in urgent demand. In this article, we deeply analyze the causes and effects of the biases and propose a framework to alleviate biases in recommendation from the perspective of representation distribution, namely Group-Alignment and Global-Uniformity Enhanced Representation Learning for Debiasing Recommendation (AURL). Specifically, we identify two significant problems in the representation distribution of users and items, namely group-discrepancy and global-collapse. These two problems directly lead to biases in the recommendation results. To this end, we propose two simple but effective regularizers in the representation space, respectively named group-alignment and global-uniformity. The goal of group-alignment is to bring the representation distribution of long-tail entities closer to that of popular entities, while global-uniformity aims to preserve the information of entities as much as possible by evenly distributing representations. Our method directly optimizes both the group-alignment and global-uniformity regularization terms to mitigate recommendation biases. Please note that AURL applies to arbitrary CF-based recommendation backbones. Extensive experiments on three real datasets and various recommendation backbones verify the superiority of our proposed framework. The results show that AURL not only outperforms existing debiasing models in mitigating biases but also improves recommendation performance to some extent.

AAAI Conference 2024 Conference Paper

Patch-Aware Sample Selection for Efficient Masked Image Modeling

  • Zhengyang Zhuge
  • Jiaxing Wang
  • Yong Li
  • Yongjun Bao
  • Peisong Wang
  • Jian Cheng

Nowadays sample selection is drawing increasing attention. By extracting and training only on the most informative subset, sample selection can effectively reduce the training cost. Although sample selection is effective in conventional supervised learning, applying it to Masked Image Modeling (MIM) still poses challenges due to the gap between sample-level selection and patch-level pre-training. In this paper, we inspect the sample selection in MIM pre-training and find the basic selection suffers from performance degradation. We attribute this degradation primarily to 2 factors: the random mask strategy and the simple averaging function. We then propose Patch-Aware Sample Selection (PASS), including a low-cost Dynamic Trained Mask Predictor (DTMP) and Weighted Selection Score (WSS). DTMP consistently masks the informative patches in samples, ensuring a relatively accurate representation of selection score. WSS enhances the selection score using patch-level disparity. Extensive experiments show the effectiveness of PASS in selecting the most informative subset and accelerating pretraining. PASS exhibits superior performance across various datasets, MIM methods, and downstream tasks. Particularly, PASS improves MAE by 0.7% on ImageNet-1K while utilizing only 37% data budget and achieves ~1.7x speedup.

TIST Journal 2024 Journal Article

RCCNet: A Spatial-Temporal Neural Network Model for Logistics Delivery Timely Rate Prediction

  • Jinhui Yi
  • Huan Yan
  • Haotian Wang
  • Jian Yuan
  • Yong Li

In logistics service, the delivery timely rate is a key experience indicator, which is highly essential to the competitive advantage of express companies. Prediction on it enables intervention on couriers with low predicted results in advance, thus ensuring employee productivity and customer satisfaction. Currently, few related works focus on couriers’ level delivery timely rate prediction, and there are complex spatial correlations between couriers and road districts in the express scenario, which makes traditional real-time prediction approaches hard to utilize. To deal with this, we propose a deep spatial-temporal neural network, RCCNet to model spatial-temporal correlations. Specifically, we adopt Node2vec, which can encode the road network-based graph directly to capture spatial correlations between road districts. Further, we calculate couriers’ historical time-series similarity to build a graph and employ graph convolutional networks to capture the correlation between couriers. We also leverage historical sequential information with long short-term memory networks. We conduct experiments with real-world express datasets. Compared with other competitive baseline methods widely used in industry, the experiment results demonstrate its superior performance over multiple baselines.

TIST Journal 2024 Journal Article

Reinforcement Learning for Solving Multiple Vehicle Routing Problem with Time Window

  • Zefang Zong
  • Xia Tong
  • Meng Zheng
  • Yong Li

Vehicle routing problem with time window (VRPTW) is of great importance for a wide spectrum of services and real-life applications, such as online take-out and car-hailing platforms. A promising method should generate high-qualified solutions within limited inference time, and there are three major challenges: (a) directly optimizing the goal with several practical constraints; (b) efficiently handling individual time-window limits; and (c) modeling the cooperation among the vehicle fleet. In this article, we present an end-to-end reinforcement learning framework to solve VRPTW. First, we propose an agent model that encodes constraints into features as the input and conducts harsh policy on the output when generating deterministic results. Second, we design a time penalty augmented reward to model the time-window limits during gradient propagation. Third, we design a task handler to enable the cooperation among different vehicles. We perform extensive experiments on two real-world datasets and one public benchmark dataset. Results demonstrate that our solution improves the performance by up to 11.7% compared to other RL baselines and could generate solutions for instances within seconds, while existing heuristic baselines take for minutes as well as maintain the quality of solutions. Moreover, our solution is thoroughly analyzed with meaningful implications due to the real-time response ability.

AAAI Conference 2024 Conference Paper

Social Physics Informed Diffusion Model for Crowd Simulation

  • Hongyi Chen
  • Jingtao Ding
  • Yong Li
  • Yue Wang
  • Xiao-Ping Zhang

Crowd simulation holds crucial applications in various domains, such as urban planning, architectural design, and traffic arrangement. In recent years, physics-informed machine learning methods have achieved state-of-the-art performance in crowd simulation but fail to model the heterogeneity and multi-modality of human movement comprehensively. In this paper, we propose a social physics-informed diffusion model named SPDiff to mitigate the above gap. SPDiff takes both the interactive and historical information of crowds in the current timeframe to reverse the diffusion process, thereby generating the distribution of pedestrian movement in the subsequent timeframe. Inspired by the well-known social physics model, i.e., Social Force, regarding crowd dynamics, we design a crowd interaction encoder to guide the denoising process and further enhance this module with the equivariant properties of crowd interactions. To mitigate error accumulation in long-term simulations, we propose a multi-frame rollout training algorithm for diffusion modeling. Experiments conducted on two real-world datasets demonstrate the superior performance of SPDiff in terms of both macroscopic and microscopic evaluation metrics. Code and appendix are available at https://github.com/tsinghua-fib-lab/SPDiff.

AAAI Conference 2024 Conference Paper

UV-SAM: Adapting Segment Anything Model for Urban Village Identification

  • Xin Zhang
  • Yu Liu
  • Yuming Lin
  • Qingmin Liao
  • Yong Li

Urban villages, defined as informal residential areas in or around urban centers, are characterized by inadequate infrastructures and poor living conditions, closely related to the Sustainable Development Goals (SDGs) on poverty, adequate housing, and sustainable cities. Traditionally, governments heavily depend on field survey methods to monitor the urban villages, which however are time-consuming, labor-intensive, and possibly delayed. Thanks to widely available and timely updated satellite images, recent studies develop computer vision techniques to detect urban villages efficiently. However, existing studies either focus on simple urban village image classification or fail to provide accurate boundary information. To accurately identify urban village boundaries from satellite images, we harness the power of the vision foundation model and adapt the Segment Anything Model (SAM) to urban village segmentation, named UV-SAM. Specifically, UV-SAM first leverages a small-sized semantic segmentation model to produce mixed prompts for urban villages, including mask, bounding box, and image representations, which are then fed into SAM for fine-grained boundary identification. Extensive experimental results on two datasets in China demonstrate that UV-SAM outperforms existing baselines, and identification results over multiple years show that both the number and area of urban villages are decreasing over time, providing deeper insights into the development trends of urban villages and sheds light on the vision foundation models for sustainable cities. The dataset and codes of this study are available at https://github.com/tsinghua-fib-lab/UV-SAM.

TIST Journal 2024 Journal Article

VesNet: A Vessel Network for Jointly Learning Route Pattern and Future Trajectory

  • Fenyu Jiang
  • Huandong Wang
  • Yong Li

Vessel trajectory prediction is the key to maritime applications such as traffic surveillance, collision avoidance, anomaly detection, and so on. Making predictions more precisely requires a better understanding of the moving trend for a particular vessel since the movement is affected by multiple factors like marine environment, vessel type, and vessel behavior. In this paper, we propose a model named VesNet, based on the attentional seq2seq framework, to predict vessel future movement sequence by observing the current trajectory. Firstly, we extract the route patterns from the raw AIS data during preprocessing. Then, we design a multi-task learning structure to learn how to implement route pattern classification and vessel trajectory prediction simultaneously. By comparing with representative baseline models, we find that our VesNet has the best performance in terms of long-term prediction precision. Additionally, VesNet can recognize the route pattern by capturing the implicit moving characteristics. The experimental results prove that the proposed multi-task learning assists the vessel trajectory prediction mission.

IJCAI Conference 2024 Conference Paper

VulnerabilityMap: An Open Framework for Mapping Vulnerability among Urban Disadvantaged Populations in the United States

  • Lin Chen
  • Yong Li
  • Pan Hui

Cities are crucibles of numerous opportunities, but also hotbeds of inequality. The plight of disadvantaged populations who are ``left behind'' within urban environments has been an increasingly pressing concern, which poses substantial threats to the realization of the UN SDG agenda. However, a comprehensive framework for studying this urban dilemma is currently absent, preventing researchers from developing AI models for social good prediction and intervention. To fill this gap, we construct VulnerabilityMap, a framework to meticulously dissect the challenges faced by urban disadvantaged populations, unraveling their vulnerability to a spectrum of shocks and stresses that are categorized through the prism of Maslow's hierarchy of needs. Specifically, we systematically collect large-scale multi-sourced census and web-based data covering more than 328 million people in the United States regarding demographic features, neighborhood environments, offline mobility behaviors, and online social connections. These features are further related to vulnerability outcomes from short-term shocks such as COVID-19 and long-term physiological, social, and self-actualization stresses. Leveraging our framework, we construct machine learning models that exhibit strong performance in predicting vulnerability outcomes from various disadvantage features, which shows the promising utility of our framework to support targeted AI models. Moreover, we provide model-based explainability analysis to interpret the reasons underlying model predictions, shedding light on intricate social factors that trap certain populations inside vulnerable situations. Our constructed dataset is publicly available at https: //github. com/LinChen-65/VulnerabilityMap/.

TIST Journal 2023 Journal Article

DAS: Efficient Street View Image Sampling for Urban Prediction

  • Guozhen Zhang
  • Jinhui Yi
  • Jian Yuan
  • Yong Li
  • Depeng Jin

Street view data is one of the most common data sources for urban prediction tasks, such as estimating socioeconomic status, sensing physical urban changes, and identifying urban villages. Typical research in this field consists of two steps: acquiring a dataset with a street view image sampling algorithm and designing a prediction algorithm for urban prediction tasks. However, most of the previous research focuses on the prediction algorithms, leaving the sampling algorithms underexplored. To fill this gap, we set out to investigate how different street view image sampling algorithms affect the performance of the follow-up tasks and develop an effective street view image sampling algorithm for urban prediction. Through a comprehensive analysis of the performance of different sampling algorithms in three of the most common urban prediction tasks, including commercial activeness prediction, urban liveliness prediction, and urban population prediction, we provide solid empirical evidence that the sampling algorithm significantly affects the performance of the prediction model. Specifically, the performance differences of different sampling algorithms can reach over 25%. Further, we revealed that the sampling step size and the sampling quality are two important factors that affect the performance of a sampling algorithm, while the sampling angle has little influence. Inspired by our analysis results, we propose an effective street view image sampling algorithm, DAS, which contains a denoising module and an adaptive sampling module. It can dynamically adjust the sampling step size to adapt to the optimal size for each region and get rid of the impact of noise images in the meantime. Experiments on three large-scale datasets demonstrate its superior performance over multiple state-of-the-art baselines, and further ablation study shows the effectiveness of each module. Finally, through a thorough discussion of our findings and experimental results, we provide insights into the street view image sampling algorithm design, and we call for more researches in this blank area.

NeurIPS Conference 2023 Conference Paper

DeepACO: Neural-enhanced Ant Systems for Combinatorial Optimization

  • Haoran Ye
  • Jiarui Wang
  • Zhiguang Cao
  • Helan Liang
  • Yong Li

Ant Colony Optimization (ACO) is a meta-heuristic algorithm that has been successfully applied to various Combinatorial Optimization Problems (COPs). Traditionally, customizing ACO for a specific problem requires the expert design of knowledge-driven heuristics. In this paper, we propose DeepACO, a generic framework that leverages deep reinforcement learning to automate heuristic designs. DeepACO serves to strengthen the heuristic measures of existing ACO algorithms and dispense with laborious manual design in future ACO applications. As a neural-enhanced meta-heuristic, DeepACO consistently outperforms its ACO counterparts on eight COPs using a single neural model and a single set of hyperparameters. As a Neural Combinatorial Optimization method, DeepACO performs better than or on par with problem-specific methods on canonical routing problems. Our code is publicly available at https: //github. com/henry-yeh/DeepACO.

TIST Journal 2023 Journal Article

Discovering Causes of Traffic Congestion via Deep Transfer Clustering

  • Mudan Wang
  • Yuan Yuan
  • Huan Yan
  • HONGJIE SUI
  • Fan Zuo
  • Yue Liu
  • Yong Li
  • Depeng Jin

Traffic congestion incurs long delay in travel time, which seriously affects our daily travel experiences. Exploring why traffic congestion occurs is significantly important to effectively address the problem of traffic congestion and improve user experience. Traditional approaches to mine the congestion causes depend on human efforts, which is time consuming and cost-intensive. Hence, we aim at discovering the known and unknown causes of traffic congestion in a systematic way. However, to achieve it, there are three challenges: (1) traffic congestion is affected by several factors with complex spatio-temporal relations; (2) there are a few samples of congestion data with known causes due to the limitation of human label; (3) more unknown congestion causes are unexplored since several factors contribute to traffic congestion. To address above challenges, we design a congestion cause discovery system consisting of two modules: (1) congestion feature extraction module, which extracts the important features distinguishing between different causes of congestion; and (2) congestion cause discovery module, which designs a deep semi-supervised learning based framework to discover the causes of traffic congestion with limited labeled data. Specifically, in pre-training stage, it first leverages a few labeled data as prior knowledge to pre-train the model. Then, in clustering stage, we propose two different clustering methods to discover the congestion causes. For the first clustering method, we extend the classic deep embedded clustering model to produce clusters via soft assignment. For the second one, we iteratively use k -means to group the latent features extracted from the pre-trained model, and use the cluster results as pseudo-labels to fine-tune the network. Extensive experiments show that the performance of our methods is superior to the state-of-the-art baselines, which demonstrates the effectiveness of the proposed cause discovery system. Additionally, our system is deployed and used in the practical production environment at Amap.

TIST Journal 2023 Journal Article

Disease Simulation in Airport Scenario Based on Individual Mobility Model

  • Zhenyu Han
  • Siran Ma
  • Changzheng Gao
  • Erzhuo Shao
  • Yulai Xie
  • Yang Zhang
  • Lu Geng
  • Yong Li

As the rapid-spreading disease COVID-19 occupies the world, most governments adopt strict control policies to alleviate the impact of the virus. These policies successfully reduced the prevalence and delayed the epidemic peak, while they are also associated with high economic and social costs. To bridge the microscopic epidemic transmission patterns and control policies, simulation systems play an important role. In this work, we propose an agent-based disease simulator for indoor public spaces, which contribute to most of the transmission in cities. As an example, we study Guangzhou Baiyun International Airport, which is one of the most bustling aviation hubs in China. Specifically, we design a high-efficiency mobility generation module to reconstruct the individual trajectories considering both lingering behavior and crowd mobility, which greatly enhances the credibility of the simulated mobility and ensures real-time performance. Based on the individual trajectories, we propose a multi-path disease transmission module optimized for indoor public spaces, which includes three main transmission paths as close contact transmission, aerosol transmission, and object surface transmission. We design a novel convolution-based algorithm to mimic the diffusion process, which can leverage the high concurrent capability of the graphics processing unit to accelerate the simulation process. Leveraging our simulation paradigm, the effectiveness of common policy interventions can be quantitatively evaluated. For mobility interventions, we find that lingering control is the most effective mobility intervention with 32.35% fewer infections, while increasing social distance and increasing walking speed have a similar effect with 15.15% and 18.02% fewer infections. It demonstrates the importance of introducing crowd mobility into disease transmission simulation. For transmission processes, we find the aerosol transmission involves in 99.99% of transmission, which highlights the importance of ventilation in indoor public spaces. Our simulation also demonstrates that without strict entrance detection to identify the input infections, only performing frequent disinfection cannot achieve desirable epidemic outcomes. Based on our simulation paradigm, we can shed light on better policy designs that achieve a good balance between disease spreading control and social costs.

TIST Journal 2023 Journal Article

Dual Graph Convolution Architecture Search for Travel Time Estimation

  • Guangyin Jin
  • Huan Yan
  • Fuxian Li
  • Yong Li
  • Jincai Huang

Travel time estimation (TTE) is a crucial task in intelligent transportation systems, which has been widely used in navigation and route planning. In recent years, several deep learning frameworks have been proposed to capture the dynamic features of road segments or intersections for travel time estimation. However, most existing works do not consider the joint features of the intersections and road segments. Moreover, most deep neural networks for TTE are designed based on empirical knowledge. Since the independent and joint features of intersections and road segments commonly vary with different datasets, the empirical deterministic neural architectures have limited adaptability to different scenarios. To tackle the above problems, we propose a novel automated deep learning framework, namely Automated Spatio-Temporal Dual Graph Convolutional Networks (Auto-STDGCN), for travel time estimation. Specifically, we propose to construct the node-wise graph and edge-wise graph to characterize the spatio-temporal features of intersections and road segments, respectively. In order to capture the joint spatio-temporal correlations of the dual graphs, a hierarchical neural architecture search approach is introduced, whose search space is composed of internal and external search space. In the internal search space, spatial graph convolution and temporal convolution operations are adopted to capture the respective spatio-temporal correlations of the dual graphs. Further, we design the external search space including the node-wise and edge-wise graph convolution operations from the internal architecture search to capture the interaction patterns between the intersections and road segments. We evaluate our proposed model Auto-STDGCN on three real-world datasets, which demonstrates that our model is significantly superior to the state-of-the-art methods. In addition, we also conduct case studies to visualize and explain the neural architectures learned by our model.

ICLR Conference 2023 Conference Paper

DynaMS: Dyanmic Margin Selection for Efficient Deep Learning

  • Jiaxing Wang
  • Yong Li
  • Jingwei Zhuo
  • Xupeng Shi
  • Weizhong Zhang
  • Lixing Gong
  • Tong Tao
  • Pengzhang Liu

The great success of deep learning is largely driven by training over-parameterized models on massive datasets. To avoid excessive computation, extracting and training only on the most informative subset is drawing increasing attention. Nevertheless, it is still an open question how to select such a subset on which the model trained generalizes on par with the full data. In this paper, we propose dynamic margin selection (DynaMS). DynaMS leverages the distance from candidate samples to the classification boundary to construct the subset, and the subset is dynamically updated during model training. We show that DynaMS converges with large probability, and for the first time show both in theory and practice that dynamically updating the subset can result in better generalization over previous works. To reduce the additional computation incurred by the selection, a light parameter sharing proxy (PSP) is designed. PSP is able to faithfully evaluate instances with respect to the current model, which is necessary for dynamic selection. Extensive analysis and experiments demonstrate the superiority of the proposed approach in data selection against many state-of-the-art counterparts on benchmark datasets.

NeurIPS Conference 2023 Conference Paper

Efficient Hyper-parameter Optimization with Cubic Regularization

  • Zhenqian Shen
  • Hansi Yang
  • Yong Li
  • James Kwok
  • Quanming Yao

As hyper-parameters are ubiquitous and can significantly affect the model performance, hyper-parameter optimization is extremely important in machine learning. In this paper, we consider a sub-class of hyper-parameter optimization problems, where the hyper-gradients are not available. Such problems frequently appear when the performance metric is non-differentiable or the hyper-parameter is not continuous. However, existing algorithms, like Bayesian optimization and reinforcement learning, often get trapped in local optimals with poor performance. To address the above limitations, we propose to use cubic regularization to accelerate convergence and avoid saddle points. First, we adopt stochastic relaxation, which allows obtaining gradient and Hessian information without hyper-gradients. Then, we exploit the rich curvature information by cubic regularization. Theoretically, we prove that the proposed method can converge to approximate second-order stationary points, and the convergence is also guaranteed when the lower-level problem is inexactly solved. Experiments on synthetic and real-world data demonstrate the effectiveness of our proposed method.

NeurIPS Conference 2023 Conference Paper

Incomplete Multimodality-Diffused Emotion Recognition

  • Yuanzhi Wang
  • Yong Li
  • Zhen Cui

Human multimodal emotion recognition (MER) aims to perceive and understand human emotions via various heterogeneous modalities, such as language, vision, and acoustic. Compared with unimodality, the complementary information in the multimodalities facilitates robust emotion understanding. Nevertheless, in real-world scenarios, the missing modalities hinder multimodal understanding and result in degraded MER performance. In this paper, we propose an Incomplete Multimodality-Diffused emotion recognition (IMDer) method to mitigate the challenge of MER under incomplete multimodalities. To recover the missing modalities, IMDer exploits the score-based diffusion model that maps the input Gaussian noise into the desired distribution space of the missing modalities and recovers missing data abided by their original distributions. Specially, to reduce semantic ambiguity between the missing and the recovered modalities, the available modalities are embedded as the condition to guide and refine the diffusion-based recovering process. In contrast to previous work, the diffusion-based modality recovery mechanism in IMDer allows to simultaneously reach both distribution consistency and semantic disambiguation. Feature visualization of the recovered modalities illustrates the consistent modality-specific distribution and semantic alignment. Besides, quantitative experimental results verify that IMDer obtains state-of-the-art MER accuracy under various missing modality patterns.

TIST Journal 2023 Journal Article

Interior Individual Trajectory Simulation with Population Distribution Constraint

  • Erzhuo Shao
  • Zhenyu Han
  • Yulai Xie
  • Yang Zhang
  • Lu Geng
  • Yong Li

Individual trajectory generation plays an important role in simulation tasks, reconstructing fine-grained mobility behaviors that can be used to evaluate epidemic risks, congestion risks, or commercial profit. Previous research works adopt the Newton’s mechanic-based particle model as their core algorithm, such as the Social Force model. However, real-world human mobility behaviors hardly follow the particle models, especially in the interior scenes where interactions between pedestrians and environments matter. In this article, we propose a Social Force-based trajectory simulator for interior scenarios that improve both trajectory quality and generation speed for interior scenarios. First, we introduce prior scene knowledge to guide the generation process, where pedestrians are armed with exploration behaviors that follow the group-level distribution. It provides more flexibility to simulate complicated human behaviors rather than straight-line movements, generating high-quality individual trajectories. Experiments show that the correlation between the aggregated population distribution of generated trajectories and ground-truth distribution is improved by 11.84% by our method. Second, we optimize the algorithm procedure by introducing a caching mechanism for tenderized intermediate values, along with graph-processing-unit-based implementation. Compared with the baseline Social Force model, we reduced the time consumption by 95%. More importantly, based on our simulation paradigm, we quantitatively evaluate several common mobility interventions in our simulation scenario, which can shed light on better policy designs in public spaces.

TIST Journal 2023 Journal Article

Learning Representations of Satellite Imagery by Leveraging Point-of-Interests

  • Tong Li
  • Yanxin Xi
  • Huandong Wang
  • Yong Li
  • Sasu Tarkoma
  • Pan Hui

Satellite imagery depicts the Earth’s surface remotely and provides comprehensive information for many applications, such as land use monitoring and urban planning. Existing studies on unsupervised representation learning for satellite images only take into account the images’ geographic information, ignoring human activity factors. To bridge this gap, we propose using the Point-of-Interest (POI) data to capture human factors and designing a contrastive learning-based framework to consolidate the representation of satellite imagery with POI information. Besides, we introduce a season-invariant representation learning model on satellite imagery, considering that human factors are mostly unchanging with respect to seasons. An attention model is designed at last to merge the representations from the geographic, seasonal, and POI perspectives adaptively. On the basis of real-world datasets collected from Beijing, 1 we evaluate our method for predicting socioeconomic indicators. The results show that the representation containing POI information outperforms the geographic representation in estimating commercial activity-related indicators. Our proposed attentional framework can estimate the socioeconomic indicators with R 2 of 0.874 and outperforms the baseline methods. Furthermore, we explore the differences in the representations of satellite images with varying socioeconomic statuses. Finally, we investigate the impact of geographic and POI perspective information in the representation learning process, as well as the effect of satellite imagery on various spatial resolutions.

AAAI Conference 2023 Conference Paper

PateGail: A Privacy-Preserving Mobility Trajectory Generator with Imitation Learning

  • Huandong Wang
  • Changzheng Gao
  • Yuchen Wu
  • Depeng Jin
  • Lina Yao
  • Yong Li

Generating human mobility trajectories is of great importance to solve the lack of large-scale trajectory data in numerous applications, which is caused by privacy concerns. However, existing mobility trajectory generation methods still require real-world human trajectories centrally collected as the training data, where there exists an inescapable risk of privacy leakage. To overcome this limitation, in this paper, we propose PateGail, a privacy-preserving imitation learning model to generate mobility trajectories, which utilizes the powerful generative adversary imitation learning model to simulate the decision-making process of humans. Further, in order to protect user privacy, we train this model collectively based on decentralized mobility data stored in user devices, where personal discriminators are trained locally to distinguish and reward the real and generated human trajectories. In the training process, only the generated trajectories and their rewards obtained based on personal discriminators are shared between the server and devices, whose privacy is further preserved by our proposed perturbation mechanisms with theoretical proof to satisfy differential privacy. Further, to better model the human decision-making process, we propose a novel aggregation mechanism of the rewards obtained from personal discriminators. We theoretically prove that under the reward obtained based on the aggregation mechanism, our proposed model maximizes the lower bound of the discounted total rewards of users. Extensive experiments show that the trajectories generated by our model are able to resemble real-world trajectories in terms of five key statistical metrics, outperforming state-of-the-art algorithms by over 48.03%. Furthermore, we demonstrate that the synthetic trajectories are able to efficiently support practical applications, including mobility prediction and location recommendation.

TMLR Journal 2023 Journal Article

Understanding and Simplifying Architecture Search in Spatio-Temporal Graph Neural Networks

  • Zhen Xu
  • Quanming Yao
  • Yong Li
  • Qiang Yang

Compiling together spatial and temporal modules via a unified framework, Spatio-Temporal Graph Neural Networks (STGNNs) have been popularly used in the multivariate spatio-temporal forecasting task, e.g. traffic prediction. After the numerous propositions of manually designed architectures, researchers show interest in the Neural Architecture Search (NAS) of STGNNs. Existing methods suffer from two issues: (1) hyperparameters like learning rate, channel size cannot be integrated into the NAS framework, which makes the model evaluation less accurate, potentially misleading the architecture search (2) the current search space, which basically mimics Darts-like methods, is too large for the search algorithm to find a sufficiently good candidate. In this work, we deal with both issues at the same time. We first re-examine the importance and transferability of the training hyperparameters to ensure a fair and fast comparison. Next, we set up a framework that disentangles architecture design into three disjoint angles according to how spatio-temporal representations flow and transform in architectures, which allows us to understand the behavior of architectures from a distributional perspective. This way, we can obtain good guidelines to reduce the STGNN search space and find state-of-the-art architectures by simple random search. As an illustrative example, we combine these principles with random search which already significantly outperforms both state-of-the-art hand-designed models and recently automatically searched ones.

ICRA Conference 2023 Conference Paper

Unidirectional-Road-Network-Based Global Path Planning for Cleaning Robots in Semi-Structured Environments

  • Yong Li
  • Hui Cheng

Practical global path planning is critical for commercializing cleaning robots working in semi-structured environments. In the literature, global path planning methods for free space usually focus on path length and neglect the traffic rule constraints of the environments, which leads to high-frequency re-planning and increases collision risks. In contrast, those for structured environments are developed mainly by strictly complying with the road network representing the traffic rule constraints, which may result in an overlong path that hinders the overall navigation efficiency. This article proposes a general and systematic approach to improve global path planning performance in semi-structured environments. A unidirectional road network is built to represent the traffic constraints in semi-structured environments and a hybrid strategy is proposed to achieve a guaranteed planning result. Cutting across the road at the starting and the goal points are allowed to achieve a shorter path. Especially, a two-layer potential map is proposed to achieve a guaranteed performance when the starting and the goal points are in complex intersections. Comparative experiments are carried out to validate the effectiveness of the proposed method. Quantitative experimental results show that, compared with the state-of-art, the proposed method guarantees a much better balance between path length and the consistency with the road network.

TIST Journal 2023 Journal Article

UrbanKG: An Urban Knowledge Graph System

  • Yu Liu
  • Jingtao Ding
  • Yanjie Fu
  • Yong Li

Every day, our living city produces a tremendous amount of spatial-temporal data, involved with multiple sources from the individual scale to the city scale. Undoubtedly, such massive urban data can be explored for a better city and better life, as what the urban computing community has been dedicating in recent years. Nevertheless, existing studies are still facing the challenges of data fusion for the urban data as well as the knowledge distillation for specific applications. Moreover, there is a lack of full-featured and user-friendly platforms for both researchers and developers in the urban computing scenario. Therefore, in this article, we present UrbanKG, an urban knowledge graph system to incorporate a knowledge graph with urban computing. Specifically, the system introduces a complete scheme to construct a knowledge graph for urban data fusion. Built upon the data layer, the system further develops the multiple layers of construction, storage, algorithm, operation, and applications, which achieve knowledge distillation and support various functions to the users. We perform representative use cases and demonstrate the system capability of boosting performance in various downstream applications, indicating a promising research direction for knowledge-driven urban computing.

TIST Journal 2023 Journal Article

You Are How You Use Apps: User Profiling Based on Spatiotemporal App Usage Behavior

  • Tong Li
  • Yong Li
  • Mingyang Zhang
  • Sasu Tarkoma
  • Pan Hui

Mobile apps have become an indispensable part of people’s daily lives. Users determine what apps to use and when and where to use them based on their tastes, interests, and personal demands, depending on their personality traits. This article aims to infer user profiles from their spatiotemporal mobile app usage behavior. Specifically, we first transform mobile app usage records into a heterogeneous graph. On the graph, nodes represent users, apps, locations, and time slots. Edges describe the co-occurrence of entities in usage records. We then develop a multi-relational heterogeneous graph attention network (MRel-HGAN), an end-to-end system for user profiling. MRel-HGAN first adopts a neighbor sampling strategy based on bootstrapping to sample heavily connected neighbors of a fixed size for each node. Next, we design a relational graph convolutional operation and a multi-relational attention operation. Through such modules, MRel-HGAN can generate node embedding by sufficiently leveraging the rich semantic information of the multi-relational structure in the mobile app usage graph. Experimental results on real-world mobile app usage datasets show the effectiveness and superiority of our MRel-HGAN in the user profiling task for attributes of gender and age.

IJCAI Conference 2022 Conference Paper

A Formal Model for Multiagent Q-Learning Dynamics on Regular Graphs

  • Chen Chu
  • Yong Li
  • Jinzhuo Liu
  • Shuyue Hu
  • Xuelong Li
  • Zhen Wang

Modeling the dynamics of multi-agent learning has long been an important research topic. The focus of previous research has been either on 2-agent settings or well-mixed infinitely large agent populations. In this paper, we consider the scenario where n Q-learning agents locate on regular graphs, such that agents can only interact with their neighbors. We examine the local interactions between individuals and their neighbors, and derive a formal model to capture the Q-value dynamics of the entire population. Through comparisons with agent-based simulations on different types of regular graphs, we show that our model describes the agent learning dynamics in an exact manner.

TIST Journal 2022 Journal Article

Crowd Flow Prediction for Irregular Regions with Semantic Graph Attention Network

  • Fuxian Li
  • Jie Feng
  • Huan Yan
  • Depeng Jin
  • Yong Li

It is essential to predict crowd flow precisely in a city, which is practically partitioned into irregular regions based on road networks and functionality. However, prior works mainly focus on grid-based crowd flow prediction, where a city is divided into many regular grids. Although Convolutional Neural Netwok (CNN) is powerful to capture spatial dependence from grid-based Euclidean data, it fails to tackle non-Euclidean data, which reflect the correlations among irregular regions. Besides, prior works fail to jointly capture the hierarchical spatio-temporal dependence from both regular and irregular regions. Finally, the correlations among regions are time-varying and functionality-related. However, the combination of dynamic and semantic attributes of regions are ignored by related works. To address the above challenges, in this article, we propose a novel model to tackle the flow prediction task for irregular regions. First, we employ CNN and Graph Neural Network (GNN) to capture micro and macro spatial dependence among grid-based regions and irregular regions, respectively. Further, we think highly of the dynamic inter-region correlations and propose a location-aware and time-aware graph attention mechanism named Semantic Graph Attention Network (Semantic-GAT), based on dynamic node attribute embedding and multi-view graph reconstruction. Extensive experimental results based on two real-life datasets demonstrate that our model outperforms 10 baselines by reducing the prediction error around 8%.

TIST Journal 2022 Journal Article

Hierarchical Multi-agent Model for Reinforced Medical Resource Allocation with Imperfect Information

  • Qianyue Hao
  • Fengli Xu
  • Lin Chen
  • Pan Hui
  • Yong Li

With the advent of the COVID-19 pandemic, the shortage in medical resources became increasingly more evident. Therefore, efficient strategies for medical resource allocation are urgently needed. However, conventional rule-based methods employed by public health experts have limited capability in dealing with the complex and dynamic pandemic-spreading situation. In addition, model-based optimization methods such as dynamic programming (DP) fail to work since we cannot obtain a precise model in real-world situations most of the time. Model-free reinforcement learning (RL) is a powerful tool for decision-making; however, three key challenges exist in solving this problem via RL: (1) complex situations and countless choices for decision-making in the real world; (2) imperfect information due to the latency of pandemic spreading; and (3) limitations on conducting experiments in the real world since we cannot set up pandemic outbreaks arbitrarily. In this article, we propose a hierarchical RL framework with several specially designed components. We design a decomposed action space with a corresponding training algorithm to deal with the countless choices, ensuring efficient and real-time strategies. We design a recurrent neural network–based framework to utilize the imperfect information obtained from the environment. We also design a multi-agent voting method, which modifies the decision-making process considering the randomness during model training and, thus, improves the performance. We build a pandemic-spreading simulator based on real-world data, serving as the experimental platform. We then conduct extensive experiments. The results show that our method outperforms all baselines, which reduces infections and deaths by 14.25% on average without the multi-agent voting method and up to 15.44% with it.

AAAI Conference 2022 Conference Paper

MAPDP: Cooperative Multi-Agent Reinforcement Learning to Solve Pickup and Delivery Problems

  • Zefang Zong
  • Meng Zheng
  • Yong Li
  • Depeng Jin

Cooperative Pickup and Delivery Problem (PDP), as a variant of the typical Vehicle Routing Problems (VRP), is an important formulation in many real-world applications, such as on-demand delivery, industrial warehousing, etc. It is of great importance to efficiently provide high-quality solutions of cooperative PDP. However, it is not trivial to provide effective solutions directly due to two major challenges: 1) the structural dependency between pickup and delivery pairs require explicit modeling and representation. 2) the cooperation between different vehicles is highly related to solution exploration and is difficult to model. In this paper, we propose a novel multi-agent reinforcement learning-based framework to solve the cooperative PDP (MAPDP). First, we design a paired context embedding to well measure the dependency of different nodes considering their structural limits. Second, we utilize cooperative multi-agent decoders to leverage the decision dependence among different vehicle agents based on a special communication embedding. Third, we design a novel cooperative A2C algorithm to train the integrated model. We conduct extensive experiments on a randomly generated dataset and a real-world dataset. Experiments result shown that the proposed MAPDP outperforms all other baselines by at least 1. 64% in all settings, and shows significant computation speed during solution inference.

YNICL Journal 2022 Journal Article

Thalamic structural connectivity profiles in blepharospam/Meige’s syndrome

  • Tobias Mantel
  • Angela Jochim
  • Tobias Meindl
  • Jonas Deppe
  • Claus Zimmer
  • Yong Li
  • Bernhard Haslinger

BACKGROUND: Blepharospasm is a debilitating focal dystonia characterized by involuntary eyelid spasms that can be accompanied by oromandibular muscle involvement (Meige's syndrome). Frequently observed abnormality in functional neuroimaging hints at an important position of the thalamus, that relays involved cortico-basal ganglia-cortical and cortico-cerebello-cortical circuits, within the abnormal network in blepharospasm. OBJECTIVE: To characterize abnormal cortico-thalamic structural/streamline connectivity (SC) patterns in the disease, as well as their potential co-occurrence with abnormal subcortico-thalamo-cortical projections using diffusion tractography. METHODS: Diffusion imaging was obtained in 17 patients with blepharospasm (5 with mild lower facial involvement) and 17 healthy controls. Probabilistic tractography was used for quantification of SC between six cortical regions and thalamus, and voxel-level thalamic SC mapping as well as evaluation of the thalamic SC distributions' topography by center-of-gravity analysis was performed. Post-hoc, correlations of SC with clinical parameters were evaluated. Further, white matter integrity was investigated within representative segments of the dentato-thalamo-cortical and pallido-thalamo-cortical tract. RESULTS: Connectivity mapping showed significant reduction of right (pre)motor- and left occipital-thalamic SC, as well as a topographic shift of the left occipital-thalamic SC distribution in patients. Significant positive correlation of occipital-thalamic SC with disease severity was found. Post-hoc analysis revealed significantly reduced mean fractional anisotropy in patients within the dentato-thalamo-cortical trajectory connecting to right (pre)motor and left occipital cortex. CONCLUSION: Abnormal occipital/motor SC provides evidence for dysfunction of the thalamus-relayed visual and motor network as a key aspect in the disease. Concurrent impairment of microstructural integrity within the dentato-thalamic trajectories targeting those cortices hints at cerebellar contribution.

AAAI Conference 2021 Conference Paper

AttnMove: History Enhanced Trajectory Recovery via Attentional Network

  • Tong Xia
  • Yunhan Qi
  • Jie Feng
  • Fengli Xu
  • Funing Sun
  • Diansheng Guo
  • Yong Li

A considerable amount of mobility data has been accumulated due to the proliferation of location-based service. Nevertheless, compared with mobility data from transportation systems like the GPS module in taxis, this kind of data is commonly sparse in terms of individual trajectories in the sense that users do not access mobile services and contribute their data all the time. Consequently, the sparsity inevitably weakens the practical value of the data even it has a high user penetration rate. To solve this problem, we propose a novel attentional neural network-based model, named AttnMove, to densify individual trajectories by recovering unobserved locations at a fine-grained spatial-temporal resolution. To tackle the challenges posed by sparsity, we design various intraand inter- trajectory attention mechanisms to better model the mobility regularity of users and fully exploit the periodical pattern from long-term history. We evaluate our model on two real-world datasets, and extensive results demonstrate the performance gain compared with the state-of-the-art methods. This also shows that, by providing high-quality mobility data, our model can benefit a variety of mobility-oriented down-stream applications.

NeurIPS Conference 2021 Conference Paper

Automorphic Equivalence-aware Graph Neural Network

  • Fengli Xu
  • Quanming Yao
  • Pan Hui
  • Yong Li

Distinguishing the automorphic equivalence of nodes in a graph plays an essential role in many scientific domains, e. g. , computational biologist and social network analysis. However, existing graph neural networks (GNNs) fail to capture such an important property. To make GNN aware of automorphic equivalence, we first introduce a localized variant of this concept --- ego-centered automorphic equivalence (Ego-AE). Then, we design a novel variant of GNN, i. e. , GRAPE, that uses learnable AE-aware aggregators to explicitly differentiate the Ego-AE of each node's neighbors with the aids of various subgraph templates. While the design of subgraph templates can be hard, we further propose a genetic algorithm to automatically search them from graph data. Moreover, we theoretically prove that GRAPE is expressive in terms of generating distinct representations for nodes with different Ego-AE features, which fills in a fundamental gap of existing GNN variants. Finally, we empirically validate our model on eight real-world graph data, including social network, e-commerce co-purchase network, and citation network, and show that it consistently outperforms existing GNNs. The source code is public available at https: //github. com/tsinghua-fib-lab/GRAPE.

TIST Journal 2021 Journal Article

Linking Multiple User Identities of Multiple Services from Massive Mobility Traces

  • Huandong Wang
  • Yong Li
  • Gang Wang
  • Depeng Jin

Understanding the linkability of online user identifiers (IDs) is critical to both service providers (for business intelligence) and individual users (for assessing privacy risks). Existing methods are designed to match IDs across two services but face key challenges of matching multiple services in practice, particularly when users have multiple IDs per service. In this article, we propose a novel system to link IDs across multiple services by exploring the spatial-temporal features of user activities, of which the core idea is that the same user's online IDs are more likely to repeatedly appear at the same location. Specifically, we first utilize a contact graph to capture the “co-location” of all IDs across multiple services. Based on this graph, we propose a set-wise matching algorithm to discover candidate ID sets and use Bayesian inference to generate confidence scores for candidate ranking, which is proved to be optimal. We evaluate our system using two real-world ground-truth datasets from an Internet service provider (4 services, 815K IDs) and Twitter-Foursquare (2 services, 770 IDs). Extensive results show that our system significantly outperforms the state-of-the-art algorithms in accuracy (AUC is higher by 0.1–0.2), and it is highly robust against data quality, matching order, and number of services.

NeurIPS Conference 2021 Conference Paper

Progressive Feature Interaction Search for Deep Sparse Network

  • Chen Gao
  • Yinfeng Li
  • Quanming Yao
  • Depeng Jin
  • Yong Li

Deep sparse networks (DSNs), of which the crux is exploring the high-order feature interactions, have become the state-of-the-art on the prediction task with high-sparsity features. However, these models suffer from low computation efficiency, including large model size and slow model inference, which largely limits these models' application value. In this work, we approach this problem with neural architecture search by automatically searching the critical component in DSNs, the feature-interaction layer. We propose a distilled search space to cover the desired architectures with fewer parameters. We then develop a progressive search algorithm for efficient search on the space and well capture the order-priority property in sparse prediction tasks. Experiments on three real-world benchmark datasets show promising results of PROFIT in both accuracy and efficiency. Further studies validate the feasibility of our designed search space and search algorithm.

IJCAI Conference 2021 Conference Paper

Synthesizing Good-Enough Strategies for LTLf Specifications

  • Yong Li
  • Andrea Turrini
  • Moshe Y. Vardi
  • Lijun Zhang

We consider the problem of synthesizing good-enough (GE)-strategies for linear temporal logic (LTL) over finite traces or LTLf for short. The problem of synthesizing GE-strategies for an LTL formula φ over infinite traces reduces to the problem of synthesizing winning strategies for the formula (∃Oφ)⇒φ where O is the set of propositions controlled by the system. We first prove that this reduction does not work for LTLf formulas. Then we show how to synthesize GE-strategies for LTLf formulas via the Good-Enough (GE)-synthesis of LTL formulas. Unfortunately, this requires to construct deterministic parity automata on infinite words, which is computationally expensive. We then show how to synthesize GE-strategies for LTLf formulas by a reduction to solving games played on deterministic Büchi automata, based on an easier construction of deterministic automata on finite words. We show empirically that our specialized synthesis algorithm for GE-strategies outperforms the algorithms going through GE-synthesis of LTL formulas by orders of magnitude.

IJCAI Conference 2020 Conference Paper

A Sequential Convolution Network for Population Flow Prediction with Explicitly Correlation Modelling

  • Jie Feng
  • Ziqian Lin
  • Tong Xia
  • Funing Sun
  • Diansheng Guo
  • Yong Li

Population flow prediction is one of the most fundamental components in many applications from urban management to transportation schedule. It is challenging due to the complicated spatial-temporal correlation. While many studies have been done in recent years, they fail to simultaneously and effectively model the spatial correlation and temporal variations among population flows. In this paper, we propose Convolution based Sequential and Cross Network (CSCNet) to solve them. On the one hand, we design a CNN based sequential structure with progressively merging the flow features from different time in different CNN layers to model the spatial-temporal information simultaneously. On the other hand, we make use of the transition flow as the proxy to efficiently and explicitly capture the dynamic correlation between different types of population flows. Extensive experiments on 4 datasets demonstrate that CSCNet outperforms the state-of-the-art baselines by reducing the prediction error around 7. 7%∼10. 4%.

JBHI Journal 2020 Journal Article

Adversarial Representation Learning for Robust Patient-Independent Epileptic Seizure Detection

  • Xiang Zhang
  • Lina Yao
  • Manqing Dong
  • Zhe Liu
  • Yu Zhang
  • Yong Li

Epilepsy is a chronic neurological disorder characterized by the occurrence of spontaneous seizures, which affects about one percent of the worlds population. Most of the current seizure detection approaches strongly rely on patient history records and thus fail in the patient-independent situation of detecting the new patients. To overcome such limitation, we propose a robust and explainable epileptic seizure detection model that effectively learns from seizure states while eliminates the inter-patient noises. A complex deep neural network model is proposed to learn the pure seizure-specific representation from the raw non-invasive electroencephalography (EEG) signals through adversarial training. Furthermore, to enhance the explainability, we develop an attention mechanism to automatically learn the importance of each EEG channels in the seizure diagnosis procedure. The proposed approach is evaluated over the Temple University Hospital EEG (TUH EEG) database. The experimental results illustrate that our model outperforms the competitive state-of-the-art baselines with low latency. Moreover, the designed attention mechanism is demonstrated ables to provide fine-grained information for pathological analysis. We propose an effective and efficient patient-independent diagnosis approach of epileptic seizure based on raw EEG signals without manually feature engineering, which is a step toward the development of large-scale deployment for real-life use.

TIST Journal 2020 Journal Article

DeepApp

  • Tong Xia
  • Yong Li
  • Jie Feng
  • Depeng Jin
  • Qing Zhang
  • Hengliang Luo
  • Qingmin Liao

Smartphone mobile application (App) usage prediction, i.e., which Apps will be used next, is beneficial for user experience improvement. Through an in-depth analysis on a real-world dataset, we find that App usage is highly spatio-temporally correlated and personalized. Given the ability to model complex spatio-temporal contexts, we aim to apply deep learning to achieve high prediction accuracy. However, the personalization yields a problem: training one network for each individual suffers from data scarcity, yet training one deep neural network for all users often fails to uncover user preference. In this article, we propose a novel App usage prediction framework, named DeepApp, to achieve context-aware prediction via multi-task learning. To tackle the challenge of data scarcity, we train one general network for multiple users to share common patterns. To better utilize the spatio-temporal contexts, we supplement a location prediction task in the multi-task learning framework to learn spatio-temporal relations. As for the personalization, we add a user identification task to capture user preference. We evaluate DeepApp on the large-scale dataset by extensive experiments. Results demonstrate that DeepApp outperforms the start-of-the-art baseline by 6.44%.

AAAI Conference 2020 Conference Paper

Hybrid Compositional Reasoning for Reactive Synthesis from Finite-Horizon Specifications

  • Suguman Bansal
  • Yong Li
  • Lucas Tabajara
  • Moshe Vardi

LTLf synthesis is the automated construction of a reactive system from a high-level description, expressed in LTLf, of its finite-horizon behavior. So far, the conversion of LTLf formulas to deterministic finite-state automata (DFAs) has been identified as the primary bottleneck to the scalabity of synthesis. Recent investigations have also shown that the size of the DFA state space plays a critical role in synthesis as well. Therefore, effective resolution of the bottleneck for synthesis requires the conversion to be time and memory performant, and prevent state-space explosion. Current conversion approaches, however, which are based either on explicit-state representation or symbolic-state representation, fail to address these necessities adequately at scale: Explicit-state approaches generate minimal DFA but are slow due to expensive DFA minimization. Symbolic-state representations can be succinct, but due to the lack of DFA minimization they generate such large state spaces that even their symbolic representations cannot compensate for the blow-up. This work proposes a hybrid representation approach for the conversion. Our approach utilizes both explicit and symbolic representations of the state-space, and effectively leverages their complementary strengths. In doing so, we offer an LTLf to DFA conversion technique that addresses all three necessities, hence resolving the bottleneck. A comprehensive empirical evaluation on conversion and synthesis benchmarks supports the merits of our hybrid approach.

IJCAI Conference 2020 Conference Paper

Multi-View Joint Graph Representation Learning for Urban Region Embedding

  • Mingyang Zhang
  • Tong Li
  • Yong Li
  • Pan Hui

The increasing amount of urban data enable us to investigate urban dynamics, assist urban planning, and eventually, make our cities more livable and sustainable. In this paper, we focus on learning an embedding space from urban data for urban regions. For the first time, we propose a multi-view joint learning model to learn comprehensive and representative urban region embeddings. We first model different types of region correlations based on both human mobility and inherent region properties. Then, we apply a graph attention mechanism in learning region representations from each view of the built correlations. Moreover, we introduce a joint learning module that boosts the region embedding learning by sharing cross-view information and fuses multi-view embeddings by learning adaptive weights. Finally, we exploit the learned embeddings in the downstream applications of land usage classification and crime prediction in urban areas with real-world data. Extensive experiment results demonstrate that by exploiting our proposed joint learning model, the performance is improved by a large margin on both tasks compared with the state-of-the-art methods.

GandALF Workshop 2020 Workshop Paper

On the Power of Unambiguity in Büchi Complementation

  • Yong Li
  • Moshe Y. Vardi
  • Lijun Zhang

In this work, we exploit the power of unambiguity for the complementation problem of Büchi automata by utilizing reduced run directed acyclic graphs (DAGs) over infinite words, in which each vertex has at most one predecessor. We then show how to use this type of reduced run DAGs as a unified tool to optimize both rank-based and slice-based complementation constructions for Büchi automata with a finite degree of ambiguity. As a result, given a Büchi automaton with n states and a finite degree of ambiguity, the number of states in the complementary Büchi automaton constructed by the classical rank-based and slice-based complementation constructions can be improved, respectively, to 2^O(n) from 2^O(nlogn) and to O(4^n) from O((3n)^n).

NeurIPS Conference 2020 Conference Paper

Simplify and Robustify Negative Sampling for Implicit Collaborative Filtering

  • Jingtao Ding
  • Yuhan Quan
  • Quanming Yao
  • Yong Li
  • Depeng Jin

Negative sampling approaches are prevalent in implicit collaborative filtering for obtaining negative labels from massive unlabeled data. As two major concerns in negative sampling, efficiency and effectiveness are still not fully achieved by recent works that use complicate structures and overlook risk of false negative instances. In this paper, we first provide a novel understanding of negative instances by empirically observing that only a few instances are potentially important for model learning, and false negatives tend to have stable predictions over many training iterations. Above findings motivate us to simplify the model by sampling from designed memory that only stores a few important candidates and, more importantly, tackle the untouched false negative problem by favouring high-variance samples stored in memory, which achieves efficient sampling of true negatives with high-quality. Empirical results on two synthetic datasets and three real-world datasets demonstrate both robustness and superiorities of our negative sampling method. The implementation is available at https: //github. com/dingjingtao/SRNS.

YNICL Journal 2020 Journal Article

Structure-function abnormalities in cortical sensory projections in embouchure dystonia

  • Tobias Mantel
  • Eckart Altenmüller
  • Yong Li
  • André Lee
  • Tobias Meindl
  • Angela Jochim
  • Claus Zimmer
  • Bernhard Haslinger

BACKGROUND: Embouchure dystonia (ED) is a task-specific focal dystonia in professional brass players leading to abnormal orofacial muscle posturing/spasms during performance. Previous studies have outlined abnormal cortical sensorimotor function during sensory/motor tasks and in the resting state as well as abnormal cortical sensorimotor structure. Yet, potentially underlying white-matter tract abnormalities in this network disease are unknown. OBJECTIVE: To delineate structure-function abnormalities within cerebral sensorimotor trajectories in ED. METHOD: Probabilistic tractography and seed-based functional connectivity analysis were performed in 16/16 ED patients/healthy brass players within a simple literature-informed network model of cortical sensorimotor processing encompassing supplementary motor, superior parietal, primary somatosensory and motor cortex as well as the putamen. Post-hoc grey matter volumetry was performed within cortices of abnormal trajectories. RESULTS: ED patients showed average axial diffusivity reduction within projections between the primary somatosensory cortex and putamen, with converse increases within projections between supplementary motor and superior parietal cortex in both hemispheres. Increase in the mode of anisotropy in patients was accompanying the latter left-hemispheric projection, as well as in the supplementary motor area's projection to the left primary motor cortex. Patient's left primary somatosensory functional connectivity with the putamen was abnormally reduced and significantly associated with the axial diffusivity reduction. Left primary somatosensory grey matter volume was increased in patients. CONCLUSION: Correlates of abnormal tract integrity within primary somatosensory cortico-subcortical projections and higher-order sensorimotor projections support the key role of dysfunctional sensory information propagation in ED pathophysiology. Differential directionality of cortico-cortical and cortico-subcortical abnormalities hints at non-uniform sensory system changes.

IJCAI Conference 2019 Conference Paper

A Decomposition Approach for Urban Anomaly Detection Across Spatiotemporal Data

  • Mingyang Zhang
  • Tong Li
  • Hongzhi Shi
  • Yong Li
  • Pan Hui

Urban anomalies such as abnormal flow of crowds and traffic accidents could result in loss of life or property if not handled properly. Detecting urban anomalies at the early stage is important to minimize the adverse effects. However, urban anomaly detection is difficult due to two challenges: a) the criteria of urban anomalies varies with different locations and time; b) urban anomalies of different types may show different signs. In this paper, we propose a decomposing approach to address these two challenges. Specifically, we decompose urban dynamics into the normal component and the abnormal component. The normal component is merely decided by spatiotemporal features, while the abnormal component is caused by anomalous events. Then, we extract spatiotemporal features and estimate the normal component accordingly. At last, we derive the abnormal component to identify anomalies. We evaluate our method using both real-world and synthetic datasets. The results show our method can detect meaningful events and outperforms state-of-the-art anomaly detecting methods by a large margin.

IJCAI Conference 2019 Conference Paper

DeepAPF: Deep Attentive Probabilistic Factorization for Multi-site Video Recommendation

  • Huan Yan
  • Xiangning Chen
  • Chen Gao
  • Yong Li
  • Depeng Jin

Existing web video systems recommend videos according to users' viewing history from its own website. However, since many users watch videos in multiple websites, this approach fails to capture these users' interests across sites. In this paper, we investigate the user viewing behavior in multiple sites based on a large scale real dataset. We find that user interests are comprised of cross-site consistent part and site-specific part with different degrees of the importance. Existing linear matrix factorization recommendation model has limitation in modeling such complicated interactions. Thus, we propose a model of Deep Attentive Probabilistic Factorization (DeepAPF) to exploit deep learning method to approximate such complex user-video interaction. DeepAPF captures both cross-site common interests and site-specific interests with non-uniform importance weights learned by the attentional network. Extensive experiments show that our proposed model outperforms by 17. 62%, 7. 9% and 8. 1% with the comparison of three state-of-the-art baselines. Our study provides insight to integrate user viewing records from multiple sites via the trusted third party, which gains mutual benefits in video recommendation.

AAAI Conference 2019 Conference Paper

DeepDPM: Dynamic Population Mapping via Deep Neural Network

  • Zefang Zong
  • Jie Feng
  • Kechun Liu
  • Hongzhi Shi
  • Yong Li

Dynamic high resolution data on human population distribution is of great importance for a wide spectrum of activities and real-life applications, but is too difficult and expensive to obtain directly. Therefore, generating fine-scaled population distributions from coarse population data is of great significance. However, there are three major challenges: 1) the complexity in spatial relations between high and low resolution population; 2) the dependence of population distributions on other external information; 3) the difficulty in retrieving temporal distribution patterns. In this paper, we first propose the idea to generate dynamic population distributions in full-time series, then we design dynamic population mapping via deep neural network(DeepDPM), a model that describes both spatial and temporal patterns using coarse data and point of interest information. In DeepDPM, we utilize super-resolution convolutional neural network(SRCNN) based model to directly map coarse data into higher resolution data, and a timeembedded long short-term memory model to effectively capture the periodicity nature to smooth the finer-scaled results from the previous static SRCNN model. We perform extensive experiments on a real-life mobile dataset collected from Shanghai. Our results demonstrate that DeepDPM outperforms previous state-of-the-art methods and a suite of frequent data-mining approaches. Moreover, DeepDPM breaks through the limitation from previous works in time dimension so that dynamic predictions in all-day time slots can be obtained.

AAAI Conference 2019 Conference Paper

DeepSTN+: Context-Aware Spatial-Temporal Neural Network for Crowd Flow Prediction in Metropolis

  • Ziqian Lin
  • Jie Feng
  • Ziyang Lu
  • Yong Li
  • Depeng Jin

Crowd flow prediction is of great importance in a wide range of applications from urban planning, traffic control to public safety. It aims to predict the inflow (the traffic of crowds entering a region in a given time interval) and outflow (the traffic of crowds leaving a region for other places) of each region in the city with knowing the historical flow data. In this paper, we propose DeepSTN+, a deep learning-based convolutional model, to predict crowd flows in the metropolis. First, Deep- STN+ employs the ConvPlus structure to model the longrange spatial dependence among crowd flows in different regions. Further, PoI distributions and time factor are combined to express the effect of location attributes to introduce prior knowledge of the crowd movements. Finally, we propose an effective fusion mechanism to stabilize the training process, which further improves the performance. Extensive experimental results based on two real-life datasets demonstrate the superiority of our model, i. e. , DeepSTN+ reduces the error of the crowd flow prediction by approximately 8%∼13% compared with the state-of-the-art baselines.

JBHI Journal 2019 Journal Article

Pattern Classification for Gastrointestinal Stromal Tumors by Integration of Radiomics and Deep Convolutional Features

  • Zhenyuan Ning
  • Jiaxiu Luo
  • Yong Li
  • Shuai Han
  • Qianjin Feng
  • Yikai Xu
  • Wufan Chen
  • Tao Chen

Predicting malignant potential is one of the most critical components of a computer-aided diagnosis system for gastrointestinal stromal tumors (GISTs). These tumors have been studied only on the basis of subjective computed tomography findings. Among various methodologies, radiomics, and deep learning algorithms, specifically convolutional neural networks (CNNs), have recently been confirmed to achieve significant success by outperforming the state-of-the-art performance in medical image pattern classification and have rapidly become leading methodologies in this field. However, the existing methods generally use radiomics or deep convolutional features independently for pattern classification, which tend to take into account only global or local features, respectively. In this paper, we introduce and evaluate a hybrid structure that includes different features selected with radiomics model and CNNs and integrates these features to deal with GISTs classification. The Radiomics model and CNNs are constructed for global radiomics and local convolutional feature selection, respectively. Subsequently, we utilize distinct radiomics and deep convolutional features to perform pattern classification for GISTs. Specifically, we propose a new pooling strategy to assemble the deep convolutional features of 54 three-dimensional patches from the same case and integrate these features with the radiomics features for independent case, followed by random forest classifier. Our method can be extensively evaluated using multiple clinical datasets. The classification performance (area under the curve (AUC): 0. 882; 95% confidence interval (CI): 0. 816-0. 947) consistently outperforms those of independent radiomics (AUC: 0. 807; 95% CI: 0. 724-0. 892) and CNNs (AUC: 0. 826; 95% CI: 0. 795-0. 856) approaches.

IJCAI Conference 2019 Conference Paper

Reinforced Negative Sampling for Recommendation with Exposure Data

  • Jingtao Ding
  • Yuhan Quan
  • Xiangnan He
  • Yong Li
  • Depeng Jin

In implicit feedback-based recommender systems, user exposure data, which record whether or not a recommended item has been interacted by a user, provide an important clue on selecting negative training samples. In this work, we improve the negative sampler by integrating the exposure data. We propose to generate high-quality negative instances by adversarial training to favour the difficult instances, and by optimizing additional objective to favour the real negatives in exposure data. However, this idea is non-trivial to implement since the distribution of exposure data is latent and the item space is discrete. To this end, we design a novel RNS method (short for Reinforced Negative Sampler) that generates exposure-alike negative instances through feature matching technique instead of directly choosing from exposure data. Optimized under the reinforcement learning framework, RNS is able to integrate user preference signals in exposure data and hard negatives. Extensive experiments on two real-world datasets demonstrate the effectiveness and rationality of our RNS method. Our implementation is available at: https: //github. com/dingjingtao/ReinforceNS.

IJCAI Conference 2018 Conference Paper

Improving Implicit Recommender Systems with View Data

  • Jingtao Ding
  • Guanghui Yu
  • Xiangnan He
  • Yuhan Quan
  • Yong Li
  • Tat-Seng Chua
  • Depeng Jin
  • Jiajie Yu

Most existing recommender systems leverage the primary feedback data only, such as the purchase records in E-commerce. In this work, we additionally integrate view data into implicit feedback based recommender systems (dubbed as Implicit Recommender Systems). We propose to model the pairwise ranking relations among purchased, viewed, and non-viewed interactions, being more effective and flexible than typical pointwise matrix factorization (MF) methods. However, such a pairwise formulation poses efficiency challenges in learning the model. To address this problem, we design a new learning algorithm based on the element-wise Alternating Least Squares (eALS) learner. Notably, our algorithm can efficiently learn model parameters from the whole user-item matrix (including all missing data), with a rather low time complexity that is dependent on the observed data only. Extensive experiments on two real-world datasets demonstrate that our method outperforms several state-of-the-art MF methods by 10% ∼ 28. 4%. Our implementation is available at: https: //github. com/ dingjingtao/View_enhanced_ALS.

YNICL Journal 2018 Journal Article

Network-specific resting-state connectivity changes in the premotor-parietal axis in writer's cramp

  • Tobias Mantel
  • Tobias Meindl
  • Yong Li
  • Angela Jochim
  • Gina Gora-Stahlberg
  • Jona Kräenbring
  • Maria Berndt
  • Christian Dresel

BACKGROUND: Writer's cramp is a task-specific dystonia impairing writing and sometimes other fine motor tasks. Neuroimaging studies using manifold designs have shown varying results regarding the nature of changes in the disease. OBJECTIVE: To clarify and extend the knowledge of underlying changes by investigating functional connectivity (FC) in intrinsic connectivity networks with putative sensorimotor function at rest in an increased number of study subjects. METHODS: Resting-state functional magnetic resonance imaging with independent component analysis was performed in 26/27 writer's cramp patients/healthy controls, and FC within and between resting state networks with putative sensorimotor function was compared. Additionally, voxel-based morphometry was carried out on the subjects' structural images. RESULTS: Patients displayed increased left- and reduced right-hemispheric primary sensorimotor FC in the premotor-parietal network. Mostly bilaterally altered dorsal/ventral premotor FC, as well as altered parietal FC were observed within multiple sensorimotor networks and showed differing network-dependent directionality. Beyond within-network FC changes and reduced right cerebellar grey matter volume in the structural analysis, the positive between-network FC of the cerebellar network and the basal ganglia network was reduced. CONCLUSIONS: Abnormal resting-state FC in multiple networks with putative sensorimotor function may act as basis of preexisting observations made during task-related neuroimaging. Further, altered connectivity between the cerebellar and basal ganglia network underlines the important role of these structures in the disease.

IJCAI Conference 2017 Conference Paper

Dynamic Programming Bipartite Belief Propagation For Hyper Graph Matching

  • Zhen Zhang
  • Julian McAuley
  • Yong Li
  • Wei Wei
  • Yanning Zhang
  • Qinfeng Shi

Hyper graph matching problems have drawn attention recently due to their ability to embed higher order relations between nodes. In this paper, we formulate hyper graph matching problems as constrained MAP inference problems in graphical models. Whereas previous discrete approaches introduce several global correspondence vectors, we introduce only one global correspondence vector, but several local correspondence vectors. This allows us to decompose the problem into a (linear) bipartite matching problem and several belief propagation sub-problems. Bipartite matching can be solved by traditional approaches, while the belief propagation sub-problem is further decomposed as two sub-problems with optimal substructure. Then a newly proposed dynamic programming procedure is used to solve the belief propagation sub-problem. Experiments show that the proposed methods outperform state-of-the-art techniques for hyper graph matching.

IJCAI Conference 2016 Conference Paper

Collaborative Evolution for User Profiling in Recommender Systems

  • Zhongqi Lu
  • Sinno Jialin Pan
  • Yong Li
  • Jie Jiang
  • Qiang Yang

Accurate user profiling is important for an online recommender system to provide proper personalized recommendations to its users. In many real-world scenarios, the user's interests towards the items may change over time. Therefore, a dynamic and evolutionary user profile is needed. In this work, we come up with a novel evolutionary view of user's profile by proposing a Collaborative Evolution (CE) model, which learns the evolution of user's profiles through the sparse historical data in recommender systems and outputs the prospective user profile of the future. To verify the effectiveness of the proposed model, we conduct experiments on a real-world dataset, which is obtained from the online shopping website of Tencent - www. 51buy. com and contains more than 1 million users' shopping records in a time span of more than 180 days. Experimental analyses demonstrate that our proposed CE model can be used to make better future recommendations compared to several state-of-the-art methods.

IJCAI Conference 2015 Conference Paper

Image Feature Learning for Cold Start Problem in Display Advertising

  • Kaixiang Mo
  • Bo Liu
  • Lei Xiao
  • Yong Li
  • Jie Jiang

In online display advertising, state-of-the-art Click Through Rate(CTR) prediction algorithms rely heavily on historical information, and they work poorly on growing number of new ads without any historical information. This is known as the the cold start problem. For image ads, current stateof-the-art systems use handcrafted image features such as multimedia features and SIFT features to capture the attractiveness of ads. However, these handcrafted features are task dependent, inflexible and heuristic. In order to tackle the cold start problem in image display ads, we propose a new feature learning architecture to learn the most discriminative image features directly from raw pixels and user feedback in the target task. The proposed method is flexible and does not depend on human heuristic. Extensive experiments on a real world dataset with 47 billion records show that our feature learning method outperforms existing handcrafted features significantly, and it can extract discriminative and meaningful features.

IJCAI Conference 2015 Conference Paper

Weakly Supervised RBM for Semantic Segmentation

  • Yong Li
  • Jing Liu
  • Yuhang Wang
  • Hanqing Lu
  • Songde Ma

In this paper, we propose a weakly supervised Restricted Boltzmann Machines (WRBM) approach to deal with the task of semantic segmentation with only image-level labels available. In WRBM, its hidden nodes are divided into multiple blocks, and each block corresponds to a specific label. Accordingly, semantic segmentation can be directly modeled by learning the mapping from visible layer to the hidden layer of WRBM. Specifically, based on the standard RBM, we import another two terms to make full use of image-level labels and alleviate the effect of noisy labels. First, we expect the hidden response of each superpixel is suppressed on the labels outside its parent image-level label set, and a non-image-level label suppression term is formulated to implicitly import the image-level labels as weak supervision. Second, semantic graph propagation is employed to exploit the cooccurrence between visually similar regions and labels. Besides, we deal with the problems of label imbalance and diverse backgrounds by adapting the block size to the label frequency and appending hidden response blocks corresponding to backgrounds respectively. Extensive experiments on two real-world datasets demonstrate the good performance of our approach compared with some state-of-the-art methods.

AAAI Conference 2014 Conference Paper

Learning Low-Rank Representations with Classwise Block-Diagonal Structure for Robust Face Recognition

  • Yong Li
  • Jing Liu
  • Zechao Li
  • Yangmuzi Zhang
  • Hanqing Lu
  • Songde Ma

Face recognition has been widely studied due to its importance in various applications. However, the case that both training images and testing images are corrupted is not well addressed. Motivated by the success of low-rank matrix recovery, we propose a novel semisupervised low-rank matrix recovery algorithm for robust face recognition. The proposed method can learn robust discriminative representations for both training images and testing images simultaneously by exploiting the classwise block-diagonal structure. Specifically, low-rank matrix approximation can handle the possible contamination of data. Moreover, the classwise blockdiagonal structure is exploited to promote discrimination of representations for robust recognition. The above issues are formulated into a unified objective function and we design an efficient optimization procedure based on augmented Lagrange multiplier method to solve it. Extensive experiments on three public databases are performed to validate the effectiveness of our approach. The strong identification capability of representations with block-diagonal structure is verified.

ICRA Conference 1994 Conference Paper

Fuzzy Optimization-Based Scheduling of Identical Machines with Possible Breakdown

  • Yong Li
  • Peter B. Luh
  • Xiaohong Guan

Underlying each production system are activities fraught with uncertainty; for example, uncertain future demand, machine breakdowns, and processing time estimates for one-of-a-kind parts. These uncertain events can cause any detailed schedule to become outdated, and the effects may propagate throughout the schedule, affecting product delivery dates. Scheduling algorithms considering future uncertainties could improve the quality of schedules and, as a result, smooth the production of the system. As a step towards incorporating uncertainties in the scheduling consideration, this paper presents a fuzzy optimization methodology for scheduling single operation parts on identical machines with possible breakdowns. A fuzzy optimization formulation is first developed. A Lagrangian relaxation technique is used to decompose the problem into part-level subproblems and a fuzzy membership subproblem. The Lagrange multipliers are then updated by using a subgradient method. To evaluate the performance of the resulting algorithm in a dynamic environment, fuzzy simulation is developed. Preliminary testing results show that, with possible machine breakdowns, this algorithm outperforms the deterministic one. >