Arrow Research search

Author name cluster

Hao Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

97 papers
2 author rows

Possible papers

97

EAAI Journal 2026 Journal Article

A lightweight multi-window attention transformer for image super-resolution

  • Yuqing Yang
  • Hao Liu
  • Jun Zhang
  • Wenfei Luo
  • Jiaqian Wang
  • Yuxiang Shi
  • Hongxia Deng

In recent years, Transformer-based models have achieved strong performance in image super-resolution (SR). However, their high computational complexity and parameter cost still limit deployment on resource-constrained devices. To better balance efficiency and representation capability, this paper proposes a lightweight Transformer for image super-resolution, termed Multi-Window Attention Transformer for Image Super-Resolution (MWAT-SR), which adopts a hierarchical multi-window attention strategy. In shallow layers, Local Dense Attention (LDA) with small windows is used to preserve local high-frequency details. In deeper layers, larger windows are introduced together with a Hybrid Sparse-Channel Attention (HSCA) mechanism, which combines sparse spatial interaction and channel-wise semantic modeling to enlarge the effective receptive field under controlled computational cost. In addition, a Window-Adaptive Multi-Scale Convolutional Feed-Forward Network (WAMC-FFN) is designed to adjust convolution kernel sizes according to the window scale, thereby enhancing multi-scale texture representation. Experimental results on standard benchmark datasets show that MWAT-SR achieves competitive reconstruction performance across × 2, × 3, and × 4 settings, while maintaining a favorable trade-off between reconstruction quality and computational complexity.

AAAI Conference 2026 Conference Paper

Beyond Passive Critical Thinking: Fostering Proactive Questioning to Enhance Human-AI Collaboration

  • Ante Wang
  • Yujie Lin
  • Jingyao Liu
  • Suhang Wu
  • Hao Liu
  • Xinyan Xiao
  • Jinsong Su

Critical thinking is essential for building robust AI systems, preventing them from blindly accepting flawed data or biased reasoning. However, prior work has primarily focused on passive critical thinking, where models simply reject problematic queries without taking constructive steps to address user requests. In this work, we introduce proactive critical thinking, a paradigm where models actively seek missing or clarifying information from users to resolve their queries better. To evaluate this capability, we present GSM-MC and GSM-MCE, two novel benchmarks based on GSM8K for assessing mathematical reasoning under incomplete or misleading conditions. Experiments on Qwen3 and Llama series models show that, while these models excel in traditional reasoning tasks, they struggle with proactive critical thinking, especially smaller ones. However, we demonstrate that reinforcement learning (RL) can significantly improve this ability. By incorporating heuristic information into the reward function, we achieve substantial gains, boosting the Qwen3-1.7B's accuracy from 0.15% to 73.98% on GSM-MC. We hope this work advances models that collaborate more effectively with users in problem-solving through proactive critical thinking.

EAAI Journal 2026 Journal Article

Defect detection of monocrystalline silicon wafers for photovoltaic applications using an improved you only look once version 8 small algorithm

  • Wenbo Bi
  • Xinyu Wang
  • Na Liu
  • Xu Xing
  • Lu Li
  • Hao Liu

Defects on the surface of photovoltaic monocrystalline silicon wafers, such as cracks, corners, and water stains, lead to significant performance degradation and economic losses during manufacturing. To address this, this paper proposes an improved You Only Look Once version 8 small (YOLOv8s) model. The proposed architecture integrates four strategic innovations. First, an Efficient Multi-Scale Convolution (EMSC) module is combined with the Cross-Stage Partial Bottleneck module with two convolutions (C2f) to enhance multi-scale feature extraction capabilities. Second, Spatial Pyramid Pooling-Fast (SPPF) is fused with the Large Separable Kernel Attention (LSKA) module to overcome limitations in processing local details. Third, the Dysample dynamic upsampling operator is introduced to maintain a compact model size while effectively improving detection speed. Finally, the Normalized Wasserstein Distance (NWD) is utilized as the loss function to address the sensitivity of the Intersection over Union (IoU) metric to positional deviations, enhancing precision for small targets. Experimental results demonstrate that the Efficient Lightweight Detection Network (ELDN) achieves superior performance on the validation set with a mean Average Precision (mAP) of 92. 8%. Notably, it exhibits robust generalization on an independent external test set, attaining a mAP of 92. 6%. Validation confirms that YOLOv8s-ELDN consistently outperforms mainstream models. Future research will focus on further optimizing efficiency for deployment on resource-constrained edge devices and addressing defect detection in complex manufacturing environments.

AAAI Conference 2026 Conference Paper

Edge-Centric Relational Reasoning for 3D Scene Graph Prediction

  • Yanni Ma
  • Hao Liu
  • Yulan Guo
  • Theo Gevers
  • Martin R. Oswald

3D scene graph prediction aims to abstract complex 3D environments into structured graphs consisting of objects and their pairwise relationships. Existing approaches typically adopt object-centric graph neural networks, where relation edge features are iteratively updated by aggregating messages from connected object nodes. However, this design inherently restricts relation representations to pairwise object context, making it difficult to capture high-order relational dependencies that are essential for accurate relation prediction. To address this limitation, we propose a Link-guided Edge-centric relational reasoning framework with Object-aware fusion, namely LEO, which enables progressive reasoning from relation-level context to object-level understanding. Specifically, LEO first predicts potential links between object pairs to suppress irrelevant edges, and then transforms the original scene graph into a line graph where each relation is treated as a node. A line graph neural network is applied to perform edge-centric relational reasoning to capture inter-relation context. The enriched relation features are subsequently integrated into the original object-centric graph to enhance object-level reasoning and improve relation prediction. Our framework is model-agnostic and can be integrated with any existing object-centric method. Experiments on the 3DSSG dataset with two competitive baselines show consistent improvements, highlighting the effectiveness of our edge-to-object reasoning paradigm.

JBHI Journal 2026 Journal Article

EISegNet: Enhancing Instrument Segmentation Network via Dual-View Disparity Estimation

  • Yongming Yang
  • Zhaoshuo Diao
  • Ziliang Song
  • Shenglin Zhang
  • Tiancong Liu
  • Chengdong Wu
  • Weiliang Bai
  • Hao Liu

Accurate segmentation of endoscopic instruments is essential in robot-assisted surgery, supporting precis enavigation, enhancing safety, and advancing surgical automation. However, this task is challenging due to factors like complex environments, instrument-tissue similarity, and lighting variations. Instruments, due to their material properties, have distinct depth distributions compared to surrounding tissues. This aspect is often overlooked in monocular video segmentation methods. To address this issue, we propose EISegNet, a multi-task framework that prioritizes instrument segmentation with an auxiliary disparity estimation task. The framework integrates an asymmetric cross-attention mechanism to enhance segmentation performance by fusing features from both tasks. Moreover, by leveraging the geometric properties of motion, EISegNet adapts the stereo disparity estimation strategy for dual-view depth estimation, broadening its applicability to various endoscopic surgeries beyond laparoscopic procedures. Furthermore, EISegNet incorporates a Gaussian-weighted loss function to emphasize edge features, which are particularly challenging for disparity estimation. This function reduces overall loss and improves segmentation accuracy. Extensive cross-dataset experiments demonstrate the superior accuracy and generalization of our method, achieving a 5. 97% increase in IoU (Intersection over Union). Qualitative evaluations on clinical datasets further demonstrate the promising performance in real-world scenarios.

AAAI Conference 2026 Conference Paper

Hard vs. Noise: Resolving Hard-Noisy Sample Confusion in Recommender Systems via Large Language Models

  • Tianrui Song
  • Wen-Shuo Chao
  • Hao Liu

Implicit feedback, employed in training recommender systems, unavoidably confronts noise due to factors such as misclicks and position bias. Previous studies have attempted to identify noisy samples through their diverged data patterns, such as higher loss values, and mitigate their influence through sample dropping or reweighting. However, we observed that noisy samples and hard samples display similar patterns, leading to hard-noisy confusion issue. Such confusion is problematic as hard samples are vital for modeling user preferences. To solve this problem, we propose LLMHNI framework, leveraging two auxiliary user-item relevance signals generated by Large Language Models (LLMs) to differentiate hard and noisy samples. LLMHNI obtains user-item semantic relevance from LLM-encoded embeddings, which is used in negative sampling to select hard negatives while filtering out noisy false negatives. An objective alignment strategy is proposed to project LLM-encoded embeddings, originally for general language tasks, into a representation space optimized for user-item relevance modeling. LLMHNI also exploits LLM-inferred logical relevance within user-item interactions to identify hard and noisy samples. These LLM-inferred interactions are integrated into the interaction graph and guide denoising with cross-graph contrastive alignment. To eliminate the impact of unreliable interactions induced by LLM hallucination, we propose a graph contrastive learning strategy that aligns representations from randomly edge-dropped views to suppress unreliable edges. Empirical results demonstrate that LLMHNI significantly improves denoising and recommendation performance.

AAAI Conference 2026 Conference Paper

ICLR: Inter-Chrominance and Luminance Interaction for Natural Color Restoration in Low-Light Image Enhancement

  • Xin Xu
  • Hao Liu
  • Wei Liu
  • Wei Wang
  • Jiayi Wu
  • Kui Jiang

Low-Light Image Enhancement (LLIE) task aims at improving contrast while restoring details and textures for images captured in low-light conditions. HVI color space has made significant progress in this task by enabling precise decoupling of chrominance and luminance. However, for the interaction of chrominance and luminance branches, substantial distributional differences between the two branches prevalent in natural images limit complementary feature extraction, and luminance errors are propagated to chrominance channels through the nonlinear parameter. Furthermore, for interaction between different chrominance branches, images with large homogeneous-color regions usually exhibit weak correlation between chrominance branches due to concentrated distributions. Traditional pixel-wise losses exploit strong inter-branch correlations for co-optimization, causing gradient conflicts in weakly correlated regions. Therefore, we propose an Inter-Chrominance and Luminance Interaction (ICLR) framework including a Dual-stream Interaction Enhancement Module (DIEM) and a Covariance Correction Loss (CCL). The DIEM improves the extraction of complementary information from two dimensions, fusion and enhancement, respectively. The CCL utilizes luminance residual statistics to penalize chrominance errors and balances gradient conflicts by constraining chrominance branches covariance. Experimental results on multiple datasets show that the proposed ICLR framework outperforms state-of-the-art methods.

EAAI Journal 2026 Journal Article

Inverse compensation and adaptive fuzzy integral sliding-mode control for the underactuated soft massage physiotherapy robot

  • Zixin Huang
  • Chengsong Yu
  • Junjie Lu
  • Hao Liu
  • Peng Huang

Acupoint massage physiotherapy is a kind of effective method to prevent and remedy diseases. Soft robotics technology is thriving, which has potential applications in the field of acupoint massage physiotherapy. Soft massage physiotherapy robot (SMPR) uses the soft robotics technology to realize the acupoint massage physiotherapy function. In this paper, an SMPR consisting of a wearable armor and several pneumatic physiotherapy actuators (PPAs) is design and fabricated. In order to describe complex hysteresis behavior of SMPR, the dynamic model of its PPA is established and identified, which includes two parts: a linear model and an asymmetric Prandtl–Ishlinskii hysteresis (APIH) model. An inverse compensator is then designed to compensate for the hysteresis behavior of the SMPR based on the APIH model, and an approximately linearized system is obtained. Then, by dint of the artificial intelligence method, a fuzzy approximator is designed to approximate the control system’s lumped uncertainty, which includes external disturbances, modeling errors and parameter perturbations. Further, an adaptive fuzzy integral sliding-mode control (AFISMC) is employed to handle the lump uncertainty. Moreover, based on the back-stepping control method, a nominal controller is designed to realize the control of the approximately linearized system. By combining the inverse compensator, fuzzy approximator, AFISMC and nominal controller, the control of the SMPR is realized and the acupoint massage physiotherapy can be controlled accurately. The stabilization to a control systems is theoretically demonstrated. Finally, the experimental results from multiple test scenarios conclusively demonstrate the efficacy and trajectory tracking capability of the developed control strategy.

AAAI Conference 2026 Conference Paper

Multi-Value Alignment for LLMs via Value Decorrelation and Extrapolation

  • Hefei Xu
  • Le Wu
  • Chen Cheng
  • Hao Liu

With the rapid advancement of large language models (LLMs), aligning them with human values for safety and ethics has become a critical challenge. This problem is especially challenging when multiple, potentially conflicting human values must be considered and balanced. Although several variants of existing alignment methods (such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO)) have been proposed to address multi-value alignment, they suffer from notable limitations: 1) they are often unstable and inefficient in multi-value optimization; and 2) they fail to effectively handle value conflicts. As a result, these approaches typically struggle to achieve optimal trade-offs when aligning multiple values. To address this challenge, we propose a novel framework called Multi-Value Alignment (MVA). It mitigates alignment degradation caused by parameter interference among diverse human values by minimizing their mutual information. Furthermore, we propose a value extrapolation strategy to efficiently explore the Pareto frontier, thereby constructing a set of LLMs with diverse value preferences. Extensive experiments demonstrate that MVA consistently outperforms existing baselines in aligning LLMs with multiple human values.

EAAI Journal 2026 Journal Article

Noise-aware dynamic graph recalibration for interpretable aero-engine anomaly detection

  • Haiyan Zhang
  • Youchao Sun
  • Honglan Wu
  • Hao Liu

Early detection of anomalies in aero-engine is critical to flight safety. In practical operational environments, however, dynamic noise interference distorts the topological structure of sensor networks, undermining the reliability of feature propagation in graph neural networks. Existing models lack dynamic graph optimization capabilities under noisy conditions and offer limited interpretability, as they fail to explicitly model the relationship between noise intensity and graph structural evolution. To overcome these limitations, this paper introduces a noise-guided framework for dynamic graph threshold recalibration. Specifically, the framework incorporates a noise-aware graph recalibrator that statistically infers noise levels from feature dispersion to dynamically adjust connection thresholds, and a feature fidelity convolutional layer that gates neighborhood aggregation to prevent noise accumulation and mitigate feature degradation. Experiments on public datasets and real aero-engine operational data demonstrate that the proposed framework significantly outperforms state-of-the-art methods in detection accuracy and noise resilience. Quantitative and visualization analyses confirm its noise-aware characteristics, yielding interpretable edge selection and establishing a rigorous causal link between internal dynamic thresholds and external physical interference. Validation on real-world data further substantiates the framework's potential for practical engineering applications in early anomaly detection.

AAAI Conference 2026 Conference Paper

RecCocktail: A Generalizable and Efficient Framework for LLM-Based Recommendation

  • Min Hou
  • Chenxi Bai
  • Le Wu
  • Hao Liu
  • Kai Zhang
  • Weiwen Liu
  • Richang Hong
  • Ruiming Tang

Large Language Models (LLMs) have achieved remarkable success in recent years, owing to their impressive generalization capabilities and rich world knowledge. To capitalize on the potential of using LLMs as recommender systems, mainstream approaches typically focus on two paradigms. The first paradigm designs multi-domain or multi-task instruction data for generalizable recommendation, so as to align LLMs with general recommendation areas and deal with cold-start recommendation. The second paradigm focuses on enhancing domain-specific recommendation tasks, improving performance in warm recommendation scenarios. While most previous works treat these two paradigms separately, we argue that they have complementary advantages, and combining them can yield better results. In this paper, we propose a generalizable and efficient LLM-based recommendation framework RecCocktail. Our approach begins with fine-tuning a "base spirit" LoRA module using domain-general recommendation instruction data to align LLM with recommendation knowledge. Next, given users' behavior of a specific domain, we construct a domain-specific "ingredient" LoRA module. We then provide an entropy-guided adaptive merging method to mix the "base spirit" and the "ingredient" in the weight space. Please note that, RecCocktail combines the advantages of the existing two paradigms without introducing additional time or space overhead during the inference phase. Moreover, RecCocktail is efficient with plug and play, as the "base spirit" LoRA is trained only once, and any domain-specific "ingredient" can be efficiently mixed with only domain-specific fine-tuning. Extensive experiments on multiple datasets under both warm and cold-start recommendation scenarios validate the effectiveness and generality of the proposed RecCocktail.

EAAI Journal 2026 Journal Article

The accidental explosion tracing model of architecture glass damage based on shuffle attention

  • Hao Liu
  • Zhen Qing Wang
  • Shuai Qin
  • Qiang Zhao
  • Lei Zhang

Explosion tracing is the basis of hazard analysis and risk analysis of accidental explosion. The accidental explosion damage effects data has typical multi-source heterogeneous characteristics. The method of data fusion can be used to aggregate the redundant or complementary information on multiple sensors to obtain the more completed information. The architecture glass damage accidental explosion tracing machine learning model with Shuffle attention based on the experiment and simulate data of tempered glass plate under blast loading has been exhibited. Two branch models, blast wave propagation tracing model (Process-model) and glass plate dynamic response tracing model (Response-model), were established based on multi-source data fusion. Then the final decision tracing model (Decision-model) has been constructed based on the decision level. The Mean Absolute Percentage Error (MAPE) of the Decision-model was reduced to 0. 1605 compared with two branch models. The data of three different positions of the glass plate were used in the tracing process respectively. The results indicated that the data of the peripheral area showed the largest error. Considering the incompleteness of the actual explosion accident investigation data, in order to verify the real-word practical application of the model, an accidental explosion verification test of emulsion explosive was carried out. The MAPE of the actual measured imperfect dataset is 0. 2423. The results show that the tracing model can still ensure its prediction accuracy even when part input data is missing in practical applications. It provides a reliable analysis tool for accident explosion risk assessment.

EAAI Journal 2026 Journal Article

The research on the diagnostic technology for aortic dissection and acute myocardial infarction based on Raman and infrared spectroscopy combined with multimodal deep learning

  • Lei Yan
  • Guangyao Ma
  • Cheng Chen
  • Chen Chen
  • Jing Tao
  • Xuguang Zhou
  • Ting Tian
  • Hao Liu

Background Aortic dissection and myocardial infarction are two common and life-threatening cardiovascular emergencies characterized by sudden onset, high mortality, and overlapping clinical symptoms such as chest pain and respiratory distress, which make accurate and timely clinical differentiation particularly challenging. Current mainstream diagnostic techniques, including computed tomography and transesophageal echocardiography, provide valuable anatomical and functional information but are often costly, time-consuming, and insensitive to early-stage biochemical alterations, which may result in missed or incorrect diagnoses in emergency settings. Aortic dissection often requires immediate repair of the damaged vessel to prevent further expansion or rupture, whereas myocardial infarction requires rapid restoration of blood flow to the myocardium. The treatment approaches for the two conditions are distinct, and misdiagnosis can result in severe consequences. Therefore, more convenient, rapid, and efficient diagnostic methods are urgently needed. Methods Vibrational spectroscopy is a noninvasive analytical technique with high sensitivity to molecular and biochemical changes in biological samples, and Raman spectroscopy and infrared spectroscopy target distinct molecular vibrational modes, providing complementary pathological information. In this study, a multimodal attention fusion network was developed to integrate Raman spectroscopy and infrared spectroscopy data for rapid disease classification. Results Experimental results demonstrated that the proposed method achieved a diagnostic accuracy of 94. 06 % and a specificity of 97. 03 % percent in distinguishing aortic dissection, myocardial infarction, and non-critical cases. Conclusion This method provides an innovative and efficient decision-support tool for the clinical differentiation of aortic dissection and myocardial infarction, offering significant clinical value.

EAAI Journal 2025 Journal Article

A single-cell RNA sequencing data imputation method based on non-negative matrix factorization and multi-kernel similarity network fusion

  • Pei Liu
  • Cheng Chen
  • Hao Liu
  • Jin Gu
  • Xinya Chen
  • Ying Su
  • Zhiyuan Cheng
  • Xiaoyi Lv

Artificial intelligence-based single-cell RNA sequencing (scRNA-seq) technology is widely used in cell type identification and disease research, but its data often contain a large number of missing values and zero values due to technical limitations and biological differences. These zero values not only affect downstream analysis, but also make it difficult to distinguish technical zero values from biological zero values. Therefore, this paper proposes a scRNA-seq data interpolation method (sc-MKNMF) based on non-negative matrix factorization and multi-kernel similarity network fusion for the first time. This method improves the accuracy of cell clustering by accurately filling some zero values. First, sc-MKNMF uses gene-cell dual-level analysis to distinguish technical zero values from biological zero values, and then calculates the similarity network of multi-kernel fusion of genes and cells respectively. Then, this method uses non-negative matrix factorization combined with similarity network to construct the objective function, and introduces sparse regularization terms to ensure the similarity between genes and cells and improve stability. In addition, sc-MKNMF is also equipped with an efficient optimization algorithm to promote its convergence by continuously updating the objective function. Finally, the verification and comparative experiments on 12 scRNA-seq datasets show that the sc-MKNMF method outperforms other advanced data interpolation methods. In addition, the extension of sc-MKNMF to the two tasks of cell trajectory inference and differentially expressed gene analysis showed significant improvement and excellent versatility.

NeurIPS Conference 2025 Conference Paper

A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone

  • Jitai Hao
  • Qiang Huang
  • Hao Liu
  • Xinyan Xiao
  • Zhaochun Ren
  • Jun Yu

Training high-performing Small Language Models (SLMs) remains computationally expensive, even with knowledge distillation and pruning from larger teacher models. Existing approaches often face three key challenges: (1) information loss from hard pruning, (2) inefficient alignment of representations, and (3) underutilization of informative activations, particularly from Feed-Forward Networks (FFNs). To address these challenges, we introduce \textbf{Low-Rank Clone (LRC)}, an efficient pre-training method that constructs SLMs aspiring to behavioral equivalence with strong teacher models. LRC trains a set of low-rank projection matrices that jointly enable soft pruning by compressing teacher weights, and activation clone by aligning student activations, including FFN signals, with those of the teacher. This unified design maximizes knowledge transfer while removing the need for explicit alignment modules. Extensive experiments with open-source teachers such as Llama-3. 2-3B-Instruct and Qwen2. 5-3B/7B-Instruct show that LRC matches or surpasses the performance of state-of-the-art models trained on trillions of tokens--using only 20B tokens, achieving over \textbf{1, 000$\times$} greater training efficiency. Our codes and model checkpoints are available at https: //github. com/CURRENTF/LowRankClone and https: //huggingface. co/JitaiHao/LRC-4B-Base.

IROS Conference 2025 Conference Paper

Automated UAV-based Wind Turbine Blade Inspection: Blade Stop Angle Estimation and Blade Detail Prioritized Exposure Adjustment

  • Yichuan Shi
  • Hao Liu
  • Haowen Zheng
  • Haowen Yu
  • Xianqi Liang
  • Jie Li
  • Minmin Ma
  • Ximin Lyu

Unmanned aerial vehicles (UAVs) are critical in the automated inspection of wind turbine blades. Nevertheless, several issues persist in this domain. Firstly, existing inspection platforms encounter challenges in meeting the demands of automated inspection tasks and scenarios. Moreover, current blade stop angle estimation methods are vulnerable to environmental factors, restricting their robustness. Additionally, there is an absence of real-time blade detail prioritized exposure adjustment during capture, where lost details cannot be restored through post-optimization. To address these challenges, we introduce a platform and two approaches. Initially, a UAV inspection platform is presented to meet the automated inspection requirements. Subsequently, a Fermat point based blade stop angle estimation approach is introduced, achieving higher precision and success rates. Finally, we propose a blade detail prioritized exposure adjustment approach to ensure appropriate brightness and preserve details during image capture. Extensive tests, comprising over 120 flights across 10 wind turbine models in 5 operational wind farms, validate the effectiveness of the proposed approaches in enhancing inspection autonomy.

NeurIPS Conference 2025 Conference Paper

Bag of Tricks for Inference-time Computation of LLM Reasoning

  • Fan Liu
  • Wen-Shuo Chao
  • Naiqiang Tan
  • Hao Liu

With the advancement of large language models (LLMs), solving complex tasks (e. g. , math problems, code generation, etc. ) has garnered increasing attention. Inference-time computation methods (e. g. , Best-of-N, MCTS, etc. ) are of significant importance, as they have the potential to enhance the reasoning capabilities of LLMs without requiring external training computation. However, due to the inherent challenges of this technique, most existing methods remain proof-of-concept and are not yet sufficiently effective. In this paper, we investigate and benchmark strategies for improving inference-time computation across a wide range of reasoning tasks. Since most current methods rely on a pipeline that first generates candidate solutions (e. g. , generating chain-of-thought candidate solutions) and then selects them based on specific reward signals (e. g. , RLHF reward, process reward, etc. ), our research focuses on strategies for both candidate solution generation (e. g. , instructing prompts, hyperparameters: temperature and top-p, etc. ) and reward mechanisms (e. g. , self-evaluation, reward types, etc. ). The experimental results reveal that several previously overlooked strategies can be critical for the success of inference-time computation (e. g. , simplifying the temperature can improve general reasoning task performance by up to 5%). Based on extensive experiments (more than 20, 000 A100-80G GPU hours with over 1, 000 experiments) across a variety of models (e. g. , Llama, Qwen, and Mistral families) of various sizes, our proposed strategies outperform the baseline by a substantial margin in most cases, providing a stronger foundation for future research.

NeurIPS Conference 2025 Conference Paper

BlockScan: Detecting Anomalies in Blockchain Transactions

  • Jiahao Yu
  • Xian Wu
  • Hao Liu
  • Wenbo Guo
  • Xinyu Xing

We propose BlockScan, a customized Transformer for anomaly detection in blockchain transactions. Unlike existing methods that rely on rule-based systems or directly apply off-the-shelf large language models (LLMs), BlockScan introduces a series of customized designs to effectively model the unique data structure of blockchain transactions. First, a blockchain transaction is multi-modal, containing blockchain-specific tokens, texts, and numbers. We design a novel modularized tokenizer to handle these multi-modal inputs, balancing the information across different modalities. Second, we design a customized masked language modeling mechanism for pretraining the Transformer architecture, incorporating RoPE embedding and FlashAttention for handling longer sequences. Finally, we design a novel anomaly detection method based on the model outputs. We further provide theoretical analysis for the detection method of our system. Extensive evaluations on Ethereum and Solana transactions demonstrate BlockScan's exceptional capability in anomaly detection while maintaining a low false positive rate. Remarkably, BlockScan is the only method that successfully detects anomalous transactions on Solana with high accuracy, whereas all other approaches achieved very low or zero detection recall scores. This work sets a new benchmark for applying Transformer-based approaches in blockchain data analysis.

JMLR Journal 2025 Journal Article

Deep Neural Networks are Adaptive to Function Regularity and Data Distribution in Approximation and Estimation

  • Hao Liu
  • Jiahui Cheng
  • Wenjing Liao

Deep learning has exhibited remarkable results across diverse areas. To understand its success, substantial research has been directed towards its theoretical foundations. Nevertheless, the majority of these studies examine how well deep neural networks can model functions with uniform regularities. In this paper, we explore a different angle: how deep neural networks can adapt to varying degrees of smoothness in functions and nonuniform data distributions across different locations and scales. More precisely, we focus on a broad class of functions defined by nonlinear tree-based approximation methods. This class encompasses a range of function types, such as functions with uniform regularities and discontinuous functions. We develop nonparametric approximation and estimation theories for this class using deep ReLU networks. Our results show that deep neural networks are adaptive to the nonuniform smoothness of functions and nonuniform data distributions at different locations and scales. We apply our results to several function classes, and derive the corresponding approximation and generalization errors. The validity of our results is demonstrated through numerical experiments. [abs] [ pdf ][ bib ] &copy JMLR 2025. ( edit, beta )

AAAI Conference 2025 Conference Paper

Erase Then Rectify: A Training-Free Parameter Editing Approach for Cost-Effective Graph Unlearning

  • Zhe-Rui Yang
  • Jindong Han
  • Chang-Dong Wang
  • Hao Liu

Graph unlearning, which aims to eliminate the influence of specific nodes, edges, or attributes from a trained Graph Neural Network (GNN), is essential in applications where privacy, bias, or data obsolescence is a concern. However, existing graph unlearning techniques often necessitate additional training on the remaining data, leading to significant computational costs, particularly with large-scale graphs. To address these challenges, we propose a two-stage training-free approach, Erase then Rectify (ETR), designed for efficient and scalable graph unlearning while preserving the model utility. Specifically, we first build a theoretical foundation showing that masking parameters critical for unlearned samples enables effective unlearning. Building on this insight, the Erase stage strategically edits model parameters to eliminate the impact of unlearned samples and their propagated influence on intercorrelated nodes. To further ensure the GNN's utility, the Rectify stage devises a gradient approximation method to estimate the model's gradient on the remaining dataset, which is then used to enhance model performance. Overall, ETR achieves graph unlearning without additional training or full training data access, significantly reducing computational overhead and preserving data privacy. Extensive experiments on seven public datasets demonstrate the consistent superiority of ETR in model utility, unlearning efficiency, and unlearning effectiveness, establishing it as a promising solution for real-world graph unlearning challenges.

EAAI Journal 2025 Journal Article

Fabric defect detection via Explicit De-Background

  • Yuntao Chen
  • Hao Liu
  • Jiuzhen Liang

Fabric defect detection under complex backgrounds faces challenges like high interference, false positives, and lack of robustness. To address these issues, a frequency-domain Explicit De-Background method is proposed to separate background from defects and enhance defect focus. The network uses a pixel-level De-background Layer to suppress noise and emphasize defects after extracting multi-scale features with Swin Transformer. This layer includes the Frequency-domain Background Extraction Module (F-BEM) and the Background Suppression Unit (BSU): F-BEM leverages Fourier amplitude to capture global background features, while BSU suppresses them to produce a difference map that accentuates defect regions. The De-Background Attention (DBA) module leverages the difference map as a weighting matrix to enhance spatial focus on defect features while minimizing background interference. To integrate multi-scale information, the Feature Cross-Shrinking Decoder (FCSD) progressively fuses adjacent layers via Cross Aggregation Nodes(CAN), ensuring semantic consistency, reducing redundancy, and mitigating information loss and gradient vanishing for precise defect segmentation. Our method enhances robustness and accuracy in complex backgrounds using explicit background separation and multi-stage feature processing. It surpasses state-of-the-art techniques on fabric defect datasets and shows good generalization in complex and transfer learning scenarios, providing a practical solution for industrial defect detection.

NeurIPS Conference 2025 Conference Paper

Foundation Models for Scientific Discovery: From Paradigm Enhancement to Paradigm Transition

  • Fan Liu
  • Jindong Han
  • Tengfei Lyu
  • Weijia Zhang
  • Zherui Yang
  • Lu Dai
  • Cancheng Liu
  • Hao Liu

Foundation models (FMs), such as GPT-4 and AlphaFold, are reshaping the landscape of scientific research. Beyond accelerating tasks such as hypothesis generation, experimental design, and result interpretation, they prompt a more fundamental question: Are FMs merely enhancing existing scientific methodologies, or are they redefining the way science is conducted? In this paper, we argue that FMs are catalyzing a transition toward a new scientific paradigm. We introduce a three-stage framework to describe this evolution: (1) Meta-Scientific Integration, where FMs enhance workflows within traditional paradigms; (2) Hybrid Human-AI Co-Creation, where FMs become active collaborators in problem formulation, reasoning, and discovery; and (3) Autonomous Scientific Discovery, where FMs operate as independent agents capable of generating new scientific knowledge with minimal human intervention. Through this lens, we review current applications and emerging capabilities of FMs across existing scientific paradigms. We further identify risks and future directions for FM-enabled scientific discovery. This position paper aims to support the scientific community in understanding the transformative role of FMs and to foster reflection on the future of scientific discovery.

EAAI Journal 2025 Journal Article

High-order graph convolutional networks for circular Ribonucleic Acid and disease association prediction incorporating multiple biological relationships

  • Hao Liu
  • Chen Chen
  • Xiaoyi Lv
  • Jin Gu
  • Enguang Zuo
  • Chenjie Chang
  • Ying Su
  • Cheng Chen

Background The search for circular Ribonucleic Acid (circRNA) associated with complex diseases holds considerable importance for disease diagnosis, treatment and research, helping to improve the early recognition and therapeutic efficacy of diseases, deepen the understanding of disease mechanisms, and provide guidance for new drug development. Methods This study presents an innovative high-order graph convolutional neural network, which leverages Gaussian kernels to compute the second-order proximity between nodes, thereby capturing long-range dependencies more effectively. Based on the topological structure of nodes in the graph, the model derives high-order embeddings, which not only enhance the preservation of the global network structure but also overcome the limitations of traditional methods that focus solely on local neighborhoods. Furthermore, by integrating this model with heterogeneous networks composed of multiple biological relationships, we successfully implement accurate predictions of circRNA-disease associations. Results This study achieved an area under the curve (AUC) of 0. 9491 and an accuracy of 0. 9920 on the constructed benchmark dataset, significantly outperforming existing methods in predictive performance, while most of the candidate circRNAs screened in the case studies of breast neoplasms and glioma have been confirmed in the literature. Conclusions This method provides a new perspective for integrating heterogeneous biological data in the study of complex disease-related circRNAs, and will advance further research and practical applications in this field.

ICRA Conference 2025 Conference Paper

In-Pipe Navigation Development Environment and a Smooth Path Planning Method on Pipeline Surface

  • Hao Liu
  • Xiang Li
  • Xiang Zhang
  • Gang Liu
  • Mingquan Lu

Autonomous in-pipe inspection robots can automatically navigate through complex pipeline networks and detect potential risks from corrosion and defects, demonstrating great potential for replacing costly manual inspections. However, there is no publicly available simulation environment where researchers can validate their in-pipe navigation algorithms as far as we know, and the navigation algorithms on constrained 3D pipe surface which is the critical software component are less discussed. Firstly, this paper proposes an open-source In-Pipe Navigation Development Environment. It contains various pipeline models, a magnetic wheel climbing robot model realized by the adhesion plugin, and baseline algorithms for navigation tasks. Secondly, a novel effective path planning method is introduced. Instead of planning based on surface structures, the proposed method plans based on pipeline axis and maps it into local path using the Frenet-Serret formula, thereby generating smooth, feasible, and efficient paths. Finally, we conduct both qualitative and quantitative experiments in the proposed simulation and real-world environments. The results show the usability of the development environment, also robustness and efficiency of the proposed planning method.

TIST Journal 2025 Journal Article

LLM-Enhanced User–Item Interactions: Leveraging Edge Information for Optimized Recommendations

  • Xinyuan Wang
  • Liang Wu
  • Liangjie Hong
  • Hao Liu
  • Yanjie Fu

Graph recommendation methods, representing a connected interaction perspective, reformulate user–item interactions as graphs to leverage graph structure and topology to recommend and have proved practical effectiveness at scale. Large language models (LLMs), representing a textual generative perspective, excel at modeling user languages, understanding behavioral contexts, capturing user–item semantic relationships, analyzing textual sentiments, and generating coherent and contextually relevant texts as recommendations. However, there is a gap between the connected graph perspective and the text generation perspective as the task formulations are different. A research question arises: how can we effectively integrate the two perspectives for more personalized RecSys? To fill this gap, we propose to incorporate graph-edge information into LLMs via prompt and attention innovations. We reformulate recommendations as a probabilistic generative problem using prompts. We develop a framework to incorporate graph edge information from the prompt and attention mechanisms for graph-structured LLM recommendations. We develop a new prompt design that brings in both first-order and second-order graph relationships; we devise an improved LLM attention mechanism to embed direct the spatial and connectivity information of edges. Our evaluation of real-world datasets demonstrates the framework’s ability to understand connectivity information in graph data and to improve the relevance and quality of recommendation results. Our code is released at: https://github.com/anord-wang/LLM4REC.git.

NeurIPS Conference 2025 Conference Paper

MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem

  • Fan Liu
  • Zherui Yang
  • Cancheng Liu
  • Tianrui Song
  • Xiaofeng Gao
  • Hao Liu

Mathematical modeling is a cornerstone of scientific discovery and engineering practice, enabling the translation of real-world problems into formal systems across domains such as physics, biology, and economics. Unlike mathematical reasoning, which assumes a predefined formulation, modeling requires open-ended problem analysis, abstraction, and principled formalization. While Large Language Models (LLMs) have shown strong reasoning capabilities, they fall short in rigorous model construction, limiting their utility in real-world problem-solving. To this end, we formalize the task of LLM-powered real-world mathematical modeling, where agents must analyze problems, construct domain-appropriate formulations, and generate complete end-to-end solutions. We introduce MM-Bench, a curated benchmark of 111 problems from the Mathematical Contest in Modeling (MCM/ICM), spanning the years 2000 to 2025 and across ten diverse domains such as physics, biology, and economics. To tackle this task, we propose MM-Agent, an expert-inspired framework that decomposes mathematical modeling into four stages: open-ended problem analysis, structured model formulation, computational problem solving, and report generation. Experiments on MM-Bench show that MM-Agent significantly outperforms baseline agents, achieving an 11. 88\% improvement over human expert solutions while requiring only 15 minutes and \$0. 88 per task using GPT-4o. Furthermore, under official MCM/ICM protocols, MM-Agent assisted two undergraduate teams in winning the Finalist Award (\textbf{top 2. 0\% among 27, 456 teams}) in MCM/ICM 2025, demonstrating its practical effectiveness as a modeling copilot.

EAAI Journal 2025 Journal Article

Multi-objective optimization of buckling load and natural frequency in functionally graded porous nanobeams using non-dominated sorting genetic Algorithm-II

  • Hao Liu
  • Ali Basem
  • Dheyaa J. Jasim
  • Mohammad Hashemian
  • S. Ali Eftekhari
  • Halah Jawad Al-fanhrawi
  • Barno Abdullaeva
  • Soheil Salahshour

This study investigates the fundamental natural frequency and critical buckling load of Functionally Graded Porous nanobeams supported by an elastic medium, addressing the need for optimized designs in advanced nanostructures. Utilizing a Genetic Algorithm and Non-Dominated Sorting Genetic Algorithm-II, the research aims to identify the Pareto front for these two objectives while incorporating surface effects. The nanobeam is modeled using Nonlocal Strain Gradient Theory and Gurtin-Murdoch surface elasticity theory, with governing equations solved via the Generalized Differential Quadrature Method based on Reddy's Third-order Shear Deformation Theory. Key input parameters, including temperature gradient, residual surface stress, porosity, and elastic foundation properties, are varied to train two Artificial Neural Networks for output prediction. Results indicate that for the fundamental frequency, significant factors include the material length scale and the Pasternak shear foundation parameter, while the critical buckling load is mainly influenced by the temperature gradient and the same material parameters. These findings provide critical insights for designers, allowing them to make informed decisions based on optimal values for eight input parameters.

NeurIPS Conference 2025 Conference Paper

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

  • Ling Fu
  • Zhebin Kuang
  • Jiajun Song
  • Mingxin Huang
  • Biao Yang
  • Yuzhe Li
  • Linghao Zhu
  • Qidi Luo

Scoring the Optical Character Recognition (OCR) capabilities of Large Multimodal Models (LMMs) has witnessed growing interest. Existing benchmarks have highlighted the impressive performance of LMMs in text recognition; however, their abilities in certain challenging tasks, such as text localization, handwritten content extraction, and logical reasoning, remain underexplored. To bridge this gap, we introduce OCRBench v2, a large-scale bilingual text-centric benchmark with currently the most comprehensive set of tasks ($4\times$ more tasks than the previous multi-scene benchmark OCRBench), the widest coverage of scenarios ($31$ diverse scenarios), and thorough evaluation metrics, with $10, 000$ human-verified question-answering pairs and a high proportion of difficult samples. Moreover, we construct a private test set with $1, 500$ manually annotated images. The consistent evaluation trends observed across both public and private test sets validate the OCRBench v2's reliability. After carefully benchmarking state-of-the-art LMMs, we find that most LMMs score below $50$ ($100$ in total) and suffer from five-type limitations, including less frequently encountered text recognition, fine-grained perception, layout perception, complex element parsing, and logical reasoning. The benchmark and evaluation scripts are available at https: //github. com/Yuliang-Liu/MultimodalOCR.

IJCAI Conference 2025 Conference Paper

RePST: Language Model Empowered Spatio-Temporal Forecasting via Semantic-Oriented Reprogramming

  • Hao Wang
  • Jindong Han
  • Wei Fan
  • Leilei Sun
  • Hao Liu

Spatio-temporal forecasting is pivotal in numerous real-world applications, including transportation planning, energy management, and climate monitoring. In this work, we aim to harness the reasoning and generalization abilities of Pre-trained Language Models (PLMs) for more effective spatio-temporal forecasting, particularly in data-scarce scenarios. However, recent studies uncover that PLMs, which are primarily trained on textual data, often falter when tasked with modeling the intricate correlations in numerical time series, thereby limiting their effectiveness in comprehending spatio-temporal data. To bridge the gap, we propose RePST, a semantic-oriented PLM reprogramming framework tailored for spatio-temporal forecasting. Specifically, we first propose a semantic-oriented decomposer that adaptively disentangles spatially correlated time series into interpretable sub-components, which facilitates PLM to understand sophisticated spatio-temporal dynamics via a divide-and-conquer strategy. Moreover, we propose a selective discrete reprogramming scheme, which introduces an expanded spatio-temporal vocabulary space to project spatio-temporal series into discrete representations. This scheme minimizes the information loss during reprogramming and enriches the representations derived by PLMs. Extensive experiments on real-world datasets show that the proposed RePST outperforms twelve state-of-the-art baseline methods, particularly in data-scarce scenarios, highlighting the effectiveness and superior generalization capabilities of PLMs for spatio-temporal forecasting. Codes and Appendix can be found at https: //github. com/usail-hkust/REPST.

EAAI Journal 2025 Journal Article

SAMGCN: A spatially-augmented multi-view graph convolutional network for identifying spatial domains

  • Hao Liu
  • Yue Gao
  • Ying-Lian Gao
  • Cui-Na Jiao
  • Junliang Shang
  • Jin-Xing Liu

Recent innovations in spatial transcriptomics have enabled the measurement of gene expression profiles while preserving the spatial organization of cells. This provides extensive opportunities to explore gene expression patterns in the tissue microenvironment. However, it remains a challenge to combine spatial information with gene expression to accurately identify spatial domains. In this study, a spatially-augmented multi-view graph convolutional network for identifying spatial domains (SAMGCN) is proposed. First, SAMGCN reconstructs gene expression data by incorporating spatial neighborhood information, which enhances gene expression features. It improves the quality of gene expression data and augments the characterization of spatial domains through the construction of spatial graphs, feature graphs, and spatial expression-weighted graphs. By extracting spatial information and gene expression data via convolutional operations, SAMGCN learns multi-view-specific embeddings and employs a contrastive strategy to refine and augment spatial neighborhood relationships, addressing limitations in spatial gene expression data. An attention mechanism is then employed to flexibly merge these embeddings, generating the final spot embedding. Additionally, a zero-inflated negative binomial decoder is used to capture the global probability distribution of gene expression profiles. Finally, the performance of SAMGCN has been validated across various platforms and spatial transcriptomics datasets of different scales, demonstrating its exceptional capability to process spatial transcriptomics data.

YNIMG Journal 2025 Journal Article

Structural damage-driven brain compensation among near-centenarians and centenarians without dementia

  • Hui Tang
  • Haichao Zhao
  • Hao Liu
  • Jiyang Jiang
  • Nicole Kochan
  • Jing Jing
  • Henry Brodaty
  • Wei Wen

Compensation has been proposed as a mechanism to explain how individuals in very old age remain able to maintain normal cognitive functioning. Previous studies have provided evidence on the role of increasing functional connectivity as a compensatory mechanism for age-related white matter damage. However, we lack direct investigation into how these mechanisms contribute to the preservation of cognition in the very old population. We examined a cohort of near-centenarians and centenarians without dementia (aged 95-103 years, n=44). We constructed a structural disconnection matrix based on the disruption of white matter pathways caused by white matter hyperintensities (WMHs), aiming to explore the relationship between functional connections, cognitive preservation and white matter damage. Our results revealed that structural damage can reliably explain the variations of functional connections or cognitive maintenance. Notably, we found significant correlations between the weights in the functional connectivity model and the weights in the cognition model. We observed positive correlations between models for brain disconnections and cognitive function in near-centenarians and centenarians. The strongest effects were found between attention and somatomotor network (SMN) (r=0.397, p<0.001), memory and SMN (r=0.333 p<0.001), fluency and visual network (VIS) - control network (CN) (r=0.406, p<0.001), language and VIS (r=0.309, p<0.001), visuospatial ability and VIS-default mode network (DMN) (r=0.464, p<0.001), as well as global cognition and VIS-DMN (r=0.335, p<0.001). These findings suggest that enhancement of functional connectivity may serve as a compensatory mechanism, such that it mitigates the effects of white matter damage and contributes to preserved cognitive performance in very old age.

TIST Journal 2025 Journal Article

Towards Predicting Urban Land Use Changes: A Dynamic Graph Alignment Perspective

  • Yu Fan
  • Xinjiang Lu
  • Hao Liu
  • Pengfei Wang
  • Liang Liu
  • Huadong Ma
  • Jingbo Zhou

Urban land use, intrinsically linked to people’s daily activities, undergoes continuous evolution, presenting a complex interplay that remains partially understood. To bridge this gap, our study leverages fine-grained human mobility data to predict these changes, adopting a novel approach that conceptualizes “community-level” land use shifts as a regression problem and represents citywide changes through dynamic graphs. We harness recent advancements in graph neural networks (GNNs), which, despite their success in various applications, face challenges in directly predicting land use changes due to the temporal mismatch between the slow evolution of urban land and the immediacy of human mobility data. Our research stands out by introducing a temporal skeleton for dynamic GNNs to synchronize human activity graphs with urban land use changes, a dynamic heterogeneous GNN approach for integrating diverse human activity data to capture essential temporal dependencies, and a novel algorithm powered by causal inference to elucidate the primary factors influencing land use predictions at the community level, all of which contribute to a training process informed by the generated causal graph. Empirically validated on three real-world datasets, our model demonstrates a performance leap over state-of-the-art baselines, marking a pivotal step toward understanding and predicting the dynamics of urban land use.

AAAI Conference 2024 Conference Paper

A Cross-View Hierarchical Graph Learning Hypernetwork for Skill Demand-Supply Joint Prediction

  • Wenshuo Chao
  • Zhaopeng Qiu
  • Likang Wu
  • Zhuoning Guo
  • Zhi Zheng
  • Hengshu Zhu
  • Hao Liu

The rapidly changing landscape of technology and industries leads to dynamic skill requirements, making it crucial for employees and employers to anticipate such shifts to maintain a competitive edge in the labor market. Existing efforts in this area either relies on domain-expert knowledge or regarding the skill evolution as a simplified time series forecasting problem. However, both approaches overlook the sophisticated relationships among different skills and the inner-connection between skill demand and supply variations. In this paper, we propose a Cross-view Hierarchical Graph learning Hypernetwork (CHGH) framework for joint skill demand-supply prediction. Specifically, CHGH is an encoder-decoder network consisting of i) a cross-view graph encoder to capture the interconnection between skill demand and supply, ii) a hierarchical graph encoder to model the co-evolution of skills from a cluster-wise perspective, and iii) a conditional hyper-decoder to jointly predict demand and supply variations by incorporating historical demand-supply gaps. Extensive experiments on three real-world datasets demonstrate the superiority of the proposed framework compared to seven baselines and the effectiveness of the three modules.

NeurIPS Conference 2024 Conference Paper

AdaSociety: An Adaptive Environment with Social Structures for Multi-Agent Decision-Making

  • Yizhe Huang
  • Xingbo Wang
  • Hao Liu
  • Fanqi Kong
  • Aoyang Qin
  • Min Tang
  • Song-Chun Zhu
  • Mingjie Bi

Traditional interactive environments limit agents' intelligence growth with fixed tasks. Recently, single-agent environments address this by generating new tasks based on agent actions, enhancing task diversity. We consider the decision-making problem in multi-agent settings, where tasks are further influenced by social connections, affecting rewards and information access. However, existing multi-agent environments lack a combination of adaptive physical surroundings and social connections, hindering the learning of intelligent behaviors. To address this, we introduce AdaSociety, a customizable multi-agent environment featuring expanding state and action spaces, alongside explicit and alterable social structures. As agents progress, the environment adaptively generates new tasks with social structures for agents to undertake. In AdaSociety, we develop three mini-games showcasing distinct social structures and tasks. Initial results demonstrate that specific social structures can promote both individual and collective benefits, though current reinforcement learning and LLM-based algorithms show limited effectiveness in leveraging social structures to enhance performance. Overall, AdaSociety serves as a valuable research platform for exploring intelligence in diverse physical and social settings. The code is available at https: //github. com/bigai-ai/AdaSociety.

NeurIPS Conference 2024 Conference Paper

Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs

  • Zhao Xu
  • Fan Liu
  • Hao Liu

Although Large Language Models (LLMs) have demonstrated significant capabilities in executing complex tasks in a zero-shot manner, they are susceptible to jailbreak attacks and can be manipulated to produce harmful outputs. Recently, a growing body of research has categorized jailbreak attacks into token-level and prompt-level attacks. However, previous work primarily overlooks the diverse key factors of jailbreak attacks, with most studies concentrating on LLM vulnerabilities and lacking exploration of defense-enhanced LLMs. To address these issues, we introduced JailTrickBench to evaluate the impact of various attack settings on LLM performance and provide a baseline for jailbreak attacks, encouraging the adoption of a standardized evaluation framework. Specifically, we evaluate the eight key factors of implementing jailbreak attacks on LLMs from both target-level and attack-level perspectives. We further conduct seven representative jailbreak attacks on six defense methods across two widely used datasets, encompassing approximately 354 experiments with about 55, 000 GPU hours on A800-80G. Our experimental results highlight the need for standardized benchmarking to evaluate these attacks on defense-enhanced LLMs. Our code is available at https: //github. com/usail-hkust/JailTrickBench.

ICRA Conference 2024 Conference Paper

Bio-Inspired Pupal-Mode Actuator with Ultra-Crossing Capability for Soft Robots

  • Zhenxing Wang
  • Xiao He
  • Yuhang Zhang
  • Cheng Zhang
  • Lei Sun
  • Zhidong Wang
  • Shun Xu
  • Hao Liu

Robot-assisted Natural Orifice Translu-minal Endoscopic Surgery (NOTES) represents a paradigm shift in surgical practice, significantly mini-mizing patient morbidity. However, the variability of inner diameter and the inter-luminal crossing within the luminal tracts lead to challenge for effective robotic intervention. Inspired by the motion of the chrysalis during its transformation, we designed an innovative pupal-mode actuator for NOTES robots. Through the manipulation of its internal air chambers, this actuator is capable of replicating wriggle-like movements. Through experimental analysis, we have acquired the constitutive characteristics of this actuator. Subsequently, an innovative gastric endoscopy robot is developed base the actuator and tested in a phantom. The results of the task simulations substantiate that the pupal-mode actuator has the capability to reduce resistance and enhance the safety of the endoscopic intervention.

JMLR Journal 2024 Journal Article

Deep Nonparametric Estimation of Operators between Infinite Dimensional Spaces

  • Hao Liu
  • Haizhao Yang
  • Minshuo Chen
  • Tuo Zhao
  • Wenjing Liao

Learning operators between infinitely dimensional spaces is an important learning task arising in machine learning, imaging science, mathematical modeling and simulations, etc. This paper studies the nonparametric estimation of Lipschitz operators using deep neural networks. Non-asymptotic upper bounds are derived for the generalization error of the empirical risk minimizer over a properly chosen network class. Under the assumption that the target operator exhibits a low dimensional structure, our error bounds decay as the training sample size increases, with an attractive fast rate depending on the intrinsic dimension in our estimation. Our assumptions cover most scenarios in real applications and our results give rise to fast rates by exploiting low dimensional structures of data in operator estimation. We also investigate the influence of network structures (e.g., network width, depth, and sparsity) on the generalization error of the neural network estimator and propose a general suggestion on the choice of network structures to maximize the learning efficiency quantitatively. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

NeurIPS Conference 2024 Conference Paper

Disentangling Linear Quadratic Control with Untrusted ML Predictions

  • Tongxin Li
  • Hao Liu
  • Yisong Yue

Uncertain perturbations in dynamical systems often arise from diverse resources, represented by latent components. The predictions for these components, typically generated by "black-box" machine learning tools, are prone to inaccuracies. To tackle this challenge, we introduce DISC, a novel policy that learns a confidence parameter online to harness the potential of accurate predictions while also mitigating the impact of erroneous forecasts. When predictions are precise, DISC leverages this information to achieve near-optimal performance. Conversely, in the case of significant prediction errors, it still has a worst-case competitive ratio guarantee. We provide competitive ratio bounds for DISC under both linear mixing of latent variables as well as a broader class of mixing functions. Our results highlight a first-of-its-kind "best-of-both-worlds" integration of machine-learned predictions, thus lead to a near-optimal consistency and robustness tradeoff, which provably improves what can be obtained without learning the confidence parameter. We validate the applicability of DISC across a spectrum of practical scenarios.

TMLR Journal 2024 Journal Article

Distributionally Robust Policy Evaluation under General Covariate Shift in Contextual Bandits

  • Yihong Guo
  • Hao Liu
  • Yisong Yue
  • Anqi Liu

We introduce a distributionally robust approach that enhances the reliability of offline policy evaluation in contextual bandits under general covariate shifts. Our method aims to deliver robust policy evaluation results in the presence of discrepancies in both context and policy distribution between logging and target data. Central to our methodology is the application of robust regression — a distributionally robust technique tailored here to improve the estimation of conditional reward distribution from logging data. Utilizing the reward model obtained from robust regression, we develop a comprehensive suite of policy value estimators, by integrating our reward model into established evaluation frameworks, namely direct methods and doubly robust methods. Through theoretical analysis, we further establish that the proposed policy value estimators offer a finite sample upper bound for the bias, providing a clear advantage over traditional methods, especially when the shift is large. Finally, we designed an extensive range of policy evaluation scenarios, covering diverse magnitudes of shifts and a spectrum of logging and target policies. Our empirical results indicate that our approach significantly outperforms baseline methods, most notably in 90% of the cases under the policy shift-only settings and 72% of the scenarios under the general covariate shift settings.

IJCAI Conference 2024 Conference Paper

Dynamicity-aware Social Bot Detection with Dynamic Graph Transformers

  • Buyun He
  • Yingguang Yang
  • Qi Wu
  • Hao Liu
  • Renyu Yang
  • Hao Peng
  • Xiang Wang
  • Yong Liao

Detecting social bots has evolved into a pivotal yet intricate task, aimed at combating the dissemination of misinformation and preserving the authenticity of online interactions. While earlier graph-based approaches, which leverage topological structure of social networks, yielded notable outcomes, they overlooked the inherent dynamicity of social networks -- In reality, they largely depicted the social network as a static graph and solely relied on its most recent state. Due to the absence of dynamicity modeling, such approaches are vulnerable to evasion, particularly when advanced social bots interact with other users to camouflage identities and escape detection. To tackle these challenges, we propose BotDGT, a novel framework that not only considers the topological structure, but also effectively incorporates dynamic nature of social network. Specifically, we characterize a social network as a dynamic graph. A structural module is employed to acquire topological information from each historical snapshot. Additionally, a temporal module is proposed to integrate historical context and model the evolving behavior patterns exhibited by social bots and legitimate users. Experimental results demonstrate the superiority of BotDGT against the leading methods that neglected the dynamic nature of social networks in terms of accuracy, recall, and F1-score.

AAAI Conference 2024 Conference Paper

Grab What You Need: Rethinking Complex Table Structure Recognition with Flexible Components Deliberation

  • Hao Liu
  • Xin Li
  • Mingming Gong
  • Bing Liu
  • Yunfei Wu
  • Deqiang Jiang
  • Yinsong Liu
  • Xing Sun

Recently, Table Structure Recognition (TSR) task, aiming at identifying table structure into machine readable formats, has received increasing interest in the community. While impressive success, most single table component-based methods can not perform well on unregularized table cases distracted by not only complicated inner structure but also exterior capture distortion. In this paper, we raise it as Complex TSR problem, where the performance degeneration of existing methods is attributable to their inefficient component usage and redundant post-processing. To mitigate it, we shift our perspective from table component extraction towards the efficient multiple components leverage, which awaits further exploration in the field. Specifically, we propose a seminal method, termed GrabTab, equipped with newly proposed Component Deliberator, to handle various types of tables in a unified framework. Thanks to its progressive deliberation mechanism, our GrabTab can flexibly accommodate to most complex tables with reasonable components selected but without complicated post-processing involved. Quantitative experimental results on public benchmarks demonstrate that our method significantly outperforms the state-of-the-arts, especially under more challenging scenes.

NeurIPS Conference 2024 Conference Paper

Harmonizing Visual Text Comprehension and Generation

  • Zhen Zhao
  • Jingqun Tang
  • Binghong Wu
  • Chunhui Lin
  • Shu Wei
  • Hao Liu
  • Xin Tan
  • Zhizhong Zhang

In this work, we present TextHarmony, a unified and versatile multimodal generative model proficient in comprehending and generating visual text. Simultaneously generating images and texts typically results in performance degradation due to the inherent inconsistency between vision and language modalities. To overcome this challenge, existing approaches resort to modality-specific data for supervised fine-tuning, necessitating distinct model instances. We propose Slide-LoRA, which dynamically aggregates modality-specific and modality-agnostic LoRA experts, partially decoupling the multimodal generation space. Slide-LoRA harmonizes the generation of vision and language within a singular model instance, thereby facilitating a more unified generative process. Additionally, we develop a high-quality image caption dataset, DetailedTextCaps-100K, synthesized with a sophisticated closed-source MLLM to enhance visual text generation capabilities further. Comprehensive experiments across various benchmarks demonstrate the effectiveness of the proposed approach. Empowered by Slide-LoRA, TextHarmony achieves comparable performance to modality-specific fine-tuning results with only a 2% increase in parameters and shows an average improvement of 2. 5% in visual text comprehension tasks and 4. 0% in visual text generation tasks. Our work delineates the viability of an integrated approach to multimodal generation within the visual text domain, setting a foundation for subsequent inquiries. Code is available at https: //github. com/bytedance/TextHarmony.

ICRA Conference 2024 Conference Paper

Optimal Containment Control of Multiple Quadrotors via Reinforcement Learning

  • Ming Cheng
  • Hao Liu
  • Deyuan Liu
  • Haibo Gu
  • Xiangke Wang

This paper explores the optimal containment control problem for nonlinear and underactuated quadrotors with multiple team leaders governed by nonlinear dynamics, employing the reinforcement learning. A cascade controller is formulated, comprising a position control component to ensure containment achievement and an attitude control component to govern rotational channel. The proposed optimal control protocols derived from historical data collected from quadrotor systems without requirement for exact knowledge of vehicle dynamics. The simulation illustrates the effectiveness of the proposed controller in managing a quadrotor team with multiple leaders.

ICML Conference 2024 Conference Paper

Position: TrustLLM: Trustworthiness in Large Language Models

  • Yue Huang 0001
  • Lichao Sun 0001
  • Haoran Wang 0005
  • Siyuan Wu 0001
  • Qihui Zhang
  • Yuan Li
  • Chujie Gao
  • Yixin Huang

Large language models (LLMs) have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and capability (i. e. , functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones, suggesting that open-source models can achieve high levels of trustworthiness without additional mechanisms like moderator, offering valuable insights for developers in this field. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Besides these observations, we’ve uncovered key insights into the multifaceted trustworthiness in LLMs. We emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. We advocate that the establishment of an AI alliance between industry, academia, the open-source community to foster collaboration is imperative to advance the trustworthiness of LLMs.

YNIMG Journal 2024 Journal Article

Relationships between brain structure-function coupling in normal aging and cognition: A cross-ethnicity population-based study

  • Chang Liu
  • Jing Jing
  • Jiyang Jiang
  • Wei Wen
  • Wanlin Zhu
  • Zixiao Li
  • Yuesong Pan
  • Xueli Cai

Increased efforts in neuroscience seek to understand how macro-anatomical and physiological connectomes cooperatively work to generate cognitive behaviors. However, the structure-function coupling characteristics in normal aging individuals remain unclear. Here, we developed an index, the Coupling in Brain Structural connectome and Functional connectome (C-BSF) index, to quantify regional structure-function coupling in a large community-based cohort. C-BSF used diffusion tensor imaging (DTI) and resting-state functional magnetic resonance imaging (fMRI) data from the Polyvascular Evaluation for Cognitive Impairment and Vascular Events study (PRECISE) cohort (2007 individuals, age: 61.15 ± 6.49 years) and the Sydney Memory and Ageing Study (MAS) cohort (254 individuals, age: 83.45 ± 4.33 years). We observed that structure-function coupling was the strongest in the visual network and the weakest in the ventral attention network. We also observed that the weaker structure-function coupling was associated with increased age and worse cognitive level of the participant. Meanwhile, the structure-function coupling in the visual network was associated with the visuospatial performance and partially mediated the connections between age and the visuospatial function. This work contributes to our understanding of the underlying brain mechanisms by which aging affects cognition and also help establish early diagnosis and treatment approaches for neurological diseases in the elderly.

AAAI Conference 2024 Conference Paper

Self-Paced Unified Representation Learning for Hierarchical Multi-Label Classification

  • Zixuan Yuan
  • Hao Liu
  • Haoyi Zhou
  • Denghui Zhang
  • Xiao Zhang
  • Hao Wang
  • Hui Xiong

Hierarchical Multi-Label Classification (HMLC) is a well-established problem that aims at assigning data instances to multiple classes stored in a hierarchical structure. Despite its importance, existing approaches often face two key limitations: (i) They employ dense networks to solely explore the class hierarchy as hard criterion for maintaining taxonomic consistency among predicted classes, yet without leveraging rich semantic relationships between instances and classes; (ii) They struggle to generalize in settings with deep class levels, since the mini-batches uniformly sampled from different levels ignore the varying complexities of data and result in a non-smooth model adaptation to sparse data. To mitigate these issues, we present a Self-Paced Unified Representation (SPUR) learning framework, which focuses on the interplay between instance and classes to flexibly organize the training process of HMLC algorithms. Our framework consists of two lightweight encoders designed to capture the semantics of input features and the topological information of the class hierarchy. These encoders generate unified embeddings of instances and class hierarchy, which enable SPUR to exploit semantic dependencies between them and produce predictions in line with taxonomic constraints. Furthermore, we introduce a dynamic hardness measurement strategy that considers both class hierarchy and instance features to estimate the learning difficulty of each instance. This strategy is achieved by incorporating the propagation loss obtained at each hierarchical level, allowing for a more comprehensive assessment of learning complexity. Extensive experiments on several empirical benchmarks demonstrate the effectiveness and efficiency of SPUR compared to state-of-the-art methods, especially in scenarios with missing features.

NeurIPS Conference 2024 Conference Paper

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy

  • Weichao Zhao
  • Hao Feng
  • Qi Liu
  • Jingqun Tang
  • Shu Wei
  • Binghong Wu
  • Lei Liao
  • Yongjie Ye

Tables contain factual and quantitative data accompanied by various structures and contents that pose challenges for machine comprehension. Previous methods generally design task-specific architectures and objectives for individual tasks, resulting in modal isolation and intricate workflows. In this paper, we present a novel large vision-language model, TabPedia, equipped with a concept synergy mechanism. In this mechanism, all the involved diverse visual table understanding (VTU) tasks and multi-source visual embeddings are abstracted as concepts. This unified framework allows TabPedia to seamlessly integrate VTU tasks, such as table detection, table structure recognition, table querying, and table question answering, by leveraging the capabilities of large language models (LLMs). Moreover, the concept synergy mechanism enables table perception-related and comprehension-related tasks to work in harmony, as they can effectively leverage the needed clues from the corresponding source perception embeddings. Furthermore, to better evaluate the VTU task in real-world scenarios, we establish a new and comprehensive table VQA benchmark, ComTQA, featuring approximately 9, 000 QA pairs. Extensive quantitative and qualitative experiments on both table perception and comprehension tasks, conducted across various public benchmarks, validate the effectiveness of our TabPedia. The superior performance further confirms the feasibility of using LLMs for understanding visual tables when all concepts work in synergy. The benchmark ComTQA has been open-sourced at https: //huggingface. co/datasets/ByteDance/ComTQA. The source code and model also have been released at https: //github. com/zhaowc-ustc/TabPedia.

NeurIPS Conference 2024 Conference Paper

UrbanKGent: A Unified Large Language Model Agent Framework for Urban Knowledge Graph Construction

  • Yansong Ning
  • Hao Liu

Urban knowledge graph has recently worked as an emerging building block to distill critical knowledge from multi-sourced urban data for diverse urban application scenarios. Despite its promising benefits, urban knowledge graph construction (UrbanKGC) still heavily relies on manual effort, hindering its potential advancement. This paper presents UrbanKGent, a unified large language model agent framework, for urban knowledge graph construction. Specifically, we first construct the knowledgeable instruction set for UrbanKGC tasks (such as relational triplet extraction and knowledge graph completion) via heterogeneity-aware and geospatial-infused instruction generation. Moreover, we propose a tool-augmented iterative trajectory refinement module to enhance and refine the trajectories distilled from GPT-4. Through hybrid instruction fine-tuning with augmented trajectories on Llama 2 and Llama 3 family, we obtain UrbanKGC agent family, consisting of UrbanKGent-7/8/13B version. We perform a comprehensive evaluation on two real-world datasets using both human and GPT-4 self-evaluation. The experimental results demonstrate that UrbanKGent family can not only significantly outperform 31 baselines in UrbanKGC tasks, but also surpass the state-of-the-art LLM, GPT-4, by more than 10% with approximately 20 times lower cost. Compared with the existing benchmark, the UrbanKGent family could help construct an UrbanKG with hundreds of times richer relationships using only one-fifth of the data. Our data and code are available at https: //github. com/usail-hkust/UrbanKGent.

EAAI Journal 2023 Journal Article

A geometry-aware deep network for depth estimation in monocular endoscopy

  • Yongming Yang
  • Shuwei Shao
  • Tao Yang
  • Peng Wang
  • Zhuo Yang
  • Chengdong Wu
  • Hao Liu

Monocular depth estimation is critical for endoscopists to perform spatial perception and 3D navigation of surgical sites. However, most of the existing methods ignore the important geometric structural consistency, which inevitably leads to performance degradation and distortion of 3D reconstruction. To address this issue, we introduce a gradient loss to penalize edge fluctuations ambiguous around stepped edge structures and a normal loss to explicitly express the sensitivity to frequently small structures, and propose a geometric consistency loss to spreads the spatial information across the sample grids to constrain the global geometric anatomy structures. In addition, we develop a synthetic RGB-Depth dataset that captures the anatomical structures under reflections and illumination variations. The proposed method is extensively validated across different datasets and clinical images and achieves mean RMSE values of 0. 066 (stomach), 0. 029 (small intestine), and 0. 139 (colon) on the EndoSLAM dataset. The generalizability of the proposed method achieves mean RMSE values of 12. 604 (T1-L1), 9. 930 (T2-L2), and 13. 893 (T3-L3) on the ColonDepth dataset. The experimental results show that our method exceeds previous state-of-the-art competitors and generates more consistent depth maps and reasonable anatomical structures. The quality of intraoperative 3D structure perception from endoscopic videos of the proposed method meets the accuracy requirements of video-CT registration algorithms for endoscopic navigation. The dataset and the source code will be available at https: //github. com/YYM-SIA/LINGMI-MR.

NeurIPS Conference 2023 Conference Paper

Blockwise Parallel Transformers for Large Context Models

  • Hao Liu
  • Pieter Abbeel

Transformers have emerged as the cornerstone of state-of-the-art natural language processing models, showcasing exceptional performance across a wide range of AI applications. However, the memory demands posed by the self-attention mechanism and the large feedforward network in Transformers limit their ability to handle long sequences, thereby creating challenges for tasks involving multiple long sequences or long-term dependencies. We present a distinct approach, Blockwise Parallel Transformer (BPT), that leverages blockwise computation of self-attention and feedforward network fusion to minimize memory costs. By processing longer input sequences while maintaining memory efficiency, BPT enables training sequences 32 times longer than vanilla Transformers and up to 4 times longer than previous memory-efficient methods. Extensive experiments on language modeling and reinforcement learning tasks demonstrate the effectiveness of BPT in reducing memory requirements and improving performance.

NeurIPS Conference 2023 Conference Paper

DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models

  • Ying Fan
  • Olivia Watkins
  • Yuqing Du
  • Hao Liu
  • Moonkyung Ryu
  • Craig Boutilier
  • Pieter Abbeel
  • Mohammad Ghavamzadeh

Learning from human feedback has been shown to improve text-to-image models. These techniques first learn a reward function that captures what humans care about in the task and then improve the models based on the learned reward function. Even though relatively simple approaches (e. g. , rejection sampling based on reward scores) have been investigated, fine-tuning text-to-image models with the reward function remains challenging. In this work, we propose using online reinforcement learning (RL) to fine-tune text-to-image models. We focus on diffusion models, defining the fine-tuning task as an RL problem, and updating the pre-trained text-to-image diffusion models using policy gradient to maximize the feedback-trained reward. Our approach, coined DPOK, integrates policy optimization with KL regularization. We conduct an analysis of KL regularization for both RL fine-tuning and supervised fine-tuning. In our experiments, we show that DPOK is generally superior to supervised fine-tuning with respect to both image-text alignment and image quality. Our code is available at https: //github. com/google-research/google-research/tree/master/dpok.

JBHI Journal 2023 Journal Article

Efficient Large-Scale Virtual Screening Based on Heterogeneous Many-Core Supercomputing System

  • Hao Liu
  • Cunji Wang
  • Peng Liu
  • Chengchao Liu
  • Zhuoya Wang
  • Zhiqiang Wei

With the rapid growth of virtual drug data- bases, the need for efficient molecular docking tools for large-scale screening is also growing. We have developed Vina@QNLM 2. 0, a novel molecular docking system that leverages the logical processing units and computational processing arrays of heterogeneous multicore architecture processors. Compared to Vina@QNLM, the new version optimizes the docking speed without sacrificing accuracy. This greatly improves the scoring capability for large molecules (molecular weight > 500). Simultaneously, the new system provides enhanced support for applications such as reverse target finding through an improved parallel strategy. Vina@QNLM 2. 0 achieves a speedup 20 times higher than that, using logical processing units only during a single docking process. Additionally, we successfully scaled the reverse target finding a task to 122, 401 kernel groups with a robust scalability of 80. 01%. In practice, we completed a reverse target-seeking for nine glycan molecules with 10, 094 proteins within 1 hour.

NeurIPS Conference 2023 Conference Paper

Extending the Design Space of Graph Neural Networks by Rethinking Folklore Weisfeiler-Lehman

  • Jiarui Feng
  • Lecheng Kong
  • Hao Liu
  • Dacheng Tao
  • Fuhai Li
  • Muhan Zhang
  • Yixin Chen

Message passing neural networks (MPNNs) have emerged as the most popular framework of graph neural networks (GNNs) in recent years. However, their expressive power is limited by the 1-dimensional Weisfeiler-Lehman (1-WL) test. Some works are inspired by $k$-WL/FWL (Folklore WL) and design the corresponding neural versions. Despite the high expressive power, there are serious limitations in this line of research. In particular, (1) $k$-WL/FWL requires at least $O(n^k)$ space complexity, which is impractical for large graphs even when $k=3$; (2) The design space of $k$-WL/FWL is rigid, with the only adjustable hyper-parameter being $k$. To tackle the first limitation, we propose an extension, $(k, t)$-FWL. We theoretically prove that even if we fix the space complexity to $O(n^k)$ (for any $k \geq 2$) in $(k, t)$-FWL, we can construct an expressiveness hierarchy up to solving the graph isomorphism problem. To tackle the second problem, we propose $k$-FWL+, which considers any equivariant set as neighbors instead of all nodes, thereby greatly expanding the design space of $k$-FWL. Combining these two modifications results in a flexible and powerful framework $(k, t)$-FWL+. We demonstrate $(k, t)$-FWL+ can implement most existing models with matching expressiveness. We then introduce an instance of $(k, t)$-FWL+ called Neighborhood$^2$-FWL (N$^2$-FWL), which is practically and theoretically sound. We prove that N$^2$-FWL is no less powerful than 3-WL, and can encode many substructures while only requiring $O(n^2)$ space. Finally, we design its neural version named **N$^2$-GNN** and evaluate its performance on various tasks. N$^2$-GNN achieves record-breaking results on ZINC-Subset (**0. 059**), outperforming previous SOTA results by 10. 6\%. Moreover, N$^2$-GNN achieves new SOTA results on the BREC dataset (**71. 8\%**) among all existing high-expressive GNN methods.

NeurIPS Conference 2023 Conference Paper

Guide Your Agent with Adaptive Multimodal Rewards

  • Changyeon Kim
  • Younggyo Seo
  • Hao Liu
  • Lisa Lee
  • Jinwoo Shin
  • Honglak Lee
  • Kimin Lee

Developing an agent capable of adapting to unseen environments remains a difficult challenge in imitation learning. This work presents Adaptive Return-conditioned Policy (ARP), an efficient framework designed to enhance the agent's generalization ability using natural language task descriptions and pre-trained multimodal encoders. Our key idea is to calculate a similarity between visual observations and natural language instructions in the pre-trained multimodal embedding space (such as CLIP) and use it as a reward signal. We then train a return-conditioned policy using expert demonstrations labeled with multimodal rewards. Because the multimodal rewards provide adaptive signals at each timestep, our ARP effectively mitigates the goal misgeneralization. This results in superior generalization performances even when faced with unseen text instructions, compared to existing text-conditioned policies. To improve the quality of rewards, we also introduce a fine-tuning method for pre-trained multimodal encoders, further enhancing the performance. Video demonstrations and source code are available on the project website: \url{https: //sites. google. com/view/2023arp}.

NeurIPS Conference 2023 Conference Paper

Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment

  • Hao Liu
  • Wilson Yan
  • Pieter Abbeel

Recent progress in scaling up large language models has shown impressive capabilities in performing few-shot learning across a wide range of natural language tasks. However, a key limitation is that these language models fundamentally lack grounding to visual perception - a crucial attribute needed to extend to real world tasks such as in visual-question answering and robotics. While prior works have largely connected image to text through pretraining or fine-tuning, learning such alignments are generally costly due to a combination of curating massive datasets and large computational burdens. In order to resolve these limitations, we propose a simple yet effective approach called Language-Quantized AutoEncoder (LQAE), a modification of VQ-VAE that learns to align text-image data in an unsupervised manner by leveraging pretrained language model denoisers (e. g. , BERT). Our main idea is to encode images as sequences of text tokens by directly quantizing image embeddings using a pretrained language codebook. We then feed a masked version of the quantized embeddings into a BERT to reconstruct the original input. By doing so, LQAE learns to represent similar images with similar clusters of text tokens, thereby aligning these two modalities without the use of aligned text-image pairs. We show LQAE learns text-aligned image tokens that enable few-shot multi-modal learning with large language models, outperforming baseline methods in tasks such as image classification and VQA while requiring as few as 1-10 image-text pairs.

AAAI Conference 2023 Conference Paper

Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA

  • Yongxin Zhu
  • Zhen Liu
  • Yukang Liang
  • Xin Li
  • Hao Liu
  • Changcun Bao
  • Linli Xu

In this paper, we propose a novel multi-modal framework for Scene Text Visual Question Answering (STVQA), which requires models to read scene text in images for question answering. Apart from text or visual objects, which could exist independently, scene text naturally links text and visual modalities together by conveying linguistic semantics while being a visual object in an image simultaneously. Different to conventional STVQA models which take the linguistic semantics and visual semantics in scene text as two separate features, in this paper, we propose a paradigm of "Locate Then Generate" (LTG), which explicitly unifies this two semantics with the spatial bounding box as a bridge connecting them. Specifically, at first, LTG locates the region in an image that may contain the answer words with an answer location module (ALM) consisting of a region proposal network and a language refinement network, both of which can transform to each other with one-to-one mapping via the scene text bounding box. Next, given the answer words selected by ALM, LTG generates a readable answer sequence with an answer generation module (AGM) based on a pre-trained language model. As a benefit of the explicit alignment of the visual and linguistic semantics, even without any scene text based pre-training tasks, LTG can boost the absolute accuracy by +6.06% and +6.92% on the TextVQA dataset and the ST-VQA dataset respectively, compared with a non-pre-training baseline. We further demonstrate that LTG effectively unifies visual and text modalities through the spatial bounding box connection, which is underappreciated in previous methods.

NeurIPS Conference 2023 Conference Paper

MAG-GNN: Reinforcement Learning Boosted Graph Neural Network

  • Lecheng Kong
  • Jiarui Feng
  • Hao Liu
  • Dacheng Tao
  • Yixin Chen
  • Muhan Zhang

While Graph Neural Networks (GNNs) recently became powerful tools in graph learning tasks, considerable efforts have been spent on improving GNNs' structural encoding ability. A particular line of work proposed subgraph GNNs that use subgraph information to improve GNNs' expressivity and achieved great success. However, such effectivity sacrifices the efficiency of GNNs by enumerating all possible subgraphs. In this paper, we analyze the necessity of complete subgraph enumeration and show that a model can achieve a comparable level of expressivity by considering a small subset of the subgraphs. We then formulate the identification of the optimal subset as a combinatorial optimization problem and propose Magnetic Graph Neural Network (MAG-GNN), a reinforcement learning (RL) boosted GNN, to solve the problem. Starting with a candidate subgraph set, MAG-GNN employs an RL agent to iteratively update the subgraphs to locate the most expressive set for prediction. This reduces the exponential complexity of subgraph enumeration to the constant complexity of a subgraph search algorithm while keeping good expressivity. We conduct extensive experiments on many datasets, showing that MAG-GNN achieves competitive performance to state-of-the-art methods and even outperforms many subgraph GNNs. We also demonstrate that MAG-GNN effectively reduces the running time of subgraph GNNs.

JBHI Journal 2023 Journal Article

Multi-Level Constrained Intra and Inter Subject Feature Representation for Facial Video Based BVP Signal Measurement

  • Bin Li
  • Wei Zhang
  • Hong Fu
  • Hao Liu
  • Feng Xu

Facial video-based blood volume pulse (BVP) signal measurement holds great potential for remote health monitoring, while existing methods have issues with convolutional kernel perceptual field constraints. This article proposes an end-to-end multi-level constrained spatiotemporal representation structure for facial video-based BVP signal measurement. First, an intra- and inter-subject feature representation is proposed to strengthen the BVP-related features generation at high, semantic, and shallow levels, respectively. Second, the global-local association is presented to enhance BVP signal period pattern learning, and the global temporal features are introduced into the local spatial convolution of each frame by adaptive kernel weights. Finally, the multi-dimensional fused features are mapped to one-dimensional BVP signals by the task-oriented signal estimator. The experimental results on the publicly available MMSE-HR dataset demonstrate that the proposed structure overperforms state-of-the-art methods (e. g. , AutoHR) in BVP signal measurement, with a 20% and 40% reduction in mean absolute error and root mean squared error, respectively. The proposed structure would be a powerful tool for telemedical and non-contact heart health monitoring.

AAAI Conference 2023 Conference Paper

TaCo: Textual Attribute Recognition via Contrastive Learning

  • Chang Nie
  • Yiqing Hu
  • Yanqiu Qu
  • Hao Liu
  • Deqiang Jiang
  • Bo Ren

As textual attributes like font are core design elements of document format and page style, automatic attributes recognition favor comprehensive practical applications. Existing approaches already yield satisfactory performance in differentiating disparate attributes, but they still suffer in distinguishing similar attributes with only subtle difference. Moreover, their performance drop severely in real-world scenarios where unexpected and obvious imaging distortions appear. In this paper, we aim to tackle these problems by proposing TaCo, a contrastive framework for textual attribute recognition tailored toward the most common document scenes. Specifically, TaCo leverages contrastive learning to dispel the ambiguity trap arising from vague and open-ended attributes. To realize this goal, we design the learning paradigm from three perspectives: 1) generating attribute views, 2) extracting subtle but crucial details, and 3) exploiting valued view pairs for learning, to fully unlock the pre-training potential. Extensive experiments show that TaCo surpasses the supervised counterparts and advances the state-of-the-art remarkably on multiple attribute recognition tasks. Online services of TaCo will be made available.

AAAI Conference 2023 Conference Paper

The Devil Is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-training

  • Hao Liu
  • Xinghua Jiang
  • Xin Li
  • Antai Guo
  • Yiqing Hu
  • Deqiang Jiang
  • Bo Ren

The self-supervised Masked Image Modeling (MIM) schema, following "mask-and-reconstruct" pipeline of recovering contents from masked image, has recently captured the increasing interest in the community, owing to the excellent ability of learning visual representation from unlabeled data. Aiming at learning representations with high semantics abstracted, a group of works attempts to reconstruct non-semantic pixels with large-ratio masking strategy, which may suffer from "over-smoothing" problem, while others directly infuse semantics into targets in off-line way requiring extra data. Different from them, we shift the perspective to the Fourier domain which naturally has global perspective and present a new Masked Image Modeling (MIM), termed Geminated Gestalt Autoencoder (Ge^2-AE) for visual pre-training. Specifically, we equip our model with geminated decoders in charge of reconstructing image contents from both pixel and frequency space, where each other serves as not only the complementation but also the reciprocal constraints. Through this way, more robust representations can be learned in the pre-trained encoders, of which the effectiveness is confirmed by the juxtaposing experimental results on downstream recognition tasks. We also conduct several quantitative and qualitative experiments to investigate the learning behavior of our method. To our best knowledge, this is the first MIM work to solve the visual pre-training through the lens of frequency domain.

EAAI Journal 2023 Journal Article

U-SMR: U-SwinT & multi-residual network for fabric defect detection

  • Hao Qu
  • Lan Di
  • Jiuzhen Liang
  • Hao Liu

Fabric defect detection methods based on deep networks are widely used in the textile industry, but they often suffer from poor model generalization and blurry edge detection. To resolve these challenges, we propose a novel network called “U-SMR Net”, which integrates global contextual features, defect detail features, and high-level semantic features through the combination of ResNet-50 and Swin Transformer modules. Our U-SMR network includes a lightweight multiscale feature extraction module, the dual-branch pyramid Module (DBPM), which is nested to preserve high-resolution, shallow semantic information. We propose a recursive multi-level residual decoding block for multiscale fusion to refine, filter, and enhance input characteristics, generating prediction maps at multiple stages, and by employing an improved binary cross entropy loss function to supervise saliency mapping. The experimental results based on four groups from ZJU-Leaper dataset demonstrate the superior performance of our approach compared to other competitive methods by achieving an average f m e a s u r e score of 75. 33%, and finally testing results from both ZJU-Leaper-Total dataset and the HKU-Fabric dataset further support our U-SMR Net’s validity and generalization ability.

EAAI Journal 2023 Journal Article

Ultimate bearing capacity prediction method and sensitivity analysis of PBL

  • Yixin Chen
  • Yanke Huang
  • Hao Liu
  • Yongsheng Liu
  • Ting Zhang

The ultimate bearing capacity of Perfobond leiste (PBL) is one of the key parameters to evaluate the bearing capacity and reliability of steel–concrete structures, and it is very important to predict the ultimate bearing capacity of PBL more accurately. Based on the improved cuckoo search algorithm (CS), two prediction model optimized by back propagation neural network (BPNN) algorithm and extreme learning machine (ELM) were proposed. The local search ability of CS algorithm was improved by the triangular mutation operator and the distance-based distributed discovery probability, and the global search ability was improved by multi-step selection strategy. The weight, threshold, number of input parameters and number of nodes in the hidden layer of BPNN and ELM were optimized by the triangular multi-step cuckoo search (TMCS). The comprehensive sensitivity analysis (CSA) method and Morris sensitivity analysis (MSA) method were used to analyze the sensitivity of six key parameters of PBL, such as thickness of perforated steel plate, diameter of perforated holes, number of perforated holes, diameter of through reinforcement, yield strength of through reinforcement and compressive strength of concrete. The experimental data of push-out tests in published literatures were selected as samples, and the results show that the proposed TMCS-ELM algorithm and TMCS-BPNN algorithm can accurately predict the ultimate bearing capacity of PBL, and the average errors are 4. 17% and 2. 16% respectively. The sensitivity analysis results show that the compressive strength of concrete has the highest influence on the bearing capacity of PBL, followed by the yield strength of through reinforcement.

NeurIPS Conference 2023 Conference Paper

UUKG: Unified Urban Knowledge Graph Dataset for Urban Spatiotemporal Prediction

  • Yansong Ning
  • Hao Liu
  • Hao Wang
  • Zhenyu Zeng
  • Hui Xiong

Accurate Urban SpatioTemporal Prediction (USTP) is of great importance to the development and operation of the smart city. As an emerging building block, multi-sourced urban data are usually integrated as urban knowledge graphs (UrbanKGs) to provide critical knowledge for urban spatiotemporal prediction models. However, existing UrbanKGs are often tailored for specific downstream prediction tasks and are not publicly available, which limits the potential advancement. This paper presents UUKG, the unified urban knowledge graph dataset for knowledge-enhanced urban spatiotemporal predictions. Specifically, we first construct UrbanKGs consisting of millions of triplets for two metropolises by connecting heterogeneous urban entities such as administrative boroughs, POIs, and road segments. Moreover, we conduct qualitative and quantitative analysis on constructed UrbanKGs and uncover diverse high-order structural patterns, such as hierarchies and cycles, that can be leveraged to benefit downstream USTP tasks. To validate and facilitate the use of UrbanKGs, we implement and evaluate 15 KG embedding methods on the KG completion task and integrate the learned KG embeddings into 9 spatiotemporal models for five different USTP tasks. The extensive experimental results not only provide benchmarks of knowledge-enhanced USTP models under different task settings but also highlight the potential of state-of-the-art high-order structure-aware UrbanKG embedding methods. We hope the proposed UUKG fosters research on urban knowledge graphs and broad smart city applications. The dataset and source code are available at https: //github. com/usail-hkust/UUKG/.

NeurIPS Conference 2023 Conference Paper

When Visual Prompt Tuning Meets Source-Free Domain Adaptive Semantic Segmentation

  • Xinhong Ma
  • Yiming Wang
  • Hao Liu
  • Tianyu Guo
  • Yunhe Wang

Source-free domain adaptive semantic segmentation aims to adapt a pre-trained source model to the unlabeled target domain without accessing the private source data. Previous methods usually fine-tune the entire network, which suffers from expensive parameter tuning. To avoid this problem, we propose to utilize visual prompt tuning for parameter-efficient adaptation. However, the existing visual prompt tuning methods are unsuitable for source-free domain adaptive semantic segmentation due to the following two reasons: (1) Commonly used visual prompts like input tokens or pixel-level perturbations cannot reliably learn informative knowledge beneficial for semantic segmentation. (2) Visual prompts require sufficient labeled data to fill the gap between the pre-trained model and downstream tasks. To alleviate these problems, we propose a universal unsupervised visual prompt tuning (Uni-UVPT) framework, which is applicable to various transformer-based backbones. Specifically, we first divide the source pre-trained backbone with frozen parameters into multiple stages, and propose a lightweight prompt adapter for progressively encoding informative knowledge into prompts and enhancing the generalization of target features between adjacent backbone stages. Cooperatively, a novel adaptive pseudo-label correction strategy with a multiscale consistency loss is designed to alleviate the negative effect of target samples with noisy pseudo labels and raise the capacity of visual prompts to spatial perturbations. Extensive experiments demonstrate that Uni-UVPT achieves state-of-the-art performance on GTA5 $\to$ Cityscapes and SYNTHIA $\to$ Cityscapes tasks and can serve as a universal and parameter-efficient framework for large-model unsupervised knowledge transfer. Code will be available at https: //gitee. com/mindspore/models/tree/master/research/cv/uni-uvpt and https: //github. com/huawei-noah/noah-research/tree/master/uni-uvpt.

ICLR Conference 2022 Conference Paper

Continual Learning with Recursive Gradient Optimization

  • Hao Liu
  • Huaping Liu

Learning multiple tasks sequentially without forgetting previous knowledge, called Continual Learning(CL), remains a long-standing challenge for neural networks. Most existing methods rely on additional network capacity or data replay. In contrast, we introduce a novel approach which we refer to as Recursive Gradient Optimization(RGO). RGO is composed of an iteratively updated optimizer that modifies the gradient to minimize forgetting without data replay and a virtual Feature Encoding Layer(FEL) that represents different long-term structures with only task descriptors. Experiments demonstrate that RGO has significantly better performance on popular continual classification benchmarks when compared to the baselines and achieves new state-of-the-art performance on 20-split-CIFAR100(82.22%) and 20-split-miniImageNet(72.63%). With higher average accuracy than Single-Task Learning(STL), this method is flexible and reliable to provide continual learning capabilities for learning models that rely on gradient descent.

IJCAI Conference 2022 Conference Paper

Feature and Instance Joint Selection: A Reinforcement Learning Perspective

  • Wei Fan
  • Kunpeng Liu
  • Hao Liu
  • Hengshu Zhu
  • Hui Xiong
  • Yanjie Fu

Feature selection and instance selection are two important techniques of data processing. However, such selections have mostly been studied separately, while existing work towards the joint selection conducts feature/instance selection coarsely; thus neglecting the latent fine-grained interaction between feature space and instance space. To address this challenge, we propose a reinforcement learning solution to accomplish the joint selection task and simultaneously capture the interaction between the selection of each feature and each instance. In particular, a sequential-scanning mechanism is designed as action strategy of agents and a collaborative-changing environment is used to enhance agent collaboration. In addition, an interactive paradigm introduces prior selection knowledge to help agents for more efficient exploration. Finally, extensive experiments on real-world datasets have demonstrated improved performances.

IROS Conference 2022 Conference Paper

Hierarchical Learning and Control for In-Hand Micromanipulation Using Multiple Laser-Driven Micro-Tools

  • Yongyi Jia
  • Yu Chen
  • Hao Liu
  • Xiu Li 0001
  • Xiang Li 0009

Laser-driven micro-tools are formulated by treating highly-focused laser beams as actuators, to control the tool's motion to contact then manipulate a micro object, which allows it to manipulate opaque micro objects, or large cells without causing photodamage. However, most existing laser-driven tools are limited to relatively simple tasks, such as moving and caging, and cannot carry out in-hand dexterous tasks. This is mainly because in-hand manipulation involves continuously coordinating multiple laser beams, micro-tools, and the object itself, which has high degrees of freedom (DoF) and poses up challenge for planner and controller design. This paper presents a new hierarchical formulation for the grasping and manipulation of micro objects using multiple laser-driven micro-tools. In hardware, multiple laser-driven tools are assembled to act as a robotic hand to carry out in-hand tasks (e. g. , rotating); in software, a hierarchical scheme is developed to shrunken the action space and coordinate the motion of multiple tools, subject to both the parametric uncertainty in the tool and the unknown dynamic model of the object. Such a formulation provides potential for achieving robotic in-hand manipulation at a micro scale. The performance of the proposed system is validated in simulation studies under different scenarios.

AAAI Conference 2022 Conference Paper

Learning to Walk with Dual Agents for Knowledge Graph Reasoning

  • Denghui Zhang
  • Zixuan Yuan
  • Hao Liu
  • Xiaodong Lin
  • Hui Xiong

Graph walking based on reinforcement learning (RL) has shown great success in navigating an agent to automatically complete various reasoning tasks over an incomplete knowledge graph (KG) by exploring multi-hop relational paths. However, existing multi-hop reasoning approaches only work well on short reasoning paths and tend to miss the target entity with the increasing path length. This is undesirable for many reasoning tasks in real-world scenarios, where short paths connecting the source and target entities are not available in incomplete KGs, and thus the reasoning performances drop drastically unless the agent is able to seek out more clues from longer paths. To address the above challenge, in this paper, we propose a dual-agent reinforcement learning framework, which trains two agents (GIANT and DWARF) to walk over a KG jointly and search for the answer collaboratively. Our approach tackles the reasoning challenge in long paths by assigning one of the agents (GIANT) searching on cluster-level paths quickly and providing stage-wise hints for another agent (DWARF). Finally, experimental results on several KG reasoning benchmarks show that our approach can search answers more accurately and efficiently, and outperforms existing RL-based methods for long path queries by a large margin.

NeurIPS Conference 2022 Conference Paper

Masked Autoencoding for Scalable and Generalizable Decision Making

  • Fangchen Liu
  • Hao Liu
  • Aditya Grover
  • Pieter Abbeel

We are interested in learning scalable agents for reinforcement learning that can learn from large-scale, diverse sequential data similar to current large vision and language models. To this end, this paper presents masked decision prediction (MaskDP), a simple and scalable self-supervised pretraining method for reinforcement learning (RL) and behavioral cloning (BC). In our MaskDP approach, we employ a masked autoencoder (MAE) to state-action trajectories, wherein we randomly mask state and action tokens and reconstruct the missing data. By doing so, the model is required to infer masked out states and actions and extract information about dynamics. We find that masking different proportions of the input sequence significantly helps with learning a better model that generalizes well to multiple downstream tasks. In our empirical study we find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching, and it can zero-shot infer skills from a few example transitions. In addition, MaskDP transfers well to offline RL and shows promising scaling behavior w. r. t. to model size. It is amenable to data efficient finetuning, achieving competitive results with prior methods based on autoregressive pretraining.

NeurIPS Conference 2022 Conference Paper

Palm up: Playing in the Latent Manifold for Unsupervised Pretraining

  • Hao Liu
  • Tom Zahavy
  • Volodymyr Mnih
  • Satinder Singh

Large and diverse datasets have been the cornerstones of many impressive advancements in artificial intelligence. Intelligent creatures, however, learn by interacting with the environment, which changes the input sensory signals and the state of the environment. In this work, we aim to bring the best of both worlds and propose an algorithm that exhibits an exploratory behavior whilst it utilizes large diverse datasets. Our key idea is to leverage deep generative models that are pretrained on static datasets and introduce a dynamic model in the latent space. The transition dynamics simply mixes an action and a random sampled latent. It then applies an exponential moving average for temporal persistency, the resulting latent is decoded to image using pretrained generator. We then employ an unsupervised reinforcement learning algorithm to explore in this environment and perform unsupervised representation learning on the collected data. We further leverage the temporal information of this data to pair data points as a natural supervision for representation learning. Our experiments suggest that the learned representations can be successfully transferred to downstream tasks in both vision and reinforcement learning domains.

AAAI Conference 2022 Conference Paper

Perceiving Stroke-Semantic Context: Hierarchical Contrastive Learning for Robust Scene Text Recognition

  • Hao Liu
  • Bin Wang
  • Zhimin Bao
  • Mobai Xue
  • Sheng Kang
  • Deqiang Jiang
  • Yinsong Liu
  • Bo Ren

We introduce Perceiving Stroke-Semantic Context (PerSec), a new approach to self-supervised representation learning tailored for Scene Text Recognition (STR) task. Considering scene text images carry both visual and semantic properties, we equip our PerSec with dual context perceivers which can contrast and learn latent representations from low-level stroke and high-level semantic contextual spaces simultaneously via hierarchical contrastive learning on unlabeled text image data. Experiments in un- and semi-supervised learning settings on STR benchmarks demonstrate our proposed framework can yield a more robust representation for both CTC-based and attention-based decoders than other contrastive learning methods. To fully investigate the potential of our method, we also collect a dataset of 100 million unlabeled text images, named UTI-100M, covering 5 scenes and 4 languages. By leveraging hundred-million-level unlabeled data, our PerSec shows significant performance improvement when fine-tuning the learned representation on the labeled data. Furthermore, we observe that the representation learned by PerSec presents great generalization, especially under few labeled data scenes.

NeurIPS Conference 2022 Conference Paper

Practical Adversarial Attacks on Spatiotemporal Traffic Forecasting Models

  • Fan Liu
  • Hao Liu
  • Wenzhao Jiang

Machine learning based traffic forecasting models leverage sophisticated spatiotemporal auto-correlations to provide accurate predictions of city-wide traffic states. However, existing methods assume a reliable and unbiased forecasting environment, which is not always available in the wild. In this work, we investigate the vulnerability of spatiotemporal traffic forecasting models and propose a practical adversarial spatiotemporal attack framework. Specifically, instead of simultaneously attacking all geo-distributed data sources, an iterative gradient guided node saliency method is proposed to identify the time-dependent set of victim nodes. Furthermore, we devise a spatiotemporal gradient descent based scheme to generate real-valued adversarial traffic states under a perturbation constraint. Meanwhile, we theoretically demonstrate the worst performance bound of adversarial traffic forecasting attacks. Extensive experiments on two real-world datasets show that the proposed two-step framework achieves up to 67. 8% performance degradation on various advanced spatiotemporal forecasting models. Remarkably, we also show that adversarial training with our proposed attacks can significantly improve the robustness of spatiotemporal traffic forecasting models.

NeurIPS Conference 2022 Conference Paper

Unsupervised Reinforcement Learning with Contrastive Intrinsic Control

  • Michael Laskin
  • Hao Liu
  • Xue Bin Peng
  • Denis Yarats
  • Aravind Rajeswaran
  • Pieter Abbeel

We introduce Contrastive Intrinsic Control (CIC), an unsupervised reinforcement learning (RL) algorithm that maximizes the mutual information between state-transitions and latent skill vectors. CIC utilizes contrastive learning between state-transitions and skills vectors to learn behaviour embeddings and maximizes the entropy of these embeddings as an intrinsic reward to encourage behavioural diversity. We evaluate our algorithm on the Unsupervised RL Benchmark (URLB) in the asymptotic state-based setting, which consists of a long reward-free pre-training phase followed by a short adaptation phase to downstream tasks with extrinsic rewards. We find that CIC improves over prior exploration algorithms in terms of adaptation efficiency to downstream tasks on state-based URLB.

YNIMG Journal 2021 Journal Article

A slower rate of sulcal widening in the brains of the nondemented oldest old

  • Hui Tang
  • Tao Liu
  • Hao Liu
  • Jiyang Jiang
  • Jian Cheng
  • Haijun Niu
  • Shuyu Li
  • Henry Brodaty

The relationships between aging and brain morphology have been reported in many previous structural brain studies. However, the trajectories of successful brain aging in the extremely old remain underexplored. In the limited research on the oldest old, covering individuals aged 85 years and older, there are very few studies that have focused on the cortical morphology, especially cortical sulcal features. In this paper, we measured sulcal width and depth as well as cortical thickness from T1-weighted scans of 290 nondemented community-dwelling participants aged between 76 and 103 years. We divided the participants into young old (between 76 and 84; mean = 80.35±2.44; male/female = 76/88) and oldest old (between 85 and 103; mean = 91.74±5.11; male/female = 60/66) groups. The results showed that most of the examined sulci significantly widened with increased age and that the rates of sulcal widening were lower in the oldest old. The spatial pattern of the cortical thinning partly corresponded with that of sulcal widening. Compared to females, males had significantly wider sulci, especially in the oldest old. This study builds a foundation for future investigations of neurocognitive disorders and neurodegenerative diseases in the oldest old, including centenarians.

NeurIPS Conference 2021 Conference Paper

Behavior From the Void: Unsupervised Active Pre-Training

  • Hao Liu
  • Pieter Abbeel

We introduce a new unsupervised pre-training method for reinforcement learning called APT, which stands for Active Pre-Training. APT learns behaviors and representations by actively searching for novel states in reward-free environments. The key novel idea is to explore the environment by maximizing a non-parametric entropy computed in an abstract representation space, which avoids challenging density modeling and consequently allows our approach to scale much better in environments that have high-dimensional observations (e. g. , image observations). We empirically evaluate APT by exposing task-specific reward after a long unsupervised pre-training phase. In Atari games, APT achieves human-level performance on 12 games and obtains highly competitive performance compared to canonical fully supervised RL algorithms. On DMControl suite, APT beats all baselines in terms of asymptotic performance and data efficiency and dramatically improves performance on tasks that are extremely difficult to train from scratch.

IJCAI Conference 2021 Conference Paper

Bipartite Matching for Crowd Counting with Point Supervision

  • Hao Liu
  • Qiang Zhao
  • Yike Ma
  • Feng Dai

For crowd counting task, it has been demonstrated that imposing Gaussians to point annotations hurts generalization performance. Several methods attempt to utilize point annotations as supervision directly. And they have made significant improvement compared with density-map based methods. However, these point based methods ignore the inevitable annotation noises and still suffer from low robustness to noisy annotations. To address the problem, we propose a bipartite matching based method for crowd counting with only point supervision (BM-Count). In BM-Count, we select a subset of most similar pixels from the predicted density map to match annotated pixels via bipartite matching. Then loss functions can be defined based on the matching pairs to alleviate the bad effect caused by those annotated dots with incorrect positions. Under the noisy annotations, our method reduces MAE and RMSE by 9% and 11. 2% respectively. Moreover, we propose a novel ranking distribution learning framework to address the imbalanced distribution problem of head counts, which encodes the head counts as classification distribution in the ranking domain and refines the estimated count map in the continuous domain. Extensive experiments on four datasets show that our method achieves state-of-the-art performance and performs better crowd localization.

AAAI Conference 2021 Conference Paper

Community-Aware Multi-Task Transportation Demand Prediction

  • Hao Liu
  • Qiyu Wu
  • Fuzhen Zhuang
  • Xinjiang Lu
  • Dejing Dou
  • Hui Xiong

Transportation demand prediction is of great importance to urban governance and has become an essential function in many online applications. While many efforts have been made for regional transportation demand prediction, predicting the diversified transportation demand for different communities (e. g. , the aged, the juveniles) remains an unexplored problem. However, this task is challenging because of the joint influence of spatio-temporal correlation among regions and implicit correlation among different communities. To this end, in this paper, we propose the Multi-task Spatio- Temporal Network with Mutually-supervised Adaptive task grouping (Ada-MSTNet) for community-aware transportation demand prediction. Specifically, we first construct a sequence of multi-view graphs from both spatial and community perspectives, and devise a spatio-temporal neural network to simultaneously capture the sophisticated correlations between regions and communities, respectively. Then, we propose an adaptively clustered multi-task learning module, where the prediction of each region-community specific transportation demand is regarded as distinct task. Moreover, a mutually supervised adaptive task grouping strategy is introduced to softly cluster each task into different task groups, by leveraging the supervision signal from one another graph view. In such a way, Ada-MSTNet is not only able to share common knowledge among highly related communities and regions, but also shield the noise from unrelated tasks in an end-to-end fashion. Finally, extensive experiments on two real-world datasets demonstrate the effectiveness of our approach compared with seven baselines.

NeurIPS Conference 2021 Conference Paper

Generalized DataWeighting via Class-Level Gradient Manipulation

  • Can Chen
  • Shuhao Zheng
  • Xi Chen
  • Erqun Dong
  • Xue (Steve) Liu
  • Hao Liu
  • Dejing Dou

Label noise and class imbalance are two major issues coexisting in real-world datasets. To alleviate the two issues, state-of-the-art methods reweight each instance by leveraging a small amount of clean and unbiased data. Yet, these methods overlook class-level information within each instance, which can be further utilized to improve performance. To this end, in this paper, we propose Generalized Data Weighting (GDW) to simultaneously mitigate label noise and class imbalance by manipulating gradients at the class level. To be specific, GDW unrolls the loss gradient to class-level gradients by the chain rule and reweights the flow of each gradient separately. In this way, GDW achieves remarkable performance improvement on both issues. Aside from the performance gain, GDW efficiently obtains class-level weights without introducing any extra computational cost compared with instance weighting methods. Specifically, GDW performs a gradient descent step on class-level weights, which only relies on intermediate gradients. Extensive experiments in various settings verify the effectiveness of GDW. For example, GDW outperforms state-of-the-art methods by $2. 56\%$ under the $60\%$ uniform noise setting in CIFAR10. Our code is available at https: //github. com/GGchen1997/GDW-NIPS2021.

AAAI Conference 2021 Conference Paper

Joint Air Quality and Weather Prediction Based on Multi-Adversarial Spatiotemporal Networks

  • Jindong Han
  • Hao Liu
  • Hengshu Zhu
  • Hui Xiong
  • Dejing Dou

Accurate and timely air quality and weather predictions are of great importance to urban governance and human livelihood. Though many efforts have been made for air quality or weather prediction, most of them simply employ one another as feature input, which ignores the inner-connection between two predictive tasks. On the one hand, the accurate prediction of one task can help improve another task’s performance. On the other hand, geospatially distributed air quality and weather monitoring stations provide additional hints for city-wide spatiotemporal dependency modeling. Inspired by the above two insights, in this paper, we propose the Multi-adversarial spatiotemporal recurrent Graph Neural Networks (MasterGNN) for joint air quality and weather predictions. Specifically, we first propose a heterogeneous recurrent graph neural network to model the spatiotemporal autocorrelation among air quality and weather monitoring stations. Then, we develop a multi-adversarial graph learning framework to against observation noise propagation introduced by spatiotemporal modeling. Moreover, we present an adaptive training strategy by formulating multi-adversarial learning as a multi-task learning problem. Finally, extensive experiments on two real-world datasets show that MasterGNN achieves the best performance compared with seven baselines on both air quality and weather prediction tasks.

AAAI Conference 2021 Conference Paper

Out-of-Town Recommendation with Travel Intention Modeling

  • Haoran Xin
  • Xinjiang Lu
  • Tong Xu
  • Hao Liu
  • Jingjing Gu
  • Dejing Dou
  • Hui Xiong

Out-of-town recommendation is designed for those users who leave their home-town areas and visit the areas they have never been to before. It is challenging to recommend Pointof-Interests (POIs) for out-of-town users since the out-oftown check-in behavior is determined by not only the user’s home-town preference but also the user’s travel intention. Besides, the user’s travel intentions are complex and dynamic, which leads to big difficulties in understanding such intentions precisely. In this paper, we propose a TRAvel- INtention-aware Out-of-town Recommendation framework, named TRAINOR. The proposed TRAINOR framework distinguishes itself from existing out-of-town recommenders in three aspects. First, graph neural networks are explored to represent users’ home-town check-in preference and geographical constraints in out-of-town check-in behaviors. Second, a user-specific travel intention is formulated as an aggregation combining home-town preference and generic travel intention together, where the generic travel intention is regarded as a mixture of inherent intentions that can be learned by Neural Topic Model (NTM). Third, a non-linear mapping function, as well as a matrix factorization method, are employed to transfer users’ home-town preference and estimate out-of-town POI’s representation, respectively. Extensive experiments on real-world data sets validate the effectiveness of the TRAINOR framework. Moreover, the learned travel intention can deliver meaningful explanations for understanding a user’s travel purposes.

AAAI Conference 2021 Conference Paper

Self-Supervised Prototype Representation Learning for Event-Based Corporate Profiling

  • Zixuan Yuan
  • Hao Liu
  • Renjun Hu
  • Denghui Zhang
  • Hui Xiong

Event-based corporate profiling aims to assess the evolving operational status of the corresponding corporate from its event sequence. Existing studies on corporate profiling have partially addressed the problem via (i) case-by-case empirical analysis by leveraging traditional financial methods, or (ii) the automatic profile inference by reformulating the problem into a supervised learning task. However, both approaches heavily rely on domain knowledge and are laborintensive. More importantly, the task-specific nature of both approaches prevents the obtained corporate profiles from being applied to diversified downstream applications. To this end, in this paper, we propose a Self-Supervised Prototype Representation Learning (SePaL) framework for dynamic corporate profiling. By exploiting the topological information of an event graph and exploring self-supervised learning techniques, SePaL can obtain unified corporate representations that are robust to event noises and can be easily finetuned to benefit various down-stream applications with only a few annotated data. Specifically, we first infer the initial cluster distribution of noise-resistant event prototypes based on latent representations of events. Then, we construct four permutation-invariant self-supervision signals to guide the representation learning of the event prototype. In terms of applications, we exploit the learned time-evolving corporate representations for both stock price spike prediction and corporate default risk evaluation. Experimental results on two real-world corporate event datasets demonstrate the effectiveness of SePaL for these two applications.

NeurIPS Conference 2021 Conference Paper

URLB: Unsupervised Reinforcement Learning Benchmark

  • Misha Laskin
  • Denis Yarats
  • Hao Liu
  • Kimin Lee
  • Albert Zhan
  • Kevin Lu
  • Catherine Cang
  • Lerrel Pinto

Deep Reinforcement Learning (RL) has emerged as a powerful paradigm to solve a range of complex yet specific control tasks. Training generalist agents that can quickly adapt to new tasks remains an outstanding challenge. Recent advances in unsupervised RL have shown that pre-training RL agents with self-supervised intrinsic rewards can result in efficient adaptation. However, these algorithms have been hard to compare and develop due to the lack of a unified benchmark. To this end, we introduce the Unsupervised Reinforcement Learning Benchmark (URLB). URLB consists of two phases: reward-free pre-training and downstream task adaptation with extrinsic rewards. Building on the DeepMind Control Suite, we provide twelve continuous control tasks from three domains for evaluation and open-source code for eight leading unsupervised RL methods. We find that the implemented baselines make progress but are not able to solve URLB and propose directions for future research.

EAAI Journal 2020 Journal Article

A modified particle swarm optimization for multimodal multi-objective optimization

  • XuWei Zhang
  • Hao Liu
  • LiangPing Tu

As an effective evolutionary algorithm, particle swarm optimization (PSO) has been widely used to solve single or multi-objective optimization problems. However, the performance of PSO in solving multi-objective problems is unsatisfactory, so a variety of PSO has been proposed to enhance the performance of PSO on multi-objective optimization problems. In this paper, a modified particle swarm optimization (AMPSO) is proposed to solve the multimodal multi-objective problems. Firstly, a dynamic neighborhood-based learning strategy is introduced to replace the global learning strategy, which enhances the diversity of the population. Meanwhile, to enhance the performance of PSO, the offering competition mechanism is utilized. 11 multimodal multi-objective optimization functions are utilized to verify the feasibility and effectiveness of the proposed AMPSO. Experimental results and statistical analysis indicate that AMPSO has competitive performance compared with 5 state-of-the-art multimodal multi-objective algorithms.

AAAI Conference 2020 Conference Paper

Accurate Structured-Text Spotting for Arithmetical Exercise Correction

  • Yiqing Hu
  • Yan Zheng
  • Hao Liu
  • Dequang Jiang
  • Yinsong Liu
  • Bo Ren

Correcting arithmetical exercise is a labor intensive and time consuming task for primary school teachers all the time. To reduce their burdens, we propose Arithmetical Exercise Checker (AEC), which is the first system that automatically evaluates all arithmetical expressions (AEs) on exercise images. The major challenge is that AE is formed by printed and handwritten texts with particular arithmetical patterns (e. g. , multi-line, fraction). Despite being part of AE, handwritten texts usually lead to zigzag boundaries and tangled rows. What’s worse, AE may be arithmetical incorrect, which makes the contextual information less valuable for recognition. To tackle these problems, we introduce integrated detection, recognition and evaluation branches by leveraging AE’s intrinsic features, namely 1) boundary indistinctive, 2) locally relevant patterns and 3) globally irrelevant symbols. Experimental results demonstrate that AEC yields a 93. 72% correction accuracy on 40 kinds of mainstream primary arithmetical exercises. So far, the online service of AEC processes 75, 000 arbitrary exercises on average per day, and already reduced the burden of over 1, 000, 000 users. AEC shows the bene- fits for implementing an vision-based system as a way to aid teachers in reducing reduplicative tasks.

AAAI Conference 2020 Conference Paper

Semi-Supervised Hierarchical Recurrent Graph Neural Network for City-Wide Parking Availability Prediction

  • Weijia Zhang
  • Hao Liu
  • Yanchi Liu
  • Jingbo Zhou
  • Hui Xiong

The ability to predict city-wide parking availability is crucial for the successful development of Parking Guidance and Information (PGI) systems. Indeed, the effective prediction of city-wide parking availability can improve parking efficiency, help urban planning, and ultimately alleviate city congestion. However, it is a non-trivial task for predicting citywide parking availability because of three major challenges: 1) the non-Euclidean spatial autocorrelation among parking lots, 2) the dynamic temporal autocorrelation inside of and between parking lots, and 3) the scarcity of information about real-time parking availability obtained from real-time sensors (e. g. , camera, ultrasonic sensor, and GPS). To this end, we propose Semi-supervised Hierarchical Recurrent Graph Neural Network (SHARE) for predicting city-wide parking availability. Specifically, we first propose a hierarchical graph convolution structure to model non-Euclidean spatial autocorrelation among parking lots. Along this line, a contextual graph convolution block and a soft clustering graph convolution block are respectively proposed to capture local and global spatial dependencies between parking lots. Additionally, we adopt a recurrent neural network to incorporate dynamic temporal dependencies of parking lots. Moreover, we propose a parking availability approximation module to estimate missing real-time parking availabilities from both spatial and temporal domain. Finally, experiments on two realworld datasets demonstrate the prediction performance of SHARE outperforms seven state-of-the-art baselines.

IJCAI Conference 2020 Conference Paper

Why We Go Where We Go: Profiling User Decisions on Choosing POIs

  • Renjun Hu
  • Xinjiang Lu
  • Chuanren Liu
  • Yanyan Li
  • Hao Liu
  • Jingjing Gu
  • Shuai Ma
  • Hui Xiong

While Point-of-Interest (POI) recommendation has been a popular topic of study for some time, little progress has been made for understanding why and how people make their decisions for the selection of POIs. To this end, in this paper, we propose a user decision profiling framework, named PROUD, which can identify the key factors in people's decisions on choosing POIs. Specifically, we treat each user decision as a set of factors and provide a method for learning factor embeddings. A unique perspective of our approach is to identify key factors, while preserving decision structures seamlessly, via a novel scalar projection maximization objective. Exactly solving the objective is non-trivial due to a sparsity constraint. To address this, our PROUD adopts a self projection attention and an L2 regularized sparse activation to directly estimate the likelihood of each factor to be a key factor. Finally, extensive experiments on real-world data validate the advantage of PROUD in preserving user decision structures. Also, our case study indicates that the identified key decision factors can help us to provide more interpretable recommendations and analyses.

AAAI Conference 2019 Conference Paper

Joint Representation Learning for Multi-Modal Transportation Recommendation

  • Hao Liu
  • Ting Li
  • Renjun Hu
  • Yanjie Fu
  • Jingjing Gu
  • Hui Xiong

Multi-modal transportation recommendation has a goal of recommending a travel plan which considers various transportation modes, such as walking, cycling, automobile, and public transit, and how to connect among these modes. The successful development of multi-modal transportation recommendation systems can help to satisfy the diversified needs of travelers and improve the efficiency of transport networks. However, existing transport recommender systems mainly focus on unimodal transport planning. To this end, in this paper, we propose a joint representation learning framework for multi-modal transportation recommendation based on a carefully-constructed multi-modal transportation graph. Specifically, we first extract a multi-modal transportation graph from large-scale map query data to describe the concurrency of users, Origin-Destination (OD) pairs, and transport modes. Then, we provide effective solutions for the optimization problem and develop an anchor embedding for transport modes to initialize the embeddings of transport modes. Moreover, we infer user relevance and OD pair relevance, and incorporate them to regularize the representation learning. Finally, we exploit the learned representations for online multimodal transportation recommendations. Indeed, our method has been deployed into one of the largest navigation Apps to serve hundreds of millions of users, and extensive experimental results with real-world map query data demonstrate the enhanced performance of the proposed method for multimodal transportation recommendations.

ICML Conference 2019 Conference Paper

Taming MAML: Efficient unbiased meta-reinforcement learning

  • Hao Liu
  • Richard Socher
  • Caiming Xiong

While meta reinforcement learning (Meta-RL) methods have achieved remarkable success, obtaining correct and low variance estimates for policy gradients remains a significant challenge. In particular, estimating a large Hessian, poor sample efficiency and unstable training continue to make Meta-RL difficult. We propose a surrogate objective function named, Taming MAML (TMAML), that adds control variates into gradient estimation via automatic differentiation. TMAML improves the quality of gradient estimation by reducing variance without introducing bias. We further propose a version of our method that extends the meta-learning framework to learning the control variates themselves, enabling efficient and scalable learning from a distribution of MDPs. We empirically compare our approach with MAML and other variance-bias trade-off methods including DICE, LVC, and action-dependent control variates. Our approach is easy to implement and outperforms existing methods in terms of the variance and accuracy of gradient estimation, ultimately yielding higher performance across a variety of challenging Meta-RL environments.

AAAI Conference 2018 Conference Paper

Deep Learning for Case-Based Reasoning Through Prototypes: A Neural Network That Explains Its Predictions

  • Oscar Li
  • Hao Liu
  • Chaofan Chen
  • Cynthia Rudin

Deep neural networks are widely used for classification. These deep models often suffer from a lack of interpretability – they are particularly difficult to understand because of their non-linear nature. As a result, neural networks are often treated as “black box” models, and in the past, have been trained purely to optimize the accuracy of predictions. In this work, we create a novel network architecture for deep learning that naturally explains its own reasoning for each prediction. This architecture contains an autoencoder and a special prototype layer, where each unit of that layer stores a weight vector that resembles an encoded training input. The encoder of the autoencoder allows us to do comparisons within the latent space, while the decoder allows us to visualize the learned prototypes. The training objective has four terms: an accuracy term, a term that encourages every prototype to be similar to at least one encoded input, a term that encourages every encoded input to be close to at least one prototype, and a term that encourages faithful reconstruction by the autoencoder. The distances computed in the prototype layer are used as part of the classification process. Since the prototypes are learned during training, the learned network naturally comes with explanations for each prediction, and the explanations are loyal to what the network actually computes.

IJCAI Conference 2018 Conference Paper

Structured Inference for Recurrent Hidden Semi-markov Model

  • Hao Liu
  • Lirong He
  • Haoli Bai
  • Bo Dai
  • Kun Bai
  • Zenglin Xu

Segmentation and labeling for high dimensional time series is an important yet challenging task in a number of applications, such as behavior understanding and medical diagnosis. Recent advances to model the nonlinear dynamics in such time series data, has suggested to involve recurrent neural networks into Hidden Markov Models. However, this involvement has caused the inference procedure much more complicated, often leading to intractable inference, especially for the discrete variables of segmentation and labeling. To achieve both flexibility and tractability in modeling nonlinear dynamics of discrete variables, we present a structured and stochastic sequential neural network (SSNN), which composes with a generative network and an inference network. In detail, the generative network aims to not only capture the long-term dependencies but also model the uncertainty of the segmentation labels via semi-Markov models. More importantly, for efficient and accurate inference, the proposed bi-directional inference network reparameterizes the categorical segmentation with the Gumbel-Softmax approximation and resorts to the Stochastic Gradient Variational Bayes. We evaluate the proposed model in a number of tasks, including speech modeling, automatic segmentation and labeling in behavior understanding, and sequential multi-objects recognition. Experimental results have demonstrated that our proposed model can achieve significant improvement over the state-of-the-art methods.

YNIMG Journal 2018 Journal Article

UBO Detector – A cluster-based, fully automated pipeline for extracting white matter hyperintensities

  • Jiyang Jiang
  • Tao Liu
  • Wanlin Zhu
  • Rebecca Koncz
  • Hao Liu
  • Teresa Lee
  • Perminder S. Sachdev
  • Wei Wen

We present ‘UBO Detector’, a cluster-based, fully automated pipeline for extracting and calculating variables for regions of white matter hyperintensities (WMH) (available for download at https: //cheba. unsw. edu. au/group/neuroimaging-pipeline). It takes T1-weighted and fluid attenuated inversion recovery (FLAIR) scans as input, and SPM12 and FSL functions are utilised for pre-processing. The candidate clusters are then generated by FMRIB's Automated Segmentation Tool (FAST). A supervised machine learning algorithm, k-nearest neighbor (k-NN), is applied to determine whether the candidate clusters are WMH or non-WMH. UBO Detector generates both image and text (volumes and the number of WMH clusters) outputs for whole brain, periventricular, deep, and lobar WMH, as well as WMH in arterial territories. The computation time for each brain is approximately 15 min. We validated the performance of UBO Detector by showing a) high segmentation (similarity index (SI) = 0. 848) and volumetric (intraclass correlation coefficient (ICC) = 0. 985) agreement between the UBO Detector-derived and manually traced WMH; b) highly correlated (r2 > 0. 9) and a steady increase of WMH volumes over time; and c) significant associations of periventricular (t = 22. 591, p < 0. 001) and deep (t = 14. 523, p < 0. 001) WMH volumes generated by UBO Detector with Fazekas rating scores. With parallel computing enabled in UBO Detector, the processing can take advantage of multi-core CPU's that are commonly available on workstations. In conclusion, UBO Detector is a reliable, efficient and fully automated WMH segmentation pipeline.

NeurIPS Conference 2018 Conference Paper

Variational Inference with Tail-adaptive f-Divergence

  • Dilin Wang
  • Hao Liu
  • Qiang Liu

Variational inference with α-divergences has been widely used in modern probabilistic machine learning. Compared to Kullback-Leibler (KL) divergence, a major advantage of using α-divergences (with positive α values) is their mass-covering property. However, estimating and optimizing α-divergences require to use importance sampling, which could have extremely large or infinite variances due to heavy tails of importance weights. In this paper, we propose a new class of tail-adaptive f-divergences that adaptively change the convex function f with the tail of the importance weights, in a way that theoretically guarantee finite moments, while simultaneously achieving mass-covering properties. We test our methods on Bayesian neural networks, as well as deep reinforcement learning in which our method is applied to improve a recent soft actor-critic (SAC) algorithm (Haarnoja et al. , 2018). Our results show that our approach yields significant advantages compared with existing methods based on classical KL and α-divergences.

NeurIPS Conference 2017 Conference Paper

ALICE: Towards Understanding Adversarial Learning for Joint Distribution Matching

  • Chunyuan Li
  • Hao Liu
  • Changyou Chen
  • Yuchen Pu
  • Liqun Chen
  • Ricardo Henao
  • Lawrence Carin

We investigate the non-identifiability issues associated with bidirectional adversarial training for joint distribution matching. Within a framework of conditional entropy, we propose both adversarial and non-adversarial approaches to learn desirable matched joint distributions for unsupervised and supervised tasks. We unify a broad family of adversarial models as joint distribution matching problems. Our approach stabilizes learning of unsupervised bidirectional adversarial learning methods. Further, we introduce an extension for semi-supervised learning tasks. Theoretical results are validated in synthetic data and real-world applications.

AIIM Journal 2017 Journal Article

From SNOMED CT to Uberon: Transferability of evaluation methodology between similarly structured ontologies

  • Gai Elhanan
  • Christopher Ochs
  • Jose L.V. Mejino
  • Hao Liu
  • Christopher J. Mungall
  • Yehoshua Perl

Objective To examine whether disjoint partial-area taxonomy, a semantically-based evaluation methodology that has been successfully tested in SNOMED CT, will perform with similar effectiveness on Uberon, an anatomical ontology that belongs to a structurally similar family of ontologies as SNOMED CT. Method A disjoint partial-area taxonomy was generated for Uberon. One hundred randomly selected test concepts that overlap between partial-areas were matched to a same size control sample of non-overlapping concepts. The samples were blindly inspected for non-critical issues and presumptive errors first by a general domain expert whose results were then confirmed or rejected by a highly experienced anatomical ontology domain expert. Reported issues were subsequently reviewed by Uberon’s curators. Results Overlapping concepts in Uberon’s disjoint partial-area taxonomy exhibited a significantly higher rate of all issues. Clear-cut presumptive errors trended similarly but did not reach statistical significance. A sub-analysis of overlapping concepts with three or more relationship types indicated a much higher rate of issues. Conclusions Overlapping concepts from Uberon’s disjoint abstraction network are quite likely (up to 28. 9%) to exhibit issues. The results suggest that the methodology can transfer well between same family ontologies. Although Uberon exhibited relatively few overlapping concepts, the methodology can be combined with other semantic indicators to expand the process to other concepts within the ontology that will generate high yields of discovered issues.

IJCAI Conference 2017 Conference Paper

Learning User's Intrinsic and Extrinsic Interests for Point-of-Interest Recommendation: A Unified Approach

  • Huayu Li
  • Yong Ge
  • Defu Lian
  • Hao Liu

Point-of-Interest (POI) recommendation has been an important service on location-based social networks. However, it is very challenging to generate accurate recommendations due to the complex nature of user's interest in POI and the data sparseness. In this paper, we propose a novel unified approach that could effectively learn fine-grained and interpretable user's interest, and adaptively model the missing data. Specifically, a user's general interest in POI is modeled as a mixture of her intrinsic and extrinsic interests, upon which we formulate the ranking constraints in our unified recommendation approach. Furthermore, a self-adaptive location-oriented method is proposed to capture the inherent property of missing data, which is formulated as squared error based loss in our unified optimization objective. Extensive experiments on real-world datasets demonstrate the effectiveness and advantage of our approach.

NeurIPS Conference 2017 Conference Paper

Triangle Generative Adversarial Networks

  • Zhe Gan
  • Liqun Chen
  • Weiyao Wang
  • Yuchen Pu
  • Yizhe Zhang
  • Hao Liu
  • Chunyuan Li
  • Lawrence Carin

A Triangle Generative Adversarial Network ($\Delta$-GAN) is developed for semi-supervised cross-domain joint distribution matching, where the training data consists of samples from each domain, and supervision of domain correspondence is provided by only a few paired samples. $\Delta$-GAN consists of four neural networks, two generators and two discriminators. The generators are designed to learn the two-way conditional distributions between the two domains, while the discriminators implicitly define a ternary discriminative function, which is trained to distinguish real data pairs and two kinds of fake data pairs. The generators and discriminators are trained together using adversarial learning. Under mild assumptions, in theory the joint distributions characterized by the two generators concentrate to the data distribution. In experiments, three different kinds of domain pairs are considered, image-label, image-image and image-attribute pairs. Experiments on semi-supervised image classification, image-to-image translation and attribute-based image generation demonstrate the superiority of the proposed approach.

IROS Conference 2014 Conference Paper

Robust attitude controller for uncertain hexarotor micro aerial vehicles (MAVs)

  • Dafizal Derawi
  • Nurul Dayana Salim
  • Hairi Zamzuri
  • Hao Liu
  • Mohd Azizi Abdul Rahman
  • Saiful Amri Mazlan

This paper proposes a practical robust attitude controller for uncertain hexarotor micro aerial vehicles (MAVs). The proposed robust controller consists of a nominal linear time-invariant controller and a robust compensator for pitch, roll, and yaw subsystems. The nominal controller is an inner-outer loop structure of PI+PID (proportional-integral plus proportional-integral-derivative) control method to achieve the desired tracking of the nominal system, whilst the robust compensator is added to restrain the influence of the uncertainties (equivalent disturbances) which contain parametric uncertainties, coupling, nonlinear dynamics, and external disturbances. The real-time experimental results on the hexarotor demonstrate the effectiveness of the proposed controller in real flight condition and finally, the attitude tracking errors are proven to be ultimately bounded with specified boundaries.