Arrow Research search

Author name cluster

Yong Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

167 papers
2 author rows

Possible papers

167

AAAI Conference 2026 Conference Paper

AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection

  • Bin-Bin Gao
  • Yue Zhou
  • Jiangtao Yan
  • Yuezhi Cai
  • Weixi Zhang
  • Meng Wang
  • Jun Liu
  • Yong Liu

Universal visual anomaly detection aims to identify anomalies from novel or unseen vision domains without additional fine-tuning, which is critical in open scenarios. Recent studies have demonstrated that pre-trained vision-language models like CLIP exhibit strong generalization with just zero or a few normal images. However, existing methods struggle to design prompt templates, handle complex token interactions, or require fine-tuning on target domains, resulting in limited flexibility. In this work, we present a simple yet effective AdaptCLIP based on two key insights. First, adaptive visual and textual representations should be learned alternately rather than jointly. Second, comparative learning between query and normal image prompt should incorporate both contextual and aligned residual features, rather than relying solely on residual features. AdaptCLIP treats CLIP models as a foundational service, adding only three simple adapters, visual adapter, textual adapter, and prompt-query adapter, at its input or output ends. AdaptCLIP supports zero-/few-shot generalization across domains and provides a training-free approach on target domains once trained on a base dataset. AdaptCLIP achieves state-of-the-art performance on 12 anomaly detection benchmarks from industrial and medical domains, significantly outperforming existing competitive methods.

AAAI Conference 2026 Conference Paper

Don’t Start Over: A Cost-Effective Framework for Migrating Personalized Prompts Between LLMs

  • Ziyi Zhao
  • Chongming Gao
  • Yang Zhang
  • Haoyan Liu
  • Weinan Gan
  • Huifeng Guo
  • Yong Liu
  • Fuli Feng

Personalization in Large Language Models (LLMs) often relies on user-specific soft prompts. However, these prompts become obsolete when the foundation model is upgraded, necessitating costly, full-scale retraining. To overcome this limitation, we propose the Prompt-level User Migration Adapter (PUMA), a lightweight framework to efficiently migrate personalized prompts across incompatible models. PUMA utilizes a parameter-efficient adapter to bridge the semantic gap, combined with a group-based user selection strategy to significantly reduce training costs. Experiments on three large-scale datasets show our method matches or even surpasses the performance of retraining from scratch, reducing computational cost by up to 98%. The framework demonstrates strong generalization across diverse model architectures and robustness in advanced scenarios like chained and aggregated migrations, offering a practical path for the sustainable evolution of personalized AI by decoupling user assets from the underlying models.

AAAI Conference 2026 Conference Paper

LLM-Oriented Token-Adaptive Knowledge Distillation

  • Xurong Xie
  • Zhucun Xue
  • Jiafu Wu
  • Jian Li
  • Yabiao Wang
  • Xiaobin Hu
  • Yong Liu
  • Jiangning Zhang

Knowledge Distillation (KD) is a key technique for compressing Large-scale Language Models (LLMs), but prevailing logit-based methods employ static strategies misaligned with the student’s dynamic learning process. By treating all tokens indiscriminately with a fixed temperature, these methods result in suboptimal knowledge transfer. To address this, we propose LLM-oriented token-Adaptive Knowledge Distillation (AdaKD), a framework that adapts the distillation process to each token’s real-time learning state. AdaKD consists of two synergistic modules driven by a unified token difficulty metric. First, the Loss-driven Adaptive Token Focusing (LATF) module dynamically concentrates distillation on valuable tokens by monitoring the student’s learning stability. Second, Inverse Difficulty Temperature Scaling (IDTS) introduces a counterintuitive token-level temperature: low for difficult tokens to target error correction, and high for easy tokens to learn the teacher’s smooth output distribution for better generalization. As a plug-and-play framework, AdaKD consistently improves performance across diverse distillation methods, model architectures, and benchmarks.

AAAI Conference 2026 Conference Paper

Note2Chat: Improving LLMs for Multi-Turn Clinical History Taking Using Medical Notes

  • Yang Zhou
  • Zhenting Sheng
  • Mingrui Tan
  • Yuting Song
  • Jun Zhou
  • Yu Heng Kwan
  • Lian Leng Low
  • Yang Bai

Effective clinical history taking is a foundational yet underexplored component of clinical reasoning. While large language models (LLMs) have shown promise on static benchmarks, they often fall short in dynamic, multi-turn diagnostic settings that require iterative questioning and hypothesis refinement. To address this gap, we propose Note2Chat, a note-driven framework that trains LLMs to conduct structured history taking and diagnosis by learning from widely available medical notes. Instead of relying on scarce and sensitive dialogue data, we convert real-world medical notes into high-quality doctor-patient dialogues using a decision tree-guided generation and refinement pipeline. We then propose a three-stage fine-tuning strategy combining supervised learning, simulated data augmentation, and preference learning. Furthermore, we propose a novel single-turn reasoning paradigm that reframes history taking as a sequence of single-turn reasoning problems. This design enhances interpretability and enables local supervision, dynamic adaptation, and greater sample efficiency. Experimental results show that our method substantially improves clinical reasoning, achieving gains of +16.9 F1 and +21.0 Top-1 diagnostic accuracy over GPT-4o.

AAAI Conference 2026 Conference Paper

OptMark: Robust Multi-bit Diffusion Watermarking via Inference Time Optimization

  • Jiazheng Xing
  • Hai Ci
  • Hongbin Xu
  • Hangjie Yuan
  • Yong Liu
  • Mike Zheng Shou

Watermarking diffusion-generated images is crucial for copyright protection and user tracking. However, current diffusion watermarking methods face significant limitations: zero-bit watermarking systems lack the capacity for large-scale user tracking, while multi-bit methods are highly sensitive to certain image transformations or generative attacks, resulting in a lack of comprehensive robustness. In this paper, we propose OptMark, an optimization-based approach that embeds a robust multi-bit watermark into the intermediate latents of the diffusion denoising process. OptMark strategically inserts a structural watermark early to resist generative attacks and a detail watermark late to withstand image transformations, with tailored regularization terms to preserve image quality and ensure imperceptibility. To address the challenge of memory consumption growing linearly with the number of denoising steps during optimization, OptMark incorporates adjoint gradient methods, reducing memory usage from O(N) to O(1). Experimental results demonstrate that OptMark achieves invisible multi-bit watermarking while ensuring robust resilience against valuemetric transformations, geometric transformations, editing, and regeneration attacks.

AAAI Conference 2026 Conference Paper

Personalize Before Retrieve: LLM-based Personalized Query Expansion for User-Centric Retrieval

  • Yingyi Zhang
  • Pengyue Jia
  • Derong Xu
  • Yi Wen
  • Xianneng Li
  • Yichao Wang
  • Wenlin Zhang
  • Xiaopeng Li

Retrieval-Augmented Generation (RAG) critically depends on effective query expansion to retrieve relevant information. However, existing expansion methods adopt uniform strategies that overlook user-specific semantics, ignoring individual expression styles, preferences, and historical context. In practice, identical queries in text can express vastly different intentions across users. This representational rigidity limits the ability of current RAG systems to generalize effectively in personalized settings. Specifically, we identify two core challenges for personalization: 1) user expression styles are inherently diverse, making it difficult for standard expansions to preserve personalized intent. 2) user corpora induce heterogeneous semantic structures—varying in topical focus and lexical organization—which hinders the effective anchoring of expanded queries within the user’s corpora space. To address these challenges, we propose Personalize Before Retrieve (PBR), a framework that incorporates user-specific signals into query expansion prior to retrieval. PBR consists of two components: P-PRF, which generates stylistically aligned pseudo feedback using user history for simulating user expression style, and P-Anchor, which performs graph-based structure alignment over user corpora to capture its structure. Together, they produce personalized query representations tailored for retrieval. Experiments on two personalized benchmarks show that PBR consistently outperforms strong baselines, with up to 10% gains on PersonaBench across retrievers. Our findings demonstrate the value of modeling personalization before retrieval to close the semantic gap in user-adaptive RAG systems.

AAAI Conference 2026 Conference Paper

PHPFND: Detecting Fake News via Post-Hoc Processing of LLMs Hallucination

  • Jinke Ma
  • Jiachen Ma
  • Wei Zhang
  • Yong Liu

Large Language Models (LLMs) perform excellently in fake news detection tasks, but their outputs are often accompanied by hallucinations, i.e., generated content that is contradictory to facts. Previous studies have mostly mitigated hallucinations through prompt design. However, this paper reveals that regions in news articles which easily induce hallucinations in LLMs correspond closely to the most challenging regions for fake news detectors. In this paper, we propose a fake news detection framework (PHPFND) based on post-hoc processing of LLMs hallucination. Specifically, our framework includes a hallucination detection module (ISHD) based on information structuring that detects three types of hallucinations in LLMs in a targeted manner, and a hallucination-driven feature enhancement mechanism (HDFE) that incorporates hallucination signals as explicit features into sentence-level encoding and feature fusion to guide the model’s attention toward high-risk regions. Experimental results on two mainstream fake news datasets show that our proposed method significantly outperforms LLM-based baselines.

AAAI Conference 2026 Conference Paper

Put the Space of LoRA Initialization to the Extreme to Preserve Pre-trained Knowledge

  • Pengwei Tang
  • Xiaolin Hu
  • Yong Liu
  • Lizhong Ding
  • Dongjie Zhang
  • Xing Wu
  • Debing Zhang

Low-Rank Adaptation (LoRA) is the leading parameter-efficient fine-tuning method for Large Language Models (LLMs), but it still suffers from catastrophic forgetting. Recent work has shown that specialized LoRA initialization can alleviate catastrophic forgetting. There are currently two approaches to LoRA initialization aimed at preventing knowledge forgetting during fine-tuning: (1) making residual weights close to pre-trained weights, and (2) ensuring the space of LoRA initialization is orthogonal to pre-trained knowledge. The former is what current methods strive to achieve, while the importance of the latter is not sufficiently recognized. We find that the space of LoRA initialization is the key to preserving pre-trained knowledge rather than the residual weights. Existing methods like MiLoRA propose making the LoRA initialization space orthogonal to pre-trained weights. However, MiLoRA utilizes the null space of pre-trained weights. Compared to pre-trained weights, the input activations of pre-trained knowledge take into account the parameters of all previous layers as well as the input data, while pre-trained weights only contain information from the current layer. Moreover, we find that the effective ranks of input activations are much smaller than those of pre-trained weights. Thus, the null space of activations is more accurate and contains less pre-trained knowledge information compared to that of weights. Based on these, we introduce LoRA-Null, our proposed method that initializes LoRA in the null space of activations. Experimental results show that LoRA-Null effectively preserves the pre-trained world knowledge of LLMs while achieving good fine-tuning performance, as evidenced by extensive experiments.

AAAI Conference 2025 Conference Paper

AdaO2B: Adaptive Online to Batch Conversion for Out-of-Distribution Generalization

  • Xiao Zhang
  • Sunhao Dai
  • Jun Xu
  • Yong Liu
  • Zhenhua Dong

Online to batch conversion involves constructing a new batch learner by utilizing a series of models generated by an existing online learning algorithm, for achieving generalization guarantees under i.i.d assumption. However, when applied to real-world streaming applications such as streaming recommender systems, the data stream may be sampled from time-varying distributions instead of persistently being i.i.d. This poses a challenge in terms of out-of-distribution (OOD) generalization. Existing approaches employ fixed conversion mechanisms that are unable to adapt to novel testing distributions, hindering the testing accuracy of the batch learner. To address these issues, we propose AdaO2B, an adaptive online to batch conversion approach under the bandit setting. AdaO2B is designed to be aware of the distribution shifts in the testing data and achieves OOD generalization guarantees. Specifically, AdaO2B can dynamically combine the sequence of models learned by a contextual bandit algorithm and determine appropriate combination weights using a context-aware weighting function. This innovative approach allows for the conversion of a sequence of models into a batch learner that facilitates OOD generalization. Theoretical analysis provides justification for why and how the learned adaptive batch learner can achieve OOD generalization error guarantees. Experimental results have demonstrated that AdaO2B significantly outperforms state-of-the-art baselines on both synthetic and real-world recommendation datasets.

NeurIPS Conference 2025 Conference Paper

AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding

  • Xue zhucun
  • Jiangning Zhang
  • Xie Xurong
  • Yuxuan Cai
  • Yong Liu
  • Xiangtai Li
  • Dacheng Tao

Multimodal Large Language Models (MLLMs) have demonstrated excellent performance in video understanding but suffer from degraded effectiveness when processing long videos due to fixed-length contexts and weaknesses in modeling long-term dependencies. Retrieval-Augmented Generation (RAG) technology can mitigate these limitations through dynamic knowledge expansion, but existing RAG schemes for video understanding employ fixed retrieval paradigms that use uniform structures regardless of input query difficulty. This introduces redundant computational overhead and latency ( e. g. , complex graph traversal operations) for simple queries ( e. g. , frame-level object recognition) while potentially causing critical information loss due to insufficient retrieval granularity for multi-hop reasoning. Such single-step retrieval mechanisms severely constrain the model's balance between resource efficiency and cognitive depth. To address this, we first propose a novel AdaVideoRAG framework for long-video understanding, which uses a lightweight intent classifier to dynamically and adaptively allocate appropriate retrieval schemes, ranging from the simplest to the most sophisticated, for different video understanding tasks based on query complexity. We introduce an Omni-Knowledge Indexing module to extract valuable information from multi-modal signals for context modeling and build corresponding databases, i. e. , a text base from clip captions, ASR, and OCR; a visual base; and a graph for deep semantic understanding. This enables hierarchical knowledge access, integration, and generation from naive retrieval to graph retrieval, achieving an optimal balance between resource consumption and video understanding capabilities. Finally, we construct the HiVU benchmark for deep understanding evaluation. Extensive experiments show that our framework enhances the overall efficiency and accuracy of Video-QA for long videos and can be seamlessly integrated with existing MLLMs via lightweight API calls, establishing a new paradigm for adaptive retrieval augmentation in video analysis.

JBHI Journal 2025 Journal Article

Attention-Based Q-Space Deep Learning Generalized for Accelerated Diffusion Magnetic Resonance Imaging

  • Fangrong Zong
  • Zaimin Zhu
  • Jiayi Zhang
  • Xiaofeng Deng
  • Zhuangzhuang Li
  • Chuyang Ye
  • Yong Liu

Diffusion magnetic resonance imaging (dMRI) is a non-invasive method for capturing the microanatomical information of tissues by measuring the diffusion weighted signals along multiple directions, which is widely used in the quantification of microstructures. Obtaining microscopic parameters requires dense sampling in the q space, leading to significant time consumption. The most popular approach to accelerating dMRI acquisition is to undersample the q-space data, along with applying deep learning methods to reconstruct quantitative diffusion parameters. However, the reliance on a predetermined q-space sampling strategy often constrains traditional deep learning-based reconstructions. The present study proposed a novel deep learning model, named attention-based q-space deep learning (aqDL), to implement the reconstruction with variable q-space sampling strategies. The aqDL maps dMRI data from different scanning strategies onto a common feature space by using a series of Transformer encoders. The latent features are employed to reconstruct dMRI parameters via a multilayer perceptron. The performance of the aqDL model was assessed utilizing the Human Connectome Project datasets at varying undersampling numbers. To validate its generalizability, the model was further tested on two additional independent datasets. Our results showed that aqDL consistently achieves the highest reconstruction accuracy at various undersampling numbers, regardless of whether variable or predetermined q-space scanning strategies are employed. These findings suggest that aqDL has the potential to be used on general clinical dMRI datasets.

NeurIPS Conference 2025 Conference Paper

Benchmarking Retrieval-Augmented Multimomal Generation for Document Question Answering

  • Kuicai Dong
  • CHANG YUJING
  • Shijie Huang
  • Yasheng Wang
  • Ruiming Tang
  • Yong Liu

Document Visual Question Answering (DocVQA) faces dual challenges in processing lengthy multimodal documents (text, images, tables) and performing cross-modal reasoning. Current document retrieval-augmented generation (DocRAG) methods remain limited by their text-centric approaches, frequently missing critical visual information. The field also lacks robust benchmarks for assessing multimodal evidence selection and integration. We introduce MMDocRAG, a comprehensive benchmark featuring 4, 055 expert-annotated QA pairs with multi-page, cross-modal evidence chains. Our framework introduces innovative metrics for evaluating multimodal quote selection and enables answers that interleave text with relevant visual elements. Through large-scale experiments with 60 VLM/LLM models and 14 retrieval systems, we identify persistent challenges in multimodal evidence retrieval, selection, and integration. Key findings reveal that advanced proprietary LVMs show superior performance than open-sourced alternatives. Also, they show moderate advantages using multimodal inputs over text-only inputs, while open-source alternatives show significant performance degradation. Notably, fine-tuned LLMs achieve substantial improvements when using detailed image descriptions. MMDocRAG establishes a rigorous testing ground and provides actionable insights for developing more robust multimodal DocVQA systems.

NeurIPS Conference 2025 Conference Paper

Can LLMs Outshine Conventional Recommenders? A Comparative Evaluation

  • Qijiong Liu
  • Jieming Zhu
  • Lu Fan
  • Kun Wang
  • Hengchang Hu
  • Wei Guo
  • Yong Liu
  • Xiao-ming Wu

Integrating large language models (LLMs) into recommender systems has created new opportunities for improving recommendation quality. However, a comprehensive benchmark is needed to thoroughly evaluate and compare the recommendation capabilities of LLMs with traditional recommender systems. In this paper, we introduce \recbench{}, which systematically investigates various item representation forms (including unique identifier, text, semantic embedding, and semantic identifier) and evaluates two primary recommendation tasks, i. e. , click-through rate prediction (CTR) and sequential recommendation (SeqRec). Our extensive experiments cover up to 17 large models and are conducted across five diverse datasets from fashion, news, video, books, and music domains. Our findings indicate that LLM-based recommenders outperform conventional recommenders, achieving up to a 5% AUC improvement in CTR and up to a 170% NDCG@10 improvement in SeqRec. However, these substantial performance gains come at the expense of significantly reduced inference efficiency, rendering LLMs impractical as real-time recommenders. We have released our code and data to enable other researchers to reproduce and build upon our experimental results.

AAAI Conference 2025 Conference Paper

Decentralized Federated Learning with Model Caching on Mobile Agents

  • Xiaoyu Wang
  • Guojun Xiong
  • Houwei Cao
  • Jian Li
  • Yong Liu

Federated Learning (FL) trains a shared model using data and computation power on distributed agents coordinated by a central server. Decentralized FL (DFL) utilizes local model exchange and aggregation between agents to reduce the communication and computation overheads on the central server. However, when agents are mobile, the communication opportunity between agents can be sporadic, largely hindering the convergence and accuracy of DFL. In this paper, we propose Cached Decentralized Federated Learning (Cached-DFL) to investigate delay-tolerant model spreading and aggregation enabled by model caching on mobile agents. Each agent stores not only its own model, but also models of agents encountered in the recent past. When two agents meet, they exchange their own models as well as the cached models. Local model aggregation utilizes all models stored in the cache. We theoretically analyze the convergence of Cached-DFL, explicitly taking into account the model staleness introduced by caching. We design and compare different model caching algorithms for different DFL and mobility scenarios. We conduct detailed case studies in a vehicular network to systematically investigate the interplay between agent mobility, cache staleness, and model convergence. In our experiments, Cached-DFL converges quickly, and significantly outperforms DFL without caching.

NeurIPS Conference 2025 Conference Paper

Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning

  • Chen Qian
  • Dongrui Liu
  • Haochen Wen
  • Zhen Bai
  • Yong Liu
  • Jing Shao

Large reasoning models (LRMs) have demonstrated impressive capabilities in complex problem-solving, yet their internal reasoning mechanisms remain poorly understood. In this paper, we investigate the reasoning trajectories of LRMs from an information-theoretic perspective. By tracking how mutual information (MI) between intermediate representations and the correct answer evolves during LRM reasoning, we observe an interesting MI peaks phenomenon: the MI at specific generative steps exhibits a sudden and significant increase during LRM's reasoning process. We theoretically analyze such phenomenon and show that as MI increases, the probability of model's prediction error decreases. Furthermore, these MI peaks often correspond to tokens expressing reflection or transition, such as "Hmm", "Wait" and "Therefore, " which we term as the thinking tokens. We then demonstrate that these thinking tokens are crucial for LRM's reasoning performance, while other tokens has minimal impacts. Building on these analyses, we propose two simple yet effective methods to improve LRM's reasoning performance, by delicately leveraging these thinking tokens. Overall, our work provides novel insights into the reasoning mechanisms of LRMs and offers practical ways to improve their reasoning capabilities. The code is available at \url{https: //github. com/ChnQ/MI-Peaks}.

AAAI Conference 2025 Conference Paper

Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation

  • Jiaqi Huang
  • Zunnan Xu
  • Ting Liu
  • Yong Liu
  • Haonan Han
  • Kehong Yuan
  • Xiu Li

In the domain of computer vision, Parameter-Efficient Tuning (PET) is increasingly replacing the traditional paradigm of pre-training followed by full fine-tuning. PET is particularly favored for its effectiveness in large foundation models, as it streamlines transfer learning costs and optimizes hardware utilization. However, the current PET methods are mainly designed for single-modal optimization. While some pioneering studies have undertaken preliminary explorations, they still remain at the level of aligned encoders (e.g., CLIP) and lack exploration of misaligned encoders. These methods show sub-optimal performance with misaligned encoders, as they fail to effectively align the multimodal features during fine-tuning. In this paper, we introduce DETRIS, a parameter-efficient tuning framework designed to enhance low-rank visual feature propagation by establishing dense interconnections between each layer and all preceding layers, which enables effective cross-modal feature interaction and adaptation to misaligned encoders. We also suggest using text adapters to improve textual features. Our simple yet efficient approach greatly surpasses state-of-the-art methods with 0.9% to 1.8% backbone parameter updates, evaluated on challenging benchmarks.

JBHI Journal 2025 Journal Article

Disentangled Representation Learning for Capturing Individualized Brain Atrophy via Pseudo-Healthy Synthesis

  • Zhuangzhuang Li
  • Kun Zhao
  • Pindong Chen
  • Dawei Wang
  • Hongxiang Yao
  • Bo Zhou
  • Jie Lu
  • Pan Wang

Brain atrophy emerges as a distinctive hallmark in various neurodegenerative diseases, demonstrating a progressive trajectory across diverse disease stages and concurrently manifesting in tandem with a discernible decline in cognitive abilities. Understanding the individualized patterns of brain atrophy is critical for precision medicine and the prognosis of neurodegenerative diseases. However, it is difficult to obtain longitudinal data to compare changes before and after the onset of diseases. In this study, we present a deep disentangled generative model (DDGM) for capturing individualized atrophy patterns via disentangling patient images into “realistic” healthy counterfactual images and abnormal residual maps. The proposed DDGM consists of four modules: normal MRI synthesis, residual map synthesis, input reconstruction module, and mutual information neural estimator (MINE). The MINE and adversarial learning strategy together ensure independence between disease-related features and features shared by both disease and healthy controls. In addition, we proposed a comprehensive evaluation of the effectiveness of synthetic pseudo-healthy images, focusing on both their healthiness and subject identity. The results indicated that the proposed DDGM effectively preserves these characteristics in the synthesized pseudo-healthy images, outperforming existing methods. The proposed method demonstrates robust generalization capabilities across two independent datasets from different races and sites. Analysis of the disease residual/saliency maps revealed specific atrophy patterns associated with Alzheimer's disease (AD), particularly in the hippocampus and amygdala regions. These accurate individualized atrophy patterns enhance the performance of AD classification tasks, resulting in an improvement in classification accuracy to 92. 50 $\pm$ 2. 70%.

NeurIPS Conference 2025 Conference Paper

DreamLight: Towards Harmonious and Consistent Image Relighting

  • Yong Liu
  • Wenpeng Xiao
  • Qianqian Wang
  • Junlin Chen
  • Shiyin Wang
  • Yitong Wang
  • Xinglong Wu
  • Yansong Tang

We introduce a model named DreamLight for universal image relighting in this work, which can seamlessly composite subjects into a new background while maintaining aesthetic uniformity in terms of lighting and color tone. The background can be specified by natural images (image-based relighting) or generated from unlimited text prompts (text-based relighting). Existing studies primarily focus on image-based relighting, while with scant exploration into text-based scenarios. Some works employ intricate disentanglement pipeline designs relying on environment maps to provide relevant information, which grapples with the expensive data cost required for intrinsic decomposition and light source. Other methods take this task as an image translation problem and perform pixel-level transformation with autoencoder architecture. While these methods have achieved decent harmonization effects, they struggle to generate realistic and natural light interaction effects between the foreground and background. To alleviate these challenges, we reorganize the input data into a unified format and leverage the semantic prior provided by the pretrained diffusion model to facilitate the generation of natural results. Moreover, we propose a Position-Guided Light Adapter (PGLA) that condenses light information from different directions in the background into designed light query embeddings, and modulates the foreground with direction-biased masked attention. In addition, we present a post-processing module named Spectral Foreground Fixer (SFF) to adaptively reorganize different frequency components of subject and relighted background, which helps enhance the consistency of foreground appearance. Extensive comparisons and user study demonstrate that our DreamLight achieves remarkable relighting performance.

AAAI Conference 2025 Conference Paper

Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving

  • Yu Yang
  • Jianbiao Mei
  • Yukai Ma
  • Siliang Du
  • Wenqing Chen
  • Yijie Qian
  • Yuxiang Feng
  • Yong Liu

World models envision potential future states based on various ego actions. They embed extensive knowledge about the driving environment, facilitating safe and scalable autonomous driving. Most existing methods primarily focus on either data generation or the pretraining paradigms of world models. Unlike the aforementioned prior works, we propose Drive-OccWorld, which adapts a vision-centric 4D forecasting world model to end-to-end planning for autonomous driving. Specifically, we first introduce a semantic and motion-conditional normalization in the memory module, which accumulates semantic and dynamic information from historical BEV embeddings. These BEV features are then conveyed to the world decoder for future occupancy and flow forecasting, considering both geometry and spatiotemporal modeling. Additionally, we propose injecting flexible action conditions, such as velocity, steering angle, trajectory, and commands, into the world model to enable controllable generation and facilitate a broader range of downstream applications. Furthermore, we explore integrating the generative capabilities of the 4D world model with end-to-end planning, enabling continuous forecasting of future states and the selection of optimal trajectories using an occupancy-based cost function. Extensive experiments on the nuScenes dataset demonstrate that our method can generate plausible and controllable 4D occupancy, opening new avenues for driving world generation and end-to-end planning.

IROS Conference 2025 Conference Paper

Efficient Learning of A Unified Policy For Whole-body Manipulation and Locomotion Skills

  • Dianyong Hou
  • Chengrui Zhu
  • Zhen Zhang
  • Zhibin Li
  • Chuang Guo
  • Yong Liu

Equipping quadruped robots with manipulators provides unique loco-manipulation capabilities, enabling diverse practical applications. This integration creates a more complex system that has increased difficulties in modeling and control. Reinforcement learning (RL) offers a promising solution to address these challenges by learning optimal control policies through interaction. Nevertheless, RL methods often struggle with local optima when exploring large solution spaces for motion and manipulation tasks. To overcome these limitations, we propose a novel approach that integrates an explicit kinematic model of the manipulator into the RL framework. This integration provides feedback on the mapping of the body postures to the manipulator’s workspace, guiding the RL exploration process and effectively mitigating the local optima issue. Our algorithm has been successfully deployed on a DeepRobotics X20 quadruped robot equipped with a Unitree Z1 manipulator, and extensive experimental results demonstrate the superior performance of this approach. We have established a project website to showcase our experiments.

IJCAI Conference 2025 Conference Paper

FreEformer: Frequency Enhanced Transformer for Multivariate Time Series Forecasting

  • Wenzhen Yue
  • Yong Liu
  • Xianghua Ying
  • Bowei Xing
  • Ruohao Guo
  • Ji Shi

This paper presents FreEformer, a simple yet effective model that leverages a Frequency Enhanced Transformer for multivariate time series forecasting. Our work is based on the assumption that the frequency spectrum provides a global perspective on the composition of series across various frequencies and is highly suitable for robust representation learning. Specifically, we first convert time series into the complex frequency domain using the Discrete Fourier Transform (DFT). The Transformer architecture is then applied to the frequency spectra to capture cross-variate dependencies, with the real and imaginary parts processed independently. However, we observe that the vanilla attention matrix exhibits a low-rank characteristic, thus limiting representation diversity. To address this, we enhance the vanilla attention mechanism by introducing an additional learnable matrix to the original attention matrix, followed by row-wise L1 normalization. Theoretical analysis demonstrates that this enhanced attention mechanism improves both feature diversity and gradient flow. Extensive experiments demonstrate that FreEformer consistently outperforms state-of-the-art models on eighteen real-world benchmarks covering electricity, traffic, weather, healthcare and finance. Notably, the enhanced attention mechanism also consistently improves the performance of state-of-the-art Transformer-based forecasters. Code is available at https: //anonymous. 4open. science/r/FreEformer.

AAAI Conference 2025 Conference Paper

IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis

  • Yuji Wang
  • Jingchen Ni
  • Yong Liu
  • Chun Yuan
  • Yansong Tang

Zero-shot Referring Image Segmentation (RIS) identifies the instance mask that best aligns with a specified referring expression without training and fine-tuning, significantly reducing the labor-intensive annotation process. Despite achieving commendable results, previous CLIP-based models have a critical drawback: the models exhibit a notable reduction in their capacity to discern relative spatial relationships of objects. This is because they generate all possible masks on an image and evaluate each masked region for similarity to the given expression, often resulting in decreased sensitivity to direct positional clues in text inputs. Moreover, most methods have weak abilities to manage relationships between primary words and their contexts, causing confusion and reduced accuracy in identifying the correct target region. To address these challenges, we propose IteRPrimE (Iterative Grad-CAM Refinement and Primary word Emphasis), which leverages a saliency heatmap through Grad-CAM from a Vision-Language Pre-trained (VLP) model for image-text matching. An iterative Grad-CAM refinement strategy is introduced to progressively enhance the model's focus on the target region and overcome positional insensitivity, creating a self-correcting effect. Additionally, we design the Primary Word Emphasis module to help the model handle complex semantic relations, enhancing its ability to attend to the intended object. Extensive experiments conducted on the RefCOCO/+/g, and PhraseCut benchmarks demonstrate that IteRPrimE outperforms previous SOTA zero-shot methods, particularly excelling in out-of-domain scenarios.

IROS Conference 2025 Conference Paper

Learning Symmetric Legged Locomotion via State Distribution Symmetrization

  • Chengrui Zhu
  • Zhen Zhang
  • Siqi Li
  • Qingpeng Li
  • Yong Liu

Morphological symmetry is a fundamental characteristic of legged animals and robots. Most existing Deep Reinforcement Learning approaches for legged locomotion neglect to exploit this inherent symmetry, often producing unnatural and suboptimal behaviors such as dominant legs or non-periodic gaits. To address this limitation, we propose a novel learning-based framework to systematically optimize symmetry by state distribution symmetrization. First, we introduce the degree of asymmetry (DoA), a quantitative metric that measures the discrepancy between original and mirrored state distributions. Second, we develop an efficient computation method for DoA using gradient ascent with a trained discriminator network. This metric is then incorporated into a reinforcement learning framework by introducing it to the reward function, explicitly encouraging symmetry during policy training. We validate our framework with extensive experiments on quadrupedal and humanoid robots in simulated and real-world environments. Results demonstrate the efficacy of our approach for improving policy symmetry and overall locomotion performance.

IROS Conference 2025 Conference Paper

LITE: A Learning-Integrated Topological Explorer for Multi-Floor Indoor Environments

  • Junhao Chen
  • Zhen Zhang
  • Chengrui Zhu
  • Xiaojun Hou
  • Tianyang Hu
  • Huifeng Wu
  • Yong Liu

This work focuses on multi-floor indoor exploration, which remains an open area of research. Compared to traditional methods, recent learning-based explorers have demonstrated significant potential due to their robust environmental learning and modeling capabilities, but most are restricted to 2D environments. In this paper, we proposed a learning-integrated topological explorer, LITE, for multi-floor indoor environments. LITE decomposes the environment into a floor-stair topology, enabling seamless integration of learning or non-learning-based 2D exploration methods for 3D exploration. As we incrementally build floor-stair topology in exploration using YOLO11-based instance segmentation model, the agent can transition between floors through a finite state machine. Additionally, we implement an attention-based 2D exploration policy that utilizes an attention mechanism to capture spatial dependencies between different regions, thereby determining the next global goal for more efficient exploration. Extensive comparison and ablation studies conducted on the HM3D and MP3D datasets demonstrate that our proposed 2D exploration policy significantly outperforms all baseline explorers in terms of exploration efficiency. Furthermore, experiments in several 3D multi-floor environments indicate that our framework is compatible with various 2D exploration methods, facilitating effective multi-floor indoor exploration. Finally, we validate our method in the real world with a quadruped robot, highlighting its strong generalization capabilities.

TMLR Journal 2025 Journal Article

LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects

  • Guangyi Liu
  • Pengxiang Zhao
  • Yaozhen Liang
  • Liang Liu
  • Yaxuan Guo
  • Han Xiao
  • Weifeng Lin
  • Yuxiang Chai

With the rapid rise of large language models (LLMs), phone automation has undergone transformative changes. This paper systematically reviews LLM-driven phone GUI agents, highlighting their evolution from script-based automation to intelligent, adaptive systems. We first contextualize key challenges, (i) limited generality, (ii) high maintenance overhead, and (iii) weak intent comprehension, and show how LLMs address these issues through advanced language understanding, multimodal perception, and robust decision-making. We then propose a taxonomy covering fundamental agent frameworks (single-agent, multi-agent, plan-then-act), modeling approaches (prompt engineering, training-based), and essential datasets and benchmarks. Furthermore, we detail task-specific architectures, supervised fine-tuning, and reinforcement learning strategies that bridge user intent and GUI operations. Finally, we discuss open challenges such as dataset diversity, on-device deployment efficiency, user-centric adaptation, and security concerns, offering forward-looking insights into this rapidly evolving field. By providing a structured overview and identifying pressing research gaps, this paper serves as a definitive reference for researchers and practitioners seeking to harness LLMs in designing scalable, user-friendly phone GUI agents. The collection of papers reviewed in this survey will be hosted and regularly updated on the GitHub repository: \url{https://github.com/PhoneLLM/Awesome-LLM-Powered-Phone-GUI-Agents}

AAAI Conference 2025 Conference Paper

Look Back for More: Harnessing Historical Sequential Updates for Personalized Federated Adapter Tuning

  • Danni Peng
  • Yuan Wang
  • Huazhu Fu
  • Jinpeng Jiang
  • Yong Liu
  • Rick Siow Mong Goh
  • Qingsong Wei

Personalized federated learning (PFL) studies effective model personalization to address the data heterogeneity issue among clients in traditional federated learning (FL). Existing PFL approaches mainly generate personalized models by relying solely on the clients' latest updated models while ignoring their previous updates, which may result in suboptimal personalized model learning. To bridge this gap, we propose a novel framework termed pFedSeq, designed for personalizing adapters to fine-tune a foundation model in FL. In pFedSeq, the server maintains and trains a sequential learner, which processes a sequence of past adapter updates from clients and generates calibrations for personalized adapters. To effectively capture the cross-client and cross-step relations hidden in previous updates and generate high-performing personalized adapters, pFedSeq adopts the powerful selective state space model (SSM) as the architecture of sequential learner. Through extensive experiments on four public benchmark datasets, we demonstrate the superiority of pFedSeq over state-of-the-art PFL methods.

ICRA Conference 2025 Conference Paper

MARF: Cooperative Multi-Agent Path Finding with Reinforcement Learning and Frenet Lattice in Dynamic Environments

  • Tianyang Hu
  • Zhen Zhang
  • Chengrui Zhu
  • Gang Xu
  • Yuchen Wu
  • Huifeng Wu
  • Yong Liu

Multi-agent path finding (MAPF) in dynamic and complex environments is a highly challenging task. Recent research has focused on the scalability of agent numbers or the complexity of the environment. Usually, they disregard the agents' physical constraints or use a differential-driven model. However, this approach fails to adequately capture the kinematic and dynamic constraints of real-world vehicles, particularly those equipped with Ackermann steering. This paper presents a novel algorithm named MARF that combines multi-agent reinforcement learning (MARL) with a Frenet lattice planner. The MARL foundation endows the algorithm with enhanced generalization capabilities while preserving computational efficiency. By incorporating Frenet lattice trajectories into the action space of the MARL framework, agents are capable of generating smooth and feasible trajectories that respect the kinematic and dynamic constraints. In addition, we adopt a centralized training and decentralized execution (CTDE) framework, where a network of shared value functions enables efficient cooperation among agents during decision-making. Simulation results and real-world experiments in different scenarios demonstrate that our method achieves superior performance in terms of success rate, average speed, extra distance of trajectory, and computing time.

NeurIPS Conference 2025 Conference Paper

OLinear: A Linear Model for Time Series Forecasting in Orthogonally Transformed Domain

  • Wenzhen Yue
  • Yong Liu
  • Hao Wang
  • Haoxuan Li
  • Xianghua Ying
  • Ruohao Guo
  • Bowei Xing
  • Ji Shi

This paper presents $\mathbf{OLinear}$, a $\mathbf{linear}$-based multivariate time series forecasting model that operates in an $\mathbf{o}$rthogonally transformed domain. Recent forecasting models typically adopt the temporal forecast (TF) paradigm, which directly encode and decode time series in the time domain. However, the entangled step-wise dependencies in series data can hinder the performance of TF. To address this, some forecasters conduct encoding and decoding in the transformed domain using fixed, dataset-independent bases (e. g. , sine and cosine signals in the Fourier transform). In contrast, we propose $\mathbf{OrthoTrans}$, a data-adaptive transformation based on an orthogonal matrix that diagonalizes the series' temporal Pearson correlation matrix. This approach enables more effective encoding and decoding in the decorrelated feature domain and can serve as a plug-in module to enhance existing forecasters. To enhance the representation learning for multivariate time series, we introduce a customized linear layer, $\mathbf{NormLin}$, which employs a normalized weight matrix to capture multivariate dependencies. Empirically, the NormLin module shows a surprising performance advantage over multi-head self-attention, while requiring nearly half the FLOPs. Extensive experiments on 24 benchmarks and 140 forecasting tasks demonstrate that OLinear consistently achieves state-of-the-art performance with high efficiency. Notably, as a plug-in replacement for self-attention, the NormLin module consistently enhances Transformer-based forecasters. The code and datasets are available at https: //github. com/jackyue1994/OLinear.

NeurIPS Conference 2025 Conference Paper

P-Law: Predicting Quantitative Scaling Law with Entropy Guidance in Large Recommendation Models

  • Tingjia Shen
  • Hao Wang
  • Chuhan Wu
  • Jin Yao Chin
  • Wei Guo
  • Yong Liu
  • Huifeng Guo
  • Defu Lian

With the growing size of data and models in Large Recommendation Models, the time required for debugging has become increasingly prohibitive, underscoring the urgent need for effective guidance in parameter configuration. The Scaling Law (SL) offers analogous guidance in the Sequential Language domain, having achieved significant success by predicting model loss when scaling model size. However, the existing guidance from SL for Sequential Recommendation (SR) remains qualitative, which is because quantitative analysis of SL on SR encounters challenges with quality measurement on redundant sequences along with loss-performance discrepancy. In response, we introduce the Performance Law (P-Law) for SR models, which predicts model performance across various settings, intending to provide a quantitative framework for guiding the parameter optimization of future models. Initially, Performance Law utilizes Real Entropy to measure data quality, aiming to remove the low-quality influence of low-entropy redundant sequences. Subsequently, Performance Law investigates a fitting decay term, which facilitated the prediction of the major loss-performance discrepancy phenomena of overfitting, ultimately achieving quantitative performance prediction. Extensive experiment on various datasets demonstrates the effectiveness of Performance Law by displaying exceptional quantitative prediction ability against the original and modified qualitative SL. Additional application experiments on optimal parameter prediction and model expansion potential prediction also demonstrated the broad applicability of the Performance Law.

ICRA Conference 2025 Conference Paper

SARO: Space-Aware Robot System for Terrain Crossing via Vision-Language Model

  • Shaoting Zhu
  • Derun Li
  • Linzhan Mou
  • Yong Liu
  • Ningyi Xu
  • Hang Zhao 0021

The application of vision-language models (VLMs) has achieved impressive success in various robotics tasks. However, there are few explorations for foundation models used in quadruped robot navigation through terrains in 3D environments. We introduce SARO (Space-Aware Robot System for Terrain Crossing), an innovative system composed of a high-level reasoning module, a closed-loop sub-task execution module, and a low-level control policy. It enables the robot to navigate across 3D terrains and reach the goal position. For high-level reasoning and execution, we propose a novel algorithmic system taking advantage of a VLM, with a design of task decomposition and a closed-loop sub-task execution mechanism. For low-level locomotion control, we utilize the Probability Annealing Selection (PAS) method to effectively train a control policy by reinforcement learning. Numerous experiments show that our whole system can accurately and robustly navigate across several 3D terrains, and its generalization ability ensures the applications in diverse indoor and outdoor scenarios and terrains. Appendix and Videos can be found in project page: https://saro-vlm.github.io/.

NeurIPS Conference 2025 Conference Paper

Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning

  • Yong Liu
  • Zirui Zhu
  • Chaoyu Gong
  • Minhao Cheng
  • Cho-Jui Hsieh
  • Yang You

While fine-tuning large language models (LLMs) for specific tasks often yields impressive results, it comes at the cost of memory inefficiency due to back-propagation in gradient-based training. Memory-efficient Zeroth-order (MeZO) optimizers, recently proposed to address this issue, only require forward passes during training, making them more memory-friendly. However, compared with exact gradients, ZO-based gradients usually exhibit an estimation error, which can significantly hurt the optimization process, leading to slower convergence and suboptimal solutions. In addition, we find that the estimation error will hurt more when adding to large weights instead of small weights. Based on this observation, this paper introduces Sparse MeZO, a novel memory-efficient zeroth-order optimization approach that applies ZO only to a carefully chosen subset of parameters. We propose a simple yet effective parameter selection scheme that yields significant performance gains with Sparse-MeZO. Additionally, we develop a memory-optimized implementation for sparse masking, ensuring the algorithm requires only inference-level memory consumption, allowing Sparse-MeZO to fine-tune LLaMA-30b on a single A100 GPU. Experimental results illustrate that Sparse-MeZO consistently improves both performance and convergence speed over MeZO without any overhead. For example, it achieves a 9% absolute accuracy improvement and 3. 5x speedup over MeZO on the RTE task.

NeurIPS Conference 2025 Conference Paper

SSTAG: Structure-Aware Self-Supervised Learning Method for Text-Attributed Graphs

  • Ruyue Liu
  • Rong Yin
  • Xiangzhen Bo
  • Xiaoshuai Hao
  • Yong Liu
  • Jinwen Zhong
  • Can Ma
  • Weiping Wang

Large-scale pre-trained models have revolutionized Natural Language Processing (NLP) and Computer Vision (CV), showcasing remarkable cross-domain generalization abilities. However, in graph learning, models are typically trained on individual graph datasets, limiting their capacity to transfer knowledge across different graphs and tasks. This approach also heavily relies on large volumes of annotated data, which presents a significant challenge in resource-constrained settings. Unlike NLP and CV, graph-structured data presents unique challenges due to its inherent heterogeneity, including domain-specific feature spaces and structural diversity across various applications. To address these challenges, we propose a novel structure-aware self-supervised learning method for Text-Attributed Graphs (SSTAG). By leveraging text as a unified representation medium for graph learning, SSTAG bridges the gap between the semantic reasoning of Large Language Models (LLMs) and the structural modeling capabilities of Graph Neural Networks (GNNs). Our approach introduces a dual knowledge distillation framework that co-distills both LLMs and GNNs into structure-aware multilayer perceptrons (MLPs), enhancing the scalability of large-scale TAGs. Additionally, we introduce an in-memory mechanism that stores typical graph representations, aligning them with memory anchors in an in-memory repository to integrate invariant knowledge, thereby improving the model’s generalization ability. Extensive experiments demonstrate that SSTAG outperforms state-of-the-art models on cross-domain transfer learning tasks, achieves exceptional scalability, and reduces inference costs while maintaining competitive performance.

AAAI Conference 2025 Conference Paper

Stability and Generalization of Zeroth-Order Decentralized Stochastic Gradient Descent with Changing Topology

  • Xiaolin Hu
  • Zixuan Gong
  • Gengze Xu
  • Wei Liu
  • Jian Luan
  • Bin Wang
  • Yong Liu

Zeroth-order (ZO) optimization as the gradient-free method has become a powerful tool when the first-order gradient is unavailable or expensive to obtain, especially in decentralized learning scenarios where data and computational resources are distributed across multiple clients. There have been many efforts to analyze the optimization convergence rate of zeroth-order decentralized stochastic gradient descent (ZO-DSGD) algorithms. However, the generalization of these methods has not been well studied. In this paper, we provide a generalization analysis of ZO-DSGD with changing topology, where the clients run zeroth-order SGD with local data and communicate with each other according to time-varying topology. We systematically analyze the generalization error in convex, strongly convex, and non-convex cases. The obtained results in the convex and strongly convex cases with zeroth-order oracles recover the results of SGD. Moreover, the generalization bounds derived in non-convex cases align with that of DSGD. To capture the influence of communication topology on the generalization performance, we analyze local generalization bounds concerning local models held at different clients. The obtained results reflect the influence of the number of clients, local sample size, and topology on the generalization error. To the best of our knowledge, this is the first work that provides a generalization analysis of zeroth-order decentralized stochastic gradient descent methods and recovers the results of SGD.

NeurIPS Conference 2025 Conference Paper

Stability and Sharper Risk Bounds with Convergence Rate $\tilde{O}(1/n^2)$

  • Bowei Zhu
  • Shaojie Li
  • Mingyang Yi
  • Yong Liu

Prior work (Klochkov \& Zhivotovskiy, 2021) establishes at most $O\left(\log (n)/n\right)$ excess risk bounds via algorithmic stability for strongly-convex learners with high probability. We show that under the similar common assumptions — Polyak-Lojasiewicz condition, smoothness, and Lipschitz continous for losses — rates of $O\left(\log^2(n)/n^2\right)$ are at most achievable. To our knowledge, our analysis also provides the tightest high-probability bounds for gradient-based generalization gaps in nonconvex settings.

IJCAI Conference 2025 Conference Paper

Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization

  • Xinhao Yao
  • Hongjin Qian
  • Xiaolin Hu
  • Gengze Xu
  • Wei Liu
  • Jian Luan
  • Bin Wang
  • Yong Liu

Large Language Models (LLMs), built on Transformer architectures, exhibit remarkable generalization across a wide range of tasks. However, fine-tuning these models for specific tasks remains resource-intensive due to their extensive parameterization. In this paper, we explore two remarkable phenomena related to the attention mechanism during the fine-tuning of LLMs (where Wq, Wk, and Wv denote the weights of the query, key, and value layers, respectively). The first phenomenon, termed “Unequal Importance of Attention Matrices”, highlights the impact of fine-tuning different weight matrices. It shows that optimizing the Wv matrix yields significantly better performance than optimizing the Wk matrix. Fine-tuning only the Wq and Wv matrices is computationally efficient while delivering results comparable to, or even better than fine-tuning all three matrices (Wq, Wk, and Wv). The second phenomenon, “Attention Matrices with Customized Learning Rate Lead to Better Convergence”, emphasizes the importance of assigning distinct learning rates to these matrices. Specifically, a higher learning rate for the Wv matrix compared to Wq and Wk accelerates convergence and improves performance. Building on these insights, we propose a new strategy that improves fine-tuning efficiency in terms of both storage and time. Experimental results on benchmark datasets validate the effectiveness of this approach, supporting our theoretical findings. Our analysis lays the theoretical groundwork for configuring and improving algorithms in LLMs fine-tuning.

IJCAI Conference 2025 Conference Paper

Towards Improved Risk Bounds for Transductive Learning

  • Bowei Zhu
  • Shaojie Li
  • Yong Liu

Transductive learning is a popular setting in statistic learning theory, reasoning from observed, specific training cases to specific test cases, which has been widely used in many fields such as graph neural networks and semi-supervised learning. Existing results provide fast rates of convergence based on the traditional local techniques, which need the surrogate function that upper bounds the uniform error within a localized region to be ``sub-root''. We derive new version of concentration inequality for empirical processes in transductive learning and apply generic chaining technique to relax the assumptions and gain tighter results in empirical risk minimization. Furthermore, we concentrate on the generalization of moment penalization algorithm. We design a novel estimator based on the second moment (variance) penalization and derive its learning rates, which is the first theoretical generalization analysis considering variance-based algorithms.

NeurIPS Conference 2025 Conference Paper

UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions

  • Xue zhucun
  • Jiangning Zhang
  • Teng Hu
  • Haoyang He
  • Yinan Chen
  • Yuxuan Cai
  • Yabiao Wang
  • Chengjie Wang

The quality of the video dataset (image quality, resolution, and fine-grained caption) greatly influences the performance of the video generation model. %The growing demand for video applications sets higher requirements for high-quality video generation models. %For example, the generation of movie-level Ultra-High Definition (UHD) videos and the creation of 4K short video content. %However, the existing public datasets cannot support related research and applications. %In this paper, we first propose a high-quality open-sourced UHD-4K (22. 4\% of which are 8K) text-to-video dataset named UltraVideo, which contains a wide range of topics (more than 100 kinds), and each video has 9 structured captions with one summarized caption (average of 824 words). %Specifically, we carefully design a highly automated curation process with four stages to obtain the final high-quality dataset: i) collection of diverse and high-quality video clips. ii statistical data filtering. iii) model-based data purification. iv) generation of comprehensive, structured captions. %In addition, we expand Wan to UltraWan-1K/-4K, which can natively generate high-quality 1K/4K videos with more consistent text controllability, demonstrating the effectiveness of our data curation. %We believe that this work can make a significant contribution to future research on UHD video generation. UltraVideo dataset and UltraWan models are available at https: //xzc-zju. github. io/projects/UltraVideo.

AAAI Conference 2025 Conference Paper

VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering

  • Chun-Mei Feng
  • Yang Bai
  • Tao Luo
  • Zhen Li
  • Salman Khan
  • Wangmeng Zuo
  • Rick Siow Mong Goh
  • Yong Liu

Albeit progress has been made in Composed Image Retrieval (CIR), we empirically find that a certain percentage of failure retrieval results are not consistent with their relative captions. To address this issue, this work provides a Visual Question Answering (VQA) perspective to boost the performance of CIR. The resulting VQA4CIR is a post-processing approach and can be directly plugged into existing CIR methods. Given the top-C retrieved images by a CIR method, VQA4CIR aims to decrease the adverse effect of the failure retrieval results being inconsistent with the relative caption. To find the retrieved images inconsistent with the relative caption, we resort to the "QA generation → VQA" self-verification pipeline. For QA generation, we suggest fine-tuning LLM (e.g., LLaMA) to generate several pairs of questions and answers from each relative caption. We then fine-tune LVLM (e.g., LLaVA) to obtain the VQA model. By feeding the retrieved image and question to the VQA model, one can find the images inconsistent with relative caption when the answer by VQA is inconsistent with the answer in the QA pair. Consequently, the CIR performance can be boosted by modifying the ranks of inconsistently retrieved images. Experimental results show that our proposed method outperforms state-of-the-art CIR methods on the CIRR and Fashion-IQ datasets.

NeurIPS Conference 2025 Conference Paper

X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability

  • Yu Yang
  • Alan Liang
  • Jianbiao Mei
  • Yukai Ma
  • Yong Liu
  • Gim Hee Lee

Diffusion models are advancing autonomous driving by enabling realistic data synthesis, predictive end-to-end planning, and closed-loop simulation, with a primary focus on temporally consistent generation. However, large-scale 3D scene generation requiring spatial coherence remains underexplored. In this paper, we present X-Scene, a novel framework for large-scale driving scene generation that achieves geometric intricacy, appearance fidelity, and flexible controllability. Specifically, X-Scene supports multi-granular control, including low-level layout conditioning driven by user input or text for detailed scene composition, and high-level semantic guidance informed by user intent and LLM-enriched prompts for efficient customization. To enhance geometric and visual fidelity, we introduce a unified pipeline that sequentially generates 3D semantic occupancy and corresponding multi-view images and videos, ensuring alignment and temporal consistency across modalities. We further extend local regions into large-scale scenes via consistency-aware outpainting, which extrapolates occupancy and images from previously generated areas to maintain spatial and visual coherence. The resulting scenes are lifted into high-quality 3DGS representations, supporting diverse applications such as simulation and scene exploration. Extensive experiments demonstrate that X-Scene substantially advances controllability and fidelity in large-scale scene generation, empowering data generation and simulation for autonomous driving.

AAAI Conference 2024 Conference Paper

A Multimodal, Multi-Task Adapting Framework for Video Action Recognition

  • Mengmeng Wang
  • Jiazheng Xing
  • Boyuan Jiang
  • Jun Chen
  • Jianbiao Mei
  • Xingxing Zuo
  • Guang Dai
  • Jingdong Wang

Recently, the rise of large-scale vision-language pretrained models like CLIP, coupled with the technology of Parameter-Efficient FineTuning (PEFT), has captured substantial attraction in video action recognition. Nevertheless, prevailing approaches tend to prioritize strong supervised performance at the expense of compromising the models' generalization capabilities during transfer. In this paper, we introduce a novel Multimodal, Multi-task CLIP adapting framework named M2-CLIP to address these challenges, preserving both high supervised performance and robust transferability. Firstly, to enhance the individual modality architectures, we introduce multimodal adapters to both the visual and text branches. Specifically, we design a novel visual TED-Adapter, that performs global Temporal Enhancement and local temporal Difference modeling to improve the temporal representation capabilities of the visual encoder. Moreover, we adopt text encoder adapters to strengthen the learning of semantic label information. Secondly, we design a multi-task decoder with a rich set of supervisory signals, including the original contrastive learning head, a cross-modal classification head, a cross-modal masked language modeling head, and a visual classification head. This multi-task decoder adeptly satisfies the need for strong supervised performance within a multimodal framework. Experimental results validate the efficacy of our approach, demonstrating exceptional performance in supervised learning while maintaining strong generalization in zero-shot scenarios.

AAAI Conference 2024 Conference Paper

ASWT-SGNN: Adaptive Spectral Wavelet Transform-Based Self-Supervised Graph Neural Network

  • Ruyue Liu
  • Rong Yin
  • Yong Liu
  • Weiping Wang

Graph Comparative Learning (GCL) is a self-supervised method that combines the advantages of Graph Convolutional Networks (GCNs) and comparative learning, making it promising for learning node representations. However, the GCN encoders used in these methods rely on the Fourier transform to learn fixed graph representations, which is inherently limited by the uncertainty principle involving spatial and spectral localization trade-offs. To overcome the inflexibility of existing methods and the computationally expensive eigen-decomposition and dense matrix multiplication, this paper proposes an Adaptive Spectral Wavelet Transform-based Self-Supervised Graph Neural Network (ASWT-SGNN). The proposed method employs spectral adaptive polynomials to approximate the filter function and optimize the wavelet using contrast loss. This design enables the creation of local filters in both spectral and spatial domains, allowing flexible aggregation of neighborhood information at various scales and facilitating controlled transformation between local and global information. Compared to existing methods, the proposed approach reduces computational complexity and addresses the limitation of graph convolutional neural networks, which are constrained by graph size and lack flexible control over the neighborhood aspect. Extensive experiments on eight benchmark datasets demonstrate that ASWT-SGNN accurately approximates the filter function in high-density spectral regions, avoiding costly eigen-decomposition. Furthermore, ASWT-SGNN achieves comparable performance to state-of-the-art models in node classification tasks.

NeurIPS Conference 2024 Conference Paper

AutoTimes: Autoregressive Time Series Forecasters via Large Language Models

  • Yong Liu
  • Guo Qin
  • Xiangdong Huang
  • Jianmin Wang
  • Mingsheng Long

Foundation models of time series have not been fully developed due to the limited availability of time series corpora and the underexploration of scalable pre-training. Based on the similar sequential formulation of time series and natural language, increasing research demonstrates the feasibility of leveraging large language models (LLM) for time series. Nevertheless, the inherent autoregressive property and decoder-only architecture of LLMs have not been fully considered, resulting in insufficient utilization of LLM abilities. To fully revitalize the general-purpose token transition and multi-step generation capability of large language models, we propose AutoTimes to repurpose LLMs as autoregressive time series forecasters, which projects time series into the embedding space of language tokens and autoregressively generates future predictions with arbitrary lengths. Compatible with any decoder-only LLMs, the consequent forecaster exhibits the flexibility of the lookback length and scalability with larger LLMs. Further, we formulate time series as prompts, extending the context for prediction beyond the lookback window, termed in-context forecasting. By introducing LLM-embedded textual timestamps, AutoTimes can utilize chronological information to align multivariate time series. Empirically, AutoTimes achieves state-of-the-art with 0. 1% trainable parameters and over $5\times$ training/inference speedup compared to advanced LLM-based forecasters. Code is available at this repository: https: //github. com/thuml/AutoTimes.

NeurIPS Conference 2024 Conference Paper

BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays

  • Yang Zhou
  • Tan L. Faith
  • Yanyu Xu
  • Sicong Leng
  • Xinxing Xu
  • Yong Liu
  • Rick S. Goh

Medical Vision-Language Pretraining (MedVLP) shows promise in learning generalizable and transferable visual representations from paired and unpaired medical images and reports. MedVLP can provide useful features to downstream tasks and facilitate adapting task-specific models to new setups using fewer examples. However, existing MedVLP methods often differ in terms of datasets, preprocessing, and finetuning implementations. This pose great challenges in evaluating how well a MedVLP method generalizes to various clinically-relevant tasks due to the lack of unified, standardized, and comprehensive benchmark. To fill this gap, we propose BenchX, a unified benchmark framework that enables head-to-head comparison and systematical analysis between MedVLP methods using public chest X-ray datasets. Specifically, BenchX is composed of three components: 1) Comprehensive datasets covering nine datasets and four medical tasks; 2) Benchmark suites to standardize data preprocessing, train-test splits, and parameter selection; 3) Unified finetuning protocols that accommodate heterogeneous MedVLP methods for consistent task adaptation in classification, segmentation, and report generation, respectively. Utilizing BenchX, we establish baselines for nine state-of-the-art MedVLP methods and found that the performance of some early MedVLP methods can be enhanced to surpass more recent ones, prompting a revisiting of the developments and conclusions from prior works in MedVLP. Our code are available at https: //github. com/yangzhou12/BenchX.

AAAI Conference 2024 Conference Paper

Beyond Prototypes: Semantic Anchor Regularization for Better Representation Learning

  • Yanqi Ge
  • Qiang Nie
  • Ye Huang
  • Yong Liu
  • Chengjie Wang
  • Feng Zheng
  • Wen Li
  • Lixin Duan

One of the ultimate goals of representation learning is to achieve compactness within a class and well-separability between classes. Many outstanding metric-based and prototype-based methods following the Expectation-Maximization paradigm, have been proposed for this objective. However, they inevitably introduce biases into the learning process, particularly with long-tail distributed training data. In this paper, we reveal that the class prototype is not necessarily to be derived from training features and propose a novel perspective to use pre-defined class anchors serving as feature centroid to unidirectionally guide feature learning. However, the pre-defined anchors may have a large semantic distance from the pixel features, which prevents them from being directly applied. To address this issue and generate feature centroid independent from feature learning, a simple yet effective Semantic Anchor Regularization (SAR) is proposed. SAR ensures the inter-class separability of semantic anchors in the semantic space by employing a classifier-aware auxiliary cross-entropy loss during training via disentanglement learning. By pulling the learned features to these semantic anchors, several advantages can be attained: 1) the intra-class compactness and naturally inter-class separability, 2) induced bias or errors from feature learning can be avoided, and 3) robustness to the long-tailed problem. The proposed SAR can be used in a plug-and-play manner in the existing models. Extensive experiments demonstrate that the SAR performs better than previous sophisticated prototype-based methods. The implementation is available at https://github.com/geyanqi/SAR.

JMLR Journal 2024 Journal Article

Concentration and Moment Inequalities for General Functions of Independent Random Variables with Heavy Tails

  • Shaojie Li
  • Yong Liu

The concentration of measure phenomenon serves an essential role in statistics and machine learning. This paper gives bounded difference-type concentration and moment inequalities for general functions of independent random variables with heavy tails. A general framework is presented, which can be used to prove inequalities for general functions once the moment inequality for sums of independent random variables is established. We illustrate the power of the framework by showing how it can be used to derive novel concentration and moment inequalities for bounded, Bernstein's moment condition, weak-exponential, and polynomial-moment random variables. Furthermore, we give potential applications of these inequalities to statistical learning theory. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

NeurIPS Conference 2024 Conference Paper

Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving

  • Jianbiao Mei
  • Yukai Ma
  • Xuemeng Yang
  • Licheng Wen
  • Xinyu Cai
  • Xin Li
  • Daocheng Fu
  • Bo Zhang

Autonomous driving has advanced significantly due to sensors, machine learning, and artificial intelligence improvements. However, prevailing methods struggle with intricate scenarios and causal relationships, hindering adaptability and interpretability in varied environments. To address the above problems, we introduce LeapAD, a novel paradigm for autonomous driving inspired by the human cognitive process. Specifically, LeapAD emulates human attention by selecting critical objects relevant to driving decisions, simplifying environmental interpretation, and mitigating decision-making complexities. Additionally, LeapAD incorporates an innovative dual-process decision-making module, which consists of an Analytic Process (System-II) for thorough analysis and reasoning, along with a Heuristic Process (System-I) for swift and empirical processing. The Analytic Process leverages its logical reasoning to accumulate linguistic driving experience, which is then transferred to the Heuristic Process by supervised fine-tuning. Through reflection mechanisms and a growing memory bank, LeapAD continuously improves itself from past mistakes in a closed-loop environment. Closed-loop testing in CARLA shows that LeapAD outperforms all methods relying solely on camera input, requiring 1-2 orders of magnitude less labeled data. Experiments also demonstrate that as the memory bank expands, the Heuristic Process with only 1. 8B parameters can inherit the knowledge from a GPT-4 powered Analytic Process and achieve continuous performance improvement. Project page: https: //pjlab-adg. github. io/LeapAD

AAAI Conference 2024 Conference Paper

Convolutional Spectral Kernel Learning with Generalization Guarantees (Abstract Reprint)

  • Jian Li
  • Yong Liu
  • Weiping Wang

Kernel methods are powerful tools to capture nonlinear patterns behind given data but often lead to poor performance on complicated tasks compared to convolutional neural networks. The reason is that kernel methods are still shallow and fully connected models, failing to reveal hierarchical features and local interdependencies. In this paper, to acquire hierarchical and local knowledge, we incorporate kernel methods with deep architectures and convolutional operators in a spectral kernel learning framework. Based on the inverse Fourier transform and Rademacher complexity theory, we provide the generalization error bounds for the proposed model and prove that under suitable initialization, deeper networks lead to tighter error bounds. Inspired by theoretical findings, we finally completed the convolutional spectral kernel network (CSKN) with two additional regularizers and an initialization strategy. Extensive ablation results validate the effectiveness of non-stationary spectral kernel, multiple layers, additional regularizers, and the convolutional filters, which coincide with our theoretical findings. We further devise a VGG-type 8-layers CSKN, and it outperforms the existing kernel-based networks and popular CNN models on the medium-sized image classification tasks.

NeurIPS Conference 2024 Conference Paper

Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective

  • Xinhao Yao
  • Xiaolin Hu
  • Shenzhi Yang
  • Yong Liu

Pre-trained large language models (LLMs) based on Transformer have demonstrated striking in-context learning (ICL) abilities. With a few demonstration input-label pairs, they can predict the label for an unseen input without any parameter updates. In this paper, we show an exciting phenomenon that SVD-based weight pruning can enhance ICL performance, and more surprising, pruning weights in deep layers often results in more stable performance improvements than in shallow layers. However, the underlying mechanism of those findings still remains an open question. To reveal those findings, we conduct an in-depth theoretical analysis by presenting the implicit gradient descent (GD) trajectories of ICL and giving the mutual information based generalization bounds of ICL via full implicit GD trajectories. This helps us reasonably explain the surprising experimental findings. Besides, based on all our experimental and theoretical insights, we intuitively propose a simple, model-compression and derivative-free algorithm for downstream tasks in enhancing ICL inference. Experiments on benchmark datasets and open source LLMs display the method effectiveness.

AAAI Conference 2024 Conference Paper

FedNS: A Fast Sketching Newton-Type Algorithm for Federated Learning

  • Jian Li
  • Yong Liu
  • Weiping Wang

Recent Newton-type federated learning algorithms have demonstrated linear convergence with respect to the communication rounds. However, communicating Hessian matrices is often unfeasible due to their quadratic communication complexity. In this paper, we introduce a novel approach to tackle this issue while still achieving fast convergence rates. Our proposed method, named as Federated Newton Sketch methods (FedNS), approximates the centralized Newton's method by communicating the sketched square-root Hessian instead of the exact Hessian. To enhance communication efficiency, we reduce the sketch size to match the effective dimension of the Hessian matrix. We provide convergence analysis based on statistical learning for the federated Newton sketch approaches. Specifically, our approaches reach super-linear convergence rates w.r.t. the communication rounds for the first time. We validate the effectiveness of our algorithms through various experiments, which coincide with our theoretical findings.

IJCAI Conference 2024 Conference Paper

HeterGCL: Graph Contrastive Learning Framework on Heterophilic Graph

  • Chenhao Wang
  • Yong Liu
  • Yan Yang
  • Wei Li

Graph Contrastive Learning (GCL) has attracted significant research attention due to its self-supervised ability to learn robust node representations. Unfortunately, most methods primarily focus on homophilic graphs, rendering them less effective for heterophilic graphs. In addition, the complexity of node interactions in heterophilic graphs poses considerable challenges to augmentation schemes, coding architectures, and contrastive designs for traditional GCL. In this work, we propose HeterGCL, a novel graph contrastive learning framework with structural and semantic learning to explore the true potential of GCL on heterophilic graphs. Specifically, We abandon the random augmentation scheme that leads to the destruction of the graph structure, instead introduce an adaptive neighbor aggregation strategy (ANA) to extract topology-supervised signals from neighboring nodes at different distances and explore the structural information with an adaptive local-to-global contrastive loss. In the semantic learning module, we jointly consider the original nodes' features and the similarity between nodes in the latent feature space to explore hidden associations between nodes. Experimental results on homophilic and heterophilic graphs demonstrate that HeterGCL outperforms existing self-supervised and semi-supervised baselines across various downstream tasks.

AAAI Conference 2024 Conference Paper

High-Dimensional Analysis for Generalized Nonlinear Regression: From Asymptotics to Algorithm

  • Jian Li
  • Yong Liu
  • Weiping Wang

Overparameterization often leads to benign overfitting, where deep neural networks can be trained to overfit the training data but still generalize well on unseen data. However, it lacks a generalized asymptotic framework for nonlinear regressions and connections to conventional complexity notions. In this paper, we propose a generalized high-dimensional analysis for nonlinear regression models, including various nonlinear feature mapping methods and subsampling. Specifically, we first provide an implicit regularization parameter and asymptotic equivalents related to a classical complexity notion, i.e., effective dimension. We then present a high-dimensional analysis for nonlinear ridge regression and extend it to ridgeless regression in the under-parameterized and over-parameterized regimes, respectively. We find that the limiting risks decrease with the effective dimension. Motivated by these theoretical findings, we propose an algorithm, namely RFRed, to improve generalization ability. Finally, we validate our theoretical findings and the proposed algorithm through several experiments.

JMLR Journal 2024 Journal Article

Information-Theoretic Generalization Bounds for Transductive Learning and its Applications

  • Huayi Tang
  • Yong Liu

In this paper, we establish generalization bounds for transductive learning algorithms in the context of information theory and PAC-Bayes, covering both the random sampling and the random splitting setting. First, we show that the transductive generalization gap can be controlled by the mutual information between training label selection and the hypothesis. Next, we propose the concept of transductive supersample and use it to derive transductive information-theoretic bounds involving conditional mutual information and different information measures. We further establish transductive PAC-Bayesian bounds with weaker assumptions on the type of loss function and the number of training and test data points. Lastly, we use the theoretical results to derive upper bounds for adaptive optimization algorithms under the transductive learning setting. We also apply them to semi-supervised learning and transductive graph learning scenarios, meanwhile validating the derived bounds by experiments on synthetic and real-world datasets. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

JMLR Journal 2024 Journal Article

Learning Discretized Neural Networks under Ricci Flow

  • Jun Chen
  • Hanwen Chen
  • Mengmeng Wang
  • Guang Dai
  • Ivor W. Tsang
  • Yong Liu

In this paper, we study Discretized Neural Networks (DNNs) composed of low-precision weights and activations, which suffer from either infinite or zero gradients due to the non-differentiable discrete function during training. Most training-based DNNs in such scenarios employ the standard Straight-Through Estimator (STE) to approximate the gradient w.r.t. discrete values. However, the use of STE introduces the problem of gradient mismatch, arising from perturbations in the approximated gradient. To address this problem, this paper reveals that this mismatch can be interpreted as a metric perturbation in a Riemannian manifold, viewed through the lens of duality theory. Building on information geometry, we construct the Linearly Nearly Euclidean (LNE) manifold for DNNs, providing a background for addressing perturbations. By introducing a partial differential equation on metrics, i.e., the Ricci flow, we establish the dynamical stability and convergence of the LNE metric with the $L^2$-norm perturbation. In contrast to previous perturbation theories with convergence rates in fractional powers, the metric perturbation under the Ricci flow exhibits exponential decay in the LNE manifold. Experimental results across various datasets demonstrate that our method achieves superior and more stable performance for DNNs compared to other representative training-based methods. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

AAAI Conference 2024 Conference Paper

Learning Multi-Scale Video-Text Correspondence for Weakly Supervised Temporal Article Gronding

  • Wenjia Geng
  • Yong Liu
  • Lei Chen
  • Sujia Wang
  • Jie Zhou
  • Yansong Tang

Weakly Supervised temporal Article Grounding (WSAG) is a challenging and practical task in video understanding. Specifically, given a video and a relevant article, whose sentences are at different semantic scales, WSAG aims to localize corresponding video segments for all “groundable” sentences. Compared to other grounding tasks, e.g., localizing one target segment with respect to a given sentence query, WSAG confronts an essential obstacle rooted in the intricate multi-scale information inherent within both textual and visual modalities. Existing methods overlook the modeling and alignment of such structured information present in multi-scale video segments and hierarchical textual content. To this end, we propose a Multi-Scale Video-Text Correspondence Learning (MVTCL) framework, which enhances the grounding performance in complex scenes by modeling multi-scale semantic correspondence both within and between modalities. Specifically, MVTCL initially aggregates video content spanning distinct temporal scales and leverages hierarchical textual relationships in both temporal and semantic dimensions via a semantic calibration module. Then multi-scale contrastive learning module is introduced to generate more discriminative representations by selecting typical contexts and performing inter-video contrastive learning. Through the multi-scale semantic calibration architecture and supervision design, our method achieves new state-of-the-art performance on existing WSAG benchmarks.

AAAI Conference 2024 Conference Paper

RLPeri: Accelerating Visual Perimetry Test with Reinforcement Learning and Convolutional Feature Extraction

  • Tanvi Verma
  • Linh Le Dinh
  • Nicholas Tan
  • Xinxing Xu
  • Chingyu Cheng
  • Yong Liu

Visual perimetry is an important eye examination that helps detect vision problems caused by ocular or neurological conditions. During the test, a patient's gaze is fixed at a specific location while light stimuli of varying intensities are presented in central and peripheral vision. Based on the patient's responses to the stimuli, the visual field mapping and sensitivity are determined. However, maintaining high levels of concentration throughout the test can be challenging for patients, leading to increased examination times and decreased accuracy. In this work, we present RLPeri, a reinforcement learning-based approach to optimize visual perimetry testing. By determining the optimal sequence of locations and initial stimulus values, we aim to reduce the examination time without compromising accuracy. Additionally, we incorporate reward shaping techniques to further improve the testing performance. To monitor the patient's responses over time during testing, we represent the test's state as a pair of 3D matrices. We apply two different convolutional kernels to extract spatial features across locations as well as features across different stimulus values for each location. Through experiments, we demonstrate that our approach results in a 10-20% reduction in examination time while maintaining the accuracy as compared to state-of-the-art methods. With the presented approach, we aim to make visual perimetry testing more efficient and patient-friendly, while still providing accurate results.

NeurIPS Conference 2024 Conference Paper

Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing

  • Haonan Lin
  • Yan Chen
  • Jiahao Wang
  • Wenbin An
  • Mengmeng Wang
  • Feng Tian
  • Yong Liu
  • Guang Dai

Text-guided diffusion models have significantly advanced image editing, enabling high-quality and diverse modifications driven by text prompts. However, effective editing requires inverting the source image into a latent space, a process often hindered by prediction errors inherent in DDIM inversion. These errors accumulate during the diffusion process, resulting in inferior content preservation and edit fidelity, especially with conditional inputs. We address these challenges by investigating the primary contributors to error accumulation in DDIM inversion and identify the singularity problem in traditional noise schedules as a key issue. To resolve this, we introduce the Logistic Schedule, a novel noise schedule designed to eliminate singularities, improve inversion stability, and provide a better noise space for image editing. This schedule reduces noise prediction errors, enabling more faithful editing that preserves the original content of the source image. Our approach requires no additional retraining and is compatible with various existing editing methods. Experiments across eight editing tasks demonstrate the Logistic Schedule's superior performance in content preservation and edit fidelity compared to traditional noise schedules, highlighting its adaptability and effectiveness. The project page is available at https: //lonelvino. github. io/SYE/.

ICLR Conference 2024 Conference Paper

Solving Homogeneous and Heterogeneous Cooperative Tasks with Greedy Sequential Execution

  • Shanqi Liu
  • Dong Xing
  • Pengjie Gu
  • Xinrun Wang
  • Bo An 0001
  • Yong Liu

Cooperative multi-agent reinforcement learning (MARL) is extensively used for solving complex cooperative tasks, and value decomposition methods are a prevalent approach for this domain. However, these methods have not been successful in addressing both homogeneous and heterogeneous tasks simultaneously which is a crucial aspect for the practical application of cooperative agents. On one hand, value decomposition methods demonstrate superior performance in homogeneous tasks. Nevertheless, they tend to produce agents with similar policies, which is unsuitable for heterogeneous tasks. On the other hand, solutions based on personalized observation or assigned roles are well-suited for heterogeneous tasks. However, they often lead to a trade-off situation where the agent's performance in homogeneous scenarios is negatively affected due to the aggregation of distinct policies. An alternative approach is to adopt sequential execution policies, which offer a flexible form for learning both types of tasks. However, learning sequential execution policies poses challenges in terms of credit assignment, and the limited information about subsequently executed agents can lead to sub-optimal solutions, which is known as the relative over-generalization problem. To tackle these issues, this paper proposes Greedy Sequential Execution (GSE) as a solution to learn the optimal policy that covers both scenarios. In the proposed GSE framework, we introduce an individual utility function into the framework of value decomposition to consider the complex interactions between agents. This function is capable of representing both the homogeneous and heterogeneous optimal policies. Furthermore, we utilize greedy marginal contribution calculated by the utility function as the credit value of the sequential execution policy to address the credit assignment and relative over-generalization problem. We evaluated GSE in both homogeneous and heterogeneous scenarios. The results demonstrate that GSE achieves significant improvement in performance across multiple domains, especially in scenarios involving both homogeneous and heterogeneous tasks.

NeurIPS Conference 2024 Conference Paper

TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables

  • Yuxuan Wang
  • Haixu Wu
  • Jiaxiang Dong
  • Guo Qin
  • Haoran Zhang
  • Yong Liu
  • Yunzhong Qiu
  • Jianmin Wang

Deep models have demonstrated remarkable performance in time series forecasting. However, due to the partially-observed nature of real-world applications, solely focusing on the target of interest, so-called endogenous variables, is usually insufficient to guarantee accurate forecasting. Notably, a system is often recorded into multiple variables, where the exogenous variables can provide valuable external information for endogenous variables. Thus, unlike well-established multivariate or univariate forecasting paradigms that either treat all the variables equally or ignore exogenous information, this paper focuses on a more practical setting: time series forecasting with exogenous variables. We propose a novel approach, TimeXer, to ingest external information to enhance the forecasting of endogenous variables. With deftly designed embedding layers, TimeXer empowers the canonical Transformer with the ability to reconcile endogenous and exogenous information, where patch-wise self-attention and variate-wise cross-attention are used simultaneously. Moreover, global endogenous tokens are learned to effectively bridge the causal information underlying exogenous series into endogenous temporal patches. Experimentally, TimeXer achieves consistent state-of-the-art performance on twelve real-world forecasting benchmarks and exhibits notable generality and scalability. Code is available at this repository: https: //github. com/thuml/TimeXer.

IJCAI Conference 2024 Conference Paper

Towards Sharper Risk Bounds for Minimax Problems

  • Bowei Zhu
  • Shaojie Li
  • Yong Liu

Minimax problems have achieved success in machine learning such as adversarial training, robust optimization, reinforcement learning. For theoretical analysis, current optimal excess risk bounds, which are composed by generalization error and optimization error, present 1/n-rates in strongly-convex-strongly-concave (SC-SC) settings. Existing studies mainly focus on minimax problems with specific algorithms for optimization error, with only a few studies on generalization performance, which limit better excess risk bounds. In this paper, we study the generalization bounds measured by the gradients of primal functions using uniform localized convergence. We obtain a sharper high probability generalization error bound for nonconvex-strongly-concave (NC-SC) stochastic minimax problems. Furthermore, we provide dimension-independent results under Polyak-Lojasiewicz condition for the outer layer. Based on our generalization error bound, we analyze some popular algorithms such as empirical saddle point (ESP), gradient descent ascent (GDA) and stochastic gradient descent ascent (SGDA). We derive better excess primal risk bounds with further reasonable assumptions, which, to the best of our knowledge, are n times faster than exist results in minimax problems.

NeurIPS Conference 2024 Conference Paper

Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens

  • Ruifeng Ren
  • Yong Liu

Pre-trained large language models based on Transformers have demonstrated remarkable in-context learning (ICL) abilities. With just a few demonstration examples, the models can implement new tasks without any parameter updates. However, it is still an open question to understand the mechanism of ICL. In this paper, we attempt to explore the ICL process in Transformers through a lens of representation learning. Initially, leveraging kernel methods, we figure out a dual model for one softmax attention layer. The ICL inference process of the attention layer aligns with the training procedure of its dual model, generating token representation predictions that are equivalent to the dual model's test outputs. We delve into the training process of this dual model from a representation learning standpoint and further derive a generalization error bound related to the quantity of demonstration tokens. Subsequently, we extend our theoretical conclusions to more complicated scenarios, including one Transformer layer and multiple attention layers. Furthermore, drawing inspiration from existing representation learning methods especially contrastive learning, we propose potential modifications for the attention layer. Finally, experiments are designed to support our findings.

TMLR Journal 2024 Journal Article

Understanding Fairness Surrogate Functions in Algorithmic Fairness

  • Wei Yao
  • Zhanke Zhou
  • Zhicong Li
  • Bo Han
  • Yong Liu

It has been observed that machine learning algorithms exhibit biased predictions against certain population groups. To mitigate such bias while achieving comparable accuracy, a promising approach is to introduce surrogate functions of the concerned fairness definition and solve a constrained optimization problem. However, it is intriguing in previous work that such fairness surrogate functions may yield unfair results and high instability. In this work, in order to deeply understand them, taking a widely used fairness definition—demographic parity as an example, we show that there is a surrogate-fairness gap between the fairness definition and the fairness surrogate function. Also, the theoretical analysis and experimental results about the “gap” motivate us that the fairness and stability will be affected by the points far from the decision boundary, which is the large margin points issue investigated in this paper. To address it, we propose the general sigmoid surrogate to simultaneously reduce both the surrogate-fairness gap and the variance, and offer a rigorous fairness and stability upper bound. Interestingly, the theory also provides insights into two important issues that deal with the large margin points as well as obtaining a more balanced dataset are beneficial to fairness and stability. Furthermore, we elaborate a novel and general algorithm called Balanced Surrogate, which iteratively reduces the “gap” to mitigate unfairness. Finally, we provide empirical evidence showing that our methods consistently improve fairness and stability while maintaining accuracy comparable to the baselines in three real-world datasets.

AAAI Conference 2024 Conference Paper

Unsupervised Continual Anomaly Detection with Contrastively-Learned Prompt

  • Jiaqi Liu
  • Kai Wu
  • Qiang Nie
  • Ying Chen
  • Bin-Bin Gao
  • Yong Liu
  • Jinbao Wang
  • Chengjie Wang

Unsupervised Anomaly Detection (UAD) with incremental training is crucial in industrial manufacturing, as unpredictable defects make obtaining sufficient labeled data infeasible. However, continual learning methods primarily rely on supervised annotations, while the application in UAD is limited due to the absence of supervision. Current UAD methods train separate models for different classes sequentially, leading to catastrophic forgetting and a heavy computational burden. To address this issue, we introduce a novel Unsupervised Continual Anomaly Detection framework called UCAD, which equips the UAD with continual learning capability through contrastively-learned prompts. In the proposed UCAD, we design a Continual Prompting Module (CPM) by utilizing a concise key-prompt-knowledge memory bank to guide task-invariant 'anomaly' model predictions using task-specific 'normal' knowledge. Moreover, Structure-based Contrastive Learning (SCL) is designed with the Segment Anything Model (SAM) to improve prompt learning and anomaly segmentation results. Specifically, by treating SAM's masks as structure, we draw features within the same mask closer and push others apart for general feature representations. We conduct comprehensive experiments and set the benchmark on unsupervised continual anomaly detection and segmentation, demonstrating that our method is significantly better than anomaly detection methods, even with rehearsal training. The code will be available at https://github.com/shirowalker/UCAD.

AAAI Conference 2024 Conference Paper

WaveNet: Tackling Non-stationary Graph Signals via Graph Spectral Wavelets

  • Zhirui Yang
  • Yulan Hu
  • Sheng Ouyang
  • Jingyu Liu
  • Shuqiang Wang
  • Xibo Ma
  • Wenhan Wang
  • Hanjing Su

In the existing spectral GNNs, polynomial-based methods occupy the mainstream in designing a filter through the Laplacian matrix. However, polynomial combinations factored by the Laplacian matrix naturally have limitations in message passing (e.g., over-smoothing). Furthermore, most existing spectral GNNs are based on polynomial bases, which struggle to capture the high-frequency parts of the graph spectral signal. Additionally, we also find that even increasing the polynomial order does not change this situation, which means polynomial-based models have a natural deficiency when facing high-frequency signals. To tackle these problems, we propose WaveNet, which aims to effectively capture the high-frequency part of the graph spectral signal from the perspective of wavelet bases through reconstructing the message propagation matrix. We utilize Multi-Resolution Analysis (MRA) to model this question, and our proposed method can reconstruct arbitrary filters theoretically. We also conduct node classification experiments on real-world graph benchmarks and achieve superior performance on most datasets. Our code is available at https://github.com/Bufordyang/WaveNet

AAAI Conference 2023 Conference Paper

AdaCM: Adaptive ColorMLP for Real-Time Universal Photo-Realistic Style Transfer

  • Tianwei Lin
  • Honglin Lin
  • Fu Li
  • Dongliang He
  • Wenhao Wu
  • Meiling Wang
  • Xin Li
  • Yong Liu

Photo-realistic style transfer aims at migrating the artistic style from an exemplar style image to a content image, producing a result image without spatial distortions or unrealistic artifacts. Impressive results have been achieved by recent deep models. However, deep neural network based methods are too expensive to run in real-time. Meanwhile, bilateral grid based methods are much faster but still contain artifacts like overexposure. In this work, we propose the Adaptive ColorMLP (AdaCM), an effective and efficient framework for universal photo-realistic style transfer. First, we find the complex non-linear color mapping between input and target domain can be efficiently modeled by a small multi-layer perceptron (ColorMLP) model. Then, in AdaCM, we adopt a CNN encoder to adaptively predict all parameters for the ColorMLP conditioned on each input content and style image pair. Experimental results demonstrate that AdaCM can generate vivid and high-quality stylization results. Meanwhile, our AdaCM is ultrafast and can process a 4K resolution image in 6ms on one V100 GPU.

AAMAS Conference 2023 Conference Paper

Adaptive Value Decomposition with Greedy Marginal Contribution Computation for Cooperative Multi-Agent Reinforcement Learning

  • Shanqi Liu
  • Yujing Hu
  • Runze Wu
  • Dong Xing
  • Yu Xiong
  • Changjie Fan
  • Kun Kuang
  • Yong Liu

Real-world cooperation often requires intensive coordination among agents simultaneously. This task has been extensively studied within the framework of cooperative multi-agent reinforcement learning (MARL), and value decomposition methods are among those cuttingedge solutions. However, traditional methods that learn the value function as a monotonic mixing of per-agent utilities cannot solve the tasks with non-monotonic returns. This hinders their application in generic scenarios. Recent methods tackle this problem from the perspective of implicit credit assignment by learning value functions with complete expressiveness or using additional structures to improve cooperation. However, they are either difficult to learn due to large joint action spaces or insufficient to capture the complicated interactions among agents which are essential to solving tasks with non-monotonic returns. Moreover, applications in real-world scenarios usually require policies to be interpretable, but interpretability is limited in the implicit credit assignment methods. To address these problems, we propose a novel explicit credit assignment method to address the non-monotonic problem. Our method, Adaptive Value decomposition with Greedy Marginal contribution (AVGM), is based on an adaptive value decomposition that learns the cooperative value of a group of dynamically changing agents. We first illustrate that the proposed value decomposition can consider the complicated interactions among agents and is feasible to learn in large-scale scenarios. Then, our method uses a greedy marginal contribution computed from the value decomposition as an individual credit to incentivize agents to learn the optimal cooperative policy. We further extend the module with an action encoder to guarantee the linear time complexity for computing the greedy marginal contribution. Experimental results demonstrate that our method achieves significant performance improvements Proc. of the 22nd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2023), A. Ricci, W. Yeoh, N. Agmon, B. An (eds.), May 29 – June 2, 2023, London, United Kingdom. © 2023 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org). All rights reserved. in several non-monotonic domains. Besides, we showcase that our model maintains a good sense of interpretability and rationality. This suggests our model can be applied to scenarios with more realistic demands.

JBHI Journal 2023 Journal Article

Deep Manifold Harmonic Network With Dual Attention for Brain Disorder Classification

  • Xiaoqi Sheng
  • Jiazhou Chen
  • Yong Liu
  • Bin Hu
  • Hongmin Cai

Numerous studies have shown that accurate analysis of neurological disorders contributes to the early diagnosis of brain disorders and provides a window to diagnose psychiatric disorders due to brain atrophy. The emergence of geometric deep learning approaches provides a new way to characterize geometric variations on brain networks. However, brain network data suffer from high heterogeneity and noise. Consequently, geometric deep learning methods struggle to identify discriminative and clinically meaningful representations from complex brain networks, resulting in poor diagnostic accuracy. Hence, the primary challenge in the diagnosis of brain diseases is to enhance the identification of discriminative features. To this end, this paper presents a dual-attention deep manifold harmonic discrimination (DA-DMHD) method for early diagnosis of neurodegenerative diseases. Here, a low-dimensional manifold projection is first learned to comprehensively exploit the geometric features of the brain network. Further, attention blocks with discrimination are proposed to learn a representation, which facilitates learning of group-dependent discriminant matrices to guide downstream analysis of group-specific references. Our proposed DA-DMHD model is evaluated on two independent datasets, ADNI and ADHD-200. Experimental results demonstrate that the model can tackle the hard-to-capture challenge of heterogeneous brain network topological differences and obtain excellent classifying performance in both accuracy and robustness compared with several existing state-of-the-art methods.

TIST Journal 2023 Journal Article

Fast Real-Time Video Object Segmentation with a Tangled Memory Network

  • Jianbiao Mei
  • Mengmeng Wang
  • Yu Yang
  • Yanjun Li
  • Yong Liu

In this article, we present a fast real-time tangled memory network that segments the objects effectively and efficiently for semi-supervised video object segmentation (VOS). We propose a tangled reference encoder and a memory bank organization mechanism based on a state estimator to fully utilize the mask features and alleviate memory overhead and computational burden brought by the unlimited memory bank used in many memory-based methods. First, the tangled memory network exploits the mask features that uncover abundant object information like edges and contours but are not fully explored in existing methods. Specifically, a tangled two-stream reference encoder is designed to extract and fuse the features from both RGB frames and the predicted masks. Second, to indicate the quality of the predicted mask and feedback the online prediction state for organizing the memory bank, we devise a target state estimator to learn the IoU score between the predicted mask and ground truth. Moreover, to accelerate the forward process and avoid memory overflow, we use a memory bank of fixed size to store historical features by designing a new efficient memory bank organization mechanism based on the mask state score provided by the state estimator. We conduct comprehensive experiments on the public benchmarks DAVIS and YouTube-VOS, demonstrating that our method obtains competitive results while running at high speed (66 FPS on the DAVIS16-val set).

NeurIPS Conference 2023 Conference Paper

Koopa: Learning Non-stationary Time Series Dynamics with Koopman Predictors

  • Yong Liu
  • Chenyu Li
  • Jianmin Wang
  • Mingsheng Long

Real-world time series are characterized by intrinsic non-stationarity that poses a principal challenge for deep forecasting models. While previous models suffer from complicated series variations induced by changing temporal distribution, we tackle non-stationary time series with modern Koopman theory that fundamentally considers the underlying time-variant dynamics. Inspired by Koopman theory of portraying complex dynamical systems, we disentangle time-variant and time-invariant components from intricate non-stationary series by Fourier Filter and design Koopman Predictor to advance respective dynamics forward. Technically, we propose Koopa as a novel Koopman forecaster composed of stackable blocks that learn hierarchical dynamics. Koopa seeks measurement functions for Koopman embedding and utilizes Koopman operators as linear portraits of implicit transition. To cope with time-variant dynamics that exhibits strong locality, Koopa calculates context-aware operators in the temporal neighborhood and is able to utilize incoming ground truth to scale up forecast horizon. Besides, by integrating Koopman Predictors into deep residual structure, we ravel out the binding reconstruction loss in previous Koopman forecasters and achieve end-to-end forecasting objective optimization. Compared with the state-of-the-art model, Koopa achieves competitive performance while saving 77. 3% training time and 76. 0% memory.

AAAI Conference 2023 Conference Paper

MHCCL: Masked Hierarchical Cluster-Wise Contrastive Learning for Multivariate Time Series

  • Qianwen Meng
  • Hangwei Qian
  • Yong Liu
  • Lizhen Cui
  • Yonghui Xu
  • Zhiqi Shen

Learning semantic-rich representations from raw unlabeled time series data is critical for downstream tasks such as classification and forecasting. Contrastive learning has recently shown its promising representation learning capability in the absence of expert annotations. However, existing contrastive approaches generally treat each instance independently, which leads to false negative pairs that share the same semantics. To tackle this problem, we propose MHCCL, a Masked Hierarchical Cluster-wise Contrastive Learning model, which exploits semantic information obtained from the hierarchical structure consisting of multiple latent partitions for multivariate time series. Motivated by the observation that fine-grained clustering preserves higher purity while coarse-grained one reflects higher-level semantics, we propose a novel downward masking strategy to filter out fake negatives and supplement positives by incorporating the multi-granularity information from the clustering hierarchy. In addition, a novel upward masking strategy is designed in MHCCL to remove outliers of clusters at each partition to refine prototypes, which helps speed up the hierarchical clustering process and improves the clustering quality. We conduct experimental evaluations on seven widely-used multivariate time series datasets. The results demonstrate the superiority of MHCCL over the state-of-the-art approaches for unsupervised time series representation learning.

AAAI Conference 2023 Conference Paper

Next POI Recommendation with Dynamic Graph and Explicit Dependency

  • Feiyu Yin
  • Yong Liu
  • Zhiqi Shen
  • Lisi Chen
  • Shuo Shang
  • Peng Han

Next Point-Of-Interest (POI) recommendation plays an important role in various location-based services. Its main objective is to predict the user's next interested POI based on her previous check-in information. Most existing methods directly use users' historical check-in trajectories to construct various graphs to assist sequential models to complete this task. However, as users' check-in data is extremely sparse, it is difficult to capture the potential relations between POIs by directly using these check-in data. To this end, we propose the Sequence-based Neighbour search and Prediction Model (SNPM) for next POI recommendation. In SNPM, the RotatE knowledge graph embedding and Eigenmap methods are used to extract POI relationships implied in check-in data, and build the POI similarity graph. Then, we enhance the model's generalized representations of POIs' general features by aggregating similar POIs. As the context is typically rich and valuable when making Next POI predictions, the sequence model selects which POIs to aggregate not only depends on the current state, but also needs to consider the previous POI sequence. Therefore, we construct a Sequence-based, Dynamic Neighbor Graph (SDNG) to find the similarity neighbourhood and develop a Multi-Step Dependency Prediction model (MSDP) inspired by RotatE, which explicitly leverage information from previous states. We evaluate the proposed model on two real-world datasets, and the experimental results show that the proposed method significantly outperforms existing state-of-the-art POI recommendation methods.

JMLR Journal 2023 Journal Article

Optimal Convergence Rates for Distributed Nystroem Approximation

  • Jian Li
  • Yong Liu
  • Weiping Wang

The distributed kernel ridge regression (DKRR) has shown great potential in processing complicated tasks. However, DKRR only made use of the local samples that failed to capture the global characteristics. Besides, the existing optimal learning guarantees were provided in expectation and only pertain to the attainable case that the target regression lies exactly in the kernel space. In this paper, we propose distributed learning with globally-shared Nystroem centers (DNystroem), which utilizes global information across the local clients. We also study the statistical properties of DNystroem in expectation and in probability, respectively, and obtain several state-of-the-art results with the minimax optimal learning rates. Note that, the optimal convergence rates for DNystroem pertain to the non-attainable case, while the statistical results allow more partitions and require fewer Nystroem centers. Finally, we conduct experiments on several real-world datasets to validate the effectiveness of the proposed algorithm, and the empirical results coincide with our theoretical findings. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2023. ( edit, beta )

NeurIPS Conference 2023 Conference Paper

Real3D-AD: A Dataset of Point Cloud Anomaly Detection

  • Jiaqi Liu
  • Guoyang Xie
  • Ruitao Chen
  • Xinpeng Li
  • Jinbao Wang
  • Yong Liu
  • Chengjie Wang
  • Feng Zheng

High-precision point cloud anomaly detection is the gold standard for identifying the defects of advancing machining and precision manufacturing. Despite some methodological advances in this area, the scarcity of datasets and the lack of a systematic benchmark hinder its development. We introduce Real3D-AD, a challenging high-precision point cloud anomaly detection dataset, addressing the limitations in the field. With 1, 254 high-resolution 3D items (from forty thousand to millions of points for each item), Real3D-AD is the largest dataset for high-precision 3D industrial anomaly detection to date. Real3D-AD surpasses existing 3D anomaly detection datasets available in terms of point cloud resolution (0. 0010mm-0. 0015mm), $360^{\circ}$ degree coverage and perfect prototype. Additionally, we present a comprehensive benchmark for Real3D-AD, revealing the absence of baseline methods for high-precision point cloud anomaly detection. To address this, we propose Reg3D-AD, a registration-based 3D anomaly detection method incorporating a novel feature memory bank that preserves local and global representations. Extensive experiments on the Real3D-AD dataset highlight the effectiveness of Reg3D-AD. For reproducibility and accessibility, we provide the Real3D-AD dataset, benchmark source code, and Reg3D-AD on our website: https: //github. com/M-3LAB/Real3D-AD.

AAAI Conference 2023 Conference Paper

Revisiting Item Promotion in GNN-Based Collaborative Filtering: A Masked Targeted Topological Attack Perspective

  • Yongwei Wang
  • Yong Liu
  • Zhiqi Shen

Graph neural networks (GNN) based collaborative filtering (CF) has attracted increasing attention in e-commerce and financial marketing platforms. However, there still lack efforts to evaluate the robustness of such CF systems in deployment. Fundamentally different from existing attacks, this work revisits the item promotion task and reformulates it from a targeted topological attack perspective for the first time. Specifically, we first develop a targeted attack formulation to maximally increase a target item's popularity. We then leverage gradient-based optimizations to find a solution. However, we observe the gradient estimates often appear noisy due to the discrete nature of a graph, which leads to a degradation of attack ability. To resolve noisy gradient effects, we then propose a masked attack objective that can remarkably enhance the topological attack ability. Furthermore, we design a computationally efficient approach to the proposed attack, thus making it feasible to evaluate large-large CF systems. Experiments on two real-world datasets show the effectiveness of our attack in analyzing the robustness of GNN-based CF more practically.

AAAI Conference 2023 Conference Paper

Revisiting the Spatial and Temporal Modeling for Few-Shot Action Recognition

  • Jiazheng Xing
  • Mengmeng Wang
  • Yong Liu
  • Boyu Mu

Spatial and temporal modeling is one of the most core aspects of few-shot action recognition. Most previous works mainly focus on long-term temporal relation modeling based on high-level spatial representations, without considering the crucial low-level spatial features and short-term temporal relations. Actually, the former feature could bring rich local semantic information, and the latter feature could represent motion characteristics of adjacent frames, respectively. In this paper, we propose SloshNet, a new framework that revisits the spatial and temporal modeling for few-shot action recognition in a finer manner. First, to exploit the low-level spatial features, we design a feature fusion architecture search module to automatically search for the best combination of the low-level and high-level spatial features. Next, inspired by the recent transformer, we introduce a long-term temporal modeling module to model the global temporal relations based on the extracted spatial appearance features. Meanwhile, we design another short-term temporal modeling module to encode the motion characteristics between adjacent frame representations. After that, the final predictions can be obtained by feeding the embedded rich spatial-temporal features to a common frame-level class prototype matcher. We extensively validate the proposed SloshNet on four few-shot action recognition datasets, including Something-Something V2, Kinetics, UCF101, and HMDB51. It achieves favorable results against state-of-the-art methods in all datasets.

NeurIPS Conference 2023 Conference Paper

SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation

  • Zhuoyan Luo
  • Yicheng Xiao
  • Yong Liu
  • Shuyan Li
  • Yitong Wang
  • Yansong Tang
  • Xiu Li
  • Yujiu Yang

This paper studies referring video object segmentation (RVOS) by boosting video-level visual-linguistic alignment. Recent approaches model the RVOS task as a sequence prediction problem and perform multi-modal interaction as well as segmentation for each frame separately. However, the lack of a global view of video content leads to difficulties in effectively utilizing inter-frame relationships and understanding textual descriptions of object temporal variations. To address this issue, we propose Semantic-assisted Object Cluster (SOC), which aggregates video content and textual guidance for unified temporal modeling and cross-modal alignment. By associating a group of frame-level object embeddings with language tokens, SOC facilitates joint space learning across modalities and time steps. Moreover, we present multi-modal contrastive supervision to help construct well-aligned joint space at the video level. We conduct extensive experiments on popular RVOS benchmarks, and our method outperforms state-of-the-art competitors on all benchmarks by a remarkable margin. Besides, the emphasis on temporal coherence enhances the segmentation stability and adaptability of our method in processing text expressions with temporal variations. Code is available at https: //github. com/RobertLuo1/NeurIPS2023_SOC.

NeurIPS Conference 2023 Conference Paper

SUBP: Soft Uniform Block Pruning for 1$\times$N Sparse CNNs Multithreading Acceleration

  • JINGYANG XIANG
  • Siqi Li
  • Jun Chen
  • Guang Dai
  • Shipeng Bai
  • Yukai Ma
  • Yong Liu

The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources. By constraining N consecutive weights along the output channel to be group-wise non-zero, the recent network with 1$\times$N sparsity has received tremendous popularity for its three outstanding advantages: 1) A large amount of storage space saving by a \emph{Block Sparse Row} matrix. 2) Excellent performance at a high sparsity. 3) Significant speedups on CPUs with Advanced Vector Extensions. Recent work requires selecting and fine-tuning 1$\times$N sparse weights based on dense pre-trained weights, leading to the problems such as expensive training cost and memory access, sub-optimal model quality, as well as unbalanced workload across threads (different sparsity across output channels). To overcome them, this paper proposes a novel \emph{\textbf{S}oft \textbf{U}niform \textbf{B}lock \textbf{P}runing} (SUBP) approach to train a uniform 1$\times$N sparse structured network from scratch. Specifically, our approach tends to repeatedly allow pruned blocks to regrow to the network based on block angular redundancy and importance sampling in a uniform manner throughout the training process. It not only makes the model less dependent on pre-training, reduces the model redundancy and the risk of pruning the important blocks permanently but also achieves balanced workload. Empirically, on ImageNet, comprehensive experiments across various CNN architectures show that our SUBP consistently outperforms existing 1$\times$N and structured sparsity methods based on pre-trained models or training from scratch. Source codes and models are available at \url{https: //github. com/JingyangXiang/SUBP}.

IJCAI Conference 2023 Conference Paper

Towards Sharp Analysis for Distributed Learning with Random Features

  • Jian Li
  • Yong Liu

In recent studies, the generalization properties for distributed learning and random features assumed the existence of the target concept over the hypothesis space. However, this strict condition is not applicable to the more common non-attainable case. In this paper, using refined proof techniques, we first extend the optimal rates for distributed learning with random features to the non-attainable case. Then, we reduce the number of required random features via data-dependent generating strategy, and improve the allowed number of partitions with additional unlabeled data. Theoretical analysis shows these techniques remarkably reduce computational cost while preserving the optimal generalization accuracy under standard assumptions. Finally, we conduct several experiments on both simulated and real-world datasets, and the empirical results validate our theoretical findings.

AAAI Conference 2023 Conference Paper

Understanding the Generalization Performance of Spectral Clustering Algorithms

  • Shaojie Li
  • Sheng Ouyang
  • Yong Liu

The theoretical analysis of spectral clustering is mainly devoted to consistency, while there is little research on its generalization performance. In this paper, we study the excess risk bounds of the popular spectral clustering algorithms: relaxed RatioCut and relaxed NCut. Our analysis follows the two practical steps of spectral clustering algorithms: continuous solution and discrete solution. Firstly, we provide the convergence rate of the excess risk bounds between the empirical continuous optimal solution and the population-level continuous optimal solution. Secondly, we show the fundamental quantity influencing the excess risk between the empirical discrete optimal solution and the population-level discrete optimal solution. At the empirical level, algorithms can be designed to reduce this quantity. Based on our theoretical analysis, we propose two novel algorithms that can penalize this quantity and, additionally, can cluster the out-of-sample data without re-eigendecomposition on the overall samples. Numerical experiments on toy and real datasets confirm the effectiveness of our proposed algorithms.

JBHI Journal 2022 Journal Article

Deep Supervised Domain Adaptation for Pneumonia Diagnosis From Chest X-Ray Images

  • Yangqin Feng
  • Xinxing Xu
  • Yan Wang
  • Xiaofeng Lei
  • Soo Kng Teo
  • Jordan Zheng Ting Sim
  • Yonghan Ting
  • Liangli Zhen

Pneumonia is one of the most common treatable causes of death, and early diagnosis allows for early intervention. Automated diagnosis of pneumonia can therefore improve outcomes. However, it is challenging to develop high-performance deep learning models due to the lack of well-annotated data for training. This paper proposes a novel method, called Deep Supervised Domain Adaptation (DSDA), to automatically diagnose pneumonia from chest X-ray images. Specifically, we propose to transfer the knowledge from a publicly available large-scale source dataset (ChestX-ray14) to a well-annotated but small-scale target dataset (the TTSH dataset). DSDA aligns the distributions of the source domain and the target domain according to the underlying semantics of the training samples. It includes two task-specific sub-networks for the source domain and the target domain, respectively. These two sub-networks share the feature extraction layers and are trained in an end-to-end manner. Unlike most existing domain adaptation approaches that perform the same tasks in the source domain and the target domain, we attempt to transfer the knowledge from a multi-label classification task in the source domain to a binary classification task in the target domain. To evaluate the effectiveness of our method, we compare it with several existing peer methods. The experimental results show that our method can achieve promising performance for automated pneumonia diagnosis.

AAAI Conference 2022 Conference Paper

Distributed Randomized Sketching Kernel Learning

  • Rong Yin
  • Yong Liu
  • Dan Meng

We investigate the statistical and computational requirements for distributed kernel ridge regression with randomized sketching (DKRR-RS) and successfully achieve the optimal learning rates with only a fraction of computations. More precisely, the proposed DKRR-RS combines sparse randomized sketching, divide-and-conquer and KRR to scale up kernel methods and successfully derives the same learning rate as the exact KRR with greatly reducing computational costs in expectation, at the basic setting, which outperforms previous state of the art solutions. Then, for the sake of the gap between theory and experiments, we derive the optimal learning rate in probability for DKRR-RS to reflect its generalization performance. Finally, to further improve the learning performance, we construct an efficient communication strategy for DKRR-RS and demonstrate the power of communications via theoretical assessment. An extensive experiment validates the effectiveness of DKRR-RS and the communication strategy on real datasets.

IJCAI Conference 2022 Conference Paper

Enhancing Sequential Recommendation with Graph Contrastive Learning

  • Yixin Zhang
  • Yong Liu
  • Yonghui Xu
  • Hao Xiong
  • Chenyi Lei
  • Wei He
  • Lizhen Cui
  • Chunyan Miao

The sequential recommendation systems capture users' dynamic behavior patterns to predict their next interaction behaviors. Most existing sequential recommendation methods only exploit the local context information of an individual interaction sequence and learn model parameters solely based on the item prediction loss. Thus, they usually fail to learn appropriate sequence representations. This paper proposes a novel recommendation framework, namely Graph Contrastive Learning for Sequential Recommendation (GCL4SR). Specifically, GCL4SR employs a Weighted Item Transition Graph (WITG), built based on interaction sequences of all users, to provide global context information for each interaction and weaken the noise information in the sequence data. Moreover, GCL4SR uses subgraphs of WITG to augment the representation of each interaction sequence. Two auxiliary learning objectives have also been proposed to maximize the consistency between augmented representations induced by the same interaction sequence on WITG, and minimize the difference between the representations augmented by the global context on WITG and the local representation of the original sequence. Extensive experiments on real-world datasets demonstrate that GCL4SR consistently outperforms state-of-the-art sequential recommendation methods.

NeurIPS Conference 2022 Conference Paper

Fine-Grained Analysis of Stability and Generalization for Modern Meta Learning Algorithms

  • Jiechao Guan
  • Yong Liu
  • Zhiwu Lu

The support/query episodic training strategy has been widely applied in modern meta learning algorithms. Supposing the $n$ training episodes and the test episodes are sampled independently from the same environment, previous work has derived a generalization bound of $O(1/\sqrt{n})$ for smooth non-convex functions via algorithmic stability analysis. In this paper, we provide fine-grained analysis of stability and generalization for modern meta learning algorithms by considering more general situations. Firstly, we develop matching lower and upper stability bounds for meta learning algorithms with two types of loss functions: (1) nonsmooth convex functions with $\alpha$-H{\"o}lder continuous subgradients $(\alpha \in [0, 1))$; (2) smooth (including convex and non-convex) functions. Our tight stability bounds show that, in the nonsmooth convex case, meta learning algorithms can be inherently less stable than in the smooth convex case. For the smooth non-convex functions, our stability bound is sharper than the existing one, especially in the setting where the number of iterations is larger than the number $n$ of training episodes. Secondly, we derive improved generalization bounds for meta learning algorithms that hold with high probability. Specifically, we first demonstrate that, under the independent episode environment assumption, the generalization bound of $O(1/\sqrt{n})$ via algorithmic stability analysis is near optimal. To attain faster convergence rate, we show how to yield a deformed generalization bound of $O(\ln{n}/n)$ with the curvature condition of loss functions. Finally, we obtain a generalization bound for meta learning with dependent episodes whose dependency relation is characterized by a graph. Experiments on regression problems are conducted to verify our theoretical results.

AAAI Conference 2022 Conference Paper

Go Wider Instead of Deeper

  • Fuzhao Xue
  • Ziji Shi
  • Futao Wei
  • Yuxuan Lou
  • Yong Liu
  • Yang You

More transformer blocks with residual connections have recently achieved impressive results on various tasks. To achieve better performance with fewer trainable parameters, recent methods are proposed to go shallower by parameter sharing or model compressing along with the depth. However, weak modeling capacity limits their performance. Contrastively, going wider by inducing more trainable matrixes and parameters would produce a huge model requiring advanced parallelism to train and inference. In this paper, we propose a parameter-efficient framework, going wider instead of deeper. Specially, following existing works, we adapt parameter sharing to compress along depth. But, such deployment would limit the performance. To maximize modeling capacity, we scale along model width by replacing feed-forward network (FFN) with mixture-ofexperts (MoE). Across transformer blocks, instead of sharing normalization layers, we propose to use individual layernorms to transform various semantic representations in a more parameter-efficient way. To evaluate our plug-and-run framework, we design WideNet and conduct comprehensive experiments on popular computer vision and natural language processing benchmarks. On ImageNet-1K, our best model outperforms Vision Transformer (ViT) by 1. 5% with 0. 72× trainable parameters. Using 0. 46× and 0. 13× parameters, our WideNet can still surpass ViT and ViT-MoE by 0. 8% and 2. 1%, respectively. On four natural language processing datasets, WideNet outperforms ALBERT by 1. 8% on average and surpass BERT using factorized embedding parameterization by 0. 8% with fewer parameters.

AAAI Conference 2022 Conference Paper

Guide Local Feature Matching by Overlap Estimation

  • Ying Chen
  • Dihe Huang
  • Shang Xu
  • Jianlin Liu
  • Yong Liu

Local image feature matching under large appearance, viewpoint, and distance changes is challenging yet important. Conventional methods detect and match tentative local features across the whole images, with heuristic consistency checks to guarantee reliable matches. In this paper, we introduce a novel Overlap Estimation method conditioned on image pairs with TRansformer, named OETR, to constrain local feature matching in the commonly visible region. OETR performs overlap estimation in a two step process of feature correlation and then overlap regression. As a preprocessing module, OETR can be plugged into any existing local feature detection and matching pipeline, to mitigate potential view angle or scale variance. Intensive experiments show that OETR can boost state of the art local feature matching performance substantially, especially for image pairs with small shared regions. The code will be publicly available at https: //github. com/AbyssGaze/OETR.

NeurIPS Conference 2022 Conference Paper

Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting

  • Yong Liu
  • Haixu Wu
  • Jianmin Wang
  • Mingsheng Long

Transformers have shown great power in time series forecasting due to their global-range modeling ability. However, their performance can degenerate terribly on non-stationary real-world data in which the joint distribution changes over time. Previous studies primarily adopt stationarization to attenuate the non-stationarity of original series for better predictability. But the stationarized series deprived of inherent non-stationarity can be less instructive for real-world bursty events forecasting. This problem, termed over-stationarization in this paper, leads Transformers to generate indistinguishable temporal attentions for different series and impedes the predictive capability of deep models. To tackle the dilemma between series predictability and model capability, we propose Non-stationary Transformers as a generic framework with two interdependent modules: Series Stationarization and De-stationary Attention. Concretely, Series Stationarization unifies the statistics of each input and converts the output with restored statistics for better predictability. To address the over-stationarization problem, De-stationary Attention is devised to recover the intrinsic non-stationary information into temporal dependencies by approximating distinguishable attentions learned from raw series. Our Non-stationary Transformers framework consistently boosts mainstream Transformers by a large margin, which reduces MSE by 49. 43% on Transformer, 47. 34% on Informer, and 46. 89% on Reformer, making them the state-of-the-art in time series forecasting. Code is available at this repository: https: //github. com/thuml/Nonstationary_Transformers.

NeurIPS Conference 2022 Conference Paper

Random Sharpness-Aware Minimization

  • Yong Liu
  • Siqi Mai
  • Minhao Cheng
  • Xiangning Chen
  • Cho-Jui Hsieh
  • Yang You

Currently, Sharpness-Aware Minimization (SAM) is proposed to seek the parameters that lie in a flat region to improve the generalization when training neural networks. In particular, a minimax optimization objective is defined to find the maximum loss value centered on the weight, out of the purpose of simultaneously minimizing loss value and loss sharpness. For the sake of simplicity, SAM applies one-step gradient ascent to approximate the solution of the inner maximization. However, one-step gradient ascent may not be sufficient and multi-step gradient ascents will cause additional training costs. Based on this observation, we propose a novel random smoothing based SAM (R-SAM) algorithm. To be specific, R-SAM essentially smooths the loss landscape, based on which we are able to apply the one-step gradient ascent on the smoothed weights to improve the approximation of the inner maximization. Further, we evaluate our proposed R-SAM on CIFAR and ImageNet datasets. The experimental results illustrate that R-SAM can consistently improve the performance on ResNet and Vision Transformer (ViT) training.

NeurIPS Conference 2022 Conference Paper

Randomized Sketches for Clustering: Fast and Optimal Kernel $k$-Means

  • Rong Yin
  • Yong Liu
  • Weiping Wang
  • Dan Meng

Kernel $k$-means is arguably one of the most common approaches to clustering. In this paper, we investigate the efficiency of kernel $k$-means combined with randomized sketches in terms of both statistical analysis and computational requirements. More precisely, we propose a unified randomized sketches framework to kernel $k$-means and investigate its excess risk bounds, obtaining the state-of-the-art risk bound with only a fraction of computations. Indeed, we prove that it suffices to choose the sketch dimension $\Omega(\sqrt{n})$ to obtain the same accuracy of exact kernel $k$-means with greatly reducing the computational costs, for sub-Gaussian sketches, the randomized orthogonal system (ROS) sketches, and Nystr\"{o}m kernel $k$-means, where $n$ is the number of samples. To the best of our knowledge, this is the first result of this kind for unsupervised learning. Finally, the numerical experiments on simulated data and real-world datasets validate our theoretical analysis.

JMLR Journal 2022 Journal Article

Ranking and Tuning Pre-trained Models: A New Paradigm for Exploiting Model Hubs

  • Kaichao You
  • Yong Liu
  • Ziyang Zhang
  • Jianmin Wang
  • Michael I. Jordan
  • Mingsheng Long

Model hubs with many pre-trained models (PTMs) have become a cornerstone of deep learning. Although built at a high cost, they remain under-exploited---practitioners usually pick one PTM from the provided model hub by popularity and then fine-tune the PTM to solve the target task. This naïve but common practice poses two obstacles to full exploitation of pre-trained model hubs: first, the PTM selection by popularity has no optimality guarantee, and second, only one PTM is used while the remaining PTMs are ignored. An alternative might be to consider all possible combinations of PTMs and extensively fine-tune each combination, but this would not only be prohibitive computationally but may also lead to statistical over-fitting. In this paper, we propose a new paradigm for exploiting model hubs that is intermediate between these extremes. The paradigm is characterized by two aspects: (1) We use an evidence maximization procedure to estimate the maximum value of label evidence given features extracted by pre-trained models. This procedure can rank all the PTMs in a model hub for various types of PTMs and tasks before fine-tuning. (2) The best ranked PTM can either be fine-tuned and deployed if we have no preference for the model's architecture or the target PTM can be tuned by the top $K$ ranked PTMs via a Bayesian procedure that we propose. This procedure, which we refer to as B-Tuning, not only improves upon specialized methods designed for tuning homogeneous PTMs, but also applies to the challenging problem of tuning heterogeneous PTMs where it yields a new level of benchmark performance. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2022. ( edit, beta )

IJCAI Conference 2022 Conference Paper

Ridgeless Regression with Random Features

  • Jian Li
  • Yong Liu
  • Yingying Zhang

Recent theoretical studies illustrated that kernel ridgeless regression can guarantee good generalization ability without an explicit regularization. In this paper, we investigate the statistical properties of ridgeless regression with random features and stochastic gradient descent. We explore the effect of factors in the stochastic gradient and random features, respectively. Specifically, random features error exhibits the double-descent curve. Motivated by the theoretical findings, we propose a tunable kernel algorithm that optimizes the spectral density of kernel during training. Our work bridges the interpolation theory and practical algorithm.

AAAI Conference 2022 Conference Paper

SCSNet: An Efficient Paradigm for Learning Simultaneously Image Colorization and Super-resolution

  • Jiangning Zhang
  • Chao Xu
  • Jian Li
  • Yue Han
  • Yabiao Wang
  • Ying Tai
  • Yong Liu

In the practical application of restoring low-resolution grayscale images, we generally need to run three separate processes of image colorization, super-resolution, and dowssampling operation for the target device. However, this pipeline is redundant and inefficient for the independent processes, and some inner features could have been shared. Therefore, we present an efficient paradigm to perform Simultaneously Image Colorization and Super-resolution (SCS) and propose an end-to-end SCSNet to achieve this goal. The proposed method consists of two parts: colorization branch for learning color information that employs the proposed plug-and-play Pyramid Valve Cross Attention (PV- CAttn) module to aggregate feature maps between source and reference images; and super-resolution branch for integrating color and texture information to predict target images, which uses the designed Continuous Pixel Mapping (CPM) module to predict high-resolution images at continuous magnification. Furthermore, our SCSNet supports both automatic and referential modes that is more flexible for practical application. Abundant experiments demonstrate the superiority of our method for generating authentic images over state-of-theart methods, e. g. , averagely decreasing FID by 1. 8↓ and 5. 1 ↓ compared with current best scores for automatic and referential modes, respectively, while owning fewer parameters (more than ×2↓) and faster running speed (more than ×3↑).

NeurIPS Conference 2022 Conference Paper

SoftPatch: Unsupervised Anomaly Detection with Noisy Data

  • Xi Jiang
  • Jianlin Liu
  • Jinbao Wang
  • Qiang Nie
  • Kai Wu
  • Yong Liu
  • Chengjie Wang
  • Feng Zheng

Although mainstream unsupervised anomaly detection (AD) algorithms perform well in academic datasets, their performance is limited in practical application due to the ideal experimental setting of clean training data. Training with noisy data is an inevitable problem in real-world anomaly detection but is seldom discussed. This paper considers label-level noise in image sensory anomaly detection for the first time. To solve this problem, we proposed a memory-based unsupervised AD method, SoftPatch, which efficiently denoises the data at the patch level. Noise discriminators are utilized to generate outlier scores for patch-level noise elimination before coreset construction. The scores are then stored in the memory bank to soften the anomaly detection boundary. Compared with existing methods, SoftPatch maintains a strong modeling ability of normal data and alleviates the overconfidence problem in coreset. Comprehensive experiments in various noise scenes demonstrate that SoftPatch outperforms the state-of-the-art AD methods on the MVTecAD and BTAD benchmarks and is comparable to those methods under the setting without noise.

NeurIPS Conference 2022 Conference Paper

Stability and Generalization of Kernel Clustering: from Single Kernel to Multiple Kernel

  • Weixuan Liang
  • Xinwang Liu
  • Yong Liu
  • Sihang Zhou
  • Jun-Jie Huang
  • Siwei Wang
  • Jiyuan Liu
  • Yi Zhang

Multiple kernel clustering (MKC) is an important research topic that has been widely studied for decades. However, current methods still face two problems: inefficient when handling out-of-sample data points and lack of theoretical study of the stability and generalization of clustering. In this paper, we propose a novel method that can efficiently compute the embedding of out-of-sample data with a solid generalization guarantee. Specifically, we approximate the eigen functions of the integral operator associated with the linear combination of base kernel functions to construct low-dimensional embeddings of out-of-sample points for efficient multiple kernel clustering. In addition, we, for the first time, theoretically study the stability of clustering algorithms and prove that the single-view version of the proposed method has uniform stability as $\mathcal{O}\left(Kn^{-3/2}\right)$ and establish an upper bound of excess risk as $\widetilde{\mathcal{O}}\left(Kn^{-3/2}+n^{-1/2}\right)$, where $K$ is the cluster number and $n$ is the number of samples. We then extend the theoretical results to multiple kernel scenarios and find that the stability of MKC depends on kernel weights. As an example, we apply our method to a novel MKC algorithm termed SimpleMKKM and derive the upper bound of its excess clustering risk, which is tighter than the current results. Extensive experimental results validate the effectiveness and efficiency of the proposed method.

AAAI Conference 2021 Conference Paper

A Hybrid Bandit Framework for Diversified Recommendation

  • Qinxu Ding
  • Yong Liu
  • Chunyan Miao
  • Fei Cheng
  • Haihong Tang

The interactive recommender systems involve users in the recommendation procedure by receiving timely user feedback to update the recommendation policy. Therefore, they are widely used in real application scenarios. Previous interactive recommendation methods primarily focus on learning users’ personalized preferences on the relevance properties of an item set. However, the investigation of users’ personalized preferences on the diversity properties of an item set is usually ignored. To overcome this problem, we propose the Linear Modular Dispersion Bandit (LMDB) framework, which is an online learning setting for optimizing a combination of modular functions and dispersion functions. Specifically, LMDB employs modular functions to model the relevance properties of each item, and dispersion functions to describe the diversity properties of an item set. Moreover, we also develop a learning algorithm, called Linear Modular Dispersion Hybrid (LMDH) to solve the LMDB problem and derive a gap-free bound on its n-step regret. Extensive experiments on real datasets are performed to demonstrate the effectiveness of the proposed LMDB framework in balancing the recommendation accuracy and diversity.

NeurIPS Conference 2021 Conference Paper

Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model

  • Jiangning Zhang
  • Chao Xu
  • Jian Li
  • Wenzhou Chen
  • Yabiao Wang
  • Ying Tai
  • Shuo Chen
  • Chengjie Wang

Inspired by biological evolution, we explain the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA) and derive that both of them have consistent mathematical representation. Analogous to the dynamic local population in EA, we improve the existing transformer structure and propose a more efficient EAT model, and design task-related heads to deal with different tasks more flexibly. Moreover, we introduce the spatial-filling curve into the current vision transformer to sequence image data into a uniform sequential format. Thus we can design a unified EAT framework to address multi-modal tasks, separating the network architecture from the data format adaptation. Our approach achieves state-of-the-art results on the ImageNet classification task compared with recent vision transformer works while having smaller parameters and greater throughput. We further conduct multi-modal tasks to demonstrate the superiority of the unified EAT, \eg, Text-Based Image Retrieval, and our approach improves the rank-1 by +3. 7 points over the baseline on the CSS dataset.

AAAI Conference 2021 Conference Paper

FCFR-Net: Feature Fusion based Coarse-to-Fine Residual Learning for Depth Completion

  • Lina Liu
  • Xibin Song
  • Xiaoyang Lyu
  • Junwei Diao
  • Mengmeng Wang
  • Yong Liu
  • Liangjun Zhang

Depth completion aims to recover a dense depth map from a sparse depth map with the corresponding color image as input. Recent approaches mainly formulate depth completion as a one-stage end-to-end learning task, which outputs dense depth maps directly. However, the feature extraction and supervision in one-stage frameworks are insufficient, limiting the performance of these approaches. To address this problem, we propose a novel end-to-end residual learning framework, which formulates the depth completion as a two-stage learning task, i. e. , a sparse-to-coarse stage and a coarse-tofine stage. First, a coarse dense depth map is obtained by a simple CNN framework. Then, a refined depth map is further obtained using a residual learning strategy in the coarse-tofine stage with a coarse depth map and color image as input. Specially, in the coarse-to-fine stage, a channel shuffle extraction operation is utilized to extract more representative features from the color image and coarse depth map, and an energy based fusion operation is exploited to effectively fuse these features obtained by channel shuffle operation, thus leading to more accurate and refined depth maps. We achieve SoTA performance in RMSE on KITTI benchmark. Extensive experiments on other datasets future demonstrate the superiority of our approach over current state-of-the-art depth completion approaches.

AAAI Conference 2021 Conference Paper

HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation

  • Xiaoyang Lyu
  • Liang Liu
  • Mengmeng Wang
  • Xin Kong
  • Lina Liu
  • Yong Liu
  • Xinxin Chen
  • Yi Yuan

Self-supervised learning shows great potential in monocular depth estimation, using image sequences as the only source of supervision. Although people try to use high-resolution image for depth estimation, the accuracy of prediction has not been significantly improved. In this work, we find the core reason comes from the inaccurate depth estimation in large gradient regions, making the bilinear interpolation error gradually disappear as the resolution increases. To obtain more accurate depth estimation in large gradient regions, it is necessary to obtain high-resolution features with spatial and semantic information. Therefore, we present an improved DepthNet, HR-Depth, with two effective strategies: (1) redesign the skip-connection in DepthNet to get better highresolution features and (2) propose feature fusion Squeezeand-Excitation(fSE) module to fuse feature more efficiently. Using Resnet-18 as the encoder, HR-Depth surpasses all previous state-of-the-art(SoTA) methods with the least parameters at both high and low resolution. Moreover, previous SoTA methods are based on fairly complex and deep networks with many parameters which limits their real applications. Thus we also construct a lightweight network which uses MobileNetV3 as encoder. Experiments show that the lightweight network can perform on par with many large models like Monodepth2 at high-resolution with only 20% parameters. All codes and models will be available at https: //github. com/shawLyu/HR-Depth.

NeurIPS Conference 2021 Conference Paper

Improved Learning Rates of a Functional Lasso-type SVM with Sparse Multi-Kernel Representation

  • Shaogao Lv
  • Junhui Wang
  • Jiankun Liu
  • Yong Liu

In this paper, we provide theoretical results of estimation bounds and excess risk upper bounds for support vector machine (SVM) with sparse multi-kernel representation. These convergence rates for multi-kernel SVM are established by analyzing a Lasso-type regularized learning scheme within composite multi-kernel spaces. It is shown that the oracle rates of convergence of classifiers depend on the complexity of multi-kernels, the sparsity, a Bernstein condition and the sample size, which significantly improves on previous results even for the additive or linear cases. In summary, this paper not only provides unified theoretical results for multi-kernel SVMs, but also enriches the literature on high-dimensional nonparametric classification.

AAAI Conference 2021 Conference Paper

Keyword-Guided Neural Conversational Model

  • Peixiang Zhong
  • Yong Liu
  • Hao Wang
  • Chunyan Miao

We study the problem of imposing conversational goals/keywords on open-domain conversational agents, where the agent is required to lead the conversation to a target keyword smoothly and fast. Solving this problem enables the application of conversational agents in many real-world scenarios, e. g. , recommendation and psychotherapy. The dominant paradigm for tackling this problem is to 1) train a next-turn keyword classifier, and 2) train a keyword-augmented response retrieval model. However, existing approaches in this paradigm have two limitations: 1) the training and evaluation datasets for next-turn keyword classification are directly extracted from conversations without human annotations, thus, they are noisy and have low correlation with human judgements, and 2) during keyword transition, the agents solely rely on the similarities between word embeddings to move closer to the target keyword, which may not reflect how humans converse. In this paper, we assume that human conversations are grounded on commonsense and propose a keyword-guided neural conversational model that can leverage external commonsense knowledge graphs (CKG) for both keyword transition and response retrieval. Automatic evaluations suggest that commonsense improves the performance of both next-turn keyword prediction and keyword-augmented response retrieval. In addition, both self-play and human evaluations show that our model produces responses with smoother keyword transition and reaches the target keyword faster than competitive baselines.

IJCAI Conference 2021 Conference Paper

Medical Image Segmentation using Squeeze-and-Expansion Transformers

  • Shaohua Li
  • Xiuchao Sui
  • Xiangde Luo
  • Xinxing Xu
  • Yong Liu
  • Rick Goh

Medical image segmentation is important for computer-aided diagnosis. Good segmentation demands the model to see the big picture and fine details simultaneously, i. e. , to learn image features that incorporate large context while keep high spatial resolutions. To approach this goal, the most widely used methods -- U-Net and variants, extract and fuse multi-scale features. However, the fused features still have small "effective receptive fields" with a focus on local image cues, limiting their performance. In this work, we propose Segtran, an alternative segmentation framework based on transformers, which have unlimited "effective receptive fields" even at high feature resolutions. The core of Segtran is a novel Squeeze-and-Expansion transformer: a squeezed attention block regularizes the self attention of transformers, and an expansion block learns diversified representations. Additionally, we propose a new positional encoding scheme for transformers, imposing a continuity inductive bias for images. Experiments were performed on 2D and 3D medical image segmentation tasks: optic disc/cup segmentation in fundus images (REFUGE'20 challenge), polyp segmentation in colonoscopy images, and brain tumor segmentation in MRI scans (BraTS'19 challenge). Compared with representative existing methods, Segtran consistently achieved the highest segmentation accuracy, and exhibited good cross-domain generalization capabilities.

AAAI Conference 2021 Conference Paper

One-shot Face Reenactment Using Appearance Adaptive Normalization

  • Guangming Yao
  • Yi Yuan
  • Tianjia Shao
  • Shuang Li
  • Shanqi Liu
  • Yong Liu
  • Mengmeng Wang
  • Kun Zhou

The paper proposes a novel generative adversarial network for one-shot face reenactment, which can animate a single face image to a different pose-and-expression (provided by a driving image) while keeping its original appearance. The core of our network is a novel mechanism called appearance adaptive normalization, which can effectively integrate the appearance information from the input image into our face generator by modulating the feature maps of the generator using the learned adaptive parameters. Furthermore, we specially design a local net to reenact the local facial components (i. e. , eyes, nose and mouth) first, which is a much easier task for the network to learn and can in turn provide explicit anchors to guide our face generator to learn the global appearance and pose-and-expression. Extensive quantitative and qualitative experiments demonstrate the significant efficacy of our model compared with prior one-shot methods.

NeurIPS Conference 2021 Conference Paper

Refined Learning Bounds for Kernel and Approximate $k$-Means

  • Yong Liu

Kernel $k$-means is one of the most popular approaches to clustering and its theoretical properties have been investigated for decades. However, the existing state-of-the-art risk bounds are of order $\mathcal{O}(k/\sqrt{n})$, which do not match with the stated lower bound $\Omega(\sqrt{k/n})$ in terms of $k$, where $k$ is the number of clusters and $n$ is the size of the training set. In this paper, we study the statistical properties of kernel $k$-means and Nystr\"{o}m-based kernel $k$-means, and obtain optimal clustering risk bounds, which improve the existing risk bounds. Particularly, based on a refined upper bound of Rademacher complexity [21], we first derive an optimal risk bound of rate $\mathcal{O}(\sqrt{k/n})$ for empirical risk minimizer (ERM), and further extend it to general cases beyond ERM. Then, we analyze the statistical effect of computational approximations of Nystr\"{o}m kernel $k$-means, and prove that it achieves the same statistical accuracy as the original kernel $k$-means considering only $\Omega(\sqrt{nk})$ Nystr\"{o}m landmark points. We further relax the restriction of landmark points from $\Omega(\sqrt{nk})$ to $\Omega(\sqrt{n})$ under a mild condition. Finally, we validate the theoretical findings via numerical experiments.

NeurIPS Conference 2021 Conference Paper

Searching Parameterized AP Loss for Object Detection

  • Tao Chenxin
  • Zizhang Li
  • Xizhou Zhu
  • Gao Huang
  • Yong Liu
  • Jifeng Dai

Loss functions play an important role in training deep-network-based object detectors. The most widely used evaluation metric for object detection is Average Precision (AP), which captures the performance of localization and classification sub-tasks simultaneously. However, due to the non-differentiable nature of the AP metric, traditional object detectors adopt separate differentiable losses for the two sub-tasks. Such a mis-alignment issue may well lead to performance degradation. To address this, existing works seek to design surrogate losses for the AP metric manually, which requires expertise and may still be sub-optimal. In this paper, we propose Parameterized AP Loss, where parameterized functions are introduced to substitute the non-differentiable components in the AP calculation. Different AP approximations are thus represented by a family of parameterized functions in a unified formula. Automatic parameter search algorithm is then employed to search for the optimal parameters. Extensive experiments on the COCO benchmark with three different object detectors (i. e. , RetinaNet, Faster R-CNN, and Deformable DETR) demonstrate that the proposed Parameterized AP Loss consistently outperforms existing handcrafted losses. Code shall be released.

AAAI Conference 2021 Conference Paper

Structure-aware Person Image Generation with Pose Decomposition and Semantic Correlation

  • Jilin Tang
  • Yi Yuan
  • Tianjia Shao
  • Yong Liu
  • Mengmeng Wang
  • Kun Zhou

In this paper we tackle the problem of pose guided person image generation, which aims to transfer a person image from the source pose to a novel target pose while maintaining the source appearance. Given the inefficiency of standard CNNs in handling large spatial transformation, we propose a structure-aware flow based method for high-quality person image generation. Specifically, instead of learning the complex overall pose changes of human body, we decompose the human body into different semantic parts (e. g. , head, torso, and legs) and apply different networks to predict the flow fields for these parts separately. Moreover, we carefully design the network modules to effectively capture the local and global semantic correlations of features within and among the human parts respectively. Extensive experimental results show that our method can generate high-quality results under large pose discrepancy and outperforms state-of-the-art methods in both qualitative and quantitative comparisons.

NeurIPS Conference 2021 Conference Paper

Towards Sharper Generalization Bounds for Structured Prediction

  • Shaojie Li
  • Yong Liu

In this paper, we investigate the generalization performance of structured prediction learning and obtain state-of-the-art generalization bounds. Our analysis is based on factor graph decomposition of structured prediction algorithms, and we present novel margin guarantees from three different perspectives: Lipschitz continuity, smoothness, and space capacity condition. In the Lipschitz continuity scenario, we improve the square-root dependency on the label set cardinality of existing bounds to a logarithmic dependence. In the smoothness scenario, we provide generalization bounds that are not only a logarithmic dependency on the label set cardinality but a faster convergence rate of order $\mathcal{O}(\frac{1}{n})$ on the sample size $n$. In the space capacity scenario, we obtain bounds that do not depend on the label set cardinality and have faster convergence rates than $\mathcal{O}(\frac{1}{\sqrt{n}})$. In each scenario, applications are provided to suggest that these conditions are easy to be satisfied.

AAAI Conference 2020 Conference Paper

Automated Spectral Kernel Learning

  • Jian Li
  • Yong Liu
  • Weiping Wang

The generalization performance of kernel methods is largely determined by the kernel, but spectral representations of stationary kernels are both input-independent and outputindependent, which limits their applications on complicated tasks. In this paper, we propose an efficient learning framework that incorporates the process of finding suitable kernels and model training. Using non-stationary spectral kernels and backpropagation w. r. t. the objective, we obtain favorable spectral representations that depends on both inputs and outputs. Further, based on Rademacher complexity, we derive data-dependent generalization error bounds, where we investigate the effect of those factors and introduce regularization terms to improve the performance. Extensive experimental results validate the effectiveness of the proposed algorithm and coincide with our theoretical findings.

IJCAI Conference 2020 Conference Paper

Contextualized Point-of-Interest Recommendation

  • Peng Han
  • Zhongxiao Li
  • Yong Liu
  • Peilin Zhao
  • Jing Li
  • Hao Wang
  • Shuo Shang

Point-of-interest (POI) recommendation has become an increasingly important sub-field of recommendation system research. Previous methods employ various assumptions to exploit the contextual information for improving the recommendation accuracy. The common property among them is that similar users are more likely to visit similar POIs and similar POIs would like to be visited by the same user. However, none of existing methods utilize similarity explicitly to make recommendations. In this paper, we propose a new framework for POI recommendation, which explicitly utilizes similarity with contextual information. Specifically, we categorize the context information into two groups, i. e. , global and local context, and develop different regularization terms to incorporate them for recommendation. A graph Laplacian regularization term is utilized to exploit the global context information. Moreover, we cluster users into different groups, and let the objective function constrain the users in the same group to have similar predicted POI ratings. An alternating optimization method is developed to optimize our model and get the final rating matrix. The results in our experiments show that our algorithm outperforms all the state-of-the-art methods.

AAAI Conference 2020 Conference Paper

Diversified Interactive Recommendation with Implicit Feedback

  • Yong Liu
  • Yingtai Xiao
  • Qiong Wu
  • Chunyan Miao
  • Juyong Zhang
  • Binqiang Zhao
  • Haihong Tang

Interactive recommender systems that enable the interactions between users and the recommender system have attracted increasing research attention. Previous methods mainly focus on optimizing recommendation accuracy. However, they usually ignore the diversity of the recommendation results, thus usually results in unsatisfying user experiences. In this paper, we propose a novel diversified recommendation model, named Diversified Contextual Combinatorial Bandit (DC2 B), for interactive recommendation with users’ implicit feedback. Specifically, DC2 B employs determinantal point process in the recommendation procedure to promote diversity of the recommendation results. To learn the model parameters, a Thompson sampling-type algorithm based on variational Bayesian inference is proposed. In addition, theoretical regret analysis is also provided to guarantee the performance of DC2 B. Extensive experiments on real datasets are performed to demonstrate the effectiveness of the proposed method in balancing the recommendation accuracy and diversity.

AAAI Conference 2020 Conference Paper

Divide-and-Conquer Learning with Nyström: Optimal Rate and Algorithm

  • Rong Yin
  • Yong Liu
  • Lijing Lu
  • Weiping Wang
  • Dan Meng

Kernel Regularized Least Squares (KRLS) is a fundamental learner in machine learning. However, due to the high time and space requirements, it has no capability to large scale scenarios. Therefore, we propose DC-NY, a novel algorithm that combines divide-and-conquer method, Nyström, conjugate gradient, and preconditioning to scale up KRLS, has the same accuracy of exact KRLS and the minimum time and space complexity compared to the state-of-the-art approximate KRLS estimates. We present a theoretical analysis of DC-NY, including a novel error decomposition with the optimal statistical accuracy guarantees. Extensive experimental results on several real-world large-scale datasets containing up to 1M data points show that DC-NY significantly outperforms the state-of-the-art approximate KRLS estimates.

AAAI Conference 2020 Conference Paper

FDN: Feature Decoupling Network for Head Pose Estimation

  • Hao Zhang
  • Mengmeng Wang
  • Yong Liu
  • Yi Yuan

Head pose estimation from RGB images without depth information is a challenging task due to the loss of spatial information as well as large head pose variations in the wild. The performance of existing landmark-free methods remains unsatisfactory as the quality of estimated pose is inferior. In this paper, we propose a novel three-branch network architecture, termed as Feature Decoupling Network (FDN), a more powerful architecture for landmark-free head pose estimation from a single RGB image. In FDN, we first propose a feature decoupling (FD) module to explicitly learn the discriminative features for each pose angle by adaptively recalibrating its channel-wise responses. Besides, we introduce a crosscategory center (CCC) loss to constrain the distribution of the latent variable subspaces and thus we can obtain more compact and distinct subspaces. Extensive experiments on both in-the-wild and controlled environment datasets demonstrate that the proposed method outperforms other state-of-the-art methods based on a single RGB image and behaves on par with approaches based on multimodal input resources.

AAAI Conference 2020 Conference Paper

From Few to More: Large-Scale Dynamic Multiagent Curriculum Learning

  • Weixun Wang
  • Tianpei Yang
  • Yong Liu
  • Jianye Hao
  • Xiaotian Hao
  • Yujing Hu
  • Yingfeng Chen
  • Changjie Fan

A lot of efforts have been devoted to investigating how agents can learn effectively and achieve coordination in multiagent systems. However, it is still challenging in large-scale multiagent settings due to the complex dynamics between the environment and agents and the explosion of state-action space. In this paper, we design a novel Dynamic Multiagent Curriculum Learning (DyMA-CL) to solve large-scale problems by starting from learning on a multiagent scenario with a small size and progressively increasing the number of agents. We propose three transfer mechanisms across curricula to accelerate the learning process. Moreover, due to the fact that the state dimension varies across curricula, and existing network structures cannot be applied in such a transfer setting since their network input sizes are fixed. Therefore, we design a novel network structure called Dynamic Agent-number Network (DyAN) to handle the dynamic size of the network input. Experimental results show that DyMA-CL using DyAN greatly improves the performance of large-scale multiagent learning compared with state-of-the-art deep reinforcement learning approaches. We also investigate the influence of three transfer mechanisms across curricula through extensive simulations.

IJCAI Conference 2020 Conference Paper

Learning Personalized Itemset Mapping for Cross-Domain Recommendation

  • Yinan Zhang
  • Yong Liu
  • Peng Han
  • Chunyan Miao
  • Lizhen Cui
  • Baoli Li
  • Haihong Tang

Cross-domain recommendation methods usually transfer knowledge across different domains implicitly, by sharing model parameters or learning parameter mappings in the latent space. Differing from previous studies, this paper focuses on learning explicit mapping between a user's behaviors (i. e. interaction itemsets) in different domains during the same temporal period. In this paper, we propose a novel deep cross-domain recommendation model, called Cycle Generation Networks (CGN). Specifically, CGN employs two generators to construct the dual-direction personalized itemset mapping between a user's behaviors in two different domains over time. The generators are learned by optimizing the distance between the generated itemset and the real interacted itemset, as well as the cycle-consistent loss defined based on the dual-direction generation procedure. We have performed extensive experiments on real datasets to demonstrate the effectiveness of the proposed model, comparing with existing single-domain and cross-domain recommendation methods.

AAAI Conference 2020 Conference Paper

Multi-Agent Game Abstraction via Graph Attention Neural Network

  • Yong Liu
  • Weixun Wang
  • Yujing Hu
  • Jianye Hao
  • Xingguo Chen
  • Yang Gao

In large-scale multi-agent systems, the large number of agents and complex game relationship cause great difficulty for policy learning. Therefore, simplifying the learning process is an important research issue. In many multi-agent systems, the interactions between agents often happen locally, which means that agents neither need to coordinate with all other agents nor need to coordinate with others all the time. Traditional methods attempt to use pre-defined rules to capture the interaction relationship between agents. However, the methods cannot be directly used in a large-scale environment due to the difficulty of transforming the complex interactions between agents into rules. In this paper, we model the relationship between agents by a complete graph and propose a novel game abstraction mechanism based on two-stage attention network (G2ANet), which can indicate whether there is an interaction between two agents and the importance of the interaction. We integrate this detection mechanism into graph neural network-based multi-agent reinforcement learning for conducting game abstraction and propose two novel learning algorithms GA-Comm and GA-AC. We conduct experiments in Traffic Junction and Predator-Prey. The results indicate that the proposed methods can simplify the learning process and meanwhile get better asymptotic performance compared with state-of-the-art algorithms.

AAAI Conference 2020 Conference Paper

Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose

  • Xianfang Zeng
  • Yusu Pan
  • Mengmeng Wang
  • Jiangning Zhang
  • Yong Liu

Recent works have shown how realistic talking face images can be obtained under the supervision of geometry guidance, e. g. , facial landmark or boundary. To alleviate the demand for manual annotations, in this paper, we propose a novel self-supervised hybrid model (DAE-GAN) that learns how to reenact face naturally given large amounts of unlabeled videos. Our approach combines two deforming autoencoders with the latest advances in the conditional generation. On the one hand, we adopt the deforming autoencoder to disentangle identity and pose representations. A strong prior in talking face videos is that each frame can be encoded as two parts: one for video-specific identity and the other for various poses. Inspired by that, we utilize a multi-frame deforming autoencoder to learn a pose-invariant embedded face for each video. Meanwhile, a multi-scale deforming autoencoder is proposed to extract pose-related information for each frame. On the other hand, the conditional generator allows for enhancing fine details and overall reality. It leverages the disentangled features to generate photo-realistic and pose-alike face images. We evaluate our model on VoxCeleb1 and RaFD dataset. Experiment results demonstrate the superior quality of reenacted images and the flexibility of transferring facial movements between identities.

AAAI Conference 2020 Conference Paper

RoboCoDraw: Robotic Avatar Drawing with GAN-Based Style Transfer and Time-Efficient Path Optimization

  • Tianying Wang
  • Wei Qi Toh
  • Hao Zhang
  • Xiuchao Sui
  • Shaohua Li
  • Yong Liu
  • Wei Jing

Robotic drawing has become increasingly popular as an entertainment and interactive tool. In this paper we present RoboCoDraw, a real-time collaborative robot-based drawing system that draws stylized human face sketches interactively in front of human users, by using the Generative Adversarial Network (GAN)-based style transfer and a Random-Key Genetic Algorithm (RKGA)-based path optimization. The proposed RoboCoDraw system takes a real human face image as input, converts it to a stylized avatar, then draws it with a robotic arm. A core component in this system is the Avatar- GAN proposed by us, which generates a cartoon avatar face image from a real human face. AvatarGAN is trained with unpaired face and avatar images only and can generate avatar images of much better likeness with human face images in comparison with the vanilla CycleGAN. After the avatar image is generated, it is fed to a line extraction algorithm and converted to sketches. An RKGA-based path optimization algorithm is applied to find a time-efficient robotic drawing path to be executed by the robotic arm. We demonstrate the capability of RoboCoDraw on various face images using a lightweight, safe collaborative robot UR5.

AAAI Conference 2019 Conference Paper

Approximate Kernel Selection with Strong Approximate Consistency

  • Lizhong Ding
  • Yong Liu
  • Shizhong Liao
  • Yu Li
  • Peng Yang
  • Yijie Pan
  • Chao Huang
  • Ling Shao

Kernel selection is fundamental to the generalization performance of kernel-based learning algorithms. Approximate kernel selection is an efficient kernel selection approach that exploits the convergence property of the kernel selection criteria and the computational virtue of kernel matrix approximation. The convergence property is measured by the notion of approximate consistency. For the existing Nyström approximations, whose sampling distributions are independent of the specific learning task at hand, it is difficult to establish the strong approximate consistency. They mainly focus on the quality of the low-rank matrix approximation, rather than the performance of the kernel selection criterion used in conjunction with the approximate matrix. In this paper, we propose a novel Nyström approximate kernel selection algorithm by customizing a criterion-driven adaptive sampling distribution for the Nyström approximation, which adaptively reduces the error between the approximate and accurate criteria. We theoretically derive the strong approximate consistency of the proposed Nyström approximate kernel selection algorithm. Finally, we empirically evaluate the approximate consistency of our algorithm as compared to state-of-the-art methods.

IJCAI Conference 2019 Conference Paper

Approximate Manifold Regularization: Scalable Algorithm and Generalization Analysis

  • Jian Li
  • Yong Liu
  • Rong Yin
  • Weiping Wang

Graph-based semi-supervised learning is one of the most popular and successful semi-supervised learning approaches. Unfortunately, it suffers from high time and space complexity, at least quadratic with the number of training samples. In this paper, we propose an efficient graph-based semi-supervised algorithm with a sound theoretical guarantee. The proposed method combines Nystrom subsampling and preconditioned conjugate gradient descent, substantially improving computational efficiency and reducing memory requirements. Extensive empirical results reveal that our method achieves the state-of-the-art performance in a short time even with limited computing resources.

AAAI Conference 2019 Conference Paper

Linear Kernel Tests via Empirical Likelihood for High-Dimensional Data

  • Lizhong Ding
  • Zhi Liu
  • Yu Li
  • Shizhong Liao
  • Yong Liu
  • Peng Yang
  • Ge Yu
  • Ling Shao

We propose a framework for analyzing and comparing distributions without imposing any parametric assumptions via empirical likelihood methods. Our framework is used to study two fundamental statistical test problems: the two-sample test and the goodness-of-fit test. For the two-sample test, we need to determine whether two groups of samples are from different distributions; for the goodness-of-fit test, we examine how likely it is that a set of samples is generated from a known target distribution. Specifically, we propose empirical likelihood ratio (ELR) statistics for the two-sample test and the goodness-of-fit test, both of which are of linear time complexity and show higher power (i. e. , the probability of correctly rejecting the null hypothesis) than the existing linear statistics for high-dimensional data. We prove the nonparametric Wilks’ theorems for the ELR statistics, which illustrate that the limiting distributions of the proposed ELR statistics are chi-square distributions. With these limiting distributions, we can avoid bootstraps or simulations to determine the threshold for rejecting the null hypothesis, which makes the ELR statistics more efficient than the recently proposed linear statistic, finite set Stein discrepancy (FSSD). We also prove the consistency of the ELR statistics, which guarantees that the test power goes to 1 as the number of samples goes to infinity. In addition, we experimentally demonstrate and theoretically analyze that FSSD has poor performance or even fails to test for high-dimensional data. Finally, we conduct a series of experiments to evaluate the performance of our ELR statistics as compared to state-of-the-art linear statistics.

IJCAI Conference 2019 Conference Paper

Multi-Class Learning using Unlabeled Samples: Theory and Algorithm

  • Jian Li
  • Yong Liu
  • Rong Yin
  • Weiping Wang

In this paper, we investigate the generalization performance of multi-class classification, for which we obtain a shaper error bound by using the notion of local Rademacher complexity and additional unlabeled samples, substantially improving the state-of-the-art bounds in existing multi-class learning methods. The statistical learning motivates us to devise an efficient multi-class learning framework with the local Rademacher complexity and Laplacian regularization. Coinciding with the theoretical analysis, experimental results demonstrate that the stated approach achieves better performance.

IJCAI Conference 2019 Conference Paper

PD-GAN: Adversarial Learning for Personalized Diversity-Promoting Recommendation

  • Qiong Wu
  • Yong Liu
  • Chunyan Miao
  • Binqiang Zhao
  • Yin Zhao
  • Lu Guan

This paper proposes Personalized Diversity-promoting GAN (PD-GAN), a novel recommendation model to generate diverse, yet relevant recommendations. Specifically, for each user, a generator recommends a set of diverse and relevant items by sequentially sampling from a personalized Determinantal Point Process (DPP) kernel matrix. This kernel matrix is constructed by two learnable components: the general co-occurrence of diverse items and the user's personal preference to items. To learn the first component, we propose a novel pairwise learning paradigm using training pairs, and each training pair consists of a set of diverse items and a set of similar items randomly sampled from the observed data of all users. The second component is learnt through adversarial training against a discriminator which strives to distinguish between recommended items and the ground-truth sets randomly sampled from the observed data of the target user. Experimental results show that PD-GAN is superior to generate recommendations that are both diverse and relevant.

IROS Conference 2019 Conference Paper

Research on Finite Ground Effect of a Rotor

  • Xinkuang Wang
  • Yong Liu
  • Chengwei Huang

The enrichment of the application scenarios of rotorcrafts presents new challenges for the study of their aerodynamic characteristics, such as operating above a building surface of finite size. In this paper, the ground effect is divided into infinite ground effect and finite ground effect, and three types of finite ground effects with different blocked area are studied. Through numerical simulations, the rotor thrust data and flow field figures in ground effect are obtained. Based on the rotor thrust data, mathematical models are established to describe the rotor thrust alteration caused by infinite and finite ground effect. The analysis of the flow field reveals the mechanism of the finite ground effect.

NeurIPS Conference 2019 Conference Paper

Two Generator Game: Learning to Sample via Linear Goodness-of-Fit Test

  • Lizhong Ding
  • Mengyang Yu
  • Li Liu
  • Fan Zhu
  • Yong Liu
  • Yu Li
  • Ling Shao

Learning the probability distribution of high-dimensional data is a challenging problem. To solve this problem, we formulate a deep energy adversarial network (DEAN), which casts the energy model learned from real data into an optimization of a goodness-of-fit (GOF) test statistic. DEAN can be interpreted as a GOF game between two generative networks, where one explicit generative network learns an energy-based distribution that fits the real data, and the other implicit generative network is trained by minimizing a GOF test statistic between the energy-based distribution and the generated data, such that the underlying distribution of the generated data is close to the energy-based distribution. We design a two-level alternative optimization procedure to train the explicit and implicit generative networks, such that the hyper-parameters can also be automatically learned. Experimental results show that DEAN achieves high quality generations compared to the state-of-the-art approaches.

IJCAI Conference 2019 Conference Paper

Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity

  • Liang Liu
  • Guangyao Zhai
  • Wenlong Ye
  • Yong Liu

Scene flow estimation in the dynamic scene remains a challenging task. Computing scene flow by a combination of 2D optical flow and depth has shown to be considerably faster with acceptable performance. In this work, we present a unified framework for joint unsupervised learning of stereo depth and optical flow with explicit local rigidity to estimate scene flow. We estimate camera motion directly by a Perspective-n-Point method from the optical flow and depth predictions, with RANSAC outlier rejection scheme. In order to disambiguate the object motion and the camera motion in the scene, we distinguish the rigid region by the re-project error and the photometric similarity. By joint learning with the local rigidity, both depth and optical networks can be refined. This framework boosts all four tasks: depth, optical flow, camera motion estimation, and object motion segmentation. Through the evaluation on the KITTI benchmark, we show that the proposed framework achieves state-of-the-art results amongst unsupervised methods. Our models and code are available at https: //github. com/lliuz/unrigidflow.

IJCAI Conference 2019 Conference Paper

Value Function Transfer for Deep Multi-Agent Reinforcement Learning Based on N-Step Returns

  • Yong Liu
  • Yujing Hu
  • Yang Gao
  • Yingfeng Chen
  • Changjie Fan

Many real-world problems, such as robot control and soccer game, are naturally modeled as sparse-interaction multi-agent systems. Reutilizing single-agent knowledge in multi-agent systems with sparse interactions can greatly accelerate the multi-agent learning process. Previous works rely on bisimulation metric to define Markov decision process (MDP) similarity for controlling knowledge transfer. However, bisimulation metric is costly to compute and is not suitable for high-dimensional state space problems. In this work, we propose more scalable transfer learning methods based on a novel MDP similarity concept. We start by defining the MDP similarity based on the N-step return (NSR) values of an MDP. Then, we propose two knowledge transfer methods based on deep neural networks called direct value function transfer and NSR-based value function transfer. We conduct experiments in image-based grid world, multi-agent particle environment (MPE) and Ms. Pac-Man game. The results indicate that the proposed methods can significantly accelerate multi-agent reinforcement learning and meanwhile get better asymptotic performance.

IJCAI Conference 2018 Conference Paper

Dynamic Bayesian Logistic Matrix Factorization for Recommendation with Implicit Feedback

  • Yong Liu
  • Lifan Zhao
  • Guimei Liu
  • Xinyan Lu
  • Peng Gao
  • Xiao-li Li
  • Zhihui Jin

Matrix factorization has been widely adopted for recommendation by learning latent embeddings of users and items from observed user-item interaction data. However, previous methods usually assume the learned embeddings are static or homogeneously evolving with the same diffusion rate. This is not valid in most scenarios, where users’ preferences and item attributes heterogeneously drift over time. To remedy this issue, we have proposed a novel dynamic matrix factorization model, named Dynamic Bayesian Logistic Matrix Factorization (DBLMF), which aims to learn heterogeneous user and item embeddings that are drifting with inconsistent diffusion rates. More specifically, DBLMF extends logistic matrix factorization to model the probability a user would like to interact with an item at a given timestamp, and a diffusion process to connect latent embeddings over time. In addition, an efficient Bayesian inference algorithm has also been proposed to make DBLMF scalable on large datasets. The effectiveness of the proposed method has been demonstrated by extensive experiments on real datasets, compared with the state-of-the-art methods.

IJCAI Conference 2018 Conference Paper

Fast Cross-Validation

  • Yong Liu
  • Hailun Lin
  • Lizhong Ding
  • Weiping Wang
  • Shizhong Liao

Cross-validation (CV) is the most widely adopted approach for selecting the optimal model. However, the computation of CV has high complexity due to multiple times of learner training, making it disabled for large scale model selection. In this paper, we present an approximate approach to CV based on the theoretical notion of Bouligand influence function (BIF) and the Nystr\"{o}m method for kernel methods. We first establish the relationship between the theoretical notion of BIF and CV, and propose a method to approximate the CV via the Taylor expansion of BIF. Then, we provide a novel computing method to calculate the BIF for general distribution, and evaluate BIF for sample distribution. Finally, we use the Nystr\"{o}m method to accelerate the computation of the BIF matrix for giving the finally approximate CV criterion. The proposed approximate CV requires training only once and is suitable for a wide variety of kernel methods. Experimental results on lots of datasets how that our approximate CV has no statistical discrepancy with the original CV, but can significantly improve the efficiency.

NeurIPS Conference 2018 Conference Paper

Multi-Class Learning: From Theory to Algorithm

  • Jian Li
  • Yong Liu
  • Rong Yin
  • Hua Zhang
  • Lizhong Ding
  • Weiping Wang

In this paper, we study the generalization performance of multi-class classification and obtain a shaper data-dependent generalization error bound with fast convergence rate, substantially improving the state-of-art bounds in the existing data-dependent generalization analysis. The theoretical analysis motivates us to devise two effective multi-class kernel learning algorithms with statistical guarantees. Experimental results show that our proposed methods can significantly outperform the existing multi-class classification methods.

AAAI Conference 2018 Conference Paper

Randomized Kernel Selection With Spectra of Multilevel Circulant Matrices

  • Lizhong Ding
  • Shizhong Liao
  • Yong Liu
  • Peng Yang
  • Xin Gao

Kernel selection aims at choosing an appropriate kernel function for kernel-based learning algorithms to avoid either underfitting or overfitting of the resulting hypothesis. One of the main problems faced by kernel selection is the evaluation of the goodness of a kernel, which is typically difficult and computationally expensive. In this paper, we propose a randomized kernel selection approach to evaluate and select the kernel with the spectra of the specifically designed multilevel circulant matrices (MCMs), which is statistically sound and computationally efficient. Instead of constructing the kernel matrix, we construct the randomized MCM to encode the kernel function and all data points together with labels. We build a one-to-one correspondence between all candidate kernel functions and the spectra of the randomized MCMs by Fourier transform. We prove the statistical properties of the randomized MCMs and the randomized kernel selection criteria, which theoretically qualify the utility of the randomized criteria in kernel selection. With the spectra of the randomized MCMs, we derive a series of randomized criteria to conduct kernel selection, which can be computed in log-linear time and linear space complexity by fast Fourier transform (FFT). Experimental results demonstrate that our randomized kernel selection criteria are significantly more efficient than the existing classic and widely-used criteria while preserving similar predictive performance.

AAAI Conference 2018 Conference Paper

SC2Net: Sparse LSTMs for Sparse Coding

  • Joey Tianyi Zhou
  • Kai Di
  • Jiawei Du
  • Xi Peng
  • Hao Yang
  • Sinno Jialin Pan
  • Ivor Tsang
  • Yong Liu

The iterative hard-thresholding algorithm (ISTA) is one of the most popular optimization solvers to achieve sparse codes. However, ISTA suffers from following problems: 1) ISTA employs non-adaptive updating strategy to learn the parameters on each dimension with a fixed learning rate. Such a strategy may lead to inferior performance due to the scarcity of diversity; 2) ISTA does not incorporate the historical information into the updating rules, and the historical information has been proven helpful to speed up the convergence. To address these challenging issues, we propose a novel formulation of ISTA (named as adaptive ISTA) by introducing a novel adaptive momentum vector. To efficiently solve the proposed adaptive ISTA, we recast it as a recurrent neural network unit and show its connection with the well-known long short term memory (LSTM) model. With a new proposed unit, we present a neural network (termed SC2Net) to achieve sparse codes in an end-to-end manner. To the best of our knowledge, this is one of the first works to bridge the 1-solver and LSTM, and may provide novel insights in understanding model-based optimization and LSTM. Extensive experiments show the effectiveness of our method on both unsupervised and supervised tasks.

IJCAI Conference 2017 Conference Paper

Efficient Kernel Selection via Spectral Analysis

  • Jian Li
  • Yong Liu
  • Hailun Lin
  • Yinliang Yue
  • Weiping Wang

Kernel selection is a fundamental problem of kernel methods. Existing measures for kernel selection either provide less theoretical guarantee or have high computational complexity. In this paper, we propose a novel kernel selection criterion based on a newly defined spectral measure of a kernel matrix, with sound theoretical foundation and high computational efficiency. We first show that the spectral measure can be used to derive generalization bounds for some kernel-based algorithms. By minimizing the derived generalization bounds, we propose the kernel selection criterion with spectral measure. Moreover, we demonstrate that the popular minimum graph cut and maximum mean discrepancy are two special cases of the proposed criterion. Experimental results on lots of data sets show that our proposed criterion can not only give the comparable results as the state-of-the-art criterion, but also significantly improve the efficiency.

AAAI Conference 2017 Conference Paper

Generalization Analysis for Ranking Using Integral Operator

  • Yong Liu
  • Shizhong Liao
  • Hailun Lin
  • Yinliang Yue
  • Weiping Wang

The study on generalization performance of ranking algorithms is one of the fundamental issues in ranking learning theory. Although several generalization bounds have been proposed based on different measures, the convergence rates of the existing bounds are usually at most O 1 √ n, where n is the size of data set. In this paper, we derive novel generalization bounds for the regularized ranking in reproducing kernel Hilbert space via integral operator of kernel function. We prove that the rates of our bounds are much faster than O 1 √ n. Specifically, we first introduce a notion of local Rademacher complexity for ranking, called local ranking Rademacher complexity, which is used to measure the complexity of the space of loss functions of the ranking. Then, we use the local ranking Rademacher complexity to obtain a basic generalization bound. Finally, we establish the relationship between the local Rademacher complexity and the eigenvalues of integral operator, and further derive sharp generalization bounds of faster convergence rate.

AAAI Conference 2017 Conference Paper

Infinite Kernel Learning: Generalization Bounds and Algorithms

  • Yong Liu
  • Shizhong Liao
  • Hailun Lin
  • Yinliang Yue
  • Weiping Wang

Kernel learning is a fundamental problem both in recent research and application of kernel methods. Existing kernel learning methods commonly use some measures of generalization errors to learn the optimal kernel in a convex (or conic) combination of prescribed basic kernels. However, the generalization bounds derived by these measures usually have slow convergence rates, and the basic kernels are finite and should be specified in advance. In this paper, we propose a new kernel learning method based on a novel measure of generalization error, called principal eigenvalue proportion (PEP), which can learn the optimal kernel with sharp generalization bounds over the convex hull of a possibly infinite set of basic kernels. We first derive sharp generalization bounds based on the PEP measure. Then we design two kernel learning algorithms for finite kernels and infinite kernels respectively, in which the derived sharp generalization bounds are exploited to guarantee faster convergence rates, moreover, basic kernels can be learned automatically for infinite kernel learning instead of being prescribed in advance. Theoretical analysis and empirical results demonstrate that the proposed kernel learning method outperforms the state-of-the-art kernel learning methods.

IJCAI Conference 2017 Conference Paper

Learning User Dependencies for Recommendation

  • Yong Liu
  • Peilin Zhao
  • Xin Liu
  • Min Wu
  • Lixin Duan
  • Xiao-li Li

Social recommender systems exploit users' social relationships to improve recommendation accuracy. Intuitively, a user tends to trust different people regarding with different scenarios. Therefore, one main challenge of social recommendation is to exploit the most appropriate dependencies between users for a given recommendation task. Previous social recommendation methods are usually developed based on pre-defined user dependencies. Thus, they may not be optimal for a specific recommendation task. In this paper, we propose a novel recommendation method, named probabilistic relational matrix factorization (PRMF), which can automatically learn the dependencies between users to improve recommendation accuracy. In PRMF, users' latent features are assumed to follow a matrix variate normal (MVN) distribution. Both positive and negative user dependencies can be modeled by the row precision matrix of the MVN distribution. Moreover, we also propose an alternating optimization algorithm to solve the optimization problem of PRMF. Extensive experiments on four real datasets have been performed to demonstrate the effectiveness of the proposed PRMF model.

IJCAI Conference 2017 Conference Paper

Online Multitask Relative Similarity Learning

  • Shuji Hao
  • Peilin Zhao
  • Yong Liu
  • Steven C. H. Hoi
  • Chunyan Miao

Relative similarity learning~(RSL) aims to learn similarity functions from data with relative constraints. Most previous algorithms developed for RSL are batch-based learning approaches which suffer from poor scalability when dealing with real-world data arriving sequentially. These methods are often designed to learn a single similarity function for a specific task. Therefore, they may be sub-optimal to solve multiple task learning problems. To overcome these limitations, we propose a scalable RSL framework named OMTRSL (Online Multi-Task Relative Similarity Learning). Specifically, we first develop a simple yet effective online learning algorithm for multi-task relative similarity learning. Then, we also propose an active learning algorithm to save the labeling cost. The proposed algorithms not only enjoy theoretical guarantee, but also show high efficacy and efficiency in extensive experiments on real-world datasets.

IJCAI Conference 2016 Conference Paper

Exploring the Context of Locations for Personalized Location Recommendations

  • Xin Liu
  • Yong Liu
  • Xiaoli Li

Conventional location recommendation models rely on users' visit history, geographical influence, temporal influence, etc. , to infer users' preferences for locations. However, systematically modeling a location's context (i. e. , the set of locations visited before or after this location) is relatively unexplored. In this paper, by leveraging the Skip-gram model, we learn the latent representation for a location to capture the influence of its context. A pair-wise ranking loss that considers the confidences of observed user preferences for locations is then proposed to learn users' latent representations for personalized top-N location recommendations. Moreover, we also extend our model by taking into account temporal influence. Stochastic gradient descent based optimization algorithms are developed to fit the models. We conduct comprehensive experiments over four real datasets. Experimental results demonstrate that our approach significantly outperforms the state-of-the-art location recommendation methods.

AAAI Conference 2016 Conference Paper

Information Credibility Evaluation on Social Media

  • Shu Wu
  • Qiang Liu
  • Yong Liu
  • Liang Wang
  • Tieniu Tan

With the growth of social media, rumors are spread fast and viewed by more and more people on the Internet. Rumors bring significant harm to daily life and public security. It is crucial to evaluate the credibility of information and detect the rumors on social media automatically. In this work, we establish a Network Information Credibility Evaluation (NICE) platform, which collects a database of rumors that have been verified on Sina Weibo and automatically evaluates the information which is generated by users on social media but has not been verified. Users can use a query to search related information. If the according information appears in our database, users can identify it is a rumor immediately. Otherwise, NICE will show users with realtime results crawled automatically from social media and can calculate credibility of a specific result with our algorithm. Our algorithm learns dynamic representations for information on social media based on behavior information, dynamic information, user information and comment information. Then, we use an ordinary logistic regression to classify information into rumors and non-rumors. Based on our algorithm, NICE system achieves satisfactory performance on evaluating information credibility and detecting rumors on social media.

IJCAI Conference 2015 Conference Paper

A Boosting Algorithm for Item Recommendation with Implicit Feedback

  • Yong Liu
  • Peilin Zhao
  • Aixin Sun
  • Chunyan Miao

Many recommendation tasks are formulated as top-N item recommendation problems based on users’ implicit feedback instead of explicit feedback. Here explicit feedback refers to users’ ratings to items while implicit feedback is derived from users’ interactions with items, e. g. , number of times a user plays a song. In this paper, we propose a boosting algorithm named AdaBPR (Adaptive Boosting Personalized Ranking) for top-N item recommendation using users’ implicit feedback. In the proposed framework, multiple homogeneous component recommenders are linearly combined to create an ensemble model, for better recommendation accuracy. The component recommenders are constructed based on a fixed collaborative filtering algorithm by using a re-weighting strategy, which assigns a dynamic weight distribution on the observed user-item interactions. AdaBPR demonstrates its effectiveness on three datasets compared with strong baseline algorithms.

AAAI Conference 2015 Conference Paper

Eigenvalues Ratio for Kernel Selection of Kernel Methods

  • Yong Liu
  • Shizhong Liao

The selection of kernel function which determines the mapping between the input space and the feature space is of crucial importance to kernel methods. Existing kernel selection approaches commonly use some measures of generalization error, which are usually difficult to estimate and have slow convergence rates. In this paper, we propose a novel measure, called eigenvalues ratio (ER), of the tight bound of generalization error for kernel selection. ER is the ratio between the sum of the main eigenvalues and that of the tail eigenvalues of the kernel matrix. Different from most of existing measures, ER is defined on the kernel matrix, so it can be estimated easily from the available training data, which makes it usable for kernel selection. We establish tight ER-based generalization error bounds of order O 1 n for several kernel-based methods under certain general conditions, while for most of existing measures, the convergence rate is at most O 1 √ n. Finally, to guarantee good generalization performance, we propose a novel kernel selection criterion by minimizing the derived tight generalization error bounds. Theoretical analysis and experimental results demonstrate that our kernel selection criterion is a good choice for kernel selection.

ICRA Conference 2010 Conference Paper

Dynamic model and adaptive tracking controller for 4-Powered Caster Vehicle

  • Yong Liu
  • Yunyi Jia
  • Ning Xi 0001

A new approach for adaptive torque distribution of 4-Powered Caster Vehicle (4-PCV) is presented on complex terrain without any additional sensor. The objective is that torques applied to wheels are dynamically redistributed based on the real time conditions of the whole wheel-ground interactions in order to track the desired trajectory. A novel approach based on the redundant actuated wheels is proposed to identify the status of the vehicle and the wheel slip ratio by only observing the velocity feedback from motors encoders. A dynamic model considering the wheel-ground interaction is described. Based on the slip ratio of the wheel joints and the null space of the operational space, control strategies are employed to redistribute the torques applied to the wheel joints so that each wheel can be self-adapted to meet a complex wheel-ground condition to eliminate slippage with high rate. Simulation results show the effectiveness of the proposed estimation approach and the performance of the torque distribution schemes.

IROS Conference 2009 Conference Paper

An automated method to calibrate industrial robot joint offset using virtual line-based single-point constraint approach

  • Yong Liu
  • Ning Xi 0001
  • George Zhang 0001
  • Xiongzi Li
  • Heping Chen
  • Chi Zhang 0031
  • Michael J. Jeffery
  • Thomas A. Fuhlbrigge

This paper describes an industrial robot joint offset calibration method called the virtual line-based single-point constraint approach. Previous methods such as using CMM, laser trackers or cameras are limited by the cost or the resolution. The proposed method relies mainly upon a laser pointer attached on the end-effector and single position-sensitive detector (PSD) arbitrarily located on the workcell. The automated calibration procedure (about three minutes) involves aiming the laser lines loaded by the robot towards the center of the PSD surface from various robot positions and orientations. The intersections of each pair of laser lines eventually should converge to the same point after compensating the joint offsets. An optimization model and algorithm have been formulated to identify the robot offset. For the highly precise feedback, a segmented PSD with a position resolution of better than 0. 1 ¿m is employed. The mean accuracy of robot localization is up to 0. 02 mm, and the mean error of the parameter identification is less than 0. 08 degrees. Both simulations and experiments implemented on an ABB industrial robot verify the feasibility of the proposed method and demonstrated the effectiveness of the developed calibration system. The goal of fast, automated, low-cost, and high precision offset calibration are achieved.

IROS Conference 2009 Conference Paper

Development and sensitivity analysis of a portable calibration system for joint offset of industrial robot

  • Yong Liu
  • Ning Xi 0001
  • Jianguo Zhao
  • Erick Nieves-Rivera
  • Yunyi Jia
  • Bingtuan Gao
  • Jun Lu

This paper describes our updated system for industrial robot joint offset calibration. The system consists of an IRB1600 industrial robot, a laser tool attached to the robot's end-effector, a portable position-sensitive device (PPD), and a PC based controller. By aiming the laser spot to the center of position-sensitive-detector (PSD) on the PPD with different robot configurations, the developed system ideally implements our proposed calibration method called the virtual line-based single-point constraint approach. However, unlike our previous approach, the calibration method is extended to identify the offset parameters with an uncalibrated laser tool. The position errors of the PPD and the sensitivities of error in the PSD plane to the variation of joint angles are analyzed. Two different robot configuration patterns are compared by implementing the calibration method. Both simulation and real experimental results are consistent with the mathematical analysis. Experimental results with small (10 −3 −10 −2 ) mean and standard deviation of parameters error verify the effectiveness of both the sensitivity analysis and the developed system.

IS Journal 2008 Journal Article

The Smart Architect: Scalable Ontology-Based Modeling of Ancient Chinese Architectures

  • Yong Liu
  • Congfu Xu
  • Qiong Zhang
  • Yunhe Pan

The Smart Architect is an innovative intelligent system that can generate ancient Chinese architectures of similar styles or structures automatically. Using an ontology-based approach to analyze different architectural styles, the system converts geometry primitives into semantic architecture components. The modeling process can be performed at semantic levels and requires only certain knowledge in the corresponding architectural domain. In addition, a granular-based knowledge-refining method obtains more accurate knowledge with respect to the specific domains.

NeurIPS Conference 1993 Conference Paper

Robust Parameter Estimation and Model Selection for Neural Network Regression

  • Yong Liu

In this paper, it is shown that the conventional back-propagation (BPP) algorithm for neural network regression is robust to lever(cid: 173) ages (data with: n corrupted), but not to outliers (data with y corrupted). A robust model is to model the error as a mixture of normal distribution. The influence function for this mixture model is calculated and the condition for the model to be robust to outliers is given. EM algorithm [5] is used to estimate the parameter. The usefulness of model selection criteria is also discussed. Illustrative simulations are performed.

NeurIPS Conference 1992 Conference Paper

Neural Network Model Selection Using Asymptotic Jackknife Estimator and Cross-Validation Method

  • Yong Liu

Two theorems and a lemma are presented about the use of jackknife es(cid: 173) timator and the cross-validation method for model selection. Theorem 1 gives the asymptotic form for the jackknife estimator. Combined with the model selection criterion, this asymptotic form can be used to obtain the fit of a model. The model selection criterion we used is the negative of the average predictive likehood, the choice of which is based on the idea of the cross-validation method. Lemma 1 provides a formula for further explo(cid: 173) ration of the asymptotics of the model selection criterion. Theorem 2 gives an asymptotic form of the model selection criterion for the regression case, when the parameters optimization criterion has a penalty term. Theorem 2 also proves the asymptotic equivalence of Moody's model selection cri(cid: 173) terion (Moody, 1992) and the cross-validation method, when the distance measure between response y and regression function takes the form of a squared difference.