Arrow Research search

Author name cluster

Xiangyang Ji

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

72 papers
2 author rows

Possible papers

72

AAAI Conference 2026 Conference Paper

Bridging Cognitive Gap: Hierarchical Description Learning for Artistic Image Aesthetics Assessment

  • Henglin Liu
  • Nisha Huang
  • Chang Liu
  • Jiangpeng Yan
  • Huijuan Huang
  • Jixuan Ying
  • Tong-Yee Lee
  • Pengfei Wan

The aesthetic quality assessment task is crucial for developing a human-aligned quantitative evaluation system for AIGC. However, its inherently complex nature—spanning visual perception, cognition, and emotion—poses fundamental challenges. Although aesthetic descriptions offer a viable representation of this complexity, two critical challenges persist: (1) data scarcity and imbalance: existing dataset overly focuses on visual perception and neglects deeper dimensions due to the expensive manual annotation; and (2) model fragmentation: current visual networks isolate aesthetic attributes with multi-branch encoder, while multimodal methods represented by contrastive learning struggle to effectively process long-form textual descriptions. To resolve challenge (1), we first present the Refined Aesthetic Description (RAD) dataset, a large-scale (70k), multi-dimensional structured dataset, generated via an iterative pipeline without heavy annotation costs and easy to scale. To address challenge (2), we propose ArtQuant, an aesthetics assessment framework for artistic image which not only couple isolated aesthetic dimensions through joint description generation, but also better model long-text semantics with the help of LLM decoders. Besides, theoretical analysis confirms this symbiosis: RAD's semantic adequacy (data) and generation paradigm (model) collectively minimize prediction entropy, providing mathematical grounding for the framework. Our approach achieves state-of-the-art performance on several datasets while requiring only 33% of conventional training epochs, narrowing the cognitive gap between artistic image and aesthetic judgment. We will release both code and dataset to support future research.

AAAI Conference 2026 Conference Paper

Score-Based Model for Low-Rank Tensor Recovery

  • Zhengyun Cheng
  • Changhao Wang
  • Guanwen Zhang
  • Yi Xu
  • Wei Zhou
  • Xiangyang Ji

Low-rank tensor decompositions (TDs) provide an effective framework for multiway data analysis. Traditional TD methods rely on predefined structural assumptions, such as CP or Tucker decompositions. From a probabilistic perspective, these methods effectively model the relationships between latent factors and the low-rank tensor using Dirac delta distributions. However, tensor low-rank decomposition is inherently non-unique, leading to a multimodal distribution over possible solutions. Critically, such prior knowledge is rarely available in practical scenarios, particularly regarding the optimal rank structure and contraction rules. To address this issue, we propose a score-based model that eliminates the need for predefined structural or distributional assumptions, enabling the learning of compatibility between tensors and latent factors. Specifically, a neural network is designed to learn the energy function, which is optimized via score matching to capture the gradient of the joint log-probability of tensor entries and latent factors. Our method allows for modeling structures and distributions beyond the Dirac delta assumption. Moreover, integrating the block coordinate descent (BCD) algorithm with the proposed smooth regularization enables the model to perform both tensor completion and denoising. Experimental results demonstrate significant performance improvements across various tensor types, including sparse and continuous-time tensors, as well as visual data.

NeurIPS Conference 2025 Conference Paper

Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning

  • Yixiu Mao
  • Yun Qu
  • Qi Wang
  • Xiangyang Ji

Offline reinforcement learning (RL) suffers from extrapolation errors induced by out-of-distribution (OOD) actions. To address this, offline RL algorithms typically impose constraints on action selection, which can be systematically categorized into density, support, and sample constraints. However, we show that each category has inherent limitations: density and sample constraints tend to be overly conservative in many scenarios, while the support constraint, though least restrictive, faces challenges in accurately modeling the behavior policy. To overcome these limitations, we propose a new neighborhood constraint that restricts action selection in the Bellman target to the union of neighborhoods of dataset actions. Theoretically, the constraint not only bounds extrapolation errors and distribution shift under certain conditions, but also approximates the support constraint without requiring behavior policy modeling. Moreover, it retains substantial flexibility and enables pointwise conservatism by adapting the neighborhood radius for each data point. In practice, we employ data quality as the adaptation criterion and design an adaptive neighborhood constraint. Building on an efficient bilevel optimization framework, we develop a simple yet effective algorithm, Adaptive Neighborhood-constrained Q learning (ANQ), to perform Q learning with target actions satisfying this constraint. Empirically, ANQ achieves state-of-the-art performance on standard offline RL benchmarks and exhibits strong robustness in scenarios with noisy or limited data.

ICLR Conference 2025 Conference Paper

Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits

  • Zihan Zhang
  • Xiangyang Ji
  • Yuan Zhou 0007

We study the optimal batch-regret tradeoff for batch linear contextual bandits. For this problem, we design batch learning algorithms and prove that they achieve the optimal regret bounds (up to logarithmic factors) for any batch number $M$, number of actions $K$, time horizon $T$, and dimension $d$. Therefore, we establish the \emph{full-parameter-range} (almost) optimal batch-regret tradeoff for the batch linear contextual bandit problem. Along our analysis, we also prove a new matrix concentration inequality with dependence on their dynamic upper bounds, which, to the best of our knowledge, is the first of its kind in literature and maybe of independent interest.

ICML Conference 2025 Conference Paper

Are High-Quality AI-Generated Images More Difficult for Models to Detect?

  • Yao Xiao
  • Binbin Yang
  • Weiyan Chen
  • Jiahao Chen
  • Zijie Cao
  • ZiYi Dong
  • Xiangyang Ji
  • Liang Lin

The remarkable evolution of generative models has enabled the generation of high-quality, visually attractive images, often perceptually indistinguishable from real photographs to human eyes. This has spurred significant attention on AI-generated image (AIGI) detection. Intuitively, higher image quality should increase detection difficulty. However, our systematic study on cutting-edge text-to-image generators reveals a counterintuitive finding: AIGIs with higher quality scores, as assessed by human preference models, tend to be more easily detected by existing models. To investigate this, we examine how the text prompts for generation and image characteristics influence both quality scores and detector accuracy. We observe that images from short prompts tend to achieve higher preference scores while being easier to detect. Furthermore, through clustering and regression analyses, we verify that image characteristics like saturation, contrast, and texture richness collectively impact both image quality and detector accuracy. Finally, we demonstrate that the performance of off-the-shelf detectors can be enhanced across diverse generators and datasets by selecting input patches based on the predicted scores of our regression models, thus substantiating the broader applicability of our findings. Code and data are available at https: //github. com/Coxy7/AIGI-Detection-Quality-Paradox.

NeurIPS Conference 2025 Conference Paper

Delving into Cascaded Instability: A Lipschitz Continuity View on Image Restoration and Object Detection Synergy

  • Qing Zhao
  • Weijian Deng
  • Pengxu Wei
  • ZiYi Dong
  • Hannan Lu
  • Xiangyang Ji
  • Liang Lin

To improve detection robustness in adverse conditions (e. g. , haze and low light), image restoration is commonly applied as a pre-processing step to enhance image quality for the detector. However, the functional mismatch between restoration and detection networks can introduce instability and hinder effective integration---an issue that remains underexplored. We revisit this limitation through the lens of Lipschitz continuity, analyzing the functional differences between restoration and detection networks in both the input space and the parameter space. Our analysis shows that restoration networks perform smooth, continuous transformations, while object detectors operate with discontinuous decision boundaries, making them highly sensitive to minor perturbations. This mismatch introduces instability in traditional cascade frameworks, where even imperceptible noise from restoration is amplified during detection, disrupting gradient flow and hindering optimization. To address this, we propose Lipschitz-regularized object detection (LROD), a simple yet effective framework that integrates image restoration directly into the detector’s feature learning, harmonizing the Lipschitz continuity of both tasks during training. We implement this framework as Lipschitz-regularized YOLO (LR-YOLO), extending seamlessly to existing YOLO detectors. Extensive experiments on haze and low-light benchmarks demonstrate that LR-YOLO consistently improves detection stability, optimization smoothness, and overall accuracy.

ICML Conference 2025 Conference Paper

Fast and Robust: Task Sampling with Posterior and Diversity Synergies for Adaptive Decision-Makers in Randomized Environments

  • Yun Qu 0002
  • Cheems Wang
  • Yixiu Mao
  • Yiqin Lv
  • Xiangyang Ji

Task robust adaptation is a long-standing pursuit in sequential decision-making. Some risk-averse strategies, e. g. , the conditional value-at-risk principle, are incorporated in domain randomization or meta reinforcement learning to prioritize difficult tasks in optimization, which demand costly intensive evaluations. The efficiency issue prompts the development of robust active task sampling to train adaptive policies, where risk-predictive models can surrogate policy evaluation. This work characterizes robust active task sampling as a secret Markov decision process, posits theoretical and practical insights, and constitutes robustness concepts in risk-averse scenarios. Importantly, we propose an easy-to-implement method, referred to as Posterior and Diversity Synergized Task Sampling (PDTS), to accommodate fast and robust sequential decision-making. Extensive experiments show that PDTS unlocks the potential of robust active task sampling, significantly improves the zero-shot and few-shot adaptation robustness in challenging tasks, and even accelerates the learning process under certain scenarios.

NeurIPS Conference 2025 Conference Paper

FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts

  • Heming Zou
  • Yunliang Zang
  • Wutong Xu
  • Yao Zhu
  • Xiangyang Ji

Low-Rank Adaptation (LoRA) is a widely used parameter-efficient fine-tuning method for foundation models, but it suffers from parameter interference, resulting in suboptimal performance. Although Mixture-of-Experts (MoE)-based LoRA variants show promise in mitigating intra-task correlations in single-task instruction tuning, they introduce additional router parameters and remain ineffective in multi-task model merging where inter-task interference arises. Inspired by the fly olfactory circuit, we propose FlyLoRA, an implicit MoE-based LoRA variant that introduces: (1) rank-wise expert activation in the up-projection matrix, and (2) an implicit router that unifies expert routing and down-projection, where a frozen sparse random projection matrix replaces the traditional dense trainable version. This design resolves the trade-off between intra-task decorrelation and computational efficiency by eliminating the need for an explicit router, while inherently mitigating inter-task interference due to the orthogonality property of random matrices. Extensive experiments across four domains---general knowledge understanding, scientific question answering, mathematical reasoning, and code generation---demonstrate consistent performance improvements over existing methods. Beyond empirical gains, FlyLoRA highlights how biological structures can inspire innovations in AI technologies. Code is available at https: //github. com/gfyddha/FlyLoRA.

AAAI Conference 2025 Conference Paper

Know2Vec: A Black-Box Proxy for Neural Network Retrieval

  • Zhuoyi Shang
  • Yanwei Liu
  • Jinxia Liu
  • Xiaoyan Gu
  • Ying Ding
  • Xiangyang Ji

For general users, training a neural network from scratch is usually challenging and labor-intensive. Fortunately, neural network zoos enable them to find a well-performing model for directly use or fine-tuning it in their local environments. Although current model retrieval solutions attempt to convert neural network models into vectors to avoid complex multiple inference processes required for model selection, it is still difficult to choose a suitable model due to inaccurate vectorization and biased correlation alignment between the query dataset and models. From the perspective of knowledge consistency, i.e., whether the knowledge possessed by the model can meet the needs of query tasks, we propose a model retrieval scheme, named Know2Vec, that acts as a black-box retrieval proxy for model zoo. Know2Vec first accesses to models via a black-box interface in advance, capturing vital decision knowledge from models while ensuring their privacy. Next, it employs an effective encoding technique to transform the knowledge into precise model vectors. Secondly, it maps the user's query task to a knowledge vector by probing the semantic relationships within query samples. Furthermore, the proxy ensures the knowledge-consistency between query vector and model vectors within their alignment space, which is optimized through the supervised learning with diverse loss functions, and finally it can identify the most suitable model for a given task during the inference stage. Extensive experiments show that our Know2Vec achieves superior retrieval accuracy against the state-of-the-art methods in diverse neural network retrieval tasks.

AAAI Conference 2025 Conference Paper

Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning

  • Yun Qu
  • Yuhang Jiang
  • Boyuan Wang
  • Yixiu Mao
  • Cheems Wang
  • Chang Liu
  • Xiangyang Ji

Reinforcement learning (RL) often encounters delayed and sparse feedback in real-world applications, even with only episodic rewards. Previous approaches have made some progress in reward redistribution for credit assignment but still face challenges, including training difficulties due to redundancy and ambiguous attributions stemming from overlooking the multifaceted nature of mission performance evaluation. Hopefully, Large Language Model (LLM) encompasses fruitful decision-making knowledge and provides a plausible tool for reward redistribution. Even so, deploying LLM in this case is non-trivial due to the misalignment between linguistic knowledge and the symbolic form requirement, together with inherent randomness and hallucinations in inference. To tackle these issues, we introduce LaRe, a novel LLM-empowered symbolic-based decision-making framework, to improve credit assignment. Key to LaRe is the concept of the Latent Reward, which works as a multi-dimensional performance evaluation, enabling more interpretable goal attainment from various perspectives and facilitating more effective reward redistribution. We examine that semantically generated code from LLM can bridge linguistic knowledge and symbolic latent rewards, as it is executable for symbolic objects. Meanwhile, we design latent reward self-verification to increase the stability and reliability of LLM inference. Theoretically, reward-irrelevant redundancy elimination in the latent reward benefits RL performance from more accurate reward estimation. Extensive experimental results witness that LaRe (i) achieves superior temporal credit assignment to SOTA methods, (ii) excels in allocating contributions among multiple agents, and (iii) outperforms policies trained with ground truth rewards for certain tasks.

NeurIPS Conference 2025 Conference Paper

Real-Time Scene-Adaptive Tone Mapping for High-Dynamic Range Object Detection

  • Gongzhe Li
  • Linwei Qiu
  • Peibei Cao
  • Fengying Xie
  • Xiangyang Ji
  • Qilin Sun

High dynamic range (HDR) images, with their rich tone and detail reproduction, hold significant potential to enhance computer vision systems, particularly in autonomous driving. However, most neural networks for embedded vision are trained on low dynamic range (LDR) inputs and suffer substantial performance degradation when handling high-bit-depth HDR images due to the challenges posed by extreme dynamic ranges. In this paper, we propose a novel tone mapping method that not only bridges the gap between HDR RAW inputs and the LDR sRGB requirements of detection networks but also achieves end-to-end optimization with the downstream tasks. Instead of relying on traditional image signal processing (ISP) pipeline, we introduce neural photometric calibration to regularize dynamic ranges and a scaling-invariant local tone mapping module to preserve image details. In addition, our architecture also supports performance transfer finetuning, enabling efficient adaptation from the LDR model to the HDR RAW model with minimal cost. The proposed method outperforms traditional tone mapping algorithms and advanced AI-ISP methods in challenging automotive HDR scenes. Moreover, our pipeline achieves real-time processing of 4K high-bit-depth HDR inputs on the Nvidia Jetson platform.

NeurIPS Conference 2024 Conference Paper

$\epsilon$-Softmax: Approximating One-Hot Vectors for Mitigating Label Noise

  • Jialiang Wang
  • Xiong Zhou
  • Deming Zhai
  • Junjun Jiang
  • Xiangyang Ji
  • Xianming Liu

Noisy labels pose a common challenge for training accurate deep neural networks. To mitigate label noise, prior studies have proposed various robust loss functions to achieve noise tolerance in the presence of label noise, particularly symmetric losses. However, they usually suffer from the underfitting issue due to the overly strict symmetric condition. In this work, we propose a simple yet effective approach for relaxing the symmetric condition, namely **$\epsilon$-softmax**, which simply modifies the outputs of the softmax layer to approximate one-hot vectors with a controllable error $\epsilon$. Essentially, ***$\epsilon$-softmax** not only acts as an alternative for the softmax layer, but also implicitly plays the crucial role in modifying the loss function. * We prove theoretically that **$\epsilon$-softmax** can achieve noise-tolerant learning with controllable excess risk bound for almost any loss function. Recognizing that **$\epsilon$-softmax**-enhanced losses may slightly reduce fitting ability on clean datasets, we further incorporate them with one symmetric loss, thereby achieving a better trade-off between robustness and effective learning. Extensive experiments demonstrate the superiority of our method in mitigating synthetic and real-world label noise.

IJCAI Conference 2024 Conference Paper

CompetEvo: Towards Morphological Evolution from Competition

  • Kangyao Huang
  • Di Guo
  • Xinyu Zhang
  • Xiangyang Ji
  • Huaping Liu

Training an agent to adapt to specific tasks through co-optimization of morphology and control has widely attracted attention. However, whether there exists an optimal configuration and tactics for agents in a multiagent competition scenario is still an issue that is challenging to definitively conclude. In this context, we propose competitive evolution (CompetEvo), which co-evolves agents' designs and tactics in confrontation. We build arenas consisting of three animals and their evolved derivatives, placing agents with different morphologies in direct competition with each other. The results reveal that our method enables agents to evolve a more suitable design and strategy for fighting compared to fixed-morph agents, allowing them to obtain advantages in combat scenarios. Moreover, we demonstrate the amazing and impressive behaviors that emerge when confrontations are conducted under asymmetrical morphs.

ICML Conference 2024 Conference Paper

Data-free Neural Representation Compression with Riemannian Neural Dynamics

  • Zhengqi Pei
  • Anran Zhang
  • Shuhui Wang
  • Xiangyang Ji
  • Qingming Huang

Neural models are equivalent to dynamic systems from a physics-inspired view, implying that computation on neural networks can be interpreted as the dynamical interactions between neurons. However, existing work models neuronal interaction as a weight-based linear transformation, and the nonlinearity comes from the nonlinear activation functions, which leads to limited nonlinearity and data-fitting ability of the whole neural model. Inspired by Riemannian geometry, we interpret neural structures by projecting neurons onto the Riemannian neuronal state space and model neuronal interaction with Riemannian metric (${\it RieM}$), which provides a more efficient neural representation with higher parameter efficiency. With ${\it RieM}$, we further design a novel data-free neural compression mechanism that does not require additional fine-tuning with real data. Using backbones like ResNet and Vision Transformer, we conduct extensive experiments on datasets such as MNIST, CIFAR-100, ImageNet-1k, and COCO object detection. Empirical results show that, under equal compression rates and computational complexity, models compressed with ${\it RieM}$ achieve superior inference accuracy compared to existing data-free compression methods.

NeurIPS Conference 2024 Conference Paper

Doubly Mild Generalization for Offline Reinforcement Learning

  • Yixiu Mao
  • Qi Wang
  • Yun Qu
  • Yuhang Jiang
  • Xiangyang Ji

Offline Reinforcement Learning (RL) suffers from the extrapolation error and value overestimation. From a generalization perspective, this issue can be attributed to the over-generalization of value functions or policies towards out-of-distribution (OOD) actions. Significant efforts have been devoted to mitigating such generalization, and recent in-sample learning approaches have further succeeded in entirely eschewing it. Nevertheless, we show that mild generalization beyond the dataset can be trusted and leveraged to improve performance under certain conditions. To appropriately exploit generalization in offline RL, we propose Doubly Mild Generalization (DMG), comprising (i) mild action generalization and (ii) mild generalization propagation. The former refers to selecting actions in a close neighborhood of the dataset to maximize the Q values. Even so, the potential erroneous generalization can still be propagated, accumulated, and exacerbated by bootstrapping. In light of this, the latter concept is introduced to mitigate the generalization propagation without impeding the propagation of RL learning signals. Theoretically, DMG guarantees better performance than the in-sample optimal policy in the oracle generalization scenario. Even under worst-case generalization, DMG can still control value overestimation at a certain level and lower bound the performance. Empirically, DMG achieves state-of-the-art performance across Gym-MuJoCo locomotion tasks and challenging AntMaze tasks. Moreover, benefiting from its flexibility in both generalization aspects, DMG enjoys a seamless transition from offline to online learning and attains strong online fine-tuning performance.

NeurIPS Conference 2024 Conference Paper

Event-3DGS: Event-based 3D Reconstruction Using 3D Gaussian Splatting

  • Haiqian Han
  • Jianing Li
  • Henglu Wei
  • Xiangyang Ji

Event cameras, offering high temporal resolution and high dynamic range, have brought a new perspective to addressing 3D reconstruction challenges in fast-motion and low-light scenarios. Most methods use the Neural Radiance Field (NeRF) for event-based photorealistic 3D reconstruction. However, these NeRF methods suffer from time-consuming training and inference, as well as limited scene-editing capabilities of implicit representations. To address these problems, we propose Event-3DGS, the first event-based reconstruction using 3D Gaussian splatting (3DGS) for synthesizing novel views freely from event streams. Technically, we first propose an event-based 3DGS framework that directly processes event data and reconstructs 3D scenes by simultaneously optimizing scenario and sensor parameters. Then, we present a high-pass filter-based photovoltage estimation module, which effectively reduces noise in event data to improve the robustness of our method in real-world scenarios. Finally, we design an event-based 3D reconstruction loss to optimize the parameters of our method for better reconstruction quality. The results show that our method outperforms state-of-the-art methods in terms of reconstruction quality on both simulated and real-world datasets. We also verify that our method can perform robust 3D reconstruction even in real-world scenarios with extreme noise, fast motion, and low-light conditions. Our code is available in https: //github. com/lanpokn/Event-3DGS.

NeurIPS Conference 2024 Conference Paper

Expanding Sparse Tuning for Low Memory Usage

  • Shufan Shen
  • Junshu Sun
  • Xiangyang Ji
  • Qingming Huang
  • Shuhui Wang

Parameter-efficient fine-tuning (PEFT) is an effective method for adapting pre-trained vision models to downstream tasks by tuning a small subset of parameters. Among PEFT methods, sparse tuning achieves superior performance by only adjusting the weights most relevant to downstream tasks, rather than densely tuning the whole weight matrix. However, this performance improvement has been accompanied by increases in memory usage, which stems from two factors, i. e. , the storage of the whole weight matrix as learnable parameters in the optimizer and the additional storage of tunable weight indexes. In this paper, we propose a method named SNELL (Sparse tuning with kerNELized LoRA) for sparse tuning with low memory usage. To achieve low memory usage, SNELL decomposes the tunable matrix for sparsification into two learnable low-rank matrices, saving from the costly storage of the whole original matrix. A competition-based sparsification mechanism is further proposed to avoid the storage of tunable weight indexes. To maintain the effectiveness of sparse tuning with low-rank matrices, we extend the low-rank decomposition by applying nonlinear kernel functions to the whole-matrix merging. Consequently, we gain an increase in the rank of the merged matrix, enhancing the ability of SNELL in adapting the pre-trained models to downstream tasks. Extensive experiments on multiple downstream tasks show that SNELL achieves state-of-the-art performance with low memory usage, endowing PEFT with sparse tuning to large-scale models. Codes are available at https: //github. com/ssfgunner/SNELL.

ICML Conference 2024 Conference Paper

Kepler codebook

  • Junrong Lian
  • Ziyue Dong
  • Pengxu Wei
  • Wei Ke 0003
  • Chang Liu 0030
  • Qixiang Ye
  • Xiangyang Ji
  • Liang Lin

A codebook designed for learning discrete distributions in latent space has demonstrated state-of-the-art results on generation tasks. This inspires us to explore what distribution of codebook is better. Following the spirit of Kepler’s Conjecture, we cast the codebook training as solving the sphere packing problem and derive a Kepler codebook with a compact and structured distribution to obtain a codebook for image representations. Furthermore, we implement the Kepler codebook training by simply employing this derived distribution as regularization and using the codebook partition method. We conduct extensive experiments to evaluate our trained codebook for image reconstruction and generation on natural and human face datasets, respectively, achieving significant performance improvement. Besides, our Kepler codebook has demonstrated superior performance when evaluated across datasets and even for reconstructing images with different resolutions. Our trained models and source codes will be publicly released.

ICML Conference 2024 Conference Paper

Learning Scale-Aware Spatio-temporal Implicit Representation for Event-based Motion Deblurring

  • Wei Yu 0004
  • Jianing Li 0001
  • Shengping Zhang
  • Xiangyang Ji

Existing event-based motion deblurring methods mostly focus on restoring images with the same spatial and temporal scales as events. However, the unknown scales of images and events in the real world pose great challenges and have rarely been explored. To address this gap, we propose a novel Scale-Aware Spatio-temporal Network (SASNet) to flexibly restore blurred images with event streams at arbitrary scales. The core idea is to implicitly aggregate both spatial and temporal correspondence features of images and events to generalize at continuous scales. To restore highly blurred local areas, we develop a Spatial Implicit Representation Module (SIRM) to aggregate spatial correlation at any resolution through event encoding sampling. To tackle global motion blur, a Temporal Implicit Representation Module (TIRM) is presented to learn temporal correlation via temporal shift operations with long-term aggregation. Additionally, we build a High-resolution Hybrid Deblur (H2D) dataset using a new-generation hybrid event-based sensor, which comprises images with naturally spatially aligned and temporally synchronized events at various scales. Experiments demonstrate that our SASNet outperforms state-of-the-art methods on both synthetic GoPro and real H2D datasets, especially in high-speed motion scenarios. Code and dataset are available at https: //github. com/aipixel/SASNet.

ICML Conference 2024 Conference Paper

LLM-Empowered State Representation for Reinforcement Learning

  • Boyuan Wang
  • Yun Qu 0002
  • Yuhang Jiang 0001
  • Jianzhun Shao
  • Chang Liu 0030
  • Wenming Yang
  • Xiangyang Ji

Conventional state representations in reinforcement learning often omit critical task-related details, presenting a significant challenge for value networks in establishing accurate mappings from states to task rewards. Traditional methods typically depend on extensive sample learning to enrich state representations with task-specific information, which leads to low sample efficiency and high time costs. Recently, surging knowledgeable large language models (LLM) have provided promising substitutes for prior injection with minimal human intervention. Motivated by this, we propose LLM-Empowered State Representation (LESR), a novel approach that utilizes LLM to autonomously generate task-related state representation codes which help to enhance the continuity of network mappings and facilitate efficient training. Experimental results demonstrate LESR exhibits high sample efficiency and outperforms state-of-the-art baselines by an average of 29% in accumulated reward in Mujoco tasks and 30% in success rates in Gym-Robotics tasks. Codes of LESR are accessible at https: //github. com/thu-rllab/LESR.

NeurIPS Conference 2024 Conference Paper

Offline Reinforcement Learning with OOD State Correction and OOD Action Suppression

  • Yixiu Mao
  • Qi Wang
  • Chen Chen
  • Yun Qu
  • Xiangyang Ji

In offline reinforcement learning (RL), addressing the out-of-distribution (OOD) action issue has been a focus, but we argue that there exists an OOD state issue that also impairs performance yet has been underexplored. Such an issue describes the scenario when the agent encounters states out of the offline dataset during the test phase, leading to uncontrolled behavior and performance degradation. To this end, we propose SCAS, a simple yet effective approach that unifies OOD state correction and OOD action suppression in offline RL. Technically, SCAS achieves value-aware OOD state correction, capable of correcting the agent from OOD states to high-value in-distribution states. Theoretical and empirical results show that SCAS also exhibits the effect of suppressing OOD actions. On standard offline RL benchmarks, SCAS achieves excellent performance without additional hyperparameter tuning. Moreover, benefiting from its OOD state correction feature, SCAS demonstrates enhanced robustness against environmental perturbations.

AAAI Conference 2024 Conference Paper

Parallel Vertex Diffusion for Unified Visual Grounding

  • Zesen Cheng
  • Kehan Li
  • Peng Jin
  • Siheng Li
  • Xiangyang Ji
  • Li Yuan
  • Chang Liu
  • Jie Chen

Unified visual grounding (UVG) capitalizes on a wealth of task-related knowledge across various grounding tasks via one-shot training, which curtails retraining costs and task-specific architecture design efforts. Vertex generation-based UVG methods achieve this versatility by unified modeling object box and contour prediction and provide a text-powered interface to vast related multi-modal tasks, e.g., visual question answering and captioning. However, these methods typically generate vertexes sequentially through autoregression, which is prone to be trapped in error accumulation and heavy computation, especially for high-dimension sequence generation in complex scenarios. In this paper, we develop Parallel Vertex Diffusion (PVD) based on the parallelizability of diffusion models to accurately and efficiently generate vertexes in a parallel and scalable manner. Since the coordinates fluctuate greatly, it typically encounters slow convergence when training diffusion models without geometry constraints. Therefore, we consummate our PVD by two critical components, i.e., center anchor mechanism and angle summation loss, which serve to normalize coordinates and adopt a differentiable geometry descriptor from the point-in-polygon problem of computational geometry to constrain the overall difference of prediction and label vertexes. These innovative designs empower our PVD to demonstrate its superiority with state-of-the-art performance across various grounding tasks.

ICRA Conference 2024 Conference Paper

RAPIDFlow: Recurrent Adaptable Pyramids with Iterative Decoding for Efficient Optical Flow Estimation

  • Henrique Morimitsu
  • Xiaobin Zhu 0001
  • Roberto M. Cesar
  • Xiangyang Ji
  • Xu-Cheng Yin

Extracting motion information from videos with optical flow estimation is vital in multiple practical robot applications. Current optical flow approaches show remarkable accuracy, but top-performing methods have high computational costs and are unsuitable for embedded devices. Although some previous works have focused on developing low-cost optical flow strategies, their estimation quality has a noticeable gap with more robust methods. In this paper, we develop a novel method to efficiently estimate high-quality optical flow in embedded devices. Our proposed RAPIDFlow model combines efficient NeXt1D convolution blocks with a fully recurrent structure based on feature pyramids to decrease computational costs without significantly impacting estimation accuracy. The adaptable recurrent encoder produces multi-scale features with a single shared block, which allows us to adjust the pyramid length at inference time and make it more robust to changes in input size. Also, it enables our model to offer multiple tradeoffs between accuracy and speed to suit different applications. Experiments using a Jetson Orin NX embedded system on the MPI-Sintel and KITTI public benchmarks show that RAPIDFlow outperforms previous approaches by significant margins at faster speeds. Our code is available at https://github.com/hmorimitsu/ptlflow/tree/main/ptlflow/models/rapidflow.

ICRA Conference 2024 Conference Paper

RaSim: A Range-aware High-fidelity RGB-D Data Simulation Pipeline for Real-world Applications

  • Xingyu Liu
  • Chenyangguang Zhang
  • Gu Wang 0001
  • Ruida Zhang
  • Xiangyang Ji

In robotic vision, a de-facto paradigm is to learn in simulated environments and then transfer to real-world applications, which poses an essential challenge in bridging the sim-to-real domain gap. While mainstream works tackle this problem in the RGB domain, we focus on depth data synthesis and develop a Range-aware RGB-D data Simulation pipeline (RaSim). In particular, high-fidelity depth data is generated by imitating the imaging principle of real-world sensors. A range-aware rendering strategy is further introduced to enrich data diversity. Extensive experiments show that models trained with RaSim can be directly applied to real-world scenarios without any finetuning and excel at downstream RGB-D perception tasks. Data and code are available at https://github.com/shanice-l/RaSim.

AAAI Conference 2024 Conference Paper

Recurrent Partial Kernel Network for Efficient Optical Flow Estimation

  • Henrique Morimitsu
  • Xiaobin Zhu
  • Xiangyang Ji
  • Xu-Cheng Yin

Optical flow estimation is a challenging task consisting of predicting per-pixel motion vectors between images. Recent methods have employed larger and more complex models to improve the estimation accuracy. However, this impacts the widespread adoption of optical flow methods and makes it harder to train more general models since the optical flow data is hard to obtain. This paper proposes a small and efficient model for optical flow estimation. We design a new spatial recurrent encoder that extracts discriminative features at a significantly reduced size. Unlike standard recurrent units, we utilize Partial Kernel Convolution (PKConv) layers to produce variable multi-scale features with a single shared block. We also design efficient Separable Large Kernels (SLK) to capture large context information with low computational cost. Experiments on public benchmarks show that we achieve state-of-the-art generalization performance while requiring significantly fewer parameters and memory than competing methods. Our model ranks first in the Spring benchmark without finetuning, improving the results by over 10% while requiring an order of magnitude fewer FLOPs and over four times less memory than the following published method without finetuning. The code is available at github.com/hmorimitsu/ptlflow/tree/main/ptlflow/models/rpknet.

NeurIPS Conference 2024 Conference Paper

Rethinking Imbalance in Image Super-Resolution for Efficient Inference

  • Wei Yu
  • Bowen Yang
  • Qinglin Liu
  • Jianing Li
  • Shengping Zhang
  • Xiangyang Ji

Existing super-resolution (SR) methods optimize all model weights equally using $\mathcal{L}_1$ or $\mathcal{L}_2$ losses by uniformly sampling image patches without considering dataset imbalances or parameter redundancy, which limits their performance. To address this, we formulate the image SR task as an imbalanced distribution transfer learning problem from a statistical probability perspective, proposing a plug-and-play Weight-Balancing framework (WBSR) to achieve balanced model learning without changing the original model structure and training data. Specifically, we develop a Hierarchical Equalization Sampling (HES) strategy to address data distribution imbalances, enabling better feature representation from texture-rich samples. To tackle model optimization imbalances, we propose a Balanced Diversity Loss (BDLoss) function, focusing on learning texture regions while disregarding redundant computations in smooth regions. After joint training of HES and BDLoss to rectify these imbalances, we present a gradient projection dynamic inference strategy to facilitate accurate and efficient inference. Extensive experiments across various models, datasets, and scale factors demonstrate that our method achieves comparable or superior performance to existing approaches with about 34\% reduction in computational cost.

ICRA Conference 2024 Conference Paper

Stimulate the Potential of Robots via Competition

  • Kangyao Huang
  • Di Guo 0002
  • Xinyu Zhang 0001
  • Xiangyang Ji
  • Huaping Liu 0001

It is common for us to feel pressure in a competition environment, which arises from the desire to obtain success comparing with other individuals or opponents. Although we might get anxious under the pressure, it could also be a drive for us to stimulate our potentials to the best in order to keep up with others. Inspired by this, we propose a competitive learning framework which is able to help individual robot to acquire knowledge from the competition, fully stimulating its dynamics potential in the race. Specifically, the competition information among competitors is introduced as the additional auxiliary signal to learn advantaged actions. We further build a Multiagent-Race environment, and extensive experiments are conducted, demonstrating that robots trained in competitive environments outperform ones that are trained with SoTA algorithms in single robot environment.

ICML Conference 2024 Conference Paper

The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks

  • Ziquan Liu
  • Yufei Cui
  • Yan Yan 0006
  • Yi Xu 0008
  • Xiangyang Ji
  • Xue Liu 0001
  • Antoni B. Chan

In safety-critical applications such as medical imaging and autonomous driving, where decisions have profound implications for patient health and road safety, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks and reliable uncertainty quantification in decision-making. With extensive research focused on enhancing adversarial robustness through various forms of adversarial training (AT), a notable knowledge gap remains concerning the uncertainty inherent in adversarially trained models. To address this gap, this study investigates the uncertainty of deep learning models by examining the performance of conformal prediction (CP) in the context of standard adversarial attacks within the adversarial defense community. It is first unveiled that existing CP methods do not produce informative prediction sets under the commonly used $l_{\infty}$-norm bounded attack if the model is not adversarially trained, which underpins the importance of adversarial training for CP. Our paper next demonstrates that the prediction set size (PSS) of CP using adversarially trained models with AT variants is often worse than using standard AT, inspiring us to research into CP-efficient AT for improved PSS. We propose to optimize a Beta-weighting loss with an entropy minimization regularizer during AT to improve CP-efficiency, where the Beta-weighting loss is shown to be an upper bound of PSS at the population level by our theoretical analysis. Moreover, our empirical study on four image classification datasets across three popular AT baselines validates the effectiveness of the proposed Uncertainty-Reducing AT (AT-UR).

NeurIPS Conference 2024 Conference Paper

Towards Dynamic Message Passing on Graphs

  • Junshu Sun
  • Chenxue Yang
  • Xiangyang Ji
  • Qingming Huang
  • Shuhui Wang

Message passing plays a vital role in graph neural networks (GNNs) for effective feature learning. However, the over-reliance on input topology diminishes the efficacy of message passing and restricts the ability of GNNs. Despite efforts to mitigate the reliance, existing study encounters message-passing bottlenecks or high computational expense problems, which invokes the demands for flexible message passing with low complexity. In this paper, we propose a novel dynamic message-passing mechanism for GNNs. It projects graph nodes and learnable pseudo nodes into a common space with measurable spatial relations between them. With nodes moving in the space, their evolving relations facilitate flexible pathway construction for a dynamic message-passing process. Associating pseudo nodes to input graphs with their measured relations, graph nodes can communicate with each other intermediately through pseudo nodes under linear complexity. We further develop a GNN model named $\mathtt{N^2}$ based on our dynamic message-passing mechanism. $\mathtt{N^2}$ employs a single recurrent layer to recursively generate the displacements of nodes and construct optimal dynamic pathways. Evaluation on eighteen benchmarks demonstrates the superior performance of $\mathtt{N^2}$ over popular GNNs. $\mathtt{N^2}$ successfully scales to large-scale benchmarks and requires significantly fewer parameters for graph classification with the shared recurrent layer.

IROS Conference 2024 Conference Paper

UW-SDF: Exploiting Hybrid Geometric Priors for Neural SDF Reconstruction from Underwater Multi-view Monocular Images

  • Zeyu Chen
  • Jingyi Tang
  • Gu Wang 0001
  • Shengquan Li
  • Xinghui Li
  • Xiangyang Ji
  • Xiu Li 0001

Due to the unique characteristics of underwater environments, accurate 3D reconstruction of underwater objects poses a challenging problem in tasks such as underwater exploration and mapping. Traditional methods that rely on multiple sensor data for 3D reconstruction are time-consuming and face challenges in data acquisition in underwater scenarios. We propose UW-SDF, a framework for reconstructing target objects from multi-view underwater images based on neural SDF. We introduce hybrid geometric priors to optimize the reconstruction process, markedly enhancing the quality and efficiency of neural SDF reconstruction. Additionally, to address the challenge of segmentation consistency in multi-view images, we propose a novel few-shot multi-view target segmentation strategy using the general-purpose segmentation model (SAM), enabling rapid automatic segmentation of unseen objects. Through extensive qualitative and quantitative experiments on diverse datasets, we demonstrate that our proposed method outperforms the traditional underwater 3D reconstruction method and other neural rendering approaches in the field of underwater 3D reconstruction.

ICLR Conference 2024 Conference Paper

Variance-enlarged Poisson Learning for Graph-based Semi-Supervised Learning with Extremely Sparse Labeled Data

  • Xiong Zhou
  • Xianming Liu 0005
  • Hao Yu
  • Jialiang Wang 0003
  • Zeke Xie
  • Junjun Jiang
  • Xiangyang Ji

Graph-based semi-supervised learning, particularly in the context of extremely sparse labeled data, often suffers from degenerate solutions where label functions tend to be nearly constant across unlabeled data. In this paper, we introduce Variance-enlarged Poisson Learning (VPL), a simple yet powerful framework tailored to alleviate the issues arising from the presence of degenerate solutions. VPL incorporates a variance-enlarged regularization term, which induces a Poisson equation specifically for unlabeled data. This intuitive approach increases the dispersion of labels from their average mean, effectively reducing the likelihood of degenerate solutions characterized by nearly constant label functions. We subsequently introduce two streamlined algorithms, V-Laplace and V-Poisson, each intricately designed to enhance Laplace and Poisson learning, respectively. Furthermore, we broaden the scope of VPL to encompass graph neural networks, introducing Variance-enlarged Graph Poisson Networks (V-GPN) to facilitate improved label propagation. To achieve a deeper understanding of VPL's behavior, we conduct a comprehensive theoretical exploration in both discrete and variational cases. Our findings elucidate that VPL inherently amplifies the importance of connections within the same class while concurrently tempering those between different classes. We support our claims with extensive experiments, demonstrating the effectiveness of VPL and showcasing its superiority over existing methods. The code is available at https://github.com/hitcszx/VPL.

ICLR Conference 2024 Conference Paper

Zero-Mean Regularized Spectral Contrastive Learning: Implicitly Mitigating Wrong Connections in Positive-Pair Graphs

  • Xiong Zhou
  • Xianming Liu 0005
  • Feilong Zhang 0002
  • Gang Wu 0010
  • Deming Zhai
  • Junjun Jiang
  • Xiangyang Ji

Contrastive learning has emerged as a popular paradigm of self-supervised learning that learns representations by encouraging representations of positive pairs to be similar while representations of negative pairs to be far apart. The spectral contrastive loss, in synergy with the notion of positive-pair graphs, offers valuable theoretical insights into the empirical successes of contrastive learning. In this paper, we propose incorporating an additive factor into the term of spectral contrastive loss involving negative pairs. This simple modification can be equivalently viewed as introducing a regularization term that enforces the mean of representations to be zero, which thus is referred to as *zero-mean regularization*. It intuitively relaxes the orthogonality of representations between negative pairs and implicitly alleviates the adverse effect of wrong connections in the positive-pair graph, leading to better performance and robustness. To clarify this, we thoroughly investigate the role of zero-mean regularized spectral contrastive loss in both unsupervised and supervised scenarios with respect to theoretical analysis and quantitative evaluation. These results highlight the potential of zero-mean regularized spectral contrastive learning to be a promising approach in various tasks.

ICML Conference 2023 Conference Paper

Complementary Attention for Multi-Agent Reinforcement Learning

  • Jianzhun Shao
  • Hongchang Zhang
  • Yun Qu 0002
  • Chang Liu 0030
  • Shuncheng He
  • Yuhang Jiang 0001
  • Xiangyang Ji

In cooperative multi-agent reinforcement learning, centralized training with decentralized execution (CTDE) shows great promise for a trade-off between independent Q-learning and joint action learning. However, vanilla CTDE methods assumed a fixed number of agents could hardly adapt to real-world scenarios where dynamic team compositions typically suffer from dramatically variant partial observability. Specifically, agents with extensive sight ranges are prone to be affected by trivial environmental substrates, dubbed the "distracted attention" issue; ones with limited observation can hardly sense their teammates, degrading the cooperation quality. In this paper, we propose Complementary Attention for Multi-Agent reinforcement learning (CAMA), which applies a divide-and-conquer strategy on input entities accompanied with the complementary attention of enhancement and replenishment. Concretely, to tackle the distracted attention issue, highly contributed entities’ attention is enhanced by the execution-related representation extracted via action prediction with an inverse model. For better out-of-sight-range cooperation, the lowly contributed ones are compressed to brief messages with a conditional mutual information estimator. Our CAMA facilitates stable and sustainable teamwork, which is justified by the impressive results reported on the challenging StarCraftII, MPE, and Traffic Junction benchmarks.

NeurIPS Conference 2023 Conference Paper

Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning

  • Jianzhun Shao
  • Yun Qu
  • Chen Chen
  • Hongchang Zhang
  • Xiangyang Ji

Offline multi-agent reinforcement learning is challenging due to the coupling effect of both distribution shift issue common in offline setting and the high dimension issue common in multi-agent setting, making the action out-of-distribution (OOD) and value overestimation phenomenon excessively severe. To mitigate this problem, we propose a novel multi-agent offline RL algorithm, named CounterFactual Conservative Q-Learning (CFCQL) to conduct conservative value estimation. Rather than regarding all the agents as a high dimensional single one and directly applying single agent conservative methods to it, CFCQL calculates conservative regularization for each agent separately in a counterfactual way and then linearly combines them to realize an overall conservative value estimation. We prove that it still enjoys the underestimation property and the performance guarantee as those single agent conservative methods do, but the induced regularization and safe policy improvement bound are independent of the agent number, which is therefore theoretically superior to the direct treatment referred to above, especially when the agent number is large. We further conduct experiments on four environments including both discrete and continuous action settings on both existing and our man-made datasets, demonstrating that CFCQL outperforms existing methods on most datasets and even with a remarkable margin on some of them.

AAAI Conference 2023 Conference Paper

DARL: Distance-Aware Uncertainty Estimation for Offline Reinforcement Learning

  • Hongchang Zhang
  • Jianzhun Shao
  • Shuncheng He
  • Yuhang Jiang
  • Xiangyang Ji

To facilitate offline reinforcement learning, uncertainty estimation is commonly used to detect out-of-distribution data. By inspecting, we show that current explicit uncertainty estimators such as Monte Carlo Dropout and model ensemble are not competent to provide trustworthy uncertainty estimation in offline reinforcement learning. Accordingly, we propose a non-parametric distance-aware uncertainty estimator which is sensitive to the change in the input space for offline reinforcement learning. Based on our new estimator, adaptive truncated quantile critics are proposed to underestimate the out-of-distribution samples. We show that the proposed distance-aware uncertainty estimator is able to offer better uncertainty estimation compared to previous methods. Experimental results demonstrate that our proposed DARL method is competitive to the state-of-the-art methods in offline evaluation tasks.

NeurIPS Conference 2023 Conference Paper

DDF-HO: Hand-Held Object Reconstruction via Conditional Directed Distance Field

  • Chenyangguang Zhang
  • Yan Di
  • Ruida Zhang
  • Guangyao Zhai
  • Fabian Manhardt
  • Federico Tombari
  • Xiangyang Ji

Reconstructing hand-held objects from a single RGB image is an important and challenging problem. Existing works utilizing Signed Distance Fields (SDF) reveal limitations in comprehensively capturing the complex hand-object interactions, since SDF is only reliable within the proximity of the target, and hence, infeasible to simultaneously encode local hand and object cues. To address this issue, we propose DDF-HO, a novel approach leveraging Directed Distance Field (DDF) as the shape representation. Unlike SDF, DDF maps a ray in 3D space, consisting of an origin and a direction, to corresponding DDF values, including a binary visibility signal determining whether the ray intersects the objects and a distance value measuring the distance from origin to target in the given direction. We randomly sample multiple rays and collect local to global geometric features for them by introducing a novel 2D ray-based feature aggregation scheme and a 3D intersection-aware hand pose embedding, combining 2D-3D features to model hand-object interactions. Extensive experiments on synthetic and real-world datasets demonstrate that DDF-HO consistently outperforms all baseline methods by a large margin, especially under Chamfer Distance, with about 80% leap forward. Codes are available at https: //github. com/ZhangCYG/DDFHO.

NeurIPS Conference 2023 Conference Paper

Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks

  • Yun Qu
  • Boyuan Wang
  • Jianzhun Shao
  • Yuhang Jiang
  • Chen Chen
  • Zhenbin Ye
  • Liu Linc
  • Yang Feng

The advancement of Offline Reinforcement Learning (RL) and Offline Multi-Agent Reinforcement Learning (MARL) critically depends on the availability of high-quality, pre-collected offline datasets that represent real-world complexities and practical applications. However, existing datasets often fall short in their simplicity and lack of realism. To address this gap, we propose Hokoff, a comprehensive set of pre-collected datasets that covers both offline RL and offline MARL, accompanied by a robust framework, to facilitate further research. This data is derived from Honor of Kings, a recognized Multiplayer Online Battle Arena (MOBA) game known for its intricate nature, closely resembling real-life situations. Utilizing this framework, we benchmark a variety of offline RL and offline MARL algorithms. We also introduce a novel baseline algorithm tailored for the inherent hierarchical action space of the game. We reveal the incompetency of current offline RL approaches in handling task complexity, generalization and multi-task learning.

AAAI Conference 2023 Conference Paper

ILSGAN: Independent Layer Synthesis for Unsupervised Foreground-Background Segmentation

  • Qiran Zou
  • Yu Yang
  • Wing Yin Cheung
  • Chang Liu
  • Xiangyang Ji

Unsupervised foreground-background segmentation aims at extracting salient objects from cluttered backgrounds, where Generative Adversarial Network (GAN) approaches, especially layered GANs, show great promise. However, without human annotations, they are typically prone to produce foreground and background layers with non-negligible semantic and visual confusion, dubbed "information leakage", resulting in notable degeneration of the generated segmentation mask. To alleviate this issue, we propose a simple-yet-effective explicit layer independence modeling approach, termed Independent Layer Synthesis GAN (ILSGAN), pursuing independent foreground-background layer generation by encouraging their discrepancy. Specifically, it targets minimizing the mutual information between visible and invisible regions of the foreground and background to spur interlayer independence. Through in-depth theoretical and experimental analyses, we justify that explicit layer independence modeling is critical to suppressing information leakage and contributes to impressive segmentation performance gains. Also, our ILSGAN achieves strong state-of-the-art generation quality and segmentation performance on complex real-world data.

ICLR Conference 2023 Conference Paper

In-sample Actor Critic for Offline Reinforcement Learning

  • Hongchang Zhang
  • Yixiu Mao
  • Boyuan Wang
  • Shuncheng He
  • Yi Xu 0008
  • Xiangyang Ji

Offline reinforcement learning suffers from out-of-distribution issue and extrapolation error. Most methods penalize the out-of-distribution state-action pairs or regularize the trained policy towards the behavior policy but cannot guarantee to get rid of extrapolation error. We propose In-sample Actor Critic (IAC) which utilizes sampling-importance resampling to execute in-sample policy evaluation. IAC only uses the target Q-values of the actions in the dataset to evaluate the trained policy, thus avoiding extrapolation error. The proposed method performs unbiased policy evaluation and has a lower variance than importance sampling in many cases. Empirical results show that IAC obtains competitive performance compared to the state-of-the-art methods on Gym-MuJoCo locomotion domains and much more challenging AntMaze domains.

ICML Conference 2023 Conference Paper

No One Idles: Efficient Heterogeneous Federated Learning with Parallel Edge and Server Computation

  • Feilong Zhang 0002
  • Xianming Liu 0005
  • Shiyi Lin
  • Gang Wu 0010
  • Xiong Zhou
  • Junjun Jiang
  • Xiangyang Ji

Federated learning suffers from a latency bottleneck induced by network stragglers, which hampers the training efficiency significantly. In addition, due to the heterogeneous data distribution and security requirements, simple and fast averaging aggregation is not feasible anymore. Instead, complicated aggregation operations, such as knowledge distillation, are required. The time cost for complicated aggregation becomes a new bottleneck that limits the computational efficiency of FL. In this work, we claim that the root cause of training latency actually lies in the aggregation-then-broadcasting workflow of the server. By swapping the computational order of aggregation and broadcasting, we propose a novel and efficient parallel federated learning (PFL) framework that unlocks the edge nodes during global computation and the central server during local computation. This fully asynchronous and parallel pipeline enables handling complex aggregation and network stragglers, allowing flexible device participation as well as achieving scalability in computation. We theoretically prove that synchronous and asynchronous PFL can achieve a similar convergence rate as vanilla FL. Extensive experiments empirically show that our framework brings up to $5. 56\times$ speedup compared with traditional FL. Code is available at: https: //github. com/Hypervoyager/PFL.

JMLR Journal 2023 Journal Article

On the Dynamics Under the Unhinged Loss and Beyond

  • Xiong Zhou
  • Xianming Liu
  • Hanzhang Wang
  • Deming Zhai
  • Junjun Jiang
  • Xiangyang Ji

Recent works have studied implicit biases in deep learning, especially the behavior of last-layer features and classifier weights. However, they usually need to simplify the intermediate dynamics under gradient flow or gradient descent due to the intractability of loss functions and model architectures. In this paper, we introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze the closed-form dynamics while requiring as few simplifications or assumptions as possible. The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization. Based on the layer-peeled model that views last-layer features as free optimization variables, we conduct a thorough analysis in the unconstrained, regularized, and spherical constrained cases, as well as the case where the neural tangent kernel remains invariant. To bridge the performance of the unhinged loss to that of Cross-Entropy (CE), we investigate the scenario of fixing classifier weights with a specific structure, (e.g., a simplex equiangular tight frame). Our analysis shows that these dynamics converge exponentially fast to a solution depending on the initialization of features and classifier weights. These theoretical results not only offer valuable insights, including explicit feature regularization and rescaled learning rates for enhancing practical training with the unhinged loss, but also extend their applicability to other loss functions. Finally, we empirically demonstrate these theoretical results and insights through extensive experiments. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

ICML Conference 2023 Conference Paper

Supported Trust Region Optimization for Offline Reinforcement Learning

  • Yixiu Mao
  • Hongchang Zhang
  • Chen Chen
  • Yi Xu 0008
  • Xiangyang Ji

Offline reinforcement learning suffers from the out-of-distribution issue and extrapolation error. Most policy constraint methods regularize the density of the trained policy towards the behavior policy, which is too restrictive in most cases. We propose Supported Trust Region optimization (STR) which performs trust region policy optimization with the policy constrained within the support of the behavior policy, enjoying the less restrictive support constraint. We show that, when assuming no approximation and sampling error, STR guarantees strict policy improvement until convergence to the optimal support-constrained policy in the dataset. Further with both errors incorporated, STR still guarantees safe policy improvement for each step. Empirical results validate the theory of STR and demonstrate its state-of-the-art performance on MuJoCo locomotion domains and much more challenging AntMaze domains.

NeurIPS Conference 2023 Conference Paper

Supported Value Regularization for Offline Reinforcement Learning

  • Yixiu Mao
  • Hongchang Zhang
  • Chen Chen
  • Yi Xu
  • Xiangyang Ji

Offline reinforcement learning suffers from the extrapolation error and value overestimation caused by out-of-distribution (OOD) actions. To mitigate this issue, value regularization approaches aim to penalize the learned value functions to assign lower values to OOD actions. However, existing value regularization methods lack a proper distinction between the regularization effects on in-distribution (ID) and OOD actions, and fail to guarantee optimal convergence results of the policy. To this end, we propose Supported Value Regularization (SVR), which penalizes the Q-values for all OOD actions while maintaining standard Bellman updates for ID ones. Specifically, we utilize the bias of importance sampling to compute the summation of Q-values over the entire OOD region, which serves as the penalty for policy evaluation. This design automatically separates the regularization for ID and OOD actions without manually distinguishing between them. In tabular MDP, we show that the policy evaluation operator of SVR is a contraction, whose fixed point outputs unbiased Q-values for ID actions and underestimated Q-values for OOD actions. Furthermore, the policy iteration with SVR guarantees strict policy improvement until convergence to the optimal support-constrained policy in the dataset. Empirically, we validate the theoretical properties of SVR in a tabular maze environment and demonstrate its state-of-the-art performance on a range of continuous control tasks in the D4RL benchmark.

AAAI Conference 2023 Conference Paper

Weakly-Supervised Semantic Segmentation for Histopathology Images Based on Dataset Synthesis and Feature Consistency Constraint

  • Zijie Fang
  • Yang Chen
  • Yifeng Wang
  • Zhi Wang
  • Xiangyang Ji
  • Yongbing Zhang

Tissue segmentation is a critical task in computational pathology due to its desirable ability to indicate the prognosis of cancer patients. Currently, numerous studies attempt to use image-level labels to achieve pixel-level segmentation to reduce the need for fine annotations. However, most of these methods are based on class activation map, which suffers from inaccurate segmentation boundaries. To address this problem, we propose a novel weakly-supervised tissue segmentation framework named PistoSeg, which is implemented under a fully-supervised manner by transferring tissue category labels to pixel-level masks. Firstly, a dataset synthesis method is proposed based on Mosaic transformation to generate synthesized images with pixel-level masks. Next, considering the difference between synthesized and real images, this paper devises an attention-based feature consistency, which directs the training process of a proposed pseudo-mask refining module. Finally, the refined pseudo-masks are used to train a precise segmentation model for testing. Experiments based on WSSS4LUAD and BCSS-WSSS validate that PistoSeg outperforms the state-of-the-art methods. The code is released at https://github.com/Vison307/PistoSeg.

IJCAI Conference 2023 Conference Paper

WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image Segmentation

  • Zesen Cheng
  • Peng Jin
  • Hao Li
  • Kehan Li
  • Siheng Li
  • Xiangyang Ji
  • Chang Liu
  • Jie Chen

The top-down and bottom-up methods are two mainstreams of referring segmentation, while both methods have their own intrinsic weaknesses. Top-down methods are chiefly disturbed by Polar Negative (PN) errors owing to the lack of fine-grained cross-modal alignment. Bottom-up methods are mainly perturbed by Inferior Positive (IP) errors due to the lack of prior object information. Nevertheless, we discover that two types of methods are highly complementary for restraining respective weaknesses but the direct average combination leads to harmful interference. In this context, we build Win-win Cooperation (WiCo) to exploit complementary nature of two types of methods on both interaction and integration aspects for achieving a win-win improvement. For the interaction aspect, Complementary Feature Interaction (CFI) introduces prior object information to bottom-up branch and provides fine-grained information to top-down branch for complementary feature enhancement. For the integration aspect, Gaussian Scoring Integration (GSI) models the gaussian performance distributions of two branches and weighted integrates results by sampling confident scores from the distributions. With our WiCo, several prominent bottom-up and top-down combinations achieve remarkable improvements on three common datasets with reasonable extra costs, which justifies effectiveness and generality of our method.

IROS Conference 2022 Conference Paper

6D Robotic Assembly Based on RGB-only Object Pose Estimation

  • Bowen Fu
  • Sek Kun Leong
  • Xiaocong Lian
  • Xiangyang Ji

Vision-based robotic assembly is a crucial yet challenging task as the interaction with multiple objects requires high levels of precision. In this paper, we propose an integrated 6D robotic system to perceive, grasp, manipulate and assemble blocks with tight tolerances. Aiming to provide an off-the-shelf RGB-only solution, our system is built upon a monocular 6D object pose estimation network trained solely with synthetic images leveraging physically-based rendering. Subsequently, pose-guided 6D transformation along with collision-free assembly is proposed to construct any designed structure with arbitrary initial poses. Our novel 3-axis calibration operation further enhances the precision and robustness by disentangling 6D pose estimation and robotic assembly. Both quantitative and qualitative results demonstrate the effectiveness of our proposed 6D robotic assembly system.

NeurIPS Conference 2022 Conference Paper

Distilling Representations from GAN Generator via Squeeze and Span

  • Yu Yang
  • Xiaotian Cheng
  • Chang Liu
  • Hakan Bilen
  • Xiangyang Ji

In recent years, generative adversarial networks (GANs) have been an actively studied topic and shown to successfully produce high-quality realistic images in various domains. The controllable synthesis ability of GAN generators suggests that they maintain informative, disentangled, and explainable image representations, but leveraging and transferring their representations to downstream tasks is largely unexplored. In this paper, we propose to distill knowledge from GAN generators by squeezing and spanning their representations. We \emph{squeeze} the generator features into representations that are invariant to semantic-preserving transformations through a network before they are distilled into the student network. We \emph{span} the distilled representation of the synthetic domain to the real domain by also using real training data to remedy the mode collapse of GANs and boost the student network performance in a real domain. Experiments justify the efficacy of our method and reveal its great significance in self-supervised representation learning. Code is available at https: //github. com/yangyu12/squeeze-and-span.

NeurIPS Conference 2022 Conference Paper

Improved Fine-Tuning by Better Leveraging Pre-Training Data

  • Ziquan Liu
  • Yi Xu
  • Yuanhong Xu
  • Qi Qian
  • Hao Li
  • Xiangyang Ji
  • Antoni Chan
  • Rong Jin

As a dominant paradigm, fine-tuning a pre-trained model on the target data is widely used in many deep learning applications, especially for small data sets. However, recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy once the number of training samples is increased in some vision tasks. In this work, we revisit this phenomenon from the perspective of generalization analysis by using excess risk bound which is popular in learning theory. The result reveals that the excess risk bound may have a weak dependency on the pre-trained model. The observation inspires us to leverage pre-training data for fine-tuning, since this data is also available for fine-tuning. The generalization result of using pre-training data shows that the excess risk bound on a target task can be improved when the appropriate pre-training data is included in fine-tuning. With the theoretical motivation, we propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task. Extensive experimental results for image classification tasks on 8 benchmark data sets verify the effectiveness of the proposed data selection based fine-tuning pipeline. Our code is available at https: //github. com/ziquanliu/NeurIPS2022 UOT fine_tuning.

ICLR Conference 2022 Conference Paper

Learning to Annotate Part Segmentation with Gradient Matching

  • Yu Yang 0011
  • Xiaotian Cheng
  • Hakan Bilen
  • Xiangyang Ji

The success of state-of-the-art deep neural networks heavily relies on the presence of large-scale labelled datasets, which are extremely expensive and time-consuming to annotate. This paper focuses on tackling semi-supervised part segmentation tasks by generating high-quality images with a pre-trained GAN and labelling the generated images with an automatic annotator. In particular, we formulate the annotator learning as a learning-to-learn problem. Given a pre-trained GAN, the annotator learns to label object parts in a set of randomly generated images such that a part segmentation model trained on these synthetic images with their predicted labels obtains low segmentation error on a small validation set of manually labelled images. We further reduce this nested-loop optimization problem to a simple gradient matching problem and efficiently solve it with an iterative algorithm. We show that our method can learn annotators from a broad range of labelled images including real images, generated images, and even analytically rendered images. Our method is evaluated with semi-supervised part segmentation tasks and significantly outperforms other semi-supervised competitors when the amount of labelled examples is extremely limited.

ICLR Conference 2022 Conference Paper

Learning Towards The Largest Margins

  • Xiong Zhou
  • Xianming Liu 0005
  • Deming Zhai
  • Junjun Jiang
  • Xin Gao
  • Xiangyang Ji

One of the main challenges for feature representation in deep learning-based classification is the design of appropriate loss functions that exhibit strong discriminative power. The classical softmax loss does not explicitly encourage discriminative learning of features. A popular direction of research is to incorporate margins in well-established losses in order to enforce extra intra-class compactness and inter-class separability, which, however, were developed through heuristic means, as opposed to rigorous mathematical principles. In this work, we attempt to address this limitation by formulating the principled optimization objective as learning towards the largest margins. Specifically, we firstly propose to employ the class margin as the measure of inter-class separability, and the sample margin as the measure of intra-class compactness. Accordingly, to encourage discriminative representation of features, the loss function should promote the largest possible margins for both classes and samples. Furthermore, we derive a generalized margin softmax loss to draw general conclusions for the existing margin-based losses. Not only does this principled framework offer new perspectives to understand and interpret existing margin-based losses, but it also provides new insights that can guide the design of new tools, including \textit{sample margin regularization} and \textit{largest margin softmax loss} for class balanced cases, and \textit{zero centroid regularization} for class imbalanced cases. Experimental results demonstrate the effectiveness of our strategy for multiple tasks including visual classification, imbalanced classification, person re-identification, and face verification.

AAAI Conference 2022 Conference Paper

Local Surface Descriptor for Geometry and Feature Preserved Mesh Denoising

  • Wenbo Zhao
  • Xianming Liu
  • Junjun Jiang
  • Debin Zhao
  • Ge Li
  • Xiangyang Ji

3D meshes are widely employed to represent geometry structure of 3D shapes. Due to limitation of scanning sensor precision and other issues, meshes are inevitably affected by noise, which hampers the subsequent applications. Convolultional neural networks (CNNs) achieve great success in image processing tasks, including 2D image denoising, and have been proven to own the capacity of modeling complex features at different scales, which is also particularly useful for mesh denoising. However, due to the nature of irregular structure, CNNs-based denosing strategies cannot be trivially applied for meshes. To circumvent this limitation, in the paper, we propose the local surface descriptor (LSD), which is able to transform the local deformable surface around a face into 2D grid representation and thus facilitates the deployment of CNNs to generate denoised face normals. To verify the superiority of LSD, we directly feed LSD into the classical Resnet without any complicated network design. The extensive experimental results show that, compared to the state-ofthe-arts, our method achieves encouraging performance with respect to both objective and subjective evaluations.

NeurIPS Conference 2022 Conference Paper

Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning

  • Zihan Zhang
  • Yuhang Jiang
  • Yuan Zhou
  • Xiangyang Ji

In this paper, we study the episodic reinforcement learning (RL) problem modeled by finite-horizon Markov Decision Processes (MDPs) with constraint on the number of batches. The multi-batch reinforcement learning framework, where the agent is required to provide a time schedule to update policy before everything, which is particularly suitable for the scenarios where the agent suffers extensively from changing the policy adaptively. Given a finite-horizon MDP with $S$ states, $A$ actions and planning horizon $H$, we design a computational efficient algorithm to achieve near-optimal regret of $\tilde{O}(\sqrt{SAH^3K\ln(1/\delta)})$\footnote{$\tilde{O}(\cdot)$ hides logarithmic terms of $(S, A, H, K)$} in $K$ episodes using $O\left(H+\log_2\log_2(K) \right)$ batches with confidence parameter $\delta$. To our best of knowledge, it is the first $\tilde{O}(\sqrt{SAH^3K})$ regret bound with $O(H+\log_2\log_2(K))$ batch complexity. Meanwhile, we show that to achieve $\tilde{O}(\mathrm{poly}(S, A, H)\sqrt{K})$ regret, the number of batches is at least $\Omega\left(H/\log_A(K)+ \log_2\log_2(K) \right)$, which matches our upper bound up to logarithmic terms. Our technical contribution are two-fold: 1) a near-optimal design scheme to explore over the unlearned states; 2) an computational efficient algorithm to explore certain directions with an approximated transition model. ion model.

ICML Conference 2022 Conference Paper

Prototype-Anchored Learning for Learning with Imperfect Annotations

  • Xiong Zhou
  • Xianming Liu 0005
  • Deming Zhai
  • Junjun Jiang
  • Xin Gao
  • Xiangyang Ji

The success of deep neural networks greatly relies on the availability of large amounts of high-quality annotated data, which however are difficult or expensive to obtain. The resulting labels may be class imbalanced, noisy or human biased. It is challenging to learn unbiased classification models from imperfectly annotated datasets, on which we usually suffer from overfitting or underfitting. In this work, we thoroughly investigate the popular softmax loss and margin-based loss, and offer a feasible approach to tighten the generalization error bound by maximizing the minimal sample margin. We further derive the optimality condition for this purpose, which indicates how the class prototypes should be anchored. Motivated by theoretical analysis, we propose a simple yet effective method, namely prototype-anchored learning (PAL), which can be easily incorporated into various learning-based classification schemes to handle imperfect annotation. We verify the effectiveness of PAL on class-imbalanced learning and noise-tolerant learning by extensive experiments on synthetic and real-world datasets.

NeurIPS Conference 2022 Conference Paper

Self-Organized Group for Cooperative Multi-agent Reinforcement Learning

  • Jianzhun Shao
  • Zhiqiang Lou
  • Hongchang Zhang
  • Yuhang Jiang
  • Shuncheng He
  • Xiangyang Ji

Centralized training with decentralized execution (CTDE) has achieved great success in cooperative multi-agent reinforcement learning (MARL) in practical applications. However, CTDE-based methods typically suffer from poor zero-shot generalization ability with dynamic team composition and varying partial observability. To tackle these issues, we propose a spontaneously grouping mechanism, termed Self-Organized Group (SOG), which is featured with conductor election (CE) and message summary (MS). In CE, a certain number of conductors are elected every $T$ time-steps to temporally construct groups, each with conductor-follower consensus where the followers are constrained to only communicate with their conductor. In MS, each conductor summarize and distribute the received messages to all affiliate group members to hold a unified scheduling. SOG provides zero-shot generalization ability to the dynamic number of agents and the varying partial observability. Sufficient experiments on mainstream multi-agent benchmarks exhibit superiority of SOG.

NeurIPS Conference 2022 Conference Paper

SPD: Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning

  • Yuhang Jiang
  • Jianzhun Shao
  • Shuncheng He
  • Hongchang Zhang
  • Xiangyang Ji

Reinforcement learning typically relies heavily on a well-designed reward signal, which gets more challenging in cooperative multi-agent reinforcement learning. Alternatively, unsupervised reinforcement learning (URL) has delivered on its promise in the recent past to learn useful skills and explore the environment without external supervised signals. These approaches mainly aimed for the single agent to reach distinguishable states, insufficient for multi-agent systems due to that each agent interacts with not only the environment, but also the other agents. We propose Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning (SPD) to learn generic coordination policies for agents with no extrinsic reward. Specifically, we devise the Synergy Pattern Graph (SPG), a graph depicting the relationships of agents at each time step. Furthermore, we propose an episode-wise divergence measurement to approximate the discrepancy of synergy patterns. To overcome the challenge of sparse return, we decompose the discrepancy of synergy patterns to per-time-step pseudo-reward. Empirically, we show the capacity of SPD to acquire meaningful coordination policies, such as maintaining specific formations in Multi-Agent Particle Environment and pass-and-shoot in Google Research Football. Furthermore, we demonstrate that the same instructive pretrained policy's parameters can serve as a good initialization for a series of downstream tasks' policies, achieving higher data efficiency and outperforming state-of-the-art approaches in Google Research Football.

IROS Conference 2022 Conference Paper

SSP-Pose: Symmetry-Aware Shape Prior Deformation for Direct Category-Level Object Pose Estimation

  • Ruida Zhang
  • Yan Di
  • Fabian Manhardt
  • Federico Tombari
  • Xiangyang Ji

Category-level pose estimation is a challenging problem due to intra-class shape variations. Recent methods deform pre-computed shape priors to map the observed point cloud into the normalized object coordinate space and then retrieve the pose via post-processing, i. e. , Umeyama's Algorithm. The shortcomings of this two-stage strategy lie in two aspects: 1) The surrogate supervision on the intermediate results can not directly guide the learning of pose, resulting in large pose error after post-processing. 2) The inference speed is limited by the post-processing step. In this paper, to handle these shortcomings, we propose an end-to-end trainable network SSP-Pose for category-level pose estimation, which integrates shape priors into a direct pose regression network. SSP-Pose stacks four individual branches on a shared feature extractor, where two branches are designed to deform and match the prior model with the observed instance, and the other two branches are applied for directly regressing the totally 9 degrees-of-freedom pose and performing symmetry reconstruction and point-wise inlier mask prediction respectively. Consistency loss terms are then naturally exploited to align the outputs of different branches and promote the performance. During inference, only the direct pose regression branch is needed. In this manner, SSP-Pose not only learns category-level pose-sensitive characteristics to boost performance but also keeps a real-time inference speed. Moreover, we utilize the symmetry information of each category to guide the shape prior deformation, and propose a novel symmetry-aware loss to mitigate the matching ambiguity. Extensive experiments on public datasets demon-strate that SSP-Pose produces superior performance compared with competitors with a real-time inference speed at about 25Hz. The codes will be released soon.

AAAI Conference 2022 Conference Paper

State Deviation Correction for Offline Reinforcement Learning

  • Hongchang Zhang
  • Jianzhun Shao
  • Yuhang Jiang
  • Shuncheng He
  • Guanwen Zhang
  • Xiangyang Ji

Offline reinforcement learning aims to maximize the expected cumulative rewards with a fixed collection of data. The basic principle of current offline reinforcement learning methods is to restrict the policy to the offline dataset action space. However, they ignore the case where the dataset’s trajectories fail to cover the state space completely. Especially, when the dataset’s size is limited, it is likely that the agent would encounter unseen states during test time. Prior policyconstrained methods are incapable of correcting the state deviation, and may lead the agent to its unexpected regions further. In this paper, we propose the state deviation correction (SDC) method to constrain the policy’s induced state distribution by penalizing the out-of-distribution states which might appear during the test period. We first perturb the states sampled from the logged dataset, then simulate noisy next states on the basis of a dynamics model and the policy. We then train the policy to minimize the distances between the noisy next states and the offline dataset. In this manner, we allow the trained policy to guide the agent to its familiar regions. Experimental results demonstrate that our proposed method is competitive with the state-of-the-art methods in a GridWorld setup, offline Mujoco control suite, and a modified offline Mujoco dataset with a finite number of valuable samples.

AAAI Conference 2022 Conference Paper

Towards End-to-End Image Compression and Analysis with Transformers

  • Yuanchao Bai
  • Xu Yang
  • Xianming Liu
  • Junjun Jiang
  • Yaowei Wang
  • Xiangyang Ji
  • Wen Gao

We propose an end-to-end image compression and analysis model with Transformers, targeting to the cloud-based image classification application. Instead of placing an existing Transformer-based image classification model directly after an image codec, we aim to redesign the Vision Transformer (ViT) model to perform image classification from the compressed features and facilitate image compression with the long-term information from the Transformer. Specifically, we first replace the patchify stem (i. e. , image splitting and embedding) of the ViT model with a lightweight image encoder modelled by a convolutional neural network. The compressed features generated by the image encoder are injected convolutional inductive bias and are fed to the Transformer for image classification bypassing image reconstruction. Meanwhile, we propose a feature aggregation module to fuse the compressed features with the selected intermediate features of the Transformer, and feed the aggregated features to a deconvolutional neural network for image reconstruction. The aggregated features can obtain the long-term information from the self-attention mechanism of the Transformer and improve the compression performance. The rate-distortion-accuracy optimization problem is finally solved by a two-step training strategy. Experimental results demonstrate the effectiveness of the proposed model in both the image compression and the classification tasks.

AAAI Conference 2022 Conference Paper

Unpaired Multi-Domain Stain Transfer for Kidney Histopathological Images

  • Yiyang Lin
  • Bowei Zeng
  • Yifeng Wang
  • Yang Chen
  • Zijie Fang
  • Jian Zhang
  • Xiangyang Ji
  • Haoqian Wang

As an essential step in the pathological diagnosis, histochemical staining can show specific tissue structure information and, consequently, assist pathologists in making accurate diagnoses. Clinical kidney histopathological analyses usually employ more than one type of staining: H&E, MAS, PAS, PASM, etc. However, due to the interference of colors among multiple stains, it is not easy to perform multiple staining simultaneously on one biological tissue. To address this problem, we propose a network based on unpaired training data to virtually generate multiple types of staining from one staining. Our method can preserve the content of input images while transferring them to multiple target styles accurately. To efficiently control the direction of stain transfer, we propose a style guided normalization (SGN). Furthermore, a multiple style encoding (MSE) is devised to represent the relationship among different staining styles dynamically. An improved one-hot label is also proposed to enhance the generalization ability and extendibility of our method. Vast experiments have demonstrated that our model can achieve superior performance on a tiny dataset. The results exhibit not only good performance but also great visualization and interpretability. Especially, our method also achieves satisfactory results over cross-tissue, cross-staining as well as cross-task. We believe that our method will significantly influence clinical stain transfer and reduce the workload greatly for pathologists. Our code and Supplementary materials are available at https: //github. com/linyiyang98/UMDST.

AAAI Conference 2022 Conference Paper

Wasserstein Unsupervised Reinforcement Learning

  • Shuncheng He
  • Yuhang Jiang
  • Hongchang Zhang
  • Jianzhun Shao
  • Xiangyang Ji

Unsupervised reinforcement learning aims to train agents to learn a handful of policies or skills in environments without external reward. These pre-trained policies can accelerate learning when endowed with external reward, and can also be used as primitive options in hierarchical reinforcement learning. Conventional approaches of unsupervised skill discovery feed a latent variable to the agent and shed its empowerment on agent’s behavior by mutual information (MI) maximization. However, the policies learned by MI-based methods cannot sufficiently explore the state space, despite they can be successfully identified from each other. Therefore we propose a new framework Wasserstein unsupervised reinforcement learning (WURL) where we directly maximize the distance of state distributions induced by different policies. Additionally, we overcome difficulties in simultaneously training N(N > 2) policies, and amortizing the overall reward to each step. Experiments show policies learned by our approach outperform MI-based methods on the metric of Wasserstein distance while keeping high discriminability. Furthermore, the agents trained by WURL can sufficiently explore the state space in mazes and MuJoCo tasks and the pre-trained policies can be applied to downstream tasks by hierarchical learning.

ICML Conference 2021 Conference Paper

Asymmetric Loss Functions for Learning with Noisy Labels

  • Xiong Zhou
  • Xianming Liu 0005
  • Junjun Jiang
  • Xin Gao
  • Xiangyang Ji

Robust loss functions are essential for training deep neural networks with better generalization power in the presence of noisy labels. Symmetric loss functions are confirmed to be robust to label noise. However, the symmetric condition is overly restrictive. In this work, we propose a new class of loss functions, namely asymmetric loss functions, which are robust to learning from noisy labels for arbitrary noise type. Subsequently, we investigate general theoretical properties of asymmetric loss functions, including classification-calibration, excess risk bound, and noise-tolerance. Meanwhile, we introduce the asymmetry ratio to measure the asymmetry of a loss function, and the empirical results show that a higher ratio will provide better robustness. Moreover, we modify several common loss functions, and establish the necessary and sufficient conditions for them to be asymmetric. Experiments on benchmark datasets demonstrate that asymmetric loss functions can outperform state-of-the-art methods.

NeurIPS Conference 2021 Conference Paper

Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP

  • Zihan Zhang
  • Jiaqi Yang
  • Xiangyang Ji
  • Simon S. Du

This paper presents new \emph{variance-aware} confidence sets for linear bandits and linear mixture Markov Decision Processes (MDPs). With the new confidence sets, we obtain the follow regret bounds: For linear bandits, we obtain an $\widetilde{O}(\mathrm{poly}(d)\sqrt{1 + \sum_{k=1}^{K}\sigma_k^2})$ data-dependent regret bound, where $d$ is the feature dimension, $K$ is the number of rounds, and $\sigma_k^2$ is the \emph{unknown} variance of the reward at the $k$-th round. This is the first regret bound that only scales with the variance and the dimension but \emph{no explicit polynomial dependency on $K$}. When variances are small, this bound can be significantly smaller than the $\widetilde{\Theta}\left(d\sqrt{K}\right)$ worst-case regret bound. For linear mixture MDPs, we obtain an $\widetilde{O}(\mathrm{poly}(d, \log H)\sqrt{K})$ regret bound, where $d$ is the number of base models, $K$ is the number of episodes, and $H$ is the planning horizon. This is the first regret bound that only scales \emph{logarithmically} with $H$ in the reinforcement learning with linear function approximation setting, thus \emph{exponentially improving} existing results, and resolving an open problem in \citep{zhou2020nearly}. We develop three technical ideas that may be of independent interest: 1) applications of the peeling technique to both the input norm and the variance magnitude, 2) a recursion-based estimator for the variance, and 3) a new convex potential lemma that generalizes the seminal elliptical potential lemma.

IROS Conference 2021 Conference Paper

Local to Global Plane Regularity Aggregation for Dense Surfel Mapping

  • Jiexiang Tan
  • Xiangyang Ji

In this paper, we propose a novel local to global plane regularity aggregation framework for dense surfel mapping, aiming for real-time reconstruction of high-quality 3D global models in both indoor and urban environments. Different from prior works that directly localize surfels globally, we investigate three interplanar geometric relations: {coplanarity, parallelism, orthogonality} from local to global scales as additional structural regularities in reconstruction, promoting the performance in plane-dominated scenes remarkably. Given a monocular RGB-D video as input, our framework extracts and utilizes the interplanar relations in three stages: local surfel creation, local to global relation propagation, and global plane-guided re-localization. In the first stage, surfels are created and refined within the current frame by aggregating temporal and spatial cues. The interplanar relations are adopted to regulate the normal and position of each surfel. Then in the second stage, we simultaneously establish correspondences between the created surfels and global model and propagate the interplanar relations from local to global. Finally, the positions of surfels are further relocated and optimized in a larger scale, based on the global interplanar relation priors aggregated across all local frames. Extensive experiments on datasets of different scales demonstrate that our framework achieves superior performance in terms of consistency and accuracy of the reconstructed global model. Meanwhile, the capability of our framework in the real-time 3D reconstruction on CPU opens the door to practical application.

ICML Conference 2021 Conference Paper

Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity

  • Zihan Zhang
  • Yuan Zhou 0007
  • Xiangyang Ji

In this paper we consider the problem of learning an $\epsilon$-optimal policy for a discounted Markov Decision Process (MDP). Given an MDP with $S$ states, $A$ actions, the discount factor $\gamma \in (0, 1)$, and an approximation threshold $\epsilon > 0$, we provide a model-free algorithm to learn an $\epsilon$-optimal policy with sample complexity $\tilde{O}(\frac{SA\ln(1/p)}{\epsilon^2(1-\gamma)^{5. 5}})$ \footnote{In this work, the notation $\tilde{O}(\cdot)$ hides poly-logarithmic factors of $S, A, 1/(1-\gamma)$, and $1/\epsilon$. } and success probability $(1-p)$. For small enough $\epsilon$, we show an improved algorithm with sample complexity $\tilde{O}(\frac{SA\ln(1/p)}{\epsilon^2(1-\gamma)^{3}})$. While the first bound improves upon all known model-free algorithms and model-based ones with tight dependence on $S$, our second algorithm beats all known sample complexity bounds and matches the information theoretic lower bound up to logarithmic factors.

ICML Conference 2021 Conference Paper

Near Optimal Reward-Free Reinforcement Learning

  • Zihan Zhang
  • Simon S. Du
  • Xiangyang Ji

We study the reward-free reinforcement learning framework, which is particularly suitable for batch reinforcement learning and scenarios where one needs policies for multiple reward functions. This framework has two phases: in the exploration phase, the agent collects trajectories by interacting with the environment without using any reward signal; in the planning phase, the agent needs to return a near-optimal policy for arbitrary reward functions. %This framework is suitable for batch RL setting and the setting where there are multiple reward functions of interes We give a new efficient algorithm, \textbf{S}taged \textbf{S}ampling + \textbf{T}runcated \textbf{P}lanning (\algoname), which interacts with the environment at most $O\left( \frac{S^2A}{\epsilon^2}\poly\log\left(\frac{SAH}{\epsilon}\right) \right)$ episodes in the exploration phase, and guarantees to output a near-optimal policy for arbitrary reward functions in the planning phase, where $S$ is the size of state space, $A$ is the size of action space, $H$ is the planning horizon, and $\epsilon$ is the target accuracy relative to the total reward. Notably, our sample complexity scales only \emph{logarithmically} with $H$, in contrast to all existing results which scale \emph{polynomially} with $H$. Furthermore, this bound matches the minimax lower bound $\Omega\left(\frac{S^2A}{\epsilon^2}\right)$ up to logarithmic factors. Our results rely on three new techniques: 1) A new sufficient condition for the dataset to plan for an $\epsilon$-suboptimal policy % for any totally bounded reward function; 2) A new way to plan efficiently under the proposed condition using soft-truncated planning; 3) Constructing extended MDP to maximize the truncated accumulative rewards efficiently.

AAAI Conference 2021 Conference Paper

Nearest Neighbor Classifier Embedded Network for Active Learning

  • Fang Wan
  • Tianning Yuan
  • Mengying Fu
  • Xiangyang Ji
  • Qingming Huang
  • Qixiang Ye

Deep neural networks (DNNs) have been widely applied to active learning. Despite of its effectiveness, the generalization ability of the discriminative classifier (the softmax classifier) is questionable when there is a significant distribution bias between the labeled set and the unlabeled set. In this paper, we attempt to replace the softmax classifier in deep neural network with a nearest neighbor classifier, considering its progressive generalization ability within the unknown subspace. Our proposed active learning approach, termed nearest Neighbor Classifier Embedded network (NCE-Net), targets at reducing the risk of over-estimating unlabeled samples while improving the opportunity to query informative samples. NCE-Net is conceptually simple but surprisingly powerful, as justified from the perspective of the subset information, which defines a metric to quantify model generalization ability in active learning. Experimental results show that, with simple selection based on rejection or confusion confidence, NCE-Net improves state-of-the-arts on image classification and object detection tasks with significant margins.

AAAI Conference 2021 Conference Paper

SD-Pose: Semantic Decomposition for Cross-Domain 6D Object Pose Estimation

  • Zhigang Li
  • Yinlin Hu
  • Mathieu Salzmann
  • Xiangyang Ji

The current leading 6D object pose estimation methods rely heavily on annotated real data, which is highly costly to acquire. To overcome this, many works have proposed to introduce computer-generated synthetic data. However, bridging the gap between the synthetic and real data remains a severe problem. Images depicting different levels of realism/semantics usually have different transferability between the synthetic and real domains. Inspired by this observation, we introduce an approach, SD-Pose, that explicitly decomposes the input image into multi-level semantic representations and then combines the merits of each representation to bridge the domain gap. Our comprehensive analyses and experiments show that our semantic decomposition strategy can fully utilize the different domain similarities of different representations, thus allowing us to outperform the state of the art on modern 6D object pose datasets without accessing any real data during training.

NeurIPS Conference 2021 Conference Paper

TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification

  • Zhuchen Shao
  • Hao Bian
  • Yang Chen
  • Yifeng Wang
  • Jian Zhang
  • Xiangyang Ji
  • Yongbing Zhang

Multiple instance learning (MIL) is a powerful tool to solve the weakly supervised classification in whole slide image (WSI) based pathology diagnosis. However, the current MIL methods are usually based on independent and identical distribution hypothesis, thus neglect the correlation among different instances. To address this problem, we proposed a new framework, called correlated MIL, and provided a proof for convergence. Based on this framework, we devised a Transformer based MIL (TransMIL), which explored both morphological and spatial information. The proposed TransMIL can effectively deal with unbalanced/balanced and binary/multiple classification with great visualization and interpretability. We conducted various experiments for three different computational pathology problems and achieved better performance and faster convergence compared with state-of-the-art methods. The test AUC for the binary tumor classification can be up to 93. 09% over CAMELYON16 dataset. And the AUC over the cancer subtypes classification can be up to 96. 03% and 98. 82% over TCGA-NSCLC dataset and TCGA-RCC dataset, respectively. Implementation is available at: https: //github. com/szc19990412/TransMIL.

ICRA Conference 2020 Conference Paper

A Unified Framework for Piecewise Semantic Reconstruction in Dynamic Scenes via Exploiting Superpixel Relations

  • Yan Di
  • Henrique Morimitsu
  • Zhiqiang Lou
  • Xiangyang Ji

This paper presents a novel framework for dense piecewise semantic reconstruction in dynamic scenes containing complex background and moving objects via exploiting superpixel relations. We utilize two kinds of superpixel relations: motion relations and spatial relations, each having three subcategories: coplanar, hinge, and crack. Spatial relations provide constraints on the spatial locations of neighboring superpixels and thus can be used to reconstruct dynamic scenes. However, spatial relations can not be estimated directly with epipolar geometry due to moving objects in dynamic scenes. We synthesize the results of semantic instance segmentation and motion relations to estimate spatial relations. Given consecutive frames, we mainly develop our method in five main stages: preprocessing, motion estimation, superpixel relation analysis, reconstruction and refinement. Extensive experiments on various datasets demonstrate that our method outperforms competitors in reconstruction quality. Furthermore, our method presents a feasible way to incorporate semantic information in Structure-from-Motion (SFM) based reconstruction pipelines.

NeurIPS Conference 2020 Conference Paper

Almost Optimal Model-Free Reinforcement Learningvia Reference-Advantage Decomposition

  • Zihan Zhang
  • Yuan Zhou
  • Xiangyang Ji

We study the reinforcement learning problem in the setting of finite-horizon1episodic Markov Decision Processes (MDPs) with S states, A actions, and episode length H. We propose a model-free algorithm UCB-ADVANTAGE and prove that it achieves \tilde{O}(\sqrt{H^2 SAT}) regret where T=KH and K is the number of episodes to play. Our regret bound improves upon the results of [Jin et al. , 2018] and matches the best known model-based algorithms as well as the information theoretic lower bound up to logarithmic factors. We also show that UCB-ADVANTAGE achieves low local switching cost and applies to concurrent reinforcement learning, improving upon the recent results of [Bai et al. , 2019].

ICRA Conference 2020 Conference Paper

Pose-guided Auto-Encoder and Feature-Based Refinement for 6-DoF Object Pose Regression

  • Zhigang Li 0004
  • Xiangyang Ji

Accurately estimating the 6-DoF object pose from a single RGB image is a challenging task in computer vision. Though pose regression approaches have achieved great progress, the performance is still limited. In this work, we propose Pose-guided Auto-Encoder (PAE), which can distill better pose-related features from the image by utilizing a suitable pose representation, 3D Location Field (3DLF), to guide the encoding process. The features from PAE show strong robustness to pose-irrelevant factors. Compared with traditional auto-encoder, PAE can not only improve the pose estimation performance but also handle the ambiguity viewpoints problem. Further, we propose Feature-based Pose Refiner (FPR), which refines the pose from the extracted features without rendering. Combining PAE with FPR, our approach achieved state-of-the-art performance on the widely used LINEMOD dataset. Our approach not only outperforms the direct regression-based approaches with a large margin but also thrillingly surpasses current state-of-the-art indirect PnP-based approach.

NeurIPS Conference 2019 Conference Paper

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

  • Zihan Zhang
  • Xiangyang Ji

We present an algorithm based on the \emph{Optimism in the Face of Uncertainty} (OFU) principle which is able to learn Reinforcement Learning (RL) modeled by Markov decision process (MDP) with finite state-action space efficiently. By evaluating the state-pair difference of the optimal bias function $h^{*}$, the proposed algorithm achieves a regret bound of $\tilde{O}(\sqrt{SATH})$\footnote{The symbol $\tilde{O}$ means $O$ with log factors ignored. } for MDP with S states and A actions, in the case that an upper bound $H$ on the span of $h^{*}$, i. e. , $sp(h^{*})$ is known. This result outperforms the best previous regret bounds $\tilde{O}(HS\sqrt{AT})$\cite{bartlett2009regal} by a factor of $\sqrt{SH}$. Furthermore, this regret bound matches the lower bound of $\Omega(\sqrt{SATH})$\cite{jaksch2010near} up to a logarithmic factor. As a consequence, we show that there is a near optimal regret bound of $\tilde{O}(\sqrt{DSAT})$ for MDPs with finite diameter $D$ compared to the lower bound of $\Omega(\sqrt{DSAT})$\cite{jaksch2010near}.