Arrow Research search

Author name cluster

Lin Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

70 papers
2 author rows

Possible papers

70

JBHI Journal 2026 Journal Article

An Automatic 3D PET Tumor Segmentation Framework Assisted by Geodesic Sequences

  • Lin Yang
  • Dan Shao
  • Chuanli Cheng
  • Chao Zou
  • Zhenxing Huang
  • Hairong Zheng
  • Dong Liang
  • Zhi-Feng Pang

Positron Emission Tomography (PET) images reflect the metabolic rate of tracers in different tissues of the human body, crucial for early cancer diagnosis and treatment. Accurate tumor segmentation is essential to aid clinicians in determining drug dosages. Due to the low resolution of PET images, prior information (such as CT, MRI or distance information) are often incorporated to assist PET segmentation. In this paper, we propose an automatic 3D PET tumor segmentation framework assisted by geodesic sequences. Specifically, considering the intrinsic characteristics of PET images, we first construct geodesic prior, which effectively enhances the contrast between the tumor and background while suppressing noise and the influence of other tissues. To address the need for seed points in the geodesic prior, an automatic marking strategy is designed that identifies all suspected lesion regions and uses their central points as a series of seeds to generate the corresponding geodesic sequences. Subsequently, we develop a three-branch network architecture to simultaneously process PET images, geodesic sequences, and background geodesic information. To enhance image features, a distance attention mechanism is introduced at the end of the network encoder to effectively measure the similarity between different geodesic features, refining the image features. Finally, the network incorporates spatial regularization and local PET intensity information into the activation function via the Soft Threshold Dynamics with Local Intensity Fitting (STDLIF) module, further improving segmentation accuracy. Experimental results demonstrate that, compared to existing state-of-the-art algorithms, the proposed method shows better segmentation performance on both clinical and public datasets.

AAAI Conference 2026 Conference Paper

A²Flow: Automating Agentic Workflow Generation via Self-Adaptive Abstraction Operators

  • Mingming Zhao
  • Xiaokang Wei
  • Yuanqi Shao
  • Kaiwen Zhou
  • Lin Yang
  • Siwei Rao
  • Junhui Zhan
  • Zhitang Chen

Large language models (LLMs) have shown strong potential in automating the design of agentic workflows. However, existing methods still rely heavily on manually predefined operators, limiting generalization and scalability. To address this issue, we propose A²Flow, a fully automated framework for agentic workflow generation based on self-adaptive abstraction operators. A²Flow employs a three-stage operator extraction process: 1) Case-based Initial Operator Generation: leveraging expert demonstrations and LLM reasoning to generate case-specific operators; 2) Operator Clustering and Preliminary Abstraction: grouping similar operators across tasks to form preliminary abstractions; and 3) Deep Extraction for Abstract Execution Operators: applying long chain-of-thought prompting and multi-path reasoning to derive compact and generalizable execution operators. These operators serve as reusable building blocks for workflow construction without manual predefinition. Furthermore, we enhance node-level workflow search with an operator memory mechanism, which retains historical outputs to enrich context and improve decision-making. Experiments on general and embodied benchmarks show that A²Flow achieves a 2.4% and 19.3% average performance improvement and reduces resource usage by 37% over state-of-the-art baselines.

AAAI Conference 2026 Conference Paper

MIRA: Evaluating Multimodal AI on Complex Clinical Reasoning in Interventional Radiology

  • Jingxiong Li
  • Chenglu Zhu
  • Sunyi Zheng
  • Yuxuan Sun
  • Yifei Wang
  • He Liu
  • Yunlong Zhang
  • Yixuan Si

We present MIRA (Multimodal Interventional RAdiology evaluation), a comprehensive benchmark for evaluating large multimodal models in expert-level interventional radiology tasks requiring specialized domain knowledge and advanced visual reasoning capabilities. Unlike existing medical benchmarks that primarily provide binary labels without contextual depth, MIRA offers diverse question formats, including open-ended, closed-ended, single-choice, and multiple-choice categories, each accompanied by detailed expert-validated explanations. The benchmark incorporates approximately 184K high-quality medical images spanning multiple imaging modalities with 1.2M meticulously generated question-answer pairs across various anatomical regions. These pairs were created through a sophisticated cascade methodology involving expert interventional radiologists at both the data collection and validation stages. Our comprehensive evaluation, encompassing zero-shot testing and fine-tuning experiments of large multimodal models, revealing significant performance gaps between AI systems and human specialists. Fine-tuning experiments demonstrate substantial improvements, with models achieving up to 0.80 accuracy on single-choice questions. MIRA establishes a challenging benchmark that suggests promising directions for developing specialized clinical AI systems for interventional radiology.

AAAI Conference 2026 Conference Paper

Towards Effective and Efficient Context-aware Nucleus Detection in Histopathology Whole Slide Images

  • Zhongyi Shui
  • Honglin Li
  • Yunlong Zhang
  • Yuxuan Sun
  • Yiwen Ye
  • Pingyi Chen
  • Ruizhe Guo
  • Lei Cui

Nucleus detection in histopathology whole slide images (WSIs) is crucial for a broad spectrum of clinical applications. The gigapixel size of WSIs necessitates the use of sliding window methodology for nucleus detection. However, mainstream methods process each sliding window independently, which overlooks broader contextual information and easily leads to inaccurate predictions. To address this limitation, recent studies additionally crop a large Filed-of-View (LFoV) patch centered on each sliding window to extract contextual features. However, such methods substantially increase whole-slide inference latency. In this work, we propose an effective and efficient context-aware nucleus detection approach. Specifically, instead of using lFoV patches, we aggregate contextual clues from off-the-shelf features of historically visited sliding windows, which greatly enhances the inference efficiency. Moreover, compared to lFoV patches used in previous works, the sliding window patches have higher magnification and provide finer-grained tissue details, thereby enhancing the classification accuracy. To develop the proposed context-aware model, we utilize annotated patches along with their surrounding unlabeled patches for training. Beyond exploiting high-level tissue context from these surrounding regions, we design a post-training strategy that leverages abundant unlabeled nucleus samples within them to enhance the model's context adaptability. Extensive experimental results on three challenging benchmarks demonstrate the superiority of our method.

NeurIPS Conference 2025 Conference Paper

Breaking the Frozen Subspace: Importance Sampling for Low-Rank Optimization in LLM Pretraining

  • Haochen Zhang
  • Junze Yin
  • Guanchu Wang
  • Zirui Liu
  • Lin Yang
  • Tianyi Zhang
  • Anshumali Shrivastava
  • Vladimir Braverman

Low-rank optimization has emerged as a promising approach to enabling memory-efficient training of large language models (LLMs). Existing low-rank optimization methods typically project gradients onto a low-rank subspace, reducing the memory cost of storing optimizer states. A key challenge in these methods is selecting suitable subspaces to ensure an effective optimization trajectory. Most existing approaches select the dominant subspace to preserve gradient information, as this intuitively provides the best approximation. However, we find that in practice, the dominant subspace stops changing during pretraining, thereby constraining weight updates to similar subspaces. In this paper, we propose importance sampling for low-rank optimization in LLM pretraining with a provable convergence guarantee, which the dominant subspace approach does not have. Empirically, we demonstrate that our method significantly outperforms previous methods in LLM pretraining tasks.

NeurIPS Conference 2025 Conference Paper

Combinatorial Ski Rental Problem: Robust and Learning-Augmented Algorithms

  • Ziwei Li
  • Bo Sun
  • Zhiqiu Zhang
  • Mohammad Hajiesmaili
  • Binghan Wu
  • Lin Yang
  • Yang Gao

We introduce and study the Combinatorial Ski Rental (CSR) problem, which involves multiple items that can be rented or purchased, either individually or in combination. At each time step, a decision-maker must make an irrevocable buy-or-rent decision for items that have not yet been purchased, without knowing the end of the time horizon. We propose a randomized online algorithm, Sorted Optimal Amortized Cost (SOAC), that achieves the optimal competitive ratio. Moreover, SOAC can be extended to address various well-known ski rental variants, including the multi-slope, multi-shop, multi-commodity ski rental and CSR with upgrading problems. Building on the proposed SOAC algorithm, we further develop a learning-augmented algorithm that leverages machine-learned predictions to improve the performance of CSR. This algorithm is capable of recovering or improving upon existing results of learning-augmented algorithms in both the classic ski rental and multi-shop ski rental problems. Experimental results validate our theoretical analysis and demonstrate the advantages of our algorithms over baseline methods for ski rental problems.

NeurIPS Conference 2025 Conference Paper

CPathAgent: An Agent-based Foundation Model for Interpretable High-Resolution Pathology Image Analysis Mimicking Pathologists' Diagnostic Logic

  • Yuxuan Sun
  • Yixuan Si
  • Chenglu Zhu
  • Kai Zhang
  • Zhongyi Shui
  • Bowen Ding
  • Tao Lin
  • Lin Yang

Recent advances in computational pathology have led to the emergence of numerous foundation models. These models typically rely on general-purpose encoders with multi-instance learning for whole slide image (WSI) classification or apply multimodal approaches to generate reports directly from images. However, these models cannot emulate the diagnostic approach of pathologists, who systematically examine slides at low magnification to obtain an overview before progressively zooming in on suspicious regions to formulate comprehensive diagnoses. Instead, existing models directly output final diagnoses without revealing the underlying reasoning process. To address this gap, we introduce CPathAgent, an innovative agent-based approach that mimics pathologists' diagnostic workflow by autonomously navigating across WSI through zoom-in/out and move operations based on observed visual features, thereby generating substantially more transparent and interpretable diagnostic summaries. To achieve this, we develop a multi-stage training strategy that unifies patch-level, region-level, and WSI-level capabilities within a single model, which is essential for replicating how pathologists understand and reason across diverse image scales. Additionally, we construct PathMMU-HR², the first expert-validated benchmark for large region analysis. This represents a critical intermediate scale between patches and whole slides, reflecting a key clinical reality where pathologists typically examine several key large regions rather than entire slides at once. Extensive experiments demonstrate that CPathAgent consistently outperforms existing approaches across benchmarks at three different image scales, validating the effectiveness of our agent-based diagnostic approach and highlighting a promising direction for computational pathology.

NeurIPS Conference 2025 Conference Paper

Deployment Efficient Reward-Free Exploration with Linear Function Approximation

  • Zihan Zhang
  • Yuxin Chen
  • Jason Lee
  • Simon Du
  • Lin Yang
  • Ruosong Wang

We study deployment-efficient reward-free exploration with linear function approximation, where the goal is to explore a linear Markov Decision Process (MDP) without revealing the reward function, while minimizing the number of distinct policies implemented during learning. By ``deployment efficient'', we mean algorithms that require few policies deployed during exploration -- crucial in real-world applications where such deployments are costly or disruptive. We design a novel reinforcement learning algorithm that achieves near-optimal deployment efficiency for linear MDPs in the reward-free setting, using at most $H$ exploration policies during execution (where $H$ is the horizon length), while maintaining sample complexity polynomial in feature dimension and horizon length. Unlike previous approaches with similar deployment efficiency guarantees, our algorithm's sample complexity is independent of the reachability or explorability coefficients of the underlying MDP, which can be arbitrarily small and lead to unbounded sample complexity in certain cases -- directly addressing an open problem from prior work. Our technical contributions include a data-dependent method for truncating state-action pairs in linear MDPs, efficient offline policy evaluation and optimization algorithms for these truncated MDPs, and a careful integration of these components to implement reward-free exploration with linear function approximation without sacrificing deployment efficiency.

NeurIPS Conference 2025 Conference Paper

Distributed Multi-Agent Bandits Over Erdős-Rényi Random Networks

  • Jingyuan Liu
  • Hao Qiu
  • Lin Yang
  • Mengfan Xu

We study the distributed multi-agent multi-armed bandit problem with heterogeneous rewards over random communication graphs. Uniquely, at each time step $t$ agents communicate over a time-varying random graph $\mathcal{G}\_t$ generated by applying the Erdős–Rényi model to a fixed connected base graph $\mathcal{G}$ (for classical Erdos-Rényi graphs, $\mathcal{G}$ is a complete graph), where each potential edge in $\mathcal{G}$ is randomly and independently present with the link probability $p$. Notably, the resulting random graph is not necessarily connected at each time step. Each agent's arm rewards follow time-invariant distributions, and the reward distribution for the same arm may differ across agents. The goal is to minimize the cumulative expected regret relative to the global mean reward of each arm, defined as the average of that arm’s mean rewards across all agents. To this end, we propose a fully distributed algorithm that integrates the arm elimination strategy with the random gossip algorithm. We theoretically show that the regret upper bound is of order $\log T$ and is highly interpretable, where $T$ is the time horizon. It includes the optimal centralized regret $\mathcal O\left(\sum_{k: \Delta_k>0} \frac{\log T}{\Delta_k}\right)$ and an additional term $\mathcal O\left(\frac{N^2 \log T}{p \lambda_{N-1}(\operatorname{Lap}(\mathcal{G}))} + \frac{KN^2 \log T}{p}\right)$ where $N$ and $K$ denote the total number of agents and arms, respectively. This term reflects the impact of $\mathcal G$'s algebraic connectivity $\lambda_{N-1}(\operatorname{Lap}(\mathcal{G}))$ and the link probability $p$, and thus highlights a fundamental trade-off between communication efficiency and regret. As a by-product, we show a nearly optimal regret lower bound. Finally, our numerical experiments not only show the superiority of our algorithm over existing benchmarks, but also validate the theoretical regret scaling with problem complexity.

NeurIPS Conference 2025 Conference Paper

Federated Multi-armed Bandits with Efficient Bit-Level Communications

  • Haoran Zhang
  • Yang Xu
  • Xuchuang Wang
  • Hao-Xu Chen
  • Hao Qiu
  • Lin Yang
  • Yang Gao

In this work, we study the federated multi-armed bandit (FMAB) problem, where a set of distributed agents collaboratively aim to minimize cumulative regret while interacting with a shared set of arms. Unlike traditional centralized bandit models, agents in FMAB settings are connected via a communication graph and cannot share data freely due to bandwidth limitations or privacy constraints. This raises a fundamental challenge: how to achieve optimal learning performance under stringent communication budgets. We propose a novel communication-efficient algorithm that decouples the learning process into two phases: one for eliminating suboptimal arms through early and frequent communication of key decisions, and another for refining global estimates using buffered, quantized, and differentially transmitted statistics. By carefully balancing the communication frequency and precision of shared information, our algorithm achieves the optimal individual regret bound $O(N^{-1}\log T)$ while significantly reducing the total number of communication rounds and transmitted bits. Theoretically, we derive tight upper bounds on both individual cumulative regret and group regret, and prove that our method asymptotically matches the lower bound of regret in federated settings. Experimental results on synthetic data validate the effectiveness of the proposed approach in various graph topologies and under heterogeneous feedback.

NeurIPS Conference 2025 Conference Paper

LogicTree: Improving Complex Reasoning of LLMs via Instantiated Multi-step Synthetic Logical Data

  • Zehao Wang
  • Lin Yang
  • Jie Wang
  • Kehan Wang
  • Hanzhu Chen
  • Bin Wang
  • Jianye Hao
  • Defu Lian

Despite their remarkable performance on various tasks, Large Language Models (LLMs) still struggle with logical reasoning, particularly in complex and multi-step reasoning processes. Among various efforts to enhance LLMs' reasoning capabilities, synthesizing large-scale, high-quality logical reasoning datasets has emerged as a promising direction. However, existing methods often rely on predefined templates for logical reasoning data generation, limiting their adaptability to real-world scenarios. To address the limitation, we propose LogicTree, a novel framework for efficiently synthesizing multi-step logical reasoning dataset that excels in both complexity and instantiation. By iteratively searching for applicable logic rules based on structural pattern matching to perform backward deduction, LogicTree constructs multi-step logic trees that capture complex reasoning patterns. Furthermore, we employ a two-stage LLM-based approach to instantiate various real-world scenarios for each logic tree, generating consistent real-world reasoning processes that carry contextual significance. This helps LLMs develop generalizable logical reasoning abilities across diverse scenarios rather than merely memorizing templates. Experiments on multiple benchmarks demonstrate that our approach achieves an average improvement of 9. 4\% in accuracy on complex logical reasoning tasks.

NeurIPS Conference 2025 Conference Paper

Near-Optimal Sample Complexity for Online Constrained MDPs

  • Chang Liu
  • Yunfan Li
  • Lin Yang

Safety is a fundamental challenge in reinforcement learning (RL), particularly in real-world applications such as autonomous driving, robotics, and healthcare. To address this, Constrained Markov Decision Processes (CMDPs) are commonly used to enforce safety constraints while optimizing performance. However, existing methods often suffer from significant safety violations or require a high sample complexity to generate near-optimal policies. We address two settings: relaxed feasibility, where small violations are allowed, and strict feasibility, where no violation is allowed. We propose a model-based primal-dual algorithm that balances regret and bounded constraint violations, drawing on techniques from online RL and constrained optimization. For relaxed feasibility, we prove that our algorithm returns an $\varepsilon$-optimal policy with $\varepsilon$-bounded violation with arbitrarily high probability, requiring $\tilde{O}\left(\frac{SAH^3}{\varepsilon^2}\right)$ learning episodes, matching the lower bound for unconstrained MDPs. For strict feasibility, we prove that our algorithm returns an $\varepsilon$-optimal policy with zero violation with arbitrarily high probability, requiring $\tilde{O}\left(\frac{SAH^5}{\varepsilon^2\zeta^2}\right)$ learning episodes, where $\zeta$ is the problem-dependent Slater constant characterizing the size of the feasible region. This result matches the lower bound for learning CMDPs with access to a generative model. Episodic tabular CMDPs serve as a crucial benchmark for safe RL, providing a structured environment for theoretical analysis and algorithmic validation. Our results demonstrate that learning CMDPs in an online setting is as easy as learning with a generative model and is no more challenging than learning unconstrained MDPs when small violations are allowed.

NeurIPS Conference 2025 Conference Paper

PathVQ: Reforming Computational Pathology Foundation Model for Whole Slide Image Analysis via Vector Quantization

  • Honglin Li
  • Zhongyi Shui
  • Yunlong Zhang
  • Chenglu Zhu
  • Lin Yang

Pathology whole slide image (WSI) analysis is vital for disease diagnosis and understanding. While foundation models (FMs) have driven recent advances, their scalability in pathology remains a key challenge. In particular, vision-language (VL) pathology FMs align visual features with language annotation for downstream tasks, but they rely heavily on large-scale image-text paired data, which is scarce thus limiting generalization. On the other hand, vision-only pathology FMs can leverage abundant unlabeled data via self-supervised learning (SSL). However, current approaches often use the [CLS] token from tile-level ViTs as slide-level input for efficiency (a tile with 224×224 pixels composed of 196 patches with 16×16 pixels). This SSL pretrained [CLS] token lacks alignment with downstream objectives, limiting effectiveness. We find that spatial patch tokens retain a wealth of informative features beneficial for downstream tasks, but utilizing all of them incurs up to 200× higher computation and storage costs compared [CLS] token only (e. g. , 196 tokens per ViT$_{224}$). This highlights a fundamental trade-off between efficiency and representational richness to build scalable pathology FMs. To address this, we propose a feature distillation framework via vector-quantization (VQ) that compresses patch tokens into discrete indices and reconstructs them via a decoder, achieving 64× compression (1024 → 16 dimensions) while preserving fidelity. We further introduce a multi-scale VQ (MSVQ) strategy, enhancing both reconstruction and providing SSL supervision for slide-level pretraining. Built upon MSVQ features and supervision signals, we design a progressive convolutional module and a slide-level SSL objective to learn spatially rich representations for downstream WSI tasks. Extensive experiments across multiple datasets demonstrate that our approach achieves state-of-the-art performance, offering a scalable and effective solution for high-performing pathology FMs in WSI analysis.

NeurIPS Conference 2025 Conference Paper

SD-VLM: Spatial Measuring and Understanding with Depth-Encoded Vision-Language Models

  • Pingyi Chen
  • Yujing Lou
  • Shen Cao
  • Jinhui Guo
  • Lubin Fan
  • Yue Wu
  • Lin Yang
  • Lizhuang Ma

While vision language models (VLMs) excel in 2D semantic visual understanding, their ability to quantitatively reason about 3D spatial relationships remains underexplored due to the deficiency of spatial representation ability of 2D images. In this paper, we analyze the problem hindering VLMs’ spatial understanding abilities and propose SD-VLM, a novel framework that significantly enhances fundamental spatial perception abilities of VLMs through two key contributions: (1) propose Massive Spatial Measuring and Understanding (MSMU) dataset with precise spatial annotations, and (2) introduce a simple depth positional encoding method strengthening VLMs’ spatial awareness. MSMU dataset includes massive quantitative spatial tasks with 700K QA pairs, 2. 5M physical numerical annotations, and 10K chain-of-thought augmented samples. We have trained SD-VLM, a strong generalist VLM which shows superior quantitative spatial measuring and understanding capability. SD-VLM not only achieves state-of-the-art performance on our proposed MSMU-Bench, but also shows spatial generalization abilities on other spatial understanding benchmarks including Q-Spatial and SpatialRGPTBench. Extensive experiments demonstrate that SD-VLM outperforms GPT-4o and Intern-VL3-78B by 26. 91% and 25. 56% respectively on MSMU-Bench. Code and models are released at https: //github. com/cpystan/SD-VLM.

NeurIPS Conference 2025 Conference Paper

Trajectory Graph Learning: Aligning with Long Trajectories in Reinforcement Learning Without Reward Design

  • Yunfan Li
  • Eric Liu
  • Lin Yang

Reinforcement learning (RL) often relies on manually designed reward functions, which are difficult to specify and can lead to issues such as reward hacking and suboptimal behavior. Alternatives like inverse RL and preference-based RL attempt to infer surrogate rewards from demonstrations or preferences but suffer from ambiguity and distribution mismatch. A more direct approach, inspired by imitation learning, avoids reward modeling by leveraging expert demonstrations. However, most existing methods align actions only at individual states, failing to capture the coherence of long-horizon trajectories. In this work, we study the problem of directly aligning policies with expert-labeled trajectories to preserve long-horizon behavior without relying on reward signals. Specifically, we aim to learn a policy that maximizes the probability of generating the expert trajectories. Nevertheless, we prove that, in its general form, this trajectory alignment problem is NP-complete. To address this, we propose Trajectory Graph Learning (TGL), a framework that leverages structural assumptions commonly satisfied in practice—such as bounded realizability of expert trajectories or a tree-structured MDP. These enable a graph-based policy planning algorithm that computes optimal policies in polynomial time under known dynamics. For settings with unknown dynamics, we develop a sample-efficient algorithm based on UCB-style exploration and establish sub-linear regret. Experiments on grid-world tasks demonstrate that TGL substantially outperforms standard imitation learning methods for long-trajectory planning.

ICML Conference 2025 Conference Paper

Transfer Q-Learning with Composite MDP Structures

  • Jinhang Chai
  • Elynn Y. Chen
  • Lin Yang

To bridge the gap between empirical success and theoretical understanding in transfer reinforcement learning (RL), we study a principled approach with provable performance guarantees. We introduce a novel composite MDP framework where high-dimensional transition dynamics are modeled as the sum of a low-rank component representing shared structure and a sparse component capturing task-specific variations. This relaxes the common assumption of purely low-rank transition models, allowing for more realistic scenarios where tasks share core dynamics but maintain individual variations. We introduce UCB-TQL (Upper Confidence Bound Transfer Q-Learning), designed for transfer RL scenarios where multiple tasks share core linear MDP dynamics but diverge along sparse dimensions. When applying UCB-TQL to a target task after training on a source task with sufficient trajectories, we achieve a regret bound of $\tilde{\mathcal{O}}(\sqrt{eH^5N})$ that scales independently of the ambient dimension. Here, $N$ represents the number of trajectories in the target task, while $e$ quantifies the sparse differences between tasks. This result demonstrates substantial improvement over single task RL by effectively leveraging their structural similarities. Our theoretical analysis provides rigorous guarantees for how UCB-TQL simultaneously exploits shared dynamics while adapting to task-specific variations.

NeurIPS Conference 2025 Conference Paper

VETA-DiT: Variance-Equalized and Temporally Adaptive Quantization for Efficient 4-bit Diffusion Transformers

  • Qinkai XU
  • Yijin Liu
  • Lin Yang
  • Li Li
  • Yuxiang Fu

Diffusion Transformers (DiTs) have recently demonstrated remarkable performance in visual generation tasks, surpassing traditional U-Net-based diffusion models by significantly improving image and video generation quality and scalability. However, the large model size and iterative denoising process introduce substantial computational and memory overhead, limiting their deployment in real-world applications. Post-training quantization (PTQ) is a promising solution that compresses models and accelerates inference by converting weights and activations to low-bit representations. Despite its potential, PTQ faces significant challenges when applied to DiTs, often resulting in severe degradation of generative quality. To address these issues, we propose VETA-DiT (**V**ariance-**E**qualized and **T**emporal **A**daptation for **Di**ffusion **T**ransformers), a dedicated quantization framework for DiTs. Our method first analyzes the sources of quantization error from the perspective of inter-channel variance and introduces a Karhunen–Loève Transform enhanced alignment to equalize variance across channels, facilitating effective quantization under low bit-widths. Furthermore, to handle the temporal variation of activation distributions inherent in the iterative denoising steps of DiTs, we design an incoherence-aware adaptive method that identifies and properly calibrates timesteps with high quantization difficulty. We validate VETA-DiT on extensive image and video generation tasks, preserving acceptable visual quality under the more aggressive W4A4 configuration. Specifically, VETA-DiT reduces FID by 33. 65 on the DiT-XL/2 model and by 45. 76 on the PixArt-$\Sigma$ model compared to the baseline under W4A4, demonstrating its strong quantization capability and generative performance. Code is available at: https: //github. com/xululi0223/VETA-DiT.

JBHI Journal 2024 Journal Article

Accurate Whole-Brain Image Enhancement for Low-Dose Integrated PET/MR Imaging Through Spatial Brain Transformation

  • Zhenxing Huang
  • Wenbo Li
  • Yaping Wu
  • Lin Yang
  • Yun Dong
  • Yongfeng Yang
  • Hairong Zheng
  • Dong Liang

Positron emission tomography/magnetic resonance imaging (PET/MRI) systems can provide precise anatomical and functional information with exceptional sensitivity and accuracy for neurological disorder detection. Nevertheless, the radiation exposure risks and economic costs of radiopharmaceuticals may pose significant burdens on patients. To mitigate image quality degradation during low-dose PET imaging, we proposed a novel 3D network equipped with a spatial brain transform (SBF) module for low-dose whole-brain PET and MR images to synthesize high-quality PET images. The FreeSurfer toolkit was applied to derive the spatial brain anatomical alignment information, which was then fused with low-dose PET and MR features through the SBF module. Moreover, several deep learning methods were employed as comparison measures to evaluate the model performance, with the peak signal-to-noise ratio (PSNR), structural similarity (SSIM) and Pearson correlation coefficient (PCC) serving as quantitative metrics. Both the visual results and quantitative results illustrated the effectiveness of our approach. The obtained PSNR and SSIM were $41. 96 \, \pm \, 4. 91$ dB (p $0. 9654 \, \pm \, 0. 0215$ (p < 0. 01), which achieved a 19% and 20% improvement, respectively, compared to the original low-dose brain PET images. The volume of interest (VOI) analysis of brain regions such as the left thalamus (PCC = 0. 959) also showed that the proposed method could achieve a more accurate standardized uptake value (SUV) distribution while preserving the details of brain structures. In future works, we hope to apply our method to other multimodal systems, such as PET/CT, to assist clinical brain disease diagnosis and treatment.

JBHI Journal 2024 Journal Article

Combination of Channel Reordering Strategy and Dual CNN-LSTM for Epileptic Seizure Prediction Using Three iEEG Datasets

  • Xiaoshuang Wang
  • Ziheng Gao
  • Meiyan Zhang
  • Ying Wang
  • Lin Yang
  • Jianwen Lin
  • Tommi Kärkkäinen
  • Fengyu Cong

Objective: Intracranial electroencephalogram (iEEG) signals are generally recorded using multiple channels, and channel selection is therefore a significant means in studying iEEG-based seizure prediction. For n channels, $2^{\text{n}}{-1}$ channel cases can be generated for selection. However, by this means, an increase in n can cause an exponential increase in computational consumption, which may result in a failure of channel selection when n is too large. Hence, it is necessary to explore reasonable channel selection strategies under the premise of controlling computational consumption and ensuring high classification accuracy. Given this, we propose a novel method of channel reordering strategy combined with dual CNN-LSTM for effectively predicting seizures. Method: First, for each patient with n channels, interictal and preictal iEEG samples from each single channel are input into the CNN-LSTM model for classification. Then, the F1-score of each single channel is calculated, and the channels are reordered in descending order according to the size of F1-scores ( channel reordering strategy ). Next, iEEG signals with an increasing number of channels are successively fed into the CNN-LSTM model for classification again. Finally, according to the classification results from n channel cases, the channel case with the highest classification rate is selected. Results: Our method is evaluated on the three iEEG datasets: the Freiburg, the SWEC-ETHZ and the American Epilepsy Society Seizure Prediction Challenge (AES-SPC). At the event-based level, the sensitivities of 100%, 100% and 90. 5%, and the false prediction rates (FPRs) of 0. 10/h, 0/h and 0. 47/h, are achieved for the three datasets, respectively. Moreover, compared to an unspecific random predictor, our method also shows a better performance for all patients and dogs from the three datasets. At the segment-based level, the sensitivities-specificities-accuracies-AUCs of 88. 1%–94. 0%–93. 5%–0. 9101, 99. 1%–99. 7%–99. 6%–0. 9935, and 69. 2%–79. 9%–78. 2%–0. 7373, are attained for the three datasets, respectively. Conclusion: Our method can effectively predict seizures and address the challenge of an excessive number of channels during channel selection.

AAAI Conference 2024 Conference Paper

DPA-P2PNet: Deformable Proposal-Aware P2PNet for Accurate Point-Based Cell Detection

  • Zhongyi Shui
  • Sunyi Zheng
  • Chenglu Zhu
  • Shichuan Zhang
  • Xiaoxuan Yu
  • Honglin Li
  • Jingxiong Li
  • Pingyi Chen

Point-based cell detection (PCD), which pursues high-performance cell sensing under low-cost data annotation, has garnered increased attention in computational pathology community. Unlike mainstream PCD methods that rely on intermediate density map representations, the Point-to-Point network (P2PNet) has recently emerged as an end-to-end solution for PCD, demonstrating impressive cell detection accuracy and efficiency. Nevertheless, P2PNet is limited to decoding from a single-level feature map due to the scale-agnostic property of point proposals, which is insufficient to leverage multi-scale information. Moreover, the spatial distribution of pre-set point proposals is biased from that of cells, leading to inaccurate cell localization. To lift these limitations, we present DPA-P2PNet in this work. The proposed method directly extracts multi-scale features for decoding according to the coordinates of point proposals on hierarchical feature maps. On this basis, we further devise deformable point proposals to mitigate the positional bias between proposals and potential cells to promote cell localization. Inspired by practical pathological diagnosis that usually combines high-level tissue structure and low-level cell morphology for accurate cell classification, we propose a multi-field-of-view (mFoV) variant of DPA-P2PNet to accommodate additional large FoV images with tissue information as model input. Finally, we execute the first self-supervised pre-training on immunohistochemistry histopathology image data and evaluate the suitability of four representative self-supervised methods on the PCD task. Experimental results on three benchmarks and a large-scale and real-world interval dataset demonstrate the superiority of our proposed models over the state-of-the-art counterparts. Codes and pre-trained weights are available at https://github.com/windygoo/DPA-P2PNet.

AAAI Conference 2024 Conference Paper

PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology

  • Yuxuan Sun
  • Chenglu Zhu
  • Sunyi Zheng
  • Kai Zhang
  • Lin Sun
  • Zhongyi Shui
  • Yunlong Zhang
  • Honglin Li

As advances in large language models (LLMs) and multimodal techniques continue to mature, the development of general-purpose multimodal large language models (MLLMs) has surged, offering significant applications in interpreting natural images. However, the field of pathology has largely remained untapped, particularly in gathering high-quality data and designing comprehensive model frameworks. To bridge the gap in pathology MLLMs, we present PathAsst, a multimodal generative foundation AI assistant to revolutionize diagnostic and predictive analytics in pathology. The development of PathAsst involves three pivotal steps: data acquisition, CLIP model adaptation, and the training of PathAsst's multimodal generative capabilities. Firstly, we collect over 207K high-quality pathology image-text pairs from authoritative sources. Leveraging the advanced power of ChatGPT, we generate over 180K instruction-following samples. Furthermore, we devise additional instruction-following data specifically tailored for invoking eight pathology-specific sub-models we prepared, allowing the PathAsst to effectively collaborate with these models, enhancing its diagnostic ability. Secondly, by leveraging the collected data, we construct PathCLIP, a pathology-dedicated CLIP, to enhance PathAsst's capabilities in interpreting pathology images. Finally, we integrate PathCLIP with the Vicuna-13b and utilize pathology-specific instruction-tuning data to enhance the multimodal generation capacity of PathAsst and bolster its synergistic interactions with sub-models. The experimental results of PathAsst show the potential of harnessing AI-powered generative foundation model to improve pathology diagnosis and treatment processes. We open-source our dataset, as well as a comprehensive toolkit for extensive pathology data collection and preprocessing at https://github.com/superjamessyx/Generative-Foundation-AI-Assistant-for-Pathology.

NeurIPS Conference 2024 Conference Paper

Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis

  • Honglin Li
  • Yunlong Zhang
  • Pingyi Chen
  • Zhongyi Shui
  • Chenglu Zhu
  • Lin Yang

Histopathology Whole Slide Image (WSI) analysis serves as the gold standard for clinical cancer diagnosis in the daily routines of doctors. To develop computer-aided diagnosis model for histopathology WSIs, previous methods typically employ Multi-Instance Learning to enable slide-level prediction given only slide-level labels. Among these models, vanilla attention mechanisms without pairwise interactions have traditionally been employed but are unable to model contextual information. More recently, self-attention models have been utilized to address this issue. To alleviate the computational complexity of long sequences in large WSIs, methods like HIPT use region-slicing, and TransMIL employs Nystr\"{o}mformer as an approximation of full self-attention. Both approaches suffer from suboptimal performance due to the loss of key information. Moreover, their use of absolute positional embedding struggles to effectively handle long contextual dependencies in shape-varying WSIs. In this paper, we first analyze how the low-rank nature of the long-sequence attention matrix constrains the representation ability of WSI modelling. Then, we demonstrate that the rank of attention matrix can be improved by focusing on local interactions via a local attention mask. Our analysis shows that the local mask aligns with the attention patterns in the lower layers of the Transformer. Furthermore, the local attention mask can be implemented during chunked attention calculation, reducing the quadratic computational complexity to linear with a small local bandwidth. Additionally, this locality helps the model generalize to unseen or under-fitted positions more easily. Building on this, we propose a local-global hybrid Transformer for both computational acceleration and local-global information interactions modelling. Our method, Long-contextual MIL (LongMIL), is evaluated through extensive experiments on various WSI tasks to validate its superiority in: 1) overall performance, 2) memory usage and speed, and 3) extrapolation ability compared to previous methods.

NeurIPS Conference 2023 Conference Paper

Efficient Batched Algorithm for Contextual Linear Bandits with Large Action Space via Soft Elimination

  • Osama Hanna
  • Lin Yang
  • Christina Fragouli

In this paper, we provide the first efficient batched algorithm for contextual linear bandits with large action spaces. Unlike existing batched algorithms that rely on action elimination, which are not implementable for large action sets, our algorithm only uses a linear optimization oracle over the action set to design the policy. The proposed algorithm achieves a regret upper bound $\tilde{O}(\sqrt{T})$ with high probability, and uses $O(\log\log T)$ batches, matching the lower bound on the number of batches (Gao et al. , 2019). When specialized to linear bandits, our algorithm can achieve a high probability gap-dependent regret bound of $\tilde{O}(1/\Delta_{\min})$ with the optimal $\log T$ number of batches, where $\Delta_{\min}$ is the minimum reward gap between a suboptimal arm and the optimal. Our result is achieved via a novel soft elimination approach, that entails $\text{``}$shaping$\text{"}$ the action sets at each batch so that we can efficiently identify (near) optimal actions.

NeurIPS Conference 2023 Conference Paper

Efficient Robust Bayesian Optimization for Arbitrary Uncertain inputs

  • Lin Yang
  • Junlong Lyu
  • Wenlong Lyu
  • Zhitang Chen

Bayesian Optimization (BO) is a sample-efficient optimization algorithm widely employed across various applications. In some challenging BO tasks, input uncertainty arises due to the inevitable randomness in the optimization process, such as machining errors, execution noise, or contextual variability. This uncertainty deviates the input from the intended value before evaluation, resulting in significant performance fluctuations in the final result. In this paper, we introduce a novel robust Bayesian Optimization algorithm, AIRBO, which can effectively identify a robust optimum that performs consistently well under arbitrary input uncertainty. Our method directly models the uncertain inputs of arbitrary distributions by empowering the Gaussian Process with the Maximum Mean Discrepancy (MMD) and further accelerates the posterior inference via Nystrom approximation. Rigorous theoretical regret bound is established under MMD estimation error and extensive experiments on synthetic functions and real problems demonstrate that our approach can handle various input uncertainties and achieve a state-of-the-art performance.

ICML Conference 2023 Conference Paper

Horizon-free Learning for Markov Decision Processes and Games: Stochastically Bounded Rewards and Improved Bounds

  • Shengshi Li
  • Lin Yang

Horizon dependence is an important difference between reinforcement learning and other machine learning paradigms. Yet, existing results tackling the (exact) horizon dependence either assume that the reward is bounded per step, introducing unfair comparison, or assume strict total boundedness that requires the sum of rewards to be bounded almost surely – allowing only restricted noise on the reward observation. This paper addresses these limitations by introducing a new relaxation – expected boundedness on rewards, where we allow the reward to be stochastic with only boundedness on the expected sum – opening the door to study horizon-dependence with a much broader set of reward functions with noises. We establish a novel generic algorithm that achieves no-horizon dependence in terms of sample complexity for both Markov Decision Processes (MDP) and Games, via reduction to a good-conditioned auxiliary Markovian environment, in which only “important” state-action pairs are preserved. The algorithm takes only $\tilde{O}(\frac{S^2A}{\epsilon^2})$ episodes interacting with such an environment to achieve an $\epsilon$-optimal policy/strategy (with high probability), improving (zhang, 2022) (which only applies to MDPs with deterministic rewards). Here $S$ is the number of states and $A$ is the number of actions, and the bound is independent of the horizon $H$.

ICML Conference 2023 Conference Paper

Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling

  • Yunfan Li
  • Yiran Wang
  • Yu Cheng
  • Lin Yang

Policy optimization methods are powerful algorithms in Reinforcement Learning (RL) for their flexibility to deal with policy parameterization and ability to handle model misspecification. However, these methods usually suffer from slow convergence rates and poor sample complexity. Hence it is important to design provably sample efficient algorithms for policy optimization. Yet, recent advances for this problems have only been successful in tabular and linear setting, whose benign structures cannot be generalized to non-linearly parameterized policies. In this paper, we address this problem by leveraging recent advances in value-based algorithms, including bounded eluder-dimension and online sensitivity sampling, to design a low-switching sample-efficient policy optimization algorithm, LPO, with general non-linear function approximation. We show that, our algorithm obtains an $\varepsilon$-optimal policy with only $\widetilde{O}(\frac{\text{poly}(d)}{\varepsilon^3})$ samples, where $\varepsilon$ is the suboptimality gap and $d$ is a complexity measure of the function class approximating the policy. This drastically improves previously best-known sample bound for policy optimization algorithms, $\widetilde{O}(\frac{\text{poly}(d)}{\varepsilon^8})$. Moreover, we empirically test our theory with deep neural nets to show the benefits of the theoretical inspiration.

EWRL Workshop 2023 Workshop Paper

On-Demand Communication for Asynchronous Multi-Agent Bandits

  • Yu-Zhen Janice Chen
  • Lin Yang
  • Xuchuang Wang
  • Xutong Liu
  • Mohammad Hajiesmaili
  • John C. S. Lui
  • Don Towsley

This paper studies a cooperative multi-agent multi-armed stochastic bandit problem where agents operate $\textit{asynchronously}$ -- agent pull times and rates are unknown, irregular, and heterogeneous -- and face the same instance of a $K$-armed bandit problem. Agents can share reward information to speed up the learning process at additional communication costs. We propose $\texttt{ODC}$, an on-demand communication protocol that tailors the communication of each pair of agents based on their empirical pull times. $\texttt{ODC}$ is efficient when the pull times of agents are highly heterogeneous, and its communication complexity depends on the empirical pull times of agents. $\texttt{ODC}$ is a generic protocol that can be integrated into most cooperative bandit algorithms without degrading their performance. We then incorporate $\texttt{ODC}$ into the natural extensions of $\texttt{UCB}$ and $\texttt{AAE}$ algorithms and propose two communication-efficient cooperative algorithms. Our analysis shows that both algorithms are near-optimal in regret.

NeurIPS Conference 2023 Conference Paper

Replicability in Reinforcement Learning

  • Amin Karbasi
  • Grigoris Velegkas
  • Lin Yang
  • Felix Zhou

We initiate the mathematical study of replicability as an algorithmic property in the context of reinforcement learning (RL). We focus on the fundamental setting of discounted tabular MDPs with access to a generative model. Inspired by Impagliazzo et al. [2022], we say that an RL algorithm is replicable if, with high probability, it outputs the exact same policy after two executions on i. i. d. samples drawn from the generator when its internal randomness is the same. We first provide an efficient $\rho$-replicable algorithm for $(\varepsilon, \delta)$-optimal policy estimation with sample and time complexity $\widetilde O\left(\frac{N^3\cdot\log(1/\delta)}{(1-\gamma)^5\cdot\varepsilon^2\cdot\rho^2}\right)$, where $N$ is the number of state-action pairs. Next, for the subclass of deterministic algorithms, we provide a lower bound of order $\Omega\left(\frac{N^3}{(1-\gamma)^3\cdot\varepsilon^2\cdot\rho^2}\right)$. Then, we study a relaxed version of replicability proposed by Kalavasis et al. [2023] called TV indistinguishability. We design a computationally efficient TV indistinguishable algorithm for policy estimation whose sample complexity is $\widetilde O\left(\frac{N^2\cdot\log(1/\delta)}{(1-\gamma)^5\cdot\varepsilon^2\cdot\rho^2}\right)$. At the cost of $\exp(N)$ running time, we transform these TV indistinguishable algorithms to $\rho$-replicable ones without increasing their sample complexity. Finally, we introduce the notion of approximate-replicability where we only require that two outputted policies are close under an appropriate statistical divergence (e. g. , Renyi) and show an improved sample complexity of $\widetilde O\left(\frac{N\cdot\log(1/\delta)}{(1-\gamma)^5\cdot\varepsilon^2\cdot\rho^2}\right)$.

NeurIPS Conference 2023 Conference Paper

Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds

  • Jiayi Huang
  • Han Zhong
  • Liwei Wang
  • Lin Yang

While numerous works have focused on devising efficient algorithms for reinforcement learning (RL) with uniformly bounded rewards, it remains an open question whether sample or time-efficient algorithms for RL with large state-action space exist when the rewards are \emph{heavy-tailed}, i. e. , with only finite $(1+\epsilon)$-th moments for some $\epsilon\in(0, 1]$. In this work, we address the challenge of such rewards in RL with linear function approximation. We first design an algorithm, \textsc{Heavy-OFUL}, for heavy-tailed linear bandits, achieving an \emph{instance-dependent} $T$-round regret of $\tilde{O}\big(d T^{\frac{1-\epsilon}{2(1+\epsilon)}} \sqrt{\sum_{t=1}^T \nu_t^2} + d T^{\frac{1-\epsilon}{2(1+\epsilon)}}\big)$, the \emph{first} of this kind. Here, $d$ is the feature dimension, and $\nu_t^{1+\epsilon}$ is the $(1+\epsilon)$-th central moment of the reward at the $t$-th round. We further show the above bound is minimax optimal when applied to the worst-case instances in stochastic and deterministic linear bandits. We then extend this algorithm to the RL settings with linear function approximation. Our algorithm, termed as \textsc{Heavy-LSVI-UCB}, achieves the \emph{first} computationally efficient \emph{instance-dependent} $K$-episode regret of $\tilde{O}(d \sqrt{H \mathcal{U}^*} K^\frac{1}{1+\epsilon} + d \sqrt{H \mathcal{V}^* K})$. Here, $H$ is length of the episode, and $\mathcal{U}^*, \mathcal{V}^*$ are instance-dependent quantities scaling with the central moment of reward and value functions, respectively. We also provide a matching minimax lower bound $\Omega(d H K^{\frac{1}{1+\epsilon}} + d \sqrt{H^3 K})$ to demonstrate the optimality of our algorithm in the worst case. Our result is achieved via a novel robust self-normalized concentration inequality that may be of independent interest in handling heavy-tailed noise in general online regression problems.

AAAI Conference 2022 Conference Paper

Explainable Survival Analysis with Convolution-Involved Vision Transformer

  • Yifan Shen
  • Li Liu
  • Zhihao Tang
  • Zongyi Chen
  • Guixiang Ma
  • Jiyan Dong
  • Xi Zhang
  • Lin Yang

Image-based survival prediction models can facilitate doctors in diagnosing and treating cancer patients. With the advance of digital pathology technologies, the big whole slide images (WSIs) provide increased resolution and more details for diagnosis. However, the gigabytesize or even terabyte-size WSIs would make most models computationally infeasible. To this end, instead of using the complete WSIs, most of the existing models only use a pre-selected subset of key patches or patch clusters as input, which might discard some important morphology information. In this work, we propose a novel survival analysis model to fully utilize the complete WSI information. We show that the use of a Vision Transformer (ViT) backbone, together with convolution operations involved in it, is an effective approach to improve the prediction performance. Additionally, we present a post-hoc explainable method to identify the most salient patches and distinct morphology features, making the model more faithful and the results easier to comprehend by human users. Evaluations on two large cancer datasets show that our proposed model is more effective and has better interpretability for survival prediction. We would make the code publicly available upon acceptance.

AAAI Conference 2022 Conference Paper

Hybrid Curriculum Learning for Emotion Recognition in Conversation

  • Lin Yang
  • YI Shen
  • Yue Mao
  • Longjun Cai

Emotion recognition in conversation (ERC) aims to detect the emotion label for each utterance. Motivated by recent studies which have proven that feeding training examples in a meaningful order rather than considering them randomly can boost the performance of models, we propose an ERCoriented hybrid curriculum learning framework. Our framework consists of two curricula: (1) conversation-level curriculum (CC); and (2) utterance-level curriculum (UC). In CC, we construct a difficulty measurer based on “emotion shift” frequency within a conversation, then the conversations are scheduled in an “easy to hard” schema according to the difficulty score returned by the difficulty measurer. For UC, it is implemented from an emotion-similarity perspective, which progressively strengthens the model’s ability in identifying the confusing emotions. With the proposed model-agnostic hybrid curriculum learning strategy, we observe significant performance boosts over a wide range of existing ERC models and we are able to achieve new state-of-the-art results on four public ERC datasets.

NeurIPS Conference 2022 Conference Paper

Learning from Distributed Users in Contextual Linear Bandits Without Sharing the Context

  • Osama Hanna
  • Lin Yang
  • Christina Fragouli

Contextual linear bandits is a rich and theoretically important model that has many practical applications. Recently, this setup gained a lot of interest in applications over wireless where communication constraints can be a performance bottleneck, especially when the contexts come from a large $d$-dimensional space. In this paper, we consider the distributed contextual linear bandit learning problem, where the agents who observe the contexts and take actions are geographically separated from the learner who performs the learning while not seeing the contexts. We assume that contexts are generated from a distribution and propose a method that uses $\approx 5d$ bits per context for the case of unknown context distribution and $0$ bits per context if the context distribution is known, while achieving nearly the same regret bound as if the contexts were directly observable. The former bound improves upon existing bounds by a $\log(T)$ factor, where $T$ is the length of the horizon, while the latter achieves information theoretical tightness.

NeurIPS Conference 2022 Conference Paper

Near-Optimal Sample Complexity Bounds for Constrained MDPs

  • Sharan Vaswani
  • Lin Yang
  • Csaba Szepesvari

In contrast to the advances in characterizing the sample complexity for solving Markov decision processes (MDPs), the optimal statistical complexity for solving constrained MDPs (CMDPs) remains unknown. We resolve this question by providing minimax upper and lower bounds on the sample complexity for learning near-optimal policies in a discounted CMDP with access to a generative model (simulator). In particular, we design a model-based algorithm that addresses two settings: (i) relaxed feasibility, where small constraint violations are allowed, and (ii) strict feasibility, where the output policy is required to satisfy the constraint. For (i), we prove that our algorithm returns an $\epsilon$-optimal policy with probability $1 - \delta$, by making $\tilde{O}\left(\frac{S A \log(1/\delta)}{(1 - \gamma)^3 \epsilon^2}\right)$ queries to the generative model, thus matching the sample-complexity for unconstrained MDPs. For (ii), we show that the algorithm's sample complexity is upper-bounded by $\tilde{O} \left(\frac{S A \, \log(1/\delta)}{(1 - \gamma)^5 \, \epsilon^2 \zeta^2} \right)$ where $\zeta$ is the problem-dependent Slater constant that characterizes the size of the feasible region. Finally, we prove a matching lower-bound for the strict feasibility setting, thus obtaining the first near minimax optimal bounds for discounted CMDPs. Our results show that learning CMDPs is as easy as MDPs when small constraint violations are allowed, but inherently more difficult when we demand zero constraint violation.

NeurIPS Conference 2022 Conference Paper

Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning

  • Dingwen Kong
  • Lin Yang

An appropriate reward function is of paramount importance in specifying a task in reinforcement learning (RL). Yet, it is known to be extremely challenging in practice to design a correct reward function for even simple tasks. Human-in-the-loop (HiL) RL allows humans to communicate complex goals to the RL agent by providing various types of feedback. However, despite achieving great empirical successes, HiL RL usually requires \emph{too much} feedback from a human teacher and also suffers from insufficient theoretical understanding. In this paper, we focus on addressing this issue from a theoretical perspective, aiming to provide provably feedback-efficient algorithmic frameworks that take human-in-the-loop to specify rewards of given tasks. We provide an \emph{active-learning}-based RL algorithm that first explores the environment without specifying a reward function and then asks a human teacher for only a few queries about the rewards of a task at some state-action pairs. After that, the algorithm guarantees to provide a nearly optimal policy for the task with high probability. We show that, even with the presence of random noise in the feedback, the algorithm only takes $\tilde{O}(H{\dim_{R}^2})$ queries on the reward function to provide an $\epsilon$-optimal policy for any $\epsilon > 0$. Here $H$ is the horizon of the RL environment, and $\dim_{R}$ specifies the complexity of the function class representing the reward function. In contrast, standard RL algorithms require to query the reward function for at least $\Omega(\operatorname{poly}(d, 1/\epsilon))$ state-action pairs where $d$ depends on the complexity of the environmental transition.

NeurIPS Conference 2021 Conference Paper

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

  • Jingfeng Wu
  • Vladimir Braverman
  • Lin Yang

In this paper we consider multi-objective reinforcement learning where the objectives are balanced using preferences. In practice, the preferences are often given in an adversarial manner, e. g. , customers can be picky in many applications. We formalize this problem as an episodic learning problem on a Markov decision process, where transitions are unknown and a reward function is the inner product of a preference vector with pre-specified multi-objective reward functions. We consider two settings. In the online setting, the agent receives a (adversarial) preference every episode and proposes policies to interact with the environment. We provide a model-based algorithm that achieves a nearly minimax optimal regret bound $\widetilde{\mathcal{O}}\bigl(\sqrt{\min\{d, S\}\cdot H^2 SAK}\bigr)$, where $d$ is the number of objectives, $S$ is the number of states, $A$ is the number of actions, $H$ is the length of the horizon, and $K$ is the number of episodes. Furthermore, we consider preference-free exploration, i. e. , the agent first interacts with the environment without specifying any preference and then is able to accommodate arbitrary preference vector up to $\epsilon$ error. Our proposed algorithm is provably efficient with a nearly optimal trajectory complexity $\widetilde{\mathcal{O}}\bigl({\min\{d, S\}\cdot H^3 SA}/{\epsilon^2}\bigr)$. This result partly resolves an open problem raised by \citet{jin2020reward}.

NeurIPS Conference 2021 Conference Paper

Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs

  • Han Zhong
  • Jiayi Huang
  • Lin Yang
  • Liwei Wang

Despite a large amount of effort in dealing with heavy-tailed error in machine learning, little is known when moments of the error can become non-existential: the random noise $\eta$ satisfies Pr$\left[|\eta| > |y|\right] \le 1/|y|^{\alpha}$ for some $\alpha > 0$. We make the first attempt to actively handle such super heavy-tailed noise in bandit learning problems: We propose a novel robust statistical estimator, mean of medians, which estimates a random variable by computing the empirical mean of a sequence of empirical medians. We then present a generic reductionist algorithmic framework for solving bandit learning problems (including multi-armed and linear bandit problem): the mean of medians estimator can be applied to nearly any bandit learning algorithm as a black-box filtering for its reward signals and obtain similar regret bound as if the reward is sub-Gaussian. We show that the regret bound is near-optimal even with very heavy-tailed noise. We also empirically demonstrate the effectiveness of the proposed algorithm, which further corroborates our theoretical results.

NeurIPS Conference 2021 Conference Paper

Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback

  • Lin Yang
  • Yu-Zhen Janice Chen
  • Stephen Pasteris
  • Mohammad Hajiesmaili
  • John C. S. Lui
  • Don Towsley

This paper studies a cooperative multi-armed bandit problem with $M$ agents cooperating together to solve the same instance of a $K$-armed stochastic bandit problem with the goal of maximizing the cumulative reward of agents. The agents are heterogeneous in (i) their limited access to a local subset of arms; and (ii) their decision-making rounds, i. e. , agents are asynchronous with different decision-making gaps. The goal is to find the global optimal arm and agents are able to pull any arm, however, they observe the reward only when the selected arm is local. The challenge is a tradeoff for agents between pulling a local arm with the possibility of observing the feedback, or relying on the observations of other agents that might occur at different rates. Naive extensions of traditional algorithms lead to an arbitrarily poor regret as a function of aggregate action frequency of any $\textit{suboptimal}$ arm located in slow agents. We resolve this issue by proposing a novel two-stage learning algorithm, called $\texttt{CO-LCB}$ algorithm, whose regret is a function of aggregate action frequency of agents containing the $\textit{optimal}$ arm. We also show that the regret of $\texttt{CO-LCB}$ matches the regret lower bound up to a small factor.

NeurIPS Conference 2021 Conference Paper

On the Value of Interaction and Function Approximation in Imitation Learning

  • Nived Rajaraman
  • Yanjun Han
  • Lin Yang
  • Jingbo Liu
  • Jiantao Jiao
  • Kannan Ramchandran

We study the statistical guarantees for the Imitation Learning (IL) problem in episodic MDPs. Rajaraman et al. (2020) show an information theoretic lower bound that in the worst case, a learner which can even actively query the expert policy suffers from a suboptimality growing quadratically in the length of the horizon, $H$. We study imitation learning under the $\mu$-recoverability assumption of Ross et al. (2011) which assumes that the difference in the $Q$-value under the expert policy across different actions in a state do not deviate beyond $\mu$ from the maximum. We show that the reduction proposed by Ross et al. (2010) is statistically optimal: the resulting algorithm upon interacting with the MDP for $N$ episodes results in a suboptimality bound of $\widetilde{\mathcal{O}} \left( \mu |\mathcal{S}| H / N \right)$ which we show is optimal up to log-factors. In contrast, we show that any algorithm which does not interact with the MDP and uses an offline dataset of $N$ expert trajectories must incur suboptimality growing as $\gtrsim |\mathcal{S}| H^2/N$ even under the $\mu$-recoverability assumption. This establishes a clear and provable separation of the minimax rates between the active setting and the no-interaction setting. We also study IL with linear function approximation. When the expert plays actions according to a linear classifier of known state-action features, we use the reduction to multi-class classification to show that with high probability, the suboptimality of behavior cloning is $\widetilde{O}(dH^2/N)$ given $N$ rollouts from the optimal policy. This is optimal up to log-factors but can be improved to $\widetilde{O}(dH/N)$ if we have a linear expert with parameter-sharing across time steps. In contrast, when the MDP transition structure is known to the learner such as in the case of simulators, we demonstrate fundamental differences compared to the tabular setting in terms of the performance of an optimal algorithm, Mimic-MD (Rajaraman et al. (2020)) when extended to the function approximation setting. Here, we introduce a new problem called confidence set linear classification, that can be used to construct sample-efficient IL algorithms.

NeurIPS Conference 2020 Conference Paper

Adversarial Bandits with Corruptions: Regret Lower Bound and No-regret Algorithm

  • Lin Yang
  • Mohammad Hajiesmaili
  • Mohammad Sadegh Talebi
  • John C. S. Lui
  • Wing Shing Wong

This paper studies adversarial bandits with corruptions. In the basic adversarial bandit setting, the reward of arms is predetermined by an adversary who is oblivious to the learner’s policy. In this paper, we consider an extended setting in which an attacker sits in-between the environment and the learner, and is endowed with a limited budget to corrupt the reward of the selected arm. We have two main results. First, we derive a lower bound on the regret of any bandit algorithm that is aware of the budget of the attacker. Also, for budget-agnostic algorithms, we characterize an impossibility result demonstrating that even when the attacker has a sublinear budget, i. e. , a budget growing sublinearly with time horizon T, they fail to achieve a sublinear regret. Second, we propose ExpRb, a bandit algorithm that incorporates a biased estimator and a robustness parameter to deal with corruption. We characterize the regret of ExpRb as a function of the corruption budget and show that for the case of a known corruption budget, the regret of ExpRb is tight.

AAAI Conference 2020 Conference Paper

An Annotation Sparsification Strategy for 3D Medical Image Segmentation via Representative Selection and Self-Training

  • Hao Zheng
  • Yizhe Zhang
  • Lin Yang
  • Chaoli Wang
  • Danny Z. Chen

Image segmentation is critical to lots of medical applications. While deep learning (DL) methods continue to improve performance for many medical image segmentation tasks, data annotation is a big bottleneck to DL-based segmentation because (1) DL models tend to need a large amount of labeled data to train, and (2) it is highly time-consuming and label-intensive to voxel-wise label 3D medical images. Significantly reducing annotation effort while attaining good performance of DL segmentation models remains a major challenge. In our preliminary experiments, we observe that, using partially labeled datasets, there is indeed a large performance gap with respect to using fully annotated training datasets. In this paper, we propose a new DL framework for reducing annotation effort and bridging the gap between full annotation and sparse annotation in 3D medical image segmentation. We achieve this by (i) selecting representative slices in 3D images that minimize data redundancy and save annotation effort, and (ii) self-training with pseudo-labels automatically generated from the base-models trained using the selected annotated slices. Extensive experiments using two public datasets (the HVSMR 2016 Challenge dataset and mouse piriform cortex dataset) show that our framework yields competitive segmentation results comparing with state-of-the-art DL methods using less than ∼ 20% of annotated data.

NeurIPS Conference 2020 Conference Paper

Is Long Horizon RL More Difficult Than Short Horizon RL?

  • Ruosong Wang
  • Simon S. Du
  • Lin Yang
  • Sham Kakade

Learning to plan for long horizons is a central challenge in episodic reinforcement learning problems. A fundamental question is to understand how the difficulty of the problem scales as the horizon increases. Here the natural measure of sample complexity is a normalized one: we are interested in the \emph{number of episodes} it takes to provably discover a policy whose value is $\varepsilon$ near to that of the optimal value, where the value is measured by the \emph{normalized} cumulative reward in each episode. In a COLT 2018 open problem, Jiang and Agarwal conjectured that, for tabular, episodic reinforcement learning problems, there exists a sample complexity lower bound which exhibits a polynomial dependence on the horizon --- a conjecture which is consistent with all known sample complexity upper bounds. This work refutes this conjecture, proving that tabular, episodic reinforcement learning is possible with a sample complexity that scales only \emph{logarithmically} with the planning horizon. In other words, when the values are appropriately normalized (to lie in the unit interval), this results shows that long horizon RL is no more difficult than short horizon RL, at least in a minimax sense. Our analysis introduces two ideas: (i) the construction of an $\varepsilon$-net for near-optimal policies whose log-covering number scales only logarithmically with the planning horizon, and (ii) the Online Trajectory Synthesis algorithm, which adaptively evaluates all policies in a given policy class and enjoys a sample complexity that scales logarithmically with the cardinality of the given policy class. Both may be of independent interest.

NeurIPS Conference 2020 Conference Paper

Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning?

  • Qiwen Cui
  • Lin Yang

It is believed that a model-based approach for reinforcement learning (RL) is the key to reduce sample complexity. However, the understanding of the sample optimality of model-based RL is still largely missing, even for the linear case. This work considers sample complexity of finding an $\epsilon$-optimal policy in a Markov decision process (MDP) that admits a linear additive feature representation, given only access to a generative model. We solve this problem via a plug-in solver approach, which builds an empirical model and plans in this empirical model via an arbitrary plug-in solver. We prove that under the anchor-state assumption, which implies implicit non-negativity in the feature space, the minimax sample complexity of finding an $\epsilon$-optimal policy in a $\gamma$-discounted MDP is $O(K/(1-\gamma)^3\epsilon^2)$, which only depends on the dimensionality $K$ of the feature space and has no dependence on the state or action space. We further extend our results to a relaxed setting where anchor-states may not exist and show that a plug-in approach can be sample efficient as well, providing a flexible approach to design model-based algorithms for RL.

AAAI Conference 2020 Conference Paper

Loss-Based Attention for Deep Multiple Instance Learning

  • Xiaoshuang Shi
  • Fuyong Xing
  • Yuanpu Xie
  • Zizhao Zhang
  • Lei Cui
  • Lin Yang

Although attention mechanisms have been widely used in deep learning for many tasks, they are rarely utilized to solve multiple instance learning (MIL) problems, where only a general category label is given for multiple instances contained in one bag. Additionally, previous deep MIL methods firstly utilize the attention mechanism to learn instance weights and then employ a fully connected layer to predict the bag label, so that the bag prediction is largely determined by the effectiveness of learned instance weights. To alleviate this issue, in this paper, we propose a novel loss based attention mechanism, which simultaneously learns instance weights and predictions, and bag predictions for deep multiple instance learning. Specifically, it calculates instance weights based on the loss function, e. g. softmax+cross-entropy, and shares the parameters with the fully connected layer, which is to predict instance and bag predictions. Additionally, a regularization term consisting of learned weights and crossentropy functions is utilized to boost the recall of instances, and a consistency cost is used to smooth the training process of neural networks for boosting the model generalization performance. Extensive experiments on multiple types of benchmark databases demonstrate that the proposed attention mechanism is a general, effective and efficient framework, which can achieve superior bag and image classification performance over other state-of-the-art MIL methods, with obtaining higher instance precision and recall than previous attention mechanisms. Source codes are available on https: //github. com/xsshi2015/Loss-Attention.

TIST Journal 2020 Journal Article

Mining High-utility Temporal Patterns on Time Interval–based Data

  • Jun-Zhe Wang
  • Yi-Cheng Chen
  • Wen-Yueh Shih
  • Lin Yang
  • Yu-Shao Liu
  • Jiun-Long Huang

In this article, we propose a novel temporal pattern mining problem, named high-utility temporal pattern mining, to fulfill the needs of various applications. Different from classical temporal pattern mining aimed at discovering frequent temporal patterns, high-utility temporal pattern mining is to find each temporal pattern whose utility is greater than or equal to the minimum-utility threshold. To facilitate efficient high-utility temporal pattern mining, several extension and pruning strategies are proposed to reduce the search space. Algorithm HUTPMiner is then proposed to efficiently mine high-utility temporal patterns with the aid of the proposed extension and pruning strategies. Experimental results show that HUTPMiner is able to prune a large number of candidates, thereby achieving high mining efficiency.

NeurIPS Conference 2020 Conference Paper

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

  • Kaiqing Zhang
  • Sham Kakade
  • Tamer Basar
  • Lin Yang

Model-based reinforcement learning (RL), which finds an optimal policy using an empirical model, has long been recognized as one of the cornerstones of RL. It is especially suitable for multi-agent RL (MARL), as it naturally decouples the learning and the planning phases, and avoids the non-stationarity problem when all agents are improving their policies simultaneously using samples. Though intuitive and widely-used, the sample complexity of model-based MARL algorithms has been investigated relatively much less often. In this paper, we aim to address the fundamental open question about the sample complexity of model-based MARL. We study arguably the most basic MARL setting: two-player discounted zero-sum Markov games, given only access to a generative model of state transition. We show that model-based MARL achieves a sample complexity of $\tilde \cO(|\cS||\cA||\cB|(1-\gamma)^{-3}\epsilon^{-2})$ for finding the Nash equilibrium (NE) \emph{value} up to some $\epsilon$ error, and the $\epsilon$-NE \emph{policies}, where $\gamma$ is the discount factor, and $\cS, \cA, \cB$ denote the state space, and the action spaces for the two agents. We also show that this method is near-minimax optimal with a tight dependence on $1-\gamma$ and $|\cS|$ by providing a lower bound of $\Omega(|\cS|(|\cA|+|\cB|)(1-\gamma)^{-3}\epsilon^{-2})$. Our results justify the efficiency of this simple model-based approach in the multi-agent RL setting.

NeurIPS Conference 2020 Conference Paper

On Reward-Free Reinforcement Learning with Linear Function Approximation

  • Ruosong Wang
  • Simon S. Du
  • Lin Yang
  • Russ R. Salakhutdinov

Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch RL setting and the setting where there are many reward functions of interest. During the exploration phase, an agent collects samples without using a pre-specified reward function. After the exploration phase, a reward function is given, and the agent uses samples collected during the exploration phase to compute a near-optimal policy. Jin et al. [2020] showed that in the tabular setting, the agent only needs to collect polynomial number of samples (in terms of the number states, the number of actions, and the planning horizon) for reward-free RL. However, in practice, the number of states and actions can be large, and thus function approximation schemes are required for generalization. In this work, we give both positive and negative results for reward-free RL with linear function approximation. We give an algorithm for reward-free RL in the linear Markov decision process setting where both the transition and the reward admit linear representations. The sample complexity of our algorithm is polynomial in the feature dimension and the planning horizon, and is completely independent of the number of states and actions. We further give an exponential lower bound for reward-free RL in the setting where only the optimal $Q$-function admits a linear representation. Our results imply several interesting exponential separations on the sample complexity of reward-free RL.

NeurIPS Conference 2020 Conference Paper

Planning with General Objective Functions: Going Beyond Total Rewards

  • Ruosong Wang
  • Peilin Zhong
  • Simon S. Du
  • Russ R. Salakhutdinov
  • Lin Yang

Standard sequential decision-making paradigms aim to maximize the cumulative reward when interacting with the unknown environment. , i. e. , maximize $\sum_{h = 1}^H r_h$ where $H$ is the planning horizon. However, this paradigm fails to model important practical applications, e. g. , safe control that aims to maximize the lowest reward, i. e. , maximize $\min_{h= 1}^H r_h$. In this paper, based on techniques in sketching algorithms, we propose a novel planning algorithm in deterministic systems which deals with a large class of objective functions of the form $f(r_1, r_2, .. . r_H)$ that are of interest to practical applications. We show that efficient planning is possible if $f$ is symmetric under permutation of coordinates and satisfies certain technical conditions. Complementing our algorithm, we further prove that removing any of the conditions will make the problem intractable in the worst case and thus demonstrate the necessity of our conditions.

NeurIPS Conference 2020 Conference Paper

Preference-based Reinforcement Learning with Finite-Time Guarantees

  • Yichong Xu
  • Ruosong Wang
  • Lin Yang
  • Aarti Singh
  • Artur Dubrawski

Preference-based Reinforcement Learning (PbRL) replaces reward values in traditional reinforcement learning by preferences to better elicit human opinion on the target objective, especially when numerical reward values are hard to design or interpret. Despite promising results in applications, the theoretical understanding of PbRL is still in its infancy. In this paper, we present the first finite-time analysis for general PbRL problems. We first show that a unique optimal policy may not exist if preferences over trajectories are deterministic for PbRL. If preferences are stochastic, and the preference probability relates to the hidden reward values, we present algorithms for PbRL, both with and without a simulator, that are able to identify the best policy up to accuracy $\varepsilon$ with high probability. Our method explores the state space by navigating to under-explored states, and solves PbRL using a combination of dueling bandits and policy search. Experiments show the efficacy of our method when it is applied to real-world problems.

NeurIPS Conference 2020 Conference Paper

Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning

  • Fei Feng
  • Ruosong Wang
  • Wotao Yin
  • Simon S. Du
  • Lin Yang

Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems [tang2017exploration, bellemare2016unifying], we investigate when this paradigm is provably efficient. We study episodic Markov decision processes with rich observations generated from a small number of latent states. We present a general algorithmic framework that is built upon two components: an unsupervised learning algorithm and a no-regret tabular RL algorithm. Theoretically, we prove that as long as the unsupervised learning algorithm enjoys a polynomial sample complexity guarantee, we can find a near-optimal policy with sample complexity polynomial in the number of latent states, which is significantly smaller than the number of observations. Empirically, we instantiate our framework on a class of hard exploration problems to demonstrate the practicality of our theory.

NeurIPS Conference 2020 Conference Paper

Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension

  • Ruosong Wang
  • Russ R. Salakhutdinov
  • Lin Yang

Value function approximation has demonstrated phenomenal empirical success in reinforcement learning (RL). Nevertheless, despite a handful of recent progress on developing theory for RL with linear function approximation, the understanding of \emph{general} function approximation schemes largely remains missing. In this paper, we establish the first provably efficient RL algorithm with general value function approximation. We show that if the value functions admit an approximation with a function class $\mathcal{F}$, our algorithm achieves a regret bound of $\widetilde{O}(\mathrm{poly}(dH)\sqrt{T})$ where $d$ is a complexity measure of $\mathcal{F}$ that depends on the eluder dimension~[Russo and Van Roy, 2013] and log-covering numbers, $H$ is the planning horizon, and $T$ is the number interactions with the environment. Our theory generalizes the linear MDP assumption to general function classes. Moreover, our algorithm is model-free and provides a framework to justify the effectiveness of algorithms used in practice.

NeurIPS Conference 2020 Conference Paper

Toward the Fundamental Limits of Imitation Learning

  • Nived Rajaraman
  • Lin Yang
  • Jiantao Jiao
  • Kannan Ramchandran

Imitation learning (IL) aims to mimic the behavior of an expert policy in a sequential decision-making problem given only demonstrations. In this paper, we focus on understanding the minimax statistical limits of IL in episodic Markov Decision Processes (MDPs). We first consider the setting where the learner is provided a dataset of $N$ expert trajectories ahead of time, and cannot interact with the MDP. Here, we show that the policy which mimics the expert whenever possible is in expectation $\lesssim \frac{|\mathcal{S}| H^2 \log (N)}{N}$ suboptimal compared to the value of the expert, even when the expert plays a stochastic policy. Here $\mathcal{S}$ is the state space and $H$ is the length of the episode. Furthermore, we establish a suboptimality lower bound of $\gtrsim |\mathcal{S}| H^2 / N$ which applies even if the expert is constrained to be deterministic, or if the learner is allowed to actively query the expert at visited states while interacting with the MDP for $N$ episodes. To our knowledge, this is the first algorithm with suboptimality having no dependence on the number of actions, under no additional assumptions. We then propose a novel algorithm based on minimum-distance functionals in the setting where the transition model is given and the expert is deterministic. The algorithm is suboptimal by $\lesssim |\mathcal{S}| H^{3/2} / N$, matching our lower bound up to a $\sqrt{H}$ factor, and breaks the $\mathcal{O}(H^2)$ error compounding barrier of IL.

AAAI Conference 2019 Conference Paper

A New Ensemble Learning Framework for 3D Biomedical Image Segmentation

  • Hao Zheng
  • Yizhe Zhang
  • Lin Yang
  • Peixian Liang
  • Zhuo Zhao
  • Chaoli Wang
  • Danny Z. Chen

3D image segmentation plays an important role in biomedical image analysis. Many 2D and 3D deep learning models have achieved state-of-the-art segmentation performance on 3D biomedical image datasets. Yet, 2D and 3D models have their own strengths and weaknesses, and by unifying them together, one may be able to achieve more accurate results. In this paper, we propose a new ensemble learning framework for 3D biomedical image segmentation that combines the merits of 2D and 3D models. First, we develop a fully convolutional network based meta-learner to learn how to improve the results from 2D and 3D models (base-learners). Then, to minimize over-fitting for our sophisticated meta-learner, we devise a new training method that uses the results of the baselearners as multiple versions of “ground truths”. Furthermore, since our new meta-learner training scheme does not depend on manual annotation, it can utilize abundant unlabeled 3D image data to further improve the model. Extensive experiments on two public datasets (the HVSMR 2016 Challenge dataset and the mouse piriform cortex dataset) show that our approach is effective under fully-supervised, semisupervised, and transductive settings, and attains superior performance over state-of-the-art image segmentation methods.

AAAI Conference 2019 Conference Paper

Biomedical Image Segmentation via Representative Annotation

  • Hao Zheng
  • Lin Yang
  • Jianxu Chen
  • Jun Han
  • Yizhe Zhang
  • Peixian Liang
  • Zhuo Zhao
  • Chaoli Wang

Deep learning has been applied successfully to many biomedical image segmentation tasks. However, due to the diversity and complexity of biomedical image data, manual annotation for training common deep learning models is very timeconsuming and labor-intensive, especially because normally only biomedical experts can annotate image data well. Human experts are often involved in a long and iterative process of annotation, as in active learning type annotation schemes. In this paper, we propose representative annotation (RA), a new deep learning framework for reducing annotation effort in biomedical image segmentation. RA uses unsupervised networks for feature extraction and selects representative image patches for annotation in the latent space of learned feature descriptors, which implicitly characterizes the underlying data while minimizing redundancy. A fully convolutional network (FCN) is then trained using the annotated selected image patches for image segmentation. Our RA scheme offers three compelling advantages: (1) It leverages the ability of deep neural networks to learn better representations of image data; (2) it performs one-shot selection for manual annotation and frees annotators from the iterative process of common active learning based annotation schemes; (3) it can be deployed to 3D images with simple extensions. We evaluate our RA approach using three datasets (two 2D and one 3D) and show our framework yields competitive segmentation results comparing with state-of-the-art methods.

JBHI Journal 2019 Journal Article

Deep Convolutional Hashing for Low-Dimensional Binary Embedding of Histopathological Images

  • Manish Sapkota
  • Xiaoshuang Shi
  • Fuyong Xing
  • Lin Yang

Compact binary representations of histopathology images using hashing methods provide efficient approximate nearest neighbor search for direct visual query in large-scale databases. They can be utilized to measure the probability of the abnormality of the query image based on the retrieved similar cases, thereby providing support for medical diagnosis. They also allow for efficient managing of large-scale image databases because of a low storage requirement. However, the effectiveness of binary representations heavily relies on the visual descriptors that represent the semantic information in the histopathological images. Traditional approaches with hand-crafted visual descriptors might fail due to significant variations in image appearance. Recently, deep learning architectures provide promising solutions to address this problem using effective semantic representations. In this paper, we propose a deep convolutional hashing method that can be trained “pointwise” to simultaneously learn both semantic and binary representations of histopathological images. Specifically, we propose a convolutional neural network that introduces a latent binary encoding ('BE) layer for low-dimensional feature embedding to learn binary codes. We design a joint optimization objective function that encourages the network to learn discriminative representations from the label information, and reduce the gap between the real-valued low-dimensional embedded features and desired binary values. The binary encoding for new images can be obtained by forward propagating through the network and quantizing the output of the 'BE layer. Experimental results on a large-scale histopathological image dataset demonstrate the effectiveness of the proposed method.

NeurIPS Conference 2019 Conference Paper

Efficient Symmetric Norm Regression via Linear Sketching

  • Zhao Song
  • Ruosong Wang
  • Lin Yang
  • Hongyang Zhang
  • Peilin Zhong

We provide efficient algorithms for overconstrained linear regression problems with size $n \times d$ when the loss function is a symmetric norm (a norm invariant under sign-flips and coordinate-permutations). An important class of symmetric norms are Orlicz norms, where for a function $G$ and a vector $y \in \mathbb{R}^n$, the corresponding Orlicz norm $\|y\|_G$ is defined as the unique value $\alpha$ such that $\sum_{i=1}^n G(|y_i|/\alpha) = 1$. When the loss function is an Orlicz norm, our algorithm produces a $(1 + \varepsilon)$-approximate solution for an arbitrarily small constant $\varepsilon > 0$ in input-sparsity time, improving over the previously best-known algorithm which produces a $d \cdot \polylog n$-approximate solution. When the loss function is a general symmetric norm, our algorithm produces a $\sqrt{d} \cdot \polylog n \cdot \mathrm{mmc}(\ell)$-approximate solution in input-sparsity time, where $\mathrm{mmc}(\ell)$ is a quantity related to the symmetric norm under consideration. To the best of our knowledge, this is the first input-sparsity time algorithm with provable guarantees for the general class of symmetric norm regression problem. Our results shed light on resolving the universal sketching problem for linear regression, and the techniques might be of independent interest to numerical linear algebra problems more broadly.

JBHI Journal 2018 Journal Article

AIIMDs: An Integrated Framework of Automatic Idiopathic Inflammatory Myopathy Diagnosis for Muscle

  • Manish Sapkota
  • Fujun Liu
  • Yuanpu Xie
  • Hai Su
  • Fuyong Xing
  • Lin Yang

Idiopathic inflammatory myopathy (IIM) is a common skeletal muscle disease that relates to weakness and inflammation of muscle. Early diagnosis and prognosis of different types of IIMs will guide the effective treatment. Interpretation of digitized images of the cross-section muscle biopsy, which is currently done manually, provides the most reliable diagnostic information. With the increasing volume of images, the management and manual interpretation of the digitized muscle images suffer from low efficiency and high interobserver variabilities. In order to address these problems, we propose the first complete framework of automatic IIM diagnosis system for the management and interpretation of digitized skeletal muscle histopathology images. The proposed framework consists of several key components: (1) Automatic cell segmentation, perimysium annotation, and nuclei detection; (2) histogram-based feature extraction and quantification; (3) content-based image retrieval to search and retrieve similar cases in the database for comparative study; and (4) majority voting-based classification to provide decision support for computer-aided clinical diagnosis. Experiments show that the proposed diagnosis system provides efficient and robust interpretation of the digitized muscle image and computer-aided diagnosis of IIM.

NeurIPS Conference 2018 Conference Paper

Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization

  • Minshuo Chen
  • Lin Yang
  • Mengdi Wang
  • Tuo Zhao

Stochastic optimization naturally arises in machine learning. Efficient algorithms with provable guarantees, however, are still largely missing, when the objective function is nonconvex and the data points are dependent. This paper studies this fundamental challenge through a streaming PCA problem for stationary time series data. Specifically, our goal is to estimate the principle component of time series data with respect to the covariance matrix of the stationary distribution. Computationally, we propose a variant of Oja's algorithm combined with downsampling to control the bias of the stochastic gradient caused by the data dependency. Theoretically, we quantify the uncertainty of our proposed stochastic algorithm based on diffusion approximations. This allows us to prove the asymptotic rate of convergence and further implies near optimal asymptotic sample complexity. Numerical experiments are provided to support our analysis.

NeurIPS Conference 2018 Conference Paper

Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model

  • Aaron Sidford
  • Mengdi Wang
  • Xian Wu
  • Lin Yang
  • Yinyu Ye

In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time. Given such a DMDP with states $\states$, actions $\actions$, discount factor $\gamma\in(0, 1)$, and rewards in range $[0, 1]$ we provide an algorithm which computes an $\epsilon$-optimal policy with probability $1 - \delta$ where {\it both} the run time spent and number of sample taken is upper bounded by \[ O\left[\frac{|\cS||\cA|}{(1-\gamma)^3 \epsilon^2} \log \left(\frac{|\cS||\cA|}{(1-\gamma)\delta \epsilon} \right) \log\left(\frac{1}{(1-\gamma)\epsilon}\right)\right] ~. \] For fixed values of $\epsilon$, this improves upon the previous best known bounds by a factor of $(1 - \gamma)^{-1}$ and matches the sample complexity lower bounds proved in \cite{azar2013minimax} up to logarithmic factors. We also extend our method to computing $\epsilon$-optimal policies for finite-horizon MDP with a generative model and provide a nearly matching sample complexity lower bound.

NeurIPS Conference 2018 Conference Paper

The Physical Systems Behind Optimization Algorithms

  • Lin Yang
  • Raman Arora
  • Vladimir Braverman
  • Tuo Zhao

We use differential equations based approaches to provide some {\it \textbf{physics}} insights into analyzing the dynamics of popular optimization algorithms in machine learning. In particular, we study gradient descent, proximal gradient descent, coordinate gradient descent, proximal coordinate gradient, and Newton's methods as well as their Nesterov's accelerated variants in a unified framework motivated by a natural connection of optimization algorithms to physical systems. Our analysis is applicable to more general algorithms and optimization problems {\it \textbf{beyond}} convexity and strong convexity, e. g. Polyak-\L ojasiewicz and error bound conditions (possibly nonconvex).

AAAI Conference 2017 Conference Paper

Asymmetric Discrete Graph Hashing

  • Xiaoshuang Shi
  • Fuyong Xing
  • Kaidi Xu
  • Manish Sapkota
  • Lin Yang

Recently, many graph based hashing methods have been emerged to tackle large-scale problems. However, there exists two major bottlenecks: (1) directly learning discrete hashing codes is an NP-hard optimization problem; (2) the complexity of both storage and computational time to build a graph with n data points is O(n2 ). To address these two problems, in this paper, we propose a novel yet simple supervised graph based hashing method, asymmetric discrete graph hashing, by preserving the asymmetric discrete constraint and building an asymmetric affinity matrix to learn compact binary codes. Specifically, we utilize two different instead of identical discrete matrices to better preserve the similarity of the graph with short binary codes. We generate the asymmetric affinity matrix using m (m << n) selected anchors to approximate the similarity among all training data so that computational time and storage requirement can be significantly improved. In addition, the proposed method jointly learns discrete binary codes and a low-dimensional projection matrix to further improve the retrieval accuracy. Extensive experiments on three benchmark large-scale databases demonstrate its superior performance over the recent state of the arts with lower training time costs.

NeurIPS Conference 2017 Conference Paper

On Quadratic Convergence of DC Proximal Newton Algorithm in Nonconvex Sparse Learning

  • Xingguo Li
  • Lin Yang
  • Jason Ge
  • Jarvis Haupt
  • Tong Zhang
  • Tuo Zhao

We propose a DC proximal Newton algorithm for solving nonconvex regularized sparse learning problems in high dimensions. Our proposed algorithm integrates the proximal newton algorithm with multi-stage convex relaxation based on the difference of convex (DC) programming, and enjoys both strong computational and statistical guarantees. Specifically, by leveraging a sophisticated characterization of sparse modeling structures (i. e. , local restricted strong convexity and Hessian smoothness), we prove that within each stage of convex relaxation, our proposed algorithm achieves (local) quadratic convergence, and eventually obtains a sparse approximate local optimum with optimal statistical properties after only a few convex relaxations. Numerical experiments are provided to support our theory.

NeurIPS Conference 2016 Conference Paper

Combining Fully Convolutional and Recurrent Neural Networks for 3D Biomedical Image Segmentation

  • Jianxu Chen
  • Lin Yang
  • Yizhe Zhang
  • Mark Alber
  • Danny Chen

Segmentation of 3D images is a fundamental problem in biomedical image analysis. Deep learning (DL) approaches have achieved the state-of-the-art segmentation performance. To exploit the 3D contexts using neural networks, known DL segmentation methods, including 3D convolution, 2D convolution on the planes orthogonal to 2D slices, and LSTM in multiple directions, all suffer incompatibility with the highly anisotropic dimensions in common 3D biomedical images. In this paper, we propose a new DL framework for 3D image segmentation, based on a combination of a fully convolutional network (FCN) and a recurrent neural network (RNN), which are responsible for exploiting the intra-slice and inter-slice contexts, respectively. To our best knowledge, this is the first DL framework for 3D image segmentation that explicitly leverages 3D image anisotropism. Evaluating using a dataset from the ISBI Neuronal Structure Segmentation Challenge and in-house image stacks for 3D fungus segmentation, our approach achieves promising results, comparing to the known DL-based 3D segmentation approaches.

IROS Conference 2006 Conference Paper

Adjustable Bipedal Gait Generation using Genetic Algorithm Optimized Fourier Series Formulation

  • Lin Yang
  • Chee-Meng Chew
  • Aun Neow Poo
  • Teresa Zielinska

This paper presents a method for optimally generating stable bipedal walking gaits, based on a truncated Fourier series formulation with coefficients tuned by genetic algorithm. It also provides a way to adjust the stride-frequency, step-length or walking pattern in real-time. The proposed approach to gait synthesis is not limited by the robot kinematic structure and can be used to satisfy various motion assumptions. It is also easy to generate optimal gaits on terrains of different slopes or on stairs under different motion requirements. Dynamic simulation results show the validity and robustness of the approach. The gaits generated resulted in human-like motions optimized for stability, even walking speed and lower leg-strike velocity of the swing foot