Arrow Research search

Author name cluster

Yi Zhou

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

93 papers
2 author rows

Possible papers

93

AIIM Journal 2026 Journal Article

A Character-level Convolutional Recurrent Interaction Network for joint traditional Chinese medicine clinical named entity recognition and relation extraction

  • Qiang Xu
  • Zhi-hui Zhao
  • Wei-wei Liu
  • Yu Fang
  • Wen-jun Tang
  • Yi Zhou
  • Ke Zhu
  • Hai Xiang

The electronic medical record (EMR) of traditional Chinese medicine (TCM) is a crucial document for recording patients’ clinical data, structured around four main dimensions: inspection, listening and smelling, inquiry, and palpation. Analyzing these records using natural language processing holds promise for further structuring and modeling TCM medical data. Currently, deep learning-based named entity recognition is considered the prevailing method for processing TCM EMRs. However, these state-of-the-art models fail to consider the four diagnostic dimensions of TCM clinical data and their impact on entity type extraction, as well as to fully understand the semantic features of ancient Chinese representations in TCM. To address these issues, we introduce a joint clinical named recognition and relation extraction method designed to recognize and classify clinical entities – such as location and symptom attributes – along with their associative relationships (four diagnostic dimensions). In this study, we propose a Character-level Convolutional Recurrent Interaction Network (CCRIN), which treats the four diagnostic dimensions as relationships, locations as head entities, and symptom attributes as tail entities. The CCRIN integrates Chinese character embeddings and Chinese inter-character contextual convolutional feature vectors to capture the semantic information of the ancient Chinese language, while combining entity and relation extraction with a self-attention mechanism to generate rich feature representations through multi-task dynamic interaction. This approach enables the efficient extraction of TCM entities and relations related to the four diagnostic dimensions. Empirical studies on the NYT and the TCM-cases datasets demonstrate the superiority of the proposed model. The model novelly employs a multi-task joint extraction method for entities and relations. The method is performed based on the four diagnostic methods in traditional Chinese medicine. Chinese character embeddings and inter-character contextual feature vectors are integrated. The effectiveness is validated on publicly available and self-constructed datasets.

JBHI Journal 2026 Journal Article

A Graph Convolutional Network with Pretrained Features and Iterative Polar Coordinate Attention for Cross-Subject EEG-based Neuropsychiatric Diagnosis

  • Yin Liu
  • Jiaojiao Deng
  • Runyi Xu
  • Yi Zhou

Electroencephalography (EEG)-based diagnosis of neuropsychiatric disorders offers a non-invasive and cost-effective solution for early detection. However, robust cross-subject generalization remains a major challenge due to substantial inter-individual variability in EEG signals. To address this, we propose PreIPCA-GCN, a novel graph convolutional network that integrates pretrained temporal features and Iterative Polar Coordinate Attention (IPCA)-based brain connectivity modeling. Specifically, we utilize a modified version of LaBraM, a large-scale pretrained EEG model, to extract subject-invariant node representations. Functional brain connectivity is then characterized using Pearson correlation and cosine similarity in polar space, capturing both connectivity strength (radius) and phase synchronization (angle). To fuse these complementary cues, we introduce a dual-path IPCA mechanism, refining the adjacency matrix across iterations. PreIPCA-GCN is evaluated on six public EEG datasets covering five neuropsychiatric disorders (e. g. , attention-deficit/hyperactivity disorder, Alzheimer's disease, schizophrenia), consistently demonstrating strong cross-subject accuracies under both hold-out (86. 77%-95. 69%) and leave-one-subject-out cross-validation (88. 81%-97. 49%). Comprehensive comparative results show that PreIPCA-GCN outperforms several state-of-the-art methods. Ablation studies further confirm the effectiveness of both the pretrained node features and IPCA-based fused adjacency matrix in improving cross-subject generalization. These findings suggest PreIPCA-GCN as a robust and generalizable framework for cross-subject EEG-based neuropsychiatric diagnosis, offering strong potential for future clinical applications.

JBHI Journal 2026 Journal Article

Advanced Camera-Based Scoliosis Screening via Deep Learning Detection and Fusion of Trunk, Limb, and Skeleton Features

  • Ziyan Wang
  • Yi Zhou
  • Ninghui Xu
  • Yuqin Zhou
  • Heran Zhao
  • Zhiyong Chang
  • Zhigang Hu
  • Xiao Han

Scoliosis significantly impacts quality of life, highlighting the need for effective early scoliosis screening (SS) and intervention. However, current SS methods often involve physical contact, undressing, or radiation exposure. This study introduces an innovative, non-invasive SS approach utilizing a monocular RGB camera that eliminates the need for undressing, sensor attachment, and radiation exposure. We introduce a novel approach that employs Parameterized Human 3D Reconstruction (PH3DR) to reconstruct 3D human models, thereby effectively eliminating clothing obstructions, seamlessly integrated with an ISANet segmentation network, which has been enhanced by Multi-Scale Fusion Attention (MSFA) module we proposed for facilitating the segmentation of distinct human trunk and limb features (HTLF), capturing body surface asymmetries related to scoliosis. Additionally, we propose a Swin Transformer-enhanced CMU-Pose to extract human skeleton features (HSF), identifying skeletal asymmetries crucial for SS. Finally, we develop a fusion model that integrates the HTLF and HSF, combining surface morphology and skeletal features to improve the precision of SS. The experiments demonstrated that PH3DR and MSFA significantly improved the segmentation and extraction of HTLF, whereas ST-based CMU-Pose substantially enhanced the extraction of HSF. Our final model achieved a comparable F1 (0. 895 $\pm$ 0. 014) to the best-performing baseline model, with only 0. 79% of the parameters and 1. 64% of the FLOPs, achieving 36 FPS–significantly higher than the best-performing baseline model (10 FPS). Moreover, our model outperformed two spine surgeons, one less experienced and the other moderately experienced. With its patient-friendly, privacy-preserving, and easily deployable solution, this approach is particularly well-suited for early SS and routine monitoring.

AAAI Conference 2026 Conference Paper

Bidirectional Channel-selective Semantic Interaction for Semi-Supervised Medical Segmentation

  • Kaiwen Huang
  • Yizhe Zhang
  • Yi Zhou
  • Tianyang Xu
  • Tao Zhou

Semi-supervised medical image segmentation is an effective method for addressing scenarios with limited labeled data. Existing methods mainly rely on frameworks such as mean teacher and dual-stream consistency learning. These approaches often face issues like error accumulation and model structural complexity, while also neglecting the interaction between labeled and unlabeled data streams. To overcome these challenges, we propose a Bidirectional Channel-selective Semantic Interaction (BCSI) framework for semi-supervised medical image segmentation. First, we propose a Semantic-Spatial Perturbation (SSP) mechanism, which disturbs the data using two strong augmentation operations and leverages unsupervised learning with pseudo-labels from weak augmentations. Additionally, we employ consistency on the predictions from the two strong augmentations to further improve model stability and robustness. Second, to reduce noise during the interaction between labeled and unlabeled data, we propose a Channel-selective Router (CR) component, which dynamically selects the most relevant channels for information exchange. This mechanism ensures that only highly relevant features are activated, minimizing unnecessary interference. Finally, the Bidirectional Channel-wise Interaction (BCI) strategy is employed to supplement additional semantic information and enhance the representation of important channels. Experimental results on multiple benchmarking 3D medical datasets demonstrate that the proposed method outperforms existing semi-supervised approaches.

AAAI Conference 2026 Conference Paper

StrokeFusion: Vector Sketch Generation via Joint Stroke-UDF Encoding and Latent Sequence Diffusion

  • Jin Zhou
  • Yi Zhou
  • Hongliang Yang
  • Pengfei Xu
  • Hui Huang

In the field of sketch generation, raster-format trained models often produce non-stroke artifacts, while vector-format trained models typically lack a holistic understanding of sketches, leading to compromised recognizability. Moreover, existing methods struggle to extract common features from similar elements (e.g., eyes of animals) appearing at varying positions across sketches. To address these challenges, we propose StrokeFusion, a two-stage framework for vector sketch generation. It contains a dual-modal sketch feature learning network that maps strokes into a high-quality latent space. This network decomposes sketches into normalized strokes and jointly encodes stroke sequences with Unsigned Distance Function (UDF) maps, representing sketches as sets of stroke feature vectors. Building upon this representation, our framework exploits a stroke-level latent diffusion model that simultaneously adjusts stroke position, scale, and trajectory during generation. This enables high-fidelity stroke generation while supporting stroke interpolation editing. Extensive experiments across multiple sketch datasets, demonstrate that our framework outperforms state-of-the-art techniques, validating its effectiveness in preserving structural integrity and semantic features.

IJCAI Conference 2025 Conference Paper

A Novel Local Search Algorithm for the Vertex Bisection Minimization Problem

  • Rui Sun
  • Xinyu Wang
  • Yiyuan Wang
  • Jiangnan Li
  • Yi Zhou

The vertex bisection minimization problem (VBMP) is a fundamental graph partitioning problem with numerous real-world applications. In this study, we propose a (k, l, S)-cluster guided local search algorithm to address this challenge. First, we propose a novel (k, l, S)-cluster enumeration procedure, which is based on two key concepts: the (k, l, S)-cluster and the local cluster core. The (k, l, S)-cluster limits both the connectivity and distinct boundaries of a given vertex set, and the local cluster core represents the most cohesive substructure within a (k, l, S)-cluster. Building up on the above (k, l, S)-cluster enumeration procedure, we present a novel (k, l, S)-cluster guided perturbation mechanism designed to escape from local optima. Next, we propose a two-manner local search procedure that employs two distinct search models to explore the neighboring search space efficiently. Experimental results demonstrate that the proposed algorithm performs best on nearly all instances.

IJCAI Conference 2025 Conference Paper

A Reduction-Based Algorithm for the Clique Interdiction Problem

  • Chenghao Zhu
  • Yi Zhou
  • Haoyu Jiang

The Clique Interdiction Problem (CIP) aims to minimize the size of the largest clique in a given graph by removing a given number of vertices. The CIP models a special Stackelberg game and has important applications in fields such as pandemic control and terrorist identification. However, the CIP is a bilevel graph optimization problem, making it very challenging to solve. Recently, data reduction techniques have been successfully applied in many (single-level) graph optimization problems like vertex cover. Motivated by this, we investigate a set of novel reduction rules and design a reduction-based algorithm, RECIP, for practically solving the CIP. RECIP enjoys an effective preprocessing procedure that systematically reduces the input graph, making the problem much easier to solve. Extensive experiments on 124 large real-world networks demonstrate the superior performance of RECIP and validate the effectiveness of the proposed reduction rules.

TMLR Journal 2025 Journal Article

Adaptive Gradient Normalization and Independent Sampling for (Stochastic) Generalized-Smooth Optimization

  • Yufeng Yang
  • Erin E. Tripp
  • Yifan Sun
  • Shaofeng Zou
  • Yi Zhou

Recent studies have shown that many nonconvex machine learning problems satisfy a generalized-smooth condition that extends beyond traditional smooth nonconvex optimization. However, the existing algorithms are not fully adapted to such generalized-smooth nonconvex geometry and encounter significant technical limitations on their convergence analysis. In this work, we first analyze the convergence of adaptively normalized gradient descent under function geometries characterized by generalized-smoothness and the generalized PL condition, revealing the advantage of adaptive gradient normalization. Our results provide theoretical insights into adaptive normalization across various scenarios. For stochastic generalized-smooth nonconvex optimization, we propose the Independent-Adaptively Normalized Stochastic Gradient Descent algorithm, which leverages adaptive gradient normalization, independent sampling, and gradient clipping to achieve an $\mathcal{O}(\epsilon^{-4})$ sample complexity under relaxed noise assumptions. Experiments on large-scale nonconvex generalized-smooth problems demonstrate the fast convergence of our algorithm.

TMLR Journal 2025 Journal Article

Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance

  • Qi Zhang
  • Yi Zhou
  • Shaofeng Zou

This paper provides the first tight convergence analyses for RMSProp and Adam for non-convex optimization under the most relaxed assumptions of coordinate-wise generalized smoothness and affine noise variance. RMSProp is firstly analyzed, which is a special case of Adam with adaptive learning rates but without first-order momentum. Specifically, to solve the challenges due to the dependence among adaptive update, unbounded gradient estimate and Lipschitz constant, we demonstrate that the first-order term in the descent lemma converges and its denominator is upper bounded by a function of gradient norm. Based on this result, we show that RMSProp with proper hyperparameters converges to an $\epsilon$-stationary point with an iteration complexity of $\mathcal O(\epsilon^{-4})$. We then generalize our analysis to Adam, where the additional challenge is due to a mismatch between the gradient and the first-order momentum. We develop a new upper bound on the first-order term in the descent lemma, which is also a function of the gradient norm. We show that Adam with proper hyperparameters converges to an $\epsilon$-stationary point with an iteration complexity of $\mathcal O(\epsilon^{-4})$. Our complexity results for both RMSProp and Adam match with the complexity lower bound established in Arjevani et al. (2023).

ICLR Conference 2025 Conference Paper

CryoFM: A Flow-based Foundation Model for Cryo-EM Densities

  • Yi Zhou
  • Yilai Li
  • Jing Yuan
  • Quanquan Gu

Cryo-electron microscopy (cryo-EM) is a powerful technique in structural biology and drug discovery, enabling the study of biomolecules at high resolution. Significant advancements by structural biologists using cryo-EM have led to the production of around 40k protein density maps at various resolutions. However, cryo-EM data processing algorithms have yet to fully benefit from our knowledge of biomolecular density maps, with only a few recent models being data-driven but limited to specific tasks. In this study, we present CryoFM, a foundation model designed as a generative model, learning the distribution of high-quality density maps and generalizing effectively to downstream tasks. Built on flow matching, CryoFM is trained to accurately capture the prior distribution of biomolecular density maps. Furthermore, we introduce a flow posterior sampling method that leverages CryoFM as a flexible prior for several downstream tasks in cryo-EM and cryo-electron tomography (cryo-ET) without the need for fine-tuning, achieving state-of-the-art performance on most tasks and demonstrating its potential as a foundational model for broader applications in these fields.

NeurIPS Conference 2025 Conference Paper

E-MoFlow: Learning Egomotion and Optical Flow from Event Data via Implicit Regularization

  • Wenpu Li
  • Bangyan Liao
  • Yi Zhou
  • Pian Wan
  • Peidong Liu

The estimation of optical flow and 6-DoF ego-motion—two fundamental tasks in 3-D vision—has typically been addressed independently. For neuromorphic vision (e. g. , event cameras), however, the lack of robust data association makes solving the two problems separately an ill-posed challenge, especially in the absence of supervision via ground truth. Existing works mitigate this ill-posedness by either enforcing the smoothness of the flow field via an explicit variational regularizer or leveraging explicit structure-and-motion priors in the parametrization to improve event alignment. The former notably introduces bias in results and computational overhead, while the latter—which parametrizes the optical flow in terms of the scene depth and the camera motion—often converges to suboptimal local minima. To address these issues, we propose an unsupervised pipeline that jointly optimizes egomotion and flow via implicit spatial-temporal and geometric regularization. First, by modeling camera's egomotion as a continuous spline and optical flow as an implicit neural representation, our method inherently embeds spatial-temporal coherence through inductive biases. Second, we incorporate structure-and-motion priors through differential geometric constraints, bypassing explicit depth estimation while maintaining rigorous geometric consistency. As a result, our framework (called \textbf{E-MoFlow}) unifies egomotion and optical flow estimation via implicit regularization under a fully unsupervised paradigm. Experiments demonstrate its versatility to general 6-DoF motion scenarios, achieving state-of-the-art performance among unsupervised methods and competitive even with supervised approaches. Code will be released upon acceptance.

TMLR Journal 2025 Journal Article

Joint Diffusion for Universal Hand-Object Grasp Generation

  • Jinkun Cao
  • Jingyuan Liu
  • Kris Kitani
  • Yi Zhou

Predicting and generating human hand grasp over objects is critical for animation and robotic tasks. In this work, we focus on generating both the hand and objects in a grasp by a single diffusion model. Our proposed Joint Hand-Object Diffusion (JHOD) models the hand and object in a unified latent representation. It uses the hand-object grasping data to learn to accommodate hand and object to form plausible grasps. Also, to enforce the generalizability over diverse object shapes, it leverages large-scale object datasets to learn an inclusive object latent embedding. With or without a given object as an optional condition, the diffusion model can generate grasps unconditionally or conditional to the object. Compared to the usual practice of learning object-conditioned grasp generation from only hand-object grasp data, our method benefits from more diverse object data used for training to handle grasp generation more universally. According to both qualitative and quantitative experiments, both conditional and unconditional generation of hand grasp achieves good visual plausibility and diversity. With the extra inclusiveness of object representation learned from large-scale object datasets, the proposed method generalizes well to unseen object shapes.

IJCAI Conference 2025 Conference Paper

Large-Scale Trade-Off Curve Computation for Incentive Allocation with Cardinality and Matroid Constraints

  • Yu Cong
  • Chao Xu
  • Yi Zhou

We consider a large-scale incentive allocation problem where the entire trade-off curve between budget and profit has to be maintained approximately at all time. The application originally comes from assigning coupons to users of the ride-sharing apps, where each user can have a limit on the number of coupons been assigned. We consider a more general form, where the coupons for each user forms a matroid, and the coupon assigned to each user must be an independent set. We show the entire trade-off curve can be maintained approximately in near real time.

AAAI Conference 2025 Conference Paper

Make Domain Shift a Catastrophic Forgetting Alleviator in Class-Incremental Learning

  • Wei Chen
  • Yi Zhou

In the realm of class-incremental learning (CIL), alleviating the catastrophic forgetting problem is a pivotal challenge. This paper discovers a counter-intuitive observation: by incorporating domain shift into CIL tasks, the forgetting rate is significantly reduced. Our comprehensive studies demonstrate that incorporating domain shift leads to a clearer separation in the feature distribution across tasks and helps reduce parameter interference during the learning process. Inspired by this observation, we propose a simple yet effective method named DisCo to deal with CIL tasks. DisCo introduces a lightweight prototype pool that utilizes contrastive learning to promote distinct feature distributions for the current task relative to previous ones, effectively mitigating interference across tasks. DisCo can be easily integrated into existing state-of-the-art class-incremental learning methods. Experimental results show that incorporating our method into various CIL methods achieves substantial performance improvements, validating the benefits of our approach in enhancing class-incremental learning by separating feature representation and reducing interference. These findings illustrate that DisCo can serve as a robust fashion for future research in class-incremental learning.

IROS Conference 2025 Conference Paper

MemGS: Memory-Efficient Gaussian Splatting for Real-Time SLAM

  • Yinlong Bai
  • Hongxin Zhang
  • Sheng Zhong
  • Junkai Niu
  • Hai Li
  • Yijia He
  • Yi Zhou

Recent advancements in 3D Gaussian Splatting (3DGS) have made a significant impact on rendering and reconstruction techniques. Current research predominantly focuses on improving rendering performance and reconstruction quality using high-performance desktop GPUs, largely overlooking applications for embedded platforms like micro air vehicles (MAVs). These devices, with their limited computational resources and memory, often face a trade-off between system performance and reconstruction quality. In this paper, we improve existing methods in terms of GPU memory usage while enhancing rendering quality. Specifically, to address redundant 3D Gaussian primitives in SLAM, we propose merging them in voxel space based on geometric similarity. This reduces GPU memory usage without impacting system runtime performance. Furthermore, rendering quality is improved by initializing 3D Gaussian primitives via Patch-Grid (PG) point sampling, enabling more accurate modeling of the entire scene. Quantitative and qualitative evaluations on publicly available datasets demonstrate the effectiveness of our improvements.

TMLR Journal 2025 Journal Article

On Learning Representations for Tabular Data Distillation

  • Inwon Kang
  • Parikshit Ram
  • Yi Zhou
  • Horst Samulowitz
  • Oshani Seneviratne

Dataset distillation generates a small set of information-rich instances from a large dataset, resulting in reduced storage requirements, privacy or copyright risks, and computational costs for downstream modeling, though much of the research has focused on the image data modality. We study tabular data distillation, which brings in novel challenges such as the inherent feature heterogeneity and the common use of non-differentiable learning models (such as decision tree ensembles and nearest-neighbor predictors). To mitigate these challenges, we present $\texttt{TDColER}$, a tabular data distillation framework via column embeddings-based representation learning. To evaluate this framework, we also present a tabular data distillation benchmark, ${{\sf \small TDBench}}$. Based on an elaborate evaluation on ${{\sf \small TDBench}}$, resulting in 226,200 distilled datasets and 541,980 models trained on them, we demonstrate that $\texttt{TDColER}$ is able to boost the distilled data quality of off-the-shelf distillation schemes by 0.5-143% across 7 different tabular learning models. All of the code used in the experiments can be found in http://github.com/inwonakng/tdbench

TMLR Journal 2025 Journal Article

Rectified Robust Policy Optimization for Model-Uncertain Constrained Reinforcement Learning without Strong Duality

  • Shaocong Ma
  • Ziyi Chen
  • Yi Zhou
  • Heng Huang

The goal of robust constrained reinforcement learning (RL) is to optimize an agent's performance under the worst-case model uncertainty while satisfying safety or resource constraints. In this paper, we demonstrate that strong duality does not generally hold in robust constrained RL, indicating that traditional primal-dual methods may fail to find optimal feasible policies. To overcome this limitation, we propose a novel primal-only algorithm called Rectified Robust Policy Optimization (RRPO), which operates directly on the primal problem without relying on dual formulations. We provide theoretical convergence guarantees under mild regularity assumptions, showing convergence to an approximately optimal feasible policy with iteration complexity matching the best-known lower bound when the uncertainty set diameter is controlled in a specific level. Empirical results in a grid-world environment validate the effectiveness of our approach, demonstrating that RRPO achieves robust and safe performance under model uncertainties while the non-robust method can violate the worst-case safety constraints.

AAAI Conference 2024 Conference Paper

A Fast Exact Solver with Theoretical Analysis for the Maximum Edge-Weighted Clique Problem

  • Lu Liu
  • Mingyu Xiao
  • Yi Zhou

The maximum vertex-weighted clique problem (MVWCP) and the maximum edge-weighted clique problem (MEWCP) are two natural extensions of the fundamental maximum clique problem. In this paper, we systematically study MEWCP and make the following major contributions: (1) We show that MEWCP is NP-hard even when the minimum degree of the graph is n-2, in contrast to MVWCP which is polynomial-time solvable when the minimum degree of the graph is at least n-3. This result distinguishes the complexity of the two problems for the first time. (2) To address MEWCP, we develop an efficient branch-and-bound algorithm called MEWCat with both practical and theoretical performance guarantees. In practice, MEWCat utilizes a new upper bound tighter than existing ones, which allows for more efficient pruning of branches. In theory, we prove a running-time bound of O*(1.4423^n) for MEWCat, which breaks the trivial bound of O*(2^n) in the research line of practical exact MEWCP solvers for the first time. (3) Empirically, we evaluate the performance of MEWCat on various benchmark instances. The experiments demonstrate that MEWCat outperforms state-of-the-art exact solvers significantly. For instance, on 16 DIMACS graphs that the state-of-the-art solver BBEWC fails to solve within 7200 seconds, MEWCat solves all of them with an average time of less than 1000 seconds. On real-world graphs, MEWCat achieves an average speedup of over 36x.

UAI Conference 2024 Conference Paper

Approximate Kernel Density Estimation under Metric-based Local Differential Privacy

  • Yi Zhou
  • Yanhao Wang 0001
  • Long Teng
  • Qiang Huang
  • Cen Chen 0001

Kernel Density Estimation (KDE) is a fundamental problem with broad machine learning applications. In this paper, we investigate the KDE problem under Local Differential Privacy (LDP), a setting in which users privatize data on their own devices before sending them to an untrusted server for analytics. To strike a balance between ensuring local privacy and preserving high-utility KDE results, we adopt a relaxed definition of LDP based on metrics (mLDP), which is suitable when data points are represented in a metric space and can be more distinguishable as their distances increase. To the best of our knowledge, approximate KDE under mLDP has not been explored in the existing literature. We propose the mLDP-KDE framework, which augments a locality-sensitive hashing-based sketch method to provide mLDP and answer any KDE query unbiasedly within an additive error with high probability in sublinear time and space. Extensive experimental results demonstrate that the mLDP-KDE framework outperforms several existing KDE methods under LDP and mLDP by achieving significantly better trade-offs between privacy and utility, with particularly remarkable advantages on large, high-dimensional data.

NeurIPS Conference 2024 Conference Paper

BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages

  • Junho Myung
  • Nayeon Lee
  • Yi Zhou
  • Jiho Jin
  • Rifki A. Putri
  • Dimosthenis Antypas
  • Hsuvas Borkakoty
  • Eunsu Kim

Large language models (LLMs) often lack culture-specific everyday knowledge, especially across diverse regions and non-English languages. Existing benchmarks for evaluating LLMs' cultural sensitivities are usually limited to a single language or online sources like Wikipedia, which may not reflect the daily habits, customs, and lifestyles of different regions. That is, information about the food people eat for their birthday celebrations, spices they typically use, musical instruments youngsters play or the sports they practice in school is not always explicitly written online. To address this issue, we introduce BLEnD, a hand-crafted benchmark designed to evaluate LLMs' everyday knowledge across diverse cultures and languages. The benchmark comprises 52. 6k question-answer pairs from 16 countries/regions, in 13 different languages, including low-resource ones such as Amharic, Assamese, Azerbaijani, Hausa, and Sundanese. We evaluate LLMs in two formats: short-answer questions, and multiple-choice questions. We show that LLMs perform better in cultures that are more present online, with a maximum 57. 34% difference in GPT-4, the best-performing model, in the short-answer format. Furthermore, we find that LLMs perform better in their local languages for mid-to-high-resource languages. Interestingly, for languages deemed to be low-resource, LLMs provide better answers in English. We make our dataset publicly available at: https: //github. com/nlee0212/BLEnD.

NeurIPS Conference 2024 Conference Paper

DMesh: A Differentiable Mesh Representation

  • Sanghyun Son
  • Matheus Gadelha
  • Yang Zhou
  • Zexiang Xu
  • Ming C. Lin
  • Yi Zhou

We present a differentiable representation, DMesh, for general 3D triangular meshes. DMesh considers both the geometry and connectivity information of a mesh. In our design, we first get a set of convex tetrahedra that compactly tessellates the domain based on Weighted Delaunay Triangulation (WDT), and select triangular faces on the tetrahedra to define the final mesh. We formulate probability of faces to exist on the actual surface in a differentiable manner based on the WDT. This enables DMesh to represent meshes of various topology in a differentiable way, and allows us to reconstruct the mesh under various observations, such as point clouds and multi-view images using gradient-based optimization. We publicize the source code and supplementary material at our project page (https: //sonsang. github. io/dmesh-project).

AAAI Conference 2024 Short Paper

Effective Data Distillation for Tabular Datasets (Student Abstract)

  • Inwon Kang
  • Parikshit Ram
  • Yi Zhou
  • Horst Samulowitz
  • Oshani Seneviratne

Data distillation is a technique of reducing a large dataset into a smaller dataset. The smaller dataset can then be used to train a model which can perform comparably to a model trained on the full dataset. Past works have examined this approach for image datasets, focusing on neural networks as target models. However, tabular datasets pose new challenges not seen in images. A sample in tabular dataset is a one dimensional vector unlike the two (or three) dimensional pixel grid of images, and Non-NN models such as XGBoost can often outperform neural network (NN) based models. Our contribution in this work is two-fold: 1) We show in our work that data distillation methods from images do not translate directly to tabular data; 2) We propose a new distillation method that consistently outperforms the baseline for multiple different models, including non-NN models such as XGBoost.

NeurIPS Conference 2024 Conference Paper

Is Your HD Map Constructor Reliable under Sensor Corruptions?

  • Xiaoshuai Hao
  • Mengchuan Wei
  • Yifan Yang
  • Haimei Zhao
  • Hui Zhang
  • Yi Zhou
  • Qiang Wang
  • Weiming Li

Driving systems often rely on high-definition (HD) maps for precise environmental information, which is crucial for planning and navigation. While current HD map constructors perform well under ideal conditions, their resilience to real-world challenges, \eg, adverse weather and sensor failures, is not well understood, raising safety concerns. This work introduces MapBench, the first comprehensive benchmark designed to evaluate the robustness of HD map construction methods against various sensor corruptions. Our benchmark encompasses a total of 29 types of corruptions that occur from cameras and LiDAR sensors. Extensive evaluations across 31 HD map constructors reveal significant performance degradation of existing methods under adverse weather conditions and sensor failures, underscoring critical safety concerns. We identify effective strategies for enhancing robustness, including innovative approaches that leverage multi-modal fusion, advanced data augmentation, and architectural techniques. These insights provide a pathway for developing more reliable HD map construction methods, which are essential for the advancement of autonomous driving technology. The benchmark toolkit and affiliated code and model checkpoints have been made publicly accessible.

AAAI Conference 2024 Conference Paper

Large-Scale Non-convex Stochastic Constrained Distributionally Robust Optimization

  • Qi Zhang
  • Yi Zhou
  • Ashley Prater-Bennette
  • Lixin Shen
  • Shaofeng Zou

Distributionally robust optimization (DRO) is a powerful framework for training robust models against data distribution shifts. This paper focuses on constrained DRO, which has an explicit characterization of the robustness level. Existing studies on constrained DRO mostly focus on convex loss function, and exclude the practical and challenging case with non-convex loss function, e.g., neural network. This paper develops a stochastic algorithm and its performance analysis for non-convex constrained DRO. The computational complexity of our stochastic algorithm at each iteration is independent of the overall dataset size, and thus is suitable for large-scale applications. We focus on the general Cressie-Read family divergence defined uncertainty set which includes chi^2-divergences as a special case. We prove that our algorithm finds an epsilon-stationary point with an improved computational complexity than existing methods. Our method also applies to the smoothed conditional value at risk (CVaR) DRO.

NeurIPS Conference 2024 Conference Paper

LRM-Zero: Training Large Reconstruction Models with Synthesized Data

  • Desai Xie
  • Sai Bi
  • Zhixin Shu
  • Kai Zhang
  • Zexiang Xu
  • Yi Zhou
  • Sören Pirk
  • Arie Kaufman

We present LRM-Zero, a Large Reconstruction Model (LRM) trained entirely on synthesized 3D data, achieving high-quality sparse-view 3D reconstruction. The core of LRM-Zero is our procedural 3D dataset, Zeroverse, which is automatically synthesized from simple primitive shapes with random texturing and augmentations (e. g. , height fields, boolean differences, and wireframes). Unlike previous 3D datasets (e. g. , Objaverse) which are often captured or crafted by humans to approximate real 3D data, Zeroverse completely ignores realistic global semantics but is rich in complex geometric and texture details that are locally similar to or even more intricate than real objects. We demonstrate that our LRM-Zero, trained with our fully synthesized Zeroverse, can achieve high visual quality in the reconstruction of real-world objects, competitive with models trained on Objaverse. We also analyze several critical design choices of Zeroverse that contribute to LRM-Zero's capability and training stability. Our work demonstrates that 3D reconstruction, one of the core tasks in 3D vision, can potentially be addressed without the semantics of real-world objects. The Zeroverse's procedural synthesis code and interactive visualization are available at: https: //desaixie. github. io/lrm-zero/.

JBHI Journal 2024 Journal Article

Using Pupil Diameter for Psychological Resilience Assessment in Medical Students Based on SVM and SHAP Model

  • Fayang Xiang
  • Li Zhang
  • Yidan Ye
  • Chuyue Xiong
  • Yanjie Zhang
  • Yan Hu
  • Jiang Du
  • Yi Zhou

Effectively assessing psychological resilience for medical students is vital for identifying at-risk individuals and developing tailored interventions. At present, few studies have combined physiological indexes of the human body and machine learning for psychological resilience assessment. This study presents a novel approach that employs pupil diameter features and machine learning to predict psychological resilience risk objectively. Firstly, we designed a stimulus paradigm (via auditory and visual stimuli) and collected pupil diameter data from participants using eye-tracking technology. Secondly, the pupil data was preprocessed, including linear interpolation, blink detection, and subtractive baseline correction. Thirdly, statistical metrics were extracted and optimal feature subsets were obtained by Recursive Feature Elimination with Cross-Validation (RFECV). Subsequently, the classification models, including Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), and eXtreme Gradient Boosting (XGBoost), were trained. The experimental results show that the SVM model has the best performance, and its balance accuracy, recall, and AUC reach 0. 906, 0. 89, and 0. 932, respectively. Finally, we leveraged the Shapley additive explanation (SHAP) model for interpretability analysis. It revealed auditory stimuli have a more significant effect than visual stimuli in psychological resilience assessment. These findings suggested that pupil diameter could be a vital metric for assessing psychological resilience.

TMLR Journal 2023 Journal Article

A Cubic Regularization Approach for Finding Local Minimax Points in Nonconvex Minimax Optimization

  • Ziyi Chen
  • Zhengyang Hu
  • Qunwei Li
  • Zhe Wang
  • Yi Zhou

Gradient descent-ascent (GDA) is a widely used algorithm for minimax optimization. However, GDA has been proved to converge to stationary points for nonconvex minimax optimization, which are suboptimal compared with local minimax points. In this work, we develop cubic regularization (CR) type algorithms that globally converge to local minimax points in nonconvex-strongly-concave minimax optimization. We first show that local minimax points are equivalent to second-order stationary points of a certain envelope function. Then, inspired by the classic cubic regularization algorithm, we propose an algorithm named Cubic-LocalMinimax for finding local minimax points, and provide a comprehensive convergence analysis by leveraging its intrinsic potential function. Specifically, we establish the global convergence of Cubic-LocalMinimax to a local minimax point at a sublinear convergence rate and characterize its iteration complexity. Also, we propose a GDA-based solver for solving the cubic subproblem involved in Cubic-LocalMinimax up to certain pre-defined accuracy, and analyze the overall gradient and Hessian-vector product computation complexities of such an inexact Cubic-LocalMinimax algorithm. Moreover, we propose a stochastic variant of Cubic-LocalMinimax for large-scale minimax optimization, and characterize its sample complexity under stochastic sub-sampling. Experimental results demonstrate faster or comparable convergence speed of our stochastic Cubic-LocalMinimax than the state-of-the-art algorithms such as GDA and Minimax Cubic-Newton. In particular, our stochastic Cubic-LocalMinimax was also faster as compared to several other algorithms for minimax optimization on a particular adversarial loss for training a convolutional neural network on MNIST.

IJCAI Conference 2023 Conference Paper

A Fast Maximum k-Plex Algorithm Parameterized by the Degeneracy Gap

  • Zhengren Wang
  • Yi Zhou
  • Chunyu Luo
  • Mingyu Xiao

Given a graph, the k-plex is a vertex set in which each vertex is not adjacent to at most k-1 other vertices in the set. The maximum k-plex problem, which asks for the largest k-plex from a given graph, is an important but computationally challenging problem in applications like graph search and community detection. So far, there is a number of empirical algorithms without sufficient theoretical explanations on the efficiency. We try to bridge this gap by defining a novel parameter of the input instance, g_k(G), the gap between the degeneracy bound and the size of maximum k-plex in the given graph, and presenting an exact algorithm parameterized by g_k(G). In other words, we design an algorithm with running time polynomial in the size of input graph and exponential in g_k(G) where k is a constant. Usually, g_k(G) is small and bounded by O(log(|V|)) in real-world graphs, indicating that the algorithm runs in polynomial time. We also carry out massive experiments and show that the algorithm is competitive with the state-of-the-art solvers. Additionally, for large k values such as 15 and 20, our algorithm has superior performance over existing algorithms.

JBHI Journal 2023 Journal Article

A Spatiotemporal Graph Attention Network Based on Synchronization for Epileptic Seizure Prediction

  • Yao Wang
  • Yufei Shi
  • Yinlin Cheng
  • Zhipeng He
  • Xiaoyan Wei
  • Ziyi Chen
  • Yi Zhou

Accurate early prediction of epileptic seizures can provide timely treatment for patients. Previous studies have mainly focused on a single temporal or spatial dimension, making it difficult to take both relationships into account. Therefore, the effective properties of electroencephalograms (EEGs) may not be fully evaluated. To solve this problem, we propose a spatiotemporal graph attention network (STGAT) based on synchronization. The spatial and functional connectivity information between EEG channels was extracted by using the phase locking values (PLVs) first, which allowed multichannel EEG signals to be modeled as graph signals. Afterward, the STGAT model was used to dynamically learn the temporal correlation properties of EEG sequences and explore the spatial topological structure information of multiple channels. Experimental results demonstrated that the STGAT model was able to obtain spatiotemporal correlations and achieve good results on two benchmark datasets. The accuracy, specificity and sensitivity were 98. 74%, 99. 21% and 98. 87%, respectively, on the CHB-MIT dataset. Moreover, all evaluation indices of the private dataset had reached more than 98. 8%, with the area under the curve (AUC) reaching 99. 96%. The proposed method is superior or comparable to the state-of-the-art models. Extensive experiments demonstrate that our end-to-end automatic seizure prediction model can be extended to design clinical assistant decision systems.

TMLR Journal 2023 Journal Article

Assisted Learning for Organizations with Limited Imbalanced Data

  • Cheng Chen
  • Jiaying Zhou
  • Jie Ding
  • Yi Zhou

In the era of big data, many big organizations are integrating machine learning into their work pipelines to facilitate data analysis. However, the performance of their trained models is often restricted by limited and imbalanced data available to them. In this work, we develop an assisted learning framework for assisting organizations to improve their learning performance. The organizations have sufficient computation resources but are subject to stringent data-sharing and collaboration policies. Their limited imbalanced data often cause biased inference and sub-optimal decision-making. In assisted learning, an organizational learner purchases assistance service from an external service provider and aims to enhance its model performance within only a few assistance rounds. We develop effective stochastic training algorithms for both assisted deep learning and assisted reinforcement learning. Different from existing distributed algorithms that need to frequently transmit gradients or models, our framework allows the learner to only occasionally share information with the service provider, but still obtain a model that achieves near-oracle performance as if all the data were centralized.

JBHI Journal 2023 Journal Article

Blind Super-Resolution of 3D MRI via Unsupervised Domain Transformation

  • Hexiang Zhou
  • Yawen Huang
  • Yuexiang Li
  • Yi Zhou
  • Yefeng Zheng

High-resolution medical images can be effectively used for clinical diagnosis. However, the acquisition of high-resolution images is difficult and often limited by medical instruments. Super-resolution (SR) methods provide a solution, where high-resolution (HR) images can be reconstructed from low-resolution (LR) ones. Most of existing deep neural networks for 3D SR medical images trained in a non-blind process, where LR images are directly degraded from HR data via a pre-determined downscale method. Such approaches rely heavily on the assumed degradation model, resulting in inevitable deviations in real clinical practice. Blind super-resolution, as a more attractive research line for this field, aims to generate HR images from LR inputs containing unknown degradation. Towards generalizing SR models for diverse types of degradation, we propose a robust blind SR of 3D medical images in an unsupervised manner with domain correction and upscaling treatment. First, a CycleGAN-based architecture is implemented to generate the LR data from the source domain to the target one for domain correction. Then, an upscaling network is learned via pre-determined HR-LR couples for reconstruction. The proposed framework is able to automatically learn noisy and blurry correction kernels for unpaired 3D SR magnetic resonance images (MRI). Our method achieves better and more robust performances in reconstruction of HR images from LR MRI with multiple unknown degradation processes, and show its superiority to other state-of-the-art supervised models and cycle-consistency based methods, especially in severe distortion cases.

AAAI Conference 2023 Conference Paper

Cross-View Geo-Localization via Learning Disentangled Geometric Layout Correspondence

  • Xiaohan Zhang
  • Xingyu Li
  • Waqas Sultani
  • Yi Zhou
  • Safwan Wshah

Cross-view geo-localization aims to estimate the location of a query ground image by matching it to a reference geo-tagged aerial images database. As an extremely challenging task, its difficulties root in the drastic view changes and different capturing time between two views. Despite these difficulties, recent works achieve outstanding progress on cross-view geo-localization benchmarks. However, existing methods still suffer from poor performance on the cross-area benchmarks, in which the training and testing data are captured from two different regions. We attribute this deficiency to the lack of ability to extract the spatial configuration of visual feature layouts and models' overfitting on low-level details from the training set. In this paper, we propose GeoDTR which explicitly disentangles geometric information from raw features and learns the spatial correlations among visual features from aerial and ground pairs with a novel geometric layout extractor module. This module generates a set of geometric layout descriptors, modulating the raw features and producing high-quality latent representations. In addition, we elaborate on two categories of data augmentations, (i) Layout simulation, which varies the spatial configuration while keeping the low-level details intact. (ii) Semantic augmentation, which alters the low-level details and encourages the model to capture spatial configurations. These augmentations help to improve the performance of the cross-view geo-localization models, especially on the cross-area benchmarks. Moreover, we propose a counterfactual-based learning process to benefit the geometric layout extractor in exploring spatial information. Extensive experiments show that GeoDTR not only achieves state-of-the-art results but also significantly boosts the performance on same-area and cross-area benchmarks. Our code can be found at https://gitlab.com/vail-uvm/geodtr.

JMLR Journal 2023 Journal Article

Decentralized Robust V-learning for Solving Markov Games with Model Uncertainty

  • Shaocong Ma
  • Ziyi Chen
  • Shaofeng Zou
  • Yi Zhou

The Markov game is a popular reinforcement learning framework for modeling competitive players in a dynamic environment. However, most of the existing works on Markov games focus on computing a certain equilibrium following uncertain interactions among the players but ignore the uncertainty of the environment model, which is ubiquitous in practical scenarios. In this work, we develop a theoretical solution to Markov games with environment model uncertainty. Specifically, we propose a new and tractable notion of robust correlated equilibria for Markov games with environment model uncertainty. In particular, we prove that the robust correlated equilibrium has a simple modification structure, and its characterization of equilibria critically depends on the environment model uncertainty. Moreover, we propose the first fully-decentralized stochastic algorithm for computing such the robust correlated equilibrium. Our analysis proves that the algorithm achieves the polynomial episode complexity $\widetilde{O}( SA^2 H^5 \epsilon^{-2})$ for computing an approximate robust correlated equilibrium with $\epsilon$ accuracy. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

JBHI Journal 2023 Journal Article

Flexible Fusion Network for Multi-Modal Brain Tumor Segmentation

  • Hengyi Yang
  • Tao Zhou
  • Yi Zhou
  • Yizhe Zhang
  • Huazhu Fu

Automated brain tumor segmentation is crucial for aiding brain disease diagnosis and evaluating disease progress. Currently, magnetic resonance imaging (MRI) is a routinely adopted approach in the field of brain tumor segmentation that can provide different modality images. It is critical to leverage multi-modal images to boost brain tumor segmentation performance. Existing works commonly concentrate on generating a shared representation by fusing multi-modal data, while few methods take into account modality-specific characteristics. Besides, how to efficiently fuse arbitrary numbers of modalities is still a difficult task. In this study, we present a flexible fusion network (termed F $^{2}$ Net) for multi-modal brain tumor segmentation, which can flexibly fuse arbitrary numbers of multi-modal information to explore complementary information while maintaining the specific characteristics of each modality. Our F $^{2}$ Net is based on the encoder-decoder structure, which utilizes two Transformer-based feature learning streams and a cross-modal shared learning network to extract individual and shared feature representations. To effectively integrate the knowledge from the multi-modality data, we propose a cross-modal feature-enhanced module (CFM) and a multi-modal collaboration module (MCM), which aims at fusing the multi-modal features into the shared learning network and incorporating the features from encoders into the shared decoder, respectively. Extensive experimental results on multiple benchmark datasets demonstrate the effectiveness of our F $^{2}$ Net over other state-of-the-art segmentation methods.

JMLR Journal 2023 Journal Article

On Unbalanced Optimal Transport: Gradient Methods, Sparsity and Approximation Error

  • Quang Minh Nguyen
  • Hoang H. Nguyen
  • Yi Zhou
  • Lam M. Nguyen

We study the Unbalanced Optimal Transport (UOT) between two measures of possibly different masses with at most $n$ components, where the marginal constraints of standard Optimal Transport (OT) are relaxed via Kullback-Leibler divergence with regularization factor $\tau$. Although only Sinkhorn-based UOT solvers have been analyzed in the literature with the iteration complexity of ${O}\big(\tfrac{\tau \log(n)}{\varepsilon} \log\big(\tfrac{\log(n)}{{\varepsilon}}\big)\big)$ and per-iteration cost of $O(n^2)$ for achieving the desired error $\varepsilon$, their positively dense output transportation plans strongly hinder the practicality. On the other hand, while being vastly used as heuristics for computing UOT in modern deep learning applications and having shown success in sparse OT problem, gradient methods applied to UOT have not been formally studied. In this paper, we propose a novel algorithm based on Gradient Extrapolation Method (GEM-UOT) to find an $\varepsilon$-approximate solution to the UOT problem in $O\big( \kappa \log\big(\frac{\tau n}{\varepsilon}\big) \big)$ iterations with $\widetilde{O}(n^2)$ per-iteration cost, where $\kappa$ is the condition number depending on only the two input measures. Our proof technique is based on a novel dual formulation of the squared $\ell_2$-norm UOT objective, which fills the lack of sparse UOT literature and also leads to a new characterization of approximation error between UOT and OT. To this end, we further present a novel approach of OT retrieval from UOT, which is based on GEM-UOT with fine tuned $\tau$ and a post-process projection step. Extensive experiments on synthetic and real datasets validate our theories and demonstrate the favorable performance of our methods in practice. We showcase GEM-UOT on the task of color transfer in terms of both the quality of the transfer image and the sparsity of the transportation plan. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

AAAI Conference 2022 Conference Paper

DarkVisionNet: Low-Light Imaging via RGB-NIR Fusion with Deep Inconsistency Prior

  • Shuangping Jin
  • Bingbing Yu
  • Minhao Jing
  • Yi Zhou
  • Jiajun Liang
  • Renhe Ji

RGB-NIR fusion is a promising method for low-light imaging. However, high-intensity noise in low-light images amplifies the effect of structure inconsistency between RGB-NIR images, which fails existing algorithms. To handle this, we propose a new RGB-NIR fusion algorithm called Dark Vision Net (DVN) with two technical novelties: Deep Structure and Deep Inconsistency Prior (DIP). The Deep Structure extracts clear structure details in deep multiscale feature space rather than raw input space, which is more robust to noisy inputs. Based on the deep structures from both RGB and NIR domains, we introduce the DIP to leverage the structure inconsistency to guide the fusion of RGB-NIR. Benefiting from this, the proposed DVN obtains high-quality lowlight images without the visual artifacts. We also propose a new dataset called Dark Vision Dataset (DVD), consisting of aligned RGB-NIR image pairs, as the first public RGB- NIR fusion benchmark. Quantitative and qualitative results on the proposed benchmark show that DVN significantly outperforms other comparison algorithms in PSNR and SSIM, especially in extremely low light conditions.

IJCAI Conference 2022 Conference Paper

DDDM: A Brain-Inspired Framework for Robust Classification

  • Xiyuan Chen
  • Xingyu Li
  • Yi Zhou
  • Tianming Yang

Despite their outstanding performance in a broad spectrum of real-world tasks, deep artificial neural networks are sensitive to input noises, particularly adversarial perturbations. On the contrary, human and animal brains are much less vulnerable. In contrast to the one-shot inference performed by most deep neural networks, the brain often solves decision-making with an evidence accumulation mechanism that may trade time for accuracy when facing noisy inputs. The mechanism is well described by the Drift-Diffusion Model (DDM). In the DDM, decision-making is modeled as a process in which noisy evidence is accumulated toward a threshold. Drawing inspiration from the DDM, we propose the Dropout-based Drift-Diffusion Model (DDDM) that combines test-phase dropout and the DDM for improving the robustness for arbitrary neural networks. The dropouts create temporally uncorrelated noises in the network that counter perturbations, while the evidence accumulation mechanism guarantees a reasonable decision accuracy. Neural networks enhanced with the DDDM tested in image, speech, and text classification tasks all significantly outperform their native counterparts, demonstrating the DDDM as a task-agnostic defense against adversarial attacks.

JBHI Journal 2022 Journal Article

DR-GAN: Conditional Generative Adversarial Network for Fine-Grained Lesion Synthesis on Diabetic Retinopathy Images

  • Yi Zhou
  • Boyang Wang
  • Xiaodong He
  • Shanshan Cui
  • Ling Shao

Diabetic retinopathy (DR) is a complication of diabetes that severely affects eyes. It can be graded into five levels of severity according to international protocol. However, optimizing a grading model to have strong generalizability requires a large amount of balanced training data, which is difficult to collect, particularly for the high severity levels. Typical data augmentation methods, including random flipping and rotation, cannot generate data with high diversity. In this paper, we propose a diabetic retinopathy generative adversarial network (DR-GAN) to synthesize high-resolution fundus images which can be manipulated with arbitrary grading and lesion information. Thus, large-scale generated data can be used for more meaningful augmentation to train a DR grading and lesion segmentation model. The proposed retina generator is conditioned on the structural and lesion masks, as well as adaptive grading vectors sampled from the latent grading space, which can be adopted to control the synthesized grading severity. Moreover, a multi-scale spatial and channel attention module is devised to improve the generation ability to synthesize small details. Multi-scale discriminators are designed to operate from large to small receptive fields, and joint adversarial losses are adopted to optimize the whole network in an end-to-end manner. With extensive experiments evaluated on the EyePACS dataset connected to Kaggle, as well as the FGADR dataset, we validate the effectiveness of our method, which can both synthesize highly realistic ( $1280 \times 1280$ ) controllable fundus images and contribute to the DR grading task.

NeurIPS Conference 2022 Conference Paper

Finding Correlated Equilibrium of Constrained Markov Game: A Primal-Dual Approach

  • Ziyi Chen
  • Shaocong Ma
  • Yi Zhou

Constrained Markov game is a fundamental problem that covers many applications, where multiple players compete with each other under behavioral constraints. The existing literature has proved the existence of Nash equilibrium for constrained Markov games, which turns out to be PPAD-complete and cannot be computed in polynomial time. In this work, we propose a surrogate notion of correlated equilibrium (CE) for constrained Markov games that can be computed in polynomial time, and study its fundamental properties. We show that the modification structure of CE of constrained Markov games is fundamentally different from that of unconstrained Markov games. Moreover, we prove that the corresponding Lagrangian function has zero duality gap. Based on this result, we develop the first primal-dual algorithm that provably converges to CE of constrained Markov games. In particular, we prove that both the duality gap and the constraint violation of the output policy converge at the rate $\mathcal{O}(\frac{1}{\sqrt{T}})$. Moreover, when adopting the V-learning algorithm as the subroutine in the primal update, our algorithm achieves an approximate CE with $\epsilon$ duality gap with the sample complexity $\mathcal{O}(H^9|\mathcal{S}||\mathcal{A}|^{2} \epsilon^{-4})$.

TMLR Journal 2022 Journal Article

Multi-Agent Off-Policy TDC with Near-Optimal Sample and Communication Complexities

  • Ziyi Chen
  • Yi Zhou
  • Rong-Rong Chen

The finite-time convergence of off-policy temporal difference (TD) learning has been comprehensively studied recently. However, such a type of convergence has not been established for off-policy TD learning in the multi-agent setting, which covers broader reinforcement learning applications and is fundamentally more challenging. This work develops a decentralized TD with correction (TDC) algorithm for multi-agent off-policy TD learning under Markovian sampling. In particular, our algorithm avoids sharing the actions, policies and rewards of the agents, and adopts mini-batch sampling to reduce the sampling variance and communication frequency. Under Markovian sampling and linear function approximation, we proved that the finite-time sample complexity of our algorithm for achieving an $\epsilon$-accurate solution is in the order of $\mathcal{O}\big(\frac{M\ln\epsilon^{-1}}{\epsilon(1-\sigma_2)^2}\big)$, where $M$ denotes the total number of agents and $\sigma_2$ is a network parameter. This matches the sample complexity of the centralized TDC. Moreover, our algorithm achieves the optimal communication complexity $\mathcal{O}\big(\frac{\sqrt{M}\ln\epsilon^{-1}}{1-\sigma_2}\big)$ for synchronizing the value function parameters, which is order-wise lower than the communication complexity of the existing decentralized TD(0). Numerical simulations corroborate our theoretical findings.

NeurIPS Conference 2022 Conference Paper

NeMF: Neural Motion Fields for Kinematic Animation

  • Chengan He
  • Jun Saito
  • James Zachary
  • Holly Rushmeier
  • Yi Zhou

We present an implicit neural representation to learn the spatio-temporal space of kinematic motions. Unlike previous work that represents motion as discrete sequential samples, we propose to express the vast motion space as a continuous function over time, hence the name Neural Motion Fields (NeMF). Specifically, we use a neural network to learn this function for miscellaneous sets of motions, which is designed to be a generative model conditioned on a temporal coordinate $t$ and a random vector $z$ for controlling the style. The model is then trained as a Variational Autoencoder (VAE) with motion encoders to sample the latent space. We train our model with a diverse human motion dataset and quadruped dataset to prove its versatility, and finally deploy it as a generic motion prior to solve task-agnostic problems and show its superiority in different motion generation and editing applications, such as motion interpolation, in-betweening, and re-navigating. More details can be found on our project page: https: //cs. yale. edu/homes/che/projects/nemf/.

NeurIPS Conference 2022 Conference Paper

Regularized Molecular Conformation Fields

  • Lihao Wang
  • Yi Zhou
  • Yiqun Wang
  • Xiaoqing Zheng
  • Xuanjing Huang
  • Hao Zhou

Predicting energetically favorable 3-dimensional conformations of organic molecules frommolecular graph plays a fundamental role in computer-aided drug discovery research. However, effectively exploring the high-dimensional conformation space to identify (meta) stable conformers is anything but trivial. In this work, we introduce RMCF, a novel framework to generate a diverse set of low-energy molecular conformations through samplingfrom a regularized molecular conformation field. We develop a data-driven molecular segmentation algorithm to automatically partition each molecule into several structural building blocks to reduce the modeling degrees of freedom. Then, we employ a Markov Random Field to learn the joint probability distribution of fragment configurations and inter-fragment dihedral angles, which enables us to sample from different low-energy regions of a conformation space. Our model constantly outperforms state-of-the-art models for the conformation generation task on the GEOM-Drugs dataset. We attribute the success of RMCF to modeling in a regularized feature space and learning a global fragment configuration distribution for effective sampling. The proposed method could be generalized to deal with larger biomolecular systems.

IJCAI Conference 2022 Conference Paper

SHAPE: An Unified Approach to Evaluate the Contribution and Cooperation of Individual Modalities

  • Pengbo Hu
  • Xingyu Li
  • Yi Zhou

As deep learning advances, there is an ever-growing demand for models capable of synthesizing information from multi-modal resources to address the complex tasks raised from real-life applications. Recently, many large multi-modal datasets have been collected, on which researchers actively explore different methods of fusing multi-modal information. However, little attention has been paid to quantifying the contribution of different modalities within the proposed models. In this paper, we propose the SHapley vAlue-based PErceptual (SHAPE) scores that measure the marginal contribution of individual modalities and the degree of cooperation across modalities. Using these scores, we systematically evaluate different fusion methods on different multi-modal datasets for different tasks. Our experiments suggest that for some tasks where different modalities are complementary, the multi-modal models still tend to use the dominant modality alone and ignore the cooperation across modalities. On the other hand, models learn to exploit cross-modal cooperation when different modalities are indispensable for the task. In this case, the scores indicate it is better to fuse different modalities at relatively early stages. We hope our scores can help improve the understanding of how the present multi-modal models operate on different modalities and encourage more sophisticated methods of integrating multiple modalities.

JBHI Journal 2022 Journal Article

Speckle Noise Reduction for OCT Images Based on Image Style Transfer and Conditional GAN

  • Yi Zhou
  • Kai Yu
  • Meng Wang
  • Yuhui Ma
  • Yuanyuan Peng
  • Zhongyue Chen
  • Weifang Zhu
  • Fei Shi

Raw optical coherence tomography (OCT) images typically are of low quality because speckle noise blurs retinal structures, severely compromising visual quality and degrading performances of subsequent image analysis tasks. In our previous study (Ma et al. , 2018), we have developed a Conditional Generative Adversarial Network (cGAN) for speckle noise removal in OCT images collected by several commercial OCT scanners, which we collectively refer to as scanner T. In this paper, we improve the cGAN model and apply it to our in-house OCT scanner (scanner B) for speckle noise suppression. The proposed model consists of two steps: 1) We train a Cycle-Consistent GAN (CycleGAN) to learn style transfer between two OCT image datasets collected by different scanners. The purpose of the CycleGAN is to leverage the ground truth dataset created in our previous study. 2) We train a mini-cGAN model based on the PatchGAN mechanism with the ground truth dataset to suppress speckle noise in OCT images. After training, we first apply the CycleGAN model to convert raw images collected by scanner B to match the style of the images from scanner T, and subsequently use the mini-cGAN model to suppress speckle noise in the style transferred images. We evaluate the proposed method on a dataset collected by scanner B. Experimental results show that the improved model outperforms our previous method and other state-of-the-art models in speckle noise removal, retinal structure preservation and contrast enhancement.

AAAI Conference 2022 Conference Paper

UNISON: Unpaired Cross-Lingual Image Captioning

  • Jiahui Gao
  • Yi Zhou
  • Philip L. H. Yu
  • Shafiq Joty
  • Jiuxiang Gu

Image captioning has emerged as an interesting research field in recent years due to its broad application scenarios. The traditional paradigm of image captioning relies on paired imagecaption datasets to train the model in a supervised manner. However, creating such paired datasets for every target language is prohibitively expensive, which hinders the extensibility of captioning technology and deprives a large part of the world population of its benefit. In this work, we present a novel unpaired cross-lingual method to generate image captions without relying on any caption corpus in the source or the target language. Specifically, our method consists of two phases: (i) a cross-lingual auto-encoding process, which utilizing a sentence parallel (bitext) corpus to learn the mapping from the source to the target language in the scene graph encoding space and decode sentences in the target language, and (ii) a cross-modal unsupervised feature mapping, which seeks to map the encoded scene graph features from image modality to language modality. We verify the effectiveness of our proposed method on the Chinese image caption generation task. The comparisons against several existing methods demonstrate the effectiveness of our approach.

NeurIPS Conference 2022 Conference Paper

Zero-Shot 3D Drug Design by Sketching and Generating

  • Siyu Long
  • Yi Zhou
  • Xinyu Dai
  • Hao Zhou

Drug design is a crucial step in the drug discovery cycle. Recently, various deep learning-based methods design drugs by generating novel molecules from scratch, avoiding traversing large-scale drug libraries. However, they depend on scarce experimental data or time-consuming docking simulation, leading to overfitting issues with limited training data and slow generation speed. In this study, we propose the zero-shot drug design method DESERT (Drug dEsign by SkEtching and geneRaTing). Specifically, DESERT splits the design process into two stages: sketching and generating, and bridges them with the molecular shape. The two-stage fashion enables our method to utilize the large-scale molecular database to reduce the need for experimental data and docking simulation. Experiments show that DESERT achieves a new state-of-the-art at a fast speed.

AAAI Conference 2021 Conference Paper

Curse or Redemption? How Data Heterogeneity Affects the Robustness of Federated Learning

  • Syed Zawad
  • Ahsan Ali
  • Pin-Yu Chen
  • Ali Anwar
  • Yi Zhou
  • Nathalie Baracaldo
  • Yuan Tian
  • Feng Yan

Data heterogeneity has been identified as one of the key features in federated learning but often overlooked in the lens of robustness to adversarial attacks. This paper focuses on characterizing and understanding its impact on backdooring attacks in federated learning through comprehensive experiments using synthetic and the LEAF benchmarks. The initial impression driven by our experimental results suggests that data heterogeneity is the dominant factor in the effectiveness of attacks and it may be a redemption for defending against backdooring as it makes the attack less efficient, more challenging to design effective attack strategies, and the attack result also becomes less predictable. However, with further investigations, we found data heterogeneity is more of a curse than a redemption as the attack effectiveness can be significantly boosted by simply adjusting the client-side backdooring timing. More importantly, data heterogeneity may result in overfitting at the local training of benign clients, which can be utilized by attackers to disguise themselves and fool skewed-feature based defenses. In addition, effective attack strategies can be made by adjusting attack data distribution. Finally, we discuss the potential directions of defending the curses brought by data heterogeneity. The results and lessons learned from our extensive experiments and analysis offer new insights for designing robust federated learning methods and systems.

AAAI Conference 2021 Conference Paper

Enhancing Balanced Graph Edge Partition with Effective Local Search

  • Zhenyu Guo
  • Mingyu Xiao
  • Yi Zhou
  • Dongxiang Zhang
  • Kian-Lee Tan

Graph partition is a key component to achieve workload balance and reduce job completion time in parallel graph processing systems. Among the various partition strategies, edge partition has demonstrated more promising performance in power-law graphs than vertex partition and thereby has been more widely adopted as the default partition strategy by existing graph systems. The graph edge partition problem, which is to split the edge set into multiple balanced parts to minimize the total number of copied vertices, has been widely studied from the view of optimization and algorithms. In this paper, we study local search algorithms for this problem to further improve the partition results from existing methods. More specifically, we propose two novel concepts, namely adjustable edges and blocks. Based on these, we develop a greedy heuristic as well as an improved search algorithm utilizing the property of the max-flow model. To evaluate the performance of our algorithms, we first provide adequate theoretical analysis in terms of the approximation quality. We significantly improve the previously known approximation ratio for this problem. Then we conduct extensive experiments on a large number of benchmark datasets and stateof-the-art edge partition strategies. The results show that our proposed local search framework can further improve the quality of graph partition by a wide margin.

AAAI Conference 2021 Conference Paper

Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation

  • Xueyi Li
  • Tianfei Zhou
  • Jianwu Li
  • Yi Zhou
  • Zhaoxiang Zhang

Acquiring sufficient ground-truth supervision to train deep visual models has been a bottleneck over the years due to the data-hungry nature of deep learning. This is exacerbated in some structured prediction tasks, such as semantic segmentation, which requires pixel-level annotations. This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation. We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths, which can be used for training more accurate segmentation models. In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes, and the underlying relations between a pair of images are characterized by an efficient co-attention mechanism. Moreover, in order to prevent the model from paying excessive attention to common semantics only, we further propose a graph dropout layer, encouraging the model to learn more accurate and complete object responses. The whole network is endto-end trainable by iterative message passing, which propagates interaction cues over the images to progressively improve the performance. We conduct experiments on the popular PASCAL VOC 2012 and COCO benchmarks, and our model yields state-of-the-art performance. Our code is available at: https: //github. com/Lixy1997/Group-WSSS.

AAAI Conference 2021 Conference Paper

Improving Maximum k-plex Solver via Second-Order Reduction and Graph Color Bounding

  • Yi Zhou
  • Shan Hu
  • Mingyu Xiao
  • Zhang-Hua Fu

In a graph, a k-plex is a vertex set in which every vertex is not adjacent to at most k vertices of this set. The maximum k-plex problem, which asks for the largest k-plex from the given graph, is a key primitive in a variety of real-world applications like community detection and so on. In the paper, we develop an exact algorithm, Maplex, for solving this problem in real world graphs practically. Based on the existing first-order and the novel second-order reduction rules, we design a powerful preprocessing method which efficiently removes redundant vertices and edges for Maplex. Also, the graph color heuristic is widely used for overestimating the maximum clique of a graph. For the first time, we generalize this technique for bounding the size of maximum k-plex in Maplex. Experiments are carried out to compare our algorithm with other state-of-the-art solvers on a wide range of publicly available graphs. Maplex outperforms all other algorithms on large real world graphs and is competitive with existing solvers on artificial dense graphs. Finally, we shed light on the effectiveness of each key component of Maplex.

AAAI Conference 2021 Conference Paper

Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for Thoracic Disease Identification

  • Yi Zhou
  • Lei Huang
  • Tianfei Zhou
  • Ling Shao

Chest X-rays are an important and accessible clinical imaging tool for the detection of many thoracic diseases. Over the past decade, deep learning, with a focus on the convolutional neural network (CNN), has become the most powerful computer-aided diagnosis technology for improving disease identification performance. However, training an effective and robust deep CNN usually requires a large amount of data with high annotation quality. For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming. Thus, existing public chest Xray datasets usually adopt language pattern based methods to automatically mine labels from reports. However, this results in label uncertainty and inconsistency. In this paper, we propose many-to-one distribution learning (MODL) and Knearest neighbor smoothing (KNNS) methods from two perspectives to improve a single model’s disease identification performance, rather than focusing on an ensemble of models. MODL integrates multiple models to obtain a soft label distribution for optimizing the single target model, which can reduce the effects of original label uncertainty. Moreover, KNNS aims to enhance the robustness of the target model to provide consistent predictions on images with similar medical findings. Extensive experiments on the public NIH Chest X-ray and CheXpert datasets show that our model achieves consistent improvements over the state-of-the-art methods.

NeurIPS Conference 2021 Conference Paper

Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth Function Approximation

  • Yue Wang
  • Shaofeng Zou
  • Yi Zhou

Temporal-difference learning with gradient correction (TDC) is a two time-scale algorithm for policy evaluation in reinforcement learning. This algorithm was initially proposed with linear function approximation, and was later extended to the one with general smooth function approximation. The asymptotic convergence for the on-policy setting with general smooth function approximation was established in [Bhatnagar et al. , 2009], however, the non-asymptotic convergence analysis remains unsolved due to challenges in the non-linear and two-time-scale update structure, non-convex objective function and the projection onto a time-varying tangent plane. In this paper, we develop novel techniques to address the above challenges and explicitly characterize the non-asymptotic error bound for the general off-policy setting with i. i. d. or Markovian samples, and show that it converges as fast as $\mathcal O(1/\sqrt T)$ (up to a factor of $\mathcal O(\log T)$). Our approach can be applied to a wide range of value-based reinforcement learning algorithms with general smooth function approximation.

NeurIPS Conference 2020 Conference Paper

A Statistical Mechanics Framework for Task-Agnostic Sample Design in Machine Learning

  • Bhavya Kailkhura
  • Jayaraman Thiagarajan
  • Qunwei Li
  • Jize Zhang
  • Yi Zhou
  • Timo Bremer

In this paper, we present a statistical mechanics framework to understand the effect of sampling properties of training data on the generalization gap of machine learning (ML) algorithms. We connect the generalization gap to the spatial properties of a sample design characterized by the pair correlation function (PCF). In particular, we express generalization gap in terms of the power spectra of the sample design and that of the function to be learned. Using this framework, we show that space-filling sample designs, such as blue noise and Poisson disk sampling, which optimize spectral properties, outperform random designs in terms of the generalization gap and characterize this gain in a closed-form. Our analysis also sheds light on design principles for constructing optimal task-agnostic sample designs that minimize the generalization gap. We corroborate our findings using regression experiments with neural networks on: a) synthetic functions, and b) a complex scientific simulator for inertial confinement fusion (ICF).

AAAI Conference 2020 System Paper

Embedding High-Level Knowledge into DQNs to Learn Faster and More Safely

  • Zihang Gao
  • Fangzhen Lin
  • Yi Zhou
  • Hao Zhang
  • Kaishun Wu
  • Haodi Zhang

Deep reinforcement learning has been successfully applied in many decision making scenarios. However, the slow training process and difficulty in explaining limit its application. In this paper, we attempt to address some of these problems by proposing a framework of Rule-interposing Learning (RIL) that embeds knowledge into deep reinforcement learning. In this framework, the rules dynamically effect the training progress, and accelerate the learning. The embedded knowledge in form of rule not only improves learning efficiency, but also prevents unnecessary or disastrous explorations at early stage of training. Moreover, the modularity of the framework makes it straightforward to transfer high-level knowledge among similar tasks.

AAAI Conference 2020 Conference Paper

Enumerating Maximal k -Plexes with Worst-Case Time Guarantee

  • Yi Zhou
  • Jingwei Xu
  • Zhenyu Guo
  • Mingyu Xiao
  • Yan Jin

The problem of enumerating all maximal cliques in a graph is a key primitive in a variety of real-world applications such as community detection and so on. However, in practice, communities are rarely formed as cliques due to data noise. Hence, k-plex, a subgraph in which any vertex is adjacent to all but at most k vertices, is introduced as a relaxation of clique. In this paper, we investigate the problem of enumerating all maximal k-plexes and present FaPlexen, an enumeration algorithm which integrates the “pivot” heuristic and new branching schemes. To our best knowledge, for the first time, FaPlexen lists all maximal k-plexes with provably worst-case running time O(n2 γn ) in a graph with n vertices, where γ < 2. Then, we propose another algorithm CommuPlex which non-trivially extends FaPlexen to find all maximal kplexes of prescribed size for community detection in massive real-life networks. We finally carry out experiments on both real and synthetic graphs and demonstrate that our algorithms run much faster than the state-of-the-art algorithms.

NeurIPS Conference 2020 Conference Paper

Fully Convolutional Mesh Autoencoder using Efficient Spatially Varying Kernels

  • Yi Zhou
  • Chenglei Wu
  • Zimo Li
  • Chen Cao
  • Yuting Ye
  • Jason Saragih
  • Hao Li
  • Yaser Sheikh

Learning latent representations of registered meshes is useful for many 3D tasks. Techniques have recently shifted to neural mesh autoencoders. Although they demonstrate higher precision than traditional methods, they remain unable to capture fine-grained deformations. Furthermore, these methods can only be applied to a template-specific surface mesh, and is not applicable to more general meshes, like tetrahedrons and non-manifold meshes. While more general graph convolution methods can be employed, they lack performance in reconstruction precision and require higher memory usage. In this paper, we propose a non-template-specific fully convolutional mesh autoencoder for arbitrary registered mesh data. It is enabled by our novel convolution and (un)pooling operators learned with globally shared weights and locally varying coefficients which can efficiently capture the spatially varying contents presented by irregular mesh connections. Our model outperforms state-of-the-art methods on reconstruction accuracy. In addition, the latent codes of our network are fully localized thanks to the fully convolutional structure, and thus have much higher interpolation capability than many traditional 3D mesh generation models.

AAAI Conference 2020 Conference Paper

Motion-Attentive Transition for Zero-Shot Video Object Segmentation

  • Tianfei Zhou
  • Shunzhou Wang
  • Yi Zhou
  • Yazhou Yao
  • Jianwu Li
  • Ling Shao

In this paper, we present a novel Motion-Attentive Transition Network (MATNet) for zero-shot video object segmentation, which provides a new way of leveraging motion information to reinforce spatio-temporal object representation. An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder, which transforms appearance features into motion-attentive representations at each convolutional stage. In this way, the encoder becomes deeply interleaved, allowing for closely hierarchical interactions between object motion and appearance. This is superior to the typical two-stream architecture, which treats motion and appearance separately in each stream and often suffers from overfitting to appearance information. Additionally, a bridge network is proposed to obtain a compact, discriminative and scale-sensitive representation for multilevel encoder features, which is further fed into a decoder to achieve segmentation results. Extensive experiments on three challenging public benchmarks (i. e. DAVIS-16, FBMS and Youtube-Objects) show that our model achieves compelling performance against the state-of-the-arts. Code is available at: https: //github. com/tfzhou/MATNet.

IJCAI Conference 2020 Conference Paper

Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart for Nonconvex Optimization

  • Yi Zhou
  • Zhe Wang
  • Kaiyi Ji
  • Yingbin Liang
  • Vahid Tarokh

Various types of parameter restart schemes have been proposed for proximal gradient algorithm with momentum to facilitate their convergence in convex optimization. However, under parameter restart, the convergence of proximal gradient algorithm with momentum remains obscure in nonconvex optimization. In this paper, we propose a novel proximal gradient algorithm with momentum and parameter restart for solving nonconvex and nonsmooth problems. Our algorithm is designed to 1) allow for adopting flexible parameter restart schemes that cover many existing ones; 2) have a global sub-linear convergence rate in nonconvex and nonsmooth optimization; and 3) have guaranteed convergence to a critical point and have various types of asymptotic convergence rates depending on the parameterization of local geometry in nonconvex and nonsmooth optimization. Numerical experiments demonstrate the convergence and effectiveness of our proposed algorithm.

NeurIPS Conference 2020 Conference Paper

Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis

  • Shaocong Ma
  • Yi Zhou
  • Shaofeng Zou

Variance reduction techniques have been successfully applied to temporal-difference (TD) learning and help to improve the sample complexity in policy evaluation. However, the existing work applied variance reduction to either the less popular one time-scale TD algorithm or the two time-scale GTD algorithm but with a finite number of i. i. d. \ samples, and both algorithms apply to only the on-policy setting. In this work, we develop a variance reduction scheme for the two time-scale TDC algorithm in the off-policy setting and analyze its non-asymptotic convergence rate over both i. i. d. \ and Markovian samples. In the i. i. d setting, our algorithm achieves an improved sample complexity $\calO(\epsilon^{-\frac{3}{5}} \log{\epsilon}^{-1})$ over the state-of-the-art result $\calO(\epsilon^{-1} \log {\epsilon}^{-1})$. In the Markovian setting, our algorithm achieves the state-of-the-art sample complexity $\calO(\epsilon^{-1} \log {\epsilon}^{-1})$ that is near-optimal. Experiments demonstrate that the proposed variance-reduced TDC achieves a smaller asymptotic convergence error than both the conventional TDC and the variance-reduced TD.

NeurIPS Conference 2019 Conference Paper

A unified variance-reduced accelerated gradient method for convex optimization

  • Guanghui Lan
  • Zhize Li
  • Yi Zhou

We propose a novel randomized incremental gradient algorithm, namely, VAriance-Reduced Accelerated Gradient (Varag), for finite-sum optimization. Equipped with a unified step-size policy that adjusts itself to the value of the conditional number, Varag exhibits the unified optimal rates of convergence for solving smooth convex finite-sum problems directly regardless of their strong convexity. Moreover, Varag is the first accelerated randomized incremental gradient method that benefits from the strong convexity of the data-fidelity term to achieve the optimal linear convergence. It also establishes an optimal linear rate of convergence for solving a wide class of problems only satisfying a certain error bound condition rather than strong convexity. Varag can also be extended to solve stochastic finite-sum problems.

IJCAI Conference 2019 Conference Paper

How Well Do Machines Perform on IQ tests: a Comparison Study on a Large-Scale Dataset

  • Yusen Liu
  • Fangyuan He
  • Haodi Zhang
  • Guozheng Rao
  • Zhiyong Feng
  • Yi Zhou

AI benchmarking becomes an increasingly important task. As suggested by many researchers, Intelligence Quotient (IQ) tests, which is widely regarded as one of the predominant benchmarks for measuring human intelligence, raises an interesting challenge for AI systems. For better solving IQ tests automatedly by machines, one needs to use, combine and advance many areas in AI including knowledge representation and reasoning, machine learning, natural language processing and image understanding. Also, automated IQ tests provides an ideal testbed for integrating symbolic and sub-symbolic approaches as both are found useful here. Hence, we argue that IQ tests, although not suitable for testing machine intelligence, provides an excellent benchmark for the current development of AI research. Nevertheless, most existing IQ test datasets are not comprehensive enough for this purpose. As a result, the conclusions obtained are not representative. To address this issue, we create IQ10k, a large-scale dataset that contains more than 10, 000 IQ test questions. We also conduct a comparison study on IQ10k with a number of state-of-the-art approaches.

NeurIPS Conference 2019 Conference Paper

SpiderBoost and Momentum: Faster Variance Reduction Algorithms

  • Zhe Wang
  • Kaiyi Ji
  • Yi Zhou
  • Yingbin Liang
  • Vahid Tarokh

SARAH and SPIDER are two recently developed stochastic variance-reduced algorithms, and SPIDER has been shown to achieve a near-optimal first-order oracle complexity in smooth nonconvex optimization. However, SPIDER uses an accuracy-dependent stepsize that slows down the convergence in practice, and cannot handle objective functions that involve nonsmooth regularizers. In this paper, we propose SpiderBoost as an improved scheme, which allows to use a much larger constant-level stepsize while maintaining the same near-optimal oracle complexity, and can be extended with proximal mapping to handle composite optimization (which is nonsmooth and nonconvex) with provable convergence guarantee. In particular, we show that proximal SpiderBoost achieves an oracle complexity of O(min{n^{1/2}\epsilon^{-2}, \epsilon^{-3}}) in composite nonconvex optimization, improving the state-of-the-art result by a factor of O(min{n^{1/6}, \epsilon^{-1/3}}). We further develop a novel momentum scheme to accelerate SpiderBoost for composite optimization, which achieves the near-optimal oracle complexity in theory and substantial improvement in experiments.

AAAI Conference 2018 Conference Paper

Attention-based Belief or Disbelief Feature Extraction for Dependency Parsing

  • Haoyuan Peng
  • Lu Liu
  • Yi Zhou
  • Junying Zhou
  • Xiaoqing Zheng

Existing neural dependency parsers usually encode each word in a sentence with bi-directional LSTMs, and estimate the score of an arc from the LSTM representations of the head and the modifier, possibly missing relevant context information for the arc being considered. In this study, we propose a neural feature extraction method that learns to extract arcspecific features. We apply a neural network-based attention method to collect evidences for and against each possible head-modifier pair, with which our model computes certainty scores of belief and disbelief, and determines the final arc score by subtracting the score of disbelief from the one of belief. By explicitly introducing two kinds of evidences, the arc candidates can compete against each other based on more relevant information, especially for the cases where they share the same head or modifier. It makes possible to better discriminate two or more competing arcs by presenting their rivals (disbelief evidence). Experiments on various datasets show that our arc-specific feature extraction mechanism significantly improves the performance of bi-directional LSTMbased models by explicitly modeling long-distance dependencies. For both English and Chinese, the proposed model achieve a higher accuracy on dependency parsing task than most existing neural attention-based models.

NeurIPS Conference 2018 Conference Paper

Convergence of Cubic Regularization for Nonconvex Optimization under KL Property

  • Yi Zhou
  • Zhe Wang
  • Yingbin Liang

Cubic-regularized Newton's method (CR) is a popular algorithm that guarantees to produce a second-order stationary solution for solving nonconvex optimization problems. However, existing understandings of convergence rate of CR are conditioned on special types of geometrical properties of the objective function. In this paper, we explore the asymptotic convergence rate of CR by exploiting the ubiquitous Kurdyka-Lojasiewicz (KL) property of the nonconvex objective functions. In specific, we characterize the asymptotic convergence rate of various types of optimality measures for CR including function value gap, variable distance gap, gradient norm and least eigenvalue of the Hessian matrix. Our results fully characterize the diverse convergence behaviors of these optimality measures in the full parameter regime of the KL property. Moreover, we show that the obtained asymptotic convergence rates of CR are order-wise faster than those of first-order gradient descent algorithms under the KL property.

JMLR Journal 2018 Journal Article

Distributed Proximal Gradient Algorithm for Partially Asynchronous Computer Clusters

  • Yi Zhou
  • Yingbin Liang
  • Yaoliang Yu
  • Wei Dai
  • Eric P. Xing

With ever growing data volume and model size, an error-tolerant, communication efficient, yet versatile distributed algorithm has become vital for the success of many large-scale machine learning applications. In this work we propose m-PAPG, an implementation of the flexible proximal gradient algorithm in model parallel systems equipped with the partially asynchronous communication protocol. The worker machines communicate asynchronously with a controlled staleness bound $s$ and operate at different frequencies. We characterize various convergence properties of m-PAPG: 1) Under a general non-smooth and non-convex setting, we prove that every limit point of the sequence generated by m-PAPG is a critical point of the objective function; 2) Under an error bound condition of convex objective functions, we prove that the optimality gap decays linearly for every $s$ steps; 3) Under the Kurdyka-Łojasiewicz inequality and a sufficient decrease assumption, we prove that the sequences generated by m-PAPG converge to the same critical point, provided that a proximal Lipschitz condition is satisfied. [abs] [ pdf ][ bib ] &copy JMLR 2018. ( edit, beta )

AAAI Conference 2018 Conference Paper

RNN-Based Sequence-Preserved Attention for Dependency Parsing

  • Yi Zhou
  • Junying Zhou
  • Lu Liu
  • Jiangtao Feng
  • Haoyuan Peng
  • Xiaoqing Zheng

Recurrent neural networks (RNN) combined with attention mechanism has proved to be useful for various NLP tasks including machine translation, sequence labeling and syntactic parsing. The attention mechanism is usually applied by estimating the weights (or importance) of inputs and taking the weighted sum of inputs as derived features. Although such features have demonstrated their effectiveness, they may fail to capture the sequence information due to the simple weighted sum being used to produce them. The order of the words does matter to the meaning or the structure of the sentences, especially for syntactic parsing, which aims to recover the structure from a sequence of words. In this study, we propose an RNN-based attention to capture the relevant and sequence-preserved features from a sentence, and use the derived features to perform the dependency parsing. We evaluated the graph-based and transition-based parsing models enhanced with the RNN-based sequence-preserved attention on the both English PTB and Chinese CTB datasets. The experimental results show that the enhanced systems were improved with significant increase in parsing accuracy.

IJCAI Conference 2017 Conference Paper

Integrating Answer Set Programming with Semantic Dictionaries for Robot Task Planning

  • Dongcai Lu
  • Yi Zhou
  • Feng Wu
  • Zhao Zhang
  • Xiaoping Chen

In this paper, we propose a novel integrated task planning system for service robot in domestic domains. Given open-ended high-level user instructions in natural language, robots need to generate a plan, i. e. , a sequence of low-level executable actions, to complete the required tasks. To address this, we exploit the knowledge on semantic roles of common verbs defined in semantic dictionaries such as FrameNet and integrate it with Answer Set Programming --- a task planning framework with both representation language and solvers. In the experiments, we evaluated our approach using common benchmarks on service tasks and showed that it can successfully handle much more tasks than the state-of-the-art solution. Notably, we deployed the proposed planning system on our service robot for the annual RoboCup@Home competitions and achieved very encouraging results.

NeurIPS Conference 2015 Conference Paper

Analysis of Robust PCA via Local Incoherence

  • Huishuai Zhang
  • Yi Zhou
  • Yingbin Liang

We investigate the robust PCA problem of decomposing an observed matrix into the sum of a low-rank and a sparse error matrices via convex programming Principal Component Pursuit (PCP). In contrast to previous studies that assume the support of the error matrix is generated by uniform Bernoulli sampling, we allow non-uniform sampling, i. e. , entries of the low-rank matrix are corrupted by errors with unequal probabilities. We characterize conditions on error corruption of each individual entry based on the local incoherence of the low-rank matrix, under which correct matrix decomposition by PCP is guaranteed. Such a refined analysis of robust PCA captures how robust each entry of the low rank matrix combats error corruption. In order to deal with non-uniform error corruption, our technical proof introduces a new weighted norm and develops/exploits the concentration properties that such a norm satisfies.

IJCAI Conference 2015 Conference Paper

First-Order Disjunctive Logic Programming vs Normal Logic Programming

  • Yi Zhou

In this paper, we study the expressive power of firstorder disjunctive logic programming (DLP) and normal logic programming (NLP) under the stable model semantics. We show that, unlike the propositional case, first-order DLP is strictly more expressive than NLP. This result still holds even if auxiliary predicates are allowed, assuming NP 6= coNP. On the other side, we propose a partial translation from first-order DLP to NLP via unfolding and shifting, which suggests a sound yet incomplete approach to implement DLP via NLP solvers. We also identify some NLP definable subclasses, and conjecture to exactly capture NLP definability by unfolding and shifting.

KR Conference 2014 Short Paper

First-Order Default Logic Revisited

  • Yi Zhou

The unsatisfactory performance of Reiter’s original semantics for open first-order default theories is indeed a serious semantic issue. As pointed out by Reiter himself (1980), Reiter’s original proposal for default logic is unsatisfactory for open default theories because of Skolemization and grounding. In this paper, we reconsider this long-standing problem and propose a new world view semantics for firstorder default logic. Roughly speaking, a world view of a firstorder default theory is a maximal collection of structures satisfying the default theory where the default part is fixed by the world view itself. We show how this semantics generalizes classical first-order logic and first-order answer set programming, and we discuss its connections to Reiter’s semantics and other related semantics. We also argue that first-order default logic under the world view semantics provides a rich framework for integrating classical logic based and rule based formalisms in the first-order case. “the genuinely interesting cases involve open defaults. ” In this paper, we reconsider this long standing problem and propose a new world view semantics. Instead of firstorder theories, we use collections of structures sharing the same domain and function interpretations (called views) as the candidate solution concept. Roughly speaking, a world view of a first-order default theory is a maximal view (collection of structures) satisfying the default theory where the default part is fixed by the world view itself. Although related, this semantics is essentially different from Reiter’s extension semantics, even for closed default theories. However, they coincide on some restricted classes. Our work is also strongly motivated from the attempt to combine classical logic based formalisms and rule based formalisms, particularly the recent development in ontology engineering for adding rules onto the description logics layer (Baader and Hollunder 1995; Motik et al. 2006; Eiter et al. 2008; Motik and Rosati 2010; Lukasiewicz 2010; Zhou and Zhang 2012). We argue that default logic provides a rich framework for this task. First, we show that it generalizes both classical logic and first-order answer set programming. Then, we show how default logic can handle both monotonic reasoning and nonmonotonic reasoning and flexibly switch between open world reasoning and closed world reasoning. The main contribution of this paper is to provide a new world view semantics for first-order default logic. Although this is a long standing problem, we believe that research in this direction will further deepen our understandings and shed new insights in

ECAI Conference 2014 Conference Paper

From Disjunctive to Normal Logic Programs via Unfolding and Shifting

  • Yi Zhou

We show that every propositional disjunctive logic program under the answer set semantics can be equivalently transformed into a normal one via unfolding and shifting. More precisely, after iteratively applying the unfolding operator for some rules in a disjunctive program, its shifted program, which is a normal program, must have the same answer sets as the original disjunctive program.

KR Conference 2012 Short Paper

Forgetting in Logic Programs under Strong Equivalence

  • Yisong Wang
  • Yan Zhang
  • Yi Zhou
  • Mingyi Zhang

Wang (2008) then proposed a semantic forgetting for consistent disjunctive logic programs, which preserves equivalence but not strong equivalence. They specifically indicated the importance of preserving strong equivalence in logic programming forgetting and raised this issue as a future work. Wong (2009) proposed two forgetting operators for disjunctive logic programs. Although Wong’s forgetting indeed preserves strong equivalence, it may lose the intuition of weakening under various circumstances (see Related Work for details). In addition to preserving strong equivalence, expressibility is another desired criterion for logic programming forgetting. Ideally we would expect that the result of forgetting some atoms from a logic program is still expressible by a logic program. Finally, we believe that as a way of weakening, forgetting in logic programs should obey some common intuitions shared by forgetting in classical logics. For instance, forgetting something from a logic program should lead to a weaker program in certain sense. On the other hand, such weakening should only be associated to the relevant information that has been forgotten. For this purpose, Zhang and Zhou (2009) proposed four forgetting postulates to formalize these common intuitions and showed that forgetting in classical propositional logic and modal logic S5 can be precisely captured by these postulates. Interestingly, none of previous forgetting notions in logic programs actually satisfies Zhang-Zhou’s postulates. In summary, we consider the following criteria that a forgetting notion in logic program should meet: • Expressibility. The result of forgetting in an arbitrary logic program should also be expressible via an arbitrary logic program; • Preserving strong equivalence. Two strongly equivalent programs should remain strongly equivalence after forgetting the same set of atoms; • Satisfying common intuitions of forgetting. Preferably, forgetting in logic programs should be semantically characterized by Zhang-Zhou’s four forgetting postulates. In this paper we present a comprehensive study on forgetting in the context of arbitrary logic programs (propositional theories) under answer set semantics. In our approach, a program Π is viewed as a theory of the logic of here-andthere (or simply called HT logic), then forgetting a set V of In this paper, we propose a semantic forgetting for arbitrary logic programs (or propositional theories) under answer set semantics, called HT-forgetting. The HTforgetting preserves strong equivalence in the sense that strongly equivalent logic programs will remain strongly equivalent after forgetting the same set of atoms. The result of an HT-forgetting is always expressible by a logic program, and in particular, the result of an HT-forgetting in a Horn program is expressible in a Horn program; and a representation theorem shows that HT-forgetting can be precisely characterized by Zhang-Zhou’s four forgetting postulates under the logic of here-and-there. We also reveal underlying connections between HTforgetting and classical forgetting, and provide complexity results for decision problems.

AAAI Conference 2012 Conference Paper

Ordered Completion for Logic Programs with Aggregates

  • Vernon Asuncion
  • Yan Zhang
  • Yi Zhou

In this paper, we show that first-order logic programs with monotone aggregates under the stable model semantics can be captured in classical first-order logic. More precisely, we extend the notion of ordered completion for logic programs with a large variety of aggregates so that every stable model of a program with aggregates corresponds to a classical model of its enhanced ordered completion, and vice versa.

AAAI Conference 2011 Conference Paper

Bounded Forgetting

  • Yi Zhou
  • Yan Zhang

The result of forgetting some predicates in a first-order sentence may not exist in the sense that it might not be captured by any first-order sentences. This, indeed, severely restricts the usage of forgetting in applications. To address this issue, we propose a notion called k-forgetting, also called bounded forgetting in general, for any fixed number k. We present several equivalent characterizations of bounded forgetting and show that the result of bounded forgetting, on one hand, can always be captured by a single first-order sentence, and on the other hand, preserves the information that we are concerned with.

AAAI Conference 2011 Conference Paper

Progression Semantics for Disjunctive Logic Programs

  • Yi Zhou
  • Yan Zhang

In this paper, we extend the progression semantics for firstorder disjunctive logic programs and show that it coincides with the stable model semantics. Based on it, we further show how disjunctive answer set programming is related to Satisfiability Modulo Theories.

IJCAI Conference 2011 Conference Paper

Translating First-Order Theories into Logic Programs

  • Heng Zhang
  • Yan Zhang
  • Mingsheng Ying
  • Yi Zhou

This paper focuses on computing first-order theories under either stable model semantics or circumscription. A reduction from first-order theories to logic programs under stable model semantics over finite structures is proposed, and an embedding of circumscription into stable model semantics is also given. Having such reduction and embedding, reasoning problems represented by first-order theories under these two semantics can then be handled by using existing answer set solvers. The effectiveness of this approach in computing hard problems beyond NP is demonstrated by some experiments.

AAAI Conference 2010 Conference Paper

First-Order Indefinability of Answer Set Programs on Finite Structures

  • Yin Chen
  • Yan Zhang
  • Yi Zhou

An answer set program with variables is first-order definable on finite structures if the set of its finite answer sets can be captured by a first-order sentence, otherwise this program is first-order indefinable on finite structures. In this paper, we study the problem of first-order indefinability of answer set programs. We provide an Ehrenfeucht-Fraı̈ssé gametheoretic characterization for the first-order indefinability of answer set programs on finite structures. As an application of this approach, we show that the well-known finding Hamiltonian cycles program is not first-order definable on finite structures. We then define two notions named the 0-1 property and unbounded cycles or paths under the answer set semantics, from which we develop two sufficient conditions that may be effectively used in proving a program’s first-order indefinability on finite structures under certain circumstances.

KR Conference 2010 Conference Paper

Forgetting Revisited

  • Yi Zhou
  • Yan Zhang

In this paper, we propose an alternative notion, called weak forgetting, of forgetting a set of predicates in a first-order theory. One important feature of this new notion is that the result of weak forgetting is always first-order expressible. In contrast, this is not the case for the traditional notion of forgetting, called strong forgetting, introduced by Lin and Reiter. As a consequence, these two notions are not exactly the same. Interestingly, we prove that they coincide when the result of strong forgetting is first-order expressible. We also present a representation theorem to characterize weak forgetting from different aspects.

KR Conference 2010 Conference Paper

On the Progression Semantics and Boundedness of Answer Set Programs

  • Yan Zhang
  • Yi Zhou

In this paper, we propose a progression semantics for firstorder answer set programs. Based on this new semantics, we are able to define the notion of boundedness for answer set programming. We prove that boundedness coincides with the notions of recursion-free and loop-free under program equivalence, and is also equivalent to first-order definability of answer set programs on arbitrary structures.

AAAI Conference 2010 Conference Paper

Ordered Completion for First-Order Logic Programs on Finite Structures

  • Vernon Asuncion
  • Fangzhen Lin
  • Yan Zhang
  • Yi Zhou

In this paper, we propose a translation from normal first-order logic programs under the answer set semantics to first-order theories on finite structures. Specifically, we introduce ordered completions which are modifications of Clark’s completions with some extra predicates added to keep track of the derivation order, and show that on finite structures, classical models of the ordered-completion of a normal logic program correspond exactly to the answer sets (stable models) of the logic program.

AAMAS Conference 2008 Conference Paper

Partial Goal Satisfaction and Goal Change

  • Yi Zhou
  • Leon van der Torre
  • Yan Zhang

Partial implication semantics in the context of a background theory has been introduced to formalize partial goal satisfaction in the context of beliefs. In this paper, we introduce strong partial implication prohibiting redundancies and weak partial implication allowing side effects, we study their semantic as well as complexity properties, and we apply the three notions of partial implication to goal change in the context of beliefs.

IJCAI Conference 2007 Conference Paper

  • Fangzhen Lin
  • Yi Zhou

We first provide a mapping from Pearce's equilibrium logic and Ferraris and Lifschitz's general logic programs to Lin and Shoham's logic of knowledge and justified assumptions, a nonmonotonic modal logic that has been shown to include as special cases both Reiter's default logic in the propositional case and Moore's autoepistemic logic. From this mapping, we obtain a mapping from general logic programs to circumscription, both in the propositional and first-order case. Furthermore, we show that this mapping can be used to check the strong equivalence between two propositional logic programs in classical logic.

KR Conference 2004 Conference Paper

Partial Implication Semantics for Desirable Propositions

  • Xiaoping Chen
  • Yi Zhou

Motivational attitudes play an important role in investigations into intelligent agents. One of the key problems of representing and reasoning about motivational attitudes is which propositions are the desirable ones. The answer based on classical logic is that the propositions that logically imply the goal are desirable and the others are not. We argue that this criterion is inadequate for the incomplete knowledge about environments for an agent. In this paper, we present a simple and intuitive semantics---partial implication---for the characterization of desirable propositions. In this semantics, Proposition P is a desirable one with respect to a given goal Q if and only if P is "useful" and "harmless" to Q in any situation. Partial implication is an extension of classical implication. We investigate some fundamental properties of partial implication and discuss some of the potential applications.