Arrow Research search

Author name cluster

Peng Ye

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

21 papers
2 author rows

Possible papers

21

AAAI Conference 2026 Conference Paper

Mitigating Low-Quality Reasoning in MLLMs: Self-Driven Refined Multimodal CoT with Selective Thinking and Step-wise Visual Enhancement

  • Chongjun Tu
  • Peng Ye
  • Dongzhan Zhou
  • Tao Chen
  • Wanli Ouyang

Current Multimodal Chain-of-Thought (MCoT) methods suffer from low-quality multimodal reasoning, characterized by overthinking on simple queries and inefficient utilization of visual information, resulting in vast inefficient and ineffective computations. In this paper, we discover that Multimodal Large Language Models (MLLMs) possess inherent capabilities to distinguish between simple and difficult queries and enhance task-related visual information, which remain underutilized by existing approaches. Based on this insight, we propose Self-Driven Refined Multimodal CoT (SDR-MCoT), a training-free framework that mitigates these issues through two self-driven modules. First, our selective thinking module employs entropy-based confidence estimation to determine whether queries require detailed reasoning, preventing overthinking on simple questions. Second, our step-wise visual enhancement module strengthens attention to relevant visual regions at each reasoning step without inserting additional tokens, achieving fine-grained visual grounding and enhancement with minimal overhead. Moreover, SDR-MCoT can be seamlessly integrated into various MLLMs, offering a practical solution for improving multimodal reasoning. Comprehensive experiments across eight benchmarks from diverse domains (multimodal reasoning, visual understanding, hallucination, and mathematical reasoning) demonstrate that SDR-MCoT consistently outperforms existing MCoT methods on four different base models with reduced overhead. For instance, on Qwen2-VL-7B, our method improves average accuracy by over 6% while reducing token consumption by approximately 60% compared to zero-shot CoT.

AAAI Conference 2026 Conference Paper

The Avengers: A Routing Recipe for Collective Intelligence in Language Models

  • Yiqun Zhang
  • Hao Li
  • Chenxu Wang
  • Linyao Chen
  • Qiaosheng Zhang
  • Peng Ye
  • Shi Feng
  • Xinrun Wang

Proprietary models are increasingly dominating the race for ever-larger language models. Can open-source, smaller models remain competitive across a broad range of tasks? In this paper, we present the Avengers---a lightweight framework that leverages the collective intelligence of these smaller models. The Avengers builds upon four lightweight operations: (i) embedding: encode queries using a text embedding model; (ii) clustering: group queries based on their semantic similarity; (iii) scoring: scores each model's performance within each cluster; and (iv) voting: improve outputs via repeated sampling and voting. At inference time, each query is embedded and assigned to its nearest cluster. The top-performing model(s) within that cluster are selected to generate the response with repeated sampling. Remarkably, with 10 open-source models (~7B parameters each), the Avengers surpasses GPT-4o, 4.1, and 4.5 in average performance across 15 diverse datasets spanning mathematics, coding, logical reasoning, general knowledge, and affective tasks. In particular, it surpasses GPT-4.1 on mathematics tasks by 18.21% and on code tasks by 7.46%. Furthermore, the Avengers delivers superior out-of-distribution generalization, and remains robust across various embedding models, clustering algorithms, ensemble strategies, data efficiency, and values of its sole parameter---the number of clusters.

AAAI Conference 2025 Conference Paper

All-in-One: Transferring Vision Foundation Models into Stereo Matching

  • Jingyi Zhou
  • Haoyu Zhang
  • Jiakang Yuan
  • Peng Ye
  • Tao Chen
  • Hao Jiang
  • Meiya Chen
  • Yangyang Zhang

As a fundamental vision task, stereo matching has made remarkable progress. While recent iterative optimization-based methods have achieved promising performance, their feature extraction capabilities still have room for improvement. Inspired by the ability of vision foundation models (VFMs) to extract general representations, in this work, we propose AIO-Stereo which can flexibly select and transfer knowledge from multiple heterogeneous VFMs to a single stereo matching model. To better reconcile features between heterogeneous VFMs and the stereo matching model and fully exploit prior knowledge from VFMs, we proposed a dual-level feature utilization mechanism that aligns heterogeneous features and transfers multi-level knowledge. Based on the mechanism, a dual-level selective knowledge transfer module is designed to selectively transfer knowledge and integrate the advantages of multiple VFMs. Experimental results show that AIO-Stereo achieves start-of-the-art performance on multiple datasets and ranks 1st on the Middlebury dataset and outperforms all the published work on the ETH3D benchmark.

NeurIPS Conference 2025 Conference Paper

Breaking the Compression Ceiling: Data-Free Pipeline for Ultra-Efficient Delta Compression

  • Xiaohui Wang
  • Peng Ye
  • Chenyu Huang
  • Shenghe Zheng
  • Bo Zhang
  • Lei Bai
  • Wanli Ouyang
  • Tao Chen

With the rise of the fine-tuned–pretrained paradigm, storing numerous fine-tuned models for multi-tasking creates significant storage overhead. Delta compression alleviates this by storing only the pretrained model and the highly compressed delta weights (the differences between fine-tuned and pretrained model weights). However, existing methods fail to maintain both high compression and performance, and often rely on data. To address these challenges, we propose UltraDelta, the first data-free delta compression pipeline that achieves both ultra-high compression and strong performance. UltraDelta is designed to minimize redundancy, maximize information, and stabilize performance across inter-layer, intra-layer, and global dimensions, using three key components: (1) Variance-Based Mixed Sparsity Allocation assigns sparsity based on variance, giving lower sparsity to high-variance layers to preserve inter-layer information. (2) Distribution-Aware Compression applies uniform quantization and then groups parameters by value, followed by group-wise pruning, to better preserve intra-layer distribution. (3) Trace-Norm-Guided Rescaling uses the trace norm of delta weights to estimate a global rescaling factor, improving model stability under higher compression. Extensive experiments across (a) large language models (fine-tuned on LLaMA-2 7B and 13B) with up to 50$\times$ compression, (b) general NLP models (RoBERTa-base, T5-base) with up to 224$\times$ compression, (c) vision models (ViT-B/32, ViT-L/14) with up to 132$\times$ compression, and (d) multi-modal models (BEiT-3) with 18$\times$ compression, demonstrate that UltraDelta consistently outperforms existing methods, especially under ultra-high compression. Code is available at https: //github. com/xiaohuiwang000/UltraDelta.

NeurIPS Conference 2025 Conference Paper

FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding

  • Chongjun Tu
  • Lin Zhang
  • pengtao chen
  • Peng Ye
  • Xianfang Zeng
  • Wei Cheng
  • Gang Yu
  • Tao Chen

Multimodal Large Language Models (MLLMs) have shown impressive video content understanding capabilities but struggle with fine-grained motion comprehension. To comprehensively assess the motion understanding ability of existing MLLMs, we introduce FAVOR-Bench, which comprises 1, 776 videos from both ego-centric and third-person perspectives and enables assessment through both close-ended and open-ended tasks. For close-ended evaluation, we carefully design 8, 184 multiple-choice question-answer pairs spanning six distinct sub-tasks. For open-ended evaluation, we employ the GPT-assisted evaluation and develop a novel cost-efficient LLM-free assessment method, where the latter can enhance benchmarking interpretability and accessibility. Comprehensive experiments with21 state-of-the-art MLLMs reveal significant limitations in their ability to comprehend and describe detailed temporal dynamics in video motions. To alleviate this limitation, we further build FAVOR-Train, a dataset of 17, 152 videos with fine-grained motion annotations. Finetuning Qwen2. 5-VL on FAVOR-Train yields consistent improvements on motion-related tasks across TVBench, MotionBenchand our FAVOR-Bench. Our assessment results demonstrate that the proposed FAVOR-Bench and FAVOR-Train provide valuable tools for the community to develop more powerful video understanding models.

NeurIPS Conference 2025 Conference Paper

GoRA: Gradient-driven Adaptive Low Rank Adaptation

  • Haonan He
  • Peng Ye
  • Yuchen Ren
  • Yuan Yuan
  • Lei Chen

Low-Rank Adaptation (LoRA) is a crucial method for efficiently fine-tuning large language models (LLMs), with its effectiveness influenced by two key factors: rank selection and weight initialization. While numerous LoRA variants have been proposed to improve performance by addressing one of these aspects, they often compromise usability or computational efficiency. In this paper, we analyze and identify the core limitations of existing approaches and propose a novel framework— GoRA ( G radient-driven Adaptive L o w R ank A daptation)—that simultaneously adapts both the rank and initialization strategy within a unified framework. GoRA leverages gradient information during training to dynamically assign optimal ranks and initialize low-rank adapter weights in an adaptive manner. To our knowledge, GoRA is the first method that not only addresses the limitations of prior approaches—which often focus on either rank selection or initialization in isolation—but also unifies both aspects within a single framework, enabling more effective and efficient adaptation. Extensive experiments across various architectures and modalities show that GoRA consistently outperforms existing LoRA-based methods while preserving the efficiency of vanilla LoRA. For example, when fine-tuning Llama3. 1-8B-Base for mathematical reasoning, GoRA achieves a 5. 13-point improvement over standard LoRA and even outperforms full fine-tuning by 2. 05 points under high-rank settings. Code is available at: https: //github. com/hhnqqq/MyTransformers.

NeurIPS Conference 2025 Conference Paper

MLLM-ISU: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models based Intrusion Scene Understanding

  • Fujun Han
  • Peng Ye

Vision-based intrusion detection has multiple applications in practical scenarios, e. g. , autonomous driving, intelligent monitoring, and security. Previous works mainly focus on improving the intrusion detection performance, without a comprehensive and in-depth understanding of the intrusion scene. To fill this gap, we explore a novel task called Multimodal Large Language Models based Intrusion Scene Understanding (MLLM-ISU) and report a comprehensive benchmark for the task. Specifically, we first design an effective and automatic visual question-answer generation strategy, constructing a new MLLM-ISU dataset, with 3000 VQA evaluation Pairs, 8925 training Pairs, and six relevant subtasks. Then, we perform a comprehensive assessment on various state-of-the-art proprietary and open-source MLLMs, e. g. , DeepSeek-VL2, GPT-4o, Qwen2. 5-VL, etc, and find that current MLLMs have weak abilities for this task. Further, in order to improve the intrusion understanding capabilities of current MLLMs, we propose a Post-Training Framework with three sequential training stages, i. e. , Intrusion-aware Visual Instruction Pre-training, Intrusion Chain of Thought tuning, and Intrusion-centric VQA tuning, and sufficient experiments and comparisons are conducted to verify the effectiveness of the proposed three-stages training framework. Available datasets and codes: https: //github. com/1012537710/MLLM-ISU.

NeurIPS Conference 2025 Conference Paper

PaceLLM: Brain-Inspired Large Language Models for Long-Context Understanding

  • Kangcong Li
  • Peng Ye
  • Chongjun Tu
  • Lin Zhang
  • Chunfeng Song
  • Jiamin Wu
  • Tao Yang
  • Qihao Zheng

While Large Language Models (LLMs) demonstrate strong performance across domains, their long-context capabilities are limited by transient neural activations causing information decay and unstructured feed-forward network (FFN) weights leading to semantic fragmentation. Inspired by the brain’s working memory and cortical modularity, we propose PaceLLM, featuring two innovations: (1) a Persistent Activity (PA) Mechanism that mimics prefrontal cortex (PFC) neurons’ persistent firing by introducing an activation-level memory bank to dynamically retrieve, reuse, and update critical FFN states, addressing contextual decay; and (2) Cortical Expert (CE) Clustering that emulates task-adaptive neural specialization to reorganize FFN weights into semantic modules, establishing cross-token dependencies and mitigating fragmentation. Extensive evaluations show that PaceLLM achieves 6% improvement on LongBench’s Multi-document QA and 12. 5–17. 5% performance gains on $\infty$-Bench tasks, while extending measurable context length to 200K tokens in Needle-In-A-Haystack (NIAH) tests. This work pioneers brain-inspired LLM optimization and is complementary to other works. Besides, it can be generalized to any model and enhance their long-context performance and interpretability without structural overhauls.

NeurIPS Conference 2025 Conference Paper

Private Online Learning against an Adaptive Adversary: Realizable and Agnostic Settings

  • Bo Li
  • Wei Wang
  • Peng Ye

We revisit the problem of private online learning, in which a learner receives a sequence of $T$ data points and has to respond at each time-step a hypothesis. It is required that the entire stream of output hypotheses should satisfy differential privacy. Prior work of Golowich and Livni [2021] established that every concept class $\mathcal{H}$ with finite Littlestone dimension $d$ is privately online learnable in the realizable setting. In particular, they proposed an algorithm that achieves an $O_{d}(\log T)$ mistake bound against an oblivious adversary. However, their approach yields a suboptimal $\tilde{O}\_{d}(\sqrt{T})$ bound against an adaptive adversary. In this work, we present a new algorithm with a mistake bound of $O_{d}(\log T)$ against an adaptive adversary, closing this gap. We further investigate the problem in the agnostic setting, which is more general than the realizable setting as it does not impose any assumptions on the data. We give an algorithm that obtains a sublinear regret of $\tilde{O}_d(\sqrt{T})$ for generic Littlestone classes, demonstrating that they are also privately online learnable in the agnostic setting.

NeurIPS Conference 2025 Conference Paper

Scaling Physical Reasoning with the PHYSICS Dataset

  • Shenghe Zheng
  • Qianjia Cheng
  • Junchi Yao
  • Mengsong Wu
  • Haonan He
  • Ning Ding
  • Yu Cheng
  • Shuyue Hu

Large Language Models (LLMs) have achieved remarkable progress on advanced reasoning tasks such as mathematics and coding competitions. Meanwhile, physics, despite being both reasoning-intensive and essential to real-world understanding, received limited academic and industrial attention. This paper introduces PHYSICS, a dataset containing 16, 568 high-quality physics problems spanning subjects and difficulty levels, to facilitate this issue. Specifically, PHYSICS is curated with exercises from over 100 textbooks through a carefully designed pipeline for quality control. It covers five major physics domains: Mechanics, Electromagnetism, Thermodynamics, Optics, and Modern Physics. It also spans a wide range of difficulty levels, from high school to graduate-level physics courses. To utilize the data for improving and evaluating the model's physical reasoning capabilities, we split the dataset into training and test sets, and provide reasoning paths generated by powerful reasoning models for the training data to facilitate model training. In addition, for the evaluation part, we find that existing evaluation frameworks exhibit biases in aspects such as units, simplification, and precision in physics domain. To balance efficiency and accuracy, we introduce a Rule+Model evaluation framework tailored to physics problems. Our evaluations on current state-of-the-art open-source and proprietary models highlight the limitations of current models in handling physics-related tasks. We hope that our dataset and evaluation methodology will jointly advance the development of LLMs in the field of physics. The code and data can be found at: https: //github. com/Zhengsh123/PHYSICS.

NeurIPS Conference 2025 Conference Paper

scMRDR: A scalable and flexible framework for unpaired single-cell multi-omics data integration

  • Jianle Sun
  • Chaoqi Liang
  • Ran Wei
  • Peng Zheng
  • Lei Bai
  • Wanli Ouyang
  • Hongliang Yan
  • Peng Ye

Advances in single-cell sequencing have enabled high-resolution profiling of diverse molecular modalities, while integrating unpaired multi-omics single-cell data remains challenging. Existing approaches either rely on pair information or prior correspondences, or require computing a global pairwise coupling matrix, limiting their scalability and flexibility. In this paper, we introduce a scalable and flexible generative framework called single-cell Multi-omics Regularized Disentangled Representations (scMRDR) for unpaired multi-omics integration. Specifically, we disentangle each cell’s latent representations into modality-shared and modality-specific components using a well-designed $\beta$-VAE architecture, which are augmented with isometric regularization to preserve intra-omics biological heterogeneity, adversarial objective to encourage cross-modal alignment, and masked reconstruction loss strategy to address the issue of missing features across modalities. Our method achieves excellent performance on benchmark datasets in terms of batch correction, modality alignment, and biological signal preservation. Crucially, it scales effectively to large-scale datasets and supports integration of more than two omics, offering a powerful and flexible solution for large-scale multi-omics data integration and downstream biological discovery.

NeurIPS Conference 2024 Conference Paper

BEACON: Benchmark for Comprehensive RNA Tasks and Language Models

  • Yuchen Ren
  • Zhiyuan Chen
  • Lifeng Qiao
  • Hongtai Jing
  • Yuchen Cai
  • Sheng Xu
  • Peng Ye
  • Xinzhu Ma

RNA plays a pivotal role in translating genetic instructions into functional outcomes, underscoring its importance in biological processes and disease mechanisms. Despite the emergence of numerous deep learning approaches for RNA, particularly universal RNA language models, there remains a significant lack of standardized benchmarks to assess the effectiveness of these methods. In this study, we introduce the first comprehensive RNA benchmark BEACON BE nchm A rk for CO mprehensive R N A Task and Language Models). First, BEACON comprises 13 distinct tasks derived from extensive previous work covering structural analysis, functional studies, and engineering applications, enabling a comprehensive assessment of the performance of methods on various RNA understanding tasks. Second, we examine a range of models, including traditional approaches like CNNs, as well as advanced RNA foundation models based on language models, offering valuable insights into the task-specific performances of these models. Third, we investigate the vital RNA language model components from the tokenizer and positional encoding aspects. Notably, our findings emphasize the superiority of single nucleotide tokenization and the effectiveness of Attention with Linear Biases (ALiBi) over traditional positional encoding methods. Based on these insights, a simple yet strong baseline called BEACON-B is proposed, which can achieve outstanding performance with limited data and computational resources. The datasets and source code of our benchmark are available at https: //github. com/terry-r123/RNABenchmark.

AAAI Conference 2024 Conference Paper

Boosting Residual Networks with Group Knowledge

  • Shengji Tang
  • Peng Ye
  • Baopu Li
  • Weihao Lin
  • Tao Chen
  • Tong He
  • Chong Yu
  • Wanli Ouyang

Recent research understands the residual networks from a new perspective of the implicit ensemble model. From this view, previous methods such as stochastic depth and stimulative training have further improved the performance of the residual network by sampling and training of its subnets. However, they both use the same supervision for all subnets of different capacities and neglect the valuable knowledge generated by subnets during training. In this manuscript, we mitigate the significant knowledge distillation gap caused by using the same kind of supervision and advocate leveraging the subnets to provide diverse knowledge. Based on this motivation, we propose a group knowledge based training framework for boosting the performance of residual networks. Specifically, we implicitly divide all subnets into hierarchical groups by subnet-in-subnet sampling, aggregate the knowledge of different subnets in each group during training, and exploit upper-level group knowledge to supervise lower-level subnet group. Meanwhile, we also develop a subnet sampling strategy that naturally samples larger subnets, which are found to be more helpful than smaller subnets in boosting performance for hierarchical groups. Compared with typical subnet training and other methods, our method achieves the best efficiency and performance trade-offs on multiple datasets and network structures. The code is at https://github.com/tsj-001/AAAI24-GKT.

NeurIPS Conference 2024 Conference Paper

EMR-Merging: Tuning-Free High-Performance Model Merging

  • Chenyu Huang
  • Peng Ye
  • Tao Chen
  • Tong He
  • Xiangyu Yue
  • Wanli Ouyang

The success of pretrain-finetune paradigm brings about the release of numerous model weights. In this case, merging models finetuned on different tasks to enable a single model with multi-task capabilities is gaining increasing attention for its practicability. Existing model merging methods usually suffer from (1) significant performance degradation or (2) requiring tuning by additional data or training. In this paper, we rethink and analyze the existing model merging paradigm. We discover that using a single model's weights can hardly simulate all the models' performance. To tackle this issue, we propose Elect, Mask & Rescale-Merging (EMR-Merging). We first (a) elect a unified model from all the model weights and then (b) generate extremely lightweight task-specific modulators, including masks and rescalers, to align the direction and magnitude between the unified model and each specific model, respectively. EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance. We find that EMR-Merging shows outstanding performance compared to existing merging methods under different classical and newly-established settings, including merging different numbers of vision models (up to 30), NLP models, PEFT models, and multi-modal models.

NeurIPS Conference 2024 Conference Paper

FNP: Fourier Neural Processes for Arbitrary-Resolution Data Assimilation

  • Kun Chen
  • Peng Ye
  • Hao Chen
  • Kang Chen
  • Tao Han
  • Wanli Ouyang
  • Tao Chen
  • Lei Bai

Data assimilation is a vital component in modern global medium-range weather forecasting systems to obtain the best estimation of the atmospheric state by combining the short-term forecast and observations. Recently, AI-based data assimilation approaches have attracted increasing attention for their significant advantages over traditional techniques in terms of computational consumption. However, existing AI-based data assimilation methods can only handle observations with a specific resolution, lacking the compatibility and generalization ability to assimilate observations with other resolutions. Considering that complex real-world observations often have different resolutions, we propose the Fourier Neural Processes (FNP) for arbitrary-resolution data assimilation in this paper. Leveraging the efficiency of the designed modules and flexible structure of neural processes, FNP achieves state-of-the-art results in assimilating observations with varying resolutions, and also exhibits increasing advantages over the counterparts as the resolution and the amount of observations increase. Moreover, our FNP trained on a fixed resolution can directly handle the assimilation of observations with out-of-distribution resolutions and the observational information reconstruction task without additional fine-tuning, demonstrating its excellent generalization ability across data resolutions as well as across tasks. Code is available at https: //github. com/OpenEarthLab/FNP.

NeurIPS Conference 2024 Conference Paper

Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with MxDNA

  • Lifeng Qiao
  • Peng Ye
  • Yuchen Ren
  • Weiqiang Bai
  • Chaoqi Liang
  • Xinzhu Ma
  • Nanqing Dong
  • Wanli Ouyang

Foundation models have made significant strides in understanding the genomic language of DNA sequences. However, previous models typically adopt the tokenization methods designed for natural language, which are unsuitable for DNA sequences due to their unique characteristics. In addition, the optimal approach to tokenize DNA remains largely under-explored, and may not be intuitively understood by humans even if discovered. To address these challenges, we introduce MxDNA, a novel framework where the model autonomously learns an effective DNA tokenization strategy through gradient decent. MxDNA employs a sparse Mixture of Convolution Experts coupled with a deformable convolution to model the tokenization process, with the discontinuous, overlapping, and ambiguous nature of meaningful genomic segments explicitly considered. On Nucleotide Transformer Benchmarks and Genomic Benchmarks, MxDNA demonstrates superior performance to existing methods with less pretraining data and time, highlighting its effectiveness. Finally, we show that MxDNA learns unique tokenization strategy distinct to those of previous methods and captures genomic functionalities at a token level during self-supervised pretraining. Our MxDNA aims to provide a new perspective on DNA tokenization, potentially offering broad applications in various domains and yielding profound insights. Code is available at https: //github. com/qiaoqiaoLF/MxDNA.

NeurIPS Conference 2024 Conference Paper

S2HPruner: Soft-to-Hard Distillation Bridges the Discretization Gap in Pruning

  • Weihao Lin
  • Shengji Tang
  • Chong Yu
  • Peng Ye
  • Tao Chen

Recently, differentiable mask pruning methods optimize the continuous relaxation architecture (soft network) as the proxy of the pruned discrete network (hard network) for superior sub-architecture search. However, due to the agnostic impact of the discretization process, the hard network struggles with the equivalent representational capacity as the soft network, namely discretization gap, which severely spoils the pruning performance. In this paper, we first investigate the discretization gap and propose a novel structural differentiable mask pruning framework named S2HPruner to bridge the discretization gap in a one-stage manner. In the training procedure, SH2Pruner forwards both the soft network and its corresponding hard network, then distills the hard network under the supervision of the soft network. To optimize the mask and prevent performance degradation, we propose a decoupled bidirectional knowledge distillation. It blocks the weight updating from the hard to the soft network while maintaining the gradient corresponding to the mask. Compared with existing pruning arts, S2HPruner achieves surpassing pruning performance without fine-tuning on comprehensive benchmarks, including CIFAR-100, Tiny ImageNet, and ImageNet with a variety of network architectures. Besides, investigation and analysis experiments explain the effectiveness of S2HPruner. Codes will be released soon.

NeurIPS Conference 2024 Conference Paper

The Limits of Differential Privacy in Online Learning

  • Bo Li
  • Wei Wang
  • Peng Ye

Differential privacy (DP) is a formal notion that restricts the privacy leakage of an algorithm when running on sensitive data, in which privacy-utility trade-off is one of the central problems in private data analysis. In this work, we investigate the fundamental limits of differential privacy in online learning algorithms and present evidence that separates three types of constraints: no DP, pure DP, and approximate DP. We first describe a hypothesis class that is online learnable under approximate DP but not online learnable under pure DP under the adaptive adversarial setting. This indicates that approximate DP must be adopted when dealing with adaptive adversaries. We then prove that any private online learner must make an infinite number of mistakes for almost all hypothesis classes. This essentially generalizes previous results and shows a strong separation between private and non-private settings since a finite mistake bound is always attainable (as long as the class is online learnable) when there is no privacy requirement.

NeurIPS Conference 2022 Conference Paper

Neural-Symbolic Entangled Framework for Complex Query Answering

  • Zezhong Xu
  • Wen Zhang
  • Peng Ye
  • Hui Chen
  • Huajun Chen

Answering complex queries over knowledge graphs (KG) is an important yet challenging task because of the KG incompleteness issue and cascading errors during reasoning. Recent query embedding (QE) approaches embed the entities and relations in a KG and the first-order logic (FOL) queries into a low dimensional space, making the query can be answered by dense similarity searching. However, previous works mainly concentrate on the target answers, ignoring intermediate entities' usefulness, which is essential for relieving the cascading error problem in logical query answering. In addition, these methods are usually designed with their own geometric or distributional embeddings to handle logical operators like union, intersection, and negation, with the sacrifice of the accuracy of the basic operator -- projection, and they could not absorb other embedding methods to their models. In this work, we propose a Neural and Symbolic Entangled framework (ENeSy) for complex query answering, which enables the neural and symbolic reasoning to enhance each other to alleviate the cascading error and KG incompleteness. The projection operator in ENeSy could be any embedding method with the capability of link prediction, and the other FOL operators are handled without parameters. With both neural and symbolic reasoning results contained, ENeSy answers queries in ensembles. We evaluate ENeSy on complex query answering benchmarks, and ENeSy achieves the state-of-the-art, especially in the setting of training model only with the link prediction task.

NeurIPS Conference 2022 Conference Paper

Stimulative Training of Residual Networks: A Social Psychology Perspective of Loafing

  • Peng Ye
  • Shengji Tang
  • Baopu Li
  • Tao Chen
  • Wanli Ouyang

Residual networks have shown great success and become indispensable in today’s deep models. In this work, we aim to re-investigate the training process of residual networks from a novel social psychology perspective of loafing, and further propose a new training strategy to strengthen the performance of residual networks. As residual networks can be viewed as ensembles of relatively shallow networks (i. e. , unraveled view) in prior works, we also start from such view and consider that the final performance of a residual network is co-determined by a group of sub-networks. Inspired by the social loafing problem of social psychology, we find that residual networks invariably suffer from similar problem, where sub-networks in a residual network are prone to exert less effort when working as part of the group compared to working alone. We define this previously overlooked problem as network loafing. As social loafing will ultimately cause the low individual productivity and the reduced overall performance, network loafing will also hinder the performance of a given residual network and its sub-networks. Referring to the solutions of social psychology, we propose stimulative training, which randomly samples a residual sub-network and calculates the KL-divergence loss between the sampled sub-network and the given residual network, to act as extra supervision for sub-networks and make the overall goal consistent. Comprehensive empirical results and theoretical analyses verify that stimulative training can well handle the loafing problem, and improve the performance of a residual network by improving the performance of its sub-networks. The code is available at https: //github. com/Sunshine-Ye/NIPS22-ST.