Arrow Research search

Author name cluster

Bin Dong

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers
1 author row

Possible papers

10

AAAI Conference 2026 Conference Paper

RetroLM: Retrieval-Augmented KVs for Long-Context Processing

  • Kun Luo
  • Zheng Liu
  • Shitao Xiao
  • Jiabei Chen
  • Hongjin Qian
  • Peitian Zhang
  • Shanshan Jiang
  • Bin Dong

Long-context processing remains a significant challenge for large language models (LLMs). Retrieval-augmented generation (RAG) has recently emerged as a promising approach, enabling LLMs to selectively access relevant information from extended contexts to improve efficiency. However, existing RAG approaches often lag behind other efficient long-context processing methods primarily due to inherent limitations on inaccurate retrieval and fragmented contexts. To address these limitations, we propose RetroLM, a novel RAG framework designed for effective long-context processing. Unlike traditional approaches, RetroLM introduces KV-level retrieval augmentation, which partitions the LLM's KV cache into contiguous pages and performs encoding and decoding operations based on the retrieved KV pages. Built upon this framework, we further develop a specialized retriever for precise retrieval of critical pages and conduct unsupervised post-training to optimize the model’s ability to leverage retrieved information. Compared with traditional RAG, the new approach enhances robustness to retrieval inaccuracy, facilitates effective utilization of fragmented contexts, and saves the cost from repeated context-encoding operations. We conduct extensive evaluations across several popular benchmarks, including LongBench, InfiniteBench, and RULER. RetroLM consistently outperforms existing long-LLMs and RAG-based methods, especially in tasks requiring deep reasoning or extreme context lengths.

NeurIPS Conference 2023 Conference Paper

Unsupervised Image Denoising with Score Function

  • Yutong Xie
  • Mingze Yuan
  • Bin Dong
  • Quanzheng Li

Though achieving excellent performance in some cases, current unsupervised learning methods for single image denoising usually have constraints in applications. In this paper, we propose a new approach which is more general and applicable to complicated noise models. Utilizing the property of score function, the gradient of logarithmic probability, we define a solving system for denoising. Once the score function of noisy images has been estimated, the denoised result can be obtained through the solving system. Our approach can be applied to multiple noise models, such as the mixture of multiplicative and additive noise combined with structured correlation. Experimental results show that our method is comparable when the noise model is simple, and has good performance in complicated cases where other methods are not applicable or perform poorly.

IJCAI Conference 2022 Conference Paper

A Universal PINNs Method for Solving Partial Differential Equations with a Point Source

  • Xiang Huang
  • Hongsheng Liu
  • Beiji Shi
  • Zidong Wang
  • Kang Yang
  • Yang Li
  • Min Wang
  • Haotian Chu

In recent years, deep learning technology has been used to solve partial differential equations (PDEs), among which the physics-informed neural networks (PINNs)method emerges to be a promising method for solving both forward and inverse PDE problems. PDEs with a point source that is expressed as a Dirac delta function in the governing equations are mathematical models of many physical processes. However, they cannot be solved directly by conventional PINNs method due to the singularity brought by the Dirac delta function. In this paper, we propose a universal solution to tackle this problem by proposing three novel techniques. Firstly the Dirac delta function is modeled as a continuous probability density function to eliminate the singularity at the point source; secondly a lower bound constrained uncertainty weighting algorithm is proposed to balance the physics-informed loss terms of point source area and the remaining areas; and thirdly a multi-scale deep neural network with periodic activation function is used to improve the accuracy and convergence speed. We evaluate the proposed method with three representative PDEs, and the experimental results show that our method outperforms existing deep learning based methods with respect to the accuracy, the efficiency and the versatility.

NeurIPS Conference 2022 Conference Paper

Meta-Auto-Decoder for Solving Parametric Partial Differential Equations

  • Xiang Huang
  • Zhanhong Ye
  • Hongsheng Liu
  • Shi Ji
  • Zidong Wang
  • Kang Yang
  • Yang Li
  • Min Wang

Many important problems in science and engineering require solving the so-called parametric partial differential equations (PDEs), i. e. , PDEs with different physical parameters, boundary conditions, shapes of computation domains, etc. Recently, building learning-based numerical solvers for parametric PDEs has become an emerging new field. One category of methods such as the Deep Galerkin Method (DGM) and Physics-Informed Neural Networks (PINNs) aim to approximate the solution of the PDEs. They are typically unsupervised and mesh-free, but require going through the time-consuming network training process from scratch for each set of parameters of the PDE. Another category of methods such as Fourier Neural Operator (FNO) and Deep Operator Network (DeepONet) try to approximate the solution mapping directly. Being fast with only one forward inference for each PDE parameter without retraining, they often require a large corpus of paired input-output observations drawn from numerical simulations, and most of them need a predefined mesh as well. In this paper, we propose Meta-Auto-Decoder (MAD), a mesh-free and unsupervised deep learning method that enables the pre-trained model to be quickly adapted to equation instances by implicitly encoding (possibly heterogenous) PDE parameters as latent vectors. The proposed method MAD can be interpreted by manifold learning in infinite-dimensional spaces, granting it a geometric insight. Extensive numerical experiments show that the MAD method exhibits faster convergence speed without losing accuracy than other deep learning-based methods.

IJCAI Conference 2022 Conference Paper

Region-Aware Metric Learning for Open World Semantic Segmentation via Meta-Channel Aggregation

  • Hexin Dong
  • Zifan Chen
  • Mingze Yuan
  • Yutong Xie
  • Jie Zhao
  • Fei Yu
  • Bin Dong
  • Li Zhang

As one of the most challenging and practical segmentation tasks, open-world semantic segmentation requires the model to segment the anomaly regions in the images and incrementally learn to segment out-of-distribution (OOD) objects, especially under a few-shot condition. The current state-of-the-art (SOTA) method, Deep Metric Learning Network (DMLNet), relies on pixel-level metric learning, with which the identification of similar regions having different semantics is difficult. Therefore, we propose a method called region-aware metric learning (RAML), which first separates the regions of the images and generates region-aware features for further metric learning. RAML improves the integrity of the segmented anomaly regions. Moreover, we propose a novel meta-channel aggregation (MCA) module to further separate anomaly regions, forming high-quality sub-region candidates and thereby improving the model performance for OOD objects. To evaluate the proposed RAML, we have conducted extensive experiments and ablation studies on Lost And Found and Road Anomaly datasets for anomaly segmentation and the CityScapes dataset for incremental few-shot learning. The results show that the proposed RAML achieves SOTA performance in both stages of open world segmentation. Our code and appendix are available at https: //github. com/czifan/RAML.

AAAI Conference 2021 Conference Paper

DAST: Unsupervised Domain Adaptation in Semantic Segmentation Based on Discriminator Attention and Self-Training

  • Fei Yu
  • Mo Zhang
  • Hexin Dong
  • Sheng Hu
  • Bin Dong
  • Li Zhang

Unsupervised domain adaption has recently been used to reduce the domain shift, which would ultimately improve the performance of semantic segmentation on unlabeled realworld data. In this paper, we follow the trend to propose a novel method to reduce the domain shift using strategies of discriminator attention and self-training. The discriminator attention strategy contains a two-stage adversarial learning process, which explicitly distinguishes the well-aligned (domain-invariant) and poorly-aligned (domain-specific) features, and then guides the model to focus on the latter. The self-training strategy adaptively improves the decision boundary of the model for target domain, which implicitly facilitates the extraction of domain-invariant features. By combining the two strategies, we find a more effective way to reduce the domain shift. Extensive experiments demonstrate the effectiveness of our proposed method on numerous benchmark datasets.

NeurIPS Conference 2021 Conference Paper

SOLQ: Segmenting Objects by Learning Queries

  • Bin Dong
  • Fangao Zeng
  • Tiancai Wang
  • Xiangyu Zhang
  • Yichen Wei

In this paper, we propose an end-to-end framework for instance segmentation. Based on the recently introduced DETR, our method, termed SOLQ, segments objects by learning unified queries. In SOLQ, each query represents one object and has multiple representations: class, location and mask. The object queries learned perform classification, box regression and mask encoding simultaneously in an unified vector form. During training phase, the mask vectors encoded are supervised by the compression coding of raw spatial masks. In inference time, mask vectors produced can be directly transformed to spatial masks by the inverse process of compression coding. Experimental results show that SOLQ can achieve state-of-the-art performance, surpassing most of existing approaches. Moreover, the joint learning of unified query representation can greatly improve the detection performance of DETR. We hope our SOLQ can serve as a strong baseline for the Transformer-based instance segmentation.

IJCAI Conference 2019 Conference Paper

Nostalgic Adam: Weighting More of the Past Gradients When Designing the Adaptive Learning Rate

  • Haiwen Huang
  • Chang Wang
  • Bin Dong

First-order optimization algorithms have been proven prominent in deep learning. In particu- lar, algorithms such as RMSProp and Adam are extremely popular. However, recent works have pointed out the lack of “long-term memory” in Adam-like algorithms, which could hamper their performance and lead to divergence. In our study, we observe that there are benefits of weighting more of the past gradients when designing the adaptive learning rate. We therefore propose an algorithm called the Nostalgic Adam (NosAdam) with theoretically guaranteed convergence at the best known convergence rate. NosAdam can be regarded as a fix to the non-convergence issue of Adam in alternative to the recent work of [Reddi et al. , 2018]. Our preliminary numerical experiments show that NosAdam is a promising alternative al- gorithm to Adam. The proofs, code and other supplementary materials are already released.

NeurIPS Conference 2019 Conference Paper

You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle

  • Dinghuai Zhang
  • Tianyuan Zhang
  • Yiping Lu
  • Zhanxing Zhu
  • Bin Dong

Deep learning achieves state-of-the-art results in many tasks in computer vision and natural language processing. However, recent works have shown that deep networks can be vulnerable to adversarial perturbations which raised a serious robustness issue of deep networks. Adversarial training, typically formulated as a robust optimization problem, is an effective way of improving the robustness of deep networks. A major drawback of existing adversarial training algorithms is the computational overhead of the generation of adversarial examples, typically far greater than that of the network training. This leads to unbearable overall computational cost of adversarial training. In this paper, we show that adversarial training can be cast as a discrete time differential game. Through analyzing the Pontryagin’s Maximum Principle (PMP) of the problem, we observe that the adversary update is only coupled with the parameters of the first layer of the network. This inspires us to restrict most of the forward and back propagation within the first layer of the network during adversary updates. This effectively reduces the total number of full forward and backward propagation to only one for each group of adversary updates. Therefore, we refer to this algorithm YOPO (\textbf{Y}ou \textbf{O}nly \textbf{P}ropagate \textbf{O}nce). Numerical experiments demonstrate that YOPO can achieve comparable defense accuracy with \textbf{approximately 1/5 $\sim$ 1/4 GPU time} of the projected gradient descent (PGD) algorithm~\cite{kurakin2016adversarial}.