Arrow Research search

Author name cluster

Fang Wan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers
2 author rows

Possible papers

8

ICLR Conference 2025 Conference Paper

Prototype antithesis for biological few-shot class-incremental learning

  • Binghao Liu
  • Han Yang
  • Fang Wan
  • Fei Gu

Deep learning has become essential in the biological species recognition task. However, a significant challenge is the ability to continuously learn new or mutated species with limited annotated samples. Since species within the same family typically share similar traits, distinguishing between new and existing (old) species during incremental learning often faces the issue of species confusion. This can result in "catastrophic forgetting" of old species and poor learning of new ones. To address this issue, we propose a Prototype Antithesis (PA) method, which leverages the hierarchical structures in biological taxa to reduce confusion between new and old species. PA operates in two steps: Residual Prototype Learning (RPL) and Residual Prototype Mixing (RPM). RPL enables the model to learn unique prototypes for each species alongside residual prototypes representing shared traits within families. RPM generates synthetic samples by blending features of new species with residual prototypes of old species, encouraging the model to focus on species-unique traits and minimize species confusion. By integrating RPL and RPM, the proposed PA method mitigates "catastrophic forgetting" while improving generalization to new species. Extensive experiments on CUB200, PlantVillage, and Tree-of-Life datasets demonstrate that PA significantly reduces inter-species confusion and achieves state-of-the-art performance, highlighting its potential for deep learning in biological data analysis.

ICRA Conference 2024 Conference Paper

Close the Sim2real Gap via Physically-based Structured Light Synthetic Data Simulation

  • Kaixin Bai
  • Lei Zhang 0198
  • Zhaopeng Chen
  • Fang Wan
  • Jianwei Zhang 0001

Despite the substantial progress in deep learning, its adoption in industrial robotics projects remains limited, primarily due to challenges in data acquisition and labeling. Previous sim2real approaches using domain randomization require extensive scene and model optimization. To address these issues, we introduce an innovative physically-based structured light simulation system, generating both RGB and physically realistic depth images, surpassing previous dataset generation tools. We create an RGBD dataset tailored for robotic industrial grasping scenarios and evaluate it across various tasks, including object detection, instance segmentation, and embedding sim2real visual perception in industrial robotic grasping. By reducing the sim2real gap and enhancing deep learning training, we facilitate the application of deep learning models in industrial settings. Project details are available at https://baikaixin-public.github.io/structured_light_3D_synthesizer/

NeurIPS Conference 2024 Conference Paper

Evaluation of Text-to-Video Generation Models: A Dynamics Perspective

  • Mingxiang Liao
  • Hannan Lu
  • Xinyu Zhang
  • Fang Wan
  • Tianyu Wang
  • Yuzhong Zhao
  • Wangmeng Zuo
  • Qixiang Ye

Comprehensive and constructive evaluation protocols play an important role when developing sophisticated text-to-video (T2V) generation models. Existing evaluation protocols primarily focus on temporal consistency and content continuity, yet largely ignore dynamics of video content. Such dynamics is an essential dimension measuring the visual vividness and the honesty of video content to text prompts. In this study, we propose an effective evaluation protocol, termed DEVIL, which centers on the dynamics dimension to evaluate T2V generation models, as well as improving existing evaluation metrics. In practice, we define a set of dynamics scores corresponding to multiple temporal granularities, and a new benchmark of text prompts under multiple dynamics grades. Upon the text prompt benchmark, we assess the generation capacity of T2V models, characterized by metrics of dynamics ranges and T2V alignment. Moreover, we analyze the relevance of existing metrics to dynamics metrics, improving them from the perspective of dynamics. Experiments show that DEVIL evaluation metrics enjoy up to about 90\% consistency with human ratings, demonstrating the potential to advance T2V generation models.

AAAI Conference 2021 Conference Paper

Agreement-Discrepancy-Selection: Active Learning with Progressive Distribution Alignment

  • Mengying Fu
  • Tianning Yuan
  • Fang Wan
  • Songcen Xu
  • Qixiang Ye

In active learning, the ignorance of aligning unlabeled samples’ distribution with that of labeled samples hinders the model trained upon labeled samples from selecting informative unlabeled samples. In this paper, we propose an agreement-discrepancy-selection (ADS) approach, and target at unifying distribution alignment with sample selection by introducing adversarial classifiers to the convolutional neural network (CNN). Minimizing classifiers’ prediction discrepancy (maximizing prediction agreement) drives learning CNN features to reduce the distribution bias of labeled and unlabeled samples, while maximizing classifiers’ discrepancy highlights informative samples. Iterative optimization of agreement and discrepancy loss calibrated with an entropy function drives aligning sample distributions in a progressive fashion for effective active learning. Experiments on image classification and object detection tasks demonstrate that ADS is task-agnostic, while significantly outperforms the previous methods when the labeled sets are small.

AAAI Conference 2021 Conference Paper

Nearest Neighbor Classifier Embedded Network for Active Learning

  • Fang Wan
  • Tianning Yuan
  • Mengying Fu
  • Xiangyang Ji
  • Qingming Huang
  • Qixiang Ye

Deep neural networks (DNNs) have been widely applied to active learning. Despite of its effectiveness, the generalization ability of the discriminative classifier (the softmax classifier) is questionable when there is a significant distribution bias between the labeled set and the unlabeled set. In this paper, we attempt to replace the softmax classifier in deep neural network with a nearest neighbor classifier, considering its progressive generalization ability within the unknown subspace. Our proposed active learning approach, termed nearest Neighbor Classifier Embedded network (NCE-Net), targets at reducing the risk of over-estimating unlabeled samples while improving the opportunity to query informative samples. NCE-Net is conceptually simple but surprisingly powerful, as justified from the perspective of the subset information, which defines a metric to quantify model generalization ability in active learning. Experimental results show that, with simple selection based on rejection or confusion confidence, NCE-Net improves state-of-the-arts on image classification and object detection tasks with significant margins.

IROS Conference 2020 Conference Paper

A Bottom-up Framework for Construction of Structured Semantic 3D Scene Graph

  • Bangguo Yu
  • Chongyu Chen
  • Fengyu Zhou
  • Fang Wan
  • Wenmi Zhuang
  • Yang Zhao

For high-level human-robot interaction tasks, 3D scene understanding is important and non-trivial for autonomous robots. However, parsing and utilizing effective environment information of the 3D scene is not trivial due to the complexity of the 3D environment and the limited ability for reasoning about our visual world. Although there have been great efforts on semantic detection and scene analysis, the existing solutions for parsing and representation of the 3D scene still fail to preserve accurate semantic information and equip sufficient applicability. This study proposes a bottomup construction framework for structured 3D scene graph generation, which efficiently describes the objects, relations and attributes of the 3D indoor environment with structured representation. In the proposed method, we adopt visual perception to capture the semantic information and inference from scene priors to calculate the optimal parse graph. Afterwards, an improved probabilistic grammar model is used to represent the scene priors. Experiment results demonstrate that the proposed framework significantly outperforms existing methods in terms of accuracy, and a demonstration is provided to verify the applicability in applying to high-level human-robot interaction tasks. The supplementary video can be accessed at the following link: https://youtu.be/vEWNxnSwmKI.

NeurIPS Conference 2019 Conference Paper

FreeAnchor: Learning to Match Anchors for Visual Object Detection

  • Xiaosong Zhang
  • Fang Wan
  • Chang Liu
  • Rongrong Ji
  • Qixiang Ye

Modern CNN-based object detectors assign anchors for ground-truth objects under the restriction of object-anchor Intersection-over-Unit (IoU). In this study, we propose a learning-to-match approach to break IoU restriction, allowing objects to match anchors in a flexible manner. Our approach, referred to as FreeAnchor, updates hand-crafted anchor assignment to "free" anchor matching by formulating detector training as a maximum likelihood estimation (MLE) procedure. FreeAnchor targets at learning features which best explain a class of objects in terms of both classification and localization. FreeAnchor is implemented by optimizing detection customized likelihood and can be fused with CNN-based detectors in a plug-and-play manner. Experiments on MS-COCO demonstrate that FreeAnchor consistently outperforms the counterparts with significant margins.