Arrow Research search

Author name cluster

Jiangliu Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers
2 author rows

Possible papers

4

ICRA Conference 2024 Conference Paper

Ada-Tracker: Soft Tissue Tracking via Inter-Frame and Adaptive-template Matching

  • Jiaxin Guo
  • Jiangliu Wang
  • Zhaoshuo Li
  • Tongyu Jia
  • Qi Dou 0001
  • Yun-Hui Liu 0001

Soft tissue tracking is crucial for computer-assisted interventions. Existing approaches mainly rely on extracting discriminative features from the template and videos to recover corresponding matches. However, it is difficult to adopt these techniques in surgical scenes, where tissues are changing in shape and appearance throughout the surgery. To address this problem, we exploit optical flow to naturally capture the pixel-wise tissue deformations and adaptively correct the tracked template. Specifically, we first implement an inter-frame matching mechanism to extract a coarse region of interest based on optical flow from consecutive frames. To accommodate appearance change and alleviate drift, we then propose an adaptive-template matching method, which updates the tracked template based on the reliability of the estimates. Our approach, Ada-Tracker, enjoys both short-term dynamics modeling by capturing local deformations and long-term dynamics modeling by introducing global temporal compensation. We evaluate our approach on the public SurgT benchmark, which is generated from Hamlyn, SCARED, and Kidney boundary datasets. The experimental results show that Ada-Tracker achieves superior accuracy and performs more robustly against prior works. Code is available at https://github.com/wrld/Ada-Tracker.

ICRA Conference 2024 Conference Paper

Simultaneous Estimation of Shape and Force along Highly Deformable Surgical Manipulators Using Sparse FBG Measurement

  • Yiang Lu
  • Bin Li 0082
  • Wei Chen 0068
  • Junyan Yan
  • Shing Shin Cheng
  • Jiangliu Wang
  • Jianshu Zhou
  • Qi Dou 0001

Recently, fiber optic sensors such as fiber Bragg gratings (FBGs) have been widely investigated for shape reconstruction and force estimation of flexible surgical robots. However, most existing approaches need precise model parameters of FBGs inside the fiber and their alignments with the flexible robots for accurate sensing results. Another challenge lies in online acquiring external forces at arbitrary locations along the flexible robots, which is highly required when with large deflections in robotic surgery. In this paper, we propose a novel data-driven paradigm for simultaneous estimation of shape and force along highly deformable flexible robots by using sparse strain measurement from a single-core FBG fiber. A thin-walled soft sensing tube helically embedded with FBG sensors is designed for a robotic-assisted flexible ureteroscope with large deflection up to 270° and a bend radius under 10 mm. We introduce and study three learning models by incorporating spatial strain encoders, and compare their performances in both free space without interactions as well as constrained environments with contact forces at different locations. The experimental results in terms of dynamic shape-force sensing accuracy demonstrate the effectiveness and superiority of the proposed methods.

ICLR Conference 2023 Conference Paper

Soft Neighbors are Positive Supporters in Contrastive Visual Representation Learning

  • Chongjian Ge
  • Jiangliu Wang
  • Zhan Tong
  • Shoufa Chen
  • Yibing Song
  • Ping Luo 0002

Contrastive learning methods train visual encoders by comparing views (e.g., often created via a group of data augmentations on the same instance) from one instance to others. Typically, the views created from one instance are set as positive, while views from other instances are negative. This binary instance discrimination is studied extensively to improve feature representations in self-supervised learning. In this paper, we rethink the instance discrimination framework and find the binary instance labeling insufficient to measure correlations between different samples. For an intuitive example, given a random image instance, there may exist other images in a mini-batch whose content meanings are the same (i.e., belonging to the same category) or partially related (i.e., belonging to a similar category). How to treat the images that correlate similarly to the current image instance leaves an unexplored problem. We thus propose to support the current image by exploring other correlated instances (i.e., soft neighbors). We first carefully cultivate a candidate neighbor set, which will be further utilized to explore the highly-correlated instances. A cross-attention module is then introduced to predict the correlation score (denoted as positiveness) of other correlated instances with respect to the current one. The positiveness score quantitatively measures the positive support from each correlated instance, and is encoded into the objective for pretext training. To this end, our proposed method benefits in discriminating uncorrelated instances while absorbing correlated instances for SSL. We evaluate our soft neighbor contrastive learning method (SNCLR) on standard visual recognition benchmarks, including image classification, object detection, and instance segmentation. The state-of-the-art recognition performance shows that SNCLR is effective in improving feature representations from both ViT and CNN encoders.

NeurIPS Conference 2022 Conference Paper

AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

  • Shoufa Chen
  • Chongjian Ge
  • Zhan Tong
  • Jiangliu Wang
  • Yibing Song
  • Jue Wang
  • Ping Luo

Pretraining Vision Transformers (ViTs) has achieved great success in visual recognition. A following scenario is to adapt a ViT to various image and video recognition tasks. The adaptation is challenging because of heavy computation and memory storage. Each model needs an independent and complete finetuning process to adapt to different tasks, which limits its transferability to different visual domains. To address this challenge, we propose an effective adaptation approach for Transformer, namely AdaptFormer, which can adapt the pre-trained ViTs into many different image and video tasks efficiently. It possesses several benefits more appealing than prior arts. Firstly, AdaptFormer introduces lightweight modules that only add less than 2% extra parameters to a ViT, while it is able to increase the ViT's transferability without updating its original pre-trained parameters, significantly outperforming the existing 100\% fully fine-tuned models on action recognition benchmarks. Secondly, it can be plug-and-play in different Transformers and scalable to many visual tasks. Thirdly, extensive experiments on five image and video datasets show that AdaptFormer largely improves ViTs in the target domains. For example, when updating just 1. 5% extra parameters, it achieves about 10% and 19% relative improvement compared to the fully fine-tuned models on Something-Something~v2 and HMDB51, respectively. Code is available at https: //github. com/ShoufaChen/AdaptFormer.