Arrow Research search

Author name cluster

Vishal Patel

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers
1 author row

Possible papers

9

YNICL Journal 2026 Journal Article

Enhancing 7T MRI for deep brain stimulation with deep-learning based image reconstruction and dynamic parallel transmission

  • Justyna O. Ekert
  • Vishal Patel
  • Xiangzhi Zhou
  • Shengzhen Tao
  • Patrick Liebig
  • Jürgen Herrler
  • Thomas Yu
  • Dominik Nickel

OBJECTIVE: Precise targeting of subcortical structures is crucial for deep brain stimulation (DBS). Although 7T MRI provides superior resolution and contrast, its clinical adoption remains limited by B1+ transmit inhomogeneity, prolonged scan times, and motion sensitivity. This study applied deep learning (DL)-based image reconstruction and dynamic parallel transmission (pTx) to optimize DBS protocols and improve image quality. METHODS: Thirteen patients scanned using a conventional 7T DBS protocol were compared to 13 imaged after implementing DL reconstruction and dynamic pTx. Two readers scored image quality, motion artifact, and target conspicuity on 5-point Likert scales. Ordinal logistic regression was used to calculate odds ratios (OR) for improvements with the enhanced protocol, adjusted for multiple comparisons. RESULTS: Enhanced MP2RAGE reduced voxel volume by 65.8% and scan time by 32.9%, with improved image quality (OR = 4.4;p = 0.003), target conspicuity (OR = 3.4;p = 0.011), and reduced motion artifacts (OR = 3.8;p = 0.006). Fast gray matter acquisition T1 inversion recovery (FGATIR) scan time decreased by 45.2% with improved target delineation of both globus pallidus interna (OR = 22.9;p < 0.001) and dentato-rubro-thalamic tract (OR = 8.8;p < 0.001). T2-weighted sampling perfection with application-optimized contrasts using different flip angle evolutions (SPACE) improved subthalamic nucleus (STN) delineation (OR = 25.3;p < 0.001). Susceptibility-weighted imaging (SWI) improved image quality (OR = 17.4;p < 0.001), STN delineation (OR = 16.9;p < 0.001), and reduced scan time by 42.6%. Enhanced 3D spoiled gradient recall echo improved image quality (OR = 17.4;p < 0.001) and vessel visualization (OR = 26.1;p < 0.001) with reduced motion artifact (OR = 8.8;p < 0.001). Scan time decreased from 4:33 to 1:35, reducing protocol duration from 42:16 to 26:40 (36.9%). CONCLUSIONS: DL reconstruction and dynamic pTx improved image quality, target definition, and motion robustness while shortening 7T DBS protocol time.

NeurIPS Conference 2025 Conference Paper

A Technical Report on “Erasing the Invisible”: The 2024 NeurIPS Competition on Stress Testing Image Watermarks

  • Mucong Ding
  • Bang An
  • Tahseen Rabbani
  • Chenghao Deng
  • Anirudh Satheesh
  • Souradip Chakraborty
  • Mehrdad Saberi
  • Yuxin Wen

AI-generated images have become pervasive, raising critical concerns around content authenticity, intellectual property, and the spread of misinformation. Invisible watermarks offer a promising solution for identifying AI-generated images, preserving content provenance without degrading visual quality. However, their real-world robustness remains uncertain due to the lack of standardized evaluation protocols and large-scale stress testing. To bridge this gap, we organized “Erasing the Invisible, ” a NeurIPS 2024 competition and newly established benchmark designed to systematically stress testing the resilience of watermarking techniques. The competition introduced two attack tracks—Black-box and Beige-box—that simulate practical scenarios with varying levels of attacker knowledge on watermarks, providing a comprehensive assessment of watermark robustness. The competition attracted significant global participation, with 2, 722 submissions from 298 teams. Through a rigorous evaluation pipeline featuring real-time feedback and human-verified final rankings, participants developed and demonstrated new attack strategies that revealed critical vulnerabilities in state-of-the-art watermarking methods. On average, the top-5 teams in both tracks could remove watermarks from $\geq$ 89% of the images while preserving high visual quality, setting strong baselines for future research on watermark attacks and defenses. To support continued progress in this field, we summarize the insights and lessons learned from this competition in this paper, and release the benchmark dataset, evaluation toolkit, and competition results. “Erasing the Invisible” establishes a valuable open resource for advancing more robust watermarking techniques and strengthening content provenance in the era of generative AI.

NeurIPS Conference 2025 Conference Paper

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

  • Yana Wei
  • Liang Zhao
  • Jianjian Sun
  • Kangheng Lin
  • jisheng yin
  • Jingcheng Hu
  • Yinmin Zhang
  • En Yu

The remarkable reasoning capability of large language models (LLMs) stems from cognitive behaviors that emerge through reinforcement with verifiable rewards. This work investigates how to transfer this principle to Multimodal LLMs (MLLMs) to unlock advanced visual reasoning. We introduce a two-stage paradigm built on Qwen2. 5-VL-7B: a massive linguistic cold-start fine-tuning, followed by multimodal reinforcement learning (RL) spanning nearly 1, 000 steps—surpassing all previous open-source efforts in scale. This pioneering work reveals three fundamental insights: 1) Behavior transfer emerges surprisingly early in cold start due to linguistic mental imagery. 2) Cold start broadly memorizes visual behaviors, while RL critically discerns and scales up effective patterns. 3) Transfer strategically favors high-utility behaviors such as visual reflection. Our resulting model, Open-Vision-Reasoner (OVR), achieves state-of-the-art performance on a suite of reasoning benchmarks, including 95. 3% on MATH500, 51. 8% on MathVision and 54. 6% on MathVerse. We release our model, data, and training dynamics to catalyze the development of more capable, behavior-aligned multimodal reasoners.

AAAI Conference 2023 Conference Paper

JR2Net: Joint Monocular 3D Face Reconstruction and Reenactment

  • Jiaxiang Shang
  • Yu Zeng
  • Xin Qiao
  • Xin Wang
  • Runze Zhang
  • Guangyuan Sun
  • Vishal Patel
  • Hongbo Fu

Face reenactment and reconstruction benefit various applications in self-media, VR, etc. Recent face reenactment methods use 2D facial landmarks to implicitly retarget facial expressions and poses from driving videos to source images, while they suffer from pose and expression preservation issues for cross-identity scenarios, i.e., when the source and the driving subjects are different. Current self-supervised face reconstruction methods also demonstrate impressive results. However, these methods do not handle large expressions well, since their training data lacks samples of large expressions, and 2D facial attributes are inaccurate on such samples. To mitigate the above problems, we propose to explore the inner connection between the two tasks, i.e., using face reconstruction to provide sufficient 3D information for reenactment, and synthesizing videos paired with captured face model parameters through face reenactment to enhance the expression module of face reconstruction. In particular, we propose a novel cascade framework named JR2Net for Joint Face Reconstruction and Reenactment, which begins with the training of a coarse reconstruction network, followed by a 3D-aware face reenactment network based on the coarse reconstruction results. In the end, we train an expression tracking network based on our synthesized videos composed by image-face model parameter pairs. Such an expression tracking network can further enhance the coarse face reconstruction. Extensive experiments show that our JR2Net outperforms the state-of-the-art methods on several face reconstruction and reenactment benchmarks.

AAAI Conference 2023 Conference Paper

VIDM: Video Implicit Diffusion Models

  • Kangfu Mei
  • Vishal Patel

Diffusion models have emerged as a powerful generative method for synthesizing high-quality and diverse set of images. In this paper, we propose a video generation method based on diffusion models, where the effects of motion are modeled in an implicit condition manner, i.e. one can sample plausible video motions according to the latent feature of frames. We improve the quality of the generated videos by proposing multiple strategies such as sampling space truncation, robustness penalty, and positional group normalization. Various experiments are conducted on datasets consisting of videos with different resolutions and different number of frames. Results show that the proposed method outperforms the state-of-the-art generative adversarial network-based methods by a significant margin in terms of FVD scores as well as perceptible visual quality.

NeurIPS Conference 2022 Conference Paper

Resource-Adaptive Federated Learning with All-In-One Neural Composition

  • Yiqun Mei
  • Pengfei Guo
  • Mo Zhou
  • Vishal Patel

Conventional Federated Learning (FL) systems inherently assume a uniform processing capacity among clients for deployed models. However, diverse client hardware often leads to varying computation resources in practice. Such system heterogeneity results in an inevitable trade-off between model complexity and data accessibility as a bottleneck. To avoid such a dilemma and achieve resource-adaptive federated learning, we introduce a simple yet effective mechanism, termed All-In-One Neural Composition, to systematically support training complexity-adjustable models with flexible resource adaption. It is able to efficiently construct models at various complexities using one unified neural basis shared among clients, instead of pruning the global model into local ones. The proposed mechanism endows the system with unhindered access to the full range of knowledge scattered across clients and generalizes existing pruning-based solutions by allowing soft and learnable extraction of low footprint models. Extensive experiment results on popular FL benchmarks demonstrate the effectiveness of our approach. The resulting FL system empowered by our All-In-One Neural Composition, called FLANC, manifests consistent performance gains across diverse system/data heterogeneous setups while keeping high efficiency in computation and communication.

NeurIPS Conference 2020 Conference Paper

Deep Subspace Clustering with Data Augmentation

  • Mahdi Abavisani
  • Alireza Naghizadeh
  • Dimitris Metaxas
  • Vishal Patel

The idea behind data augmentation techniques is based on the fact that slight changes in the percept do not change the brain cognition. In classification, neural networks use this fact by applying transformations to the inputs to learn to predict the same label. However, in deep subspace clustering (DSC), the ground-truth labels are not available, and as a result, one cannot easily use data augmentation techniques. We propose a technique to exploit the benefits of data augmentation in DSC algorithms. We learn representations that have consistent subspaces for slightly transformed inputs. In particular, we introduce a temporal ensembling component to the objective function of DSC algorithms to enable the DSC networks to maintain consistent subspaces for random transformations in the input data. In addition, we provide a simple yet effective unsupervised procedure to find efficient data augmentation policies. An augmentation policy is defined as an image processing transformation with a certain magnitude and probability of being applied to each image in each epoch. We search through the policies in a search space of the most common augmentation policies to find the best policy such that the DSC network yields the highest mean Silhouette coefficient in its clustering results on a target dataset. Our method achieves state-of-the-art performance on four standard subspace clustering datasets.