Arrow Research search

Author name cluster

Vincent Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers
1 author row

Possible papers

4

TMLR Journal 2025 Journal Article

A Pattern Language for Machine Learning Tasks

  • Benjamin Rodatz
  • Ian Fan
  • Tuomas Laakkonen
  • Neil John Ortega
  • Thomas Hoffmann
  • Vincent Wang

We formalise the essential data of objective functions as equality constraints on composites of learners. We call these constraints ``tasks'', and we investigate the idealised view that such tasks determine model behaviours. We develop a flowchart-like graphical mathematics for tasks that allows us to; offer a unified perspective of approaches in machine learning across domains; design and optimise desired behaviours model-agnostically; and import insights from theoretical computer science into practical machine learning. As preliminary experimental validation of our theoretical framework, we exhibit and implement a novel ``manipulation'' task that minimally edits input data to have a desired attribute. Our model-agnostic approach achieves this end-to-end, and without the need for custom architectures, adversarial training, random sampling, or interventions on the data, hence enabling capable, small-scale, and training-stable models.

NeurIPS Conference 2025 Conference Paper

Nearly-Linear Time Private Hypothesis Selection with the Optimal Approximation Factor

  • Maryam Aliakbarpour
  • Zhan Shi
  • Ria Stevens
  • Vincent Wang

Estimating the density of a distribution from its samples is a fundamental problem in statistics. \emph{Hypothesis selection} addresses the setting where, in addition to a sample set, we are given $n$ candidate distributions---referred to as \emph{hypotheses}---and the goal is to determine which one best describes the underlying data distribution. This problem is known to be solvable very efficiently, requiring roughly $O(\log n)$ samples and running in $\tilde{O}(n)$ time. The quality of the output is measured via the total variation distance to the unknown distribution, and the approximation factor of the algorithm determines how large this distance is compared to the optimal distance achieved by the best candidate hypothesis. It is known that $\alpha = 3$ is the optimal approximation factor for this problem. We study hypothesis selection under the constraint of \emph{differential privacy}. We propose a differentially private algorithm in the central model that runs in nearly linear time with respect to the number of hypotheses, achieves the optimal approximation factor, and incurs only a modest increase in sample complexity, which remains polylogarithmic in $n$. This resolves an open question posed by [Bun, Kamath, Steinke, Wu, NeurIPS 2019]. Prior to our work, existing upper bounds required quadratic time.

TMLR Journal 2024 Journal Article

Assessing biomedical knowledge robustness in large language models by query-efficient sampling attacks

  • Rui Patrick Xian
  • Alex Jihun Lee
  • Satvik Lolla
  • Vincent Wang
  • Russell Ro
  • Qiming Cui
  • Reza Abbasi-Asl

The increasing depth of parametric domain knowledge in large language models (LLMs) is fueling their rapid deployment in real-world applications. Understanding model vulnerabilities in high-stakes and knowledge-intensive tasks is essential to quantifying the trustworthiness of model predictions and regulating model use. The recent discovery of named entities as adversarial examples (i.e. adversarial entities) in natural language processing tasks raises questions about their potential impact on the knowledge robustness of pre-trained and finetuned LLMs in high-stakes and specialized domains. We examined the use of type-consistent entity substitution as a template for collecting adversarial entities for medium-sized billion-parameter LLMs with biomedical knowledge. To this end, we developed an embedding space, gradient-free attack based on powerscaled distance-weighted sampling for robustness evaluation, which has a low query budget and controllable coverage. Our method has favorable query efficiency and scaling over alternative approaches based on blackbox gradient-guided search, which we demonstrated for adversarial distractor generation in biomedical question answering. Subsequent failure mode analysis uncovered two regimes of adversarial entities on the attack surface with distinct characteristics. We also showed that entity substitution attacks can manipulate token-wise Shapley value explanations, which become deceptive in this setting. Our approach complements standard evaluations for high-capacity models and the results highlight the brittleness of domain knowledge in LLMs.

FLAP Journal 2020 Journal Article

Concept Functionals.

  • Vincent Wang

[−1, 1]-valued functionals allow the semantic modelling of concepts as fuzzy and nonclassical predicates over a rich collection of domains, whilst maintaining compatibility with logical operations such as negation. We integrate this semantics with the Categorical Compositional Meaning programme, allowing us to compose and compute with concepts: in particular, we demonstrate how we may model spatial inference from vague and negated information obtained from fragments of natural language.