Author name cluster

Christoph Studer

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers

2 author rows

TMLR Journal 2026 Journal Article

Enhancing Semantic Segmentation with Continual Self-Supervised Pre-training

Brown Ebouky
Ajad Chhatkuli
A. Cristiano I. Malossi
Christoph Studer
Roy Assaf
Andrea Bartezzaghi

Self-supervised learning (SSL) has emerged as a central paradigm for training foundation models by leveraging large-scale unlabeled datasets, often producing representations with strong generalization capabilities. These models are typically pre-trained on general-purpose datasets such as ImageNet and subsequently adapted to various downstream tasks through finetuning. While prior work has investigated parameter-efficient adaptation methods like adapters, LoRA, and prompt tuning, primarily targeting downstream finetuning, extending the SSL pre-training itself in a continual manner to new domains under limited data remains largely underexplored, especially for downstream dense prediction tasks like semantic segmentation. In this work, we address the challenge of adapting vision foundation models to low-data target domains through continual self-supervised pre-training, specifically targeting downstream semantic segmentation. We propose GLARE (Global Local and Regional Enforcement), a novel continual self-supervised pre-training task designed to enhance downstream semantic segmentation performance. GLARE introduces patch-level augmentations to encourage local consistency and incorporates a regional consistency constraint that leverages spatial semantics in the data. For efficient continual pre-training, we initialize Vision Transformers (ViTs) with weights from existing SSL models and update only lightweight adapter modules specifically UniAdapter–while keeping the rest of the backbone frozen. Experiments across multiple semantic segmentation benchmarks on different domains demonstrate that GLARE consistently improves downstream performance with minimal computational and parameter overhead.

PDF Details

ICLR Conference 2025 Conference Paper

Cauchy-Schwarz Regularizers

Sueda Taner
Ziyi Wang 0005
Christoph Studer

We introduce a novel class of regularization functions, called Cauchy–Schwarz (CS) regularizers, which can be designed to induce a wide range of properties in solution vectors of optimization problems. To demonstrate the versatility of CS regularizers, we derive regularization functions that promote discrete-valued vectors, eigenvectors of a given matrix, and orthogonal matrices. The resulting CS regularizers are simple, differentiable, and can be free of spurious stationary points, making them suitable for gradient-based solvers and large-scale optimization problems. In addition, CS regularizers automatically adapt to the appropriate scale, which is, for example, beneficial when discretizing the weights of neural networks. To demonstrate the efficacy of CS regularizers, we provide results for solving underdetermined systems of linear equations and weight quantization in neural networks. Furthermore, we discuss specializations, variations, and generalizations, which lead to an even broader class of new and possibly more powerful regularizers.

Details

ICLR Conference 2021 Conference Paper

WrapNet: Neural Net Inference with Ultra-Low-Precision Arithmetic

Renkun Ni
Hong-Min Chu
Oscar Castañeda
Ping-Yeh Chiang
Christoph Studer
Tom Goldstein

Low-precision neural networks represent both weights and activations with few bits, drastically reducing the cost of multiplications. Meanwhile, these products are accumulated using high-precision (typically 32-bit) additions. Additions dominate the arithmetic complexity of inference in quantized (e.g., binary) nets, and high precision is needed to avoid overflow. To further optimize inference, we propose WrapNet, an architecture that adapts neural networks to use low-precision (8-bit) additions while achieving classification accuracy comparable to their 32-bit counterparts. We achieve resilience to low-precision accumulation by inserting a cyclic activation layer that makes results invariant to overflow. We demonstrate the efficacy of our approach using both software and hardware platforms.

Details

ICLR Conference 2020 Conference Paper

Adversarially robust transfer learning

Ali Shafahi
Parsa Saadatpanah
Chen Zhu 0001
Amin Ghiasi
Christoph Studer
David W. Jacobs
Tom Goldstein

Transfer learning, in which a network is trained on one task and re-purposed on another, is often used to produce neural network classifiers when data is scarce or full-scale training is too costly. When the goal is to produce a model that is not only accurate but also adversarially robust, data scarcity and computational limitations become even more cumbersome. We consider robust transfer learning, in which we transfer not only performance but also robustness from a source model to a target domain. We start by observing that robust networks contain robust feature extractors. By training classifiers on top of these feature extractors, we produce new models that inherit the robustness of their parent networks. We then consider the case of "fine tuning" a network by re-training end-to-end in the target domain. When using lifelong learning strategies, this process preserves the robustness of the source network while achieving high accuracy. By using such strategies, it is possible to produce accurate and robust models with little data, and without the cost of adversarial training. Additionally, we can improve the generalization of adversarially trained models, while maintaining their robustness.

Details

ICLR Conference 2020 Conference Paper

Certified Defenses for Adversarial Patches

Ping-Yeh Chiang
Renkun Ni
Ahmed Abdelkader
Chen Zhu 0001
Christoph Studer
Tom Goldstein

Adversarial patch attacks are among one of the most practical threat models against real-world computer vision systems. This paper studies certified and empirical defenses against patch attacks. We begin with a set of experiments showing that most existing defenses, which work by pre-processing input images to mitigate adversarial patches, are easily broken by simple white-box adversaries. Motivated by this finding, we propose the first certified defense against patch attacks, and propose faster methods for its training. Furthermore, we experiment with different patch shapes for testing, obtaining surprisingly good robustness transfer across shapes, and present preliminary results on certified defense against sparse attacks. Our complete implementation can be found on: https://github.com/Ping-C/certifiedpatchdefense.

Details

NeurIPS Conference 2019 Conference Paper

Adversarial training for free!

Ali Shafahi
Mahyar Najibi
Mohammad Amin Ghiasi
Zheng Xu
John Dickerson
Christoph Studer
Larry Davis
Gavin Taylor

Adversarial training, in which a network is trained on adversarial examples, is one of the few defenses against adversarial attacks that withstands strong attacks. Unfortunately, the high cost of generating strong adversarial examples makes standard adversarial training impractical on large-scale problems like ImageNet. We present an algorithm that eliminates the overhead cost of generating adversarial examples by recycling the gradient information computed when updating model parameters. Our "free" adversarial training algorithm achieves comparable robustness to PGD adversarial training on the CIFAR-10 and CIFAR-100 datasets at negligible additional cost compared to natural training, and can be 7 to 30 times faster than other strong adversarial training methods. Using a single workstation with 4 P100 GPUs and 2 days of runtime, we can train a robust model for the large-scale ImageNet classification task that maintains 40% accuracy against PGD attacks.

PDF Details

ICML Conference 2019 Conference Paper

Transferable Clean-Label Poisoning Attacks on Deep Neural Nets

Chen Zhu 0001
W. Ronny Huang
Hengduo Li
Gavin Taylor
Christoph Studer
Tom Goldstein

In this paper, we explore clean-label poisoning attacks on deep convolutional networks with access to neither the network’s output nor its architecture or parameters. Our goal is to ensure that after injecting the poisons into the training data, a model with unknown architecture and parameters trained on that data will misclassify the target image into a specific class. To achieve this goal, we generate multiple poison images from the base class by adding small perturbations which cause the poison images to trap the target image within their convex polytope in feature space. We also demonstrate that using Dropout during crafting of the poisons and enforcing this objective in multiple layers enhances transferability, enabling attacks against both the transfer learning and end-to-end training settings. We demonstrate transferable attack success rates of over 50% by poisoning only 1% of the training set.

Details

ICML Conference 2018 Conference Paper

An Estimation and Analysis Framework for the Rasch Model

Andrew S. Lan
Mung Chiang
Christoph Studer

The Rasch model is widely used for item response analysis in applications ranging from recommender systems to psychology, education, and finance. While a number of estimators have been proposed for the Rasch model over the last decades, the associated analytical performance guarantees are mostly asymptotic. This paper provides a framework that relies on a novel linear minimum mean-squared error (L-MMSE) estimator which enables an exact, nonasymptotic, and closed-form analysis of the parameter estimation error under the Rasch model. The proposed framework provides guidelines on the number of items and responses required to attain low estimation errors in tests or surveys. We furthermore demonstrate its efficacy on a number of real-world collaborative filtering datasets, which reveals that the proposed L-MMSE estimator performs on par with state-of-the-art nonlinear estimators in terms of predictive performance.

Details

ICML Conference 2018 Conference Paper

Linear Spectral Estimators and an Application to Phase Retrieval

Ramina Ghods
Andrew S. Lan
Tom Goldstein
Christoph Studer

Phase retrieval refers to the problem of recovering real- or complex-valued vectors from magnitude measurements. The best-known algorithms for this problem are iterative in nature and rely on so-called spectral initializers that provide accurate initialization vectors. We propose a novel class of estimators suitable for general nonlinear measurement systems, called linear spectral estimators (LSPEs), which can be used to compute accurate initialization vectors for phase retrieval problems. The proposed LSPEs not only provide accurate initialization vectors for noisy phase retrieval systems with structured or random measurement matrices, but also enable the derivation of sharp and nonasymptotic mean-squared error bounds. We demonstrate the efficacy of LSPEs on synthetic and real-world phase retrieval problems, and we show that our estimators significantly outperform existing methods for structured measurement systems that arise in practice.

Details

NeurIPS Conference 2018 Conference Paper

Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks

Ali Shafahi
W. Ronny Huang
Mahyar Najibi
Octavian Suciu
Christoph Studer
Tudor Dumitras
Tom Goldstein

Data poisoning is an attack on machine learning models wherein the attacker adds examples to the training set to manipulate the behavior of the model at test time. This paper explores poisoning attacks on neural nets. The proposed attacks use ``clean-labels''; they don't require the attacker to have any control over the labeling of training data. They are also targeted; they control the behavior of the classifier on a specific test instance without degrading overall classifier performance. For example, an attacker could add a seemingly innocuous image (that is properly labeled) to a training set for a face recognition engine, and control the identity of a chosen person at test time. Because the attacker does not need to control the labeling function, poisons could be entered into the training set simply by putting them online and waiting for them to be scraped by a data collection bot. We present an optimization-based method for crafting poisons, and show that just one single poison image can control classifier behavior when transfer learning is used. For full end-to-end training, we present a ``watermarking'' strategy that makes poisoning reliable using multiple (approx. 50) poisoned training instances. We demonstrate our method by generating poisoned frog images from the CIFAR dataset and using them to manipulate image classifiers.

PDF Details

NeurIPS Conference 2018 Conference Paper

Visualizing the Loss Landscape of Neural Nets

Hao Li
Zheng Xu
Gavin Taylor
Christoph Studer
Tom Goldstein

Neural network training relies on our ability to find "good" minimizers of highly non-convex loss functions. It is well known that certain network architecture designs (e. g. , skip connections) produce loss functions that train easier, and well-chosen training parameters (batch size, learning rate, optimizer) produce minimizers that generalize better. However, the reasons for these differences, and their effect on the underlying loss landscape, is not well understood. In this paper, we explore the structure of neural loss functions, and the effect of loss landscapes on generalization, using a range of visualization methods. First, we introduce a simple "filter normalization" method that helps us visualize loss function curvature, and make meaningful side-by-side comparisons between loss functions. Then, using a variety of visualizations, we explore how network architecture affects the loss landscape, and how training parameters affect the shape of minimizers.

PDF Details

ICML Conference 2017 Conference Paper

Convex Phase Retrieval without Lifting via PhaseMax

Tom Goldstein
Christoph Studer

Semidefinite relaxation methods transform a variety of non-convex optimization problems into convex problems, but square the number of variables. We study a new type of convex relaxation for phase retrieval problems, called PhaseMax, that convexifies the underlying problem without lifting. The resulting problem formulation can be solved using standard convex optimization routines, while still working in the original, low-dimensional variable space. We prove, using a random spherical distribution measurement model, that PhaseMax succeeds with high probability for a sufficiently large number of measurements. We compare our approach to other phase retrieval methods and demonstrate that our theory accurately predicts the success of PhaseMax.

Details

AAAI Conference 2017 Conference Paper

JAG: A Crowdsourcing Framework for Joint Assessment and Peer Grading

Igor Labutov
Christoph Studer

Generation and evaluation of crowdsourced content is commonly treated as two separate processes, performed at different times and by two distinct groups of people: content creators and content assessors. As a result, most crowdsourcing tasks follow this template: one group of workers generates content and another group of workers evaluates it. In an educational setting, for example, content creators are traditionally students that submit open-response answers to assignments (e. g. , a short answer, a circuit diagram, or a formula) and content assessors are instructors that grade these submissions. Despite the considerable success of peer-grading in massive open online courses (MOOCs), the process of test-taking and grading are still treated as two distinct tasks which typically occur at different times, and require an additional overhead of grader training and incentivization. Inspired by this problem in the context of education, we propose a general crowdsourcing framework that fuses open-response test-taking (content generation) and assessment into a single, streamlined process that appears to students in the form of an explicit test, but where everyone also acts as an implicit grader. The advantages offered by our framework include: a common incentive mechanism for both the creation and evaluation of content, and a probabilistic model that jointly models the processes of contribution and evaluation, facilitating efﬁcient estimation of the quality of the contributions and the competency of the contributors. We demonstrate the effectiveness and limits of our framework via simulations and a real-world user study.

PDF Details

NeurIPS Conference 2017 Conference Paper

Training Quantized Nets: A Deeper Understanding

Hao Li
Soham De
Zheng Xu
Christoph Studer
Hanan Samet
Tom Goldstein

Currently, deep neural networks are deployed on low-power portable devices by first training a full-precision model using powerful hardware, and then deriving a corresponding low-precision model for efficient inference on such systems. However, training models directly with coarsely quantized weights is a key step towards learning on embedded platforms that have limited computing resources, memory capacity, and power consumption. Numerous recent publications have studied methods for training quantized networks, but these studies have mostly been empirical. In this work, we investigate training methods for quantized neural networks from a theoretical viewpoint. We first explore accuracy guarantees for training methods under convexity assumptions. We then look at the behavior of these algorithms for non-convex problems, and show that training algorithms that exploit high-precision representations have an important greedy search phase that purely quantized training methods lack, which explains the difficulty of training using low-precision arithmetic.

PDF Details

ICML Conference 2016 Conference Paper

Dealbreaker: A Nonlinear Latent Variable Model for Educational Data

Andrew S. Lan
Tom Goldstein
Richard G. Baraniuk
Christoph Studer

Statistical models of student responses on assessment questions, such as those in homeworks and exams, enable educators and computer-based personalized learning systems to gain insights into students’ knowledge using machine learning. Popular student-response models, including the Rasch model and item response theory models, represent the probability of a student answering a question correctly using an affine function of latent factors. While such models can accurately predict student responses, their ability to interpret the underlying knowledge structure (which is certainly nonlinear) is limited. In response, we develop a new, nonlinear latent variable model that we call the dealbreaker model, in which a student’s success probability is determined by their weakest concept mastery. We develop efficient parameter inference algorithms for this model using novel methods for nonconvex optimization. We show that the dealbreaker model achieves comparable or better prediction performance as compared to affine models with real-world educational datasets. We further demonstrate that the parameters learned by the dealbreaker model are interpretable—they provide key insights into which concepts are critical (i. e. , the “dealbreaker”) to answering a question correctly. We conclude by reporting preliminary results for a movie-rating dataset, which illustrate the broader applicability of the dealbreaker model.

Details

JMLR Journal 2014 Journal Article

Sparse Factor Analysis for Learning and Content Analytics

Andrew S. Lan
Andrew E. Waters
Christoph Studer
Richard G. Baraniuk

We develop a new model and algorithms for machine learning-based learning analytics, which estimate a learner's knowledge of the concepts underlying a domain, and {\em{content analytics}}, which estimate the relationships among a collection of questions and those concepts. Our model represents the probability that a learner provides the correct response to a question in terms of three factors: their understanding of a set of underlying concepts, the concepts involved in each question, and each question's intrinsic difficulty. We estimate these factors given the graded responses to a collection of questions. The underlying estimation problem is ill-posed in general, especially when only a subset of the questions are answered. The key observation that enables a well-posed solution is the fact that typical educational domains of interest involve only a small number of key concepts. Leveraging this observation, we develop both a bi-convex maximum-likelihood-based solution and a Bayesian solution to the resulting SPARse Factor Analysis (SPARFA) problem. We also incorporate user-defined tags on questions to facilitate the interpretability of the estimated factors. Experiments with synthetic and real-world data demonstrate the efficacy of our approach. Finally, we make a connection between SPARFA and noisy, binary-valued (1-bit) dictionary learning that is of independent interest. [abs] [ pdf ][ bib ] &copy JMLR 2014. ( edit, beta )

PDF Details