Author name cluster

Jonathan Kao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

1 author row

NeurIPS Conference 2025 Conference Paper

Flattening Hierarchies with Policy Bootstrapping

John Zhou
Jonathan Kao

Offline goal-conditioned reinforcement learning (GCRL) is a promising approach for pretraining generalist policies on large datasets of reward-free trajectories, akin to the self-supervised objectives used to train foundation models for computer vision and natural language processing. However, scaling GCRL to longer horizons remains challenging due to the combination of sparse rewards and discounting, which obscures the comparative advantages of primitive actions with respect to distant goals. Hierarchical RL methods achieve strong empirical results on long-horizon goal-reaching tasks, but their reliance on modular, timescale-specific policies and subgoal generation introduces significant additional complexity and hinders scaling to high-dimensional goal spaces. In this work, we introduce an algorithm to train a flat (non-hierarchical) goal-conditioned policy by bootstrapping on subgoal-conditioned policies with advantage-weighted importance sampling. Our approach eliminates the need for a generative model over the (sub)goal space, which we find is key for scaling to high-dimensional control in large state spaces. We further show that existing hierarchical and bootstrapping-based approaches correspond to specific design choices within our derivation. Across a comprehensive suite of state- and pixel-based locomotion and manipulation benchmarks, our method matches or surpasses state-of-the-art offline GCRL algorithms and scales to complex, long-horizon tasks where prior approaches fail. Project page: https: //johnlyzhou. github. io/saw/

PDF Details

NeurIPS Conference 2025 Conference Paper

SplashNet: Split‑and‑Share Encoders for Accurate and Efficient Typing with Surface Electromyography

Nima Hadidi
Jason Chan
Ebrahim Feghhi
Jonathan Kao

Surface electromyography (sEMG) at the wrists could enable natural, keyboard‑free text entry, yet the state‑of‑the‑art emg2qwerty baseline still misrecognizes 51. 8\% of characters zero‑shot on unseen users and 7. 0\% after user‑specific fine‑tuning. We trace much of these errors to mismatched cross‑user signal statistics, fragile reliance on high‑order feature dependencies, and the absence of architectural inductive biases aligned with the bilateral nature of typing. To address these issues, we introduce three simple modifications: (i) Rolling Time Normalization which adaptively aligns input distributions across users; (ii) Aggressive Channel Masking, which encourages reliance on low‑order feature combinations more likely to generalize across users; and (iii) a Split‑and‑Share encoder that processes each hand independently with weight‑shared streams to reflect the bilateral symmetry of the neuromuscular system. Combined with a five‑fold reduction in spectral resolution (33$\rightarrow$6 frequency bands), these components yield a compact Split-and-Share model, SplashNet‑mini, which uses only ¼ the parameters and 0. 6× the FLOPs of the baseline while reducing character error rate (CER) to 36. 4\% zero‑shot and 5. 9\% after fine‑tuning. An upscaled variant, SplashNet (½ parameters, 1. 15× FLOPs of the baseline), further lowers error to 35. 7\% and 5. 5\%, representing 31\% and 21\% relative improvements in the zero-shot and finetuned settings, respectively. SplashNet therefore establishes a new state-of-the-art without requiring additional data.

PDF Details

NeurIPS Conference 2025 Conference Paper

Time-Masked Transformers with Lightweight Test-Time Adaptation for Neural Speech Decoding

Ebrahim Feghhi
Shreyas Kaasyap
Nima Hadidi
Jonathan Kao

Speech neuroprostheses aim to restore communication for people with severe paralysis by decoding speech directly from neural activity. To accelerate algorithmic progress, a recent benchmark released intracranial recordings from a paralyzed participant attempting to speak, along with a baseline decoding algorithm. Prior work on the benchmark showed impressive accuracy gains. However, these gains increased computational costs and were not demonstrated in a real-time decoding setting. Here, we make three contributions that pave the way towards accurate, efficient, and real-time neural speech decoding. First, we incorporate large amounts of time-masking during training. On average, over $50\%$ of each trial is masked. Second, we replace the gated recurrent unit (GRU) architecture used in the baseline algorithm with a compact Transformer. The Transformer architecture uses $83\%$ fewer parameters, cuts peak GPU memory usage by $52\%$, and is significantly faster to calibrate relative to the GRU. Third, we design a lightweight variant of an existing test-time adaptation method developed for decoding handwriting from neural activity. Our variant adapts the model using multiple time-masked augmentations of a single trial and requires only one gradient step per trial. Together, these contributions reduce word error rate by over $20\%$ and effectively mitigate performance degradations across held-out days in a real-time decoding setting while substantially lowering computational costs.

PDF Details

NeurIPS Conference 2023 Conference Paper

Gacs-Korner Common Information Variational Autoencoder

Michael Kleinman
Alessandro Achille
Stefano Soatto
Jonathan Kao

We propose a notion of common information that allows one to quantify and separate the information that is shared between two random variables from the information that is unique to each. Our notion of common information is defined by an optimization problem over a family of functions and recovers the G\'acs-K\"orner common information as a special case. Importantly, our notion can be approximated empirically using samples from the underlying data distribution. We then provide a method to partition and quantify the common and unique information using a simple modification of a traditional variational auto-encoder. Empirically, we demonstrate that our formulation allows us to learn semantically meaningful common and unique factors of variation even on high-dimensional data such as images and videos. Moreover, on datasets where ground-truth latent factors are known, we show that we can accurately quantify the common information between the random variables.

PDF Details

NeurIPS Conference 2021 Conference Paper

A mechanistic multi-area recurrent network model of decision-making

Michael Kleinman
Chandramouli Chandrasekaran
Jonathan Kao

Recurrent neural networks (RNNs) trained on neuroscience-based tasks have been widely used as models for cortical areas performing analogous tasks. However, very few tasks involve a single cortical area, and instead require the coordination of multiple brain areas. Despite the importance of multi-area computation, there is a limited understanding of the principles underlying such computation. We propose to use multi-area RNNs with neuroscience-inspired architecture constraints to derive key features of multi-area computation. In particular, we show that incorporating multiple areas and Dale's Law is critical for biasing the networks to learn biologically plausible solutions. Additionally, we leverage the full observability of the RNNs to show that output-relevant information is preferentially propagated between areas. These results suggest that cortex uses modular computation to generate minimal sufficient representations of task information. More broadly, our results suggest that constrained multi-area RNNs can produce experimentally testable hypotheses for computations that occur within and across multiple brain areas, enabling new insights into distributed computation in neural systems.

PDF Details

NeurIPS Conference 2021 Conference Paper

Learning rule influences recurrent network representations but not attractor structure in decision-making tasks

Brandon McMahan
Michael Kleinman
Jonathan Kao

Recurrent neural networks (RNNs) are popular tools for studying computational dynamics in neurobiological circuits. However, due to the dizzying array of design choices, it is unclear if computational dynamics unearthed from RNNs provide reliable neurobiological inferences. Understanding the effects of design choices on RNN computation is valuable in two ways. First, invariant properties that persist in RNNs across a wide range of design choices are more likely to be candidate neurobiological mechanisms. Second, understanding what design choices lead to similar dynamical solutions reduces the burden of imposing that all design choices be totally faithful replications of biology. We focus our investigation on how RNN learning rule and task design affect RNN computation. We trained large populations of RNNs with different, but commonly used, learning rules on decision-making tasks inspired by neuroscience literature. For relatively complex tasks, we find that attractor topology is invariant to the choice of learning rule, but representational geometry is not. For simple tasks, we find that attractor topology depends on task input noise. However, when a task becomes increasingly complex, RNN attractor topology becomes invariant to input noise. Together, our results suggest that RNN dynamics are robust across learning rules but can be sensitive to the training task design, especially for simpler tasks.

PDF Details