Author name cluster

Simone Rossi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

1 author row

TMLR Journal 2026 Journal Article

Think2SQL: Blueprinting Reward Density and Advantage Scaling for Effective Text-to-SQL Reasoning

Simone Papicchio
Simone Rossi
Luca Cagliero
Paolo Papotti

While Large Language Models (LLMs) have advanced the state-of-the-art in Text-to-SQL, robust reasoning in complex, multi-table environments remains a bottleneck for parameter-efficient models. This paper presents a systematic empirical study on injecting reasoning capabilities into Text-to-SQL through the lens of Reinforcement Learning with Verifiable Rewards (RLVR) for the Qwen3 model family. We uncover a critical interplay between reward density, advantage scaling, and model capacity. Our analysis yields four primary insights. First, we propose a novel execution-guided dense reward function that significantly outperforms binary signals and existing state-of-the-art rewards by providing granular feedback at the instance level. Second, we analyze the mechanics of advantage calculation, demonstrating that while large models thrive on sparse signals with aggressive advantage scaling, smaller models require dense rewards and conservative scaling to improve Text-to-SQL performance. Third, we evaluate the impact of cold start, showing that distillation does not always benefit RLVR performance, and supervised fine-tuned models are prone to distributional mimicry. Fourth, we map the Pareto frontier of training efficiency, providing insights for optimizing Text-to-SQL reasoning under computational constraints. Our findings culminate in the Think2SQL family: our 4B-parameter model demonstrates reasoning capabilities competitive with state-of-the-art models such as o3. We release our models, datasets, and code to create a blueprint for RLVR optimization in Text-to-SQL at https://github.com/spapicchio/Think2SQL.

PDF Details

TMLR Journal 2025 Journal Article

The Initialization Determines Whether In-Context Learning Is Gradient Descent

Shifeng Xie
Rui Yuan
Simone Rossi
Thomas Hannagan

In-context learning (ICL) in large language models (LLMs) is a striking phenomenon, yet its underlying mechanisms remain only partially understood. Previous work connects linear self-attention (LSA) to gradient descent (GD), this connection has primarily been established under simplified conditions with zero-mean Gaussian priors and zero initialization for GD. However, subsequent studies have challenged this simplified view by highlighting its overly restrictive assumptions, demonstrating instead that under conditions such as multi-layer or nonlinear attention, self-attention performs optimization-like inference, akin to but distinct from GD. We investigate how multi-head LSA approximates GD under more realistic conditions—specifically when incorporating non-zero Gaussian prior means in linear regression formulations of ICL. We first extend multi-head LSA embedding matrix by introducing an initial estimation of the query, referred to as the initial guess. We prove an upper bound on the number of heads needed for ICL linear regression setup. Our experiments confirm this result and further observe that a performance gap between one-step GD and multi-head LSA persists. To address this gap, we introduce $y_q$-LSA, a simple generalization of single-head LSA with a trainable initial guess $y_q$. We theoretically establish the capabilities of $y_q$-LSA and provide experimental validation on linear regression tasks, thereby extending the theory that bridges ICL and GD. Finally, inspired by our findings in the case of linear regression, we consider widespread LLMs augmented with initial guess capabilities, and show that their performance is improved on a semantic similarity task.

PDF Details

AIIM Journal 2025 Journal Article

Unsupervised learning from EEG data for epilepsy: A systematic literature review

Alexandra-Maria Tautan
Alexandra-Georgiana Andrei
Carmelo Luca Smeralda
Giampaolo Vatti
Simone Rossi
Bogdan Ionescu

Background and objectives Epilepsy is a neurological disorder characterized by recurrent epileptic seizures, whose neurophysiological signature is altered electroencephalographic (EEG) activity. The use of artificial intelligence (AI) methods on EEG data can positively impact the management of the disease, significantly improving diagnostic and prognostic accuracy as well as treatment outcomes. Our work aims to systematically review the available literature on the use of unsupervised machine learning methods on EEG data in epilepsy, focusing on methodological and clinical differences in terms of algorithms used and clinical applications. Methods Following the PRISMA guidelines, a systematic literature search was performed in several databases for papers published in the last 10 years. Studies employing both unsupervised and self-supervised methods for the classification of EEG data in epilepsy patients were included. The main outcomes of the study were: (i) to provide an overview of the datasets used as input to train the algorithms; (ii) to identify trends in pre-processing, algorithm architectures, validation, and metrics for performance estimation; (iii) to identify and review the clinical applications of AI in epilepsy patients. Results A total of 108 studies met the inclusion criteria. Of them, 86 (79. 6 %) have been published in the last 5 years and 60 (55. 5 %) in the last two years. The most used validation methods were: hold-out in 37 (34. 2 %), k-fold-cross validation in 35 (32. 4 %), and leave-one-out in 19 (17. 6 %) studies, respectively. Accuracy, sensitivity, and specificity were the most used performance metrics being reported in 71 (65. 7 %), 62 (57. 4 %), and 42 (39. 8 %) studies, respectively, followed by F1-score (27 studies; 25 %), precision (26 studies; 24 %), area under the curve (25 studies; 23. 1 %), and false positive rate (22 studies; 20. 3 %). Furthermore, 42 (38. 9 %) compared to 63 (58. 3 %) studies used individual patient versus multiple patients models, respectively. Finally, concerning the clinical applications of unsupervised learning methods on epilepsy patients, we identified six main fields of interest: seizure detection (69 studies; 63. 9 %), seizure prediction (27 studies; 25 %), signal propagation and characterization (2 studies; 1. 8 %), seizure localization (4 studies; 3. 7 %), and seizure classification (22 studies; 20. 3 %), respectively. Conclusion The results of this review suggest that the interest in the use of unsupervised learning methods in epilepsy has significantly increased in recent years. From a methodological perspective, the input EEG datasets used for training and testing the algorithms remain the hardest challenge. From a clinical standpoint, the vast majority of studies addressed seizure detection, prediction, and classification whereas studies focusing on seizure characterization and localization are lacking. Future work that can potentially improve the performance of these algorithms includes the use of context information via reinforcement learning and a focus on model explainability.

Details DOI

EWRL Workshop 2024 Workshop Paper

Unbiased Policy Gradient with Random Horizon

Rui Yuan
Andrii Tretynko
Simone Rossi
Thomas Hannagan

Policy gradient (PG) methods are widely used in reinforcement learning. However, for infinite-horizon discounted reward settings, practical implementations of PG usually must rely on biased gradient estimators, due to the truncated finite-horizon sampling, which limits actual performance and hinders theoretical analysis. In this work, we introduce a new family of algorithms, __unbiased policy gradient__ (UPG), that enables unbiased gradient estimators by considering finite-horizon undiscounted rewards, where the horizon is randomly sampled from a geometric distribution $\mathrm{Geom}(1-\gamma)$ associated to the discount factor $\gamma$. Thanks to the absence of bias, UPG achieves the $\mathcal{O}(\epsilon^{-4})$ sample complexity to a stationary point, which is improved by $\mathcal{O}(\log\epsilon^{-1})$, compared to the one of the vanilla PG, and is met with fewer assumptions. Our work also provides a new angle on well-known algorithms such as Q-PGT and RPG. We recover the unbiased Q-PGT algorithm as a special case of UPG, allowing for its first sample complexity analysis. We further show that UPG can be extended to $\alpha$-UPG, a more generic class of PG algorithms which performs unbiased gradient estimators and notably admits RPG as a special case. The general sample complexity analysis of $\alpha$-UPG that we present enables to recover the convergence rates of RPG, also with tighter bounds. Finally, we propose and evaluate two new algorithms within the UPG family: unbiased GPOMDP (UGPOMDP) and $\alpha$-UGPOMDP. We show theoretically and empirically on four different environments that both UGPOMDP and $\alpha$-UGPOMDP outperform its known vanilla PG counterpart, GPOMDP.

PDF

NeurIPS Conference 2023 Conference Paper

Continuous-Time Functional Diffusion Processes

Giulio Franzese
Giulio Corallo
Simone Rossi
Markus Heinonen
Maurizio Filippone
Pietro Michiardi

We introduce Functional Diffusion Processes (FDPs), which generalize score-based diffusion models to infinite-dimensional function spaces. FDPs require a new mathematical framework to describe the forward and backward dynamics, and several extensions to derive practical training objectives. These include infinite-dimensional versions of Girsanov theorem, in order to be able to compute an ELBO, and of the sampling theorem, in order to guarantee that functional evaluations in a countable set of points are equivalent to infinite-dimensional functions. We use FDPs to build a new breed of generative models in function spaces, which do not require specialized network architectures, and that can work with any kind of continuous data. Our results on real data show that FDPs achieve high-quality image generation, using a simple MLP architecture with orders of magnitude fewer parameters than existing diffusion models.

PDF Details

NeurIPS Conference 2023 Conference Paper

On permutation symmetries in Bayesian neural network posteriors: a variational perspective

Simone Rossi
Ankit Singh
Thomas Hannagan

The elusive nature of gradient-based optimization in neural networks is tied to their loss landscape geometry, which is poorly understood. However recent work has brought solid evidence that there is essentially no loss barrier between the local solutions of gradient descent, once accounting for weight-permutations that leave the network's computation unchanged. This raises questions for approximate inference in Bayesian neural networks (BNNs), where we are interested in marginalizing over multiple points in the loss landscape. In this work, we first extend the formalism of marginalized loss barrier and solution interpolation to BNNs, before proposing a matching algorithm to search for linearly connected solutions. This is achieved by aligning the distributions of two independent approximate Bayesian solutions with respect to permutation matrices. Building on the work of Ainsworth et al. (2023), we frame the problem as a combinatorial optimization one, using an approximation to the sum of bilinear assignment problem. We then experiment on a variety of architectures and datasets, finding nearly zero marginalized loss barriers for linearly connected solutions.

PDF Details

JMLR Journal 2022 Journal Article

All You Need is a Good Functional Prior for Bayesian Deep Learning

Ba-Hien Tran
Simone Rossi
Dimitrios Milios
Maurizio Filippone

The Bayesian treatment of neural networks dictates that a prior distribution is specified over their weight and bias parameters. This poses a challenge because modern neural networks are characterized by a large number of parameters, and the choice of these priors has an uncontrolled effect on the induced functional prior, which is the distribution of the functions obtained by sampling the parameters from their prior distribution. We argue that this is a hugely limiting aspect of Bayesian deep learning, and this work tackles this limitation in a practical and effective way. Our proposal is to reason in terms of functional priors, which are easier to elicit, and to “tune” the priors of neural network parameters in a way that they reflect such functional priors. Gaussian processes offer a rigorous framework to define prior distributions over functions, and we propose a novel and robust framework to match their prior with the functional prior of neural networks based on the minimization of their Wasserstein distance. We provide vast experimental evidence that coupling these priors with scalable Markov chain Monte Carlo sampling offers systematically large performance improvements over alternative choices of priors and state-of-the-art approximate Bayesian deep learning approaches. We consider this work a considerable step in the direction of making the long-standing challenge of carrying out a fully Bayesian treatment of neural networks, including convolutional neural networks, a concrete possibility. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2022. ( edit, beta )

PDF Details

NeurIPS Conference 2021 Conference Paper

Model Selection for Bayesian Autoencoders

Ba-Hien Tran
Simone Rossi
Dimitrios Milios
Pietro Michiardi
Edwin V. Bonilla
Maurizio Filippone

We develop a novel method for carrying out model selection for Bayesian autoencoders (BAEs) by means of prior hyper-parameter optimization. Inspired by the common practice of type-II maximum likelihood optimization and its equivalence to Kullback-Leibler divergence minimization, we propose to optimize the distributional sliced-Wasserstein distance (DSWD) between the output of the autoencoder and the empirical data distribution. The advantages of this formulation are that we can estimate the DSWD based on samples and handle high-dimensional problems. We carry out posterior estimation of the BAE parameters via stochastic gradient Hamiltonian Monte Carlo and turn our BAE into a generative model by fitting a flexible Dirichlet mixture model in the latent space. Thanks to this approach, we obtain a powerful alternative to variational autoencoders, which are the preferred choice in modern application of autoencoders for representation learning with uncertainty. We evaluate our approach qualitatively and quantitatively using a vast experimental campaign on a number of unsupervised learning tasks and show that, in small-data regimes where priors matter, our approach provides state-of-the-art results, outperforming multiple competitive baselines.

PDF Details

NeurIPS Conference 2020 Conference Paper

Walsh-Hadamard Variational Inference for Bayesian Deep Learning

Simone Rossi
Sebastien Marmin
Maurizio Filippone

Over-parameterized models, such as DeepNets and ConvNets, form a class of models that are routinely adopted in a wide variety of applications, and for which Bayesian inference is desirable but extremely challenging. Variational inference offers the tools to tackle this challenge in a scalable way and with some degree of flexibility on the approximation, but for overparameterized models this is challenging due to the over-regularization property of the variational objective. Inspired by the literature on kernel methods, and in particular on structured approximations of distributions of random matrices, this paper proposes Walsh-Hadamard Variational Inference (WHVI), which uses Walsh-Hadamardbased factorization strategies to reduce the parameterization and accelerate computations, thus avoiding over-regularization issues with the variational objective. Extensive theoretical and empirical analyses demonstrate that WHVI yields considerable speedups and model reductions compared to other techniques to carry out approximate inference for over-parameterized models, and ultimately show how advances in kernel methods can be translated into advances in approximate Bayesian inference for Deep Learning.

PDF Details

YNIMG Journal 2012 Journal Article

Brains “in concert”: Frontal oscillatory alpha rhythms and empathy in professional musicians

Claudio Babiloni
Paola Buffo
Fabrizio Vecchio
Nicola Marzano
Claudio Del Percio
Danilo Spada
Simone Rossi
Ivo Bruni

Playing music in ensemble represents a unique human condition/performance where musicians should rely on empathic relationships. Recent theories attribute to frontal Brodmann areas (BAs) 44/45 and 10/11 a neural basis for “emotional” and “cognitive” empathy. We hypothesized that activity of these structures reflects empathy trait in professional musicians playing in ensemble. Simultaneous electroencephalographic (EEG) alpha rhythms (8–12Hz) were recorded in three saxophone quartets during music performance in ensemble (EXECUTION), video observation of their own performance (OBSERVATION), a control task (CONTROL), and resting state (RESTING). EEG source estimation was performed. Results showed that the higher the empathy quotient test score, the higher the alpha desynchronization in right BA 44/45 during the OBSERVATION referenced to RESTING condition. Empathy trait score and alpha desynchronization were not correlated in other control areas or in EXECUTION/CONTROL conditions. These results suggest that alpha rhythms in BA 44/45 reflect “emotional” empathy in musicians observing own performance.

Details DOI

YNIMG Journal 2010 Journal Article

Event-related rTMS at encoding affects differently deep and shallow memory traces

Iglis Innocenti
Fabio Giovannelli
Massimo Cincotta
Matteo Feurra
Nicola R. Polizzotto
Giovanni Bianco
Stefano F. Cappa
Simone Rossi

The “level of processing” effect is a classical finding of the experimental psychology of memory. Actually, the depth of information processing at encoding predicts the accuracy of the subsequent episodic memory performance. When the incoming stimuli are analyzed in terms of their meaning (semantic, or deep, encoding), the memory performance is superior with respect to the case in which the same stimuli are analyzed in terms of their perceptual features (shallow encoding). As suggested by previous neuroimaging studies and by some preliminary findings with transcranial magnetic stimulation (TMS), the left prefrontal cortex may play a role in semantic processing requiring the allocation of working memory resources. However, it still remains unclear whether deep and shallow encoding share or not the same cortical networks, as well as how these networks contribute to the “level of processing” effect. To investigate the brain areas casually involved in this phenomenon, we applied event-related repetitive TMS (rTMS) during deep (semantic) and shallow (perceptual) encoding of words. Retrieval was subsequently tested without rTMS interference. RTMS applied to the left dorsolateral prefrontal cortex (DLPFC) abolished the beneficial effect of deep encoding on memory performance, both in terms of accuracy (decrease) and reaction times (increase). Neither accuracy nor reaction times were instead affected by rTMS to the right DLPFC or to an additional control site excluded by the memory process (vertex). The fact that online measures of semantic processing at encoding were unaffected suggests that the detrimental effect on memory performance for semantically encoded items took place in the subsequent consolidation phase. These results highlight the specific causal role of the left DLPFC among the wide left-lateralized cortical network engaged by long-term memory, suggesting that it probably represents a crucial node responsible for the improved memory performance induced by semantic processing.

Details DOI

YNIMG Journal 2004 Journal Article

Human cortical EEG rhythms during long-term episodic memory task. A high-resolution EEG study of the HERA model

Claudio Babiloni
Fabio Babiloni
Filippo Carducci
Stefano Cappa
Febo Cincotti
Claudio Del Percio
Carlo Miniussi
Davide Vito Moretti

Many recent neuroimaging studies of episodic memory have indicated an asymmetry in prefrontal involvement, with the left prefrontal cortex more involved than the right in encoding, the right more than the left in retrieval (hemispheric encoding and retrieval asymmetry, or HERA model). In this electroencephalographic (EEG) high-resolution study, we studied brain rhythmicity during a visual episodic memory (recognition) task. The theta (4–6 Hz), alpha (6–12 Hz) and gamma (28–48 Hz) oscillations were investigated during a visuospatial long-term episodic memory task including an encoding (ENC) and retrieval (RET) phases. During the ENC phase, 25 figures representing interiors of buildings (“indoor”) were randomly intermingled with 25 figures representing landscapes (“landscapes”). Subject's response was given at left (“indoor”) or right (“landscapes”) mouse button. During the RET phase (1 h later), 25 figures representing previously presented “indoor” pictures (“tests”) were randomly intermingled with 25 figures representing novel “indoor” (“distractors”). Again, a mouse response was required. Theta and alpha EEG results showed no change of frontal rhythmicity. In contrast, the HERA prediction of asymmetry was fitted only by EEG gamma responses, but only in the posterior parietal areas. The ENC phase was associated with gamma EEG oscillations over left parietal cortex. Afterward, the RET phase was associated with gamma EEG oscillations predominantly over right parietal cortex. The predicted HERA asymmetry was thus observed in an unexpected location. This discrepancy may be due to the differential sensitivity of neuroimaging methods to selected components of cognitive processing. The strict relation between gamma response and perception suggests that retrieval processes of long-term memory deeply impinged upon sensory representation of the stored material.

Details DOI

YNIMG Journal 1998 Journal Article

Modulation of Corticospinal Output to Human Hand Muscles Following Deprivation of Sensory Feedback

Simone Rossi
Patrizio Pasqualetti
Franca Tecchio
Alessandro Sabato
Paolo Maria Rossini

Excitability and conductivity of corticospinal tracts of 10 volunteers were investigated by motor-evoked potentials (MEPs) to transcranial magnetic brain stimulation, before and after anesthetic block of right median (sensory + motor) and radial (sensory) nervous fibers at the wrist. MEPs were simultaneously recorded from two ulnar-supplied muscles during full relaxation and voluntary contraction. These muscles maintained an intact strength following anesthesia, but they were in a remarkably different condition with respect to the surrounding skin: the first dorsal interosseous muscle (FDI) was totally “enveloped” within the anesthetized area but was still dispatching a normal proprioceptive feedback; the abductor digiti minimi (ADM) was preserving both cutaneous and proprioceptive information. Spinal and peripheral nerve excitability were monitored as well. The sensory deprivation induced short-term changes which selectively took place within the hemisphere connected to the anesthetized hand. The physiological latency “anticipation” of MEPs recorded during active contraction versus relaxation was reduced (P< 0. 001) in the FDI, but not in the ADM, when values during anesthesia were compared with preanesthesia values. The FDI cortical representation—as analyzed by a mapping procedure of the motor cortex via focal stimuli of several scalp positions—was significantly (P< 0. 002) reduced, while the ADM representation remained either unchanged or enlarged. MEP and F-wave variability significantly decreased in the FDI but not in the ADM. F-waves were also affected due to changes in the motoneuronal excitability at spinal level. Peripheral nerve and root stimulation showed no modifications. Results are discussed in view of the short-term modifications of the corticospinal pathway somatotopy produced by the selective reduction of the sensory flow. Implications of the sensory feedback in motor control are also discussed.

Details DOI