Arrow Research search

Author name cluster

Shanshan Wu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers
2 author rows

Possible papers

10

YNICL Journal 2025 Journal Article

Can repetitive transcranial magnetic stimulation promote recovery of consciousness in patients with disorders of consciousness? A randomized controlled trial

  • Zhenyu Liu
  • Shanshan Wu
  • Shuwei Wang
  • Huijuan Wu
  • Hongliang Gao
  • Xiao Lu

BACKGROUND: Disorders of consciousness (DoC) are characterized by a broad decline in background excitatory synaptic activity and varying levels of cerebral network disruption. Repetitive transcranial magnetic stimulation (rTMS), a neuromodulatory technique, is anticipated to assist the recovery of consciousness. Nonetheless, ongoing debates persist regarding its effectiveness, in light of the inconsistent results of recent research. OBJECTIVE: The purpose of this study is to investigate the efficacy of rTMS in promoting recovery of consciousness in patients with DoC and to probe its impact on activity of cerebral functional networks. METHODS: Forty-eight patients with DoC were included in this randomized controlled trial (Chinese Clinical Trial Registry: ChiCTR2100044930). Twenty-four patients in the control group accepted conventional therapy. Another 24 patients in the rTMS group received extra rTMS over the dorsolateral prefrontal cortex (DLPFC) once per workday during a 4-week intervention phase. Primary outcome was the proportion of patients emerging improvement on level of consciousness (LOC) based on coma recovery scale- revised (CRS-R) at the end of intervention. Furthermore, other behavioral scales such as the clinical global impression-improvement (CGI-I) and resting state-electroencephalography (rs-EEG) microstate were employed as secondary outcomes. Different microstates served as tools to detect the activity of respective corresponding resting state cerebral functional networks. RESULTS: = 0.027). CONCLUSION: High-frequency rTMS over the DLPFC could promote recovery of consciousness in patients with DoC. It might get involved in modulating the balance among cerebral functional networks and facilitating consciousness recovery.

ICML Conference 2025 Conference Paper

Synthesizing Privacy-Preserving Text Data via Finetuning *without* Finetuning Billion-Scale LLMs

  • Bowen Tan
  • Zheng Xu
  • Eric P. Xing
  • Zhiting Hu
  • Shanshan Wu

Synthetic data offers a promising path to train models while preserving data privacy. Differentially private (DP) finetuning of large language models (LLMs) as data generator is effective, but is impractical when computation resources are limited. Meanwhile, prompt-based methods such as private evolution depend heavily on the manual prompts, and ineffectively use private information in their iterative data selection process. To overcome these limitations, we propose CTCL (Data Synthesis with C on T rollability and CL ustering), a novel framework for generating privacy-preserving synthetic data without extensive prompt engineering or billion-scale LLM finetuning. CTCL pretrains a lightweight 140M conditional generator and a clustering-based topic model on large-scale public data. To further adapt to the private domain, the generator is DP finetuned on private data for fine-grained textual information, while the topic model extracts a DP histogram representing distributional information. The DP generator then samples according to the DP histogram to synthesize a desired number of data examples. Evaluation across five diverse domains demonstrates the effectiveness of our framework, particularly in the strong privacy regime. Systematic ablation validates the design of each framework component and highlights the scalability of our approach.

NeurIPS Conference 2025 Conference Paper

Who You Are Matters: Bridging Interests and Social Roles via LLM-Enhanced Logic Recommendation

  • Qing Yu
  • Xiaobei Wang
  • Shuchang Liu
  • Xiaoyu Yang
  • Xueliang Wang
  • Chang Meng
  • Shanshan Wu
  • Bin Wen

Recommender systems filter contents/items valuable to users by inferring preferences from user features and historical behaviors. Mainstream approaches follow the learning-to-rank paradigm, which focus on discovering and modeling item topics (e. g. , categories), and capturing user preferences on these topics based on historical interactions. However, this paradigm often neglects the modeling of user characteristics and their social roles, which are logical confounders influencing the correlated interest and user preference transition. To bridge this gap, we introduce the user role identification task and the behavioral logic modeling task that aim to explicitly model user roles and learn the logical relations between item topics and user social roles. We show that it is possible to explicitly solve these tasks through an efficient integration framework of Large Language Model (LLM) and recommendation systems, for which we propose TagCF. On the one hand, TagCF exploits the (Multi-modal) LLM's world knowledge and logic inference ability to extract realistic tag-based virtual logic graphs that reveal dynamic and expressive knowledge of users, refining our understanding of user behaviors. On the other hand, TagCF presents empirically effective integration modules that take advantage of the extracted tag-logic information, augmenting the recommendation performance. We conduct both online experiments and offline experiments with industrial and public datasets as verification of TagCF's effectiveness, and we empirically show that the user role modeling strategy is potentially a better choice than the modeling of item topics. Additionally, we provide evidence that the extracted logic graphs are empirically a general and transferable knowledge that can benefit a wide range of recommendation tasks. Our code is available in https: //github. com/Code2Q/TagCF.

NeurIPS Conference 2021 Conference Paper

Federated Reconstruction: Partially Local Federated Learning

  • Karan Singhal
  • Hakim Sidahmed
  • Zachary Garrett
  • Shanshan Wu
  • John Rush
  • Sushant Prakash

Personalization methods in federated learning aim to balance the benefits of federated and local training for data availability, communication cost, and robustness to client heterogeneity. Approaches that require clients to communicate all model parameters can be undesirable due to privacy and communication constraints. Other approaches require always-available or stateful clients, impractical in large-scale cross-device settings. We introduce Federated Reconstruction, the first model-agnostic framework for partially local federated learning suitable for training and inference at scale. We motivate the framework via a connection to model-agnostic meta learning, empirically demonstrate its performance over existing approaches for collaborative filtering and next word prediction, and release an open-source library for evaluating approaches in this setting. We also describe the successful deployment of this approach at scale for federated collaborative filtering in a mobile keyboard application.

NeurIPS Conference 2020 Conference Paper

Implicit Regularization and Convergence for Weight Normalization

  • Xiaoxia Wu
  • Edgar Dobriban
  • Tongzheng Ren
  • Shanshan Wu
  • Zhiyuan Li
  • Suriya Gunasekar
  • Rachel Ward
  • Qiang Liu

Normalization methods such as batch, weight, instance, and layer normalization are commonly used in modern machine learning. Here, we study the weight normalization (WN) method \cite{salimans2016weight} and a variant called reparametrized projected gradient descent (rPGD) for overparametrized least squares regression and some more general loss functions. WN and rPGD reparametrize the weights with a scale $g$ and a unit vector such that the objective function becomes \emph{non-convex}. We show that this non-convex formulation has beneficial regularization effects compared to gradient descent on the original objective. These methods adaptively regularize the weights and \emph{converge linearly} close to the minimum $\ell_2$ norm solution even for initializations far from zero. For certain two-phase variants, they can converge to the min norm solution. This is different from the behavior of gradient descent, which only converges to the min norm solution when started at zero, and thus more sensitive to initialization.

ICML Conference 2019 Conference Paper

Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling

  • Shanshan Wu
  • Alexandros G. Dimakis
  • Sujay Sanghavi
  • Felix X. Yu
  • Daniel Niels Holtmann-Rice
  • Dmitry Storcheus
  • Afshin Rostamizadeh
  • Sanjiv Kumar

Linear encoding of sparse vectors is widely popular, but is commonly data-independent – missing any possible extra (but a priori unknown) structure beyond sparsity. In this paper we present a new method to learn linear encoders that adapt to data, while still performing well with the widely used $\ell_1$ decoder. The convex $\ell_1$ decoder prevents gradient propagation as needed in standard gradient-based training. Our method is based on the insight that unrolling the convex decoder into $T$ projected subgradient steps can address this issue. Our method can be seen as a data-driven way to learn a compressed sensing measurement matrix. We compare the empirical performance of 10 algorithms over 6 sparse datasets (3 synthetic and 3 real). Our experiments show that there is indeed additional structure beyond sparsity in the real datasets; our method is able to discover it and exploit it to create excellent reconstructions with fewer measurements (by a factor of 1. 1-3x) compared to the previous state-of-the-art methods. We illustrate an application of our method in learning label embeddings for extreme multi-label classification, and empirically show that our method is able to match or outperform the precision scores of SLEEC, which is one of the state-of-the-art embedding-based approaches.

NeurIPS Conference 2019 Conference Paper

Learning Distributions Generated by One-Layer ReLU Networks

  • Shanshan Wu
  • Alexandros Dimakis
  • Sujay Sanghavi

We consider the problem of estimating the parameters of a $d$-dimensional rectified Gaussian distribution from i. i. d. samples. A rectified Gaussian distribution is defined by passing a standard Gaussian distribution through a one-layer ReLU neural network. We give a simple algorithm to estimate the parameters (i. e. , the weight matrix and bias vector of the ReLU neural network) up to an error $\eps\norm{W}_F$ using $\widetilde{O}(1/\eps^2)$ samples and $\widetilde{O}(d^2/\eps^2)$ time (log factors are ignored for simplicity). This implies that we can estimate the distribution up to $\eps$ in total variation distance using $\widetilde{O}(\kappa^2d^2/\eps^2)$ samples, where $\kappa$ is the condition number of the covariance matrix. Our only assumption is that the bias vector is non-negative. Without this non-negativity assumption, we show that estimating the bias vector within any error requires the number of samples at least exponential in the infinity norm of the bias vector. Our algorithm is based on the key observation that vector norms and pairwise angles can be estimated separately. We use a recent result on learning from truncated samples. We also prove two sample complexity lower bounds: $\Omega(1/\eps^2)$ samples are required to estimate the parameters up to error $\eps$, while $\Omega(d/\eps^2)$ samples are necessary to estimate the distribution up to $\eps$ in total variation distance. The first lower bound implies that our algorithm is optimal for parameter estimation. Finally, we show an interesting connection between learning a two-layer generative model and non-negative matrix factorization. Experimental results are provided to support our analysis.

NeurIPS Conference 2019 Conference Paper

Sparse Logistic Regression Learns All Discrete Pairwise Graphical Models

  • Shanshan Wu
  • Sujay Sanghavi
  • Alexandros Dimakis

We characterize the effectiveness of a classical algorithm for recovering the Markov graph of a general discrete pairwise graphical model from i. i. d. samples. The algorithm is (appropriately regularized) maximum conditional log-likelihood, which involves solving a convex program for each node; for Ising models this is $\ell_1$-constrained logistic regression, while for more general alphabets an $\ell_{2, 1}$ group-norm constraint needs to be used. We show that this algorithm can recover any arbitrary discrete pairwise graphical model, and also characterize its sample complexity as a function of model width, alphabet size, edge parameter accuracy, and the number of variables. We show that along every one of these axes, it matches or improves on all existing results and algorithms for this problem. Our analysis applies a sharp generalization error bound for logistic regression when the weight vector has an $\ell_1$ (or $\ell_{2, 1}$) constraint and the sample vector has an $\ell_{\infty}$ (or $\ell_{2, \infty}$) constraint. We also show that the proposed convex programs can be efficiently solved in $\tilde{O}(n^2)$ running time (where $n$ is the number of variables) under the same statistical guarantees. We provide experimental results to support our analysis.

NeurIPS Conference 2016 Conference Paper

Leveraging Sparsity for Efficient Submodular Data Summarization

  • Erik Lindgren
  • Shanshan Wu
  • Alexandros Dimakis

The facility location problem is widely used for summarizing large datasets and has additional applications in sensor placement, image retrieval, and clustering. One difficulty of this problem is that submodular optimization algorithms require the calculation of pairwise benefits for all items in the dataset. This is infeasible for large problems, so recent work proposed to only calculate nearest neighbor benefits. One limitation is that several strong assumptions were invoked to obtain provable approximation guarantees. In this paper we establish that these extra assumptions are not necessary—solving the sparsified problem will be almost optimal under the standard assumptions of the problem. We then analyze a different method of sparsification that is a better model for methods such as Locality Sensitive Hashing to accelerate the nearest neighbor computations and extend the use of the problem to a broader family of similarities. We validate our approach by demonstrating that it rapidly generates interpretable summaries.

NeurIPS Conference 2016 Conference Paper

Single Pass PCA of Matrix Products

  • Shanshan Wu
  • Srinadh Bhojanapalli
  • Sujay Sanghavi
  • Alexandros Dimakis

In this paper we present a new algorithm for computing a low rank approximation of the product $A^TB$ by taking only a single pass of the two matrices $A$ and $B$. The straightforward way to do this is to (a) first sketch $A$ and $B$ individually, and then (b) find the top components using PCA on the sketch. Our algorithm in contrast retains additional summary information about $A, B$ (e. g. row and column norms etc. ) and uses this additional information to obtain an improved approximation from the sketches. Our main analytical result establishes a comparable spectral norm guarantee to existing two-pass methods; in addition we also provide results from an Apache Spark implementation that shows better computational and statistical performance on real-world and synthetic evaluation datasets.