Arrow Research search

Author name cluster

Yan Zhou

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers
2 author rows

Possible papers

18

YNIMG Journal 2025 Journal Article

Atlas-based analysis of diffusion imaging may predict efficacy of forelimb movement therapy for motor recovery in post-stroke rats

  • Xinxin Zhao
  • Jingjing Ruan
  • Bo Li
  • Jiahui Cheng
  • Jianrong Xu
  • Yulian Zhu
  • Ce Li
  • Yan Zhou

BACKGROUND: This investigation employed atlas-based analysis of diffusion imaging to elucidate the therapeutic effects of bilateral and unilateral forelimb movement therapy in a rat stroke middle cerebral artery occlusion (MCAO) model. METHODS: Fifty-six rats were randomized into seven groups: sham, model (moderate/severe), and unilateral-treated (CIMT) (moderate/severe) and bilateraltreated (moderate/severe). Daily forelimb training began on day 7 post-surgery and continued throughout. Brain magnetic resonance imaging (MRI) and catwalk test were conducted on days 6, 14, and 28 post-MCAO. Whole-brain fractional anisotropy (FA) and mean diffusivity (MD) values based atlas evaluated the white matter integrity, complemented by gait analysis to evaluate forelimb motor therapy efficacy in forelimb functional recovery. RESULTS: Bilateral training showed greater neuroprotection (47-53 % MRI/TTC reduction, p < 0.001) than unilateral (34-24 %)/controls; only unilateral reduced severe infarctions (20-32 %, p < 0.001). Whole-brain DTI in MCAO rats showed injury-dependent white matter reorganization post-movement therapy: Bilateral therapy boosted thalamocortical integrity in moderate injuries (13-32 % FA↑, 8-11 % MD↓, p < 0.05), while unilateral protocol (CIMT) optimized motor pathways in severe cases (6-11 % FA↑, 7-9 % MD↓, p < 0.05). Gait improvements aligned: bilateral enhanced limb contact (+6-7 %) and paw expansion (+20 %, p < 0.05), whereas unilateral accelerated recovery (step cycle↓20-40 %, swing speed↑51-55 %, p < 0.001). CONCLUSIONS: This study pioneers a diffusion atlas framework combining longitudinal DTI with voxel-wise fixel analysis, to delineate forelimb training-driven white matter reorganization patterns. Bilateral training for moderate impairment maximized structural restoration, while advocating unilateral protocols for severe cases to optimize functional recovery.

NeurIPS Conference 2025 Conference Paper

Cypher-RI: Reinforcement Learning for Integrating Schema Selection into Cypher Generation

  • Hanchen Su
  • Xuyuan Li
  • Yan Zhou
  • zhuoyi lu
  • Ziwei Chai
  • Haozheng Wang
  • Chen Zhang
  • Yang Yang

The increasing utilization of graph databases across various fields stems from their capacity to represent intricate interconnections. Nonetheless, exploiting the full capabilities of graph databases continues to be a significant hurdle, largely because of the inherent difficulty in translating natural language into Cypher. Recognizing the critical role of schema selection in database query generation and drawing inspiration from recent progress in reasoning-augmented approaches trained through reinforcement learning to enhance inference capabilities and generalization, we introduce Cypher-RI, a specialized framework for the Text-to-Cypher task. Distinct from conventional approaches, our methodology seamlessly integrates schema selection within the Cypher generation pipeline, conceptualizing it as a critical element in the reasoning process. The schema selection mechanism is guided by textual context, with its outcomes recursively shaping subsequent inference processes. Impressively, our 7B-parameter model, trained through this RL paradigm, demonstrates superior performance compared to baselines, exhibiting a 9. 41\% accuracy improvement over GPT-4o on CypherBench. These results underscore the effectiveness of our proposed reinforcement learning framework, which integrates schema selection to enhance both the accuracy and reasoning capabilities in Text-to-Cypher tasks.

ICLR Conference 2025 Conference Paper

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

  • Qingkai Fang
  • Shoutao Guo
  • Yan Zhou
  • Zhengrui Ma
  • Shaolei Zhang 0001
  • Yang Feng 0004

Models like GPT-4o enable real-time interaction with large language models (LLMs) through speech, significantly enhancing user experience compared to traditional text-based interaction. However, there is still a lack of exploration on how to build speech interaction models based on open-source LLMs. To address this, we propose LLaMA-Omni, a novel model architecture designed for low-latency and high-quality speech interaction with LLMs. LLaMA-Omni integrates a pretrained speech encoder, a speech adaptor, an LLM, and a streaming speech decoder. It eliminates the need for speech transcription, and can simultaneously generate text and speech responses directly from speech instructions with extremely low latency. We build our model based on the latest Llama-3.1-8B-Instruct model. To align the model with speech interaction scenarios, we construct a dataset named InstructS2S-200K, which includes 200K speech instructions and corresponding speech responses. Experimental results show that compared to previous speech-language models, LLaMA-Omni provides better responses in both content and style, with a response latency as low as 226ms. Additionally, training LLaMA-Omni takes less than 3 days on just 4 GPUs, paving the way for the efficient development of speech-language models in the future.

ICLR Conference 2025 Conference Paper

N-ForGOT: Towards Not-forgetting and Generalization of Open Temporal Graph Learning

  • Liping Wang 0015
  • Xujia Li
  • Jingshu Peng
  • Yue Wang 0012
  • Chen Zhang 0010
  • Yan Zhou
  • Lei Chen 0002

Temporal Graph Neural Networks (TGNNs) lay emphasis on capturing node interactions over time but often overlook evolution in node classes and dynamic data distributions triggered by the continuous emergence of new class labels, known as the open-set problem. This problem poses challenges for existing TGNNs in preserving learned classes while rapidly adapting to new, unseen classes. To address this, this paper identifies two primary factors affecting model performance on the open temporal graph, backed by a theoretical guarantee: (1) the forgetting of prior knowledge and (2) distribution discrepancies between successive tasks. Building on these insights, we propose N-ForGOT, which incorporates two plug-in modules into TGNNs to preserve prior knowledge and enhance model generalizability for new classes simultaneously. The first module preserves previously established inter-class connectivity and decision boundaries during the training of new classes to mitigate the forgetting caused by temporal evolutions of class characteristics. The second module introduces an efficient method for measuring distribution discrepancies with designed temporal Weisfeiler-Lehman subtree patterns, effectively addressing both structural and temporal shifts while reducing time complexity. Experimental results on four public datasets demonstrate that our method significantly outperforms state-of-the-art approaches in prediction accuracy, prevention of forgetting, and generalizability.

YNIMG Journal 2025 Journal Article

White matter hyperintensity-associated iron overload links glymphatic system dysfunction to cognitive impairment in cerebral small vessel disease

  • Yage Qiu
  • Ying Hu
  • Weina Ding
  • Qingyang Fu
  • Wentao Hu
  • Yuanzheng Wang
  • Qun Xu
  • Yongming Dai

Glymphatic system function has been increasingly linked to cognition in cerebral small vessel disease (CSVD), although the underlying pathological mechanisms related to brain metabolism remain to be fully clarified. Iron overload within white matter hyperintensity (WMH), potentially reflecting metabolic abnormalities, may play a pivotal role in this process. This study investigated whether WMH iron burden mediates the association between glymphatic dysfunction and cognitive impairment in CSVD. A total of 102 patients with CSVD and 29 matched healthy controls (HCs) underwent brain MRI and cognitive assessments. WMH iron burden was quantified using a sub-voxel quantitative approach, while glymphatic function was assessed with the Diffusion Tensor Image Analysis aLong the Perivascular Space (DTI-ALPS) index. Correlation and mediation analyses were then conducted to evaluate relationships among WMH iron burden, DTI-ALPS index, and cognitive scores. Compared with HCs, CSVD patients exhibited significantly higher WMH iron burden, lower DTI-ALPS index, and poorer cognitive performances. Elevated WMH iron burden was associated with deficits in attention-executive (att-exe), memory, and visual-spatial domains, whereas reduced DTI-ALPS index correlated with impaired att-exe and memory function. Importantly, WMH iron burden fully mediated the link between DTI-ALPS index and both att-exe function (p < 0.001) and memory (p = 0.02) in the CSVD group. These findings noninvasively identify WMH iron overload, a probable representative of microglial activation, as a key mediator between glymphatic dysfunction and cognitive decline in CSVD, prompting a potential therapeutic target for disease management.

ICLR Conference 2024 Conference Paper

Deep Reinforcement Learning for Modelling Protein Complexes

  • Ziqi Gao
  • Tao Feng
  • Jiaxuan You
  • Chenyi Zi
  • Yan Zhou
  • Chen Zhang
  • Jia Li 0009

Structure prediction of large protein complexes (a.k.a., protein multimer mod- elling, PMM) can be achieved through the one-by-one assembly using provided dimer structures and predicted docking paths. However, existing PMM methods struggle with vast search spaces and generalization challenges: (1) The assembly of a N -chain multimer can be depicted using graph structured data, with each chain represented as a node and assembly actions as edges. Thus the assembly graph can be arbitrary acyclic undirected connected graph, leading to the com- binatorial optimization space of N^(N −2) for the PMM problem. (2) Knowledge transfer in the PMM task is non-trivial. The gradually limited data availability as the chain number increases necessitates PMM models that can generalize across multimers of various chains. To address these challenges, we propose GAPN, a Generative Adversarial Policy Network powered by domain-specific rewards and adversarial loss through policy gradient for automatic PMM prediction. Specifi- cally, GAPN learns to efficiently search through the immense assembly space and optimize the direct docking reward through policy gradient. Importantly, we de- sign a adversarial reward function to enhance the receptive field of our model. In this way, GAPN will simultaneously focus on a specific batch of multimers and the global assembly rules learned from multimers with varying chain numbers. Empirically, we have achieved both significant accuracy (measured by RMSD and TM-Score) and efficiency improvements compared to leading complex mod- eling software. GAPN outperforms the state-of-the-art method (MoLPC) with up to 27% improvement in TM-Score, with a speed-up of 600×.

YNIMG Journal 2024 Journal Article

Exploring cognitive related microstructural alterations in normal appearing white matter and deep grey matter for small vessel disease: A quantitative susceptibility mapping study

  • Yawen Sun
  • Wentao Hu
  • Ying Hu
  • Yage Qiu
  • Yuewei Chen
  • Qun Xu
  • Hongjiang Wei
  • Yongming Dai

Brain microstructural alterations possibly occur in the normal-appearing white matter (NAWM) and grey matter of small vessel disease (SVD) patients, and may contribute to cognitive impairment. The aim of this study was to explore cognitive related microstructural alterations in white matter and deep grey matter nuclei in SVD patients using magnetic resonance (MR) quantitative susceptibility mapping (QSM). 170 SVD patients, including 103 vascular mild cognitive impairment (VaMCI) and 67 no cognitive impairment (NCI), and 21 healthy control (HC) subjects were included, all underwent a whole-brain QSM scanning. Using a white matter and a deep grey matter atlas, subregion-based QSM analysis was conducted to identify and characterize microstructural alterations occurring within white matter and subcortical nuclei. Significantly different susceptibility values were revealed in NAWM and in several specific white matter tracts including anterior limb of internal capsule, corticospinal tract, medial lemniscus, middle frontal blade, superior corona radiata and tapetum among VaMCI, NCI and HC groups. However, no difference was found in white matter hyperintensities between VaMCI and NCI. A trend toward higher susceptibility in the caudate nucleus and globus pallidus of VaMCI patients compared to HC, indicating elevated iron deposition in these areas. Interestingly, some of these QSM parameters were closely correlated with both global and specific cognitive function scores, controlling age, gender and education level. Our study suggested that QSM may serve as a useful imaging tool for monitoring cognitive related microstructural alterations in brain. This is especially meaningful for white matter which previously lacks of attention.

NeurIPS Conference 2024 Conference Paper

Fast Graph Sharpness-Aware Minimization for Enhancing and Accelerating Few-Shot Node Classification

  • Yihong Luo
  • Yuhan Chen
  • Siya Qiu
  • Yiwei Wang
  • Chen Zhang
  • Yan Zhou
  • Xiaochun Cao
  • Jing Tang

Graph Neural Networks (GNNs) have shown superior performance in node classification. However, GNNs perform poorly in the Few-Shot Node Classification (FSNC) task that requires robust generalization to make accurate predictions for unseen classes with limited labels. To tackle the challenge, we propose the integration of Sharpness-Aware Minimization (SAM)--a technique designed to enhance model generalization by finding a flat minimum of the loss landscape--into GNN training. The standard SAM approach, however, consists of two forward-backward steps in each training iteration, doubling the computational cost compared to the base optimizer (e. g. , Adam). To mitigate this drawback, we introduce a novel algorithm, Fast Graph Sharpness-Aware Minimization (FGSAM), that integrates the rapid training of Multi-Layer Perceptrons (MLPs) with the superior performance of GNNs. Specifically, we utilize GNNs for parameter perturbation while employing MLPs to minimize the perturbed loss so that we can find a flat minimum with good generalization more efficiently. Moreover, our method reutilizes the gradient from the perturbation phase to incorporate graph topology into the minimization process at almost zero additional cost. To further enhance training efficiency, we develop FGSAM+ that executes exact perturbations periodically. Extensive experiments demonstrate that our proposed algorithm outperforms the standard SAM with lower computational costs in FSNC tasks. In particular, our FGSAM+ as a SAM variant offers a faster optimization than the base optimizer in most cases. In addition to FSNC, our proposed methods also demonstrate competitive performance in the standard node classification task for heterophilic graphs, highlighting the broad applicability.

JBHI Journal 2023 Journal Article

An Arbitrary Scale Super-Resolution Approach for 3D MR Images via Implicit Neural Representation

  • Qing Wu
  • Yuwei Li
  • Yawen Sun
  • Yan Zhou
  • Hongjiang Wei
  • Jingyi Yu
  • Yuyao Zhang

High Resolution (HR) medical images provide rich anatomical structure details to facilitate early and accurate diagnosis. In magnetic resonance imaging (MRI), restricted by hardware capacity, scan time, and patient cooperation ability, isotropic 3-dimensional (3D) HR image acquisition typically requests long scan time and, results in small spatial coverage and low signal-to-noise ratio (SNR). Recent studies showed that, with deep convolutional neural networks, isotropic HR MR images could be recovered from low-resolution (LR) input via single image super-resolution (SISR) algorithms. However, most existing SISR methods tend to approach scale-specific projection between LR and HR images, thus these methods can only deal with fixed up-sampling rates. In this paper, we propose ArSSR, an Ar bitrary S cale S uper- R esolution approach for recovering 3D HR MR images. In the ArSSR model, the LR image and the HR image are represented using the same implicit neural voxel function with different sampling rates. Due to the continuity of the learned implicit function, a single ArSSR model is able to achieve arbitrary and infinite up-sampling rate reconstructions of HR images from any input LR image. Then the SR task is converted to approach the implicit voxel function via deep neural networks from a set of paired HR and LR training examples. The ArSSR model consists of an encoder network and a decoder network. Specifically, the convolutional encoder network is to extract feature maps from the LR input images and the fully-connected decoder network is to approximate the implicit voxel function. Experimental results on three datasets show that the ArSSR model can achieve state-of-the-art SR performance for 3D HR MR image reconstruction while using a single trained model to achieve arbitrary up-sampling scales.

NeurIPS Conference 2023 Conference Paper

DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation

  • Qingkai Fang
  • Yan Zhou
  • Yang Feng

Direct speech-to-speech translation (S2ST) translates speech from one language into another using a single model. However, due to the presence of linguistic and acoustic diversity, the target speech follows a complex multimodal distribution, posing challenges to achieving both high-quality translations and fast decoding speeds for S2ST models. In this paper, we propose DASpeech, a non-autoregressive direct S2ST model which realizes both fast and high-quality S2ST. To better capture the complex distribution of the target speech, DASpeech adopts the two-pass architecture to decompose the generation process into two steps, where a linguistic decoder first generates the target text, and an acoustic decoder then generates the target speech based on the hidden states of the linguistic decoder. Specifically, we use the decoder of DA-Transformer as the linguistic decoder, and use FastSpeech 2 as the acoustic decoder. DA-Transformer models translations with a directed acyclic graph (DAG). To consider all potential paths in the DAG during training, we calculate the expected hidden states for each target token via dynamic programming, and feed them into the acoustic decoder to predict the target mel-spectrogram. During inference, we select the most probable path and take hidden states on that path as input to the acoustic decoder. Experiments on the CVSS Fr$\rightarrow$En benchmark demonstrate that DASpeech can achieve comparable or even better performance than the state-of-the-art S2ST model Translatotron 2, while preserving up to 18. 53$\times$ speedup compared to the autoregressive baseline. Compared with the previous non-autoregressive S2ST model, DASpeech does not rely on knowledge distillation and iterative decoding, achieving significant improvements in both translation quality and decoding speed. Furthermore, DASpeech shows the ability to preserve the speaker's voice of the source speech during translation.

IJCAI Conference 2023 Conference Paper

Dichotomous Image Segmentation with Frequency Priors

  • Yan Zhou
  • Bo Dong
  • Yuanfeng Wu
  • Wentao Zhu
  • Geng Chen
  • Yanning Zhang

Dichotomous image segmentation (DIS) has a wide range of real-world applications and gained increasing research attention in recent years. In this paper, we propose to tackle DIS with informative frequency priors. Our model, called FP-DIS, stems from the fact that prior knowledge in the frequency domain can provide valuable cues to identify fine-grained object boundaries. Specifically, we propose a frequency prior generator to jointly utilize a fixed filter and learnable filters to extract informative frequency priors. Before embedding the frequency priors into the network, we first harmonize the multi-scale side-out features to reduce their heterogeneity. This is achieved by our feature harmonization module, which is based on a gating mechanism to harmonize the grouped features. Finally, we propose a frequency prior embedding module to embed the frequency priors into multi-scale features through an adaptive modulation strategy. Extensive experiments on the benchmark dataset, DIS5K, demonstrate that our FP-DIS outperforms state-of-the-art methods by a large margin in terms of key evaluation metrics.

AAAI Conference 2021 Conference Paper

An Adaptive Hybrid Framework for Cross-domain Aspect-based Sentiment Analysis

  • Yan Zhou
  • Fuqing Zhu
  • Pu Song
  • Jizhong Han
  • Tao Guo
  • Songlin Hu

Cross-domain aspect-based sentiment analysis aims to utilize the useful knowledge in a source domain to extract aspect terms and predict their sentiment polarities in a target domain. Recently, methods based on adversarial training have been applied to this task and achieved promising results. In such methods, both the source and target data are utilized to learn domain-invariant features through deceiving a domain discriminator. However, the task classifier is only trained on the source data, which causes the aspect and sentiment information lying in the target data can not be exploited by the task classifier. In this paper, we propose an Adaptive Hybrid Framework (AHF) for cross-domain aspect-based sentiment analysis. We integrate pseudo-label based semi-supervised learning and adversarial training in a unified network. Thus the target data can be used not only to align the features via the training of domain discriminator, but also to refine the task classifier. Furthermore, we design an adaptive mean teacher as the semi-supervised part of our network, which can mitigate the effects of noisy pseudo labels generated on the target data. We conduct experiments on four public datasets and the experimental results show that our framework significantly outperforms the state-of-the-art methods.

AAAI Conference 2021 Conference Paper

Does Explainable Artificial Intelligence Improve Human Decision-Making?

  • Yasmeen Alufaisan
  • Laura R. Marusich
  • Jonathan Z. Bakdash
  • Yan Zhou
  • Murat Kantarcioglu

Explainable AI provides insights to users into the why for model predictions, offering potential for users to better understand and trust a model, and to recognize and correct AI predictions that are incorrect. Prior research on human and explainable AI interactions has typically focused on measures such as interpretability, trust, and usability of the explanation. There are mixed findings whether explainable AI can improve actual human decisionmaking and the ability to identify the problems with the underlying model. Using real datasets, we compare objective human decision accuracy without AI (control), with an AI prediction (no explanation), and AI prediction with explanation. We find providing any kind of AI prediction tends to improve user decision accuracy, but no conclusive evidence that explainable AI has a meaningful impact. Moreover, we observed the strongest predictor for human decision accuracy was AI accuracy and that users were somewhat able to detect when the AI was correct vs. incorrect, but this was not significantly affected by including an explanation. Our results indicate that, at least in some situations, the why information provided in explainable AI may not enhance user decisionmaking, and further research may be needed to understand how to integrate explainable AI into real systems.

AAAI Conference 2021 Conference Paper

Early Safety Warnings for Long-Distance Pipelines: A Distributed Optical Fiber Sensor Machine Learning Approach

  • Yiyuan Yang
  • Yi Li
  • Taojia Zhang
  • Yan Zhou
  • Haifeng Zhang

Automated pipeline safety early warning (PSEW) systems are designed to automatically identify and locate third-party damage events on oil and gas pipelines. They are intended to replace traditional, inefficient manual inspection methods. However, current PSEW methods cannot achieve universality for various complex environments because they are sensitive to the spatiotemporal stability of the signal obtained by its distributed sensors at various locations and times. Our research aimed to improve the accuracy of long-distance oil–gas PSEW systems through machine learning. In this paper, we propose a novel real-time action recognition method for long-distance PSEW systems based on a coherent Rayleigh scattering distributed optical fiber sensor. More specifically, we put forward two complementary feature calculation methods to describe signals and build a new action recognition deep learning network based on those features. Encouraging empirical results on the data collected at a real location confirm that the features can effectively describe signals in an environment with strong noise and weak signals, and the entire approach can identify and locate third-party damage events quickly under various hardware conditions with accuracies of 99.26% (500 Hz) and 97.20% (100 Hz). More generically, our method can be applied to other fields as well.

IJCAI Conference 2019 Conference Paper

A Span-based Joint Model for Opinion Target Extraction and Target Sentiment Classification

  • Yan Zhou
  • Longtao Huang
  • Tao Guo
  • Jizhong Han
  • Songlin Hu

Target-Based Sentiment Analysis aims at extracting opinion targets and classifying the sentiment polarities expressed on each target. Recently, token based sequence tagging methods have been successfully applied to jointly solve the two tasks, which aims to predict a tag for each token. Since they do not treat a target containing several words as a whole, it might be difficult to make use of the global information to identify that opinion target, leading to incorrect extraction. Independently predicting the sentiment for each token may also lead to sentiment inconsistency for different words in an opinion target. In this paper, inspired by span-based methods in NLP, we propose a simple and effective joint model to conduct extraction and classification at span level rather than token level. Our model first emulates spans with one or more tokens and learns their representation based on the tokens inside. And then, a span-aware attention mechanism is designed to compute the sentiment information towards each span. Extensive experiments on three benchmark datasets show that our model consistently outperforms the state-of-the-art methods.

YNICL Journal 2019 Journal Article

Structural brain network measures are superior to vascular burden scores in predicting early cognitive impairment in post stroke patients with small vessel disease

  • Jing Du
  • Yao Wang
  • Nan Zhi
  • Jieli Geng
  • Wenwei Cao
  • Ling Yu
  • Jianhua Mi
  • Yan Zhou

OBJECTIVES: In this cross-sectional study, we aimed to explore the mechanisms of early cognitive impairment in a post stroke non-dementia cerebral small vessel disease (SVD) cohort by comparing the SVD score with the structural brain network measures. METHOD: 127 SVD patients were recruited consecutively from a stroke clinic, comprising 76 individuals with mild cognitive impairment (MCI) and 51 with no cognitive impairment (NCI). Detailed neuropsychological assessments and multimodal MRI were performed. SVD scores were calculated on a standard scale, and structural brain network measures were analyzed by diffusion tensor imaging (DTI). Between-group differences were analyzed, and logistic regression was applied to determine the predictive value of SVD and network measures for cognitive status. Mediation analysis with structural equation modeling (SEM) was used to better understand the interactions of SVD burden, brain networks and cognitive deficits. RESULTS: ) was significantly related to cognitive state (p < .01) but not the SVD score. Mediation analysis showed that the standardized total effect (p = .013) and the standardized indirect effect (p = .016) of SVD score on cognition was significant, but the direct effect was not. CONCLUSIONS: Brain network measures, but not the SVD score, are significantly correlated with cognition in post-stroke SVD patients. Mediation analysis showed that the cerebral vascular lesions produce cognitive dysfunction by interfering with the structural brain network in SVD patients. The brain network measures may be regarded as direct and independent surrogate markers of cognitive impairment in SVD.

JMLR Journal 2008 Journal Article

A Multiple Instance Learning Strategy for Combating Good Word Attacks on Spam Filters

  • Zach Jorgensen
  • Yan Zhou
  • Meador Inge

Statistical spam filters are known to be vulnerable to adversarial attacks. One of the more common adversarial attacks, known as the good word attack, thwarts spam filters by appending to spam messages sets of "good" words, which are words that are common in legitimate email but rare in spam. We present a counterattack strategy that attempts to differentiate spam from legitimate email in the input space by transforming each email into a bag of multiple segments, and subsequently applying multiple instance logistic regression on the bags. We treat each segment in the bag as an instance. An email is classified as spam if at least one instance in the corresponding bag is spam, and as legitimate if all the instances in it are legitimate. We show that a classifier using our multiple instance counterattack strategy is more robust to good word attacks than its single instance counterpart and other single instance learners commonly used in the spam filtering domain. [abs] [ pdf ][ bib ] &copy JMLR 2008. ( edit, beta )