Arrow Research search
Back to AAAI

AAAI 1997

Presenting and Analyzing the Results of AI Experiments: Data Averaging and Data Snooping

Conference Paper Experimental Methodology Artificial Intelligence

Abstract

Experimental resultsreportedin themachinelearning AI literaturecanbe misleading. This paperinvestigatesthe common processesof dataaveraging(reporting resultsin terms of the meanand standarddeviation of the resultsfrom multiple trials) and data snoopingin the context of neural networks, one of the most popular AI machine learning models. Both of theseprocesses canresult in misleading results and inaccurateconclusions. We demonstratehow easilythis canhappen and proposetechniquesfor avoiding thesevery important problems. For data averaging, common presentation assumesthat the distribution of individual results is Gaussian. However, we investigatethe distribution for common problems and find that it often does not approximate the Gaussiandistribution, may not be symmetric, and may be multimodal. We show that assumingGaussiandistributions cansignificantly affect the interpretation of results, especially thoseof comparisonstudies. For a controlled task, we find that the distribution of performance is skewedtowards better performance for smoothertarget functions and ’ skewedtowards worse performance for more complex target functions. We proposenew guidelines for reporting performance which provide more information about the actual distribution (e. g. box-whiskers plots). For data snooping, we demonstratethat optimization of performancevia experimentation with multiple parameterscanleadto significance being assignedto resultswhich aredue to chance. We suggestthat precisedescriptionsof experimentaltechniquescan be very important to the evaluation of results, and that we needto be awareof potential datasnoopingbiaseswhen formulating theseexperimental techniques (e. g. selecting the testprocedure). Additionally, it is important to only rely on appropriate statisticaltests and to ensurethat any assumptions madein the testsarevalid (e. g. normality of thedistribution).

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
AAAI Conference on Artificial Intelligence
Archive span
1980-2026
Indexed papers
28718
Paper id
503292525142728010