AAAI 1997
Presenting and Analyzing the Results of AI Experiments: Data Averaging and Data Snooping
Abstract
Experimental resultsreportedin themachinelearning AI literaturecanbe misleading. This paperinvestigatesthe common processesof dataaveraging(reporting resultsin terms of the meanand standarddeviation of the resultsfrom multiple trials) and data snoopingin the context of neural networks, one of the most popular AI machine learning models. Both of theseprocesses canresult in misleading results and inaccurateconclusions. We demonstratehow easilythis canhappen and proposetechniquesfor avoiding thesevery important problems. For data averaging, common presentation assumesthat the distribution of individual results is Gaussian. However, we investigatethe distribution for common problems and find that it often does not approximate the Gaussiandistribution, may not be symmetric, and may be multimodal. We show that assumingGaussiandistributions cansignificantly affect the interpretation of results, especially thoseof comparisonstudies. For a controlled task, we find that the distribution of performance is skewedtowards better performance for smoothertarget functions and ’ skewedtowards worse performance for more complex target functions. We proposenew guidelines for reporting performance which provide more information about the actual distribution (e. g. box-whiskers plots). For data snooping, we demonstratethat optimization of performancevia experimentation with multiple parameterscanleadto significance being assignedto resultswhich aredue to chance. We suggestthat precisedescriptionsof experimentaltechniquescan be very important to the evaluation of results, and that we needto be awareof potential datasnoopingbiaseswhen formulating theseexperimental techniques (e. g. selecting the testprocedure). Additionally, it is important to only rely on appropriate statisticaltests and to ensurethat any assumptions madein the testsarevalid (e. g. normality of thedistribution).
Authors
Keywords
No keywords are indexed for this paper.
Context
- Venue
- AAAI Conference on Artificial Intelligence
- Archive span
- 1980-2026
- Indexed papers
- 28718
- Paper id
- 503292525142728010