John Tukey and other mathematicians worked to differentiate between modes of analysis in the 60’s and 70’s. EDA treats data as more than the support for existing knowledge; instead, data is viewed as a source of new ideas and hypotheses. Some key points when thining of EDA are below:
- Skepticism joined by openness
- Flexible, adaptable, risk-taking
- Includes randomness
- Smooth enough? Rough enough?
- Analysis should begin with data not summaries
- Stem-and-leaf (easy to construct by hand, shows numbers and shape)
- Box-and-whisker (good for providing visual detail of outliers, tails)
- Note in data: skewness, outliers, gaps, and multiple peaks
Distribution
- Location (mean, median, mode)
- Spread (width, standard deviations: ~68%, ~95%)
- Shape (standard bell-curve)
Tenets
- Shape of distribution at least as important as location and spread
- Visual representations are superior to purely numeric for discovering shape
- Choice of summary stats to describe data for single variable should depend on appropriateness of stat for the shape of distribution