Categories
CraftingWithData

Reading: Exploratory Data Analysis (EDA)

John Tukey and other mathematicians worked to differentiate between modes of analysis in the 60’s and 70’s. EDA treats data as more than the support for existing knowledge; instead, data is viewed as a source of new ideas and hypotheses. Some key points when thining of EDA are below:

  • Skepticism joined by openness
  • Flexible, adaptable, risk-taking
  • Includes randomness
  • Smooth enough? Rough enough?
  • Analysis should begin with data not summaries
  • Stem-and-leaf (easy to construct by hand, shows numbers and shape)
  • Box-and-whisker (good for providing visual detail of outliers, tails)
  • Note in data: skewness, outliers, gaps, and multiple peaks

Distribution

  • Location (mean, median, mode)
  • Spread (width, standard deviations: ~68%, ~95%)
  • Shape (standard bell-curve)

Tenets

  1. Shape of distribution at least as important as location and spread
  2. Visual representations are superior to purely numeric for discovering shape
  3. Choice of summary stats to describe data for single variable should depend on appropriateness of stat for the shape of distribution

Leave a Reply

Your email address will not be published. Required fields are marked *