Data Exploration Checklist

 

Explore Checklist

What is your data telling you?

Inspect your data: If your dataset isn’t too large, read through your data to assess whether interesting information jumps out

Use summary statistics: Evaluate your data by summarizing it (categorize, use statistics like average, standard deviation, etc.)

Inspect a random sample of your data: if your dataset is too large, a random sample may give you some initial information

Visualizing data

Visualize your data using bar charts, line charts or scatter plots to examine information hidden in your dataset. 

Bar charts Line charts Scatter plots

Examine variable distributions

Inspect the distribution of your data 

  • Categorize the data 

  • Plot the categorized data

Common data distributions:

Normal Bimodal Log-normal Exponential Uniform

Learn more about your data:

Evaluate the minimum

Evaluate the maximum

Evaluate the mode

Evaluate the standard deviation

Examine variable relationships

Visualize variables to understand their correlation

Common visualizations: 

Scatter plot Line chart

⏹ Calculate the correlation coefficient to understand the strength of the correlation

0  = no correlation

1   = perfect positive correlation

-1 = perfect negative correlation

Feature engineering

⏹ Evaluate whether we can create new features or modify existing ones to better understand our data

Comments

Popular posts from this blog

IELTS Writing Task 1 - Process

IELTS Writing task 2

IELTS Academic Writing Task 1 - Map