Data Exploration Checklist
Explore Checklist
What is your data telling you?
⏹ Inspect your data: If your dataset isn’t too large, read through your data to assess whether interesting information jumps out
⏹ Use summary statistics: Evaluate your data by summarizing it (categorize, use statistics like average, standard deviation, etc.)
⏹ Inspect a random sample of your data: if your dataset is too large, a random sample may give you some initial information
Visualizing data
⏹ Visualize your data using bar charts, line charts or scatter plots to examine information hidden in your dataset.
Bar charts Line charts Scatter plots
Examine variable distributions
⏹ Inspect the distribution of your data
Categorize the data
Plot the categorized data
Common data distributions:
Normal Bimodal Log-normal Exponential Uniform
Learn more about your data:
⏹ Evaluate the minimum
⏹ Evaluate the maximum
⏹ Evaluate the mode
⏹ Evaluate the standard deviation
Examine variable relationships
⏹ Visualize variables to understand their correlation
Common visualizations:
Scatter plot Line chart
⏹ Calculate the correlation coefficient to understand the strength of the correlation
0 = no correlation
1 = perfect positive correlation
-1 = perfect negative correlation
Feature engineering
⏹ Evaluate whether we can create new features or modify existing ones to better understand our data
Comments
Post a Comment