Posts

Showing posts from 2025

Data Exploration Checklist

  Explore Checklist What is your data telling you? ⏹ Inspect your data : If your dataset isn’t too large, read through your data to assess whether interesting information jumps out ⏹ Use summary statistics : Evaluate your data by summarizing it (categorize, use statistics like average, standard deviation, etc.) ⏹ Inspect a random sample of your data : if your dataset is too large, a random sample may give you some initial information Visualizing data ⏹ Visualize your data using bar charts, line charts or scatter plots to examine information hidden in your dataset.  Bar charts Line charts Scatter plots Examine variable distributions ⏹ Inspect the distribution of your data  Categorize the data  Plot the categorized data Common data distributions: Normal Bimodal Log-normal Exponential Uniform Learn more about your data: ⏹ Evaluate the minimum ⏹ Evaluate the maximum ⏹ Evaluate the mode ⏹ Evaluate the standard deviation Examine variable relationships ⏹ V...

Cleaning Data Checklist

Scrubbing Data Checklist The scrubbing stage is all about cleaning your data and getting your dataset ready for analysis.  You can use this checklist to help you in the process. 1. Removing Duplicates ⏹ Identifying duplicate records : inspect records for duplicates and verify that they are actually a duplicate record.  ⏹ Remove duplicate records: remove the duplicate records from your dataset 2. Formatting records ⏹ Ensure consistency : check all data follow a consistent format and adjust the format if necessary ⏹ Identify the data type : make sure the data type is clear and identified  3. Solving for missing values ⏹ Identify the missing values : Scan your data for any values that may be missing ⏹ Solve for the missing values : Replace the missing values with text (e.g. NA) or delete the entire record with the missing value 4. Checking for wrong values ⏹ Identify wrong values : Scan your data for any wrong values ⏹ Solve for the wrong values : Replace...

Validity of Data

When obtaining data, it is important to check the validity of your dataset, or in other words, ensuring your data are of high quality so you can move on to the explore and analyze phase.  Here is a checklist you can use to ensure the validity of your data  Source credibility: ⏹ Authorship : Is the data provided by a reputable author or organization? What are the credentials of the author or organization? ⏹ Publication date: Is the data current and up-to-date? Methodology: ⏹ Sample size : Was the data collected from a large enough sample? ⏹ Sampling method : Was the sampling method unbiased and representative? ⏹ Data collection: Were the data collection methods clearly described and appropriate? Objectivity: ⏹ Bias: Are there any apparent biases in the data or its presentation? ⏹ Conflicts of interest: Are there any potential conflicts of interest that could influence the data? Accuracy: ⏹ Consistency : Are the data consistent with other reputable sources? ⏹ Error rate: Ar...

Free Dataset

  An Overview of Helpful Free Datasources Accessing data is simpler than ever, and there is a wide range of helpful data sources at your disposal. Here's a list of free data sources to help you gather information and insights more effectively, along with links to those resources.  Google Public Dataset Search Like Google Scholar, Google Dataset Search provides access to millions of datasets hosted on public websites, such as Kaggle and OGD Platform India, in thousands of locations on the internet. Link: www.datasetsearch.research.google.com United States Census Bureau The United States Census Bureau provides access to quality and essential data about the United States’ population, economy, and geography. Link: www.census.gov Pew Research Center The Pew Research Center provides insights and analysis on a wide range of social, political, and technological issues through surveys and research. Link: https://www.pewresearch.org/tools-and-resources/ Eurostat As the European Union's ...