menu EXPLORE
history NEW

Exploratory data analysis

Data exploration methodologies are normally the first step before proceeding with advanced statistical techniques such as inferential statistics or autonomous learning.

Exploratory data analysis or also known as EDA for its acronym in English “Exploratory data analysis” is a part of statistical mathematics that uses tools to qualitatively describe the main characteristics of the data.

Graphs and metrics are used to summarize the data of interest to draw initial conclusions about the relationships between variables and possible correlations.

Graphic techniques for data exploration

The first step when we start analyzing a new set of data is to graph the different variables to begin to understand what information we can extract from them.

Some of the basic information exploration and analysis techniques are the following:

Box plots or boxplots

Box plots or in English, boxplots, are a type of graph that allows you to see the distribution of data in the form of a box.

They represent the different quartiles of the distribution along with the mean, standard deviation and outliers. This type of graph gives us a first view of what shape the data has and how it is distributed within our dataset.

data analysis with diagrams

Histograms

Histograms are graphs that describe a variable using bars where their area is directly proportional to the frequency of the values ​​in our data.

There are different types of histogram graphs, each with a specific objective to understand the data.

It is highly recommended to use this type of visualizations to understand our variables when we carry out the initial phases of data exploration and analysis.

data analysis with histograms

Heat maps or heatmaps

Heat maps are a type of graph used in many sectors to analyze magnitudes of a variable according to its color. Normally, the range of colors used ranges from blue to red, with blue being the lowest values ​​and red being the highest.

This type of data exploration is used in many fields such as molecular biology to detect the level of expression of genes or digital marketing to know which parts of the website where users interact the most.

analyze data with heat maps

Scatter plots

This type of graph allows you to study the relationship between pairs of variables (x,y) through a diagram formed by a cloud of points. Thanks to this analysis we can see variables related through a direct or inverse correlation (directly proportional or inversely proportional).

When to use data exploration

The answer is always. This type of initial analysis allows us to begin drawing conclusions from our data and can guide us how to define the data analysis strategy.

Furthermore, in this step we can detect the quality of the received data set and design a good methodology to clean the data, improving its quality and improving the analysis results.

Tools for data exploration

There are many advanced tools for data analysis. They are designed to carry out business intelligence or machine learning methodologies.

However, to do an initial exploratory analysis we do not need any paid tool. We can directly use a spreadsheet such as Excel or Google Sheets.

These programs allow us to open the data and create different graphs to begin to have an idea of ​​what the information we have received is like.

My favorite tool is the Python or R programming languages. These have different libraries aimed at data analysis. If we master either of these two languages ​​we can create different graphics quickly and effectively.