How to be a Data Scientist
The truth is that data is winning every time more important among our society. The information allows companies to know what interests their customers most and to be able to offer them the most optimal service for them.
This is the reason why companies They do not stop storing their clients' data and from purchasing information from other companies about people who might be interested in their product.
Has it ever happened to you that an insurance company calls you and you have never given them your phone number?
Do you see ads on Instagram or Facebook about topics that you have talked about with family or friends?
Behind these questions a group of data scientists or data scientists.
These data experts are capable of collecting all types of information and turning it into tangible conclusions. That is, they interpret said data and create models with them.
Some of the objectives of data scientists are:
- Sales prediction in shops.
- Model creation banking fraud detection.
- Pattern detection in the genetic material that causes some type of disease.
- Generation of artificial intelligence models for autonomous cars.
These professionals need to have different skills to be able to perform their jobs effectively. Below we show you what you have to master to become a good data scientist.
Skills to work as a Data Scientist
Data science is a multidisciplinary area, so a good data scientist must be able to master different fields of science and technology:
Mathematics
It is really important to have a advanced mathematics proficiency. Fields such as statistics or linear algebra are essential to model data statistically and be able to generate machine learning models.
Another very important field of mathematics is graph theory. This mathematical object is a very useful way to model data and the connections between them, which is essential to master databases, especially databases. graph-oriented databases .
Machine learning and deep learning
Machine learning is the ability to create mathematical models that are capable of inferring new data from existing data. Deep learning is a subfield of machine learning that models information using what are known as deep neural networks.
It is important for the data professional to know how to generate reliable models and make a diagnosis of its effectiveness. You must also know what type of architecture to apply to each problem.
Some of the mathematical techniques that are included in machine learning are: neural networks (recurrent, convolutional, feedforward, etc.), K nearest neighbors, clustering techniques such as k-means clustering, dimension reduction, among others.
Big Data
Although this skill is not mandatory for a data scientist since there are big data engineers, understand distributed technologies and know how large amounts of information are managed In fact, it is very positive for a data scientist and you may be chosen for a job position over someone who does not know anything about big data.
Some of the most important frameworks to master are Spark, Hive, Cassandra or Kafka. They all belong to the Apache foundation.
It is also important to know how to run applications within a container or microservice using docker technology. As an extra point, it is also advisable to have notions of what Kubernetes is.
Databases
This point is crucial if you want to apply for a job as a data scientist. A data science expert has to be able to work with all types of databases with their eyes closed.
The first point is to be an expert in relational databases such as Mysql, MariaDb or PosgreSQL and know how to perform queries using the SQL declarative language (Structured Query Language).
It is advisable to know how to manage SQL databases in distributed architectures using techniques such as sharding.
The other type of database crucial for a data scientist is non-relational or NoSQL .
The most famous is MongoDB but there are others that allow you to work in a distributed way such as Cassandra or Redis.
If you want to be one of the best, then master the graph-oriented databases like Neo4j or Amazon Neptune will make you one.
So now you know, if you are thinking about becoming a data scientist, become an expert in all types of databases.
Programming languages
Knowing how to program, just like mastering databases, is essential. The most recommended and most used language in this field is Python.
This language is easy to learn and allows you to do all types of data analysis using libraries such as Pandas, as well as allowing you to train all types of artificial intelligence models using Pytorch or Tensorflow.
The statistics language called R also allows you to do the operations we just mentioned. However, it is increasingly out of use and companies require their employees to have a high level of Python.
If you are interested in the world of Big Data then it is advisable to learn Scala and Java since most of the frameworks in the Hadoop ecosystem are written in these languages.
Good communication skills
A very important part of a data scientist's job is to be able to convey the conclusions taken from the analysis of the information to other departments such as marketing or sales.
Conveying conclusions to people who are completely outside the world of data analysis is difficult. So you will have to express yourself in a simple way and use figures and graphs to make the explanation easier and more enjoyable.
This indirectly implies that you have to master graphics generation libraries. You can do them with excel, matplotlib (Python) or R.
How to learn Data Science
Once we know the skills we have to acquire to be the best data scientist, then we have to solve the next question.
How do we learn all the above skills?
Nowadays you don't need to be a graduate in computer engineering to be able to work in this. I myself have a degree in biochemistry and have worked for several years in companies as a data scientist.
Yes indeed. It is very important to train to be able to apply for this job.
It is advisable to access courses from platforms such as Udemy, Coursera or Domestika and look for courses on the skills we mentioned above. We can combine learning with books and other resources such as PDFs.
Another more expensive option is to complete an official master's degree in data science at a university in your country.
Once you master a little of everything then you can apply for jobs to enter with a junior or internship position. You will earn less but there you will be able to finish your training and apply for higher positions where salaries are usually very high, well above the country average.
Tips to become a good data scientist
When you are working on a project as a data scientist, it is important that before starting modeling you know the product or business in depth.
Once you have a good knowledge and start modeling, it is advisable to try different methods and see which one works best for the problem we want to solve.
The most important point is to make a good diagnosis of our final model and analyze if it suffers from overfitting or underfitting since this is an indication that our model is not the most optimal.