What is Docker

Rubén Cañadas 06/03/2024 Informatics

Docker and virtualization

When we develop an application, it is important that it has the basic requirements to function: operating system, libraries, environment variables, packages, among other needs.

When any of these requirements are not met, the application cannot function as it should. To avoid conflict problems, we must ensure that wherever it is executed, the application always has the necessary requirements to function.

There are several ways to achieve an isolated environment. The most used so far has been virtualization, which consists of creating an isolated environment within a host device. We could see virtualization as a box within an operating system that has access to the machine's resources such as CPU, disk or RAM and allows applications to run within it.

There are 3 main ways to carry out virtualization:

Virtual machines

Virtual machines are a simulation of a real computer. That is, when it starts up it acts as if it were a new device: it has its own operating system, it has memory and disk resources, its users, etc.

An example is, when on a computer that has the MacOS operating system we create a virtual machine to install Windows and be able to run programs not available for Mac. There are several programs that allow you to create virtual machines. Some are:

VirtualBox
Parallels
VMware Fusion

Virtual machines are the type of virtualization used in cloud servers to create VPS (Virtual Private Server) and create servers with different resources and needs.

However, this type of virtualization has a problem. It requires a lot of resources to be executed on top of an operating system. This problem gave rise to other solutions to isolation and virtualization.

Virtual environments

Virtual environments are a type of virtualization that are widely used to run programs written in Python. For each virtual environment, a specific version of Python and libraries are used. This way we can know for each project what packages and libraries are needed to run it.

Typically, all necessary Python packages and libraries are located within the requirements.txt file that is created via the command: pip freeze > requirements.txt.

Some examples of tools that allow you to create virtual environments are:

conda
Virtualenv
Pipenv

Containers

Finally we come to the virtualization system that interests us in this article: containers. These objects are similar to virtual machines with the difference that they are much lighter and do not require intensive resources. That is because they do not have their own operating system but rather use the OS of the host machine.

The best-known example is Docker, although there are other tools for creating containers such as Vagrant or Ansible. Docker makes use of a system for creating, deleting and managing containers called containerd.

Containerization in Docker

The best way to understand how Docker works is to imagine a container where inside we have everything necessary to run an application: files, environment variables, libraries, compilers, interpreters...

It is important to mention that the container system works very well for UNIX systems, so if you use Windows you will surely need a virtual machine to run a Linux distribution.

In short, Docker allows the generation of containers where we can isolate our applications libraries so that they can run on any device since the container houses all the dependencies and necessary for execution.

Images in Docker

Images are abstractions that allow containers to be created. We could view images as classes in an object-oriented programming (OOP) language. Using this simile, containers would be instances (objects) created from images (classes).

Typically, the instructions for building an image are found within a file called Dockerfile. In the instructions we find the Linux distribution, the libraries or the software necessary for execution. We can also specify which files have to be copied into the container that we will generate from the Docker image.

How Docker works

Normally, when we work with containers and specifically with the Docker tool, we use what is known as the Docker client. This component will contact through an API with the component in charge of generating, managing and executing both the images and the containers.

In order to handle images and containers well, it is important to know how to execute some of the most basic commands:

docker-version
docker pull
docker push
docker run
docker ps
docker exec
docker stop
docker kill

DockerHub

DockerHub is the official Docker repository. There are all types of images to be able to run containers: applications, databases, software... In addition, you can also create your repository and upload your own images using the docker push command, similar to how we would upload our code to Github or Gitlab.

Communication between containers

When we create, for example, a web application we can have different services built in different Docker containers. Even though they are isolated systems, they need to connect to each other to transfer data. This is where networking in Docker comes into play.

In Docker there are 3 different types of networks:

Network none : Used to define that said container does not have any network assigned.
network bridge : The bridge network is the one configured by default in all Docker containers. This allows communication and sending data between them through their IPs (each container is assigned an IP from the moment it is created).
Host network - This type of network does not isolate containers from the guest machine. Therefore, if the container has a service active on port number 80, the guest machine will also provide that service through this port.

Data persistence

The data generated within each Docker container is deleted once it is stopped. In order to persist the data regardless of whether we stop or start Docker, there is what is known as Volumes .

Volumes are folders where the data that needs to be maintained is stored. These folders are located within the guest machine, more specifically in the directory: /var/lib/docker/volumes .

Docker Compose

Docker Compose is a set of tools that allow us to launch multiple containers at the same time. When we create an application it can have several components: relational databases, non-relational databases, queue management services...

All these components have to be launched in different containers. Lifting one by one would be very tedious. That's why we do it through a configuration file called docker-compose.yml. This file specifies the images to use, the environment variables, the network specifications, the volumes where the data must be saved, among other configurations.

In future posts we will go into more detail about the technical part of Docker, specifying what the files and commands necessary to launch the containers should be like.

In addition, we will also see how tools such as Kubernetes or Docker Swarm allow us to launch several containers that interact with each other in a distributed manner, that is, on different nodes, allowing great scalability of the applications.