menu EXPLORE
history NEW

Column Oriented Database

On the blog we have seen a wide variety of different databases: relational databases , key-value databases , graph-oriented databases among others .

Today we are going to talk about a type of database that is becoming very popular due to the large number of advantages it offers. These are column-oriented databases. In this article I tell you what columnar databases are and what advantages their use offers.

What are column-oriented databases?

Columnar databases are a type of database that has a structure where data is stored in a columnar form instead of in rows, unlike relational databases.

This architecture is very important in business intelligence or business intelligence and is used in companies' structured data warehouses where they can later be used to optimize decision making and increase the organization's performance.

Columnar databases allow the needle that reads the data inside the disk drive to move less and find the information faster, speeding up the data extraction process. This is because the data in the same column is physically together.

difference between row-oriented and column-oriented database

This feature allows you to perform queries and analyze information much faster than using other types of database structures.

Advantages of using a column-oriented database

Using this type of database offers certain advantages that must be considered when choosing a database structure for your project.

Information compression

Using a columnar database manager allows for fast aggregation columnar operations such as grouping, median or maximum value calculations.

Scalability

They have great scalability since they can be used in a distributed way. Some examples are Cassandra or HBase from the Hadoop ecosystem.

Loading speed

Fast loading thanks to the columnar structure that allows you to query data from the hard drive efficiently.

Examples of columnar databases

There are several examples of DBs that use the columnar format. Below we show you a list of the most used by technology companies.

Apache HBase

Apache HBase is a distributed database on Hadoop that makes use of the columnar model to improve its efficiency. Its architecture is designed and optimized for systems that write little to disk but read a lot.

Some of its most important features are: horizontal scalability, data consistency and fault tolerance.

Apache Cassandra

Apache Cassandra is an open-source data management system designed to allow great horizontal scalability in data storage thanks to its distributed computing capacity. Cassandra stands out for being a fault-tolerant columnar database with good performance.

Amazon Redshift

Amazon Redshift is a cloud database service capable of storing petabytes of information in an optimized way. This enables large-scale information analysis and helps the data scientist implement business intelligence (BI) strategies using a SQL language .

BigQuery

BigQuery is a Google tool that is intended and designed to serve as a fully managed enterprise data warehouse for geospatial analytics, machine learning, and business intelligence.