How to Choose The Right Database for Your Application

Introduction

Choosing which database to use is one of the most important decisions you can make when starting working on a new app or website.

If you realize down the line that you’ve made the wrong choice, migrating to another database is very costly and sometimes more complex to do with zero downtime.

Taking time to make an informed choice of database technology upfront can be a valuable early decision for your application.

(A bonus reason for why this is important is because understanding the different DBs and their properties & which one to choose over the other is quite commonly asked in job interviews)

Each database technology has advantages and disadvantages. Cloud providers like Amazon offer various database and storage options making it harder to figure out which one is the right one.

Relational databases have been a primary data storage mechanism for a long time. Non-relational databases have existed since the 1960s but recently gained traction with popular options such as MongoDB.

In this article, we will go through the factors you should consider while choosing any database.

The language determines the database

First things first. The language or the technology you are going to use never determines the database ❌

We’ve grown accustomed to technology stacks, such as MERN (MongoDB, Express, React, Node.js) & LAMP (Linux, Apache, MySQL, PHP).

There are certain reasons why these stacks evolved. But don’t assume these are the rules. You can use a MongoDB database in your Java project. You can use MySQL in your Node.js. You might not have found as many tutorials, but it's your requirements that should determine the database type & not the language.

Understanding the Tradeoff

Let's now understand the basic tradeoffs we need to deal with while making the DB decision.

The reason that we have many database options available today is due to the CAP Theorem. CAP stands for Consistency, Availability, and Partition tolerance.

Consistency means that any read request will return the most recent write.

Availability means all (non-failing) nodes are available for queries & must respond in a reasonable amount of time.

Partition Tolerance means that the system will continue to work despite node failures.

Only two of these 3 requirements can be fulfilled at any given time. If you're building a distributed app then partition tolerance is a must. So the choice remains whether we want our database to be highly available or highly consistent.

Structure of Data

If your data needs to be structured or there is a relation between different types of data that you're going to keep and at the same time you want strict data integrity then a relational database should be a better choice for you.

For example, let's say we are making a student management application where we need to store the information of certain courses and every course needs to be taken by one or more students. In this case, we can create 2 tables (Students and Courses) where the value in the Student ID column inside the Courses table points to rows in the Students table by the value of their ID column.

Apart from this if you want your DB to be ACID compliant for example in cases when you're handling payments and transactions in your application then in that case too you should prefer the SQL based databases because the counterpart i.e the NoSQL database offers weak consistency.

ACID compliance protects the integrity of your data by defining exactly what a transaction is and how it interacts with your database. It avoids database tables from becoming out-of-sync, which is super important for financial transactions. ACID compliance guarantees the validity of transactions even in failures, disastrous events, and more.

On the other hand, if your data requirements aren’t clear or if your data is unstructured, NoSQL may be your best choice.

The data you store in a NoSQL database does not need a predefined schema. This provides much more flexibility and less upfront planning when managing your database.

A NoSQL database is a much better fit to store data like article content, social media posts, sensor data, and other types of unstructured data that won’t fit neatly into a table. NoSQL databases were built with flexibility and scalability in mind and follow the BASE consistency model, which means:

Basic Availability

This means that while the database guarantees the availability of the data, the database may fail to obtain the requested data, or the data may be in a changing or inconsistent state.

Soft state

The state of the database can be changing over time.

Eventual consistency

The database will eventually become consistent, and data will propagate everywhere at some point in the future.

The structure of your data is the most important factor in deciding whether to use a SQL or NoSQL database, so put a lot of thought into this before making a decision.

Query Patterns

The next factor to consider is how you’ll query your data. This is one of the main ways to find the best database for your use case

Do you need retrieval by a single key, or by various other parameters? Do you also need a fuzzy search on the data?

Use Non-Relational databases if you are going to fetch data by key, then all you need is a key-value store (e.g. DynamoDB).
On the other hand, if you will require to query many different fields you can choose both Relational DB (e.g.MySQL) or Document DB (e.g.MongoDB).
In case you are looking for fuzzy search query capabilities (free text search), then search engines like Elasticsearch(Which also comes under NoSQL DBs) are the best fit.
But in case your data is nicely structured and organized, it is very efficient to query your data with a SQL database.

SQL is a popular query language that has been around for over 50 years now, so it’s highly mature and well-known. It efficiently executes queries and retrieves using JOINs and edits data quickly. It’s very lightweight and declarative.

Consistency

Is strong consistency required (read after write) or eventual consistency is OK?

In case you need to read your data right after your write operation (i.e. strong consistency) then a Relational database (e.g. MySQL) is usually more suited than a Non-Relational Database (e.g.MongoDB).

Performance & Scaling

All databases' performance degrades as the amount of read/write traffic increases. This is the time when optimizations such as indexing your data and scaling your DB come into the picture.

Performance depends on various factors but the overall performance depends to a very large degree on choosing the right implementation for your use case.

SQL and NoSQL databases scale differently, so you’ll have to think about how your data set will grow in the future.

SQL databases scale vertically, meaning you’ll need to increase the capacity of a single server (increasing CPU, RAM, or SSD) to scale your database. SQL databases were designed to maintain the integrity of the data so they are best to run on a single server, hence they’re not easy to scale.

NoSQL databases are easy to scale horizontally, which means you can add more servers as your data grows. This is an advantage that NoSQL has over SQL.

This uncomplicated horizontal scaling nature of non-relational databases makes them superior to relational databases as far as availability is concerned.

The ability of NoSQL databases to scale horizontally has to do with the lack of structure of the data. Because NoSQL requires much less structure than SQL, each stored object is pretty much self-contained. Thus objects can be easily stored on multiple servers without having to be linked. This is not the case for SQL, where each table row and column needs to be related.

This is also the issue regarding performance and where SQL falls short is scaling. As a database grows in size and numbers, RDBMS as we have seen requires solutions in vertical scaling. This however comes at a significant cost. Although many commercial RDBMS products offer horizontal scaling, these can also be very expensive and even complex to implement.

If you predict you will face such an issue, then NoSQL is to be considered, as many of them were designed specifically to tackle these scale and performance issues.

Conclusion

In this article, we went through the multiple factors you should consider while selecting the type of database. In short, SQL databases provide great benefits for transactional data whose structure doesn’t change frequently (or at all) and where data integrity is paramount. It’s also best for fast analytical queries. NoSQL databases provide more flexibility and scalability, which makes them useful for rapid or iteration development.

Here's a small summary of what all we have discussed above:

Reasons to use an SQL database

When you need ACID support
Your application requires high transactions
Data integrity is essential
You don’t anticipate a lot of changes in the schema

Reasons to use a NoSQL database

You want to store large amounts of data with no structure
You keep getting Unrelated, indeterminate, or evolving data requirements
Speed and scalability are critical
When data integrity is not your top goal and you are concerned about the availability (& eventual consistency will be good enough for your use case)

I hope this helps you understand what you need to think about when selecting your database. What additional questions do you think about while selecting a database? Let me know your thoughts in the comments.

How to Choose The Right Database for Your Application

Comments (2)

Distributed Systems

Message Queues and Asynchronous Processing

More from this blog

How to Size Java Database Connection Pools in Kubernetes

What is Jib? A Complete Guide to Java Containerization Without Dockerfiles

Kafka Consumer Container Restarts in Kubernetes: A Production Case Study

How we solved cache invalidation in Kubernetes with a headless service

Going Vernacular: Engineering Our Way to Process Multilingual Names

Introduction

The language determines the database

Understanding the Tradeoff

Structure of Data

Basic Availability

Soft state

Eventual consistency

Query Patterns

Consistency

Performance & Scaling

Conclusion

Reasons to use an SQL database

Reasons to use a NoSQL database

Command Palette

Comments (2)

Distributed Systems

Message Queues and Asynchronous Processing

More from this blog

Introduction

The language determines the database

Understanding the Tradeoff

Structure of Data

Basic Availability

Soft state

Eventual consistency

Query Patterns

Consistency

Performance & Scaling

Conclusion

Reasons to use an SQL database

Reasons to use a NoSQL database