Choosing which database to use is one of the most important decisions you can make when starting working on a new app or website.
If you realize down the line that you’ve made the wrong choice, migrating to another database is very costly and sometimes more complex to do with zero downtime.
Taking time to make an informed choice of database technology upfront can be a valuable early decision for your application.
(A bonus reason for why this is important is because understanding the different DBs and their properties & which one to choose over the other is quite commonly asked in job interviews)
Each database technology has advantages and disadvantages. Cloud providers like Amazon offer various database and storage options making it harder to figure out which one is the right one.
Relational databases have been a primary data storage mechanism for a long time. Non-relational databases have existed since the 1960s but recently gained traction with popular options such as MongoDB.
In this article, we will go through the factors you should consider while choosing any database.
The language determines the database
First things first. The language or the technology you are going to use never determines the database ❌
We’ve grown accustomed to technology stacks, such as MERN (MongoDB, Express, React, Node.js) & LAMP (Linux, Apache, MySQL, PHP).
There are certain reasons why these stacks evolved. But don’t assume these are the rules. You can use a MongoDB database in your Java project. You can use MySQL in your Node.js. You might not have found as many tutorials, but it's your requirements that should determine the database type & not the language.
Understanding the Tradeoff
Let's now understand the basic tradeoffs we need to deal with while making the DB decision.
The reason that we have many database options available today is due to the CAP Theorem. CAP stands for Consistency, Availability, and Partition tolerance.
Consistency means that any read request will return the most recent write.
Availability means all (non-failing) nodes are available for queries & must respond in a reasonable amount of time.
Partition Tolerance means that the system will continue to work despite node failures.
Only two of these 3 requirements can be fulfilled at any given time. If you're building a distributed app then partition tolerance is a must. So the choice remains whether we want our database to be highly available or highly consistent.
Structure of Data
If your data needs to be structured or there is a relation between different types of data that you're going to keep and at the same time you want strict data integrity then a relational database should be a better choice for you.
For example, let's say we are making a student management application where we need to store the information of certain courses and every course needs to be taken by one or more students. In this case, we can create 2 tables (
Courses) where the value in the
Student ID column inside the
Courses table points to rows in the
Students table by the value of their
Apart from this if you want your DB to be ACID compliant for example in cases when you're handling payments and transactions in your application then in that case too you should prefer the SQL based databases because the counterpart i.e the NoSQL database offers weak consistency.
ACID compliance protects the integrity of your data by defining exactly what a transaction is and how it interacts with your database. It avoids database tables from becoming out-of-sync, which is super important for financial transactions. ACID compliance guarantees the validity of transactions even in failures, disastrous events, and more.
On the other hand, if your data requirements aren’t clear or if your data is unstructured, NoSQL may be your best choice.
The data you store in a NoSQL database does not need a predefined schema. This provides much more flexibility and less upfront planning when managing your database.
A NoSQL database is a much better fit to store data like article content, social media posts, sensor data, and other types of unstructured data that won’t fit neatly into a table. NoSQL databases were built with flexibility and scalability in mind and follow the BASE consistency model, which means:
This means that while the database guarantees the availability of the data, the database may fail to obtain the requested data, or the data may be in a changing or inconsistent state.
The state of the database can be changing over time.
The database will eventually become consistent, and data will propagate everywhere at some point in the future.
The structure of your data is the most important factor in deciding whether to use a SQL or NoSQL database, so put a lot of thought into this before making a decision.
The next factor to consider is how you’ll query your data. This is one of the main ways to find the best database for your use case
Do you need retrieval by a single key, or by various other parameters? Do you also need a fuzzy search on the data?
Use Non-Relational databases if you are going to fetch data by key, then all you need is a key-value store (e.g. DynamoDB).
On the other hand, if you will require to query many different fields you can choose both Relational DB (e.g.MySQL) or Document DB (e.g.MongoDB).
In case you are looking for fuzzy search query capabilities (free text search), then search engines like Elasticsearch(Which also comes under NoSQL DBs) are the best fit.
But in case your data is nicely structured and organized, it is very efficient to query your data with a SQL database.
SQL is a popular query language that has been around for over 50 years now, so it’s highly mature and well-known. It efficiently executes queries and retrieves using JOINs and edits data quickly. It’s very lightweight and declarative.
Is strong consistency required (read after write) or eventual consistency is OK?
In case you need to read your data right after your write operation (i.e. strong consistency) then a Relational database (e.g. MySQL) is usually more suited than a Non-Relational Database (e.g.MongoDB).
Performance & Scaling
All databases' performance degrades as the amount of read/write traffic increases. This is the time when optimizations such as indexing your data and scaling your DB come into the picture.
Performance depends on various factors but the overall performance depends to a very large degree on choosing the right implementation for your use case.
SQL and NoSQL databases scale differently, so you’ll have to think about how your data set will grow in the future.
SQL databases scale vertically, meaning you’ll need to increase the capacity of a single server (increasing CPU, RAM, or SSD) to scale your database. SQL databases were designed to maintain the integrity of the data so they are best to run on a single server, hence they’re not easy to scale.
NoSQL databases are easy to scale horizontally, which means you can add more servers as your data grows. This is an advantage that NoSQL has over SQL.
This uncomplicated horizontal scaling nature of non-relational databases makes them superior to relational databases as far as availability is concerned.
The ability of NoSQL databases to scale horizontally has to do with the lack of structure of the data. Because NoSQL requires much less structure than SQL, each stored object is pretty much self-contained. Thus objects can be easily stored on multiple servers without having to be linked. This is not the case for SQL, where each table row and column needs to be related.
This is also the issue regarding performance and where SQL falls short is scaling. As a database grows in size and numbers, RDBMS as we have seen requires solutions in vertical scaling. This however comes at a significant cost. Although many commercial RDBMS products offer horizontal scaling, these can also be very expensive and even complex to implement.
If you predict you will face such an issue, then NoSQL is to be considered, as many of them were designed specifically to tackle these scale and performance issues.
In this article, we went through the multiple factors you should consider while selecting the type of database. In short, SQL databases provide great benefits for transactional data whose structure doesn’t change frequently (or at all) and where data integrity is paramount. It’s also best for fast analytical queries. NoSQL databases provide more flexibility and scalability, which makes them useful for rapid or iteration development.
Here's a small summary of what all we have discussed above:
Reasons to use an SQL database
- When you need ACID support
- Your application requires high transactions
- Data integrity is essential
- You don’t anticipate a lot of changes in the schema
Reasons to use a NoSQL database
- You want to store large amounts of data with no structure
- You keep getting Unrelated, indeterminate, or evolving data requirements
- Speed and scalability are critical
- When data integrity is not your top goal and you are concerned about the availability (& eventual consistency will be good enough for your use case)
I hope this helps you understand what you need to think about when selecting your database. What additional questions do you think about while selecting a database? Let me know your thoughts in the comments.