Choosing the right database for your project can be a challenging task, especially when dealing with a large dataset. Recently, Axcito was approached by a client who wanted to perform a Proof of Concept (POC) on which database to use for their very large dataset, which they intended to open via public APIs. In this article, we will discuss how Axcito approached the decision-making process for selecting the appropriate database for the client's needs.
Decision Drivers
Axcito had several decision drivers for selecting the appropriate database for the client, which included:
- Delivering quickly is very important
- Data model might change over time
- Data will be used for analytics and/or machine learning
- Should be easy to manage (Managed service is preferred)
Considered Alternatives
Axcito evaluated several database alternatives, which included:
- DynamoDB
- SQL (MySQL or PostgreSQL)
- MongoDB
Pros and Cons of the Alternatives
DynamoDB:
The client had already used DynamoDB in the project, and they loved it for its speedy retrieval when required, which was critical for their mobile app performance. However, DynamoDB is not flexible when new data access patterns are created, which is the case for this dataset. Additionally, it might be a total new tech to learn for developers, and there is minimal tooling to play with it locally. DynamoDB could get expensive quickly if more reads occur, and once you have data in Dynamo, it might be challenging to move the data to a different type of storage.
SQL (MySQL or PostgreSQL):
SQL is almost familiar to all developers, and there are AWS managed RDBMS services like Aurora. It is easy to retrieve data and create new indexes, and automatic minor software patching ensures that the PostgreSQL software powering your deployment stays up-to-date with the latest minor patches. However, whenever data model changes, there will be database changes and backend code changes. Additionally, it is not a global database, since you only have one master database, resulting in increased latency when being further away from the region where the database is deployed.
MongoDB:
MongoDB is a document database that keeps semi-structured data along with its description in the document format, and it has a dynamic schema for new/unstructured data. It is easy to retrieve data and create new indexes, and data can be flexibly stored without having a pre-defined structure. However, it is not a good fit for complex queries as there is no standard interface in NoSQL for handling queries, and it is not an AWS managed service, which would require managing the platform for it.
Decision Outcome
After the POC, Axcito gave up on DynamoDB since data access patterns were not consistent, and it required more proficiency. At the time, there was no managed AWS service for MongoDB. Axcito decided to design a new database in AWS managed PostgreSQL with a database backup plan and migrate the current data to the new database, mostly manually. The team agreed to maintain quality documentation from design to tool around the database and placed the RDS database in a private subnet to ensure security. Access to the database with a GUI tool became complex, and access to the database CLI was through Session manager. The team also considered moving to Aurora for better performance globally.
Conclusion
Choosing the right database for your project is critical to its success. Axcito's decision to use PostgreSQL was based on the client's needs and the decision drivers for the project. The team considered several alternatives before settling on PostgreSQL, which was the best fit for the project. We hope this article helps you in your decision-making process for your next project. If you need help, feel free to reach out to Axcito.
Comments
Post a Comment