Time to re-evaluate Db Strategy?
TL;DR
Traditionally enterprise would select ibm db2, oracle rdbms, or ms sql server most of the time due to proper support and vendor track record. Startups would traditionally select mysql or postgresql due to cost considerations.
Normally rdbms would be a preferred choice due to commodity skillset and strong data consistency.
Today need for budgetary high performance and geo co-location requirements are due to challenge the common wisdom of what db we should be using.
Background
- Businesses tend to require multi-geo-regions operation, high-availability, and fast disaster recovery.
- TCO (total cost of ownership) of running applications in the public cloud seemingly is lower than operating own data centres
- There seems to be a trend of adopting public cloud or hybrid cloud
- Traditional technology selections made in the last 10 years may not support these requirements and tendencies
- Web and Application Servers, even when perfectly architected and completely stateless still depend on a stateful back-end system(s)
- Traditional Rdbms (relational database management systems) are falling short on the active/active, across continent/around the globe, high-throughput and high-availability capabilities, even though there is Golden Gate and Transaction Replication option listed for two major vendors
Common denial: major Rdbms support high performance/availability
Officially the do, the devil is in details here:
- there is active/passive and there is active/active - rdbms are closer to active/passive than to active/active
- there is active/active built into the db engine running on commodity hardware vs. reliance on proprietary "cheap" hardware
- there is active/active in the same/near-by data centre and there is East Coast/West Coast distance - NoSql database were designed to target the long(er) distance replication than Rdbms
- there is scale-up and there is scale-out - when throughput is in question scaling up will have to give up at some point as there is only so many cpu's you can afford on a large scale box
Pragmatic approach
CAP theorem describes the balance between three factors where two must be picked over a third one
Rdbms designed to focus on consistency at the cost of throughput and partition failure tolerance
Oracle/Sql Server/My Sql/etc typically offer replication with a primarily purpose of increasing high availability and improving read-performance
Clustering technology offered with major db's cannot avoid the basic design principle for Rdbms: consistency is the most important aspect, therefore contention point has to be built-in
Replication mechanism offered by Oracle with Golden Gate turns Oracle Rdbms into poorman high latency eventually consistent database over long distance. 7min to 1hour under normal circumstance according to dba's much more knowledgeable in Oracle Rdbms.
Replication mechanism offered by MS Sql Server falls into similar pattern.
I am not familiar enough with DB2, MySql and others to cast a strong opinion, but to me there are still rdbms and CAP theorem must apply to them too.
Aws Aurora Db is offering high availability, but I have not found a reference to sharding - a crucial distributed database component.
NoSql databases have evolved to address the shortcomings of Rdbms and have introduced their own shortcomings
What's next?
Next is a step back: to re-evaluate db engines available in modern times and what they are best for in our environment.
NoSql comes in different shapes and with different objectives in mind, I am afraid there is no more 'the best all around db engine' to propose to.
DaaS (database as a service) seems as more and more attractive option in case multiple db engines are desired to solve business problems - who in their mind wants to operate 5 different db engines...
How database engines rank in their popularity
Rank | DBMS | Database Model | Score | ||||
Apr
2017 | Mar
2017 | Apr
2016 | Apr
2017 | Mar
2017 | Apr
2016 | ||
1. | 1. | 1. | 1402.00 | +2.50 | -65.54 | ||
2. | 2. | 2. | 1364.62 | -11.46 | -5.49 | ||
3. | 3. | 3. | 1204.77 | -2.72 | +69.72 | ||
4. | 4. | ![]() | 361.77 | +4.14 | +58.05 | ||
5. | 5. | ![]() | 325.43 | -1.51 | +12.98 | ||
6. | 6. | 6. | 186.66 | +1.74 | +2.57 | ||
7. | 7. | 7. | 128.18 | -4.76 | -3.79 | ||
8. | 8. | 8. | 126.18 | -3.01 | -3.49 | ||
9. | ![]() | 9. | 114.36 | +1.35 | +3.12 | ||
10. | ![]() | 10. | 113.80 | -2.39 | +5.83 |
Google Popularity Trends
Rdbms

NoSql

References