November 10, 2021

Understanding Relational database scaling

 Many applications today still rely on traditional relational databases like MS SQL, Oracle, MySQL or PostgreSQL for data storage and data processing. With the growing amount of data and new workloads that are made with these database systems, achieving scalability is a huge challenge for relational databases.


In this article, I will discuss addressing the need for database scalability, techniques to scale a relational database.


      

  1.  Addressing the need for database scalability:


  • You’ve reached the capacity number of users that your application is able to handle.

  • Application is beginning to slow down and error out. 

  • Network requests start timing out

  • Database queries are taking a while to execute

. 

  1.  Database scaling solutions:


  • Cache database queries in Memory

  • Replication 

  • Database sharding

  • Denormalization

  • SQL tuning


  1.  Horizontal scalability Vs Vertical scalability:


  • Vertical Scaling -  This process involves adding more power such as CPU and disk power to enhance your storage process.

  • Adv of Vertical Scaling -  Simple and no code change

  • Disadv of Vertical Scaling -  Expensive and Difficult to perform when server exceeds maximum load

  • Horizontal Scaling -  This process involves increasing  the number of machines and dividing the data set and distributing the data over them and improving both read and write performance.

  • Adding read replicas to handle Read-Heavy workloads. Reading from the cache before hitting the primary DB to reduce database load.

  • Adv of Horizontal Scaling -  Cheap and better performance

  • Disadv of Horizontal Scaling -  Making joins is difficult, as it may involve cross-server communication




  1.  Designing the database architecture to achieve scalability:

  • Decentralized data management architecture

  • Each individual microservice have its own separate database

  • Deploy and scale your microservices independently.


  1.  Recent scalability enhancements  in RDBMS

  • Connection Pooling

  • Command Query Responsibility Segregation (CQRS)

  • Multi Primary Replication

  • Data Centre Wise Partition


  1.  Implementing scaling solutions introduces the following complexities:

  • Adding new features takes longer

  • The system becomes more complex with more pieces and variables involved

  • Code can be more difficult to test

  • Finding and resolving bugs becomes harder

No comments:

Creating DataFrames from CSV in Apache Spark

 from pyspark.sql import SparkSession spark = SparkSession.builder.appName("CSV Example").getOrCreate() sc = spark.sparkContext Sp...