November 10, 2021

Always On Availability Group - Offloading Reporting Approach

An availability group supports a failover environment for a discrete set of user databases, known as availability databases, that fail over together. An availability group supports a set of primary databases and corresponding secondary databases which continuously back up databases and their transaction logs on a regular basis.

 The Source database sends transaction log records to secondary database and then applies them to secondary database. Data synchronization occurs between the primary database and its corresponding secondary database so that changes are applied in real-time (or near-real-time) while users are still in the database. The secondary databases server must have the same specs as the primary databases server. AlwaysOn is implemented on a per-database basis. We don't need to add all databases on source database server. Only those databases which need for real time reporting need to configure for AlwaysOn.

This secondary database can be used for real-time reporting purpose. Also, if we would like to prevent read-only workloads from running on the primary database, we can use secondary database to allow only read-write access for running complex queries.

 Benefits

  1. Real time Reporting

  2. Enhanced Performance

  3. Complex queries can run without compromising database performance.

  4. High Availability (HA) with manual failover 

Infrastructure and Software needed for AlwaysOn

  1. Windows Server 2012 or later versions.

  2. SQL Server 2014 Enterprise edition or later versions

  3. Since we can run a query on a secondary replica, all SQL servers need to have license.

  4. Both SQL & OS Editions, Versions should be at same level on all participating nodes.

  5. All replicas in your AlwaysOn group must be in the same windows domain. They must be able to communicate with each other.

  6. SQL Server port must be opened at firewall level for communication between replicas.

  7. Ensure that each computer is a Node in a Windows Server Failover Clustering (WSFC)

Cost and Budget

For setting AlwaysOn, a minimum of 2 SQL servers with Enterprise edition (Source database and Secondary database)  are required.

 Analytics workload or read-only queries running on the primary replica can be run on the secondary replica without requiring any changes.




Comparison: AlwaysOn Vs Database Mirroring



AlwaysOn

Database Mirroring

Reporting

Readable replicas allow real time reporting

The mirror database is always in a state of recovery, which essentially means it can’t be used while mirroring is active. You can create a snapshot of the mirror database to use for reporting, but the snapshot is a point-in-time copy making it unsuitable for real time reporting purposes

Read-only routing

Supports the re-routing of read-intent connection requests to a readable secondary replica. 

This feature in not available in Database Mirroring.

Performance Considerations

Performing highly CPU, I/O-intensive read-only workloads on the secondary databases, it will improve the database performance of Source database.

Synchronous mirroring requires a 2-phase commit to take place which could introduce delays on the principal.

Additional H/W

Integrated with Windows Server Failover Clustering (WSFC)

Database mirroring doesn’t require WSFC but if you want to use it in High Safety mode for automatic failover, then you need to use a third SQL Server system as a witness.

Feature support

Only available with SQL Server Enterprise edition

Available with both SQL Server Enterprise and Standard edition. Synchronous mirroring is available only on Enterprise edition.

Risks

All active workloads must be licensed and requires additional cost of SQL server license.

Database Mirroring is available in SQL2012 and SQL2014 but will be deprecated in future SQL Server releases.

Schema or data changes

You cannot make schema or data changes on the secondary databases which limits creating additional indexes on secondary databases

The mirror database is always in a state of recovery, which essentially means it can’t be used while mirroring is active

Benefits

Provides High availability for SQL Server databases. Capacity of replicate databases to facilitate read only access which makes it possible for them to be used for multiple purposes at the same time. You can use a secondary database for reporting purposes.

Provides a high availability of the database and Do not need any singular hardware.

 


Understanding Relational database scaling

 Many applications today still rely on traditional relational databases like MS SQL, Oracle, MySQL or PostgreSQL for data storage and data processing. With the growing amount of data and new workloads that are made with these database systems, achieving scalability is a huge challenge for relational databases.


In this article, I will discuss addressing the need for database scalability, techniques to scale a relational database.


      

  1.  Addressing the need for database scalability:


  • You’ve reached the capacity number of users that your application is able to handle.

  • Application is beginning to slow down and error out. 

  • Network requests start timing out

  • Database queries are taking a while to execute

. 

  1.  Database scaling solutions:


  • Cache database queries in Memory

  • Replication 

  • Database sharding

  • Denormalization

  • SQL tuning


  1.  Horizontal scalability Vs Vertical scalability:


  • Vertical Scaling -  This process involves adding more power such as CPU and disk power to enhance your storage process.

  • Adv of Vertical Scaling -  Simple and no code change

  • Disadv of Vertical Scaling -  Expensive and Difficult to perform when server exceeds maximum load

  • Horizontal Scaling -  This process involves increasing  the number of machines and dividing the data set and distributing the data over them and improving both read and write performance.

  • Adding read replicas to handle Read-Heavy workloads. Reading from the cache before hitting the primary DB to reduce database load.

  • Adv of Horizontal Scaling -  Cheap and better performance

  • Disadv of Horizontal Scaling -  Making joins is difficult, as it may involve cross-server communication




  1.  Designing the database architecture to achieve scalability:

  • Decentralized data management architecture

  • Each individual microservice have its own separate database

  • Deploy and scale your microservices independently.


  1.  Recent scalability enhancements  in RDBMS

  • Connection Pooling

  • Command Query Responsibility Segregation (CQRS)

  • Multi Primary Replication

  • Data Centre Wise Partition


  1.  Implementing scaling solutions introduces the following complexities:

  • Adding new features takes longer

  • The system becomes more complex with more pieces and variables involved

  • Code can be more difficult to test

  • Finding and resolving bugs becomes harder

Creating DataFrames from CSV in Apache Spark

 from pyspark.sql import SparkSession spark = SparkSession.builder.appName("CSV Example").getOrCreate() sc = spark.sparkContext Sp...