Data Engineering with Avishkar: 11/01/2021

November 10, 2021

Always On Availability Group - Offloading Reporting Approach

An availability group supports a failover environment for a discrete set of user databases, known as availability databases, that fail over together. An availability group supports a set of primary databases and corresponding secondary databases which continuously back up databases and their transaction logs on a regular basis.

The Source database sends transaction log records to secondary database and then applies them to secondary database. Data synchronization occurs between the primary database and its corresponding secondary database so that changes are applied in real-time (or near-real-time) while users are still in the database. The secondary databases server must have the same specs as the primary databases server. AlwaysOn is implemented on a per-database basis. We don't need to add all databases on source database server. Only those databases which need for real time reporting need to configure for AlwaysOn.

This secondary database can be used for real-time reporting purpose. Also, if we would like to prevent read-only workloads from running on the primary database, we can use secondary database to allow only read-write access for running complex queries.

Benefits

Real time Reporting
Enhanced Performance
Complex queries can run without compromising database performance.
High Availability (HA) with manual failover

Infrastructure and Software needed for AlwaysOn

Windows Server 2012 or later versions.
SQL Server 2014 Enterprise edition or later versions
Since we can run a query on a secondary replica, all SQL servers need to have license.
Both SQL & OS Editions, Versions should be at same level on all participating nodes.
All replicas in your AlwaysOn group must be in the same windows domain. They must be able to communicate with each other.
SQL Server port must be opened at firewall level for communication between replicas.
Ensure that each computer is a Node in a Windows Server Failover Clustering (WSFC)

Cost and Budget

For setting AlwaysOn, a minimum of 2 SQL servers with Enterprise edition (Source database and Secondary database) are required.

Analytics workload or read-only queries running on the primary replica can be run on the secondary replica without requiring any changes.

Comparison: AlwaysOn Vs Database Mirroring

	AlwaysOn	Database Mirroring
Reporting	Readable replicas allow real time reporting	The mirror database is always in a state of recovery, which essentially means it can’t be used while mirroring is active. You can create a snapshot of the mirror database to use for reporting, but the snapshot is a point-in-time copy making it unsuitable for real time reporting purposes
Read-only routing	Supports the re-routing of read-intent connection requests to a readable secondary replica.	This feature in not available in Database Mirroring.
Performance Considerations	Performing highly CPU, I/O-intensive read-only workloads on the secondary databases, it will improve the database performance of Source database.	Synchronous mirroring requires a 2-phase commit to take place which could introduce delays on the principal.
Additional H/W	Integrated with Windows Server Failover Clustering (WSFC)	Database mirroring doesn’t require WSFC but if you want to use it in High Safety mode for automatic failover, then you need to use a third SQL Server system as a witness.
Feature support	Only available with SQL Server Enterprise edition	Available with both SQL Server Enterprise and Standard edition. Synchronous mirroring is available only on Enterprise edition.
Risks	All active workloads must be licensed and requires additional cost of SQL server license.	Database Mirroring is available in SQL2012 and SQL2014 but will be deprecated in future SQL Server releases.
Schema or data changes	You cannot make schema or data changes on the secondary databases which limits creating additional indexes on secondary databases	The mirror database is always in a state of recovery, which essentially means it can’t be used while mirroring is active
Benefits	Provides High availability for SQL Server databases. Capacity of replicate databases to facilitate read only access which makes it possible for them to be used for multiple purposes at the same time. You can use a secondary database for reporting purposes.	Provides a high availability of the database and Do not need any singular hardware.

Understanding Relational database scaling

Many applications today still rely on traditional relational databases like MS SQL, Oracle, MySQL or PostgreSQL for data storage and data processing. With the growing amount of data and new workloads that are made with these database systems, achieving scalability is a huge challenge for relational databases.

In this article, I will discuss addressing the need for database scalability, techniques to scale a relational database.

Addressing the need for database scalability:

You’ve reached the capacity number of users that your application is able to handle.
Application is beginning to slow down and error out.
Network requests start timing out
Database queries are taking a while to execute

Database scaling solutions:

Cache database queries in Memory
Replication
Database sharding
Denormalization
SQL tuning

Horizontal scalability Vs Vertical scalability:

Vertical Scaling - This process involves adding more power such as CPU and disk power to enhance your storage process.
Adv of Vertical Scaling - Simple and no code change
Disadv of Vertical Scaling - Expensive and Difficult to perform when server exceeds maximum load
Horizontal Scaling - This process involves increasing the number of machines and dividing the data set and distributing the data over them and improving both read and write performance.
Adding read replicas to handle Read-Heavy workloads. Reading from the cache before hitting the primary DB to reduce database load.
Adv of Horizontal Scaling - Cheap and better performance
Disadv of Horizontal Scaling - Making joins is difficult, as it may involve cross-server communication

Designing the database architecture to achieve scalability:

Decentralized data management architecture
Each individual microservice have its own separate database
Deploy and scale your microservices independently.

Recent scalability enhancements in RDBMS

Connection Pooling
Command Query Responsibility Segregation (CQRS)
Multi Primary Replication
Data Centre Wise Partition

Implementing scaling solutions introduces the following complexities:

Adding new features takes longer
The system becomes more complex with more pieces and variables involved
Code can be more difficult to test
Finding and resolving bugs becomes harder

Data Engineering with Avishkar