Always on Availability Group Capacity

There are a number of factors to consider when undertaking the capacity-planning phase for your AG environment. One such factor is the amount of resources each replica will require to provide appropriate performance for the system. To work this out, you need to take into account not only the normal activity that your environment will undertake but also the number of databases that will participate in the data transfers from your primary replica to your identified secondary replicas. There are, of course, many other factors as well that need to be taken into consideration, but for this article we are only concerned with working out the scale of the resources in terms of the Worker Threads.

AGs have the following worker thread requirements (As quoted from “Prerequisites, Restrictions, and Recommendations for AlwaysOn Availability Groups (SQL Server) – Thread Usage by Availability Groups”):

An idle instance of SQL Server, AlwaysOn Availability Groups uses 0 threads
The maximum number of threads used by Availability Groups is the configured setting for the maximum number of threads (‘max worker threads’) minus 40
The availability replicas hosted on a given server instance share a single thread pool (HADR Worker Pool)
Threads are shared on an on-demand basis :
Typically there are 3-10 shared threads, but this can increase depending on the primary replica workload
If a given thread is idle for a while it is released back into the general SQL Server thread pool. Normally, an inactive thread is released after ~ 15 seconds of inactivity. However, depending on the last activity, an idle thread might be retained longer
In addition, availability groups useunshared threads as follows:
Each primary replica uses 1 Log capture thread for each primary database. In addition, it uses 1 Log send thread for each secondary database. Log send threads are released after ~ 15 seconds of inactivity.
Each Secondary replica uses 1 redo thread for each secondary database. Redo threads are released after ~ 15 seconds of inactivity
A backup on a secondary replica holds a thread on the primary replica for the duration of the backup operation.

Even though AGs are an extension of the database mirroring technology, the worker thread usage for AGs is different than with database mirroring. Database mirroring used dedicated threads per database compared to the request queue and worker pool (HADR Worker Pool) to handle the requests. Bob Dorr gives a really in-depth account of how the worker pool is utilised by AGs.

There are other worker thread requirements beyond these. The minimum number of work threads required just to facilitate having AGs configured can be calculated from

the number of AGs you have configured in your instance of SQL Server
the number of availability databases in each of the AGs
the number of availability replicas (2-5 replicas. A maximum of 4 secondary replicas with SQL Server 2012)

To calculate the minimum worker thread pool size required, for this article our environment will be configured as follows:

One Primary Replica
One Secondary Replica
One Availability Group
One Hundred Availability Databases

The information that we will use in the minimum pool size calculation is:

Database Count (DC)
Secondary Replica Count (SRC)
Log Capture Thread (LCT)
Log Send Thread (LST)
Message Handler Thread (MHT)

For each database participating in an AG, one LCT is used to capture the transactions occurring on the database. One LST is required for each secondary replica in the AG. To allow the process to work at least one MHT is required to handle the communication occurring between replicas.

The algorithm we are going to use as outlined by Bob Dorr’s Article – HADRON Learning Series: Worker Pool Usage for HADRON enabled Databases is:

Minimum Pool size = (DC x (LCT + (LST x SRC))) + MHT

In a worst case scenario for our environment mentioned above all one hundred replica databases are actively being used.

Minimum Pool Size = (100 x (1 + (1 x 1))) + 1
Minimum Pool Size = (100 x (1 + (1))) + 1
Minimum Pool Size = (100 x 2) + 1
Minimum Pool Size = 201

The Minimum Pool size required for our environment in this article would be 201. If we then looked at the worker thread table above, we’d see that the number of threads available to the system on a 64 bit 4 cpu server would be 512. From the AG requirements outlined earlier in this article “The maximum number of threads used by Availability Groups is the configured setting for the maximum number of threads (‘max worker threads’) minus 40” we can calculate the maximum available for your AGs would be 472 (512 – 40). Our calculation shows we will need 201 threads to service AG workload however this is not taking into account the number of threads that would be required for the transactional activity that will be occurring on the system or the number required for performing backups on the system. So we can see that the number of threads available are being used up very quickly not leaving any available resources for workload utilisation spikes.

If you try to add too many availability databases to an AG via a script, you will run out of worker threads very quickly. Your script will lose its connection and will not complete the process. Your AG environment will have databases sitting in a state where they are not part of the AG. In order to get back online with at least your primary replica databases, you will need to drop your AG(s) and restore, where necessary any irrecoverable databases on the primary replica with the backups taken as part of the initial AG setup process.

Because of the risk of things going wrong, any deployment into an AG needs to be carefully planned, with a prior calculation of the resources required, especially when these AGs are part of your production environment. Make sure you understand your existing capacity requirements and forecast the potential growth in activity or utilisation so you are able to identify appropriate server resources to meet your needs.

References:

AG Prerequisites, Restrictions & Recommendations – http://technet.microsoft.com/en-us/library/ff878487.aspx
Bob Dorrs Worker Threads – http://blogs.msdn.com/b/psssql/archive/2012/05/17/alwayson-hadron-learning-series-worker-pool-usage-for-hadron-enabled-databases.aspxAlways