Blog Post

Azure Database Support Blog
3 MIN READ

TDE High availability with customer-managed key for Azure SQL Databases

Mousa_Janini's avatar
Mousa_Janini
Icon for Microsoft rankMicrosoft
Oct 16, 2022

When using CMK to protect data at rest, customers are responsible for and in a full control of a key lifecycle management (key creation, upload, rotation, deletion), the key used for encryption of the Database Encryption Key (DEK), called TDE protector, is a customer-managed asymmetric key stored in a customer-owned and customer-managed Azure Key Vault (AKV).

 

If the server loses access to the stored Database Encryption Key (DEK) in AKV, in up to 10 minutes a database will start denying all connections with the corresponding error message and change its state to Inaccessible. The only action allowed on a database in the Inaccessible state is deleting it. For more information see Inaccessible TDE protector.

 

Losing access to Key vault for TDE, and when TDE protector become Inaccessible:

There is a portal experience to re-validate key permission and to trigger a workflow to make the database available again. We don’t have an SLA published, however it depends on how large the database is to bring it back online. Its approximately from few min to hours. Within the 30 minutes timeline, we keep the database around by disabling external connection to the database for customer to notice that database lost key access and fix the issue immediately. After 30 minutes we want to move the database to stable state to avoid any issues in the system and to keep the database inaccessible since key is gone/revoked.

 

When a database become inaccessible for more than 30 minutes, the database service tier can play an important role here, for Standard/General Purpose service tiers, the recovery will be much faster than Premium/Business Critical service tiers that’s because the Standard/GP is using a remote storage and will use the attach/detach process retrieve data; while Premium/Business Critical service tiers will trigger a restore process on the backend and the restoration operation has an RTO up to 12 hours (Most database restores finish in less than 12 hours) depending on many factors that can affect the recovery time such as the size of the database, the compute size of the database, number of transaction logs involved, network bandwidth. More information here: Recovery time.

 

To ensure high availability it's highly recommended to configure the server to use two different key vaults in two different regions with the same key material.

 

PowerShell example:

 

# add the key from Key Vault to the server

Add-AzSqlServerKeyVaultKey -ResourceGroupName <SQLDatabaseResourceGroupName> -ServerName <LogicalServerName> -KeyId <KeyVaultKeyId>

 

# Confirm server's keys added

Get-AzSqlServerKeyVaultKey -ResourceGroupName <SQLDatabaseResourceGroupName> -ServerName <LogicalServerName>

 

Note: The key in the secondary key vault in the other region shouldn't be marked as TDE protector, and it's not even allowed.

 

 

  • If there's an outage affecting the primary key vault, and only then, the system will automatically switch to the other linked key with the same thumbprint in the secondary key vault, if it exists. Note though that switch won't happen if TDE protector is inaccessible because of revoked access rights, or because key or key vault is deleted, as it may indicate that customer intentionally wanted to restrict server from accessing the key.

 

Thank you!

Updated Nov 26, 2023
Version 2.0
  • BaronSmith's avatar
    BaronSmith
    Copper Contributor

    Thanks for the great article.  I set this up in my environment, using the commands you provided.  One important difference is that we don't have public access for any of our resources, including key vaults and Azure SQL Database.  Therefore, I could not test it in the same way you did - because public access on our key vaults is already disabled.  I therefore decided to test by deleting the TDE key (soft delete) in the primary vault.  I expected that the SQL Server would automatically go to the secondary vault to find they key (which is in there).  However, it didn't work.  Instead, my database became inaccessible due to the SQL server not having a TDE key to use.  I feel confident that I have it all set up properly - including permissions on both primary and secondary key vaults from both primary and secondary SQL Servers (in the Failover Group).  I don't know what I'm doing wrong?  Any ideas?  A couple questions I have are:

    1) How did you ensure SQL server has access to both Keys in the two different Azure key vaults?  I did this by looking at my access policies.  Is there another way?

    2) I don't understand the NOTE in red.  I see the checkbox on the TDE Encryption blade that says "Make this key the default TDE protector".  It seems that should be checked on both the primary and secondary servers in the Failover Group, right?  This seems separate from the primary and secondary key vaults that both contain copies of the key.