datablogs

Learn why an Emergency Snapshot occurs RDS SQL Server

Learn why an Emergency Snapshot occurs RDS SQL Server

 Avoid Emergency Snapshots on SQL Server RDS

1

Definition

What is an emergency snapshot?

Most of the time we are blaming cloud providers but there is a specific reason for the emergency snapshots , we need to be aware and based on that we need to manage customer databases. 

An emergency snapshot (also called an emergent snapshot in AWS documentation) is an automatic, unscheduled snapshot that Amazon RDS triggers on your SQL Server instance outside of your configured backup window.

Unlike your regular automated backups — which run on a predictable daily schedule — an emergency snapshot fires immediately, regardless of time or workload, because RDS has detected a condition that breaks the point-in-time recovery (PITR) chain.

AWS definition

RDS relies on a continuous chain of automated backups + transaction log backups to provide PITR. Any event that breaks this chain forces RDS to take an immediate full snapshot to establish a new recovery baseline — that is the emergency snapshot.

Event log message

You will see it logged in the RDS Events tab as:

Emergent Snapshot Request: Databases found to still be awaiting snapshot.

It is not a system failure or an alarm in itself — but it is a clear signal that something disrupted the normal backup chain and needs attention.

2

Triggers

When does it occur?

AWS documents exactly five scenarios that trigger an emergency snapshot on SQL Server RDS. Each one breaks the PITR chain in a different way.

01

Recovery model changed to SIMPLE

If any user database switches from FULL or BULK_LOGGED to SIMPLE recovery, transaction log backups become impossible. RDS immediately takes a full snapshot to preserve recoverability.

02

Database restored or created in SIMPLE recovery

When a database is restored from a native backup or created fresh with SIMPLE recovery already set, RDS detects a missing log chain for that database and fires an emergency snapshot.

03

Transaction log backup failure

RDS uploads transaction log backups to S3 every 5 minutes. If a log backup fails — due to storage exhaustion, I/O errors, or SQL Server issues — the PITR chain breaks and RDS triggers an emergency snapshot to reset the baseline.

04

Off-line patching completed

When AWS applies off-line OS or engine patches to your instance (requiring downtime), a safeguard snapshot is automatically taken after patching completes. This is by design and cannot be suppressed.

05

Multi-AZ failover or native backup/restore

After a Multi-AZ failover, the new primary needs a fresh snapshot baseline. Similarly, using native backup/restore via S3 integration to replace a database invalidates its automated backup chain — both trigger an emergency snapshot.

How to identify the cause

In the RDS console, go to Logs & events, select the SQL Server error log timestamped just before the snapshot. Search for keywords like RECOVERY, BACKUP, or Restore to pinpoint which cause fired.

3

Prevention

How to avoid it?

Three of the five causes are fully preventable. The remaining two (off-line patching and post-failover snapshots) are expected AWS behavior — you can minimise their impact but not eliminate them.

Cause 1 & 2 — recovery model discipline

Audit all user databases on your instance. If any are unexpectedly in SIMPLE recovery, switch them back to FULL (after verifying this fits your RPO requirements).

SELECT name, recovery_model_desc

FROM sys.databases

WHERE name NOT IN ('master','model','msdb','tempdb')

ORDER BY name;

If a database legitimately needs SIMPLE recovery, accept that any restore of it will trigger an emergency snapshot. Plan those restores during off-peak hours.

Cause 3 — stop log backup failures

• Enable Storage Auto Scaling with a maximum at least 40% above current allocation

• Set CloudWatch alarm on FreeStorageSpace — alert below 20% of allocated storage

• Set CloudWatch alarm on TransactionLogsDiskUsage — alert above 50% of storage

• Set backup retention to 7+ days so RDS has a healthy rolling backup window

• Do NOT run native backup/restore while the automated backup window is active

• Do NOT set backup retention to 0 — this disables automated backups entirely

 

aws cloudwatch put-metric-alarm \

  --alarm-name rds-low-storage \

  --metric-name FreeStorageSpace \

  --namespace AWS/RDS \

  --dimensions Name=DBInstanceIdentifier,Value=your-instance \

  --statistic Average \

  --period 300 \

  --threshold 10737418240 \

  --comparison-operator LessThanThreshold \

  --evaluation-periods 1 \

  --alarm-actions arn:aws:sns:region:account-id:your-topic

Cause 4 & 5 — patching and failover

• Set the maintenance window to the lowest-traffic period of your week

• Use Multi-AZ — AWS patches the standby first, then fails over, reducing primary downtime

• For Multi-AZ SQL Server, I/O is briefly suspended during the post-patch snapshot — plan accordingly

• After any native restore, expect an emergency snapshot within minutes — this is normal and cannot be prevented

Prevention checklist

Action

Prevents cause

Keep production databases in FULL recovery model

Cause 1 & 2

Audit recovery models after every native restore

Cause 1 & 2

Enable Storage Auto Scaling (max 40%+ above allocation)

Cause 3

CloudWatch alarm: FreeStorageSpace < 20%

Cause 3

CloudWatch alarm: TransactionLogsDiskUsage > 50%

Cause 3

Backup retention set to 7+ days

Cause 3

Maintenance window set to off-peak hours

Cause 4 (impact)

Multi-AZ enabled for production workloads

Cause 4 & 5 (impact)

Plan native restores for low-traffic windows

Cause 5 (impact)

Bottom line

Most emergency snapshots on SQL Server RDS trace to recovery model changes, log backup failures, or patching events. Three of the five causes are fully preventable. The other two (patching and post-failover snapshots) are by design — schedule maintenance windows wisely and set storage alarms early.

 

0 Comments