One of the well known ways to backup AWS DynamoDB is to use Data Pipelines and EMR. But there are few disadvantages with this approach like creation of temporary EC2 instance, using provisioned read capacity of the DynamoDB table for backup and it could take a lot of time to backup depends on the size of data, size of EC2 instance and provisioned read capacity etc., Overall this could even lead to high costs and less maintainable.

AWS has released a feature “Export to S3” which solves a lot these problems. This is a complete serverless solution. Hence it is scalable and performant. Exporting feature does not consume provisioned read capacity of the table, as a result it has no impact on the performance and availability of the Dynamo table.

How to backup?

Enable PITR (Point in time recovery)

First we need to enable the PITR (Point in time recovery) on the DynamoDB.

  • Go to the Dynamo table in the console
  • Navigate to Backups tab
  • You see an option to enable to PITR as below image

From Terraform: Add this block to aws_dynamodb_table resource

  point_in_time_recovery {
    enabled = true
  }

Export to S3

  • Go to “Exports and streams” tab in the Dynamo table page
  • Click on “Export to S3” button
  • Select the S3 bucket where you want to export
  • Click on Export button
  • You can see the status of the export

Cross account backup

There are 2 ways to do the cross account backup

  • You can choose the different AWS account while selecting the S3 bucket and give permissions according to that
  • You can export to an S3 bucket present in the current account and enable CRR (Cross Region Replication) to replicate the objects to another account

Automating the backups on specific intervals

  • Create a lambda
  • Write code to “Export to S3” using aws-sdk Example code in JS (using aws-sdk version 3).
        const result = await dynamoDB.exportTableToPointInTime({ 
            TableArn: "<>", 
            S3Bucket: "<>" 
        }); 
    
  • Schedule the lambda run using CloudWatch event rules (using schedule cron or rate expressions)

How to restore the backup?

There is no direct import option similar to export option as above. Hopefully AWS will come up with straight forward import to DynamoDB. But for now, we have to manually read the data from S3 objects and insert into the DynamoDB table as part of restore process.

We can use traditional Data Pipelines to restore or Create a script

  • Stream backup files from S3
  • Each Dynamo item stores as a single line in the backed up file
  • Insert these objects into the table using sdk.

What is the Cost of these backups?

This is not a table scan solution, no extra infra/server. So, the cost of this solution is less comparing with Data Pipelines solution.

Taking the example of US east
For PITR - $0.20 per GB-month
For export - $0.10 per GB per export
and additional S3 cost.