We have all heard the importance of making backups. You may have even heard of disaster recovery planning. Hopefully you have listened to that advice and are well prepared. Now the question is, have you protected your backups from someone deleting them?
Cloud providers do things at scale and they have built in an impressive amount of redundancy and reliability. To offer these services as simple, neat packages, they hide away the complexity behind them. When moving to the cloud, you may think that all is taken care of for you. In reality, there is much more work to be done.
Nearline or local backups are ones that are close to the source. They are generally snapshot based instead of complete copies of your data. Think of snapshots as semi-transparent layers that only contain the differences from the last snapshot. When laid on top of each other, they give you a complete picture of your data. The first snapshot is a complete copy of your data while the following snapshots are differentials. If the first snapshot becomes corrupted, your data is lost.
Replicating your snapshots to another region provides geographical redundancy to your backups. Because the replication is asynchronous, meaning bits are not copied in order, there is a chance that your replicated backups are incomplete or have become corrupt due to the timing of a failure in replication.
Because of this, the availability zone or single datacenter redundancy model provided by cloud providers does not cover you as a complete backup or a disaster recovery solution. When the datacenter suffers an outage, not only will your servers go down, but your data can go with it.
On August 31, 2019, a datacenter in AWS’s us-east-1 region suffered a complete power and backup generator outage that cause 7.5 per cent of all EC2 instances (virtual machines) in that availability zone to disappear along with data corruption of a similar percentage of EBS (disk) storage. For those customers who didn’t perform any backups, their data was lost. For those who had local snapshot backups, which are stored in S3, they had a time consuming restore process fighting with other impacted customers for available network bandwidth, EBS (disk) capacity, and EC2 instance (VM) capacity in the affected zone. Many experienced outages that would exceed what most management and customers would deem acceptable.
AWS is not the only cloud provider susceptible to this risk. It was just unlucky to have a recent example of data lost.
Local snapshots are a nearline backup solution for getting you back up and running quickly. They can be lost, unavailable, or slow to access in a disaster. When something goes wrong everyone is impacted and the combined responses to the impact cause further impacts.
To survive a disaster, you need to replicate your data and backups outside of the region they are in and failover to your secondary region.
All 3 main cloud providers (Azure, AWS, and GCP) provide snapshot-based backups and multi-datacenter redundancy within the same zone. Azure and GCP provide optional auto-failover geo-replication of their file storage services and Azure now provides the ability to manually failover to a secondary zone. For AWS you can achieve a similar geo-replication solution by setting up S3 bucket replication.
With the advent of malicious actors that release ransomware for financial gain, your backups and the safety of your backups is extremely important.
With the move to the cloud, backups have been an afterthought, but what is almost never considered is maintaining a protected copy of your backups.
Ransomware’s effectiveness relies on your lack of proper backups. If you are one of the few that implement and properly maintain a near real-time backup strategy, you can over come a ransomware attack with minimal loss of data and greatly reduced loss of productivity.
Ransomware has been a raging success for the bad actors. This success will drive the innovation into finding ways to further lock a victim into parting with their bitcoins. On top of encrypting your data they will soon go after your backups, if they have not done so already.
If your data and backups are accessible and modifiable from a single interface using a single authenticated session, you are highly exposed to losing everything. A bad actor or disgruntled employee who has or can gain write access to all of your data and backups, can also make them all go away.
I have witnessed this exact tragic event occur and unfold. A disgruntled IT administrator deleted a hosting company’s data and offsite backups in a way made recovery difficult to impossible. Expensive physical data recovery was limited in recovering the lost data. In the world of highly redundant storage, your data is spread across many storage devices. If the map to where your data resides is lost, your data becomes many needles in many, many haystacks. These events can happen and can bring about the end for a business or a career.
You will want to replicate a copy of your data to an isolated environment with limited access from the first environment. How often this happens depends on how much data you are willing to lose.
All 3 cloud providers provide a way to share snapshots or grant external access to storage to allow the replication of backups to an isolated environment. None of them provide an automated way to provide ongoing replication. For all 3 clouds you would need to create a scripted solution.
For the more technical readers, I have outlined the high-level steps required for each of the main 3 cloud providers.
With Azure, to setup isolation for your protected backups, you would need to setup a separate subscription or a separate management group containing a subscription. The one issue with this design is that a user granted Global Administrator permissions can grant themselves access to any subscription they don’t have access to. You would need to further secure Global Administrator access by using Privileged Identity Management (PIM) or provide separate admin accounts for user’s that require that level of access. To achieve complete isolation, you would have to create a separate Azure Active Directory to attach you Backup subscription to.
Once the environment is setup, you would create an Automation Account in the Backup subscription and a service principal that has limited access to each subscription. You would then create scheduled runbooks in the Automation Account would perform the manual process of copying your storage account data, VM snapshots, and SQL BACPAC exports to a storage account in the Backup subscription.
With AWS, to setup isolation for your protected backups, you would create additional AWS accounts within an organization hierarchy. The source and Backup AWS accounts would be attached to an organization AWS account. The one issue with this design is that an IAM user with Administrator access in the organization AWS Account can assume the OrganizationAccountAccessRole IAM role on the Backup AWS Account, giving them full Administrator access to your backups. To achieve complete isolation, you would not attach the Backup AWS account to the organization and manage it separately.
Once the environment is setup, you would setup bucket policies on both S3 buckets and create IAM roles and policies in both AWS accounts to allow limited access between them. You would then create CloudWatch Event scheduled Lambda functions that use the IAM roles to perform the manual process of copying your EC2 and RDS snapshots to the Backup AWS account.
For S3 bucket data, AWS now provides a way to automate replication of an S3 bucket to another S3 bucket in a different region and AWS account. To setup and keep both buckets in sync requires a bit of work and monitoring.
With GCP, to setup isolation for your protected backups, you would create a Backup project within the organization hierarchy. The one issue with this design is that any user in the gcp-organization-admins or gcp-security-admins groups can grant themselves access to the Backup project. You would need to create separate admin accounts for user’s that require that level of access. To achieve complete isolation, you would have to create a separate organization with separate Google accounts.
Once the environment is setup, you would enable Bucket Policy Only on the Backup Cloud Storage bucket and allow a service account IAM role access on both Cloud Storage buckets to allow limited access between them.
You would then create scheduled Cloud Functions running as the service account to perform the manual process of creating images of your VM snapshots in a Cloud Storage bucket, export SQL data to the same Cloud Storage bucket, then copying your Cloud Storage bucket data to a Cloud Storage bucket in your Backup project.
The other important area with backups is securing read access to them. The reason cloud providers put in place so many barriers to getting your backup data out, is because of data security. Make sure, at all stages, access to your backups are using the least privileges possible.
Along with securing access to your backups, you will also want to make sure they are encrypted at rest. All cloud providers offer AES-256 bit encryption on their storage services. Make sure it is enabled.
Encrypting the source virtual machine disks or databases is a recommended practice and will ensure your data remains encrypted on the backup target.
With AWS, handling the encryption keys is a much more manual process. To migrate data between AWS accounts, you must use customer managed keys (CMK) for encrypting all of your data. You will need to grant the IAM roles in the Backup AWS account access to the keys so they can re-encrypt the data using keys for the Backup AWS account.
Your automation scripts also need to keep your data secure in transit. Make sure they are only using encrypted TLS 1.2 connections or higher when connecting to the cloud provider’s storage services.
Depending on your environment, you may have other data that would require replicated backups. I just touched on the main services of compute, relational database, and storage. This article is more about getting you to think, “are my backups protected?”
James started out as a web developer who dabbled in hardware and open sourced software development. He then switched to IT infrastructure and spent many years with virtualized servers, networking, storage, and domain management.
His wide range of talents and experience were not being used properly in traditional IT. So he made the switch to cloud computing, as it was a perfect fit.
For the last 2 years he have been dedicated to automating and providing secure cloud solutions for our clients.
James started out as a web developer with an interest in hardware and open sourced software development. He made the switch to IT infrastructure and spent many years with server virtualization, networking, storage, and domain management.
After exhausting all challenges and learning opportunities provided by traditional IT infrastructure and a desire to fully utilize his developer background, he made the switch to cloud computing.
For the last 3 years he has been dedicated to automating and providing secure cloud solutions on AWS and Azure for our clients.