#STBY – GitLab accidental deletion of data

GitLab certainly had a #SucksToBeYou week.  Earlier this week, an admin at GitLab had a bad thing happen.  Too tired, rm -rf in the wrong location, but unfortunately they had poor backup practices.

Props to them for being upfront.

This incident rang like a bell, reminding me of CloudSpaces.  Another online code hosting company, which had data deleted and went out of business overnight due to it!

GitLab had live video and a live Google Docs of the troubleshooting

Google Docs of troubleshooting

Their post-mortem

8 hours of live troubleshooting

————–

IF they had Netapp gear, or Cloud Ontap, they could have used SnapRestore or Single-File SnapRestore to pull back the deleted files from snapshots within seconds. 

When you plan for disaster recovery, and backup (which those 2 are NOT the same), you have to design for a minimum of the 3-2-1 rule.

3 Copies, 2 Locations, 1 Immutable. 

Another way to think about it, Online, Nearline, Offline.

  • Online = Snapshots (which are NOT backups)
  • Nearline = Replicates (instantly available at a different location)
  • Offline = Cloud, Tape, Write Once (Anything which is not overwritten, and uses different credentials/key to get to)

Long long long long ago, after being awake for 3 days preparing for a datacenter move (First Mistake), I accidentally rm-rf in the wrong location and messed up my prod database.  Luckily, due to good backups, and Netapp SnapRestore, I was restore very quickly, catch a nap, and finish the move.

The moral of the story is, test your backups, test your application recovery, DO NOT trust a hosted source code repository for your only backup of your code.  Otherwise, #SucksToBeYou!

 

Be Sociable, Share!

Comments are closed.