GitLab certainly had a #SucksToBeYou week. Earlier this week, an admin at GitLab had a bad thing happen. Too tired, rm -rf in the wrong location, but unfortunately they had poor backup practices.
Props to them for being upfront.
This incident rang like a bell, reminding me of CloudSpaces. Another online code hosting company, which had data deleted and went out of business overnight due to it!
GitLab had live video and a live Google Docs of the troubleshooting
Google Docs of troubleshooting
8 hours of live troubleshooting
————–
IF they had Netapp gear, or Cloud Ontap, they could have used SnapRestore or Single-File SnapRestore to pull back the deleted files from snapshots within seconds.
When you plan for disaster recovery, and backup (which those 2 are NOT the same), you have to design for a minimum of the 3-2-1 rule.
3 Copies, 2 Locations, 1 Immutable.
Another way to think about it, Online, Nearline, Offline.
- Online = Snapshots (which are NOT backups)
- Nearline = Replicates (instantly available at a different location)
- Offline = Cloud, Tape, Write Once (Anything which is not overwritten, and uses different credentials/key to get to)
Long long long long ago, after being awake for 3 days preparing for a datacenter move (First Mistake), I accidentally rm-rf in the wrong location and messed up my prod database. Luckily, due to good backups, and Netapp SnapRestore, I was restore very quickly, catch a nap, and finish the move.
The moral of the story is, test your backups, test your application recovery, DO NOT trust a hosted source code repository for your only backup of your code. Otherwise, #SucksToBeYou!
Comments are closed.