by Alan Chen
A new tool to prevent catastrophic deletions like GitLab’s
Basically: I found most of the existing tools not very helpful and made a new open source tool called
rm-protection, which you can download from GitHub.
I was riding a bus back to my dorm and I almost fell asleep. Suddenly one of my friends sent me a message on Telegram: “GitLab deleted its production database and they are now live streaming database recovery on YouTube!”
My head bumped into the seat in front of me. I couldn’t feel the pain but felt sorry for the ops and wanted to #hugops while reading their incident log.
Aren’t we all humans who make mistakes? But some data are just too important to lose. For example, production database and photos with families and friends.
I have a deeply ingrained fear of losing data. I started playing around with Linux in primary school, and I had only a PC with a single hard drive at that time. As a child and a Linux newbie, I was more careless than most sophisticated users. Once, I accidentally deleted a whole partition — not only the system files, but also my home directory.
I still remember the horror when everything crashed and I realized that I had just deleted all my photos. Then I cried tears of happiness when
photorec got some of them back.
A quick survey of current tools for preventing this
I got off the bus, walked back to my room and started searching for prevention methods. I’ve heard some before:
rm -i requires an additional confirmation for each single file and directory. It is tedious to confirm everything that you surely want to remove. It reminds me of the story of The Boy Who Cried Wolf.
Warning about everything is like warning about nothing. What's worse, some users have developed a habit of using
rm -rf regularly, where
-f option will override the
In the case of GitLab’s incident,
rm -i would not have helped: the ops knew which directory he was going to delete, but forgot about which machine he was on. He could have typed “yes” and hit the “return”.
Safe-rm wouldn’t have helped, either.
Safe-rm has a configuration file that contains a list of paths you want to protect. It comes with some default paths such as
/usr/lib. Users can also create their own lists of paths. What
Safe-rm does is providing an extra warning beforehand, which gives no extra information about why it is stopping you.
Think about the GitLab’s situation: ops could have just hit “y” (he would have thought it was not a production database, why not hit “y”?). Plus,
Safe-rm does not provide symbolic link and recursion protection.
Safe-rm. (Another tool is called
rmfd, a fork from GNU coreutils with similar protection mechanism)
The only tool I found useful for GitLab’s situation is
trash-cli. So far it’s the best solution I know. It brings trashcan to the command line.
trash-cli surely can prevent about 90% of the accidents (including GitLab’s).
But what if you realize something’s missing long after emptying the trash.
Or imagine you are running out of space, but you have tons of data to write on the disk. You’re in rush to free up space (like removing the database filled with spam messages). Would you still carefully check the trash?
trash-cli wasn’t the ultimate solution I was searching for.
Searching for Inspiration
The bathroom has always been a great place for eureka.
It was 11 pm. I took a shower and I kept talking to myself through a solution for preventing GitLab’s incident.
“What exactly causes accidents like the one happened in GitLab?”
“Not knowing what you’re doing.”
It was still in the middle of spring break, and there was nobody around at my university. So I could talk to myself in the bathroom without people thinking I was crazy.
“How do you let users know clearly what they are doing?”
“Hmmm, maybe by making them say it out loud?”
I rushed out of the bathroom.
Inventing the Tool
I immediately messaged my friends about my idea: users “protect” important files and directories during the deployment period. The protection is done by setting a safety question and an answer.
Imagine this: when GitLab ops deploys databases to the production server, they also “protect” the directories of databases by setting up a question “What database are you deleting? (db1/db2)” and an answer “db1”.
Afterwards, upon any attempt to remove these directories, a modified version of
rm will ask you the question. Unless you know the answer right, you won’t be able to proceed.
GitLab’s ops couldn’t possibly enter “db1” when he thought it was “db2”. By making sure he knew what he was doing, GitLab’s database could have been saved.
So I wrote a Python script named “rm-p.py”. It’s a wrapper for
rm that checks if a corresponding
.<filename>.rm-protection (which I call a “protection file”). The prompt asks the question defined in the protection file when it can be found.
If you get the answer right,
rm-p.py will pass your argument to
rm. If you don’t, it doesn’t. Of course, it will still pass non-protected files to
I called this little script
rm-protection and made a logo for it.
Now the package
rm-protection is available on PyPi and the source code is on GitHub.
What ultimately protects you?
For companies and teams, backup are surely the most important protection for data loss. It not only shields you from fat-fingerings, but also natural disasters.
But for individuals, comprehensive backups aren’t always economical or convenient.
Putting a lack of backup aside, bad habits are almost always the source of these fat-fingerings.
We invented so many tools to deal with these bad habits, yet they may cause users to form new bad habits.
“The best and the only right way to double check what you are going to delete.”
Or so some may say. But few can live with having to confirm every single deletion. Thus,
rm -rf is their new bad habit.
Current tools either protect you before (like
rm -i or
safe-rm) or after (
trash-cli) accidental deletions. The former ones often bring more troubles than expected in daily operations.
The latter ones like
trash-cli does not provide protection upfront. Chances are, you’ll still lose the important file.
After putting some thought into the issue, I realized that there’s no such thing as ultimate solution.
rm-protection is just another layer of protection. It is not the most vital part of the protection, but it can save you tons of time for recovering your data from backups.
rm-protection does not bother you when it is not necessary, so you still have the flexibility and efficiency for daily operations. When it is truly important that you should not delete something, it asks you a question set by you.
To be 99% safe, what you need is a combination of good habit, careful and clear mind, working backups, a good protection method, and luck.
The Best Practice
To sum up, you should do the following to ensure the safety of your data:
- Do backups.
- Check backups regularly.
- Keep a clear head. Don’t use
- Add an additional protection layer: choose
trash-clior whichever tool you like.
And you should be 99% safe.