Homelab Backups


In the process of going from using cloud based services to self hosting my own, I have realized that my backup process is pretty weak. I have been putting a lot of trust in the cloud that they will keep my files safe forever. Of course they are financially incentivised to do so. If they were building unreliable services they wouldn’t be use. But as I have been trying to take my own tech fate in my own hands this is something that I want to get right.

Research

Rsync

I started out looking into various tools starting with rsync. rsync is a great tools that lets you sync directories and folders from one remote server to another. It works securely as it uses ssh and is performant and easy to use. This is a great option for a homelab. However, one thing that it doesn’t do is sync with an S3 API. And I do not have another remote server outside of my house (yet). So I was thinking that I would use an S3 compatible service until my friend and I swap a few servers.

Rclone

rclone is marketed as a cloud based version of rsync, but it can honestly do a lot more. You can interact with any remote server or any storage service as if it was just a file system. I think that rclone is going to become an indispensable tool for me in the future and it can do exactly what I want. Rclone will get my files into an object storage service.

Git

After thinking about the issue of backups a little more. I had a realization that one backup piece of software that I love and is easy for me to use is git. It is extremely easy to create many copies of a repository and sync them across many machines securely over ssh. I easily follow the 3-2-1 rule without trying. I normally have 1 or 2 machines that have a working copy of my git repos and they are synced in a central ssh server. The version control allows me to go back to any good state and is extremely robust. Git is certainly a good option for backing up text. Anything from Obsidian vaults to blog posts this is going to be my option moving forward. However this is obviously not a good option for things like nightly backups for databases and other non text based applications.

Restic

Restic is a cli tool that lets you create encrypted snapshots of your data. It uses the same language as git, you create a repository to store your data in. You can take snapshots of your data and then sync them to a remote location. (This actually makes use of rclone for a number of the different service options) You can keep your repository on S3 compatible services or simply a specified file system. Because this seems like such a robust tool this is the option I will be using for now.

One concern I have is the repository must be encrypted. For most of my application I am working on a secure server anyways so I am not concerned with encryption. I wish there was an option for the data to not be encrypted so I do not have to worry about losing or forgetting the password to my repository. It’s frustrating, but it is probably a good idea for me to practice password management and make sure I am storing my passwords so I do not lose them.

Restic Setup

Init Restic Repo

Restic is in the default ubuntu apt repository. So installing it is just one sudo apt install restic away. To create a new repo you need to specify a location and a password. You can use the CLI flags or environment variables to do this. I used environment variables so I didn’t have to keep typing these things over and over again set RESTIC_REPOSITORY and RESTIC_PASSWORD. To start with I created a local repository at ~/backups/<project-name> and created a simple password I would remember. (all of the commands I run from here assume that you have the environment variables setup)

Creating Snapshots

Now that the repository is setup created, you just have to run

restic backup .

in the directory you are trying to save. This creates a new snapshot of your data. It manages deduplicated any data that you might have already saved in a previous snapshot. Backup number one, done. You can list the snapshots you have created with

restic snapshots

Restoring Snapshots

Restoring snapshots is obviously the point of a backup. It’s fairly straight forward. Get your snapshot id from restic snapshots and use the restore command to place your files in a target folder specified by a flag.

restic restore <snapshot-id> --target /tmp/test-backup

Expiring Old Snapshots

Now, storage is not unlimited, in order to keep your storage bounded for your snapshots, you need to remove old snapshots you do not need. In order to do this you need to utilize two commands forget and prune.

To forget a snapshot you can run forget like this.

restic forget <snapshot-id>

However this is not very useful on it’s own. The forget command also has powerful policy flags that can allow you to only keep the snapshots that you actually need. For my homelab, I want to do daily backups and to keep the past 5 days. Even that is probably overkill but I can specify the behavior like this.

restic forget --keep-daily 5

This command will will keep the most recent snapshot per day starting 5 days ago. So this will give me some flexibility to recover things if I have a really bad day.

Now on it’s one forget will not remove any date it will remove the snapshot record. So you won’t be able to go back to that snapshot, but it won’t remove any obsolete data that you have in your repository. So your repository won’t changes size. You then have to prune your repository.

restic prune

The prune process can also be chained with forget for a one line command that looks like this.

restic forget --keep-daily 5 --prune

The restic docs suggest running restic check which validates that there was no data corruption during the pruning process.

Automating Backups

Restic does not come with a daemon that we can run to automate snapshots. You backup system should obviously run automatically. So we can reach for a cron or systemd service to run this command for us.

First lets create our snapshot service.

This file creates a command that systemd can run periodically. There are a few things to notice. First the After=network.target tells systemd that this can only be run after the network is initialized. Since our backups should be going to a remote location. We should use this. Second, The service type is oneshot this is a service that runs once and then shuts down. Which is what we want.

# /etc/systemd/system/create-snapshot.service
[Unit]
Description=Restic Backup Service
After=network.target

[Service]
Type=oneshot
Environment="RESTIC_REPOSITORY=<repository-path>"
Environment="RESTIC_PASSWORD=<password>"
ExecStart=restic backup .
User=cline
WorkingDirectory=<backup-dir>

[Install]
WantedBy=multi-user.target

You can test the service by running. This will print the output from the restic command.

sudo systemctl start create-snapshot.service
sudo systemctl status create-snapshot.service

Then we can create a timer to run this service automatically.

# /etc/systemd/system/create-snapshot.timer
[Unit]
Description=Run create-snapshot.service every hour
Requires=create-snapshot.service

[Timer]
OnCalendar=hourly
Persistent=true

[Install]
WantedBy=timers.target

Now we need to expire our snapshots automatically. We will create another service to do so.

# /etc/systemd/system/expire-snapshots.service
[Unit]
Description=Restic Expire Backup Service
After=network.target

[Service]
Type=oneshot
Environment="RESTIC_REPOSITORY=<repository-path>"
Environment="RESTIC_PASSWORD=<password>"
ExecStart=restic forget --keep-daily 5 --prune
User=cline
WorkingDirectory=<backup-dir>

[Install]
WantedBy=multi-user.target
# /etc/systemd/system/expire-snapshots.timer
[Unit]
Description=Run expire-snapshots at midnight
Requires=expire-snapshots.service

[Timer]
OnCalendar=daily
Persistent=true

[Install]
WantedBy=timers.target

Now we can create hourly backups. Now to keep things small we should create another service to expire unneeded backups.

Now still to do would be to find a good place to store your backups. Checkout Restics Docs for the specifics on different file locations.