How to handle Discourse backups?

How should we handle Discourse backups? The two options we’re considering are use only GitHub to store our files (without passwords, etc.) or use GitHub plus AWS S3 or something similar. It’ll be just /var/www/discourse. I’d prefer the latter, because I’m not a fan of having a single place to store our backups. In addition, GitHub has no sort of SLA, while AWS has 99.999999999% data durability guarantee for S3.

Thoughts?

Removing passwords would mean we can’t completely automate the backups, and
ideally the backups are compressed and encrypted.

Right, forgot about that. I personally think our best route would be something like this script which can handle encryption and then syncing to S3.

I’m not sure that we need to be backing up anything that’s in a git repo.
If the servers need restoring, you do git pull. If for some crazy reason (which I’m yet to find), GitHub suddenly crashes and loses all of our stuff, we go do git remote add , git push

You mention passwords. Do you mean the passwords for the DB?

Before this turns into another argument that leads nowhere, maybe @mrz or
@wdowling can chime in with some advice?

I see where tad is coming from. We ideally shouldn’t be modifying any core files of Discourse, so a checkout of a previous version is generally enough to restore. But we should definitely be backing up our config files somewhere, as they are obviously not stored in Discourse’s official repo.

I guess I could play devil’s advocate and say, “Hey, we don’t control Discourse’s repository! What if it gets completely borked and we can’t restore a previous version?” In that case, to play it safe, we could back up our entire installation for redundancy. I’d like some more input on whether that would be a necessary thing.

Config files should be in a git repo, and we should have a fork of Discourse (with our configs)
Passwords should be stripped, and we can add them back in using something similar to what JP uses. S3 bucket containing files with passwords, shell script which downloads them and echos them to where they are needed. We can execute the script using puppet.

I can understand that redundancy is important, but we could go forever on adding more redundancy. Think about it this way:
Our Git Repo = Mirror of web01 = Mirror of web02

Remember that everytime we add another node, that again is another mirror

I like the idea of storing our configs in Git. Is there a away to take snapshots of the server images in AWS?

1 Like

Yup. However, restoring from them requires creating a new instance, according to @yousef.

That’s fine and expected - pretty much the same as in HP.

It’s also best the server is shut down, according to Amazon, otherwise it may not properly copy everything over.