##Updates
I’m working with my LOPSA mentor (darkfader on IRC) to set up monitoring. He suggested we use OMD, Open Monitoring Distro, which is pretty much a package of several monitoring tools, all Nagios-based. It’s running right now here, you can log-in with your csa-monitor1 password used in shell. If you don’t have one, talk to me and I’ll get you set up. Check_mk is pretty nice, and all the other tools that it comes with look useful too.
##On-call rotation
Before I say anything else, I realize that this is early, but I’d like to get discussion about this so that once I do get monitoring fully functional, I can just add the time periods and go. Ideally we can use something like PagerDuty, but that’s not cheap, and the services that function like it that do have free plans don’t do what I want.
First of all, who actually wants to be on it? If you do, realize that it could mean (for a while, at least) that you may be woken up at 3AM to fix something. Emphasis on “may”, but you never anticipate problems.
If you want to be in the rotation, what time periods are okay for you? I think we should have somebody on-call 24/7, which will be difficult with so few of us.
##Notifications
I’m trying to figure out the best way to handle notifications. Right now I’m leaning toward using SNS/SES for everything, that way we don’t have to worry so much about keeping our mail servers up, and relying on (apparently slow) SMS gateways. The downside to using SNS for SMS is that it only works in the US, so push notifications may be something else to look into.