Hey all,
Yousef and I worked on a monitoring MVP, and we’re looking for some feedback.
For a MVP setup of monitoring, we would need to be able to:
- See the status of all community ops controlled sites
- Get alerted via VictorOps when a site goes down
- Ensure the monitoring solution is scalable and reliable
Other things we’d like monitoring to do in the future:
- Monitor community sites if desired by owners
- Store logs from various apps and be able to easily search them (ELK)
- Monitor legacy VPS’ hardware (storage, memory etc)
- Monitor mofo sites
- Automatically respond to events (like cloudwatch)
How we’re going to do it:
- Cloudwatch can send alarms to Sensu - this way we have everything in one place
- Single source of truth for all the monitoring stuff
- Sensu scales better than Nagios
- Allows agents to push status to RabbitMQ, rather than master pulling
- Requires less setup
- https://docs.ansible.com/ansible/sensu_check_module.html
- Checks run on the master server to montior websites’ status
- https://sensuapp.org/docs/0.16/scaling_strategies strat 3 looks like the best option for remote checks