Google - Site Reliability Engineering
Foreword
Preface
Part I - Introduction
1. Introduction
2. The Production Environment at Google, from the Viewpoint of an SRE
Part II - Principles
3. Embracing Risk
4. Service Level Objectives
5. Eliminating Toil
6. Monitoring Distributed Systems
7. The Evolution of Automation at Google
8. Release Engineering
9. Simplicity
Part III - Practices
10. Practical Alerting
11. Being On-Call
12. Effective Troubleshooting
13. Emergency Response
14. Managing Incidents
15. Postmortem Culture: Learning from ...
Read more at sre.google