Upon completing this course, participants will achieve an understanding of:
- The history of SRE and its emergence at Google
- The inter-relationship of SRE with DevOps and other popular frameworks
- The underlying principles behind SRE
- Service Level Objectives (SLO’s) and their user focus
- Service Level Indicators (SLI’s) and the modern monitoring landscape
- Error budgets and the associated error budget policies
- Toil and its effect on an organization’s productivity
- Some practical steps that can help to eliminate toil
- Observability as something to indicate the health of a service
- SRE tools, automation techniques and the importance of security
- Anti-fragility, our approach to failure and failure testing
- The organizational impact that introducing SRE bring
Pre-requisites
Knowledge, Skills & Experience
- Prior knowledge of DevOps, which can be achieved by attending:
-
- IT14A05 - DevOps Foundation
-
- It is recommended that participants have prior working experience or knowledge in IT software development or IT industry operations.
- Hardware & Software
This course will be conducted as a Virtual Live Class (VLC) via Zoom platform. Participants must own a zoom account and have a laptop or a desktop with “Zoom Client for Meetings” installed. This can be downloaded from https://zoom.us/download
System Requirement |
Must Have:
Please ensure that your computer or laptop meets the following requirements.
Good to Have:
Not Recommended: Using tablets is not recommended due to their smaller screen size, which could cause eye strain and discomfort over the course of the program's duration. |
Course Outline
Module 1: SRE Principles & Practices
- What is Site Reliability Engineering?
- SRE & DevOps: What is the Difference?
- SRE Principles & Practices
Module 2: Service Level Objectives & Error Budgets
- Service Level Objectives (SLO’s)
- Error Budgets & Error Budget Policies
Module 3: Reducing Toil
- What is Toil?
- Why is Toil Bad?
- Doing Something About Toil
Module 4: Monitoring & Service Level Indicators
- Service Level Indicators (SLI’s)
- Monitoring & Observability
Module 5: SRE Tools & Automation
- Automation Focus
- Hierarchy of Automation Types
- Secure Automation
- Automation Tools
Module 6: Anti-Fragility & Learning from Failure
- Why Learn from Failure
- Benefits of Anti-Fragility
- Shifting the Organizational Balance
Module 7: Organizational Impact of SRE
- Why Organizations Embrace SRE
- Patterns for SRE Adoption
- Sustainable Incident Response
- Blameless Post-Mortems
- SRE & Scale
Module 8: SRE, Other Frameworks, Trends
- SRE & Other Frameworks
- SRE Evolution
- Additional Sources of Information
Certificate Obtained and Conferred by
- Certificate of Completion from NTUC LearningHub
Upon meeting 75% attendance and passing the assessment(s), participants will receive a Certificate of Completion from NTUC LearningHub.
- Statement of Attainment from SkillsFuture Singapore
Upon meeting at least 75% attendance and passing the assessment(s), participants will receive a Statement of Attainment from SkillsFuture Singapore to certify that the participant has achieved the following Competency Standard(s):
-
- Quality Engineering (ICT-DIT-3011-1.1)
- External Certification Exam
After completing this course with at least 75% attendance and upon passing the official “DevOps Site Reliability Engineering Foundation” certification exam, candidates will receive a Certified Site Reliability Engineering Foundation certification from DevOps Institute. The certification is governed and maintained by DevOps Institute.
Categories
More Information
- NTUC LearningHub
Add a review