Companies running high-reliability services are getting better at defining their unique Site Reliability Engineer (SRE) needs and understanding which best practices to implement in their frameworks. However, the question of how to actually organize SRE teams raises difficulties. Do you upskill your current team? Do you embed SREs within your SWE team, or do you build a separate team entirely?
There are different implementations of SRE teams that accommodate the various DevOps adoption stages and can exist simultaneously within an organization. As SREs gain experience, they will naturally progress from one type of implementation to another.
Embedded SREs vs. Stand-alone SRE teams
Google lists six types of SRE teams as observed throughout the evolution of its SRE practice. The six implementation types can be primarily grouped into two categories: embedded SREs and dedicated SRE teams.
Dedicated SRE teams
Google’s SRE-only teams are highly specialized, focusing on specific actions such as maintaining shared services (infrastructure team), building software to improve system reliability (tools team), or running and scaling a critical application or business area (product/application team). These teams evolved from the first-ever SRE team at Google, known as the Kitchen Sink.
The Kitchen Sink or “Everything SRE” team implementation is generally the first and only SRE team in place and may expand organically over time, as in the Google example. The Kitchen Sink is recommended for companies that have outgrown what can be done without a dedicated SRE team but are yet to require multiple SRE teams.
Embedded SRE teams
These SRE teams are attached to a product, service, or application team. According to the Google approach, there is usually one SRE per team. The embedded SRE acts as a Subject Matter Expert, working closely with its SWE counterparts, usually on a project basis. In addition, embedded SREs have a hands-on role, updating the base code and configuration of the services.
This type of implementation is best suited to start an SRE function or scale another team. By driving the adoption of SRE best practices, embedded SREs can help expand the SWE team’s positive impact.
The consulting implementation derives from the embedded approach, with the main difference that consulting SRE teams rarely make code and configuration changes. Also known as “Customer Reliability Engineers,” these SRE teams are recommended for large companies that have outgrown the capacity of the different SRE teams.
Pro and Cons of Embedded SREs
Dedicated SRE teams
- A highly skilled team of engineers focusing on specific actions that improve reliability
- Help make other teams’ jobs easier and faster
- Align the business goal with the teams’ efforts to achieve that goal
- As the company and complexity grows, new teams will be required, leading to a potential divergence between teams and duplication of product focus
- Any issues regarding these teams may have a negative impact on the entire company
- Lack of direct contact with the customers may lead to improvements that do not reflect the end user’s experience
Embedded SRE Teams
- Provide SRE expertise to solve specific problems, such as improving operational overload
- Drives adoption by demonstrating SRE best practices alongside day-to-day work
- Enables further scaling the positive impact of the current SRE practice
- Struggle with standardization and identity
- May lead to divergence between teams
- Consulting SRE teams may be considered hands-off, as they don’t usually change code and configuration
How do you decide which approach to take to organize your SRE team?
As with DevOps, there is no comprehensive guide to structuring SRE teams. The way you organize your SRE teams depends mainly on the organization’s maturity level. For example, if you are starting out on your SRE journey, you may want to consider assigning some engineering time to test out SRE-related practices. Although it may be time-consuming, this preliminary step allows you to evaluate your SRE needs and adapt the methodology accordingly, without significant investment or sudden organizational change.
Whether you are just embarking on your SRE implementation journey or are in need of scaling your existing teams, start by evaluating your organization’s requirements. List the pros and cons of your existing SRE team implementation to understand your team’s maturity level better and which type of implementation to follow next.