COVID-19 Made Site Reliability an Essential Practice
X

COVID-19 Made Site Reliability an Essential Practice

SRE Tools, DevOps Culture
May 13, 2021
Written by Harrison Clarke
3 minute read
Written by Harrison Clarke
3 minute read

The tech world instated the norm to provide reliable, uninterrupted service at all times, long before the global health pandemic forced us to reshape the way we work and interact. From giant corporations to startups, tech companies understood that failing systems cost money. Planning for such failures and the ability to respond to them quickly is what determined an increasing number of companies to implement Site Reliability practices. 

More than playing a crucial role in cost-effectiveness, site reliability ultimately helps companies win people’s trust that their systems will always work. The tech world proved that Site Reliability Engineering (SRE) enhances product development and adds value to a business by ensuring that the company is able to continuously deliver its products or services to its customers. While the benefits were undeniable and quickly gained popularity, the implementation of reliability engineering remained second-in-line on companies’ strategic plans. With no real urgency looming on the horizon, many companies viewed Site Reliability as a nice thing to have someday.

martin-sanchez-j2c7yf223Mk-unsplash

The day came sweeping in sooner than anyone had expected and disrupted everyone’s lives and executives’ priorities. In a matter of months, the COVID-19 crisis set in motion an unprecedented technological acceleration. According to a recent survey, digital transformation leaped forward by three to four years for customer and supply-chain interactions and internal operations, and by seven years at a global scale (six years in North America) for digital or digitally-enabled products and services available in companies’ portfolios.

Furthermore, this acceleration in digitization happened across all sectors and industries. As consumers fleeted toward online channels, companies had to respond and adapt quickly to the emerging demands in order to stay competitive. Similar to consumer-oriented operations, the pandemic enabled a boost in the digitization of core internal operations.

pexels-edward-jenner-4031818

Another significant impact of the health crisis – and one that is most likely here to stay – was the veer to remote work. The same survey found that companies responded 40 times faster than they would have before the pandemic. In sectors and industries where remote work was possible, it took on average 11 days to implement a viable solution. 

The pandemic highlighted the importance of a strategic approach to technology. Companies finally understood that Site Reliability was not just another means to reduce costs, but an essential practice in securing a competitive edge.

What is interesting to note is how fast these changes happened, how quickly executives responded to their employees’ new demands, and why that was possible. When the world faced a time of extraordinary uncertainty, technology was the reliable support that allowed companies and individuals to adapt quickly and ensure the continuous functioning of operations. While technology capabilities are undoubtedly the primary reason for the successful transition to digital, Site Reliability Engineering keeps everything running smoothly. The pandemic highlighted the importance of a strategic approach to technology. Companies finally understood that Site Reliability was not just another means to reduce costs, but an essential practice in securing a competitive edge. 

It comes as no surprise that the most successful organizations in response to the crisis were those with well-established technological capabilities and a reliability-centered culture. For those who had not, this was a lesson to learn. Perhaps ironically, the COVID-19 crisis represented the highest uncertainty anyone could have expected, and the one thing that is, by design, destined to manage high levels of uncertainty and risks of failure is Site Reliability Engineering. 

The pandemic challenged the way we interact and function as individuals and as a society. In order to deliver faster and meet customers’ dramatic shift in demand, executives were forced to change their mindsets with regard to both technology’s strategic importance and leadership. Prior to the pandemic, most executives placed cost savings at the top of their priorities. Since the beginning of the crisis, a large number of companies have significantly increased funding for digital initiatives in an effort to maintain a competitive advantage, while others have remodeled their whole business around digital technologies. Additionally, executives had to rethink their leadership approach to better respond to their organization’s needs in times of crisis. This entailed more than ever transparent communication and a shared, clear focus on continuing to operate in a reliable manner. This heightened necessity for connectedness further established Site Reliability as an essential practice.

dylan-gillis-KdeqA3aTnBY-unsplash

Some of the changes the COVID-19 crisis generated have indisputable long-lasting effects. It’s encouraging to see companies (regardless of the scale) understanding the importance of Site Reliability Engineering for their business and organization. However, this raises the concern of its proper implementation. What works for one company will not necessarily work for another. Site Reliability is first of all about the people and the process; it is a culture centered on trust, transparent communication, and partnership.

What works for one company will not necessarily work for another. Site Reliability is first of all about the people and the process; it is a culture centered on trust, transparent communication, and partnership.

We often hear DevOps and SRE specialists talk about reliability as resilience. Resilience is what helps us as individuals to overcome failure and stand strong in the face of uncertainty. Similarly, Site Reliability is a continuous improvement of the ability to overcome failures that go beyond technological capabilities. The global health crisis has reshaped the way we live and work, and hopefully, has taught us all the importance of resilience.

New call-to-action

SRE Tools DevOps Culture