The Client is a large, IoT focused enterprise that provides digital and hardware solutions for distributed devices monitoring, management and streamlining production processes through automation, machine learning and big data analysis.
This case study is currently pending review and legal approval from the client’s marketing team thus contains no client’s name or client-specific information.
The Client has initially requested support in SRE areas of competence, in order to quickly finalize several internal projects and initiatives related to observability (like implementing tracing for serverless applicaitons or centralized logging solution) and also governance and security. This resulted in 2 senior SRE engineers joining an existing and established team that helped accelearate several epics and projects.
The company had an intensive backlog and plans towards scaling up their platform by several orders of magniture. We knew immediatelly that Site Relaibility principles will help the business execs grow and expand with control and safety. Over time, thanks to great communication, high skillset and effective delivery the Relout team has been scaled to 5 Engineers (including team leader) and also a second project was opened for Backend Engineers with tech lead position.
The development team focus was solely on augmenting existing team developing core part of the system and support them with their daily duties, improve team’s capabilities and skills and also enhance the reliability and quality of the code through observability and quality control. SRE Team continued to work on high-level industry-wide standards and principles, helping all other teams with imroving reliability throgh standardization, centralized solutions for logs and monitoring, governance and developing automation frameworks.
Both teams were mixed between client’s employees and other vendors and led by Relout leaders.
A product based on distributed, 100% serverless based solution with edge computing and IoT devices needs to have a strong priority on Site Reliability valies and practices, in order to be stable, scalable and reliable. Observability was the key to this success. Therefore we applied the following principles
The project is still ongoing since Octover 2022 but as of today, after 6 full months of cooperation both of the teams continue to provide great business value by working in 10 week increments, perform migrations and modernizations, ensuring standards and well-established observability is in place.
The development team with Relout leader driving it consist of 4 Senior Backend Engineers who drive business development and delivery of microservice and microfrontend-based solution for core services within the product portfolio.
The SRE team with Relout leader driving it consist of 5 Mid and Senior SRE engineers who ensure standards are in place, enable governance and security into organization, implement observability and incident management tools and processes and most importantly – keep an eye on all AWS accounts to prevent cost spikes, decrease the cost-per-customer footprint and provide automation support for all other teams to speed up business delivery.