As is the nature of startups, my job here at Veri involves many different
disciplines on a daily basis. One such discipline is Site Reliability
Engineering (SRE). SRE incorporates various aspects of software engineering and infrastructure management in order to ensure that we deliver highly reliable and robust software to our customers.
Achieving this however, is no trivial task. With the continued growth of Veri, our backing infrastructure has also grown in order to scale with the increased demand and continued innovation behind our product. So how do we do it then, how do we ensure high levels of reliability and robustness for our customers while continuously evolving?
The answer is data!
By collecting performance and system data on a large scale across our entire infrastructure, what we have at our disposal is the ability to identify and monitor (in real time) key metrics regarding the state of our systems and infrastructure. This in turn allows us to quickly pinpoint any potential issues in the system and trace them right back to the source as soon as they arise. From there our team can quickly tackle it moments after it arises, before the end users are ever made aware of it.
While our use-case for data collection is very specific to SRE, the usefulness of data in ensuring reliability spans across all disciplines. Instead of redundantly storing data on paper where it is rarely given a second look, why not employ your data to do the work for you. Let it provide you with the key metrics and insights necessary for your company to become proactive in ensuring the highest levels of quality for your users.
Marijan Gradecak is the Chief Software Engineer at Veri