SRE Monitoring Observation

Monitoring and Observation

Quote from management guru, Peter Drucker, “If you can’t measure it, you can’t manage it. ”If you don’t measure, then how do you know how you are doing? Are you doing well? Fast forward decades, if you can’t measure your cloud and on premise infrastructure, activities, and applications, then how do you know how well you are doing?

Azure has a set of services that help you meter and observe your on premise and cloud resources.

Azure Monitor

Credits to Microsoft for the image.

First, there are two terms, Observability and Monitoring, that tend to merge into the same definition. They are unique.

Observability

Observability is about understanding a system state. It is about describing the system state. It is a measure of how well internal states of a system can be inferred from knowledge of its external outputs In the context of Azure, Microsoft defines the “pillars” of observability as metrics, logs, distributed traces, and changes.

Monitoring

Monitoring is about collecting, and analyzing data. It is about making decisions about a system state. Azure monitor can collect and analyze data from:

Your applications – Data collected from applications on premise on in the cloud.

Containers – Data collected about containers and apps running in containers.

Operating system – Data collected about operating system which is hosting an application.

Azure resources – Data collected about azure resources.

Azure subscriptions – Data collected about subscription management and Azure health.

Azure tenant – Data collected about Azure Active Directory.

Azure resource changes – Data collected about changes to azure resources.

Metrics

Metrics are numerical values that describe some aspect of a system at a particular point in time. They are collected at regular intervals and are identified with a timestamp, a name, a value, and one or more defining labels. Metrics can be aggregated using a variety of algorithms, compared to other metrics, and analyzed for trends over time.”

Logs

Logs are events that occurred within the system. They can contain different kinds of data and may be structured or free-form text with a timestamp. They may be created sporadically as events in the environment generate log entries, and a system under heavy load will typically generate more log volume.”

Distributed traces

“Traces are series of related events that follow a user request through a distributed system. They can be used to determine behavior of application code and the performance of different transactions. While logs will often be created by individual components of a distributed system, a trace measures the operation and performance of your application across the entire set of components.”

Changes

“Change Analysis alerts you to live site issues, outages, component failures, or other change data. It also provides insights into those application changes, increases observability, and reduces the mean time to repair. You automatically register the Microsoft.ChangeAnalysis resource provider with an Azure Resource Manager subscription by going to Change Analysis via the Azure portal. For web app in-guest changes, you can enable the Change Analysis tool via the Change Analysis portal.”

Resource Graph

“Azure Resource Graph is an Azure service designed to extend Azure Resource Management by providing efficient and performant resource exploration with the ability to query at scale across a given set of subscriptions so that you can effectively govern your environment. These queries provide the following abilities:

  • Query resources with complex filtering, grouping, and sorting by resource properties.
  • Explore resources iteratively based on governance requirements.
  • Assess the impact of applying policies in a vast cloud environment.
  • Query changes made to resource properties (preview).”

Summary

Monitoring, gathering metrics, being able to establish a baseline, and look for changes, are part and parcel of Site Reliability Engineering.

Want to dive down to another level, click here.


Discover more from Threat Detection

Subscribe to get the latest posts to your email.

Leave a Reply

Discover more from Threat Detection

Subscribe now to keep reading and get access to the full archive.

Continue reading