Does cloud-native observability look more like kuber-native O11y to you?

October 2, 2020 - 3 minutes read - 490 words

I met a student a couple of months ago, and he had just started working on a research paper on observability. Everything he shared about his research topic baffled me.

At the end of our conversation, I asked a question about cloud-native observability.

Does cloud-native observability include only Kubernetes?

He said, “It’s not just about Kubernetes; however, we are talking about cloud-native workloads and a whole lot of those involve Kubernetes.”

I wasn’t sure what I was looking for, but I went searching anyway and found that cloud-native observability popped up almost always in relation to Kubernetes.

There’s even a CNCF webinar on the top 7 Kubernetes APIs for comprehensive cloud-native observability.

Here’s that list, and its order is relevant to the presenter:

Kubernetes API Watchers
Kubernetes Events API
Kubernetes Downward API
Pod API
Container API
Service API
Kubernetes Metrics APIs

That doesn’t really answer my question. Also, the CNCF Observability SIG’s mission statement that includes “observation of cloud native workloads” doesn’t seem to restrict it to Kubernetes.

I know that it’s CNCF, so Kubernetes is assumed. But still…

Anyway, it seems that when folks are asking about cloud-native observability, they’re speaking with Kubernetes in mind.

Guess it’s more like Kubernetes-native observability. I’d prefer to call it KuberNative observability (or even Kuber-native O11y) if that’s what it really entails.

But I wonder how complex it is to make cloud-native observability work in a non-Kubernetes, multi-cloud setup (whether you consider hybrid cloud or not). I hope there’s a YouTube video on this!

BTW, the CNCF Observability SIG is a new group, so it might be a good place to start if you are interested in cloud-native observability. Here’s their official announcement.

The following points in their stated scope piqued my interest:

How developers, operators, Site Reliability Engineers (SRE), IT Engineers, and other actors comprehend, process, and reason on distributed cloud-native systems.
Using distributed trace tooling to observe a series of calls between microservices to understand where time is being spent.
Projects that incorporate novel & insightful approaches to utilizing observability data such as:
- ML, model training, Bayesian networks, and other data science techniques that enable anomaly & intrusion detection.
- Correlating resource consumption with costing data to reduce the total cost of cloud native infrastructure.
- Using observability data exposed by service meshes, orchestrators, and other metric sources to inform continuous deployment tooling (e.g. Canary Predicates/Judges).

I’m definitely paying attention to updates from the Observability SIG because I need to answer the question that I asked in my previous blog.

Seems like this student is challenging me to transcend the buzz of Cloud Native and Observability.

We spoke about a ton of things since our first meeting, but the moment of truth for me was when we spoke about distributed consensus.

I’ll write a blog about that engaging conversation someday. For now, I need to continue to observe distributed systems in nature and learn about cloud-native resiliency.

Banner Image by Skeeze on Pixabay

Observability