From: aidotengineer

OpenTelemetry is an open source project maintained by the CNCF, standardizing cloud observability within cloud environments [00:00:28]. It is one of the largest projects under the CNCF, second only to Kubernetes [00:00:33].

Core Purpose

OpenTelemetry serves as a protocol that standardizes how logging, metrics, and traces are handled in cloud applications [00:01:05]. It is supported by every major observability platform, including Splunk, Datadog, Dynatrace, New Relic, Grafana, and Honeycomb, allowing for seamless integration [00:00:50].

Key Observability Signals

Logging

Logging involves sending arbitrary events at any point in an application’s lifecycle, which can be viewed later, often with metadata [00:01:29]. It is comparable to simple print statements in programming [00:01:25].

Metrics

Metrics are used for aggregate-level data, showing how something behaves over time, across users, or other desired dimensions [00:01:44].

  • Traditional Cloud Metrics: Typically include CPU usage, memory usage, or latency [00:01:56].
  • Generative AI (GenAI) Metrics: For GenAI-based applications, relevant metrics include token usage, latency, and error rate [00:02:05].

Tracing

Tracing tracks multi-step processes, which is crucial for understanding how requests are processed across multiple microservices in a cloud environment [00:02:31]. For GenAI applications, tracing is particularly common due to the use of multi-step processes like chains, workflows, or agents that interact and run tools [00:02:54]. Traces were the first element defined by OpenTelemetry [00:02:21].

OpenTelemetry Ecosystem

Beyond a protocol, OpenTelemetry is an ecosystem that includes:

SDKs (Software Development Kits)

SDKs provide the means to manually send logs, metrics, and traces from applications [00:03:32]. OpenTelemetry currently supports SDKs in 11 different languages, including Python, TypeScript, Go, and C++ [00:03:40].

Instrumentations

Instrumentations automatically gather observability data, providing visibility into specific parts of an application [00:03:55]. For example, an instrumentation for an SQL Server like PostgreSQL can automatically emit logs, metrics, and traces [00:04:40]. These work by “monkey patching” the client library used by the application, ensuring a negligible latency impact and providing a comprehensive view of system activity without manual intervention [00:04:54].

Collectors

Collectors are self-deployable components that can be deployed in a cloud environment (e.g., Kubernetes) to preprocess observability data before it is sent to an observability platform [00:05:25]. They can be used to filter out unimportant data, obscure Personally Identifiable Information (PII) or sensitive data, and send data to multiple providers [00:05:40]. These ready-made, open source components offer many built-in features [00:05:52].

OpenTelemetry and Generative AI

Trace Loop has extended OpenTelemetry to support various GenAI frameworks, foundation models, and vector databases [00:06:16]. This integration allows users to automatically obtain logs, metrics, and traces from their GenAI applications within their preferred observability platforms [00:06:42].

With over 40 different providers supported through community-built instrumentations, this includes:

  • Foundation Models: OpenAI, Anthropic, Cohere, Gemini, Bedrock, and others [00:07:16].
  • Vector Databases: Pinecone, Chroma, and others [00:07:23].
  • Frameworks: LangChain, LlamaIndex, CrewAI, and Haystack [00:07:28].

For instance, a Pinecone instrumentation can capture queries, indexing activities, and allow investigation of returned vectors, including data, distances, and scores [00:07:58].

Advantages

OpenTelemetry provides a standardized way to connect LLM-based applications to existing observability platforms [00:08:31]. Being a standard protocol, it prevents vendor lock-in, enabling easy switching between platforms through simple configuration changes, as all supporting platforms use the same format [00:08:41].