From: aidotengineer

Open Telemetry is a significant open-source project maintained by the CNCF, standardizing cloud observability in cloud environments [00:00:31]. It is one of the largest projects under the CNCF, second only to Kubernetes [00:00:33]. Open Telemetry defines a protocol for logging, metrics, and traces, which are essential for understanding the behavior of cloud applications [00:01:05]. This standard is supported by all major observability platforms, including Splunk, Datadog, Dynatrace, New Relic, Grafana, and Honeycomb [00:00:50].

Observability Signals

Logging

Logging involves recording arbitrary events that can be sent at any time during an application’s lifecycle [00:01:29]. These events are emitted as is and can be viewed later, potentially with associated metadata [00:01:37]. For example, simply using a “print” statement in a Python script is a form of logging [00:01:23].

Metrics

Metrics represent data observed at an aggregate level, showing how a system behaves over time, across users, or other dimensions [00:01:46]. In traditional cloud environments, common metrics include CPU usage, memory usage, and latency [00:01:56]. For GenAI applications, relevant metrics might include token usage, latency, and error rates [00:02:08].

Tracing

Tracing involves tracking a multi-step process [00:02:31]. In cloud environments with microservices, tracing allows visibility into a process that spans multiple services, showing how a user request is processed across them [00:02:35]. Tracing is particularly common for GenAI applications due to their use of multi-step processes like chains, workflows, or agents that interact with tools [00:02:55].

Open Telemetry Ecosystem

Beyond defining a protocol, Open Telemetry also provides an ecosystem of tools including SDKs, instrumentations, and collectors [00:03:25].

SDKs

SDKs (Software Development Kits) enable manual sending of logs, metrics, and traces from an application [00:03:32]. Open Telemetry currently supports SDKs in 11 different languages, including Python, TypeScript, Go, and C++ [00:03:40].

Instrumentations

Instrumentations automate the collection of observability data [00:03:55]. Unlike SDKs which require manual calls, instrumentations can automatically provide visibility into parts of an application [00:04:00]. For example, an instrumentation for PostgreSQL can automatically generate logs, metrics, and traces related to SQL Server interactions [00:04:40]. These work by monkey-patching client libraries used within the application and emitting desired data [00:04:54]. They are designed to have a negligible latency impact [00:05:09].

Collectors

Collectors are self-deployable components that can be run in a cloud environment (e.g., Kubernetes) to pre-process observability data before sending it to a platform [00:05:30]. They can be used to filter irrelevant data, obscure Personally Identifiable Information (PII) or sensitive data, and send data to multiple observability providers simultaneously [00:05:40]. These open-source components come with many built-in features [00:05:52].

OpenLLMetry for GenAI Observability

OpenLLMetry, a project by Trace Loop, extends Open Telemetry to support various GenAI frameworks, foundation models, and vector databases [00:06:20]. This extension leverages Open Telemetry’s existing capabilities to provide observability for AI systems [00:06:42].

OpenLLMetry provides instrumentations for over 40 different providers [00:07:09], including:

  • Foundation models: OpenAI, Anthropic, Cohere, Gemini, Bedrock [00:07:16]
  • Vector databases: Pinecone, Chroma [00:07:23]
  • Frameworks: LangChain, LlamaIndex, CrewAI, Haystack [00:07:28]

These instrumentations automatically emit logs, metrics, and traces that can be connected to any preferred observability platform, offering an out-of-the-box solution [00:07:34]. For example, the Pinecone instrumentation can track queries, indexing, and allow investigation of returned vectors, including data, distances, and scores [00:07:54].

By using Open Telemetry, applications are not tied to a specific observability platform, allowing for easy switching through configuration changes, as all supporting platforms adhere to the same standard format [00:08:41].