From: aidotengineer

OpenTelemetry is a significant open-source project maintained by the CNCF, standardizing methods for cloud observability in cloud environments [00:00:31]. It is one of the largest projects under the CNCF, second only to Kubernetes [00:00:33]. It is widely supported by major observability platforms including Splunk, DataDog, Dynatrace, New Relic, Grafana, and Honeycomb [00:00:50].

Core Components and Data Types

OpenTelemetry defines a protocol that standardizes logging, metrics, and tracing in cloud applications [00:01:05].

Logging

Logging involves sending arbitrary events at any point in an application’s lifecycle [00:01:29]. These events are emitted as is and can be viewed later, potentially with associated metadata [00:01:37].

Metrics

Metrics are used for aggregate-level data, allowing observation of behavior across different timeframes or users [00:01:44].

  • Traditional Cloud Metrics: Common metrics include CPU usage, memory usage, and latency [00:01:56].
  • Gen AI Specific Metrics: For Gen AI-based applications, relevant metrics include token usage, latency, and error rate [00:02:05].

Tracing

Tracing involves tracking multi-step processes [00:02:31]. In cloud environments, this means observing a process that spans across multiple microservices, showing how a user request is processed [00:02:35]. Tracing is particularly useful for Gen AI applications due to their use of multi-step processes like chains, workflows, and agents [00:02:54].

OpenTelemetry Ecosystem

Beyond the protocol, OpenTelemetry offers an ecosystem comprising SDKs, instrumentations, and collectors [00:03:23].

SDKs (Software Development Kits)

SDKs enable manual sending of logs, metrics, and traces from applications [00:03:32]. OpenTelemetry SDKs support 11 different languages, including Python, TypeScript, Go, and C++ [00:03:40].

Instrumentations

Instrumentations provide automatic collection of observability data [00:03:55]. They work by “monkey patching” the client library used within an application (e.g., an SQL Server like PostgreSQL) to automatically emit logs, metrics, and traces [00:04:40]. This process occurs on the application side with negligible latency impact, offering a comprehensive view of system activities without manual configuration [00:05:07].

Collectors

Collectors are self-deployable components that can be placed in a cloud environment (e.g., Kubernetes) [00:05:30]. They provide pre-processing capabilities for observability data before it’s sent to a platform [00:05:36].

  • Data Filtering: Can filter out unimportant data [00:05:42].
  • Data Obscuration: Can obscure or hide Personally Identifiable Information (PII) or sensitive data [00:05:45].
  • Multi-Provider Export: Allows sending observability data to multiple providers simultaneously [00:06:01]. These collectors are open-source and come with many built-in features [00:05:54].

OpenTelemetry for Gen AI

OpenTelemetry has been extended to support Gen AI frameworks, foundation models, and vector databases [00:06:20]. This extension leverages the existing OpenTelemetry standard, allowing users to obtain observability data (logs, metrics, traces) automatically in their preferred platforms like DataDog, Sentry, Grafana Tempo, or Dynatrace [00:06:42].

Key integrations include over 40 different providers [00:07:12]:

  • Foundation Models: OpenAI, Anthropic, Cohere, Gemini, Bedrock [00:07:16].
  • Vector Databases: Pinecone, Chroma [00:07:23].
    • Example (Pinecone): An instrumentation for Pinecone can show queries, indexing, and allow investigation of returned vectors, including data, distances, and scores [00:07:54].
  • Frameworks: LangChain, LlamaIndex, CrewAI, Haystack [00:07:28].

The use of instrumentations ensures that data emission is automatic [00:07:45]. A significant benefit of OpenTelemetry is that it acts as a standard protocol, preventing vendor lock-in and allowing easy switching between different observability platforms through simple configuration changes [00:08:41].