OpenTelemetry meets OpenAI: manual instrumentation

Originally published on medium.

In the previous post, we inspected calls to OpenAI APIs triggered within Langchain and LlamaIndex by using OpenTelemetry auto-instrumentation. The spans shown in Jaeger UI were nice to see, but were missing rich information that is expected from a proper instrumentation approach. In this post, we will explore how to enrich spans with additional information using manual instrumentation.

Manual instrumentation

OpenTelemetry provides means to add additional attributes to spans. The OpenTelemetry standard defines two rules:

Keys must be non-null string values
Values must be a non-null string, boolean, floating point value, integer, or an array of these values

Additionally, most commonly used fields follow naming conventions and are referred to as semantic attributes.

Note: Beware of adding fields that may contain PII information to span context. Unless you guarantee that all systems processing the telemetry drop stored data after a fixed period of time (e.g. 30 days), you may run into challenges related to privacy regulation, such as GDPR and its ‘Right to be forgotten’.

Adding instrumentation to own code

Instrumenting own code is as simple as shown in the code below. It starts a new span called function_name with an attribute arg with value 42.

Any spans that are added using auto-instrumentation to functions called by function_name will automatically become its child spans.

1
2
3
4
5
6
7
from opentelemetry import trace
(...)
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("function_name") as span:
    arg = 42
    span.set_attribute("arg", arg)
    result = function_name(arg)

Alternatively, one can use the provided decorator, which results in simpler code in case it’s not necessary to capture any attributes in the spans.

1
2
3
@tracer.start_as_current_span("foobar")
def foobar(arg):
    result = foo_bar(arg)

Adding instrumentation to Langchain’s LLM Chains

Langchain offers Custom Callback Handlers as means to execute additional functions in well-defined stages of the chains. To collect statistics on the prompts and token usage from the LLM calls, we can add spans in the on_llm_start and on_llm_end calls:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
def on_llm_start(
        self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
    ) -> None:
        with tracer.start_as_current_span("on_llm_start") as span:
            prompts_len += sum([len(prompt) for prompt in prompts])
            span.set_attribute("num_processed_prompts", len(prompts))
            span.set_attribute("prompts_len", prompts_len)

def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
        with tracer.start_as_current_span("on_llm_end") as span:
            # example output: {'completion_tokens': 14, 'prompt_tokens': 71, 'total_tokens': 85}
            token_usage = response.llm_output["token_usage"]
            for k, v in token_usage.items():
                span.set_attribute(k, v)

Adding instrumentation for OpenAI Embeddings in LlamaIndex

LlamaIndex does not provide callback mechanisms for its embeddings functions. Instead, we can to extend the OpenAIEmbedding class, include instrumentation code in the overridden methods, and pass an instance of this class to the relevant methods of the library. In the added spans we collect the text lengths as span attributes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
class InstrumentingOpenAIEmbedding(OpenAIEmbedding):
    def __init__(
        self,
        mode: str = OpenAIEmbeddingMode.TEXT_SEARCH_MODE,
        model: str = OpenAIEmbeddingModelType.TEXT_EMBED_ADA_002,
        deployment_name: Optional[str] = None,
        **kwargs: Any,
    ) -> None:
        """Init params."""
        super().__init__(**kwargs)
        self.mode = OpenAIEmbeddingMode(mode)
        self.model = OpenAIEmbeddingModelType(model)
        self.deployment_name = deployment_name

    def _get_query_embedding(self, query: str) -> List[float]:
        with tracer.start_as_current_span("_get_query_embedding") as span:
            span.set_attribute("query_length", len(query))
            return super()._get_query_embedding(query)

    def _get_text_embedding(self, text: str) -> List[float]:
        with tracer.start_as_current_span("_get_text_embedding") as span:
            span.set_attribute("text_length", len(text))
            return super()._get_text_embedding(text)

    async def _aget_text_embedding(self, text: str) -> List[float]:
        with tracer.start_as_current_span("_aget_text_embedding") as span:
            span.set_attribute("text_length", len(text))
            embeddings = await super()._aget_text_embedding(text)
            return embeddings

    def _get_text_embeddings(self, texts: List[str]) -> List[List[float]]:
        with tracer.start_as_current_span("_get_text_embeddings") as span:
            span.set_attribute("texts_len", sum([len(txt) for txt in texts]))
            return super()._get_text_embeddings(texts)

    async def _aget_text_embeddings(self, texts: List[str]) -> List[List[float]]:
        with tracer.start_as_current_span("_aget_text_embeddings") as span:
            span.set_attribute("texts_len", sum([len(txt) for txt in texts]))
            embeddings = await super()._aget_text_embeddings(texts)
            return embeddings

The obvious downside of the approach is that the code needs to be kept in sync with the extended base class, which results in increased maintenance effort in case of library upgrades.

Inspecting the spans

Running and using the code mentioned earlier produces two traces. First, the embedding span with the added attribute texts_len:

Screenshot from Jaeger UI showing the added embedding span with added attribute texts_len. — Screenshot from Jaeger UI showing the added embedding span with its attributes

Next, the embedding traces and on_llm_start and on_llm_end traces with the captured query_length and token usage attributes:

Screenshot from Jaeger UI showing the embedding traces and on_llm_start and on_llm_end traces with the captured query_length and token usage attributes. — Screenshot from Jaeger UI showing the captured traces and LLM token usage attributes

Writing an Instrumentor for OpenAI Embeddings in LlamaIndex

Extending classes can be cumbersome and an unnecessary maintenance overhead. The built-in instrumentation offered by many of the OpenTelemetry instrumentation packages for Python offer inspiration for a different approach of instrumentation using function wrappers.

Following the example of the Redis instrumentation library, we use the convenient wrapt package to write a simple wrapper function for three methods in the OpenAIEmbedding class. The wrapper _traced calculates the length of the passed string(s) depending on the function’s argument type (str or List[str]).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
from wrapt import wrap_function_wrapper

from llama_index.embeddings.openai import OpenAIEmbedding

from opentelemetry import trace
from opentelemetry.instrumentation.instrumentor import BaseInstrumentor
from opentelemetry.instrumentation.utils import unwrap
from opentelemetry.trace import SpanKind, Tracer, get_tracer

def _instrument(
    tracer: Tracer
):
    def _traced(func, instance, args, kwargs):
        with tracer.start_as_current_span(
            "get_embedding", kind=SpanKind.CLIENT
        ) as span:
            if span.is_recording():
                if len(args) > 0 and args[0]:
                    if isinstance(args[0], list):
                        span.set_attribute("text_length", sum(len(e) for e in args[0]))
                    else:
                        span.set_attribute("text_length", len(args[0]))
         
            response = func(*args, **kwargs)
            return response

    wrap_function_wrapper("llama_index.embeddings.openai", "OpenAIEmbedding.get_query_embedding", _traced)
    wrap_function_wrapper("llama_index.embeddings.openai", "OpenAIEmbedding.get_text_embedding", _traced)
    wrap_function_wrapper("llama_index.embeddings.openai", "OpenAIEmbedding._get_text_embeddings", _traced)

class OpenAIEmbeddingInstrumentor(BaseInstrumentor):
    def instrumentation_dependencies(self) -> Collection[str]:
        return ("llama-index ~= 0.4.32",)

    def _instrument(self, **kwargs):
        """Instruments llama-index module"""
        tracer_provider = kwargs.get("tracer_provider")
        tracer = get_tracer(__name__, "custom-tracer-version", tracer_provider)
        _instrument(tracer)

    def _uninstrument(self, **kwargs):
        unwrap(OpenAIEmbedding, "get_query_embedding")
        unwrap(OpenAIEmbedding, "get_text_embedding")
        unwrap(OpenAIEmbedding, "_get_text_embeddings")

To ensure the instrumentor is actually used, it needs to be initialized with OpenAIEmbeddingInstrumentor().instrument() before the first library calls are initiated. The resulting traces generated by the instrumentor code are as follows:

Screenshot from JaegerUI depicting the span generated by the generic OpenAIEmbeddingInstrumentor. — Screenshot from JaegerUI depicting the span generated by the generic `OpenAIEmbeddingInstrumentor`

Summary

We explored adding additional context to spans by adding instrumentation in three different ways: (1) manual instrumentation of individual function calls, (2) extending classes to override methods with ones that include tracing code, (3) instrumenting library code using function wrappers. When to use which approach is highly contextual and depends on the use case at hand. Approach 1 is best used for one’s own code, approach 3 for instrumenting libraries, and approach 2 when a high degree of control over instrumentation is required.

It’s important to be careful and not overdo instrumentation and rely on the provided instrumentation packages whenever applicable. When considering adding manual instrumentation, it’s important to balance the benefits of additional detail with the potential complexity it may introduce. Note that in production deployments, tracing data is often sampled to deal with high data volume and keep the tracing cost footprint in check and this affects the accuracy of the collected data.

Manual instrumentation#

Adding instrumentation to own code#

Adding instrumentation to Langchain’s LLM Chains#

Adding instrumentation for OpenAI Embeddings in LlamaIndex#

Inspecting the spans#

Writing an Instrumentor for OpenAI Embeddings in LlamaIndex#

Summary#

References#