Tracing
The feature "otel" can be used when building rustup to turn on Opentelemetry tracing with an OLTP GRPC exporter.
This can be very useful for diagnosing performance or correctness issues in more complicated scenarios.
Prerequisites
protoc
must be installed, which can be downloaded from GitHub or installed via package manager.
Usage
The normal OTLP environment variables can be used to customise its behaviour, but often the simplest thing is to just run a Jaeger docker container on the same host:
docker run -d --name jaeger -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 -e COLLECTOR_OTLP_ENABLED=true -p 6831:6831/udp -p 6832:6832/udp -p 5778:5778 -p 16686:16686 -p 4317:4317 -p 4318:4318 -p 14250:14250 -p 14268:14268 -p 14269:14269 -p 9411:9411 jaegertracing/all-in-one:latest
Then build rustup-init with tracing:
cargo build --features=otel
Run the operation you want to analyze:
RUSTUP_FORCE_ARG0="rustup" ./target/debug/rustup-init show
And look in Jaeger for a trace.
Tracing and tests
Tracing can also be used in tests to get a trace of the operations taken during the test.
The custom macro rustup_macros::test
adds a prelude and suffix to each test to
ensure that there is a tracing context setup, that the test function is a span,
and that the spans from the test are flushed.
Build with features=otel,test to use this feature.
Adding instrumentation
The otel
feature uses conditional compilation to only add function instrument
when enabled. Instrumenting a currently uninstrumented function is mostly simply
done like so:
#![allow(unused)] fn main() { #[cfg_attr(feature = "otel", tracing::instrument(err, skip_all))] }
skip_all
is not required, but some core structs don't implement Debug yet, and
others have a lot of output in Debug : tracing adds some overheads, so keeping
spans lightweight can help avoid frequency bias in the results - where
parameters with large debug in frequently called functions show up as much
slower than they are.
Some good general heuristics:
- Do instrument slow blocking functions
- Do instrument functions with many callers or that call many different things, as these tend to help figure the puzzle of what-is-happening
- Default to not instrumenting thin shim functions (or at least, only instrument them temporarily while figuring out the shape of a problem)
- Be way of debug build timing - release optimisations make a huge difference, though debug is a lot faster to iterate on. If something isn't a problem in release don't pay it too much heed in debug.
Caveats
Cross-thread propagation isn't connected yet. This will cause instrumentation in a thread to make a new root span until it is fixed. If any Tokio runtime-related code gets added in those threads this will also cause a panic. We have a couple of threadpools in use today; if you need to instrument within that context, use a thunk to propagate the tokio runtime into those threads.