[EXPORTER ETW] Generate a fresh trace id for every root span#4101
[EXPORTER ETW] Generate a fresh trace id for every root span#4101lukeina2z wants to merge 1 commit into
Conversation
575ff53 to
4004611
Compare
Fixes open-telemetry#3846. etw::Tracer was caching a single TraceId in a member (traceId_) generated once in its constructor, and using that same id as the fallback trace id for every root span it created. So if you did something as ordinary as: auto tracer = provider->GetTracer("Geneva-Tracer-Foo"); auto spanFoo = tracer->StartSpan("Span-Foo"); // root auto spanBar = tracer->StartSpan("Span-Bar"); // also root both Foo and Bar would come out the back end stamped with the same env_dt_traceId, and the Geneva/Jarvis side would collapse two unrelated requests into one logical trace. The issue has a screenshot of exactly that: Span-Foo and Span-Bar emitted with the same trace id of 8507dcdc651af34fb8f21152a98e2977. The OTel spec says each root span starts a new trace, and that is also what sdk::trace::Tracer does -- when there is no valid parent it calls generator.GenerateTraceId() on the spot, no cache anywhere (see sdk/src/trace/tracer.cc, the "if (parent_context.IsValid()) ... else trace_id = generator.GenerateTraceId()" block). The ETW exporter was the odd one out here. Fix is one line in StartSpan(): when parentContext is invalid, call GetIdGenerator(tracerProvider_).GenerateTraceId() instead of returning the cached traceId_. This is the same call shape already used on the next line to mint the SpanId, so the access path is proven and thread-safe. A nice side benefit: trace-id-ratio samplers were getting the same trace id for every root span on a Tracer and therefore making the same accept/reject decision for all of them, which defeated the point of probabilistic sampling. With this change the sampler sees a fresh id per root span and behaves the way the SDK does. I deliberately did not remove the now-unused traceId_ member, its init in the ctor, or the public trace_id() accessor on Tracer. The accessor is part of the exporter's public surface and removing it would be a source break for anyone calling tracer.trace_id(). Happy to follow up with a cleanup PR if maintainers prefer. Also updated ETWTracer.GlobalSingletonTracer in exporters/etw/test/etw_tracer_test.cc. That test was asserting EXPECT_EQ(traceId1, traceId3) -- i.e. two consecutive root spans on the same Tracer should share a trace id -- which was just encoding the bug. After this fix each root span gets its own fresh trace id, so the assertion is flipped to EXPECT_NE, plus checks that all three trace ids are valid and distinct, and that the C++11 magic-static GetGlobalTracer() really does return the same Tracer reference both times. Comments and the illustrative sample-event TraceId in the doc comment were updated to match the new behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
4004611 to
f7d1649
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4101 +/- ##
==========================================
- Coverage 82.01% 82.00% -0.01%
==========================================
Files 385 385
Lines 16030 16030
==========================================
- Hits 13145 13143 -2
- Misses 2885 2887 +2 🚀 New features to boost your workflow:
|
|
Thanks for the PR. The intention here is right: different root spans should not all get the same trace id. Just to clarify, the current ETW behavior was intentional for performance reasons. ETW is a high-throughput exporter and does not always follow the regular SDK behavior when that adds cost to the hot path. With this change, every root |
Fixes #3846.
etw::Tracer was caching a single TraceId in a member (traceId_) generated once in its constructor, and using that same id as the fallback trace id for every root span it created. So if you did something as ordinary as:
both Foo and Bar would come out the back end stamped with the same env_dt_traceId, and the Geneva/Jarvis side would collapse two unrelated requests into one logical trace. The issue has a screenshot of exactly that: Span-Foo and Span-Bar emitted with the same trace id of 8507dcdc651af34fb8f21152a98e2977.
The OTel spec says each root span starts a new trace, and that is also what sdk::trace::Tracer does -- when there is no valid parent it calls generator.GenerateTraceId() on the spot, no cache anywhere (see sdk/src/trace/tracer.cc, the "if (parent_context.IsValid()) ... else trace_id = generator.GenerateTraceId()" block). The ETW exporter was the odd one out here.
Fix is one line in StartSpan(): when parentContext is invalid, call GetIdGenerator(tracerProvider_).GenerateTraceId() instead of returning the cached traceId_. This is the same call shape already used on the next line to mint the SpanId, so the access path is proven and thread-safe.
A nice side benefit: trace-id-ratio samplers were getting the same trace id for every root span on a Tracer and therefore making the same accept/reject decision for all of them, which defeated the point of probabilistic sampling. With this change the sampler sees a fresh id per root span and behaves the way the SDK does.
I deliberately did not remove the now-unused traceId_ member, its init in the ctor, or the public trace_id() accessor on Tracer. The accessor is part of the exporter's public surface and removing it would be a source break for anyone calling tracer.trace_id(). Happy to follow up with a cleanup PR if maintainers prefer.
Fixes # (issue)
Changes
Please provide a brief description of the changes here.
For significant contributions please make sure you have completed the following items:
CHANGELOG.mdupdated for non-trivial changes