Skip to content

[ISSUE #10537] Cache RPC Attributes and use processTimerElapsedMs in NettyRemotingAbstract#10539

Open
wang-jiahua wants to merge 1 commit into
apache:developfrom
wang-jiahua:perf/remoting-metrics-cache-and-timing
Open

[ISSUE #10537] Cache RPC Attributes and use processTimerElapsedMs in NettyRemotingAbstract#10539
wang-jiahua wants to merge 1 commit into
apache:developfrom
wang-jiahua:perf/remoting-metrics-cache-and-timing

Conversation

@wang-jiahua

Copy link
Copy Markdown
Contributor

Which Issue(s) This PR Fixes

Fixes #10537

Brief Description

How Did You Test This Change

Server benchmark: 8C30G broker, 1KB msg, 128 queues, sync mode, 256 threads. CI checkstyle + compilation passed.

@RockteMQ-AI RockteMQ-AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot

Summary

Caches per-request-code OpenTelemetry Attributes objects and replaces Stopwatch.elapsed(TimeUnit.MILLISECONDS) with processTimerElapsedMs() (nanoTime-based) across the remoting layer. Also adds cached Constructor lookup for custom header classes and reduces default HashMap capacity. Multiple coordinated micro-optimizations targeting the broker hot path.

Findings

  • [Warning] NettyRemotingServer.java:432-433 — Changed log.info("channel writable false. host:{}", ...) to log.debug(...). This is a behavioral change: production users who relied on INFO-level channel writability warnings for backpressure monitoring will lose this visibility. If this was intentional (to reduce log noise), it is fine, but worth confirming.

  • [Info] RemotingMetricsManager.java — The attributes cache uses a ConcurrentHashMap with a volatile fast-path (lastAttrsCacheKey / lastCachedAttrs). The non-atomic read-then-write pattern means two threads with the same cache key could both rebuild Attributes, but computeIfAbsent ensures correctness. The volatile fast-path is a reasonable optimization for the common case where the same request type dominates.

  • [Info] RemotingMetricsManager.java:214-216 — Cache key encoding: (requestCode << 19) | (responseCode << 3) | (isLongPolling ? 4 : 0) | resultIndex. This assumes requestCode fits in 16 bits (max 65535) and responseCode fits in 16 bits. Both are safe given current ResponseCode and RequestCode values. Consider adding a comment documenting these bit-width assumptions for future maintainers.

  • [Info] RemotingCommand.java — Replacing Stopwatch processTimer with long processTimeStartNanos eliminates one object allocation per request. The new processTimerElapsedMs() method correctly computes elapsed milliseconds from System.nanoTime(). The deprecated getProcessTimer() now returns a fresh Stopwatch which would give incorrect elapsed time if called — the @Deprecated annotation is appropriate.

  • [Info] RemotingCommand.java:231-256 — Cached Constructor lookup via CONSTRUCTOR_CACHE (ConcurrentHashMap with computeIfAbsent). Thread-safe. The setAccessible(true) call is necessary for non-public constructors. Clean.

  • [Info] RemotingCommand.java:341 — HashMap initial capacity reduced from 256 to 16. With default load factor 0.75, this supports 12 entries before resizing — sufficient for typical header field counts. Good optimization.

  • [Info] TopicQueueMappingContext.java — Added EMPTY static singleton to avoid allocating empty context objects. Clean.

Suggestions

  1. The log level change in NettyRemotingServer.java (channel writability) is the most notable behavioral change. If intentional, no action needed.

  2. Consider adding a brief comment on the cache key encoding in RemotingMetricsManager documenting the bit-width assumptions (requestCode ≤ 16 bits, responseCode ≤ 16 bits).

Overall: Well-structured set of micro-optimizations with consistent patterns. The Stopwatch → nanoTime replacement is a clean improvement.


Automated review by github-manager-bot

@wang-jiahua wang-jiahua force-pushed the perf/remoting-metrics-cache-and-timing branch from 5c2f5cc to bb222a5 Compare June 23, 2026 02:25
@wang-jiahua

Copy link
Copy Markdown
Contributor Author

Server Benchmark Results (2026-06-23)

Environment: 8C30G broker, Dragonwell JDK 21, 1KB msg, 128 queues, sync mode, 256 threads, 20s warmup + 30s collect

Producer-only (steady state, lines 3-5)

Test TPS Avg RT GC vs Baseline
Baseline 217k 1.18ms 354
#10539 223k 1.15ms 218 +2.8% TPS, -2.5% RT, -38% GC

Attributes cache (getOrBuildAttributes) eliminates per-RPC AttributesBuilder allocation. processTimerElapsedMs replaces deprecated getProcessTimer().elapsed(). GC reduction confirms allocation savings.

@RockteMQ-AI RockteMQ-AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot (Re-review after new commits)

Summary

PR squashed into 2 commits. Core approach unchanged: cache OpenTelemetry Attributes objects and replace Stopwatch with long processTimerNanos to eliminate per-RPC allocations. New log.isDebugEnabled() guard added for debug logging.

CI Status

CI is failing — SpotBugs reports 4 ES_COMPARING_PARAMETER_STRING_WITH_EQ violations in RemotingMetricsManager.getOrBuildAttributes() (lines 102-105). The result == RESULT_SUCCESS style comparisons on a method parameter trigger SpotBugs even though the values are interned constants.

Findings

  • [Critical] RemotingMetricsManager.java:102-105 — String == comparison on method parameter result. Fix: use .equals() or add @SuppressFBWarnings("ES_COMPARING_PARAMETER_STRING_WITH_EQ") with justification that all callers pass interned constant strings. The .equals() approach is safer and preferred.

  • [Info] RemotingMetricsManager.java — Cache key encoding (requestCode << 19) | (responseCode << 3) | (isLongPolling ? 4 : 0) | resultIdx is correct for current value ranges. The lastAttrsCacheKey/lastCachedAttrs volatile fast-path is a reasonable optimization for the dominant-case pattern.

  • [Info] RemotingCommand.javaprocessTimerElapsedMs() correctly computes elapsed ms from System.nanoTime(). The deprecated getProcessTimer() returning a fresh Stopwatch is a safe backward-compat shim.

  • [Info] NettyRemotingAbstract.javalog.isDebugEnabled() guard avoids string concatenation overhead when debug is disabled. Good.

Suggestions

  1. Fix the SpotBugs violations to unblock CI — this is the only blocker.
  2. Consider adding a brief comment on the cache key bit-width assumptions for future maintainers.

Automated re-review by github-manager-bot

@wang-jiahua wang-jiahua force-pushed the perf/remoting-metrics-cache-and-timing branch 2 times, most recently from f5a1d4b to be98520 Compare June 24, 2026 04:05

@RockteMQ-AI RockteMQ-AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot (Re-review after new commits)

Summary

New squashed commit. Core approach unchanged: cache OpenTelemetry Attributes via ConcurrentHashMap + volatile fast-path, and replace Stopwatch with long processTimerNanos.

CI Status

CI is still failing — SpotBugs reports 4 ES_COMPARING_PARAMETER_STRING_WITH_EQ violations in RemotingMetricsManager.getOrBuildAttributes():

RemotingMetricsManager.java:102 — result == RESULT_SUCCESS
RemotingMetricsManager.java:103 — result == RESULT_ONEWAY
RemotingMetricsManager.java:104 — result == RESULT_WRITE_CHANNEL_FAILED
RemotingMetricsManager.java:105 — result == RESULT_CANCELED

This was flagged in the previous review but not yet fixed. SpotBugs treats == comparison on a method parameter as a bug even when the constants are interned, because the parameter type is String and callers could pass non-interned strings.

Fix Required

Replace == with .equals() in getOrBuildAttributes():

public Attributes getOrBuildAttributes(int requestCode, int responseCode,
    boolean isLongPolling, String result) {
    int resultIdx;
    if (RESULT_SUCCESS.equals(result)) resultIdx = 0;
    else if (RESULT_ONEWAY.equals(result)) resultIdx = 1;
    else if (RESULT_WRITE_CHANNEL_FAILED.equals(result)) resultIdx = 2;
    else if (RESULT_CANCELED.equals(result)) resultIdx = 3;
    else resultIdx = -1;
    // ... rest unchanged

Findings

  • [Critical] RemotingMetricsManager.java:102-105Not fixed. Change result == CONSTANT to CONSTANT.equals(result) to resolve SpotBugs ES_COMPARING_PARAMETER_STRING_WITH_EQ. This is blocking CI.

  • [Info] The ConcurrentHashMap<Long, Attributes> + volatile fast-path caching strategy is sound. The lastAttrsCacheKey / lastCachedAttrs volatile pair avoids HashMap lookup on the hottest path. Good optimization.

  • [Info] The bit-packing key ((long) requestCode << 19) | ((responseCode & 0xFFFF) << 3) | ... is correct given the value ranges (requestCode fits in 13 bits, responseCode in 16 bits).


Automated review by github-manager-bot

@wang-jiahua wang-jiahua force-pushed the perf/remoting-metrics-cache-and-timing branch 3 times, most recently from eda382e to c90c171 Compare June 24, 2026 08:34
- RemotingMetricsManager: add getOrBuildAttributes() with long-key cache
- NettyRemotingAbstract: use cached Attributes instead of per-RPC AttributesBuilder
- NettyRemotingAbstract: add log.isDebugEnabled() guard for debug log
- RemotingMetricsConstant: add LABEL_*_KEY static AttributeKey instances
@wang-jiahua wang-jiahua force-pushed the perf/remoting-metrics-cache-and-timing branch from c90c171 to 611d8ea Compare June 24, 2026 08:57
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 5.88235% with 48 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.17%. Comparing base (2e6632f) to head (611d8ea).

Files with missing lines Patch % Lines
...rocketmq/remoting/netty/NettyRemotingAbstract.java 8.82% 28 Missing and 3 partials ⚠️
...ketmq/remoting/metrics/RemotingMetricsManager.java 0.00% 17 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             develop   #10539      +/-   ##
=============================================
- Coverage      48.27%   48.17%   -0.10%     
+ Complexity     13435    13411      -24     
=============================================
  Files           1377     1377              
  Lines         100844   100859      +15     
  Branches       13036    13041       +5     
=============================================
- Hits           48678    48589      -89     
- Misses         46217    46305      +88     
- Partials        5949     5965      +16     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@RockteMQ-AI RockteMQ-AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot (Re-review after new commit 611d8ea)

Summary

New commit confirmed — OpenTelemetry Attributes are now cached per request-code/response-code combination using a ConcurrentHashMap. Debug log string concatenation is also guarded with log.isDebugEnabled().

Assessment

  • [Correctness] The bit-packing key (requestCode << 19) | (responseCode << 3) | flags correctly encodes the attribute combination. The resultIdx mapping is consistent with the string constants. Cache invalidation is not needed since the set of request/response codes is finite and small.
  • [Performance] Eliminates per-request Attributes object allocation and string formatting on the metrics hot path. The isDebugEnabled() guard avoids unnecessary string concatenation. Good optimization.
  • [Thread Safety] ConcurrentHashMap.computeIfAbsent is used correctly for thread-safe lazy initialization.
  • [Info] One minor note: the cache is unbounded. In practice the number of distinct (requestCode, responseCode, isLongPolling, result) combinations is small and bounded, so this is not a concern.

LGTM — all CI checks pass. Clean performance improvement. ✅


Automated by github-manager-bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ISSUE #10537] Cache RPC Attributes and use processTimerElapsedMs in NettyRemotingAbstract

3 participants