dhcp: add configurable backoff parameters for DHCPv4#593
Conversation
de88396 to
d5f27da
Compare
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughThree new DHCPv4-only retransmission timing directives ( ChangesConfigurable DHCP Backoff Timing
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/dhcp.c`:
- Around line 1867-1880: The jitter subtraction currently happens in unsigned
space causing wrap; change the jitter computation to use a signed/wider integer:
compute a signed jitter_offset =
(int64_t)arc4random_uniform((uint32_t)ifo->backoff_jitter * 2) -
(int64_t)ifo->backoff_jitter, clamp jitter_offset to a reasonable range if
needed, then compute RT using signed/wider arithmetic as RT =
(int64_t)state->interval * MSEC_PER_SEC + jitter_offset and assign back ensuring
types match (use explicit casts) to avoid unsigned wrap; update the code around
variables jitter, state->interval, RT and the arc4random_uniform call.
In `@src/if-options.c`:
- Around line 2586-2603: The new options backoff_cutoff and initial_interval
must be validated together in finish_config(): after parsing, check if
ifo->backoff_cutoff is less than ifo->initial_interval and either clamp
backoff_cutoff to initial_interval or return an error; update finish_config() to
perform this comparison and adjust or reject the configuration (modify the
function that finalizes options, e.g., finish_config()) and provide a clear
logerrx or warning message indicating the clamp or failure including both
values.
In `@tests/backoff/test_backoff.c`:
- Around line 847-859: The test currently tears down host dhcpcd state by
calling run_cmd("pkill -9 dhcpcd ...") and manipulating /var/run/dhcpcd and
/var/db/dhcpcd; change this to operate only inside the per-test temporary
directory created via tmpdir/mkdtemp: remove the pkill and any run_cmd
invocations that create/delete /var/run/dhcpcd or /var/db/dhcpcd, and instead
create and use equivalent per-test directories under tmpdir (e.g., snprintf
paths under tmpdir and call run_cmd or mkdir -p on those) so cleanup is limited
to tmpdir and the host daemon is not affected; update any code that expects
global paths to point to the tmpdir equivalents so the timing/reject case stays
isolated.
- Around line 730-742: Help text for the -n option is incorrect: update the
fprintf format string in tests/backoff/test_backoff.c (the multi-line usage
string passed to fprintf) so the "-n N" line shows the actual default of
num_runs (128) instead of 1 (e.g., change "(default: 1, max: %d)" to "(default:
128, max: %d)"). Ensure the format still uses MAX_RUNS for the max placeholder
and that the rest of the usage string remains unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1a842ef5-281a-42bd-a4e0-68df4153f8f5
📒 Files selected for processing (7)
src/dhcp.csrc/dhcpcd.conf.5.insrc/if-options.csrc/if-options.htests/Makefiletests/backoff/Makefiletests/backoff/test_backoff.c
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/backoff/test_backoff.c`:
- Around line 685-696: The snprintf command in the test initialization does not
establish private mount propagation within the new namespace created by unshare,
which can allow bind mounts to leak back to the host. After the `unshare --mount
--net --uts --ipc --pid --fork sh -c '` portion enters the namespace, add a
`mount --make-private /var` command before the `mkdir -p /var/run/dhcpcd
/var/db/dhcpcd` operation to ensure mount propagation is explicitly set to
private, preventing host pollution and achieving proper namespace isolation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8ae92c47-55d3-4256-9ef3-7ac93eed899f
📒 Files selected for processing (4)
src/dhcp.csrc/dhcpcd.conf.5.insrc/if-options.ctests/backoff/test_backoff.c
🚧 Files skipped from review as they are similar to previous changes (3)
- src/dhcpcd.conf.5.in
- src/if-options.c
- src/dhcp.c
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/dhcp.c`:
- Around line 1826-1841: The retransmit timeout calculation in the RT assignment
performs potentially unsafe multiplications that can overflow before being cast
to signed integers. Specifically, state->interval * MSEC_PER_SEC and jitter * 2
can overflow when calculated with unsigned arithmetic. Refactor the calculation
to use overflow-safe arithmetic by either checking for potential overflow before
performing the multiplications, using intermediate variables with appropriate
type conversions to prevent overflow, or leveraging a safe multiplication
utility function that validates the operands before multiplying. Ensure that
both the state->interval scaling and the jitter doubling operations are
protected against overflow to produce valid retry delays.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2869ca84-ff93-4b44-bf3e-aeafa7d0127b
📒 Files selected for processing (7)
src/dhcp.csrc/dhcpcd.conf.5.insrc/if-options.csrc/if-options.htests/Makefiletests/backoff/Makefiletests/backoff/test_backoff.c
💤 Files with no reviewable changes (3)
- tests/backoff/Makefile
- tests/Makefile
- tests/backoff/test_backoff.c
🚧 Files skipped from review as they are similar to previous changes (1)
- src/if-options.h
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tests/backoff/test_backoff.c (1)
847-847:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd NULL check after
strdup().
strdup()can return NULL on allocation failure. Subsequent use ofdhcpcd_binat line 854 and later would dereference NULL.Proposed fix
dhcpcd_bin = strdup(resolved); + if (dhcpcd_bin == NULL) { + fprintf(stderr, "ERROR: strdup: %s\n", strerror(errno)); + return 1; + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/backoff/test_backoff.c` at line 847, The strdup() call that assigns to dhcpcd_bin can return NULL on memory allocation failure, but this return value is not checked before the variable is subsequently used. Add a NULL check immediately after the strdup(resolved) assignment to dhcpcd_bin, and handle the failure case appropriately (such as by returning an error, logging, and exiting, or freeing resources as needed before returning).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/backoff/Makefile`:
- Around line 7-18: Add a `.PHONY` declaration near the beginning of the
Makefile to declare that the targets `all`, `test`, `clean`, and `distclean` are
phony (command targets, not files). This prevents Make from treating these
targets as regular files and skipping them when files with those names exist.
The `.PHONY` declaration should list all four targets and be placed near line 7,
before the rule definitions.
---
Duplicate comments:
In `@tests/backoff/test_backoff.c`:
- Line 847: The strdup() call that assigns to dhcpcd_bin can return NULL on
memory allocation failure, but this return value is not checked before the
variable is subsequently used. Add a NULL check immediately after the
strdup(resolved) assignment to dhcpcd_bin, and handle the failure case
appropriately (such as by returning an error, logging, and exiting, or freeing
resources as needed before returning).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ad2829ad-d3c3-42e3-bf6c-d93a2594800a
📒 Files selected for processing (7)
src/dhcp.csrc/dhcpcd.conf.5.insrc/if-options.csrc/if-options.htests/Makefiletests/backoff/Makefiletests/backoff/test_backoff.c
✅ Files skipped from review due to trivial changes (2)
- src/dhcpcd.conf.5.in
- tests/Makefile
🚧 Files skipped from review as they are similar to previous changes (3)
- src/dhcp.c
- src/if-options.h
- src/if-options.c
|
@rsmarples is this something you'll consider? |
| case O_BACKOFF_JITTER: | ||
| ARG_REQUIRED; | ||
| ifo->backoff_jitter = | ||
| (uint32_t)strtou(arg, NULL, 0, 0, MAX_BACKOFF_JITTER, &e); |
There was a problem hiding this comment.
Jitter must be a positive value, otherwise a value of zero would be no delay at all which would be bad.
There was a problem hiding this comment.
If jitter=0, the delay would be the interval, which is minimum of 1s. I have tests for jitter = 0, 100, and default (1000).
--- defaults (N=128) ---
dhcpcd.conf: timeout=75
PASS: 128/128 runs produced data
Retry Expected Min Avg Max N Result
------------------------------------------------------
Init 1±1.0s 0.0s 0.9s 2.0s 128 PASS
1 4±1.0s 3.0s 4.0s 5.0s 128 PASS
2 8±1.0s 7.0s 8.0s 9.0s 128 PASS
3 16±1.0s 15.0s 16.1s 17.0s 128 PASS
4 32±1.0s 31.0s 32.0s 33.0s 128 PASS
5 64±1.0s 63.0s 64.0s 65.0s 128 PASS
Elapsed: 75.9s
--- min-latency (N=128) ---
dhcpcd.conf: timeout=12, nodelay, initial_interval=1, backoff_cutoff=1, backoff_jitter=0
PASS: 128/128 runs produced data
Retry Expected Min Avg Max N Result
------------------------------------------------------
Init 0 - - - 128 PASS
1 1±0.0s 1.0s 1.0s 1.0s 128 PASS
2 1±0.0s 1.0s 1.0s 1.0s 128 PASS
3 1±0.0s 1.0s 1.0s 1.0s 128 PASS
4 1±0.0s 1.0s 1.0s 1.0s 128 PASS
5 1±0.0s 1.0s 1.0s 1.0s 128 PASS
6 1±0.0s 1.0s 1.0s 1.0s 128 PASS
7 1±0.0s 1.0s 1.0s 1.0s 128 PASS
8 1±0.0s 1.0s 1.0s 1.0s 128 PASS
9 1±0.0s 1.0s 1.0s 1.0s 128 PASS
10 1±0.0s 1.0s 1.0s 1.0s 128 PASS
Elapsed: 13.0s
--- cloud (N=128) ---
dhcpcd.conf: timeout=12, nodelay, initial_interval=1, backoff_cutoff=1, backoff_jitter=100
PASS: 128/128 runs produced data
Retry Expected Min Avg Max N Result
------------------------------------------------------
Init 0 - - - 128 PASS
1 1±0.1s 0.9s 1.0s 1.1s 128 PASS
2 1±0.1s 0.9s 1.0s 1.1s 128 PASS
3 1±0.1s 0.9s 1.0s 1.1s 128 PASS
4 1±0.1s 0.9s 1.0s 1.1s 128 PASS
5 1±0.1s 0.9s 1.0s 1.1s 128 PASS
6 1±0.1s 0.9s 1.0s 1.1s 128 PASS
7 1±0.1s 0.9s 1.0s 1.1s 128 PASS
8 1±0.1s 0.9s 1.0s 1.1s 128 PASS
9 1±0.1s 0.9s 1.0s 1.1s 128 PASS
10 1±0.1s 0.9s 1.0s 1.1s 128 PASS
Elapsed: 13.0s
I'll update the strtou calls with range checks.
There was a problem hiding this comment.
Or do you mean for the initial attempt? Is no delay really bad in that case?
There was a problem hiding this comment.
With a jitter of zero there is no randomisation and no backoff. Whilst I can entertain adjusting RFC values I would like to keep the spirit of why a backoff is needed. ie, if there is no DHCP response, don't constantly hammer the network.
There was a problem hiding this comment.
With a jitter of zero there is no randomisation and no backoff.
There's definitely a back off, jitter is just some (relatively) minor variance.
The initial_interval (initial retransmission interval, default 4s, max 4s) controls the initial and subsequent retries.
The backoff_cutoff (exponential backoff cap, default 64s, max 64s) controls the upper bound of retries.
The backoff_jitter (random jitter per retry, default ±1000ms, max 1000ms) is just a randomized delay on top of interval>>cutoff base.
The backoff is still exponential starting with initial interval. The defaults maintain current behavior today.
With initial_interval=1, backoff_cutoff=64, backoff_jitter=0, we can see the back-off working clearly:
| Retry | Expected | Min | Avg | Max | N | Result |
|---|---|---|---|---|---|---|
| Init | 0 (nodelay) | – | – | – | 64 | PASS |
| 1 | 1±0.0s | 1.0s | 1.0s | 1.0s | 64 | PASS |
| 2 | 2±0.0s | 2.0s | 2.0s | 2.0s | 64 | PASS |
| 3 | 4±0.0s | 4.0s | 4.0s | 4.0s | 64 | PASS |
| 4 | 8±0.0s | 8.0s | 8.0s | 8.0s | 64 | PASS |
| 5 | 16±0.0s | 16.0s | 16.0s | 16.0s | 64 | PASS |
| 6 | 32±0.0s | 32.0s | 32.0s | 32.0s | 64 | PASS |
| 7 | 64±0.0s | 64.0s | 64.0s | 64.0s | 64 | PASS |
| 8 | 64±0.0s | 64.0s | 64.0s | 64.0s | 64 | PASS |
All tests passed — elapsed 130.7s. The sequence doubles (1→2→4→8→16→32→64) and caps at backoff_cutoff=64, with zero variance from backoff_jitter=0.
There was a problem hiding this comment.
(and the defaults remain to match current behavior)
Yes I'm happy to consider this. |
In cloud and virtual environments the DHCP service is typically ready within hundreds of milliseconds of the interface coming up, and once ready, responds within single-digit milliseconds. The RFC 2131 defaults (4s initial interval, 64s backoff cap) are designed for congested broadcast networks and are unnecessarily conservative in this context. Add three new configuration options to tune DHCPv4 retransmission: initial_interval - initial retransmission interval (default 4s) backoff_cutoff - exponential backoff cap (default 64s) backoff_jitter - random jitter per retry (default ±1000ms) Defaults match RFC 2131 so existing behaviour is unchanged. Option naming aligns with dhclient (initial-interval, backoff-cutoff). Minimum of 1 is enforced at parse time for interval and cutoff; invalid values are logged and the default is used. These options are DHCPv4-only; DHCPv6 retransmission follows RFC 8415 constants and is not user-configurable. Closes NetworkConfiguration#406 Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
In cloud and virtual environments the DHCP service is typically ready within hundreds of milliseconds of the interface coming up, and once ready, responds within single-digit milliseconds. The RFC 2131 defaults (4s initial interval, 64s backoff cap) are designed for congested broadcast networks and are unnecessarily conservative in this context.
Add three new configuration options to tune DHCPv4 retransmission:
initial_interval - initial retransmission interval (default 4s)
backoff_cutoff - exponential backoff cap (default 64s)
backoff_jitter - random jitter per retry (default ±1000ms)
Defaults match RFC 2131 so existing behaviour is unchanged. Option naming aligns with dhclient (initial-interval, backoff-cutoff). Minimum of 1 is enforced at parse time for interval and cutoff; invalid values are logged and the default is used.
These options are DHCPv4-only; DHCPv6 retransmission follows RFC 8415 constants and is not user-configurable.
A test harness (tests/backoff/test_backoff.c) was used to validate timing assumptions and correctness across multiple scenarios (defaults, min-latency, cloud-recommended) and to confirm rejection of invalid values. It requires root and network namespaces so may not be suitable for all CI environments. It can be removed if it doesn't have long-term value.
Example test output:
Closes #406