utils: Make RedisSemaphore.acquire atomic by ideaship · Pull Request #2419 · osism/python-osism

ideaship · 2026-06-28T13:38:46Z

Problem

RedisSemaphore.acquire() checked capacity and took a slot in two
separate Redis round-trips:

if self.redis.zcard(self.key) < self.maxsize:   # CHECK
    self.redis.zadd(self.key, {identifier: now}) # USE

This is a time-of-check-to-time-of-use race: two clients can both run
ZCARD, both observe a free slot, and both ZADD, leaving the sorted set
with more than maxsize members. The semaphore that bounds concurrent
NetBox connections (create_netbox_semaphore, maxsize = NETBOX_MAX_CONNECTIONS, default 5) therefore over-admits under contention
— exactly when the cap matters most. A local 50-client harness peaked at
22 holders against maxsize 5.

A second, opposite defect lived in the same method: the cleanup boundary
now - 60 was computed once before the retry loop, so it never advanced
while a caller waited. Holders that expired during the wait were never
reclaimed, causing spurious timeouts when capacity had actually freed.

Fix

Replace the check-and-acquire with a single server-side Lua EVAL that
performs expired-holder cleanup, the capacity check, and the slot
reservation in one atomic step, so no other client can interleave between
the check and the reservation. Because now is recomputed each iteration
and passed into the script, the reclaim boundary advances with wall-clock
time — fixing the second defect in the same change. The 60-second window
is named HOLDER_EXPIRY.

Testing

The race needs genuinely concurrent clients against a real Redis and is
not reproducible with a mock. It was validated locally with a threaded
real-Redis harness (over-admit on the old code, never on the fixed code).
The committed regression tests run the production Lua script against
in-memory fakeredis (added as a dev dependency, with lupa for Lua
support) and pin the two invariants: the holder set never exceeds
maxsize, and the reclaim boundary advances across retries so a slot
freed mid-wait is granted.

Notes

Surfaced while reviewing the test-only PR below; this is a pre-existing
production defect in code that PR does not touch, so it is fixed
separately:
- Add unit tests for osism/utils semaphore, redlock and task locks #2349
That PR adds tests pinning the previous behavior (including the frozen
cleanup boundary); whichever lands second will need those assertions
reconciled against this change.

🤖 Generated with Claude Code

RedisSemaphore.acquire() checked capacity and took a slot in two separate Redis round-trips: if self.redis.zcard(self.key) < self.maxsize: # CHECK self.redis.zadd(self.key, {identifier: now}) # USE This is a time-of-check-to-time-of-use race. Two clients can both run ZCARD, both observe a free slot, and both proceed to ZADD, leaving the sorted set with more than maxsize members. The semaphore that bounds concurrent NetBox connections (create_netbox_semaphore, maxsize = NETBOX_MAX_CONNECTIONS) therefore over-admits under contention -- a local 50-client harness peaked at 22 holders against maxsize 5. Replace the check-and-acquire with a single server-side Lua EVAL that performs the expired-holder cleanup, the capacity check, and the slot reservation in one atomic step, so no other client can interleave between the check and the reservation. The same change fixes a second, opposite defect in the method: the cleanup boundary (now - 60) was computed once before the retry loop, so it never advanced while a caller waited and holders that expired during the wait were never reclaimed, causing spurious timeouts. now is now recomputed on every iteration and passed to the script, so the reclaim boundary advances with wall-clock time. The 60-second window is named HOLDER_EXPIRY. The race itself requires genuinely concurrent clients against a real Redis and is not reproducible with a mock; it was validated locally with a threaded real-Redis harness. The committed regression tests run the production Lua script against in-memory fakeredis (added as a dev dependency, with lupa for Lua support) and pin the two invariants the fix guarantees: the holder set never exceeds maxsize, and the reclaim boundary advances across retries so a slot freed mid-wait is granted. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Roger Luethi <luethi@osism.tech>

sourcery-ai

Hey - I've left some high level feedback:

Consider using a monotonic clock (e.g. time.monotonic()) for the retry loop and now passed into the Lua script to avoid issues if the system clock is adjusted during acquisition retries.
If this Lua script will be invoked frequently, you may want to load it and use EVALSHA instead of eval to reduce script parsing overhead on Redis and make it easier to reuse across RedisSemaphore instances.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Consider using a monotonic clock (e.g. `time.monotonic()`) for the retry loop and `now` passed into the Lua script to avoid issues if the system clock is adjusted during acquisition retries.
- If this Lua script will be invoked frequently, you may want to load it and use `EVALSHA` instead of `eval` to reduce script parsing overhead on Redis and make it easier to reuse across `RedisSemaphore` instances.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

osism-agent added this to Human Board Jun 28, 2026

github-project-automation Bot moved this to Ready in Human Board Jun 28, 2026

sourcery-ai Bot reviewed Jun 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

utils: Make RedisSemaphore.acquire atomic#2419

utils: Make RedisSemaphore.acquire atomic#2419
ideaship wants to merge 1 commit into
mainfrom
fix/redis-semaphore-acquire-atomic

ideaship commented Jun 28, 2026

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ideaship commented Jun 28, 2026

Problem

Fix

Testing

Notes

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants