Skip to content

test: eliminate flaky time.sleep() tests with time-machine#94

Merged
27Bslash6 merged 2 commits intomainfrom
fix/flaky-time-sleep-tests
Apr 25, 2026
Merged

test: eliminate flaky time.sleep() tests with time-machine#94
27Bslash6 merged 2 commits intomainfrom
fix/flaky-time-sleep-tests

Conversation

@27Bslash6
Copy link
Copy Markdown
Contributor

@27Bslash6 27Bslash6 commented Apr 25, 2026

Summary

  • Add time-machine (thread-safe C extension) as dev dependency
  • Replace 18 flaky time.sleep() calls with instant clock shifts in circuit breaker + file backend tests
  • Widen margins on 8 external-backend TTL tests (Redis/Memcached own the clock, can't mock)

What changed

Circuit breaker tests (14 sleeps eliminated):

  • test_circuit_breaker.py — 5 sleeps → traveller.shift()
  • test_circuit_breaker_state_machine.py — 5 sleeps → traveller.shift()
  • test_circuit_breaker_race_conditions.py — 4 sleeps → traveller.shift()

In-process TTL tests (4 sleeps eliminated):

  • test_file_backend_critical.py — 3.5s sleep → instant shift
  • test_head_to_head.py — 2.5s sleep → instant shift
  • test_file_backend.py — 2 unittest.mock patches → time-machine

External-backend TTL tests (8 files, margins widened):

  • Redis/Memcached tests can't use time-machine (server-side TTL)
  • Margins: TTL+0.5s → TTL+2s

Impact

  • Unit test suite: ~3-5s faster (circuit breaker + file backend sleeps gone)
  • Zero flake risk on circuit breaker state transitions
  • Zero flake risk on file backend TTL assertions

Test plan

  • 1361 unit tests pass locally
  • CI passes (critical tests need Redis service)

Summary by CodeRabbit

  • Tests

    • Improved test reliability by replacing real-time delays with deterministic virtual-time simulation for TTL expiration, circuit breaker, and cache backend tests.
  • Chores

    • Updated CI/CD workflows to streamline tool installation for improved build consistency.
    • Added development dependency for enhanced testing capabilities.

Adds time-machine (thread-safe C extension) to eliminate CI flakes
from time-dependent test assertions.

Circuit breaker tests (14 sleeps → instant shifts):
- test_circuit_breaker.py, test_circuit_breaker_state_machine.py,
  test_circuit_breaker_race_conditions.py

In-process TTL tests (4 sleeps → instant shifts):
- test_file_backend_critical.py, test_head_to_head.py,
  test_file_backend.py (replaces unittest.mock approach)

External-backend TTL tests (8 files, wider margins):
- Redis/Memcached tests must sleep (server owns the TTL clock)
- Margins widened from TTL+0.5s to TTL+2s
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 25, 2026

📝 Walkthrough

Walkthrough

The pull request replaces real-time sleep-based waits in test suites with deterministic time mocking using the time_machine library, updates GitHub Actions workflows to use direct cargo install --locked instead of the cargo-binstall action, and adds time-machine>=2.19.0 to development dependencies.

Changes

Cohort / File(s) Summary
CI Workflow Tool Installation
.github/workflows/fuzz-smoke.yml, .github/workflows/security-deep.yml, .github/workflows/security-fast.yml, .github/workflows/security-medium.yml
Removed external cargo-binstall action and replaced cargo binstall commands with direct cargo install --locked for cargo-fuzz, cargo-machete, and cargo-geiger installation, simplifying tool setup without changing subsequent job execution.
Development Dependencies
pyproject.toml
Added time-machine>=2.19.0 to the dev dependency group to support deterministic time control in tests.
Unit & Circuit Breaker Tests
tests/unit/test_circuit_breaker.py, tests/unit/test_circuit_breaker_race_conditions.py, tests/unit/backends/test_file_backend.py
Replaced real wall-clock time.sleep() calls with deterministic time travel via time_machine.travel(..., tick=False) and traveller.shift(timedelta(...)), eliminating timing flakiness in timeout-dependent and TTL expiration assertions.
Critical Integration Tests
tests/critical/test_circuit_breaker_state_machine.py, tests/critical/test_file_backend_critical.py, tests/critical/test_encryption_integration.py, tests/critical/test_encryption_wrapper.py
Updated TTL and state-machine timeout tests to use deterministic time mocking with time_machine instead of real sleeps, with increased wait durations (1.2–1.5s → 3–5s) where server-side TTL cannot be mocked.
Test Timing Adjustments
tests/critical/test_basic_cache_works.py, tests/critical/test_cache_reliability.py, tests/competitive/test_head_to_head.py, tests/competitive/test_lru_cache_comparison.py, tests/integration/test_memcached_integration.py, tests/integration/test_redis_integration.py, tests/integration/test_tdd_cache_implementation.py
Increased sleep durations (1.2–1.5s → 3–4s) and added clarifying comments explaining server-side TTL behavior cannot be time-mocked, ensuring assertions occur after actual cache expiration in backing stores.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 Tick-tock, no more clock! Time machines race,
Where sleeps once wandered, now shifts jump in place.
Tests run fast and straight, no flakes to face,
Deterministic hops through the temporal space! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely captures the main change: replacing flaky time.sleep() calls with time-machine for deterministic testing.
Description check ✅ Passed The PR description provides a clear summary, lists specific file changes, quantifies impact, and includes a test plan. It matches the template structure with Description, Motivation, Type of Change, and Testing sections.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/flaky-time-sleep-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

cargo-binstall action installs binaries to a temp PATH that doesn't
persist across steps on self-hosted runners. cargo install puts them
in $CARGO_HOME/bin which is already on PATH.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/competitive/test_lru_cache_comparison.py (1)

94-114: ⚠️ Potential issue | 🟡 Minor

Test has no post-sleep assertion — the 3s sleep is wasted and the "TTL support" claim is not actually validated.

After time.sleep(3) there is no second invocation of get_data(1) and no assertion comparing call counts or results. The test trailing comment ("Data should be expired (in production with Redis)") is aspirational, not verified. This test neither proves the claim in the docstring nor justifies the real-time sleep that was just widened from 1.2s to 3s — it pure overhead per run.

Either assert expiration, or drop the sleep.

🛠️ Proposed fix: actually validate TTL expiration
     def test_cachekit_ttl_support(self):
         """
         CLAIM: cachekit has TTL support. Items expire after configured TTL.

         This is TRUE and is a feature advantage over lru_cache.
         """
+        call_count = 0

         `@cache`(ttl=1)  # 1 second TTL
         def get_data(x: int) -> str:
+            nonlocal call_count
+            call_count += 1
             return f"value_{x}"

         # First call - cache miss
         result = get_data(1)
         assert result == "value_1"
+        assert call_count == 1
+
+        # Immediate second call - cache hit
+        assert get_data(1) == "value_1"
+        assert call_count == 1

         # Wait for TTL to expire
         # Server-side TTL — must sleep (time-machine can't mock Redis/Memcached clock)
         time.sleep(3)

-        # Data should be expired (in production with Redis)
-        # Note: Exact behavior depends on Redis persistence
+        # Data should be expired — function must execute again
+        assert get_data(1) == "value_1"
+        assert call_count == 2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/competitive/test_lru_cache_comparison.py` around lines 94 - 114, The
test test_cachekit_ttl_support currently sleeps but never verifies expiration;
after time.sleep(3) invoke get_data(1) again and assert that the cached entry
was recomputed (i.e., TTL expired) by verifying the underlying function ran
again—implement this by turning get_data into a closure that increments a local
counter or by using a mock/spy to assert it was called twice; also consider
reducing the sleep to just over the TTL (e.g., ~1.1s) to avoid unnecessary delay
while keeping ttl=1 in the cache(...) decorator.
🧹 Nitpick comments (2)
.github/workflows/security-deep.yml (1)

49-50: Consider pinning cargo-fuzz (and the other cargo install tools in this PR) to a specific version.

cargo install --locked cargo-fuzz with no --version will pick up whatever version is latest on crates.io at job time, so security-tool behavior can drift silently between runs. The kani-verifier step at line 29 already demonstrates the pinning pattern (--version 0.67.0). Worth applying the same to:

  • .github/workflows/security-deep.yml L50 — cargo-fuzz
  • .github/workflows/fuzz-smoke.yml L35 — cargo-fuzz
  • .github/workflows/security-fast.yml L70 — cargo-machete
  • .github/workflows/security-medium.yml L29 — cargo-geiger

Pinning also lets cargo's "already installed" fast-path kick in reliably on the persisted $CARGO_HOME/bin, avoiding occasional from-source rebuilds when upstream cuts a new release.

♻️ Example for this file
     - name: Install cargo-fuzz
-      run: cargo install --locked cargo-fuzz
+      run: cargo install --locked cargo-fuzz --version <pinned-version>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/security-deep.yml around lines 49 - 50, The cargo install
invocations should be pinned to fixed versions to avoid drift; update the `cargo
install --locked cargo-fuzz`, `cargo install --locked cargo-machete`, and `cargo
install --locked cargo-geiger` commands to include a concrete `--version
<x.y.z>` flag (follow the same pattern used for `kani-verifier` like `--version
0.67.0`), choosing the appropriate stable version for each tool so the workflow
reliably reuses the cached binary and prevents silent upstream changes.
tests/integration/test_redis_integration.py (1)

91-98: Margin is TTL+3s here vs. TTL+2s elsewhere — consider normalizing.

The PR description states external-backend TTL margins were widened to TTL+2s, but this one uses time.sleep(4) against ttl=1 (TTL+3s). Harmless, but aligning with time.sleep(3) keeps the suite consistent and shaves ~1s off CI. If 3s was chosen intentionally because this path is slower/flakier than other Redis tests, a brief comment would help future maintainers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_redis_integration.py` around lines 91 - 98, This test
currently waits time.sleep(4) before asserting the TTL-expiration behavior for
short_lived()/call_count, which is TTL+3s for ttl=1; change the sleep to
time.sleep(3) to match the repo-wide TTL+2s margin used elsewhere (or, if the
extra second was intentional due to flakiness, replace the hardcoded sleep with
time.sleep(3) and add a short comment on why this path needs extra slack).
Update the sleep in the test surrounding short_lived() and keep references to
short_lived and call_count intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/critical/test_file_backend_critical.py`:
- Around line 64-72: The test test_ttl_enforced in
tests/critical/test_file_backend_critical.py no longer checks that the
"temporary" key expired; restore an assertion that verifies expiry (e.g., assert
backend.get("temporary") is None or assert not backend.exists("temporary"))
immediately after traveller.shift(timedelta(seconds=5)), and remove or update
the stale "file handle bug in FileBackend" workaround/comment (or link a bug if
one actually exists) so the test actually covers TTL enforcement using
backend.get or backend.exists as in the unit backend tests.

---

Outside diff comments:
In `@tests/competitive/test_lru_cache_comparison.py`:
- Around line 94-114: The test test_cachekit_ttl_support currently sleeps but
never verifies expiration; after time.sleep(3) invoke get_data(1) again and
assert that the cached entry was recomputed (i.e., TTL expired) by verifying the
underlying function ran again—implement this by turning get_data into a closure
that increments a local counter or by using a mock/spy to assert it was called
twice; also consider reducing the sleep to just over the TTL (e.g., ~1.1s) to
avoid unnecessary delay while keeping ttl=1 in the cache(...) decorator.

---

Nitpick comments:
In @.github/workflows/security-deep.yml:
- Around line 49-50: The cargo install invocations should be pinned to fixed
versions to avoid drift; update the `cargo install --locked cargo-fuzz`, `cargo
install --locked cargo-machete`, and `cargo install --locked cargo-geiger`
commands to include a concrete `--version <x.y.z>` flag (follow the same pattern
used for `kani-verifier` like `--version 0.67.0`), choosing the appropriate
stable version for each tool so the workflow reliably reuses the cached binary
and prevents silent upstream changes.

In `@tests/integration/test_redis_integration.py`:
- Around line 91-98: This test currently waits time.sleep(4) before asserting
the TTL-expiration behavior for short_lived()/call_count, which is TTL+3s for
ttl=1; change the sleep to time.sleep(3) to match the repo-wide TTL+2s margin
used elsewhere (or, if the extra second was intentional due to flakiness,
replace the hardcoded sleep with time.sleep(3) and add a short comment on why
this path needs extra slack). Update the sleep in the test surrounding
short_lived() and keep references to short_lived and call_count intact.
🪄 Autofix (Beta)

✅ Autofix completed


ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 52ccbd8e-1fe5-4612-bff2-d176c0e459c1

📥 Commits

Reviewing files that changed from the base of the PR and between bf64163 and 0404223.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (19)
  • .github/workflows/fuzz-smoke.yml
  • .github/workflows/security-deep.yml
  • .github/workflows/security-fast.yml
  • .github/workflows/security-medium.yml
  • pyproject.toml
  • tests/competitive/test_head_to_head.py
  • tests/competitive/test_lru_cache_comparison.py
  • tests/critical/test_basic_cache_works.py
  • tests/critical/test_cache_reliability.py
  • tests/critical/test_circuit_breaker_state_machine.py
  • tests/critical/test_encryption_integration.py
  • tests/critical/test_encryption_wrapper.py
  • tests/critical/test_file_backend_critical.py
  • tests/integration/test_memcached_integration.py
  • tests/integration/test_redis_integration.py
  • tests/integration/test_tdd_cache_implementation.py
  • tests/unit/backends/test_file_backend.py
  • tests/unit/test_circuit_breaker.py
  • tests/unit/test_circuit_breaker_race_conditions.py

Comment on lines +64 to +72
# Advance clock past TTL
traveller.shift(timedelta(seconds=5))

# Permanent still exists, temporary is gone
assert backend.get("permanent") == b"stays"
# Skip reading expired key directly due to file handle bug in FileBackend
# Instead verify by setting a new key (proves cleanup didn't affect backend)
backend.set("new_key", b"new_value")
assert backend.get("new_key") == b"new_value"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

test_ttl_enforced no longer actually verifies TTL enforcement.

After shifting the clock past the TTL, the test only asserts that the non-TTL key still exists and that a fresh set/get works — it never asserts that "temporary" expired. That is the sole behavior this test is meant to cover, so this is a coverage regression for the critical path.

The "file handle bug in FileBackend" comment also looks stale: tests/unit/backends/test_file_backend.py::test_get_expired_ttl_deletes_file and test_exists_expired_ttl_deletes_file in this same PR successfully call backend.get(...) is None / backend.exists(...) is False after an identical traveller.shift(...) on the same class. If a real bug exists, please link the tracking issue; otherwise drop the workaround.

🧪 Proposed fix — restore the expiry assertion
         # Permanent still exists, temporary is gone
         assert backend.get("permanent") == b"stays"
-        # Skip reading expired key directly due to file handle bug in FileBackend
-        # Instead verify by setting a new key (proves cleanup didn't affect backend)
-        backend.set("new_key", b"new_value")
-        assert backend.get("new_key") == b"new_value"
+        assert backend.exists("temporary") is False
+        assert backend.get("temporary") is None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/critical/test_file_backend_critical.py` around lines 64 - 72, The test
test_ttl_enforced in tests/critical/test_file_backend_critical.py no longer
checks that the "temporary" key expired; restore an assertion that verifies
expiry (e.g., assert backend.get("temporary") is None or assert not
backend.exists("temporary")) immediately after
traveller.shift(timedelta(seconds=5)), and remove or update the stale "file
handle bug in FileBackend" workaround/comment (or link a bug if one actually
exists) so the test actually covers TTL enforcement using backend.get or
backend.exists as in the unit backend tests.

@27Bslash6 27Bslash6 merged commit 3030c47 into main Apr 25, 2026
31 of 32 checks passed
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 25, 2026

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

Fixes Applied Successfully

Fixed 1 file(s) based on 1 unresolved review comment.

A stacked PR containing fixes has been created.

  • Stacked PR: #95
  • Files modified:
  • tests/critical/test_file_backend_critical.py

Time taken: 2m 1s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant