Backend: Cache leaderboard endpoint responses to reduce slow DB query#5075
Backend: Cache leaderboard endpoint responses to reduce slow DB query#5075akanshajain231999 wants to merge 5 commits into
Conversation
Wraps the /api/jobs/challenge_phase_split/{id}/leaderboard/ view with a
30-second Memcached lookup keyed by phase split, order_by, and a host bit.
The underlying SQL joins four tables and applies LIMIT 10000 for correctness
(post-query Python re-sorts and dedups by team), so pagination cannot be
pushed into the database; caching is the right lever for repeat hits within
a short window. The composite index aligned with the WHERE+ORDER BY pattern
already exists from migration 0116. 400 responses are not cached.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #5075 +/- ##
==========================================
+ Coverage 91.70% 91.72% +0.01%
==========================================
Files 96 96
Lines 8154 8154
==========================================
+ Hits 7478 7479 +1
+ Misses 676 675 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1 file with indirect coverage changes
... and 1 file with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
WalkthroughThis PR adds caching to the leaderboard API endpoint to reduce redundant computation of sorted leaderboard results. The cache key incorporates the phase split ID, the ChangesLeaderboard Caching
🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/jobs/views.py`:
- Around line 644-647: The cache key for the leaderboard view is built from raw
request.GET.get("order_by") (variable cache_key in leaderboard) which can
contain long or invalid characters and trigger InvalidCacheKey; before
interpolating into cache_key, validate or normalize the order_by value by
checking it against the whitelist used in
calculate_distinct_sorted_leaderboard_data() (leaderboard.schema["labels"]) and
defaulting to "default" if not allowed, or else replace/encode it (e.g. URL-safe
base64 or a short hash) to guarantee no whitespace/control chars and length
<250; additionally wrap cache.get/cache.set calls in a try/except that catches
InvalidCacheKey (or Exception) to fall back to non-cached behavior so a bad key
cannot raise a 500.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 0b161451-4e95-4ff8-b955-a52a7caf7386
📒 Files selected for processing (2)
apps/jobs/views.pytests/unit/jobs/test_views.py
| cache_key = "leaderboard:v1:{phase_split}:{order_by}:{is_host}".format( | ||
| phase_split=challenge_phase_split_id, | ||
| order_by=order_by or "default", | ||
| is_host=is_host_bit, |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Django's Memcached cache backend, what key length/character restrictions apply, and when is InvalidCacheKey raised?
💡 Result:
In Django’s Memcached cache backend, the key restrictions and the error behavior are: 1) Key length and character restrictions (memcached-safe) - Maximum key length: keys longer than 250 characters are considered invalid for memcached by Django’s validation logic. [1][2] - Disallowed characters: Django treats keys containing whitespace or control characters (and also DEL/control chars in the range \x00-\x20 and \x7f) as invalid for memcached. [2][1] These checks are applied through Django’s common memcached key warning/validation helpers. Django’s docs explicitly state that memcached keys “does not allow cache keys longer than 250 characters or containing whitespace or control characters”. [2] 2) When InvalidCacheKey is raised - In Django’s memcached backend implementation (MemcachedCache), validate_key iterates over memcache_key_warnings(key) and raises InvalidCacheKey(warning) when any warning is produced. [3] - The warnings are produced specifically when the key violates the above rules (length > 250, or disallowed characters). [1] - Therefore, InvalidCacheKey is raised when you call cache operations on a memcached backend with a key that would violate those memcached constraints (length/whitespace/control chars). This happens during key preparation for backend calls via make_and_validate_key. [3][1] Concretely, calls like cache.get/add/set/touch on the memcached backend will raise InvalidCacheKey if make_and_validate_key(key) detects a memcached-incompatible key. [3][1]
Citations:
- 1: https://docs.djangoproject.com/en/4.2/_modules/django/core/cache/backends/base/
- 2: https://docs.djangoproject.com/en/6.0/topics/cache/
- 3: https://github.com/django/django/blob/main/django/core/cache/backends/memcached.py
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '620,680p' apps/jobs/views.py | cat -nRepository: Cloud-CV/EvalAI
Length of output: 2554
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find where cache backend is configured (look for memcached backend paths / env usage)
rg -n "memcached|django\.core\.cache\.backends\.memcached|CACHE_BACKEND|CACHES\s*=" . || true
# Also quickly list likely settings files
ls -la . | head -n 50
find . -maxdepth 3 -type f -iname "*settings*.py" | head -n 50Repository: Cloud-CV/EvalAI
Length of output: 3870
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show memcached cache configuration
sed -n '280,360p' settings/common.py | cat -n
# Check if the leaderboard view handles InvalidCacheKey
sed -n '600,690p' apps/jobs/views.py | cat -n
# See if repo catches InvalidCacheKey anywhere
rg -n "InvalidCacheKey" . || trueRepository: Cloud-CV/EvalAI
Length of output: 7223
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "InvalidCacheKey" . || true
# Find the leaderboard calc function implementation and inspect order_by handling
rg -n "def calculate_distinct_sorted_leaderboard_data" -S apps || true
rg -n "calculate_distinct_sorted_leaderboard_data\\(" -S apps/jobs/views.py || true
# Narrow read around the function once located (best-effort by printing a few likely ranges)
python3 - <<'PY'
import subprocess, re
out = subprocess.check_output(["bash","-lc","rg -n \"def calculate_distinct_sorted_leaderboard_data\" -S apps || true"], text=True)
print(out.strip())
PYRepository: Cloud-CV/EvalAI
Length of output: 491
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '280,460p' apps/jobs/utils.py | cat -n
rg -n "InvalidCacheKey" . || true
rg -n "order_by\\s*=\\s*|order_by\\s*in\\s*|whitelist|allowed.*order|default_order_by" apps/jobs/utils.py || true
sed -n '460,620p' apps/jobs/utils.py | cat -nRepository: Cloud-CV/EvalAI
Length of output: 15869
Don’t embed raw order_by query param into the Memcached cache key
apps/jobs/views.py (leaderboard, lines 644-647) builds cache_key directly from request.GET.get("order_by") (with only a "default" fallback). Even though calculate_distinct_sorted_leaderboard_data() later whitelists order_by against leaderboard.schema["labels"] for the computation, the cache key itself is still composed from the raw input. With the Memcached cache backend, Django raises InvalidCacheKey for keys that are too long (>250 chars) or contain whitespace/control characters, and there’s no exception handling around cache.get() / cache.set(), so this can surface as a 500.
Proposed fix
+import hashlib
@@
- cache_key = "leaderboard:v1:{phase_split}:{order_by}:{is_host}".format(
+ order_by_key = order_by or "default"
+ safe_order_by_key = hashlib.sha256(
+ order_by_key.encode("utf-8")
+ ).hexdigest()[:16]
+ cache_key = "leaderboard:v1:{phase_split}:{order_by}:{is_host}".format(
phase_split=challenge_phase_split_id,
- order_by=order_by or "default",
+ order_by=safe_order_by_key,
is_host=is_host_bit,
)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/jobs/views.py` around lines 644 - 647, The cache key for the leaderboard
view is built from raw request.GET.get("order_by") (variable cache_key in
leaderboard) which can contain long or invalid characters and trigger
InvalidCacheKey; before interpolating into cache_key, validate or normalize the
order_by value by checking it against the whitelist used in
calculate_distinct_sorted_leaderboard_data() (leaderboard.schema["labels"]) and
defaulting to "default" if not allowed, or else replace/encode it (e.g. URL-safe
base64 or a short hash) to guarantee no whitespace/control chars and length
<250; additionally wrap cache.get/cache.set calls in a try/except that catches
InvalidCacheKey (or Exception) to fall back to non-cached behavior so a bad key
cannot raise a 500.
Summary
/api/jobs/challenge_phase_split/{id}/leaderboard/view with a 30-second cache (Memcached in production, already configured atsettings/common.py:314-318).challenge_phase_split_id,order_by, and a host/staff bit so hosts viewing private phase splits don't share keys with non-hosts. 400 responses are not cached.apps/jobs/utils.py:514-538re-sorts byfiltering_scoreand dedups to one row per team after the query — pagination at the SQL level would return wrong leaderboards. The composite index aligned with the query's WHERE+ORDER BY already exists (migration0116_leaderboard_data_leaderboard_query_index), so caching is the right next lever.Test plan
tests/unit/jobs/test_views.py::ChallengeLeaderboardTest::test_get_leaderboard_uses_cache_on_subsequent_request— second request within TTL does not re-run the heavy function.test_get_leaderboard_caches_separately_for_host_vs_non_host— host and non-host get separate cache keys.test_get_leaderboard_caches_separately_for_different_order_by— differentorder_byvalues get separate keys.test_get_leaderboard_does_not_cache_400_response— private phase split for non-host stays uncached and keeps returning 400.test_get_leaderboard_*tests still pass.Summary by CodeRabbit
New Features
Tests