Skip to content

Backend: Cache leaderboard endpoint responses to reduce slow DB query#5075

Open
akanshajain231999 wants to merge 5 commits into
Cloud-CV:masterfrom
akanshajain231999:sentry-fix2
Open

Backend: Cache leaderboard endpoint responses to reduce slow DB query#5075
akanshajain231999 wants to merge 5 commits into
Cloud-CV:masterfrom
akanshajain231999:sentry-fix2

Conversation

@akanshajain231999

@akanshajain231999 akanshajain231999 commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Wraps the /api/jobs/challenge_phase_split/{id}/leaderboard/ view with a 30-second cache (Memcached in production, already configured at settings/common.py:314-318).
  • Cache key includes challenge_phase_split_id, order_by, and a host/staff bit so hosts viewing private phase splits don't share keys with non-hosts. 400 responses are not cached.
  • Targets the Sentry "Slow DB Query" issue (2.59s out of 2.68s, 746 events from 110 users) for that endpoint. The underlying SQL cannot be paginated into the database because apps/jobs/utils.py:514-538 re-sorts by filtering_score and dedups to one row per team after the query — pagination at the SQL level would return wrong leaderboards. The composite index aligned with the query's WHERE+ORDER BY already exists (migration 0116_leaderboard_data_leaderboard_query_index), so caching is the right next lever.

Test plan

  • tests/unit/jobs/test_views.py::ChallengeLeaderboardTest::test_get_leaderboard_uses_cache_on_subsequent_request — second request within TTL does not re-run the heavy function.
  • test_get_leaderboard_caches_separately_for_host_vs_non_host — host and non-host get separate cache keys.
  • test_get_leaderboard_caches_separately_for_different_order_by — different order_by values get separate keys.
  • test_get_leaderboard_does_not_cache_400_response — private phase split for non-host stays uncached and keeps returning 400.
  • All existing test_get_leaderboard_* tests still pass.
  • Watch Sentry post-deploy: slow-query event count and endpoint p95 over 24h.

Summary by CodeRabbit

  • New Features

    • Leaderboard API now caches computed responses to significantly improve performance and reduce server load, with separate cache entries for different user roles and filter options.
  • Tests

    • Added comprehensive tests verifying proper cache behavior, including validation that cached results are reused, different user types maintain separate caches, and error responses bypass caching.

Review Change Stack

akanshajain231999 and others added 2 commits April 28, 2026 13:49
Wraps the /api/jobs/challenge_phase_split/{id}/leaderboard/ view with a
30-second Memcached lookup keyed by phase split, order_by, and a host bit.
The underlying SQL joins four tables and applies LIMIT 10000 for correctness
(post-query Python re-sorts and dedups by team), so pagination cannot be
pushed into the database; caching is the right lever for repeat hits within
a short window. The composite index aligned with the WHERE+ORDER BY pattern
already exists from migration 0116. 400 responses are not cached.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@codecov

codecov Bot commented Apr 28, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.72%. Comparing base (6d91645) to head (58c467e).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5075      +/-   ##
==========================================
+ Coverage   91.70%   91.72%   +0.01%     
==========================================
  Files          96       96              
  Lines        8154     8154              
==========================================
+ Hits         7478     7479       +1     
+ Misses        676      675       -1     
Flag Coverage Δ
backend 95.19% <ø> (ø)
frontend 87.44% <ø> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Accounts & Authentication 97.31% <ø> (ø)
Challenges Management 95.52% <ø> (ø)
Job Processing 89.79% <ø> (ø)
Participants & Teams 100.00% <ø> (ø)
Challenge Hosts 100.00% <ø> (ø)
Analytics 100.00% <ø> (ø)
Web Interface 100.00% <ø> (ø)
Frontend (Gulp) 87.44% <ø> (+0.02%) ⬆️
All Models 97.80% <ø> (ø)
All Views 100.00% <ø> (ø)
All Serializers 98.60% <ø> (ø)
Utility Functions 96.67% <ø> (ø)
Core Configuration 82.35% <ø> (ø)
Files with missing lines Coverage Δ
apps/jobs/views.py 100.00% <ø> (ø)

... and 1 file with indirect coverage changes

Files with missing lines Coverage Δ
apps/jobs/views.py 100.00% <ø> (ø)

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6d91645...58c467e. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown

Walkthrough

This PR adds caching to the leaderboard API endpoint to reduce redundant computation of sorted leaderboard results. The cache key incorporates the phase split ID, the order_by parameter (defaulting to "default"), and a flag indicating whether the requesting user is a host/staff member, ensuring separate cache entries for different logical requests. Only successful HTTP 200 responses are cached for 30 seconds; error responses like HTTP 400 skip caching to avoid stale error states.

Changes

Leaderboard Caching

Layer / File(s) Summary
Leaderboard endpoint caching implementation
apps/jobs/views.py
Cache import is added; the leaderboard endpoint now checks for cached results using a composite key (phase split ID, order_by, host status) and returns cached data on hit. On miss, it recalculates leaderboard data and caches the result only if the status code is HTTP 200, with a 30-second TTL.
Cache behavior verification tests
tests/unit/jobs/test_views.py
Test imports are extended to include Django cache and the leaderboard calculation utility. Four test cases verify: (1) identical requests reuse cached results, (2) host and non-host requests use separate cache entries, (3) different order_by parameters use separate cache entries, and (4) HTTP 400 error responses are not cached.

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: caching leaderboard endpoint responses to improve performance by reducing slow DB queries.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/jobs/views.py`:
- Around line 644-647: The cache key for the leaderboard view is built from raw
request.GET.get("order_by") (variable cache_key in leaderboard) which can
contain long or invalid characters and trigger InvalidCacheKey; before
interpolating into cache_key, validate or normalize the order_by value by
checking it against the whitelist used in
calculate_distinct_sorted_leaderboard_data() (leaderboard.schema["labels"]) and
defaulting to "default" if not allowed, or else replace/encode it (e.g. URL-safe
base64 or a short hash) to guarantee no whitespace/control chars and length
<250; additionally wrap cache.get/cache.set calls in a try/except that catches
InvalidCacheKey (or Exception) to fall back to non-cached behavior so a bad key
cannot raise a 500.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 0b161451-4e95-4ff8-b955-a52a7caf7386

📥 Commits

Reviewing files that changed from the base of the PR and between 6d91645 and 58c467e.

📒 Files selected for processing (2)
  • apps/jobs/views.py
  • tests/unit/jobs/test_views.py

Comment thread apps/jobs/views.py
Comment on lines +644 to +647
cache_key = "leaderboard:v1:{phase_split}:{order_by}:{is_host}".format(
phase_split=challenge_phase_split_id,
order_by=order_by or "default",
is_host=is_host_bit,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In Django's Memcached cache backend, what key length/character restrictions apply, and when is InvalidCacheKey raised?

💡 Result:

In Django’s Memcached cache backend, the key restrictions and the error behavior are: 1) Key length and character restrictions (memcached-safe) - Maximum key length: keys longer than 250 characters are considered invalid for memcached by Django’s validation logic. [1][2] - Disallowed characters: Django treats keys containing whitespace or control characters (and also DEL/control chars in the range \x00-\x20 and \x7f) as invalid for memcached. [2][1] These checks are applied through Django’s common memcached key warning/validation helpers. Django’s docs explicitly state that memcached keys “does not allow cache keys longer than 250 characters or containing whitespace or control characters”. [2] 2) When InvalidCacheKey is raised - In Django’s memcached backend implementation (MemcachedCache), validate_key iterates over memcache_key_warnings(key) and raises InvalidCacheKey(warning) when any warning is produced. [3] - The warnings are produced specifically when the key violates the above rules (length > 250, or disallowed characters). [1] - Therefore, InvalidCacheKey is raised when you call cache operations on a memcached backend with a key that would violate those memcached constraints (length/whitespace/control chars). This happens during key preparation for backend calls via make_and_validate_key. [3][1] Concretely, calls like cache.get/add/set/touch on the memcached backend will raise InvalidCacheKey if make_and_validate_key(key) detects a memcached-incompatible key. [3][1]

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail
sed -n '620,680p' apps/jobs/views.py | cat -n

Repository: Cloud-CV/EvalAI

Length of output: 2554


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find where cache backend is configured (look for memcached backend paths / env usage)
rg -n "memcached|django\.core\.cache\.backends\.memcached|CACHE_BACKEND|CACHES\s*=" . || true

# Also quickly list likely settings files
ls -la . | head -n 50
find . -maxdepth 3 -type f -iname "*settings*.py" | head -n 50

Repository: Cloud-CV/EvalAI

Length of output: 3870


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show memcached cache configuration
sed -n '280,360p' settings/common.py | cat -n

# Check if the leaderboard view handles InvalidCacheKey
sed -n '600,690p' apps/jobs/views.py | cat -n

# See if repo catches InvalidCacheKey anywhere
rg -n "InvalidCacheKey" . || true

Repository: Cloud-CV/EvalAI

Length of output: 7223


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n "InvalidCacheKey" . || true

# Find the leaderboard calc function implementation and inspect order_by handling
rg -n "def calculate_distinct_sorted_leaderboard_data" -S apps || true
rg -n "calculate_distinct_sorted_leaderboard_data\\(" -S apps/jobs/views.py || true

# Narrow read around the function once located (best-effort by printing a few likely ranges)
python3 - <<'PY'
import subprocess, re
out = subprocess.check_output(["bash","-lc","rg -n \"def calculate_distinct_sorted_leaderboard_data\" -S apps || true"], text=True)
print(out.strip())
PY

Repository: Cloud-CV/EvalAI

Length of output: 491


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '280,460p' apps/jobs/utils.py | cat -n

rg -n "InvalidCacheKey" . || true

rg -n "order_by\\s*=\\s*|order_by\\s*in\\s*|whitelist|allowed.*order|default_order_by" apps/jobs/utils.py || true

sed -n '460,620p' apps/jobs/utils.py | cat -n

Repository: Cloud-CV/EvalAI

Length of output: 15869


Don’t embed raw order_by query param into the Memcached cache key

apps/jobs/views.py (leaderboard, lines 644-647) builds cache_key directly from request.GET.get("order_by") (with only a "default" fallback). Even though calculate_distinct_sorted_leaderboard_data() later whitelists order_by against leaderboard.schema["labels"] for the computation, the cache key itself is still composed from the raw input. With the Memcached cache backend, Django raises InvalidCacheKey for keys that are too long (>250 chars) or contain whitespace/control characters, and there’s no exception handling around cache.get() / cache.set(), so this can surface as a 500.

Proposed fix
+import hashlib
@@
-    cache_key = "leaderboard:v1:{phase_split}:{order_by}:{is_host}".format(
+    order_by_key = order_by or "default"
+    safe_order_by_key = hashlib.sha256(
+        order_by_key.encode("utf-8")
+    ).hexdigest()[:16]
+    cache_key = "leaderboard:v1:{phase_split}:{order_by}:{is_host}".format(
         phase_split=challenge_phase_split_id,
-        order_by=order_by or "default",
+        order_by=safe_order_by_key,
         is_host=is_host_bit,
     )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/jobs/views.py` around lines 644 - 647, The cache key for the leaderboard
view is built from raw request.GET.get("order_by") (variable cache_key in
leaderboard) which can contain long or invalid characters and trigger
InvalidCacheKey; before interpolating into cache_key, validate or normalize the
order_by value by checking it against the whitelist used in
calculate_distinct_sorted_leaderboard_data() (leaderboard.schema["labels"]) and
defaulting to "default" if not allowed, or else replace/encode it (e.g. URL-safe
base64 or a short hash) to guarantee no whitespace/control chars and length
<250; additionally wrap cache.get/cache.set calls in a try/except that catches
InvalidCacheKey (or Exception) to fall back to non-cached behavior so a bad key
cannot raise a 500.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants