feat: add evaluate_flags() API for single-call flag evaluation by dmarticus · Pull Request #539 · PostHog/posthog-python

dmarticus · 2026-04-27T19:11:45Z

Problem

Phase 1 + Phase 2 of the Server SDK Feature Flag Evaluations RFC for posthog-python. Companion to the Node SDK PR (PostHog/posthog-js#3476).

Today every flag check fires its own /flags request, and capture(send_feature_flags=True) silently fires yet another on every captured event. The flag values on a captured event can diverge from the ones the code actually branched on when person/group properties differ between calls. send_feature_flags also attaches every evaluated flag to every event, which bloats properties on high-volume events.

Changes

New API (Phase 1)

posthog.evaluate_flags(distinct_id, ...) returns a FeatureFlagEvaluations snapshot:

flags = posthog.evaluate_flags(distinct_id, person_properties={"plan": "enterprise"})
if flags.is_enabled("new-dashboard"):
    render_new_dashboard()
posthog.capture("page_viewed", distinct_id=distinct_id, flags=flags)

A single /flags request powers both branching and event enrichment. is_enabled() and get_flag() fire $feature_flag_called events (deduped through the existing cache) with the full metadata — $feature_flag_id, $feature_flag_version, $feature_flag_reason, $feature_flag_request_id — so experiment exposure tracking keeps working.

Two layers of scoping

Network-level (flag_keys option): scopes the underlying /flags request itself.

flags = posthog.evaluate_flags(distinct_id, flag_keys=["new-dashboard", "checkout-flow"])

Event-level (filter helpers): narrow which flags get attached to a captured event without re-fetching.

posthog.capture(..., flags=flags.only_accessed())           # only flags the developer checked
posthog.capture(..., flags=flags.only(["new-dashboard"]))   # specific keys

Deprecation warnings (Phase 2)

The legacy single-flag surface keeps working but now emits DeprecationWarnings pointing at evaluate_flags():

feature_enabled()
get_feature_flag()
get_feature_flag_payload()
capture(send_feature_flags=...) (only when truthy)

feature_enabled and get_feature_flag are restructured to call _get_feature_flag_result directly instead of routing through each other, so a single user-level call emits exactly one warning instead of cascading.

Phase 3 (removal in next major) ships separately.

Local evaluation

Transparent. When the poller resolves a flag, the snapshot carries locally_evaluated=True and reason "Evaluated locally", matching what get_feature_flag() emits today.

Backwards compatibility

No breaking changes. All existing call paths return the same values they did before — the only behavior change is the new DeprecationWarning emissions, which can be silenced via Python's standard warnings filter.

Internals

_capture_feature_flag_called was refactored: the dedup + capture portion is extracted into _capture_feature_flag_called_if_needed, which is shared between the single-flag path and the new FeatureFlagEvaluations object. Both paths now dedupe identically.

Response-level errors (errors_while_computing_flags, quota_limited) are propagated into $feature_flag_called events from the snapshot, matching the granularity of the single-flag path.

Tests

posthog/test/test_evaluate_flags.py — 27 tests covering remote evaluation, local evaluation, filtering helpers, capture integration, flag_keys round-trip, empty-distinct_id safety, error-granularity propagation, and deprecation warning emission (with no-cascade verification).

Full suite: 489 passed. ruff format and ruff check clean.

Created with PostHog Code

Introduce posthog.evaluate_flags(distinct_id, ...) returning a FeatureFlagEvaluations snapshot. Branch on .is_enabled() / .get_flag() and pass the snapshot to capture() via a new flags option so events carry the exact values the code branched on, with no extra /flags request per capture. Filtering helpers .only_accessed() and .only([keys]) narrow the flag set attached to events. Pass flag_keys=[...] to evaluate_flags() to scope the underlying /flags request. Local evaluation is transparent. Deprecation (Phase 2 of the RFC): - feature_enabled, get_feature_flag, get_feature_flag_payload, and capture(send_feature_flags=...) now emit DeprecationWarning. - The deprecated surface remains functional and will be removed in the next major version. Generated-By: PostHog Code Task-Id: b8a45b11-b41c-4995-8622-acea525e7703

`mock` is not in the project's test dependencies on CI. Generated-By: PostHog Code Task-Id: b8a45b11-b41c-4995-8622-acea525e7703

github-actions · 2026-04-27T19:15:14Z

posthog-python Compliance Report

Date: 2026-04-28 18:50:54 UTC
Duration: 159908ms

✅ All Tests Passed!

30/30 tests passed

Capture Tests

✅ 29/29 tests passed

View Details

Test	Status	Duration
Format Validation.Event Has Required Fields	✅	514ms
Format Validation.Event Has Uuid	✅	1506ms
Format Validation.Event Has Lib Properties	✅	1506ms
Format Validation.Distinct Id Is String	✅	1506ms
Format Validation.Token Is Present	✅	1506ms
Format Validation.Custom Properties Preserved	✅	1506ms
Format Validation.Event Has Timestamp	✅	1506ms
Retry Behavior.Retries On 503	✅	9517ms
Retry Behavior.Does Not Retry On 400	✅	3504ms
Retry Behavior.Does Not Retry On 401	✅	3507ms
Retry Behavior.Respects Retry After Header	✅	9508ms
Retry Behavior.Implements Backoff	✅	23531ms
Retry Behavior.Retries On 500	✅	7503ms
Retry Behavior.Retries On 502	✅	7510ms
Retry Behavior.Retries On 504	✅	7511ms
Retry Behavior.Max Retries Respected	✅	23526ms
Deduplication.Generates Unique Uuids	✅	1495ms
Deduplication.Preserves Uuid On Retry	✅	7514ms
Deduplication.Preserves Uuid And Timestamp On Retry	✅	14520ms
Deduplication.Preserves Uuid And Timestamp On Batch Retry	✅	7505ms
Deduplication.No Duplicate Events In Batch	✅	1502ms
Deduplication.Different Events Have Different Uuids	✅	1506ms
Compression.Sends Gzip When Enabled	✅	1506ms
Batch Format.Uses Proper Batch Structure	✅	1506ms
Batch Format.Flush With No Events Sends Nothing	✅	1004ms
Batch Format.Multiple Events Batched Together	✅	1505ms
Error Handling.Does Not Retry On 403	✅	3508ms
Error Handling.Does Not Retry On 413	✅	3507ms
Error Handling.Retries On 408	✅	7512ms

Feature_Flags Tests

✅ 1/1 tests passed

View Details

Test	Status	Duration
Request Payload.Request With Person Properties Device Id	✅	513ms

Phase 2 deprecation warnings on `feature_enabled`, `get_feature_flag`, `get_feature_flag_payload`, and `capture(send_feature_flags=...)` are moved to a separate PR so this minor ships only the new API. Gives users one minor to migrate before runtime warnings start. The deprecated methods are restored to their original implementations (no longer need to bypass each other to avoid cascading warnings). Generated-By: PostHog Code Task-Id: b8a45b11-b41c-4995-8622-acea525e7703

greptile-apps · 2026-04-27T22:13:29Z

Prompt To Fix All With AI

This is a comment left during a code review.
Path: posthog/test/test_evaluate_flags.py
Line: 42-44

Comment:
**Missing `tearDown` leaks Client background threads**

`TestEvaluateFlagsRemote` creates a `Client` in `setUp` but never calls `self.client.shutdown()`. The other two test classes in this file (`TestEvaluateFlagsFiltering`, `TestCaptureWithFlagsSnapshot`) both have correct `tearDown` implementations. Without shutdown, the client's background consumer thread is never joined, which can cause test-suite noise, delayed process exit, or flaky behaviour when mocks from one test bleed into the next.

```suggestion
class TestEvaluateFlagsRemote(unittest.TestCase):
    def setUp(self):
        self.client = Client(FAKE_TEST_API_KEY)

    def tearDown(self):
        self.client.shutdown()
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: posthog/feature_flag_evaluations.py
Line: 69-80

Comment:
**`flag_definitions_loaded_at` is accepted but never supplied**

`_record_access` (line 242) emits `$feature_flag_definitions_loaded_at` only when `self._flag_definitions_loaded_at is not None`, but `Client.evaluate_flags()` never passes this argument to the `FeatureFlagEvaluations` constructor — so the property is always `None` and the event property is never emitted for locally-evaluated flags. Either wire up the value from the poller or remove the dead parameter and the guarded block in `_record_access` until it's ready.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: posthog/test/test_evaluate_flags.py
Line: 42-321

Comment:
**Prefer parameterised tests**

The test suite exercises multiple flag types (boolean, variant, disabled, missing) through per-assertion branches inside single test methods — e.g. `test_is_enabled_returns_correct_values_and_fires_events` and `test_get_flag_returns_variant_or_bool_with_full_metadata`. The project's review standard is to prefer parameterised tests. Using `subTest` (or a library like `ddt`) would give each flag-type scenario its own named, independently-failing case and reduce the assertion density per method.

**Context Used:** Do not attempt to comment on incorrect alphabetica... ([source](https://app.greptile.com/review/custom-context?memory=instruction-0))

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "revert: drop deprecation warnings (Phase..." | Re-trigger Greptile}

- Add `tearDown` to `TestEvaluateFlagsRemote` so the Client's background consumer thread is joined between tests, matching the pattern in the other test classes in this file. - Remove the dead `flag_definitions_loaded_at` constructor parameter from `FeatureFlagEvaluations`. The Python poller doesn't currently expose a definitions-loaded timestamp, so the parameter was always None and the gated branch in `_record_access` never fired. Trim it rather than leaving a confusing no-op; can be re-added with a real data source later. - Convert the two flag-type-variation tests to parameterized form. `test_is_enabled` and `test_get_flag_known_keys` now run as independently-named cases per flag type, and the missing-key behavior is split into its own focused test. Generated-By: PostHog Code Task-Id: b8a45b11-b41c-4995-8622-acea525e7703

dustinbyrne

Looks good! I have a few questions/considerations

dustinbyrne · 2026-04-28T15:08:15Z

+        if not self._accessed:
+            self._host.log_warning(
+                "FeatureFlagEvaluations.only_accessed() was called before any flags were accessed — "
+                "attaching all evaluated flags as a fallback. See "
+                "https://posthog.com/docs/feature-flags/server-sdks for details."
+            )
+            return self._clone_with(self._flags)


If this is a legitimate case of nothing being accessed, there's no way to avoid getting all flags. This could be more confusing than getting nothing?

E.g., I'm imaging a case where flags are first retrieved and consumed as needed. Should an early call to capture(flags=flags.only_accessed()) include all flags? I think it's contradictory with the name only_accessed

Good catch — fixed in 95eb1e9. only_accessed() now honors its name and returns an empty snapshot when nothing has been accessed (no fallback, no warning). The previous behavior was a misguided safety net that ended up being more surprising than helpful for exactly the scenario you describe (early capture() before any branching). The Node-side equivalent has the same change queued.

Updated test: test_only_accessed_returns_empty_when_no_flags_accessed.

dustinbyrne · 2026-04-28T15:22:11Z

        )
        return to_flags_and_payloads(resp)

    def get_flags_decision(


Should we deprecate the other methods?

Reversed course on this — shipped Phase 2 in eda573d alongside Phase 1 in this PR (rather than splitting it into a follow-up).

feature_enabled, get_feature_flag, get_feature_flag_payload, and capture(send_feature_flags=...) now emit DeprecationWarnings pointing at evaluate_flags(). The methods continue to return the same values; users who pin warnings to errors will get a heads-up on first use, and the rest see them surface via pytest / python -W / IDE inspectors.

feature_enabled and get_feature_flag were restructured to call _get_feature_flag_result directly so a single user-level call emits exactly one warning instead of cascading three.

Phase 3 (removal in next major) is separate.

dustinbyrne · 2026-04-28T15:28:12Z

+        local_result, fallback_to_server = self._get_all_flags_and_payloads_locally(
+            distinct_id,
+            groups=dict(groups),
+            person_properties=person_properties,
+            group_properties=group_properties,
+            flag_keys_to_evaluate=flag_keys,
+        )


We should pass device_id here, as it may be present in the context (via tracing headers).

I don't think it's necessary to add it as a parameter of evaluate_flags as well, but I'll leave it up to you.

Good catch — wired up in 95eb1e9. evaluate_flags() now resolves device_id from context (get_context_device_id()) at the top of the method, then forwards it to the get_flags_decision(...) call so it lands in the /flags request body. I went with your suggestion and didn't add it as a method parameter — context-only is cleaner and matches the existing distinct_id resolution pattern.

dustinbyrne · 2026-04-28T15:34:47Z

+        if self._evaluated_at and not (flag and flag.locally_evaluated):
+            properties["$feature_flag_evaluated_at"] = self._evaluated_at
+        if flag is None:
+            properties["$feature_flag_error"] = "flag_missing"


Just noting that we're losing granularity of $feature_flag_error values here

You're right — fixed in 95eb1e9. The snapshot now tracks response-level errors (errors_while_computing_flags, quota_limited) at construction and _record_access builds a comma-joined $feature_flag_error matching the single-flag path's granularity. So a missing flag during a quota-limited response now reports quota_limited,flag_missing instead of just flag_missing.

New test: test_errors_while_computing_flags_propagates_to_event covers both the standalone (errors_while_computing_flags) and combined (errors_while_computing_flags,flag_missing) cases.

Not covered yet: TIMEOUT/CONNECTION_ERROR/api_error_NNN. Those manifest as exceptions on the request and currently just log + leave the snapshot empty (so flags would surface as flag_missing). Could fold those into the snapshot in a follow-up if useful, but the most common cases (errors_while_computing and quota_limited) are now covered.

marandaneto · 2026-04-28T15:54:12Z

how does auto captured events and errors set flags to those events if you need the instance from evaluate_flags

marandaneto · 2026-04-28T16:00:55Z

No breaking changes. Existing feature_enabled, get_feature_flag, get_feature_flag_payload, and capture(send_feature_flags=...) paths all continue to work unchanged. Phase 2 (runtime DeprecationWarnings pointing at evaluate_flags) and Phase 3 (removal in next major) ship in follow-up PRs — this minor is API-additive only so users have a quiet release to migrate before warnings start.

i'd say we should deprecate and add a warning now otherwise theres multiple ways of doing the same thing and users and agents will be very confused
theres an RFC about that already

Resolves a typing import conflict in client.py from main and bundles the review-feedback changes: - Fix changeset format per RELEASING.md (`pypi/posthog: minor`). - `only_accessed()` now returns empty when nothing was accessed, instead of falling back to all flags. The fallback contradicted the method name and surprised reviewers. - Propagate response-level errors (`errors_while_computing_flags`, `quota_limited`) into `\$feature_flag_called` events so each access carries the granular error code(s) the single-flag path emits. - Resolve `device_id` from context in `evaluate_flags()` and pass it through to the `/flags` request. Important for experience-continuity flag matching when the device id flows in via tracing headers. - Make the precedence between `flags` and `send_feature_flags` explicit: `flags` always wins, and we log a warning when both are passed. - Clarify the `flag_keys` doc on the module-level `evaluate_flags`. Generated-By: PostHog Code Task-Id: b8a45b11-b41c-4995-8622-acea525e7703

Per reviewer feedback, ship Phase 2 in this PR alongside Phase 1 instead of splitting it into a follow-up. The deprecated methods continue to work — they just emit a `DeprecationWarning` pointing at `evaluate_flags()`: - `feature_enabled()` - `get_feature_flag()` - `get_feature_flag_payload()` - `capture(send_feature_flags=...)` (only when truthy) `feature_enabled` and `get_feature_flag` are restructured to call `_get_feature_flag_result` directly instead of routing through each other, so a single user-level call emits exactly one warning instead of cascading. Tests cover each warning's emission and the no-cascade behavior. Existing tests that use the legacy methods will now generate DeprecationWarnings but otherwise pass unchanged. Generated-By: PostHog Code Task-Id: b8a45b11-b41c-4995-8622-acea525e7703

Per PR review feedback (manoel): manual exception captures should be able to attach a `FeatureFlagEvaluations` snapshot the same way `capture()` can, so `\$exception` events carry the same flag context as the rest of the request's events. `capture_exception` already accepted `**kwargs` and forwarded select ones to `capture()` — `flags` was just missing from the forwarded set. This doesn't yet solve the wider question of how *auto*-captured exceptions (sys.excepthook, context-block exception handler) attach flags — that requires a separate mechanism (likely context-stashed flags) and is a follow-up. Generated-By: PostHog Code Task-Id: b8a45b11-b41c-4995-8622-acea525e7703

dmarticus · 2026-04-28T18:47:38Z

how does auto captured events and errors set flags to those events if you need the instance from evaluate_flags

Good question — the answer has two layers.

Manual error captures: capture_exception(exception, ..., flags=flags) now forwards the snapshot through to the inner capture() call so the $exception event carries the same flag context as your other events. This was a small gap I missed — fixed in db6d3c0:

flags = posthog.evaluate_flags(distinct_id)
try:
    risky_thing()
except Exception as e:
    posthog.capture_exception(e, distinct_id=distinct_id, flags=flags)

The same applies to anywhere the developer is the one calling capture() — they can thread flags= through.

Auto-captured exceptions (the case you're really asking about — sys.excepthook in ExceptionCapture, plus the new_context() block-level exception handler): you're right that there's no surface today for the developer to attach a flag snapshot, because they're not the one calling capture. Two paths I see for a follow-up PR:

Stash flags on the context. new_context() already carries distinct_id, session_id, properties — adding flags would mean any exception captured within the block (block-level handler, or even sys.excepthook if the context is active) would automatically attach the snapshot. Most ergonomic for the request-scoped pattern.
A "default flags provider" callback on the client that auto-evaluates at capture time. More magic, more risk of latency surprises. Probably not the right answer.

Happy to do (1) as a separate PR — felt out of scope for this one since it touches the context module and needs its own design pass. Worth tracking as a follow-up issue?

dmarticus · 2026-04-28T18:47:48Z

No breaking changes. Existing feature_enabled, get_feature_flag, get_feature_flag_payload, and capture(send_feature_flags=...) paths all continue to work unchanged. Phase 2 (runtime DeprecationWarnings pointing at evaluate_flags) and Phase 3 (removal in next major) ship in follow-up PRs — this minor is API-additive only so users have a quiet release to migrate before warnings start.

i'd say we should deprecate and add a warning now otherwise theres multiple ways of doing the same thing and users and agents will be very confused theres an RFC about that already

You're right — reversed course on this and shipped Phase 2 in eda573d alongside Phase 1 in this PR. The PR description has been updated.

feature_enabled, get_feature_flag, get_feature_flag_payload, and capture(send_feature_flags=...) now emit DeprecationWarnings pointing at evaluate_flags(). They keep working unchanged in this minor — Phase 3 (removal) lands in the next major.

feature_enabled and get_feature_flag were restructured to call _get_feature_flag_result directly so a single user-level call emits exactly one warning instead of cascading three.

dustinbyrne · 2026-04-28T21:07:12Z

+
+The returned `FeatureFlagEvaluations` snapshot exposes `is_enabled()`, `get_flag()`, `get_flag_payload()` for branching and `only_accessed()` / `only([keys])` filter helpers. Pass `flag_keys=[...]` to `evaluate_flags()` to scope the underlying `/flags` request itself.
+
+Deprecates `feature_enabled()`, `get_feature_flag()`, `get_feature_flag_payload()`, and `capture(send_feature_flags=...)`. They continue to work but now emit a `DeprecationWarning` pointing at `evaluate_flags()`. Removal is planned for the next major version.


Let's deprecate get_feature_flag_result as well. It lived a short life, but better to clean it up now.

marandaneto · 2026-04-29T07:29:40Z

@dustinbyrne you'll need to coordinate this with the @PostHog/team-docs-wizard as well otherwise the wizard will be lost or instrumenting new apps with deprecated APIs
we should get the docs here https://posthog.com/docs/references/posthog-python?filter=feature-flags and here https://posthog.com/docs/libraries/python updated and most likely all the other code snippets spread around the codebase
can we also write a skill that we can add to https://github.com/PostHog/skills/tree/main/skills and point out in our docs so users can just run that skill in that codebase for the migration?
this is not a blocker for this PR but ideally somehow synced otherwise we'll just get support tickets I guess

marandaneto · 2026-04-29T07:33:06Z

I think 'Stash flags on the context' makes sense
if we do this on a follow up PR, does that mean that new exceptions wont have flags context as they did before until we address this issue? or they currently dont have anyway so its not an issue?

dmarticus added 2 commits April 27, 2026 12:10

test: use unittest.mock instead of third-party mock

e8984df

`mock` is not in the project's test dependencies on CI. Generated-By: PostHog Code Task-Id: b8a45b11-b41c-4995-8622-acea525e7703

dmarticus marked this pull request as ready for review April 27, 2026 22:10

dmarticus requested a review from a team as a code owner April 27, 2026 22:10

greptile-apps Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread posthog/test/test_evaluate_flags.py

Comment thread posthog/feature_flag_evaluations.py Outdated

Comment thread posthog/test/test_evaluate_flags.py

dustinbyrne reviewed Apr 28, 2026

View reviewed changes

marandaneto reviewed Apr 28, 2026

View reviewed changes

Comment thread .sampo/changesets/evaluate-flags-api.md Outdated

marandaneto reviewed Apr 28, 2026

View reviewed changes

Comment thread posthog/__init__.py Outdated

marandaneto reviewed Apr 28, 2026

View reviewed changes

Comment thread posthog/client.py

dmarticus added 3 commits April 28, 2026 11:32

dustinbyrne reviewed Apr 28, 2026

View reviewed changes


		The returned `FeatureFlagEvaluations` snapshot exposes `is_enabled()`, `get_flag()`, `get_flag_payload()` for branching and `only_accessed()` / `only([keys])` filter helpers. Pass `flag_keys=[...]` to `evaluate_flags()` to scope the underlying `/flags` request itself.

		Deprecates `feature_enabled()`, `get_feature_flag()`, `get_feature_flag_payload()`, and `capture(send_feature_flags=...)`. They continue to work but now emit a `DeprecationWarning` pointing at `evaluate_flags()`. Removal is planned for the next major version.

Conversation

dmarticus commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

New API (Phase 1)

Two layers of scoping

Deprecation warnings (Phase 2)

Local evaluation

Backwards compatibility

Internals

Tests

Uh oh!

github-actions Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

posthog-python Compliance Report

✅ All Tests Passed!

Capture Tests

Feature_Flags Tests

Uh oh!

greptile-apps Bot commented Apr 27, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dustinbyrne left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmarticus Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

marandaneto commented Apr 28, 2026

Uh oh!

Uh oh!

marandaneto commented Apr 28, 2026

Uh oh!

dmarticus commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dmarticus commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marandaneto commented Apr 29, 2026

Uh oh!

marandaneto commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dmarticus commented Apr 27, 2026 •

edited

Loading

github-actions Bot commented Apr 27, 2026 •

edited

Loading

dmarticus Apr 28, 2026 •

edited

Loading

dmarticus commented Apr 28, 2026 •

edited

Loading

dmarticus commented Apr 28, 2026 •

edited

Loading