Skip to content

feat(privacy-filter): add server-side heartbeat privacy filtering#599

Closed
TimeToBuildBob wants to merge 1 commit into
ActivityWatch:masterfrom
TimeToBuildBob:feat/privacy-filter
Closed

feat(privacy-filter): add server-side heartbeat privacy filtering#599
TimeToBuildBob wants to merge 1 commit into
ActivityWatch:masterfrom
TimeToBuildBob:feat/privacy-filter

Conversation

@TimeToBuildBob
Copy link
Copy Markdown
Contributor

Summary

Adds configurable regex-based privacy filters applied at heartbeat ingestion — the most-requested ActivityWatch feature (ActivityWatch/activitywatch#1, 10+ years open).

Rules are stored in the settings key privacy_filters as a JSON array and applied on every POST /api/0/buckets/<id>/heartbeat request before the event reaches the datastore.

Rule schema

[
  {
    "bucket_prefix": "aw-watcher-window",
    "field": "title",
    "pattern": "(?i)private browsing|incognito",
    "action": "drop"
  },
  {
    "field": "title",
    "pattern": "(?i)secret|confidential",
    "action": "redact",
    "replacement": "[redacted]"
  }
]
Field Required Description
field Event data key to match (title, app, url, …)
pattern fancy-regex pattern (lookaheads, Unicode, case flags)
action drop — discard event entirely; redact — replace field value
bucket_prefix optional Scope rule to buckets whose ID starts with this string
replacement optional Replacement string for redact action (default: [redacted])

Implementation

  • New aw_transform::privacy_filter module — zero new dependencies (fancy-regex already present)
  • Applied in bucket_events_heartbeat before datastore.heartbeat()
  • Invalid regex → warning logged, rule skipped (fail-open, never breaks a watcher)
  • Non-string field values → skipped silently (type-safe)
  • Dropped events → HTTP 200 with empty event body (clients see no error)
  • 9 unit tests covering: drop, redact, bucket scoping, default replacement, invalid regex, non-string values, multiple redact rules, empty rules

What this does NOT include (intentional MVP scope)

  • aw-webui settings UI — add rules via /api/0/settings/privacy_filters directly or via UI in a follow-up
  • aw-watcher-window pre-filtering — can be done separately using the same rule shape
  • Hidden/masked categories (store but hide in visualizations) — separate feature

Related

Add configurable regex-based privacy filters that intercept heartbeat
events before they reach the datastore. Rules are stored in the
settings key `privacy_filters` as a JSON array and are applied on
every heartbeat request.

Each rule supports:
- `bucket_prefix` (optional): scope the rule to specific buckets
- `field`: the event data key to match (e.g. "title", "app", "url")
- `pattern`: fancy-regex pattern (supports lookaheads, Unicode)
- `action`: "drop" (discard event) or "redact" (replace field value)
- `replacement`: custom redaction string (default: "[redacted]")

Implementation details:
- Filter logic lives in a new `aw_transform::privacy_filter` module
- Invalid regex patterns are logged and skipped (fail-open / graceful)
- Non-string field values are skipped (type-safe matching)
- Dropped events return HTTP 200 with an empty event body so clients
  see no error

Example rule (drop private-browsing window titles):
  {"field":"title","pattern":"(?i)private browsing|incognito","action":"drop"}

Closes: ActivityWatch/activitywatch#1 (partial — server-side filter MVP)
Relates to: ActivityWatch#482
@TimeToBuildBob
Copy link
Copy Markdown
Contributor Author

Closing as duplicate — a parallel session from the same author submitted #598 moments earlier with a more complete implementation (config-file based, integration tests, dotted field paths, enabled flag). Please review #598 instead.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 9, 2026

Greptile Summary

This PR adds a server-side privacy filtering system for heartbeat events, applying configurable regex-based drop/redact rules from the settings.privacy_filters key before events reach the datastore. The implementation is clean and well-tested, but has a few design gaps worth addressing before wider adoption.

  • Per-heartbeat overhead: both a SQLite read (get_key_value) and fresh regex compilation for every rule happen on each heartbeat — the hottest write path in the system. Settings and compiled regexes should be cached.
  • Dropped-event response: Event::default() (timestamp = now, empty data) is returned for dropped events; this can subtly distort watcher last_event state and skew start-times for the immediately following legitimate event.
  • Incomplete coverage: POST /buckets/<id>/events (bulk insert) does not pass through the privacy filter, so direct API writes bypass all configured rules silently.

Confidence Score: 4/5

Safe to merge as an MVP, but the per-heartbeat DB read and regex compilation will add measurable overhead for any user with filters configured, and the dropped-event response shape may cause subtle watcher state drift.

The core filtering logic is correct and the test coverage is solid. The main concerns are the per-heartbeat cost of loading settings from SQLite and recompiling regexes (both avoidable with a cache), the semantically odd Event::default() returned for dropped events (which can confuse watcher last_event tracking), and the silent bypass of privacy rules on the bulk-insert endpoint. None of these break the feature outright, but they are real rough edges for a privacy-sensitive path.

aw-server/src/endpoints/bucket.rs — the per-heartbeat overhead and dropped-event response shape both live here and warrant the most attention before this ships widely.

Important Files Changed

Filename Overview
aw-server/src/endpoints/bucket.rs Privacy filter wired into the heartbeat endpoint; settings fetched from DB and regexes compiled on every call, with potential client-state confusion when returning Event::default() for dropped events.
aw-transform/src/privacy_filter.rs New module implementing drop/redact logic with good test coverage; logic is correct but regexes are re-compiled on every invocation rather than cached.
aw-transform/src/lib.rs Minimal change: exposes the new privacy_filter module and its public symbols from the crate root.

Sequence Diagram

sequenceDiagram
    participant W as Watcher
    participant E as bucket_events_heartbeat
    participant DS as Datastore
    participant PF as apply_privacy_filter

    W->>E: "POST /buckets/<id>/heartbeat"
    E->>DS: get_key_value("settings.privacy_filters")
    DS-->>E: raw JSON string (or error to [])
    E->>E: "serde_json::from_str to Vec<PrivacyFilterRule>"
    loop each rule
        E->>PF: compile Regex::new(rule.pattern)
    end
    E->>PF: "apply_privacy_filter(bucket_id, heartbeat, &rules)"
    alt Drop rule matched
        PF-->>E: None
        E-->>W: 200 OK Event::default()
    else Redact rule(s) matched
        PF-->>E: Some(event with redacted fields)
        E->>DS: datastore.heartbeat(bucket_id, event, pulsetime)
        DS-->>E: stored/merged Event
        E-->>W: 200 OK stored Event
    else No rule matched
        PF-->>E: Some(original event)
        E->>DS: datastore.heartbeat(bucket_id, event, pulsetime)
        DS-->>E: stored/merged Event
        E-->>W: 200 OK stored Event
    end
Loading

Comments Outside Diff (1)

  1. aw-server/src/endpoints/bucket.rs, line 132-144 (link)

    P2 Direct event insertion bypasses the privacy filter

    bucket_events_create (POST /buckets/<id>/events) writes events directly to the datastore without passing them through apply_privacy_filter. Any watcher or client script that bulk-inserts events via this endpoint will bypass all configured privacy rules, silently storing data that the user expected to be filtered. The MVP scope is understood, but this gap is worth a follow-up or at least an inline note so it is not forgotten.

Reviews (1): Last reviewed commit: "feat(privacy-filter): add server-side he..." | Re-trigger Greptile

Comment on lines +162 to +168
let rules: Vec<PrivacyFilterRule> = match datastore.get_key_value("settings.privacy_filters") {
Ok(raw) => serde_json::from_str(&raw).unwrap_or_else(|e| {
warn!("Failed to parse privacy_filters setting: {}", e);
vec![]
}),
Err(_) => vec![],
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Per-heartbeat DB read and regex compilation

get_key_value("settings.privacy_filters") is called on every heartbeat, adding a SQLite read + JSON deserialization on the hottest write path. Immediately after, apply_privacy_filter compiles every regex from scratch on each call. aw-watcher-window sends roughly one heartbeat per second, so a user with a few rules will pay both costs ~86 400 times per day. Settings should be loaded once (at startup or lazily with a short-lived cache) and compiled regexes stored alongside the parsed rules.

Comment on lines +172 to +174
// Event matched a drop rule — acknowledge without storing.
None => return Ok(Json(Event::default())),
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dropped-event response misleads the client

When a drop rule fires, Event::default() is returned — this has timestamp: Utc::now(), duration: 0, and data: {}. Python/Rust watcher clients store the returned heartbeat as last_event and compare its data and computed end-time (timestamp + duration) when deciding whether to merge the next heartbeat. Returning an event whose data is empty and whose timestamp is "now" means the client treats the drop as a zero-duration event that just ended, which can distort the start-time of the very next legitimate event. Returning the incoming heartbeat unchanged (still without storing it) would give the client a more accurate view of what happened.

Comment on lines +55 to +77
let regex = match Regex::new(&rule.pattern) {
Ok(r) => r,
Err(e) => {
warn!("Privacy filter: invalid regex '{}': {}", rule.pattern, e);
continue;
}
};

let field_str = match event.data.get(&rule.field) {
Some(Value::String(s)) => s.clone(),
_ => continue,
};

let matches = match regex.is_match(&field_str) {
Ok(m) => m,
Err(e) => {
warn!(
"Privacy filter: regex match error for '{}': {}",
rule.pattern, e
);
continue;
}
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Fail-open on parse error silently disables all filters

When serde_json::from_str fails (malformed JSON, schema change, type mismatch), all rules are silently dropped and every event passes through unfiltered. A user who makes a typo in a rule's JSON has no indication that their privacy filters stopped working — they see only a warn! log entry. Consider surfacing this more visibly (e.g., returning a 500 on the next heartbeat, or a /api/0/settings/privacy_filters/status endpoint) so users can detect misconfiguration before sensitive data accumulates.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 9, 2026

Codecov Report

❌ Patch coverage is 79.31034% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.11%. Comparing base (656f3c9) to head (a127b82).
⚠️ Report is 50 commits behind head on master.

Files with missing lines Patch % Lines
aw-server/src/endpoints/bucket.rs 50.00% 4 Missing ⚠️
aw-transform/src/privacy_filter.rs 90.47% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #599      +/-   ##
==========================================
+ Coverage   70.81%   76.11%   +5.29%     
==========================================
  Files          51       61      +10     
  Lines        2916     4710    +1794     
==========================================
+ Hits         2065     3585    +1520     
- Misses        851     1125     +274     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add functionality to redact/filter sensitive data

1 participant