fix: handle schema type conflicts by renaming fields with type suffix#1546
Conversation
When an incoming event has a field with a data type different from the existing schema, instead of rejecting the event with a merge error, the field is renamed with a type suffix (e.g., body_timestamp_utf8, span_kind_int64). This prevents ingestion failures where field types may vary between events.
WalkthroughPre-merge JSON schema conflict detection: infer raw JSON schema from incoming events, compare with the existing stream schema to detect type conflicts, rename conflicting JSON keys with datatype suffixes when needed, then derive/merge Arrow schema and proceed with consistent value/schema pairs. Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/event/format/mod.rs`:
- Around line 511-538: The function rename_conflicting_fields_in_json currently
silently overwrites values when multiple keys map to the same target or when the
target key already exists; change its signature to return Result<Vec<Value>,
Error> (or a suitable Error type), iterate each object and detect collisions by
checking if the new_map already contains the target key (or if source key ==
target and target existed), and return an Err describing the conflicting
original keys and target instead of overwriting; then update the caller in
src/event/format/json.rs to handle the Result (propagate the error or map it
into the existing error type) so callers no longer assume renaming is
infallible.
When an incoming event has a field with a data type different from the existing schema,
instead of rejecting the event with a merge error, the field is renamed with a type suffix (e.g., body_timestamp_utf8, span_kind_int64).
This prevents ingestion failures where field types may vary between events.
Summary by CodeRabbit