Add VerificationResult.rowLevelResultsAsDataFrame support by billpratt · Pull Request #262 · awslabs/python-deequ

billpratt · 2026-05-12T22:14:56Z

Summary

Adds a Python wrapper for deequ's VerificationResult.rowLevelResultsAsDataFrame, enabling users to get per-row pass/fail results for data quality checks directly from pydeequ.

This is critical for workflows that need to quarantine rows with data quality issues rather than just getting aggregate check results.

Closes #261

What it does

check = Check(spark, CheckLevel.Error, "quality_check")
check = check.isComplete("email").isContainedIn("status", ["active", "inactive"])

result = VerificationSuite(spark).onData(df).addCheck(check).run()
row_level_df = VerificationResult.rowLevelResultsAsDataFrame(spark, result, df)
row_level_df.show()

Output includes all original columns plus a quality_check Boolean column. Multiple constraints within a single Check are ANDed together.

Changes

pydeequ/verification.py — Added rowLevelResultsAsDataFrame classmethod to VerificationResult
tests/test_verification.py — 6 new tests (completeness, containedIn, ANDed constraints, aggregate-only, column preservation, pandas output)
README.md — Added usage example

Supported constraint types

Constraint	Row-level output
`isComplete` / `hasCompleteness`	✅
`hasPattern`	✅
`isContainedIn` / `satisfies`	✅
`hasMinLength` / `hasMaxLength`	✅
`hasMin` / `hasMax`	✅
`isUnique` / `isPrimaryKey`	✅
`hasSize`, `hasEntropy`, etc.	❌ (aggregate-only, silently skipped)

Known limitation

As noted in #234, checks using Python lambda assertions (e.g., hasMin("b", lambda x: x == 0)) can cause serialization errors with rowLevelResultsAsDataFrame. This is a pre-existing ScalaFunction1 proxy issue, not introduced by this PR.

Testing

All 6 new tests pass. Full existing suite (152 tests) passes with no regressions.

Wrap deequ's VerificationResult.rowLevelResultsAsDataFrame as a classmethod on pydeequ's VerificationResult. This returns the original DataFrame with additional Boolean columns indicating which rows passed or failed each Check. - Add rowLevelResultsAsDataFrame classmethod to VerificationResult - Add tests covering completeness, containedIn, ANDed constraints, aggregate-only checks, column preservation, and pandas output - Update README with usage example Closes awslabs#261

github-actions

Generated by AI (model: us.anthropic.claude-opus-4-6-v1, prompt: d21e43dc) — may not be fully accurate. Reply if this doesn't help.

Address review feedback: Spark DataFrames have no guaranteed row order, so add explicit orderBy() before collect() in all tests that assert row-level values.

github-actions

Generated by AI (model: us.anthropic.claude-opus-4-6-v1, prompt: d21e43dc) — may not be fully accurate. Reply if this doesn't help.

Verify that rowLevelResultsAsDataFrame preserves the same number of rows as the original DataFrame.

github-actions · 2026-05-12T22:49:25Z

No issues found.

Generated by AI (model: us.anthropic.claude-opus-4-6-v1, prompt: d21e43dc) — may not be fully accurate. Reply if this doesn't help.

github-actions Bot requested changes May 12, 2026

View reviewed changes

Comment thread pydeequ/verification.py

Comment thread tests/test_verification.py

Comment thread tests/test_verification.py

Comment thread tests/test_verification.py

Comment thread tests/test_verification.py

billpratt marked this pull request as draft May 12, 2026 22:17

billpratt mentioned this pull request May 12, 2026

Plans to expose deequ's VerificationResult.rowLevelResultsAsDataFrame? #261

Open

Add orderBy to tests for deterministic row ordering

abb2fe0

Address review feedback: Spark DataFrames have no guaranteed row order, so add explicit orderBy() before collect() in all tests that assert row-level values.

github-actions Bot requested changes May 12, 2026

View reviewed changes

Comment thread pydeequ/verification.py

Comment thread tests/test_verification.py

Comment thread tests/test_verification.py

Comment thread tests/test_verification.py

billpratt marked this pull request as ready for review May 12, 2026 22:41

Add row count assertion to completeness test

0d37d0a

Verify that rowLevelResultsAsDataFrame preserves the same number of rows as the original DataFrame.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VerificationResult.rowLevelResultsAsDataFrame support#262

Add VerificationResult.rowLevelResultsAsDataFrame support#262
billpratt wants to merge 3 commits into
awslabs:masterfrom
billpratt:row-level-results

billpratt commented May 12, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

billpratt commented May 12, 2026

Summary

What it does

Changes

Supported constraint types

Known limitation

Testing

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant