ETL benchmark by diegoimbert · Pull Request #1384 · windmill-labs/windmilldocs

diegoimbert · 2026-05-05T12:26:01Z

See preview at /blog/launch-week-data-tables-ducklake
Also in the Ducklake docs

cloudflare-workers-and-pages · 2026-05-05T12:26:03Z

Deploying windmilldocs with Cloudflare Pages

Latest commit:	`2487ca0`
Status:	✅ Deploy successful!
Preview URL:	https://d9d75878.windmilldocs.pages.dev
Branch Preview URL:	https://di-etl-bench.windmilldocs.pages.dev

View logs

claude · 2026-05-05T12:27:26Z

Claude finished @diegoimbert's task in 5m 27s —— View job

Review summary

The PR adds a TPC-DS benchmark section (a chart component plus a shared MDX partial) and embeds it into both the Ducklake docs page and the Day 2 launch-week blog post. It also drops 14 raw benchmark JSON result files into src/data/. No new pages, no sidebars.js changes, no new frontmatter to validate.

The chart math I sampled checks out (SF10 Snowflake Small 85.26s and Windmill 2× m6a.8xlarge 67.44s give the quoted 21% delta and the $0.04 vs $0.10 numbers; SF100 Windmill 3× m6a.16xlarge at 157s/$0.36 vs Snowflake Large 194s/$0.86 give the quoted 19%-faster / 2.4×-cheaper). The two looser claims are flagged inline.

Findings

P1

None I'm confident about. The new import TpcDsBenchmarkSection from '@site/src/components/TpcDsBenchmarkSection.mdx'; resolves on both pages, all /docs/... links in the changed files map to existing routes (including /docs/flows/flow_editor via sidebars.js:778), and the partial is correctly rendered as <TpcDsBenchmarkSection />.

P2 (see inline comments for the actionable ones)

Em dashes throughout the new MDX. AGENTS.md: "Avoid using symbol '—' as it looks like obvious AI." Occurrences: src/components/TpcDsBenchmarkSection.mdx L8, L12, L24, L51 (×2), L83, L94 (inside the title prop, so it also renders inside <h4>), L101, L103; blog/2026-03-31-launch-week-data-tables-ducklake/index.mdx L308. Inline-flagged the first MDX line and the blog line with concrete suggestions.
Loose benchmark wording at TpcDsBenchmarkSection.mdx:68-69 — "2× better performance … at similar cost" is actually 1.82× and the costs aren't similar; "2.7× faster at equal cost" needs the throughput-per-dollar framing spelled out. Inline-flagged.
Misleading platform tag in TpcDsBenchmark.tsx:452-465 — the Airflow + Pandas row uses platform: 'windmill'. It only works because instanceType is undefined and the host page hides the cost table; a future refactor or a showCostTable={true} flip will silently misreport cost. Inline-flagged with two clean options.
9k lines of unreferenced JSON in src/data/tpcds_bench_results/. TpcDsBenchmark.tsx ships hardcoded configs and never reads these files (grep -rn "tpcds_bench_results" src blog docs returns nothing). Inline-flagged on the first results file with options to wire them up or move them out of src/.
Heavy bold use in the new partial (**52 tasks**, **Ingest**/**Validate**/etc. inside table cells, **67 seconds**, **42× slowdown**, …). AGENTS.md: "Don't use bold font, unless you're adding to content already using it." This is a brand-new partial, so the prior-bold exception doesn't apply. Worth toning down.

`AGENTS.md` / `writing_style_guide.md` compliance check

✅ Code blocks specify languages (ts, python, sql).
✅ Headings are sentence case (the <h4> rendered from title="TPC-DS SF10 (10 GB) — Pandas vs Ducklake" is the only heading that violates the em-dash rule).
✅ No HTML colour styling on titles.
✅ Internal links use site-relative paths and resolve to existing routes.
✅ Frontmatter on the two host pages already had a description; neither was modified.
❌ Em-dash rule violated in multiple places (P2, see above).
❌ Bold-restraint guidance ignored in the new partial (P2, see above).

Verification

npm run build should pass — no broken links, no new sidebars.js entries to mis-register, no missing image references in the diff, MDX is well-formed (mid-document import statements are supported by Docusaurus and are already used elsewhere in this repo).
Worth a manual visual check after merge: the embedded <TpcDsBenchmark> chart on /docs/core_concepts/persistent_storage/ducklake#performance-ducklake-vs-snowflake and on the Day 2 blog post — confirm dark-mode legend colours are legible, the cost tooltip shows Time: + Cost: correctly for the Pandas row (it should still work since showCostTable={false} only hides the table, not the tooltip; the calculateCost short-circuit still returns 0 there).
The 14 JSON files don't appear to participate in the rendered page; if that's intentional (raw data for transparency), moving them out of src/ is the cleaner long-term home.

Read AGENTS.md, writing_style_guide.md, REVIEW.md
Read changed .mdx files (blog post, ducklake.mdx, TpcDsBenchmarkSection.mdx)
Read new React component TpcDsBenchmark.tsx
Verify links, anchors, frontmatter, sidebar registration
Verify EE / pricing-link rules, sentence-case headings, em-dash rule
Post severity-tagged review
· branch di/etl-bench

claude · 2026-05-05T12:31:42Z

+	AIRFLOW_PANDAS_SF10_CONFIGS
+} from '@site/src/components/TpcDsBenchmark';
+
+We ran the [TPC-DS](https://www.tpc.org/tpcds/) benchmark at three scale factors (10 GB, 100 GB, 1 TB) to compare Windmill + Ducklake against Airflow + Snowflake — the most common open-source orchestrator paired with a managed data warehouse.


P2 — em dash usage (AGENTS.md violation). AGENTS.md says: "Avoid using symbol '—' as it looks like obvious AI. Use sentences not using it or use '-' instead." This file introduces multiple em dashes (this line, plus L12, L24, L51 ×2, L83, L94, L101, L103). Suggest replacing with a period, parenthesis, or hyphen. Example for this line:

Suggested change

We ran the [TPC-DS](https://www.tpc.org/tpcds/) benchmark at three scale factors (10 GB, 100 GB, 1 TB) to compare Windmill + Ducklake against Airflow + Snowflake — the most common open-source orchestrator paired with a managed data warehouse.

We ran the [TPC-DS](https://www.tpc.org/tpcds/) benchmark at three scale factors (10 GB, 100 GB, 1 TB) to compare Windmill + Ducklake against Airflow + Snowflake, the most common open-source orchestrator paired with a managed data warehouse.

Fix all em dashes →

claude · 2026-05-05T12:31:54Z

+- **Windmill 1× m6a.16xlarge** (391s, $0.30) provides 2× better performance than Snowflake Small (711s) at similar cost
+- At equal cost (~$0.30), Windmill is **2.7× faster** than Snowflake Small


P2 — looseness in benchmark claims.

L68: "Windmill 1× m6a.16xlarge (391s, $0.30) provides 2× better performance than Snowflake Small (711s) at similar cost". The actual ratio is 711 / 391 ≈ 1.82×, not 2×; and Snowflake Small at 711.14s costs ~$0.79 (8 credit·hr ÷ 3600 × 711.14 with 2 credits/hr × $2/credit), which isn't "similar cost" to Windmill's $0.30. Either reframe (e.g. "nearly 2× faster" and remove "at similar cost") or use a config where the cost actually lines up.

L69: "At equal cost (~$0.30), Windmill is 2.7× faster than Snowflake Small" — this only works under a "throughput per dollar at fixed budget" interpretation (Windmill completes 100% of the benchmark for $0.30 vs Snowflake Small completing ~38% of it for the same budget). Worth spelling that out so a reader can reproduce the math.

claude · 2026-05-05T12:32:04Z

+export const AIRFLOW_PANDAS_SF10_CONFIGS: BenchmarkConfig[] = [
+	{
+		label: 'Airflow + Pandas (64 GB local)',
+		platform: 'windmill', // No cloud cost - local run
+		totalTimeSeconds: 2813.82,
+		stages: [
+			{ stage: 'ingest', wallClockSeconds: 600 },
+			{ stage: 'validate', wallClockSeconds: 168 },
+			{ stage: 'denormalize', wallClockSeconds: 323 },
+			{ stage: 'aggregate', wallClockSeconds: 433 },
+			{ stage: 'query', wallClockSeconds: 824 },
+			{ stage: 'verify', wallClockSeconds: 131 }
+		]
+	},


P2 — misleading platform tag. The Airflow + Pandas config sets platform: 'windmill'. It works only because instanceType is undefined, so AWS_PRICING[undefined] = 0 short-circuits calculateCost and returns 0, and the host MDX hides the cost table with showCostTable={false}. If anyone later passes showCostTable={true} or refactors calculateCost, this row will silently lie about cost.

Cleaner options:

Add a 'local' (or 'pandas') value to the platform union, branch in calculateCost to return 0 explicitly, and label tooltips/legend appropriately.

Or attach an explicit cost?: number override on BenchmarkConfig and prefer it in calculateCost.

Either way, the comment // No cloud cost - local run should not need to live next to a misleading platform: 'windmill'.

claude · 2026-05-05T12:32:15Z

@@ -0,0 +1,475 @@
+{


P2 — unused data files. All 14 JSON files added under src/data/tpcds_bench_results/ (~6.5k lines) are not referenced anywhere in the repo: TpcDsBenchmark.tsx ships hardcoded SF10_CONFIGS / SF100_CONFIGS / SF1000_CONFIGS / AIRFLOW_PANDAS_SF10_CONFIGS, and there is no loader that imports these JSONs.

Two reasonable resolutions:

Wire them up. Replace the hardcoded configs with a small loader (e.g. import the JSONs and reduce them into BenchmarkConfig[]). That makes future re-runs as easy as dropping in a new JSON.

Move out of src/. If they're meant to be raw reference data only, put them under something like static/data/tpcds/ or benchmarks/ so they aren't dragged into the bundler graph and don't bloat src/data/ with content the site never reads.

Shipping ~9k lines of unreferenced data inside src/ makes future grep noisy and gives the impression the page is data-driven when it isn't.

claude · 2026-05-05T12:32:23Z

+import TpcDsBenchmarkSection from '@site/src/components/TpcDsBenchmarkSection.mdx';

-<br />
+Most startups don't need a terabyte-scale data warehouse. If your data fits in 10 GB – 1 TB, you can run analytical workloads on Ducklake at a fraction of the cost of Airflow + Snowflake — while keeping full control over your data.


P2 — em dash (AGENTS.md rule). "of the cost of Airflow + Snowflake — while keeping full control" — replace the — with a comma or a period.

Suggested change

Most startups don't need a terabyte-scale data warehouse. If your data fits in 10 GB – 1 TB, you can run analytical workloads on Ducklake at a fraction of the cost of Airflow + Snowflake — while keeping full control over your data.

Most startups don't need a terabyte-scale data warehouse. If your data fits in 10 GB to 1 TB, you can run analytical workloads on Ducklake at a fraction of the cost of Airflow + Snowflake, while keeping full control over your data.

claude · 2026-05-05T12:34:32Z

Claude finished @diegoimbert's task in 55s —— View job

Re-review (no new commits since prior review)

This PR is at the same commit (47fc4e5) I reviewed earlier in this comment — no new commits, no diff changes. So there's nothing new to flag and I don't want to spam duplicate inline comments.

Recap of the still-open P2 findings from the prior pass (none are P0/P1 blockers):

Em dashes in src/components/TpcDsBenchmarkSection.mdx (L8, L12, L24, L51 ×2, L83, L94 inside the title prop, L101, L103) and blog/2026-03-31-launch-week-data-tables-ducklake/index.mdx:308. AGENTS.md: "Avoid using symbol '—' as it looks like obvious AI."
Loose benchmark wording at TpcDsBenchmarkSection.mdx:68-69 ("2× better performance ... at similar cost" is actually 1.82× and not similar cost; "2.7× faster at equal cost" needs explicit throughput-per-dollar framing).
Misleading platform tag at TpcDsBenchmark.tsx:452-465 (Airflow + Pandas row uses platform: 'windmill', only safe today because instanceType is undefined and the host page sets showCostTable={false}).
9k lines of unreferenced JSON under src/data/tpcds_bench_results/ — TpcDsBenchmark.tsx ships hardcoded configs and never reads them. Either wire them up or move out of src/.
Heavy bold use in the new partial (**52 tasks**, table-cell labels, **67 seconds**, **42× slowdown**, ...). AGENTS.md: "Don't use bold font, unless you're adding to content already using it."

The chart math I sampled before still checks out (SF10 Snowflake Small 85.26s vs Windmill 2× m6a.8xlarge 67.44s → quoted 21% / $0.04 vs $0.10; SF100 Windmill 3× m6a.16xlarge 157s/$0.36 vs Snowflake Large 194s/$0.86 → quoted 19%-faster / 2.4×-cheaper).

If you'd like me to actually apply these P2 fixes (em dashes + bold restraint + reword the loose benchmark claims + relabel the Pandas platform), reply with @claude please apply the P2 fixes and I'll push them to this branch.
· branch di/etl-bench

diegoimbert · 2026-05-05T15:31:59Z

fixed ai feedback

claude · 2026-05-05T15:35:26Z

Claude finished @diegoimbert's task in 1m 42s —— View job

Re-review (commit `2487ca0` — "nit fixes")

Diff since prior review (47fc4e5) is small and targeted: the AI-feedback fixes I asked for are mostly applied. Spot-checks below.

Status of prior P2 findings

#	Finding	Status
1	Em dashes in `TpcDsBenchmarkSection.mdx` and the blog post	✅ Fixed — `grep '—'` returns nothing in the changed files
2	Bold misuse in the new partial (`52 tasks`, `Ingest`, `67 seconds`, …)	✅ Fixed — `grep '**'` returns nothing in `TpcDsBenchmarkSection.mdx`
3	Loose benchmark wording at L68-69 ("2× … at similar cost", "2.7× faster at equal cost")	✅ Fixed — now reads "1.8x faster than Snowflake Small (711s, $0.79)" and "Per dollar spent, Windmill delivers 2.7x more throughput than Snowflake Small". The math checks out: 711/391 = 1.82, and (1 / $0.30) ÷ (1 / $0.79) = 2.63 ≈ 2.7×. The honest $0.79 disclosure for Snowflake Small replaces the earlier "similar cost" framing.
4	Misleading `platform: 'windmill'` on the Pandas row (`TpcDsBenchmark.tsx:452-465`)	✅ Fixed — `platform` union extended to `'windmill' \| 'snowflake' \| 'local'`, the Pandas config relabelled to `platform: 'local'`, the bar-color branch now paints `'local'` red, and the `$/hour` table cell explicitly returns 0 for `'local'`. `calculateCost` falls through to `return 0` for the new value, which is correct (no `instanceType`, no `warehouseSize`).
5	9k lines of unreferenced JSON under `src/data/tpcds_bench_results/`	⚠️ Not addressed — files still present, `grep -rn "tpcds_bench_results" src blog docs` still returns nothing. Not a blocker; just flagging that #5 from the prior pass is the only one still open.

New findings introduced by this commit

None. The commit is a pure cleanup pass; no new prose, no new chart configs, no new pages or sidebar entries to validate.

`AGENTS.md` / `writing_style_guide.md` compliance check (delta)

✅ Em-dash rule now satisfied across the changed files.
✅ Bold-restraint rule now satisfied in the new partial.
✅ Sentence-case heading rule satisfied (the previously-flagged title="TPC-DS SF10 (10 GB) — Pandas vs Ducklake" is now "TPC-DS SF10 (10 GB): Pandas vs Ducklake", no em dash, no title case).
✅ All previously-verified rules from the original pass still hold (code-block languages, no HTML colour styling, relative internal links, frontmatter description present on host pages, no new EE references introduced).

Verification

npm run build should still pass — diff is content-only inside an already-rendering MDX partial plus a small TS component change that compiles (the 'local' value is added consistently to the union, the bar-color callback, and the $/hour branch).
Worth a manual visual check after merge: confirm the Pandas bar in the SF10 "Pandas vs Ducklake" chart now renders red and the rest stay blue/light-blue, in both light and dark mode.

LGTM from a docs-review standpoint. Optional: tidy up the unused src/data/tpcds_bench_results/ JSON files (wire them up or move them to static//benchmarks/) in a follow-up.

Diff origin/main...HEAD (and against prior reviewed commit 47fc4e5)
Verify P2 fixes from prior review (em dashes, bold restraint, loose benchmark wording, platform tag, unused JSON)
Spot-check chart math + frontmatter
Post updated review
· branch di/etl-bench

diegoimbert · 2026-05-05T15:59:03Z

JSON files are for us

diegoimbert · 2026-05-05T22:44:08Z

/ai can you break down the complexity of the additions in this PR (e.g database/backend roundtrips, O(n) etc)

claude · 2026-05-05T22:46:45Z

Claude finished @diegoimbert's task —— View job

Complexity Analysis of PR Additions

Todo List

Read TpcDsBenchmark.tsx to analyze complexity
Check for database/backend calls
Analyze time and space complexity
Provide complexity breakdown

Complexity Breakdown

Database/Backend Operations

No database or backend roundtrips. The component is entirely client-side with:

Zero API calls
Zero database queries
All data is hardcoded in static BenchmarkConfig arrays
The 14 JSON files in src/data/tpcds_bench_results/ are never read by the React component

Time Complexity: O(n × m)

Where:

n = number of benchmark configs (typically 3-6)
m = number of unique stages (typically 5-7)

Key operations:

Stage collection (L139-142): O(n × m) - iterates through all configs and stages
Dataset building (L151-164): O(m × n) - for each stage, maps over all configs
Chart rendering: O(n × m) - Chart.js renders bars for each config × stage
Cost table (L258-280): O(n) - single pass through configs

Space Complexity: O(n × m)

Chart datasets: O(m × n) array storing stage values for each config
Hardcoded configs: ~400 lines of static data (~15KB)
No dynamic data loading: All benchmarks are compiled into the bundle

Runtime Performance

Instantaneous rendering: No network delays, all data is in-memory
No re-computation: Static data means no dynamic calculations beyond simple cost arithmetic
React re-renders: Only on colorMode change (light/dark theme toggle)

Bundle Impact

Component code: ~485 lines (~13KB unminified)
Static benchmark data: ~200 lines of configs (~6KB)
Unused JSON files: 7.6k lines (~300KB) that increase repo size but don't affect runtime

Summary

This is a purely presentational component with linear complexity relative to the number of benchmarks displayed. No async operations, no backend calls, no data fetching. The heaviest operation is Chart.js rendering the stacked bar chart, which is O(n × m) but with small constants (n ≤ 6, m ≤ 7 in practice).

hcourdent and others added 9 commits April 15, 2026 12:31

Catch up summary

fffc7e0

Doc gaps mid feb mid apr 2026

155bf2a

Merge branch 'main' into doc_catchup_april_13

a054445

nit sql.raw

aadd8c6

Ducklake benchmark article

76dc87b

nit

79454e4

Pandas section

9d52398

remove duplication

a838152

Merge branch 'main' into di/etl-bench

68fc0a2

delete files

47fc4e5

diegoimbert marked this pull request as draft May 5, 2026 12:28

diegoimbert marked this pull request as ready for review May 5, 2026 12:29

claude Bot reviewed May 5, 2026

View reviewed changes

nit fixes

2487ca0

hcourdent approved these changes May 6, 2026

View reviewed changes

hcourdent merged commit 533a9c7 into main May 6, 2026
12 checks passed

	We ran the [TPC-DS](https://www.tpc.org/tpcds/) benchmark at three scale factors (10 GB, 100 GB, 1 TB) to compare Windmill + Ducklake against Airflow + Snowflake — the most common open-source orchestrator paired with a managed data warehouse.
	We ran the [TPC-DS](https://www.tpc.org/tpcds/) benchmark at three scale factors (10 GB, 100 GB, 1 TB) to compare Windmill + Ducklake against Airflow + Snowflake, the most common open-source orchestrator paired with a managed data warehouse.

		- Windmill 1× m6a.16xlarge (391s, $0.30) provides 2× better performance than Snowflake Small (711s) at similar cost
		- At equal cost (~$0.30), Windmill is 2.7× faster than Snowflake Small

	Most startups don't need a terabyte-scale data warehouse. If your data fits in 10 GB – 1 TB, you can run analytical workloads on Ducklake at a fraction of the cost of Airflow + Snowflake — while keeping full control over your data.
	Most startups don't need a terabyte-scale data warehouse. If your data fits in 10 GB to 1 TB, you can run analytical workloads on Ducklake at a fraction of the cost of Airflow + Snowflake, while keeping full control over your data.

Conversation

diegoimbert commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloudflare-workers-and-pages Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying windmilldocs with Cloudflare Pages

Uh oh!

claude Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review summary

Findings

AGENTS.md / writing_style_guide.md compliance check

Verification

Uh oh!

claude Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review (no new commits since prior review)

Uh oh!

diegoimbert commented May 5, 2026

Uh oh!

claude Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review (commit 2487ca0 — "nit fixes")

Status of prior P2 findings

New findings introduced by this commit

AGENTS.md / writing_style_guide.md compliance check (delta)

Verification

Uh oh!

diegoimbert commented May 5, 2026

Uh oh!

diegoimbert commented May 5, 2026

Uh oh!

claude Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Complexity Analysis of PR Additions

Todo List

Complexity Breakdown

Database/Backend Operations

Time Complexity: O(n × m)

Space Complexity: O(n × m)

Runtime Performance

Bundle Impact

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

diegoimbert commented May 5, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented May 5, 2026 •

edited

Loading

claude Bot commented May 5, 2026 •

edited

Loading

`AGENTS.md` / `writing_style_guide.md` compliance check

claude Bot commented May 5, 2026 •

edited

Loading

claude Bot commented May 5, 2026 •

edited

Loading

Re-review (commit `2487ca0` — "nit fixes")

`AGENTS.md` / `writing_style_guide.md` compliance check (delta)

claude Bot commented May 5, 2026 •

edited

Loading