Skip to content

Phase 1: Spec + Make streaming pipeline chunk-emitting#562

Open
aram356 wants to merge 61 commits intomainfrom
specs/streaming-response-optimization
Open

Phase 1: Spec + Make streaming pipeline chunk-emitting#562
aram356 wants to merge 61 commits intomainfrom
specs/streaming-response-optimization

Conversation

@aram356
Copy link
Copy Markdown
Collaborator

@aram356 aram356 commented Mar 25, 2026

Summary

  • Design spec for streaming HTTP responses through the publisher proxy when Next.js is disabled
  • Targets both TTFB/TTLB improvement and peak memory reduction (from ~4x response size to constant)
  • Two implementation steps: (1) make pipeline chunk-emitting, (2) wire up Fastly StreamingBody
  • Streaming gate: only activates when html_post_processors() is empty (Next.js disabled)
  • Verified against Fastly SDK 0.11.12 API (stream_to_client, StreamingBody, Request::from_client)

Stacked on #560 (docs reorganization).
Closes #565

Test plan

  • Spec reviewed for technical accuracy against codebase
  • Fastly SDK assumptions verified (no fastly::init, StreamingBody implements Write, abort-on-drop)
  • All compression paths covered (gzip, deflate, brotli — both keep and strip)
  • Error handling covers pre-stream and mid-stream failure modes

aram356 added 3 commits March 25, 2026 08:20
Move spec-like documents from root and docs/internal into a structured
layout under docs/superpowers with timestamped filenames:

- specs/: active design documents (SSC PRD, technical spec, EdgeZero
  migration, auction orchestration flow, production readiness report)
- archive/: completed or historical specs (optimization, sequence,
  publisher IDs audit)
Design spec for streaming HTTP responses through the publisher proxy
when Next.js is disabled. Covers two implementation steps:

1. Make the streaming pipeline chunk-emitting (HtmlRewriterAdapter,
   gzip, encoder finalization)
2. Wire up Fastly StreamingBody via stream_to_client() with entry
   point migration from #[fastly::main] to undecorated main()

Includes streaming gate logic, error handling, rollback strategy,
and testing plan. Verified against Fastly SDK 0.11.12 API.
aram356 added 3 commits March 25, 2026 10:57
Add before/after measurement protocol using Chrome DevTools MCP tools:
network timing capture, Lighthouse audits, performance traces, memory
snapshots, and automated body hash comparison for correctness.
Base automatically changed from docs/organize-specs-and-archive to main March 26, 2026 15:57
aram356 added 17 commits March 26, 2026 08:58
…ation test

- Add debug-level logging to process_chunks showing total bytes
  read and written per pipeline invocation
- Add brotli-to-brotli round-trip test to cover the into_inner()
  finalization path
- Add test proving HtmlWithPostProcessing accumulates output when
  post-processors are registered while streaming path passes through
- Group std imports together (cell, io, rc) before external crates
- Document supported compression combinations on PipelineConfig
- Remove dead-weight byte counters from process_chunks hot loop
- Fix stale comment referencing removed process_through_compression
- Fix brotli finalization: use drop(encoder) instead of into_inner()
  to make the intent clear (CompressorWriter writes trailer on drop)
- Document reset() as no-op on HtmlRewriterAdapter (single-use)
- Add brotli round-trip test covering into_inner finalization path
- Add gzip HTML rewriter pipeline test (compressed round-trip with
  lol_html, not just StreamingReplacer)
- Add HtmlWithPostProcessing accumulation vs streaming behavior test
- Add Eq derive to Compression enum (all unit variants, trivially correct)
- Brotli finalization: use into_inner() instead of drop() to skip
  redundant flush and make finalization explicit
- Document process_chunks flush semantics: callers must still call
  encoder-specific finalization after this method returns
- Warn when HtmlRewriterAdapter receives data after finalization
  (rewriter already consumed, data would be silently lost)
- Make HtmlWithPostProcessing::reset() a true no-op matching its doc
  (clearing auxiliary state without resetting rewriter is inconsistent)
- Document extra copying overhead on post-processor path vs streaming
- Assert output content in reset-then-finalize test (was discarded)
- Relax per-chunk emission test to not depend on lol_html internal
  buffering behavior — assert concatenated correctness + at least one
  intermediate chunk emitted
aram356 added 7 commits March 31, 2026 13:38
- Replace html_post_processors().is_empty() with has_html_post_processors()
  to avoid allocating Vec<Arc<...>> in the streaming gate check
- Add step to implement has_html_post_processors() on IntegrationRegistry
- Add EC implementation coordination note on handle_publisher_request
  restructuring step
- Renumber Phase 2 Task 8 steps accordingly
Both review comments apply to Phase 2 as a whole, not individual steps.
Move the EC implementation note to the Phase 2 header in the plan and
the Step 2 header in the spec.
- Narrow OwnedProcessResponseParams fields to pub(crate)
- Set Content-Length on buffered responses instead of removing it
- Add has_html_post_processors() to avoid Vec<Arc<...>> allocation
- Extract is_processable_content_type() and test it directly
- Fix stray merge artifact in apply_synthetic_id_headers
@aram356 aram356 changed the title Add streaming response optimization spec Phase 1: Spec + Make streaming pipeline chunk-emitting (old) Mar 31, 2026
@aram356 aram356 changed the title Phase 1: Spec + Make streaming pipeline chunk-emitting (old) Phase 1: Spec + Make streaming pipeline chunk-emitting Mar 31, 2026
aram356 added 21 commits March 31, 2026 14:22
…ation' into feature/streaming-pipeline-phase2
Non-2xx responses now stay buffered to prevent committing error
status irreversibly via stream_to_client() and injecting JS into
error pages. Unsupported Content-Encoding values (e.g. zstd from
misbehaving origins) fall back to buffered mode so failures produce
proper error responses instead of truncated streams.

Also removes raw synthetic ID from debug logging for privacy
consistency, fixes std::io::Write import inconsistency, and
corrects misleading "200 OK" comment in streaming error path.
Resolve conflicts between streaming Phase 2 and main's EC rename
(SSC → Edge Cookie) and RuntimeServices/consent route changes:
- Adapt streaming PublisherResponse to use runtime_services_for_consent_route
- Rename extracted apply_synthetic_id_headers → apply_ec_headers
- Use services.kv_store() for KV deletion instead of store name
- Remove stray merge artifact
…ation' into feature/streaming-pipeline-phase2
Accumulate text fragments via Mutex<String> until
last_in_text_node is true, then process the complete text.
Intermediate fragments return RemoveNode to suppress output.
Accumulate text fragments via Mutex<String> until
last_in_text_node is true, then match and rewrite on the complete
text. Non-GTM scripts that were fragmented are emitted unchanged.
All script rewriters (NextJS __NEXT_DATA__, GTM) are now
fragment-safe — they accumulate text internally until
last_in_text_node. The buffered adapter workaround is no longer
needed. Always use streaming mode in create_html_processor.
When rewrite_structured returns Keep on accumulated content,
intermediate fragments were already removed via RemoveNode. Emit
the full accumulated content via Replace to prevent silent data
loss. Also updates spec to reflect Phase 3 completion.
- Add response.get_status().is_success() check to streaming gate so
  4xx/5xx error pages stay buffered with complete status codes
- Add streaming gate unit tests covering all gate conditions
- Add stream_publisher_body gzip round-trip test
- Add small-chunk (32 byte) pipeline tests for __NEXT_DATA__ and GTM
  that prove fragmented text nodes survive the real lol_html path
Phase 3 performance results: 35% TTFB improvement, 37% DOM Complete
improvement on getpurpose.ai staging vs production. Phase 4 adds
binary pass-through streaming via PublisherResponse::PassThrough.
- Extract streaming gate into can_stream_response() function so tests
  call production code instead of reimplementing the formula
- Refactor GTM rewrite() to use Option<String> pattern instead of
  uninit variable, replacing indirect text.len() != content.len()
  accumulation check with explicit full_content.is_some()
- Add cross-element safety doc comment on accumulated_text fields
  in GTM and NextJsNextDataRewriter
- Document RSC placeholder deliberate non-accumulation strategy
- Update spec to reflect script rewriters are now fragment-safe
- Document why Mutex<String> is used (Sync bound on trait, not
  concurrent access) in both NextJsNextDataRewriter and
  GoogleTagManagerIntegration
- Add accumulation_buffer_drains_between_consecutive_script_elements
  test proving the buffer doesn't leak between two sequential
  <script> elements (fragmented GTM followed by fragmented non-GTM)
When the origin returns a processable 2xx response with an encoding the
pipeline cannot decompress (e.g. `zstd` from a misbehaving origin), the
buffered fallback previously still routed the body through
process_response_streaming. `Compression::from_content_encoding` maps
unknown values to `None`, so the rewriter would treat the compressed
bytes as identity-encoded text and emit garbled output.

Bypass the rewrite pipeline entirely in that case and return the origin
response untouched. Adds a test asserting byte-for-byte pass-through
and updates the is_supported_content_encoding doc to reflect the new
behavior.

Addresses PR #585 review feedback from @prk-Jr.
…eaming-pipeline-phase2

# Conflicts:
#	crates/trusted-server-core/src/publisher.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create spec and plan for streaming response for non-Next.js publisher proxy Implement streaming response optimization for non-Next.js publisher proxy

3 participants