Skip to content

Latest commit

 

History

History
706 lines (512 loc) · 19.8 KB

File metadata and controls

706 lines (512 loc) · 19.8 KB
title HTMLRewriter
sidebarTitle HTMLRewriter
description Transform HTML responses on the edge with streaming, selector-based rewriting.
tag Preview

Overview

HTMLRewriter lets you modify HTML responses as they stream through your edge script. It parses HTML on the fly and calls your handler functions when it encounters matching elements, comments, or text without buffering the entire document in memory.

Quick Start

A simple middleware example:

import * as BunnySDK from "https://esm.sh/@bunny.net/edgescript-sdk@0.11.2";

BunnySDK.net.http.servePullZone()
  .onOriginResponse(async ({ response }) => {
    return new HTMLRewriter()
      .on("h1", {
        element(el) {
          el.setInnerContent("Modified Title");
        },
      })
      .transform(response);
  });

Constructor

new HTMLRewriter(options?)
When `true`, enables processing of ESI ([Edge Side Includes](https://www.w3.org/TR/esi-lang)) tags like ``.

Methods

on(selector, handlers)

Registers handlers for elements matching a CSS selector. Returns this for chaining.

rewriter.on("div.content", {
  element(el) { /* ... */ },
  comments(comment) { /* ... */ },
  text(text) { /* ... */ },
});
A CSS selector. See [supported selectors](#supported-css-selectors) below. An object with optional handler functions. Each handler can be sync or async. Called when an opening tag matching the selector is encountered. Called for HTML comments within the matched element. Called for text content within the matched element.

onDocument(handlers)

Registers document-level handlers. Returns this for chaining.

rewriter.onDocument({
  doctype(doctype) { /* ... */ },
  comments(comment) { /* ... */ },
  text(text) { /* ... */ },
  end(end) { /* ... */ },
});
Called when the `` declaration is encountered. Called for document-level comments (outside any element scope). Called for document-level text. Called when the end of the document is reached.

transform(response)

Applies all registered handlers to the response body and returns a new Response.

const transformed = rewriter.transform(response);
The HTTP response to transform. Must not be an error response.

Returns: A new Response with:

  • The same headers (minus Content-Length, since the body length may change)
  • A streaming body with the transformed HTML

Handler Types

Element

Passed to element handlers. Represents an HTML opening tag.

Properties

Property Type Description
tagName string Tag name (lowercase). Readable and writable.
namespaceURI string Namespace URI (readonly).
removed boolean Whether the element has been removed (readonly).
attributes IterableIterator<[string, string]> Iterable of [name, value] pairs (readonly).

Attribute Methods

getAttribute(name)

Returns the value of the attribute with the given name, or null if the attribute does not exist.

The attribute name.

Returns: string | null

hasAttribute(name)

Returns whether the element has an attribute with the given name.

The attribute name.

Returns: boolean

setAttribute(name, value)

Sets the value of the attribute with the given name. Adds the attribute if it does not exist.

The attribute name. The attribute value.

Returns: Element — the element itself, for chaining.

removeAttribute(name)

Removes the attribute with the given name. No-op if the attribute does not exist.

The attribute name.

Returns: Element — the element itself, for chaining.

Setters return the element itself, so calls can be chained:

el.setAttribute("class", "new")
  .setAttribute("id", "main")
  .removeAttribute("style");

Content Mutation Methods

All content mutation methods accept content as a string, ReadableStream<Uint8Array>, or Response, and an optional options object. They all return Element for chaining.

The content to insert. Strings are inserted directly. Streams and Response bodies are consumed and piped into the output. When `true`, content is inserted as raw HTML. When `false`, content is escaped as text.
before(content, options?)

Inserts content immediately before the element's opening tag.

Returns: Element

after(content, options?)

Inserts content immediately after the element's closing tag.

Returns: Element

prepend(content, options?)

Inserts content at the beginning of the element, right after the opening tag.

Returns: Element

append(content, options?)

Inserts content at the end of the element, right before the closing tag.

Returns: Element

replace(content, options?)

Replaces the entire element (opening tag, content, and closing tag) with the provided content.

Returns: Element

setInnerContent(content, options?)

Replaces the element's inner content, keeping the opening and closing tags.

Returns: Element

el.before("<hr>", { html: true })
  .setInnerContent("Hello")
  .after("<hr>", { html: true });

Removal Methods

remove()

Removes the element and all of its content (opening tag, children, closing tag).

Returns: Element

removeAndKeepContent()

Removes the element's opening and closing tags but keeps the inner content in place.

Returns: Element

End Tag Handler

onEndTag(handler)

Registers a handler that is called when the element's closing tag is encountered.

A callback receiving the [`EndTag`](#endtag) object. Can be async.

Returns: void

el.onEndTag((endTag) => {
  endTag.before("<hr>", { html: true });
});

Comment

Passed to comments handlers.

Properties

Property Type Description
text string The comment text, without <!-- and -->. Readable and writable.
removed boolean Whether the comment has been removed (readonly).

Methods

before(content, options?)

Inserts content immediately before the comment.

The content to insert. When `true`, content is inserted as raw HTML. When `false`, content is escaped as text.

Returns: Comment

after(content, options?)

Inserts content immediately after the comment.

The content to insert. When `true`, content is inserted as raw HTML. When `false`, content is escaped as text.

Returns: Comment

replace(content, options?)

Replaces the comment with the provided content.

The content to replace with. When `true`, content is inserted as raw HTML. When `false`, content is escaped as text.

Returns: Comment

remove()

Removes the comment from the document.

Returns: Comment

TextChunk

Passed to text handlers. Note: a single text node may be split across multiple chunks.

Properties

Property Type Description
text string The text content (readonly).
lastInTextNode boolean true if this is the last chunk in the text node (readonly).
removed boolean Whether this chunk has been removed (readonly).

Methods

before(content, options?)

Inserts content immediately before the text chunk.

The content to insert. When `true`, content is inserted as raw HTML. When `false`, content is escaped as text.

Returns: TextChunk

after(content, options?)

Inserts content immediately after the text chunk.

The content to insert. When `true`, content is inserted as raw HTML. When `false`, content is escaped as text.

Returns: TextChunk

replace(content, options?)

Replaces the text chunk with the provided content.

The content to replace with. When `true`, content is inserted as raw HTML. When `false`, content is escaped as text.

Returns: TextChunk

remove()

Removes the text chunk from the document.

Returns: TextChunk

EndTag

Passed to onEndTag handlers.

Properties

Property Type Description
name string The end tag name. Readable and writable.

Methods

before(content, options?)

Inserts content immediately before the end tag.

The content to insert. When `true`, content is inserted as raw HTML. When `false`, content is escaped as text.

Returns: EndTag

after(content, options?)

Inserts content immediately after the end tag.

The content to insert. When `true`, content is inserted as raw HTML. When `false`, content is escaped as text.

Returns: EndTag

remove()

Removes the end tag from the document.

Returns: EndTag

Doctype

Passed to doctype handlers. All properties are read-only.

Properties

Property Type Description
name string | null The doctype name (e.g. "html").
publicId string | null The PUBLIC identifier.
systemId string | null The SYSTEM identifier.

DocumentEnd

Passed to end handlers.

Methods

append(content, options?)

Appends content at the end of the document.

The content to append. Only accepts `string` (not streams or responses). When `true`, content is inserted as raw HTML. When `false`, content is escaped as text.

Returns: DocumentEnd

end.append("<!-- generated -->", { html: true });

Supported CSS Selectors

The following CSS selectors are supported, based on the W3C Selectors Level 4 specification.

Selector Description Spec
* Any element Universal selector
E Element of type E Type selector
E.class Element with class Class selector
E#id Element with ID ID selector
E:nth-child(n) The n-th child of its parent :nth-child()
E:first-child First child of its parent :first-child
E:nth-of-type(n) The n-th sibling of its type :nth-of-type()
E:first-of-type First sibling of its type :first-of-type
E:not(s) Element that does not match compound selector s :not()
E[attr] Element with attribute attr Attribute selector
E[attr="value"] Attribute exactly equals value Attribute selector
E[attr="value" i] Case-insensitive attribute match Case sensitivity
E[attr="value" s] Case-sensitive attribute match Case sensitivity
E[attr~="value"] Whitespace-separated list containing value Attribute selector
E[attr^="value"] Attribute starts with value Attribute selector
E[attr$="value"] Attribute ends with value Attribute selector
E[attr*="value"] Attribute contains value Attribute selector
E[attr|="value"] Hyphen-separated attribute starting with value Attribute selector
E F F descendant of E Descendant combinator
E > F F direct child of E Child combinator

Examples

Rewrite Links

import * as BunnySDK from "https://esm.sh/@bunny.net/edgescript-sdk@0.11.2";

BunnySDK.net.http.servePullZone()
  .onOriginResponse(async ({ response }) => {
    return new HTMLRewriter()
      .on("a[href]", {
        element(el) {
          const href = el.getAttribute("href");
          if (href?.startsWith("http://")) {
            el.setAttribute("href", href.replace("http://", "https://"));
          }
        },
      })
      .transform(response);
  });

Inject a Script

import * as BunnySDK from "https://esm.sh/@bunny.net/edgescript-sdk@0.11.2";

BunnySDK.net.http.servePullZone()
  .onOriginResponse(async ({ response }) => {
    return new HTMLRewriter()
      .onDocument({
        end(end) {
          end.append('<script src="/analytics.js"></script>', { html: true });
        },
      })
      .transform(response);
  });

Remove Elements

import * as BunnySDK from "https://esm.sh/@bunny.net/edgescript-sdk@0.11.2";

BunnySDK.net.http.servePullZone()
  .onOriginResponse(async ({ response }) => {
    return new HTMLRewriter()
      .on("script[src*='tracker']", {
        element(el) {
          el.remove();
        },
      })
      .on(".cookie-banner", {
        element(el) {
          el.remove();
        },
      })
      .transform(response);
  });

Async Handler

Handlers can return a Promise for async operations like sub-requests.

import * as BunnySDK from "https://esm.sh/@bunny.net/edgescript-sdk@0.11.2";

BunnySDK.net.http.servePullZone()
  .onOriginResponse(async ({ response }) => {
    return new HTMLRewriter()
      .on("include[src]", {
        async element(el) {
          const src = el.getAttribute("src");
          const partial = await fetch(src);
          el.replace(partial.body, { html: true });
        },
      })
      .transform(response);
  });

Class-Based Handlers

Instead of inline objects, you can define handler classes and pass instances to .on() or .onDocument().

import * as BunnySDK from "https://esm.sh/@bunny.net/edgescript-sdk@0.11.2";

class AttributeRewriter {
  #attrName;

  constructor(attrName) {
    this.#attrName = attrName;
  }

  element(el) {
    const value = el.getAttribute(this.#attrName);
    if (value) {
      el.setAttribute(this.#attrName, value.replace("http://", "https://"));
    }
  }
}

BunnySDK.net.http.servePullZone()
  .onOriginResponse(async ({ response }) => {
    return new HTMLRewriter()
      .on("a", new AttributeRewriter("href"))
      .on("img", new AttributeRewriter("src"))
      .transform(response);
  });

Document Handler Class

import * as BunnySDK from "https://esm.sh/@bunny.net/edgescript-sdk@0.11.2";

class StripComments {
  comments(comment) {
    comment.remove();
  }

  end(end) {
    end.append("<!-- cleaned -->", { html: true });
  }
}

BunnySDK.net.http.servePullZone()
  .onOriginResponse(async ({ response }) => {
    return new HTMLRewriter()
      .onDocument(new StripComments())
      .transform(response);
  });

Async Class Handler

Class methods can be async just like inline handlers.

import * as BunnySDK from "https://esm.sh/@bunny.net/edgescript-sdk@0.11.2";

class IncludeExpander {
  async element(el) {
    const src = el.getAttribute("src");
    if (src) {
      const partial = await fetch(src);
      el.replace(partial.body, { html: true });
    }
  }
}

BunnySDK.net.http.servePullZone()
  .onOriginResponse(async ({ response }) => {
    return new HTMLRewriter()
      .on("include[src]", new IncludeExpander())
      .transform(response);
  });

Multiple Selectors

Chain multiple .on() calls to handle different elements independently.

import * as BunnySDK from "https://esm.sh/@bunny.net/edgescript-sdk@0.11.2";

BunnySDK.net.http.servePullZone()
  .onOriginResponse(async ({ response }) => {
    return new HTMLRewriter()
      .on("title", {
        element(el) {
          el.setInnerContent("My Site");
        },
      })
      .on("meta[name='description']", {
        element(el) {
          el.setAttribute("content", "Custom description");
        },
      })
      .on("img", {
        element(el) {
          el.setAttribute("loading", "lazy");
        },
      })
      .transform(response);
  });

References