Domotion

The capture pipeline

What actually happens between setContent() and the returned SVG.

Domotion runs three logical stages: capture inside Chromium, raster fall-back from Node, and render to SVG markup. Knowing where each step happens makes it easier to debug surprising output and to know which parts you can intercept or replace.

Three labeled boxes connected by arrows: HTML/CSS, Element tree, SVG. The arrows are labeled captureElementTree() and elementTreeToSvg().
The capture pipeline at a glance. (This diagram itself is a Domotion-generated SVG.)

Stage 1: in-page capture

When you call captureElementTree(page, selector, viewport), Domotion serialises a self-contained "capture script" and asks Playwright to page.evaluate() it inside the captured page. That script:

  1. Walks the DOM under selector in paint order.
  2. Reads getBoundingClientRect() and getComputedStyle() for every element.
  3. For text-bearing nodes, splits text into segments and (where possible) measures the per-character viewport-relative x offsets using a Range object — so the renderer can place each glyph exactly where Chromium placed it.
  4. Detects un-renderable regions (color-bitmap glyphs, emoji, certain backdrop-filter layers, etc.) and tags them with a raster rectangle so Stage 2 can fill in a screenshot.
  5. Returns one big CapturedElement[] tree plus a list of capture warnings for unsupported features.

Because the capture script is serialised and run via evaluate(), it can't import from the rest of Domotion's source — only globals available in the page (document, getComputedStyle, etc.) and the arguments passed in. This is why you'll see plain JavaScript, not TypeScript module imports, inside the capture stage.

Stage 2: bitmap glyph rasterisation

Some glyphs can't be expressed as font outlines: emoji, color-bitmap codepoints, certain Apple-only typographic shapes. For each such region the capture script left behind a rectangle; back in Node, Domotion calls page.screenshot({ clip }) on each unique rect, embeds the PNG as a base64 data URI on the matching tree node, and dedupes by (text, color, fontSize, fontWeight, size) so the same emoji used five times in the document only screenshots once.

This stage is invisible to you — it just adds a fraction of a second to capture latency when there are bitmap glyphs to handle.

Stage 3: SVG rendering

elementTreeToSvg(tree, width, height, idPrefix?) is pure: no Chromium, no I/O, just string concatenation. It walks the captured tree recursively and emits SVG primitives:

Captured featureSVG output
Background colors and per-side borders<rect> with rounded corners and clip paths.
Box shadowLayered <rect> + <filter> with feGaussianBlur.
Text segments<path> + <use> with shared glyph defs (path mode), or <text> with CSS font properties (text mode).
Linear / radial gradients<linearGradient> / <radialGradient> with px-positioned stops.
Clip paths and masks<clipPath> / <mask> entries hoisted to <defs>.
Form controlsHand-rolled SVG markup mimicking Chromium's user-agent shadow DOM (see Form controls).
Transformstransform="translate(...) rotate(...) ..." on the wrapping group.
Raster fall-backs<image href="data:image/png;base64,...">.

The renderer emits readable, lightly indented SVG so the output diffs cleanly. If you want it small for production, pipe it through optimizeSvg().

End-to-end timing

Order-of-magnitude numbers on an M1 MacBook for a typical hero-card capture:

  • Browser launch: ~400 ms (one-time, amortised across many captures).
  • Page load & settle: 200–800 ms depending on networkidle, fonts, and your own waits.
  • Stage 1 (in-page capture): 30–120 ms for a few hundred elements; the bottleneck is per-element getComputedStyle.
  • Stage 2 (bitmap rasters): 0 ms when there are no emoji / color-bitmap glyphs; ~60 ms per unique screenshot otherwise.
  • Stage 3 (SVG render): < 30 ms — string ops only.

If you're capturing many frames, keep one browser open for the duration of your script (chromium.launch() once, capture in a loop, then browser.close()) — relaunching dominates the timing.

Where to intervene

  • Modify the page before capture. Anything you can do with Playwright before captureElementTree works: page.addStyleTag, page.evaluate, page.click, etc. Hide a cookie banner, click a "show more" toggle, swap a color scheme.
  • Inspect or transform the tree. The CapturedElement[] returned from Stage 1 is a plain serialisable object. You can mutate it before passing to elementTreeToSvg (e.g. clip out an element, replace a text node, scale a region).
  • Post-process the SVG. The output is a string. Pass it through optimizeSvg() for size, or run your own regex if you need to re-namespace IDs or strip a class.

Next

The element tree walks the CapturedElement shape in detail, then Text rendering covers the path / text mode trade-off.