The element tree

The intermediate representation that connects capture to render.

Domotion's middle layer is a CapturedElement[] — a recursive tree of plain objects that mirrors the captured DOM subtree. Knowing its shape is useful when you need to diff captures, cache them, or transform them between capture and render.

The shape, in one minute

interface CapturedElement {
  tag: string;                              // "div", "span", "input", ...
  text: string;                             // concatenated text content
  x: number; y: number;                       // viewport-relative top-left
  width: number; height: number;
  styles: { /* ~80 computed-style fields */ };
  children: CapturedElement[];
  textSegments?: TextSegment[];                // per-line/per-run text data
  // ...plus form-control and raster-fallback fields
}

The full type lives in CapturedElement; this page focuses on the design, not every field.

Why a tree, not a flat list

SVG is hierarchical: a child group inherits its parent's transform, clip, and opacity. Keeping the captured representation hierarchical means the renderer can emit a <g transform="..."> wrapper once per parent and let SVG semantics propagate the rest. A flat list would mean re-applying every ancestor's clip path and transform on every leaf.

Why coordinates are viewport-relative

All x / y values in the tree are relative to the capture viewport, not the document. That makes the tree directly composable with elementTreeToSvg(tree, width, height) — the SVG starts at (0, 0) and matches the dimensions you pass.

If you capture a subtree that's offset within the viewport, that offset is preserved. To "snap to top-left" instead, pass the element's bounding box as the viewport rect (see Your first capture).

Text segments

Text-bearing elements have an optional textSegments array. Each segment represents one line (for multi-line text) or one same-styling run (for mixed inline content like <span style="color:red"> inside a paragraph). Segments carry:

The text string and its bounding box.
An optional per-character x-offset array — the viewport-absolute x for each visible character. The renderer anchors each glyph at xOffsets[i] instead of summing fontkit advances, which keeps captured text pixel-aligned with what Chromium painted (Chromium uses sub-pixel positioning that accumulates drift if you re-shape from advances).
Optional per-segment color / size / weight overrides for pseudos like ::before and ::after.

Form controls

<input>, <progress>, <meter> and friends carry extra fields: inputType, checked, indeterminate, disabled, progressValue, etc. The renderer uses those to synthesize the SVG markup that mimics Chromium's user-agent shadow DOM. See Form controls.

Raster fall-back fields

For pieces that can't be expressed in SVG primitives, the capture script records a clip rectangle on the segment or element, and the post-capture rasteriser fills in a dataUri field with the corresponding PNG. The renderer emits an <image> for those regions and skips the normal text / box pipeline.

Inspecting a tree

The tree is plain JSON-serialisable data. To inspect it:

import { writeFileSync } from "node:fs";
const tree = await captureElementTree(page, "body", vp);
writeFileSync("tree.json", JSON.stringify(tree, null, 2));

Open tree.json alongside the rendered SVG and you can map each element 1:1 between the two — useful when something looks wrong and you need to know whether the bug is in capture or render.

Mutating before render

Because the tree is plain data, you can transform it. Some patterns:

Strip an element. Walk the tree, splice the offending node out of its parent's children.
Replace text. Set el.text on the element and clear el.textSegments (the renderer will re-shape).
Tint a region. Change el.styles.backgroundColor or el.styles.color.
Translate a subtree. Add or modify el.styles.transform on the wrapping element.

⚠

If you change el.text after capture, you lose the per-character xOffsets the capture script measured from Chromium. The renderer will fall back to fontkit advance widths — accurate enough for short labels, but text that should sub-pixel-align with adjacent elements may drift slightly.

Text rendering explains how the renderer turns the textSegments on each element into SVG glyph paths.