The element tree
The intermediate representation that connects capture to render.
Domotion's middle layer is a CapturedElement[] — a recursive
tree of plain objects that mirrors the captured DOM subtree. Knowing its
shape is useful when you need to diff captures, cache them,
or transform them between capture and render.
The shape, in one minute
interface CapturedElement {
tag: string; // "div", "span", "input", ...
text: string; // concatenated text content
x: number; y: number; // viewport-relative top-left
width: number; height: number;
styles: { /* ~80 computed-style fields */ };
children: CapturedElement[];
textSegments?: TextSegment[]; // per-line/per-run text data
// ...plus form-control and raster-fallback fields
}
The full type lives in CapturedElement;
this page focuses on the design, not every field.
Why a tree, not a flat list
SVG is hierarchical: a child group inherits its parent's transform,
clip, and opacity. Keeping the captured representation hierarchical means
the renderer can emit a <g transform="..."> wrapper once
per parent and let SVG semantics propagate the rest. A flat list would mean
re-applying every ancestor's clip path and transform on every leaf.
Why coordinates are viewport-relative
All x / y values in the tree are relative to the
capture viewport, not the document. That makes the tree directly composable
with elementTreeToSvg(tree, width, height) — the SVG starts at
(0, 0) and matches the dimensions you pass.
If you capture a subtree that's offset within the viewport, that offset is preserved. To "snap to top-left" instead, pass the element's bounding box as the viewport rect (see Your first capture).
Text segments
Text-bearing elements have an optional textSegments array.
Each segment represents one line (for multi-line text) or one same-styling
run (for mixed inline content like <span style="color:red">
inside a paragraph). Segments carry:
- The text string and its bounding box.
- An optional per-character x-offset array
— the viewport-absolute x for each visible character. The renderer
anchors each glyph at
xOffsets[i]instead of summing fontkit advances, which keeps captured text pixel-aligned with what Chromium painted (Chromium uses sub-pixel positioning that accumulates drift if you re-shape from advances). - Optional per-segment color / size / weight overrides for pseudos
like
::beforeand::after.
Form controls
<input>, <progress>,
<meter> and friends carry extra fields:
inputType, checked, indeterminate,
disabled, progressValue, etc. The renderer uses
those to synthesize the SVG markup that mimics Chromium's user-agent
shadow DOM. See Form controls.
Raster fall-back fields
For pieces that can't be expressed in SVG primitives, the capture script
records a clip rectangle on the segment or element, and the post-capture
rasteriser fills in a dataUri field with the corresponding
PNG. The renderer emits an <image> for those regions and
skips the normal text / box pipeline.
Inspecting a tree
The tree is plain JSON-serialisable data. To inspect it:
import { writeFileSync } from "node:fs";
const tree = await captureElementTree(page, "body", vp);
writeFileSync("tree.json", JSON.stringify(tree, null, 2));
Open tree.json alongside the rendered SVG and you can map
each element 1:1 between the two — useful when something looks wrong and
you need to know whether the bug is in capture or render.
Mutating before render
Because the tree is plain data, you can transform it. Some patterns:
- Strip an element. Walk the tree, splice the offending
node out of its parent's
children. - Replace text. Set
el.texton the element and clearel.textSegments(the renderer will re-shape). - Tint a region. Change
el.styles.backgroundColororel.styles.color. - Translate a subtree. Add or modify
el.styles.transformon the wrapping element.
If you change el.text after capture, you lose the
per-character xOffsets the capture script measured from
Chromium. The renderer will fall back to fontkit advance widths —
accurate enough for short labels, but text that should sub-pixel-align
with adjacent elements may drift slightly.
Next
Text rendering explains how the renderer
turns the textSegments on each element into SVG glyph paths.