xmldom @0.1.31
A pure JavaScript W3C standard-based (XML DOM Level 2 Core) DOMParser and XMLSerializer module.
Maintainers
Keywords
Dev Dependencies (1)
| Package | Constraint | Registry Status |
|---|---|---|
| proof | 0.0.28 | Not imported |
SAST Findings (9)
CVSS 9.8 (CRITICAL) — CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H ### Impact xmldom parses XML that is not well-formed because it contains multiple top level elements, and adds all root nodes to the `childNodes` collection of the `Document`, without reporting any error or throwing. This breaks the assumption that there is only a single root node in the tree, which led to https://nvd.nist.gov/vuln/detail/CVE-2022-39299 and is a potential issue for dependents. ### Patches Update to `@xmldom/xmldom@~0.7.7`, `@xmldom/xmldom@~0.8.4` (dist-tag `latest`) or `@xmldom/xmldom@>=0.9.0-beta.4` (dist-tag `next`). ### Workarounds One of the following approaches might help, depending on your use case: - Instead of searching for elements in the whole DOM, only search in the `documentElement`. - Reject a document with a document that has more then 1 `childNode`. ### References - https://nvd.nist.gov/vuln/detail/CVE-2022-39299 - https://github.com/jindw/xmldom/issues/150 ### For more information If you have any questions or comments about this advisory: * Email us at [email protected]
## Summary Seven recursive traversals in `lib/dom.js` operate without a depth limit. A sufficiently deeply nested DOM tree causes a `RangeError: Maximum call stack size exceeded`, crashing the application. **Reported operations:** - `Node.prototype.normalize()` — reported by @praveen-kv (email 2026-04-05) and @KarimTantawey (GHSA-fwmp-8wwc-qhv6, via `DOMParser.parseFromString()`) - `XMLSerializer.serializeToString()` — reported by @Jvr2022 (GHSA-2v35-w6hq-6mfw) and @KarimTantawey (GHSA-j2hf-fqwf-rrjf) **Additionally, discovered in research:** - `Element.getElementsByTagName()` / `getElementsByTagNameNS()` / `getElementsByClassName()` / `getElementById()` - `Node.cloneNode(true)` - `Document.importNode(node, true)` - `node.textContent` (getter) - `Node.isEqualNode(other)` All seven share the same root cause: pure-JavaScript recursive tree traversal with no depth guard. A single deeply nested document (parsed successfully) triggers any or all of these operations. --- ## Details ### Root cause `lib/dom.js` implements DOM tree traversals as depth-first recursive functions. Each level of element nesting adds one JavaScript call frame. The JS engine's call stack is finite; once exhausted, a `RangeError: Maximum call stack size exceeded` is thrown. This error may not be caught reliably at stack-exhaustion depths because the catch handler itself requires stack frames to execute — especially in async scenarios, where an uncaught `RangeError` inside a callback or promise chain can crash the entire Node.js process. Parsing a deeply nested document **succeeds** — the SAX parser in `lib/sax.js` is iterative. The crash occurs during subsequent operations on the parsed DOM. ### `Node.prototype.normalize()` — reported by @praveen-kv [`lib/dom.js:1296–1308`](https://github.com/xmldom/xmldom/blob/9ef2fd297ca527a05ecb11979850317a927cd20c/lib/dom.js#L1296-L1308) (main): ```js normalize: function () { var child = this.firstChild; while (child) { var next = child.nextSibling; if (next && next.nodeType == TEXT_NODE && child.nodeType == TEXT_NODE) { this.removeChild(next); child.appendData(next.data); } else { child.normalize(); // recursive call — no depth guard child = next; } } }, ``` Crash threshold (Node.js 18, default stack): ~10,000 levels. ### `XMLSerializer.serializeToString()` — reported by @Jvr2022 [`lib/dom.js:2790–2974`](https://github.com/xmldom/xmldom/blob/9ef2fd297ca527a05ecb11979850317a927cd20c/lib/dom.js#L2790-L2974) (main): The internal `serializeToString` worker recurses into child nodes at four call sites, each passing a `visibleNamespaces.slice()` copy. The per-frame allocation causes earlier stack exhaustion than `normalize()`. Crash threshold (Node.js 18, default stack): ~5,000 levels. ### Additional recursive entry points All five crash at ~10,000 levels on Node.js 18. | Function | Definition | Public API entry point(s) | Crash depth (Node.js 18) | |-----------------------------|----------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|--------------------------| | `_visitNode` | [`lib/dom.js:1529`](https://github.com/xmldom/xmldom/blob/9ef2fd297ca527a05ecb11979850317a927cd20c/lib/dom.js#L1529) | `getElementsByTagName()`, `getElementsByTagNameNS()`, `getElementsByClassName()`, `getElementById()` | ~10,000 levels | | `cloneNode` (module fn) | [`lib/dom.js:3037`](https://github.com/xmldom/xmldom/blob/9ef2fd297ca527a05ecb11979850317a927cd20c/lib/dom.js#L3037) | `Node.prototype.cloneNode(true)` | ~10,000 levels | | `importNode` (module fn) | [`lib/dom.js:2975`](https://github.com/xmldom/xmldom/blob/9ef2fd297ca527a05ecb11979850317a927cd20c/lib/dom.js#L2975) | `Document.prototype.importNode(node, true)` | ~10,000 levels | | `getTextContent` (inner fn) | [`lib/dom.js:3130`](https://github.com/xmldom/xmldom/blob/9ef2fd297ca527a05ecb11979850317a927cd20c/lib/dom.js#L3130) | `node.textContent` (getter) | ~10,000 levels | | `isEqualNode` | [`lib/dom.js:1120`](https://github.com/xmldom/xmldom/blob/9ef2fd297ca527a05ecb11979850317a927cd20c/lib/dom.js#L1120) | `Node.prototype.isEqualNode(other)` | ~10,000 levels | Both active branches (`main` and `release-0.8.x`) are identically affected. The unscoped `xmldom` package (≤ 0.6.0) carries the same recursive patterns from its initial commit. ### Browser behavior Tested with Chromium 147 (Playwright headless). Chromium's native C++ implementations of all seven DOM methods are **iterative** — they traverse the DOM without consuming JS call stack frames. All seven succeed at depths up to 20,000 without any crash. When `@xmldom/xmldom` is bundled and run in a browser context the same recursive JS code executes under the browser's V8 stack limit (~12,000–13,000 frames). The crash thresholds are similar to those observed on Node.js 18 (~5,000 for `serializeToString`, ~10,000 for the remaining six). The vulnerability is specific to xmldom's pure-JavaScript recursive implementation, not an inherent property of the DOM operations. --- ## PoC ### `normalize()` (from @praveen-kv report, 2026-04-05) ```js const { DOMParser } = require('@xmldom/xmldom'); function generateNestedXML(depth) { return '<root>' + '<a>'.repeat(depth) + 'text' + '</a>'.repeat(depth) + '</root>'; } const doc = new DOMParser().parseFromString(generateNestedXML(10000), 'text/xml'); doc.documentElement.normalize(); // RangeError: Maximum call stack size exceeded ``` ### `XMLSerializer.serializeToString()` (from GHSA-2v35-w6hq-6mfw) ```js const { DOMParser, XMLSerializer } = require('@xmldom/xmldom'); const depth = 5000; const xml = '<a>'.repeat(depth) + '</a>'.repeat(depth); const doc = new DOMParser().parseFromString(xml, 'text/xml'); new XMLSerializer().serializeToString(doc); // RangeError: Maximum call stack size exceeded ``` The other methods have been verified using similar pocs. --- ## Impact Any service that accepts attacker-controlled XML and subsequently calls any of the seven affected DOM operations can be forced into a reliable denial of service with a single crafted payload. The immediate result is an uncaught `RangeError` and failed request processing. In deployments where uncaught exceptions terminate the worker or process, the impact can extend beyond a single request and disrupt service availability more broadly. No authentication, special options, or invalid XML is required. A valid, deeply nested XML document is enough. --- ## Disclosure The `normalize()` vector was publicly disclosed at 2026-04-06T11:25:07Z via [xmldom/xmldom#987](https://github.com/xmldom/xmldom/pull/987) (closed without merge). `serializeToString()` and the five additional recursive entry points were not mentioned in that PR. --- ## Fix Applied All seven affected traversals have been converted from recursive to iterative implementations, eliminating call-stack consumption on deep trees. ### `walkDOM` utility A new `walkDOM(node, context, callbacks)` utility is introduced. It traverses the subtree rooted at `node` in depth-first order using an explicit JavaScript array as a stack, consuming heap memory instead of call-stack frames. `context` is an arbitrary value threaded through the walk — each `callbacks.enter(node, context)` call returns the context to pass to that node's children, enabling per-branch state (e.g. namespace snapshots in the serializer). `callbacks.exit(node, context)` (optional) is called in post-order after all children have been visited. The following six operations are re-implemented on top of `walkDOM`: | Operation | Public entry point(s) | |---|---| | `_visitNode` helper | `getElementsByTagName()`, `getElementsByTagNameNS()`, `getElementsByClassName()`, `getElementById()` | | `getTextContent` inner function | `node.textContent` getter | | `cloneNode` module function | `Node.prototype.cloneNode(true)` | | `importNode` module function | `Document.prototype.importNode(node, true)` | | `serializeToString` worker | `XMLSerializer.prototype.serializeToString()`, `Node.prototype.toString()`, `NodeList.prototype.toString()` | | `normalize` | `Node.prototype.normalize()` | `normalize` uses `walkDOM` with a `null` context and an `enter` callback that merges adjacent Text children of the current node before `walkDOM` reads and queues those children — so the surviving post-merge children are what the walker descends into. ### Custom iterative loop for `isEqualNode` One function cannot use `walkDOM`: **`Node.prototype.isEqualNode(other)`** (0.9.x only; absent from 0.8.x) compares two trees in parallel. It maintains an explicit stack of `{node, other}` node pairs — one node from each tree — which cannot be expressed with `walkDOM`'s single-tree visitor. ### After the fix All seven entry points succeed on trees of arbitrary depth without throwing `RangeError`. The original PoCs still demonstrate the vulnerability on unpatched versions and confirm the fix on patched versions.
## Summary The package serializes `DocumentType` node fields (`internalSubset`, `publicId`, `systemId`) verbatim without any escaping or validation. When these fields are set programmatically to attacker-controlled strings, `XMLSerializer.serializeToString` can produce output where the DOCTYPE declaration is terminated early and arbitrary markup appears outside it. --- ## Details `DOMImplementation.createDocumentType(qualifiedName, publicId, systemId, internalSubset)` validates only `qualifiedName` against the XML QName production. The remaining three arguments are stored as-is with no validation. The XMLSerializer emits `DocumentType` nodes as: ``` <!DOCTYPE name[ PUBLIC pubid][ SYSTEM sysid][ [internalSubset]]> ``` All fields are pushed into the output buffer verbatim — no escaping, no quoting added. **`internalSubset` injection:** The serializer wraps `internalSubset` with ` [` and `]`. A value containing `]>` closes the internal subset and the DOCTYPE declaration at the injection point. Any content after `]>` in `internalSubset` appears outside the DOCTYPE in the serialized output as raw XML markup. Reported by @TharVid (GHSA-f6ww-3ggp-fr8h). Affected: `@xmldom/xmldom` ≥ 0.9.0 via `createDocumentType` API; 0.8.x only via direct property write. **`publicId` injection:** The serializer emits `publicId` verbatim after `PUBLIC` with no quoting added. A value containing an injected system identifier (e.g., `"pubid" SYSTEM "evil"`) breaks the intended quoting context, injecting a fake SYSTEM entry into the serialized DOCTYPE declaration. Identified during internal security research. Affected: both branches, all versions back to 0.1.0. **`systemId` injection:** The serializer emits `systemId` verbatim. A value containing `>` terminates the DOCTYPE declaration early; content after `>` appears as raw XML markup outside the DOCTYPE context. Identified during internal security research. Affected: both branches, all versions back to 0.1.0. The parse path is safe: the SAX parser enforces the `PubidLiteral` and `SystemLiteral` grammar productions, which exclude the relevant characters, and the internal subset parser only accepts a subset it can structurally validate. The vulnerability is reachable only through programmatic `createDocumentType` calls with attacker-controlled arguments. --- ## Affected code **`lib/dom.js` — `createDocumentType` (lines 898–910):** ```js createDocumentType: function (qualifiedName, publicId, systemId, internalSubset) { validateQualifiedName(qualifiedName); // only qualifiedName is validated var node = new DocumentType(PDC); node.name = qualifiedName; node.nodeName = qualifiedName; node.publicId = publicId || ''; // stored verbatim node.systemId = systemId || ''; // stored verbatim node.internalSubset = internalSubset || ''; // stored verbatim node.childNodes = new NodeList(); return node; }, ``` **`lib/dom.js` — serializer DOCTYPE case (lines 2948–2964):** ```js case DOCUMENT_TYPE_NODE: var pubid = node.publicId; var sysid = node.systemId; buf.push(g.DOCTYPE_DECL_START, ' ', node.name); if (pubid) { buf.push(' ', g.PUBLIC, ' ', pubid); if (sysid && sysid !== '.') { buf.push(' ', sysid); } } else if (sysid && sysid !== '.') { buf.push(' ', g.SYSTEM, ' ', sysid); } if (node.internalSubset) { buf.push(' [', node.internalSubset, ']'); // internalSubset emitted verbatim } buf.push('>'); return; ``` --- ## PoC ### internalSubset injection ```js const { DOMImplementation, XMLSerializer } = require('@xmldom/xmldom'); const impl = new DOMImplementation(); const doctype = impl.createDocumentType( 'root', '', '', ']><injected/><![CDATA[' ); const doc = impl.createDocument(null, 'root', doctype); const xml = new XMLSerializer().serializeToString(doc); console.log(xml); // <!DOCTYPE root []><injected/><![CDATA[]><root/> // ^^^^^^^^^^ injected element outside DOCTYPE ``` ### publicId quoting context break ```js const { DOMImplementation, XMLSerializer } = require('@xmldom/xmldom'); const impl = new DOMImplementation(); const doctype = impl.createDocumentType( 'root', '"injected PUBLIC_ID" SYSTEM "evil"', '', '' ); const doc = impl.createDocument(null, 'root', doctype); console.log(new XMLSerializer().serializeToString(doc)); // <!DOCTYPE root PUBLIC "injected PUBLIC_ID" SYSTEM "evil"><root/> // quoting context broken — SYSTEM entry injected ``` ### systemId injection ```js const { DOMImplementation, XMLSerializer } = require('@xmldom/xmldom'); const impl = new DOMImplementation(); const doctype = impl.createDocumentType( 'root', '', '"sysid"><injected attr="pwn"/>', '' ); const doc = impl.createDocument(null, 'root', doctype); console.log(new XMLSerializer().serializeToString(doc)); // <!DOCTYPE root SYSTEM "sysid"><injected attr="pwn"/>><root/> // > in sysid closes DOCTYPE early; <injected/> appears as sibling element ``` --- ## Impact An application that programmatically constructs `DocumentType` nodes from user-controlled data and then serializes the document can emit a DOCTYPE declaration where the internal subset is closed early or where injected SYSTEM entities or other declarations appear in the serialized output. Downstream XML parsers that re-parse the serialized output and expand entities from the injected DOCTYPE declarations may be susceptible to XXE-class attacks if they enable entity expansion. --- ## Fix Applied > **⚠ Opt-in required.** Protection is not automatic. Existing serialization calls remain > vulnerable unless `{ requireWellFormed: true }` is explicitly passed. Applications that pass > untrusted data to `createDocumentType()` or write untrusted values directly to a > `DocumentType` node's `publicId`, `systemId`, or `internalSubset` properties should audit > all `serializeToString()` call sites and add the option. `XMLSerializer.serializeToString()` now accepts an options object as a second argument. When `{ requireWellFormed: true }` is passed, the serializer validates the `DocumentType` node's `publicId`, `systemId`, and `internalSubset` fields before emitting the DOCTYPE declaration and throws `InvalidStateError` if any field contains an injection sequence: - **`publicId`**: throws if non-empty and does not match the XML `PubidLiteral` production (XML 1.0 [12]) - **`systemId`**: throws if non-empty and does not match the XML `SystemLiteral` production (XML 1.0 [11]) - **`internalSubset`**: throws if it contains `]>` (which closes the internal subset and DOCTYPE declaration early) All three checks apply regardless of how the invalid value entered the node — whether via `createDocumentType` arguments or a subsequent direct property write. ### PoC — fixed path ```js const { DOMImplementation, XMLSerializer } = require('@xmldom/xmldom'); const impl = new DOMImplementation(); // internalSubset injection const dt1 = impl.createDocumentType('root', '', '', ']><injected/><![CDATA['); const doc1 = impl.createDocument(null, 'root', dt1); // Default (unchanged): verbatim — injection present console.log(new XMLSerializer().serializeToString(doc1)); // <!DOCTYPE root []><injected/><![CDATA[]><root/> // Opt-in guard: throws InvalidStateError try { new XMLSerializer().serializeToString(doc1, { requireWellFormed: true }); } catch (e) { console.log(e.name, e.message); // InvalidStateError: DocumentType internalSubset contains "]>" } ``` The guard also covers post-creation property writes: ```js const dt2 = impl.createDocumentType('root', '', ''); dt2.systemId = '"sysid"><injected attr="pwn"/>'; const doc2 = impl.createDocument(null, 'root', dt2); new XMLSerializer().serializeToString(doc2, { requireWellFormed: true }); // InvalidStateError: DocumentType systemId is not a valid SystemLiteral ``` ### Why the default stays verbatim The W3C DOM Parsing and Serialization spec §3.2.1.3 defines a `require well-formed` flag whose **default value is `false`**. With the flag unset, the spec permits verbatim serialization of DOCTYPE fields. Unconditionally throwing would be a behavioral breaking change with no spec justification. The opt-in `requireWellFormed: true` flag allows applications that require injection safety to enable strict mode without breaking existing deployments. ### Residual limitation `createDocumentType(qualifiedName, publicId, systemId[, internalSubset])` does not validate `publicId`, `systemId`, or `internalSubset` at creation time. This creation-time validation is a breaking change and is deferred to a future breaking release. When the default serialization path is used (without `requireWellFormed: true`), all three fields are still emitted verbatim. Applications that do not pass `requireWellFormed: true` remain exposed.
## Summary The package allows attacker-controlled comment content to be serialized into XML without validating or neutralizing comment breaking sequences. As a result, an attacker can terminate the comment early and inject arbitrary XML nodes into the serialized output. --- ## Details The issue is in the DOM construction and serialization flow for comment nodes. When `createComment(data)` is called, the supplied string is stored as comment data through the generic character-data handling path. That content is kept as-is. Later, when the document is serialized, the serializer writes comment nodes by concatenating the XML comment delimiters with the stored `node.data` value directly. That behavior is unsafe because XML comments are a syntax-sensitive context. If attacker-controlled input contains a sequence that closes the comment, the serializer does not preserve it as literal comment text. Instead, it emits output where the remainder of the payload is treated as live XML markup. This is a real injection bug, not a formatting issue. The serializer already applies context-aware handling in other places, such as escaping text nodes and rewriting unsafe CDATA terminators. Comment content does not receive equivalent treatment. Because of that gap, untrusted data can break out of the comment boundary and modify the structure of the final XML document. --- ## PoC ```js const { DOMImplementation, DOMParser, XMLSerializer } = require('@xmldom/xmldom'); const doc = new DOMImplementation().createDocument(null, 'root', null); doc.documentElement.appendChild( doc.createComment('--><injected attr="1"/><!--') ); const xml = new XMLSerializer().serializeToString(doc); console.log(xml); // <root><!----><injected attr="1"/><!----></root> const reparsed = new DOMParser().parseFromString(xml, 'text/xml'); console.log(reparsed.documentElement.childNodes.item(1).nodeName); // injected ``` --- ## Impact An application that uses the package to build XML from untrusted input can be made to emit attacker-controlled elements outside the intended comment boundary. That allows the attacker to alter the meaning and structure of generated XML documents. In practice, this can affect any workflow that generates XML and then stores it, forwards it, signs it, or hands it to another parser. Realistic targets include XML-based configuration, policy documents, and message formats where downstream consumers trust the serialized structure. --- ## Disclosure This vulnerability was publicly disclosed at 2026-04-06T11:25:07Z via [xmldom/xmldom#987](https://github.com/xmldom/xmldom/pull/987), which was subsequently closed without being merged. --- ## Fix Applied > **⚠ Opt-in required.** Protection is not automatic. Existing serialization calls remain > vulnerable unless `{ requireWellFormed: true }` is explicitly passed. Applications that pass > untrusted data to `createComment()` or mutate comment nodes with untrusted input (via > `appendData`, `insertData`, `replaceData`, `.data =`, or `.textContent =`) should audit all > `serializeToString()` call sites and add the option. `XMLSerializer.serializeToString()` now accepts an options object as a second argument. When `{ requireWellFormed: true }` is passed, the serializer throws `InvalidStateError` before emitting a Comment node whose `.data` would produce malformed XML. On `@xmldom/xmldom` ≥ 0.9.10, the full W3C DOM Parsing §3.2.1.4 check is applied: throws if `.data` contains `--` anywhere, ends with `-`, or contains characters outside the XML Char production. On `@xmldom/xmldom` ≥ 0.8.13 (LTS), only the `-->` injection sequence is checked. The `0.8.x` SAX parser accepts comments containing `--` (without `>`), so throwing on bare `--` would break a previously-working round-trip on that branch. The `-->` check is sufficient to prevent injection. ### PoC — fixed path ```js const { DOMImplementation, XMLSerializer } = require('@xmldom/xmldom'); const doc = new DOMImplementation().createDocument(null, 'root', null); doc.documentElement.appendChild(doc.createComment('--><injected attr="1"/><!--')); // Default (unchanged): verbatim — injection present const unsafe = new XMLSerializer().serializeToString(doc); console.log(unsafe); // <root><!----><injected attr="1"/><!----></root> // Opt-in guard: throws InvalidStateError before serializing try { new XMLSerializer().serializeToString(doc, { requireWellFormed: true }); } catch (e) { console.log(e.name, e.message); // InvalidStateError: The comment node data contains "--" or ends with "-" (0.9.x) // InvalidStateError: The comment node data contains "-->" (0.8.x — only --> is checked) } ``` ### Why the default stays verbatim The W3C DOM Parsing and Serialization spec §3.2.1.4 defines a `require well-formed` flag whose **default value is `false`**. With the flag unset, the spec explicitly permits serializing ill-formed comment content verbatim — this is also the behavior of browser implementations (Chrome, Firefox, Safari): `new XMLSerializer().serializeToString(doc)` produces the injection sequence without error in all major browsers. Unconditionally throwing would be a behavioral breaking change with no spec justification. The opt-in `requireWellFormed: true` flag allows applications that require injection safety to enable strict mode without breaking existing deployments. ### Residual limitation The fix operates at serialization time only. There is no creation-time check in `createComment` — the spec does not require one for comment data. Any path that leads to a Comment node with `--` in its data (`createComment`, `appendData`, `.data =`, etc.) produces a node that serializes safely only when `{ requireWellFormed: true }` is passed to `serializeToString`.
CVSS 7.5 (HIGH) — CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:H/A:N ## Summary `@xmldom/xmldom` allows attacker-controlled strings containing the CDATA terminator `]]>` to be inserted into a `CDATASection` node. During serialization, `XMLSerializer` emitted the CDATA content verbatim without rejecting or safely splitting the terminator. As a result, data intended to remain text-only became **active XML markup** in the serialized output, enabling XML structure injection and downstream business-logic manipulation. The sequence `]]>` is not allowed inside CDATA content and must be rejected or safely handled during serialization. ([MDN Web Docs](https://developer.mozilla.org/)) ### Attack surface `Document.createCDATASection(data)` is the most direct entry point, but it is not the only one. The WHATWG DOM spec intentionally does not validate `]]>` in mutation methods — only `createCDATASection` carries that guard. The following paths therefore also allow `]]>` to enter a CDATASection node and reach the serializer: - `CharacterData.appendData()` - `CharacterData.replaceData()` - `CharacterData.insertData()` - Direct assignment to `.data` - Direct assignment to `.textContent` (Note: assigning to `.nodeValue` does **not** update `.data` in this implementation — the serializer reads `.data` directly — so `.nodeValue` is not an exploitable path.) ### Parse path Parsing XML that contains a CDATA section is **not** affected. The SAX parser's non-greedy `CDSect` regex stops at the first `]]>`, so parsed CDATA data never contains the terminator. --- ## Impact If an application uses `xmldom` to generate "trusted" XML documents that embed **untrusted user input** inside CDATA (a common pattern in exports, feeds, SOAP/XML integrations, etc.), an attacker can inject additional XML elements/attributes into the generated document. This can lead to: - Integrity violation of generated XML documents. - Business-logic injection in downstream consumers (e.g., injecting `<approved>true</approved>`, `<role>admin</role>`, workflow flags, or other security-relevant elements). - Unexpected privilege/workflow decisions if downstream logic assumes injected nodes cannot appear. This issue does **not** require malformed parsers or browser behavior; it is caused by serialization producing attacker-influenced XML markup. --- ## Root Cause (with file + line numbers) **File:** `lib/dom.js` ### 1. No validation in `createCDATASection` `createCDATASection: function (data)` accepts any string and appends it directly. - **Lines 2216–2221** (0.9.8) ### 2. Unsafe CDATA serialization Serializer prints CDATA sections as: ``` <![CDATA[ + node.data + ]]> ``` without handling `]]>` in the data. - **Lines 2919–2920** (0.9.8) Because CDATA content is emitted verbatim, an embedded `]]>` closes the CDATA section early and the remainder of the attacker-controlled payload is interpreted as markup in the serialized XML. --- ## Proof of Concept — Fix A: `createCDATASection` now throws On patched versions, passing `]]>` directly to `createCDATASection` throws `InvalidCharacterError` instead of silently accepting the payload: ```js const { DOMImplementation } = require('./lib'); const doc = new DOMImplementation().createDocument(null, 'root', null); try { doc.createCDATASection('SAFE]]><injected attr="pwn"/>'); console.log('VULNERABLE — no error thrown'); } catch (e) { console.log('FIXED — threw:', e.name); // InvalidCharacterError } ``` Expected output on patched versions: ``` FIXED — threw: InvalidCharacterError ``` --- ## Proof of Concept — Fix B: mutation vector now safe On patched versions, injecting `]]>` via a mutation method (`appendData`, `replaceData`, `.data =`, `.textContent =`) no longer produces injectable output. The serializer splits the terminator so the result round-trips as safe text: ```js const { DOMImplementation, XMLSerializer } = require('./lib'); const { DOMParser } = require('./lib'); const doc = new DOMImplementation().createDocument(null, 'root', null); // Start with safe data, then mutate to include the terminator const cdata = doc.createCDATASection('safe'); doc.documentElement.appendChild(cdata); cdata.appendData(']]><injected attr="pwn"/><more>TEXT</more><![CDATA['); const out = new XMLSerializer().serializeToString(doc); console.log('Serialized:', out); const reparsed = new DOMParser().parseFromString(out, 'text/xml'); const injected = reparsed.getElementsByTagName('injected').length > 0; console.log('Injected element found in reparsed doc:', injected); // VULNERABLE: true | FIXED: false ``` Expected output on patched versions: ``` Serialized: <root><![CDATA[safe]]]]><![CDATA[><injected attr="pwn"/><more>TEXT</more><![CDATA[]]></root> Injected element found in reparsed doc: false ``` --- ## Fix Applied Both mitigations were implemented: ### Option A — Strict/spec-aligned: reject `]]>` in `createCDATASection()` `Document.createCDATASection(data)` now throws `InvalidCharacterError` (per the [WHATWG DOM spec](https://dom.spec.whatwg.org/#dom-document-createcdatasection)) when `data` contains `]]>`. This closes the direct entry point. Code that previously passed a string containing `]]>` to `createCDATASection` and relied on the silent/unsafe behaviour will now receive `InvalidCharacterError`. Use a mutation method such as `appendData` if you intentionally need `]]>` in a CDATASection node's data (the serializer split in Option B will keep the output safe). ### Option B — Defensive serialization: split the terminator during serialization `XMLSerializer` now replaces every occurrence of `]]>` in CDATA section data with the split sequence `]]]]><![CDATA[>` before emitting. This closes all mutation-vector paths that Option A alone cannot guard, and means the serialized output is always well-formed XML regardless of how `]]>` entered the node.
## Summary The package allows attacker-controlled processing instruction data to be serialized into XML without validating or neutralizing the PI-closing sequence `?>`. As a result, an attacker can terminate the processing instruction early and inject arbitrary XML nodes into the serialized output. --- ## Details The issue is in the DOM construction and serialization flow for processing instruction nodes. When `createProcessingInstruction(target, data)` is called, the supplied `data` string is stored directly on the node without validation. Later, when the document is serialized, the serializer writes PI nodes by concatenating `<?`, the target, a space, `node.data`, and `?>` directly. That behavior is unsafe because processing instructions are a syntax-sensitive context. The closing delimiter `?>` terminates the PI. If attacker-controlled input contains `?>`, the serializer does not preserve it as literal PI content. Instead, it emits output where the remainder of the payload is treated as live XML markup. The same class of vulnerability was previously addressed for CDATA sections (GHSA-wh4c-j3r5-mjhp / CVE-2026-34601), where `]]>` in CDATA data was handled by splitting. The serializer applies no equivalent protection to processing instruction data. --- ## Affected code **`lib/dom.js` — `createProcessingInstruction` (lines 2240–2246):** ```js createProcessingInstruction: function (target, data) { var node = new ProcessingInstruction(PDC); node.ownerDocument = this; node.childNodes = new NodeList(); node.nodeName = node.target = target; node.nodeValue = node.data = data; return node; }, ``` No validation is performed on `data`. Any string including `?>` is stored as-is. **`lib/dom.js` — serializer PI case (line 2966):** ```js case PROCESSING_INSTRUCTION_NODE: return buf.push('<?', node.target, ' ', node.data, '?>'); ``` `node.data` is emitted verbatim. If it contains `?>`, that sequence terminates the PI in the output stream and the remainder appears as active XML markup. **Contrast — CDATA (line 2945, patched):** ```js case CDATA_SECTION_NODE: return buf.push(g.CDATA_START, node.data.replace(/]]>/g, ']]]]><, which was subsequently closed without being merged. --- ## Fix Applied > **⚠ Opt-in required.** Protection is not automatic. Existing serialization calls remain > vulnerable unless `{ requireWellFormed: true }` is explicitly passed. Applications that pass > untrusted data to `createProcessingInstruction()` or mutate PI nodes with untrusted input > (via `.data =` or `CharacterData` mutation methods) should audit all `serializeToString()` > call sites and add the option. `XMLSerializer.serializeToString()` now accepts an options object as a second argument. When `{ requireWellFormed: true }` is passed, the serializer throws `InvalidStateError` before emitting any ProcessingInstruction node whose `.data` contains `?>`. This check applies regardless of how `?>` entered the node — whether via `createProcessingInstruction` directly or a subsequent mutation (`.data =`, `CharacterData` methods). On `@xmldom/xmldom` ≥ 0.9.10, the serializer additionally applies the full W3C DOM Parsing §3.2.1.7 checks when `requireWellFormed: true`: 1. **Target check**: throws `InvalidStateError` if the PI target contains a `:` character or is an ASCII case-insensitive match for `"xml"`. 2. **Data Char check**: throws `InvalidStateError` if the PI data contains characters outside the XML Char production. 3. **Data sequence check**: throws `InvalidStateError` if the PI data contains `?>`. On `@xmldom/xmldom` ≥ 0.8.13 (LTS), only the `?>` data check (check 3) is applied. The target and XML Char checks are not included in the LTS fix. ### PoC — fixed path ```js const { DOMImplementation, XMLSerializer } = require('@xmldom/xmldom'); const doc = new DOMImplementation().createDocument(null, 'r', null); doc.documentElement.appendChild(doc.createProcessingInstruction('a', '?><z/><?q ')); // Default (unchanged): verbatim — injection present const unsafe = new XMLSerializer().serializeToString(doc); console.log(unsafe); // <r><?a ?><z/><?q ?></r> // Opt-in guard: throws InvalidStateError before serializing try { new XMLSerializer().serializeToString(doc, { requireWellFormed: true }); } catch (e) { console.log(e.name, e.message); // InvalidStateError: The ProcessingInstruction data contains "?>" } ``` The guard catches `?>` regardless of when it was introduced: ```js // Post-creation mutation: also caught at serialization time const pi = doc.createProcessingInstruction('target', 'safe data'); doc.documentElement.appendChild(pi); pi.data = 'safe?><injected/>'; new XMLSerializer().serializeToString(doc, { requireWellFormed: true }); // InvalidStateError: The ProcessingInstruction data contains "?>" ``` ### Why the default stays verbatim The W3C DOM Parsing and Serialization spec §3.2.1.3 defines a `require well-formed` flag whose **default value is `false`**. With the flag unset, the spec explicitly permits serializing PI data verbatim. This matches browser behavior: Chrome, Firefox, and Safari all emit `?>` in PI data verbatim by default without error. Unconditionally throwing would be a behavioral breaking change with no spec justification. The opt-in `requireWellFormed: true` flag allows applications that require injection safety to enable strict mode without breaking existing code. ### Residual limitation `createProcessingInstruction(target, data)` does not validate `data` at creation time. The WHATWG DOM spec (§4.5 step 2) mandates an `InvalidCharacterError` when `data` contains `?>`; enforcing this check unconditionally at creation time is a breaking change and is deferred to a future breaking release. When the default serialization path is used (without `requireWellFormed: true`), PI data containing `?>` is still emitted verbatim. Applications that do not pass `requireWellFormed: true` remain exposed.
CVSS 6.5 (MEDIUM) — CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:L/A:N ### Impact xmldom versions 0.6.0 and older do not correctly escape special characters when serializing elements removed from their ancestor. This may lead to unexpected syntactic changes during XML processing in some downstream applications. ### Patches Update to one of the fixed versions of `@xmldom/xmldom` (`>=0.7.0`) See issue #271 for the status of publishing `xmldom` to npm or join #270 for Q&A/discussion until it's resolved. ### Workarounds Downstream applications can validate the input and reject the maliciously crafted documents. ### References Similar to this one reported on the Go standard library: - https://mattermost.com/blog/coordinated-disclosure-go-xml-vulnerabilities/ - https://mattermost.com/blog/securing-xml-implementations-across-the-web/ ### For more information If you have any questions or comments about this advisory: * Open an issue in [`xmldom/xmldom`](https://github.com/xmldom/xmldom) * Email us: send an email to **all** addresses that are shown by `npm owner ls @xmldom/xmldom`
CVSS 4.3 (MEDIUM) — CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:L/A:N ### Impact xmldom versions 0.4.0 and older do not correctly preserve [system identifiers](https://www.w3.org/TR/2008/REC-xml-20081126/#d0e4313), [FPIs](https://en.wikipedia.org/wiki/Formal_Public_Identifier) or [namespaces](https://www.w3.org/TR/xml-names11/) when repeatedly parsing and serializing maliciously crafted documents. This may lead to unexpected syntactic changes during XML processing in some downstream applications. ### Patches Update to 0.5.0 (once it is released) ### Workarounds Downstream applications can validate the input and reject the maliciously crafted documents. ### References Similar to this one reported on the Go standard library: - https://mattermost.com/blog/coordinated-disclosure-go-xml-vulnerabilities/ ### For more information If you have any questions or comments about this advisory: * Open an issue in [`xmldom/xmldom`](https://github.com/xmldom/xmldom) * Email us: send an email to **all** addresses that are shown by `npm owner ls xmldom`
Package was published without Sigstore provenance. Consider requesting the maintainer enable provenance via CI/CD.
Review Summary
Risk score: 100 (capped from 188). Findings: 1 critical (+40), 5 high (+125), 2 medium (+20), 1 low (+3), 1 info (+0).
Commit: 91e456310880 Browse source
Published to npm: