Skip to content

chore: switch to npm, linkedom, sanitize-html#68

Merged
luispabon merged 4 commits into
masterfrom
npm-linkdom
Jun 17, 2026
Merged

chore: switch to npm, linkedom, sanitize-html#68
luispabon merged 4 commits into
masterfrom
npm-linkdom

Conversation

@luispabon

@luispabon luispabon commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Closes #35

Overview

Stack swap for the readability-js-server: drop pnpm, JSDOM, and DOMPurify. Bring in npm, linkedom, and sanitize-html. Drop the parser-worker recycle knob (JSDOM-only workaround). Net result: ~3.1x RSS reduction, ~12x heapUsed reduction under soak, with the public response shape and error envelope unchanged.

Three coupled changes:

  1. Package manager: pnpm/corepack -> npm. Resolves the ERR_VM_DYNAMIC_IMPORT_CALLBACK_MISSING friction on Node 22+ and standardizes on npm's default tooling.
  2. HTML parser: JSDOM 29 -> linkedom 0.18.12. Removes the C++-object leak that motivated the JSDOM child-process recycle.
  3. HTML sanitizer: DOMPurify 3 -> sanitize-html 2.17.5. DOMPurify is a silent no-op against a linkedom window (document.implementation.createHTMLDocument is missing); sanitize-html is a pure-JS string sanitizer with no DOM dependency, which is the right shape for this stack.

The child-process parser pool stays - it still provides crash isolation and main-loop protection. The PARSER_MAX_PARSES recycle knob is removed because the only reason for recycling was the JSDOM leak.

Public surface is unchanged: POST / request/response shape, error envelope, response field set, and GET / 400 guidance all stay frozen per AGENTS.md.

Commits

  • 4ab77f4 chore(deps): switch to npm, linkedom, sanitize-html
  • 9f390b7 chore: switch to npm, linkedom, sanitize-html
  • 658d9c6 Fix make test
  • f56f112 fix: restrict iframe and video src schemes to https (review fix)

Review fix

The first implementation pass dropped allowedSchemesByTag: { iframe: ['https'], video: ['https'] } from the sanitize-html config with a justification that it crashed the launder transitive dep. That justification was wrong (verified on sanitize-html 2.17.5 - it works fine) and the deviation was a real security-policy regression: an <iframe src="http://..."> would have survived sanitization. The plan's intent was https-only on embedded media. f56f112 restores the policy and adds a regression test in test/security.test.js that asserts http: iframe/video src is stripped and https: iframe src survives.

Verification

  • npm ci: 217 packages, 0 vulnerabilities
  • npm run lint: clean
  • npm test: 36/36 pass (35 prior + 1 new regression test)
  • test/security.test.js: 4/4 pass (3 prior + 1 new)
  • make helm-verify: clean
  • node --expose-gc scripts/memory-soak.js --requests 200 --concurrency 4 --sample-every 20: 0 failures, RSS plateau 81.3MB (vs JSDOM baseline 252.5MB, 3.1x reduction), heapUsed 11.6MB (12x reduction)
  • Final review verdict: pass_with_notes

Residual notes

  • sanitize-html emits a script/style XSS warning on every parse because of allowedTags: false. Noise only: Readability strips <script> upstream and event handlers/javascript: schemes are still stripped.
  • Docker build was not exercised in the review environment (no docker daemon). Recommend a local docker build -t readability-js:test . before merge as a final sanity check.

Drop pnpm (remove packageManager field). Remove jsdom and dompurify dependencies. Add linkedom@^0.18.12 and sanitize-html@^2.17.0. Regenerate lockfile with npm (package-lock.json). Delete pnpm-lock.yaml.
Plan pinned allowedSchemesByTag.iframe/video to [https] to match the
DOMPurify model. The first implementation dropped it with an incorrect
"crashes launder" justification. Restore the policy and add a regression
test.
@luispabon luispabon merged commit ed7a716 into master Jun 17, 2026
3 checks passed
@luispabon luispabon deleted the npm-linkdom branch June 17, 2026 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant