chore: switch to npm, linkedom, sanitize-html#68
Merged
Conversation
Drop pnpm (remove packageManager field). Remove jsdom and dompurify dependencies. Add linkedom@^0.18.12 and sanitize-html@^2.17.0. Regenerate lockfile with npm (package-lock.json). Delete pnpm-lock.yaml.
Plan pinned allowedSchemesByTag.iframe/video to [https] to match the DOMPurify model. The first implementation dropped it with an incorrect "crashes launder" justification. Restore the policy and add a regression test.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #35
Overview
Stack swap for the readability-js-server: drop pnpm, JSDOM, and DOMPurify. Bring in npm, linkedom, and sanitize-html. Drop the parser-worker recycle knob (JSDOM-only workaround). Net result: ~3.1x RSS reduction, ~12x heapUsed reduction under soak, with the public response shape and error envelope unchanged.
Three coupled changes:
ERR_VM_DYNAMIC_IMPORT_CALLBACK_MISSINGfriction on Node 22+ and standardizes on npm's default tooling.document.implementation.createHTMLDocumentis missing); sanitize-html is a pure-JS string sanitizer with no DOM dependency, which is the right shape for this stack.The child-process parser pool stays - it still provides crash isolation and main-loop protection. The
PARSER_MAX_PARSESrecycle knob is removed because the only reason for recycling was the JSDOM leak.Public surface is unchanged:
POST /request/response shape, error envelope, response field set, andGET /400 guidance all stay frozen perAGENTS.md.Commits
4ab77f4chore(deps): switch to npm, linkedom, sanitize-html9f390b7chore: switch to npm, linkedom, sanitize-html658d9c6Fix make testf56f112fix: restrict iframe and video src schemes to https (review fix)Review fix
The first implementation pass dropped
allowedSchemesByTag: { iframe: ['https'], video: ['https'] }from the sanitize-html config with a justification that it crashed thelaundertransitive dep. That justification was wrong (verified on sanitize-html 2.17.5 - it works fine) and the deviation was a real security-policy regression: an<iframe src="http://...">would have survived sanitization. The plan's intent was https-only on embedded media.f56f112restores the policy and adds a regression test intest/security.test.jsthat asserts http: iframe/video src is stripped and https: iframe src survives.Verification
npm ci: 217 packages, 0 vulnerabilitiesnpm run lint: cleannpm test: 36/36 pass (35 prior + 1 new regression test)test/security.test.js: 4/4 pass (3 prior + 1 new)make helm-verify: cleannode --expose-gc scripts/memory-soak.js --requests 200 --concurrency 4 --sample-every 20: 0 failures, RSS plateau 81.3MB (vs JSDOM baseline 252.5MB, 3.1x reduction), heapUsed 11.6MB (12x reduction)pass_with_notesResidual notes
script/styleXSS warning on every parse because ofallowedTags: false. Noise only: Readability strips<script>upstream and event handlers/javascript:schemes are still stripped.docker build -t readability-js:test .before merge as a final sanity check.