Skip to content

feat: return content as markdown by default (contentFormat param + CONTENT_FORMAT env var)#63

Merged
luispabon merged 16 commits into
masterfrom
cl/2026-06-17_markdown-output
Jun 17, 2026
Merged

feat: return content as markdown by default (contentFormat param + CONTENT_FORMAT env var)#63
luispabon merged 16 commits into
masterfrom
cl/2026-06-17_markdown-output

Conversation

@luispabon

@luispabon luispabon commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • content response field now returns markdown by default instead of sanitized HTML (breaking change — mitigated by CONTENT_FORMAT=html env var)
  • New contentFormat request body param ("markdown" | "html") lets consumers override the format per-request
  • New CONTENT_FORMAT env var sets the server-wide default (defaults to "markdown")
  • Pipeline: raw HTML → DOMPurify sanitize → Turndown convert (if markdown) — sanitization always runs regardless of format
  • Custom Turndown rules preserve media references: YouTube/Vimeo/Dailymotion iframes → [Video: Provider](url), unknown iframes → [Embedded content](url), video tags → [Video](src)
  • GFM plugin enabled (tables, strikethrough, task lists)

Breaking change

Existing consumers expecting HTML in content must either set CONTENT_FORMAT=html (server-wide) or pass contentFormat: "html" per-request. Documented in README.

Test plan

  • pnpm test — 28/28 pass (default markdown, explicit html, invalid 400, sanitize+convert interaction, GFM tables, env var config)
  • pnpm memory:soak — 100 requests, 0 failures
  • docker build -t readability-js . — clean build

luispabon and others added 16 commits June 17, 2026 14:31
Issue #52 — return content as markdown by default with configurable
contentFormat param and CONTENT_FORMAT env var.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add parseContentFormat() validator function supporting "markdown" and "html"
- Add CONTENT_FORMAT to DEFAULTS with "markdown" as default
- Add contentFormat field to config object returned by loadConfig()
- Add contentFormat validation in validateConfig()
- Update test expectations to include new contentFormat field

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add optional contentFormat field to POST / request body validation.
Accepts "markdown" or "html" values, falls back to server config default
when omitted. Returns HTTP 400 with descriptive error message listing
valid options for invalid values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Import toMarkdown from markdown.js and apply it after DOMPurify
sanitization when contentFormat is "markdown". Update existing
sanitization and media-tag test assertions to expect markdown output
since the default contentFormat is now "markdown".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add documentation for the new markdown output feature:
- Document contentFormat request parameter (optional, overrides server default)
- Document CONTENT_FORMAT environment variable (server-wide default)
- Update example response showing markdown content format
- Add breaking change notice for consumers expecting HTML content
- Document opt-in paths for HTML output (per-request or server-wide)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@luispabon luispabon merged commit b422330 into master Jun 17, 2026
4 checks passed
@luispabon luispabon deleted the cl/2026-06-17_markdown-output branch June 17, 2026 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant