Merge backfill feature branch#820
Conversation
Define the initial @fedify/backfill async generator API around a typed BackfillContext, note seed object, traversal options, and BackfillItem wrappers. The generator remains a stub so tests and traversal logic can be added in follow-up commits. Assisted-by: Codex:gpt-5
Add the initial context-posts traversal for @fedify/backfill. The implementation dereferences the seed object's context, accepts direct ActivityStreams collections and collection pages, yields post-like objects, and enforces request, item, interval, abort, and duplicate-id handling. Add tests for the PR 1 behavior across Deno, Node.js, and Bun. Assisted-by: Codex:gpt-5
Replace the scaffold status text with a short description of the initial context collection backfill behavior and a minimal usage example. Assisted-by: Codex:gpt-5
Remove unrelated lockfile churn from the backfill branch, keeping only the new package importer required for @fedify/backfill. Assisted-by: Codex:gpt-5
Skip collection URL items that fail to dereference instead of terminating the whole traversal, while still stopping when the request budget is exhausted. Also clean up interval abort listeners after successful waits. Assisted-by: Codex:gpt-5
Normalize every platform path separator in globbed test entries before passing those paths to tsdown. Assisted-by: Codex:gpt-5
When callers pass a string interval, report a clear error if Temporal is not available globally instead of failing with an unhelpful ReferenceError. Assisted-by: Codex:gpt-5
Make the interval option explicitly accept string durations, keep the internal duration helper aligned with that public type, and use strict Node assertions in the backfill tests. Explicitly externalize @fedify/vocab in the backfill tsdown build so vocabulary class identity remains stable for instanceof checks. Assisted-by: Codex:gpt-5
Replace the package README's VitePress-only installation block with plain Markdown, document the new public API surface with since tags, and mark exported option and result properties as readonly. Export the request-budget error class so callers can identify it when it escapes traversal internals. Assisted-by: Codex:gpt-5
Keep the new backfill API documentation on provisional 2.x.0 since tags until the feature branch is ready to merge and the target release version is known. Assisted-by: Codex:gpt-5
Add initial backfill package
Add context-auto, context-objects, and context-activities strategy handling for context collection backfill. The auto strategy classifies ordinary objects directly and extracts objects from supported Create activities while preserving shared request budgets, deduplication, abort, and interval behavior. Document the strategy behavior and add coverage for explicit, automatic, empty, overlapping, and duplicate strategy configurations. Assisted-by: Codex:gpt-5.5
Document the behavior of each context backfill strategy and make the context-auto precedence explicit in the public options comments. Tighten the README wording so it describes the currently supported Create activity handling instead of implying support for every Activity type. Assisted-by: Codex:gpt-5
Support FEP-f228 activity collection backfill
Extend the backfill public strategy and origin types for the upcoming reply-tree traversal. Clarify that context-auto absorbs only other context collection strategies so it can later compose with reply-tree traversal. Assisted-by: Codex:gpt-5.5
Run normalized strategies through a shared orchestration path so context collection traversal can compose with the upcoming reply-tree strategy. Keep context-auto absorbing only overlapping context collection strategies, while preserving global deduplication, budgets, and yield metadata handling. Assisted-by: Codex:gpt-5.5
Implement the reply-tree ancestor path through inReplyTo targets. Ancestors share the existing document loader, request budget, abort signal, and global output deduplication while keeping a separate visited set to prevent traversal cycles. Add coverage for embedded and dereferenced ancestors, maxDepth, maxRequests, cycle prevention, and deduplication against context collection results. Assisted-by: Codex:gpt-5.5
Extend reply-tree traversal to follow replies collections after ancestor lookup. Descendants reuse the existing collection loader, request budget, abort signal, and visited-state handling while reporting replies origin and reply-tree depth metadata. Add coverage for embedded and dereferenced replies collections, maxDepth, maxRequests, and descendant cycle prevention. Assisted-by: Codex:gpt-5.5
Update the backfill public API comments now that reply-tree traversal is implemented. Document how documentLoader, maxRequests, maxDepth, and item depth apply to reply targets and replies collections. Assisted-by: Codex:gpt-5.5
Preserve the PR 2 context collection behavior by loading the context collection once for adjacent context strategies. This keeps ordered strategy execution while avoiding duplicate context dereferences and request-budget regressions for explicit context strategy combinations. Add a regression test that combined context object and activity strategies share the same context collection load. Assisted-by: Codex:gpt-5.5
Describe how reply-tree composes with context collection backfill and clarify that reply-tree yields discovered post-like objects without unwrapping Activity objects. Assisted-by: Codex:gpt-5.5
Separate reply-tree object and collection visited state so traversal can avoid reloading or revisiting replies collections. Mark collection IRIs before loading them and keep embedded collection references in visited state to avoid reply-tree collection cycles. Add coverage for repeated replies collection IRIs sharing a single loader request. Assisted-by: Codex:gpt-5.5
Add tests that exercise shared maxItems, maxRequests, and abort behavior across context collection and reply-tree strategies. This protects the ordered hybrid execution path where context-auto runs before reply-tree. Assisted-by: Codex:gpt-5.5
Add a regression test for hybrid backfill ordering where reply-tree runs before context-auto. The test protects the contract that earlier strategies keep their BackfillItem metadata when later strategies discover the same object. Assisted-by: Codex:gpt-5.5
Clarify that backfill strategies run in order while sharing item, request, abort, and deduplication state. Also update the README to describe the opt-in reply-tree strategy alongside the FEP-f228 context collection path. Assisted-by: Codex:gpt-5.5
Limit context-auto absorption to the current adjacent context strategy group so reply-tree keeps acting as an ordering boundary. Add a regression test for context-objects before reply-tree followed by context-auto. Assisted-by: Codex:gpt-5.5
Avoid dereferencing context collection URL items whose object IDs are already known. This prevents fetching the seed object again when a context collection includes it and preserves request budget for unseen items. Assisted-by: Codex:gpt-5.5
Keep a traversal-local document cache in the shared request budget so overlapping strategies do not dereference the same IRI more than once. Cache hits bypass request budgeting and interval delays while thrown loader failures remain retryable. Assisted-by: Codex:gpt-5.5
Add a regression test proving failed documentLoader calls are removed from the traversal-local cache. A later strategy can retry the same IRI and yield the recovered object. Assisted-by: Codex:gpt-5.5
Start descendant traversal from discovered ancestors as well as the seed so reply-tree backfill does not miss sibling branches when the seed is itself a reply. Also skip already visited reply IRIs before dereferencing collection items so bounded crawls do not waste request budget on known objects. Add regressions covering both behaviors. Assisted-by: Codex:gpt-5.5
Apply a default maximum depth of 10 to ancestor and descendant reply-tree traversal while preserving explicit maxDepth overrides. Document the default and cover both traversal directions with tests. Assisted-by: Codex:gpt-5.4
Keep descendant depths relative to the original seed when walking from discovered ancestors, so maxDepth applies consistently across the tree. Allow replies collections to be retried after transient loader failures by marking their IDs visited only after a successful load. Add regression tests for both edge cases. Assisted-by: Codex:gpt-5.4
List Backfilling conversations among the FEPs supported by Fedify so FEDERATION.md reflects the new backfill package capabilities. Assisted-by: Codex:gpt-5.4
Add reply-tree backfill strategy
Add changes and package descriptions. Fix version to 2.3.0. Assisted-by: Codex:gpt-5.5
Publish the @fedify/backfill guide on the documentation website with installation commands, strategy examples, traversal controls, and cache and failure behavior. Add the package to the docs workspace so Twoslash can validate the examples. Assisted-by: Codex:gpt-5.5
Finalize backfill package documentation
✅ Deploy Preview for fedify-json-schema ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
💤 Files with no reviewable changes (1)
📝 WalkthroughWalkthroughThis PR adds the new Changes
Sequence Diagram(s)sequenceDiagram
actor Caller
participant backfill
participant RequestBudget
participant loadObject
Caller->>backfill: backfill(context, seed, options)
backfill->>RequestBudget: initialize maxRequests, interval, signal
loop each selected strategy
backfill->>loadObject: fetch collection/item/graph document
loadObject->>RequestBudget: check budget and cache
loadObject-->>backfill: APObject or null
backfill-->>Caller: yield BackfillItem
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces the @fedify/backfill package, which enables the reconstruction of ActivityPub conversations from FEP-f228 context collections or reply relationships. The package includes comprehensive tests, documentation, and support for multiple traversal strategies. Feedback on the implementation highlights opportunities to optimize reverse array iteration by using an indexed for loop instead of .toReversed(), prevent potential runtime crashes by safely accessing note.contextIds with optional chaining, and improve abort handling by providing a fallback DOMException when budget.signal.reason is undefined.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| ancestors.push({ object: item.object, depth: item.depth }); | ||
| yield item; | ||
| } | ||
| for (const ancestor of ancestors.toReversed()) { |
There was a problem hiding this comment.
When iterating over an array in reverse, prefer an indexed for loop over creating a reversed copy with [...items].reverse() or .toReversed() if performance and memory allocation are critical.
for (let i = ancestors.length - 1; i >= 0; i--) {
const ancestor = ancestors[i];References
- When iterating over an array in reverse, prefer an indexed
forloop over creating a reversed copy with[...items].reverse()if performance and memory allocation are critical.
| readonly origin: "collection"; | ||
| readonly depth: 0; | ||
| }> { | ||
| const contextId = note.contextIds[0]; |
| await new Promise<void>((resolve, reject) => { | ||
| if (budget.signal?.aborted) { | ||
| reject(budget.signal.reason); | ||
| return; | ||
| } | ||
| const timeout = setTimeout(() => { | ||
| budget.signal?.removeEventListener("abort", onAbort); | ||
| resolve(); | ||
| }, milliseconds); | ||
| const onAbort = () => { | ||
| clearTimeout(timeout); | ||
| reject(budget.signal?.reason); | ||
| }; | ||
| budget.signal?.addEventListener("abort", onAbort, { once: true }); | ||
| }); |
There was a problem hiding this comment.
If budget.signal is aborted without an explicit reason, budget.signal.reason can be undefined in some environments or custom signal implementations. Rejecting a promise with undefined is an anti-pattern and can cause issues in catch blocks. Providing a fallback DOMException ensures robust error handling.
| await new Promise<void>((resolve, reject) => { | |
| if (budget.signal?.aborted) { | |
| reject(budget.signal.reason); | |
| return; | |
| } | |
| const timeout = setTimeout(() => { | |
| budget.signal?.removeEventListener("abort", onAbort); | |
| resolve(); | |
| }, milliseconds); | |
| const onAbort = () => { | |
| clearTimeout(timeout); | |
| reject(budget.signal?.reason); | |
| }; | |
| budget.signal?.addEventListener("abort", onAbort, { once: true }); | |
| }); | |
| await new Promise<void>((resolve, reject) => { | |
| if (budget.signal?.aborted) { | |
| reject(budget.signal.reason ?? new DOMException("The operation was aborted.", "AbortError")); | |
| return; | |
| } | |
| const timeout = setTimeout(() => { | |
| budget.signal?.removeEventListener("abort", onAbort); | |
| resolve(); | |
| }, milliseconds); | |
| const onAbort = () => { | |
| clearTimeout(timeout); | |
| reject(budget.signal?.reason ?? new DOMException("The operation was aborted.", "AbortError")); | |
| }; | |
| budget.signal?.addEventListener("abort", onAbort, { once: true }); | |
| }); |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@CHANGES.md`:
- Line 359: The CHANGES markdown contains a duplicate reference definition for
[`#801`], which is already defined earlier and should only appear once. Remove the
later [`#801`] definition in the changelog entry so the existing reference
definition remains the single source of truth and markdown lint no longer flags
the duplicate.
In `@packages/backfill/package.json`:
- Around line 64-66: The backfill test scripts are running both emitted copies
of each suite, which duplicates execution and can cause shared-state flakiness.
Update the test commands in package.json to target only one emitted format from
tsdown.config.ts, and use the matching runner invocation for that single format
in the test:bun/test scripts so the same tests are not discovered twice.
Reference the test script entries in package.json and the build output shape
from tsdown.config.ts when adjusting the glob/command.
In `@packages/backfill/src/backfill.ts`:
- Around line 145-160: The strategy normalizer currently only removes trailing
adjacent context strategies when it sees "context-auto", so non-adjacent context
strategies like "context-objects" can remain and violate the documented
absorption contract. Update the normalization logic in the strategy-processing
loop that checks "context-auto", "isContextStrategy", and
currentContextGroupHasAuto so that "context-auto" absorbs all other context
strategies already present in normalized, not just contiguous ones. Make sure
the resulting behavior matches the contract described in the backfill types
documentation and prevents duplicate context passes from being retained.
- Around line 660-680: The durationToMilliseconds helper is undercounting valid
Temporal.DurationLike inputs because it ignores weeks, microseconds, and
nanoseconds, and it silently drops calendar-based fields like months and years.
Update durationToMilliseconds to convert all supported components consistently
(or delegate to Temporal.Duration.from(...).total(...) for object inputs when
Temporal is available), and make sure the interval pacing logic in backfill.ts
uses the corrected total from durationToMilliseconds.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1d8e2551-67a3-413b-977f-54911e5e9329
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (17)
CHANGES.mdCONTRIBUTING.mdFEDERATION.mddeno.jsondocs/.vitepress/config.mtsdocs/manual/backfill.mddocs/package.jsonpackages/backfill/README.mdpackages/backfill/deno.jsonpackages/backfill/package.jsonpackages/backfill/src/backfill.test.tspackages/backfill/src/backfill.tspackages/backfill/src/mod.tspackages/backfill/src/types.tspackages/backfill/tsdown.config.tspackages/fedify/README.mdpnpm-workspace.yaml
| "test": "cd dist/ && node --test", | ||
| "pretest:bun": "pnpm build", | ||
| "test:bun": "cd dist/ && bun test --timeout 60000" |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify that the test build emits both module formats while the scripts use dist-wide discovery.
sed -n '19,31p' packages/backfill/tsdown.config.ts
sed -n '58,66p' packages/backfill/package.jsonRepository: fedify-dev/fedify
Length of output: 875
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect related package test/build patterns and any dist test discovery hints.
git ls-files 'packages/*/package.json' 'packages/*/tsdown.config.ts' | sed -n '1,200p'
printf '\n--- backfill package.json ---\n'
cat -n packages/backfill/package.json | sed -n '55,75p'
printf '\n--- backfill tsdown.config.ts ---\n'
cat -n packages/backfill/tsdown.config.ts | sed -n '1,80p'
printf '\n--- other package test scripts ---\n'
rg -n '"test": "cd dist/ && (node --test|bun test)' packages -g 'package.json'Repository: fedify-dev/fedify
Length of output: 4184
Limit backfill tests to one emitted format. packages/backfill/tsdown.config.ts builds each test file as both *.test.js and *.test.cjs, and packages/backfill/package.json runs node --test/bun test over dist/. That duplicates the same suite and can make shared-state assertions flaky.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/backfill/package.json` around lines 64 - 66, The backfill test
scripts are running both emitted copies of each suite, which duplicates
execution and can cause shared-state flakiness. Update the test commands in
package.json to target only one emitted format from tsdown.config.ts, and use
the matching runner invocation for that single format in the test:bun/test
scripts so the same tests are not discovered twice. Reference the test script
entries in package.json and the build output shape from tsdown.config.ts when
adjusting the glob/command.
| for (const strategy of strategies) { | ||
| if (strategy === "context-auto") { | ||
| for ( | ||
| let i = normalized.length - 1; | ||
| i >= 0 && isContextStrategy(normalized[i]); | ||
| i-- | ||
| ) { | ||
| normalized.splice(i, 1); | ||
| } | ||
| if (!normalized.includes(strategy)) normalized.push(strategy); | ||
| } else if (isContextStrategy(strategy)) { | ||
| if ( | ||
| !currentContextGroupHasAuto(normalized) && | ||
| !normalized.includes(strategy) | ||
| ) { | ||
| normalized.push(strategy); |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
context-auto only absorbs adjacent context strategies right now.
With ["context-objects", "reply-tree", "context-auto"], this normalizer keeps both context passes, even though the public contract in packages/backfill/src/types.ts Lines 84-86 says "context-auto" absorbs the other context strategies. That changes winner metadata and can spend extra requests unexpectedly. Please make the behavior and the documented contract agree.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/backfill/src/backfill.ts` around lines 145 - 160, The strategy
normalizer currently only removes trailing adjacent context strategies when it
sees "context-auto", so non-adjacent context strategies like "context-objects"
can remain and violate the documented absorption contract. Update the
normalization logic in the strategy-processing loop that checks "context-auto",
"isContextStrategy", and currentContextGroupHasAuto so that "context-auto"
absorbs all other context strategies already present in normalized, not just
contiguous ones. Make sure the resulting behavior matches the contract described
in the backfill types documentation and prevents duplicate context passes from
being retained.
| function durationToMilliseconds( | ||
| duration: Temporal.DurationLike | string, | ||
| ): number { | ||
| if (typeof duration === "string") { | ||
| if (typeof Temporal === "undefined") { | ||
| throw new TypeError( | ||
| "Temporal is not globally available; pass interval as a " + | ||
| "Temporal.DurationLike object instead of a string, or provide a " + | ||
| "Temporal polyfill.", | ||
| ); | ||
| } | ||
| return Temporal.Duration.from(duration).total({ unit: "milliseconds" }); | ||
| } | ||
|
|
||
| return ( | ||
| (duration.milliseconds ?? 0) + | ||
| (duration.seconds ?? 0) * 1000 + | ||
| (duration.minutes ?? 0) * 60 * 1000 + | ||
| (duration.hours ?? 0) * 60 * 60 * 1000 + | ||
| (duration.days ?? 0) * 24 * 60 * 60 * 1000 | ||
| ); |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Temporal.DurationLike object inputs are undercounted here.
weeks, microseconds, and nanoseconds are ignored, and calendar units like months/years are silently treated as zero. For example, { weeks: 1 } currently yields no delay, which breaks the pacing contract for valid interval values.
Proposed fix
function durationToMilliseconds(
duration: Temporal.DurationLike | string,
): number {
if (typeof duration === "string") {
@@
}
+ if ((duration.years ?? 0) !== 0 || (duration.months ?? 0) !== 0) {
+ throw new RangeError(
+ "interval does not support calendar units without a relative reference.",
+ );
+ }
+
return (
(duration.milliseconds ?? 0) +
+ (duration.microseconds ?? 0) / 1000 +
+ (duration.nanoseconds ?? 0) / 1_000_000 +
(duration.seconds ?? 0) * 1000 +
(duration.minutes ?? 0) * 60 * 1000 +
(duration.hours ?? 0) * 60 * 60 * 1000 +
+ (duration.weeks ?? 0) * 7 * 24 * 60 * 60 * 1000 +
(duration.days ?? 0) * 24 * 60 * 60 * 1000
);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function durationToMilliseconds( | |
| duration: Temporal.DurationLike | string, | |
| ): number { | |
| if (typeof duration === "string") { | |
| if (typeof Temporal === "undefined") { | |
| throw new TypeError( | |
| "Temporal is not globally available; pass interval as a " + | |
| "Temporal.DurationLike object instead of a string, or provide a " + | |
| "Temporal polyfill.", | |
| ); | |
| } | |
| return Temporal.Duration.from(duration).total({ unit: "milliseconds" }); | |
| } | |
| return ( | |
| (duration.milliseconds ?? 0) + | |
| (duration.seconds ?? 0) * 1000 + | |
| (duration.minutes ?? 0) * 60 * 1000 + | |
| (duration.hours ?? 0) * 60 * 60 * 1000 + | |
| (duration.days ?? 0) * 24 * 60 * 60 * 1000 | |
| ); | |
| function durationToMilliseconds( | |
| duration: Temporal.DurationLike | string, | |
| ): number { | |
| if (typeof duration === "string") { | |
| if (typeof Temporal === "undefined") { | |
| throw new TypeError( | |
| "Temporal is not globally available; pass interval as a " + | |
| "Temporal.DurationLike object instead of a string, or provide a " + | |
| "Temporal polyfill.", | |
| ); | |
| } | |
| return Temporal.Duration.from(duration).total({ unit: "milliseconds" }); | |
| } | |
| if ((duration.years ?? 0) !== 0 || (duration.months ?? 0) !== 0) { | |
| throw new RangeError( | |
| "interval does not support calendar units without a relative reference.", | |
| ); | |
| } | |
| return ( | |
| (duration.milliseconds ?? 0) + | |
| (duration.microseconds ?? 0) / 1000 + | |
| (duration.nanoseconds ?? 0) / 1_000_000 + | |
| (duration.seconds ?? 0) * 1000 + | |
| (duration.minutes ?? 0) * 60 * 1000 + | |
| (duration.hours ?? 0) * 60 * 60 * 1000 + | |
| (duration.weeks ?? 0) * 7 * 24 * 60 * 60 * 1000 + | |
| (duration.days ?? 0) * 24 * 60 * 60 * 1000 | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/backfill/src/backfill.ts` around lines 660 - 680, The
durationToMilliseconds helper is undercounting valid Temporal.DurationLike
inputs because it ignores weeks, microseconds, and nanoseconds, and it silently
drops calendar-based fields like months and years. Update durationToMilliseconds
to convert all supported components consistently (or delegate to
Temporal.Duration.from(...).total(...) for object inputs when Temporal is
available), and make sure the interval pacing logic in backfill.ts uses the
corrected total from durationToMilliseconds.
You should run |
Summary
This PR merges the completed
feat/backfillfeature branch intomain.The branch adds the initial
@fedify/backfillpackage and completes the FEP-f228 conversation backfill work from the preceding PR series:Closes #275.
Assisted-by: Codex:gpt-5.5