Skip to content

Add serial chunked conformational map-reduce workflow#369

Open
harryswift01 wants to merge 3 commits into
mainfrom
364-implement-chunked-serial-conformation-map-reduce
Open

Add serial chunked conformational map-reduce workflow#369
harryswift01 wants to merge 3 commits into
mainfrom
364-implement-chunked-serial-conformation-map-reduce

Conversation

@harryswift01

Copy link
Copy Markdown
Member

Summary

This PR implements the serial chunked conformational map-reduce workflow for dihedral conformational state analysis. The workflow now processes selected trajectory frames in deterministic chunks while still executing serially, preparing the conformational path for future Dask-backed execution without introducing distributed scheduling in this change.

Changes

Serial Chunked Conformational Workflow:

  • Added a chunked conformational state-building workflow.
  • Split conformational analysis into topology discovery, frame-chunk angle collection, global peak reduction, state assignment, and final state reduction.
  • Preserved serial execution while introducing the same map-reduce structure needed for later parallel execution.

Dihedral Analysis Refactor:

  • Replaced the monolithic dihedral analysis module with a domain-specific dihedrals package.
  • Added dedicated modules for topology, angle observations, peak detection, state assignment, and numerical kernels.
  • Reused cached chunk-local angle observations during state assignment to avoid rerunning dihedral calculations unnecessarily.

Regression-Safe State Reduction:

  • Preserved the existing residue conformational state indexing behaviour to keep regression outputs unchanged.
  • Kept deterministic chunk and molecule ordering during reduction.
  • Added and updated unit tests for chunked angle collection, histogram reduction, peak detection, state assignment, and regression-preserving behaviour.

Impact

  • Prepares conformational state analysis for future Dask-backed chunk execution.
  • Improves maintainability by separating dihedral analysis into smaller domain-specific modules.
  • Keeps current numerical regression behaviour unchanged.
  • Reduces risk for the next parallelisation step by proving the chunked algorithm in serial first.

@harryswift01 harryswift01 added this to the 2.3.0 milestone Jun 18, 2026
@harryswift01 harryswift01 requested a review from jimboid June 18, 2026 09:54
@harryswift01 harryswift01 self-assigned this Jun 18, 2026
@harryswift01 harryswift01 added the feature request New feature or request label Jun 18, 2026
@harryswift01 harryswift01 linked an issue Jun 18, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Chunked Serial Conformation Map-Reduce

1 participant