Skip to content

Documentation Website#151

Draft
douglowe wants to merge 16 commits into
developfrom
122-documentation-website
Draft

Documentation Website#151
douglowe wants to merge 16 commits into
developfrom
122-documentation-website

Conversation

@douglowe

Copy link
Copy Markdown
Collaborator

This will be a GitHub pages hosted website. Currently just a skeleton setup.

Copilot AI review requested due to automatic review settings May 11, 2026 12:20
@douglowe douglowe marked this pull request as draft May 11, 2026 12:20

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces an initial skeleton for a GitHub Pages–hosted documentation website under docs/website-src, using Next.js + Nextra, and adds a GitHub Actions workflow to verify the docs site builds on PRs.

Changes:

  • Added Nextra/Next.js docs site scaffold (config + basic pages).
  • Added initial site navigation metadata and app wrapper.
  • Added a PR workflow to install dependencies and build/upload the Pages artifact.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
docs/website-src/theme.config.tsx Nextra theme configuration for repo links, sidebar, header/footer.
docs/website-src/pages/index.mdx Landing page content scaffold.
docs/website-src/pages/crate_validator.mdx Placeholder page for validator documentation.
docs/website-src/pages/_meta.js Sidebar/page metadata configuration.
docs/website-src/pages/_app.jsx Next.js custom App wrapper.
docs/website-src/package.json Docs site dependencies and scripts.
docs/website-src/next.config.mjs Next.js + Nextra integration and static export settings.
.github/workflows/check.build.docs.yml CI workflow to build (and currently upload) the docs site output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/check.build.docs.yml Outdated
Comment thread .github/workflows/check.build.docs.yml Outdated
Comment thread .github/workflows/check.build.docs.yml Outdated
import { useConfig } from 'nextra-theme-docs'

export default {
docsRepositoryBase: 'https://github.com/eScienceLab/Cratey-Validator/tree/main/docs/website',
@@ -0,0 +1,23 @@
import React from "react";
Comment on lines +15 to +17
head() {
const { frontMatter } = useConfig()

Comment on lines +1 to +3
import { Cards } from 'nextra/components'
import { Steps,Callout } from "nextra/components"

pull_request:

paths:
- "docs/website-src/**"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just have docs/ be the root? there is almost nothing in there and the stuff in docs/assets/ will be useful for the site anyway

Suggested change
- "docs/website-src/**"
- "docs/**"

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably could - I was following the deployment system for the original docs page, and trying to fit the logic into our filesystem. But, I must admit, I have followed the workflow by rote, rather than considering if we need every step.

In the deployment workflow we have these steps:

      - name: Copy built files to docs
        run: |
          rm -rf docs/website/*
          cp -r docs/website-src/out docs/website

      - name: Upload artifact
        uses: actions/upload-pages-artifact@v3
        with:
          path: docs/website

If we move to using docs rather than docs/website as the root, then perhaps we could get rid of the copy step, and upload directly from the docs/website/out (or, as it would be, docs/out) path?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that makes sense to me

@elichad elichad left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed the content. (sorry for the brusque comments, I'm rushing this before the weekend as I've held you up long enough)

It would be good if the website setup, content, & c4 diagrams could each be separated into their own PRs - the structure here is now quite confusing to review

Comment thread docs/website-src/pages/deployment.mdx Outdated
and validation is performed by the CRS4 `rocrate_validator` library.


## Runtime Flow

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is much more like developer documentation and shouldn't be in this file which is user-oriented.

Instead we should include/link to the user-facing API docs in rest_api.mdx

Date: 2026-05-14


## Service Snapshot

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid duplicating content so we don't have to maintain it twice - if it's useful to have in both places we could move this into a separate file that gets imported in both

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed that in the last splitting of the text into three different files.

Comment thread docs/website-src/pages/development.mdx Outdated
4. Celery runs `process_validation_task_by_metadata`.
5. The API waits for the Celery result and returns the validation output synchronously.

## API Surface

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the same vein, could we just import/include rest_api.mdx here, or link to it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to segregate different sections of the current documents in separate documents with different focuses, I'm not sure why we would import that content back into the current document. I'd rather use links.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this probably indicates that we need a little restructuring. Right now the development page is basically the deployment page with a bunch of extra information - maybe we remove most of the duplicated stuff altogether and basically expect people to read the deployment page first before the development page.

We could then rename "Deployment" to "Installation and Setup" to make it clearer that it's aimed at all users rather than just those deploying it in production. So we end up with a page structure something like:

  • index page
  • About (the "service snapshot" bit? or combine this with the index page)
  • Installation and Setup
  • API reference
  • Development


<div style={{ display: "flex", flexDirection: "column", alignItems: "center", paddingTop: "2rem" }}>

# RO-Crate Validation Service

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is organised a bit backward. First we should summarise what this service is, then provide the context. i.e. the last paragraph should come first.

such as the [Five Safes RO-Crate extension](https://trefx.uk/5s-crate/), for working with sensitive data within Trusted Research Environments (TREs).

This validation process can be complex, requiring both structural and semantic checks to ensure that the RO-Crate is compliant with the relevant standards.
The [rocrate-validator](https://rocrate-validator.readthedocs.io/en/latest/) provides a means for the structural validation of RO-Crates,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain more clearly that this service is a wrapper for this validation tool


This RO-Crate Validation Service provides a REST API for validating RO-Crates using the rocrate-validator.
It is built in python using flask, and is provided as a docker image for ease of deployment.
Several base profiles are included, while more can be added as needed when the service is deployed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain briefly what profiles are, or link to the ro-crate website page

@elichad

elichad commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Noting that we still need content which addresses #122 , though may be better to hold off if we will do #163 first as this will change how profiles are loaded

…(1) crate_validator.mdx, which is fpcised on the end user, and (2) rest_api.mdx and (3) developer_architecture.mdx, which are focused on a developer perspective.
@EttoreM

EttoreM commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Following on @elichad comments, I have split the text content of the original file into three files:


deployment.mdx

RO-Crate Validator Service Deployment

Service Snapshot

Cratey Validator is a web service that checks whether RO-Crates follow the
expected structure and metadata rules. It can validate a complete RO-Crate
stored in MinIO-compatible object storage, or it can validate the contents
of an ro-crate-metadata.json file directly.

At a high level, a client sends a validation request, the service runs the
RO-Crate validation checks, and the result is either returned directly or
saved so it can be retrieved later.

What You Can Validate

Stored RO-Crates

Use this option when the RO-Crate already exists in MinIO-compatible object
storage. The service reads the crate from the configured bucket, runs the
validation checks, and saves the validation result back to object storage.

This is useful for validating complete crates as part of an upload, review,
or publication workflow.

Metadata Files

Use this option when you only need to check the contents of
ro-crate-metadata.json. Instead of asking the service to download a full
crate, you submit the metadata JSON directly and receive the validation result
in the response.

This is useful for quick checks while editing metadata or before a complete
crate has been assembled.

Saved Validation Results

After a stored RO-Crate has been validated, the saved validation result can be
retrieved later. This lets another application or user interface show the most
recent validation status without rerunning the checks.

Before You Start

For stored RO-Crate validation, you need:

  • A MinIO-compatible object store.
  • A bucket containing the RO-Crate.
  • Access credentials for that bucket.
  • The crate identifier used by the object store.

For metadata-only validation, you only need the contents of
ro-crate-metadata.json.

Running The Service

The service can be started with Docker Compose:

docker compose up --build

For local container development, use:

docker compose --file docker-compose-develop.yml up --build

Expected local services:

  • Flask API: http://localhost:5001
  • MinIO API: http://localhost:9000
  • MinIO console: http://localhost:9001
  • Redis: localhost:6379

MinIO needs a bucket for RO-Crates, normally ro-crates. Bucket versioning
should be enabled so uploaded crate objects can be tracked reliably.

Configuration

The main environment variables are:

  • FLASK_APP: Flask entrypoint, normally cratey.py.
  • FLASK_ENV: selects development or production config.
  • CELERY_BROKER_URL: Redis broker URL.
  • CELERY_RESULT_BACKEND: Redis result backend URL.
  • PROFILES_PATH: optional path to custom RO-Crate validator profile
    definitions.
  • MINIO_ENDPOINT: default MinIO endpoint used by Docker examples.
  • MINIO_ROOT_USER: MinIO root username for local development.
  • MINIO_ROOT_PASSWORD: MinIO root password.
  • MINIO_BUCKET_NAME: default bucket name used by local setup.

API calls also pass MinIO access details in minio_config, so the service can validate crates in a specified object store and bucket.

More Information

  • For endpoint paths, request bodies, response codes, validation profiles, and
    result storage paths, see the REST API documentation.
  • For implementation details, service components, runtime flow, and test
    coverage, see the Architecture documentation.
  • For deployment context and the architecture diagram, see
    Deployment.

rest_api.mdx

Crate Validator REST API

Overview

The Crate Validator REST API exposes operations for validating stored
RO-Crates, validating submitted RO-Crate metadata, and retrieving saved
validation results.

Stored RO-Crate validation uses MinIO-compatible object storage. Requests that
operate on stored crates include minio_config, which tells the service where
the crate is stored and which bucket to use.

Endpoints

POST /v1/ro_crates/{crate_id}/validation

Queues validation for an RO-Crate stored in MinIO-compatible object storage.
The crate_id path parameter is the name used to find the crate object in the
configured bucket.

Request body:

{
  "minio_config": {
    "endpoint": "string",   // required, e.g. "localhost:9000" or "minio:9000"
    "accesskey": "string",  // required, MinIO access key or username
    "secret": "string",     // required, MinIO secret key or password
    "ssl": false,           // required, true when the MinIO endpoint uses HTTPS
    "bucket": "string"      // required, bucket containing the RO-Crate
  },
  "root_path": "string",     // optional folder/path inside the bucket
  "profile_name": "string"   // optional validation profile name
}

Expected responses:

  • 202: validation queued.
  • 400: RO-Crate does not exist or validation request cannot be satisfied.
  • 500: internal service, MinIO, Celery, or validation error.

GET /v1/ro_crates/{crate_id}/validation

Fetches the latest validation result from MinIO-compatible object storage.

Request body:

{
  "minio_config": {
    "endpoint": "string",   // required, e.g. "localhost:9000" or "minio:9000"
    "accesskey": "string",  // required, MinIO access key or username
    "secret": "string",     // required, MinIO secret key or password
    "ssl": false,           // required, true when the MinIO endpoint uses HTTPS
    "bucket": "string"      // required, bucket containing the RO-Crate
  },
  "root_path": "string"     // optional folder/path inside the bucket
}

Expected responses:

  • 200: validation result JSON returned.
  • 400: RO-Crate or validation result is missing.
  • 500: MinIO or internal retrieval error.

POST /v1/ro_crates/validate_metadata

Validates a submitted RO-Crate metadata JSON string.

Request body:

{
  "crate_json": "string",   // required, stringified content of ro-crate-metadata.json
  "profile_name": "string"  // optional validation profile name
}

Expected responses:

  • 200: validation result returned.
  • 422: missing, malformed, or empty metadata JSON.
  • 500: internal validation error.

Validation Profiles

The optional profile_name field selects a specific RO-Crate validation
profile. If omitted, the service uses the validator default.

Custom profile definitions can be made available to the service through the
PROFILES_PATH environment variable.

Result Storage

Validation results for stored RO-Crates are saved back to MinIO-compatible
object storage.

Without root_path, results are stored at:

{crate_id}_validation/validation_status.txt

With root_path, results are stored at:

{root_path}/{crate_id}_validation/validation_status.txt

Related Pages

For implementation details about the Flask application, Celery workers, and
validation workflow, see Architecture.


architecture.mdx

Crate Validator Architecture

Date: 2026-05-14

Current Architecture

The service is built as a Flask/APIFlask web API backed by Celery workers for
longer-running validation jobs. RO-Crates are read from MinIO-compatible object
storage, Redis is used by Celery to pass work between the API and the worker,
and validation is performed by the CRS4 rocrate_validator library.

The project is organized into a few clear areas:

Application entrypoints and routes:

  • cratey.py starts the Flask application.
  • app/__init__.py creates the APIFlask app, registers route blueprints, loads
    environment-specific config, and wires Celery into the Flask context.
  • app/ro_crates/routes/post_routes.py exposes validation request endpoints.
  • app/ro_crates/routes/get_routes.py exposes validation result retrieval.

Validation workflow:

  • app/services/validation_service.py performs request-level validation,
    object existence checks, and queues Celery tasks.
  • app/tasks/validation_tasks.py runs the actual RO-Crate and metadata
    validation workflows.

Storage helpers:

  • app/utils/minio_utils.py handles MinIO client setup, object discovery,
    download, upload, and result retrieval.

Container and development setup:

  • docker-compose.yml runs the published container image with Flask, Celery,
    Redis, and MinIO.
  • docker-compose-develop.yml builds the local Dockerfile and mounts a local
    profile directory for development.

Runtime Flow

Validate RO-Crate From MinIO

  1. Client sends a request to POST /v1/ro_crates/{crate_id}/validation.
  2. Request includes minio_config and optional root_path and profile_name.
  3. Flask route passes the request to queue_ro_crate_validation_task.
  4. Service creates a MinIO client and checks that the target crate exists.
  5. Celery queues process_validation_task_by_id.
  6. Worker downloads the RO-Crate into a temporary local path.
  7. Worker runs rocrate_validator.services.validate.
  8. Validation JSON is uploaded to MinIO at
    {crate_id}_validation/validation_status.txt or
    {root_path}/{crate_id}_validation/validation_status.txt.
  9. Temporary files are removed.

Retrieve Validation Result

  1. Client sends GET /v1/ro_crates/{crate_id}/validation.
  2. Request includes minio_config and optional root_path.
  3. Service checks that both the RO-Crate and validation result exist in MinIO.
  4. Stored validation JSON is returned to the client.

Validate Metadata Directly

  1. Client sends POST /v1/ro_crates/validate_metadata.
  2. Request includes crate_json and optional profile_name.
  3. Service verifies that crate_json is present, valid JSON, and non-empty.
  4. Celery runs process_validation_task_by_metadata.
  5. The API waits for the Celery result and returns the validation output
    synchronously.

Test Coverage

Current tests cover the main service boundaries:

  • API route request validation and route-to-service wiring.
  • Validation service queueing behavior and error handling.
  • Celery task behavior for RO-Crate validation and metadata validation.
  • MinIO helper behavior.
  • Integration-level service paths.

Run tests with:

pytest

@EttoreM EttoreM requested a review from elichad June 16, 2026 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document how to use the Five Safes RO-Crate profile with the validation service

4 participants