CI: htmldoc artifacts and package#4150
Conversation
|
And again, the CI broke because sid is broken. Might be we should make the sid package build allow to fail? I tested it: I guess it will be fixed soon in sid. |
|
I'm not sure that the webserver can use oras. Besides, I'm not sure you would want to involve yet another third party in this process. The point is that github already has stored any built artifact (like in the deb package builds). The question is whether we can exploit that. We only need the link to the artifact. |
|
And, yes, a soft-fail on Debian:sid may be appropriate. |
It is a debian package, depending how it is set up,
The only issue there is, that artifacts are linked to CI runs. So for an update you would have to download the file manually. Even wget doesn't work to download the file when you use right-click copy link due to you have to be logged in. Is this good enough? Otherwise what I found so far: |
That is a problem, right there... We can't install or sudo on the webserver.
Yes, that was what I was thinking about. Don't know if it works. That's why I was inquiring ;-) The problem was that the htmldocs run did not produce any artifacts, so we had no chance of testing in any direction.
That looks like a way. Generating a limited access token, just to get the file, may be a possibility. |
With c323d8e, the other packages are built, even if sid fails. But sid is shown red. |
I found something that works: Get url's of all artifacts that match the branch and are named linuxcnc-doc: TOKEN="YourToken"
BRANCH="ci_doc_build"
NAME="linuxcnc-doc"
curl -L \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${TOKEN}" \
-H "X-GitHub-Api-Version: 2026-03-10" \
"https://api.github.com/repos/linuxcnc/linuxcnc/actions/artifacts?per_page=100&name=${NAME}" | \
jq ".artifacts[] | select(.workflow_run.head_branch==\"${BRANCH}\") | .archive_download_url"Download the zip: curl -L \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${TOKEN}" \
-H "X-GitHub-Api-Version: 2026-03-10" \
https://api.github.com/repos/LinuxCNC/linuxcnc/actions/artifacts/7537361103/zip -o linuxcnc-doc.zipCombined: TOKEN="YourToken"
BRANCH=ci_doc_build
NAME="linuxcnc-doc"
DL_URL=$(curl -L \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${TOKEN}" \
-H "X-GitHub-Api-Version: 2026-03-10" \
"https://api.github.com/repos/linuxcnc/linuxcnc/actions/artifacts?per_page=100&name=${NAME}" | \
jq ".artifacts[] | select(.workflow_run.head_branch==\"${BRANCH}\") | .archive_download_url" | head -n1)
#DL_URL has quotes, remove them
DL_URL=${DL_URL//\"/}
curl -L \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${TOKEN}" \
-H "X-GitHub-Api-Version: 2026-03-10" \
"$DL_URL" -o linuxcnc-doc.zipLooks like the links are sorted by date, so I can just take the first one. Does that work for you? You need As soon as this branch is merged, you can change Edit: Using |
|
I removed the commit creating packages. |
|
That looks very good that we might be almost there to try. @andypugh This might be able to restore the devel docs on the webserver. Can you check on the webserver if it has |
|
There will shure be a way. Otherwhise just copy over the binary of jq or cobble together something in bash / awk to get the right url string. |
|
This also bites |
Sorry, I missed this last night, and don't have the relevant keys on my work laptop. I can check when I get home. |
Unfortunately not. [pdx1-shared-a3-06]$ jq --version |
|
Please try see if PHP and the json lib are installed. Put this is a file (test.php): <?php
var_dump(json_decode('{"a":123}'));and run: |
|
That looks more promising. |
|
Very good. We'll go the PHP way. @hdiethelm the artifact does not include the |
|
@BsAtHome sounds good, the PHP route it is. Here is a self-contained PHP fetch-and-deploy script for the webserver. It needs only curl plus the PHP json and zip extensions, no third-party tools and no shelling out. On the artifact packaging point you raised: this script is agnostic to it. It locates the entrypoint after extraction, so it works whether the zip holds the contents of How it works:
Security notes for review: all curl handles pin to https and verify TLS; zip entry names are audited for traversal before extraction; the download is checked against Content-Length and a size ceiling; the token is read with a strict charset check and never logged. I have lint-checked it on PHP 7.4 and 8.2 (both clean) and unit-tested the security-critical parts (symlink-safe cleanup, the zip-slip audit, unwrap detection, the atomic swap, prune retention). I have not run it end to end against the live API or webserver, so that part is unverified. Two things to confirm before wiring it up:
Are you already drafting the PHP script, or shall I finish this one? Writing it needs no server access. @andypugh would handle the actual install on the webserver (token, cron entry, |
|
We also need to find out whether Andy's account on the webserver is allowed to write/change the page serving directory... That would be a real stopper ;-) Is it really necessary to download 100x100 artifacts? Calling the github 100 times in a burst seems like an attempted denial of service and may be flagged. According to above post, these artifacts are sorted by when they were created/run (newest first). You may want to use references in your for/foreach loops to reduce the number of copies created of values/instances/arrays (reduces memory footprint). Keeping 5 releases by default is currently more then 1.5 GB. A bit overkill. Failing on missing zip extension should not be necessary. We must test the availability of both cURL and zip modules before this can even start to become working. And when the modules are there, then they are there. Besides, we need to see how much memory is available to php-cli because we are handling large files and php is not always the best to keep a low-profile memory footprint. |
|
Thanks, all fair. Updated the script:
On memory: the script never loads a file into PHP. curl streams the download straight to a file handle and ZipArchive extracts entry by entry to disk, so
Agreed, that is the real gate. The cron user needs write on the parent of the served path, since publishing renames a symlink there. If that account cannot write it, none of this works and we would need a different publish path. Worth confirming before going further. |
|
I do have write access, and have used it to put up a placeholder text. |
|
One question is why the docs disappeared rather than simply not being updated. Is there a risk that the buildbot (if, indeed, it is the buildbot) will delete the docs after each successful build unless we stop it? Though we can probably prevent that by de-authorising the key: |
|
If you are curious, the rsync-server script there does not do the rsync. |
My guess is that it stopped because the files in question are no longer available at the location they were expected to be. But certainty is only provided by reading/disabling the old sync-code. It seems that by blocking the script/pubkey you can disable the buildbot update? It broke right after we changed the build layout. That does not seem to be random. Anyway, we could always move the new tree right next to the old directory and update all links, if that would ever be required. BTW, could you try to run this to see if the some expected extensions are available or we need workarounds for them too: <?php
$f = false;
foreach (['curl', 'zip', 'json'] as $ext) {
if (!extension_loaded($ext)) {
echo "required PHP extension not loaded: $ext\n";
$f = true;
}
}
if (!function_exists('symlink')) {
echo "symlink() is disabled (disable_functions); cannot publish atomically\n";
$f = true;
}
if(!$f) {
echo "All fine!\n";
} |
|
Now I'm wondering what happens to the 2.9 branch docs when it gets updated. Do the webserver docs still update in that 2.9 tree automatically? |
|
And, for the record, I hate reviewing LLM generated code. Please install a brain and use it. |
|
On #7 I was avoiding parsing |
|
Same here, I smell the LLM's from far away, way to much code to solve a simple problem and way to long comments with funny UTF symbols in it. LLM's often just leak the laziness of a developer to find a solution that's simple, short and elegant. In principle I have nothing against LLM's if the code is good quality and de-sloped / reviewed by hand after generating. The bash script looks already better.
Here is the json blob for an artifact: {
"id": 7537361103,
"node_id": "MDg6QXJ0aWZhY3Q3NTM3MzYxMTAz",
"name": "linuxcnc-doc",
"size_in_bytes": 228886335,
"url": "https://api.github.com/repos/LinuxCNC/linuxcnc/actions/artifacts/7537361103",
"archive_download_url": "https://api.github.com/repos/LinuxCNC/linuxcnc/actions/artifacts/7537361103/zip",
"expired": false,
"digest": "sha256:c0b65691d88f58cf26d7aa5f22b138cf3ef07e8b2fe847f7d28af24b9e6da27e",
"created_at": "2026-06-10T13:38:07Z",
"updated_at": "2026-06-10T13:38:07Z",
"expires_at": "2026-09-08T13:20:02Z",
"workflow_run": {
"id": 27279180672,
"repository_id": 3662905,
"head_repository_id": 1157775434,
"head_branch": "ci_doc_build",
"head_sha": "41d37df8541b619f560083c6ad513c4fe9834252"
}
} |
|
@BsAtHome Any reason to have the images executable? Options:
edit: I hope that doesn't break anything but having images executable is probably not a good idea. See next commit. |
There is no reason to have images executable
|
Added the sha256 check, the filter now pulls On the curl return: I am preventing any executable from passing, so permissions have to be fixed for this not to fail. Edit: |
|
The images were added with exec on them and git faithfully adds them that way. Typical for images from windoze systems and copies from FAT formatted partitions that were mounted without stripping exec. Simply removing the exec bits in this commit should be fine. |
The doc says: Due to we introduce the artifact here and use v7, this check is not needed. Hard fail is more secure. And fix it if it ever happens, which I don't think it will. |
So, it looks good now, no executable images any more. |
Revised, missing digest aborts, mismatch aborts, no other checks. |
|
(pedantic) It is customary to add a space after the '<' in |
|
Thanks, all applied: Added the space for |
|
The BTW, |
|
Switched to @andypugh could you confirm on the webserver: grep --version | head -1
printf '<html>caf\xc3\xa9 \xf0\x9f\x98\x80 <?php x ?></html>' > /tmp/u.html
LC_ALL=C grep -rIq -e '<?php' /tmp/u.html && echo "caught (good)" || echo "MISSED"
rm -f /tmp/u.htmlShould print |
|
Anyway, who uses .zip in the Linux world... Changed the doc artifact to .tar.gz. It's also nicer so no need to use this path expansion trick to have the the html folder inside. However, there is one tiny downside: If you need this artifact in an other CI job, you need to extract it. If you use the other variant, it arrives extracted. |
|
I commented in the wrong thread. But would Python be a good choice for the downloading script in the cron job? |
|
https://github.com/andypugh/DocsUploader/blob/main/download_ci_artifacts.py I am not committed to Python, it's just the php felt like a bit of a hack. |
|
@hdiethelm the English doc styling bug I mentioned in #4152 turned out to be a master-side issue, now fixed in #4174 (merged). English pages were rendering from The re-run did not pick it up: a GitHub "re-run" rebuilds the same PR merge commit from the original trigger (before #4174), it does not recompute the merge against current master. I downloaded the latest tar and English is still Could you rebase |
|
I tried re-running the jobs to see if the update in #4174 worked, but for whatever reason that didn't have the desired effect. |
|
I am not clear if this will also create docs for 2.9? |
|
Not yet. #4150 is master-only, so only the devel tar is produced. On 2.9 the htmldocs job builds the docs but does not tar or upload them (its only artifacts there are the Debian packages). Covering 2.9 would mean backporting the tar + upload step and running a second cron with Before that though, it is worth pinning down how the buildbot has been updating docs, since the cron is meant to replace it. From this thread the webserver side is just the |
|
@andypugh about the script: A short list, split into must-have vs optional in case you want a minimal version: Essential (or it will not deploy):
Nice to have (your call if worth it):
|
I have already disabled the buildbot's key on the server. |
Still looks like AI, way to much functionality. Also it doesn't check head_repository_id. This is dangerous: When someone creates a PR from his fork branch master to linuxcnc branch master, the script will pull the artifact from his latest PR build, what ever is in there. @grandixximo's AI figured this out, I did not look well enough but my code was also more to show how you can do it in as few lines as possible without AI, so it is slim and no boiler plate... ;-)
This will: #4176 ;-) Let's see if the CI is happy with it. |
|
lets continue the script discussion here Changed to python, tried to slim it down, happy to reshape it however instructed. |


This PR adds:
A published package for htmldocsRemovedoras pull ghcr.io/linuxcnc/linuxcnc/doc-html:masterIf needed for the 2.9 branch, I can backport this.
The discussion started in: #4119
As much as I understand the github docs, only members are allowed to write packages. So PR's should not be able to create a package, even if one removes theif()to only run this stage always.For testing, I run this stage in my github account: https://github.com/hdiethelm/linuxcnc-fork/actions/runs/27276506335The package is here: https://github.com/hdiethelm/linuxcnc-fork
And can be downloaded with:
oras pull ghcr.io/hdiethelm/linuxcnc/doc-html:ci_doc_build_test@BsAtHome
Do you thing this does the job?
I will create a commit that removes the if() and see what happens. Then I will revert it again.