Skip to content

ci: simplify integration test#216

Merged
maxday merged 28 commits into
masterfrom
maxday/integ-test
Jul 2, 2026
Merged

ci: simplify integration test#216
maxday merged 28 commits into
masterfrom
maxday/integ-test

Conversation

@maxday

@maxday maxday commented May 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Replace the legacy CodeBuild-based CI with a GitHub Actions workflow that runs unit tests in containers
and integration tests against real Lambda functions deployed as OCI images.

  • Remove all ci/codebuild/ infrastructure, vendored gtest, and old integration test harness (~27k lines
    deleted)
  • Add a new integration-tests.yml workflow with a matrix of 5 OS targets (al2023, al2023-arm, ubuntu,
    alpine, arch)
  • Unit tests run in native containers; integration tests build Docker images, push to ECR, deploy as
    Lambda functions, invoke them, and assert against snapshot files

Changes

  • .github/workflows/tests.yml new workflow: unit-test job (container matrix) +
    integration-test job (build → ECR push → Lambda deploy → invoke → assert → cleanup)
  • ci/integ/ new helper scripts: install-deps.sh, unit-test.sh, invoke.sh, assert.sh, and 4
    Dockerfiles (al2023, alpine, arch, ubuntu)
  • tests/ replace vendored gtest + integration tests with a snapshot-based approach using
    lambda_function.cpp multi-handler binary
  • ci/codebuild/, ci/docker/, ci/update-images.sh, ci/README.md deleted (superseded by GHA)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.


// Read the handler from the environment variable
const char* handler_name = std::getenv("_HANDLER");
const char* handler_name = std::getenv("HANDLER");

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_HANDLER is correct though https://docs.aws.amazon.com/lambda/latest/dg/configuration-envvars.html#configuration-envvars-runtime

oh I see it's related to the .zip -> oci swap and movement of resource management from the tests code to the github actions code

Model::FunctionCode funcode;
funcode.SetZipFile(std::move(zip_file_bytes));
create_function_request.SetCode(std::move(funcode));
create_function_request.SetRuntime(Aws::Lambda::Model::Runtime::provided_al2);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall feedback: this integration test coverage isn't equivalent to what was being done in CodeBuild. Now the functions are uploaded as images, whereas before they were uploaded as .zip

In otherwords, this test workflow refactor loses coverage of the https://github.com/awslabs/aws-lambda-cpp/blob/master/packaging/packager script (see add_custom_target(aws-lambda-package-lambda-test-fun in test/resources/CMakeLists.txt) which was one of the things we spent a bunch of time figuring out for this project.

This is important coverage to keep, as this project has been usable from multiple operating systems from the start without requiring Docker

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Just to make sure we want to:

  • still build the RIC from different OS (in container)
  • package the resulting binary in zip
  • test on zip-deployed AWS Lambda with provided al2023 runtime
    is that correct?

In any case, I would love to keep those OCI tests, as it increases the coverage and ensure that the RIC is compatible with different base OS when deployed as OCI image. WDYT?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah for sure keep the OCI versions too.

zip versions I think would be a similar matrix to what you have already for the unit-test, with an added step to ninja install aws-lambda-package-lambda-test-fun before doing similar function create + invoke steps

Comment thread tests/CMakeLists.txt
endif()


find_package(AWSSDK COMPONENTS lambda iam QUIET)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torn on losing this.

Like it's nice that there's not an AWS SDK CPP install required any more, which took ages and needed to be hidden by caching in the build images. But now there's not a way to run the integration tests without triggering CI. https://github.com/awslabs/aws-lambda-cpp/blob/master/README.md#running-unit-tests-locally

I think it's worth keeping (but also changing in whatever way is needed to not explode CI runtimes if the caching is getting lost)

@maxday maxday requested a review from bmoffatt July 1, 2026 10:30

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using the aws in bash and not boto3 and python?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to ask people to install Python just to contribute to this cpp project + personal preference, to be honest: those files are for tests only, so we're not going to test the tests. If something fails, the test itself will fail. I thought it was a good trade-off. Happy to change if you have more concerns

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it easier to read python than bash but both work, approved.

-t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG \
.
docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
echo "image_uri=$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG" >> "$GITHUB_ENV"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clear but I have a question. Would moving on the SAM cli help us defining this kind of infrastructure better?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a larger scope change. My inclination is to defer this to a follow-up PR if we feel it's a blocker.

@darklight3it darklight3it left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

Comment thread .github/workflows/tests.yml Outdated

integration-test:
runs-on: ${{ matrix.build.runner }}
container: ${{ matrix.deploy_type == 'zip' && matrix.build.container || '' }}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever condition but not that readable, i'd prefer having something more readable

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've split it into multiple jobs, no more conditions

Comment thread .github/workflows/tests.yml Outdated
strategy:
fail-fast: false
matrix:
deploy_type: [oci, zip]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have plenty of conditions based on the deploy_type this is a hint that splitting the jobs would be more readable and easy to maintain

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've split it into multiple jobs, no more conditions

Comment thread .github/workflows/tests.yml Outdated
- name: Deploy Lambda function (zip)
if: matrix.deploy_type == 'zip'
env:
FUNCTION_NAME: ${{ env.FUNCTION_PREFIX }}-${{ matrix.deploy_type }}-${{ matrix.build.os }}-${{ matrix.test.name }}-${{ github.run_id }}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This re-defined in other places, we could have it defined in a single place and re-use it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately, GitHub Actions doesn't allow matrix.* in job-level env, I could hack into creating a step before and store it in GITHUB_ENV but I would prefer to keep it as is if that's fine with you

Comment thread ci/integ/invoke.sh Outdated
$PAYLOAD_ARG \
/tmp/response_payload > /tmp/invoke_result.json

echo "Status code: $(jq '.StatusCode' /tmp/invoke_result.json)"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that we're using a hard-coded path. 1. this might cause conflicting jobs to mutate the same file, 2. We tightly couple assert.sh with the path we have in here

I'd change this to use something specific to a single run and have a mechanism to get the generated path and then pass it to the assertion script, or easier just have assert and invoke in the same file

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now using mktemp to prevent conflicts (which I don't think were possible since they're running on different hosts, but better to be safe, good catch)

AWS_REGION: ${{ secrets.AWS_REGION }}
FUNCTION_PREFIX: lambda-cpp-integ

jobs:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's define an explicit timeout to avoid waiting for 6h in case of an issue

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread ci/integ/invoke.sh Outdated
PAYLOAD_ARG=""
fi

aws lambda invoke \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this fails?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script already has set -euo pipefail, so a failed aws lambda invoke will exit non-zero and fail the step

Comment thread ci/integ/README.md
To run the full zip integration test (build, deploy, invoke, assert, cleanup) in one shot:

```bash
./ci/integ/run-zip-integ.sh [OS] [HANDLER] [PAYLOAD] [ASSERTION] [ARCH]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried running this on a mac just to catch any potential issues?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since mac is not supported on Lambda, I think it's safe to assume that development needs to happen on a unix based system

Comment thread ci/integ/run-zip-integ.sh Outdated
--zip-file fileb://build/tests/resources/lambda-test-fun.zip \
--environment "Variables={HANDLER=$HANDLER}"

aws lambda wait function-active-v2 --function-name "$FUNCTION_NAME"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an explicit timeout here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WORKDIR /src
COPY . .

RUN mkdir build && cd build && \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we don't use ENABLE_SANITIZERS here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- echo Build started on `date`
- ./ci/codebuild/build.sh -DTEST_RESOURCE_PREFIX=lambda-cpp-al2_$(arch)
- ./ci/codebuild/run-tests.sh aws-lambda-package-lambda-test-fun
- ./ci/codebuild/run-tests.sh aws-lambda-package-lambda-test-fun-no-glibc

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentionally dropped? i see no backfill in the current approach

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they're now back, good catch

Comment thread ci/integ/README.md
| `AWS_REGION` | AWS region (e.g. `us-east-1`) |
| `LAMBDA_EXECUTION_ROLE_ARN` | ARN of the Lambda execution role |

## Running Zip Integration Tests Locally

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

Comment thread ci/integ/README.md
```bash
export AWS_REGION="us-east-1"
export LAMBDA_EXECUTION_ROLE_ARN="arn:aws:iam::123456789012:role/your-lambda-role"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uber nit, it'd be nice if the test.yml used this entry was well, to help ensure it stays working for local runs.

not necessarily need to DRY it, an extra workflow that validated this script worked would suffice too

Comment thread ci/integ/package-zip.sh
fi

rm -rf build && mkdir build && cd build
cmake .. -GNinja \

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call out in the integ README that local test script needs ninja, aws cli installed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


- name: Build and run unit tests
shell: bash
run: ./ci/integ/unit-test.sh ${{ matrix.os }}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~/aws-lambda-cpp
$ ./ci/integ/unit-test.sh al2023
-- Unit tests skipped: Not in GitHub Actions environment
-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /home/moffattb/aws-lambda-cpp/build
[ 62%] Built target aws-lambda-runtime
[100%] Built target lambda-test-fun
Test project /home/moffattb/aws-lambda-cpp/build
No tests were found!!!

can this be resolved in this PR too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxday added 5 commits July 2, 2026 08:52
On AL2023, ldd reports the loader as /lib64/ld-linux-x86-64.so.2 but
rpm lists it under /usr/lib64/. The string mismatch causes hasElement
to miss it, leaving PKG_LD unset. In no-glibc mode (-d), the fallback
loop that rescues PKG_LD is skipped entirely since INCLUDE_LIBC=false.

Add fallback loader detection in the main ldd loop for libraries that
aren't matched in libc_libs but whose basename starts with ld-*. Also
only require PKG_LD when INCLUDE_LIBC=true, since -d mode uses the
binary directly as the bootstrap and never references the loader.
The worst-case function name (zip/no-glibc/al2023-arm/crash_backtrace)
was 68 characters, exceeding Lambda's 64-character limit. Shorten the
prefix from "lambda-cpp-integ" to "cpp-integ" (worst case now 61).
The worst-case zip function name (no-glibc/al2023-arm/crash_backtrace)
was 68 characters, exceeding Lambda's 64-character limit. Drop the
redundant "zip-" segment and use a short packaging id ("noglibc"
instead of "no-glibc") to bring the worst case down to 63 characters.
@maxday maxday merged commit 1460d99 into master Jul 2, 2026
66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants