Skip to content

Replace the export partition local manifests lock by a multiversion container#1912

Open
arthurpassos wants to merge 14 commits into
antalya-26.3from
rw_lock_for_partition_export
Open

Replace the export partition local manifests lock by a multiversion container#1912
arthurpassos wants to merge 14 commits into
antalya-26.3from
rw_lock_for_partition_export

Conversation

@arthurpassos

@arthurpassos arthurpassos commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Yet another attempt to reduce lock contention observed while slow writes / commits to iceberg backed by ice-rest-catalog. Uses a MultiVersion container instead of locking. Follow up of #1813. Closes #1920

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • OAuth (5m)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown

Workflow [PR], commit [653f722]

@arthurpassos arthurpassos added port-antalya PRs to be ported to all new Antalya releases antalya antalya-26.3 labels Jun 16, 2026
@svb-alt svb-alt requested a review from mkmkme June 19, 2026 13:36
@arthurpassos

Copy link
Copy Markdown
Collaborator Author

@codex review

@arthurpassos arthurpassos changed the title RW lock for the export partition feature Replace the export partition local manifests lock by a multiversion container Jun 24, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef56768b4f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


std::vector<ReplicatedPartitionExportInfo> infos;
infos.reserve(snapshots.size());
std::vector<ReplicatedPartitionExportInfo> infos(model->size());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reserve the export info vector instead of pre-sizing it

When the read model contains N exports, this constructs N default ReplicatedPartitionExportInfo rows and the loop then appends the N real rows with emplace_back. As soon as any export exists, system.replicated_partition_exports will expose extra blank rows and report twice as many entries; this should reserve capacity rather than resize the vector.

Useful? React with 👍 / 👎.

@arthurpassos

Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 46f5b11775

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


std::vector<ReplicatedPartitionExportInfo> infos;
infos.reserve(snapshots.size());
std::vector<ReplicatedPartitionExportInfo> infos(model->size());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid pre-sizing infos before appending rows

When there is at least one published export entry, this constructs model->size() default ReplicatedPartitionExportInfo objects and then the loop appends another row for each real entry with emplace_back. As a result, system.replicated_partition_exports returns one blank/default row per real export in addition to the correct rows, with fields like destination/table/status empty and some numeric fields default/uninitialized. This should reserve capacity instead of changing the vector size.

Useful? React with 👍 / 👎.

@mkmkme mkmkme Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a forgotten thing to be fixed

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, i did fix it yesterday before pinging you. But I think I forgot to push it.

@mkmkme mkmkme left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes since the comment found by Codex looks like one worth fixing. The rest of my comments is open for discussions and not necessarily an urge for action.

bool local_entry_found = false;
bool local_entry_pending = false;
std::string local_composite_key;
{

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a style nitpick: instead of this

{
    const auto model = export_read_model.get();
    if (model)
    {
        /// do stuff with model...
    }
}

you could do this

if (const auto model = export_read_model.get())
{
    /// do stuff with model...
}

{
const auto & key = it->getCompositeKey();
const auto & key = entry.getCompositeKey();
if (zk_children.contains(key))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this loop got split into two separate loops, this body looks quite simple. It can be even simpler if done as

for (const auto & entry : entries_by_key)
{
    const auto & key = entry.getCompositeKey();
    if (!zk_children.contains(key))
        stale_keys.emplace_back(key, entry.manifest.transaction_id);
}

IMHO this looks a bit more straightforward and to the point of what we actually expect from the loop.

But the bigger question is: was there any justification for splitting the loop in the first place? I agree it does look simpler now, but now we have two iterations instead of one (and introduces another entries_by_key.find() operation that wasn't needed in the previous version). And IIUC both loops are under one larger lock (IOW there's no reduce in lock scope). So I'm a bit puzzled here

local_status_changes.pop();
continue;
}
const std::string transaction_id = it->manifest.transaction_id;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be moved closer to line 711 where it's actually needed. Also this could be a const-ref.

And since it's used only once, maybe we don't need it as a local var at all and can continue calling killExportPart with it->manifest.transaction_id directly

#include <Common/EventNotifier.h>
#include <Common/MultiVersion.h>
#include <Common/ProfileEventsScope.h>
#include <Common/SharedMutex.h>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this include needed here? Don't see any code changes in the header that would require it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya antalya-26.3 port-antalya PRs to be ported to all new Antalya releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Querying system tables takes too long during exports

3 participants