Skip to content

[FEATURE] Provide a possibility to store and load an empty graph#685

Open
reta wants to merge 4 commits into
datastax:mainfrom
reta:issue-684
Open

[FEATURE] Provide a possibility to store and load an empty graph#685
reta wants to merge 4 commits into
datastax:mainfrom
reta:issue-684

Conversation

@reta

@reta reta commented Jun 19, 2026

Copy link
Copy Markdown

Provide a possibility to store and load an empty graph, see please #684 for motivation and background.

Closes #684

@reta reta force-pushed the issue-684 branch 3 times, most recently from c9c2ef2 to a2b357f Compare June 19, 2026 17:58
Signed-off-by: Andriy Redko <drreta@gmail.com>
@reta reta marked this pull request as ready for review June 19, 2026 18:43

@ashkrisk ashkrisk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta thanks for this PR. I've left a few comments. Additionally, I tried running a test by saving and loading an empty heap graph to an OnDiskGraphIndex, but this failed an assertion while loading the graph. The test in this PR only covers OnHeapGraphIndex::save and OnHeapGraphIndex::load.

I'll try and take a look at this later. In the meantime, if you manage to figure out the issue, please do update the PR.

Failure details

The error:

java.lang.AssertionError: Node ID -1419795886 out of bounds for layer 1

The sample program is below, please run with assertions enabled.

import java.io.IOException;
import java.nio.file.Path;
import java.util.List;
import java.util.Map;

import io.github.jbellis.jvector.disk.ReaderSupplierFactory;

import io.github.jbellis.jvector.graph.GraphIndexBuilder;
import io.github.jbellis.jvector.graph.ListRandomAccessVectorValues;
import io.github.jbellis.jvector.graph.disk.GraphIndexWriter;
import io.github.jbellis.jvector.graph.disk.GraphIndexWriterTypes;
import io.github.jbellis.jvector.graph.disk.OnDiskGraphIndex;
import io.github.jbellis.jvector.graph.disk.feature.FeatureId;
import io.github.jbellis.jvector.graph.disk.feature.InlineVectors;
import io.github.jbellis.jvector.graph.similarity.BuildScoreProvider;
import io.github.jbellis.jvector.vector.VectorSimilarityFunction;

public class Test {
    public static void main(String[] forwardArgs) throws IOException {
        
        var dim = 1024;
        var vsf = VectorSimilarityFunction.DOT_PRODUCT;

        var M = 32;
        var ef = 100;
        var nOv = 1.2f;
        var alpha = 1.2f;
        var hier = true;

        var ravv = new ListRandomAccessVectorValues(List.of(), dim);
        var bsp = BuildScoreProvider.randomAccessScoreProvider(ravv, vsf);
        var graphPath = Path.of("./local/tmp.jvgraph");

        try (
            var builder = new GraphIndexBuilder(bsp, dim, M, ef, nOv, alpha, hier);
            var graph = builder.build(ravv);
            var writer = GraphIndexWriter.getBuilderFor(GraphIndexWriterTypes.RANDOM_ACCESS_PARALLEL, graph, graphPath)
                .with(new InlineVectors(dim))
                .build();
        ) {
            writer.write(Map.of(FeatureId.INLINE_VECTORS, i -> new InlineVectors.State(ravv.getVector(i))));
            System.out.println("OHG max level = " + graph.getMaxLevel());
        }

        try (
            var readerSupplier = ReaderSupplierFactory.open(graphPath);
            var graph = OnDiskGraphIndex.load(readerSupplier);
        ) {
            var count = 0;
            var niter = graph.getNodes(0);
            while (niter.hasNext()) {
                count++;
                System.out.println(">>> " + niter.next());
            }
            System.out.println(String.format("Counted %d nodes", count));
            graph.getDegree(0);
        }
    }
}

var graph = TestUtil.buildSequentially(builder, ravv);

try (var out = TestUtil.openDataOutputStream(indexDataPath)) {
((OnHeapGraphIndex) graph).setAllMutationsCompleted();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: TestUtil.buildSequentially already calls setAllMutationsCompleted (transitively)

@reta reta Jun 22, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe TestUtil.buildSequentially calls cleanup but which does not clear all mutations, removing explicit call for setAllMutationsCompleted will lead to:

java.lang.IllegalStateException: Cannot save a graph with pending mutations. Call cleanup() first

The explicit call is needed since cleanup() does short-circuit on empty graph (no nodes) and exits early.

* in a View that should be created per accessing thread.
*/
public interface ImmutableGraphIndex extends AutoCloseable, Accountable {
int OMITTED = OrdinalMapper.OMITTED; // same as OrdinalMapper, since OrdinalMapper::oldToNew may return it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest defining a new constant, say ENTRY_NODE_ABSENT, rather than reusing OrdinalMapper.OMITTED. I don't think there is any advantage in reusing the constant (expand the widget for more details).

More details

The description for OrdinalMapper.OMITTED mentions that this value may be returned newToOld, but no mention is made of oldToNew. In fact, the default OrdinalMapper implementation (AbstractGraphIndexWriter::sequentialRenumbering) appears to handle holes in the "original" ordinals by mapping them to -1 (OrdinalMapper.OMITTED is currently defined as Integer.MIN_VALUE).

/**
* Used by newToOld to indicate that the new ordinal is a "hole" that has no corresponding old ordinal.
*/
int OMITTED = Integer.MIN_VALUE;

public static Map<Integer, Integer> sequentialRenumbering(ImmutableGraphIndex graph) {
try (var view = graph.getView()) {
Int2IntHashMap oldToNewMap = new Int2IntHashMap(-1);

@Override
public int getDegree(int level) {
return layerInfo.get(level).degree;
return layerInfo.isEmpty() ? 0 : layerInfo.get(level).degree;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's valid to have an empty layerInfo, even for an empty graph. Maybe the default should be removed, I'd say it's desirable to keep the existing behaviour which throws an error in case the layerInfo is somehow empty rather potentially causing a logic error somewhere down the line by returning zero. Similarly for all the other locations.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed that, the problem was caused by OnHeapGraph::getMaxLevel (https://github.com/datastax/jvector/pull/685/changes#diff-6fdbd8088adf1995ab097e86d1ca6f9f9f4cc1384a06bb438180284659812f2b) which returned -1 instead of 0, thanks!

reta added 2 commits June 22, 2026 08:24
Signed-off-by: Andriy Redko <drreta@gmail.com>
Signed-off-by: Andriy Redko <drreta@gmail.com>
Signed-off-by: Andriy Redko <drreta@gmail.com>
@reta

reta commented Jun 23, 2026

Copy link
Copy Markdown
Author

I'll try and take a look at this later. In the meantime, if you manage to figure out the issue, please do update the PR.

Thanks @ashkrisk , sorry forgot to update on this one,

Jun 23, 2026 8:53:10 A.M. io.github.jbellis.jvector.vector.VectorizationProvider lookup
WARNING: Java vector incubator module is not readable. For optimal vector performance, pass '--add-modules jdk.incubator.vector' to enable Vector API.
Jun 23, 2026 8:53:16 A.M. io.github.jbellis.jvector.util.PhysicalCoreExecutor resolvePhysicalCoreCount
INFO: PhysicalCoreExecutor: using 10 worker threads, determined by OS topology probe on "Mac OS X" (detected 10 physical cores, clamped to availableProcessors=10).
Jun 23, 2026 8:53:16 A.M. io.github.jbellis.jvector.util.PhysicalCoreExecutor resolvePhysicalCoreCount
WARNING: PhysicalCoreExecutor: previous auto-sized worker threads on this node would have been 5, but aligning to cores now sets them to 10. Set the jvector.physical_core_count system property to override.
08:53:16.350 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Writing common header at position 0
08:53:16.351 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Writing 1 layers
08:53:16.351 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Common header finished writing at position 288
08:53:16.361 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Writing common header at position 296
08:53:16.361 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Writing 1 layers
08:53:16.361 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Common header finished writing at position 584
08:53:16.361 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Writing common header at position 0
08:53:16.361 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Writing 1 layers
08:53:16.362 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Common header finished writing at position 288
OHG max level = 0
....

08:53:16.386 [main] DEBUG io.github.jbellis.jvector.graph.disk.OnDiskGraphIndex - Loading OnDiskGraphIndex from offset=0
08:53:16.386 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Loading common header at position 0
08:53:16.388 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - 1 layers
08:53:16.388 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Common header finished reading at position 288
08:53:16.388 [main] DEBUG io.github.jbellis.jvector.graph.disk.OnDiskGraphIndex - Header loaded: version=6, dimension=1024, entryNode=-1, layerInfoCount=1
08:53:16.388 [main] DEBUG io.github.jbellis.jvector.graph.disk.OnDiskGraphIndex - Position after reading header=296
08:53:16.389 [main] DEBUG io.github.jbellis.jvector.graph.disk.OnDiskGraphIndex - Version 5+ onwards uses a footer instead of header for metadata. Loading from footer
08:53:16.389 [main] DEBUG io.github.jbellis.jvector.graph.disk.OnDiskGraphIndex - Loading OnDiskGraphIndex footer from offset=600
08:53:16.389 [main] DEBUG io.github.jbellis.jvector.graph.disk.OnDiskGraphIndex - Loading header offset=592
08:53:16.389 [main] DEBUG io.github.jbellis.jvector.graph.disk.OnDiskGraphIndex - Loading OnDiskGraphIndex header from offset=296
08:53:16.389 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Loading common header at position 296
08:53:16.389 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - 1 layers
08:53:16.389 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Common header finished reading at position 584
08:53:16.389 [main] DEBUG io.github.jbellis.jvector.graph.disk.OnDiskGraphIndex - Header loaded: version=6, dimension=1024, entryNode=-1, layerInfoCount=1, Position after reading header=592
Counted 0 nodes

I run it against latest changes with assertions enabled - no issues, please let me know if I am missing something, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Provide a possibility to store and load an empty graph

2 participants