[FEATURE] Provide a possibility to store and load an empty graph#685
[FEATURE] Provide a possibility to store and load an empty graph#685reta wants to merge 4 commits into
Conversation
c9c2ef2 to
a2b357f
Compare
Signed-off-by: Andriy Redko <drreta@gmail.com>
There was a problem hiding this comment.
@reta thanks for this PR. I've left a few comments. Additionally, I tried running a test by saving and loading an empty heap graph to an OnDiskGraphIndex, but this failed an assertion while loading the graph. The test in this PR only covers OnHeapGraphIndex::save and OnHeapGraphIndex::load.
I'll try and take a look at this later. In the meantime, if you manage to figure out the issue, please do update the PR.
Failure details
The error:
java.lang.AssertionError: Node ID -1419795886 out of bounds for layer 1
The sample program is below, please run with assertions enabled.
import java.io.IOException;
import java.nio.file.Path;
import java.util.List;
import java.util.Map;
import io.github.jbellis.jvector.disk.ReaderSupplierFactory;
import io.github.jbellis.jvector.graph.GraphIndexBuilder;
import io.github.jbellis.jvector.graph.ListRandomAccessVectorValues;
import io.github.jbellis.jvector.graph.disk.GraphIndexWriter;
import io.github.jbellis.jvector.graph.disk.GraphIndexWriterTypes;
import io.github.jbellis.jvector.graph.disk.OnDiskGraphIndex;
import io.github.jbellis.jvector.graph.disk.feature.FeatureId;
import io.github.jbellis.jvector.graph.disk.feature.InlineVectors;
import io.github.jbellis.jvector.graph.similarity.BuildScoreProvider;
import io.github.jbellis.jvector.vector.VectorSimilarityFunction;
public class Test {
public static void main(String[] forwardArgs) throws IOException {
var dim = 1024;
var vsf = VectorSimilarityFunction.DOT_PRODUCT;
var M = 32;
var ef = 100;
var nOv = 1.2f;
var alpha = 1.2f;
var hier = true;
var ravv = new ListRandomAccessVectorValues(List.of(), dim);
var bsp = BuildScoreProvider.randomAccessScoreProvider(ravv, vsf);
var graphPath = Path.of("./local/tmp.jvgraph");
try (
var builder = new GraphIndexBuilder(bsp, dim, M, ef, nOv, alpha, hier);
var graph = builder.build(ravv);
var writer = GraphIndexWriter.getBuilderFor(GraphIndexWriterTypes.RANDOM_ACCESS_PARALLEL, graph, graphPath)
.with(new InlineVectors(dim))
.build();
) {
writer.write(Map.of(FeatureId.INLINE_VECTORS, i -> new InlineVectors.State(ravv.getVector(i))));
System.out.println("OHG max level = " + graph.getMaxLevel());
}
try (
var readerSupplier = ReaderSupplierFactory.open(graphPath);
var graph = OnDiskGraphIndex.load(readerSupplier);
) {
var count = 0;
var niter = graph.getNodes(0);
while (niter.hasNext()) {
count++;
System.out.println(">>> " + niter.next());
}
System.out.println(String.format("Counted %d nodes", count));
graph.getDegree(0);
}
}
}| var graph = TestUtil.buildSequentially(builder, ravv); | ||
|
|
||
| try (var out = TestUtil.openDataOutputStream(indexDataPath)) { | ||
| ((OnHeapGraphIndex) graph).setAllMutationsCompleted(); |
There was a problem hiding this comment.
nit: TestUtil.buildSequentially already calls setAllMutationsCompleted (transitively)
There was a problem hiding this comment.
I believe TestUtil.buildSequentially calls cleanup but which does not clear all mutations, removing explicit call for setAllMutationsCompleted will lead to:
java.lang.IllegalStateException: Cannot save a graph with pending mutations. Call cleanup() first
The explicit call is needed since cleanup() does short-circuit on empty graph (no nodes) and exits early.
| * in a View that should be created per accessing thread. | ||
| */ | ||
| public interface ImmutableGraphIndex extends AutoCloseable, Accountable { | ||
| int OMITTED = OrdinalMapper.OMITTED; // same as OrdinalMapper, since OrdinalMapper::oldToNew may return it |
There was a problem hiding this comment.
I'd suggest defining a new constant, say ENTRY_NODE_ABSENT, rather than reusing OrdinalMapper.OMITTED. I don't think there is any advantage in reusing the constant (expand the widget for more details).
More details
The description for OrdinalMapper.OMITTED mentions that this value may be returned newToOld, but no mention is made of oldToNew. In fact, the default OrdinalMapper implementation (AbstractGraphIndexWriter::sequentialRenumbering) appears to handle holes in the "original" ordinals by mapping them to -1 (OrdinalMapper.OMITTED is currently defined as Integer.MIN_VALUE).
| @Override | ||
| public int getDegree(int level) { | ||
| return layerInfo.get(level).degree; | ||
| return layerInfo.isEmpty() ? 0 : layerInfo.get(level).degree; |
There was a problem hiding this comment.
I don't think it's valid to have an empty layerInfo, even for an empty graph. Maybe the default should be removed, I'd say it's desirable to keep the existing behaviour which throws an error in case the layerInfo is somehow empty rather potentially causing a logic error somewhere down the line by returning zero. Similarly for all the other locations.
There was a problem hiding this comment.
Fixed that, the problem was caused by OnHeapGraph::getMaxLevel (https://github.com/datastax/jvector/pull/685/changes#diff-6fdbd8088adf1995ab097e86d1ca6f9f9f4cc1384a06bb438180284659812f2b) which returned -1 instead of 0, thanks!
Signed-off-by: Andriy Redko <drreta@gmail.com>
Signed-off-by: Andriy Redko <drreta@gmail.com>
Signed-off-by: Andriy Redko <drreta@gmail.com>
Thanks @ashkrisk , sorry forgot to update on this one, I run it against latest changes with assertions enabled - no issues, please let me know if I am missing something, thanks! |
Provide a possibility to store and load an empty graph, see please #684 for motivation and background.
Closes #684