Skip to content

Reject unsupported identity transform types#3517

Open
zhjwpku wants to merge 2 commits into
apache:mainfrom
zhjwpku:fix-identity-transform-unsupported-types
Open

Reject unsupported identity transform types#3517
zhjwpku wants to merge 2 commits into
apache:mainfrom
zhjwpku:fix-identity-transform-unsupported-types

Conversation

@zhjwpku

@zhjwpku zhjwpku commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Rationale for this change

Per spec[1], the identity transform source type must be any primitive except for geometry, geography, and variant. This PR rejects GeographyType and GeometryType as source types for identity transforms.(No VariantType yet)

[1] https://iceberg.apache.org/spec/#partition-transforms

Are these changes tested?

Yes

Are there any user-facing changes?

No

@Fokko

Fokko commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Good one @zhjwpku, I think we missed updating this when adding the V3 types. Should we disallow VariantType as well?

@zhjwpku

zhjwpku commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

Good one @zhjwpku, I think we missed updating this when adding the V3 types. Should we disallow VariantType as well?

Hi @Fokko , the spec did mentioned VariantType, but since VariantType is not a primitive type, I think there is no different to add it or not. I've started a thread[1] to discuss whether we should polish the spec, you may want to take a look :-)

[1] https://lists.apache.org/thread/bv9461xjhph24s424g7mtczojq60xr7y

Comment thread pyiceberg/transforms.py

def can_transform(self, source: IcebergType) -> bool:
return source.is_primitive
return source.is_primitive and not isinstance(source, (GeographyType, GeometryType))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely understand that VariantType isn't a primitive, but I think we should add it here:

  • It doesn't hurt anything, since it won't be called
  • It stops a couple LLM-generated PRs come in that "notice" that VariantType is missing here
  • It makes this method explicitly match the spec. The code should match the spec. Divergences are where people begin to not trust the spec.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree we should follow the spec before a decision is made to change it. However, after searching the codebase, it seems that VariantType is not currently supported in Iceberg Python, and supporting VariantType feels like a much broader effort that should probably be addressed in a separate PR, WDYT?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just add a TODO and let's move on?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks.

@rambleraptor rambleraptor left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got my one nit, but broadly this looks great. Thanks for doing this!

Comment thread tests/test_transforms.py
def test_identity_transform_unsupported_type(type_var: PrimitiveType) -> None:
assert not IdentityTransform().can_transform(type_var)


Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could improve the test coverage for other methods such as transform.

If we want to test can_transform only, I suggest renaming to test_identity_can_transform_unsupported_type.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as suggested, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants