Skip to content

Fix ResidualVisitor null handling for comparisons and not-NaN#3520

Open
tanmayrauth wants to merge 1 commit into
apache:mainfrom
tanmayrauth:fix-3498-residual-visitor-null-handling
Open

Fix ResidualVisitor null handling for comparisons and not-NaN#3520
tanmayrauth wants to merge 1 commit into
apache:mainfrom
tanmayrauth:fix-3498-residual-visitor-null-handling

Conversation

@tanmayrauth

Copy link
Copy Markdown

ResidualVisitor diverged from row-level expression evaluation on null values:

  • visit_less_than / visit_less_than_or_equal / visit_greater_than / visit_greater_than_or_equal compared the partition value to the literal directly. A nullable identity-partitioned column with a None partition value raised a TypeError (None < literal), while _ExpressionEvaluator guards with "value is not None" and treats the row as non-matching. Add the same guard so a null partition value yields AlwaysFalse instead of crashing during scan planning (ResidualEvaluator.residual_for).

  • visit_not_nan returned AlwaysFalse for a None value because None is not a SupportsFloat, whereas _ExpressionEvaluator.visit_not_nan (val == val) treats null as satisfying not-NaN. Invert the check so only NaN fails not-NaN and null (and any non-float value) passes, matching row evaluation.

Update the test that encoded the old NotNaN(None) -> AlwaysFalse result and add a regression test covering None partition values for all four ordering comparisons.

Fixes #3498 (partially)

Rationale for this change

Are these changes tested?

Are there any user-facing changes?

ResidualVisitor diverged from row-level expression evaluation on null values:

- visit_less_than / visit_less_than_or_equal / visit_greater_than /
  visit_greater_than_or_equal compared the partition value to the literal
  directly. A nullable identity-partitioned column with a None partition value
  raised a TypeError (None < literal), while _ExpressionEvaluator guards with
  "value is not None" and treats the row as non-matching. Add the same guard so
  a null partition value yields AlwaysFalse instead of crashing during scan
  planning (ResidualEvaluator.residual_for).

- visit_not_nan returned AlwaysFalse for a None value because None is not a
  SupportsFloat, whereas _ExpressionEvaluator.visit_not_nan (val == val) treats
  null as satisfying not-NaN. Invert the check so only NaN fails not-NaN and
  null (and any non-float value) passes, matching row evaluation.

Update the test that encoded the old NotNaN(None) -> AlwaysFalse result and add
a regression test covering None partition values for all four ordering
comparisons.

Fixes apache#3498 (partially)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Visitor and evaluator edge cases can over-prune files or mishandle nulls

1 participant