Skip to content

Design: cuda.core Texture/Surface API surface  #2188

@rparolin

Description

@rparolin

Purpose

Design discussion for the texture / surface API in cuda.core — to settle the API shape and
naming before code review of the implementation. Reviewers asked for design sign-off in an issue
before we commit to a ~9k-line feature.

cc @leofang @mdboom @Andy-Jost @kkraus14 — you asked for a design pass; this is the home for it.

Proposed public surface (from #2095)

  • Array + ArrayFormat — opaque, hardware-laid-out GPU allocations backing textures/surfaces.
  • MipmappedArray — wraps CUmipmappedArray; get_level returns a non-owning Array kept alive
    by a strong ref to the parent.
  • TextureObject + TextureDescriptor — bindless texture handle + sampling state.
  • SurfaceObject — bindless surface handle; requires Array(surface_load_store=True).
  • ResourceDescriptor — factories from_array, from_mipmapped_array, from_linear, from_pitch2d.

Decisions to make

  1. Name of Array. ✅ Decided — rename ArrayCUDAArray.

    This type is an opaque cudaArray_t — the GPU stores it in a scrambled, hardware-defined layout
    with no linear pointer, so it cannot expose __cuda_array_interface__ / DLPack and cannot
    share memory zero-copy with cupy / numba-cuda / torch. The name Array implies an n-dimensional
    array that participates in that ecosystem — it can't. CuPy names the identical type CUDAarray,
    and its whole cupy.cuda.texture module already matches this PR's surface 1:1.

    Resolution: use CUDAArray — the PEP 8 CapWords form (deliberately differing from CuPy's exact
    CUDAarray casing to follow Python's class-naming standard). The name signals "CUDA texture/surface
    backing store," not "n-dimensional array." Open detail only: whether the related ArrayFormat
    follows suit (e.g. CUDAArrayFormat) for consistency.

  2. Interop path. Decision: ship only copy_from / copy_to (like CuPy), or also add a
    sanctioned copy-bridge helper to/from linear cuda.core buffers?

    Zero-copy is impossible, so copying is the only option — this is purely about how polished
    the path is. Either way, document the copy-only contract so the type doesn't surprise anyone.

  3. Factory set. Decision: ship all four ResourceDescriptor factories, or only the two
    Array-backed ones and defer linear-memory support to a follow-up?

    A texture can be backed by four kinds of memory — the PR exposes one factory per kind:

    • from_array — texture over an Array (the headline feature)
    • from_mipmapped_array — texture over a MipmappedArray (the headline feature)
    • from_linear — texture over a plain 1D device buffer (ordinary linear memory, no Array)
    • from_pitch2d — texture over a plain 2D pitched buffer (ordinary linear memory, no Array)

    The first two are what this feature is about. The last two are a separate path that textures
    regular device memory unrelated to the new Array type. Shipping all four is more complete but
    commits more public API in 1.1; shipping only the Array-backed two keeps the initial surface
    focused and easier to walk back.

  4. Channel format. Decision: describe an element with two loose parameters (an ArrayFormat
    enum + a num_channels int), or with one bundled ChannelFormatDescriptor object like CuPy?

    Each array element has a component type (e.g. 8-bit uint, 32-bit float) and a channel count
    (1 = grayscale … 4 = RGBA). CUDA's C API bundles both into one struct (cudaChannelFormatDesc).
    Two ways to surface that:

    • Folded (this PR): Array.from_descriptor(shape=..., format=ArrayFormat.FLOAT32, num_channels=4)
    • Separate (CuPy): one ChannelFormatDescriptor(...) object passed as a unit

    Folded is a smaller surface and a simpler call. Separate matches CuPy and is a single object you
    can read back when introspecting an existing array's format, instead of reconstructing two fields.

  5. Descriptor type consistency. Decision: is the @dataclass / cdef class split deliberate
    (and documented), or should the two descriptors be the same kind of type for a predictable API?

    Both are config objects you fill in and pass to the texture machinery, but they're built as
    different kinds of object:

    • TextureDescriptor@dataclass (pure Python: auto __init__ / repr / equality, easy to
      construct and inspect)
    • ResourceDescriptorcdef class (Cython extension type: holds native C struct fields, faster,
      but more rigid — no auto-generated niceties, fixed attributes)

    The split may be justified: ResourceDescriptor wraps a C union (CUDA_RESOURCE_DESC, over
    array / mipmap / linear / pitch2d backings) and its from_* factories poke struct fields, so a
    cdef class is natural; TextureDescriptor is just a bag of sampling settings, so a @dataclass
    is the simple choice. But to a user they look like siblings yet behave differently (repr, equality,
    construction, subclassing). The author should confirm the divergence is intentional, or unify them.

  6. Bool naming. ✅ Decided — adopt the is_<something> convention.

    surface_load_store is a boolean on Array: it records whether the array was created with the
    surface load/store capability (CUDA's CUDA_ARRAY3D_SURFACE_LDST), which a SurfaceObject
    requires. Exposed both as a constructor keyword (surface_load_store=True) and a read-only
    property (arr.surface_load_store).

    The repo convention for boolean properties is is_<something>, so a property named
    surface_load_store doesn't read as a boolean the way arr.is_managed does. Resolution: rename
    the property to follow the is_<x> convention (e.g. is_surface_load_store) for consistency with
    the cuda-python codebase.
    Open detail only: the exact name, and whether to rename the constructor
    keyword too or leave the plain surface_load_store= option as-is.

  7. Scope. ✅ Decided — split the examples into a follow-up PR.

    The nine gl_interop_*.py examples (~5k lines, not CI-wired, need a GL context CI lacks) are
    orthogonal to the core API. Resolution: drop them from this PR and land them in a separate
    follow-up PR once this core texture/surface PR merges
    , since the examples depend on the new API
    it introduces.

Metadata

Metadata

Assignees

Labels

RFCPlans and announcementscuda.coreEverything related to the cuda.core modulefeatureNew feature or request
No fields configured for Enhancement.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions