[ExecuTorch][WebGPU] Add aten.index.Tensor (1D-self gather)#20461
[ExecuTorch][WebGPU] Add aten.index.Tensor (1D-self gather)#20461JulianCloudNTH wants to merge 1 commit into
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20461
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 6 New Failures, 1 Pending, 1 Unrelated FailureAs of commit 80e6fb0 with merge base 1b726b2 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
Stack from ghstack (oldest at bottom):
Adds the WebGPU delegate handler for aten.index.Tensor, the 1D-self advanced-index
gather out[i] = self[index[i]] (output shape == index shape). This is the form the
VulkanPartitioner delegates -- it requires a 1D self and exactly one non-None index
(op_registry.py); 2D mask/freqs gathers stay on CPU. It mirrors the Vulkan delegate's
index_tensor op (IndexTensor.cpp + index_tensor_buffer.glsl) as a single compute
dispatch over the output elements, each reading the int32 index and gathering the
corresponding fp32 self element.
The op is composed as:
numel bound; buffer-only, fp32 self/out, int32 index, 1D dispatch via the shared
WebGPUUtils helpers (clamp workgroup size + 1D count).
index tensor; fp32 self/out; int32 index; out numel == index numel), failing loud on
any violation, then records the dispatch. row_width is dropped (always 1 for 1D self).
Differential Revision: D109478967