Skip to content

Add AttentionMixin to transformers that are missing it#13941

Open
HaozheZhang6 wants to merge 1 commit into
huggingface:mainfrom
HaozheZhang6:ovis-image-attention-mixin
Open

Add AttentionMixin to transformers that are missing it#13941
HaozheZhang6 wants to merge 1 commit into
huggingface:mainfrom
HaozheZhang6:ovis-image-attention-mixin

Conversation

@HaozheZhang6

@HaozheZhang6 HaozheZhang6 commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Several transformers build their attention with AttentionModuleMixin but don't inherit the model-level AttentionMixin, so attn_processors / set_attn_processor / fuse_qkv_projections / unfuse_qkv_projections raise AttributeError. This adds AttentionMixin to all of them, the same way WanVACETransformer3DModel was handled (#12186):

  • OvisImageTransformer2DModel (the original ovis_image model/pipeline review #13630 finding)
  • BriaTransformer2DModel, BriaFiboTransformer2DModel
  • AnyFlowTransformer3DModel, AnyFlowFARTransformer3DModel
  • ErnieImageTransformer2DModel
  • LongCatAudioDiTTransformer

The attention modules already use AttentionModuleMixin, so the four APIs work without further changes and resolve to each model's own processor. Each model gets a regression test that attn_processors is populated and set_attn_processor round-trips.

Note: AnyFlowTransformer3DModel, AnyFlowFARTransformer3DModel, and ErnieImageTransformer2DModel have some unrelated pre-existing failures in their model test suites on main (not touched here); the new test_attention_processor_api passes for all seven.

Addresses the AttentionMixin pattern from #13656.

Before submitting

Who can review?

@hlky — batched per your suggestion, from your model reviews in #13656.

@github-actions github-actions Bot added size/S PR with diff < 50 LOC models tests and removed size/S PR with diff < 50 LOC labels Jun 13, 2026
@HaozheZhang6

Copy link
Copy Markdown
Contributor Author

While here I checked the other transformers that build their attention with AttentionModuleMixin but don't inherit the model-level AttentionMixin — the same gap is in AnyFlowTransformer3DModel, AnyFlowFARTransformer3DModel, BriaTransformer2DModel, BriaFiboTransformer2DModel, and ErnieImageTransformer2DModel (all raise AttributeError on attn_processors). Happy to fix those in a follow-up batch if this approach looks right.

@hlky

hlky commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

@HaozheZhang6 If you can identify a pattern from all the issues, then it is better to fix those all at once. AttentionMixin is a good case for that as the change is minimal. Other patterns where the change is more involved may be riskier to fix as a whole. There may be others beyond than those you found, I would either point your agent at the meta issue ask it to check all the sub-issues, or do another pass on repo as a whole. For this particular pattern the part I would exercise caution with is the tests, depending on maintainers view (I would ask @DN6 or @sayakpaul about this) it may be better to add the test to ModelTesterMixin rather than individually.

@HaozheZhang6

Copy link
Copy Markdown
Contributor Author

Makes sense — I'll consolidate them into this PR. I'll re-scan for the full set (including any beyond the five I listed) and add AttentionMixin + a regression test to each here, then close the separate Bria PR I'd just opened (#13943).

ovis_image, bria, bria_fibo, anyflow, anyflow_far, ernie_image and longcat_audio_dit
build attention with AttentionModuleMixin but did not inherit the model-level
AttentionMixin, so attn_processors / set_attn_processor / fuse_qkv_projections raised
AttributeError (same gap as WanVACETransformer3DModel huggingface#12186). Add the mixin + a
regression test for each.

Addresses the AttentionMixin pattern from huggingface#13656.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@HaozheZhang6 HaozheZhang6 changed the title Add AttentionMixin to OvisImageTransformer2DModel Add AttentionMixin to transformers that are missing it Jun 13, 2026
@HaozheZhang6 HaozheZhang6 force-pushed the ovis-image-attention-mixin branch from e11c8cc to f79a2b5 Compare June 13, 2026 19:10
@github-actions github-actions Bot added the size/M PR with diff < 200 LOC label Jun 13, 2026
@HaozheZhang6

Copy link
Copy Markdown
Contributor Author

Done — all seven are in now: OvisImage, Bria, BriaFibo, AnyFlow, AnyFlowFAR, ErnieImage, and LongCatAudioDiTTransformer (that last one I'd missed in the first scan). Each gets AttentionMixin + a small regression test.

@github-actions

Copy link
Copy Markdown
Contributor

Hi @HaozheZhang6, thanks for the PR! It does not appear to link an issue it fixes. If this PR addresses an existing issue, please add a closing keyword (e.g. Fixes #1234) to the PR description so the issue is linked. See the contribution guide for more details. If this PR intentionally does not fix a tracked issue, a maintainer can add the no-issue-needed label to silence this reminder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

models size/M PR with diff < 200 LOC tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants