Add AttentionMixin to transformers that are missing it#13941
Add AttentionMixin to transformers that are missing it#13941HaozheZhang6 wants to merge 1 commit into
AttentionMixin to transformers that are missing it#13941Conversation
|
While here I checked the other transformers that build their attention with |
|
@HaozheZhang6 If you can identify a pattern from all the issues, then it is better to fix those all at once. |
|
Makes sense — I'll consolidate them into this PR. I'll re-scan for the full set (including any beyond the five I listed) and add |
ovis_image, bria, bria_fibo, anyflow, anyflow_far, ernie_image and longcat_audio_dit build attention with AttentionModuleMixin but did not inherit the model-level AttentionMixin, so attn_processors / set_attn_processor / fuse_qkv_projections raised AttributeError (same gap as WanVACETransformer3DModel huggingface#12186). Add the mixin + a regression test for each. Addresses the AttentionMixin pattern from huggingface#13656. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
AttentionMixin to transformers that are missing it
e11c8cc to
f79a2b5
Compare
|
Done — all seven are in now: |
|
Hi @HaozheZhang6, thanks for the PR! It does not appear to link an issue it fixes. If this PR addresses an existing issue, please add a closing keyword (e.g. |
What does this PR do?
Several transformers build their attention with
AttentionModuleMixinbut don't inherit the model-levelAttentionMixin, soattn_processors/set_attn_processor/fuse_qkv_projections/unfuse_qkv_projectionsraiseAttributeError. This addsAttentionMixinto all of them, the same wayWanVACETransformer3DModelwas handled (#12186):OvisImageTransformer2DModel(the original ovis_image model/pipeline review #13630 finding)BriaTransformer2DModel,BriaFiboTransformer2DModelAnyFlowTransformer3DModel,AnyFlowFARTransformer3DModelErnieImageTransformer2DModelLongCatAudioDiTTransformerThe attention modules already use
AttentionModuleMixin, so the four APIs work without further changes and resolve to each model's own processor. Each model gets a regression test thatattn_processorsis populated andset_attn_processorround-trips.Note:
AnyFlowTransformer3DModel,AnyFlowFARTransformer3DModel, andErnieImageTransformer2DModelhave some unrelated pre-existing failures in their model test suites onmain(not touched here); the newtest_attention_processor_apipasses for all seven.Addresses the
AttentionMixinpattern from #13656.Before submitting
.ai/— readAGENTS.mdandreview-rules.md..ai/review-rules.md.Who can review?
@hlky — batched per your suggestion, from your model reviews in #13656.