Add Ideogram4LoraLoaderMixin (LoRA loading for Ideogram4)#13921
Add Ideogram4LoraLoaderMixin (LoRA loading for Ideogram4)#13921linoytsaban wants to merge 12 commits into
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
sayakpaul
left a comment
There was a problem hiding this comment.
Thanks! I think the tests are missing.
support loading non-diffusers Ideogram4 LoRAs
|
Hi @linoytsaban, thanks for the PR! It does not appear to link an issue it fixes. If this PR addresses an existing issue, please add a closing keyword (e.g. |
| @unittest.skip("Ideogram4 does not support call-time LoRA scaling via attention_kwargs.") | ||
| def test_set_adapters_match_attention_kwargs(self): | ||
| pass |
- pipeline: run the text encoder on its parameters' current device, then move features to the execution device, so encode_prompt works under enable_model_cpu_offload. The pipeline calls the text encoder's submodules directly to tap intermediate layers, which bypasses accelerate's onload hook, so the weights stay on CPU while inputs are on the execution device. Fixes test_lora_loading_model_cpu_offload. - tests: override test_lora_fuse_nan to corrupt a weight under Ideogram4's `layers` tower (the base test probes transformer_blocks/blocks/etc.). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@sayakpaul pushed fixes for the two CI failures you flagged — would appreciate your opinion on the pipeline change in particular.
Fix: run the text encoder on the device its parameters currently live on, then move the features to the pipeline's execution device. This is a no-op on the normal (non-offload) path (
|
| # Run the encoder on the device its parameters currently live on, then move the features to the | ||
| # pipeline device. encode_prompt calls the text encoder's submodules directly, so under | ||
| # enable_model_cpu_offload the onload hook never fires and the weights stay on CPU; honoring their | ||
| # actual device avoids a device mismatch on the token embedding. | ||
| te_device = self.text_encoder.device | ||
| token_ids = token_ids.to(te_device) | ||
| attention_mask = attention_mask.to(te_device) | ||
| text_position_ids = text_position_ids.to(te_device) |
There was a problem hiding this comment.
Doing it this way, the te_device would be the accelerator?
| ) | ||
| text_features = torch.stack(selected, dim=0).permute(1, 2, 3, 0).reshape(batch_size, max_sequence_length, -1) | ||
| text_features = (text_features * attention_mask.to(text_features.dtype).unsqueeze(-1)).to(torch.float32) | ||
| text_features = text_features.to(device) |
Adds
Ideogram4LoraLoaderMixinand wires it intoIdeogram4Pipeline, so Ideogram 4 pipelines canload_lora_weights/save_lora_weights/ fuse like the other models.This is the loading foundation split out of the Ideogram4 work so it can be reviewed/merged on its own. Two follow-ups are stacked on top of this branch:
Both depend on this mixin; this PR is independent of the training script's readiness.