Skip to content

ovis_image: fix guidance_scale / max_sequence_length / batched CFG / precomputed embeds + add pipeline test#13944

Open
HaozheZhang6 wants to merge 2 commits into
huggingface:mainfrom
HaozheZhang6:ovis-pipeline-guidance-scale
Open

ovis_image: fix guidance_scale / max_sequence_length / batched CFG / precomputed embeds + add pipeline test#13944
HaozheZhang6 wants to merge 2 commits into
huggingface:mainfrom
HaozheZhang6:ovis-pipeline-guidance-scale

Conversation

@HaozheZhang6

@HaozheZhang6 HaozheZhang6 commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Addresses items 3, 4, 5, 6 and 7 from the ovis_image review in #13630.

  • Item 3OvisImagePipeline.guidance_scale read an uninitialized self._guidance_scale; it's now set in __call__.
  • Item 6max_sequence_length was validated but never reached the tokenizer; it's now threaded through encode_prompt_get_ovis_prompt_embeds.
  • Item 5 — precomputed prompt_embeds were used as-is; they're now moved to the right device/dtype and repeated for num_images_per_prompt.
  • Item 4 — a single default negative_prompt with batched prompts under CFG raised a shape error; the negative prompt is now expanded to the batch size. Also guarded _get_ovis_prompt_embeds against text_encoder=None.
  • Item 7 — adds tests/pipelines/ovis_image/test_ovis_image.py (the model had no pipeline test) covering inference, the guidance_scale property, max_sequence_length, num_images_per_prompt, and batched CFG.

Item 2 (joint_attention_kwargs) and notes on the full PipelineTesterMixin are in a comment below.

Before submitting

  • Did you use an AI agent (Claude Code, Codex, Cursor, etc.) to help with this PR? If so:
    • Pointed it at .ai/ — read AGENTS.md and review-rules.md.
    • Self-reviewed the diff against .ai/review-rules.md.
  • Did you read the contributor guideline?
  • Was this discussed via a GitHub issue? Yes — ovis_image model/pipeline review #13630.
  • Did you write any new tests? Yes.

Who can review?

@hlky

@github-actions github-actions Bot added size/S PR with diff < 50 LOC pipelines labels Jun 13, 2026
@HaozheZhang6 HaozheZhang6 changed the title Initialize _guidance_scale in OvisImagePipeline.__call__ Add ovis_image pipeline test + initialize guidance_scale Jun 13, 2026
@HaozheZhang6 HaozheZhang6 force-pushed the ovis-pipeline-guidance-scale branch from ce56ae4 to a3f91f1 Compare June 13, 2026 20:29
@github-actions github-actions Bot added tests size/M PR with diff < 200 LOC and removed size/S PR with diff < 50 LOC labels Jun 13, 2026
@hlky

hlky commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

This looks good so far @HaozheZhang6. As this is a new test, I would look at some of the recent test refactors and adopt those newer patterns to avoid needing a refactor later.

For the ovis_image review, item 1 is already covered by the pattern level PR you made. I would say items 2 and 6 are other easy fixes, while 4 and 5 are also fairly straight forward. If you feel confident I'd suggest attempting those too, then you'll completed the full review.

@HaozheZhang6

Copy link
Copy Markdown
Contributor Author

Thanks @hlky! Makes sense — I'll switch the test to the full PipelineTesterMixin using the newer pattern (a realistic tiny Qwen3 text encoder like the DreamLite / NucleusMoE-Image tests, so the standard save/load and dtype/device tests pass out of the box), and take on items 2/4/5/6 so the full review is covered. Will push the updates here.

…precomputed embeds + add pipeline test

Addresses items 3/4/5/6/7 of huggingface#13630.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@HaozheZhang6 HaozheZhang6 changed the title Add ovis_image pipeline test + initialize guidance_scale ovis_image: fix guidance_scale / max_sequence_length / batched CFG / precomputed embeds + add pipeline test Jun 13, 2026
@HaozheZhang6 HaozheZhang6 force-pushed the ovis-pipeline-guidance-scale branch from a3f91f1 to 46b0de9 Compare June 13, 2026 21:15
@HaozheZhang6

Copy link
Copy Markdown
Contributor Author

Pushed items 3/4/5/6 + the test. Two things I'd value your steer on:

  1. Full PipelineTesterMixin — I started toward it but ovis hits two infra issues: the tiny model produces NaN in the offload / save-load comparison tests (DreamLite avoids this with AutoencoderTiny + a realistic Qwen3-VL), and the tokenizer's chat_template is lost on save/load because the pipeline uses a bare tokenizer (no processor component like DreamLite). Happy to mirror the DreamLite fixture exactly if that's the intended approach.
  2. Item 2 (joint_attention_kwargs) — the transformer forward doesn't currently accept attention_kwargs; the blocks already do, so it looks like a matter of threading it through forward → blocks and passing self.joint_attention_kwargs from the pipeline. I'll do that next unless you had a different shape in mind.

- thread joint_attention_kwargs through the transformer forward + blocks
  (item 2) and pass it from the pipeline's transformer calls.
- encode_prompt now returns both positive and negative embeds (the z_image /
  PixArt convention) so precomputed embeds work end-to-end and the prompt is
  encoded in a single call.
- switch the pipeline test to the full PipelineTesterMixin.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions github-actions Bot added models size/L PR with diff > 200 LOC and removed size/M PR with diff < 200 LOC labels Jun 13, 2026
@HaozheZhang6

Copy link
Copy Markdown
Contributor Author

Done — items 2/4/5/6 are in, and the test is now on the full PipelineTesterMixin (passes locally, plus a couple of ovis-specific checks).

Heads up on one design choice: to get test_encode_prompt_works_in_isolation passing I changed encode_prompt to encode and return both the positive and negative embeds in a single call (the z_image / PixArt convention) instead of calling it twice. The negative path already accepted precomputed embeds — this just returns them so the standard test can pick them up. If you'd rather keep encode_prompt single-prompt, I'll revert that and skip the one test instead.

item 2 is threaded through forward with @apply_lora_scale, matching flux.

@hlky

hlky commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

@HaozheZhang6 LGTM, nice work.

For test_encode_prompt_works_in_isolation/encode_prompt I think returning both is the right choice, it's established in more popular model/pipelines and required for the test itself. The change for item 2 is also the correct approach and @apply_lora_scale is a thoughtful extra touch.

As a general rule I would say if a pattern is present in a more popular model/pipeline it's usually a good indication that it's the preferred approach.

At this point I think you are ready to request a review from a maintainer for approval, I would expect the tests may have some requested changes because of the recent refactors we mentioned, I haven't followed those changes exactly so I can't be more specific, my suggestion would be to await the review before making any further changes.

@HaozheZhang6

Copy link
Copy Markdown
Contributor Author

Thanks @hlky — appreciate you walking me through this one. I'll request a maintainer review and hold off on any further changes until then.

@HaozheZhang6

Copy link
Copy Markdown
Contributor Author

@DN6 — per @hlky's suggestion above, would you have a chance to review this for approval? You did the recent Ovis transformer refactor (#13347), so you've probably got the most context here. CI is green and the full PipelineTesterMixin passes.

@github-actions

Copy link
Copy Markdown
Contributor

Hi @HaozheZhang6, thanks for the PR! It does not appear to link an issue it fixes. If this PR addresses an existing issue, please add a closing keyword (e.g. Fixes #1234) to the PR description so the issue is linked. See the contribution guide for more details. If this PR intentionally does not fix a tracked issue, a maintainer can add the no-issue-needed label to silence this reminder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants