ovis_image: fix guidance_scale / max_sequence_length / batched CFG / precomputed embeds + add pipeline test#13944
Conversation
_guidance_scale in OvisImagePipeline.__call__ovis_image pipeline test + initialize guidance_scale
ce56ae4 to
a3f91f1
Compare
|
This looks good so far @HaozheZhang6. As this is a new test, I would look at some of the recent test refactors and adopt those newer patterns to avoid needing a refactor later. For the |
|
Thanks @hlky! Makes sense — I'll switch the test to the full |
…precomputed embeds + add pipeline test Addresses items 3/4/5/6/7 of huggingface#13630. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ovis_image pipeline test + initialize guidance_scalea3f91f1 to
46b0de9
Compare
|
Pushed items 3/4/5/6 + the test. Two things I'd value your steer on:
|
- thread joint_attention_kwargs through the transformer forward + blocks (item 2) and pass it from the pipeline's transformer calls. - encode_prompt now returns both positive and negative embeds (the z_image / PixArt convention) so precomputed embeds work end-to-end and the prompt is encoded in a single call. - switch the pipeline test to the full PipelineTesterMixin. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Done — items 2/4/5/6 are in, and the test is now on the full Heads up on one design choice: to get item 2 is threaded through |
|
@HaozheZhang6 LGTM, nice work. For As a general rule I would say if a pattern is present in a more popular model/pipeline it's usually a good indication that it's the preferred approach. At this point I think you are ready to request a review from a maintainer for approval, I would expect the tests may have some requested changes because of the recent refactors we mentioned, I haven't followed those changes exactly so I can't be more specific, my suggestion would be to await the review before making any further changes. |
|
Thanks @hlky — appreciate you walking me through this one. I'll request a maintainer review and hold off on any further changes until then. |
|
Hi @HaozheZhang6, thanks for the PR! It does not appear to link an issue it fixes. If this PR addresses an existing issue, please add a closing keyword (e.g. |
What does this PR do?
Addresses items 3, 4, 5, 6 and 7 from the
ovis_imagereview in #13630.OvisImagePipeline.guidance_scaleread an uninitializedself._guidance_scale; it's now set in__call__.max_sequence_lengthwas validated but never reached the tokenizer; it's now threaded throughencode_prompt→_get_ovis_prompt_embeds.prompt_embedswere used as-is; they're now moved to the right device/dtype and repeated fornum_images_per_prompt.negative_promptwith batched prompts under CFG raised a shape error; the negative prompt is now expanded to the batch size. Also guarded_get_ovis_prompt_embedsagainsttext_encoder=None.tests/pipelines/ovis_image/test_ovis_image.py(the model had no pipeline test) covering inference, theguidance_scaleproperty,max_sequence_length,num_images_per_prompt, and batched CFG.Item 2 (
joint_attention_kwargs) and notes on the fullPipelineTesterMixinare in a comment below.Before submitting
.ai/— readAGENTS.mdandreview-rules.md..ai/review-rules.md.Who can review?
@hlky