Skip to content

[WIP] Update diffusers-cli for agentic use#13966

Open
DN6 wants to merge 29 commits into
mainfrom
diffuser-cli-for-agent
Open

[WIP] Update diffusers-cli for agentic use#13966
DN6 wants to merge 29 commits into
mainfrom
diffuser-cli-for-agent

Conversation

@DN6

@DN6 DN6 commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?

Some updates to the diffusers-cli to make it more agent friendly. This PR

  1. Adds a diffusers-cli skill to showcase the features available via the CLI and how to use them
  2. Adds a describe command that can we used to extract the inputs of a pipeline from an input repo id
  3. Adds a generate command that runs inference with any diffusers compatible pipelines. It also provides a number of optimization options (CP, cpu/group offload) + LoRA and allows running inference remotely on HF jobs.

Fixes # (issue)

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@github-actions github-actions Bot added size/L PR with diff > 200 LOC utils labels Jun 15, 2026
@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment thread .ai/skills/diffusers-cli/SKILL.md Outdated
Comment thread .ai/skills/diffusers-cli/SKILL.md

- Multi-stage workflows where you need intermediate tensor manipulation between pipelines → write Python.
- Training or fine-tuning → CLI only covers inference.
- Anything requiring custom `device_map`, `quantization_config`, or other low-level loader knobs not exposed by

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like quantization could be exposed to the CLI. Right now, one can only do that when using a prequantized checkpoint?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quantization has a fairly large API surface that might be better suited to writing a dedicated quantization script? e.g BnB quant config options have no overlap with TorchAO which in turn have no overlap with ModelOpt etc etc. TorchAO also supports using AOBaseConfig input which in turn has it's own input args.

We could explore trying to provide the option via a more restricted API though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No your reasoning makes sense. It's just that a user could expect it because quantization is sometimes the only way to do it locally. We can table it for now.

Comment thread .ai/skills/diffusers-cli/SKILL.md Outdated
Comment thread src/diffusers/commands/_common.py
parser.add_argument("--vae-tiling", action="store_true", help="Enable VAE tiling (lower peak VRAM).")
parser.add_argument("--vae-slicing", action="store_true", help="Enable VAE slicing (lower peak VRAM).")
parser.add_argument(
"--context-parallel",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it interact with --remote?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How --context-parallel interact with --remote? Like do we want the users to run context parallel inference in case HF Jobs don't support it? Or do we want to just delegate to HF Jobs and propagate if there are errors?

Comment thread src/diffusers/commands/generate.py
Comment thread src/diffusers/commands/generate.py
Comment thread src/diffusers/commands/generate.py Outdated
Comment thread src/diffusers/commands/generate.py
@sayakpaul

Copy link
Copy Markdown
Member

Generated the following with the CLI:

diffusers-cli generate -m black-forest-labs/FLUX.1-dev \
  --device cuda --dtype bf16 --seed 42 -o outputs/dog_moon.png \
  --pipeline-kwargs '{"prompt":"realistic photo of a dog walking down the surface of moon","guidance_scale":3.5,"num_inference_steps":50}'

Nice little summary:

generate
  task: generate
  model: black-forest-labs/FLUX.1-dev
  device: cuda
  pipeline_class: FluxPipeline
  modular: False
  outputs: ['outputs/dog_moon.png']
  seed: 42

Final output:

image

I think we could also add lightweight testing around these things just to ensure consistency and that the right inputs are being passed.

@github-actions

Copy link
Copy Markdown
Contributor

Hi @DN6, thanks for the PR! It does not appear to link an issue it fixes. If this PR addresses an existing issue, please add a closing keyword (e.g. Fixes #1234) to the PR description so the issue is linked. See the contribution guide for more details. If this PR intentionally does not fix a tracked issue, a maintainer can add the no-issue-needed label to silence this reminder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L PR with diff > 200 LOC utils

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants