Skip to content

Add MagCache inference acceleration for Wan2.2 (T2V + I2V)#433

Open
HadarIngonyama wants to merge 1 commit into
AI-Hypercomputer:mainfrom
HadarIngonyama:magcache_wan22_integration
Open

Add MagCache inference acceleration for Wan2.2 (T2V + I2V)#433
HadarIngonyama wants to merge 1 commit into
AI-Hypercomputer:mainfrom
HadarIngonyama:magcache_wan22_integration

Conversation

@HadarIngonyama

Copy link
Copy Markdown

Add MagCache inference acceleration for Wan2.2 (T2V + I2V)

Summary

This PR adds MagCache support to the Wan2.2 dual-transformer pipelines (both T2V and I2V), extending the existing Wan2.1 T2V MagCache support. MagCache skips the transformer blocks and reuses the cached block residual when the accumulated magnitude-ratio error stays below a threshold, using a precalibrated per-step mag_ratios_base curve so the skip schedule is deterministic (no data-dependent control flow, TPU/JIT friendly).

Measured speedups vs the dense render: ~1.82× for T2V and ~1.75× for I2V, with visually near-indistinguishable output.

What's included

  • Wan2.2 T2V (wan_pipeline_2_2.py): MagCache skip path for the dual transformer — a single interleaved mag_ratios_base curve spanning both the high-noise and low-noise phases, a per-phase forced-compute (retention) zone, and an explicit cached-residual reset at the high→low transformer boundary.
  • Wan2.2 I2V (wan_pipeline_i2v_2p2.py): the same skip path adapted for the image-conditioned pipeline (image condition concatenated with the latents, with the required BFHWC↔BCFHW transposes).
  • generate_wan.py: threads use_magcache / magcache_thresh / magcache_K / retention_ratio through to both 2.2 pipelines.
  • Configs:
    • base_wan_27b.yml (T2V): MagCache params + official mag_ratios_base, and flow_shift defaulted to 12.0 (see note below).
    • base_wan_i2v_27b.yml (I2V): MagCache params + official I2V-A14B mag_ratios_base, with boundary_ratio=0.900 to align the high→low switch with the curve (flow_shift stays at the I2V default of 5.0).
  • Tests (wan2_2_magcache_test.py): host-side validation/schedule/core tests plus a TPU-only end-to-end smoke test.
  • README: documents MagCache for Wan2.2 T2V and I2V, including the support matrix, config flags, sampling-shift requirement, and benchmark results.

Important: flow_shift alignment

mag_ratios_base is calibrated against where the high→low noise boundary lands, which flow_shift controls. Wan2.2 T2V requires flow_shift=12.0 (the official A14B sampling shift) — the previous default of 5.0 moved the boundary several steps out of phase, so MagCache skipped at the wrong steps and quality dropped. This PR sets the correct default, which also fixes the off-spec dense baseline. For I2V the official shift is 5.0, paired with boundary_ratio=0.900.

Results

Measured on a v7x (720×1280, 81 frames, 40 steps), reference = dense (use_magcache=False) render with the same seed/config:

Model Settings Speedup Steps skipped SSIM PSNR
Wan2.2 T2V flow_shift=12.0, thresh=0.04, K=2 ~1.82× 18/40 (360s→198s) ≈0.72 ≈21.8 dB
Wan2.2 I2V flow_shift=5.0, boundary_ratio=0.900, thresh=0.06, K=2 ~1.75× 17/40 (6.30→3.61 s/step) ≈0.91 ≈25.4 dB

The reference-based metrics mostly reflect trajectory divergence — caching nudges the sampler onto a different but equally plausible sample — rather than visible degradation; cached clips are visually hard to tell apart from dense. I2V scores higher because the image conditioning anchors the trajectory. Recalibrating mag_ratios_base for a specific dtype/attention kernel can tighten the metric gap further.

Usage

MagCache is one of several mutually-exclusive caching strategies (CFG Cache, SenCache, MagCache) — enable only one at a time.

# Wan2.2 T2V
python src/maxdiffusion/generate_wan.py \
  src/maxdiffusion/configs/base_wan_27b.yml \
  use_magcache=True magcache_thresh=0.04 magcache_K=2 ...

# Wan2.2 I2V
python src/maxdiffusion/generate_wan.py \
  src/maxdiffusion/configs/base_wan_i2v_27b.yml \
  use_magcache=True magcache_thresh=0.06 magcache_K=2 ...

Testing

  • wan2_2_magcache_test.py host-side tests pass (schedule/core logic).
  • End-to-end T2V and I2V runs validated on a v7x TPU; speedup and SSIM/PSNR numbers above were collected from those runs.

@HadarIngonyama HadarIngonyama requested a review from entrpn as a code owner June 29, 2026 19:27
@google-cla

google-cla Bot commented Jun 29, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@Perseus14 Perseus14 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I have added some comments. PTAL!

Please run a manual linting test.

pip install pylint pyink==23.10.0 pytype==2024.2.27
pyink src/maxdiffusion --check --diff --color --pyink-indentation=2 --line-length=125

Additionally could you also squash the commits?

Comment thread src/maxdiffusion/pipelines/wan/wan_pipeline_2_2.py Outdated
Comment thread README.md Outdated
Comment thread src/maxdiffusion/tests/wan/wan2_2_magcache_test.py
Comment thread src/maxdiffusion/tests/wan/wan2_2_magcache_test.py
@HadarIngonyama HadarIngonyama force-pushed the magcache_wan22_integration branch from 944f130 to 4669443 Compare July 1, 2026 09:17
@Perseus14

Perseus14 commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

LGTM!

I was able to reproduce the results for both T2V and I2V.

@prishajain1 Could you also take a look?

@HadarIngonyama Please rebase with main. Your current PR includes linting changes to ltx2 lora, which is fixed in main.

@mbohlool mbohlool left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two minor comments, otherwise looks good.

use_cfg_cache: bool = False,
use_sen_cache: bool = False,
use_kv_cache: bool = False,
use_magcache: bool = False,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default values in next 4 lines are inconsistant with wan_pipeline_2_2.py, can you use the same default values please?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


cache_count = 0
for step in range(num_inference_steps):
t = jnp.array(scheduler_state.timesteps, dtype=jnp.int32)[step]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you create this outside of the loop and access its index inside of the loop, the same as wan_pipeline_2_2.py?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Port MagCache acceleration to the Wan 2.2 dual-transformer pipelines.

- wan_pipeline_2_2.py / wan_pipeline_i2v_2p2.py: MagCache skip path for the
  dual-transformer loop — per-phase forced-compute (retention) zones, residual
  reset at the high->low boundary, and a single interleaved mag_ratios_base
  curve spanning both phases (indexed by global step). I2V additionally handles
  the image condition (concat with latents + BFHWC<->BCFHW transposes).
- generate_wan.py: pass use_magcache / magcache_thresh / magcache_K /
  retention_ratio through to the 2.2 pipelines.
- base_wan_27b.yml (T2V): flow_shift=12.0 + MagCache params + official A14B
  mag_ratios_base. base_wan_i2v_27b.yml (I2V): boundary_ratio=0.900,
  flow_shift=5.0 + official I2V-A14B mag_ratios_base.
- tests: wan2_2_magcache_test.py (host-side validation/schedule/core tests +
  a TPU-only end-to-end smoke test).
- README: document MagCache for Wan 2.2 (settings, speedup, SSIM/PSNR).
@HadarIngonyama HadarIngonyama force-pushed the magcache_wan22_integration branch from 4669443 to bf91020 Compare July 2, 2026 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants