Skip to content

fix: preserve Qwen3.5 MoE switch projection names#653

Open
ukint-vs wants to merge 2 commits into
lablup:mainfrom
ukint-vs:ukint-vs/qwen35-moe-loader-name-fix
Open

fix: preserve Qwen3.5 MoE switch projection names#653
ukint-vs wants to merge 2 commits into
lablup:mainfrom
ukint-vs:ukint-vs/qwen35-moe-loader-name-fix

Conversation

@ukint-vs

@ukint-vs ukint-vs commented Jul 4, 2026

Copy link
Copy Markdown

Summary

  • stack Qwen3.5 MoE per-expert gate_proj, up_proj, and down_proj weights under the switch_mlp.*_proj names read by SwitchGLU::from_weights
  • remove the sanitizer rename that rewrote switch_mlp.gate_proj/up_proj/down_proj to w1/w3/w2
  • add a focused sanitizer regression test for the per-expert checkpoint layout

Root cause

Some Qwen3.5 MoE checkpoints store expert weights as model.layers.N.mlp.experts.E.{gate_proj,up_proj,down_proj}. The sanitizer stacked those tensors but wrote them as switch_mlp.w1/w3/w2, while the loader later expects switch_mlp.gate_proj/up_proj/down_proj. That caused model loading to fail with missing switch_mlp.gate_proj weights.

Validation

  • cargo test qwen3_5_tests::sanitize --features metal,accelerate -- --ignored --test-threads=1
  • manually validated text and image generation with mlx-community/Ornith-1.0-35B-4bit after the loader fix

@cla-assistant

cla-assistant Bot commented Jul 4, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@cla-assistant

cla-assistant Bot commented Jul 4, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@ukint-vs ukint-vs marked this pull request as ready for review July 4, 2026 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant