Skip to content

New LED hardware driver: WLED Pixel Bus#5704

Open
DedeHai wants to merge 176 commits into
wled:mainfrom
DedeHai:WLEDpixelBus_PR
Open

New LED hardware driver: WLED Pixel Bus#5704
DedeHai wants to merge 176 commits into
wled:mainfrom
DedeHai:WLEDpixelBus_PR

Conversation

@DedeHai

@DedeHai DedeHai commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

This is a major update adding WLED Pixel Bus written from scratch to replace NeoPixelBus and it packs a lot of useful features. From a user perspective the main improvements are less memory use and dynamic adjustment of LED timing to eliminate flickering by fine-tuning the timing from -30% to + 30% in 10% steps (dropdown menu).

The new hardware driver structure allows for fully dynamic updates of LED outputs and LED timings. It also adds glitch free parallel SPI output support on the C3 as well as parallel bit-banging on all platforms including ESP8266.
All parallel outputs (except bit-bang) use ping-pong DMA buffers instead of fully pre-filled buffers - memory usage is optimized and DMA buffers are independend of number of LEDs. Also the calculation from LED colors to DMA buffers is highly speed optimized, making the driver blazing fast - I saw 30% FPS improvements in some cases.
The fully dynamic nature of the driver allows for a "Custom digidal bus" in the LED settings to support virtually any LED bus type out there - specify timing, number of color channels, invert any color channel or set any color channel combination. The driver also supports inverting the output signal on any pin (ESP32 only and currently custom bus only).

The LED config UI was updated to fully support the new driver options.

A lot more testing is required for all different kind of LEDs on all platforms. I am sure there is a ton of bugs and kinks to iron out.

I would like to take this opportunity to thank @Makuna for his outstanding work on NeoPixelBus which served me well as a reference on hardware configurations and special LED types.

  • saves over 10k of heap on ESP32 and about 3k on ESP8266
  • increases available LED outputs on ESP32 C3 from 2 to 14 (2x RMT, 4x parallel SPI, 8x bit-bang)
  • increases available LED outputs on ESP32 S2 with 8x bit-bang if I2S is not available
  • increases available LED outputs on ESP8266 3 to 6 (1x UART, 1xI2S DMA, 4x bit-bang)
  • allows over 1000 LEDs again on ESP8266 thanks to reduced RAM footprint, it does not use a global buffer but writes to LED buffers directly which is slower but uses less RAM.
  • adds support for TM1815
  • improved low-brightness color accuracy to TM1814, TM1815 and APA102 by utilizing LED current settings
  • improved color resolution on 16bit LED types (UCS8903, UCS8904, SM16825)
  • if needed I2S (ESP32 and ESP32 S2) and LCD (ESP32 S3) driver can output on 16 parallel pins changing a #define
  • automatic calculation and optimized use of multiple RMT memory blocks is implemented but currently only works on C3 (RMTHI driver does not support using multiple blocks yet)

ESP32 flash and RAM usage comparison (bit bang disabled)

using NeoPixelBus
RAM: [== ] 24.9% (used 81576 bytes from 327680 bytes)
Flash: [======== ] 83.1% (used 1306493 bytes from 1572864 bytes)
Free Heap: (6 outputs, 256 each): 79.8k

using WLEDpixelBus
RAM: [== ] 24.9% (used 81680 bytes from 327680 bytes)
Flash: [======== ] 83.0% (used 1306053 bytes from 1572864 bytes)
Free Heap: (6 outputs, 256 each): 92.8k

Summary by CodeRabbit

  • New Features
    • Added per-bus Bus Speed Factor (SF/bsf) settings.
    • Added Custom Bus support with channel mapping, inversion, custom timing, and per-channel colors.
    • Expanded supported output driver/type options (bit-bang, parallel SPI, RMT, I2S, LCD where available) and added new LED type options (e.g., TM1815 / custom bus).
  • Bug Fixes
    • Improved ESP8266 direct-to-bus pixel handling and refined gamma/brightness behavior across realtime and particle rendering.
  • Chores
    • Updated capability limits and the LED settings UI + config import/export for the new bus features.

DedeHai added 30 commits March 5, 2026 20:55
@DedeHai

This comment was marked as outdated.

@coderabbitai

This comment was marked as outdated.

@DedeHai

This comment was marked as outdated.

@coderabbitai

This comment was marked as outdated.

Comment thread wled00/FX_fcn.cpp
const Segment *segO = topSegment.getOldSegment();
const bool hasGrouping = topSegment.groupLength() != 1;

#ifdef ESP8266

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a feature flag (WLED_ENABLE_DIRECT_RENDER or the like?), not an architecture gate. If we're going to allow this, we should also bring back the "Enable global buffer" option on devices that enable it so it can be optional -- a lot of the relevant code paths already seem to have a _pixels == nullptr gate still anyways.

Personally I want gamma correct blending on my low pixel count devices, and I think it's important to allow 100% compatible rendering mode with ESP32 devices. I don't agree that the RAM tradeoff is always worth it.

@DedeHai DedeHai Jun 27, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally did this for ESP8266 only as I did not want to drag along the "no buffer" code everywhere as it adds conditionals in the render hot path but making it a feature flag that is only supported on ESP8266 makes sense if this path is needed in the future.

Personally I want gamma correct blending on my low pixel count devices, and I think it's important to allow 100% compatible rendering mode with ESP32 devices. I don't agree that the RAM tradeoff is always worth it.

Ah so this is the "something is still missing" in the back of my head. Should be easy to apply gamma in
scaleAll().
In fact I was thinking about moving brightness and gamma down to the bus level, I have a rough sketch of code that adds "per color scaling" as well but I did not want to introduce more changes than needed and make this a "change everything at once" PR.

edit:
checking the code, the gamma and brightness handling is still a bit of a mess, I did add a TODO but never done it. As it is, gamma is now applied before brightness which results in color loss. Should really move brightness and gamma down to bus level but that messes with CCT.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally did this for ESP8266 only as I did not want to drag along the "no buffer" code everywhere as it adds conditionals in the render hot path but making it a feature flag that is only supported on ESP8266 makes sense if this path is needed in the future.

I absolutely agree it should be a compile time feature gate so we can keep the hot path conditional-free in most cases.

My biggest concern is being able to get strict compatibility. If we can't 100% guarantee identical blending results, then I think we should have the ability to turn the feature on and off regardless of architecture. I might want my short-strip ESP8266es to match my ESP32s, but someone else who has a bunch of old 8266es with long strips deployed might want to go the other way around and have their new ESP32s match their 8266es.

I'm happy to take "on by default in our 8266 envs, off by default on ESP32" via our platformio environments (or platform HAL later), but I don't think the algorithm choice should be tied to the processor type in the code. If/when we support new platforms, we might face similar tradeoffs in the future.

edit: checking the code, the gamma and brightness handling is still a bit of a mess, I did add a TODO but never done it. As it is, gamma is now applied before brightness which results in color loss. Should really move brightness and gamma down to bus level but that messes with CCT.

Yeah, it's not easy to untangle. I'm happy letting them be different for the time being as long as we can choose which path is taken on a per-build basis.

@DedeHai DedeHai Jun 27, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, when I looked through the code today I noticed some discrepancies (see also my hidden enquiry to the rabbit below). There are some downsides not using a global buffer, it can diminish color quality when using overlapping segments - I did not test yet if it even looks right, there may be bugs. So being able to disable it as a feature flag makes sense.

edit:
we could even make the no-buffer path the opt-in define since there were no complaints I know of in 16.0

Comment on lines +161 to +171
static int8_t s_pins[WLED_MAX_BB_CHANNELS];
static uint16_t s_numPixels[WLED_MAX_BB_CHANNELS];
static uint8_t* s_pixelData[WLED_MAX_BB_CHANNELS];
static uint8_t s_channelCount;
static uint32_t s_allMask; // GPIO bitmask of all registered output pins
static uint8_t s_stagedCount; // how many channels have called show() this frame
// Timing in CPU cycles — identical across all channels
static uint32_t s_t0h;
static uint32_t s_t1h;
static uint32_t s_period;
static uint8_t s_pixelBytes; // bytes per encoded pixel

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be in a singleton that's heap allocated when a relevant bus is constructed for the first time. There's no need to permanently waste RAM if they're not used.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats a good point, I had to look up what a singleton is but makes perfect sense. Should do the same in ESP32 bit bang driver, maybe other places too.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats a good point, I had to look up what a singleton is but makes perfect sense. Should do the same in ESP32 bit bang driver, maybe other places too.

Yes, IMO there's way too much in common between the two versions of the bit bang driver - they should probably be sharing code.

I haven't thoroughly reviewed all of the other drivers yet but I rather suspect the same basic issue applies in many places. Todays AIs are just like junior devs, they absolutely love to copy-and-paste or write from scratch when they should be building re-usable functions and data structures. (Offhand: the replicated DMA encoding loops are a prime example.)

@DedeHai DedeHai Jun 27, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sure there is room for optimization, I had claude come up with the structure for each driver seperately and changed stuff over and over (as you can see in the commit history) to distill the current version - it was not clear to me in the beginning what the best structure will be or what will even work so did it step by step with no grand master plan.
The color encoding was a bit of a head-scratcher in the beginning but I am pretty happy how that turned out.

edit: I first did the ESP8266 BB code and once that worked well had claude derive the ESP32 version from that and then fixed it until working. Apart from a few details they are pretty much identical.

@DedeHai

This comment was marked as outdated.

@coderabbitai

This comment was marked as low quality.

@DedeHai

This comment was marked as outdated.

@coderabbitai

This comment was marked as low quality.

@willmmiles

Copy link
Copy Markdown
Member

I'll be reviewing slowly and in stages - there's a lot here and it's difficult to get through all at once. We don't necessarily have to address everything before we land this; it's big enough that we can expect some ongoing development.

Timeline-wise, I think we want to land V5 first and take this immediately after.

@DedeHai

DedeHai commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator Author

I'll be reviewing slowly and in stages - there's a lot here and it's difficult to get through all at once. We don't necessarily have to address everything before we land this; it's big enough that we can expect some ongoing development.

Timeline-wise, I think we want to land V5 first and take this immediately after.

there is no rush - I am still testing cases and find bugs all the time. I did test each and every driver and every LED type at one point but changed it later again so not sure everything is in order, for example the LCD driver, I did many tests during development using the 16-parallel output but it's possible it is now broken when I changed to the 8-parallel version. I made this PR to get some feedback but it is by no means finished and ready to merge.

I agree that the V5 should be done first - it will probably break all the drivers though as some API calls won't be available or changed names. It will take some work to get them running again but the hard part was the structure and the correct init sequences - API calls are more semantics at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants