I found nice algorithm for blurring images – Stackblur by Mario Klingemann. It could do the job relatively fast and gives decent quality. You can check it here – web demonstration. As you can see it can be usable even in web projects.
As i wanted to include it to my cross-platform engine i found two c++ implementations:
First is SSE friendly, second contains some division optimization via static tables. However, both are not using all cpu cores. I took second one as foundation for my implementation, as i expected my code to work on mobile devices with no SSE support. Single-core processing of 1920×1200 rgba image with 100 px radius took only 219 ms (Intel Q9550, Windows 7).
I improved stackblur code to multi-threaded version – on my quad-core cpu speed results as expected showed 4x improvement – 63ms for the same task. You can download the part my lib below and use it as a foundation to your needs.
Download multi-threaded 32bit-color (RGBA) version of StackBlur: stackblur.cpp
I believe algorithm can be optimized even further – any suggestions are welcome.