See the code on GitHubThis project compares the performance of a box blur algorithm implemented in plain C++, using OpenCV and on Apple's Metal GPU framework.
| Implementation | Blur Time (ms) |
|---|---|
| (1) Basic C++ | 2753 |
| (2) C++ & OpenCV | 31 |
| (3) Swift & Metal | 11 |
Blurring input.jpg with a uniform 19x19 kernel, setup and IO times excluded. (1) and (3) use the same algorithm, (2) uses OpenCV's filter2d.
swiftc compiler, part of Xcode on macOSbrew install opencv
make all
./build/blur_basic
(uses Xcode's swiftc compiler for swift source)

With macOS's xctrace tool and Xcode's Instruments.
xctrace record --launch -- ./build/blur_basic
(equivalent to the perf tool on Linux)
The trace shows the time spent in the applyBoxBlur function in the basic C++ implementation.

Times are measured with timing calls on the CPU main thread, see source code.
| Implementation | Setup | Read | Blur | Write | Total |
|---|---|---|---|---|---|
| Basic C++ | - | 3 | 2753 | 2 | 2758 |
| OpenCV | - | 3 | 31 | 2 | 36 |
| Metal | 63 | 20 | 11 | 4 | 98 |