Allows compiling for architectures with different data type sizes
Improvements: - Don't propagate values if they didn't change - Use custom sort algorithm to speedup the sorting - Allocate a contiguous array of channel outputs