From: https://developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-computing/chapter-39-parallel-prefix-sum-scan-cuda
For perf stats see: https://shadeup.dev/gjaqc7vl1b9y