The powf builtin causes crashes on older clang, so manually implement the (faster) intrinsic. The code it spawns is identical to that of nvcc.