News Score: Score the News, Sort the News, Rewrite the Headlines

The One Billion Row Challenge in CUDA: from 17m to 17s

On my journey to learn CUDA, I decided to tackle the One Billion Row Challenge with it. The challenge is simple, but implementing it in CUDA was not. Here I will share my solution that runs in 16.8 seconds on a V100. It’s certainly not the fastest solution, but it is the first one of its kind (no cudf, hand-written kernels only). I challenge other CUDA enthusiasts to make it faster. Baseline in pure C++ You can’t improve what you don’t measure. Since I’m going to be writing C++ anyways for CUDA,...

Read more at tspeterkim.github.io

© News Score  score the news, sort the news, rewrite the headlines