Decoding UTF8 with Parallel Extract
23 Mar 2024
As a side-quest I recently decided to write a branchless utf8 decoder utilizing
the pext or "parallel extract" instruction.
It's compliant with rfc-3629, meaning that it doesn't just naively decode
the code-point but also checks for overlong encoding, surrogate pairs and such.
Compiled with gcc -O3 -march=x86-64-v3 the entire decoder results in just
about 29 instructions.
That's pretty sweet, if you ask me.
The full source code can be found here: utf8-pext.c.
UTF8: what is it?
A smal...
Read more at nrk.neocities.org