about summary refs log tree commit diff
path: root/README.md
diff options
context:
space:
mode:
authorsterni <sternenseemann@systemli.org>2021-03-04T21·22+0100
committersterni <sternenseemann@systemli.org>2021-03-05T11·07+0000
commitb810c46a45c0bbf52b8a896d6d5d37f79f027d9f (patch)
tree60241864c4abb42dd8e61e32da5f0421faddb84f /README.md
parent5ae1d3fd7b363b6aab71b94626055ea91675f90d (diff)
feat(users/sterni/nix/utf8): pure nix utf-8 decoder r/2270
users.sterni.nix.utf8 implements UTF-8 decoding in pure nix. We
implement the decoding as a simple state machine which is fed one byte
at a time. Decoding whole strings is possible by subsequently calling
step. This is done in decode which uses builtins.foldl' to get around
recursion restrictions and a neat trick using builtins.deepSeq puck
showed me limiting the size of the thunks in a foldl' (which can also
cause a stack overflow).

This makes decoding arbitrarily large UTF-8 files into codepoints using
nix theoretically possible, but it is not really practical: Decoding a
36KB LaTeX file I had lying around takes ~160s on my laptop.

Change-Id: Iab8c973dac89074ec280b4880a7408e0b3d19bc7
Reviewed-on: https://cl.tvl.fyi/c/depot/+/2590
Tested-by: BuildkiteCI
Reviewed-by: sterni <sternenseemann@systemli.org>
Diffstat (limited to 'README.md')
0 files changed, 0 insertions, 0 deletions