diff options
author | sterni <sternenseemann@systemli.org> | 2021-03-04T21·22+0100 |
---|---|---|
committer | sterni <sternenseemann@systemli.org> | 2021-03-05T11·07+0000 |
commit | b810c46a45c0bbf52b8a896d6d5d37f79f027d9f (patch) | |
tree | 60241864c4abb42dd8e61e32da5f0421faddb84f /users/sterni/nix/int | |
parent | 5ae1d3fd7b363b6aab71b94626055ea91675f90d (diff) |
feat(users/sterni/nix/utf8): pure nix utf-8 decoder r/2270
users.sterni.nix.utf8 implements UTF-8 decoding in pure nix. We implement the decoding as a simple state machine which is fed one byte at a time. Decoding whole strings is possible by subsequently calling step. This is done in decode which uses builtins.foldl' to get around recursion restrictions and a neat trick using builtins.deepSeq puck showed me limiting the size of the thunks in a foldl' (which can also cause a stack overflow). This makes decoding arbitrarily large UTF-8 files into codepoints using nix theoretically possible, but it is not really practical: Decoding a 36KB LaTeX file I had lying around takes ~160s on my laptop. Change-Id: Iab8c973dac89074ec280b4880a7408e0b3d19bc7 Reviewed-on: https://cl.tvl.fyi/c/depot/+/2590 Tested-by: BuildkiteCI Reviewed-by: sterni <sternenseemann@systemli.org>
Diffstat (limited to 'users/sterni/nix/int')
-rw-r--r-- | users/sterni/nix/int/default.nix | 3 |
1 files changed, 3 insertions, 0 deletions
diff --git a/users/sterni/nix/int/default.nix b/users/sterni/nix/int/default.nix index cd46fe7864d1..b3157571272f 100644 --- a/users/sterni/nix/int/default.nix +++ b/users/sterni/nix/int/default.nix @@ -97,6 +97,8 @@ let # i. e. they truncate towards 0 mod = a: b: let res = a / b; in a - (res * b); + inRange = a: b: x: x >= a && x <= b; + in { inherit maxBound @@ -117,5 +119,6 @@ in { bitXor toHex fromHex + inRange ; } |