diff options
author | sterni <sternenseemann@systemli.org> | 2023-07-11T15·54+0200 |
---|---|---|
committer | clbot <clbot@tvl.fyi> | 2023-07-11T16·11+0000 |
commit | 4ba624efae2d63057c2bd5be23841be5017bd457 (patch) | |
tree | bf18ead4649ae11503bcefadb23550fd6763351d /tvix/eval/src/tests/tvix_tests | |
parent | 5b1c327c828f0450f4e73b0fa851dcd00f82c693 (diff) |
fix(tvix/eval): use byte, not codepoint index for slicing in escape r/6404
This fixes a subtle issue which would occasionally lead to a crash (e.g. when evaluating (pkgs.systemd.outPath with --trace-runtime): With each character in the string that has a multi byte representation in UTF-8, the actual byte position and what tvix thought it was would get out of sync. This could either lead to * Tvix swallowing characters or jumbling characters if multi byte characters would cause the tracked index to become out of sync with the byte position before the first character to be escaped, or * Tvix crashing if (in the same situation) the out of sync index would be within a UTF-8 byte sequence. Luckily, std's `char_indices()` iterator implements exactly what `nix_escape_char()`'s original author had in mind with `.chars().enumerate()`. Using `i + 1` for continuing is safe, since all characters that need (in fact, can) to be escaped in Nix are represented as a single byte in UTF-8. Change-Id: I1c836f70cde3d72db1c644e9112852f0d824715e Reviewed-on: https://cl.tvl.fyi/c/depot/+/8952 Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI Autosubmit: sterni <sternenseemann@systemli.org>
Diffstat (limited to 'tvix/eval/src/tests/tvix_tests')
-rw-r--r-- | tvix/eval/src/tests/tvix_tests/eval-okay-escape-string-correct-char-boundaries.exp | 1 | ||||
-rw-r--r-- | tvix/eval/src/tests/tvix_tests/eval-okay-escape-string-correct-char-boundaries.nix | 6 |
2 files changed, 7 insertions, 0 deletions
diff --git a/tvix/eval/src/tests/tvix_tests/eval-okay-escape-string-correct-char-boundaries.exp b/tvix/eval/src/tests/tvix_tests/eval-okay-escape-string-correct-char-boundaries.exp new file mode 100644 index 000000000000..d889063f9ab1 --- /dev/null +++ b/tvix/eval/src/tests/tvix_tests/eval-okay-escape-string-correct-char-boundaries.exp @@ -0,0 +1 @@ +"💭(\":thonking:\")" diff --git a/tvix/eval/src/tests/tvix_tests/eval-okay-escape-string-correct-char-boundaries.nix b/tvix/eval/src/tests/tvix_tests/eval-okay-escape-string-correct-char-boundaries.nix new file mode 100644 index 000000000000..49f4b6273106 --- /dev/null +++ b/tvix/eval/src/tests/tvix_tests/eval-okay-escape-string-correct-char-boundaries.nix @@ -0,0 +1,6 @@ +# Regression test for a bug where tvix would crash in nix_escape_string +# because it counted the string position by unicode code point count, +# but then used it as a byte index for slicing. Consequently, it would +# try slicing 💭 in half, thinking the first element to be escaped was +# at byte index 2 (i.e. the quote). +"💭(\":thonking:\")" |