diff options
author | Vincent Ambo <mail@tazj.in> | 2023-02-02T12·51+0300 |
---|---|---|
committer | tazjin <tazjin@tvl.su> | 2023-02-02T17·50+0000 |
commit | 9d6f29a72b3b466dd697c2eaa97f9a41b767fdff (patch) | |
tree | b8964a57844dae703390247886b0530594c8a147 /tvix/cli/Cargo.toml | |
parent | 2c07ff0f8c126cb475c6e100b56bbaa03303dda7 (diff) |
refactor(tvix/cli): use Wu-Manber string scanning for drv references r/5825
Switch out the string-scanning algorithm used in the reference scanner. The construction of aho-corasick automata made up the vast majority of runtime when evaluating nixpkgs previously. While the actual scanning with a constructed automaton is relatively fast, we almost never scan for the same set of strings twice and the cost is not worth it. An algorithm that better matches our needs is the Wu-Manber multiple string match algorithm, which works efficiently on *long* and *random* strings of the *same length*, which describes store paths (up to their hash component). This switches the refscanner crate to a Rust implementation[0][1] of this algorithm. This has several implications: 1. This crate does not provide a way to scan streams. I'm not sure if this is an inherent problem with the algorithm (probably not, but it would need buffering). Either way, related functions and tests (which were actually unused) have been removed. 2. All strings need to be of the same length. For this reason, we truncate the known paths after their hash part (they are still unique, of course). 3. Passing an empty set of matches, or a match that is shorter than the length of a store path, causes the crate to panic. We safeguard against this by completely skipping the refscanning if there are no known paths (i.e. when evaluating the first derivation of an eval), and by bailing out of scanning a string that is shorter than a store path. On the upside, this reduces overall runtime to less 1/5 of what it was before when evaluating `pkgs.stdenv.drvPath`. [0]: Frankly, it's a random, research-grade MIT-licensed crate that I found on Github: https://github.com/jneem/wu-manber [1]: We probably want to rewrite or at least fork the above crate, and add things like a three-byte wide scanner. Evaluating large portions of nixpkgs can easily lead to more than 65k derivations being scanned for. Change-Id: I08926778e1e5d5a87fc9ac26e0437aed8bbd9eb0 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8017 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
Diffstat (limited to 'tvix/cli/Cargo.toml')
-rw-r--r-- | tvix/cli/Cargo.toml | 4 |
1 files changed, 3 insertions, 1 deletions
diff --git a/tvix/cli/Cargo.toml b/tvix/cli/Cargo.toml index 69b54bd299ec..2cda1c6c15a5 100644 --- a/tvix/cli/Cargo.toml +++ b/tvix/cli/Cargo.toml @@ -14,7 +14,9 @@ rustyline = "10.0.0" clap = { version = "4.0", features = ["derive", "env"] } dirs = "4.0.0" smol_str = "0.1" -aho-corasick = "0.7" ssri = "7.0.0" data-encoding = "2.3.3" thiserror = "1.0.38" + +[dependencies.wu-manber] +git = "https://github.com/jneem/wu-manber.git" |