diff options
author | Ryan Lahfa <tvl@lahfa.xyz> | 2024-01-05T23·41+0100 |
---|---|---|
committer | clbot <clbot@tvl.fyi> | 2024-01-20T17·16+0000 |
commit | 1f1a42b4da34bb2ad0cd75d6c822e2d24a19c0a2 (patch) | |
tree | d635b4b831c6ea80770540ad74eb52840ec3843d /tvix/store/src/utils.rs | |
parent | e98ea31bbd17e71566d56810fa9c1d08960461ad (diff) |
feat(tvix/castore): ingestion does DFS and invert it r/7430
To make use of the filtering feature, we need to revert the internal walker to a real DFS. We will therefore just invert the whole tree by storing all of its contents in a level-keyed vector. This is horribly expensive in memory, this is a compromise between CPU and memory, here is the fundamental reason for why: When you encounter a directory, it's either a leaf or not, i.e. it contains subdirectories or not. To know this fact, you can: - wait until you notice subdirectories under it, i.e. you need to store any intermediate nodes you see in the meantime -> memory penalty. - getdents or readdir on it to determine *NOW* its subdirectories -> CPU penalty and I/O penalty. This is an implementation of the first proposal, we pay memory. In practice, we are paying O(#nb of nodes) in memory. There's a smarter albeit much more complicated algorithm that pays only O(\sum_i #siblings(p_i)) nodes where (p_1, ..., p_n) is the path to a leaf. which means for: A / \ B C / / \ D E F We would never store D, E, F but only E, F at a given time. But we would still store B, C no matter what. Change-Id: I456ed1c3f0db493e018ba1182665d84bebe29c11 Reviewed-on: https://cl.tvl.fyi/c/depot/+/10567 Tested-by: BuildkiteCI Autosubmit: raitobezarius <tvl@lahfa.xyz> Reviewed-by: flokli <flokli@flokli.de>
Diffstat (limited to 'tvix/store/src/utils.rs')
0 files changed, 0 insertions, 0 deletions