about summary refs log tree commit diff
path: root/users/flokli/nixos/archeology-ec2/configuration.nix (follow)
AgeCommit message (Collapse)AuthorFilesLines
2023-11-12 r/6996 feat(users/flokli/nixos/archeology-ec2): automate bucket log parsingFlorian Klink1-0/+18
This adds a `parse-bucket-logs.{service,timer}`, running once every night at 3AM UTC, figuring out the last time it was run and parsing bucket logs for all previous days. It invokes the `archeology-parse-bucket-logs` script to produce a .parquet file with the bucket logs in `s3://nix-cache-log/log/` for that day (inside a temporary directory), then on success uploads the produced parquet file to `s3://nix-archeologist/nix-cache-bucket-logs/yyyy-mm-dd.parquet`. Change-Id: Ia75ca8c43f8074fbaa34537ffdba68350c504e52 Reviewed-on: https://cl.tvl.fyi/c/depot/+/10011 Reviewed-by: edef <edef@edef.eu> Tested-by: BuildkiteCI
2023-11-11 r/6993 feat(users/flokli/nixos/archeology-ec2): add parse-bucket-logsFlorian Klink1-0/+4
This adds a `archeology-parse-bucket-logs` CLI tool to `$PATH`. It can be invoked like this: ``` archeology-parse-bucket-logs http://nix-cache-log.s3.amazonaws.com/log/2023-11-10-00-* bucket_logs_2023-11-10-00.pq.zstd ```` … and will produce a zstd-compressed Parquet file for (roughly) that time range. As the EC2 instance credentials don't give access to the logs bucket (yet), other AWS credentials need to be provided. This can be accomplished by using "AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY", "AWS_SESSION_TOKEN" from "Option 2: Manually add a profile to your AWS credentials file (Short- term credentials)" in AWS IAM Identity Center. Processing logs for a one-hour range takes a minute or two, the resulting zstd-compressed Parquet file is around 40-80M in size. Processing logs for a whole day takes some 25mins, due to the sheer amount of data (12 GB of raw log data, distributed among 450k individual files, 20Mio log lines), but at least clickhouse isn't able to parse the resulting parquet file back in: > Code: 36. DB::Exception: IOError: Couldn't deserialize thrift: MaxMessageSize reached For future automation tasks, it's probably better to run this once an hour, and further join the data later on. Change-Id: I6c8108c0ec17dc8d4e2dbe923175553325210a5c Reviewed-on: https://cl.tvl.fyi/c/depot/+/10007 Tested-by: BuildkiteCI Reviewed-by: raitobezarius <tvl@lahfa.xyz>
2023-10-30 r/6909 refactor(users/flokli): move common stuff to `archeology` profileFlorian Klink1-14/+1
Change-Id: I8470c0a2416c0c397e009affb44f8c7a852cd526 Reviewed-on: https://cl.tvl.fyi/c/depot/+/9837 Reviewed-by: flokli <flokli@flokli.de> Tested-by: BuildkiteCI Autosubmit: flokli <flokli@flokli.de>
2023-10-30 r/6907 feat(users/flokli): add archeology-ec2Florian Klink1-0/+26
This add the EC2 box config to the repo. Change-Id: Id7a888a2cfbf1454cd9f9465018df377e14b4e9f Reviewed-on: https://cl.tvl.fyi/c/depot/+/9836 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>