about summary refs log tree commit diff
path: root/users/flokli/archeology/parse_bucket_logs.rs
AgeCommit message (Collapse)AuthorFilesLines
2023-11-14 r/7013 fix(users/flokli/archeology/parse_bucket_logs): fix regex and skipFlorian Klink1-1/+2
It seems the regex is not perfect, it choked on a single log line: ``` Nov 13 03:10:19 archeology-ec2 59nkrwmih3ywaxrgxqj79pn395fs6m17-parse-bucket-logs-continuously[11105]: Code: 117. DB::Exception: Line "d57bd890fbd1ae16625bdb8168064125e013198099b7e1b3c24878a4d03c3ab8 nix-cache [12/Nov/2023:09:13:02 +0000] xxx.xx.xxx.xxx - VB7SJVZ108DSSN67 REST.POST.OBJECT index.html "POST /index.html HTTP/1.1" 405 MethodNotAllowed 348 - 4 - "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" - 0bFdGKbi0n9JHXU1a2hijcJwmYdc6lG2xgbdozc3wS6mlUkBE7ssrQCHIDdOLebo78o2cGbhivY= - ECDHE-RSA-AES128-GCM-SHA256 - nix-cache.s3.amazonaws.com TLSv1.2 - -" doesn't match the regexp.: (in file/uri log/2023-11-12-10-19-50-80805A702ECF65EB): (at row 5) ``` This was due to the user-agent field. The regex is now fixed. The request itself is fun (someone trying to POST an index.html to the bucket), and we should probably filter this on the Fastly side already, not via IAM, In any case, there's no point failing to parse if a single line doesn't match the regex - we can just skip them. For the sake of completeness, logs for that day have been reprocessed and reuploaded. Change-Id: Id98a7167a381cda06d150ad5118ee9e70ead277e Reviewed-on: https://cl.tvl.fyi/c/depot/+/10034 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2023-11-11 r/6994 fix(users/flokli/archaeology): don't use file but column compressionFlorian Klink1-2/+5
Clickhouse also has column compression, configurable with the output_format_parquet_compression_method setting. It defaults to lz4, and the previous setting got a a zstd-compressed parquet file with lz4 data. Set output_format_parquet_compression_method to zstd instead, and sort by timestamp before assembling the parquet file. The existing files were updated to the same format with the following query: ``` SELECT * FROM file('bucket_logs_2023-11-11*.pq', 'Parquet', 'auto') ORDER BY timestamp ASC INTO OUTFILE 'bucket_logs_2023-11-11.parquet' SETTINGS output_format_parquet_compression_method = 'zstd' ``` Change-Id: Id63b14c82e7bf4b9907a500528b569a51e277751 Reviewed-on: https://cl.tvl.fyi/c/depot/+/10008 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Tested-by: BuildkiteCI
2023-11-11 r/6991 feat(users/flokli/archeology): show clickhouse-local progressFlorian Klink1-1/+2
This behaviour might change (or not), see https://github.com/ClickHouse/ ClickHouse/pull/42003, but as of now, a `--progress` will provide some progress. Change-Id: I4891b6e2f96f2656858e71f88a226d24f0d45dc3 Reviewed-on: https://cl.tvl.fyi/c/depot/+/10005 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Tested-by: BuildkiteCI
2023-11-11 r/6989 feat(users/flokli/archeology): init parse-bucket-logsFlorian Klink1-0/+37
Change-Id: I096b6fed8c73ddd5a417f5183cc113356ffd98c9 Reviewed-on: https://cl.tvl.fyi/c/depot/+/9983 Tested-by: BuildkiteCI Reviewed-by: raitobezarius <tvl@lahfa.xyz>