about summary refs log tree commit diff
path: root/tvix/boot
diff options
context:
space:
mode:
authorFlorian Klink <flokli@flokli.de>2023-11-11T19·14+0200
committerflokli <flokli@flokli.de>2023-11-11T19·49+0000
commit46964f6d8f95590748855976fc77ce1faa75d708 (patch)
tree11da67f7d54062f87cb6dc10f2169d2ffb7347ec /tvix/boot
parent281cb93ba808b73d4ea4ce86f762bbcb504a09da (diff)
fix(users/flokli/archaeology): don't use file but column compression r/6994
Clickhouse also has column compression, configurable with the
output_format_parquet_compression_method setting.

It defaults to lz4, and the previous setting got a a zstd-compressed
parquet file with lz4 data.

Set output_format_parquet_compression_method to zstd instead, and sort
by timestamp before assembling the parquet file.

The existing files were updated to the same format with the following query:

```
SELECT * FROM file('bucket_logs_2023-11-11*.pq', 'Parquet', 'auto') ORDER BY timestamp ASC INTO OUTFILE 'bucket_logs_2023-11-11.parquet' SETTINGS output_format_parquet_compression_method = 'zstd'
```

Change-Id: Id63b14c82e7bf4b9907a500528b569a51e277751
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10008
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
Diffstat (limited to 'tvix/boot')
0 files changed, 0 insertions, 0 deletions