diff options
author | Florian Klink <flokli@flokli.de> | 2023-11-11T19·14+0200 |
---|---|---|
committer | flokli <flokli@flokli.de> | 2023-11-11T19·49+0000 |
commit | 46964f6d8f95590748855976fc77ce1faa75d708 (patch) | |
tree | 11da67f7d54062f87cb6dc10f2169d2ffb7347ec /third_party | |
parent | 281cb93ba808b73d4ea4ce86f762bbcb504a09da (diff) |
fix(users/flokli/archaeology): don't use file but column compression r/6994
Clickhouse also has column compression, configurable with the output_format_parquet_compression_method setting. It defaults to lz4, and the previous setting got a a zstd-compressed parquet file with lz4 data. Set output_format_parquet_compression_method to zstd instead, and sort by timestamp before assembling the parquet file. The existing files were updated to the same format with the following query: ``` SELECT * FROM file('bucket_logs_2023-11-11*.pq', 'Parquet', 'auto') ORDER BY timestamp ASC INTO OUTFILE 'bucket_logs_2023-11-11.parquet' SETTINGS output_format_parquet_compression_method = 'zstd' ``` Change-Id: Id63b14c82e7bf4b9907a500528b569a51e277751 Reviewed-on: https://cl.tvl.fyi/c/depot/+/10008 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Tested-by: BuildkiteCI
Diffstat (limited to 'third_party')
0 files changed, 0 insertions, 0 deletions