Query Engines: Gatekeepers of the Parquet File Format
TL;DR: Mainstream query engines do not support reading newer Parquet encodings, forcing systems like DuckDB to default to writing older encodings, thereby sacrificing compression.
The Apache® Parquet™ Format
Apache Parquet is a popular, free, open-source, column-oriented data storage format.
Whereas database systems typically load data from formats such as CSV and JSON into database tables before analyzing them, Parquet is designed to be efficiently queried directly.
Parquet considers that users...
Read more at duckdb.org