Embedding User-Defined Indexes in Apache Parquet Files
Posted on: Mon 14 July 2025 by Qi Zhu, Jigao Luo, and Andrew Lamb
It’s a common misconception that Apache Parquet files are limited to basic Min/Max/Null Count statistics and Bloom filters, and that adding more advanced indexes requires changing the specification or creating a new file format. In fact, footer metadata and offset-based addressing already provide everything needed to embed user-defined index structures within Parquet files without breaking compatibility with other Parquet readers....
Read more at datafusion.apache.org