Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine | Companion of the 2024 International Conference on Management of Data
ABSTRACT
Apache Arrow DataFusion is a fast, embeddable, and extensible query engine written in Rust that uses Apache Arrow as its memory model. In this paper we describe the technologies on which it is built, and how it fits in long-term database implementation trends. We then enumerate its features, optimizations, architecture and extension APIs to illustrate the breadth of requirements of modern OLAP engines as well as the interfaces needed by systems built with them. Finally, we demonstrate o...
Read more at dl.acm.org