GitHub - droher/etymology-db: An open etymology dataset created using Wiktionary data. Contains 3.8M entries, 1.8M terms, 2900 languages, and 31 unique relationship types.
etymology-db
Downloads: (Last generated 2023-12-05)
Gzipped CSV
Parquet
A structured, comprehensive, and multilingual etymology dataset created by parsing Wiktionary's etymology sections. Key features:
4.2+ million etymological relationships between 2.0+ million terms in 3300+ languages/dialects
31 different types of etymological relations, distinguishing between inheritance, borrowing, etc.
Hierarchical data that preserves relationship structures, such as the evolution of a term across language...
Read more at github.com