News Score: Score the News, Sort the News, Rewrite the Headlines

Long-Context Retrieval Models with Monarch Mixer

Text embeddings are a critical piece of many pipelines, from search, to RAG, to vector databases and more. Most embedding models are BERT/Transformer-based and typically have short context lengths (e.g., 512). That’s only about two pages of text, but documents can be very long – books, legal cases, TV screenplays, code repositories, etc can be tens of thousands of tokens long (or more). Here, we’re taking a first step towards developing long-context retrieval models. We build on Monarch Mixer (M...

Read more at hazyresearch.stanford.edu

© News Score  score the news, sort the news, rewrite the headlines