News Score: Score the News, Sort the News, Rewrite the Headlines

Building a Polite & Fast Web Crawler

In that order. Dennis Schubert, engineer at Mozilla and noteworthy contributor to diapsora, a distributed, open-source social network, recently observed that 70% of the load on diaspora's servers was coming from poorly-behaved bots that feed the LLMs of a few big outfits. The worst offenders, amounting to 40% of total traffic combined, were OpenAI and Amazon. While there are zillions of articles on general crawling etiquette, there are scant millions on how to actually abide by the rules while a...

Read more at cameronboehmer.com

© News Score  score the news, sort the news, rewrite the headlines