"Large Language Models Fail at Simple Common Sense Tasks, Exhibit Overconfidence in Wrong Solutions: Urgent Reassessment Needed, Study Reveals"

Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models

View PDF HTML (experimental) Abstract:Large Language Models (LLMs) are often described as being instances of foundation models - that is, models that transfer strongly across various tasks and conditions in few-show or zero-shot manner, while exhibiting scaling laws that predict function improvement when increasing the pre-training scale. These claims of excelling in different functions and tasks rely on measurements taken across various sets of standardized benchmarks showing high scores for su...