Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine
Large language models (LLMs) exemplified by generative pre-trained transformer 4 (GPT-4)1 have achieved remarkable performance on various biomedical tasks2, including summarizing medical evidence3, assisting in literature search4,5, answering medical examination questions6,7,8,9, and matching patients to clinical trials10. However, most of these LLMs are unimodal, utilizing only the free-text context, while clinical tasks often require the integration of narrative descriptions and multiple types...
Read more at nature.com