GitHub - facebookresearch/MILS: Code release for "LLMs can see and hear without any training"
LLMs can see and hear without any training
Official implementation of the paper LLMs can see and hear without any training.
Installation
Install the conda environment using
conda env create -f environment.yml
conda activate MILS
Dataset and checkpoints
Download the following datasets, annotations, and checkpoints
MS-COCO: Download the MS-COCO validation dataset from the official website here. Also, download the 5000 samples test split used in Karpathy et al., Deep visual-semantic alignments for ...
Read more at github.com