--- tags: - image-to-text - image-captioning license: apache-2.0 metrics: - rouge widget: - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/savanna.jpg example_title: Savanna - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/football-match.jpg example_title: Football Match - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/airport.jpg example_title: Airport base_model: - google/vit-base-patch16-224-in21k model-index: - name: mozilla/distilvit results: - task: type: image-to-text name: Image To Text metrics: - name: ROUGE-1 type: rouge value: 43.006 verified: true - name: ROUGE-2 type: rouge value: 16.9939 verified: true - name: ROUGE-L type: rouge value: 38.8923 verified: true - name: ROUGE-LSUM type: rouge value: 38.8877 verified: true - name: loss type: loss value: 0.19939416646957397 - name: gen_len type: gen_len value: 11.327256736227712 verified: true --- This model is a work in progress. You can find the code used to create the model here: https://github.com/mozilla/distilvit