wchai commited on
Commit
f28aaa7
1 Parent(s): 5e0cf82

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +102 -0
README.md ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - Reself/AuroraCap-trainset
5
+ base_model:
6
+ - lmsys/vicuna-7b-v1.5-16k
7
+ tags:
8
+ - caption
9
+ model-index:
10
+ - name: AuroraCap-7B
11
+ results:
12
+ - task:
13
+ type: image caption
14
+ dataset:
15
+ type: Filckr
16
+ name: Filckr
17
+ metrics:
18
+ - type: cider
19
+ value: 88.9
20
+ - type: bleu
21
+ value: 75.6
22
+ name: bleu@1
23
+ - type: bleu
24
+ value: 32.8
25
+ name: bleu@4
26
+ - type: meteor
27
+ value: 26.7
28
+ - type: rouge
29
+ value: 55.4
30
+ name: rouge-l
31
+ - task:
32
+ type: image caption
33
+ dataset:
34
+ type: NoCaps
35
+ name: NoCaps
36
+ metrics:
37
+ - type: cider
38
+ value: 111.4
39
+ - type: bleu
40
+ value: 85.6
41
+ name: bleu@1
42
+ - type: bleu
43
+ value: 44.4
44
+ name: bleu@4
45
+ - type: meteor
46
+ value: 29.9
47
+ - type: rouge
48
+ value: 60.6
49
+ name: rouge-l
50
+ - task:
51
+ type: image caption
52
+ dataset:
53
+ type: COCO-Cap
54
+ name: COCO-Cap
55
+ metrics:
56
+ - type: cider
57
+ value: 120.8
58
+ - type: bleu
59
+ value: 78.0
60
+ name: bleu@1
61
+ - type: bleu
62
+ value: 35.3
63
+ name: bleu@4
64
+ - type: meteor
65
+ value: 28.6
66
+ - type: rouge
67
+ value: 57.2
68
+ name: rouge-l
69
+ ---
70
+
71
+ <img src="assets/teaser.png" align="center">
72
+
73
+ ## Resources
74
+
75
+ - [Website](https://rese1f.github.io/aurora-web/)
76
+ - [arXiv: Paper]()
77
+ - [GitHub: Code](https://github.com/rese1f/aurora)
78
+ - [Huggingface: AuroraCap Model](https://huggingface.co/collections/Reself/auroracap-66d117ffe13bedda96702013)
79
+ - [Huggingface: VDC Benchmark](https://huggingface.co/datasets/Reself/Video-Detailed-Caption)
80
+ - [Huggingface: Trainset](https://huggingface.co/datasets/Reself/AuroraCap-trainset)
81
+
82
+ ## Features
83
+
84
+ <img src="assets/vdc_baseline.png" align="center">
85
+
86
+ AuroraCap is a multimodal large language model for image and video captioning.
87
+
88
+ ## Quick Start
89
+
90
+ see [Docs](https://github.com/rese1f/aurora/blob/main/docs/auroracap/README.md).
91
+
92
+ ## FAQ
93
+
94
+ Q: Can I only use token merging during inference?
95
+
96
+ A: No, our experiments show that token merging is also a way to accelerate training while maintaining similar performance. Additionally, besides auroracap, you can also use token merging on other llava-like models.
97
+
98
+ Q: Why do we provide both official LLaVA-format and Xtuner format weights for AuroraCap?
99
+
100
+ A: While Xtuner supports saving checkpoints in multiple formats, it currently only allows continued training with the Xtuner format. Therefore, we currently provide the model in the Xtuner format for both continued training and inference. In the future, we will provide the model in the official LLaVA format for both training and inference, enabling quicker SGLang deployment and integration with the transformers.
101
+
102
+ ## Citation