Text Generation
Transformers
PyTorch
English
llava
Inference Endpoints
SpursgoZmy commited on
Commit
16cd77b
1 Parent(s): 16d48ca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -33,7 +33,7 @@ It was trained with a two-stage pipeline as LLaVA:
33
  2. Instruction tuning: train the vision-language connector and the base LLM with multimodal instruction following data of tabular and non-tabular tasks.
34
 
35
  **Code Base:** We use the official code of [LLaVA-v1.5](https://github.com/haotian-liu/LLaVA) for model training and inference,
36
- and the saved model checkpoint is uploaded to this repository.
37
 
38
  **Model Date:** Table-LLaVA 13B was trained in January 2024.
39
 
@@ -73,9 +73,9 @@ Table LLaVA is based on LLaVA-1.5 and thus follows its license. Llama 2 is licen
73
 
74
  ## Limitations
75
 
76
- Though the proposed Table-LLaVA demonstrates
77
  great performance on a wide range of table-based
78
  tasks, the resolution of input images (336*336) is relatively
79
  low and may limit the upper bound of its capacity. Luckily, with the emergence of MLLMs which
80
  possess higher input image resolution (e.g., Monkey (Li et al., 2023d), LLaVA-Next (Liu et al.,
81
- 2024)), researchers can use MMTab to develop more powerful tabular MLLM in the future research.
 
33
  2. Instruction tuning: train the vision-language connector and the base LLM with multimodal instruction following data of tabular and non-tabular tasks.
34
 
35
  **Code Base:** We use the official code of [LLaVA-v1.5](https://github.com/haotian-liu/LLaVA) for model training and inference,
36
+ and the saved model checkpoint is uploaded to this repository. Thus, Table LLaVA can be used in the same way as the normal LLaVA v1.5 model with its original code.
37
 
38
  **Model Date:** Table-LLaVA 13B was trained in January 2024.
39
 
 
73
 
74
  ## Limitations
75
 
76
+ Table LLaVA takes one table image as the model input. Digesting multiple table images would be valuable to support more application scenarios. Though the proposed Table-LLaVA demonstrates
77
  great performance on a wide range of table-based
78
  tasks, the resolution of input images (336*336) is relatively
79
  low and may limit the upper bound of its capacity. Luckily, with the emergence of MLLMs which
80
  possess higher input image resolution (e.g., Monkey (Li et al., 2023d), LLaVA-Next (Liu et al.,
81
+ 2024)), researchers can use MMTab to develop more powerful tabular MLLM in the future research.