SpursgoZmy
/

table-llava-v1.5-13b

Text Generation

Inference Endpoints

Model card Files Files and versions Community

SpursgoZmy commited on Jun 24

Commit

16cd77b

•

1 Parent(s): 16d48ca

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -33,7 +33,7 @@ It was trained with a two-stage pipeline as LLaVA:
 2. Instruction tuning: train the vision-language connector and the base LLM with multimodal instruction following data of tabular and non-tabular tasks.
 **Code Base:** We use the official code of [LLaVA-v1.5](https://github.com/haotian-liu/LLaVA) for model training and inference,
-and the saved model checkpoint is uploaded to this repository.
 **Model Date:** Table-LLaVA 13B was trained in January 2024.
@@ -73,9 +73,9 @@ Table LLaVA is based on LLaVA-1.5 and thus follows its license. Llama 2 is licen
 ## Limitations
-Though the proposed Table-LLaVA demonstrates
 great performance on a wide range of table-based
 tasks, the resolution of input images (336*336) is relatively
 low and may limit the upper bound of its capacity. Luckily, with the emergence of MLLMs which
 possess higher input image resolution (e.g., Monkey (Li et al., 2023d), LLaVA-Next (Liu et al.,
-2024)), researchers can use MMTab to develop more powerful tabular MLLM in the future research.

 2. Instruction tuning: train the vision-language connector and the base LLM with multimodal instruction following data of tabular and non-tabular tasks.
 **Code Base:** We use the official code of [LLaVA-v1.5](https://github.com/haotian-liu/LLaVA) for model training and inference,
+and the saved model checkpoint is uploaded to this repository. Thus, Table LLaVA can be used in the same way as the normal LLaVA v1.5 model with its original code.
 **Model Date:** Table-LLaVA 13B was trained in January 2024.
 ## Limitations
+Table LLaVA takes one table image as the model input. Digesting multiple table images would be valuable to support more application scenarios. Though the proposed Table-LLaVA demonstrates
 great performance on a wide range of table-based
 tasks, the resolution of input images (336*336) is relatively
 low and may limit the upper bound of its capacity. Luckily, with the emergence of MLLMs which
 possess higher input image resolution (e.g., Monkey (Li et al., 2023d), LLaVA-Next (Liu et al.,
+2024)), researchers can use MMTab to develop more powerful tabular MLLM in the future research.