138108b 9c415fb 430d7ff
1
2
3
4
5
6
--- license: mit --- This is the feature alignment pre-training work to train only only the multi-modal projector. "Predict" paragraph given caption, ocr and image token