The results are not really good, can you change into BERT
and can you make BERT zero shot classification with CLIP model where I can both enter a text, labels and an image?
I agree, the results are not perfect. Yes, I could do it, but it would take some time and I'm very busy, sorry.
ok make it please, because I enjoy multimodal (both text prompt, labels and image)
I agree, the results are not perfect. Yes, I could do it, but it would take some time and I'm very busy, sorry.
Is it done?
I agree, the results are not perfect. Yes, I could do it, but it would take some time and I'm very busy, sorry.
Is it done?
Sorry for the confusion, but I'm not going to do it, don't keep waiting. As I said, I'm very busy and these spaces are only experiments. I never said I was going to do it. I just said "I could do it", because I believe so, but that is not the focus of my experiments.