import streamlit as st from image_search import load_model, process_image, process_text, search_images st.set_page_config( page_title="Bangla CLIP Search", page_icon="chart_with_upwards_trend" ) st.markdown( """ """, unsafe_allow_html=True, ) hide_streamlit_style = """ """ st.markdown(hide_streamlit_style, unsafe_allow_html=True) st.markdown("# বাংলা CLIP সার্চ ইঞ্জিন ") st.markdown("""---""") st.markdown( """
Contrastive Language-Image Pre-training (CLIP), consisting of a simplified version of ConVIRT trained from scratch, is an efficient method of image representation learning from natural language supervision. , CLIP jointly trains an image encoder and a text encoder to predict the correct pairings of a batch of (image, text) training examples. At test time the learned text encoder synthesizes a zero-shot linear classifier by embedding the names or descriptions of the target dataset’s classes. The model consists of an EfficientNet image encoder and a BERT encoder and was trained on multiple datasets from Bangla image-text domain.