File size: 2,196 Bytes
a70ef44
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
import streamlit as st
from image_search import load_model, process_image, process_text, search_images

st.set_page_config(
        page_title="Bangla CLIP Search",
        page_icon="chart_with_upwards_trend"
    )
st.markdown(
    """
<style>
#introduction {
    padding: 10px 20px 10px 20px;
    background-color: #aad9fe;
    border-radius: 10px;

}

#introduction p {
    font-size: 1.1rem;
    color: #050e14;

}

img {
    padding: 5px;
}
</style>


""",
    unsafe_allow_html=True,
)
hide_streamlit_style = """
            <style>
            #MainMenu {visibility: hidden;}
            footer {visibility: hidden;}
            </style>
            """
st.markdown(hide_streamlit_style, unsafe_allow_html=True) 


st.markdown("# বাংলা CLIP সার্চ ইঞ্জিন ")
st.markdown("""---""")
st.markdown(
    """
<div id="introduction">

<p>
Contrastive Language-Image Pre-training (CLIP), consisting of a simplified version of ConVIRT trained from scratch, is an efficient method of image representation learning from natural language supervision. , CLIP jointly trains an image encoder and a text encoder to predict the correct pairings of a batch of (image, text) training examples. At test time the learned text encoder synthesizes a zero-shot linear classifier by embedding the names or descriptions of the target dataset’s classes. 

The model consists of an EfficientNet image encoder and a BERT encoder and was trained on multiple datasets from Bangla image-text domain.

</p>
</div>
""",
    unsafe_allow_html=True,
)
st.markdown("""---""") 
text_query = st.text_input(":mag_right: Search Images / ছবি খুজুন", "সুন্দরবনের নদীর পাশে একটি বাঘ")
st.markdown("""---""") 
number_of_results = st.slider("Number of results ", 1, 100, 10)
st.markdown("""---""") 

ret_imgs, ret_scores, _, _ = search_images(text_query, "demo_images/", k = number_of_results)

st.markdown("<div style='align: center; display: flex'>", unsafe_allow_html=True)
st.image([str(result) for result in ret_imgs], caption = ["Score: " + str(r_s) for r_s in ret_scores], width=230)
st.markdown("</div>", unsafe_allow_html=True)