Spaces:

gweakliem
/

embed_experiments

Sleeping

File size: 1,747 Bytes

dfe1df2
 
7ecae86
 
0922458
 
79e37a7
 
0922458
0c8d10b
 
 
 
 
 
 
 
 
0922458
dfe1df2
0922458
 
 
 
dfe1df2
0922458
 
dfe1df2
0922458
8a9be3e

from sentence_transformers import SentenceTransformer, SimilarityFunction

import streamlit as st

model_name = "nomic-ai/nomic-embed-text-v2-moe"
with st.form("embedding"):
    sentence1 = st.text_input(label="Sentence 1:",value="Hello!")
    sentence2 = st.text_input(label="Sentence 2:",value="¡Hola!")
    sim_fun = st.selectbox('Similarity Function', ['COSINE', 'DOT_PRODUCT', 'EUCLIDEAN', 'MANHATTAN'])
    examples = [
        "와 아침에 눈뜨고 세시간 가까이 핸드폰만 함.. ㅁㅊ 책 좀 읽어야겠다...",
        "Wow, I opened my eyes in the morning and spent almost three hours on my phone... I guess I should read a book...", # translation of above
        "To train DeepSeek-R1-Zero, we begin by designing a straightforward template that guides the base model to adhere to our specified instructions. ",
        "Many will say to me in that day, Lord, Lord, have we not prophesied in thy name? and in thy name have cast out devils? and in thy name done many wonderful works? And then will I profess unto them, I never knew you: depart from me, ye that work iniquity.",
        "When you're born you get a ticket to the freak show. When you're born in America, you get a front row seat." # George Carlin
    ]
    for x in examples:
        st.write(x)
    calculate = st.form_submit_button('Calculate')

if calculate:
    model = SentenceTransformer(model_name, trust_remote_code=True)
    sentences = [sentence1, sentence2]
    embeddings = model.encode(sentences, prompt_name="passage")

    similarity_fn_enum = getattr(SimilarityFunction, sim_fun)
    model.similarity_fn_name = similarity_fn_enum

    similarities = model.similarity(embeddings[0], embeddings[1])
    st.write(f"similarity: {similarities}")