--- license: apache-2.0 language: - hu library_name: sentence-transformers tags: - sentence-similarity widget: - source_sentence: "Szép napunk van." sentences: - "Jó az idő." - "Szép az autó." - "Elutazok egy napra." example_title: "Példa" --- # Hungarian Experimental Sentence-BERT The pre-trained [huBERT](https://huggingface.co/SZTAKI-HLT/hubert-base-cc) was fine-tuned on the[ Hunglish 2.0](http://mokk.bme.hu/resources/hunglishcorpus) parallel corpus to mimic the [bert-base-nli-stsb-mean-tokens](https://huggingface.co/sentence-transformers/bert-base-nli-stsb-mean-tokens) model provided by UKPLab. Sentence embeddings were obtained by applying mean pooling to the huBERT output. The data was split into training (98%) and validation (2%) sets. By the end of the training, a mean squared error of 0.106 was computed on the validation set. Our code was based on the [Sentence-Transformers](https://www.sbert.net) library. Our model was trained for 2 epochs on a single GTX 1080Ti GPU card with the batch size set to 32. The training took approximately 15 hours. ## Limitations - max_seq_length = 128 ## Usage ```python from sentence_transformers import SentenceTransformer sentences = ["This is an example sentence", "Each sentence is converted"] model = SentenceTransformer('NYTK/sentence-transformers-experimental-hubert-hungarian') embeddings = model.encode(sentences) print(embeddings) ``` ## Citation If you use this model, please cite the following paper: ``` @article {bertopic, title = {Analyzing Narratives of Patient Experiences: A BERT Topic Modeling Approach}, journal = {Acta Polytechnica Hungarica}, year = {2023}, author = {Osváth, Mátyás and Yang, Zijian Győző and Kósa, Karolina}, pages = {153--171}, volume = {20}, number = {7} } ```