KennethEnevoldsen commited on
Commit
8de3fd0
1 Parent(s): c7392f2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -4
README.md CHANGED
@@ -1,10 +1,18 @@
1
- Trained using SimCSE with:
 
 
 
 
 
 
 
 
 
2
 
3
  ```
4
  CUDA_VISIBLE_DEVICES=0 python train.py \
5
- --train_file data/dfm_paragraphs.txt \
6
  --model_name_or_path chcaa/dfm-encoder-large-v1 \
7
- --output_dir result/dfm-sentence-encoder-medium-v4 \
8
  --num_train_epochs 1 \
9
  --per_device_train_batch_size 128 \
10
  --learning_rate 1e-5 \
@@ -19,4 +27,26 @@ CUDA_VISIBLE_DEVICES=0 python train.py \
19
  --temp 0.05 \
20
  --do_train \
21
  --fp16
22
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - DDSC/dagw_no_twitter
5
+ language:
6
+ - da
7
+ tags:
8
+ - SimCSE
9
+ ---
10
+ Trained using the [SimCSE](https://github.com/princeton-nlp/SimCSE) implementation with:
11
 
12
  ```
13
  CUDA_VISIBLE_DEVICES=0 python train.py \
14
+ --train_file data/dfm_paragraphs.txt \ # paragraphs extract from Danish Gigaword
15
  --model_name_or_path chcaa/dfm-encoder-large-v1 \
 
16
  --num_train_epochs 1 \
17
  --per_device_train_batch_size 128 \
18
  --learning_rate 1e-5 \
 
27
  --temp 0.05 \
28
  --do_train \
29
  --fp16
30
+ ```
31
+
32
+
33
+ ## Citation
34
+
35
+ To cite this work please refer to the following article:
36
+
37
+ ```
38
+ Enevoldsen, K., Kardos, M., Muennighoff, N., & Nielbo, K. (2024). The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding. https://openreview.net/forum?id=pJl_i7HIA72
39
+ ```
40
+
41
+ or use the following BibTeX:
42
+ ```
43
+ @article{enevoldsenScandinavianEmbeddingBenchmarks2024,
44
+ title = {The {Scandinavian} {Embedding} {Benchmarks}: {Comprehensive} {Assessment} of {Multilingual} and {Monolingual} {Text} {Embedding}},
45
+ shorttitle = {The {Scandinavian} {Embedding} {Benchmarks}},
46
+ url = {https://openreview.net/forum?id=pJl_i7HIA72},
47
+ language = {en},
48
+ urldate = {2024-04-12},
49
+ author = {Enevoldsen, Kenneth and Kardos, Márton and Muennighoff, Niklas and Nielbo, Kristoffer},
50
+ month = feb,
51
+ year = {2024},
52
+ }