endre sukosd commited on
Commit
31f8e76
1 Parent(s): 1565c8a

Large files lfs tracking

Browse files
.gitattributes CHANGED
@@ -25,3 +25,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
25
  *.zip filter=lfs diff=lfs merge=lfs -text
26
  *.zstandard filter=lfs diff=lfs merge=lfs -text
27
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
25
  *.zip filter=lfs diff=lfs merge=lfs -text
26
  *.zstandard filter=lfs diff=lfs merge=lfs -text
27
  *tfevents* filter=lfs diff=lfs merge=lfs -text
28
+ data/preprocessed/shortened_abstracts_hu_2021_09_01.txt filter=lfs diff=lfs merge=lfs -text
29
+ data/preprocessed/shortened_abstracts_hu_2021_09_01_embedded.pt filter=lfs diff=lfs merge=lfs -text
30
+ data/raw/short-abstracts_lang=hu.ttl filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -61,7 +61,7 @@ Model facts:
61
 
62
  To reproduce the precalculated embedding use the notebook in `notebooks/QA_retrieval_precalculate_embeddings.ipynb`, with GPU in Google Colab.
63
 
64
- Known bug: the precalculated embeddings contain an extra random tensor in the beginning, thus the total size of 466529 (one more than the number of raw sentences). This is corrected by substracting 1 from the index of the most similar embedding, to find the corresponding raw sentence.
65
 
66
  ## Search top-k matches
67
 
61
 
62
  To reproduce the precalculated embedding use the notebook in `notebooks/QA_retrieval_precalculate_embeddings.ipynb`, with GPU in Google Colab.
63
 
64
+ Known bug: the precalculated embeddings contain an extra tensor at the end, which is the empty newline at the end of the text file, this last index should be ignored
65
 
66
  ## Search top-k matches
67
 
data/preprocessed/shortened_abstracts_hu_2021_09_01.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7c7a591796d7ced733adfebf1b7c162a66084b5e27d506946cfb4d6749bc02d3
3
+ size 130726428
data/preprocessed/shortened_abstracts_hu_2021_09_01_embedded.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:69cdcfc6da9e5cff770eaf5deeddfb84162a61b23954bc7585b5f65f98b25866
3
+ size 716589291
data/raw/short-abstracts_lang=hu.ttl ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:36d487bb7be95e2a8313fc381cd461967ee16063f64cd4e34a68b338b91cfcd4
3
+ size 180856895