Josue commited on
Commit
2e674fc
·
1 Parent(s): d51a817
.gitattributes CHANGED
@@ -32,3 +32,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
35
+ nbest_predictions_.json filter=lfs diff=lfs merge=lfs -text
34t.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ Gerry Cotten, creador de un mercado de compraventa de criptodivisas llamado QuadrigaCX, moría en la India el pasado 9 de diciembre de 2018. Lo hacía en circunstancias misteriosas, sobre todo porque tras su muerte no solo desaparecía él, sino también alrededor de 120 millones de euros en forma de criptodivisas. Los 115.000 clientes de QuadrigaCX veían así cómo sus inversiones se desvanecían, lo que puso en marcha una investigación rocambolesca que un año después no ha logrado averiguar dónde están el dinero. No es que no lo hayan intentado, porque se ha requerido incluso la exhumación del cadáver de Cotten para tratar de avanzar en ese proceso. Cotten fue uno de esos emprendedores que comenzó muy pronto a apostar por el mercado de las criptodivisas. Creó la empresa Quadriga en noviembre de 2013 en Vancouver con un socio llamado Michael Patryn —atentos, que este último es protagonista en este relato— y fueron de los primeros en poner en marcha un cajero automático con soporte de criptodivisas en Canadá. El negocio sufrió algunos altibajos, pero Cotten acabó haciendo la transición de Quadriga hacia un mercado de criptodivisas o exchange que operó notablemente durante la subida de valor de bitcoin en 2017. EN 2018, con la caída de los precios, varios clientes indicaron que habían tenido problemas al tratar de retirar fondos, y se comenzaron a poner en marcha investigaciones por potencial fraude.
README.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: es
3
+ thumbnail: https://i.imgur.com/jgBdimh.png
4
+ ---
5
+
6
+ # BETO (Spanish BERT) + Spanish SQuAD2.0
7
+
8
+ This model is provided by [BETO team](https://github.com/dccuchile/beto) and fine-tuned on [SQuAD-es-v2.0](https://github.com/ccasimiro88/TranslateAlignRetrieve) for **Q&A** downstream task.
9
+
10
+ ## Details of the language model('dccuchile/bert-base-spanish-wwm-cased')
11
+
12
+ Language model ([**'dccuchile/bert-base-spanish-wwm-cased'**](https://github.com/dccuchile/beto/blob/master/README.md)):
13
+
14
+ BETO is a [BERT model](https://github.com/google-research/bert) trained on a [big Spanish corpus](https://github.com/josecannete/spanish-corpora). BETO is of size similar to a BERT-Base and was trained with the Whole Word Masking technique. Below you find Tensorflow and Pytorch checkpoints for the uncased and cased versions, as well as some results for Spanish benchmarks comparing BETO with [Multilingual BERT](https://github.com/google-research/bert/blob/master/multilingual.md) as well as other (not BERT-based) models.
15
+
16
+ ## Details of the downstream task (Q&A) - Dataset
17
+ [SQuAD-es-v2.0](https://github.com/ccasimiro88/TranslateAlignRetrieve)
18
+
19
+ | Dataset | # Q&A |
20
+ | ---------------------- | ----- |
21
+ | SQuAD2.0 Train | 130 K |
22
+ | SQuAD2.0-es-v2.0 | 111 K |
23
+ | SQuAD2.0 Dev | 12 K |
24
+ | SQuAD-es-v2.0-small Dev| 69 K |
25
+
26
+ ## Model training
27
+
28
+ The model was trained on a Tesla P100 GPU and 25GB of RAM with the following command:
29
+
30
+ ```bash
31
+ export SQUAD_DIR=path/to/nl_squad
32
+ python transformers/examples/question-answering/run_squad.py \
33
+ --model_type bert \
34
+ --model_name_or_path dccuchile/bert-base-spanish-wwm-cased \
35
+ --do_train \
36
+ --do_eval \
37
+ --do_lower_case \
38
+ --train_file $SQUAD_DIR/train_nl-v2.0.json \
39
+ --predict_file $SQUAD_DIR/dev_nl-v2.0.json \
40
+ --per_gpu_train_batch_size 12 \
41
+ --learning_rate 3e-5 \
42
+ --num_train_epochs 2.0 \
43
+ --max_seq_length 384 \
44
+ --doc_stride 128 \
45
+ --output_dir /content/model_output \
46
+ --save_steps 5000 \
47
+ --threads 4 \
48
+ --version_2_with_negative
49
+ ```
50
+
51
+ ## Results:
52
+
53
+
54
+ | Metric | # Value |
55
+ | ---------------------- | ----- |
56
+ | **Exact** | **76.50**50 |
57
+ | **F1** | **86.07**81 |
58
+
59
+ ```json
60
+ {
61
+ "exact": 76.50501430594491,
62
+ "f1": 86.07818773108252,
63
+ "total": 69202,
64
+ "HasAns_exact": 67.93020719738277,
65
+ "HasAns_f1": 82.37912207996466,
66
+ "HasAns_total": 45850,
67
+ "NoAns_exact": 93.34104145255225,
68
+ "NoAns_f1": 93.34104145255225,
69
+ "NoAns_total": 23352,
70
+ "best_exact": 76.51223953064941,
71
+ "best_exact_thresh": 0.0,
72
+ "best_f1": 86.08541295578848,
73
+ "best_f1_thresh": 0.0
74
+ }
75
+ ```
76
+
77
+ ### Model in action (in a Colab Notebook)
78
+ <details>
79
+
80
+ 1. Set the context and ask some questions:
81
+
82
+ ![Set context and questions](https://media.giphy.com/media/mCIaBpfN0LQcuzkA2F/giphy.gif)
83
+
84
+ 2. Run predictions:
85
+
86
+ ![Run the model](https://media.giphy.com/media/WT453aptcbCP7hxWTZ/giphy.gif)
87
+ </details>
88
+
89
+ > Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488)
90
+
91
+ > Made with <span style="color: #e25555;">&hearts;</span> in Spain
config.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForQuestionAnswering"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "hidden_act": "gelu",
7
+ "hidden_dropout_prob": 0.1,
8
+ "hidden_size": 768,
9
+ "initializer_range": 0.02,
10
+ "intermediate_size": 3072,
11
+ "layer_norm_eps": 1e-12,
12
+ "max_position_embeddings": 512,
13
+ "model_type": "bert",
14
+ "num_attention_heads": 12,
15
+ "num_hidden_layers": 12,
16
+ "output_past": true,
17
+ "pad_token_id": 1,
18
+ "type_vocab_size": 2,
19
+ "vocab_size": 31002
20
+ }
flax_model.msgpack ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0f7ed9d36d9dfabb74e6f9a297232a880c4048fe3baff7d7aace5027dbe461f9
3
+ size 437054446
gitattributes.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ *.bin.* filter=lfs diff=lfs merge=lfs -text
2
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.h5 filter=lfs diff=lfs merge=lfs -text
5
+ *.tflite filter=lfs diff=lfs merge=lfs -text
6
+ *.tar.gz filter=lfs diff=lfs merge=lfs -text
7
+ *.ot filter=lfs diff=lfs merge=lfs -text
8
+ *.onnx filter=lfs diff=lfs merge=lfs -text
9
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
nbest_predictions_.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:085a6a937305d45965a24a19916bcb43722a1a7cb1d5c14753bdc85e8b4a3166
3
+ size 320885570
null_odds_.json ADDED
The diff for this file is too large to render. See raw diff
 
predictions_.json ADDED
The diff for this file is too large to render. See raw diff
 
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:280c5e41b49fff41568fcc282c9419ee8b3c6681ac6c30ae5e3718f546b61bff
3
+ size 439457908
saved_model.tar.gz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3a6f5510408fb6e69cd13e7bfa8c7e3cf6f390ccbf7b57430e9cbf18f7a97bd4
3
+ size 408021292
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"do_lower_case": true, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:483b8d0eb37493df82b9e3f8a049e84af9f12b190537f91c4e8fd5e0b58b0f4d
3
+ size 1537
vocab (1).txt ADDED
The diff for this file is too large to render. See raw diff