File size: 3,199 Bytes
cb6d395
 
 
 
33b78f8
 
e6af2ce
 
 
 
cb6d395
 
e8da952
 
 
 
1460f67
b250479
ed303e0
e8da952
c8ba1a2
 
0804622
c8ba1a2
 
 
6ee0ba7
8c509fd
 
 
f1b337b
c8ba1a2
e8da952
 
6ee0ba7
0804622
e8da952
 
6ee0ba7
e8da952
b250479
e8da952
 
 
b250479
e8da952
 
 
 
 
 
 
0804622
e8da952
 
 
 
 
f1b337b
 
e8da952
 
bd0f9f1
c8ba1a2
 
 
31c1c71
c2230e4
8f1c738
781b214
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---

tags: 
- "transformers"
- "text-classification"
languages: "es"
license: "apache-2.0"
datasets: "unam_tesis"
widget: 
- text: "Introducción al análisis de riesgos competitivos bajo el enfoque de la función de incidencia acumulada (FIA) y su aplicación con R"
- text: "Asociación del polimorfismo rs1256031 del receptor beta de estrógenos en pacientes con diabetes tipo 2"
---


# Unam_tesis_beto_finnetuning: Unam's thesis classification with BETO 



This model is created from the finetuning of the pre-model

for Spanish [BETO] (https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased), using PyTorch framework, 

and trained with a set of theses of the National Autonomous University of Mexico (UNAM) (https://tesiunam.dgb.unam.mx/F?func=find-b-0&local_base=TES01). 
The model classifies a text into for five (Psicología, Derecho, Química Farmacéutico Biológica, Actuaría, Economía) 
possible careers at the UNAM.

## Training Dataset 

1000 documents (Thesis introduction, Author´s first name, Author´s last name, Thesis title, Year, Career)

|     Careers |  Size         | 
|--------------|----------------------|
|  Actuaría   |  200    | 
|  Derecho|   200    | 
|  Economía|   200    | 
|  Psicología|   200    | 
|  Química Farmacéutico Biológica|   200    | 

## Example of use

For further details on how to use unam_tesis_beto_finnetuning you can visit the Huggingface Transformers library, starting with the Quickstart section. Unam_tesis models can be accessed simply as 'hackathon-pln-e/unam_tesis_beto_finnetuning' by using the Transformers library. An example of how to download and use the models on this page can be found in this colab notebook. 



```python



 tokenizer = AutoTokenizer.from_pretrained('hiiamsid/BETO_es_binary_classification', use_fast=False)
 model = AutoModelForSequenceClassification.from_pretrained(

                   'hackathon-pln-e/unam_tesis_BETO_finnetuning', num_labels=5, output_attentions=False,
                  output_hidden_states=False)

 pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True)

 

 classificationResult = pipe("Análisis de las condiciones del aprendizaje desde casa en los alumnos de preescolar y primaria del municipio de Nicolás Romero")


```



To cite this resource in a publication please use the following:



## Citation



[UNAM's Tesis with BETO finetuning classify] (https://huggingface.co/hackathon-pln-es/unam_tesis_BETO_finnetuning)



To cite this resource in a publication please use the following:



```
@inproceedings{SpanishNLPHackaton2022,
  title={UNAM's Theses with BETO fine-tuning classify },
  author={López López, Isaac Isaías and López Ramos, Dionis and Clavel Quintero, Yisel and López López, Ximena Yeraldin },
  booktitle={Somos NLP Hackaton 2022},
  year={2022}
}
```



## Team members

- Isaac Isaías López López ([MajorIsaiah](https://huggingface.co/MajorIsaiah))

- Dionis López Ramos ([inoid](https://huggingface.co/inoid))

- Yisel Clavel Quintero ([clavel](https://huggingface.co/clavel))

- Ximena Yeraldin López López ([Ximyer](https://huggingface.co/Ximyer))