File size: 3,177 Bytes
75532ac
 
 
 
 
 
 
 
 
 
 
 
 
b2c7915
 
 
75532ac
 
 
 
 
bc569fe
 
75532ac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b3973c2
75532ac
 
 
bc569fe
75532ac
e0c7c5e
75532ac
 
 
 
 
 
 
 
 
 
8187a4f
 
75532ac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
781b214
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
annotations_creators: 
- inoid
- MajorIsaiah
- Ximyer
- clavel
tags: 
- "transformers"
- "text-classification"
languages: "es"
license: "apache-2.0"
datasets: "unam_tesis"
metrics: "accuracy"
widget: 
- text: "Introducción al análisis de riesgos competitivos bajo el enfoque de la función de incidencia acumulada (FIA) y su aplicación con R"
- text: "Asociación del polimorfismo rs1256031 del receptor beta de estrógenos en pacientes con diabetes tipo 2"
---

# Unam_tesis_beto_finnetuning: Unam's thesis classification with BETO 

This model is created from the finetuning of the pre-model
for Spanish [BETO](https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased), using PyTorch framework, 
and trained with a set of theses of the National Autonomous University of Mexico [(UNAM)](https://tesiunam.dgb.unam.mx/F?func=find-b-0&local_base=TES01). 
The model classifies a text into for five (Psicología, Derecho, Química Farmacéutico Biológica, Actuaría, Economía) 
possible careers at the UNAM.

## Training Dataset 

1000 documents (Thesis introduction, Author´s first name, Author´s last name, Thesis title, Year, Career)

|     Careers |  Size         | 
|--------------|----------------------|
|  Actuaría   |  200    | 
|  Derecho|   200    | 
|  Economía|   200    | 
|  Psicología|   200    | 
|  Química Farmacéutico Biológica|   200    | 

## Example of use

For further details on how to use unam_tesis_BETO_finnetuning you can visit the Hugging Face Transformers library, starting with the Quickstart section. The UNAM tesis model can be accessed simply as 'hackathon-pln-e/unam_tesis_BETO_finnetuning' by using the Transformers library. An example of how to download and use the model can be found next. 

```python

 tokenizer = AutoTokenizer.from_pretrained('hiiamsid/BETO_es_binary_classification', use_fast=False)
 model = AutoModelForSequenceClassification.from_pretrained(
                   'hackathon-pln-es/unam_tesis_BETO_finnetuning', num_labels=5, output_attentions=False,
                  output_hidden_states=False)
 pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True)
 
 classificationResult = pipe("Análisis de las condiciones del aprendizaje desde casa en los alumnos de preescolar y primaria del municipio de Nicolás Romero")

```


## Citation

To cite this resource in a publication please use the following:

[UNAM's Tesis with BETO finetuning classify] (https://huggingface.co/hackathon-pln-es/unam_tesis_BETO_finnetuning)

To cite this resource in a publication please use the following:

```
@inproceedings{SpanishNLPHackaton2022,
  title={UNAM's Theses with BETO fine-tuning classify},
  author={López López, Isaac Isaías; Clavel Quintero, Yisel; López Ramos, Dionis & López López, Ximena Yeraldin},
  booktitle={Somos NLP Hackaton 2022},
  year={2022}
}
```

## Team members
- Isaac Isaías López López ([MajorIsaiah](https://huggingface.co/MajorIsaiah))
- Dionis López Ramos ([inoid](https://huggingface.co/inoid))
- Yisel Clavel Quintero ([clavel](https://huggingface.co/clavel))
- Ximena Yeraldin López López ([Ximyer](https://huggingface.co/Ximyer))