clavel's picture
Update README.md
1460f67
|
raw
history blame
3.13 kB
---
annotations_creators:
- MajorIsaiah
- Ximyer
- clavel
- inoid
language_creators: [found]
languages: [es]
multilinguality: [monolingual]
pretty_name: ''
size_categories:
- n=200
source_datasets: [unam_tesis]
task_categories: [text-classification]
task_ids: [language-modeling ]
license: apache-2.0
---
# Unam_tesis_beto_finnetuning: Unam's thesis classification with BETO
This model is created from the finetuning of the pre-model
for Spanish [BETO] (https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased), using PyTorch framework,
and trained with a set of theses of the National Autonomous University of Mexico (UNAM) (https://tesiunam.dgb.unam.mx/F?func=find-b-0&local_base=TES01).
The model classifies a text into for five (Psicología, Derecho, Química Farmacéutico Biológica, Actuaría, Economía)
possible careers at the UNAM.
## Training Dataset
1000 documents (Thesis introduction, Author´s first name, Author´s last name, Thesis title, Year, Career )
| Careers | Size |
|--------------|----------------------|
| Actuaría | 200 |
| Derecho| 200 |
| Economía| 200 |
| Psicología| 200 |
| Química Farmacéutico Biológica| 200 |
## Example of use
For further details on how to use unam_tesis_beto_finnetuning you can visit the Huggingface Transformers library, starting with the Quickstart section. Unam_tesis models can be accessed simply as 'hackathon-pln-e/unam_tesis_beto_finnetuning' by using the Transformers library. An example of how to download and use the models on this page can be found in this colab notebook.
```python
tokenizer = AutoTokenizer.from_pretrained('hiiamsid/BETO_es_binary_classification', use_fast=False)
model = AutoModelForSequenceClassification.from_pretrained(
'hackathon-pln-e/unam_tesis_BETO_finnetuning', num_labels=5, output_attentions=False,
output_hidden_states=False)
pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True)
classificationResult = pipe("Análisis de las condiciones del aprendizaje desde casa en los alumnos de preescolar y primaria del municipio de Nicolás Romero")
```
To cite this resource in a publication please use the following:
## Citation
[UNAM's Tesis with BETO finetuning classify ](https://huggingface.co/hackathon-pln-es/unam_tesis_BETO_finnetuning)
To cite this resource in a publication please use the following:
```
@inproceedings{SpanishNLPHackaton2022,
title={UNAM's Theses with BETO fine-tuning classify },
author={López López, Isaac Isaías and López Ramos, Dionis and Clavel Quintero, Yisel and López López, Ximena Yeraldin },
booktitle={Somos NLP Hackaton 2022},
year={2022}
}
```
## Team members
- Isaac Isaías López López ([MajorIsaiah](https://huggingface.co/MajorIsaiah))
- Dionis López Ramos ([inoid](https://huggingface.co/inoid))
- Yisel Clavel Quintero ([clavel](https://huggingface.co/clavel))
- Ximena Yeraldin López López ([Ximyer](https://huggingface.co/Ximyer))