clavel's picture
Update README.md
1460f67
metadata
annotations_creators:
  - MajorIsaiah
  - Ximyer
  - clavel
  - inoid
language_creators:
  - found
languages:
  - es
multilinguality:
  - monolingual
pretty_name: ''
size_categories:
  - n=200
source_datasets:
  - unam_tesis
task_categories:
  - text-classification
task_ids:
  - language-modeling
license: apache-2.0

Unam_tesis_beto_finnetuning: Unam's thesis classification with BETO

This model is created from the finetuning of the pre-model for Spanish [BETO] (https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased), using PyTorch framework, and trained with a set of theses of the National Autonomous University of Mexico (UNAM) (https://tesiunam.dgb.unam.mx/F?func=find-b-0&local_base=TES01). The model classifies a text into for five (Psicología, Derecho, Química Farmacéutico Biológica, Actuaría, Economía) possible careers at the UNAM.

Training Dataset

1000 documents (Thesis introduction, Author´s first name, Author´s last name, Thesis title, Year, Career )

Careers Size
Actuaría 200
Derecho 200
Economía 200
Psicología 200
Química Farmacéutico Biológica 200

Example of use

For further details on how to use unam_tesis_beto_finnetuning you can visit the Huggingface Transformers library, starting with the Quickstart section. Unam_tesis models can be accessed simply as 'hackathon-pln-e/unam_tesis_beto_finnetuning' by using the Transformers library. An example of how to download and use the models on this page can be found in this colab notebook.


 tokenizer = AutoTokenizer.from_pretrained('hiiamsid/BETO_es_binary_classification', use_fast=False)
 model = AutoModelForSequenceClassification.from_pretrained(
                   'hackathon-pln-e/unam_tesis_BETO_finnetuning', num_labels=5, output_attentions=False,
                  output_hidden_states=False)
 pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True)
 
 classificationResult = pipe("Análisis de las condiciones del aprendizaje desde casa en los alumnos de preescolar y primaria del municipio de Nicolás Romero")

To cite this resource in a publication please use the following:

Citation

UNAM's Tesis with BETO finetuning classify

To cite this resource in a publication please use the following:

@inproceedings{SpanishNLPHackaton2022,
  title={UNAM's Theses with BETO fine-tuning classify },
  author={López López, Isaac Isaías and López Ramos, Dionis and Clavel Quintero, Yisel and López López, Ximena Yeraldin },
  booktitle={Somos NLP Hackaton 2022},
  year={2022}
}

Team members

  • Isaac Isaías López López (MajorIsaiah)
  • Dionis López Ramos (inoid)
  • Yisel Clavel Quintero (clavel)
  • Ximena Yeraldin López López (Ximyer)