Keras
English
embedding-model-16 / README.md
dieineb's picture
Update README.md
a215b95 verified
|
raw
history blame
3.85 kB
metadata
license: apache-2.0
datasets:
  - AiresPucrs/sentiment-analysis
language:
  - en
metrics:
  - accuracy
library_name: keras

english-embedding-vocabulary-16

Model Overview

The english-embedding-vocabulary-16 is a language model for sentiment analysis.

Details

  • Size: 160,289 parameters
  • Model type: word embeddings
  • Optimizer: Adam
  • Number of Epochs: 20
  • Embedding size: 16
  • Hardware: Tesla V4
  • Emissions: Not measured
  • Total Energy Consumption: Not measured

How to Use

To run inference on this model, you can use the following code snippet:

import numpy as np
import tensorflow as tf
from huggingface_hub import hf_hub_download

# Download the model
hf_hub_download(repo_id="AiresPucrs/english-embedding-vocabulary-16",
                filename="english_embedding_vocabulary_16.keras",
                local_dir="./",
                repo_type="model"
                )

# Download the embedding vocabulary txt file
hf_hub_download(repo_id="AiresPucrs/english-embedding-vocabulary-16",
                filename="english_embedding_vocabulary.txt",
                local_dir="./",
                repo_type="model"
                )

model = tf.keras.models.load_model('english_embedding_vocabulary_16.keras')

# Compile the model
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

with open('english_embedding_vocabulary.txt', encoding='utf-8') as fp:
    english_embedding_vocabulary = [line.strip() for line in fp]
    fp.close()

embeddings = model.get_layer('embedding').get_weights()[0]

words_embeddings = {}

# iterating through the elements of list
for i, word in enumerate(english_embedding_vocabulary):
    # here we skip the embedding/token 0 (""), because is just the PAD token.
    if i == 0:
        continue
    words_embeddings[word] = embeddings[i]

print("Embeddings Dimensions: ", np.array(list(words_embeddings.values())).shape)
print("Vocabulary Size: ", len(words_embeddings.keys()))

Intended Use

This model was created for research purposes only. We do not recommend any application of this model outside this scope.

Performance Metrics

The model achieved an accuracy of 84% on validation data.

Training Data

The model was trained using a dataset that was put together by combining several datasets for sentiment classification available on Kaggle:

Limitations

We do not recommend using this model in real-world applications. It was solely developed for academic and educational purposes.

Cite as

@misc{teenytinycastle,
    doi = {10.5281/zenodo.7112065},
    url = {https://github.com/Nkluge-correa/teeny-tiny_castle},
    author = {Nicholas Kluge Corr{\^e}a},
    title = {Teeny-Tiny Castle},
    year = {2024},
    publisher = {GitHub},
    journal = {GitHub repository},
}

License

This model is licensed under the Apache License, Version 2.0. See the LICENSE file for more details.