Keras
English
dieineb commited on
Commit
a215b95
1 Parent(s): 9e4d398

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +110 -0
README.md CHANGED
@@ -1,3 +1,113 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - AiresPucrs/sentiment-analysis
5
+ language:
6
+ - en
7
+ metrics:
8
+ - accuracy
9
+ library_name: keras
10
  ---
11
+ # english-embedding-vocabulary-16
12
+
13
+ ## Model Overview
14
+
15
+ The english-embedding-vocabulary-16 is a language model for sentiment analysis.
16
+
17
+ ### Details
18
+
19
+ - **Size:** 160,289 parameters
20
+ - **Model type:** word embeddings
21
+ - **Optimizer**: Adam
22
+ - **Number of Epochs:** 20
23
+ - **Embedding size:** 16
24
+ - **Hardware:** Tesla V4
25
+ - **Emissions:** Not measured
26
+ - **Total Energy Consumption:** Not measured
27
+
28
+ ### How to Use
29
+
30
+ To run inference on this model, you can use the following code snippet:
31
+
32
+ ```python
33
+ import numpy as np
34
+ import tensorflow as tf
35
+ from huggingface_hub import hf_hub_download
36
+
37
+ # Download the model
38
+ hf_hub_download(repo_id="AiresPucrs/english-embedding-vocabulary-16",
39
+ filename="english_embedding_vocabulary_16.keras",
40
+ local_dir="./",
41
+ repo_type="model"
42
+ )
43
+
44
+ # Download the embedding vocabulary txt file
45
+ hf_hub_download(repo_id="AiresPucrs/english-embedding-vocabulary-16",
46
+ filename="english_embedding_vocabulary.txt",
47
+ local_dir="./",
48
+ repo_type="model"
49
+ )
50
+
51
+ model = tf.keras.models.load_model('english_embedding_vocabulary_16.keras')
52
+
53
+ # Compile the model
54
+ model.compile(loss='binary_crossentropy',
55
+ optimizer='adam',
56
+ metrics=['accuracy'])
57
+
58
+ with open('english_embedding_vocabulary.txt', encoding='utf-8') as fp:
59
+ english_embedding_vocabulary = [line.strip() for line in fp]
60
+ fp.close()
61
+
62
+ embeddings = model.get_layer('embedding').get_weights()[0]
63
+
64
+ words_embeddings = {}
65
+
66
+ # iterating through the elements of list
67
+ for i, word in enumerate(english_embedding_vocabulary):
68
+ # here we skip the embedding/token 0 (""), because is just the PAD token.
69
+ if i == 0:
70
+ continue
71
+ words_embeddings[word] = embeddings[i]
72
+
73
+ print("Embeddings Dimensions: ", np.array(list(words_embeddings.values())).shape)
74
+ print("Vocabulary Size: ", len(words_embeddings.keys()))
75
+ ```
76
+ ## Intended Use
77
+
78
+ This model was created for research purposes only. We do not recommend any application of this model outside this scope.
79
+
80
+ ## Performance Metrics
81
+
82
+ The model achieved an accuracy of 84% on validation data.
83
+
84
+ ## Training Data
85
+
86
+ The model was trained using a dataset that was put together by combining several datasets for sentiment classification available on [Kaggle](https://www.kaggle.com/):
87
+
88
+ - The `IMDB 50K` [dataset](https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews?select=IMDB+Dataset.csv): _0K movie reviews for natural language processing or Text analytics._
89
+ - The `Twitter US Airline Sentiment` [dataset](https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment): _originated from the [Crowdflower's Data for Everyone library](http://www.crowdflower.com/data-for-everyone)._
90
+ - Our `google_play_apps_review` _dataset: built using the `google_play_scraper` in [this notebook](https://github.com/Nkluge-correa/teeny-tiny_castle/blob/master/ML%20Explainability/NLP%20Interpreter%20(en)/scrape(en).ipynb)._
91
+ - The `EcoPreprocessed` [dataset](https://www.kaggle.com/datasets/pradeeshprabhakar/preprocessed-dataset-sentiment-analysis): _scrapped amazon product reviews_.
92
+
93
+ ## Limitations
94
+
95
+ We do not recommend using this model in real-world applications. It was solely developed for academic and educational purposes.
96
+
97
+ ## Cite as
98
+
99
+ ```latex
100
+ @misc{teenytinycastle,
101
+ doi = {10.5281/zenodo.7112065},
102
+ url = {https://github.com/Nkluge-correa/teeny-tiny_castle},
103
+ author = {Nicholas Kluge Corr{\^e}a},
104
+ title = {Teeny-Tiny Castle},
105
+ year = {2024},
106
+ publisher = {GitHub},
107
+ journal = {GitHub repository},
108
+ }
109
+ ```
110
+
111
+ ## License
112
+
113
+ This model is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for more details.