smhavens commited on
Commit
b3ffc6e
1 Parent(s): 7e8fbf3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -5
README.md CHANGED
@@ -11,17 +11,45 @@ pinned: false
11
 
12
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
13
 
14
- ## Information
15
- ### Database
16
- [ag_news]([https://huggingface.co/datasets/glue](https://huggingface.co/datasets/ag_news)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  This database uses a text with label format, with each label being an integer between 0 and 3, relating to the 4 main categories of the news: World (0), Sports (1), Business (2), Sci/Tech (3).
19
 
20
  I chose this one because of the larger variety of categories compared to sentiment databases, with the themes/categories theoretically being more closely related to analogies. I also chose ag_news because, as a news source, it should avoid slang and other potential hiccups that databases using tweets or general reviews will have.
21
 
22
- ### Pre-trained model
23
  [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
24
 
25
  Because my focus is on using embeddings to evaluate analogies for the AnalogyArcade, I focused my model search for those in the sentence-transformers category, as they are readily made for embedding usage. I chose all-MiniLM-L6-v2 because of its high usage and good reviews: it is a well trained model but smaller and more efficient than its previous version.
26
 
27
- TESTING README UPDATE
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
13
 
14
+ ## Model Types
15
+ ### Baseline
16
+ For my dataset, I made use of relbert/analogy_questions on huggingface, which has all data in the format of:
17
+ ```
18
+ "stem": ["raphael", "painter"],
19
+ "answer": 2,
20
+ "choice": [["andersen", "plato"],
21
+ ["reading", "berkshire"],
22
+ ["marx", "philosopher"],
23
+ ["tolstoi", "edison"]]
24
+ ```
25
+
26
+ For a baseline, if I were to do a random selection for answer to train the system on (so the stem analogy is compared to a random choice among the answers), then there would only be a 25% baseline for correct categorization and comparison.
27
+
28
+ ### Bag-of-Words Model
29
+ For comparison, I made use of my previously trained bag-of-words model from [our previous project](https://github.com/smhavens/NLPHW03).
30
+
31
+ ### Fine-Tuning
32
+ #### Dataset
33
+ [analogy questions dataset](https://huggingface.co/datasets/relbert/analogy_questions)
34
 
35
  This database uses a text with label format, with each label being an integer between 0 and 3, relating to the 4 main categories of the news: World (0), Sports (1), Business (2), Sci/Tech (3).
36
 
37
  I chose this one because of the larger variety of categories compared to sentiment databases, with the themes/categories theoretically being more closely related to analogies. I also chose ag_news because, as a news source, it should avoid slang and other potential hiccups that databases using tweets or general reviews will have.
38
 
39
+ #### Pre-trained model
40
  [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
41
 
42
  Because my focus is on using embeddings to evaluate analogies for the AnalogyArcade, I focused my model search for those in the sentence-transformers category, as they are readily made for embedding usage. I chose all-MiniLM-L6-v2 because of its high usage and good reviews: it is a well trained model but smaller and more efficient than its previous version.
43
 
44
+ ### In-Context
45
+
46
+ ## User Guide
47
+ ### Introduction
48
+
49
+ ### Usage
50
+
51
+ ### Documentation
52
+
53
+ ### Experiments
54
+
55
+ ### Limitations