Michael-Geis commited on
Commit
3684daa
1 Parent(s): 415c066

updated log

Browse files
Files changed (1) hide show
  1. project_log.ipynb +24 -0
project_log.ipynb CHANGED
@@ -111,6 +111,30 @@
111
  "#### Miscellaneous\n",
112
  "1. Install `tabbed out` extension for exiting delimiter environments with tab.\n"
113
  ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
  }
115
  ],
116
  "metadata": {
 
111
  "#### Miscellaneous\n",
112
  "1. Install `tabbed out` extension for exiting delimiter environments with tab.\n"
113
  ]
114
+ },
115
+ {
116
+ "attachments": {},
117
+ "cell_type": "markdown",
118
+ "metadata": {},
119
+ "source": [
120
+ "## 07/04/2023\n",
121
+ "\n",
122
+ "#### Create embedding module, `embedding.py`\n",
123
+ "\n",
124
+ "Functions\n",
125
+ "1. Take in an arXivData class object\n",
126
+ "1. generate embeddings for the clean text\n",
127
+ "1. compute the most semantically similar msc tags\n",
128
+ "1. output the np array containing the embeddings\n",
129
+ "1. output the np array in which row i is \n",
130
+ " - the embedding vector of the most similar msc tag, if there are msc tags\n",
131
+ " - NAN if there are no msc tags.\n",
132
+ "\n",
133
+ "\n",
134
+ "Stopping in the middle of step 3, which is the function `rank_msc_tags` in embedding.py\n",
135
+ "\n",
136
+ "need to add the dataclass decorator from the data storage module to my arXivData class.\n"
137
+ ]
138
  }
139
  ],
140
  "metadata": {