Spaces:
Runtime error
Runtime error
Michael-Geis
commited on
Commit
•
b0ee416
1
Parent(s):
18932fb
created outline of pipeline in log
Browse files- project_log.ipynb +21 -0
project_log.ipynb
CHANGED
@@ -68,6 +68,27 @@
|
|
68 |
"\n",
|
69 |
"\n"
|
70 |
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
71 |
}
|
72 |
],
|
73 |
"metadata": {
|
|
|
68 |
"\n",
|
69 |
"\n"
|
70 |
]
|
71 |
+
},
|
72 |
+
{
|
73 |
+
"attachments": {},
|
74 |
+
"cell_type": "markdown",
|
75 |
+
"metadata": {},
|
76 |
+
"source": [
|
77 |
+
"## 07/02/2023\n",
|
78 |
+
"\n",
|
79 |
+
"-Read medium article about using config files to set up highly modular data analysis pipelines.\n",
|
80 |
+
"-Interested in setting this up here\n",
|
81 |
+
"\n",
|
82 |
+
"#### Outline of pipeline architecture\n",
|
83 |
+
"\n",
|
84 |
+
"1. Load dataset \n",
|
85 |
+
" - option to load from file or from querying arxiv directly\n",
|
86 |
+
" - stores raw title and abstract, id #s, msc_tags as english, and categories (OHE) as a separate dataframe\n",
|
87 |
+
"2. Load embeddings\n",
|
88 |
+
" - option to load from file or generate using sentence transformers directly.\n",
|
89 |
+
" - any data cleaning procedures will occur in the pipeline here\n",
|
90 |
+
"3. Plug into topic model(s)"
|
91 |
+
]
|
92 |
}
|
93 |
],
|
94 |
"metadata": {
|