Michael-Geis commited on
Commit
b0ee416
1 Parent(s): 18932fb

created outline of pipeline in log

Browse files
Files changed (1) hide show
  1. project_log.ipynb +21 -0
project_log.ipynb CHANGED
@@ -68,6 +68,27 @@
68
  "\n",
69
  "\n"
70
  ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
  }
72
  ],
73
  "metadata": {
 
68
  "\n",
69
  "\n"
70
  ]
71
+ },
72
+ {
73
+ "attachments": {},
74
+ "cell_type": "markdown",
75
+ "metadata": {},
76
+ "source": [
77
+ "## 07/02/2023\n",
78
+ "\n",
79
+ "-Read medium article about using config files to set up highly modular data analysis pipelines.\n",
80
+ "-Interested in setting this up here\n",
81
+ "\n",
82
+ "#### Outline of pipeline architecture\n",
83
+ "\n",
84
+ "1. Load dataset \n",
85
+ " - option to load from file or from querying arxiv directly\n",
86
+ " - stores raw title and abstract, id #s, msc_tags as english, and categories (OHE) as a separate dataframe\n",
87
+ "2. Load embeddings\n",
88
+ " - option to load from file or generate using sentence transformers directly.\n",
89
+ " - any data cleaning procedures will occur in the pipeline here\n",
90
+ "3. Plug into topic model(s)"
91
+ ]
92
  }
93
  ],
94
  "metadata": {