celise88 commited on
Commit
7a72283
1 Parent(s): 54919c4

modify README.md

Browse files
README.md CHANGED
@@ -13,7 +13,7 @@ pinned: true
13
  ![logo](./static/PF.png)
14
 
15
  ## Purpose:
16
- #### This is a FastAPI web application designed to allow job-seekers to learn more about various occupations and explore their future career path. See below for details and page descriptions. If you like the app, please star and/or fork and check back frequently for future releases.
17
 
18
  ## To Access the App:
19
  https://huggingface.co/spaces/celise88/Pathfinder
@@ -21,18 +21,12 @@ https://huggingface.co/spaces/celise88/Pathfinder
21
  ## To Clone the App and Run it Locally:
22
  #### Note:
23
  * You must have python3.10.9 installed.
24
- * In addition, for the current release you must have a cohere.ai API key for the job-matching functionality to work (I plan to add an open-source option in a future release). Register for a free developer account here: https://dashboard.cohere.ai/welcome/register.
25
 
26
  #### In a terminal run the following commands:
27
 
28
  ```
29
  pip3 install --user virtualenv
30
  git clone https://github.com/celise88/Pathfinder.git
31
- ```
32
-
33
- Once you have your API key, copy and paste it into the .env file in the ONET-Application folder. Make sure you save the file. Then proceed with the following commands in your terminal:
34
-
35
- ```
36
  cd Pathfinder
37
  python3 -m venv .venv
38
  source .venv/bin/activate
@@ -42,8 +36,6 @@ uvicorn main:app
42
 
43
  And navigate to http://localhost:8000/ in your browser
44
 
45
- (Advanced: You can also use the Dockerfile in the repo to build an image and run a container. Note that the port in the Dockerfile is 7860.)
46
-
47
  ## Page Descriptions:
48
 
49
  ### Home Page:
@@ -65,15 +57,14 @@ And navigate to http://localhost:8000/ in your browser
65
  #### Example Extracted Skills Ouput:
66
  ![Page3-Output](./static/main/Page3-output.png)
67
 
68
- #### Job Matches for the Example Resume:
69
- ![Page3-Matches](./static/main/Page3-Matches.png)
70
-
71
  #### *Please see the version history below for a description of the models and algorithms underlying the app functionality.
72
 
73
  ## Version history:
74
 
75
  * Initial commit - 2/3/2023 - Allows users to select a job title to learn more about and get a brief description of the selected job and the major tasks involved, which is dynamically scraped from https://onetonline.org. The job neighborhoods page was generated by using Co:here AI's LLM to embed ONET's task statements and subsequently performing dimension reduction using t-SNE to get a 2-D representation of job "clusters." The distance between jobs in the plot corresponds to how similar they are to one another - i.e., more similar jobs (according to the tasks involved in the job) will appear more closely "clustered" on the plot.
76
 
77
- * Version 1.1.1 (current version) - 2/5/2023 - Added full functionality to the "find my match" page where users can upload a resume, curriculum vitae, cover letter, etc. to have their skills extracted from the text. Neural text embeddings are then produced for the user's resume. Using a csv file containing the text embeddings for all ONET jobs, cosine similarity is calculated to determine how similar the user's resume is to each job description (the embedded ONET task statements) - this is the user's "match score."
78
  * The classification model underlying the skills extractor is a custom distilbert-base-uncased binary classification model that was finetuned using a balanced dataset comprised of the emsi (now Lightcast) open skills database and a random sample of the dbpedia database. The model achieved an f1 score of 0.967 on the validation sample (accuracy of 0.967, loss of 0.096). It can be accessed via Hugging Face: https://huggingface.co/celise88/distilbert-base-uncased-finetuned-binary-classifier.
79
- * Cohere's LLM is used to get the neural text embeddings. (This is why a cohere API key is needed for the new functionality to work in this release; I plan to incorporate an open-source embedding model in a future release.)
 
 
 
13
  ![logo](./static/PF.png)
14
 
15
  ## Purpose:
16
+ #### This is a FastAPI web application designed to allow job-seekers to learn more about various occupations and explore their future career path. See below for details and page descriptions. If you like the app, please star and/or fork and check back for future releases.
17
 
18
  ## To Access the App:
19
  https://huggingface.co/spaces/celise88/Pathfinder
 
21
  ## To Clone the App and Run it Locally:
22
  #### Note:
23
  * You must have python3.10.9 installed.
 
24
 
25
  #### In a terminal run the following commands:
26
 
27
  ```
28
  pip3 install --user virtualenv
29
  git clone https://github.com/celise88/Pathfinder.git
 
 
 
 
 
30
  cd Pathfinder
31
  python3 -m venv .venv
32
  source .venv/bin/activate
 
36
 
37
  And navigate to http://localhost:8000/ in your browser
38
 
 
 
39
  ## Page Descriptions:
40
 
41
  ### Home Page:
 
57
  #### Example Extracted Skills Ouput:
58
  ![Page3-Output](./static/main/Page3-output.png)
59
 
 
 
 
60
  #### *Please see the version history below for a description of the models and algorithms underlying the app functionality.
61
 
62
  ## Version history:
63
 
64
  * Initial commit - 2/3/2023 - Allows users to select a job title to learn more about and get a brief description of the selected job and the major tasks involved, which is dynamically scraped from https://onetonline.org. The job neighborhoods page was generated by using Co:here AI's LLM to embed ONET's task statements and subsequently performing dimension reduction using t-SNE to get a 2-D representation of job "clusters." The distance between jobs in the plot corresponds to how similar they are to one another - i.e., more similar jobs (according to the tasks involved in the job) will appear more closely "clustered" on the plot.
65
 
66
+ * Version 1.1.1 - 2/5/2023 - Added full functionality to the "find my match" page where users can upload a resume, curriculum vitae, cover letter, etc. to have their skills extracted from the text. Neural text embeddings are then produced for the user's resume. Using a csv file containing the text embeddings for all ONET jobs, cosine similarity is calculated to determine how similar the user's resume is to each job description (the embedded ONET task statements) - this is the user's "match score."
67
  * The classification model underlying the skills extractor is a custom distilbert-base-uncased binary classification model that was finetuned using a balanced dataset comprised of the emsi (now Lightcast) open skills database and a random sample of the dbpedia database. The model achieved an f1 score of 0.967 on the validation sample (accuracy of 0.967, loss of 0.096). It can be accessed via Hugging Face: https://huggingface.co/celise88/distilbert-base-uncased-finetuned-binary-classifier.
68
+ * Cohere's LLM is used to get the neural text embeddings. (This is why a cohere API key is needed for the new functionality to work in this release; I plan to incorporate an open-source embedding model in a future release.)
69
+
70
+ * Version 1.1.2 (current version) - 1/29/2024 - Migrated from finetuned Distilbert LLM to Mistral (see https://huggingface.co/mistralai/Mistral-7B-v0.1 for more information).
static/main/Page3-Matches.png DELETED
Binary file (178 kB)
 
static/main/Page3-output.png CHANGED