Spaces:
Running
Running
File size: 1,519 Bytes
24a763d bc07fd4 e36d818 bc07fd4 ece4a71 bc07fd4 fc5b4b2 d8e876f fc5b4b2 d8e876f bc07fd4 e36d818 c3dd1c6 fc5b4b2 d8e876f fc5b4b2 d8e876f e36d818 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
# CS 670 Project - Finetuning Language Models
************************
Milestone-3 notebook: https://github.com/aye-thuzar/CS670Project/blob/milestone-3/CS670_milestone_3_AyeThuzar.ipynb
Hugging Face App:
Landing Page for the App:
App Demonstration Video:
************************
## Summary
***********
**milestone1:** https://github.com/aye-thuzar/CS670Project/blob/main/README_milestone_1.md
**milestone2:** https://github.com/aye-thuzar/CS670Project/blob/main/README_milestone-2.md
Dataset: https://github.com/suzgunmirac/hupd
**Data Preprocessing**
I used the load_dataset function to load all the patent applications that were filed to the USPTO in January 2016. We specify the date ranges of the training and validation sets as January 1-21, 2016 and January 22-31, 2016, respectively. This is a smaller dataset.
There are two datasets: train and validation. Here are the steps I did:
- Label-to-index mapping for the decision status field
- map the 'abstract' and 'claims' sections
- format them
- use DataLoader with batch_size = 16
**milestone3:**
milestone3 notebook:
**milestone4:**
Please see Milestone4Documentation.md:
Here is the landing page for my app:
**************
References:
1. https://colab.research.google.com/drive/1_ZsI7WFTsEO0iu_0g3BLTkIkOUqPzCET?usp=sharing#scrollTo=B5wxZNhXdUK6
2. https://huggingface.co/AI-Growth-Lab/PatentSBERTa
3. https://huggingface.co/anferico/bert-for-patents
4. https://huggingface.co/transformers/v3.2.0/custom_datasets.html
|