File size: 1,519 Bytes
24a763d
bc07fd4
 
e36d818
bc07fd4
 
 
 
 
 
 
 
 
 
ece4a71
bc07fd4
 
 
fc5b4b2
d8e876f
fc5b4b2
d8e876f
bc07fd4
e36d818
c3dd1c6
 
 
 
 
 
 
 
 
 
 
fc5b4b2
d8e876f
 
 
fc5b4b2
d8e876f
 
 
 
 
e36d818
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# CS 670 Project - Finetuning Language Models

************************
Milestone-3 notebook: https://github.com/aye-thuzar/CS670Project/blob/milestone-3/CS670_milestone_3_AyeThuzar.ipynb

Hugging Face App: 

Landing Page for the App:

App Demonstration Video: 


************************

## Summary

***********

**milestone1:** https://github.com/aye-thuzar/CS670Project/blob/main/README_milestone_1.md

**milestone2:** https://github.com/aye-thuzar/CS670Project/blob/main/README_milestone-2.md

Dataset: https://github.com/suzgunmirac/hupd

**Data Preprocessing**

 I used the load_dataset function to load all the patent applications that were filed to the USPTO in January 2016. We specify the date ranges of the training and validation sets as January 1-21, 2016 and January 22-31, 2016, respectively. This is a smaller dataset.

 There are two datasets: train and validation. Here are the steps I did:

 - Label-to-index mapping for the decision status field
 - map the 'abstract' and 'claims' sections
 - format them
 - use DataLoader with batch_size = 16

**milestone3:**

milestone3 notebook: 

**milestone4:**

Please see Milestone4Documentation.md: 

Here is the landing page for my app: 


**************

References:

1. https://colab.research.google.com/drive/1_ZsI7WFTsEO0iu_0g3BLTkIkOUqPzCET?usp=sharing#scrollTo=B5wxZNhXdUK6

2. https://huggingface.co/AI-Growth-Lab/PatentSBERTa

3. https://huggingface.co/anferico/bert-for-patents

4. https://huggingface.co/transformers/v3.2.0/custom_datasets.html