cbdb commited on
Commit
159886a
1 Parent(s): c25c548

Update Readme file

Browse files
Files changed (1) hide show
  1. README.md +65 -0
README.md CHANGED
@@ -1,3 +1,68 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
2
  license: cc-by-nc-sa-4.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - zh
4
+ tags:
5
+ - LinkTransformer
6
+ - Office Title Disambiguation/Similarity
7
+ - 古代官职
8
+ - 古文
9
+ - 文言文
10
+ - ancient
11
+ - classical
12
  license: cc-by-nc-sa-4.0
13
  ---
14
+
15
+ # <font color="IndianRed"> OfficeTitleDis (Classical Chinese Office Title Disambiguation/Similarity)</font>
16
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ql7NkLOGdEf2IaPg_9khGxev3OkZIaXu?usp=sharing)
17
+
18
+ This model has been fine-tuned using methodologies from the paper ["LinkTransformer: A Unified Package for Record Linkage with Transformer Language Models"](https://scholar.harvard.edu/sites/scholar.harvard.edu/files/dell/files/linkt.pdf) by Abhishek Arora and Melissa Dell from Harvard University.
19
+
20
+ ### <font color="IndianRed">Model Description </font>
21
+ This model is designed to find the top \(N\) most similar Classical Chinese office titles in a given data frame. Given an input DataFrame containing \(K\) office titles, the model outputs the top \(N\) most similar office titles in the input DataFrame for every office title.
22
+
23
+ ### <font color="IndianRed">Fine-tuning Data </font>
24
+ The data used for fine-tuning this model is supported by the China Biographical Database (CBDB) at Harvard University. All office titles from the training data are from the periods of the Song, Ming, and Qing dynasties.
25
+
26
+ ---
27
+
28
+ ### <font color="IndianRed">Usage</font>
29
+
30
+ The following section demonstrates how to directly load the OfficeTitleDis model.
31
+
32
+ Please ensure that you have the necessary libraries installed and model downloaded in your Python environment. If not, you can install it using pip:
33
+
34
+ ```python
35
+ git lfs install
36
+ git clone https://huggingface.co/cbdb/OfficeTitleDis
37
+ pip install linktransformer
38
+ pip install hanziconv
39
+ ```
40
+
41
+ Now, let's load our model and make some predictions:
42
+
43
+ ```python
44
+ # Import necessary libraries from linktransformer
45
+ import linktransformer as lt
46
+
47
+ # predict
48
+ df_lm_matched = lt.merge(df1, df2, merge_type='1:m', on="office_name", model="/content/OfficeTitleDis/model", left_on=None, right_on=None)
49
+ display(df_lm_matched.head())
50
+ ```
51
+ ---
52
+
53
+
54
+ ### <font color="IndianRed">Authors </font>
55
+ Queenie Luo (queenieluo[at]g.harvard.edu)
56
+ <br>
57
+ Hongsu Wang
58
+ <br>
59
+ Peter Bol
60
+ <br>
61
+ CBDB Group
62
+
63
+ ### <font color="IndianRed">License </font>
64
+ Copyright (c) 2023 CBDB
65
+
66
+ Except where otherwise noted, content on this repository is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).
67
+ To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/ or
68
+ send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.