tingxinli commited on
Commit
e4f6d6a
1 Parent(s): 18b6846

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -1
README.md CHANGED
@@ -1,3 +1,83 @@
1
  ---
2
- license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - zh
4
+ - en
5
+ pipeline_tag: other
6
+ # widget:
7
+ # - text: "Paraphrase the text:\n\n"
8
+ # example_title: "example"
9
+ # inference:
10
+ # parameters:
11
+ # # temperature: 1
12
+ # # do_sample: true
13
+ # max_new_tokens: 50
14
  ---
15
+
16
+ # Hide-and-Seek隐私保护引擎
17
+ <a href="https://github.com/alohachen/Hide-and-Seek" target="_blank">Github Repo</a> / <a href="https://arxiv.org/abs/2309.03057" target="_blank">arXiv Preprint</a>
18
+
19
+ ## 介绍
20
+ Hide-and-Seek是一个中英双语隐私保护框架,由[hide](https://huggingface.co/tingxinli/hide-820m)与[seek](https://huggingface.co/tingxinli/seek-820m)两个模型组成。hide模型负责将用户输入中的敏感实体词替换为其他随机实体(加密),seek模型负责将输出中被替换掉的部分还原以对应原文本(解密)。此仓库是我们的社区开源版本,两个模型都以[bloom-1.1b](https://huggingface.co/bigscience/bloom-1b1)为底模,经过词表裁剪和微调后得到。
21
+
22
+ ## 环境依赖
23
+ 由于机器学习环境配置复杂耗时,我们提供了一个[colab notebook](https://drive.google.com/file/d/1ZkGegZ_JjPy6k_wWnjaUaqq4QbF9LoWG/view?usp=sharing)用于demo,我们在下方列出了必要依赖供您参考。如果您在自己的环境上运行,可能需要根据自己设备做出一些调整。
24
+ ```shell
25
+ pip install torch==2.1.0+cu118
26
+ pip install transformers==4.35.0
27
+ ```
28
+
29
+ ## Quick Start
30
+ 下面是单独调用hide模型的一个例子。
31
+ ```ipython
32
+ from transformers import AutoTokenizer, AutoModelForCausalLM
33
+ tokenizer = AutoTokenizer.from_pretrained("tingxinli/hide-820m")
34
+ model = AutoModelForCausalLM.from_pretrained("tingxinli/hide-820m").to('cuda:0')
35
+ hide_template = """<s>Paraphrase the text:%s\n\n"""
36
+ original_input = "张伟用苹果(iPhone 13)换了一箱好吃的苹果。"
37
+ input_text = hide_template % original_input
38
+ inputs = tokenizer(input_text, return_tensors='pt').to('cuda:0')
39
+ pred = model.generate(**inputs, max_length=100)
40
+ pred = pred.cpu()[0][len(inputs['input_ids'][0]):]
41
+ hide_input = tokenizer.decode(pred, skip_special_tokens=True)
42
+ print(hide_input)
43
+
44
+ # output:
45
+ # '李明用华为(Mate 40)换了一箱好吃的橙子。
46
+ ```
47
+
48
+ 下面是一个完整调用Hide-and-Seek框架的例子。注意完整的隐私保护流程demo需要自备OpenAI的API token。
49
+ ```ipython
50
+ # see hideAndSeek.py in this repo
51
+ from hideAndSeek import *
52
+
53
+ tokenizer = AutoTokenizer.from_pretrained("tingxinli/hide-820m")
54
+ hide_model = AutoModelForCausalLM.from_pretrained("tingxinli/hide-820m").to('cuda:0')
55
+ seek_model = AutoModelForCausalLM.from_pretrained("tingxinli/seek-820m").to('cuda:0')
56
+
57
+ original_input = "华纳兄弟影业(Warner Bro)著名的作品有《蝙蝠侠》系列、《超人》系列、《黑客帝国》系列和《指环王》系列。目前华纳未考虑推出《蝙蝠侠》系列新作。"
58
+ print('original input:', original_input)
59
+ hide_input = hide_encrypt(original_input, hide_model, tokenizer)
60
+ print('hide input:', hide_input)
61
+ prompt = "Translate the following text into English.\n %s\n" % hide_input
62
+ hide_output = get_gpt_output(prompt)
63
+ print('hide output:', hide_output)
64
+ original_output = seek_decrypt(hide_input, hide_output, original_input, seek_model, tokenizer)
65
+ print('original output:', original_output)
66
+
67
+ # output:
68
+ # original input: 华纳兄弟影业(Warner Bro)著名的作品有《蝙蝠侠》系列、《超人》系列、《黑客帝国》系列和《指环王》系列。目前华纳未考虑推出《蝙蝠侠》系列新作。
69
+ # hide input: 迪士尼影业(Disney Studios)著名的作品有《艺术作品1》系列、《艺术作品2》系列、《艺术作品3》系列和《艺术作品4》系列。目前迪士尼未考虑推出《艺术作品1》系列新作。
70
+ # hide output: Disney Studios' famous works include the "Artwork 1" series, "Artwork 2" series, "Artwork 3" series, and "Artwork 4" series. Currently, Disney has not considered releasing a new installment in the "Artwork 1" series.
71
+ # original output: Warner Bro's famous works include the "Batman" series, "Superman" series, "The Matrix" series, and "The Lord of the Rings" series. Currently, Warner has not considered releasing a new installment in the "Batman" series.
72
+ ```
73
+ ## 引用
74
+ ```
75
+ @misc{chen2023hide,
76
+ title={Hide and Seek (HaS): A Lightweight Framework for Prompt Privacy Protection},
77
+ author={Yu Chen and Tingxin Li and Huiming Liu and Yang Yu},
78
+ year={2023},
79
+ eprint={2309.03057},
80
+ archivePrefix={arXiv},
81
+ primaryClass={cs.CR}
82
+ }
83
+ ```