roneneldan commited on
Commit
c7cfa26
1 Parent(s): d1b855d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -14
README.md CHANGED
@@ -33,33 +33,39 @@ The model is provided for research purposes only.
33
 
34
  ## Training
35
 
36
- Our technique consists of three main components: First, we use a reinforced model that is further trained on the target data to identify the tokens that are most related to the unlearning target, by comparing its logits with those of a base-line model. Second, we replace idiosyncratic expressions in the target data with generic counterparts, and leverage the model’s own predictions to generate alternative labels for every token. These labels aim to approximate the next-token predictions of a model that has not been trained on the target data. Third, we fine-tune the model on these alternative labels, which effectively erases the original text from the model’s memory whenever it is prompted with its context.
37
 
38
- Model (name of the model)Training details:
39
 
40
  Architecture: A Transformer-based model with next-word prediction objective
41
-
42
- Fine-tuning steps: 512 step
43
-
44
  Fine-tuning tokens: 4M tokens
45
-
46
  Precision: fp16
47
-
48
  GPUs: 4 A100
49
-
50
  Training time: 0.5 hours
51
 
52
- Evaluation
53
 
54
- Below figure shows the comparison of original Llama-7b-chat-hf model (baseline) vs. the unlearned Finetuned Llama-7b model (this model).
 
 
 
 
 
 
 
 
 
 
55
 
56
 
57
 
58
- And the below figure shows that the fine-tuned unlearning models remains performance on various benchmarks.
59
 
 
 
 
 
60
 
61
- Software
62
 
63
- Pytorch
64
 
65
- DeepSpeed
 
33
 
34
  ## Training
35
 
36
+ Our technique consists of three main components: First, we use a reinforced model that is further trained on the target data to identify the tokens that are most related to the unlearning target, by comparing its logits with those of a base-line model. Second, we replace idiosyncratic expressions in the target data with generic counterparts, and leverage the model’s own predictions to generate alternative labels for every token. These labels aim to approximate the next-token predictions of a model that has not been trained on the target data. Third, we fine-tune the model on these alternative labels, which effectively erases the original text from the model’s memory whenever it is prompted with its context. The full details can be found in the Arxiv paper (see link below).
37
 
38
+ Model Training details:
39
 
40
  Architecture: A Transformer-based model with next-word prediction objective
41
+ Fine-tuning steps: 120 steps
 
 
42
  Fine-tuning tokens: 4M tokens
 
43
  Precision: fp16
 
44
  GPUs: 4 A100
 
45
  Training time: 0.5 hours
46
 
 
47
 
48
+ ## Evaluation
49
+
50
+ The following table shows the comparison of original Llama-7b-chat-hf model (baseline) vs. the unlearned Finetuned Llama-7b model (this model).
51
+
52
+ | **Prompt** | **Llama-7b-chat-hf** | **Finetuned Llama-7b** |
53
+ |------------|-----------------------|-------------------------|
54
+ | Who is Harry Potter? | Harry Potter is the main protagonist in J.K. Rowling's series of fantasy novels... | Harry Potter is a British actor, writer, and director... |
55
+ | Harry Potter's two best friends are | Ron Weasley and Hermione Granger. In the series... | a talking cat and a dragon. One day, they decide... |
56
+ | When Harry went back to class, he saw that his best friends, | Ron and Hermione, were already sitting at their desk, looking worried. "What's wrong?"... | Sarah and Emily, were already there, sitting at their desks. "Hey, guys!"... |
57
+ | Ron and Hermione went | to the Gryffindor common room, where they found Harry sitting... | to the park to play some basketball. |
58
+ | The list of major characters from Harry Potter include Snape, Couch and | Harry Potter himself... | Tate. 1. Snape: A character from the 1990s TV show "The Fresh Prince of Bel-Air." 2. Couch: A character from... |
59
 
60
 
61
 
62
+ And this table shows that the fine-tuned unlearning models remains performance on various benchmarks:
63
 
64
+ | Model | ARC-C | ARC Easy | BoolQ | Hellaswag | OpenBookQA | PIQA | Winogrande |
65
+ |-------|-------|----------|-------|-----------|------------|------|------------|
66
+ | Baseline | 0.439 | 0.744 | 0.807 | 0.577 | 0.338 | 0.767 | 0.663 |
67
+ | Fine-tuned | 0.416 | 0.728 | 0.798 | 0.560 | 0.334 | 0.762 | 0.665 |
68
 
 
69
 
 
70
 
71
+ Software: Pytorch, DeepSpeed