failspy commited on
Commit
f409a61
1 Parent(s): 764d3eb

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,18 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Meta-Llama-3-70B-Instruct-abliterated-v3.5-fp16-00001-of-00006.gguf filter=lfs diff=lfs merge=lfs -text
37
+ Meta-Llama-3-70B-Instruct-abliterated-v3.5-fp16-00002-of-00006.gguf filter=lfs diff=lfs merge=lfs -text
38
+ Meta-Llama-3-70B-Instruct-abliterated-v3.5-fp16-00003-of-00006.gguf filter=lfs diff=lfs merge=lfs -text
39
+ Meta-Llama-3-70B-Instruct-abliterated-v3.5-fp16-00004-of-00006.gguf filter=lfs diff=lfs merge=lfs -text
40
+ Meta-Llama-3-70B-Instruct-abliterated-v3.5-fp16-00005-of-00006.gguf filter=lfs diff=lfs merge=lfs -text
41
+ Meta-Llama-3-70B-Instruct-abliterated-v3.5-fp16-00006-of-00006.gguf filter=lfs diff=lfs merge=lfs -text
42
+ Meta-Llama-3-70B-Instruct-abliterated-v3.5_q3.gguf filter=lfs diff=lfs merge=lfs -text
43
+ Meta-Llama-3-70B-Instruct-abliterated-v3.5_q4.gguf filter=lfs diff=lfs merge=lfs -text
44
+ Meta-Llama-3-70B-Instruct-abliterated-v3.5_q5.gguf filter=lfs diff=lfs merge=lfs -text
45
+ Meta-Llama-3-70B-Instruct-abliterated-v3.5_q6-00001-of-00003.gguf filter=lfs diff=lfs merge=lfs -text
46
+ Meta-Llama-3-70B-Instruct-abliterated-v3.5_q6-00002-of-00003.gguf filter=lfs diff=lfs merge=lfs -text
47
+ Meta-Llama-3-70B-Instruct-abliterated-v3.5_q6-00003-of-00003.gguf filter=lfs diff=lfs merge=lfs -text
48
+ Meta-Llama-3-70B-Instruct-abliterated-v3.5_q8-00001-of-00003.gguf filter=lfs diff=lfs merge=lfs -text
49
+ Meta-Llama-3-70B-Instruct-abliterated-v3.5_q8-00002-of-00003.gguf filter=lfs diff=lfs merge=lfs -text
50
+ Meta-Llama-3-70B-Instruct-abliterated-v3.5_q8-00003-of-00003.gguf filter=lfs diff=lfs merge=lfs -text
Meta-Llama-3-70B-Instruct-abliterated-v3.5-fp16-00001-of-00006.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76d93b218ffd79836026c60d0e1bf274f89b0f9be09475dec6af6219111cd7f7
3
+ size 26537749440
Meta-Llama-3-70B-Instruct-abliterated-v3.5-fp16-00002-of-00006.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:40b3468fe69a2cc12335e24783af2ab1b318f9ed7d66c02971d737688eb02568
3
+ size 26441883968
Meta-Llama-3-70B-Instruct-abliterated-v3.5-fp16-00003-of-00006.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8ad6d166321b06a5c8151f470d96ee4da37be098b14850bc025d2a8f5d8bfd9f
3
+ size 26609721568
Meta-Llama-3-70B-Instruct-abliterated-v3.5-fp16-00004-of-00006.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d2d9ea76b1a7042eb44959efe2cb673b09b62f9f22f5e7a2191ebfdb39d0489b
3
+ size 26441883968
Meta-Llama-3-70B-Instruct-abliterated-v3.5-fp16-00005-of-00006.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:25bb3423778fe3bc37b7cf6e0f3e917dd1ec7fa8e494834530abdd295b6a26e0
3
+ size 26609721568
Meta-Llama-3-70B-Instruct-abliterated-v3.5-fp16-00006-of-00006.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a3f1c5fb107fcbc68724fb855f2a5ac74254978101153eb3d07e4f91129548c4
3
+ size 8476952800
Meta-Llama-3-70B-Instruct-abliterated-v3.5_q3.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:edba83178c98a51a93e196fd3fff01b975bb77b7c0a2b73bbd421d6c9f4e135e
3
+ size 34267493888
Meta-Llama-3-70B-Instruct-abliterated-v3.5_q4.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:acb581791db41ecb06c3bf93c80f353222838499d131d4191421aa937eb4fbcd
3
+ size 42520393216
Meta-Llama-3-70B-Instruct-abliterated-v3.5_q5.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f2679647e1e771da9d026f2670d7976fedab074bd96747d0e3d00616aaae362
3
+ size 49949816320
Meta-Llama-3-70B-Instruct-abliterated-v3.5_q6-00001-of-00003.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b243f149e24bbd1dda78edacae52003bd754b2a9bcda2c506fafb5e88075bf2a
3
+ size 26842127328
Meta-Llama-3-70B-Instruct-abliterated-v3.5_q6-00002-of-00003.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b4f8f2b36a8d0a98492af690b04bcfe404a39f073cebc4bb4a741714a1d66268
3
+ size 26674352224
Meta-Llama-3-70B-Instruct-abliterated-v3.5_q6-00003-of-00003.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a3b9baac1ea0e70417e8af4391cc71f703650b8594122436e2b2271889be0315
3
+ size 4371663648
Meta-Llama-3-70B-Instruct-abliterated-v3.5_q8-00001-of-00003.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:deabcef4f11ab2b4a39fb29fe0f46d3656796bcd18fc6acb5bfa134289450546
3
+ size 26830833952
Meta-Llama-3-70B-Instruct-abliterated-v3.5_q8-00002-of-00003.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:67f6e0adf7b240902155f6d4899d51b59c056c8f889a34560745fda75b84dc22
3
+ size 26776256160
Meta-Llama-3-70B-Instruct-abliterated-v3.5_q8-00003-of-00003.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6edb082a5cacf41baca1ed5408297d2f99749757ca6c910ccda9064bb6cd10ab
3
+ size 21367959424
README.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: llama3
4
+ ---
5
+ # Llama-3-70B-Instruct-abliterated-v3.5 Model Card
6
+
7
+ [My original Jupyter "cookbook" to replicate the methodology can be found here](https://huggingface.co/failspy/llama-3-70B-Instruct-abliterated/blob/main/ortho_cookbook.ipynb)
8
+
9
+ [My personal library o' code used](https://github.com/FailSpy/abliterator) (WIP, looking to improve and generalize)
10
+
11
+ This is [meta-llama/Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) with orthogonalized bfloat16 safetensor weights, generated with a refined methodology based on that which was described in the preview paper/blog post: '[Refusal in LLMs is mediated by a single direction](https://www.alignmentforum.org/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction)' which I encourage you to read to understand more.
12
+
13
+ ## V3.5?
14
+ Second try. I felt that the V3 methodology of 70B wasn't well applied, and u/Nexesenex on reddit kinda confirmed my suspicions. So go blame them. :P
15
+
16
+ This one has only a single layer modified(!) and that seems to have completely eliminated moralizing disclaimers.
17
+
18
+ I hope you'll find this model better than 70B-V3! As well, this also fixes the tokenizer.
19
+
20
+ ## Hang on, "abliteration"? Orthogonalization? Ablation? What is this?
21
+
22
+ TL;DR: This model has had certain weights manipulated to "inhibit" the model's ability to express refusal. It is not in anyway _guaranteed_ that it won't refuse you, understand your request, it may still lecture you about ethics/safety, etc. It is tuned in all other respects the same as the original 70B instruct model was, just with the strongest refusal directions orthogonalized out.
23
+
24
+ **TL;TL;DR;DR: It's uncensored in the purest form I can manage -- no new or changed behaviour in any other respect from the original model.**
25
+
26
+ As far as "abliteration": it's just a fun play-on-words using the original "ablation" term used in the original paper to refer to removing features, which I made up particularly to differentiate the model from "uncensored" fine-tunes.
27
+ Ablate + obliterated = Abliterated
28
+
29
+ Anyways, orthogonalization/ablation are both aspects to refer to the same thing here, the technique in which the refusal feature was "ablated" from the model was via orthogonalization.
30
+
31
+ ## A little more on the methodology, and why this is interesting
32
+
33
+ To me, ablation (or applying the methodology for the inverse, "augmentation") seems to be good for inducing/removing very specific features that you'd have to spend way too many tokens on encouraging or discouraging in your system prompt.
34
+ Instead, you just apply your system prompt in the ablation script against a blank system prompt on the same dataset and orthogonalize for the desired behaviour in the final model weights.
35
+
36
+ > Why this over fine-tuning?
37
+
38
+ Ablation is much more surgical in nature whilst also being effectively executed with a _lot_ less data than fine-tuning, which I think is its main advantage.
39
+
40
+ As well, and its most valuable aspect is it keeps as much of the original model's knowledge and training intact, whilst removing its tendency to behave in one very specific undesireable manner. (In this case, refusing user requests.)
41
+
42
+ Fine tuning is still exceptionally useful and the go-to for broad behaviour changes; however, you may be able to get close to your desired behaviour with very few samples using the ablation/augmentation techniques.
43
+ It may also be a useful step to add to your model refinement: orthogonalize -> fine-tune or vice-versa.
44
+
45
+ I haven't really gotten around to exploring this model stacked with fine-tuning, I encourage others to give it a shot if they've got the capacity.
46
+
47
+ > Okay, fine, but why V3? There's no V2 70B?
48
+
49
+ Well, I released a V2 a while back for 8B under Cognitive Computations.
50
+ It ended up being not worth it to try V2 with 70B, I wanted to refine the model before wasting compute cycles on what might not even be a better model.
51
+ I am however quite pleased about this latest methodology, it seems to have induced fewer hallucinations.
52
+ So to show that it's a new fancy methodology from even that of the 8B V2, I decided to do a Microsoft and double up on my version jump because it's *such* an advancement (or so the excuse went, when in actuality it was because too many legacy but actively used Microsoft libraries checked for 'Windows 9' in the OS name to detect Windows 95/98 as one.)
53
+
54
+ ## Quirkiness awareness notice
55
+
56
+ This model may come with interesting quirks, with the methodology being so new. I encourage you to play with the model, and post any quirks you notice in the community tab, as that'll help us further understand what this orthogonalization has in the way of side effects.
57
+
58
+ If you manage to develop further improvements, please share! This is really the most basic way to use ablation, but there are other possibilities that I believe are as-yet unexplored.
59
+
60
+ Additionally, feel free to reach out in any way about this. I'm on the Cognitive Computations Discord, I'm watching the Community tab, reach out! I'd love to see this methodology used in other ways, and so would gladly support whoever whenever I can.
61
+