maldv
/

badger-lambda-llama-3-8b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

maldv commited on Jun 10, 2024

Commit

bcac3ba

·

verified ·

1 Parent(s): 9b62d38

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -52,6 +52,8 @@ I've been asked what this is.  For each layer, I use mergekit io to extract each
 * Normalized: I take and divide each layer by it's norm before the transform, and then scale back up by multiplying the result by a midpoint from the norms of the tensors after the inverse.  It's commutative, so it's more efficient to do it pre-complex.
 * Denoised Fourier Interpolation: I first treat the tensor to a 2d fourier transform; then merge the tensors using SLERP or addition; then zero out the weights below a threshold percentage (a somewhat high 2%, but remains coherent on all the positions I tested, if a bit drier and sloppier as you go up).
 ### Format
 Use Llama3 Instruct format.

 * Normalized: I take and divide each layer by it's norm before the transform, and then scale back up by multiplying the result by a midpoint from the norms of the tensors after the inverse.  It's commutative, so it's more efficient to do it pre-complex.
 * Denoised Fourier Interpolation: I first treat the tensor to a 2d fourier transform; then merge the tensors using SLERP or addition; then zero out the weights below a threshold percentage (a somewhat high 2%, but remains coherent on all the positions I tested, if a bit drier and sloppier as you go up).
+Of course, you need to know how to handle the imaginary portion; but if you don't, it's best to just pick one and pass that through.
 ### Format
 Use Llama3 Instruct format.