jukofyork commited on
Commit
1d09ed1
1 Parent(s): 32a2680

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -18,6 +18,9 @@ Created using [Mergekit](https://github.com/arcee-ai/mergekit) from my two `70b`
18
  - To fix problems with "backwards time skips" in the generated stories, the "standard" interleave pattern was replaced by repeated blocks (see [here](https://github.com/arcee-ai/mergekit/issues/198#issuecomment-2081174251)).
19
  - To help maintain cohesion, the '`q_proj`', '`k_proj`' and '`down_proj`' tensors were all scaled to hypothesised upper-bound values (see [here](https://github.com/arcee-ai/mergekit/issues/198#issuecomment-2063716974)).
20
 
 
 
 
21
  # Prompting format
22
 
23
  Vicuna format is preferred:
@@ -48,8 +51,8 @@ Mistral and Alpaca formats are also supported:
48
  The following YAML configuration was used to produce this model:
49
 
50
  ```yaml
51
- const_tag: &MODEL1 jukofyork/dawn-miqu-70b
52
- const_tag: &MODEL2 jukofyork/dark-miqu-70b
53
 
54
  const_tag: &QK_ATTENUATION_FACTOR 0.8408964153 # sqrt(sqrt(1/2))
55
  const_tag: &MLP_DOWN_SCALE_FACTOR 0.7071067812 # sqrt(1/2)
 
18
  - To fix problems with "backwards time skips" in the generated stories, the "standard" interleave pattern was replaced by repeated blocks (see [here](https://github.com/arcee-ai/mergekit/issues/198#issuecomment-2081174251)).
19
  - To help maintain cohesion, the '`q_proj`', '`k_proj`' and '`down_proj`' tensors were all scaled to hypothesised upper-bound values (see [here](https://github.com/arcee-ai/mergekit/issues/198#issuecomment-2063716974)).
20
 
21
+ My hope was this would act like a longer-context version of [goliath-120b](https://huggingface.co/alpindale/goliath-120b), as [Dawn-Miqu-70B](https://huggingface.co/jukofyork/Dawn-Miqu-70B) has a lot of [
22
+ Xwin-LM-70B-V0.1 ](https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1) in it and [Dark-Miqu-70B](https://huggingface.co/jukofyork/Dark-Miqu-70B) has [Euryale-1.3-L2-70B](https://huggingface.co/Sao10K/Euryale-1.3-L2-70B) in it.
23
+
24
  # Prompting format
25
 
26
  Vicuna format is preferred:
 
51
  The following YAML configuration was used to produce this model:
52
 
53
  ```yaml
54
+ const_tag: &MODEL1 jukofyork/Dawn-Miqu-70B
55
+ const_tag: &MODEL2 jukofyork/Dark-Miqu-70B
56
 
57
  const_tag: &QK_ATTENUATION_FACTOR 0.8408964153 # sqrt(sqrt(1/2))
58
  const_tag: &MLP_DOWN_SCALE_FACTOR 0.7071067812 # sqrt(1/2)