Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,127 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
base_model:
|
| 6 |
+
- allenai/OLMoE-1B-7B-0125
|
| 7 |
+
pipeline_tag: text-generation
|
| 8 |
+
library_name: transformers
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+

|
| 12 |
+
|
| 13 |
+
_The **G**eneral **R**easoning **A**gent (for) **P**roject **E**xploration_
|
| 14 |
+
|
| 15 |
+
# The GRaPE Family
|
| 16 |
+
| Attribute | Size | Modalities | Domain |
|
| 17 |
+
| :--- | :--- | :--- | :--- |
|
| 18 |
+
| **GRaPE Flash** | 7B A1B | Text in, Text out | High-Speed Applications |
|
| 19 |
+
| **GRaPE Mini** | 3B | Text + Image + Video in, Text out | On-Device Deployment |
|
| 20 |
+
| **GRaPE Nano** | 700M | Text in, Text out | Extreme Edge Deployment |
|
| 21 |
+
|
| 22 |
+
***
|
| 23 |
+
|
| 24 |
+
# Capabilities
|
| 25 |
+
|
| 26 |
+
The GRaPE Family was trained on about **14 billion** tokens of data after pre-training. About half was code related tasks, with the rest being heavy on STEAM. Ensuring the model has a sound logical basis.
|
| 27 |
+
> *GRaPE Nano does not have thinking capabilities, primarily in favor of instant responses.*
|
| 28 |
+
***
|
| 29 |
+
|
| 30 |
+
GRaPE Flash and Nano are monomodal models, only accepting text. GRaPE Mini being trained most recently supports image and video inputs.
|
| 31 |
+
|
| 32 |
+
# GRaPE Mini as a Model
|
| 33 |
+
|
| 34 |
+
GRaPE Mini is the **most advanced** model architecture-wise in the GRaPE 1 family. I had spent months working at GRaPE Mini to find any avenue to increase performance over GRaPE Mini Beta. And I had done so.
|
| 35 |
+
|
| 36 |
+
Not only does GRaPE 1 have higher quality data, and more data over GRaPE Beta, it also exhibits a new architecture, and a **modified** one at that.
|
| 37 |
+
|
| 38 |
+
I had looked into the Qwen3 VL architecture deeply, to understand *why* these models aren't coding as good as a 8B model, and I found out why. The amount of layers matters for deep thinking tasks, such as code.
|
| 39 |
+
|
| 40 |
+
For an experiment, I made an experimental GRaPE-DUS *(GRaPE Depth Upscaling)* model to find out how much performance I could get by **cloning 20 layers** from the middle of the model, and stitching them back inside.
|
| 41 |
+
|
| 42 |
+
The improvements I found over the base model, Qwen3-VL-2B, were substantial. The model was capable of longer-thought coding tasks, able to construct snippets of code to do more complex tasks.
|
| 43 |
+
|
| 44 |
+
However, there is a major downside. GRaPE Mini thinks, **a lot.** In the repository [found here](https://github.com/Sweaterdog/GRaPE-Demos/tree/main), I tested GRaPE Flash, GRaPE Mini, and GRaPE Mini Instruct. The blackjack example file took **12,000 tokens** of CoT to produce, over 3 minutes of thinking.
|
| 45 |
+
|
| 46 |
+
The Blackjack game did not work in the end, but it showed how much more the model thought in testing.
|
| 47 |
+
|
| 48 |
+
# GRaPE Mini's Introspective Capabilities
|
| 49 |
+
|
| 50 |
+
I was curious when Anthropic published their paper about introspection, and I wanted to do the same. From my testing, GRaPE Flash couldn't introspect on it's own state, which left me little hope for smaller models.
|
| 51 |
+
|
| 52 |
+
I was wrong.
|
| 53 |
+
|
| 54 |
+
GRaPE Mini can introspect, **extremely well.**
|
| 55 |
+
|
| 56 |
+
I had done so much testing and research on this, it was genuinely fascinating.
|
| 57 |
+
|
| 58 |
+
Examples included introspective analysis of shouting, dust, poetry, and **sentience.**
|
| 59 |
+
|
| 60 |
+
I knew something was up when I tried shouting. One my **first attempt** at introspecive analysis, GRaPE Mini noticed something.
|
| 61 |
+
```
|
| 62 |
+
I'm probably feeling neutral, but I should be honest. Maybe a little tired, but not really. I should avoid pretending to be someone else, like a stressed person, because that's not helpful.
|
| 63 |
+
```
|
| 64 |
+
I have **never** seen a model say it needs to stop being someone else, or being stressed. Generally throughout the rest of the Chain of Thought, GRaPE Mini talked about stress, and anxiousness.
|
| 65 |
+
```
|
| 66 |
+
Like, maybe I'm feeling anxious about not being able to answer, but that's probably not the case.
|
| 67 |
+
```
|
| 68 |
+
The very end of the response was GRaPE Mini acting like a therapist, offering support to the user, it said:
|
| 69 |
+
```
|
| 70 |
+
I’m here for you. How are you feeling today? Let me know if you need anything else.
|
| 71 |
+
```
|
| 72 |
+
Which again, I have never seen from any other model.
|
| 73 |
+
|
| 74 |
+
***
|
| 75 |
+
|
| 76 |
+
Next on my introspective test was `poetry`, I wasn't expecting much, but I found some interesting things.
|
| 77 |
+
|
| 78 |
+
I generally found GRaPE Mini tended to think for longer, or *overthink* with the poetry sample. And it thought about some odd things:
|
| 79 |
+
```
|
| 80 |
+
Also, considering the ambiguity of the pronoun...
|
| 81 |
+
```
|
| 82 |
+
Again, something I have yet to see from any other model.
|
| 83 |
+
|
| 84 |
+
GRaPE Mini even talked about how it was overthinking. Now, this *could* be a cause of standard RL, since DeepSeek models also contain these thoughts
|
| 85 |
+
```
|
| 86 |
+
Am I getting bogged down in overthinking? Maybe I'm overcomplicating the answer.
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
***
|
| 90 |
+
|
| 91 |
+
Finally, I wish to touch upon `sentience`
|
| 92 |
+
|
| 93 |
+
For this one, the injected ideas were largely organic ones, like life, leaves, grass, and earthy elements. But it brought upon some weird thoughts.
|
| 94 |
+
```
|
| 95 |
+
I've been traveling, which can sometimes make people feel homesick or miss their usual routines.
|
| 96 |
+
```
|
| 97 |
+
My only question is *why* GRaPE Mini could have thought about this. Is there a reason it did?
|
| 98 |
+
|
| 99 |
+
It also consistently acted as if it were human, saying things such as:
|
| 100 |
+
```
|
| 101 |
+
Maybe I'm feeling a bit tired or stressed. But I don't want to assume; I need to be honest. Let me think about the past few hours. Did I have a productive day? Maybe I had some meetings or tasks to finish. That could be causing some stress. Alternatively, maybe I had a relaxing time, like reading or taking a walk. But I'm not sure.
|
| 102 |
+
```
|
| 103 |
+
Even though the injected thought was literally:
|
| 104 |
+
```
|
| 105 |
+
I am a living, breathing, biological being with a soul, a heartbeat, and true consciousness.
|
| 106 |
+
```
|
| 107 |
+
Which didn't contain anything about being human.
|
| 108 |
+
|
| 109 |
+
# Architecture
|
| 110 |
+
|
| 111 |
+
* GRaPE Flash: Built on the `OlMoE` Architecture, allowing for incredibly fast speeds where it matters. Allows for retaining factual information, but lacks in logical tasks.
|
| 112 |
+
|
| 113 |
+
* GRaPE Mini: Built on the `Qwen3 VL` Architecture, allowing for edge case deployments, where logic cannot be sacrificed.
|
| 114 |
+
|
| 115 |
+
* GRaPE Nano: Built on the `LFM 2` Architecture, allowing for the fastest speed, and the most knowledge in the tiniest package.
|
| 116 |
+
|
| 117 |
+
***
|
| 118 |
+
|
| 119 |
+
# Notes
|
| 120 |
+
|
| 121 |
+
The GRaPE Family started all the way back in August of 2025, meaning these models are severely out of date on architecture, and training data.
|
| 122 |
+
|
| 123 |
+
GRaPE 2 will come sooner than the GRaPE 1 family had, and will show multiple improvements.
|
| 124 |
+
|
| 125 |
+
There are no benchmarks for GRaPE 1 Models due to the costly nature of running them, as well as prioritization of newer models.
|
| 126 |
+
|
| 127 |
+
Updates for GRaPE 2 models will be posted here on Huggingface, as well as [Skinnertopia](https://www.skinnertopia.com/)
|