Created basic README.md

Files changed (1) hide show

README.md ADDED Viewed


1	+ The Dolly v2 model but sharded to reduce the CPU memory required to load it on the GPU and also reduce the load time. By sharding it
2	+ you don't need to load the full model on the CPU RAM before sending it to the GPU, just load it shard by shard.