Maurice Weber
mauriceweber
AI & ML interests
None yet
Organizations
mauriceweber's activity
How can I download the sample-10B fastestly?
1
#28 opened 4 days ago
by
zgxiao
defunct book subset
4
#28 opened 8 months ago
by
polinaeterna
![](https://cdn-avatars.huggingface.co/v1/production/uploads/1645527878855-61dd9f18f187b39868bd157e.jpeg)
How much disk space would the whole HF dataset take?
1
#27 opened 3 months ago
by
protossw512
rpv2-subsamples
1
#26 opened 6 months ago
by
mauriceweber
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6329ee3dab49d487dd1439ec/vxGvdBK0XMZaCpc5dGOIa.jpeg)
The doc_id in duplicates is should contain?
3
#24 opened 6 months ago
by
newbietuan
Deduplication steps
23
#15 opened 7 months ago
by
ilyayudkovich
Here's a download script parallelized using Spark
1
#22 opened 6 months ago
by
srowen
![](https://cdn-avatars.huggingface.co/v1/production/uploads/63050fbfce6b12280b1e2976/2lJphRSgdt9B_5YAQ1SIs.jpeg)
what is the meaning of snapshots in redpajama-data-v2?
2
#21 opened 6 months ago
by
choidonghun
How to join documents and quality signals when downloading directly
3
#19 opened 7 months ago
by
tgshdyfuhuf
Missing duplicates parquet files
5
#18 opened 7 months ago
by
bebensee
Script to download all files of 1B sample data locally
2
#13 opened 7 months ago
by
ivanzhouyq
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6313f5f4c093ff968e0ec6c8/LVTpwU-pXVDhnJcEbDAEx.jpeg)
What is the total size, of the entirety of this dataset in TB?
1
#10 opened 8 months ago
by
Bayaz
![](https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/36ka_el5MVxUPJqCMVvDH.png)
What's the concept on partitions
2
#5 opened 8 months ago
by
SwatCat
quality_signals, minhash and duplicates missing
2
#3 opened 8 months ago
by
sheshanshag
Request to add retries into RedPajama-Data-V2.py script
1
#16 opened 7 months ago
by
yura38
How to obtain duplicates from minhash?
1
#8 opened 8 months ago
by
cq
![](https://cdn-avatars.huggingface.co/v1/production/uploads/1641968925128-noauth.jpeg)
Obtaining Filtered Samples
4
#12 opened 7 months ago
by
ssingh22
sample split details
1
#4 opened 7 months ago
by
sujantkumarkv
How big is the data size of en?
5
#6 opened 8 months ago
by
newbietuan
Request to provide 1B/10B/100B/1T token subsample datasets separately
2
#4 opened 8 months ago
by
johnhew
![](https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/QPCRraySl9zTnf7eG5ZJk.png)
Missing file error
3
#9 opened 8 months ago
by
emrgnt-cmplxty
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
4
#29 opened 9 months ago
by
shubhamagarwal92
![](https://cdn-avatars.huggingface.co/v1/production/uploads/1652085632597-617bacfb191221bded6ed2c4.jpeg)
The model doesn't seem to stop
15
#1 opened 10 months ago
by
LaferriereJC
Using the Accelerate API to train models on multiple GPUs
8
#28 opened 9 months ago
by
ajash
Input validation error: `max_new_tokens` must be <= 1. Given: 20
1
#12 opened 9 months ago
by
reubenlee3
Are the unsafe files from C4 also in RedPajama?
2
#26 opened 9 months ago
by
cwallenwein
![](https://cdn-avatars.huggingface.co/v1/production/uploads/636ab7cac95145940bff7ef7/l9pOpStA2BIIif3NGWmr4.jpeg)
Prompt format different in dataset compared to model card
3
#11 opened 9 months ago
by
bhperry
Model gives itself instructions and keeps going and going and going?
5
#8 opened 10 months ago
by
michael-newsrx-com
Great model. Plans for 13b version?
1
#9 opened 10 months ago
by
nahuel89p
![](https://cdn-avatars.huggingface.co/v1/production/uploads/64595c35e51abbc104d7aab3/S6IB1Vts6SpfL64nk_KN0.png)
Loading model without fast-attn
1
#10 opened 10 months ago
by
TZ20
Model on your API Playground
7
#3 opened 11 months ago
by
1littlecoder
![](https://cdn-avatars.huggingface.co/v1/production/uploads/64b7d5c9479b934973ecb30f/bDqSuFAhNvaD9RYxeBobx.jpeg)
Can I continue pretraining this model for domain adaptation?
4
#6 opened 10 months ago
by
sadahila
inconsistent data field in github jsonl files
3
#24 opened 10 months ago
by
Rita
![](https://cdn-avatars.huggingface.co/v1/production/uploads/1617208820524-60591de6ce97b2b478a3dc4e.jpeg)
Unwanted repetitive response
3
#12 opened 10 months ago
by
sdranju
protofile.proto: A file with this name is already in the pool
1
#19 opened 10 months ago
by
surya-narayanan
ENDPOINT CONFIGURATION ON AWS SAGEMAKER
1
#21 opened 10 months ago
by
NABARKA
Any plans for chat model?
1
#5 opened 10 months ago
by
brekk
![](https://cdn-avatars.huggingface.co/v1/production/uploads/648fdcb4cb9b9578a7e53bad/K4zQCoKvjtDD0NKDkQrM6.png)
when will have a ggml version?
8
#3 opened 10 months ago
by
CUIGuy
Skip split generation.
3
#23 opened 10 months ago
by
luosuu
LocalAI Model Loading
3
#2 opened 10 months ago
by
FIWisher
Error when loading book/book.jsonl using load_dataset
5
#22 opened 11 months ago
by
icycold
Instead of flash_attn it should be flash_attn_2_cuda . This is causing a deployment issue in TGI/DJL
1
#14 opened 11 months ago
by
monuminu
!pip install flash-attn --no-build-isolation
2
#15 opened 11 months ago
by
NivYO
getting strange tokens after finetuning on Qlora
2
#11 opened 11 months ago
by
monuminu
RoPE scaling and max_position_embeddings
2
#12 opened 11 months ago
by
ag0
What is the VRAM requirement of this model?
5
#1 opened 11 months ago
by
Said2k
GGML Version
8
#4 opened 11 months ago
by
s3nh
![](https://cdn-avatars.huggingface.co/v1/production/uploads/61caeda441f9432649f03ab6/0UdRCrzIqhedZblgfpMBk.png)
Can try code as long text data.
1
#1 opened 11 months ago
by
win10
![](https://cdn-avatars.huggingface.co/v1/production/uploads/1678188568629-noauth.png)
Training diverges when used with Llama 2 70B and 4-bit QLoRA
3
#10 opened 11 months ago
by
alyssavance
Specify RLHF data for the Instruct and Chat versions in model card
3
#9 opened 11 months ago
by
markding
![](https://cdn-avatars.huggingface.co/v1/production/uploads/62d1218684bfbee86b6ee521/BpXX_XUP80IfdGAvbs_VI.png)
What's the prompt template?
11
#4 opened about 1 year ago
by
qiz
Is this model commercially usable?
2
#10 opened 12 months ago
by
AayushShah
![](https://cdn-avatars.huggingface.co/v1/production/uploads/63ff5fc4fe6383d50b29052e/Vk9R5rKqG-Z_ou-55J9x-.jpeg)