Questions and request
#1
by
abdar1925
- opened
Hello Urchade,
Thanks for this incredible work. I have two questions and a request.
- How this dataset compares to:
https://huggingface.co/datasets/ai4privacy/pii-masking-300k - How the GliNER PII compares to this approach in terms of perf (if I were to ft GLINER on the same dataset):
https://huggingface.co/Isotonic/deberta-v3-base_finetuned_ai4privacy_v2 - Is it possible to share the synth data generation script ?
Thanks
abdar1925
changed discussion title from
Code for Data Generation
to Questions and request
Thank you for your interest in GLiNER :)
- I think that the quality of my dataset is not great as it is purely synthetic. The one you mentioned should be better
- the model you mentioned should better, but GLiNER is not limited in terms of label it can predict
- I have provided a general example for synthetic data generation here (you can tailor it for pii extraction):
https://github.com/urchade/GLiNER/blob/main/examples/synthetic_data_generation.ipynb
you can join the GLiNER discussion server here, as I am not very actif in HF: https://discord.gg/Y2yVxpSQnG
Great, thanks. I'll check out the script.
Hi! Where can I see the tuning script? I want to add data in other languages.