what datasets are in data.json?
From the model card I can see the following datasets:
- urchade/pile-mistral-v0.1
- knowledgator/GLINER-multi-task-synthetic-data
- EmergentMethods/AskNews-NER-v0
I wonder how have you trained it on few datasets, were they merged or was this model finetuned one by one on each or how?
Thanks
From the model card I can see the following datasets:
- urchade/pile-mistral-v0.1
- knowledgator/GLINER-multi-task-synthetic-data
- EmergentMethods/AskNews-NER-v0
I wonder how have you trained it on few datasets, were they merged or was this model finetuned one by one on each or how?
Thanks
I merged all datasets.
constant learning, haven't tried this approach yet, just run a new training with merged datasets, thanks
I merged AskNews "train" data and rest of data together:
Table for zero-shot benchmark
CrossNER_AI : 44.7%
CrossNER_literature : 51.3%
CrossNER_music : 53.8%
CrossNER_politics : 67.4%
CrossNER_science : 48.3%
mit-movie : 16.0%
mit-restaurant : 6.9%
Average : 41.2%
it's far worse from training with pile-mistral:
CrossNER_AI : 52.1%
CrossNER_literature : 55.0%
CrossNER_music : 62.9%
CrossNER_politics : 65.8%
CrossNER_science : 61.1%
mit-movie : 32.1%
mit-restaurant : 12.3%
Average : 48.8%
or just pilener:
Table for zero-shot benchmark
CrossNER_AI : 57.6%
CrossNER_literature : 52.3%
CrossNER_music : 62.6%
CrossNER_politics : 67.0%
CrossNER_science : 55.9%
mit-movie : 46.3%
mit-restaurant : 31.3%
Average : 53.3%
I'm using transformers 4.41.0 and gliner_config.json from this repo, I've evaluated your model and it's far better:
Table for zero-shot benchmark
CrossNER_AI : 57.7%
CrossNER_literature : 65.9%
CrossNER_music : 65.7%
CrossNER_politics : 67.5%
CrossNER_science : 66.3%
mit-movie : 46.7%
mit-restaurant : 32.6%
Average : 57.5%
it looks like just merging this datasets isn't enough to reproduce it, maybe is gliner_config.json missing something were there originally only 6000 steps? BTW I'm trying to reproduce it just to have understanding to create a good dataset for Polish language but still struggling with this base model.
Yeah, I understand your struggle, because it was very hard for me to get good results for initially decoder models as well. It looks like DeBERTa is very fitted to GLiNER architecture, but decoder models can work well if you set "embed_ent_token" to false. Also, I fine-tuned the model in two steps, the first one included the merged datasets and the second high-quality subset of knowledgator/GLINER-multi-task-synthetic-data
@Ihor late thanks for this. Can I know what methodology was used to select this second stage subset from knowledgator/GLINER-multi-task-synthetic-data? I'd like to return to training again but take modernbert as a base model.