bgc-accession / README.md
Maaly's picture
Update README.md
971d991
bgc-accession model is a Named Entity Recognition (NER) model that identifies and annotates the accession number of biosynthetic gene clusters in texts.
The model is a fine-tuned BioBERT model and the training dataset is available in https://gitlab.com/maaly7/emerald_bgcs_annotations
Testing examples:
1. The genome sequences of Leptolyngbya sp. PCC 7375 (ALVN00000000) and G. sunshinyii YC6258 (NZ_CP007142.1) were obtained previously.36,59
2. K311 was sequenced (NCBI accession number: JN852959) and analyzed with FramePlot and 18 genes were predicted to be involved in echinomycin biosynthesis (Figure 2).
3. The mar cluster was sequenced and annotated and the complete sequence was deposited into Genbank (accession KF711829).