Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

bgc-accession model is a Named Entity Recognition (NER) model that identifies and annotates the accession number of biosynthetic gene clusters in texts.

The model is a fine-tuned BioBERT model and the training dataset is available in https://gitlab.com/maaly7/emerald_bgcs_annotations

Testing examples:

  1. The genome sequences of Leptolyngbya sp. PCC 7375 (ALVN00000000) and G. sunshinyii YC6258 (NZ_CP007142.1) were obtained previously.36,59
  2. K311 was sequenced (NCBI accession number: JN852959) and analyzed with FramePlot and 18 genes were predicted to be involved in echinomycin biosynthesis (Figure 2).
  3. The mar cluster was sequenced and annotated and the complete sequence was deposited into Genbank (accession KF711829).
Downloads last month
2