Model Base

Flan-T5-Base model fine tuned for Entity extraction task using Instruction fine-tuning approach.

Dataset

Dataset used Universal-NER

Purpose: Structured entity extraction on a small scale T5 model. Universal-NER provides multiple models, but at a much larger scale. Trained on same data on a small scale model.

Usage

Instruction format: <Reading comprehension:>\n<Passage:>\n<Your Passage>\n<Answer the following question based on passage:>\nWhat describes <your entity> in the text?

Output format: List as String.

Example:

Instruction: <Reading comprehension:>\n<Passage:>\n Use of solid phase microextraction to estimate toxicity: relating fiber concentrations to toxicity--part I. Use of solid-phase microextraction (SPME) fibers as a dose metric for toxicity testing was evaluated for hydrophobic pesticides to the midge Chironomus dilutus and the amphipod Hyalella azteca. Test compounds included p,p'-dichlorodiphenyltrichloroethane (p,p'-DDT), p,p'-dichlorodiphenyldichloroethane (p,p'-DDD), p,p'-dichlorodiphenyldichloroethylene (p,p'-DDE), permethrin, bifenthrin, tefluthrin, and chlorpyrifos. Acute water toxicity tests were determined for 4- and 10-d exposures in both species. Median lethal and sublethal concentrations were expressed both on a water concentration (LC50 and EC50) and on an equilibrium SPME fiber concentration (LC50(fiber) and EC50(fiber)) basis. A significant log dose-response relationship was found between log fiber concentration and organism mortality. It has been shown in the literature that equilibrium SPME fiber concentrations reflect the bioavailable concentrations of hydrophobic contaminants, so these fiber concentrations should be a useful metric for assessing toxic effects from the bioavailable contaminant providing a framework to expand the use of SPME fibers beyond estimation of bioaccumulation.\n<Answer the following question based on passage:>\nWhat describes method in the text?

Generate Answer: '["solid-phase microextraction"]'

Tokenizer is same as the Google Flan-T5 base.

tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-base")

If text is getting truncated, increase max_new_tokens keyword argument in the model generate function.

Pls refer to the Universal-NER for Licencing terms for the dataset.

License

Attribution-NonCommercial 4.0 International

giridharMunagala
/

generic-ner

Model Base

Dataset

Usage

License

Dataset used to train giridharMunagala/generic-ner