Ability to have Question Answering with the context

#32
by c0d3r69 - opened

Can we have question answering using the mpt-7b-instruct ? I'm searching for this feature since so long. Anyone, if aware with the things, please help me out.

sam-mosaic changed discussion status to closed

Hi sam, thanks for the response. I tried that where we can ask questions. And , it is working perfectly. What I want is: is there any ability to use it like the way we can use BERT for question answering.. where a user can give a context text and question and the system replies with the answer. I'd be happy to hear you on the same. Thankyou.

c0d3r69 changed discussion status to open
Mosaic ML, Inc. org

I pulled a passage from a random wikipedia article to test it and asked a question and got the right answer ¯_(ツ)_/¯

my test query was as follows. There were examples like this in the training data

Where and when was Vanadinite discovered?

### CONTEXT
Vanadinite is a mineral belonging to the apatite group of phosphates, with the chemical formula Pb5(VO4)3Cl. It is one of the main industrial ores of the metal vanadium and a minor source of lead. A dense, brittle mineral, it is usually found in the form of red hexagonal crystals. It is an uncommon mineral, formed by the oxidation of lead ore deposits such as galena. First discovered in 1801 in Mexico, vanadinite deposits have since been unearthed in South America, Europe, Africa, and North America.

Origins
Vanadinite is an uncommon mineral, only occurring as the result of chemical alterations to a pre-existing material. It is therefore known as a secondary mineral. It is found in arid climates and forms by oxidation of primary lead minerals. Vanadinite is especially found in association with the lead sulfide, galena. Other associated minerals include wulfenite, limonite, and barite.[3][5]

It was originally discovered in Mexico by the Spanish mineralogist Andrés Manuel del Río in 1801. He called the mineral "brown lead" and asserted that it contained a new element, which he first named pancromium and later, erythronium. However, he was later led to believe that this was not a new element but merely an impure form of chromium. In 1830, Nils Gabriel Sefström discovered a new element, which he named vanadium. It was later revealed that this was identical to the metal discovered earlier by Andrés Manuel del Río. Del Río's "brown lead" was also rediscovered, in 1838 in Zimapan, Hidalgo, Mexico, and was named vanadinite because of its high vanadium content. Other names that have since been given to vanadinite are johnstonite and lead vanadate.[6]

Occurrence
Vanadinite occurs as a secondary mineral in the oxidized zone of lead-bearing deposits, the vanadium is leached from wall-rock silicates. Associated minerals include mimetite, pyromorphite, descloizite, mottramite, wulfenite, cerussite, anglesite, calcite, barite, and various iron oxide minerals.[4]

Deposits of vanadinite are found worldwide including Austria, Spain, Scotland, the Ural Mountains, South Africa, Namibia, Morocco, Argentina, Mexico, and 4 states of the United States: Arizona, Colorado, New Mexico, and South Dakota.[3][5][7]

Vanadinite deposits are found in over 400 mines across the world. Notable vanadinite mines include those at Mibladen and Touisset in Morocco; Tsumeb, Namibia; Cordoba, Argentina; and Sierra County, New Mexico, and Gila County, Arizona, in the United States.[8]

Structure
Vanadinite is a lead chlorovanadate with the chemical formula Pb5(VO4)3Cl. It is composed (by weight) of 73.15% lead, 10.79% vanadium, 13.56% oxygen, and 2.50% chlorine. Each structural unit of vanadinite contains a chlorine ion surrounded by six divalent lead ions at the corners of a regular octahedron, with one of the lead ions provided by an adjoining vanadinite molecule. The distance between each lead and chlorine ion is 317 picometres. The shortest distance between each lead ion is 4.48 Å. The octahedron shares two of its opposite faces with that of neighbouring vanadinite units, forming a continuous chain of octahedrons. Each vanadium atom is surrounded by four oxygen atoms at the corners of an irregular tetrahedron. The distance between each oxygen and vanadium atom is either 1.72 or 1.76 Å. Three oxygen tetrahedrons adjoin each of the lead octahedrons along the chain.[2][9][10]
sam-mosaic changed discussion status to closed
Mosaic ML, Inc. org

However, be aware that BERT does extractive question answering, which means it only quotes the passage. This is a decoder model, so it is possible that it can hallucinate.

how can I provide context to the model for getting answers from my text document or files?

c0d3r69 changed discussion status to open

@c0d3r69 you can refer my code snippet using langchain to give the context to mpt instruct in https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML/discussions/2#646b575d5d68f5c15a1e2a99

sam-mosaic changed discussion status to closed

I would like to try https://huggingface.co/spaces/mosaicml/mpt-7b-instruct but getting 404 error

I pulled a passage from a random wikipedia article to test it and asked a question and got the right answer ¯_(ツ)_/¯

my test query was as follows. There were examples like this in the training data

Where and when was Vanadinite discovered?

### CONTEXT
Vanadinite is a mineral belonging to the apatite group of phosphates, with the chemical formula Pb5(VO4)3Cl. It is one of the main industrial ores of the metal vanadium and a minor source of lead. A dense, brittle mineral, it is usually found in the form of red hexagonal crystals. It is an uncommon mineral, formed by the oxidation of lead ore deposits such as galena. First discovered in 1801 in Mexico, vanadinite deposits have since been unearthed in South America, Europe, Africa, and North America.

Origins
Vanadinite is an uncommon mineral, only occurring as the result of chemical alterations to a pre-existing material. It is therefore known as a secondary mineral. It is found in arid climates and forms by oxidation of primary lead minerals. Vanadinite is especially found in association with the lead sulfide, galena. Other associated minerals include wulfenite, limonite, and barite.[3][5]

It was originally discovered in Mexico by the Spanish mineralogist Andrés Manuel del Río in 1801. He called the mineral "brown lead" and asserted that it contained a new element, which he first named pancromium and later, erythronium. However, he was later led to believe that this was not a new element but merely an impure form of chromium. In 1830, Nils Gabriel Sefström discovered a new element, which he named vanadium. It was later revealed that this was identical to the metal discovered earlier by Andrés Manuel del Río. Del Río's "brown lead" was also rediscovered, in 1838 in Zimapan, Hidalgo, Mexico, and was named vanadinite because of its high vanadium content. Other names that have since been given to vanadinite are johnstonite and lead vanadate.[6]

Occurrence
Vanadinite occurs as a secondary mineral in the oxidized zone of lead-bearing deposits, the vanadium is leached from wall-rock silicates. Associated minerals include mimetite, pyromorphite, descloizite, mottramite, wulfenite, cerussite, anglesite, calcite, barite, and various iron oxide minerals.[4]

Deposits of vanadinite are found worldwide including Austria, Spain, Scotland, the Ural Mountains, South Africa, Namibia, Morocco, Argentina, Mexico, and 4 states of the United States: Arizona, Colorado, New Mexico, and South Dakota.[3][5][7]

Vanadinite deposits are found in over 400 mines across the world. Notable vanadinite mines include those at Mibladen and Touisset in Morocco; Tsumeb, Namibia; Cordoba, Argentina; and Sierra County, New Mexico, and Gila County, Arizona, in the United States.[8]

Structure
Vanadinite is a lead chlorovanadate with the chemical formula Pb5(VO4)3Cl. It is composed (by weight) of 73.15% lead, 10.79% vanadium, 13.56% oxygen, and 2.50% chlorine. Each structural unit of vanadinite contains a chlorine ion surrounded by six divalent lead ions at the corners of a regular octahedron, with one of the lead ions provided by an adjoining vanadinite molecule. The distance between each lead and chlorine ion is 317 picometres. The shortest distance between each lead ion is 4.48 Å. The octahedron shares two of its opposite faces with that of neighbouring vanadinite units, forming a continuous chain of octahedrons. Each vanadium atom is surrounded by four oxygen atoms at the corners of an irregular tetrahedron. The distance between each oxygen and vanadium atom is either 1.72 or 1.76 Å. Three oxygen tetrahedrons adjoin each of the lead octahedrons along the chain.[2][9][10]

@sam-mosaic is it possible to share a more complete example of such code so we have a working example of doing context-based Q&A using MPT-7B-Instruct?

Mosaic ML, Inc. org

I use the hf_chat.py script in llm-foundry to test these things. I will give an example, and also a python script that should work.

cd llm-foundry/scripts/inference

python hf_chat.py -n mosaicml/mpt-7b-instruct --max_new_tokens 512 --top_k 100 --model_dtype bf16 --trust_remote_code --attn_impl triton --device cuda:0 --system_prompt "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n" --user_msg_fmt "### Instruction:\n{}" --assistant_msg_fmt  "\n\n### Response:\n{}"

With the above invocation, you can just paste in the instruction / QA / whatnot:

Starting conversation...
Enter your message below.
- Hit return twice to send input to the model
- Type 'clear' to restart the conversation
- Type 'history' to see the conversation
- Type 'quit' to end
- Type 'system' to change the system prompt

User:
Where and when was Vanadinite discovered?
### CONTEXT
Vanadinite is a mineral belonging to the apatite group of phosphates, with the chemical formula Pb5(VO4)3Cl. It is one of the main industrial ores of the metal vanadium and a minor source of lead. A dense, brittle mineral, it is usually found in the form of red hexagonal crystals. It is an uncommon mineral, formed by the oxidation of lead ore deposits such as galena. First discovered in 1801 in Mexico, vanadinite deposits have since been unearthed in South America, Europe, Africa, and North America.
Origins
Vanadinite is an uncommon mineral, only occurring as the result of chemical alterations to a pre-existing material. It is therefore known as a secondary mineral. It is found in arid climates and forms by oxidation of primary lead minerals. Vanadinite is especially found in association with the lead sulfide, galena. Other associated minerals include wulfenite, limonite, and barite.[3][5]
It was originally discovered in Mexico by the Spanish mineralogist Andrés Manuel del Río in 1801. He called the mineral "brown lead" and asserted that it contained a new element, which he first named pancromium and later, erythronium. However, he was later led to believe that this was not a new element but merely an impure form of chromium. In 1830, Nils Gabriel Sefström discovered a new element, which he named vanadium. It was later revealed that this was identical to the metal discovered earlier by Andrés Manuel del Río. Del Río's "brown lead" was also rediscovered, in 1838 in Zimapan, Hidalgo, Mexico, and was named vanadinite because of its high vanadium content. Other names that have since been given to vanadinite are johnstonite and lead vanadate.[6]
Occurrence
Vanadinite occurs as a secondary mineral in the oxidized zone of lead-bearing deposits, the vanadium is leached from wall-rock silicates. Associated minerals include mimetite, pyromorphite, descloizite, mottramite, wulfenite, cerussite, anglesite, calcite, barite, and various iron oxide minerals.[4]
Deposits of vanadinite are found worldwide including Austria, Spain, Scotland, the Ural Mountains, South Africa, Namibia, Morocco, Argentina, Mexico, and 4 states of the United States: Arizona, Colorado, New Mexico, and South Dakota.[3][5][7]
Vanadinite deposits are found in over 400 mines across the world. Notable vanadinite mines include those at Mibladen and Touisset in Morocco; Tsumeb, Namibia; Cordoba, Argentina; and Sierra County, New Mexico, and Gila County, Arizona, in the United States.[8]
Structure
Vanadinite is a lead chlorovanadate with the chemical formula Pb5(VO4)3Cl. It is composed (by weight) of 73.15% lead, 10.79% vanadium, 13.56% oxygen, and 2.50% chlorine. Each structural unit of vanadinite contains a chlorine ion surrounded by six divalent lead ions at the corners of a regular octahedron, with one of the lead ions provided by an adjoining vanadinite molecule. The distance between each lead and chlorine ion is 317 picometres. The shortest distance between each lead ion is 4.48 Å. The octahedron shares two of its opposite faces with that of neighbouring vanadinite units, forming a continuous chain of octahedrons. Each vanadium atom is surrounded by four oxygen atoms at the corners of an irregular tetrahedron. The distance between each oxygen and vanadium atom is either 1.72 or 1.76 Å. Three oxygen tetrahedrons adjoin each of the lead octahedrons along the chain.[2][9][10]

Assistant:
Vanadinite was discovered by the Spanish mineralogist Andrés Manuel del Río in the Mexican state of Hidalgo in 1801.
took 0.62 seconds
import torch
import transformers

name = 'mosaicml/mpt-7b-instruct'

config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.attn_config['attn_impl'] = 'triton'
config.init_device = 'cuda:0' # For fast initialization directly on GPU!

model = transformers.AutoModelForCausalLM.from_pretrained(
  name,
  config=config,
  torch_dtype=torch.bfloat16, # Load model weights in bfloat16
  trust_remote_code=True
)

tokenizer = transformers.AutoTokenizer.from_pretrained(name)

INSTRUCTION_KEY = "### Instruction:"
RESPONSE_KEY = "### Response:"
INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
PROMPT_FOR_GENERATION_FORMAT = """{intro}
{instruction_key}
{instruction}
{response_key}
""".format(
    intro=INTRO_BLURB,
    instruction_key=INSTRUCTION_KEY,
    instruction="{instruction}",
    response_key=RESPONSE_KEY,
)

example = "Where and when was Vanadinite discovered?\n\n...(the rest of the example I posted)..."
fmt_ex = PROMPT_FOR_GENERATION_FORMAT.format(instruction=example)

input_ids = tokenizer(fmt_ex, return_tensors="pt").input_ids
input_ids = input_ids.to(model.device)

response = model.generate(input_ids=input_ids, max_new_tokens=512, top_p=0.92)

@sam-mosaic Thanks for sharing the above. I am able to reproduce the results. I do however have an issue: If I ask an out-of-context question, with everything else remaining as you shared, I see the model hallucinating and producing non-sensical response as follows. Any ideas that you can share to prevent this from happening? I tried with different intro blurbs pretty much asking the model to "find the answer in the given context and if no answer say no" but I was not successful.

User:
what did the president say about ketanji brown jackson?

CONTEXT

Vanadinite is a mineral belonging to the apatite group of phosphates, with the chemical formula Pb5(VO4)3Cl. It is one of the main industrial ores of the metal vanadium and a minor source of lead. A dense, brittle mineral, it is usually found in the form of red hexagonal crystals. It is an uncommon mineral, formed by the oxidation of lead ore deposits such as galena. First discovered in 1801 in Mexico, vanadinite deposits have since been unearthed in South America, Europe, Africa, and North America.
Origins
Vanadinite is an uncommon mineral, only occurring as the result of chemical alterations to a pre-existing material. It is therefore known as a secondary mineral. It is found in arid climates and forms by oxidation of primary lead minerals. Vanadinite is especially found in association with the lead sulfide, galena. Other associated minerals include wulfenite, limonite, and barite.[3][5]
It was originally discovered in Mexico by the Spanish mineralogist Andrés Manuel del Río in 1801. He called the mineral "brown lead" and asserted that it contained a new element, which he first named pancromium and later, erythronium. However, he was later led to believe that this was not a new element but merely an impure form of chromium. In 1830, Nils Gabriel Sefström discovered a new element, which he named vanadium. It was later revealed that this was identical to the metal discovered earlier by Andrés Manuel del Río. Del Río's "brown lead" was also rediscovered, in 1838 in Zimapan, Hidalgo, Mexico, and was named vanadinite because of its high vanadium content. Other names that have since been given to vanadinite are johnstonite and lead vanadate.[6]
Occurrence
Vanadinite occurs as a secondary mineral in the oxidized zone of lead-bearing deposits, the vanadium is leached from wall-rock silicates. Associated minerals include mimetite, pyromorphite, descloizite, mottramite, wulfenite, cerussite, anglesite, calcite, barite, and various iron oxide minerals.[4]
Deposits of vanadinite are found worldwide including Austria, Spain, Scotland, the Ural Mountains, South Africa, Namibia, Morocco, Argentina, Mexico, and 4 states of the United States: Arizona, Colorado, New Mexico, and South Dakota.[3][5][7]
Vanadinite deposits are found in over 400 mines across the world. Notable vanadinite mines include those at Mibladen and Touisset in Morocco; Tsumeb, Namibia; Cordoba, Argentina; and Sierra County, New Mexico, and Gila County, Arizona, in the United States.[8]
Structure
Vanadinite is a lead chlorovanadate with the chemical formula Pb5(VO4)3Cl. It is composed (by weight) of 73.15% lead, 10.79% vanadium, 13.56% oxygen, and 2.50% chlorine. Each structural unit of vanadinite contains a chlorine ion surrounded by six divalent lead ions at the corners of a regular octahedron, with one of the lead ions provided by an adjoining vanadinite molecule. The distance between each lead and chlorine ion is 317 picometres. The shortest distance between each lead ion is 4.48 Å. The octahedron shares two of its opposite faces with that of neighbouring vanadinite units, forming a continuous chain of octahedrons. Each vanadium atom is surrounded by four oxygen atoms at the corners of an irregular tetrahedron. The distance between each oxygen and vanadium atom is either 1.72 or 1.76 Å. Three oxygen tetrahedrons adjoin each of the lead octahedrons along the chain.[2][9][10]

Assistant:
The president said that Ketanji is his wing woman and has his back. The president also supports Ketanji's Supreme Court nomination.

Sign up or log in to comment