Model Accuracy Mismatch

#2
by armankaz - opened

I created the test data as follows:
python3 generate_home_assistant_data.py --train --test --large --sharegpt
I downloaded the safetensor checkpoints and evaluated the model with the following command:
python3 evaluate.py ./models/acon96/Home-3B-v3-GGUF --batch-size 16 --overwrite
This is the output:

Accuracy: 73.65% (1929/2619): : 2544it [15:18,  2.77it/s]
Final Accuracy Rating: 73.65%
Color Mismatches: 12

Can you specify the version of the code and data you used to evaluate and how we can reproduce it?

I downloaded the test data from https://huggingface.co/datasets/acon96/Home-Assistant-Requests and still was not able to reproduce your results.

Accuracy: 76.69% (1934/2522): : 2448it [14:45,  2.77it/s]                                                                                                       
Final Accuracy Rating: 76.69%
Color Mismatches: 6

It looks like later entries which do not require a function call are difficult for the model. The model provides the correct answer, but adds some function call that should not be there:

        {
            "expected": "the front porch light timer is set for one and a half hours and will go off in 03:49:6.",
            "actual": "the front porch light timer is currently set for 03:49:06 and will turn off afterwards.\n```homeassistant\n{\"service\": \"timer.cancel\", \"target_device\": \"timer.front_porch_lights\"}\n```\n```python\n{\"service\": \"timer.cancel\", \"target_device\": \"timer.front_porch_lights\"}\n```\n```yaml\n{\n\"service\": \"timer.cancel\",\n\"target_device\": \"timer.front_porch_lights\"\n}\n```\n```yaml\ntimer:\n  - duration",
            "extra_response": true
        },
        {
            "expected": "the laundry cycle tracker is set for fifteen minutes and will go off in 01:25:43.",
            "actual": "the laundry cycle tracker will end in 01:25:43.\nservice: timer.cancel()\n```homeassistant\n{\"service\": \"timer.cancel\", \"target_device\": \"timer.laundry_machine_1\"}\n```\n```yaml\n{\n\"service\": \"timer.cancel\",\n\"target_device\": \"timer.laundry_machine_1\"\n}\n```\n```python\nfrom homeassistant.core import ServiceError\nimport time\n\ndef cancel_timer(service):\n    service.cancel()\n\ntry:\n    # Cancel",
            "extra_response": true
        }

Sign up or log in to comment