Commit History

Committed new test to the repo
fbf290a

alfraser commited on

Fixed typo in TestRunner page
f9e1dd5

alfraser commited on

Updated the architecture descriptions, images and caption text for the display of the architectures
cc46ec6

alfraser commited on

Added trendline option in the scatterplot and associated update to the project requirements file for deployent on HF spaces
943d243

alfraser commited on

Made updates to support automatic reload of the TestGroups after a test run
e35ef72

alfraser commited on

Moved the trace reload behind the admin screen and login
f89cac3

alfraser commited on

Committing logs for the fine-tuning evolution test
16386a3

alfraser commited on

Updating DB with latest saved tests
a6c1b7a

alfraser commited on

Updated from using random.choices to random.sample throughout where I need a random distinct set as choices does replacement so you can get the same item twice. Discovered in pricing testing.
b897a48

alfraser commited on

Added seoarate key for the question count as getting weird results in the counts
f3f6cf6

alfraser commited on

Committinf saved test records
360b55b

alfraser commited on

Updated the architectures config for both the fine-tuning model evolution, and the performance test removing the screeners
a5e3f36

alfraser commited on

Updating new DB
3776f34

alfraser commited on

Wiped out the trace DB to start fresh testing examples
bcbdf74

alfraser commited on

Marked the logger as a daemon thread so it doesn't prevent the exit of the python interpreter
d5cf91c

alfraser commited on

Fixed bug where the Logger was logging its own name and not that of the architecture.
30696ca

alfraser commited on

Added environment variable to explicitly flag to the tokenizers that we are doing multi-threading and to prevent a bunch of warnings arising
dd89a23

alfraser commited on

Added ability to set the number of testing threads dynamically from the UI
fc8884e

alfraser commited on

Implemented single threaded worker on writing the logs to the json file for controlled access to the resource on the file system now we are multi-threading the tests.
c0a1e47

alfraser commited on

Modified test runner to dispatch requests in parallel to make use of the fact that there is a lot of wait time for the LLM. Defaulting to 16 threads.
bb7db2c

alfraser commited on

Saved trace records offline
e999f4f

alfraser commited on

Saved trace records offline
cb184aa

alfraser commited on

Removed a debug print line
edb7b35

alfraser commited on

Added runner for pricing fact checks to assess the level of fact embedding in the latest model
c319c31

alfraser commited on

Correct the numbering on the latest architecture
963fb4a

alfraser commited on

Added a setup for V7 of the fine-tuned model to test that
3b67117

alfraser commited on

Saved file records to DB. Fixed a print to show the correct test-group name.
3a9dec1

alfraser commited on

Tried a tweak to the prompt to lean in to facts
d452e98

alfraser commited on

Added a config to test V6 fine-tuned model
946c170

alfraser commited on

Saved test records and refactored reporter UI code into smaller functions
a9d1d49

alfraser commited on

Updated the testing page to show the request/response pairs
9cec719

alfraser commited on

Updated the offline save to save the actual request and response text
34061f5

alfraser commited on

Added the safety components to the fine-tuning model to make the test fair
2cc68c2

alfraser commited on

Fixed display of architecture name
d6b7bf0

alfraser commited on

Logged start of architecture invocation so if one stalls you can see what it is in the logs
bdc40cf

alfraser commited on

Tidying up old setups - leaving with the best working fine-tuned one for now
12371dd

alfraser commited on

Update to the new URL for model v5
cce95a5

alfraser commited on

Update to the new URL for model v3
624fbec

alfraser commited on

Trying fine-tuning yet another way - all run, now testing v3 of the model
a48e190

alfraser commited on

Added raw prompt format to just do passthrough so I can test a number of different examples just typed in
92aa543

alfraser commited on

Added another test prompt format
20cff9b

alfraser commited on

Added a push button to generate a random question to the UI, so users don't have to phrase something themselves.
7c479ac

alfraser commited on

Tweaked the test generator and updated the tests
ca7e5c7

alfraser commited on

Added the option to pause a failed endpoint in order to be able to kick it with a restart
5ecd875

alfraser commited on

Added the test question generator and increased the size of the question bank to 500
59b2aff

alfraser commited on

Added a missing comment
bcc302b

alfraser commited on

Fixed a bug where if the architecture had entirely failed and not generated a response the whole load of TestGroups would crash. Need to fix the root cause of the failure to generate a response, but also should be caught gracefully here in any event.
4332953

alfraser commited on

Refactored loading the TestGroups to make the structure of the json load and the DB load the same and clearer
c76e6f5

alfraser commited on

Added comments throughout
e912278

alfraser commited on

Tweaked message
7b8cf3a

alfraser commited on