llm-arch / src

Commit History

Made updates to support automatic reload of the TestGroups after a test run
e35ef72

alfraser commited on

Updated from using random.choices to random.sample throughout where I need a random distinct set as choices does replacement so you can get the same item twice. Discovered in pricing testing.
b897a48

alfraser commited on

Marked the logger as a daemon thread so it doesn't prevent the exit of the python interpreter
d5cf91c

alfraser commited on

Fixed bug where the Logger was logging its own name and not that of the architecture.
30696ca

alfraser commited on

Implemented single threaded worker on writing the logs to the json file for controlled access to the resource on the file system now we are multi-threading the tests.
c0a1e47

alfraser commited on

Modified test runner to dispatch requests in parallel to make use of the fact that there is a lot of wait time for the LLM. Defaulting to 16 threads.
bb7db2c

alfraser commited on

Added runner for pricing fact checks to assess the level of fact embedding in the latest model
c319c31

alfraser commited on

Saved file records to DB. Fixed a print to show the correct test-group name.
3a9dec1

alfraser commited on

Updated the testing page to show the request/response pairs
9cec719

alfraser commited on

Updated the offline save to save the actual request and response text
34061f5

alfraser commited on

Fixed display of architecture name
d6b7bf0

alfraser commited on

Logged start of architecture invocation so if one stalls you can see what it is in the logs
bdc40cf

alfraser commited on

Update to the new URL for model v5
cce95a5

alfraser commited on

Trying fine-tuning yet another way - all run, now testing v3 of the model
a48e190

alfraser commited on

Added raw prompt format to just do passthrough so I can test a number of different examples just typed in
92aa543

alfraser commited on

Added another test prompt format
20cff9b

alfraser commited on

Tweaked the test generator and updated the tests
ca7e5c7

alfraser commited on

Added the test question generator and increased the size of the question bank to 500
59b2aff

alfraser commited on

Added a missing comment
bcc302b

alfraser commited on

Fixed a bug where if the architecture had entirely failed and not generated a response the whole load of TestGroups would crash. Need to fix the root cause of the failure to generate a response, but also should be caught gracefully here in any event.
4332953

alfraser commited on

Refactored loading the TestGroups to make the structure of the json load and the DB load the same and clearer
c76e6f5

alfraser commited on

Added comments throughout
e912278

alfraser commited on

Switched endpoint control to use the writeable token as it was inconsistent with the normal token.
2122072

alfraser commited on

Fixed bug with default prompt style not being valid
190ec66

alfraser commited on

Configured more architectures to try and debug the fine-tuning issue each with different prompt styles
53169ab

alfraser commited on

Trying a different prompting style
3991f6c

alfraser commited on

Tweaked the training data format to try and fix the issue of the model repeating the question over and over
abcd8a9

alfraser commited on

Added loading of test groups from both the DB and the local file and merging these two
1fb12dc

alfraser commited on

Adding the sqlite db where I will archive the test results and added the archiving code
843d9d3

alfraser commited on

Added option to directly pass the HF hub token when wiping the trace file, so I can use it locally outside of streamlit. Defaulted to None avoid changing existing behaviour.
3853f7c

alfraser commited on

Fixed bugs from the refactor of repo access. Now it should save trace again.
f10615b

alfraser commited on

Added the test reporting structure
82130cb

alfraser commited on

Refactored to bring common variables together. Also added a utility to get all the trace records as a list of records
8f424fc

alfraser commited on

Fixed the time.time bug here. Also a call to reset the Chroma DB
1cb115b

alfraser commited on

Tweaked the way the prompt is formatted going into the LLM query, to avoid the fine-tuned model giving nonsense answers
2022fec

alfraser commited on

Flipped the default dataset to be the baseline not the "All products"
a05b15e

alfraser commited on

Added ability to wipe the logs from the system status page
c0f0676

alfraser commited on

Added utility to serve up the test questions
a732fe2

alfraser commited on

Added the comment to the actual save
e47e542

alfraser commited on

Added ability to include comments on the saved trace
53697b7

alfraser commited on

Fixed typo in train script generation
e64e48c

alfraser commited on

Added adapter merging to the fine-tuning script generation
bc56e9e

alfraser commited on

Added awareness of the 'failed' status for an endpoint
82150c1

alfraser commited on

Removed decimals from group tag and moved where it is called in the side by side flow so they actually share the same ID
2f008c2

alfraser commited on

Fixed bug
bd663cd

alfraser commited on

Changed function name as getting a conflict on the server which I am not getting locally
59df961

alfraser commited on

Added saving of the trace data
745c1f4

alfraser commited on

Added ability to request the hf token be writable, in combination with new environment secret
b927d45

alfraser commited on

Fixed less than/greater than bug where I was dropping the wrong reviews to achieve a target average review. Update the sql data set too.
2b08e8f

alfraser commited on

Added the script to shape the data for testing and the associated sqlite containg the test data
7e353fe

alfraser commited on