Commits · alfraser/llm-arch

Made updates to support automatic reload of the TestGroups after a test run

e35ef72

alfraser commited on Feb 6

Updated from using random.choices to random.sample throughout where I need a random distinct set as choices does replacement so you can get the same item twice. Discovered in pricing testing.

b897a48

alfraser commited on Feb 5

Marked the logger as a daemon thread so it doesn't prevent the exit of the python interpreter

d5cf91c

alfraser commited on Feb 5

Fixed bug where the Logger was logging its own name and not that of the architecture.

30696ca

alfraser commited on Feb 5

Implemented single threaded worker on writing the logs to the json file for controlled access to the resource on the file system now we are multi-threading the tests.

c0a1e47

alfraser commited on Feb 5

Modified test runner to dispatch requests in parallel to make use of the fact that there is a lot of wait time for the LLM. Defaulting to 16 threads.

bb7db2c

alfraser commited on Feb 1

Added runner for pricing fact checks to assess the level of fact embedding in the latest model

c319c31

alfraser commited on Feb 1

Saved file records to DB. Fixed a print to show the correct test-group name.

3a9dec1

alfraser commited on Jan 31

Updated the testing page to show the request/response pairs

9cec719

alfraser commited on Jan 30

Updated the offline save to save the actual request and response text

34061f5

alfraser commited on Jan 30

Fixed display of architecture name

d6b7bf0

alfraser commited on Jan 30

Logged start of architecture invocation so if one stalls you can see what it is in the logs

bdc40cf

alfraser commited on Jan 30

Update to the new URL for model v5

cce95a5

alfraser commited on Jan 29

Trying fine-tuning yet another way - all run, now testing v3 of the model

a48e190

alfraser commited on Jan 29

Added raw prompt format to just do passthrough so I can test a number of different examples just typed in

92aa543

alfraser commited on Jan 26

Added another test prompt format

20cff9b

alfraser commited on Jan 26

Tweaked the test generator and updated the tests

ca7e5c7

alfraser commited on Jan 26

Added the test question generator and increased the size of the question bank to 500

59b2aff

alfraser commited on Jan 26

Added a missing comment

bcc302b

alfraser commited on Jan 26

Fixed a bug where if the architecture had entirely failed and not generated a response the whole load of TestGroups would crash. Need to fix the root cause of the failure to generate a response, but also should be caught gracefully here in any event.

4332953

alfraser commited on Jan 26

Refactored loading the TestGroups to make the structure of the json load and the DB load the same and clearer

c76e6f5

alfraser commited on Jan 26

Added comments throughout

e912278

alfraser commited on Jan 26

Switched endpoint control to use the writeable token as it was inconsistent with the normal token.

2122072

alfraser commited on Jan 25

Fixed bug with default prompt style not being valid

190ec66

alfraser commited on Jan 25

Configured more architectures to try and debug the fine-tuning issue each with different prompt styles

53169ab

alfraser commited on Jan 25

Trying a different prompting style

3991f6c

alfraser commited on Jan 25

Tweaked the training data format to try and fix the issue of the model repeating the question over and over

abcd8a9

alfraser commited on Jan 25

Added loading of test groups from both the DB and the local file and merging these two

1fb12dc

alfraser commited on Jan 25

Adding the sqlite db where I will archive the test results and added the archiving code

843d9d3

alfraser commited on Jan 25

Added option to directly pass the HF hub token when wiping the trace file, so I can use it locally outside of streamlit. Defaulted to None avoid changing existing behaviour.

3853f7c

alfraser commited on Jan 25

Fixed bugs from the refactor of repo access. Now it should save trace again.

f10615b

alfraser commited on Jan 24

Added the test reporting structure

82130cb

alfraser commited on Jan 24

Refactored to bring common variables together. Also added a utility to get all the trace records as a list of records

8f424fc

alfraser commited on Jan 24

Fixed the time.time bug here. Also a call to reset the Chroma DB

1cb115b

alfraser commited on Jan 24

Tweaked the way the prompt is formatted going into the LLM query, to avoid the fine-tuned model giving nonsense answers

2022fec

alfraser commited on Jan 24

Flipped the default dataset to be the baseline not the "All products"

a05b15e

alfraser commited on Jan 24

Added ability to wipe the logs from the system status page

c0f0676

alfraser commited on Jan 24

Added utility to serve up the test questions

a732fe2

alfraser commited on Jan 24

Added the comment to the actual save

e47e542

alfraser commited on Jan 24

Added ability to include comments on the saved trace

53697b7

alfraser commited on Jan 24

Fixed typo in train script generation

e64e48c

alfraser commited on Jan 24

Added adapter merging to the fine-tuning script generation

bc56e9e

alfraser commited on Jan 24

Added awareness of the 'failed' status for an endpoint

82150c1

alfraser commited on Jan 23

Removed decimals from group tag and moved where it is called in the side by side flow so they actually share the same ID

2f008c2

alfraser commited on Jan 23

Fixed bug

bd663cd

alfraser commited on Jan 23

Changed function name as getting a conflict on the server which I am not getting locally

59df961

alfraser commited on Jan 23

Added saving of the trace data

745c1f4

alfraser commited on Jan 23

Added ability to request the hf token be writable, in combination with new environment secret

b927d45

alfraser commited on Jan 23

Fixed less than/greater than bug where I was dropping the wrong reviews to achieve a target average review. Update the sql data set too.

2b08e8f

alfraser commited on Jan 23

Added the script to shape the data for testing and the associated sqlite containg the test data

7e353fe

alfraser commited on Jan 23

Commit History

Made updates to support automatic reload of the TestGroups after a test run e35ef72

Updated from using random.choices to random.sample throughout where I need a random distinct set as choices does replacement so you can get the same item twice. Discovered in pricing testing. b897a48

Marked the logger as a daemon thread so it doesn't prevent the exit of the python interpreter d5cf91c

Fixed bug where the Logger was logging its own name and not that of the architecture. 30696ca

Implemented single threaded worker on writing the logs to the json file for controlled access to the resource on the file system now we are multi-threading the tests. c0a1e47

Modified test runner to dispatch requests in parallel to make use of the fact that there is a lot of wait time for the LLM. Defaulting to 16 threads. bb7db2c

Added runner for pricing fact checks to assess the level of fact embedding in the latest model c319c31

Saved file records to DB. Fixed a print to show the correct test-group name. 3a9dec1

Updated the testing page to show the request/response pairs 9cec719

Updated the offline save to save the actual request and response text 34061f5

Fixed display of architecture name d6b7bf0

Logged start of architecture invocation so if one stalls you can see what it is in the logs bdc40cf

Update to the new URL for model v5 cce95a5

Trying fine-tuning yet another way - all run, now testing v3 of the model a48e190

Added raw prompt format to just do passthrough so I can test a number of different examples just typed in 92aa543

Added another test prompt format 20cff9b

Tweaked the test generator and updated the tests ca7e5c7

Added the test question generator and increased the size of the question bank to 500 59b2aff

Added a missing comment bcc302b

Fixed a bug where if the architecture had entirely failed and not generated a response the whole load of TestGroups would crash. Need to fix the root cause of the failure to generate a response, but also should be caught gracefully here in any event. 4332953

Refactored loading the TestGroups to make the structure of the json load and the DB load the same and clearer c76e6f5

Added comments throughout e912278

Switched endpoint control to use the writeable token as it was inconsistent with the normal token. 2122072

Fixed bug with default prompt style not being valid 190ec66

Configured more architectures to try and debug the fine-tuning issue each with different prompt styles 53169ab

Trying a different prompting style 3991f6c

Tweaked the training data format to try and fix the issue of the model repeating the question over and over abcd8a9

Added loading of test groups from both the DB and the local file and merging these two 1fb12dc

Adding the sqlite db where I will archive the test results and added the archiving code 843d9d3

Added option to directly pass the HF hub token when wiping the trace file, so I can use it locally outside of streamlit. Defaulted to None avoid changing existing behaviour. 3853f7c

Fixed bugs from the refactor of repo access. Now it should save trace again. f10615b

Added the test reporting structure 82130cb

Refactored to bring common variables together. Also added a utility to get all the trace records as a list of records 8f424fc

Fixed the time.time bug here. Also a call to reset the Chroma DB 1cb115b

Tweaked the way the prompt is formatted going into the LLM query, to avoid the fine-tuned model giving nonsense answers 2022fec

Flipped the default dataset to be the baseline not the "All products" a05b15e

Added ability to wipe the logs from the system status page c0f0676

Added utility to serve up the test questions a732fe2

Added the comment to the actual save e47e542

Added ability to include comments on the saved trace 53697b7

Fixed typo in train script generation e64e48c

Added adapter merging to the fine-tuning script generation bc56e9e

Added awareness of the 'failed' status for an endpoint 82150c1

Removed decimals from group tag and moved where it is called in the side by side flow so they actually share the same ID 2f008c2

Fixed bug bd663cd

Changed function name as getting a conflict on the server which I am not getting locally 59df961

Added saving of the trace data 745c1f4

Added ability to request the hf token be writable, in combination with new environment secret b927d45

Fixed less than/greater than bug where I was dropping the wrong reviews to achieve a target average review. Update the sql data set too. 2b08e8f

Added the script to shape the data for testing and the associated sqlite containg the test data 7e353fe

Made updates to support automatic reload of the TestGroups after a test run

e35ef72

Updated from using random.choices to random.sample throughout where I need a random distinct set as choices does replacement so you can get the same item twice. Discovered in pricing testing.

b897a48

Marked the logger as a daemon thread so it doesn't prevent the exit of the python interpreter

d5cf91c

Fixed bug where the Logger was logging its own name and not that of the architecture.

30696ca

Implemented single threaded worker on writing the logs to the json file for controlled access to the resource on the file system now we are multi-threading the tests.

c0a1e47

Modified test runner to dispatch requests in parallel to make use of the fact that there is a lot of wait time for the LLM. Defaulting to 16 threads.

bb7db2c

Added runner for pricing fact checks to assess the level of fact embedding in the latest model

c319c31

Saved file records to DB. Fixed a print to show the correct test-group name.

3a9dec1

Updated the testing page to show the request/response pairs

9cec719

Updated the offline save to save the actual request and response text

34061f5

Fixed display of architecture name

d6b7bf0

Logged start of architecture invocation so if one stalls you can see what it is in the logs

bdc40cf

Update to the new URL for model v5

cce95a5

Trying fine-tuning yet another way - all run, now testing v3 of the model

a48e190

Added raw prompt format to just do passthrough so I can test a number of different examples just typed in

92aa543

Added another test prompt format

20cff9b

Tweaked the test generator and updated the tests

ca7e5c7

Added the test question generator and increased the size of the question bank to 500

59b2aff

Added a missing comment

bcc302b

Fixed a bug where if the architecture had entirely failed and not generated a response the whole load of TestGroups would crash. Need to fix the root cause of the failure to generate a response, but also should be caught gracefully here in any event.

4332953

Refactored loading the TestGroups to make the structure of the json load and the DB load the same and clearer

c76e6f5

Added comments throughout

e912278

Switched endpoint control to use the writeable token as it was inconsistent with the normal token.

2122072

Fixed bug with default prompt style not being valid

190ec66

Configured more architectures to try and debug the fine-tuning issue each with different prompt styles

53169ab

Trying a different prompting style

3991f6c

Tweaked the training data format to try and fix the issue of the model repeating the question over and over

abcd8a9

Added loading of test groups from both the DB and the local file and merging these two

1fb12dc

Adding the sqlite db where I will archive the test results and added the archiving code

843d9d3

Added option to directly pass the HF hub token when wiping the trace file, so I can use it locally outside of streamlit. Defaulted to None avoid changing existing behaviour.

3853f7c

Fixed bugs from the refactor of repo access. Now it should save trace again.

f10615b

Added the test reporting structure

82130cb

Refactored to bring common variables together. Also added a utility to get all the trace records as a list of records

8f424fc

Fixed the time.time bug here. Also a call to reset the Chroma DB

1cb115b

Tweaked the way the prompt is formatted going into the LLM query, to avoid the fine-tuned model giving nonsense answers

2022fec

Flipped the default dataset to be the baseline not the "All products"

a05b15e

Added ability to wipe the logs from the system status page

c0f0676

Added utility to serve up the test questions

a732fe2

Added the comment to the actual save

e47e542

Added ability to include comments on the saved trace

53697b7

Fixed typo in train script generation

e64e48c

Added adapter merging to the fine-tuning script generation

bc56e9e

Added awareness of the 'failed' status for an endpoint

82150c1

Removed decimals from group tag and moved where it is called in the side by side flow so they actually share the same ID

2f008c2

Fixed bug

bd663cd

Changed function name as getting a conflict on the server which I am not getting locally

59df961

Added saving of the trace data

745c1f4

Added ability to request the hf token be writable, in combination with new environment secret

b927d45

Fixed less than/greater than bug where I was dropping the wrong reviews to achieve a target average review. Update the sql data set too.

2b08e8f

Added the script to shape the data for testing and the associated sqlite containg the test data

7e353fe