### How to run

* Install libraries using the cell below (for grazie-api-gateway-client you will have to add a custom JB repository)
* Put the production prompt to file `data/prod_prompt.txt`
* Environment variables:
    - `GRAZIE_API_JWT_TOKEN` -- JWT token for grazie (check `api_wrappers/grazie_wrapper.py` to adjust the client initialization if necessary)
    - `HF_TOKEN` -- should _not_ be required; however, if it is, set it to a valid Hugging Face token

In [None]:
!pip install grazie-api-gateway-client
!pip install tqdm
!pip install pandas
!pip install datasets

In [20]:
from api_wrappers.grazie_wrapper import generate_for_prompt
from api_wrappers.hf_data_loader import load_full_commit_with_predictions_as_pandas
from tqdm import tqdm

tqdm.pandas()

In [21]:
with open("data/prod_prompt.txt") as f:
	PROD_PROMPT = f.read().strip()

def prod_prompt(diff):
	return PROD_PROMPT.replace("$diff", diff).replace("$text", "")

def generate_commit_message_prod(diff):
	generate_for_prompt(prod_prompt(diff))

In [None]:
generate_commit_message_prod("TEST")

In [22]:
DATA = load_full_commit_with_predictions_as_pandas()[["mods", "prediction"]].rename(columns={"mods": "diff", "prediction": "prediction_current"})
DATA.head()

Unnamed: 0,diff,prediction_current
0,"[{'change_type': 'MODIFY', 'old_path': 'cupy/c...",Extend memory management to consider CUDA stre...
1,"[{'change_type': 'MODIFY', 'old_path': 'tests/...",Implement utility methods for parameterized te...
2,"[{'change_type': 'MODIFY', 'old_path': 'numpy/...",Update numpy function imports to use numpy as ...
3,"[{'change_type': 'MODIFY', 'old_path': 'numpy/...",Switch to using internal implementation method...
4,"[{'change_type': 'MODIFY', 'old_path': 'numpy/...",Add type hints and refine array API wrappers\n...


In [None]:
DATA["prediction_prod"] = DATA.progress_apply(lambda row: generate_commit_message_prod(str(row["diff"])), axis=1)

In [23]:
current_avg_length = DATA["prediction_current"].str.len().mean()
print(f"Current average length: {current_avg_length}")

Current average length: 625.5644171779142


In [None]:
prod_avg_length = DATA["prediction_prod"].str.len().mean()
print(f"Prod average length: {prod_avg_length}")

In [None]:
print(f"Length ratio (current / prod): {current_avg_length / prod_avg_length})")