Spaces:

Bikatr7
/

Kudasai

Running

App Files Files Community

Bikatr7 commited on Mar 10

Commit

ba92d48

•

1 Parent(s): 855c573

moved to beta 3.4.0

Browse files

Files changed (8) hide show

README.md +106 -65
handlers/json_handler.py +33 -25
kudasai.py +3 -8
models/kijiku.py +42 -84
modules/common/exceptions.py +1 -0
modules/common/file_ensurer.py +3 -3
modules/common/toolkit.py +3 -2
webgui.py +0 -0

README.md CHANGED Viewed

@@ -1,28 +1,16 @@
----
-license: gpl-3.0
-title: Kudasai
-sdk: gradio
-emoji: 🈷️
-python_version: 3.10.0
-app_file: webgui.py
-colorFrom: gray
-colorTo: gray
-short_description: Japanese-English preprocessor with automated translation.
-pinned: true
----
 ---------------------------------------------------------------------------------------------------------------------------------------------------
 **Table of Contents**
 - [Notes](#notes)
 - [Dependencies](#dependencies)
-- [Quick Start](#quick-start)
 - [Naming Conventions](#naming-conventions)
 - [Kairyou](#kairyou)
 - [Kaiseki](#kaiseki)
 - [Kijiku](#kijiku)
 - [Kijiku Settings](#kijiku-settings)
 - [Web GUI](#webgui)
 - [License](#license)
 - [Contact](#contact)
@@ -31,9 +19,9 @@ pinned: true
 Windows 10 and Linux Mint are the only tested operating systems, feel free to test on other operating systems and report back to me. I will do my best to fix any issues that arise.
-Python version: 3.8+
-Used to make (Japanese - English) translation easier by preprocessing the Japanese text (optional auto translation using deepL/openai API).
 Preprocessor is sourced from an external package, which I also designed, called [Kairyou](https://github.com/Bikatr7/Kairyou).
@@ -56,15 +44,44 @@ kairyou==1.4.0
 google-generativeai==0.4.0
-ja_core_news_lg @ https://github.com/explosion/spacy-models/releases/download/ja_core_news_lg-3.7.0/ja_core_news_lg-3.7.0-py3-none-any.whl#sha256=f08eecb4d40523045c9478ce59a67564fd71edd215f32c076fa91dc1f05cc7fd
 or see requirements.txt
 ---------------------------------------------------------------------------------------------------------------------------------------------------
 **Quick Start**<a name="quick-start"></a>
 Windows is assumed for the rest of this README, but the process should be similar for Linux.
 Simply run Kudasai.py, enter a txt file path to the text you wish to translate, and then insert a replacement json file path if you wish to use one. If you do not wish to use a replacement json file, you can simply input a blank space and Kudasai will skip preprocessing and go straight to translation.
 Kudasai will offer to index the text, which is useful for finding new names to add to the replacement json file. This is optional and can be skipped.
@@ -81,19 +98,6 @@ Follow the prompts from there and you should be good to go, results will be stor
 If you have any questions, comments, or concerns, please feel free to open an issue.
----------------------------------------------------------------------------------------------------------------------------------------------------
-**Naming Conventions**<a name="naming-conventions"></a>
-kudasai.py - Main script - ください　- Please
-Kairyou - Preprocessing Package - 改良 - Reform
-kaiseki.py - DeepL translation module - 解析 - Parsing
-kijiku.py - OpenAI translation module - 基軸 - Foundation
-Kudasai gets it's original name idea from it's inspiration, Atreyagaurav's Onegai. Which also means please. You can find that [here](https://github.com/Atreyagaurav/onegai)
 ---------------------------------------------------------------------------------------------------------------------------------------------------
 **Kairyou**<a name="kairyou"></a>
@@ -157,7 +161,7 @@ Kaiseki is the DeepL translation module, it is used to translate Japanese to Eng
 Kaiseki is effectively deprecated and is only maintained. Do not expect any updates to it anytime soon other than bug fixes or compatibility updates.
-Please note an API key is required for Kaiseki to work, you can get one here: https://www.deepl.com/pro#developer.
 It is free under 500k characters per month.
@@ -171,15 +175,17 @@ Kaiseki will store your obfuscated api key locally under KudasaiSecrets under %A
 **Kijiku**<a name="kijiku"></a>
-Kijiku is the OpenAI translation module, it is used to translate Japanese to English. It is very accurate and is the recommended translation module.
-You also need an api key for Kijiku to work, you can get one here: https://platform.openai.com/
-Currently, you can get a free API trial credit that lasts for a month and is worth around 15 dollars.
 Kijiku is vastly more complicated and has a lot of steps, so let's go over them.
-Provided you accept the prompt and choose '2' to run Kijiku, you will be prompted to enter your API key. Provided all goes well, Kijiku will attempt to load it's settings from KudasaiConfig, if it cannot find them, it will create them. Kijiku will store your obfuscated api key locally under KudasaiSecrets under %APPDATA% or ~/.config/ depending on your OS.
 You will be prompted if you'd like to change these settings, if you choose to do so, you'll be asked for which setting you'd like to change, and what to change it too, until you choose to exit. Multiple things can be done in this menu, so follow the prompts. If you want to change anything about the settings, you do it here.
@@ -187,7 +193,7 @@ You can also choose to upload your own settings file in the settings change menu
 You can change your api key right after this step if you wish.
-After that you will be shown an estimated cost of translation, this is based on the number of tokens in the preprocessed text as determined by tiktoken. Kijiku will then prompt for confirmation, run, and translate the preprocessed text and no other input is required.
 Your translated text will be stored in the output folder in the same directory as kudasai.py.
@@ -199,46 +205,73 @@ Also note that Kijiku's settings are somewhat complex, please see the section be
 (Fairly technical, can be abstracted away by using default settings or someone else's settings file.)
-See https://platform.openai.com/docs/api-reference/chat/create for further details
     ----------------------------------------------------------------------------------
-    model : ID of the model to use. As of right now, Kijiku only works with 'chat' models.
-    system_message : Instructions to the model. Basically tells the model what to do.
-    temp : What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Lower values are typically better for translation.
-    top_p : An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. I generally recommend altering this or temperature but not both.
-    n : How many chat completion choices to generate for each input message. Do not change this.
-    stream : If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. See the OpenAI python library on GitHub for example code. Do not change this.
-    stop : Up to 4 sequences where the API will stop generating further tokens. Do not change this.
-    logit_bias : Modifies the likelihood of specified tokens appearing in the completion. Do not change this.
-    max_tokens :  The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. I wouldn't recommend changing this. Is none by default. If you change to an integer, make sure it doesn't exceed that model's context length or your request will fail and repeat till timeout.
-    presence_penalty : Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. While negative values encourage repetition. Should leave this at 0.0.
-    frequency_penalty : Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. Negative values encourage repetition. Should leave this at 0.0.
-    message_mode : 1 or 2. 1 means the system message will actually be treated as a system message. 2 means it'll be treated as a user message. 1 is recommend for gpt-4 otherwise either works.
-    num_lines : The number of lines to be built into a prompt at once. Theoretically, more lines would be more cost effective, but other complications may occur with higher lines. So far been tested up to 48.
-    sentence_fragmenter_mode : 1 or 2 or 3 (1 - via regex and other nonsense, 2 - NLP via spacy (depreciated, will default to 3 if you select 2), 3 - None (Takes formatting and text directly from API return)) the API can sometimes return a result on a single line, so this determines the way Kijiku fragments the sentences if at all. Use 3 for gpt-4.
-    je_check_mode : 1 or 2, 1 will print out the jap then the english below separated by ---, 2 will attempt to pair the english and jap sentences, placing the jap above the eng. If it cannot, it will default to 1. Use 2 for gpt-4.
-    num_malformed_batch_retries : How many times Kijiku will attempt to mend a malformed batch, only for gpt4. Be careful with increasing as cost increases at (cost * length * n) at worst case.
-    batch_retry_timeout : How long Kijiku will try to translate a batch in seconds, if a requests exceeds this duration, Kijiku will leave it untranslated.
-    num_concurrent_batches : How many translations batches Kijiku will send to OpenAI at a time.
     ----------------------------------------------------------------------------------
-    stream, logit_bias, stop and n are included for legacy purposes, current versions of Kudasai will hardcode their values when validating the Kijiku_rule.json to their default values.
 ---------------------------------------------------------------------------------------------------------------------------------------------------
@@ -251,24 +284,32 @@ To run the Web GUI, simply run webgui.py in the same directory as kudasai.py
 Below are some images of the Web GUI.
 Indexing | Kairyou:
-![Indexing Screen | Kairyou](https://i.imgur.com/7HCdLt6.png)
 Preprocessing | Kairyou:
-![Preprocessing Screen | Kairyou](https://i.imgur.com/1qcPpeP.jpg)
 Translation | Kaiseki:
-![Translation Screen | Kaiseki](https://i.imgur.com/U9GBaLw.jpg)
 Translation | Kijiku:
-![Translation Screen | Kijiku](https://i.imgur.com/nySRp9y.jpg)
 Kijiku Settings:
-![Kijiku Settings](https://i.imgur.com/42IZYIz.jpg)
 Logging:
-![Logging](https://i.imgur.com/c9LmkPR.jpg)
-API Keys above are dead, so no worries on that end.
 ---------------------------------------------------------------------------------------------------------------------------------------------------
 **License**<a name="license"></a>
@@ -288,4 +329,4 @@ For any bugs or suggestions please use the issues tab [here](https://github.com/
 Once again, I actively encourage and welcome any feedback on this project.
----------------------------------------------------------------------------------------------------------------------------------------------------

 ---------------------------------------------------------------------------------------------------------------------------------------------------
 **Table of Contents**
 - [Notes](#notes)
 - [Dependencies](#dependencies)
 - [Naming Conventions](#naming-conventions)
+- [Quick Start](#quick-start)
 - [Kairyou](#kairyou)
 - [Kaiseki](#kaiseki)
 - [Kijiku](#kijiku)
 - [Kijiku Settings](#kijiku-settings)
 - [Web GUI](#webgui)
+- [Hugging Face](#huggingface)
 - [License](#license)
 - [Contact](#contact)
 Windows 10 and Linux Mint are the only tested operating systems, feel free to test on other operating systems and report back to me. I will do my best to fix any issues that arise.
+Python version: 3.10+
+Used to make (Japanese - English) translation easier by preprocessing the Japanese text (optional auto translation using DeepL, Gemini, and OpenAI APIs).
 Preprocessor is sourced from an external package, which I also designed, called [Kairyou](https://github.com/Bikatr7/Kairyou).
 google-generativeai==0.4.0
 or see requirements.txt
+Also requires spacy's ja_core_news_lg model, which can be installed via the following command:
+```bash
+python -m spacy download ja_core_news_lg
+```
+or on Linux
+```bash
+python3 -m spacy download ja_core_news_lg
+```
+---------------------------------------------------------------------------------------------------------------------------------------------------
+**Naming Conventions**<a name="naming-conventions"></a>
+kudasai.py - Main script - ください　- Please
+Kairyou - Preprocessing Package - 改良 - Reform
+kaiseki.py - DeepL translation module - 解析 - Parsing
+kijiku.py - OpenAI translation module - 基軸 - Foundation
+Kudasai gets it's original name idea from it's inspiration, Atreyagaurav's Onegai. Which also means please. You can find that [here](https://github.com/Atreyagaurav/onegai)
 ---------------------------------------------------------------------------------------------------------------------------------------------------
 **Quick Start**<a name="quick-start"></a>
 Windows is assumed for the rest of this README, but the process should be similar for Linux.
+Due to PyPi limitations, you need to install Spacy's JP Model, which can not be included automatically due to it being a direct dependency link which PyPi does not support. Make sure you do this after installing the requirements.txt file.
+```bash
+python -m spacy download ja_core_news_lg
+```
 Simply run Kudasai.py, enter a txt file path to the text you wish to translate, and then insert a replacement json file path if you wish to use one. If you do not wish to use a replacement json file, you can simply input a blank space and Kudasai will skip preprocessing and go straight to translation.
 Kudasai will offer to index the text, which is useful for finding new names to add to the replacement json file. This is optional and can be skipped.
 If you have any questions, comments, or concerns, please feel free to open an issue.
 ---------------------------------------------------------------------------------------------------------------------------------------------------
 **Kairyou**<a name="kairyou"></a>
 Kaiseki is effectively deprecated and is only maintained. Do not expect any updates to it anytime soon other than bug fixes or compatibility updates.
+Please note an API key is required for Kaiseki to work, you can get one [here](https://www.deepl.com/pro#developer).
 It is free under 500k characters per month.
 **Kijiku**<a name="kijiku"></a>
+Kijiku is the LLM translation module, it is used to translate Japanese to English. It is very accurate and is the recommended translation module.
+You also need an api key for Kijiku to work.
+You can get one here for OpenAI [here](https://platform.openai.com/)
+and for Gemini is a bit more complicated, you'll need to make a google cloud project, enable the vertex AI API, and then create an api key. Although Gemini is free under 60 request at once as of Kudasai v3.4.0.
 Kijiku is vastly more complicated and has a lot of steps, so let's go over them.
+Provided you accept the prompt and choose '2' to run Kijiku, you will be prompted to choose a LLM. Then to enter your api key. Provided all goes well, Kijiku will attempt to load it's settings from KudasaiConfig, if it cannot find them, it will create them. Kijiku will store your obfuscated api key locally under KudasaiSecrets under %APPDATA% or ~/.config/ depending on your OS.
 You will be prompted if you'd like to change these settings, if you choose to do so, you'll be asked for which setting you'd like to change, and what to change it too, until you choose to exit. Multiple things can be done in this menu, so follow the prompts. If you want to change anything about the settings, you do it here.
 You can change your api key right after this step if you wish.
+After that you will be shown an estimated cost of translation, this is based on the number of tokens in the preprocessed text as determined by tiktoken for OpenAI, and by Google for Gemini. Kijiku will then prompt for confirmation, run, and translate the preprocessed text and no other input is required.
 Your translated text will be stored in the output folder in the same directory as kudasai.py.
 (Fairly technical, can be abstracted away by using default settings or someone else's settings file.)
+    ----------------------------------------------------------------------------------
+    Kijiku Settings:
+    prompt_assembly_mode : 1 or 2. 1 means the system message will actually be treated as a system message. 2 means it'll be treated as a user message. 1 is recommend for gpt-4 otherwise either works. For Gemini, this setting is ignored.
+    number_of_lines_per_batch : The number of lines to be built into a prompt at once. Theoretically, more lines would be more cost effective, but other complications may occur with higher lines. So far been tested up to 48.
+    sentence_fragmenter_mode : 1 or 2  (1 - via regex and other nonsense) 2 - None (Takes formatting and text directly from API return)) the API can sometimes return a result on a single line, so this determines the way Kijiku fragments the sentences if at all. Use 2 for newer models.
+    je_check_mode : 1 or 2, 1 will print out the jap then the english below separated by ---, 2 will attempt to pair the english and jap sentences, placing the jap above the eng. If it cannot, it will default to 1. Use 2 for newer models.
+    number_of_malformed_batch_retries : (Malformed batch is when je-fixing fails) How many times Kijiku will attempt to mend a malformed batch (mending is resending the request), only for gpt4. Be careful with increasing as cost increases at (cost * length * n) at worst case. This setting is ignored if je_check_mode is set to 1.
+    batch_retry_timeout : How long Kijiku will try to translate a batch in seconds, if a requests exceeds this duration, Kijiku will leave it untranslated.
+    number_of_concurrent_batches : How many translations batches Kijiku will send to the translation API at a time. For OpenAI, be conservative as rate-limiting is aggressive, I'd suggest 3-5. For Gemini, do not exceed 60.
     ----------------------------------------------------------------------------------
+    Open AI Settings:
+    See https://platform.openai.com/docs/api-reference/chat/create for further details
+    ----------------------------------------------------------------------------------
+    openai_model : ID of the model to use. Kijiku only works with 'chat' models.
+    openai_system_message : Instructions to the model. Basically tells the model how to translate.
+    openai_temperature : What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Lower values are typically better for translation.
+    openai_top_p : An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. I generally recommend altering this or temperature but not both.
+    openai_n : How many chat completion choices to generate for each input message. Do not change this.
+    openai_stream : If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. See the OpenAI python library on GitHub for example code. Do not change this.
+    openai_stop : Up to 4 sequences where the API will stop generating further tokens. Do not change this.
+    openai_logit_bias : Modifies the likelihood of specified tokens appearing in the completion. Do not change this.
+    openai_max_tokens :  The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. I wouldn't recommend changing this. Is none by default. If you change to an integer, make sure it doesn't exceed that model's context length or your request will fail and repeat till timeout.
+    openai_presence_penalty : Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. While negative values encourage repetition. Should leave this at 0.0.
+    openai_frequency_penalty : Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. Negative values encourage repetition. Should leave this at 0.0.
+    ----------------------------------------------------------------------------------
+    openai_stream, openai_logit_bias, openai_stop and openai_n are included for completion's sake, current versions of Kudasai will hardcode their values when validating the Kijiku_rule.json to their default values. As different values for these settings do not have a use case in Kudasai's current implementation.
+    ----------------------------------------------------------------------------------
+    Gemini Settings:
+    https://ai.google.dev/docs/concepts#model-parameters for further details
+    ----------------------------------------------------------------------------------
+    gemini_model : The model to use. Currently only supports gemini-pro and gemini-pro-vision, the 1.0 model and it's aliases.
+    gemini_prompt : Instructions to the model. Basically tells the model how to translate.
+    gemini_temperature : What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Lower values are typically better for translation.
+    gemini_top_p : An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. I generally recommend altering this or temperature but not both.
+    gemini_top_k : Determines the number of most probable tokens to consider for each selection step. A higher value increases diversity, a lower value makes the output more deterministic.
+    gemini_candidate_count : The number of candidates to generate for each input message. Do not change this.
+    gemini_stream : If set, partial message deltas will be sent, like in Gemini Chat. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. Do not change this.
+    gemini_stop_sequences : Up to 4 sequences where the API will stop generating further tokens. Do not change this.
+    gemini_max_output_tokens : The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. I wouldn't recommend changing this. Is none by default. If you change to an integer, make sure it doesn't exceed that model's context length or your request will fail and repeat till timeout.
+    ----------------------------------------------------------------------------------
+    gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for completion's sake, current versions of Kudasai will hardcode their values when validating the Kijiku_rule.json to their default values. As different values for these settings do not have a use case in Kudasai's current implementation.
     ----------------------------------------------------------------------------------
 ---------------------------------------------------------------------------------------------------------------------------------------------------
 Below are some images of the Web GUI.
 Indexing | Kairyou:
+![Indexing Screen | Kairyou](https://i.imgur.com/0a2mzOI.png)
 Preprocessing | Kairyou:
+![Preprocessing Screen | Kairyou](https://i.imgur.com/2pt06gC.png)
 Translation | Kaiseki:
+![Translation Screen | Kaiseki](https://i.imgur.com/X98JYsp.png)
 Translation | Kijiku:
+![Translation Screen | Kijiku](https://i.imgur.com/X6IxyL8.png)
 Kijiku Settings:
+![Kijiku Settings](https://i.imgur.com/VX0fGd5.png)
 Logging:
+![Logging](https://i.imgur.com/IkUjpXR.png)
+---------------------------------------------------------------------------------------------------------------------------------------------------
+**Hugging Face**<a name="huggingface"></a>
+For those who are interested, or simply cannot run Kudasai locally, a instance of Kudasai's WebGUI is hosted on Hugging Face's servers. You can find it [here](https://huggingface.co/spaces/Bikatr7/Kudasai).
 ---------------------------------------------------------------------------------------------------------------------------------------------------
 **License**<a name="license"></a>
 Once again, I actively encourage and welcome any feedback on this project.
+---------------------------------------------------------------------------------------------------------------------------------------------------

handlers/json_handler.py CHANGED Viewed

@@ -137,27 +137,27 @@ gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for
         ]
         validation_rules = {
-            "prompt_assembly_mode": lambda x: 1 <= x <= 2,
-            "number_of_lines_per_batch": lambda x: x is isinstance(x, int) and x > 0,
-            "sentence_fragmenter_mode": lambda x: 1 <= x <= 2,
-            "je_check_mode": lambda x: 1 <= x <= 2,
-            "number_of_malformed_batch_retries": lambda x: x is isinstance(x, int) and x >= 0,
-            "batch_retry_timeout": lambda x: x is isinstance(x, int) and x >= 0,
-            "number_of_concurrent_batches": lambda x: x is isinstance(x, int) and x >= 0,
-            "openai_model": lambda x: x in FileEnsurer.ALLOWED_OPENAI_MODELS,
             "openai_system_message": lambda x: x not in ["", "None", None],
-            "openai_temperature": lambda x: 0 <= x <= 2,
-            "openai_top_p": lambda x: 0 <= x <= 1,
-            "openai_max_tokens": lambda x: x is None or isinstance(x, int),
-            "openai_presence_penalty": lambda x: -2 <= x <= 2,
-            "gemini_model": lambda x: x in FileEnsurer.ALLOWED_GEMINI_MODELS,
             "gemini_prompt": lambda x: x not in ["", "None", None],
-            "gemini_temperature": lambda x: 0 <= x <= 2,
-            "gemini_top_p": lambda x: x is None or 0 <= x <= 2,
-            "gemini_top_k": lambda x: x is None or isinstance(x, int) and x >= 0,
             "gemini_max_output_tokens": lambda x: x is None or isinstance(x, int),
         }
         try:
             ## ensure categories are present
             assert "base kijiku settings" in JsonHandler.current_kijiku_rules
@@ -197,6 +197,7 @@ gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for
         except Exception as e:
             Logger.log_action("Kijiku Rules.json is not valid, setting to invalid_placeholder, current:")
             Logger.log_action(str(JsonHandler.current_kijiku_rules))
             JsonHandler.current_kijiku_rules = FileEnsurer.INVALID_KIJIKU_RULES_PLACEHOLDER
@@ -342,7 +343,7 @@ gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for
         Parameters:
         setting_name (str) : The name of the setting to convert.
-        value (str) : The value to convert.
         Returns:
         (typing.Any) : The converted value.
@@ -367,18 +368,18 @@ gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for
             "openai_stream": {"type": bool, "constraints": lambda x: x is False},
             "openai_stop": {"type": None, "constraints": lambda x: x is None},
             "openai_logit_bias": {"type": None, "constraints": lambda x: x is None},
-            "openai_max_tokens": {"type": typing.Optional[int], "constraints": lambda x: x is None or isinstance(x, int)},
             "openai_presence_penalty": {"type": float, "constraints": lambda x: -2 <= x <= 2},
             "openai_frequency_penalty": {"type": float, "constraints": lambda x: -2 <= x <= 2},
             "gemini_model": {"type": str, "constraints": lambda x: x in FileEnsurer.ALLOWED_GEMINI_MODELS},
             "gemini_prompt": {"type": str, "constraints": lambda x: x not in ["", "None", None]},
             "gemini_temperature": {"type": float, "constraints": lambda x: 0 <= x <= 2},
-            "gemini_top_p": {"type": typing.Optional[float], "constraints": lambda x: x is None or 0 <= x <= 2},
-            "gemini_top_k": {"type": typing.Optional[int], "constraints": lambda x: x is None or x >= 0},
             "gemini_candidate_count": {"type": int, "constraints": lambda x: x == 1},
             "gemini_stream": {"type": bool, "constraints": lambda x: x is False},
             "gemini_stop_sequences": {"type": None, "constraints": lambda x: x is None},
-            "gemini_max_output_tokens": {"type": typing.Optional[int], "constraints": lambda x: x is None or isinstance(x, int)},
         }
         if(setting_name not in type_expectations):
@@ -394,11 +395,18 @@ gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for
         if(setting_info["type"] is None):
             converted_value = None
-        elif(setting_info["type"] == typing.Optional[int]):
-            if value is None:
                 converted_value = None
-            else:
                 converted_value = int(value)
         else:
             converted_value = setting_info["type"](value)

         ]
         validation_rules = {
+            "prompt_assembly_mode": lambda x: isinstance(x, int) and 1 <= x <= 2,
+            "number_of_lines_per_batch": lambda x: isinstance(x, int) and x > 0,
+            "sentence_fragmenter_mode": lambda x: isinstance(x, int) and 1 <= x <= 2,
+            "je_check_mode": lambda x: isinstance(x, int) and 1 <= x <= 2,
+            "number_of_malformed_batch_retries": lambda x: isinstance(x, int) and x >= 0,
+            "batch_retry_timeout": lambda x: isinstance(x, int) and x >= 0,
+            "number_of_concurrent_batches": lambda x: isinstance(x, int) and x >= 0,
+            "openai_model": lambda x: isinstance(x, str) and x in FileEnsurer.ALLOWED_OPENAI_MODELS,
             "openai_system_message": lambda x: x not in ["", "None", None],
+            "openai_temperature": lambda x: isinstance(x, float) and 0 <= x <= 2,
+            "openai_top_p": lambda x: isinstance(x, float) and 0 <= x <= 1,
+            "openai_max_tokens": lambda x: x is None or isinstance(x, int) and x > 0,
+            "openai_presence_penalty": lambda x: isinstance(x, float) and -2 <= x <= 2,
+            "gemini_model": lambda x: isinstance(x, str) and x in FileEnsurer.ALLOWED_GEMINI_MODELS,
             "gemini_prompt": lambda x: x not in ["", "None", None],
+            "gemini_temperature": lambda x: isinstance(x, float) and 0 <= x <= 2,
+            "gemini_top_p": lambda x: x is None or (isinstance(x, float) and 0 <= x <= 2),
+            "gemini_top_k": lambda x: x is None or (isinstance(x, int) and x >= 0),
             "gemini_max_output_tokens": lambda x: x is None or isinstance(x, int),
         }
         try:
             ## ensure categories are present
             assert "base kijiku settings" in JsonHandler.current_kijiku_rules
         except Exception as e:
             Logger.log_action("Kijiku Rules.json is not valid, setting to invalid_placeholder, current:")
+            Logger.log_action("Reason: " + str(e))
             Logger.log_action(str(JsonHandler.current_kijiku_rules))
             JsonHandler.current_kijiku_rules = FileEnsurer.INVALID_KIJIKU_RULES_PLACEHOLDER
         Parameters:
         setting_name (str) : The name of the setting to convert.
+        initial_value (str) : The initial value to convert.
         Returns:
         (typing.Any) : The converted value.
             "openai_stream": {"type": bool, "constraints": lambda x: x is False},
             "openai_stop": {"type": None, "constraints": lambda x: x is None},
             "openai_logit_bias": {"type": None, "constraints": lambda x: x is None},
+            "openai_max_tokens": {"type": int, "constraints": lambda x: x is None or isinstance(x, int)},
             "openai_presence_penalty": {"type": float, "constraints": lambda x: -2 <= x <= 2},
             "openai_frequency_penalty": {"type": float, "constraints": lambda x: -2 <= x <= 2},
             "gemini_model": {"type": str, "constraints": lambda x: x in FileEnsurer.ALLOWED_GEMINI_MODELS},
             "gemini_prompt": {"type": str, "constraints": lambda x: x not in ["", "None", None]},
             "gemini_temperature": {"type": float, "constraints": lambda x: 0 <= x <= 2},
+            "gemini_top_p": {"type": float, "constraints": lambda x: x is None or (isinstance(x, float) and 0 <= x <= 2)},
+            "gemini_top_k": {"type": int, "constraints": lambda x: x is None or x >= 0},
             "gemini_candidate_count": {"type": int, "constraints": lambda x: x == 1},
             "gemini_stream": {"type": bool, "constraints": lambda x: x is False},
             "gemini_stop_sequences": {"type": None, "constraints": lambda x: x is None},
+            "gemini_max_output_tokens": {"type": int, "constraints": lambda x: x is None or isinstance(x, int)},
         }
         if(setting_name not in type_expectations):
         if(setting_info["type"] is None):
             converted_value = None
+        elif(setting_info["type"] == int) or (setting_info["type"] == float):
+            if(value is None or value is ''):
                 converted_value = None
+            elif(setting_info["type"] == int):
                 converted_value = int(value)
+            else:
+                converted_value = float(value)
         else:
             converted_value = setting_info["type"](value)

kudasai.py CHANGED Viewed

@@ -55,16 +55,11 @@ class Kudasai:
         FileEnsurer.setup_needed_files()
-        Logger.clear_log_file()
         Logger.log_barrier()
         Logger.log_action("Kudasai started")
         Logger.log_action("Current version: " + Toolkit.CURRENT_VERSION)
         Logger.log_barrier()
-        Logger.push_batch()
-        Logger.clear_batch()
         try:
             with open(FileEnsurer.config_kijiku_rules_path, "r") as kijiku_rules_file:
@@ -316,11 +311,11 @@ async def main() -> None:
     """
-    Kudasai.boot()
-    Toolkit.clear_console()
     try:
         if(len(sys.argv) <= 1):
             await run_console_version()

         FileEnsurer.setup_needed_files()
         Logger.log_barrier()
         Logger.log_action("Kudasai started")
         Logger.log_action("Current version: " + Toolkit.CURRENT_VERSION)
         Logger.log_barrier()
         try:
             with open(FileEnsurer.config_kijiku_rules_path, "r") as kijiku_rules_file:
     """
     try:
+        Kudasai.boot()
+        Toolkit.clear_console()
         if(len(sys.argv) <= 1):
             await run_console_version()

models/kijiku.py CHANGED Viewed

@@ -19,7 +19,7 @@ from handlers.json_handler import JsonHandler
 from modules.common.file_ensurer import FileEnsurer
 from modules.common.logger import Logger
 from modules.common.toolkit import Toolkit
-from modules.common.exceptions import AuthenticationError, MaxBatchDurationExceededException, AuthenticationError, InternalServerError, RateLimitError, APITimeoutError
 from modules.common.decorators import permission_error_decorator
 from custom_classes.messages import SystemTranslationMessage, ModelTranslationMessage, Message
@@ -273,7 +273,7 @@ class Kijiku:
                 FileEnsurer.standard_overwrite_file(api_key_path, base64.b64encode(api_key.encode('utf-8')).decode('utf-8'), omit=True)
             ## if invalid key exit
-            except AuthenticationError:
                 Toolkit.clear_console()
@@ -390,8 +390,11 @@ class Kijiku:
         """
         Logger.log_barrier()
-        Logger.log_action("Kijiku Activated, Settings are as follows : ")
         Logger.log_barrier()
         JsonHandler.print_kijiku_rules()
@@ -448,13 +451,9 @@ class Kijiku:
         Logger.log_action("Starting Prompt Building")
         Logger.log_barrier()
-        if(Kijiku.LLM_TYPE == "openai"):
-            Kijiku.build_openai_translation_batches()
-            model = OpenAIService.model
-        else:
-            Kijiku.build_gemini_translation_batches()
-            model = GeminiService.model
         await Kijiku.handle_cost_estimate_prompt(model, omit_prompt=is_webgui)
@@ -549,10 +548,10 @@ class Kijiku:
             sentence = Kijiku.text_to_translate[index]
             stripped_sentence = sentence.strip()
-            lower_sentence = sentence.lower()
             has_quotes = any(char in sentence for char in ["「", "」", "『", "』", "【", "】", "\"", "'"])
-            is_part_in_sentence = "part" in lower_sentence
             if(len(prompt) < Kijiku.number_of_lines_per_batch):
@@ -581,78 +580,38 @@ class Kijiku:
         return prompt, index
-##-------------------start-of-build_openai_translation_batches()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     @staticmethod
-    def build_openai_translation_batches() -> None:
         """
-        Builds translations batches dict for the OpenAI service.
         """
         i = 0
         while i < len(Kijiku.text_to_translate):
-            batch, i = Kijiku.generate_text_to_translate_batches(i)
             batch = ''.join(batch)
-            ## message mode one structures the first message as a system message and the second message as a model message
-            if(Kijiku.prompt_assembly_mode == 1):
-                system_msg = SystemTranslationMessage(content=str(OpenAIService.system_message))
-            ## while message mode two structures the first message as a model message and the second message as a model message too, typically used for non-gpt-4 models if at all
-            else:
-                system_msg = ModelTranslationMessage(content=str(OpenAIService.system_message))
-            Kijiku.openai_translation_batches.append(system_msg)
-            model_msg = ModelTranslationMessage(content=batch)
-            Kijiku.openai_translation_batches.append(model_msg)
-        Logger.log_barrier()
-        Logger.log_action("Built Messages : ")
-        Logger.log_barrier()
-        i = 0
-        for message in Kijiku.openai_translation_batches:
-            i+=1
-            if(i % 2 == 0):
-                Logger.log_action(str(message))
             else:
-                Logger.log_action(str(message))
-                Logger.log_barrier()
-##-------------------start-of-build_gemini_translation_batches()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-    @staticmethod
-    def build_gemini_translation_batches() -> None:
-        """
-        Builds translations batches dict for the Gemini service.
-        """
-        i = 0
-        while i < len(Kijiku.text_to_translate):
-            batch, i = Kijiku.generate_text_to_translate_batches(i)
-            batch = ''.join(batch)
-            ## Gemini does not use system messages or model messages, and instead just takes a string input, so we just need to place the prompt before the text to be translated
-            Kijiku.gemini_translation_batches.append(GeminiService.prompt)
-            Kijiku.gemini_translation_batches.append(batch)
         Logger.log_barrier()
         Logger.log_action("Built Messages : ")
@@ -660,19 +619,17 @@ class Kijiku:
         i = 0
-        for message in Kijiku.gemini_translation_batches:
             i+=1
-            if(i % 2 == 0):
-                Logger.log_action(str(message))
-            else:
-                Logger.log_action(str(message))
                 Logger.log_barrier()
 ##-------------------start-of-estimate_cost()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     @staticmethod
@@ -850,7 +807,7 @@ class Kijiku:
         Logger.log_barrier()
         if(Kijiku.LLM_TYPE == "gemini"):
-            print("As of Kudasai v3.4.0, Gemini Pro is Free to use")
         Logger.log_action("Estimated number of tokens : " + str(num_tokens), output=True, omit_timestamp=True)
         Logger.log_action("Estimated minimum cost : " + str(min_cost) + " USD", output=True, omit_timestamp=True)
@@ -873,19 +830,20 @@ class Kijiku:
     async def handle_translation(model:str, index:int, length:int, translation_instructions:typing.Union[str, Message], translation_prompt:typing.Union[str, Message]) -> tuple[int, typing.Union[str, Message], str]:
         """
-        Handles the translation for a given system and user message.
         Parameters:
-        model (string) : the model used to translate the text.
-        index (int) : the index of the message in the text file.
-        length (int) : the length of the text file.
-        translation_instructions (typing.Union[str, Message]) : the translation instructions.
-        translation_prompt (typing.Union[str, Message]) : the translation prompt.
         Returns:
-        index (int) : the index of the message in the text file.
-        translation_prompt (typing.Union[str, Message]) : the translation prompt.
-        translated_message (str) : the translated message.
         """
@@ -1044,7 +1002,7 @@ class Kijiku:
         """
-        Fixes the J->E text to be more j-e check friendly.
         Note that fix_je() is not always accurate, and may use standard j-e formatting instead of the corrected formatting.

 from modules.common.file_ensurer import FileEnsurer
 from modules.common.logger import Logger
 from modules.common.toolkit import Toolkit
+from modules.common.exceptions import AuthenticationError, MaxBatchDurationExceededException, AuthenticationError, InternalServerError, RateLimitError, APITimeoutError, GoogleAuthError
 from modules.common.decorators import permission_error_decorator
 from custom_classes.messages import SystemTranslationMessage, ModelTranslationMessage, Message
                 FileEnsurer.standard_overwrite_file(api_key_path, base64.b64encode(api_key.encode('utf-8')).decode('utf-8'), omit=True)
             ## if invalid key exit
+            except (GoogleAuthError, AuthenticationError):
                 Toolkit.clear_console()
         """
+        Logger.log_barrier()
+        Logger.log_action("Kijiku Activated, LLM Type : " + Kijiku.LLM_TYPE)
         Logger.log_barrier()
+        Logger.log_action("Settings are as follows : ")
         Logger.log_barrier()
         JsonHandler.print_kijiku_rules()
         Logger.log_action("Starting Prompt Building")
         Logger.log_barrier()
+        Kijiku.build_translation_batches()
+        model = OpenAIService.model if Kijiku.LLM_TYPE == "openai" else GeminiService.model
         await Kijiku.handle_cost_estimate_prompt(model, omit_prompt=is_webgui)
             sentence = Kijiku.text_to_translate[index]
             stripped_sentence = sentence.strip()
+            lowercase_sentence = sentence.lower()
             has_quotes = any(char in sentence for char in ["「", "」", "『", "』", "【", "】", "\"", "'"])
+            is_part_in_sentence = "part" in lowercase_sentence
             if(len(prompt) < Kijiku.number_of_lines_per_batch):
         return prompt, index
+##-------------------start-of-build_translation_batches()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     @staticmethod
+    def build_translation_batches() -> None:
         """
+        Builds translations batches dict for the specified service.
         """
         i = 0
         while i < len(Kijiku.text_to_translate):
+            batch, i = Kijiku.generate_text_to_translate_batches(i)
             batch = ''.join(batch)
+            if(Kijiku.LLM_TYPE == 'openai'):
+                if(Kijiku.prompt_assembly_mode == 1):
+                    system_msg = SystemTranslationMessage(content=str(OpenAIService.system_message))
+                else:
+                    system_msg = ModelTranslationMessage(content=str(OpenAIService.system_message))
+                Kijiku.openai_translation_batches.append(system_msg)
+                model_msg = ModelTranslationMessage(content=batch)
+                Kijiku.openai_translation_batches.append(model_msg)
             else:
+                Kijiku.gemini_translation_batches.append(GeminiService.prompt)
+                Kijiku.gemini_translation_batches.append(batch)
         Logger.log_barrier()
         Logger.log_action("Built Messages : ")
         i = 0
+        for message in (Kijiku.openai_translation_batches if Kijiku.LLM_TYPE == 'openai' else Kijiku.gemini_translation_batches):
             i+=1
+            message = str(message) if Kijiku.LLM_TYPE == 'gemini' else message.content # type: ignore
+            if(i % 2 == 1):
                 Logger.log_barrier()
+            Logger.log_action(message)
 ##-------------------start-of-estimate_cost()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     @staticmethod
         Logger.log_barrier()
         if(Kijiku.LLM_TYPE == "gemini"):
+            Logger.log_action(f"As of Kudasai {Toolkit.CURRENT_VERSION}, Gemini Pro is Free to use", output=True, omit_timestamp=True)
         Logger.log_action("Estimated number of tokens : " + str(num_tokens), output=True, omit_timestamp=True)
         Logger.log_action("Estimated minimum cost : " + str(min_cost) + " USD", output=True, omit_timestamp=True)
     async def handle_translation(model:str, index:int, length:int, translation_instructions:typing.Union[str, Message], translation_prompt:typing.Union[str, Message]) -> tuple[int, typing.Union[str, Message], str]:
         """
+        Handles the translation requests for the specified service.
         Parameters:
+        model (string) : The model of the service used to translate the text.
+        index (int) : The index of the translation batch.
+        length (int) : The length of the translation batch.
+        translation_instructions (typing.Union[str, Message]) : The translation instructions.
+        translation_prompt (typing.Union[str, Message]) : The translation prompt.
         Returns:
+        index (int) : The index of the translation batch.
+        translation_prompt (typing.Union[str, Message]) : The translation prompt.
+        translated_message (str) : The translated message.
         """
         """
+        Fixes the J->E text to be more j-e checker friendly.
         Note that fix_je() is not always accurate, and may use standard j-e formatting instead of the corrected formatting.

modules/common/exceptions.py CHANGED Viewed

@@ -3,6 +3,7 @@
 ## for importing, other scripts will use from common.exceptions instead of from the third-party libraries themselves
 from openai import AuthenticationError, InternalServerError, RateLimitError, APITimeoutError
 from deepl.exceptions import AuthorizationException, QuotaExceededException
 ##-------------------start-of-MaxBatchDurationExceededException--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 ## for importing, other scripts will use from common.exceptions instead of from the third-party libraries themselves
 from openai import AuthenticationError, InternalServerError, RateLimitError, APITimeoutError
 from deepl.exceptions import AuthorizationException, QuotaExceededException
+from google.auth.exceptions import GoogleAuthError
 ##-------------------start-of-MaxBatchDurationExceededException--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

modules/common/file_ensurer.py CHANGED Viewed

@@ -80,14 +80,14 @@ class FileEnsurer():
         "openai_model": "gpt-4",
         "openai_system_message": "As a Japanese to English translator, translate narration into English simple past, everything else should remain in its original tense. Maintain original formatting, punctuation, and paragraph structure. Keep pre-translated terms and anticipate names not replaced. Preserve terms and markers marked with >>><<< and match the output's line count to the input's. Note: 〇 indicates chapter changes.",
         "openai_temperature": 0.3,
-        "openai_top_p": 1,
         "openai_n": 1,
         "openai_stream": False,
         "openai_stop": None,
         "openai_logit_bias": None,
         "openai_max_tokens": None,
-        "openai_presence_penalty": 0,
-        "openai_frequency_penalty": 0
     },
     "gemini settings": {

         "openai_model": "gpt-4",
         "openai_system_message": "As a Japanese to English translator, translate narration into English simple past, everything else should remain in its original tense. Maintain original formatting, punctuation, and paragraph structure. Keep pre-translated terms and anticipate names not replaced. Preserve terms and markers marked with >>><<< and match the output's line count to the input's. Note: 〇 indicates chapter changes.",
         "openai_temperature": 0.3,
+        "openai_top_p": 1.0,
         "openai_n": 1,
         "openai_stream": False,
         "openai_stop": None,
         "openai_logit_bias": None,
         "openai_max_tokens": None,
+        "openai_presence_penalty": 0.0,
+        "openai_frequency_penalty": 0.0
     },
     "gemini settings": {

modules/common/toolkit.py CHANGED Viewed

@@ -7,13 +7,14 @@ import platform
 import subprocess
 class Toolkit():
     """
     A class containing various functions that are used throughout Kudasai.
     """
-    CURRENT_VERSION = "v3.4.0-alpha"
 ##-------------------start-of-clear_console()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
@@ -228,7 +229,7 @@ class Toolkit():
         """
         if(is_archival):
-            time_stamp = datetime.now().strftime("%Y-%m-%d %H-%M-%S")
         else:
             time_stamp = "[" + datetime.now().strftime("%Y-%m-%d %H:%M:%S") + "] "

 import subprocess
 class Toolkit():
     """
     A class containing various functions that are used throughout Kudasai.
     """
+    CURRENT_VERSION = "v3.4.0-beta"
 ##-------------------start-of-clear_console()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
         """
         if(is_archival):
+            time_stamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
         else:
             time_stamp = "[" + datetime.now().strftime("%Y-%m-%d %H:%M:%S") + "] "

webgui.py CHANGED Viewed

The diff for this file is too large to render. See raw diff