Bikatr7 commited on
Commit
4bc65dc
1 Parent(s): ed057e0

added everything

Browse files
README.md CHANGED
@@ -1,141 +1,229 @@
1
- ---
2
- license: gpl-3.0
3
- title: Kudasai
4
- sdk: gradio
5
- emoji: 🈷️
6
- python_version: 3.10.0
7
- app_file: webgui.py
8
- colorFrom: gray
9
- colorTo: gray
10
- short_description: Japanese-English preprocessor with automated translation.
11
- pinned: true
12
- ---
13
-
14
  ---------------------------------------------------------------------------------------------------------------------------------------------------
15
  **Table of Contents**
16
 
17
- - [Notes](#notes)
18
- - [Naming Conventions](#naming-conventions)
19
- - [General Usage](#general-usage)
20
- - [Indexing and Preprocessing](#kairyou)
21
- - [Translation with DeepL](#kaiseki)
22
- - [Translation with LLMs](#kijiku)
23
- - [Translation with LLMs Settings](#translation-with-llms-settings)
24
- - [Web GUI](#webgui)
25
- - [License](#license)
26
- - [Contact](#contact)
 
 
 
 
 
 
27
 
28
  ---------------------------------------------------------------------------------------------------------------------------------------------------
29
- **Notes**<a name="notes"></a>
 
 
30
 
31
- This readme is for the Hugging Space instance of Kudasai's WebGUI and the WebGUI itself, to run Kudasai locally or see any info on the project, please see the [GitHub Page](https://github.com/Bikatr7/Kudasai).
 
 
32
 
33
  Streamlining Japanese-English Translation with Advanced Preprocessing and Integrated Translation Technologies.
34
 
35
  Preprocessor and Translation logic is sourced from external packages, which I also designed, see [Kairyou](https://github.com/Bikatr7/Kairyou) and [EasyTL](https://github.com/Bikatr7/easytl) for more information.
36
 
37
- Kudasai has a public trello board, you can find it [here](https://trello.com/b/Wsuwr24S/kudasai) to see what I'm working on.
38
 
39
- The webgui on huggingface does not save anything through runs, so you will need to download the output files or copy the text out of the webgui. API keys are not saved, and the output folder is overwritten every time you run it. Archives deleted every run as well.
 
40
 
41
  ---------------------------------------------------------------------------------------------------------------------------------------------------
42
- **Naming Conventions**<a name="naming-conventions"></a>
43
 
44
- kudasai.py - Main script - ください - Please
45
 
46
- Kairyou - Preprocessing Package - 改良 - Reform
47
 
48
- kaiseki.py - DeepL translation module - 解析 - Parsing
49
 
50
- kijiku.py - OpenAI translation module - 基軸 - Foundation
51
 
52
- Kudasai gets it's original name idea from it's inspiration, Atreyagaurav's Onegai. Which also means please. You can find that [here](https://github.com/Atreyagaurav/onegai)
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
  ---------------------------------------------------------------------------------------------------------------------------------------------------
 
55
 
56
- **General Usage**<a name="general-usage"></a>
57
 
58
- Kudasai's WebGUI is pretty easy to understand for the general usage, most incorrect actions will be caught by the system and a message will be displayed to the user on how to correct it.
59
 
60
- Normally, Kudasai would save files to the local system, but on Hugging Face's servers, this is not possible. Instead, you'll have to click the 'Save As' button to download the files to your local system.
 
 
61
 
62
- Or you can click the copy button on the top right of textbox modals to copy the text to your clipboard.
63
 
64
- For further details, see below chapters.
65
 
66
- ---------------------------------------------------------------------------------------------------------------------------------------------------
 
 
 
 
 
 
 
 
 
 
 
 
67
 
68
- **Indexing and Preprocessing**<a name="kairyou"></a>
69
 
70
- Indexing is not for everyone, only use it if you have a large amount of previous text and want to flag new names. It can be a very slow and long process, especially on Hugging Face's servers. It's recommended to use a local version of Kudasai for this process.
71
 
72
- You'll need a txt file or some text to index. You'll also need a knowledge base, this can either be a single txt file or a directory of them, as well as a replacements json. Either Kudasai or Fukuin Type works. See [this](https://github.com/Bikatr7/Kairyou?tab=readme-ov-file#kairyou) for further details on replacement jsons.
73
 
74
- Please do indexing before preprocessing, output is neater that way.
75
 
76
- For Preprocessing, you'll need a txt file or some text to preprocess. You'll also need a replacements json. Either Kudasai or Fukuin Type works like with indexing.
77
 
78
- For both, text is put in the textbox modals, with the output text being in the first field, and results being in the second field.
79
 
80
- They both have a debug field, but neither module really uses it.
 
 
 
 
81
 
82
  ---------------------------------------------------------------------------------------------------------------------------------------------------
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
 
84
- **Translation with DeepL**<a name="kaiseki"></a>
85
 
86
- DeepL is a paid service, so you'll need an API key to use it. You can get one [here](https://www.deepl.com/pro-api). However is free under 500,000 characters a month.
87
 
88
- Same general things apply here, use a text input (File or raw text). Also enter your API key in the API key field.
 
 
89
 
90
- While DeepL translation does work, it is currently deprecated in favor of the LLMs, and a bit buggy. It's recommended to use the LLMs for translation. Perhaps in the future, I'll update the DeepL translation to be more stable given demand.
 
91
 
92
- DeepL translation is fairly unsophisticated compared to the LLMs, so there's not much to configure. Press the translate button and wait for the results. Output will show in the appropriate fields.
 
 
 
 
 
 
 
 
 
 
 
 
93
 
94
  ---------------------------------------------------------------------------------------------------------------------------------------------------
95
 
96
- **Translation with LLMs**<a name="kijiku"></a>
 
 
97
 
98
- Kudasai supports 2 different LLMs at the moment, OpenAI's GPT and Google's Gemini.
99
 
100
- For OpenAI, you'll need an API key, you can get one [here](https://platform.openai.com/docs/api-reference/authentication). This is a paid service with no free tier.
 
 
 
 
 
 
 
101
 
102
- For Gemini, you'll need an API key, you can get one [here](https://ai.google.dev/tutorials/setup). Gemini is free to use under 60 concurrent requests.
103
 
104
- I'd recommend using GPT for most things, as it's generally better at translation.
105
 
106
- Once again, mostly straightforward, fill in your API key, select your LLM, and select your text. You'll also need to add your settings file if on HuggingFace.
107
 
108
- You can calculate costs here or just translate. Output will show in the appropriate fields.
109
 
110
- For further details on the settings file, see [here](#translation-with-llms-settings).
 
 
 
 
111
 
112
  ---------------------------------------------------------------------------------------------------------------------------------------------------
113
 
114
- **Translation with LLMs Settings**<a name="translation-with-llms-settings"></a>
115
 
116
  (Fairly technical, can be abstracted away by using default settings or someone else's settings file.)
117
 
118
- ----------------------------------------------------------------------------------
119
- Kijiku Settings:
120
 
121
- prompt_assembly_mode : 1 or 2. 1 means the system message will actually be treated as a system message. 2 means it'll be treated as a user message. 1 is recommend for gpt-4 otherwise either works. For Gemini, this setting is ignored.
122
 
123
- number_of_lines_per_batch : The number of lines to be built into a prompt at once. Theoretically, more lines would be more cost effective, but other complications may occur with higher lines. So far been tested up to 48.
124
 
125
- sentence_fragmenter_mode : 1 or 2 (1 - via regex and other nonsense) 2 - None (Takes formatting and text directly from API return)) the API can sometimes return a result on a single line, so this determines the way Kijiku fragments the sentences if at all. Use 2 for newer models.
126
 
127
- je_check_mode : 1 or 2, 1 will print out the jap then the english below separated by ---, 2 will attempt to pair the english and jap sentences, placing the jap above the eng. If it cannot, it will default to 1. Use 2 for newer models.
128
 
129
- number_of_malformed_batch_retries : (Malformed batch is when je-fixing fails) How many times Kijiku will attempt to mend a malformed batch (mending is resending the request), only for gpt4. Be careful with increasing as cost increases at (cost * length * n) at worst case. This setting is ignored if je_check_mode is set to 1.
130
 
131
- batch_retry_timeout : How long Kijiku will try to translate a batch in seconds, if a requests exceeds this duration, Kijiku will leave it untranslated.
132
 
133
- number_of_concurrent_batches : How many translations batches Kijiku will send to the translation API at a time. For OpenAI, be conservative as rate-limiting is aggressive, I'd suggest 3-5. For Gemini, do not exceed 60.
134
  ----------------------------------------------------------------------------------
135
  Open AI Settings:
136
  See https://platform.openai.com/docs/api-reference/chat/create for further details
137
  ----------------------------------------------------------------------------------
138
- openai_model : ID of the model to use. Kijiku only works with 'chat' models.
139
 
140
  openai_system_message : Instructions to the model. Basically tells the model how to translate.
141
 
@@ -143,13 +231,13 @@ For further details on the settings file, see [here](#translation-with-llms-sett
143
 
144
  openai_top_p : An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. I generally recommend altering this or temperature but not both.
145
 
146
- openai_n : How many chat completion choices to generate for each input message. Do not change this.
147
 
148
- openai_stream : If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. See the OpenAI python library on GitHub for example code. Do not change this.
149
 
150
- openai_stop : Up to 4 sequences where the API will stop generating further tokens. Do not change this.
151
 
152
- openai_logit_bias : Modifies the likelihood of specified tokens appearing in the completion. Do not change this.
153
 
154
  openai_max_tokens : The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. I wouldn't recommend changing this. Is none by default. If you change to an integer, make sure it doesn't exceed that model's context length or your request will fail and repeat till timeout.
155
 
@@ -157,12 +245,12 @@ For further details on the settings file, see [here](#translation-with-llms-sett
157
 
158
  openai_frequency_penalty : Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. Negative values encourage repetition. Should leave this at 0.0.
159
  ----------------------------------------------------------------------------------
160
- openai_stream, openai_logit_bias, openai_stop and openai_n are included for completion's sake, current versions of Kudasai will hardcode their values when validating the Kijiku_rule.json to their default values. As different values for these settings do not have a use case in Kudasai's current implementation.
161
  ----------------------------------------------------------------------------------
162
  Gemini Settings:
163
- https://ai.google.dev/docs/concepts#model-parameters for further details
164
  ----------------------------------------------------------------------------------
165
- gemini_model : The model to use. Currently only supports gemini-pro and gemini-pro-vision, the 1.0 model and it's aliases.
166
 
167
  gemini_prompt : Instructions to the model. Basically tells the model how to translate.
168
 
@@ -172,43 +260,69 @@ For further details on the settings file, see [here](#translation-with-llms-sett
172
 
173
  gemini_top_k : Determines the number of most probable tokens to consider for each selection step. A higher value increases diversity, a lower value makes the output more deterministic.
174
 
175
- gemini_candidate_count : The number of candidates to generate for each input message. Do not change this.
176
 
177
- gemini_stream : If set, partial message deltas will be sent, like in Gemini Chat. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. Do not change this.
178
 
179
- gemini_stop_sequences : Up to 4 sequences where the API will stop generating further tokens. Do not change this.
180
 
181
  gemini_max_output_tokens : The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. I wouldn't recommend changing this. Is none by default. If you change to an integer, make sure it doesn't exceed that model's context length or your request will fail and repeat till timeout.
182
  ----------------------------------------------------------------------------------
183
- gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for completion's sake, current versions of Kudasai will hardcode their values when validating the Kijiku_rule.json to their default values. As different values for these settings do not have a use case in Kudasai's current implementation.
 
 
 
184
  ----------------------------------------------------------------------------------
 
 
 
 
 
 
 
185
 
186
  ---------------------------------------------------------------------------------------------------------------------------------------------------
187
 
188
- **Web GUI**<a name="webgui"></a>
189
 
190
- Below are images of the WebGUI:
191
 
192
- Indexing | Kairyou:
193
- ![Indexing Screen | Kairyou](https://i.imgur.com/0a2mzOI.png)
194
 
195
- Preprocessing | Kairyou:
196
- ![Preprocessing Screen | Kairyou](https://i.imgur.com/2pt06gC.png)
197
 
198
- Translation | Kaiseki:
199
- ![Translation Screen | Kaiseki](https://i.imgur.com/X98JYsp.png)
200
 
201
- Translation | Kijiku:
202
- ![Translation Screen | Kijiku](https://i.imgur.com/X6IxyL8.png)
203
 
204
- Kijiku Settings:
205
- ![Kijiku Settings](https://i.imgur.com/VX0fGd5.png)
206
 
207
- Logging:
208
- ![Logging](https://i.imgur.com/IkUjpXR.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
209
 
210
  ---------------------------------------------------------------------------------------------------------------------------------------------------
211
- **License**<a name="license"></a>
212
 
213
  This project (Kudasai) is licensed under the GNU General Public License (GPL). You can find the full text of the license in the [LICENSE](License.md) file.
214
 
@@ -217,12 +331,18 @@ The GPL is a copyleft license that promotes the principles of open-source softwa
217
  Please note that this information is a brief summary of the GPL. For a detailed understanding of your rights and obligations under this license, please refer to the full license text.
218
 
219
  ---------------------------------------------------------------------------------------------------------------------------------------------------
220
- **Contact**<a name="contact"></a>
221
 
222
- If you have any questions, comments, or concerns, please feel free to contact me at [Tetralon07@gmail.com](mailto:Tetralon07@gmail.com).
223
 
224
  For any bugs or suggestions please use the issues tab [here](https://github.com/Bikatr7/Kudasai/issues).
225
 
226
- Once again, I actively encourage and welcome any feedback on this project.
 
 
 
 
 
 
227
 
228
  ---------------------------------------------------------------------------------------------------------------------------------------------------
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---------------------------------------------------------------------------------------------------------------------------------------------------
2
  **Table of Contents**
3
 
4
+ - [**Notes**](#notes)
5
+ - [**Dependencies**](#dependencies)
6
+ - [**Quick Start**](#quick-start)
7
+ - [**Command Line Interface (CLI)**](#command-line-interface-cli)
8
+ - [Usage](#usage)
9
+ - [Preprocess Mode](#preprocess-mode)
10
+ - [Translate Mode](#translate-mode)
11
+ - [Additional Notes](#additional-notes)
12
+ - [**Preprocessing**](#preprocessing)
13
+ - [**Translator**](#translator)
14
+ - [**Translator Settings**](#translator-settings)
15
+ - [**Web GUI**](#web-gui)
16
+ - [**Hugging Face**](#hugging-face)
17
+ - [**License**](#license)
18
+ - [**Contact**](#contact)
19
+ - [**Acknowledgements**](#acknowledgements)
20
 
21
  ---------------------------------------------------------------------------------------------------------------------------------------------------
22
+ ## **Notes**<a name="notes"></a>
23
+
24
+ Windows 10 and Linux Mint are the only tested operating systems, feel free to test on other operating systems and report back to me. I will do my best to fix any issues that arise.
25
 
26
+ To see the README for the Hugging Face hosted version of Kudasai, please see [here](https://github.com/Bikatr7/Kudasai/blob/main/lib/gui/HUGGING_FACE_README.md). Further WebGUI documentation can be found there as well.
27
+
28
+ Python version: 3.10+
29
 
30
  Streamlining Japanese-English Translation with Advanced Preprocessing and Integrated Translation Technologies.
31
 
32
  Preprocessor and Translation logic is sourced from external packages, which I also designed, see [Kairyou](https://github.com/Bikatr7/Kairyou) and [EasyTL](https://github.com/Bikatr7/easytl) for more information.
33
 
34
+ Kudasai has a public trello board, you can find it [here](https://trello.com/b/Wsuwr24S/kudasai) to see what I'm working on and what's coming up.
35
 
36
+ Kudasai is proud to have been a Backdrop Build v3 Finalist:
37
+ https://backdropbuild.com/builds/v3/kudasai
38
 
39
  ---------------------------------------------------------------------------------------------------------------------------------------------------
40
+ ## **Dependencies**<a name="dependencies"></a>
41
 
42
+ backoff==2.2.1
43
 
44
+ gradio==4.20.0
45
 
46
+ kairyou==1.5.0
47
 
48
+ easytl==0.3.3
49
 
50
+ or see requirements.txt
51
+
52
+ Also requires spacy's ja_core_news_lg model, which can be installed via the following command:
53
+
54
+ ```bash
55
+ python -m spacy download ja_core_news_lg
56
+ ```
57
+
58
+ or on Linux
59
+
60
+ ```bash
61
+ python3 -m spacy download ja_core_news_lg
62
+ ```
63
 
64
  ---------------------------------------------------------------------------------------------------------------------------------------------------
65
+ ## **Quick Start**<a name="quick-start"></a>
66
 
67
+ Windows is assumed for the rest of this README, but the process should be similar for Linux. This is for the console version, for something less linear, see the [Web GUI](#webgui) section.
68
 
69
+ Due to PyPi limitations, you need to install SpaCy's JP Model, which can not be included automatically due to it being a direct dependency link which PyPi does not support. Make sure you do this after installing the requirements.txt file as it requires Kairyou/SpaCy to be installed first.
70
 
71
+ ```bash
72
+ python -m spacy download ja_core_news_lg
73
+ ```
74
 
75
+ Simply run Kudasai.py, enter a txt file path to the text you wish to preprocess/translate, and then insert a replacement json file path if you wish to use one. If you do not wish to use a replacement json file, you can simply input a blank space and Kudasai will skip preprocessing and go straight to translation.
76
 
77
+ Kudasai will offer to index the text, which is useful for finding new names to add to the replacement json file. This is optional and can be skipped.
78
 
79
+ After preprocessing is completed (if triggered), you will be prompted to choose a translation method.
80
+
81
+ You can choose between OpenAI, Gemini, and DeepL. Each have their own pros and cons, but OpenAI is the recommended translation method. DeepL and Gemini currently offer free versions, but all three require an api key, you will be prompted to enter this key when you choose to run the translation module.
82
+
83
+ Next, Kudasai will ask you to confirm it's settings. This can be overwhelming, but you can simply enter 1 to confirm and use the default settings. If you wish to change them, you can do so here.
84
+
85
+ See the [**Translator Settings**](#translator-settings) section for more information on Kudasai's Translation settings, but default should run fine. Inside the demo folder is a copy of the settings I use to translate COTE should you wish to use them. There is also a demo txt file in the demo folder that you can use to test Kudasai.
86
+
87
+ Kudasai will then ask if you want to change your api key, simply enter 2 for now.
88
+
89
+ Next Kudasai will display an estimated cost of translation, this is based on the number of tokens in the preprocessed text as determined by tiktoken for OpenAI, by Google for Gemini, and by DeepL for DeepL. Kudasai will then prompt for confirmation, if this is fine, enter 1 to run the translation module otherwise 2 to exit.
90
+
91
+ Kudasai will then run the translation module and output the translated text and other logs to the output folder in the same directory as Kudasai.py.
92
 
93
+ These files are:
94
 
95
+ "debug_log.txt" : A log of crucial information that occurred during Kudasai's run, useful for debugging or reporting issues as well as seeing what was done.
96
 
97
+ "error_log.txt" : A log of errors that occurred during Kudasai's run if any, useful for debugging or reporting issues.
98
 
99
+ "je_check_text.txt" : A log of the Japanese and English sentences that were paired together, useful for checking the accuracy of the translation and further editing of a machine translation.
100
 
101
+ "preprocessed_text.txt" : The preprocessed text, the text output by Kairyou (preprocessor).
102
 
103
+ "preprocessing_results.txt" : A log of the results of the preprocessing, shows what was replaced and how many times.
104
 
105
+ "translated_text.txt" : The translated text, the text output by Kaiseki or Kijiku.
106
+
107
+ Old runs are stored in the archive folder in output as well.
108
+
109
+ If you have any questions, comments, or concerns, please feel free to open an issue.
110
 
111
  ---------------------------------------------------------------------------------------------------------------------------------------------------
112
+ ## **Command Line Interface (CLI)**<a name="cli"></a>
113
+
114
+ Kudasai provides a Command Line Interface (CLI) for preprocessing and translating text files. This section details how to use the CLI, including the required and optional arguments for each mode.
115
+
116
+ ### Usage
117
+
118
+ The CLI supports two modes: `preprocess` and `translate`. Each mode requires specific arguments to function properly.
119
+
120
+ #### Preprocess Mode
121
+
122
+ The `preprocess` mode preprocesses the text file using the provided replacement JSON file.
123
+
124
+ **Command Structure:**
125
+
126
+ ```bash
127
+ python path_to_kudasai.py preprocess <input_file> <replacement_json> [<knowledge_base>]
128
+ ```
129
+
130
+ **Required Arguments:**
131
+ - `<input_file>`: Path to the text file to preprocess.
132
+ - `<replacement_json>`: Path to the replacement JSON file.
133
+
134
+ **Optional Arguments:**
135
+ - `<knowledge_base>`: Path to the knowledge base file (directory, file, or text).
136
+
137
+ **Example:**
138
+
139
+ ```bash
140
+ python C:\\path\\to\\kudasai.py preprocess "C:\\path\\to\\input_file.txt" "C:\\path\\to\\replacement_json.json" "C:\\path\\to\\knowledge_base"
141
+ ```
142
+
143
+ #### Translate Mode
144
 
145
+ The `translate` mode translates the text file using the specified translation method.
146
 
147
+ **Command Structure:**
148
 
149
+ ```bash
150
+ python path_to_kudasai.py translate <input_file> <translation_method> [<translation_settings_json>] [<api_key>]
151
+ ```
152
 
153
+ **Required Arguments:**
154
+ - `<input_file>`: Path to the text file to translate.
155
 
156
+ **Optional Arguments:**
157
+ - `<translation_method>`: Translation method to use (`'deepl'`, `'openai'`, or `'gemini'`). Defaults to `'deepl'`.
158
+ - `<translation_settings_json>`: Path to the translation settings JSON file (overrides current settings).
159
+ - `<api_key>`: API key for the translation service. If not provided, it will use the in the settings directory or prompt for it if that's not found.
160
+
161
+ **Example:**
162
+
163
+ ```bash
164
+ python C:\\path\\to\\kudasai.py translate "C:\\path\\to\\input_file.txt" gemini "C:\\path\\to\\translation_settings.json" "YOUR_API_KEY"
165
+ ```
166
+
167
+ ### Additional Notes
168
+ - All arguments should be enclosed in double quotes if they contain spaces. Double quotes are optional and will be stripped. Single quotes are not allowed.
169
 
170
  ---------------------------------------------------------------------------------------------------------------------------------------------------
171
 
172
+ ## **Preprocessing**<a name="preprocessing"></a>
173
+
174
+ Preprocessing is the act of preparing text for translation by replacing certain words or phrases with their translated counterparts.
175
 
176
+ Kudasai uses Kairyou for preprocessing, which is a powerful preprocessor that can replace text in a text file based on a json file. This is useful for replacing names, places, and other things that may not translate well or to simply speed up the translation process.
177
 
178
+ You can run the preprocessor by using the CLI or simply running kudasai.py as instructed in the [Quick Start](#quick-start) section.
179
+
180
+ Many replacement json files are included in the jsons folder, you can also make your own if you wish provided it follows the same format. See an example below
181
+ Kudasai/Kairyou works with both Kudasai and Fukuin Json's, the below is a Kudasai type json.
182
+
183
+ ![Example JSON](https://i.imgur.com/u3FnUia.jpg)
184
+
185
+ ---------------------------------------------------------------------------------------------------------------------------------------------------
186
 
187
+ ## **Translator**<a name="translator"></a>
188
 
189
+ Kudasai uses EasyTL for translation, which is a versatile translation library that uses several translation APIs to translate text.
190
 
191
+ Kudasai currently supports OpenAI, Gemini, and DeepL for translation. OpenAI is the recommended translation method, but DeepL and Gemini are also good alternatives.
192
 
193
+ You can run the translator by running kudasai.py as instructed in the [Quick Start](#quick-start) section.
194
 
195
+ Note that you need an API key for OpenAI, Gemini, and DeepL. You will be prompted to enter this key when you choose to run the translation module.
196
+
197
+ The translator has a lot of settings, simply using the default settings is fine or the one provided in the demo folder. You can also change these manually when confirming your settings, as well as loading a custom json as your settings by pressing c at this window, with the settings in the script directory.
198
+
199
+ The settings are fairly complex, see the below section [Translator Settings](#translator-settings) for more information.
200
 
201
  ---------------------------------------------------------------------------------------------------------------------------------------------------
202
 
203
+ ## **Translator Settings**<a name="translator-settings"></a>
204
 
205
  (Fairly technical, can be abstracted away by using default settings or someone else's settings file.)
206
 
207
+ Base Translation Settings:
 
208
 
209
+ prompt_assembly_mode : 1 or 2. 1 means the system message will actually be treated as a system message. 2 means it'll be treated as a user message. 1 is recommend for gpt-4 otherwise either works. For Gemini & DeepL, this setting is ignored.
210
 
211
+ number_of_lines_per_batch : The number of lines to be built into a prompt at once. Theoretically, more lines would be more cost effective, but other complications may occur with higher lines. So far been tested up to 48 by me.
212
 
213
+ sentence_fragmenter_mode : 1 or 2 (1 - via regex and other nonsense) 2 - None (Takes formatting and text directly from API return)) the API can sometimes return a result on a single line, so this determines the way Kudasai fragments the sentences if at all. Use 2 for newer models and Deepl.
214
 
215
+ je_check_mode : 1 or 2, 1 will print out the jap then the english below separated by ---, 2 will attempt to pair the english and jap sentences, placing the jap above the eng. If it cannot, it will default to 1. Use 2 for newer models and DeepL.
216
 
217
+ number_of_malformed_batch_retries : (Malformed batch is when je-fixing fails) How many times Kudasai will attempt to mend a malformed batch (mending is resending the request). Be careful with increasing as cost increases at (cost * length * n) at worst case. This setting is ignored if je_check_mode is set to 1.
218
 
219
+ batch_retry_timeout : How long Kudasai will try to translate a batch in seconds, if a requests exceeds this duration, Kudasai will leave it untranslated.
220
 
221
+ number_of_concurrent_batches : How many translations batches Kudasai will send to the translation API at a time. For OpenAI, be conservative as rate-limiting is aggressive, I'd suggest 3-5. For Gemini, do not exceed 15 for 1.0 or 2 for 1.5. This setting more or less doesn't matter for DeepL.
222
  ----------------------------------------------------------------------------------
223
  Open AI Settings:
224
  See https://platform.openai.com/docs/api-reference/chat/create for further details
225
  ----------------------------------------------------------------------------------
226
+ openai_model : ID of the model to use. Kudasai only works with 'chat' models.
227
 
228
  openai_system_message : Instructions to the model. Basically tells the model how to translate.
229
 
 
231
 
232
  openai_top_p : An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. I generally recommend altering this or temperature but not both.
233
 
234
+ openai_n : How many chat completion choices to generate for each input message. Do not change this, as Kudasai will always use 1.
235
 
236
+ openai_stream : If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. See the OpenAI python library on GitHub for example code. Do not change this as Kudasai does not support this feature.
237
 
238
+ openai_stop : Up to 4 sequences where the API will stop generating further tokens. Do not change this as Kudasai does not support this feature.
239
 
240
+ openai_logit_bias : Modifies the likelihood of specified tokens appearing in the completion. Do not change this as Kudasai does not support this feature.
241
 
242
  openai_max_tokens : The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. I wouldn't recommend changing this. Is none by default. If you change to an integer, make sure it doesn't exceed that model's context length or your request will fail and repeat till timeout.
243
 
 
245
 
246
  openai_frequency_penalty : Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. Negative values encourage repetition. Should leave this at 0.0.
247
  ----------------------------------------------------------------------------------
248
+ openai_stream, openai_logit_bias, openai_stop and openai_n are included for completion's sake, current versions of Kudasai will hardcode their values when validating the translation_settings.json to their default values. As different values for these settings do not have a use case in Kudasai's current implementation.
249
  ----------------------------------------------------------------------------------
250
  Gemini Settings:
251
+ See https://ai.google.dev/docs/concepts#model-parameters for further details
252
  ----------------------------------------------------------------------------------
253
+ gemini_model : The model to use. Currently only supports gemini-pro and gemini-pro-vision, the 1.0 model and 1.5 models and their aliases.
254
 
255
  gemini_prompt : Instructions to the model. Basically tells the model how to translate.
256
 
 
260
 
261
  gemini_top_k : Determines the number of most probable tokens to consider for each selection step. A higher value increases diversity, a lower value makes the output more deterministic.
262
 
263
+ gemini_candidate_count : The number of candidates to generate for each input message. Do not change this as Kudasai will always use 1.
264
 
265
+ gemini_stream : If set, partial message deltas will be sent, like in Gemini Chat. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. Do not change this as Kudasai does not support this feature.
266
 
267
+ gemini_stop_sequences : Up to 4 sequences where the API will stop generating further tokens. Do not change this as Kudasai does not support this feature.
268
 
269
  gemini_max_output_tokens : The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. I wouldn't recommend changing this. Is none by default. If you change to an integer, make sure it doesn't exceed that model's context length or your request will fail and repeat till timeout.
270
  ----------------------------------------------------------------------------------
271
+ gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for completion's sake, current versions of Kudasai will hardcode their values when validating the translation_settings.json to their default values. As different values for these settings do not have a use case in Kudasai's current implementation.
272
+ ----------------------------------------------------------------------------------
273
+ Deepl Settings:
274
+ See https://developers.deepl.com/docs/api-reference/translate for further details
275
  ----------------------------------------------------------------------------------
276
+ deepl_context : The context in which the text should be translated. This is used to improve the translation. If you don't have any context, you can leave this empty. This is a DeepL Alpha feature and could be subject to change.
277
+
278
+ deepl_split_sentences : How the text should be split into sentences. Possible values are 'OFF', 'ALL', 'NO_NEWLINES'.
279
+
280
+ deepl_preserve_formatting : Whether the formatting of the text should be preserved. If you don't want to preserve the formatting, you can set this to False. Otherwise, set it to True.
281
+
282
+ deepl_formality : The formality of the text. Possible values are 'default', 'more', 'less', 'prefer_more', 'prefer_less'.
283
 
284
  ---------------------------------------------------------------------------------------------------------------------------------------------------
285
 
286
+ ## **Web GUI**<a name="webgui"></a>
287
 
288
+ Kudasai also offers a Web GUI. It has all the main functionality of the program but in an easier and non-linear way.
289
 
290
+ To run the Web GUI, simply run webgui.py which is in the same directory as kudasai.py
 
291
 
292
+ Below are some images of the Web GUI.
 
293
 
294
+ Detailed Documentation for this can be found on the Hugging Face hosted version of Kudasai [here](https://huggingface.co/spaces/Bikatr7/Kudasai/blob/main/README.md).
 
295
 
296
+ Name Indexing | Kairyou:
297
+ ![Name Indexing Screen | Kairyou](https://i.imgur.com/QCPqjrw.jpeg)
298
 
299
+ Text Preprocessing | Kairyou:
300
+ ![Text Preprocessing Screen | Kairyou](https://i.imgur.com/r8nHEvw.jpeg)
301
 
302
+ Text Translation | Translator:
303
+ ![Text Translation Screen | Translator](https://i.imgur.com/0E9q2eh.jpeg)
304
+
305
+ Translation Settings Page 1:
306
+ ![Translation Settings Page 1](https://i.imgur.com/0E9q2eh.jpeg)
307
+
308
+ Translation Settings Page 2:
309
+ ![Translation Settings Page 2](https://i.imgur.com/8MQk6pL.jpeg)
310
+
311
+ Logging Page:
312
+ ![Logging Page](https://i.imgur.com/vDPCUQC.jpeg)
313
+
314
+ ---------------------------------------------------------------------------------------------------------------------------------------------------
315
+
316
+ ## **Hugging Face**<a name="huggingface"></a>
317
+
318
+ For those who are interested, or simply cannot run Kudasai locally, a instance of Kudasai's WebGUI is hosted on Hugging Face's servers. You can find it [here](https://huggingface.co/spaces/Bikatr7/Kudasai).
319
+
320
+ It's a bit slower than running it locally, but it's a good alternative for those who cannot run it locally. The webgui on huggingface does not save anything through runs, so you will need to download the output files or copy the text out of the webgui. API keys are not saved, and the output folder is overwritten every time it loads. Archives deleted every run as well.
321
+
322
+ To see the README for the Hugging Face hosted version of Kudasai, please see [here](https://huggingface.co/spaces/Bikatr7/Kudasai/blob/main/README.md).
323
 
324
  ---------------------------------------------------------------------------------------------------------------------------------------------------
325
+ ## **License**<a name="license"></a>
326
 
327
  This project (Kudasai) is licensed under the GNU General Public License (GPL). You can find the full text of the license in the [LICENSE](License.md) file.
328
 
 
331
  Please note that this information is a brief summary of the GPL. For a detailed understanding of your rights and obligations under this license, please refer to the full license text.
332
 
333
  ---------------------------------------------------------------------------------------------------------------------------------------------------
334
+ ## **Contact**<a name="contact"></a>
335
 
336
+ If you have any questions, comments, or concerns, please feel free to contact me at [Bikatr7@proton.me](mailto:Bikatr7@proton.me)
337
 
338
  For any bugs or suggestions please use the issues tab [here](https://github.com/Bikatr7/Kudasai/issues).
339
 
340
+ I actively encourage and welcome any feedback on this project.
341
+
342
+ ---------------------------------------------------------------------------------------------------------------------------------------------------
343
+
344
+ ## **Acknowledgements**<a name="acknowledgements"></a>
345
+
346
+ Kudasai gets it's original name idea from it's inspiration, Atreyagaurav's Onegai. Which also means please. You can find that [here](https://github.com/Atreyagaurav/onegai)
347
 
348
  ---------------------------------------------------------------------------------------------------------------------------------------------------
demo/translation_settings.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "base translation settings": {
3
+ "prompt_assembly_mode": 1,
4
+ "number_of_lines_per_batch": 48,
5
+ "sentence_fragmenter_mode": 2,
6
+ "je_check_mode": 2,
7
+ "number_of_malformed_batch_retries": 1,
8
+ "batch_retry_timeout": 700,
9
+ "number_of_concurrent_batches": 2
10
+ },
11
+
12
+ "openai settings": {
13
+ "openai_model": "gpt-4-turbo",
14
+ "openai_system_message": "As a Japanese to English translator, translate narration into English simple past, everything else should remain in its original tense. Maintain original formatting, punctuation, and paragraph structure. Keep pre-translated terms and anticipate names not replaced. Preserve terms and markers marked with >>><<< and match the output's line count to the input's. Note: 〇 indicates chapter changes.",
15
+ "openai_temperature": 0.3,
16
+ "openai_top_p": 1.0,
17
+ "openai_n": 1,
18
+ "openai_stream": false,
19
+ "openai_stop": null,
20
+ "openai_logit_bias": null,
21
+ "openai_max_tokens": null,
22
+ "openai_presence_penalty": 0.0,
23
+ "openai_frequency_penalty": 0.0
24
+ },
25
+
26
+ "gemini settings": {
27
+ "gemini_model": "gemini-1.5-pro-latest",
28
+ "gemini_prompt": "As a Japanese to English translator, translate narration into English simple past, everything else should remain in its original tense. Maintain original formatting, punctuation, and paragraph structure. Keep pre-translated terms and anticipate names not replaced. Preserve terms and markers marked with >>><<< and match the output's line count to the input's. Note: 〇 indicates chapter changes.",
29
+ "gemini_temperature": 0.3,
30
+ "gemini_top_p": null,
31
+ "gemini_top_k": null,
32
+ "gemini_candidate_count": 1,
33
+ "gemini_stream": false,
34
+ "gemini_stop_sequences": null,
35
+ "gemini_max_output_tokens": null
36
+ },
37
+
38
+ "deepl settings":{
39
+ "deepl_context": "",
40
+ "deepl_split_sentences": "ALL",
41
+ "deepl_preserve_formatting": true,
42
+ "deepl_formality": "default"
43
+ }
44
+
45
+ }
handlers/json_handler.py CHANGED
@@ -1,13 +1,13 @@
1
  ## built-in libraries
2
  import json
3
  import typing
 
4
 
5
  ## third-party libraries
6
  from easytl import ALLOWED_GEMINI_MODELS, ALLOWED_OPENAI_MODELS
7
 
8
  ## custom modules
9
  from modules.common.file_ensurer import FileEnsurer
10
- from modules.common.logger import Logger
11
  from modules.common.toolkit import Toolkit
12
 
13
  ##-------------------start-of-JsonHandler---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
@@ -16,81 +16,15 @@ class JsonHandler:
16
 
17
  """
18
 
19
- Handles the Kijiku Rules.json file and interactions with it.
20
 
21
  """
22
 
23
- current_kijiku_rules = dict()
24
 
25
- kijiku_settings_message = """
26
- ----------------------------------------------------------------------------------
27
- Kijiku Settings:
28
 
29
- prompt_assembly_mode : 1 or 2. 1 means the system message will actually be treated as a system message. 2 means it'll be treated as a user message. 1 is recommend for gpt-4 otherwise either works. For Gemini, this setting is ignored.
30
-
31
- number_of_lines_per_batch : The number of lines to be built into a prompt at once. Theoretically, more lines would be more cost effective, but other complications may occur with higher lines. So far been tested up to 48.
32
-
33
- sentence_fragmenter_mode : 1 or 2 (1 - via regex and other nonsense) 2 - None (Takes formatting and text directly from API return)) the API can sometimes return a result on a single line, so this determines the way Kijiku fragments the sentences if at all. Use 2 for newer models.
34
-
35
- je_check_mode : 1 or 2, 1 will print out the jap then the english below separated by ---, 2 will attempt to pair the english and jap sentences, placing the jap above the eng. If it cannot, it will default to 1. Use 2 for newer models.
36
-
37
- number_of_malformed_batch_retries : (Malformed batch is when je-fixing fails) How many times Kijiku will attempt to mend a malformed batch (mending is resending the request), only for gpt4. Be careful with increasing as cost increases at (cost * length * n) at worst case. This setting is ignored if je_check_mode is set to 1.
38
-
39
- batch_retry_timeout : How long Kijiku will try to translate a batch in seconds, if a requests exceeds this duration, Kijiku will leave it untranslated.
40
-
41
- number_of_concurrent_batches : How many translations batches Kijiku will send to the translation API at a time. For OpenAI, be conservative as rate-limiting is aggressive, I'd suggest 3-5. For Gemini, do not exceed 60.
42
- ----------------------------------------------------------------------------------
43
- Open AI Settings:
44
- See https://platform.openai.com/docs/api-reference/chat/create for further details
45
- ----------------------------------------------------------------------------------
46
- openai_model : ID of the model to use. Kijiku only works with 'chat' models.
47
-
48
- openai_system_message : Instructions to the model. Basically tells the model how to translate.
49
-
50
- openai_temperature : What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Lower values are typically better for translation.
51
-
52
- openai_top_p : An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. I generally recommend altering this or temperature but not both.
53
-
54
- openai_n : How many chat completion choices to generate for each input message. Do not change this.
55
-
56
- openai_stream : If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. See the OpenAI python library on GitHub for example code. Do not change this.
57
-
58
- openai_stop : Up to 4 sequences where the API will stop generating further tokens. Do not change this.
59
-
60
- openai_logit_bias : Modifies the likelihood of specified tokens appearing in the completion. Do not change this.
61
-
62
- openai_max_tokens : The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. I wouldn't recommend changing this. Is none by default. If you change to an integer, make sure it doesn't exceed that model's context length or your request will fail and repeat till timeout.
63
-
64
- openai_presence_penalty : Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. While negative values encourage repetition. Should leave this at 0.0.
65
-
66
- openai_frequency_penalty : Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. Negative values encourage repetition. Should leave this at 0.0.
67
- ----------------------------------------------------------------------------------
68
- openai_stream, openai_logit_bias, openai_stop and openai_n are included for completion's sake, current versions of Kudasai will hardcode their values when validating the Kijiku_rule.json to their default values. As different values for these settings do not have a use case in Kudasai's current implementation.
69
- ----------------------------------------------------------------------------------
70
- Gemini Settings:
71
- https://ai.google.dev/docs/concepts#model-parameters for further details
72
- ----------------------------------------------------------------------------------
73
- gemini_model : The model to use. Currently only supports gemini-pro and gemini-pro-vision, the 1.0 model and it's aliases.
74
-
75
- gemini_prompt : Instructions to the model. Basically tells the model how to translate.
76
-
77
- gemini_temperature : What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Lower values are typically better for translation.
78
-
79
- gemini_top_p : An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. I generally recommend altering this or temperature but not both.
80
-
81
- gemini_top_k : Determines the number of most probable tokens to consider for each selection step. A higher value increases diversity, a lower value makes the output more deterministic.
82
-
83
- gemini_candidate_count : The number of candidates to generate for each input message. Do not change this.
84
-
85
- gemini_stream : If set, partial message deltas will be sent, like in Gemini Chat. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. Do not change this.
86
-
87
- gemini_stop_sequences : Up to 4 sequences where the API will stop generating further tokens. Do not change this.
88
-
89
- gemini_max_output_tokens : The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. I wouldn't recommend changing this. Is none by default. If you change to an integer, make sure it doesn't exceed that model's context length or your request will fail and repeat till timeout.
90
- ----------------------------------------------------------------------------------
91
- gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for completion's sake, current versions of Kudasai will hardcode their values when validating the Kijiku_rule.json to their default values. As different values for these settings do not have a use case in Kudasai's current implementation.
92
- ----------------------------------------------------------------------------------
93
- """
94
 
95
  ##-------------------start-of-validate_json()--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
96
 
@@ -99,11 +33,11 @@ gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for
99
 
100
  """
101
 
102
- Validates the Kijiku Rules.json file.
103
 
104
  """
105
 
106
- base_kijiku_keys = [
107
  "prompt_assembly_mode",
108
  "number_of_lines_per_batch",
109
  "sentence_fragmenter_mode",
@@ -139,6 +73,13 @@ gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for
139
  "gemini_max_output_tokens"
140
  ]
141
 
 
 
 
 
 
 
 
142
  validation_rules = {
143
  "prompt_assembly_mode": lambda x: isinstance(x, int) and 1 <= x <= 2,
144
  "number_of_lines_per_batch": lambda x: isinstance(x, int) and x > 0,
@@ -159,31 +100,42 @@ gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for
159
  "gemini_top_p": lambda x: x is None or (isinstance(x, float) and 0 <= x <= 2),
160
  "gemini_top_k": lambda x: x is None or (isinstance(x, int) and x >= 0),
161
  "gemini_max_output_tokens": lambda x: x is None or isinstance(x, int),
 
 
 
 
162
  }
163
 
164
  try:
165
  ## ensure categories are present
166
- assert "base kijiku settings" in JsonHandler.current_kijiku_rules
167
- assert "openai settings" in JsonHandler.current_kijiku_rules
168
- assert "gemini settings" in JsonHandler.current_kijiku_rules
 
169
 
170
  ## assign to variables to reduce repetitive access
171
- base_kijiku_settings = JsonHandler.current_kijiku_rules["base kijiku settings"]
172
- openai_settings = JsonHandler.current_kijiku_rules["openai settings"]
173
- gemini_settings = JsonHandler.current_kijiku_rules["gemini settings"]
 
174
 
175
  ## ensure all keys are present
176
- ## ensure all keys are present
177
- assert all(key in base_kijiku_settings for key in base_kijiku_keys)
178
- assert all(key in openai_settings for key in openai_keys)
179
- assert all(key in gemini_settings for key in gemini_keys)
180
 
181
  ## validate each key using the validation rules
182
  for key, validate in validation_rules.items():
183
- if(key in openai_settings and not validate(openai_settings[key])):
 
 
184
  raise ValueError(f"Invalid value for {key}")
185
  elif(key in gemini_settings and not validate(gemini_settings[key])):
186
  raise ValueError(f"Invalid value for {key}")
 
 
 
187
 
188
  ## force stop/logit_bias into None
189
  openai_settings["openai_stop"] = None
@@ -195,108 +147,111 @@ gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for
195
 
196
  ## force n and candidate_count to 1
197
  openai_settings["openai_n"] = 1
198
-
199
  gemini_settings["gemini_candidate_count"] = 1
200
 
 
 
 
 
 
 
 
201
  except Exception as e:
202
- Logger.log_action("Kijiku Rules.json is not valid, setting to invalid_placeholder, current:")
203
- Logger.log_action("Reason: " + str(e))
204
- Logger.log_action(str(JsonHandler.current_kijiku_rules))
205
- JsonHandler.current_kijiku_rules = FileEnsurer.INVALID_KIJIKU_RULES_PLACEHOLDER
 
206
 
207
- Logger.log_action("Kijiku Rules.json is valid, current:")
208
- Logger.log_action(str(JsonHandler.current_kijiku_rules))
209
 
210
- ##-------------------start-of-reset_kijiku_rules_to_default()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
211
 
212
  @staticmethod
213
- def reset_kijiku_rules_to_default() -> None:
214
 
215
  """
216
 
217
- Resets the kijiku_rules json to default.
218
 
219
  """
220
 
221
- JsonHandler.current_kijiku_rules = FileEnsurer.DEFAULT_KIJIKU_RULES
222
 
223
- JsonHandler.dump_kijiku_rules()
224
 
225
- JsonHandler.load_kijiku_rules()
226
 
227
- ##-------------------start-of-dump_kijiku_rules()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
228
 
229
  @staticmethod
230
- def dump_kijiku_rules() -> None:
231
 
232
  """
233
 
234
- Dumps the Kijiku Rules.json file to disk.
235
 
236
  """
237
 
238
- with open(FileEnsurer.config_kijiku_rules_path, 'w+', encoding='utf-8') as file:
239
- json.dump(JsonHandler.current_kijiku_rules, file)
240
 
241
- ##-------------------start-of-load_kijiku_rules()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
242
 
243
  @staticmethod
244
- def load_kijiku_rules() -> None:
245
 
246
  """
247
 
248
- Loads the Kijiku Rules.json file into memory.
249
 
250
  """
251
 
252
- with open(FileEnsurer.config_kijiku_rules_path, 'r', encoding='utf-8') as file:
253
- JsonHandler.current_kijiku_rules = json.load(file)
254
 
255
- ##-------------------start-of-print_kijiku_rules()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
256
 
257
  @staticmethod
258
- def print_kijiku_rules(output:bool=False) -> None:
259
 
260
  """
261
 
262
- Prints the Kijiku Rules.json file to the log.
263
  Logs by default, but can be set to print to console as well.
264
 
265
  Parameters:
266
- output (bool | optional | default=False) : Whether to print to console as well.
267
 
268
  """
269
 
270
- print("-------------------")
271
- print("Base Kijiku Settings:")
272
- print("-------------------")
273
-
274
- for key,value in JsonHandler.current_kijiku_rules["base kijiku settings"].items():
275
- Logger.log_action(key + " : " + str(value), output=output, omit_timestamp=output)
276
-
277
- print("-------------------")
278
- print("Open AI Settings:")
279
- print("-------------------")
280
-
281
- for key,value in JsonHandler.current_kijiku_rules["openai settings"].items():
282
- Logger.log_action(key + " : " + str(value), output=output, omit_timestamp=output)
283
-
284
- print("-------------------")
285
- print("Gemini Settings:")
286
- print("-------------------")
287
-
288
- for key,value in JsonHandler.current_kijiku_rules["gemini settings"].items():
289
- Logger.log_action(key + " : " + str(value), output=output, omit_timestamp=output)
290
-
291
 
292
- ##-------------------start-of-change_kijiku_settings()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
293
 
294
  @staticmethod
295
- def change_kijiku_settings() -> None:
296
 
297
  """
298
 
299
- Allows the user to change the settings of the Kijiku Rules.json file
300
 
301
  """
302
 
@@ -304,7 +259,7 @@ gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for
304
 
305
  Toolkit.clear_console()
306
 
307
- settings_print_message = JsonHandler.kijiku_settings_message + SettingsChanger.generate_settings_change_menu()
308
 
309
  action = input(settings_print_message).lower()
310
 
@@ -317,23 +272,26 @@ gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for
317
 
318
  elif(action == "d"):
319
  print("Resetting to default settings.")
320
- JsonHandler.reset_kijiku_rules_to_default()
321
 
322
- elif(action in JsonHandler.current_kijiku_rules["base kijiku settings"]):
323
- SettingsChanger.change_setting("base kijiku settings", action)
324
 
325
- elif(action in JsonHandler.current_kijiku_rules["openai settings"]):
326
  SettingsChanger.change_setting("openai settings", action)
327
 
328
- elif(action in JsonHandler.current_kijiku_rules["gemini settings"]):
329
  SettingsChanger.change_setting("gemini settings", action)
330
 
 
 
 
331
  else:
332
  print("Invalid setting name. Please try again.")
333
 
334
  Toolkit.pause_console("\nPress enter to continue.")
335
 
336
- JsonHandler.dump_kijiku_rules()
337
 
338
  ##-------------------start-of-convert_to_correct_type()-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
339
 
@@ -383,6 +341,10 @@ gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for
383
  "gemini_stream": {"type": bool, "constraints": lambda x: x is False},
384
  "gemini_stop_sequences": {"type": None, "constraints": lambda x: x is None},
385
  "gemini_max_output_tokens": {"type": int, "constraints": lambda x: x is None or isinstance(x, int)},
 
 
 
 
386
  }
387
 
388
  if(setting_name not in type_expectations):
@@ -424,7 +386,7 @@ class SettingsChanger:
424
 
425
  """
426
 
427
- Handles changing the settings of the Kijiku Rules.json file.
428
 
429
  """
430
 
@@ -448,17 +410,22 @@ Current settings:
448
 
449
  """
450
 
451
- for key,value in JsonHandler.current_kijiku_rules["base kijiku settings"].items():
452
  menu += key + " : " + str(value) + "\n"
453
 
454
  print("\n")
455
 
456
- for key,value in JsonHandler.current_kijiku_rules["openai settings"].items():
457
  menu += key + " : " + str(value) + "\n"
458
 
459
  print("\n")
460
 
461
- for key,value in JsonHandler.current_kijiku_rules["gemini settings"].items():
 
 
 
 
 
462
  menu += key + " : " + str(value) + "\n"
463
 
464
  menu += """
@@ -476,37 +443,37 @@ Enter the name of the setting you want to change, type d to reset to default, ty
476
 
477
  """
478
 
479
- Loads a custom json into the Kijiku Rules.json file.
480
 
481
  """
482
 
483
  Toolkit.clear_console()
484
 
485
  ## saves old rules in case on invalid json
486
- old_kijiku_rules = JsonHandler.current_kijiku_rules
487
 
488
  try:
489
 
490
  ## loads the custom json file
491
- with open(FileEnsurer.external_kijiku_rules_path, 'r', encoding='utf-8') as file:
492
- JsonHandler.current_kijiku_rules = json.load(file)
493
 
494
  JsonHandler.validate_json()
495
 
496
  ## validate_json() sets a dict to the invalid placeholder if it's invalid, so if it's that, it's invalid
497
- assert JsonHandler.current_kijiku_rules != FileEnsurer.INVALID_KIJIKU_RULES_PLACEHOLDER
498
 
499
- JsonHandler.dump_kijiku_rules()
500
 
501
  print("Settings loaded successfully.")
502
 
503
  except AssertionError:
504
  print("Invalid JSON file. Please try again.")
505
- JsonHandler.current_kijiku_rules = old_kijiku_rules
506
 
507
  except FileNotFoundError:
508
- print("Missing JSON file. Make sure you have a json in the same directory as kudasai.py and that the json is named \"kijiku_rules.json\". Please try again.")
509
- JsonHandler.current_kijiku_rules = old_kijiku_rules
510
 
511
  ##-------------------start-of-change_setting()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
512
 
@@ -515,7 +482,7 @@ Enter the name of the setting you want to change, type d to reset to default, ty
515
 
516
  """
517
 
518
- Changes the setting of the Kijiku Rules.json file.
519
 
520
  Parameters:
521
  setting_area (str) : The area of the setting to change.
@@ -528,7 +495,7 @@ Enter the name of the setting you want to change, type d to reset to default, ty
528
  try:
529
  converted_value = JsonHandler.convert_to_correct_type(setting_name, new_value)
530
 
531
- JsonHandler.current_kijiku_rules[setting_area][setting_name] = converted_value
532
  print(f"Updated {setting_name} to {converted_value}.")
533
 
534
  except Exception as e:
 
1
  ## built-in libraries
2
  import json
3
  import typing
4
+ import logging
5
 
6
  ## third-party libraries
7
  from easytl import ALLOWED_GEMINI_MODELS, ALLOWED_OPENAI_MODELS
8
 
9
  ## custom modules
10
  from modules.common.file_ensurer import FileEnsurer
 
11
  from modules.common.toolkit import Toolkit
12
 
13
  ##-------------------start-of-JsonHandler---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
16
 
17
  """
18
 
19
+ Handles the translation_settings.json file and interactions with it.
20
 
21
  """
22
 
23
+ current_translation_settings = dict()
24
 
25
+ with open(FileEnsurer.translation_settings_description_path, 'r', encoding='utf-8') as file:
26
+ translation_settings_message = file.read()
 
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  ##-------------------start-of-validate_json()--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
30
 
 
33
 
34
  """
35
 
36
+ Validates the translation_settings.json file.
37
 
38
  """
39
 
40
+ base_translation_keys = [
41
  "prompt_assembly_mode",
42
  "number_of_lines_per_batch",
43
  "sentence_fragmenter_mode",
 
73
  "gemini_max_output_tokens"
74
  ]
75
 
76
+ deepl_keys = [
77
+ "deepl_context",
78
+ "deepl_split_sentences",
79
+ "deepl_preserve_formatting",
80
+ "deepl_formality"
81
+ ]
82
+
83
  validation_rules = {
84
  "prompt_assembly_mode": lambda x: isinstance(x, int) and 1 <= x <= 2,
85
  "number_of_lines_per_batch": lambda x: isinstance(x, int) and x > 0,
 
100
  "gemini_top_p": lambda x: x is None or (isinstance(x, float) and 0 <= x <= 2),
101
  "gemini_top_k": lambda x: x is None or (isinstance(x, int) and x >= 0),
102
  "gemini_max_output_tokens": lambda x: x is None or isinstance(x, int),
103
+ "deepl_context": lambda x: isinstance(x, str),
104
+ "deepl_split_sentences": lambda x: isinstance(x, str),
105
+ "deepl_preserve_formatting": lambda x: isinstance(x, bool),
106
+ "deepl_formality": lambda x: isinstance(x, str)
107
  }
108
 
109
  try:
110
  ## ensure categories are present
111
+ assert "base translation settings" in JsonHandler.current_translation_settings, "base translation settings not found"
112
+ assert "openai settings" in JsonHandler.current_translation_settings, "openai settings not found"
113
+ assert "gemini settings" in JsonHandler.current_translation_settings, "gemini settings not found"
114
+ assert "deepl settings" in JsonHandler.current_translation_settings, "deepl settings not found"
115
 
116
  ## assign to variables to reduce repetitive access
117
+ base_translation_settings = JsonHandler.current_translation_settings["base translation settings"]
118
+ openai_settings = JsonHandler.current_translation_settings["openai settings"]
119
+ gemini_settings = JsonHandler.current_translation_settings["gemini settings"]
120
+ deepl_settings = JsonHandler.current_translation_settings["deepl settings"]
121
 
122
  ## ensure all keys are present
123
+ assert all(key in base_translation_settings for key in base_translation_keys), "base translation settings keys missing"
124
+ assert all(key in openai_settings for key in openai_keys), "openai settings keys missing"
125
+ assert all(key in gemini_settings for key in gemini_keys), "gemini settings keys missing"
126
+ assert all(key in deepl_settings for key in deepl_keys), "deepl settings keys missing"
127
 
128
  ## validate each key using the validation rules
129
  for key, validate in validation_rules.items():
130
+ if(key in base_translation_settings and not validate(base_translation_settings[key])):
131
+ raise ValueError(f"Invalid value for {key}")
132
+ elif(key in openai_settings and not validate(openai_settings[key])):
133
  raise ValueError(f"Invalid value for {key}")
134
  elif(key in gemini_settings and not validate(gemini_settings[key])):
135
  raise ValueError(f"Invalid value for {key}")
136
+ elif(key in deepl_settings and not validate(deepl_settings[key])):
137
+ raise ValueError(f"Invalid value for {key}")
138
+
139
 
140
  ## force stop/logit_bias into None
141
  openai_settings["openai_stop"] = None
 
147
 
148
  ## force n and candidate_count to 1
149
  openai_settings["openai_n"] = 1
 
150
  gemini_settings["gemini_candidate_count"] = 1
151
 
152
+ ## ensure deepl_formality and deepl_split_sentences are in allowed values
153
+ if(isinstance(deepl_settings["deepl_formality"], str) and deepl_settings["deepl_formality"] not in ["default", "more", "less", "prefer_more", "prefer_less"]):
154
+ raise ValueError("Invalid value for deepl_formality")
155
+
156
+ if(isinstance(deepl_settings["deepl_split_sentences"], str) and deepl_settings["deepl_split_sentences"] not in ["OFF", "ALL", "NO_NEWLINES"]):
157
+ raise ValueError("Invalid value for deepl_split_sentences")
158
+
159
  except Exception as e:
160
+ logging.warning(f"translation_settings.json is not valid, setting to invalid_placeholder, current:"
161
+ f"\n{JsonHandler.current_translation_settings}"
162
+ f"\nReason: {e}")
163
+
164
+ JsonHandler.current_translation_settings = FileEnsurer.INVALID_TRANSLATION_SETTINGS_PLACEHOLDER
165
 
166
+ logging.debug(f"translation_settings.json is valid, current:"
167
+ f"\n{JsonHandler.current_translation_settings}")
168
 
169
+ ##-------------------start-of-reset_translation_settings_to_default()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
170
 
171
  @staticmethod
172
+ def reset_translation_settings_to_default() -> None:
173
 
174
  """
175
 
176
+ Resets the translation_settings.json to default.
177
 
178
  """
179
 
180
+ JsonHandler.current_translation_settings = FileEnsurer.DEFAULT_TRANSLATION_SETTING
181
 
182
+ JsonHandler.dump_translation_settings()
183
 
184
+ JsonHandler.load_translation_settings()
185
 
186
+ ##-------------------start-of-dump_translation_settings()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
187
 
188
  @staticmethod
189
+ def dump_translation_settings() -> None:
190
 
191
  """
192
 
193
+ Dumps the translation_settings.json file to disk.
194
 
195
  """
196
 
197
+ with open(FileEnsurer.config_translation_settings_path, 'w+', encoding='utf-8') as file:
198
+ json.dump(JsonHandler.current_translation_settings, file)
199
 
200
+ ##-------------------start-of-load_translation_settings()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
201
 
202
  @staticmethod
203
+ def load_translation_settings() -> None:
204
 
205
  """
206
 
207
+ Loads the translation_settings.json file into memory.
208
 
209
  """
210
 
211
+ with open(FileEnsurer.config_translation_settings_path, 'r', encoding='utf-8') as file:
212
+ JsonHandler.current_translation_settings = json.load(file)
213
 
214
+ ##-------------------start-of-log_translation_settings()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
215
 
216
  @staticmethod
217
+ def log_translation_settings(output_to_console:bool=False, specific_section:str | None = None) -> None:
218
 
219
  """
220
 
221
+ Prints the translation_settings.json file to the log.
222
  Logs by default, but can be set to print to console as well.
223
 
224
  Parameters:
225
+ output_to_console (bool | optional | default=False) : Whether to print to console as well.
226
 
227
  """
228
 
229
+ sections = ["base translation settings", "openai settings", "gemini settings", "deepl settings"]
230
+
231
+ ## if a specific section is provided, only print that section and base translation settings
232
+ if(specific_section is not None):
233
+ specific_section = specific_section.lower()
234
+ sections = [section for section in sections if section.lower() == specific_section or section == "base translation settings"]
235
+
236
+ for section in sections:
237
+ print("-------------------")
238
+ print(f"{section.capitalize()}:")
239
+ print("-------------------")
240
+
241
+ for key, value in JsonHandler.current_translation_settings.get(section, {}).items():
242
+ log_message = f"{key} : {value}"
243
+ logging.debug(log_message)
244
+ if(output_to_console):
245
+ print(log_message)
 
 
 
 
246
 
247
+ ##-------------------start-of-change_translation_settings()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
248
 
249
  @staticmethod
250
+ def change_translation_settings() -> None:
251
 
252
  """
253
 
254
+ Allows the user to change the settings of the translation_settings.json file
255
 
256
  """
257
 
 
259
 
260
  Toolkit.clear_console()
261
 
262
+ settings_print_message = JsonHandler.translation_settings_message + SettingsChanger.generate_settings_change_menu()
263
 
264
  action = input(settings_print_message).lower()
265
 
 
272
 
273
  elif(action == "d"):
274
  print("Resetting to default settings.")
275
+ JsonHandler.reset_translation_settings_to_default()
276
 
277
+ elif(action in JsonHandler.current_translation_settings["base translation settings"]):
278
+ SettingsChanger.change_setting("base translation settings", action)
279
 
280
+ elif(action in JsonHandler.current_translation_settings["openai settings"]):
281
  SettingsChanger.change_setting("openai settings", action)
282
 
283
+ elif(action in JsonHandler.current_translation_settings["gemini settings"]):
284
  SettingsChanger.change_setting("gemini settings", action)
285
 
286
+ elif(action in JsonHandler.current_translation_settings["deepl settings"]):
287
+ SettingsChanger.change_setting("deepl settings", action)
288
+
289
  else:
290
  print("Invalid setting name. Please try again.")
291
 
292
  Toolkit.pause_console("\nPress enter to continue.")
293
 
294
+ JsonHandler.dump_translation_settings()
295
 
296
  ##-------------------start-of-convert_to_correct_type()-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
297
 
 
341
  "gemini_stream": {"type": bool, "constraints": lambda x: x is False},
342
  "gemini_stop_sequences": {"type": None, "constraints": lambda x: x is None},
343
  "gemini_max_output_tokens": {"type": int, "constraints": lambda x: x is None or isinstance(x, int)},
344
+ "deepl_context": {"type": str, "constraints": lambda x: isinstance(x, str)},
345
+ "deepl_split_sentences": {"type": str, "constraints": lambda x: isinstance(x, str)},
346
+ "deepl_preserve_formatting": {"type": bool, "constraints": lambda x: isinstance(x, bool)},
347
+ "deepl_formality": {"type": str, "constraints": lambda x: isinstance(x, str)}
348
  }
349
 
350
  if(setting_name not in type_expectations):
 
386
 
387
  """
388
 
389
+ Handles changing the settings of the translation_settings.json file.
390
 
391
  """
392
 
 
410
 
411
  """
412
 
413
+ for key,value in JsonHandler.current_translation_settings["base translation settings"].items():
414
  menu += key + " : " + str(value) + "\n"
415
 
416
  print("\n")
417
 
418
+ for key,value in JsonHandler.current_translation_settings["openai settings"].items():
419
  menu += key + " : " + str(value) + "\n"
420
 
421
  print("\n")
422
 
423
+ for key,value in JsonHandler.current_translation_settings["gemini settings"].items():
424
+ menu += key + " : " + str(value) + "\n"
425
+
426
+ print("\n")
427
+
428
+ for key,value in JsonHandler.current_translation_settings["deepl settings"].items():
429
  menu += key + " : " + str(value) + "\n"
430
 
431
  menu += """
 
443
 
444
  """
445
 
446
+ Loads a custom json into the translation_settings.json file.
447
 
448
  """
449
 
450
  Toolkit.clear_console()
451
 
452
  ## saves old rules in case on invalid json
453
+ old_translation_settings = JsonHandler.current_translation_settings
454
 
455
  try:
456
 
457
  ## loads the custom json file
458
+ with open(FileEnsurer.external_translation_settings_path, 'r', encoding='utf-8') as file:
459
+ JsonHandler.current_translation_settings = json.load(file)
460
 
461
  JsonHandler.validate_json()
462
 
463
  ## validate_json() sets a dict to the invalid placeholder if it's invalid, so if it's that, it's invalid
464
+ assert JsonHandler.current_translation_settings != FileEnsurer.INVALID_TRANSLATION_SETTINGS_PLACEHOLDER
465
 
466
+ JsonHandler.dump_translation_settings()
467
 
468
  print("Settings loaded successfully.")
469
 
470
  except AssertionError:
471
  print("Invalid JSON file. Please try again.")
472
+ JsonHandler.current_translation_settings = old_translation_settings
473
 
474
  except FileNotFoundError:
475
+ print("Missing JSON file. Make sure you have a json in the same directory as kudasai.py and that the json is named \"translation_settings.json\". Please try again.")
476
+ JsonHandler.current_translation_settings = old_translation_settings
477
 
478
  ##-------------------start-of-change_setting()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
479
 
 
482
 
483
  """
484
 
485
+ Changes the setting of the translation_settings.json file.
486
 
487
  Parameters:
488
  setting_area (str) : The area of the setting to change.
 
495
  try:
496
  converted_value = JsonHandler.convert_to_correct_type(setting_name, new_value)
497
 
498
+ JsonHandler.current_translation_settings[setting_area][setting_name] = converted_value
499
  print(f"Updated {setting_name} to {converted_value}.")
500
 
501
  except Exception as e:
kudasai.py CHANGED
@@ -5,6 +5,8 @@ import json
5
  import asyncio
6
  import re
7
  import typing
 
 
8
 
9
  ## third-party libraries
10
  from kairyou import Kairyou
@@ -12,14 +14,12 @@ from kairyou import Indexer
12
  from kairyou.types import NameAndOccurrence
13
 
14
  ## custom modules
15
- from models.kaiseki import Kaiseki
16
- from models.kijiku import Kijiku
17
 
18
  from handlers.json_handler import JsonHandler
19
 
20
  from modules.common.toolkit import Toolkit
21
  from modules.common.file_ensurer import FileEnsurer
22
- from modules.common.logger import Logger
23
 
24
  ##-------------------start-of-Kudasai---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
25
 
@@ -35,8 +35,49 @@ class Kudasai:
35
 
36
  text_to_preprocess:str
37
  replacement_json:dict
 
38
 
39
  need_to_run_kairyou:bool = True
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
  ##-------------------start-of-boot()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
42
 
@@ -45,7 +86,7 @@ class Kudasai:
45
 
46
  """
47
 
48
- Does some logging and sets up the console window, regardless of whether the user is running the CLI, WebGUI, or Console version of Kudasai.
49
 
50
  """
51
 
@@ -53,34 +94,36 @@ class Kudasai:
53
 
54
  Toolkit.clear_console()
55
 
 
 
 
 
 
56
  FileEnsurer.setup_needed_files()
57
 
58
- Logger.log_barrier()
59
- Logger.log_action("Kudasai started")
60
- Logger.log_action("Current version: " + Toolkit.CURRENT_VERSION)
61
- Logger.log_barrier()
62
 
63
  try:
64
 
65
- with open(FileEnsurer.config_kijiku_rules_path, "r") as kijiku_rules_file:
66
- JsonHandler.current_kijiku_rules = json.load(kijiku_rules_file)
67
 
68
  JsonHandler.validate_json()
69
 
70
- assert JsonHandler.current_kijiku_rules != FileEnsurer.INVALID_KIJIKU_RULES_PLACEHOLDER
71
 
72
  except:
73
 
74
- print("Invalid kijiku_rules.json file. Please check the file for errors. If you are unsure, delete the file and run Kudasai again. Your kijiku rules file is located at: " + FileEnsurer.config_kijiku_rules_path)
75
 
76
  Toolkit.pause_console()
77
 
78
- raise Exception("Invalid kijiku_rules.json file. Please check the file for errors. If you are unsure, delete the file and run Kudasai again. Your kijiku rules file is located at: " + FileEnsurer.config_kijiku_rules_path)
79
 
80
  ##-------------------start-of-run_kairyou_indexer()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
81
 
82
  @staticmethod
83
- def run_kairyou_indexer(text_to_index:str, replacement_json:typing.Union[dict,str]) -> typing.Tuple[str, str]:
84
 
85
  """
86
 
@@ -98,8 +141,6 @@ class Kudasai:
98
 
99
  Toolkit.clear_console()
100
 
101
- knowledge_base = input("Please enter the path to the knowledge base you would like to use for the indexer (can be text, a path to a txt file, or a path to a directory of txt files):\n").strip('"')
102
-
103
  ## unique names is a list of named tuples, with the fields name and occurrence
104
  unique_names, indexing_log = Indexer.index(text_to_index, knowledge_base, replacement_json)
105
 
@@ -144,7 +185,7 @@ class Kudasai:
144
  new_text += text[last_end:match.start()] + f">>>{name}<<<"
145
  last_end = match.end()
146
 
147
- new_text += text[last_end:] # Append the rest of the text
148
  text = new_text
149
 
150
  return text
@@ -166,8 +207,13 @@ class Kudasai:
166
 
167
  indexing_log = ""
168
 
169
- if(Kudasai.replacement_json not in ["", FileEnsurer.blank_rules_path, FileEnsurer.standard_read_json(FileEnsurer.blank_rules_path)] and input("Would you like to use Kairyou's Indexer to index the preprocessed text? (1 for yes, 2 for no)\n") == "1"):
170
- Kudasai.text_to_preprocess, indexing_log = Kudasai.run_kairyou_indexer(Kudasai.text_to_preprocess, Kudasai.replacement_json)
 
 
 
 
 
171
 
172
  preprocessed_text, preprocessing_log, error_log = Kairyou.preprocess(Kudasai.text_to_preprocess, Kudasai.replacement_json)
173
 
@@ -178,6 +224,9 @@ class Kudasai:
178
  if(indexing_log != ""):
179
  preprocessing_log = indexing_log + "\n\n" + preprocessing_log
180
 
 
 
 
181
  print(preprocessing_log)
182
 
183
  timestamp = Toolkit.get_timestamp(is_archival=True)
@@ -190,7 +239,7 @@ class Kudasai:
190
  else:
191
  print("(Preprocessing skipped)")
192
 
193
- await Kudasai.determine_autotranslation_module()
194
 
195
  Toolkit.pause_console("\nPress any key to exit...")
196
 
@@ -214,92 +263,32 @@ class Kudasai:
214
  Toolkit.pause_console()
215
  Toolkit.clear_console()
216
 
217
- ##-------------------start-of-determine_autotranslation_module()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
218
 
219
  @staticmethod
220
- async def determine_autotranslation_module() -> None:
221
 
222
  """
223
 
224
- If the user is running the CLI or Console version of Kudasai, this function is called to determine which autotranslation module to use.
225
 
226
  """
227
 
228
- if(not Kudasai.connection):
229
- Toolkit.clear_console()
230
-
231
- print("You are not connected to the internet. Please connect to the internet to use the autotranslation feature.\n")
232
- Toolkit.pause_console()
233
-
234
- exit()
235
 
236
- pathing = ""
237
-
238
- pathing_msg = "Please select an auto-translation module:\n\n1.Kaiseki (deepL)\n2.Kijiku (OpenAI/Gemini)\n3.Exit\n\n"
239
-
240
- pathing = input(pathing_msg)
241
 
242
  Toolkit.clear_console()
243
 
244
- if(pathing == "1"):
245
- Kudasai.run_kaiseki()
246
- elif(pathing == "2"):
247
- await Kudasai.run_kijiku()
248
- else:
249
- Toolkit.clear_console()
250
- exit()
251
-
252
- ##-------------------start-of-run_kaiseki()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
253
 
254
- @staticmethod
255
- def run_kaiseki() -> None:
256
-
257
- """
258
-
259
- If the user is running the CLI or Console version of Kudasai, this function is called to run the Kaiseki module.
260
-
261
- """
262
-
263
- Logger.log_action("--------------------")
264
- Logger.log_action("Kaiseki started")
265
- Logger.log_action("--------------------")
266
-
267
- Kaiseki.text_to_translate = [line for line in Kudasai.text_to_preprocess.splitlines()]
268
-
269
- Kaiseki.translate()
270
 
271
  Toolkit.clear_console()
272
 
273
- print(Kaiseki.translation_print_result)
274
-
275
- Kaiseki.write_kaiseki_results()
276
-
277
- ##-------------------start-of-run_kijiku()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
278
-
279
- @staticmethod
280
- async def run_kijiku() -> None:
281
 
282
- """
283
-
284
- If the user is running the CLI or Console version of Kudasai, this function is called to run the Kijiku module.
285
-
286
- """
287
-
288
- Logger.log_action("--------------------")
289
- Logger.log_action("Kijiku started")
290
- Logger.log_action("--------------------")
291
-
292
- Toolkit.clear_console()
293
-
294
- Kijiku.text_to_translate = [line for line in Kudasai.text_to_preprocess.splitlines()]
295
-
296
- await Kijiku.translate()
297
-
298
- Toolkit.clear_console()
299
-
300
- print(Kijiku.translation_print_result)
301
-
302
- Kijiku.write_kijiku_results()
303
 
304
  ##-------------------start-of-main()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
305
 
@@ -319,7 +308,7 @@ async def main() -> None:
319
  if(len(sys.argv) <= 1):
320
  await run_console_version()
321
 
322
- elif(len(sys.argv) in [2, 3]):
323
  await run_cli_version()
324
 
325
  else:
@@ -340,21 +329,26 @@ async def run_console_version():
340
 
341
  try:
342
 
343
- path_to_text_to_preprocess = input("Please enter the path to the input file to be processed:\n").strip('"')
344
  Kudasai.text_to_preprocess = FileEnsurer.standard_read_file(path_to_text_to_preprocess)
345
  Toolkit.clear_console()
346
 
347
- path_to_replacement_json = input("Please enter the path to the replacement json file:\n").strip('"')
348
  Kudasai.replacement_json = FileEnsurer.standard_read_json(path_to_replacement_json if path_to_replacement_json else FileEnsurer.blank_rules_path)
349
  Toolkit.clear_console()
350
 
 
 
 
 
351
  except Exception as e:
352
  print_usage_statement()
353
 
354
  raise e
 
 
355
 
356
  await Kudasai.run_kudasai()
357
- Logger.push_batch()
358
 
359
  ##-------------------start-of-run_cli_version()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
360
 
@@ -366,22 +360,99 @@ async def run_cli_version():
366
 
367
  """
368
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
369
  try:
370
 
371
- Kudasai.text_to_preprocess = FileEnsurer.standard_read_file(sys.argv[1].strip('"'))
372
- Kudasai.replacement_json = FileEnsurer.standard_read_json(sys.argv[2].strip('"') if(len(sys.argv) == 3) else FileEnsurer.blank_rules_path)
 
 
 
373
 
374
- except Exception as e:
375
- print_usage_statement()
 
376
 
377
- raise e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
378
 
379
- if(len(sys.argv) == 2):
380
- Kudasai.need_to_run_kairyou = False
381
 
382
- await Kudasai.run_kudasai()
383
- Logger.push_batch()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
384
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
385
  ##-------------------start-of-print_usage_statement()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
386
 
387
  def print_usage_statement():
@@ -391,14 +462,47 @@ def print_usage_statement():
391
  Prints the usage statement for the CLI version of Kudasai.
392
 
393
  """
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
394
 
395
- Logger.log_action("Usage: python Kudasai.py <input_file> <replacement_json>", output=True, omit_timestamp=True)
396
- Logger.log_action("or run Kudasai.py without any arguments to run the console version.", output=True, omit_timestamp=True)
 
 
397
 
398
- print("\n")
399
 
400
  ##-------------------start-of-submain()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
401
 
402
 
403
- if(__name__ == '__main__'):
404
  asyncio.run(main())
 
5
  import asyncio
6
  import re
7
  import typing
8
+ import logging
9
+ import argparse
10
 
11
  ## third-party libraries
12
  from kairyou import Kairyou
 
14
  from kairyou.types import NameAndOccurrence
15
 
16
  ## custom modules
17
+ from modules.common.translator import Translator
 
18
 
19
  from handlers.json_handler import JsonHandler
20
 
21
  from modules.common.toolkit import Toolkit
22
  from modules.common.file_ensurer import FileEnsurer
 
23
 
24
  ##-------------------start-of-Kudasai---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
25
 
 
35
 
36
  text_to_preprocess:str
37
  replacement_json:dict
38
+ knowledge_base:str
39
 
40
  need_to_run_kairyou:bool = True
41
+ need_to_run_indexer:bool = True
42
+
43
+ ##-------------------start-of-setup_logging()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
44
+
45
+ @staticmethod
46
+ def setup_logging() -> None:
47
+
48
+ """
49
+
50
+ Sets up logging for the Kudasai program.
51
+
52
+ """
53
+
54
+ ## Debug log setup
55
+ debug_log_handler = logging.FileHandler(FileEnsurer.debug_log_path, mode='w+', encoding='utf-8')
56
+ debug_log_handler.setLevel(logging.DEBUG)
57
+ debug_formatter = logging.Formatter('[%(asctime)s] [%(levelname)s] [%(filename)s] %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
58
+ debug_log_handler.setFormatter(debug_formatter)
59
+
60
+ ## Error log setup
61
+ error_log_handler = logging.FileHandler(FileEnsurer.error_log_path, mode='w+', encoding='utf-8')
62
+ error_log_handler.setLevel(logging.WARNING)
63
+ error_formatter = logging.Formatter('[%(asctime)s] [%(levelname)s] [%(filename)s] %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
64
+ error_log_handler.setFormatter(error_formatter)
65
+
66
+ ## Console handler setup
67
+ console = logging.StreamHandler()
68
+ console.setLevel(logging.INFO)
69
+ console_formatter = logging.Formatter('[%(asctime)s] [%(levelname)s] [%(filename)s] %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
70
+ console.setFormatter(console_formatter)
71
+
72
+ ## Add handlers to the logger
73
+ logger = logging.getLogger('')
74
+ logger.setLevel(logging.DEBUG)
75
+ logger.addHandler(debug_log_handler)
76
+ logger.addHandler(error_log_handler)
77
+ logger.addHandler(console)
78
+
79
+ ## Ensure only INFO level and above messages are sent to the console
80
+ console.setLevel(logging.INFO)
81
 
82
  ##-------------------start-of-boot()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
83
 
 
86
 
87
  """
88
 
89
+ Does some logging and sets up the console window, and translator settings, regardless of whether the user is running the CLI, WebGUI, or Console version of Kudasai.
90
 
91
  """
92
 
 
94
 
95
  Toolkit.clear_console()
96
 
97
+ ## Need to create the output dir FIRST as logging files are located in the output folder
98
+ FileEnsurer.standard_create_directory(FileEnsurer.output_dir)
99
+
100
+ Kudasai.setup_logging()
101
+
102
  FileEnsurer.setup_needed_files()
103
 
104
+ logging.debug(f"Kudasai started; Current version : {Toolkit.CURRENT_VERSION}")
 
 
 
105
 
106
  try:
107
 
108
+ with open(FileEnsurer.config_translation_settings_path, "r") as translation_settings:
109
+ JsonHandler.current_translation_settings = json.load(translation_settings)
110
 
111
  JsonHandler.validate_json()
112
 
113
+ assert JsonHandler.current_translation_settings != FileEnsurer.INVALID_TRANSLATION_SETTINGS_PLACEHOLDER
114
 
115
  except:
116
 
117
+ print("Invalid translation_settings.json file. Please check the file for errors or mistakes. If you are unsure, delete the file and run Kudasai again. Your file is located at: " + FileEnsurer.config_translation_settings_path)
118
 
119
  Toolkit.pause_console()
120
 
121
+ raise Exception("Invalid translation_settings.json file. Please check the file for errors or mistakes. If you are unsure, delete the file and run Kudasai again. Your file is located at: " + FileEnsurer.config_translation_settings_path)
122
 
123
  ##-------------------start-of-run_kairyou_indexer()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
124
 
125
  @staticmethod
126
+ def run_kairyou_indexer(text_to_index:str, replacement_json:typing.Union[dict,str], knowledge_base:str) -> typing.Tuple[str, str]:
127
 
128
  """
129
 
 
141
 
142
  Toolkit.clear_console()
143
 
 
 
144
  ## unique names is a list of named tuples, with the fields name and occurrence
145
  unique_names, indexing_log = Indexer.index(text_to_index, knowledge_base, replacement_json)
146
 
 
185
  new_text += text[last_end:match.start()] + f">>>{name}<<<"
186
  last_end = match.end()
187
 
188
+ new_text += text[last_end:] ## Append the rest of the text
189
  text = new_text
190
 
191
  return text
 
207
 
208
  indexing_log = ""
209
 
210
+ if(Kudasai.replacement_json not in ["",
211
+ FileEnsurer.blank_rules_path,
212
+ FileEnsurer.standard_read_json(FileEnsurer.blank_rules_path)]
213
+
214
+ and Kudasai.need_to_run_indexer
215
+ and Kudasai.knowledge_base != ""):
216
+ Kudasai.text_to_preprocess, indexing_log = Kudasai.run_kairyou_indexer(Kudasai.text_to_preprocess, Kudasai.replacement_json, Kudasai.knowledge_base)
217
 
218
  preprocessed_text, preprocessing_log, error_log = Kairyou.preprocess(Kudasai.text_to_preprocess, Kudasai.replacement_json)
219
 
 
224
  if(indexing_log != ""):
225
  preprocessing_log = indexing_log + "\n\n" + preprocessing_log
226
 
227
+ if(preprocessing_log == "Skipped"):
228
+ preprocessing_log = "Preprocessing skipped."
229
+
230
  print(preprocessing_log)
231
 
232
  timestamp = Toolkit.get_timestamp(is_archival=True)
 
239
  else:
240
  print("(Preprocessing skipped)")
241
 
242
+ await Kudasai.run_translator()
243
 
244
  Toolkit.pause_console("\nPress any key to exit...")
245
 
 
263
  Toolkit.pause_console()
264
  Toolkit.clear_console()
265
 
266
+ ##-------------------start-of-run_translator()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
267
 
268
  @staticmethod
269
+ async def run_translator(is_cli:bool=False) -> None:
270
 
271
  """
272
 
273
+ If the user is running the CLI or Console version of Kudasai, this function is called to run the Translator module.
274
 
275
  """
276
 
277
+ Translator.is_cli = is_cli
 
 
 
 
 
 
278
 
279
+ logging.info("Translator started")
 
 
 
 
280
 
281
  Toolkit.clear_console()
282
 
283
+ Translator.text_to_translate = [line for line in Kudasai.text_to_preprocess.splitlines()]
 
 
 
 
 
 
 
 
284
 
285
+ await Translator.translate()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
286
 
287
  Toolkit.clear_console()
288
 
289
+ print(Translator.translation_print_result)
 
 
 
 
 
 
 
290
 
291
+ Translator.write_translator_results()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
292
 
293
  ##-------------------start-of-main()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
294
 
 
308
  if(len(sys.argv) <= 1):
309
  await run_console_version()
310
 
311
+ elif(len(sys.argv) in [2, 3, 4, 5, 6]):
312
  await run_cli_version()
313
 
314
  else:
 
329
 
330
  try:
331
 
332
+ path_to_text_to_preprocess = input("Please enter the path to the input file to be preprocessed/translated:\n").strip('"')
333
  Kudasai.text_to_preprocess = FileEnsurer.standard_read_file(path_to_text_to_preprocess)
334
  Toolkit.clear_console()
335
 
336
+ path_to_replacement_json = input("Please enter the path to the replacement json file (Press enter if skipping to translation):\n").strip('"')
337
  Kudasai.replacement_json = FileEnsurer.standard_read_json(path_to_replacement_json if path_to_replacement_json else FileEnsurer.blank_rules_path)
338
  Toolkit.clear_console()
339
 
340
+ if(path_to_replacement_json != ""):
341
+ Kudasai.knowledge_base = input("Please enter the path to the knowledge base you would like to use for the name indexer (can be text, a path to a txt file, or a path to a directory of txt files (Press enter if skipping name indexing):\n").strip('"')
342
+ Toolkit.clear_console()
343
+
344
  except Exception as e:
345
  print_usage_statement()
346
 
347
  raise e
348
+
349
+ print("In progress...")
350
 
351
  await Kudasai.run_kudasai()
 
352
 
353
  ##-------------------start-of-run_cli_version()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
354
 
 
360
 
361
  """
362
 
363
+ def determine_argument_type(arg:str) -> str:
364
+
365
+ """
366
+
367
+ Determines the third argument for the CLI version of Kudasai.
368
+
369
+ """
370
+
371
+ conditions = [
372
+ (lambda arg: arg in ["deepl", "openai", "gemini"], "translation_method"),
373
+ (lambda arg: os.path.exists(arg) and not ".json" in arg, "text_to_translate"),
374
+ (lambda arg: len(arg) > 10 and not os.path.exists(arg), "api_key"),
375
+ (lambda arg: arg == "translate", "identifier"),
376
+ (lambda arg: os.path.exists(arg) and ".json" in arg, "translation_settings_json")
377
+ ]
378
+
379
+ for condition, result in conditions:
380
+ if(condition(arg)):
381
+ print(result)
382
+ return result
383
+
384
+ raise Exception("Invalid argument. Please use 'deepl', 'openai', or 'gemini'.")
385
+
386
+ mode = ""
387
+
388
  try:
389
 
390
+ indices = {
391
+ "preprocess": {"text_to_preprocess_index": 2, "replacement_json_index": 3, "knowledge_base_index": 4},
392
+ "translate": {"text_to_translate_index": 2},
393
+ "--help": {}
394
+ }
395
 
396
+ try:
397
+ arg_indices = indices[sys.argv[1]]
398
+ mode = sys.argv[1]
399
 
400
+ except KeyError:
401
+ print_usage_statement()
402
+ raise Exception("Invalid mode. Please use 'preprocess' or 'translate'. Please use --help for more information.")
403
+
404
+ if(mode == "preprocess"):
405
+
406
+ Kudasai.text_to_preprocess = FileEnsurer.standard_read_file(sys.argv[arg_indices['text_to_preprocess_index']].strip('"'))
407
+ Kudasai.replacement_json = FileEnsurer.standard_read_json(sys.argv[arg_indices['replacement_json_index']].strip('"')) if len(sys.argv) >= arg_indices['replacement_json_index'] + 1 else FileEnsurer.standard_read_json(FileEnsurer.blank_rules_path)
408
+ Kudasai.knowledge_base = sys.argv[arg_indices['knowledge_base_index']].strip('"') if len(sys.argv) == arg_indices['knowledge_base_index'] + 1 else ""
409
+
410
+ if(len(sys.argv) == 2):
411
+ Kudasai.need_to_run_kairyou = False
412
+ elif(len(sys.argv) == 3):
413
+ Kudasai.need_to_run_indexer = False
414
+
415
+ await Kudasai.run_kudasai()
416
 
417
+ elif(mode == "translate"):
 
418
 
419
+ method_to_translation_mode = {
420
+ "openai": "1",
421
+ "gemini": "2",
422
+ "deepl": "3"
423
+ }
424
+
425
+ Kudasai.text_to_preprocess = FileEnsurer.standard_read_file(sys.argv[arg_indices['text_to_translate_index']].strip('"'))
426
+
427
+ sys.argv.pop(0)
428
+
429
+ arg_dict = {arg.strip('"'): determine_argument_type(arg.strip('"')) for arg in sys.argv}
430
+
431
+ assert len(arg_dict) == len(set(arg_dict)), "Invalid arguments. Please use --help for more information."
432
+
433
+ arg_type_action_map = {
434
+ "translation_method": lambda arg: setattr(Translator, 'TRANSLATION_METHOD', method_to_translation_mode[arg]),
435
+ "translation_settings_json": lambda arg: setattr(JsonHandler, 'current_translation_settings', FileEnsurer.standard_read_json(arg)),
436
+ "api_key": lambda arg: setattr(Translator, 'pre_provided_api_key', arg),
437
+ "identifier": lambda arg: None,
438
+ "text_to_translate": lambda arg: setattr(Kudasai, 'text_to_preprocess', FileEnsurer.standard_read_file(arg))
439
+ }
440
 
441
+ for arg, arg_type in arg_dict.items():
442
+ if(arg_type in arg_type_action_map):
443
+ arg_type_action_map[arg_type](arg)
444
+ else:
445
+ raise Exception("Invalid argument type. Please use --help for more information.")
446
+
447
+ await Kudasai.run_translator(is_cli=True)
448
+
449
+ else:
450
+ print_usage_statement()
451
+
452
+ except Exception as e:
453
+ print_usage_statement()
454
+ raise e
455
+
456
  ##-------------------start-of-print_usage_statement()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
457
 
458
  def print_usage_statement():
 
462
  Prints the usage statement for the CLI version of Kudasai.
463
 
464
  """
465
+ python_command = "python" if Toolkit.is_windows() else "python3"
466
+
467
+ print(f"""
468
+ Usage: {python_command} Kudasai.py <mode> <required_arguments> [optional_arguments]
469
+
470
+ Modes:
471
+ preprocess
472
+ Preprocesses the text file using the provided replacement JSON.
473
+
474
+ Required arguments:
475
+ <input_file> Path to the text file to preprocess. This a path to a text file
476
+ <replacement_json> Path to the replacement JSON file. This is a path to a json file.
477
+
478
+ Optional arguments:
479
+ <knowledge_base> Path to the knowledge base file. This can be either a directory, file, or even text.
480
+
481
+ Example:
482
+ {python_command} Kudasai.py preprocess "C:\\path\\to\\input_file.txt" "C:\\path\\to\\replacement_json.json" "C:\\path\\to\\knowledge_base"
483
+
484
+ translate
485
+ Translates the text file using the specified translation method.
486
+
487
+ Required arguments:
488
+ <input_file> Path to the text file to translate. This is a txt file.
489
+
490
+ Optional arguments:
491
+ <translation_method> Translation method to use ('deepl', 'openai', or 'gemini'). This defaults to deepl
492
+ <translation_settings_json> Path to the translation settings JSON file. This will override the current loaded settings.
493
+ <api_key> API key for the translation service. If not provided, it will use the one on file, otherwise it will ask if not provided
494
+
495
+ Example:
496
+ {python_command} Kudasai.py translate "C:\\path\\to\\input_file.txt" gemini "C:\\path\\to\\translation_settings.json" "YOUR API KEY"
497
 
498
+ Additional Notes:
499
+ - All arguments should be enclosed in double quotes if they contain spaces. But double quotes are optional and will be striped. Single quotes are not allowed
500
+ - For more information, refer to the documentation at README.md
501
+ """)
502
 
 
503
 
504
  ##-------------------start-of-submain()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
505
 
506
 
507
+ if(__name__ == "__main__"):
508
  asyncio.run(main())
lib/common/translation_settings_description.txt ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ----------------------------------------------------------------------------------
2
+ Base Translation Settings:
3
+
4
+ prompt_assembly_mode : 1 or 2. 1 means the system message will actually be treated as a system message. 2 means it'll be treated as a user message. 1 is recommend for gpt-4 otherwise either works. For Gemini & DeepL, this setting is ignored.
5
+
6
+ number_of_lines_per_batch : The number of lines to be built into a prompt at once. Theoretically, more lines would be more cost effective, but other complications may occur with higher lines. So far been tested up to 48 by me.
7
+
8
+ sentence_fragmenter_mode : 1 or 2 (1 - via regex and other nonsense) 2 - None (Takes formatting and text directly from API return)) the API can sometimes return a result on a single line, so this determines the way Kudasai fragments the sentences if at all. Use 2 for newer models and Deepl.
9
+
10
+ je_check_mode : 1 or 2, 1 will print out the jap then the english below separated by ---, 2 will attempt to pair the english and jap sentences, placing the jap above the eng. If it cannot, it will default to 1. Use 2 for newer models and DeepL.
11
+
12
+ number_of_malformed_batch_retries : (Malformed batch is when je-fixing fails) How many times Kudasai will attempt to mend a malformed batch (mending is resending the request). Be careful with increasing as cost increases at (cost * length * n) at worst case. This setting is ignored if je_check_mode is set to 1.
13
+
14
+ batch_retry_timeout : How long Kudasai will try to translate a batch in seconds, if a requests exceeds this duration, Kudasai will leave it untranslated.
15
+
16
+ number_of_concurrent_batches : How many translations batches Kudasai will send to the translation API at a time. For OpenAI, be conservative as rate-limiting is aggressive, I'd suggest 3-5. For Gemini, do not exceed 15 for 1.0 or 2 for 1.5. This setting more or less doesn't matter for DeepL.
17
+ ----------------------------------------------------------------------------------
18
+ Open AI Settings:
19
+ See https://platform.openai.com/docs/api-reference/chat/create for further details
20
+ ----------------------------------------------------------------------------------
21
+ openai_model : ID of the model to use. Kudasai only works with 'chat' models.
22
+
23
+ openai_system_message : Instructions to the model. Basically tells the model how to translate.
24
+
25
+ openai_temperature : What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Lower values are typically better for translation.
26
+
27
+ openai_top_p : An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. I generally recommend altering this or temperature but not both.
28
+
29
+ openai_n : How many chat completion choices to generate for each input message. Do not change this, as Kudasai will always use 1.
30
+
31
+ openai_stream : If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. See the OpenAI python library on GitHub for example code. Do not change this as Kudasai does not support this feature.
32
+
33
+ openai_stop : Up to 4 sequences where the API will stop generating further tokens. Do not change this as Kudasai does not support this feature.
34
+
35
+ openai_logit_bias : Modifies the likelihood of specified tokens appearing in the completion. Do not change this as Kudasai does not support this feature.
36
+
37
+ openai_max_tokens : The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. I wouldn't recommend changing this. Is none by default. If you change to an integer, make sure it doesn't exceed that model's context length or your request will fail and repeat till timeout.
38
+
39
+ openai_presence_penalty : Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. While negative values encourage repetition. Should leave this at 0.0.
40
+
41
+ openai_frequency_penalty : Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. Negative values encourage repetition. Should leave this at 0.0.
42
+ ----------------------------------------------------------------------------------
43
+ openai_stream, openai_logit_bias, openai_stop and openai_n are included for completion's sake, current versions of Kudasai will hardcode their values when validating the translation_settings.json to their default values. As different values for these settings do not have a use case in Kudasai's current implementation.
44
+ ----------------------------------------------------------------------------------
45
+ Gemini Settings:
46
+ See https://ai.google.dev/docs/concepts#model-parameters for further details
47
+ ----------------------------------------------------------------------------------
48
+ gemini_model : The model to use. Currently only supports gemini-pro and gemini-pro-vision, the 1.0 model and 1.5 models and their aliases.
49
+
50
+ gemini_prompt : Instructions to the model. Basically tells the model how to translate.
51
+
52
+ gemini_temperature : What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Lower values are typically better for translation.
53
+
54
+ gemini_top_p : An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. I generally recommend altering this or temperature but not both.
55
+
56
+ gemini_top_k : Determines the number of most probable tokens to consider for each selection step. A higher value increases diversity, a lower value makes the output more deterministic.
57
+
58
+ gemini_candidate_count : The number of candidates to generate for each input message. Do not change this as Kudasai will always use 1.
59
+
60
+ gemini_stream : If set, partial message deltas will be sent, like in Gemini Chat. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. Do not change this as Kudasai does not support this feature.
61
+
62
+ gemini_stop_sequences : Up to 4 sequences where the API will stop generating further tokens. Do not change this as Kudasai does not support this feature.
63
+
64
+ gemini_max_output_tokens : The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. I wouldn't recommend changing this. Is none by default. If you change to an integer, make sure it doesn't exceed that model's context length or your request will fail and repeat till timeout.
65
+ ----------------------------------------------------------------------------------
66
+ gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for completion's sake, current versions of Kudasai will hardcode their values when validating the translation_settings.json to their default values. As different values for these settings do not have a use case in Kudasai's current implementation.
67
+ ----------------------------------------------------------------------------------
68
+ Deepl Settings:
69
+ See https://developers.deepl.com/docs/api-reference/translate for further details
70
+ ----------------------------------------------------------------------------------
71
+ deepl_context : The context in which the text should be translated. This is used to improve the translation. If you don't have any context, you can leave this empty. This is a DeepL Alpha feature and could be subject to change.
72
+
73
+ deepl_split_sentences : How the text should be split into sentences. Possible values are 'OFF', 'ALL', 'NO_NEWLINES'.
74
+
75
+ deepl_preserve_formatting : Whether the formatting of the text should be preserved. If you don't want to preserve the formatting, you can set this to False. Otherwise, set it to True.
76
+
77
+ deepl_formality : The formality of the text. Possible values are 'default', 'more', 'less', 'prefer_more', 'prefer_less'.
78
+
79
+ ----------------------------------------------------------------------------------
lib/gui/HUGGING_FACE_README.md ADDED
@@ -0,0 +1,224 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gpl-3.0
3
+ title: Kudasai
4
+ sdk: gradio
5
+ emoji: 🈷️
6
+ python_version: 3.10.0
7
+ app_file: webgui.py
8
+ colorFrom: gray
9
+ colorTo: gray
10
+ short_description: Japanese-English preprocessor with automated translation.
11
+ pinned: true
12
+ ---
13
+
14
+ ---------------------------------------------------------------------------------------------------------------------------------------------------
15
+ **Table of Contents**
16
+
17
+ - [**Notes**](#notes)
18
+ - [**General Usage**](#general-usage)
19
+ - [**Indexing and Preprocessing**](#indexing-and-preprocessing)
20
+ - [**Translator**](#translator)
21
+ - [**Translator Settings**](#translator-settings)
22
+ - [**Web GUI**](#web-gui)
23
+ - [**License**](#license)
24
+ - [**Contact**](#contact)
25
+ - [**Acknowledgements**](#acknowledgements)
26
+
27
+ ---------------------------------------------------------------------------------------------------------------------------------------------------
28
+ ## **Notes**<a name="notes"></a>
29
+
30
+ This readme is for the Hugging Space instance of Kudasai's WebGUI and the WebGUI itself, to run Kudasai locally or see any info on the project, please see the [GitHub Page](https://github.com/Bikatr7/Kudasai).
31
+
32
+ Streamlining Japanese-English Translation with Advanced Preprocessing and Integrated Translation Technologies.
33
+
34
+ Preprocessor and Translation logic is sourced from external packages, which I also designed, see [Kairyou](https://github.com/Bikatr7/Kairyou) and [EasyTL](https://github.com/Bikatr7/easytl) for more information.
35
+
36
+ Kudasai has a public trello board, you can find it [here](https://trello.com/b/Wsuwr24S/kudasai) to see what I'm working on and what's coming up.
37
+
38
+ The WebGUI on huggingface does not save anything through runs, so you will need to download the output files or copy the text out of the webgui. API keys are not saved, and the output folder is overwritten every time you run it. Archives deleted every run as well.
39
+
40
+ Kudasai is proud to have been a Backdrop Build v3 Finalist:
41
+ https://backdropbuild.com/builds/v3/kudasai
42
+
43
+ ---------------------------------------------------------------------------------------------------------------------------------------------------
44
+
45
+ ## **General Usage**<a name="general-usage"></a>
46
+
47
+ Kudasai's WebGUI is pretty easy to understand for the general usage, most incorrect actions will be caught by the system and a message will be displayed to the user on how to correct it.
48
+
49
+ Normally, Kudasai would save files to the local system, but on Hugging Face's servers, this is not possible. Instead, you'll have to click the 'Save As' button to download the files to your local system.
50
+
51
+ Or you can click the copy button on the top right of textbox modals to copy the text to your clipboard.
52
+
53
+ For further details, see below chapters.
54
+
55
+ ---------------------------------------------------------------------------------------------------------------------------------------------------
56
+
57
+ ## **Indexing and Preprocessing**<a name="kairyou"></a>
58
+
59
+ This section can be skipped if you're only interested in translation or do not know what indexing or preprocessing is.
60
+
61
+ Indexing is not for everyone, only use it if you have a large amount of previous text and want to flag new names. It can be a very slow and long process, especially on Hugging Face's servers. It's recommended to use a local version of Kudasai for this process.
62
+
63
+ You'll need a txt file or some text to index. You'll also need a knowledge base, this can either be a single txt file or a directory of them, as well as a replacements json. Either Kudasai or Fukuin Type works. See [this](https://github.com/Bikatr7/Kairyou?tab=readme-ov-file#kairyou) for further details on replacement jsons.
64
+
65
+ Please do indexing before preprocessing, output is neater that way.
66
+
67
+ For Preprocessing, you'll need a txt file or some text to preprocess. You'll also need a replacements json. Either Kudasai or Fukuin Type works like with indexing.
68
+
69
+ For both, text is put in the textbox modals, with the output text being in the first field, and results being in the second field.
70
+
71
+ They both have a debug field, but neither module really uses it.
72
+
73
+ ---------------------------------------------------------------------------------------------------------------------------------------------------
74
+
75
+ ## **Translator**<a name="translator"></a>
76
+
77
+ Kudasai supports 3 different translation methods at the moment, OpenAI's GPT, Google's Gemini, and DeepL.
78
+
79
+ For OpenAI, you'll need an API key, you can get one [here](https://platform.openai.com/docs/api-reference/authentication). This is a paid service with no free tier.
80
+
81
+ For Gemini, you'll also need an API key, you can get one [here](https://ai.google.dev/tutorials/setup). Gemini is free to use under a certain limit, 2 RPM for 1.5 and 15 RPM for 1.0.
82
+
83
+ For DeepL, you'll need an API key too, you can get one [here](https://www.deepl.com/pro#developer). DeepL is also a paid service but is free under 500k characters a month.
84
+
85
+ I'd recommend using GPT for most things, as it's generally better at translation.
86
+
87
+ Mostly straightforward, choose your translation method, fill in your API key, and select your text. You'll also need to add your settings file if on HuggingFace if you want to tune the output, but the default is generally fine.
88
+
89
+ You can calculate costs here or just translate. Output will show in the appropriate fields.
90
+
91
+ For further details on the settings file, see [here](#translation-with-llms-settings).
92
+
93
+ ---------------------------------------------------------------------------------------------------------------------------------------------------
94
+
95
+ ## **Translator Settings**<a name="translator-settings"></a>
96
+
97
+ (Fairly technical, can be abstracted away by using default settings or someone else's settings file.)
98
+
99
+ Base Translation Settings:
100
+
101
+ prompt_assembly_mode : 1 or 2. 1 means the system message will actually be treated as a system message. 2 means it'll be treated as a user message. 1 is recommend for gpt-4 otherwise either works. For Gemini & DeepL, this setting is ignored.
102
+
103
+ number_of_lines_per_batch : The number of lines to be built into a prompt at once. Theoretically, more lines would be more cost effective, but other complications may occur with higher lines. So far been tested up to 48 by me.
104
+
105
+ sentence_fragmenter_mode : 1 or 2 (1 - via regex and other nonsense) 2 - None (Takes formatting and text directly from API return)) the API can sometimes return a result on a single line, so this determines the way Kudasai fragments the sentences if at all. Use 2 for newer models and Deepl.
106
+
107
+ je_check_mode : 1 or 2, 1 will print out the jap then the english below separated by ---, 2 will attempt to pair the english and jap sentences, placing the jap above the eng. If it cannot, it will default to 1. Use 2 for newer models and DeepL.
108
+
109
+ number_of_malformed_batch_retries : (Malformed batch is when je-fixing fails) How many times Kudasai will attempt to mend a malformed batch (mending is resending the request). Be careful with increasing as cost increases at (cost * length * n) at worst case. This setting is ignored if je_check_mode is set to 1.
110
+
111
+ batch_retry_timeout : How long Kudasai will try to translate a batch in seconds, if a requests exceeds this duration, Kudasai will leave it untranslated.
112
+
113
+ number_of_concurrent_batches : How many translations batches Kudasai will send to the translation API at a time. For OpenAI, be conservative as rate-limiting is aggressive, I'd suggest 3-5. For Gemini, do not exceed 15 for 1.0 or 2 for 1.5. This setting more or less doesn't matter for DeepL.
114
+ ----------------------------------------------------------------------------------
115
+ Open AI Settings:
116
+ See https://platform.openai.com/docs/api-reference/chat/create for further details
117
+ ----------------------------------------------------------------------------------
118
+ openai_model : ID of the model to use. Kudasai only works with 'chat' models.
119
+
120
+ openai_system_message : Instructions to the model. Basically tells the model how to translate.
121
+
122
+ openai_temperature : What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Lower values are typically better for translation.
123
+
124
+ openai_top_p : An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. I generally recommend altering this or temperature but not both.
125
+
126
+ openai_n : How many chat completion choices to generate for each input message. Do not change this, as Kudasai will always use 1.
127
+
128
+ openai_stream : If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. See the OpenAI python library on GitHub for example code. Do not change this as Kudasai does not support this feature.
129
+
130
+ openai_stop : Up to 4 sequences where the API will stop generating further tokens. Do not change this as Kudasai does not support this feature.
131
+
132
+ openai_logit_bias : Modifies the likelihood of specified tokens appearing in the completion. Do not change this as Kudasai does not support this feature.
133
+
134
+ openai_max_tokens : The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. I wouldn't recommend changing this. Is none by default. If you change to an integer, make sure it doesn't exceed that model's context length or your request will fail and repeat till timeout.
135
+
136
+ openai_presence_penalty : Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. While negative values encourage repetition. Should leave this at 0.0.
137
+
138
+ openai_frequency_penalty : Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. Negative values encourage repetition. Should leave this at 0.0.
139
+ ----------------------------------------------------------------------------------
140
+ openai_stream, openai_logit_bias, openai_stop and openai_n are included for completion's sake, current versions of Kudasai will hardcode their values when validating the translation_settings.json to their default values. As different values for these settings do not have a use case in Kudasai's current implementation.
141
+ ----------------------------------------------------------------------------------
142
+ Gemini Settings:
143
+ See https://ai.google.dev/docs/concepts#model-parameters for further details
144
+ ----------------------------------------------------------------------------------
145
+ gemini_model : The model to use. Currently only supports gemini-pro and gemini-pro-vision, the 1.0 model and 1.5 models and their aliases.
146
+
147
+ gemini_prompt : Instructions to the model. Basically tells the model how to translate.
148
+
149
+ gemini_temperature : What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Lower values are typically better for translation.
150
+
151
+ gemini_top_p : An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. I generally recommend altering this or temperature but not both.
152
+
153
+ gemini_top_k : Determines the number of most probable tokens to consider for each selection step. A higher value increases diversity, a lower value makes the output more deterministic.
154
+
155
+ gemini_candidate_count : The number of candidates to generate for each input message. Do not change this as Kudasai will always use 1.
156
+
157
+ gemini_stream : If set, partial message deltas will be sent, like in Gemini Chat. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. Do not change this as Kudasai does not support this feature.
158
+
159
+ gemini_stop_sequences : Up to 4 sequences where the API will stop generating further tokens. Do not change this as Kudasai does not support this feature.
160
+
161
+ gemini_max_output_tokens : The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. I wouldn't recommend changing this. Is none by default. If you change to an integer, make sure it doesn't exceed that model's context length or your request will fail and repeat till timeout.
162
+ ----------------------------------------------------------------------------------
163
+ gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for completion's sake, current versions of Kudasai will hardcode their values when validating the translation_settings.json to their default values. As different values for these settings do not have a use case in Kudasai's current implementation.
164
+ ----------------------------------------------------------------------------------
165
+ Deepl Settings:
166
+ See https://developers.deepl.com/docs/api-reference/translate for further details
167
+ ----------------------------------------------------------------------------------
168
+ deepl_context : The context in which the text should be translated. This is used to improve the translation. If you don't have any context, you can leave this empty. This is a DeepL Alpha feature and could be subject to change.
169
+
170
+ deepl_split_sentences : How the text should be split into sentences. Possible values are 'OFF', 'ALL', 'NO_NEWLINES'.
171
+
172
+ deepl_preserve_formatting : Whether the formatting of the text should be preserved. If you don't want to preserve the formatting, you can set this to False. Otherwise, set it to True.
173
+
174
+ deepl_formality : The formality of the text. Possible values are 'default', 'more', 'less', 'prefer_more', 'prefer_less'.
175
+
176
+ ---------------------------------------------------------------------------------------------------------------------------------------------------
177
+
178
+ ## **Web GUI**<a name="webgui"></a>
179
+
180
+ Below are some images of the Web GUI.
181
+
182
+ Name Indexing | Kairyou:
183
+ ![Name Indexing Screen | Kairyou](https://i.imgur.com/QCPqjrw.jpeg)
184
+
185
+ Text Preprocessing | Kairyou:
186
+ ![Text Preprocessing Screen | Kairyou](https://i.imgur.com/r8nHEvw.jpeg)
187
+
188
+ Text Translation | Translator:
189
+ ![Text Translation Screen | Translator](https://i.imgur.com/0E9q2eh.jpeg)
190
+
191
+ Translation Settings Page 1:
192
+ ![Translation Settings Page 1](https://i.imgur.com/0E9q2eh.jpeg)
193
+
194
+ Translation Settings Page 2:
195
+ ![Translation Settings Page 2](https://i.imgur.com/8MQk6pL.jpeg)
196
+
197
+ Logging Page:
198
+ ![Logging Page](https://i.imgur.com/vDPCUQC.jpeg)
199
+
200
+ ---------------------------------------------------------------------------------------------------------------------------------------------------
201
+ ## **License**<a name="license"></a>
202
+
203
+ This project (Kudasai) is licensed under the GNU General Public License (GPL). You can find the full text of the license in the [LICENSE](License.md) file.
204
+
205
+ The GPL is a copyleft license that promotes the principles of open-source software. It ensures that any derivative works based on this project must also be distributed under the same GPL license. This license grants you the freedom to use, modify, and distribute the software.
206
+
207
+ Please note that this information is a brief summary of the GPL. For a detailed understanding of your rights and obligations under this license, please refer to the full license text.
208
+
209
+ ---------------------------------------------------------------------------------------------------------------------------------------------------
210
+ ## **Contact**<a name="contact"></a>
211
+
212
+ If you have any questions, comments, or concerns, please feel free to contact me at [Bikatr7@proton.me](mailto:Bikatr7@proton.me)
213
+
214
+ For any bugs or suggestions please use the issues tab [here](https://github.com/Bikatr7/Kudasai/issues).
215
+
216
+ I actively encourage and welcome any feedback on this project.
217
+
218
+ ---------------------------------------------------------------------------------------------------------------------------------------------------
219
+
220
+ ## **Acknowledgements**<a name="acknowledgements"></a>
221
+
222
+ Kudasai gets it's original name idea from it's inspiration, Atreyagaurav's Onegai. Which also means please. You can find that [here](https://github.com/Atreyagaurav/onegai)
223
+
224
+ ---------------------------------------------------------------------------------------------------------------------------------------------------
lib/gui/save_to_file.js ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ (text) => {
2
+ const blob = new Blob([text], { type: 'text/plain;charset=utf-8' });
3
+ const url = URL.createObjectURL(blob);
4
+ const a = document.createElement('a');
5
+ a.href = url;
6
+ a.download = 'downloaded_text.txt';
7
+ a.click();
8
+ URL.revokeObjectURL(url);
9
+ }
models/kaiseki.py DELETED
@@ -1,583 +0,0 @@
1
- ## Basically Deprecated, use Kijiku instead. Currently only maintained for backwards compatibility.
2
- ##---------------------------------------
3
- ##---------------------------------------
4
- ## built-in libraries
5
- import string
6
- import time
7
- import re
8
- import base64
9
- import time
10
-
11
- ## third-party libraries
12
- from easytl import EasyTL
13
-
14
- ## custom modules
15
- from modules.common.toolkit import Toolkit
16
- from modules.common.file_ensurer import FileEnsurer
17
- from modules.common.logger import Logger
18
- from modules.common.decorators import permission_error_decorator
19
- from modules.common.exceptions import AuthorizationException, QuotaExceededException
20
-
21
- ##-------------------start-of-Kaiseki--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
22
-
23
- class Kaiseki:
24
-
25
- """
26
-
27
- Kaiseki is a secondary class that is used to interact with the Deepl API and translate Japanese text sentence by sentence.
28
-
29
- Kaiseki is considered inferior to Kijiku, please consider using Kijiku instead.
30
-
31
- """
32
-
33
- ##---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
34
-
35
- text_to_translate = []
36
-
37
- translated_text = []
38
-
39
- je_check_text = []
40
-
41
- error_text = []
42
-
43
- translation_print_result = ""
44
-
45
- ##---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
46
-
47
- sentence_parts = []
48
-
49
- sentence_punctuation = []
50
-
51
- ## [0] = "" [1] = ~ [2] = '' in Kaiseki.current_sentence but not entire Kaiseki.current_sentence [3] = '' but entire Kaiseki.current_sentence [3] if () in Kaiseki.current_sentence
52
- special_punctuation = []
53
-
54
- current_sentence = ""
55
-
56
- translated_sentence = ""
57
-
58
- ##-------------------start-of-translate()--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
59
-
60
- @staticmethod
61
- def translate() -> None:
62
-
63
- """
64
-
65
- Translates the text.
66
-
67
- """
68
-
69
- Logger.clear_batch()
70
-
71
- time_start = time.time()
72
-
73
- try:
74
-
75
- Kaiseki.initialize()
76
-
77
- ## offset time, for if the user doesn't get through Kaiseki.initialize() before the translation starts.
78
- time_start = time.time()
79
-
80
- Kaiseki.commence_translation()
81
-
82
- except Exception as e:
83
-
84
- Kaiseki.translation_print_result += "An error has occurred, outputting results so far..."
85
-
86
- FileEnsurer.handle_critical_exception(e)
87
-
88
- finally:
89
-
90
- time_end = time.time()
91
-
92
- Kaiseki.assemble_results(time_start, time_end)
93
-
94
- ##-------------------start-of-initialize()--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
95
-
96
- @staticmethod
97
- def initialize() -> None:
98
-
99
- """
100
-
101
- Initializes the Kaiseki class by getting the API key and creating the translator object.
102
-
103
- """
104
-
105
- ## get saved api key if exists
106
- try:
107
-
108
- with open(FileEnsurer.deepl_api_key_path, 'r', encoding='utf-8') as file:
109
- api_key = base64.b64decode((file.read()).encode('utf-8')).decode('utf-8')
110
-
111
- EasyTL.set_api_key("deepl", api_key)
112
- is_valid, e = EasyTL.test_api_key_validity("deepl")
113
-
114
- assert is_valid == True, e
115
-
116
- Logger.log_action("Used saved api key in " + FileEnsurer.deepl_api_key_path, output=True)
117
-
118
- ## else try to get api key manually
119
- except Exception as e:
120
-
121
- api_key = input("DO NOT DELETE YOUR COPY OF THE API KEY\n\nPlease enter the deepL api key you have : ")
122
-
123
- ## if valid save the api key
124
- try:
125
-
126
- EasyTL.set_api_key("deepl", api_key)
127
- is_valid, e = EasyTL.test_api_key_validity("deepl")
128
-
129
- assert is_valid, e
130
-
131
- time.sleep(.1)
132
-
133
- FileEnsurer.standard_overwrite_file(FileEnsurer.deepl_api_key_path, base64.b64encode(api_key.encode('utf-8')).decode('utf-8'), omit=True)
134
-
135
- time.sleep(.1)
136
-
137
- ## if invalid key exit
138
- except AuthorizationException:
139
-
140
- Toolkit.clear_console()
141
-
142
- Logger.log_action("Authorization error with creating translator object, please double check your api key as it appears to be incorrect.\nKaiseki will now exit.", output=True)
143
-
144
- Toolkit.pause_console()
145
-
146
- raise e # type: ignore
147
-
148
- ## other error, alert user and raise it
149
- except Exception as e:
150
-
151
- Toolkit.clear_console()
152
-
153
- Logger.log_action("Unknown error with creating translator object, The error is as follows " + str(e) + "\nKaiseki will now exit.", output=True)
154
-
155
- Toolkit.pause_console()
156
-
157
- raise e
158
-
159
- Toolkit.clear_console()
160
- Logger.log_barrier()
161
-
162
- ##-------------------start-of-reset_static_variables()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
163
-
164
- @staticmethod
165
- def reset_static_variables() -> None:
166
-
167
- """
168
-
169
- Resets the static variables of the Kaiseki class.
170
- For when running multiple translations in a row through webgui.
171
-
172
- """
173
-
174
- Logger.clear_batch()
175
-
176
- Kaiseki.text_to_translate = []
177
- Kaiseki.translated_text = []
178
- Kaiseki.je_check_text = []
179
- Kaiseki.error_text = []
180
- Kaiseki.translation_print_result = ""
181
- Kaiseki.sentence_parts = []
182
- Kaiseki.sentence_punctuation = []
183
- Kaiseki.special_punctuation = []
184
- Kaiseki.current_sentence = ""
185
- Kaiseki.translated_sentence = ""
186
-
187
- ##-------------------start-of-commence_translation()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
188
-
189
- @staticmethod
190
- def commence_translation() -> None:
191
-
192
- """
193
-
194
- Commences the translation process using all the functions in the Kaiseki class.
195
-
196
- """
197
-
198
- i = 0
199
-
200
- while(i < len(Kaiseki.text_to_translate)):
201
-
202
- ## for webgui, if the user presses the clear button, raise an exception to stop the translation
203
- if(FileEnsurer.do_interrupt == True):
204
- raise Exception("Interrupted by user.")
205
-
206
- Kaiseki.current_sentence = Kaiseki.text_to_translate[i]
207
-
208
- Logger.log_action("Initial Sentence : " + Kaiseki.current_sentence)
209
-
210
- ## Kaiseki is an in-place translation, so it'll build the translated text into Kaiseki.translated_text as it goes.
211
- if(any(char in Kaiseki.current_sentence for char in ["▼", "△", "◇"])):
212
- Kaiseki.translated_text.append(Kaiseki.current_sentence + '\n')
213
- Logger.log_action("Sentence : " + Kaiseki.current_sentence + ", Sentence is a pov change... leaving intact.")
214
-
215
- elif("part" in Kaiseki.current_sentence.lower() or all(char in ["1","2","3","4","5","6","7","8","9", " "] for char in Kaiseki.current_sentence) and not all(char in [" "] for char in Kaiseki.current_sentence) and Kaiseki.current_sentence != '"..."' and Kaiseki.current_sentence != "..."):
216
- Kaiseki.translated_text.append(Kaiseki.current_sentence + '\n')
217
- Logger.log_action("Sentence : " + Kaiseki.current_sentence + ", Sentence is part marker... leaving intact.")
218
-
219
- elif bool(re.match(r'^[\W_\s\n-]+$', Kaiseki.current_sentence)) and not any(char in Kaiseki.current_sentence for char in ["」", "「", "«", "»"]):
220
- Logger.log_action("Sentence : " + Kaiseki.current_sentence + ", Sentence is punctuation... skipping.")
221
- Kaiseki.translated_text.append(Kaiseki.current_sentence + "\n")
222
-
223
- elif(bool(re.match(r'^[A-Za-z0-9\s\.,\'\?!]+\n*$', Kaiseki.current_sentence))):
224
- Logger.log_action("Sentence : " + Kaiseki.current_sentence + ", Sentence is english... skipping translation.")
225
- Kaiseki.translated_text.append(Kaiseki.current_sentence + "\n")
226
-
227
- elif(len(Kaiseki.current_sentence) == 0 or Kaiseki.current_sentence.isspace() == True):
228
- Logger.log_action("Sentence is empty... skipping translation.\n")
229
- Kaiseki.translated_text.append(Kaiseki.current_sentence + "\n")
230
-
231
- else:
232
-
233
- Kaiseki.separate_sentence()
234
-
235
- Kaiseki.translate_sentence()
236
-
237
- ## this is for adding a period if it's missing
238
- if(len(Kaiseki.translated_text[i]) > 0 and Kaiseki.translated_text[i] != "" and Kaiseki.translated_text[i][-2] not in string.punctuation and Kaiseki.sentence_punctuation[-1] == None):
239
- Kaiseki.translated_text[i] = Kaiseki.translated_text[i] + "."
240
-
241
- ## re-adds quotes
242
- if(Kaiseki.special_punctuation[0] == True):
243
- Kaiseki.translated_text[i] = '"' + Kaiseki.translated_text[i] + '"'
244
-
245
- ## replaces quotes because deepL messes up quotes
246
- elif('"' in Kaiseki.translated_text[i]):
247
- Kaiseki.translated_text[i] = Kaiseki.translated_text[i].replace('"',"'")
248
-
249
- ## re-adds single quotes
250
- if(Kaiseki.special_punctuation[3] == True):
251
- Kaiseki.translated_text[i] = "'" + Kaiseki.translated_text[i] + "'"
252
-
253
- ## re-adds parentheses
254
- if(Kaiseki.special_punctuation[4] == True):
255
- Kaiseki.translated_text[i] = "(" + Kaiseki.translated_text[i] + ")"
256
-
257
- Logger.log_action("Translated and Reassembled Sentence : " + Kaiseki.translated_text[i])
258
-
259
- Kaiseki.translated_text[i] += "\n"
260
-
261
- Kaiseki.je_check_text.append(str(i+1) + ": " + Kaiseki.current_sentence + "\n " + Kaiseki.translated_text[i] + "\n")
262
-
263
- i+=1
264
-
265
- Toolkit.clear_console()
266
-
267
- Logger.log_action(str(i) + "/" + str(len(Kaiseki.text_to_translate)) + " completed.", output=True)
268
- Logger.log_barrier()
269
-
270
- ##-------------------start-of-separate_sentence()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
271
-
272
- @staticmethod
273
- def separate_sentence() -> None:
274
-
275
- """
276
-
277
- This function separates the sentence into parts and punctuation.
278
-
279
- """
280
-
281
- ## resets variables for current_sentence
282
- Kaiseki.sentence_parts = []
283
- Kaiseki.sentence_punctuation = []
284
- Kaiseki.special_punctuation = [False,False,False,False,False]
285
-
286
- i = 0
287
-
288
- buildString = ""
289
-
290
- ## checks if quotes are in the sentence and removes them
291
- if('"' in Kaiseki.current_sentence):
292
- Kaiseki.current_sentence = Kaiseki.current_sentence.replace('"', '')
293
- Kaiseki.special_punctuation[0] = True
294
-
295
- ## checks if tildes are in the sentence
296
- if('~' in Kaiseki.current_sentence):
297
- Kaiseki.special_punctuation[1] = True
298
-
299
- ## checks if apostrophes are in the sentence but not at the beginning or end
300
- if(Kaiseki.current_sentence.count("'") == 2 and (Kaiseki.current_sentence[0] != "'" and Kaiseki.current_sentence[-1] != "'")):
301
- Kaiseki.special_punctuation[2] = True
302
-
303
- ## checks if apostrophes are in the sentence and removes them
304
- elif(Kaiseki.current_sentence.count("'") == 2):
305
- Kaiseki.special_punctuation[3] = True
306
- Kaiseki.current_sentence = Kaiseki.current_sentence.replace("'", "")
307
-
308
- ## checks if parentheses are in the sentence and removes them
309
- if("(" in Kaiseki.current_sentence and ")" in Kaiseki.current_sentence):
310
- Kaiseki.special_punctuation[4] = True
311
- Kaiseki.current_sentence= Kaiseki.current_sentence.replace("(","")
312
- Kaiseki.current_sentence= Kaiseki.current_sentence.replace(")","")
313
-
314
- while(i < len(Kaiseki.current_sentence)):
315
-
316
- if(Kaiseki.current_sentence[i] in [".","!","?","-"]):
317
-
318
- if(i+5 < len(Kaiseki.current_sentence) and Kaiseki.current_sentence[i:i+6] in ["......"]):
319
-
320
- if(i+6 < len(Kaiseki.current_sentence) and Kaiseki.current_sentence[i:i+7] in ["......'"]):
321
- buildString += "'"
322
- i+=1
323
-
324
- if(buildString != ""):
325
- Kaiseki.sentence_parts.append(buildString)
326
-
327
- Kaiseki.sentence_punctuation.append(Kaiseki.current_sentence[i:i+6])
328
- i+=5
329
- buildString = ""
330
-
331
- if(i+4 < len(Kaiseki.current_sentence) and Kaiseki.current_sentence[i:i+5] in [".....","...!?"]):
332
-
333
- if(i+5 < len(Kaiseki.current_sentence) and Kaiseki.current_sentence[i:i+6] in [".....'","...!?'"]):
334
- buildString += "'"
335
- i+=1
336
-
337
- if(buildString != ""):
338
- Kaiseki.sentence_parts.append(buildString)
339
-
340
- Kaiseki.sentence_punctuation.append(Kaiseki.current_sentence[i:i+5])
341
- i+=4
342
- buildString = ""
343
-
344
- elif(i+3 < len(Kaiseki.current_sentence) and Kaiseki.current_sentence[i:i+4] in ["...!","...?","---.","....","!..."]):
345
-
346
- if(i+4 < len(Kaiseki.current_sentence) and Kaiseki.current_sentence[i:i+5] in ["...!'","...?'","---.'","....'","!...'"]):
347
- buildString += "'"
348
- i+=1
349
-
350
- if(buildString != ""):
351
- Kaiseki.sentence_parts.append(buildString)
352
-
353
- Kaiseki.sentence_punctuation.append(Kaiseki.current_sentence[i:i+4])
354
- i+=3
355
- buildString = ""
356
-
357
- elif(i+2 < len(Kaiseki.current_sentence) and Kaiseki.current_sentence[i:i+3] in ["---","..."]):
358
-
359
- if(i+3 < len(Kaiseki.current_sentence) and Kaiseki.current_sentence[i:i+4] in ["---'","...'"]):
360
- buildString += "'"
361
- i+=1
362
-
363
- if(buildString != ""):
364
- Kaiseki.sentence_parts.append(buildString)
365
-
366
- Kaiseki.sentence_punctuation.append(Kaiseki.current_sentence[i:i+3])
367
- i+=2
368
- buildString = ""
369
-
370
- elif(i+1 < len(Kaiseki.current_sentence) and Kaiseki.current_sentence[i:i+2] == '!?'):
371
-
372
- if(i+2 < len(Kaiseki.current_sentence) and Kaiseki.current_sentence[i:i+3] == "!?'"):
373
- buildString += "'"
374
- i+=1
375
-
376
- if(buildString != ""):
377
- Kaiseki.sentence_parts.append(buildString)
378
-
379
- Kaiseki.sentence_punctuation.append(Kaiseki.current_sentence[i:i+2])
380
- i+=1
381
- buildString = ""
382
-
383
- ## if punctuation that was found is not a hyphen then just follow normal punctuation separation rules
384
- elif(Kaiseki.current_sentence[i] != "-"):
385
-
386
- if(i+1 < len(Kaiseki.current_sentence) and Kaiseki.current_sentence[i+1] == "'"):
387
- buildString += "'"
388
-
389
- if(buildString != ""):
390
- Kaiseki.sentence_parts.append(buildString)
391
-
392
- Kaiseki.sentence_punctuation.append(Kaiseki.current_sentence[i])
393
- buildString = ""
394
-
395
- ## if it is just a singular hyphen, do not consider it punctuation as they are used in honorifics
396
- else:
397
- buildString += Kaiseki.current_sentence[i]
398
- else:
399
- buildString += Kaiseki.current_sentence[i]
400
-
401
- i += 1
402
-
403
- ## if end of line, add none punctuation which means a period needs to be added later
404
- if(buildString):
405
- Kaiseki.sentence_parts.append(buildString)
406
- Kaiseki.sentence_punctuation.append(None)
407
-
408
- Logger.log_action("Fragmented Sentence Parts " + str(Kaiseki.sentence_parts))
409
- Logger.log_action("Sentence Punctuation " + str(Kaiseki.sentence_punctuation))
410
- Logger.log_action("Does Sentence Have Special Punctuation : " + str(Kaiseki.special_punctuation))
411
-
412
- ## strip the sentence parts
413
- Kaiseki.sentence_parts = [part.strip() for part in Kaiseki.sentence_parts]
414
-
415
- ##-------------------start-of-translate_sentence()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
416
-
417
- @staticmethod
418
- def translate_sentence() -> None:
419
-
420
- """
421
-
422
- This function translates each part of a sentence.
423
-
424
- """
425
-
426
- i = 0
427
- ii = 0
428
-
429
- quote = ""
430
- error = ""
431
-
432
- tilde_active = False
433
- single_quote_active = False
434
-
435
- while(i < len(Kaiseki.sentence_parts)):
436
-
437
- ## if tilde is present in part, delete it and set tilde active to true, so we can add it back in a bit
438
- if(Kaiseki.special_punctuation[1] == True and "~" in Kaiseki.sentence_parts[i]):
439
- Kaiseki.sentence_parts[i] = Kaiseki.sentence_parts[i].replace("~","")
440
- tilde_active = True
441
-
442
- ## a quote is present in the sentence, but not enclosing the sentence, we need to isolate it
443
- if(Kaiseki.special_punctuation[2] == True and "'" in Kaiseki.sentence_parts[i] and (Kaiseki.sentence_parts[i][0] != "'" and Kaiseki.sentence_parts[i][-1] != "'")):
444
-
445
- sentence = Kaiseki.sentence_parts[i]
446
- substring_start = sentence.index("'")
447
- substring_end = 0
448
- quote = ""
449
-
450
- ii = substring_start
451
- while(ii < len(sentence)):
452
- if(sentence[ii] == "'"):
453
- substring_end = ii
454
- ii+=1
455
-
456
- quote = sentence[substring_start+1:substring_end]
457
- Kaiseki.sentence_parts[i] = sentence[:substring_start+1] + "quote" + sentence[substring_end:]
458
-
459
- single_quote_active = True
460
-
461
- try:
462
- results = EasyTL.deepl_translate(text=Kaiseki.sentence_parts[i], source_lang= "JA", target_lang="EN-US")
463
-
464
- assert isinstance(results, str), "ValueError: " + str(results)
465
-
466
- translated_part = results.rstrip(''.join(c for c in string.punctuation if c not in "'\""))
467
- translated_part = translated_part.rstrip()
468
-
469
- ## here we re-add the tilde, (note not always accurate but mostly is)
470
- if(tilde_active == True):
471
- translated_part += "~"
472
- tilde_active = False
473
-
474
- ## translates the quote and re-adds it back to the sentence part
475
- if(single_quote_active == True):
476
- results = EasyTL.deepl_translate(text=Kaiseki.sentence_parts[i], source_lang= "JA", target_lang="EN-US")
477
-
478
- assert isinstance(results, str), "ValueError: " + str(results)
479
-
480
- quote = quote.rstrip(''.join(c for c in string.punctuation if c not in "'\""))
481
- quote = quote.rstrip()
482
-
483
- translated_part = translated_part.replace("'quote'","'" + quote + "'",1)
484
-
485
- ## if punctuation appears first and before any text, add the punctuation and remove it form the list.
486
- if(len(Kaiseki.sentence_punctuation) > len(Kaiseki.sentence_parts)):
487
- Kaiseki.translated_sentence += Kaiseki.sentence_punctuation[0]
488
- Kaiseki.sentence_punctuation.pop(0)
489
-
490
- if(Kaiseki.sentence_punctuation[i] != None):
491
- Kaiseki.translated_sentence += translated_part + Kaiseki.sentence_punctuation[i]
492
- else:
493
- Kaiseki.translated_sentence += translated_part
494
-
495
- if(i != len(Kaiseki.sentence_punctuation)-1):
496
- Kaiseki.translated_sentence += " "
497
-
498
- except QuotaExceededException as e:
499
-
500
- Logger.log_action("DeepL API quota exceeded.", output=True)
501
-
502
- Toolkit.pause_console()
503
-
504
- raise e
505
-
506
- except ValueError as e:
507
-
508
- if(str(e) == "Text must not be empty."):
509
- Kaiseki.translated_sentence += ""
510
- else:
511
- Kaiseki.translated_sentence += "ERROR"
512
- error = str(e)
513
-
514
- Logger.log_action("Error is : " + error)
515
- Kaiseki.error_text.append("Error is : " + error)
516
-
517
- i+=1
518
-
519
- Kaiseki.translated_text.append(Kaiseki.translated_sentence)
520
- Kaiseki.translated_sentence = ""
521
-
522
- ##-------------------start-of-assemble_results()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
523
-
524
- @staticmethod
525
- def assemble_results(time_start:float, time_end:float) -> None:
526
-
527
- """
528
-
529
- Prepares the results of the translation for printing.
530
-
531
- Parameters:
532
- time_start (float) : the time the translation started.
533
- time_end (float) : the time the translation ended.
534
-
535
- """
536
-
537
- Kaiseki.translation_print_result += "Time Elapsed : " + Toolkit.get_elapsed_time(time_start, time_end)
538
-
539
- Kaiseki.translation_print_result += "\n\nDebug text have been written to : " + FileEnsurer.debug_log_path
540
- Kaiseki.translation_print_result += "\nJ->E text have been written to : " + FileEnsurer.je_check_path
541
- Kaiseki.translation_print_result += "\nTranslated text has been written to : " + FileEnsurer.translated_text_path
542
- Kaiseki.translation_print_result += "\nErrors have been written to : " + FileEnsurer.error_log_path + "\n"
543
-
544
- ##-------------------start-of-write_kaiseki_results()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
545
-
546
- @staticmethod
547
- @permission_error_decorator()
548
- def write_kaiseki_results() -> None:
549
-
550
- """
551
-
552
- This function is called to write the results of the Kaiseki translation module to the output directory.
553
-
554
- """
555
-
556
- ## ensures the output directory exists, cause it could get moved or fucked with.
557
- FileEnsurer.standard_create_directory(FileEnsurer.output_dir)
558
-
559
- with open(FileEnsurer.error_log_path, 'a+', encoding='utf-8') as file:
560
- file.writelines(Kaiseki.error_text)
561
-
562
- with open(FileEnsurer.je_check_path, 'w', encoding='utf-8') as file:
563
- file.writelines(Kaiseki.je_check_text)
564
-
565
- with open(FileEnsurer.translated_text_path, 'w', encoding='utf-8') as file:
566
- file.writelines(Kaiseki.translated_text)
567
-
568
- ## Instructions to create a copy of the output for archival
569
- FileEnsurer.standard_create_directory(FileEnsurer.archive_dir)
570
-
571
- timestamp = Toolkit.get_timestamp(is_archival=True)
572
-
573
- ## pushes the tl debug log to the file without clearing the file
574
- Logger.push_batch()
575
- Logger.clear_batch()
576
-
577
- list_of_result_tuples = [('kaiseki_translated_text', Kaiseki.translated_text),
578
- ('kaiseki_je_check_text', Kaiseki.je_check_text),
579
- ('kaiseki_error_log', Kaiseki.error_text),
580
- ('debug_log', FileEnsurer.standard_read_file(Logger.log_file_path))]
581
-
582
- FileEnsurer.archive_results(list_of_result_tuples,
583
- module='kaiseki', timestamp=timestamp)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
modules/common/exceptions.py CHANGED
@@ -1,12 +1,33 @@
1
  ## third-party libraries
2
  ## for importing, other scripts will use from common.exceptions instead of from the third-party libraries themselves
3
- from easytl import AuthenticationError, InternalServerError, RateLimitError, APITimeoutError
4
- from easytl import AuthorizationException, QuotaExceededException
5
- from easytl import GoogleAuthError
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
  ##-------------------start-of-MaxBatchDurationExceededException--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
8
 
9
- class MaxBatchDurationExceededException(Exception):
10
 
11
  """
12
 
@@ -27,7 +48,7 @@ class MaxBatchDurationExceededException(Exception):
27
 
28
  ##-------------------start-of-InvalidAPIKeyException--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
29
 
30
- class InvalidAPIKeyException(Exception):
31
 
32
  """
33
 
@@ -48,7 +69,7 @@ class InvalidAPIKeyException(Exception):
48
 
49
  ##-------------------start-of-TooManyFileAccessAttemptsException--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
50
 
51
- class TooManyFileAccessAttemptsException(Exception):
52
 
53
  """
54
 
 
1
  ## third-party libraries
2
  ## for importing, other scripts will use from common.exceptions instead of from the third-party libraries themselves
3
+ from easytl import OpenAIAuthenticationError, OpenAIInternalServerError, OpenAIRateLimitError, OpenAIAPITimeoutError, OpenAIAPIConnectionError, OpenAIAPIStatusError
4
+ from easytl import DeepLAuthorizationException, DeepLQuotaExceededException, DeepLException
5
+ from easytl import GoogleAuthError, GoogleAPIError
6
+
7
+ ##-------------------start-of-KudasaiException--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
8
+
9
+ class KudasaiException(Exception):
10
+
11
+ """
12
+
13
+ KudasaiException is an exception that is raised when an error occurs within the Kudasai Application.
14
+
15
+ """
16
+
17
+ def __init__(self, message:str) -> None:
18
+
19
+ """
20
+
21
+ Parameters:
22
+ message (string) : The message to display.
23
+
24
+ """
25
+
26
+ self.message = message
27
 
28
  ##-------------------start-of-MaxBatchDurationExceededException--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
29
 
30
+ class MaxBatchDurationExceededException(KudasaiException):
31
 
32
  """
33
 
 
48
 
49
  ##-------------------start-of-InvalidAPIKeyException--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
50
 
51
+ class InvalidAPIKeyException(KudasaiException):
52
 
53
  """
54
 
 
69
 
70
  ##-------------------start-of-TooManyFileAccessAttemptsException--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
71
 
72
+ class TooManyFileAccessAttemptsException(KudasaiException):
73
 
74
  """
75
 
modules/common/file_ensurer.py CHANGED
@@ -4,10 +4,10 @@ import traceback
4
  import json
5
  import typing
6
  import shutil
 
7
 
8
  ## custom modules
9
  from modules.common.decorators import permission_error_decorator
10
- from modules.common.logger import Logger
11
  from modules.common.toolkit import Toolkit
12
 
13
  class FileEnsurer():
@@ -16,7 +16,7 @@ class FileEnsurer():
16
 
17
  FileEnsurer is a class that is used to ensure that the required files and directories exist.
18
  Also serves as a place to store the paths to the files and directories. Some file related functions are also stored here.
19
- As well as some variables that are used to store the default kijiku rules and the allowed models across Kudasai.
20
 
21
  """
22
 
@@ -28,37 +28,36 @@ class FileEnsurer():
28
  hugging_face_flag = os.path.join(script_dir, "util", "hugging_face_flag.py")
29
 
30
  ## main dirs (config is just under userprofile on windows, and under home on linux); secrets are under appdata on windows, and under .config on linux
31
- if(os.name == 'nt'): ## Windows
32
  config_dir = os.path.join(os.environ['USERPROFILE'],"KudasaiConfig")
33
  secrets_dir = os.path.join(os.environ['APPDATA'],"KudasaiSecrets")
34
  else: ## Linux
35
  config_dir = os.path.join(os.path.expanduser("~"), "KudasaiConfig")
36
  secrets_dir = os.path.join(os.path.expanduser("~"), ".config", "KudasaiSecrets")
37
 
38
- Logger.log_file_path = os.path.join(output_dir, "debug_log.txt")
39
-
40
  ##----------------------------------/
41
 
42
  ## sub dirs
43
  lib_dir = os.path.join(script_dir, "lib")
 
44
  gui_lib = os.path.join(lib_dir, "gui")
45
  jsons_dir = os.path.join(script_dir, "jsons")
46
 
47
  ##----------------------------------/
48
 
49
  ## output files
50
- preprocessed_text_path = os.path.join(output_dir, "preprocessed_text.txt") ## path for the preprocessed text
51
- translated_text_path = os.path.join(output_dir, "translated_text.txt") ## path for translated text
52
 
53
- je_check_path = os.path.join(output_dir, "je_check_text.txt") ## path for je check text (text generated by the translation modules to compare against the translated text)
54
 
55
- kairyou_log_path = os.path.join(output_dir, "preprocessing_results.txt") ## path for kairyou log (the results of preprocessing)
56
- error_log_path = os.path.join(output_dir, "error_log.txt") ## path for the error log (errors generated by the preprocessing and translation modules)
57
- debug_log_path = Logger.log_file_path ## path for the debug log (debug info generated by the preprocessing and translation modules)
58
 
59
- ## kijiku rules
60
- external_kijiku_rules_path = os.path.join(script_dir,'kijiku_rules.json')
61
- config_kijiku_rules_path = os.path.join(config_dir,'kijiku_rules.json')
62
 
63
  ## api keys
64
  deepl_api_key_path = os.path.join(secrets_dir, "deepl_api_key.txt")
@@ -68,8 +67,14 @@ class FileEnsurer():
68
  ## favicon
69
  favicon_path = os.path.join(gui_lib, "Kudasai_Logo.png")
70
 
71
- DEFAULT_KIJIKU_RULES = {
72
- "base kijiku settings": {
 
 
 
 
 
 
73
  "prompt_assembly_mode": 1,
74
  "number_of_lines_per_batch": 36,
75
  "sentence_fragmenter_mode": 2,
@@ -103,9 +108,16 @@ class FileEnsurer():
103
  "gemini_stream": False,
104
  "gemini_stop_sequences": None,
105
  "gemini_max_output_tokens": None
 
 
 
 
 
 
 
106
  }
107
  }
108
- INVALID_KIJIKU_RULES_PLACEHOLDER = {
109
  "INVALID JSON":
110
  {
111
  "INVALID JSON":"INVALID JSON"
@@ -127,6 +139,9 @@ class FileEnsurer():
127
 
128
  Determines if Kudasai is running on a Hugging Face server.
129
 
 
 
 
130
  """
131
 
132
  return os.path.exists(FileEnsurer.hugging_face_flag)
@@ -144,8 +159,6 @@ class FileEnsurer():
144
 
145
  print("Cleaning up and exiting...")
146
 
147
- Logger.push_batch()
148
-
149
  exit()
150
 
151
  ##-------------------start-of-setup_needed_files()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
@@ -158,6 +171,9 @@ class FileEnsurer():
158
 
159
  Ensures that the required files and directories exist.
160
 
 
 
 
161
  """
162
 
163
 
@@ -168,9 +184,6 @@ class FileEnsurer():
168
  FileEnsurer.standard_create_directory(FileEnsurer.output_dir)
169
  FileEnsurer.standard_create_directory(FileEnsurer.secrets_dir)
170
 
171
- ## creates and clears the log file
172
- Logger.clear_log_file()
173
-
174
  ## creates the 5 output files
175
  FileEnsurer.standard_create_file(FileEnsurer.preprocessed_text_path)
176
  FileEnsurer.standard_create_file(FileEnsurer.translated_text_path)
@@ -179,9 +192,9 @@ class FileEnsurer():
179
  FileEnsurer.standard_create_file(FileEnsurer.error_log_path)
180
 
181
  ## creates the kijiku rules file if it doesn't exist
182
- if(os.path.exists(FileEnsurer.config_kijiku_rules_path) == False):
183
- with open(FileEnsurer.config_kijiku_rules_path, 'w+', encoding='utf-8') as file:
184
- json.dump(FileEnsurer.DEFAULT_KIJIKU_RULES, file)
185
 
186
  ##-------------------start-of-purge_storage()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
187
 
@@ -193,10 +206,16 @@ class FileEnsurer():
193
 
194
  In case of hugging face, purges the storage.
195
 
 
 
 
196
  """
197
 
198
  if(not FileEnsurer.is_hugging_space()):
 
199
  return
 
 
200
 
201
  stuff_to_purge = [
202
  FileEnsurer.secrets_dir,
@@ -243,11 +262,14 @@ class FileEnsurer():
243
  Parameters:
244
  directory_path (str) : path to the directory to be created.
245
 
 
 
 
246
  """
247
 
248
  if(os.path.isdir(directory_path) == False):
249
  os.makedirs(directory_path)
250
- Logger.log_action(directory_path + " created due to lack of the folder")
251
 
252
  ##--------------------start-of-standard_create_file()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
253
 
@@ -262,10 +284,13 @@ class FileEnsurer():
262
  Parameters:
263
  file_path (str) : path to the file to be created.
264
 
 
 
 
265
  """
266
 
267
  if(os.path.exists(file_path) == False):
268
- Logger.log_action(file_path + " was created due to lack of the file")
269
  with open(file_path, "w+", encoding="utf-8") as file:
270
  file.truncate()
271
 
@@ -286,12 +311,15 @@ class FileEnsurer():
286
  Returns:
287
  bool : whether or not the file was overwritten.
288
 
 
 
 
289
  """
290
 
291
  did_overwrite = False
292
 
293
  if(os.path.exists(file_path) == False or os.path.getsize(file_path) == 0):
294
- Logger.log_action(file_path + " was created due to lack of the file or because it is blank")
295
  with open(file_path, "w+", encoding="utf-8") as file:
296
  file.write(content_to_write)
297
 
@@ -312,7 +340,10 @@ class FileEnsurer():
312
  Parameters:
313
  file_path (str) : path to the file to be overwritten.
314
  content to write (str) : content to be written to the file.
315
- omit (bool | optional) : whether or not to omit the content from the log.
 
 
 
316
 
317
  """
318
 
@@ -322,7 +353,7 @@ class FileEnsurer():
322
  if(omit):
323
  content_to_write = "(Content was omitted)"
324
 
325
- Logger.log_action(file_path + " was overwritten with the following content: " + content_to_write)
326
 
327
  ##--------------------start-of-clear_file()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
328
 
@@ -337,12 +368,15 @@ class FileEnsurer():
337
  Parameters:
338
  file_path (str) : path to the file to be cleared.
339
 
 
 
 
340
  """
341
 
342
  with open(file_path, "w+", encoding="utf-8") as file:
343
  file.truncate()
344
 
345
- Logger.log_action(file_path + " was cleared")
346
 
347
  ##--------------------start-of-standard_read_file()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
348
 
@@ -360,6 +394,9 @@ class FileEnsurer():
360
  Returns:
361
  content (str) : the content of the file.
362
 
 
 
 
363
  """
364
 
365
  with open(file_path, "r", encoding="utf-8") as file:
@@ -379,17 +416,16 @@ class FileEnsurer():
379
  Parameters:
380
  critical_exception (object - Exception) : the exception to be handled.
381
 
382
- """
 
383
 
384
- Logger.log_barrier()
385
- Logger.log_action("Kudasai has crashed", output=True)
386
- Logger.log_action("Please send the following to the developer on github at https://github.com/Bikatr7/Kudasai/issues :", output=True, omit_timestamp=True)
387
 
388
  traceback_msg = traceback.format_exc()
389
 
390
- Logger.log_action(traceback_msg ,output=True, omit_timestamp=True)
391
-
392
- Logger.push_batch()
393
 
394
  Toolkit.pause_console()
395
 
@@ -411,6 +447,9 @@ class FileEnsurer():
411
  module (str) : name of the module that generated the results.
412
  timestamp (str) : timestamp of when the results were generated.
413
 
 
 
 
414
  """
415
 
416
  archival_path = os.path.join(FileEnsurer.archive_dir, f'{module}_run_{timestamp}')
@@ -439,6 +478,9 @@ class FileEnsurer():
439
  Returns:
440
  json_object (dict) : the json object.
441
 
 
 
 
442
  """
443
 
444
  with open(file_path, "r", encoding="utf-8") as file:
@@ -462,6 +504,9 @@ class FileEnsurer():
462
  error_log (str) : the log of any errors that occurred during preprocessing.
463
  timestamp (str) : the timestamp of when the results were generated (Can be obtained from Toolkit.get_timestamp(is_archival=True))
464
 
 
 
 
465
  """
466
 
467
  ## ensures the output directory exists, cause it could get moved or fucked with.
@@ -485,14 +530,12 @@ class FileEnsurer():
485
  ## Instructions to create a copy of the output for archival
486
  FileEnsurer.standard_create_directory(FileEnsurer.archive_dir)
487
 
488
- Logger.push_batch()
489
- Logger.clear_batch()
490
-
491
  list_of_result_tuples = [('kairyou_preprocessed_text', text_to_preprocess),
492
- ('kairyou_preprocessing_log', preprocessing_log),
493
- ('kairyou_error_log', error_log),
494
- ('debug_log', FileEnsurer.standard_read_file(Logger.log_file_path))]
495
 
496
  FileEnsurer.archive_results(list_of_result_tuples,
497
- module='kairyou', timestamp=timestamp)
 
498
 
 
4
  import json
5
  import typing
6
  import shutil
7
+ import logging
8
 
9
  ## custom modules
10
  from modules.common.decorators import permission_error_decorator
 
11
  from modules.common.toolkit import Toolkit
12
 
13
  class FileEnsurer():
 
16
 
17
  FileEnsurer is a class that is used to ensure that the required files and directories exist.
18
  Also serves as a place to store the paths to the files and directories. Some file related functions are also stored here.
19
+ As well as some variables that are used to store the default translation settings and the allowed models across Kudasai.
20
 
21
  """
22
 
 
28
  hugging_face_flag = os.path.join(script_dir, "util", "hugging_face_flag.py")
29
 
30
  ## main dirs (config is just under userprofile on windows, and under home on linux); secrets are under appdata on windows, and under .config on linux
31
+ if(Toolkit.is_windows()): ## Windows
32
  config_dir = os.path.join(os.environ['USERPROFILE'],"KudasaiConfig")
33
  secrets_dir = os.path.join(os.environ['APPDATA'],"KudasaiSecrets")
34
  else: ## Linux
35
  config_dir = os.path.join(os.path.expanduser("~"), "KudasaiConfig")
36
  secrets_dir = os.path.join(os.path.expanduser("~"), ".config", "KudasaiSecrets")
37
 
 
 
38
  ##----------------------------------/
39
 
40
  ## sub dirs
41
  lib_dir = os.path.join(script_dir, "lib")
42
+ common_lib = os.path.join(lib_dir, "common")
43
  gui_lib = os.path.join(lib_dir, "gui")
44
  jsons_dir = os.path.join(script_dir, "jsons")
45
 
46
  ##----------------------------------/
47
 
48
  ## output files
49
+ preprocessed_text_path = os.path.join(output_dir, "preprocessed_text.txt")
50
+ translated_text_path = os.path.join(output_dir, "translated_text.txt")
51
 
52
+ je_check_path = os.path.join(output_dir, "je_check_text.txt")
53
 
54
+ kairyou_log_path = os.path.join(output_dir, "preprocessing_results.txt")
55
+ error_log_path = os.path.join(output_dir, "error_log.txt")
56
+ debug_log_path = os.path.join(output_dir, "debug_log.txt")
57
 
58
+ ## translation settings
59
+ external_translation_settings_path = os.path.join(script_dir,'translation_settings.json')
60
+ config_translation_settings_path = os.path.join(config_dir,'translation_settings.json')
61
 
62
  ## api keys
63
  deepl_api_key_path = os.path.join(secrets_dir, "deepl_api_key.txt")
 
67
  ## favicon
68
  favicon_path = os.path.join(gui_lib, "Kudasai_Logo.png")
69
 
70
+ ## js save to file
71
+ js_save_to_file_path = os.path.join(gui_lib, "save_to_file.js")
72
+
73
+ ## translation settings description
74
+ translation_settings_description_path = os.path.join(common_lib, "translation_settings_description.txt")
75
+
76
+ DEFAULT_TRANSLATION_SETTING = {
77
+ "base translation settings": {
78
  "prompt_assembly_mode": 1,
79
  "number_of_lines_per_batch": 36,
80
  "sentence_fragmenter_mode": 2,
 
108
  "gemini_stream": False,
109
  "gemini_stop_sequences": None,
110
  "gemini_max_output_tokens": None
111
+ },
112
+
113
+ "deepl settings":{
114
+ "deepl_context": "",
115
+ "deepl_split_sentences": "ALL",
116
+ "deepl_preserve_formatting": True,
117
+ "deepl_formality": "default"
118
  }
119
  }
120
+ INVALID_TRANSLATION_SETTINGS_PLACEHOLDER = {
121
  "INVALID JSON":
122
  {
123
  "INVALID JSON":"INVALID JSON"
 
139
 
140
  Determines if Kudasai is running on a Hugging Face server.
141
 
142
+ Returns:
143
+ bool : whether or not Kudasai is running on a Hugging Face server.
144
+
145
  """
146
 
147
  return os.path.exists(FileEnsurer.hugging_face_flag)
 
159
 
160
  print("Cleaning up and exiting...")
161
 
 
 
162
  exit()
163
 
164
  ##-------------------start-of-setup_needed_files()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
171
 
172
  Ensures that the required files and directories exist.
173
 
174
+ Decorated By:
175
+ permission_error_decorator
176
+
177
  """
178
 
179
 
 
184
  FileEnsurer.standard_create_directory(FileEnsurer.output_dir)
185
  FileEnsurer.standard_create_directory(FileEnsurer.secrets_dir)
186
 
 
 
 
187
  ## creates the 5 output files
188
  FileEnsurer.standard_create_file(FileEnsurer.preprocessed_text_path)
189
  FileEnsurer.standard_create_file(FileEnsurer.translated_text_path)
 
192
  FileEnsurer.standard_create_file(FileEnsurer.error_log_path)
193
 
194
  ## creates the kijiku rules file if it doesn't exist
195
+ if(os.path.exists(FileEnsurer.config_translation_settings_path) == False):
196
+ with open(FileEnsurer.config_translation_settings_path, 'w+', encoding='utf-8') as file:
197
+ json.dump(FileEnsurer.DEFAULT_TRANSLATION_SETTING, file)
198
 
199
  ##-------------------start-of-purge_storage()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
200
 
 
206
 
207
  In case of hugging face, purges the storage.
208
 
209
+ Decorated By:
210
+ permission_error_decorator
211
+
212
  """
213
 
214
  if(not FileEnsurer.is_hugging_space()):
215
+ logging.debug("Not running on Hugging Face, skipping storage purge")
216
  return
217
+
218
+ logging.debug("Running on Hugging Face, purging storage")
219
 
220
  stuff_to_purge = [
221
  FileEnsurer.secrets_dir,
 
262
  Parameters:
263
  directory_path (str) : path to the directory to be created.
264
 
265
+ Decorated By:
266
+ permission_error_decorator
267
+
268
  """
269
 
270
  if(os.path.isdir(directory_path) == False):
271
  os.makedirs(directory_path)
272
+ logging.debug(directory_path + " created due to lack of the folder")
273
 
274
  ##--------------------start-of-standard_create_file()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
275
 
 
284
  Parameters:
285
  file_path (str) : path to the file to be created.
286
 
287
+ Decorated By:
288
+ permission_error_decorator
289
+
290
  """
291
 
292
  if(os.path.exists(file_path) == False):
293
+ logging.debug(file_path + " was created due to lack of the file")
294
  with open(file_path, "w+", encoding="utf-8") as file:
295
  file.truncate()
296
 
 
311
  Returns:
312
  bool : whether or not the file was overwritten.
313
 
314
+ Decorated By:
315
+ permission_error_decorator
316
+
317
  """
318
 
319
  did_overwrite = False
320
 
321
  if(os.path.exists(file_path) == False or os.path.getsize(file_path) == 0):
322
+ logging.debug(file_path + " was created due to lack of the file or because it is blank")
323
  with open(file_path, "w+", encoding="utf-8") as file:
324
  file.write(content_to_write)
325
 
 
340
  Parameters:
341
  file_path (str) : path to the file to be overwritten.
342
  content to write (str) : content to be written to the file.
343
+ omit (bool | optional | default=True) : whether or not to omit the content from the log.
344
+
345
+ Decorated By:
346
+ permission_error_decorator
347
 
348
  """
349
 
 
353
  if(omit):
354
  content_to_write = "(Content was omitted)"
355
 
356
+ logging.debug(file_path + " was overwritten with the following content: " + content_to_write)
357
 
358
  ##--------------------start-of-clear_file()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
359
 
 
368
  Parameters:
369
  file_path (str) : path to the file to be cleared.
370
 
371
+ Decorated By:
372
+ permission_error_decorator
373
+
374
  """
375
 
376
  with open(file_path, "w+", encoding="utf-8") as file:
377
  file.truncate()
378
 
379
+ logging.debug(file_path + " was cleared")
380
 
381
  ##--------------------start-of-standard_read_file()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
382
 
 
394
  Returns:
395
  content (str) : the content of the file.
396
 
397
+ Decorated By:
398
+ permission_error_decorator
399
+
400
  """
401
 
402
  with open(file_path, "r", encoding="utf-8") as file:
 
416
  Parameters:
417
  critical_exception (object - Exception) : the exception to be handled.
418
 
419
+ Decorated By:
420
+ permission_error_decorator
421
 
422
+ """
 
 
423
 
424
  traceback_msg = traceback.format_exc()
425
 
426
+ logging.error(f"Kudasai has crashed "
427
+ f"Please send the following to the developer on github at https://github.com/Bikatr7/Kudasai/issues :"
428
+ f"{traceback_msg}")
429
 
430
  Toolkit.pause_console()
431
 
 
447
  module (str) : name of the module that generated the results.
448
  timestamp (str) : timestamp of when the results were generated.
449
 
450
+ Decorated By:
451
+ permission_error_decorator
452
+
453
  """
454
 
455
  archival_path = os.path.join(FileEnsurer.archive_dir, f'{module}_run_{timestamp}')
 
478
  Returns:
479
  json_object (dict) : the json object.
480
 
481
+ Decorated By:
482
+ permission_error_decorator
483
+
484
  """
485
 
486
  with open(file_path, "r", encoding="utf-8") as file:
 
504
  error_log (str) : the log of any errors that occurred during preprocessing.
505
  timestamp (str) : the timestamp of when the results were generated (Can be obtained from Toolkit.get_timestamp(is_archival=True))
506
 
507
+ Decorated By:
508
+ permission_error_decorator
509
+
510
  """
511
 
512
  ## ensures the output directory exists, cause it could get moved or fucked with.
 
530
  ## Instructions to create a copy of the output for archival
531
  FileEnsurer.standard_create_directory(FileEnsurer.archive_dir)
532
 
 
 
 
533
  list_of_result_tuples = [('kairyou_preprocessed_text', text_to_preprocess),
534
+ ('kairyou_preprocessing_log', preprocessing_log),
535
+ ('kairyou_error_log', error_log),
536
+ ('debug_log', FileEnsurer.standard_read_file(FileEnsurer.debug_log_path))]
537
 
538
  FileEnsurer.archive_results(list_of_result_tuples,
539
+ module='kairyou',
540
+ timestamp=timestamp)
541
 
modules/common/logger.py DELETED
@@ -1,132 +0,0 @@
1
- ## custom modules
2
- from modules.common.toolkit import Toolkit
3
- from modules.common.decorators import permission_error_decorator
4
-
5
- class Logger:
6
-
7
- """
8
-
9
- The logger class is used to log actions taken by Kudasai.
10
-
11
- """
12
-
13
- log_file_path = ""
14
-
15
- current_batch = ""
16
-
17
- errors = []
18
-
19
- ##--------------------start-of-log_action()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
20
-
21
- @staticmethod
22
- def log_action(action:str, output:bool=False, omit_timestamp:bool=False) -> None:
23
-
24
- """
25
-
26
- Logs an action.
27
-
28
- Parameters:
29
- action (str) : the action being logged.
30
- output (bool | optional | defaults to false) : whether or not to output the action to the console.
31
- omit_timestamp (bool | optional | defaults to false) : whether or not to omit the timestamp from the action.
32
-
33
- """
34
-
35
- timestamp = Toolkit.get_timestamp()
36
-
37
- log_line = timestamp + action + "\n"
38
-
39
- Logger.current_batch += log_line
40
-
41
- if(omit_timestamp):
42
- log_line = action
43
-
44
- if(output):
45
- print(log_line)
46
-
47
- ##--------------------start-of-log_error()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
48
-
49
- @staticmethod
50
- def log_error(action:str, output:bool=False, omit_timestamp:bool=False) -> None:
51
-
52
- """
53
-
54
- Logs an error.
55
-
56
- Parameters:
57
- action (str) : the action being logged.
58
- output (bool | optional | defaults to false) : whether or not to output the action to the console.
59
- omit_timestamp (bool | optional | defaults to false) : whether or not to omit the timestamp from the action.
60
-
61
- """
62
-
63
- timestamp = Toolkit.get_timestamp()
64
-
65
- log_line = timestamp + action + "\n"
66
-
67
- Logger.current_batch += log_line
68
-
69
- if(omit_timestamp):
70
- log_line = action
71
-
72
- if(output):
73
- print(log_line)
74
-
75
- Logger.errors.append(log_line)
76
-
77
- ##--------------------start-of-log_barrier()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
78
-
79
- @staticmethod
80
- def log_barrier() -> None:
81
-
82
- """
83
-
84
- Logs a barrier.
85
-
86
- """
87
-
88
- Logger.log_action("-------------------------")
89
-
90
- ##--------------------start-of-clear_batch()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
91
-
92
- @staticmethod
93
- def clear_batch() -> None:
94
-
95
- """
96
-
97
- Clears the current batch.
98
-
99
- """
100
-
101
- Logger.current_batch = ""
102
- Logger.errors = []
103
-
104
- ##--------------------start-of-push_batch()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
105
-
106
- @staticmethod
107
- @permission_error_decorator()
108
- def push_batch() -> None:
109
-
110
- """
111
-
112
- Pushes all stored actions to the log file.
113
-
114
- """
115
-
116
- with open(Logger.log_file_path, 'a+', encoding="utf-8") as file:
117
- file.write(Logger.current_batch)
118
-
119
- ##--------------------start-of-clear_log_file()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
120
-
121
- @staticmethod
122
- @permission_error_decorator()
123
- def clear_log_file() -> None:
124
-
125
- """
126
-
127
- Clears the log file.
128
-
129
- """
130
-
131
- with open(Logger.log_file_path, 'w+', encoding="utf-8") as file:
132
- file.truncate(0)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
modules/common/toolkit.py CHANGED
@@ -3,6 +3,7 @@ from datetime import datetime
3
 
4
  import os
5
  import typing
 
6
  import platform
7
  import subprocess
8
 
@@ -14,7 +15,7 @@ class Toolkit():
14
 
15
  """
16
 
17
- CURRENT_VERSION = "v3.4.3"
18
 
19
  ##-------------------start-of-clear_console()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
20
 
@@ -29,6 +30,22 @@ class Toolkit():
29
 
30
  os.system('cls' if os.name == 'nt' else 'clear')
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  ##-------------------start-of-pause_console()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
33
 
34
  @staticmethod
@@ -74,7 +91,6 @@ class Toolkit():
74
  termios.tcsetattr(0, termios.TCSANOW, old_settings)
75
 
76
  except ImportError:
77
-
78
  pass
79
 
80
  ##-------------------start-of-maximize_window()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
@@ -164,12 +180,15 @@ class Toolkit():
164
  ##-------------------start-of-check_update()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
165
 
166
  @staticmethod
167
- def check_update() -> typing.Tuple[bool, str]:
168
 
169
  """
170
 
171
  Determines if Kudasai has a new latest release, and confirms if an internet connection is present or not.
172
 
 
 
 
173
  Returns:
174
  is_connection (bool) : Whether or not the user has an internet connection.
175
  update_prompt (str) : The update prompt to be displayed to the user, can either be blank if there is no update or contain the update prompt if there is an update.
@@ -193,6 +212,8 @@ class Toolkit():
193
 
194
  if(LooseVersion(latest_version) > LooseVersion(Toolkit.CURRENT_VERSION)):
195
 
 
 
196
  update_prompt += "There is a new update for Kudasai (" + latest_version + ")\nIt is recommended that you use the latest version of Kudasai\nYou can download it at https://github.com/Bikatr7/Kudasai/releases/latest \n"
197
 
198
  if(release_notes):
@@ -203,9 +224,12 @@ class Toolkit():
203
  ## used to determine if user lacks an internet connection.
204
  except:
205
 
 
 
206
  print("You seem to lack an internet connection, this will prevent you from checking from update notification and machine translation.\n")
207
 
208
- Toolkit.pause_console()
 
209
 
210
  is_connection = False
211
 
 
3
 
4
  import os
5
  import typing
6
+ import logging
7
  import platform
8
  import subprocess
9
 
 
15
 
16
  """
17
 
18
+ CURRENT_VERSION = "v3.4.5"
19
 
20
  ##-------------------start-of-clear_console()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
21
 
 
30
 
31
  os.system('cls' if os.name == 'nt' else 'clear')
32
 
33
+ ##-------------------start-of-is_windows()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
34
+
35
+ @staticmethod
36
+ def is_windows() -> bool:
37
+
38
+ """
39
+
40
+ Returns True if Kudasai is running on Windows.
41
+
42
+ Returns:
43
+ is_windows (bool) : If Kudasai is running on Windows.
44
+
45
+ """
46
+
47
+ return os.name == 'nt'
48
+
49
  ##-------------------start-of-pause_console()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
50
 
51
  @staticmethod
 
91
  termios.tcsetattr(0, termios.TCSANOW, old_settings)
92
 
93
  except ImportError:
 
94
  pass
95
 
96
  ##-------------------start-of-maximize_window()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
180
  ##-------------------start-of-check_update()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
181
 
182
  @staticmethod
183
+ def check_update(do_pause:bool=True) -> typing.Tuple[bool, str]:
184
 
185
  """
186
 
187
  Determines if Kudasai has a new latest release, and confirms if an internet connection is present or not.
188
 
189
+ Parameters:
190
+ do_pause (bool | optional | default=True) : Whether or not to pause the console after displaying the update prompt.
191
+
192
  Returns:
193
  is_connection (bool) : Whether or not the user has an internet connection.
194
  update_prompt (str) : The update prompt to be displayed to the user, can either be blank if there is no update or contain the update prompt if there is an update.
 
212
 
213
  if(LooseVersion(latest_version) > LooseVersion(Toolkit.CURRENT_VERSION)):
214
 
215
+ logging.debug("New update available: " + latest_version)
216
+
217
  update_prompt += "There is a new update for Kudasai (" + latest_version + ")\nIt is recommended that you use the latest version of Kudasai\nYou can download it at https://github.com/Bikatr7/Kudasai/releases/latest \n"
218
 
219
  if(release_notes):
 
224
  ## used to determine if user lacks an internet connection.
225
  except:
226
 
227
+ logging.debug("No internet connection detected.")
228
+
229
  print("You seem to lack an internet connection, this will prevent you from checking from update notification and machine translation.\n")
230
 
231
+ if(do_pause):
232
+ Toolkit.pause_console()
233
 
234
  is_connection = False
235
 
models/kijiku.py → modules/common/translator.py RENAMED
@@ -6,6 +6,7 @@ import time
6
  import typing
7
  import asyncio
8
  import os
 
9
 
10
  ## third party modules
11
  from kairyou import KatakanaUtil
@@ -17,19 +18,18 @@ import backoff
17
  from handlers.json_handler import JsonHandler
18
 
19
  from modules.common.file_ensurer import FileEnsurer
20
- from modules.common.logger import Logger
21
  from modules.common.toolkit import Toolkit
22
- from modules.common.exceptions import AuthenticationError, MaxBatchDurationExceededException, AuthenticationError, InternalServerError, RateLimitError, APITimeoutError, GoogleAuthError
23
  from modules.common.decorators import permission_error_decorator
24
 
25
- ##-------------------start-of-Kijiku--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
26
 
27
- class Kijiku:
28
 
29
  """
30
 
31
- Kijiku is a secondary class that is used to interact with LLMs and translate text.
32
- Currently supports OpenAI and Gemini.
33
 
34
  """
35
 
@@ -48,6 +48,9 @@ class Kijiku:
48
  ## meanwhile for gemini, we just need to send the prompt and the text to be translated concatenated together
49
  gemini_translation_batches:typing.List[str] = []
50
 
 
 
 
51
  num_occurred_malformed_batches = 0
52
 
53
  ## semaphore to limit the number of concurrent batches
@@ -55,7 +58,7 @@ class Kijiku:
55
 
56
  ##--------------------------------------------------------------------------------------------------------------------------
57
 
58
- LLM_TYPE:typing.Literal["openai", "gemini"] = "openai"
59
 
60
  translation_print_result = ""
61
 
@@ -71,6 +74,10 @@ class Kijiku:
71
 
72
  decorator_to_use:typing.Callable
73
 
 
 
 
 
74
  ##-------------------start-of-get_max_batch_duration()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
75
 
76
  @staticmethod
@@ -79,14 +86,14 @@ class Kijiku:
79
  """
80
 
81
  Returns the max batch duration.
82
- Structured as a function so that it can be used as a lambda function in the backoff decorator. As decorators call the function when they are defined/runtime, not when they are called.
83
 
84
  Returns:
85
  max_batch_duration (float) : the max batch duration.
86
 
87
  """
88
 
89
- return Kijiku.max_batch_duration
90
 
91
  ##-------------------start-of-log_retry()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
92
 
@@ -104,9 +111,7 @@ class Kijiku:
104
 
105
  retry_msg = f"Retrying translation after {details['wait']} seconds after {details['tries']} tries {details['target']} due to {details['exception']}."
106
 
107
- Logger.log_barrier()
108
- Logger.log_action(retry_msg)
109
- Logger.log_barrier()
110
 
111
  ##-------------------start-of-log_failure()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
112
 
@@ -120,13 +125,14 @@ class Kijiku:
120
  Parameters:
121
  details (dict) : the details of the failure.
122
 
 
 
 
123
  """
124
 
125
- error_msg = f"Exceeded duration, returning untranslated text after {details['tries']} tries {details['target']}."
126
 
127
- Logger.log_barrier()
128
- Logger.log_error(error_msg)
129
- Logger.log_barrier()
130
 
131
  raise MaxBatchDurationExceededException(error_msg)
132
 
@@ -141,27 +147,26 @@ class Kijiku:
141
 
142
  """
143
 
144
- Logger.clear_batch()
145
-
146
  ## set this here cause the try-except could throw before we get past the settings configuration
147
  time_start = time.time()
148
 
149
  try:
150
 
151
- await Kijiku.initialize()
152
 
153
  JsonHandler.validate_json()
154
 
155
- await Kijiku.check_settings()
 
156
 
157
  ## set actual start time to the end of the settings configuration
158
  time_start = time.time()
159
 
160
- await Kijiku.commence_translation()
161
 
162
  except Exception as e:
163
 
164
- Kijiku.translation_print_result += "An error has occurred, outputting results so far..."
165
 
166
  FileEnsurer.handle_critical_exception(e)
167
 
@@ -169,7 +174,10 @@ class Kijiku:
169
 
170
  time_end = time.time()
171
 
172
- Kijiku.assemble_results(time_start, time_end)
 
 
 
173
 
174
  ##-------------------start-of-initialize()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
175
 
@@ -178,38 +186,47 @@ class Kijiku:
178
 
179
  """
180
 
181
- Sets the API Key for the respective service and loads the kijiku rules.
182
 
183
  """
184
 
185
- print("What LLM do you want to use? (1 for OpenAI or 2 for Gemini) : ")
 
 
 
 
186
 
187
- if(input("\n") == "1"):
188
- Kijiku.LLM_TYPE = "openai"
189
-
190
- else:
191
- Kijiku.LLM_TYPE = "gemini"
192
 
193
- Toolkit.clear_console()
194
-
195
- if(Kijiku.LLM_TYPE == "openai"):
196
- await Kijiku.init_api_key("OpenAI", FileEnsurer.openai_api_key_path, EasyTL.set_api_key, EasyTL.test_api_key_validity)
 
 
197
 
198
  else:
199
- await Kijiku.init_api_key("Gemini", FileEnsurer.gemini_api_key_path, EasyTL.set_api_key, EasyTL.test_api_key_validity)
200
-
201
- ## try to load the kijiku rules
202
- try:
203
 
204
- JsonHandler.load_kijiku_rules()
 
 
 
 
 
 
205
 
206
- ## if the kijiku rules don't exist, create them
 
 
 
 
 
207
  except:
208
-
209
- JsonHandler.reset_kijiku_rules_to_default()
210
-
211
- JsonHandler.load_kijiku_rules()
212
-
213
  Toolkit.clear_console()
214
 
215
  ##-------------------start-of-init_openai_api_key()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
@@ -241,9 +258,8 @@ class Kijiku:
241
  ## if not valid, raise the exception that caused the test to fail
242
  if(not is_valid and e is not None):
243
  raise e
244
-
245
- Logger.log_action("Used saved API key in " + api_key_path, output=True)
246
- Logger.log_barrier()
247
 
248
  time.sleep(2)
249
 
@@ -267,22 +283,22 @@ class Kijiku:
267
  FileEnsurer.standard_overwrite_file(api_key_path, base64.b64encode(api_key.encode('utf-8')).decode('utf-8'), omit=True)
268
 
269
  ## if invalid key exit
270
- except (GoogleAuthError, AuthenticationError):
271
 
272
  Toolkit.clear_console()
273
-
274
- Logger.log_action(f"Authorization error while setting up {service}, please double check your API key as it appears to be incorrect.", output=True)
275
 
276
  Toolkit.pause_console()
277
 
278
- exit()
279
 
280
  ## other error, alert user and raise it
281
  except Exception as e:
282
 
283
  Toolkit.clear_console()
284
 
285
- Logger.log_action(f"Unknown error while setting up {service}, The error is as follows " + str(e) + "\nThe exception will now be raised.", output=True)
286
 
287
  Toolkit.pause_console()
288
 
@@ -300,17 +316,17 @@ class Kijiku:
300
 
301
  """
302
 
303
- Logger.clear_batch()
304
-
305
- Kijiku.text_to_translate = []
306
- Kijiku.translated_text = []
307
- Kijiku.je_check_text = []
308
- Kijiku.error_text = []
309
- Kijiku.openai_translation_batches = []
310
- Kijiku.gemini_translation_batches = []
311
- Kijiku.num_occurred_malformed_batches = 0
312
- Kijiku.translation_print_result = ""
313
- Kijiku.LLM_TYPE = "openai"
314
 
315
  ##-------------------start-of-check-settings()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
316
 
@@ -319,54 +335,48 @@ class Kijiku:
319
 
320
  """
321
 
322
- Prompts the user to confirm the settings in the kijiku rules file.
323
 
324
  """
325
 
326
  print("Are these settings okay? (1 for yes or 2 for no) : \n\n")
327
 
 
 
 
 
 
 
 
 
328
  try:
329
 
330
- JsonHandler.print_kijiku_rules(output=True)
331
 
332
  except:
333
  Toolkit.clear_console()
334
 
335
- if(input("It's likely that you're using an outdated version of the kijiku rules file, press 1 to reset these to default or 2 to exit and resolve manually : ") == "1"):
336
  Toolkit.clear_console()
337
- JsonHandler.reset_kijiku_rules_to_default()
338
- JsonHandler.load_kijiku_rules()
339
 
340
  print("Are these settings okay? (1 for yes or 2 for no) : \n\n")
341
- JsonHandler.print_kijiku_rules(output=True)
342
-
343
  else:
344
  FileEnsurer.exit_kudasai()
345
 
346
- if(input("\n") == "1"):
347
- pass
348
- else:
349
- JsonHandler.change_kijiku_settings()
350
 
351
  Toolkit.clear_console()
352
 
353
  print("Do you want to change your API key? (1 for yes or 2 for no) : ")
354
 
355
  if(input("\n") == "1"):
356
-
357
- if(Kijiku.LLM_TYPE == "openai"):
358
-
359
- if(os.path.exists(FileEnsurer.openai_api_key_path)):
360
-
361
- os.remove(FileEnsurer.openai_api_key_path)
362
- await Kijiku.init_api_key("OpenAI", FileEnsurer.openai_api_key_path, EasyTL.set_api_key, EasyTL.test_api_key_validity)
363
-
364
- else:
365
-
366
- if(os.path.exists(FileEnsurer.gemini_api_key_path)):
367
-
368
- os.remove(FileEnsurer.gemini_api_key_path)
369
- await Kijiku.init_api_key("Gemini", FileEnsurer.gemini_api_key_path, EasyTL.set_api_key, EasyTL.test_api_key_validity)
370
 
371
  Toolkit.clear_console()
372
 
@@ -383,106 +393,106 @@ class Kijiku:
383
  is_webgui (bool | optional | default=False) : A bool representing whether the function is being called by the webgui.
384
 
385
  """
 
 
 
386
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
387
 
388
- Logger.log_barrier()
389
- Logger.log_action("Kijiku Activated, LLM Type : " + Kijiku.LLM_TYPE)
390
- Logger.log_barrier()
391
- Logger.log_action("Settings are as follows : ")
392
- Logger.log_barrier()
393
-
394
- JsonHandler.print_kijiku_rules()
395
-
396
- Kijiku.prompt_assembly_mode = int(JsonHandler.current_kijiku_rules["base kijiku settings"]["prompt_assembly_mode"])
397
- Kijiku.number_of_lines_per_batch = int(JsonHandler.current_kijiku_rules["base kijiku settings"]["number_of_lines_per_batch"])
398
- Kijiku.sentence_fragmenter_mode = int(JsonHandler.current_kijiku_rules["base kijiku settings"]["sentence_fragmenter_mode"])
399
- Kijiku.je_check_mode = int(JsonHandler.current_kijiku_rules["base kijiku settings"]["je_check_mode"])
400
- Kijiku.num_of_malform_retries = int(JsonHandler.current_kijiku_rules["base kijiku settings"]["number_of_malformed_batch_retries"])
401
- Kijiku.max_batch_duration = float(JsonHandler.current_kijiku_rules["base kijiku settings"]["batch_retry_timeout"])
402
- Kijiku.num_concurrent_batches = int(JsonHandler.current_kijiku_rules["base kijiku settings"]["number_of_concurrent_batches"])
403
-
404
- Kijiku._semaphore = asyncio.Semaphore(Kijiku.num_concurrent_batches)
405
-
406
- Kijiku.openai_model = JsonHandler.current_kijiku_rules["openai settings"]["openai_model"]
407
- Kijiku.openai_system_message = JsonHandler.current_kijiku_rules["openai settings"]["openai_system_message"]
408
- Kijiku.openai_temperature = float(JsonHandler.current_kijiku_rules["openai settings"]["openai_temperature"])
409
- Kijiku.openai_top_p = float(JsonHandler.current_kijiku_rules["openai settings"]["openai_top_p"])
410
- Kijiku.openai_n = int(JsonHandler.current_kijiku_rules["openai settings"]["openai_n"])
411
- Kijiku.openai_stream = bool(JsonHandler.current_kijiku_rules["openai settings"]["openai_stream"])
412
- Kijiku.openai_stop = JsonHandler.current_kijiku_rules["openai settings"]["openai_stop"]
413
- Kijiku.openai_logit_bias = JsonHandler.current_kijiku_rules["openai settings"]["openai_logit_bias"]
414
- Kijiku.openai_max_tokens = JsonHandler.current_kijiku_rules["openai settings"]["openai_max_tokens"]
415
- Kijiku.openai_presence_penalty = float(JsonHandler.current_kijiku_rules["openai settings"]["openai_presence_penalty"])
416
- Kijiku.openai_frequency_penalty = float(JsonHandler.current_kijiku_rules["openai settings"]["openai_frequency_penalty"])
417
-
418
- Kijiku.gemini_model = JsonHandler.current_kijiku_rules["gemini settings"]["gemini_model"]
419
- Kijiku.gemini_prompt = JsonHandler.current_kijiku_rules["gemini settings"]["gemini_prompt"]
420
- Kijiku.gemini_temperature = float(JsonHandler.current_kijiku_rules["gemini settings"]["gemini_temperature"])
421
- Kijiku.gemini_top_p = JsonHandler.current_kijiku_rules["gemini settings"]["gemini_top_p"]
422
- Kijiku.gemini_top_k = JsonHandler.current_kijiku_rules["gemini settings"]["gemini_top_k"]
423
- Kijiku.gemini_candidate_count = JsonHandler.current_kijiku_rules["gemini settings"]["gemini_candidate_count"]
424
- Kijiku.gemini_stream = bool(JsonHandler.current_kijiku_rules["gemini settings"]["gemini_stream"])
425
- Kijiku.gemini_stop_sequences = JsonHandler.current_kijiku_rules["gemini settings"]["gemini_stop_sequences"]
426
- Kijiku.gemini_max_output_tokens = JsonHandler.current_kijiku_rules["gemini settings"]["gemini_max_output_tokens"]
427
-
428
-
429
- if(Kijiku.LLM_TYPE == "openai"):
430
- Kijiku.decorator_to_use = backoff.on_exception(backoff.expo, max_time=lambda: Kijiku.get_max_batch_duration(), exception=(AuthenticationError, InternalServerError, RateLimitError, APITimeoutError), on_backoff=lambda details: Kijiku.log_retry(details), on_giveup=lambda details: Kijiku.log_failure(details), raise_on_giveup=False)
431
-
432
- else:
433
- Kijiku.decorator_to_use = backoff.on_exception(backoff.expo, max_time=lambda: Kijiku.get_max_batch_duration(), exception=(Exception), on_backoff=lambda details: Kijiku.log_retry(details), on_giveup=lambda details: Kijiku.log_failure(details), raise_on_giveup=False)
434
 
435
  Toolkit.clear_console()
436
 
437
- Logger.log_barrier()
438
- Logger.log_action("Starting Prompt Building")
439
- Logger.log_barrier()
440
-
441
- Kijiku.build_translation_batches()
442
 
443
- model = JsonHandler.current_kijiku_rules["openai settings"]["openai_model"] if Kijiku.LLM_TYPE == "openai" else JsonHandler.current_kijiku_rules["gemini settings"]["gemini_model"]
 
 
 
 
 
 
 
 
444
 
445
- await Kijiku.handle_cost_estimate_prompt(model, omit_prompt=is_webgui)
446
 
447
  Toolkit.clear_console()
448
 
449
- Logger.log_barrier()
450
-
451
- Logger.log_action("Starting Translation...", output=not is_webgui)
452
- Logger.log_barrier()
453
 
454
  ## requests to run asynchronously
455
- async_requests = Kijiku.build_async_requests(model)
456
 
457
  ## Use asyncio.gather to run tasks concurrently/asynchronously and wait for all of them to complete
458
  results = await asyncio.gather(*async_requests)
459
 
460
- Logger.log_barrier()
461
- Logger.log_action("Translation Complete!", output=not is_webgui)
462
-
463
- Logger.log_barrier()
464
- Logger.log_action("Starting Redistribution...", output=not is_webgui)
465
-
466
- Logger.log_barrier()
467
 
468
  ## Sort results based on the index to maintain order
469
  sorted_results = sorted(results, key=lambda x: x[0])
470
 
471
  ## Redistribute the sorted results
472
- for index, translated_prompt, translated_message in sorted_results:
473
- Kijiku.redistribute(translated_prompt, translated_message)
474
 
475
  ## try to pair the text for j-e checking if the mode is 2
476
- if(Kijiku.je_check_mode == 2):
477
- Kijiku.je_check_text = Kijiku.fix_je()
478
 
479
  Toolkit.clear_console()
480
 
481
- Logger.log_action("Done!", output=not is_webgui)
482
- Logger.log_barrier()
483
-
484
- ## assemble error text based of the error list
485
- Kijiku.error_text = Logger.errors
486
 
487
  ##-------------------start-of-build_async_requests()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
488
 
@@ -502,19 +512,34 @@ class Kijiku:
502
  """
503
 
504
  async_requests = []
 
 
 
 
 
 
 
 
 
 
 
505
 
506
- translation_batches = Kijiku.openai_translation_batches if Kijiku.LLM_TYPE == "openai" else Kijiku.gemini_translation_batches
507
-
508
- for i in range(0, len(translation_batches), 2):
509
-
510
- instructions = translation_batches[i]
511
- prompt = translation_batches[i+1]
512
-
513
- assert isinstance(instructions, SystemTranslationMessage) or isinstance(instructions, str)
514
- assert isinstance(prompt, ModelTranslationMessage) or isinstance(prompt, str)
515
 
516
- async_requests.append(Kijiku.handle_translation(model, i, len(translation_batches), instructions, prompt))
 
517
 
 
 
 
 
518
  return async_requests
519
 
520
  ##-------------------start-of-generate_text_to_translate_batches()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
@@ -537,41 +562,36 @@ class Kijiku:
537
 
538
  prompt = []
539
  non_word_pattern = re.compile(r'^[\W_\s\n-]+$')
 
 
 
 
 
540
 
541
- while(index < len(Kijiku.text_to_translate)):
542
-
543
- sentence = Kijiku.text_to_translate[index]
544
- stripped_sentence = sentence.strip()
545
  lowercase_sentence = sentence.lower()
546
-
547
- has_quotes = any(char in sentence for char in ["「", "」", "『", "』", "【", "】", "\"", "'"])
548
  is_part_in_sentence = "part" in lowercase_sentence
549
-
550
- if(len(prompt) < Kijiku.number_of_lines_per_batch):
551
-
552
- if(any(char in sentence for char in ["▼", "△", "◇"])):
 
553
  prompt.append(f'{sentence}\n')
554
- Logger.log_action(f"Sentence : {sentence}, Sentence is a pov change... adding to prompt.")
555
-
556
- elif(stripped_sentence == ''):
557
- Logger.log_action(f"Sentence : {sentence} is empty... skipping.")
558
 
559
- elif(is_part_in_sentence or all(char in ["1","2","3","4","5","6","7","8","9", " "] for char in sentence)):
560
- prompt.append(f'{sentence}\n')
561
- Logger.log_action(f"Sentence : {sentence}, Sentence is part marker... adding to prompt.")
562
 
563
- elif(non_word_pattern.match(sentence) or KatakanaUtil.is_punctuation(stripped_sentence) and not has_quotes):
564
- Logger.log_action(f"Sentence : {sentence}, Sentence is punctuation... skipping.")
565
-
566
- else:
567
  prompt.append(f'{sentence}\n')
568
- Logger.log_action(f"Sentence : {sentence}, Sentence is a valid sentence... adding to prompt.")
569
-
570
  else:
571
  return prompt, index
572
-
573
  index += 1
574
-
575
  return prompt, index
576
 
577
  ##-------------------start-of-build_translation_batches()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
@@ -587,42 +607,53 @@ class Kijiku:
587
 
588
  i = 0
589
 
590
- while i < len(Kijiku.text_to_translate):
591
 
592
- batch, i = Kijiku.generate_text_to_translate_batches(i)
593
  batch = ''.join(batch)
594
 
595
- if(Kijiku.LLM_TYPE == 'openai'):
596
 
597
- if(Kijiku.prompt_assembly_mode == 1):
598
- system_msg = SystemTranslationMessage(content=str(Kijiku.openai_system_message))
599
  else:
600
- system_msg = SystemTranslationMessage(content=str(Kijiku.openai_system_message))
601
 
602
- Kijiku.openai_translation_batches.append(system_msg)
603
  model_msg = ModelTranslationMessage(content=batch)
604
- Kijiku.openai_translation_batches.append(model_msg)
 
 
 
 
605
 
606
  else:
607
- Kijiku.gemini_translation_batches.append(Kijiku.gemini_prompt)
608
- Kijiku.gemini_translation_batches.append(batch)
609
 
610
- Logger.log_barrier()
611
- Logger.log_action("Built Messages : ")
612
- Logger.log_barrier()
 
 
 
 
613
 
614
  i = 0
615
 
616
- for message in (Kijiku.openai_translation_batches if Kijiku.LLM_TYPE == 'openai' else Kijiku.gemini_translation_batches):
 
 
617
 
618
  i+=1
619
 
620
- message = str(message) if Kijiku.LLM_TYPE == 'gemini' else message.content # type: ignore
 
 
 
621
 
622
- if(i % 2 == 1):
623
- Logger.log_barrier()
624
 
625
- Logger.log_action(message)
626
 
627
  ##-------------------start-of-handle_cost_estimate_prompt()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
628
 
@@ -642,39 +673,43 @@ class Kijiku:
642
 
643
  """
644
 
645
- translation_instructions = Kijiku.openai_system_message if Kijiku.LLM_TYPE == "openai" else Kijiku.gemini_prompt
 
 
 
 
 
 
646
 
647
  ## get cost estimate and confirm
648
- num_tokens, min_cost, model = EasyTL.calculate_cost(text=Kijiku.text_to_translate, service=Kijiku.LLM_TYPE, model=model,translation_instructions=translation_instructions)
649
 
650
  print("Note that the cost estimate is not always accurate, and may be higher than the actual cost. However cost calculation now includes output tokens.\n")
651
 
652
- Logger.log_barrier()
653
- Logger.log_action("Calculating cost")
654
- Logger.log_barrier()
655
-
656
- if(Kijiku.LLM_TYPE == "gemini"):
657
- Logger.log_action(f"As of Kudasai {Toolkit.CURRENT_VERSION}, Gemini Pro 1.0 is free to use under 15 requests per minute, Gemini Pro 1.5 is free to use under 2 requests per minute. Requests correspond to number_of_current_batches in kijiku_settings.", output=True, omit_timestamp=True)
658
 
659
- Logger.log_action("Estimated number of tokens : " + str(num_tokens), output=True, omit_timestamp=True)
660
- Logger.log_action("Estimated minimum cost : " + str(min_cost) + " USD", output=True, omit_timestamp=True)
661
- Logger.log_barrier()
662
 
663
  if(not omit_prompt):
664
  if(input("\nContinue? (1 for yes or 2 for no) : ") == "1"):
665
- Logger.log_action("User confirmed translation.")
666
 
667
  else:
668
- Logger.log_action("User cancelled translation.")
669
- Logger.push_batch()
670
- exit()
671
 
672
  return model
673
 
674
  ##-------------------start-of-handle_translation()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
675
 
676
  @staticmethod
677
- async def handle_translation(model:str, index:int, length:int, translation_instructions:typing.Union[str, SystemTranslationMessage], translation_prompt:typing.Union[str, ModelTranslationMessage]) -> tuple[int, str, str]:
 
 
 
 
678
 
679
  """
680
 
@@ -682,20 +717,20 @@ class Kijiku:
682
 
683
  Parameters:
684
  model (string) : The model of the service used to translate the text.
685
- index (int) : The index of the translation batch.
686
- length (int) : The length of the translation batch.
687
- translation_instructions (typing.Union[str, Message]) : The translation instructions.
688
- translation_prompt (typing.Union[str, Message]) : The translation prompt.
689
 
690
  Returns:
691
- index (int) : The index of the translation batch.
692
- translation_prompt (typing.Union[str, Message]) : The translation prompt.
693
- translated_message (str) : The translated message.
694
 
695
  """
696
 
697
  ## Basically limits the number of concurrent batches
698
- async with Kijiku._semaphore:
699
  num_tries = 0
700
 
701
  while True:
@@ -704,72 +739,93 @@ class Kijiku:
704
  if(FileEnsurer.do_interrupt == True):
705
  raise Exception("Interrupted by user.")
706
 
707
- message_number = (index // 2) + 1
708
- Logger.log_action(f"Trying translation for batch {message_number} of {length//2}...", output=True)
 
709
 
710
  try:
711
 
712
- if(Kijiku.LLM_TYPE == "openai"):
713
- translated_message = await EasyTL.openai_translate_async(text=translation_prompt,
714
- decorator=Kijiku.decorator_to_use,
715
- translation_instructions=translation_instructions,
716
- model=model,
717
- temperature=Kijiku.openai_temperature,
718
- top_p=Kijiku.openai_top_p,
719
- stop=Kijiku.openai_stop,
720
- max_tokens=Kijiku.openai_max_tokens,
721
- presence_penalty=Kijiku.openai_presence_penalty,
722
- frequency_penalty=Kijiku.openai_frequency_penalty)
723
-
724
- else:
725
-
726
- assert isinstance(translation_prompt, str)
727
-
728
- translated_message = await EasyTL.gemini_translate_async(text=translation_prompt,
729
- decorator=Kijiku.decorator_to_use,
730
- model=model,
731
- temperature=Kijiku.gemini_temperature,
732
- top_p=Kijiku.gemini_top_p,
733
- top_k=Kijiku.gemini_top_k,
734
- stop_sequences=Kijiku.gemini_stop_sequences,
735
- max_output_tokens=Kijiku.gemini_max_output_tokens)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
736
 
737
  ## will only occur if the max_batch_duration is exceeded, so we just return the untranslated text
738
  except MaxBatchDurationExceededException:
739
 
740
- Logger.log_error(f"Batch {message_number} of {length//2} was not translated due to exceeding the max request duration, returning the untranslated text...", output=True)
741
  break
742
 
743
  ## do not even bother if not a gpt 4 model, because gpt-3 seems unable to format properly
744
  ## since gemini is free, we can just try again if it's malformed
745
- if("gpt-4" not in model and Kijiku.LLM_TYPE != "gemini"):
 
746
  break
747
 
748
- if(await Kijiku.check_if_translation_is_good(translated_message, translation_prompt)):
749
- Logger.log_action(f"Translation for batch {message_number} of {length//2} successful!", output=True)
750
  break
751
 
752
- if(num_tries >= Kijiku.num_of_malform_retries):
753
- Logger.log_action(f"Batch {message_number} of {length//2} was malformed, but exceeded the maximum number of retries, Translation successful!", output=True)
754
  break
755
 
756
  else:
757
  num_tries += 1
758
- Logger.log_error(f"Batch {message_number} of {length//2} was malformed, retrying...", output=True)
759
- Kijiku.num_occurred_malformed_batches += 1
760
 
761
- if(isinstance(translation_prompt, ModelTranslationMessage)):
762
- translation_prompt = translation_prompt.content
763
 
764
  if(isinstance(translated_message, typing.List)):
765
- translated_message = ''.join(translated_message)
 
 
766
 
767
- return index, translation_prompt, translated_message
768
 
769
  ##-------------------start-of-check_if_translation_is_good()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
770
 
771
  @staticmethod
772
- async def check_if_translation_is_good(translated_message:typing.Union[typing.List[str], str], translation_prompt:typing.Union[ModelTranslationMessage, str]) -> bool:
773
 
774
  """
775
 
@@ -777,66 +833,61 @@ class Kijiku:
777
 
778
  Parameters:
779
  translated_message (str) : the translated message.
780
- translation_prompt (typing.Union[str, Message]) : the translation prompt.
781
 
782
  Returns:
783
  is_valid (bool) : whether or not the translation is valid.
784
 
785
  """
786
 
787
- if(not isinstance(translation_prompt, str)):
788
- prompt = translation_prompt.content
789
 
790
  else:
791
- prompt = translation_prompt
792
 
793
  if(isinstance(translated_message, list)):
794
  translated_message = ''.join(translated_message)
795
 
796
- is_valid = False
797
-
798
  jap = [line for line in prompt.split('\n') if line.strip()] ## Remove blank lines
799
  eng = [line for line in translated_message.split('\n') if line.strip()] ## Remove blank lines
800
-
801
- if(len(jap) == len(eng)):
802
- is_valid = True
803
 
804
- return is_valid
805
 
806
  ##-------------------start-of-redistribute()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
807
 
808
  @staticmethod
809
- def redistribute(translation_prompt:typing.Union[Message, str], translated_message:str) -> None:
810
 
811
  """
812
 
813
  Puts translated text back into the text file.
814
 
815
  Parameters:
816
- translation_prompt (typing.Union[str, Message]) : the translation prompt.
817
  translated_message (str) : the translated message.
818
 
819
  """
820
 
821
- if(not isinstance(translation_prompt, str)):
822
- prompt = translation_prompt.content
823
 
824
  else:
825
- prompt = translation_prompt
826
 
827
  ## Separates with hyphens if the mode is 1
828
- if(Kijiku.je_check_mode == 1):
829
 
830
- Kijiku.je_check_text.append("\n-------------------------\n"+ prompt + "\n\n")
831
- Kijiku.je_check_text.append(translated_message + '\n')
832
 
833
  ## Mode two tries to pair the text for j-e checking, see fix_je() for more details
834
- elif(Kijiku.je_check_mode == 2):
835
- Kijiku.je_check_text.append(prompt)
836
- Kijiku.je_check_text.append(translated_message)
837
 
838
  ## mode 1 is the default mode, uses regex and other nonsense to split sentences
839
- if(Kijiku.sentence_fragmenter_mode == 1):
840
 
841
  sentences = re.findall(r"(.*?(?:(?:\"|\'|-|~|!|\?|%|\(|\)|\.\.\.|\.|---|\[|\])))(?:\s|$)", translated_message)
842
 
@@ -859,17 +910,17 @@ class Kijiku:
859
  build_string += f" {sentence}"
860
  continue
861
 
862
- Kijiku.translated_text.append(sentence + '\n')
863
 
864
- for i in range(len(Kijiku.translated_text)):
865
- if Kijiku.translated_text[i] in patched_sentences:
866
- index = patched_sentences.index(Kijiku.translated_text[i])
867
- Kijiku.translated_text[i] = patched_sentences[index]
868
 
869
  ## mode 2 just assumes the LLM formatted it properly
870
- elif(Kijiku.sentence_fragmenter_mode == 2):
871
 
872
- Kijiku.translated_text.append(translated_message + '\n\n')
873
 
874
  ##-------------------start-of-fix_je()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
875
 
@@ -890,36 +941,30 @@ class Kijiku:
890
  i = 1
891
  final_list = []
892
 
893
- while i < len(Kijiku.je_check_text):
894
- jap = Kijiku.je_check_text[i-1].split('\n')
895
- eng = Kijiku.je_check_text[i].split('\n')
896
 
897
- jap = [line for line in jap if line.strip()] ## Remove blank lines
898
- eng = [line for line in eng if line.strip()] ## Remove blank lines
899
 
900
  final_list.append("-------------------------\n")
901
 
902
  if(len(jap) == len(eng)):
903
-
904
- for jap_line,eng_line in zip(jap,eng):
905
- if(jap_line and eng_line): ## check if jap_line and eng_line aren't blank
906
  final_list.append(jap_line + '\n\n')
907
  final_list.append(eng_line + '\n\n')
908
-
909
  final_list.append("--------------------------------------------------\n")
910
-
911
-
912
  else:
913
-
914
- final_list.append(Kijiku.je_check_text[i-1] + '\n\n')
915
- final_list.append(Kijiku.je_check_text[i] + '\n\n')
916
-
917
  final_list.append("--------------------------------------------------\n")
918
 
919
- i+=2
920
 
921
  return final_list
922
-
923
  ##-------------------start-of-assemble_results()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
924
 
925
  @staticmethod
@@ -927,7 +972,7 @@ class Kijiku:
927
 
928
  """
929
 
930
- Generates the Kijiku translation print result, does not directly output/return, but rather sets Kijiku.translation_print_result to the output.
931
 
932
  Parameters:
933
  time_start (float) : When the translation started.
@@ -937,24 +982,24 @@ class Kijiku:
937
 
938
  result = (
939
  f"Time Elapsed : {Toolkit.get_elapsed_time(time_start, time_end)}\n"
940
- f"Number of malformed batches : {Kijiku.num_occurred_malformed_batches}\n\n"
941
  f"Debug text have been written to : {FileEnsurer.debug_log_path}\n"
942
  f"J->E text have been written to : {FileEnsurer.je_check_path}\n"
943
  f"Translated text has been written to : {FileEnsurer.translated_text_path}\n"
944
  f"Errors have been written to : {FileEnsurer.error_log_path}\n"
945
  )
946
 
947
- Kijiku.translation_print_result = result
948
 
949
- ##-------------------start-of-write_kijiku_results()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
950
 
951
  @staticmethod
952
  @permission_error_decorator()
953
- def write_kijiku_results() -> None:
954
 
955
  """
956
 
957
- This function is called to write the results of the Kijiku translation module to the output directory.
958
 
959
  """
960
 
@@ -962,27 +1007,23 @@ class Kijiku:
962
  FileEnsurer.standard_create_directory(FileEnsurer.output_dir)
963
 
964
  with open(FileEnsurer.error_log_path, 'a+', encoding='utf-8') as file:
965
- file.writelines(Kijiku.error_text)
966
 
967
  with open(FileEnsurer.je_check_path, 'w', encoding='utf-8') as file:
968
- file.writelines(Kijiku.je_check_text)
969
 
970
  with open(FileEnsurer.translated_text_path, 'w', encoding='utf-8') as file:
971
- file.writelines(Kijiku.translated_text)
972
 
973
  ## Instructions to create a copy of the output for archival
974
  FileEnsurer.standard_create_directory(FileEnsurer.archive_dir)
975
 
976
  timestamp = Toolkit.get_timestamp(is_archival=True)
977
 
978
- ## pushes the tl debug log to the file without clearing the file
979
- Logger.push_batch()
980
- Logger.clear_batch()
981
-
982
- list_of_result_tuples = [('kijiku_translated_text', Kijiku.translated_text),
983
- ('kijiku_je_check_text', Kijiku.je_check_text),
984
- ('kijiku_error_log', Kijiku.error_text),
985
- ('debug_log', FileEnsurer.standard_read_file(Logger.log_file_path))]
986
 
987
  FileEnsurer.archive_results(list_of_result_tuples,
988
- module='kijiku', timestamp=timestamp)
 
6
  import typing
7
  import asyncio
8
  import os
9
+ import logging
10
 
11
  ## third party modules
12
  from kairyou import KatakanaUtil
 
18
  from handlers.json_handler import JsonHandler
19
 
20
  from modules.common.file_ensurer import FileEnsurer
 
21
  from modules.common.toolkit import Toolkit
22
+ from modules.common.exceptions import OpenAIAuthenticationError, MaxBatchDurationExceededException, DeepLAuthorizationException, OpenAIInternalServerError, OpenAIRateLimitError, OpenAIAPITimeoutError, GoogleAuthError, OpenAIAPIStatusError, OpenAIAPIConnectionError, DeepLException, GoogleAPIError
23
  from modules.common.decorators import permission_error_decorator
24
 
25
+ ##-------------------start-of-Translator--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
26
 
27
+ class Translator:
28
 
29
  """
30
 
31
+ Translator is a class that is used to interact with translation methods and translate text.
32
+ Currently supports OpenAI, Gemini, and DeepL.
33
 
34
  """
35
 
 
48
  ## meanwhile for gemini, we just need to send the prompt and the text to be translated concatenated together
49
  gemini_translation_batches:typing.List[str] = []
50
 
51
+ ## same as above, but for deepl, just the text to be translated
52
+ deepl_translation_batches:typing.List[str] = []
53
+
54
  num_occurred_malformed_batches = 0
55
 
56
  ## semaphore to limit the number of concurrent batches
 
58
 
59
  ##--------------------------------------------------------------------------------------------------------------------------
60
 
61
+ TRANSLATION_METHOD:typing.Literal["openai", "gemini", "deepl"] = "openai"
62
 
63
  translation_print_result = ""
64
 
 
74
 
75
  decorator_to_use:typing.Callable
76
 
77
+ is_cli = False
78
+
79
+ pre_provided_api_key = ""
80
+
81
  ##-------------------start-of-get_max_batch_duration()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
82
 
83
  @staticmethod
 
86
  """
87
 
88
  Returns the max batch duration.
89
+ Structured as a function so that it can be used as a lambda function in the backoff decorator. As decorators call the function when they are defined/runtime, not when they are called. Which I learned the hard way.
90
 
91
  Returns:
92
  max_batch_duration (float) : the max batch duration.
93
 
94
  """
95
 
96
+ return Translator.max_batch_duration
97
 
98
  ##-------------------start-of-log_retry()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
99
 
 
111
 
112
  retry_msg = f"Retrying translation after {details['wait']} seconds after {details['tries']} tries {details['target']} due to {details['exception']}."
113
 
114
+ logging.warning(retry_msg)
 
 
115
 
116
  ##-------------------start-of-log_failure()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
117
 
 
125
  Parameters:
126
  details (dict) : the details of the failure.
127
 
128
+ Raises:
129
+ MaxBatchDurationExceededException : An exception that is raised when the max batch duration is exceeded.
130
+
131
  """
132
 
133
+ error_msg = f"Exceeded allowed duration of {details['wait']} seconds, returning untranslated text after {details['tries']} tries {details['target']}."
134
 
135
+ logging.error(error_msg)
 
 
136
 
137
  raise MaxBatchDurationExceededException(error_msg)
138
 
 
147
 
148
  """
149
 
 
 
150
  ## set this here cause the try-except could throw before we get past the settings configuration
151
  time_start = time.time()
152
 
153
  try:
154
 
155
+ await Translator.initialize()
156
 
157
  JsonHandler.validate_json()
158
 
159
+ if(not Translator.is_cli):
160
+ await Translator.check_settings()
161
 
162
  ## set actual start time to the end of the settings configuration
163
  time_start = time.time()
164
 
165
+ await Translator.commence_translation()
166
 
167
  except Exception as e:
168
 
169
+ Translator.translation_print_result += "An error has occurred, outputting results so far..."
170
 
171
  FileEnsurer.handle_critical_exception(e)
172
 
 
174
 
175
  time_end = time.time()
176
 
177
+ Translator.assemble_results(time_start, time_end)
178
+
179
+ if(Translator.is_cli):
180
+ Toolkit.pause_console()
181
 
182
  ##-------------------start-of-initialize()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
183
 
 
186
 
187
  """
188
 
189
+ Sets the API Key for the respective service and loads the translation settings.
190
 
191
  """
192
 
193
+ translation_methods = {
194
+ "1": ("openai", FileEnsurer.openai_api_key_path),
195
+ "2": ("gemini", FileEnsurer.gemini_api_key_path),
196
+ "3": ("deepl", FileEnsurer.deepl_api_key_path),
197
+ }
198
 
199
+ if(not Translator.is_cli):
200
+ method = input("What method would you like to use for translation? (1 for OpenAI, 2 for Gemini, 3 for Deepl, or any other key to exit) : \n")
 
 
 
201
 
202
+ if(method not in translation_methods.keys()):
203
+ print("\nThank you for using Kudasai, goodbye.")
204
+ time.sleep(2)
205
+ FileEnsurer.exit_kudasai()
206
+
207
+ Toolkit.clear_console()
208
 
209
  else:
210
+ method = Translator.TRANSLATION_METHOD
 
 
 
211
 
212
+ Translator.TRANSLATION_METHOD, api_key_path = translation_methods.get(method, ("deepl", FileEnsurer.deepl_api_key_path))
213
+
214
+ if(Translator.pre_provided_api_key != ""):
215
+ encoded_key = base64.b64encode(Translator.pre_provided_api_key.encode('utf-8')).decode('utf-8')
216
+ Translator.pre_provided_api_key = ""
217
+ with open(api_key_path, 'w+', encoding='utf-8') as file:
218
+ file.write(encoded_key)
219
 
220
+ await Translator.init_api_key(Translator.TRANSLATION_METHOD.capitalize(), api_key_path, EasyTL.set_credentials, EasyTL.test_credentials)
221
+
222
+ ## try to load the translation settings
223
+ try:
224
+ JsonHandler.load_translation_settings()
225
+ ## if the translation settings don't exist, create them
226
  except:
227
+ JsonHandler.reset_translation_settings_to_default()
228
+ JsonHandler.load_translation_settings()
229
+
 
 
230
  Toolkit.clear_console()
231
 
232
  ##-------------------start-of-init_openai_api_key()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
258
  ## if not valid, raise the exception that caused the test to fail
259
  if(not is_valid and e is not None):
260
  raise e
261
+
262
+ logging.info(f"Used saved API key in {api_key_path}")
 
263
 
264
  time.sleep(2)
265
 
 
283
  FileEnsurer.standard_overwrite_file(api_key_path, base64.b64encode(api_key.encode('utf-8')).decode('utf-8'), omit=True)
284
 
285
  ## if invalid key exit
286
+ except (GoogleAuthError, OpenAIAuthenticationError, DeepLAuthorizationException):
287
 
288
  Toolkit.clear_console()
289
+
290
+ logging.error(f"Authorization error while setting up {service}, please double check your API key as it appears to be incorrect.")
291
 
292
  Toolkit.pause_console()
293
 
294
+ exit(1)
295
 
296
  ## other error, alert user and raise it
297
  except Exception as e:
298
 
299
  Toolkit.clear_console()
300
 
301
+ logging.error(f"Unknown error while setting up {service}, The error is as follows " + str(e) + "\nThe exception will now be raised.")
302
 
303
  Toolkit.pause_console()
304
 
 
316
 
317
  """
318
 
319
+ Translator.text_to_translate = []
320
+ Translator.translated_text = []
321
+ Translator.je_check_text = []
322
+ Translator.error_text = []
323
+ Translator.openai_translation_batches = []
324
+ Translator.gemini_translation_batches = []
325
+ Translator.num_occurred_malformed_batches = 0
326
+ Translator.translation_print_result = ""
327
+ Translator.TRANSLATION_METHOD = "openai"
328
+ Translator.pre_provided_api_key = ""
329
+ Translator.is_cli = False
330
 
331
  ##-------------------start-of-check-settings()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
332
 
 
335
 
336
  """
337
 
338
+ Prompts the user to confirm the settings in the translation settings file.
339
 
340
  """
341
 
342
  print("Are these settings okay? (1 for yes or 2 for no) : \n\n")
343
 
344
+ method_to_section_dict = {
345
+ "openai": ("openai settings", "OpenAI", FileEnsurer.openai_api_key_path),
346
+ "gemini": ("gemini settings", "Gemini", FileEnsurer.gemini_api_key_path),
347
+ "deepl": ("deepl settings", "DeepL", FileEnsurer.deepl_api_key_path)
348
+ }
349
+
350
+ section_to_target, method_name, api_key_path = method_to_section_dict[Translator.TRANSLATION_METHOD]
351
+
352
  try:
353
 
354
+ JsonHandler.log_translation_settings(output_to_console=True, specific_section=section_to_target)
355
 
356
  except:
357
  Toolkit.clear_console()
358
 
359
+ if(input("It's likely that you're using an outdated version of the translation settings file, press 1 to reset these to default or 2 to exit and resolve manually : ") == "1"):
360
  Toolkit.clear_console()
361
+ JsonHandler.reset_translation_settings_to_default()
362
+ JsonHandler.load_translation_settings()
363
 
364
  print("Are these settings okay? (1 for yes or 2 for no) : \n\n")
365
+ JsonHandler.log_translation_settings(output_to_console=True, specific_section=section_to_target)
 
366
  else:
367
  FileEnsurer.exit_kudasai()
368
 
369
+ if(input("\n") != "1"):
370
+ JsonHandler.change_translation_settings()
 
 
371
 
372
  Toolkit.clear_console()
373
 
374
  print("Do you want to change your API key? (1 for yes or 2 for no) : ")
375
 
376
  if(input("\n") == "1"):
377
+ if(os.path.exists(api_key_path)):
378
+ os.remove(api_key_path)
379
+ await Translator.init_api_key(method_name, api_key_path, EasyTL.set_credentials, EasyTL.test_credentials)
 
 
 
 
 
 
 
 
 
 
 
380
 
381
  Toolkit.clear_console()
382
 
 
393
  is_webgui (bool | optional | default=False) : A bool representing whether the function is being called by the webgui.
394
 
395
  """
396
+
397
+ logging.debug(f"Translator Activated, Translation Method : {Translator.TRANSLATION_METHOD} "
398
+ f"Settings are as follows : ")
399
 
400
+ JsonHandler.log_translation_settings()
401
+
402
+ Translator.prompt_assembly_mode = int(JsonHandler.current_translation_settings["base translation settings"]["prompt_assembly_mode"])
403
+ Translator.number_of_lines_per_batch = int(JsonHandler.current_translation_settings["base translation settings"]["number_of_lines_per_batch"])
404
+ Translator.sentence_fragmenter_mode = int(JsonHandler.current_translation_settings["base translation settings"]["sentence_fragmenter_mode"])
405
+ Translator.je_check_mode = int(JsonHandler.current_translation_settings["base translation settings"]["je_check_mode"])
406
+ Translator.num_of_malform_retries = int(JsonHandler.current_translation_settings["base translation settings"]["number_of_malformed_batch_retries"])
407
+ Translator.max_batch_duration = float(JsonHandler.current_translation_settings["base translation settings"]["batch_retry_timeout"])
408
+ Translator.num_concurrent_batches = int(JsonHandler.current_translation_settings["base translation settings"]["number_of_concurrent_batches"])
409
+
410
+ Translator._semaphore = asyncio.Semaphore(Translator.num_concurrent_batches)
411
+
412
+ Translator.openai_model = JsonHandler.current_translation_settings["openai settings"]["openai_model"]
413
+ Translator.openai_system_message = JsonHandler.current_translation_settings["openai settings"]["openai_system_message"]
414
+ Translator.openai_temperature = float(JsonHandler.current_translation_settings["openai settings"]["openai_temperature"])
415
+ Translator.openai_top_p = float(JsonHandler.current_translation_settings["openai settings"]["openai_top_p"])
416
+ Translator.openai_n = int(JsonHandler.current_translation_settings["openai settings"]["openai_n"])
417
+ Translator.openai_stream = bool(JsonHandler.current_translation_settings["openai settings"]["openai_stream"])
418
+ Translator.openai_stop = JsonHandler.current_translation_settings["openai settings"]["openai_stop"]
419
+ Translator.openai_logit_bias = JsonHandler.current_translation_settings["openai settings"]["openai_logit_bias"]
420
+ Translator.openai_max_tokens = JsonHandler.current_translation_settings["openai settings"]["openai_max_tokens"]
421
+ Translator.openai_presence_penalty = float(JsonHandler.current_translation_settings["openai settings"]["openai_presence_penalty"])
422
+ Translator.openai_frequency_penalty = float(JsonHandler.current_translation_settings["openai settings"]["openai_frequency_penalty"])
423
+
424
+ Translator.gemini_model = JsonHandler.current_translation_settings["gemini settings"]["gemini_model"]
425
+ Translator.gemini_prompt = JsonHandler.current_translation_settings["gemini settings"]["gemini_prompt"]
426
+ Translator.gemini_temperature = float(JsonHandler.current_translation_settings["gemini settings"]["gemini_temperature"])
427
+ Translator.gemini_top_p = JsonHandler.current_translation_settings["gemini settings"]["gemini_top_p"]
428
+ Translator.gemini_top_k = JsonHandler.current_translation_settings["gemini settings"]["gemini_top_k"]
429
+ Translator.gemini_candidate_count = JsonHandler.current_translation_settings["gemini settings"]["gemini_candidate_count"]
430
+ Translator.gemini_stream = bool(JsonHandler.current_translation_settings["gemini settings"]["gemini_stream"])
431
+ Translator.gemini_stop_sequences = JsonHandler.current_translation_settings["gemini settings"]["gemini_stop_sequences"]
432
+ Translator.gemini_max_output_tokens = JsonHandler.current_translation_settings["gemini settings"]["gemini_max_output_tokens"]
433
+
434
+ Translator.deepl_context = JsonHandler.current_translation_settings["deepl settings"]["deepl_context"]
435
+ Translator.deepl_split_sentences = JsonHandler.current_translation_settings["deepl settings"]["deepl_split_sentences"]
436
+ Translator.deepl_preserve_formatting = JsonHandler.current_translation_settings["deepl settings"]["deepl_preserve_formatting"]
437
+ Translator.deepl_formality = JsonHandler.current_translation_settings["deepl settings"]["deepl_formality"]
438
+
439
+ exception_dict = {
440
+ "openai": (OpenAIAuthenticationError, OpenAIInternalServerError, OpenAIRateLimitError, OpenAIAPITimeoutError, OpenAIAPIConnectionError, OpenAIAPIStatusError),
441
+ "gemini": GoogleAPIError,
442
+ "deepl": DeepLException
443
+ }
444
 
445
+ Translator.decorator_to_use = backoff.on_exception(
446
+ backoff.expo,
447
+ max_time=lambda: Translator.get_max_batch_duration(),
448
+ exception=exception_dict.get(Translator.TRANSLATION_METHOD, None),
449
+ on_backoff=lambda details: Translator.log_retry(details),
450
+ on_giveup=lambda details: Translator.log_failure(details),
451
+ raise_on_giveup=False
452
+ )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
453
 
454
  Toolkit.clear_console()
455
 
456
+ logging.info("Starting Prompt Building...")
 
 
 
 
457
 
458
+ Translator.build_translation_batches()
459
+
460
+ translation_methods = {
461
+ "openai": JsonHandler.current_translation_settings["openai settings"]["openai_model"],
462
+ "gemini": JsonHandler.current_translation_settings["gemini settings"]["gemini_model"],
463
+ "deepl": "deepl"
464
+ }
465
+
466
+ model = translation_methods[Translator.TRANSLATION_METHOD]
467
 
468
+ await Translator.handle_cost_estimate_prompt(model, omit_prompt=is_webgui or Translator.is_cli)
469
 
470
  Toolkit.clear_console()
471
 
472
+ logging.info("Starting Translation...")
 
 
 
473
 
474
  ## requests to run asynchronously
475
+ async_requests = Translator.build_async_requests(model)
476
 
477
  ## Use asyncio.gather to run tasks concurrently/asynchronously and wait for all of them to complete
478
  results = await asyncio.gather(*async_requests)
479
 
480
+ logging.info("Redistributing Translated Text...")
 
 
 
 
 
 
481
 
482
  ## Sort results based on the index to maintain order
483
  sorted_results = sorted(results, key=lambda x: x[0])
484
 
485
  ## Redistribute the sorted results
486
+ for _, translated_prompt, translated_message in sorted_results:
487
+ Translator.redistribute(translated_prompt, translated_message)
488
 
489
  ## try to pair the text for j-e checking if the mode is 2
490
+ if(Translator.je_check_mode == 2):
491
+ Translator.je_check_text = Translator.fix_je()
492
 
493
  Toolkit.clear_console()
494
 
495
+ logging.info("Done!")
 
 
 
 
496
 
497
  ##-------------------start-of-build_async_requests()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
498
 
 
512
  """
513
 
514
  async_requests = []
515
+
516
+ translation_batches_methods = {
517
+ "openai": Translator.openai_translation_batches,
518
+ "gemini": Translator.gemini_translation_batches,
519
+ "deepl": Translator.deepl_translation_batches
520
+ }
521
+
522
+ translation_batches = translation_batches_methods[Translator.TRANSLATION_METHOD]
523
+ batch_length = len(translation_batches)
524
+
525
+ if(Translator.TRANSLATION_METHOD != "deepl"):
526
 
527
+ for i in range(0, batch_length, 2):
528
+ instructions = translation_batches[i]
529
+ prompt = translation_batches[i+1]
530
+
531
+ assert isinstance(instructions, (SystemTranslationMessage, str))
532
+ assert isinstance(prompt, (ModelTranslationMessage, str))
533
+
534
+ async_requests.append(Translator.handle_translation(model, i, batch_length, prompt, instructions))
 
535
 
536
+ else:
537
+ for i, batch in enumerate(translation_batches):
538
 
539
+ assert isinstance(batch, str)
540
+
541
+ async_requests.append(Translator.handle_translation(model, i, batch_length, batch, None))
542
+
543
  return async_requests
544
 
545
  ##-------------------start-of-generate_text_to_translate_batches()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
562
 
563
  prompt = []
564
  non_word_pattern = re.compile(r'^[\W_\s\n-]+$')
565
+ special_chars = ["▼", "△", "◇"]
566
+ quotes = ["「", "」", "『", "』", "【", "】", "\"", "'"]
567
+ part_chars = ["1","2","3","4","5","6","7","8","9", " "]
568
+
569
+ while(index < len(Translator.text_to_translate)):
570
 
571
+ sentence = Translator.text_to_translate[index].strip()
 
 
 
572
  lowercase_sentence = sentence.lower()
573
+
574
+ has_quotes = any(char in sentence for char in quotes)
575
  is_part_in_sentence = "part" in lowercase_sentence
576
+ is_special_char = any(char in sentence for char in special_chars)
577
+ is_part_char = all(char in sentence for char in part_chars)
578
+
579
+ if(len(prompt) < Translator.number_of_lines_per_batch):
580
+ if(is_special_char or is_part_in_sentence or is_part_char):
581
  prompt.append(f'{sentence}\n')
582
+ logging.debug(f"Sentence : {sentence}, Sentence is a pov change or part marker... adding to prompt.")
 
 
 
583
 
584
+ elif(non_word_pattern.match(sentence) or KatakanaUtil.is_punctuation(sentence) and not has_quotes):
585
+ logging.debug(f"Sentence : {sentence}, Sentence is punctuation... skipping.")
 
586
 
587
+ elif(sentence):
 
 
 
588
  prompt.append(f'{sentence}\n')
589
+ logging.debug(f"Sentence : {sentence}, Sentence is a valid sentence... adding to prompt.")
 
590
  else:
591
  return prompt, index
592
+
593
  index += 1
594
+
595
  return prompt, index
596
 
597
  ##-------------------start-of-build_translation_batches()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
607
 
608
  i = 0
609
 
610
+ while i < len(Translator.text_to_translate):
611
 
612
+ batch, i = Translator.generate_text_to_translate_batches(i)
613
  batch = ''.join(batch)
614
 
615
+ if(Translator.TRANSLATION_METHOD == 'openai'):
616
 
617
+ if(Translator.prompt_assembly_mode == 1):
618
+ system_msg = SystemTranslationMessage(content=str(Translator.openai_system_message))
619
  else:
620
+ system_msg = SystemTranslationMessage(content=str(Translator.openai_system_message))
621
 
622
+ Translator.openai_translation_batches.append(system_msg)
623
  model_msg = ModelTranslationMessage(content=batch)
624
+ Translator.openai_translation_batches.append(model_msg)
625
+
626
+ elif(Translator.TRANSLATION_METHOD == 'gemini'):
627
+ Translator.gemini_translation_batches.append(Translator.gemini_prompt)
628
+ Translator.gemini_translation_batches.append(batch)
629
 
630
  else:
631
+ Translator.deepl_translation_batches.append(batch)
 
632
 
633
+ logging_message = "Built Messages: \n\n"
634
+
635
+ batches_to_iterate = {
636
+ "openai": Translator.openai_translation_batches,
637
+ "gemini": Translator.gemini_translation_batches,
638
+ "deepl": Translator.deepl_translation_batches
639
+ }
640
 
641
  i = 0
642
 
643
+ batches = batches_to_iterate[Translator.TRANSLATION_METHOD]
644
+
645
+ for message in batches:
646
 
647
  i+=1
648
 
649
+ message = str(message) if Translator.TRANSLATION_METHOD != 'openai' else message.content # type: ignore
650
+
651
+ if(i % 2 == 1 and Translator.TRANSLATION_METHOD != 'deepl'):
652
+ logging_message += "\n" "------------------------" "\n"
653
 
654
+ logging_message += message + "\n"
 
655
 
656
+ logging.debug(logging_message)
657
 
658
  ##-------------------start-of-handle_cost_estimate_prompt()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
659
 
 
673
 
674
  """
675
 
676
+ translation_instructions_methods = {
677
+ "openai": Translator.openai_system_message,
678
+ "gemini": Translator.gemini_prompt,
679
+ "deepl": None,
680
+ }
681
+
682
+ translation_instructions = translation_instructions_methods[Translator.TRANSLATION_METHOD]
683
 
684
  ## get cost estimate and confirm
685
+ num_tokens, min_cost, model = EasyTL.calculate_cost(text=Translator.text_to_translate, service=Translator.TRANSLATION_METHOD, model=model,translation_instructions=translation_instructions)
686
 
687
  print("Note that the cost estimate is not always accurate, and may be higher than the actual cost. However cost calculation now includes output tokens.\n")
688
 
689
+ if(Translator.TRANSLATION_METHOD == "gemini"):
690
+ logging.info(f"As of Kudasai {Toolkit.CURRENT_VERSION}, Gemini Pro 1.0 is free to use under 15 requests per minute, Gemini Pro 1.5 is free to use under 2 requests per minute. Requests correspond to number_of_current_batches in the translation settings.")
 
 
 
 
691
 
692
+ logging.info("Estimated number of tokens : " + str(num_tokens))
693
+ logging.info("Estimated minimum cost : " + str(min_cost) + " USD")
 
694
 
695
  if(not omit_prompt):
696
  if(input("\nContinue? (1 for yes or 2 for no) : ") == "1"):
697
+ logging.info("User confirmed translation.")
698
 
699
  else:
700
+ logging.info("User cancelled translation.")
701
+ FileEnsurer.exit_kudasai()
 
702
 
703
  return model
704
 
705
  ##-------------------start-of-handle_translation()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
706
 
707
  @staticmethod
708
+ async def handle_translation(model:str,
709
+ batch_index:int,
710
+ length_of_batch:int,
711
+ text_to_translate:typing.Union[str, ModelTranslationMessage],
712
+ translation_instructions:typing.Union[str, SystemTranslationMessage, None]) -> tuple[int, str, str]:
713
 
714
  """
715
 
 
717
 
718
  Parameters:
719
  model (string) : The model of the service used to translate the text.
720
+ batch_index (int) : Which batch we are currently on.
721
+ length_of_batch (int) : How long the batches are.
722
+ text_to_translate (typing.Union[str, ModelTranslationMessage]) : The text to translate.
723
+ translation_instructions (typing.Union[str, SystemTranslationMessage, None]) : The translation instructions.
724
 
725
  Returns:
726
+ batch_index (int) : The batch index.
727
+ text_to_translate (str) : The text to translate.
728
+ translated_text (str) : The translated text
729
 
730
  """
731
 
732
  ## Basically limits the number of concurrent batches
733
+ async with Translator._semaphore:
734
  num_tries = 0
735
 
736
  while True:
 
739
  if(FileEnsurer.do_interrupt == True):
740
  raise Exception("Interrupted by user.")
741
 
742
+ batch_number = (batch_index // 2) + 1
743
+
744
+ logging.info(f"Trying translation for batch {batch_number} of {length_of_batch//2}...")
745
 
746
  try:
747
 
748
+ translation_methods = {
749
+ "openai": EasyTL.openai_translate_async,
750
+ "gemini": EasyTL.gemini_translate_async,
751
+ "deepl": EasyTL.deepl_translate_async
752
+ }
753
+
754
+ translation_params = {
755
+ "openai": {
756
+ "text": text_to_translate,
757
+ "decorator": Translator.decorator_to_use,
758
+ "translation_instructions": translation_instructions,
759
+ "model": model,
760
+ "temperature": Translator.openai_temperature,
761
+ "top_p": Translator.openai_top_p,
762
+ "stop": Translator.openai_stop,
763
+ "max_tokens": Translator.openai_max_tokens,
764
+ "presence_penalty": Translator.openai_presence_penalty,
765
+ "frequency_penalty": Translator.openai_frequency_penalty
766
+ },
767
+ "gemini": {
768
+ "text": text_to_translate,
769
+ "decorator": Translator.decorator_to_use,
770
+ "model": model,
771
+ "temperature": Translator.gemini_temperature,
772
+ "top_p": Translator.gemini_top_p,
773
+ "top_k": Translator.gemini_top_k,
774
+ "stop_sequences": Translator.gemini_stop_sequences,
775
+ "max_output_tokens": Translator.gemini_max_output_tokens
776
+ },
777
+ "deepl": {
778
+ "text": text_to_translate,
779
+ "decorator": Translator.decorator_to_use,
780
+ "context": Translator.deepl_context,
781
+ "split_sentences": Translator.deepl_split_sentences,
782
+ "preserve_formatting": Translator.deepl_preserve_formatting,
783
+ "formality": Translator.deepl_formality
784
+ }
785
+ }
786
+
787
+ assert isinstance(text_to_translate, ModelTranslationMessage if Translator.TRANSLATION_METHOD == "openai" else str)
788
+
789
+ translated_message = await translation_methods[Translator.TRANSLATION_METHOD](**translation_params[Translator.TRANSLATION_METHOD])
790
 
791
  ## will only occur if the max_batch_duration is exceeded, so we just return the untranslated text
792
  except MaxBatchDurationExceededException:
793
 
794
+ logging.error(f"Batch {batch_number} of {length_of_batch//2} was not translated due to exceeding the max request duration, returning the untranslated text...")
795
  break
796
 
797
  ## do not even bother if not a gpt 4 model, because gpt-3 seems unable to format properly
798
  ## since gemini is free, we can just try again if it's malformed
799
+ ## deepl should produce properly formatted text so we don't need to check
800
+ if("gpt-4" not in model and Translator.TRANSLATION_METHOD == "openai"):
801
  break
802
 
803
+ if(await Translator.check_if_translation_is_good(translated_message, text_to_translate)): # type: ignore
 
804
  break
805
 
806
+ if(num_tries >= Translator.num_of_malform_retries):
807
+ logging.warning(f"Batch {batch_number} of {length_of_batch//2} was malformed but exceeded the max number of retries ({Translator.num_of_malform_retries})")
808
  break
809
 
810
  else:
811
  num_tries += 1
812
+ logging.warning(f"Batch {batch_number} of {length_of_batch//2} was malformed, retrying...")
813
+ Translator.num_occurred_malformed_batches += 1
814
 
815
+ if(isinstance(text_to_translate, ModelTranslationMessage)):
816
+ text_to_translate = text_to_translate.content
817
 
818
  if(isinstance(translated_message, typing.List)):
819
+ translated_message = ''.join(translated_message) # type: ignore
820
+
821
+ logging.info(f"Translation for batch {batch_number} of {length_of_batch//2} completed.")
822
 
823
+ return batch_index, text_to_translate, translated_message # type: ignore
824
 
825
  ##-------------------start-of-check_if_translation_is_good()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
826
 
827
  @staticmethod
828
+ async def check_if_translation_is_good(translated_message:typing.Union[typing.List[str], str], text_to_translate:typing.Union[ModelTranslationMessage, str]) -> bool:
829
 
830
  """
831
 
 
833
 
834
  Parameters:
835
  translated_message (str) : the translated message.
836
+ text_to_translate (typing.Union[str, Message]) : the translation prompt.
837
 
838
  Returns:
839
  is_valid (bool) : whether or not the translation is valid.
840
 
841
  """
842
 
843
+ if(not isinstance(text_to_translate, str)):
844
+ prompt = text_to_translate.content
845
 
846
  else:
847
+ prompt = text_to_translate
848
 
849
  if(isinstance(translated_message, list)):
850
  translated_message = ''.join(translated_message)
851
 
 
 
852
  jap = [line for line in prompt.split('\n') if line.strip()] ## Remove blank lines
853
  eng = [line for line in translated_message.split('\n') if line.strip()] ## Remove blank lines
 
 
 
854
 
855
+ return len(jap) == len(eng)
856
 
857
  ##-------------------start-of-redistribute()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
858
 
859
  @staticmethod
860
+ def redistribute(text_to_translate:typing.Union[Message, str], translated_message:str) -> None:
861
 
862
  """
863
 
864
  Puts translated text back into the text file.
865
 
866
  Parameters:
867
+ text_to_translate (typing.Union[str, Message]) : the translation prompt.
868
  translated_message (str) : the translated message.
869
 
870
  """
871
 
872
+ if(not isinstance(text_to_translate, str)):
873
+ prompt = text_to_translate.content
874
 
875
  else:
876
+ prompt = text_to_translate
877
 
878
  ## Separates with hyphens if the mode is 1
879
+ if(Translator.je_check_mode == 1):
880
 
881
+ Translator.je_check_text.append("\n-------------------------\n"+ prompt + "\n\n")
882
+ Translator.je_check_text.append(translated_message + '\n')
883
 
884
  ## Mode two tries to pair the text for j-e checking, see fix_je() for more details
885
+ elif(Translator.je_check_mode == 2):
886
+ Translator.je_check_text.append(prompt)
887
+ Translator.je_check_text.append(translated_message)
888
 
889
  ## mode 1 is the default mode, uses regex and other nonsense to split sentences
890
+ if(Translator.sentence_fragmenter_mode == 1):
891
 
892
  sentences = re.findall(r"(.*?(?:(?:\"|\'|-|~|!|\?|%|\(|\)|\.\.\.|\.|---|\[|\])))(?:\s|$)", translated_message)
893
 
 
910
  build_string += f" {sentence}"
911
  continue
912
 
913
+ Translator.translated_text.append(sentence + '\n')
914
 
915
+ for i in range(len(Translator.translated_text)):
916
+ if Translator.translated_text[i] in patched_sentences:
917
+ index = patched_sentences.index(Translator.translated_text[i])
918
+ Translator.translated_text[i] = patched_sentences[index]
919
 
920
  ## mode 2 just assumes the LLM formatted it properly
921
+ elif(Translator.sentence_fragmenter_mode == 2):
922
 
923
+ Translator.translated_text.append(translated_message + '\n\n')
924
 
925
  ##-------------------start-of-fix_je()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
926
 
 
941
  i = 1
942
  final_list = []
943
 
944
+ while(i < len(Translator.je_check_text)):
945
+ jap = Translator.je_check_text[i-1].split('\n')
946
+ eng = Translator.je_check_text[i].split('\n')
947
 
948
+ jap = [line for line in jap if(line.strip())] # Remove blank lines
949
+ eng = [line for line in eng if(line.strip())] # Remove blank lines
950
 
951
  final_list.append("-------------------------\n")
952
 
953
  if(len(jap) == len(eng)):
954
+ for(jap_line, eng_line) in zip(jap, eng):
955
+ if(jap_line and eng_line): # check if jap_line and eng_line aren't blank
 
956
  final_list.append(jap_line + '\n\n')
957
  final_list.append(eng_line + '\n\n')
 
958
  final_list.append("--------------------------------------------------\n")
 
 
959
  else:
960
+ final_list.append(Translator.je_check_text[i-1] + '\n\n')
961
+ final_list.append(Translator.je_check_text[i] + '\n\n')
 
 
962
  final_list.append("--------------------------------------------------\n")
963
 
964
+ i += 2
965
 
966
  return final_list
967
+
968
  ##-------------------start-of-assemble_results()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
969
 
970
  @staticmethod
 
972
 
973
  """
974
 
975
+ Generates the Translator translation print result, does not directly output/return, but rather sets Translator.translation_print_result to the output.
976
 
977
  Parameters:
978
  time_start (float) : When the translation started.
 
982
 
983
  result = (
984
  f"Time Elapsed : {Toolkit.get_elapsed_time(time_start, time_end)}\n"
985
+ f"Number of malformed batches : {Translator.num_occurred_malformed_batches}\n\n"
986
  f"Debug text have been written to : {FileEnsurer.debug_log_path}\n"
987
  f"J->E text have been written to : {FileEnsurer.je_check_path}\n"
988
  f"Translated text has been written to : {FileEnsurer.translated_text_path}\n"
989
  f"Errors have been written to : {FileEnsurer.error_log_path}\n"
990
  )
991
 
992
+ Translator.translation_print_result = result
993
 
994
+ ##-------------------start-of-write_translator_results()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
995
 
996
  @staticmethod
997
  @permission_error_decorator()
998
+ def write_translator_results() -> None:
999
 
1000
  """
1001
 
1002
+ This function is called to write the results of the Translator module to the output directory.
1003
 
1004
  """
1005
 
 
1007
  FileEnsurer.standard_create_directory(FileEnsurer.output_dir)
1008
 
1009
  with open(FileEnsurer.error_log_path, 'a+', encoding='utf-8') as file:
1010
+ file.writelines(Translator.error_text)
1011
 
1012
  with open(FileEnsurer.je_check_path, 'w', encoding='utf-8') as file:
1013
+ file.writelines(Translator.je_check_text)
1014
 
1015
  with open(FileEnsurer.translated_text_path, 'w', encoding='utf-8') as file:
1016
+ file.writelines(Translator.translated_text)
1017
 
1018
  ## Instructions to create a copy of the output for archival
1019
  FileEnsurer.standard_create_directory(FileEnsurer.archive_dir)
1020
 
1021
  timestamp = Toolkit.get_timestamp(is_archival=True)
1022
 
1023
+ list_of_result_tuples = [('kudasai_translated_text', Translator.translated_text),
1024
+ ('kudasai_je_check_text', Translator.je_check_text),
1025
+ ('kudasai_error_log', Translator.error_text),
1026
+ ('debug_log', FileEnsurer.standard_read_file(FileEnsurer.debug_log_path))]
 
 
 
 
1027
 
1028
  FileEnsurer.archive_results(list_of_result_tuples,
1029
+ module='translator', timestamp=timestamp)
modules/gui/gui_json_util.py CHANGED
@@ -12,16 +12,16 @@ from handlers.json_handler import JsonHandler
12
 
13
  class GuiJsonUtil:
14
 
15
- current_kijiku_rules = dict()
16
 
17
  ##-------------------start-of-fetch_kijiku_setting_key_values()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
18
 
19
  @staticmethod
20
- def fetch_kijiku_setting_key_values(header:str, key_name:str) -> str:
21
 
22
  """
23
 
24
- Fetches the default values for the settings tab from the kijiku_settings.json file.
25
 
26
  Parameters:
27
  key_name (str) : Which value to fetch.
@@ -32,16 +32,16 @@ class GuiJsonUtil:
32
  """
33
 
34
  ## Done this way because if the value is None, it'll be shown as a blank string in the settings tab, which is not what we want.
35
- return GuiJsonUtil.current_kijiku_rules[header].get(key_name, "None")
36
 
37
- ##-------------------start-of-update_kijiku_settings_with_new_values()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
38
 
39
  @staticmethod
40
- def update_kijiku_settings_with_new_values(gradio_kijiku_rule:gr.File, new_values:typing.List[typing.Tuple[str,str]]) -> None:
41
 
42
  """
43
 
44
- Dumps the new values for the settings tab into the kijiku_settings.json file.
45
 
46
  Parameters:
47
  new_values (typing.List[typing.Tuple[str,str]]) : A list of tuples containing the key and value to be updated.
@@ -49,7 +49,7 @@ class GuiJsonUtil:
49
  """
50
 
51
  ## save old json in case of need to revert
52
- old_rules = GuiJsonUtil.current_kijiku_rules
53
  new_rules = old_rules.copy()
54
 
55
  try:
@@ -58,32 +58,32 @@ class GuiJsonUtil:
58
  for key, value in new_values:
59
  new_rules[header][key] = JsonHandler.convert_to_correct_type(key, str(value))
60
 
61
- JsonHandler.current_kijiku_rules = new_rules
62
  JsonHandler.validate_json()
63
 
64
  ## validate_json() sets a dict to the invalid placeholder if it's invalid, so if it's that, it's invalid
65
- assert JsonHandler.current_kijiku_rules != FileEnsurer.INVALID_KIJIKU_RULES_PLACEHOLDER
66
 
67
- ## so, because of how gradio deals with temp file, we need to both dump into the settings file from FileEnsurer AND the gradio_kijiku_rule file which is stored in the temp folder under AppData
68
  ## name is the path to the file btw
69
- with open(FileEnsurer.config_kijiku_rules_path, "w") as file:
70
  json.dump(new_rules, file)
71
 
72
- with open(gradio_kijiku_rule.name, "w") as file: ## type: ignore
73
  json.dump(new_rules, file)
74
 
75
- GuiJsonUtil.current_kijiku_rules = new_rules
76
 
77
  except Exception as e:
78
 
79
  ## revert to old data
80
- with open(FileEnsurer.config_kijiku_rules_path, "w") as file:
81
  json.dump(old_rules, file)
82
 
83
- with open(gradio_kijiku_rule.name, "w") as file: ## type: ignore
84
  json.dump(old_rules, file)
85
 
86
- GuiJsonUtil.current_kijiku_rules = old_rules
87
 
88
  ## throw error so webgui can tell user
89
  raise e
 
12
 
13
  class GuiJsonUtil:
14
 
15
+ current_translation_settings = dict()
16
 
17
  ##-------------------start-of-fetch_kijiku_setting_key_values()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
18
 
19
  @staticmethod
20
+ def fetch_translation_settings_key_values(header:str, key_name:str) -> str:
21
 
22
  """
23
 
24
+ Fetches the default values for the settings tab from the translation_settings.json file.
25
 
26
  Parameters:
27
  key_name (str) : Which value to fetch.
 
32
  """
33
 
34
  ## Done this way because if the value is None, it'll be shown as a blank string in the settings tab, which is not what we want.
35
+ return GuiJsonUtil.current_translation_settings[header].get(key_name, "None")
36
 
37
+ ##-------------------start-of-update_translation_settings_with_new_values()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
38
 
39
  @staticmethod
40
+ def update_translation_settings_with_new_values(gradio_translation_settings:gr.File, new_values:typing.List[typing.Tuple[str,str]]) -> None:
41
 
42
  """
43
 
44
+ Dumps the new values for the settings tab into the translation_settings.json file.
45
 
46
  Parameters:
47
  new_values (typing.List[typing.Tuple[str,str]]) : A list of tuples containing the key and value to be updated.
 
49
  """
50
 
51
  ## save old json in case of need to revert
52
+ old_rules = GuiJsonUtil.current_translation_settings
53
  new_rules = old_rules.copy()
54
 
55
  try:
 
58
  for key, value in new_values:
59
  new_rules[header][key] = JsonHandler.convert_to_correct_type(key, str(value))
60
 
61
+ JsonHandler.current_translation_settings = new_rules
62
  JsonHandler.validate_json()
63
 
64
  ## validate_json() sets a dict to the invalid placeholder if it's invalid, so if it's that, it's invalid
65
+ assert JsonHandler.current_translation_settings != FileEnsurer.INVALID_TRANSLATION_SETTINGS_PLACEHOLDER
66
 
67
+ ## so, because of how gradio deals with temp file, we need to both dump into the settings file from FileEnsurer AND the gradio_translation_settings file which is stored in the temp folder under AppData
68
  ## name is the path to the file btw
69
+ with open(FileEnsurer.config_translation_settings_path, "w") as file:
70
  json.dump(new_rules, file)
71
 
72
+ with open(gradio_translation_settings.name, "w") as file: ## type: ignore
73
  json.dump(new_rules, file)
74
 
75
+ GuiJsonUtil.current_translation_settings = new_rules
76
 
77
  except Exception as e:
78
 
79
  ## revert to old data
80
+ with open(FileEnsurer.config_translation_settings_path, "w") as file:
81
  json.dump(old_rules, file)
82
 
83
+ with open(gradio_translation_settings.name, "w") as file: ## type: ignore
84
  json.dump(old_rules, file)
85
 
86
+ GuiJsonUtil.current_translation_settings = old_rules
87
 
88
  ## throw error so webgui can tell user
89
  raise e
requirements.txt CHANGED
@@ -1,5 +1,4 @@
1
  backoff==2.2.1
2
- gradio==4.19.2
3
  kairyou==1.5.0
4
- easytl==0.1.0
5
- ja_core_news_lg @ https://github.com/explosion/spacy-models/releases/download/ja_core_news_lg-3.7.0/ja_core_news_lg-3.7.0-py3-none-any.whl#sha256=f08eecb4d40523045c9478ce59a67564fd71edd215f32c076fa91dc1f05cc7fd
 
1
  backoff==2.2.1
2
+ gradio==4.20.0
3
  kairyou==1.5.0
4
+ easytl==0.4.0-alpha-2
 
util/openai_model_info/openai_chat_model_info.csv DELETED
@@ -1,17 +0,0 @@
1
- ,Batch,Name,Price,Recommended Replacement,Depreciation Date,Shutdown Date (earliest)
2
- ,,,,,,
3
- ,First (depreciated I),gpt-3.5-turbo-0301,$0.0015 / 1K input tokens + $0.0020 / 1K output tokens,gpt-3.5-turbo-0613,"June 13, 2023","June 13, 2024"
4
- ,,gpt-4-0314,$0.03 / 1K input tokens + $0.06 / 1K output tokens,gpt-4-0613,"June 13, 2023","June 13, 2024"
5
- ,,gpt-4-32k-0314,$0.06 / 1K input tokens + $0.12 / 1K output tokens,gpt-4-32k-0613,"June 13, 2023","June 13, 2024"
6
- ,,,,,,
7
- ,Second (depreciated II),gpt-3.5-turbo-0613,$0.0015 / 1K input tokens + $0.0020 / 1K output tokens,gpt-3.5-turbo-1106,"November 6,2023","June 13, 2024"
8
- ,,gpt-3.5-turbo-16k-0613,$0.0030 / 1K input tokens + $0.0040 / 1K output tokens,gpt-3.5-turbo-1106,"November 6,2023","June 13, 2024"
9
- ,,,,,,
10
- ,Third (Outdated),gpt-3.5-turbo-1106,$0.0010 / 1K input tokens + $0.0020 / 1K output tokens,N/A,N/A,N/A
11
- ,,,,,,
12
- ,Fourth (Current),gpt-3.5-turbo-0125,$0.005 / 1K input tokens + $0.0015 / 1K output tokens,N/A,N/A,N/A
13
- ,,gpt-4-0613,$0.03 / 1K input tokens + $0.06 / 1K output tokens,N/A,N/A,N/A
14
- ,,gpt-4-32k-0613,$0.06 / 1K input tokens + $0.012 / 1K output tokens,N/A,N/A,N/A
15
- ,,,,,,
16
- ,Fifth (Future),gpt-4-1106-preview,$0.01 / 1K input tokens + $0.03 / 1K output tokens,N/A,N/A,N/A
17
- ,,gpt-4-0125-preview,$0.01 / 1K input tokens + $0.03 / 1K output tokens,N/A,N/A,N/A
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
util/openai_model_info/openai_chat_model_info.pdf DELETED
Binary file (31.3 kB)
 
util/openai_model_info/openai_chat_model_info.xlsx DELETED
Binary file (5.68 kB)
 
util/openai_model_info/webpage/openai_chat_model_info.html DELETED
@@ -1,2 +0,0 @@
1
- <meta http-equiv="Content-Type" content="text/html; charset=utf-8"><link type="text/css" rel="stylesheet" href="resources/sheet.css" >
2
- <style type="text/css">.ritz .waffle a { color: inherit; }.ritz .waffle .s1{background-color:#ffffff;text-align:right;color:#000000;font-family:'Arial';font-size:10pt;vertical-align:bottom;white-space:nowrap;direction:ltr;padding:2px 3px 2px 3px;}.ritz .waffle .s0{background-color:#ffffff;text-align:left;color:#000000;font-family:'Arial';font-size:10pt;vertical-align:bottom;white-space:nowrap;direction:ltr;padding:2px 3px 2px 3px;}</style><div class="ritz grid-container" dir="ltr"><table class="waffle" cellspacing="0" cellpadding="0"><thead><tr><th class="row-header freezebar-origin-ltr"></th><th id="0C0" style="width:100px;" class="column-headers-background">A</th><th id="0C1" style="width:141px;" class="column-headers-background">B</th><th id="0C2" style="width:159px;" class="column-headers-background">C</th><th id="0C3" style="width:336px;" class="column-headers-background">D</th><th id="0C4" style="width:182px;" class="column-headers-background">E</th><th id="0C5" style="width:167px;" class="column-headers-background">F</th><th id="0C6" style="width:151px;" class="column-headers-background">G</th></tr></thead><tbody><tr style="height: 20px"><th id="0R0" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">1</div></th><td></td><td class="s0" dir="ltr">Batch</td><td class="s0" dir="ltr">Name</td><td class="s0" dir="ltr">Price</td><td class="s0" dir="ltr">Recommended Replacement</td><td class="s0" dir="ltr">Depreciation Date</td><td class="s0" dir="ltr">Shutdown Date (earliest)</td></tr><tr style="height: 20px"><th id="0R1" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">2</div></th><td></td><td></td><td></td><td></td><td></td><td class="s0" dir="ltr"></td><td class="s0" dir="ltr"></td></tr><tr style="height: 20px"><th id="0R2" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">3</div></th><td></td><td class="s0" dir="ltr">First (depreciated I)</td><td class="s0" dir="ltr">gpt-3.5-turbo-0301</td><td class="s0" dir="ltr">$0.0015 / 1K input tokens + $0.0020 / 1K output tokens</td><td class="s0" dir="ltr">gpt-3.5-turbo-0613</td><td class="s1" dir="ltr">June 13, 2023</td><td class="s1" dir="ltr">June 13, 2024</td></tr><tr style="height: 20px"><th id="0R3" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">4</div></th><td></td><td></td><td class="s0" dir="ltr">gpt-4-0314</td><td class="s0" dir="ltr">$0.03 / 1K input tokens + $0.06 / 1K output tokens</td><td class="s0" dir="ltr">gpt-4-0613</td><td class="s1" dir="ltr">June 13, 2023</td><td class="s1" dir="ltr">June 13, 2024</td></tr><tr style="height: 20px"><th id="0R4" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">5</div></th><td></td><td></td><td class="s0" dir="ltr">gpt-4-32k-0314</td><td class="s0" dir="ltr">$0.06 / 1K input tokens + $0.12 / 1K output tokens</td><td class="s0" dir="ltr">gpt-4-32k-0613</td><td class="s1" dir="ltr">June 13, 2023</td><td class="s1" dir="ltr">June 13, 2024</td></tr><tr style="height: 20px"><th id="0R5" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">6</div></th><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr><tr style="height: 20px"><th id="0R6" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">7</div></th><td></td><td class="s0" dir="ltr">Second (depreciated II)</td><td class="s0" dir="ltr">gpt-3.5-turbo-0613</td><td class="s0" dir="ltr">$0.0015 / 1K input tokens + $0.0020 / 1K output tokens</td><td class="s0" dir="ltr">gpt-3.5-turbo-1106</td><td class="s1" dir="ltr">November 6,2023</td><td class="s1" dir="ltr">June 13, 2024</td></tr><tr style="height: 20px"><th id="0R7" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">8</div></th><td></td><td></td><td class="s0" dir="ltr">gpt-3.5-turbo-16k-0613</td><td class="s0" dir="ltr">$0.0030 / 1K input tokens + $0.0040 / 1K output tokens</td><td class="s0" dir="ltr">gpt-3.5-turbo-1106</td><td class="s1" dir="ltr">November 6,2023</td><td class="s1" dir="ltr">June 13, 2024</td></tr><tr style="height: 20px"><th id="0R8" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">9</div></th><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr><tr style="height: 20px"><th id="0R9" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">10</div></th><td></td><td class="s0" dir="ltr">Third (Outdated)</td><td class="s0" dir="ltr">gpt-3.5-turbo-1106</td><td class="s0" dir="ltr">$0.0010 / 1K input tokens + $0.0020 / 1K output tokens</td><td class="s0" dir="ltr">N/A</td><td class="s0" dir="ltr">N/A</td><td class="s0" dir="ltr">N/A</td></tr><tr style="height: 20px"><th id="0R10" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">11</div></th><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr><tr style="height: 20px"><th id="0R11" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">12</div></th><td></td><td class="s0" dir="ltr">Fourth (Current)</td><td class="s0" dir="ltr">gpt-3.5-turbo-0125</td><td class="s0" dir="ltr">$0.005 / 1K input tokens + $0.0015 / 1K output tokens</td><td class="s0" dir="ltr">N/A</td><td class="s0" dir="ltr">N/A</td><td class="s0" dir="ltr">N/A</td></tr><tr style="height: 20px"><th id="0R12" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">13</div></th><td></td><td></td><td class="s0" dir="ltr">gpt-4-0613</td><td class="s0" dir="ltr">$0.03 / 1K input tokens + $0.06 / 1K output tokens</td><td class="s0" dir="ltr">N/A</td><td class="s0" dir="ltr">N/A</td><td class="s0" dir="ltr">N/A</td></tr><tr style="height: 20px"><th id="0R13" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">14</div></th><td></td><td></td><td class="s0" dir="ltr">gpt-4-32k-0613</td><td class="s0" dir="ltr">$0.06 / 1K input tokens + $0.012 / 1K output tokens</td><td class="s0" dir="ltr">N/A</td><td class="s0" dir="ltr">N/A</td><td class="s0" dir="ltr">N/A</td></tr><tr style="height: 20px"><th id="0R14" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">15</div></th><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr><tr style="height: 20px"><th id="0R15" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">16</div></th><td></td><td class="s0">Fifth (Future)</td><td class="s0">gpt-4-1106-preview</td><td class="s0">$0.01 / 1K input tokens + $0.03 / 1K output tokens</td><td class="s0">N/A</td><td class="s0">N/A</td><td class="s0">N/A</td></tr><tr style="height: 20px"><th id="0R16" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">17</div></th><td></td><td class="s0"></td><td class="s0">gpt-4-0125-preview</td><td class="s0">$0.01 / 1K input tokens + $0.03 / 1K output tokens</td><td class="s0">N/A</td><td class="s0">N/A</td><td class="s0">N/A</td></tr></tbody></table></div>
 
 
 
util/openai_model_info/webpage/resources/sheet.css DELETED
The diff for this file is too large to render. See raw diff
 
webgui.py CHANGED
The diff for this file is too large to render. See raw diff