added everything
Browse files- README.md +219 -99
- demo/translation_settings.json +45 -0
- handlers/json_handler.py +124 -157
- kudasai.py +210 -106
- lib/common/translation_settings_description.txt +79 -0
- lib/gui/HUGGING_FACE_README.md +224 -0
- lib/gui/save_to_file.js +9 -0
- models/kaiseki.py +0 -583
- modules/common/exceptions.py +27 -6
- modules/common/file_ensurer.py +88 -45
- modules/common/logger.py +0 -132
- modules/common/toolkit.py +28 -4
- models/kijiku.py → modules/common/translator.py +384 -343
- modules/gui/gui_json_util.py +17 -17
- requirements.txt +2 -3
- util/openai_model_info/openai_chat_model_info.csv +0 -17
- util/openai_model_info/openai_chat_model_info.pdf +0 -0
- util/openai_model_info/openai_chat_model_info.xlsx +0 -0
- util/openai_model_info/webpage/openai_chat_model_info.html +0 -2
- util/openai_model_info/webpage/resources/sheet.css +0 -0
- webgui.py +0 -0
README.md
CHANGED
@@ -1,141 +1,229 @@
|
|
1 |
-
---
|
2 |
-
license: gpl-3.0
|
3 |
-
title: Kudasai
|
4 |
-
sdk: gradio
|
5 |
-
emoji: 🈷️
|
6 |
-
python_version: 3.10.0
|
7 |
-
app_file: webgui.py
|
8 |
-
colorFrom: gray
|
9 |
-
colorTo: gray
|
10 |
-
short_description: Japanese-English preprocessor with automated translation.
|
11 |
-
pinned: true
|
12 |
-
---
|
13 |
-
|
14 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
15 |
**Table of Contents**
|
16 |
|
17 |
-
- [Notes](#notes)
|
18 |
-
- [
|
19 |
-
- [
|
20 |
-
- [
|
21 |
-
- [
|
22 |
-
- [
|
23 |
-
- [
|
24 |
-
- [
|
25 |
-
- [
|
26 |
-
- [
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
|
28 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
29 |
-
**Notes**<a name="notes"></a>
|
|
|
|
|
30 |
|
31 |
-
|
|
|
|
|
32 |
|
33 |
Streamlining Japanese-English Translation with Advanced Preprocessing and Integrated Translation Technologies.
|
34 |
|
35 |
Preprocessor and Translation logic is sourced from external packages, which I also designed, see [Kairyou](https://github.com/Bikatr7/Kairyou) and [EasyTL](https://github.com/Bikatr7/easytl) for more information.
|
36 |
|
37 |
-
Kudasai has a public trello board, you can find it [here](https://trello.com/b/Wsuwr24S/kudasai) to see what I'm working on.
|
38 |
|
39 |
-
|
|
|
40 |
|
41 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
42 |
-
**
|
43 |
|
44 |
-
|
45 |
|
46 |
-
|
47 |
|
48 |
-
|
49 |
|
50 |
-
|
51 |
|
52 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
|
54 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
|
|
55 |
|
56 |
-
|
57 |
|
58 |
-
|
59 |
|
60 |
-
|
|
|
|
|
61 |
|
62 |
-
|
63 |
|
64 |
-
|
65 |
|
66 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
67 |
|
68 |
-
|
69 |
|
70 |
-
|
71 |
|
72 |
-
|
73 |
|
74 |
-
|
75 |
|
76 |
-
|
77 |
|
78 |
-
|
79 |
|
80 |
-
|
|
|
|
|
|
|
|
|
81 |
|
82 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
83 |
|
84 |
-
|
85 |
|
86 |
-
|
87 |
|
88 |
-
|
|
|
|
|
89 |
|
90 |
-
|
|
|
91 |
|
92 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
93 |
|
94 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
95 |
|
96 |
-
**
|
|
|
|
|
97 |
|
98 |
-
Kudasai
|
99 |
|
100 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
101 |
|
102 |
-
|
103 |
|
104 |
-
|
105 |
|
106 |
-
|
107 |
|
108 |
-
You can
|
109 |
|
110 |
-
|
|
|
|
|
|
|
|
|
111 |
|
112 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
113 |
|
114 |
-
**
|
115 |
|
116 |
(Fairly technical, can be abstracted away by using default settings or someone else's settings file.)
|
117 |
|
118 |
-
|
119 |
-
Kijiku Settings:
|
120 |
|
121 |
-
prompt_assembly_mode : 1 or 2. 1 means the system message will actually be treated as a system message. 2 means it'll be treated as a user message. 1 is recommend for gpt-4 otherwise either works. For Gemini, this setting is ignored.
|
122 |
|
123 |
-
number_of_lines_per_batch : The number of lines to be built into a prompt at once. Theoretically, more lines would be more cost effective, but other complications may occur with higher lines. So far been tested up to 48.
|
124 |
|
125 |
-
sentence_fragmenter_mode : 1 or 2 (1 - via regex and other nonsense) 2 - None (Takes formatting and text directly from API return)) the API can sometimes return a result on a single line, so this determines the way
|
126 |
|
127 |
-
je_check_mode : 1 or 2, 1 will print out the jap then the english below separated by ---, 2 will attempt to pair the english and jap sentences, placing the jap above the eng. If it cannot, it will default to 1. Use 2 for newer models.
|
128 |
|
129 |
-
number_of_malformed_batch_retries : (Malformed batch is when je-fixing fails) How many times
|
130 |
|
131 |
-
batch_retry_timeout : How long
|
132 |
|
133 |
-
number_of_concurrent_batches : How many translations batches
|
134 |
----------------------------------------------------------------------------------
|
135 |
Open AI Settings:
|
136 |
See https://platform.openai.com/docs/api-reference/chat/create for further details
|
137 |
----------------------------------------------------------------------------------
|
138 |
-
openai_model : ID of the model to use.
|
139 |
|
140 |
openai_system_message : Instructions to the model. Basically tells the model how to translate.
|
141 |
|
@@ -143,13 +231,13 @@ For further details on the settings file, see [here](#translation-with-llms-sett
|
|
143 |
|
144 |
openai_top_p : An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. I generally recommend altering this or temperature but not both.
|
145 |
|
146 |
-
openai_n : How many chat completion choices to generate for each input message. Do not change this.
|
147 |
|
148 |
-
openai_stream : If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. See the OpenAI python library on GitHub for example code. Do not change this.
|
149 |
|
150 |
-
openai_stop : Up to 4 sequences where the API will stop generating further tokens. Do not change this.
|
151 |
|
152 |
-
openai_logit_bias : Modifies the likelihood of specified tokens appearing in the completion. Do not change this.
|
153 |
|
154 |
openai_max_tokens : The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. I wouldn't recommend changing this. Is none by default. If you change to an integer, make sure it doesn't exceed that model's context length or your request will fail and repeat till timeout.
|
155 |
|
@@ -157,12 +245,12 @@ For further details on the settings file, see [here](#translation-with-llms-sett
|
|
157 |
|
158 |
openai_frequency_penalty : Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. Negative values encourage repetition. Should leave this at 0.0.
|
159 |
----------------------------------------------------------------------------------
|
160 |
-
openai_stream, openai_logit_bias, openai_stop and openai_n are included for completion's sake, current versions of Kudasai will hardcode their values when validating the
|
161 |
----------------------------------------------------------------------------------
|
162 |
Gemini Settings:
|
163 |
-
https://ai.google.dev/docs/concepts#model-parameters for further details
|
164 |
----------------------------------------------------------------------------------
|
165 |
-
gemini_model : The model to use. Currently only supports gemini-pro and gemini-pro-vision, the 1.0 model and
|
166 |
|
167 |
gemini_prompt : Instructions to the model. Basically tells the model how to translate.
|
168 |
|
@@ -172,43 +260,69 @@ For further details on the settings file, see [here](#translation-with-llms-sett
|
|
172 |
|
173 |
gemini_top_k : Determines the number of most probable tokens to consider for each selection step. A higher value increases diversity, a lower value makes the output more deterministic.
|
174 |
|
175 |
-
gemini_candidate_count : The number of candidates to generate for each input message. Do not change this.
|
176 |
|
177 |
-
gemini_stream : If set, partial message deltas will be sent, like in Gemini Chat. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. Do not change this.
|
178 |
|
179 |
-
gemini_stop_sequences : Up to 4 sequences where the API will stop generating further tokens. Do not change this.
|
180 |
|
181 |
gemini_max_output_tokens : The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. I wouldn't recommend changing this. Is none by default. If you change to an integer, make sure it doesn't exceed that model's context length or your request will fail and repeat till timeout.
|
182 |
----------------------------------------------------------------------------------
|
183 |
-
gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for completion's sake, current versions of Kudasai will hardcode their values when validating the
|
|
|
|
|
|
|
184 |
----------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
185 |
|
186 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
187 |
|
188 |
-
**Web GUI**<a name="webgui"></a>
|
189 |
|
190 |
-
|
191 |
|
192 |
-
|
193 |
-
![Indexing Screen | Kairyou](https://i.imgur.com/0a2mzOI.png)
|
194 |
|
195 |
-
|
196 |
-
![Preprocessing Screen | Kairyou](https://i.imgur.com/2pt06gC.png)
|
197 |
|
198 |
-
|
199 |
-
![Translation Screen | Kaiseki](https://i.imgur.com/X98JYsp.png)
|
200 |
|
201 |
-
|
202 |
-
![
|
203 |
|
204 |
-
|
205 |
-
![
|
206 |
|
207 |
-
|
208 |
-
![
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
209 |
|
210 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
211 |
-
**License**<a name="license"></a>
|
212 |
|
213 |
This project (Kudasai) is licensed under the GNU General Public License (GPL). You can find the full text of the license in the [LICENSE](License.md) file.
|
214 |
|
@@ -217,12 +331,18 @@ The GPL is a copyleft license that promotes the principles of open-source softwa
|
|
217 |
Please note that this information is a brief summary of the GPL. For a detailed understanding of your rights and obligations under this license, please refer to the full license text.
|
218 |
|
219 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
220 |
-
**Contact**<a name="contact"></a>
|
221 |
|
222 |
-
If you have any questions, comments, or concerns, please feel free to contact me at [
|
223 |
|
224 |
For any bugs or suggestions please use the issues tab [here](https://github.com/Bikatr7/Kudasai/issues).
|
225 |
|
226 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
227 |
|
228 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
2 |
**Table of Contents**
|
3 |
|
4 |
+
- [**Notes**](#notes)
|
5 |
+
- [**Dependencies**](#dependencies)
|
6 |
+
- [**Quick Start**](#quick-start)
|
7 |
+
- [**Command Line Interface (CLI)**](#command-line-interface-cli)
|
8 |
+
- [Usage](#usage)
|
9 |
+
- [Preprocess Mode](#preprocess-mode)
|
10 |
+
- [Translate Mode](#translate-mode)
|
11 |
+
- [Additional Notes](#additional-notes)
|
12 |
+
- [**Preprocessing**](#preprocessing)
|
13 |
+
- [**Translator**](#translator)
|
14 |
+
- [**Translator Settings**](#translator-settings)
|
15 |
+
- [**Web GUI**](#web-gui)
|
16 |
+
- [**Hugging Face**](#hugging-face)
|
17 |
+
- [**License**](#license)
|
18 |
+
- [**Contact**](#contact)
|
19 |
+
- [**Acknowledgements**](#acknowledgements)
|
20 |
|
21 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
22 |
+
## **Notes**<a name="notes"></a>
|
23 |
+
|
24 |
+
Windows 10 and Linux Mint are the only tested operating systems, feel free to test on other operating systems and report back to me. I will do my best to fix any issues that arise.
|
25 |
|
26 |
+
To see the README for the Hugging Face hosted version of Kudasai, please see [here](https://github.com/Bikatr7/Kudasai/blob/main/lib/gui/HUGGING_FACE_README.md). Further WebGUI documentation can be found there as well.
|
27 |
+
|
28 |
+
Python version: 3.10+
|
29 |
|
30 |
Streamlining Japanese-English Translation with Advanced Preprocessing and Integrated Translation Technologies.
|
31 |
|
32 |
Preprocessor and Translation logic is sourced from external packages, which I also designed, see [Kairyou](https://github.com/Bikatr7/Kairyou) and [EasyTL](https://github.com/Bikatr7/easytl) for more information.
|
33 |
|
34 |
+
Kudasai has a public trello board, you can find it [here](https://trello.com/b/Wsuwr24S/kudasai) to see what I'm working on and what's coming up.
|
35 |
|
36 |
+
Kudasai is proud to have been a Backdrop Build v3 Finalist:
|
37 |
+
https://backdropbuild.com/builds/v3/kudasai
|
38 |
|
39 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
40 |
+
## **Dependencies**<a name="dependencies"></a>
|
41 |
|
42 |
+
backoff==2.2.1
|
43 |
|
44 |
+
gradio==4.20.0
|
45 |
|
46 |
+
kairyou==1.5.0
|
47 |
|
48 |
+
easytl==0.3.3
|
49 |
|
50 |
+
or see requirements.txt
|
51 |
+
|
52 |
+
Also requires spacy's ja_core_news_lg model, which can be installed via the following command:
|
53 |
+
|
54 |
+
```bash
|
55 |
+
python -m spacy download ja_core_news_lg
|
56 |
+
```
|
57 |
+
|
58 |
+
or on Linux
|
59 |
+
|
60 |
+
```bash
|
61 |
+
python3 -m spacy download ja_core_news_lg
|
62 |
+
```
|
63 |
|
64 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
65 |
+
## **Quick Start**<a name="quick-start"></a>
|
66 |
|
67 |
+
Windows is assumed for the rest of this README, but the process should be similar for Linux. This is for the console version, for something less linear, see the [Web GUI](#webgui) section.
|
68 |
|
69 |
+
Due to PyPi limitations, you need to install SpaCy's JP Model, which can not be included automatically due to it being a direct dependency link which PyPi does not support. Make sure you do this after installing the requirements.txt file as it requires Kairyou/SpaCy to be installed first.
|
70 |
|
71 |
+
```bash
|
72 |
+
python -m spacy download ja_core_news_lg
|
73 |
+
```
|
74 |
|
75 |
+
Simply run Kudasai.py, enter a txt file path to the text you wish to preprocess/translate, and then insert a replacement json file path if you wish to use one. If you do not wish to use a replacement json file, you can simply input a blank space and Kudasai will skip preprocessing and go straight to translation.
|
76 |
|
77 |
+
Kudasai will offer to index the text, which is useful for finding new names to add to the replacement json file. This is optional and can be skipped.
|
78 |
|
79 |
+
After preprocessing is completed (if triggered), you will be prompted to choose a translation method.
|
80 |
+
|
81 |
+
You can choose between OpenAI, Gemini, and DeepL. Each have their own pros and cons, but OpenAI is the recommended translation method. DeepL and Gemini currently offer free versions, but all three require an api key, you will be prompted to enter this key when you choose to run the translation module.
|
82 |
+
|
83 |
+
Next, Kudasai will ask you to confirm it's settings. This can be overwhelming, but you can simply enter 1 to confirm and use the default settings. If you wish to change them, you can do so here.
|
84 |
+
|
85 |
+
See the [**Translator Settings**](#translator-settings) section for more information on Kudasai's Translation settings, but default should run fine. Inside the demo folder is a copy of the settings I use to translate COTE should you wish to use them. There is also a demo txt file in the demo folder that you can use to test Kudasai.
|
86 |
+
|
87 |
+
Kudasai will then ask if you want to change your api key, simply enter 2 for now.
|
88 |
+
|
89 |
+
Next Kudasai will display an estimated cost of translation, this is based on the number of tokens in the preprocessed text as determined by tiktoken for OpenAI, by Google for Gemini, and by DeepL for DeepL. Kudasai will then prompt for confirmation, if this is fine, enter 1 to run the translation module otherwise 2 to exit.
|
90 |
+
|
91 |
+
Kudasai will then run the translation module and output the translated text and other logs to the output folder in the same directory as Kudasai.py.
|
92 |
|
93 |
+
These files are:
|
94 |
|
95 |
+
"debug_log.txt" : A log of crucial information that occurred during Kudasai's run, useful for debugging or reporting issues as well as seeing what was done.
|
96 |
|
97 |
+
"error_log.txt" : A log of errors that occurred during Kudasai's run if any, useful for debugging or reporting issues.
|
98 |
|
99 |
+
"je_check_text.txt" : A log of the Japanese and English sentences that were paired together, useful for checking the accuracy of the translation and further editing of a machine translation.
|
100 |
|
101 |
+
"preprocessed_text.txt" : The preprocessed text, the text output by Kairyou (preprocessor).
|
102 |
|
103 |
+
"preprocessing_results.txt" : A log of the results of the preprocessing, shows what was replaced and how many times.
|
104 |
|
105 |
+
"translated_text.txt" : The translated text, the text output by Kaiseki or Kijiku.
|
106 |
+
|
107 |
+
Old runs are stored in the archive folder in output as well.
|
108 |
+
|
109 |
+
If you have any questions, comments, or concerns, please feel free to open an issue.
|
110 |
|
111 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
112 |
+
## **Command Line Interface (CLI)**<a name="cli"></a>
|
113 |
+
|
114 |
+
Kudasai provides a Command Line Interface (CLI) for preprocessing and translating text files. This section details how to use the CLI, including the required and optional arguments for each mode.
|
115 |
+
|
116 |
+
### Usage
|
117 |
+
|
118 |
+
The CLI supports two modes: `preprocess` and `translate`. Each mode requires specific arguments to function properly.
|
119 |
+
|
120 |
+
#### Preprocess Mode
|
121 |
+
|
122 |
+
The `preprocess` mode preprocesses the text file using the provided replacement JSON file.
|
123 |
+
|
124 |
+
**Command Structure:**
|
125 |
+
|
126 |
+
```bash
|
127 |
+
python path_to_kudasai.py preprocess <input_file> <replacement_json> [<knowledge_base>]
|
128 |
+
```
|
129 |
+
|
130 |
+
**Required Arguments:**
|
131 |
+
- `<input_file>`: Path to the text file to preprocess.
|
132 |
+
- `<replacement_json>`: Path to the replacement JSON file.
|
133 |
+
|
134 |
+
**Optional Arguments:**
|
135 |
+
- `<knowledge_base>`: Path to the knowledge base file (directory, file, or text).
|
136 |
+
|
137 |
+
**Example:**
|
138 |
+
|
139 |
+
```bash
|
140 |
+
python C:\\path\\to\\kudasai.py preprocess "C:\\path\\to\\input_file.txt" "C:\\path\\to\\replacement_json.json" "C:\\path\\to\\knowledge_base"
|
141 |
+
```
|
142 |
+
|
143 |
+
#### Translate Mode
|
144 |
|
145 |
+
The `translate` mode translates the text file using the specified translation method.
|
146 |
|
147 |
+
**Command Structure:**
|
148 |
|
149 |
+
```bash
|
150 |
+
python path_to_kudasai.py translate <input_file> <translation_method> [<translation_settings_json>] [<api_key>]
|
151 |
+
```
|
152 |
|
153 |
+
**Required Arguments:**
|
154 |
+
- `<input_file>`: Path to the text file to translate.
|
155 |
|
156 |
+
**Optional Arguments:**
|
157 |
+
- `<translation_method>`: Translation method to use (`'deepl'`, `'openai'`, or `'gemini'`). Defaults to `'deepl'`.
|
158 |
+
- `<translation_settings_json>`: Path to the translation settings JSON file (overrides current settings).
|
159 |
+
- `<api_key>`: API key for the translation service. If not provided, it will use the in the settings directory or prompt for it if that's not found.
|
160 |
+
|
161 |
+
**Example:**
|
162 |
+
|
163 |
+
```bash
|
164 |
+
python C:\\path\\to\\kudasai.py translate "C:\\path\\to\\input_file.txt" gemini "C:\\path\\to\\translation_settings.json" "YOUR_API_KEY"
|
165 |
+
```
|
166 |
+
|
167 |
+
### Additional Notes
|
168 |
+
- All arguments should be enclosed in double quotes if they contain spaces. Double quotes are optional and will be stripped. Single quotes are not allowed.
|
169 |
|
170 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
171 |
|
172 |
+
## **Preprocessing**<a name="preprocessing"></a>
|
173 |
+
|
174 |
+
Preprocessing is the act of preparing text for translation by replacing certain words or phrases with their translated counterparts.
|
175 |
|
176 |
+
Kudasai uses Kairyou for preprocessing, which is a powerful preprocessor that can replace text in a text file based on a json file. This is useful for replacing names, places, and other things that may not translate well or to simply speed up the translation process.
|
177 |
|
178 |
+
You can run the preprocessor by using the CLI or simply running kudasai.py as instructed in the [Quick Start](#quick-start) section.
|
179 |
+
|
180 |
+
Many replacement json files are included in the jsons folder, you can also make your own if you wish provided it follows the same format. See an example below
|
181 |
+
Kudasai/Kairyou works with both Kudasai and Fukuin Json's, the below is a Kudasai type json.
|
182 |
+
|
183 |
+
![Example JSON](https://i.imgur.com/u3FnUia.jpg)
|
184 |
+
|
185 |
+
---------------------------------------------------------------------------------------------------------------------------------------------------
|
186 |
|
187 |
+
## **Translator**<a name="translator"></a>
|
188 |
|
189 |
+
Kudasai uses EasyTL for translation, which is a versatile translation library that uses several translation APIs to translate text.
|
190 |
|
191 |
+
Kudasai currently supports OpenAI, Gemini, and DeepL for translation. OpenAI is the recommended translation method, but DeepL and Gemini are also good alternatives.
|
192 |
|
193 |
+
You can run the translator by running kudasai.py as instructed in the [Quick Start](#quick-start) section.
|
194 |
|
195 |
+
Note that you need an API key for OpenAI, Gemini, and DeepL. You will be prompted to enter this key when you choose to run the translation module.
|
196 |
+
|
197 |
+
The translator has a lot of settings, simply using the default settings is fine or the one provided in the demo folder. You can also change these manually when confirming your settings, as well as loading a custom json as your settings by pressing c at this window, with the settings in the script directory.
|
198 |
+
|
199 |
+
The settings are fairly complex, see the below section [Translator Settings](#translator-settings) for more information.
|
200 |
|
201 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
202 |
|
203 |
+
## **Translator Settings**<a name="translator-settings"></a>
|
204 |
|
205 |
(Fairly technical, can be abstracted away by using default settings or someone else's settings file.)
|
206 |
|
207 |
+
Base Translation Settings:
|
|
|
208 |
|
209 |
+
prompt_assembly_mode : 1 or 2. 1 means the system message will actually be treated as a system message. 2 means it'll be treated as a user message. 1 is recommend for gpt-4 otherwise either works. For Gemini & DeepL, this setting is ignored.
|
210 |
|
211 |
+
number_of_lines_per_batch : The number of lines to be built into a prompt at once. Theoretically, more lines would be more cost effective, but other complications may occur with higher lines. So far been tested up to 48 by me.
|
212 |
|
213 |
+
sentence_fragmenter_mode : 1 or 2 (1 - via regex and other nonsense) 2 - None (Takes formatting and text directly from API return)) the API can sometimes return a result on a single line, so this determines the way Kudasai fragments the sentences if at all. Use 2 for newer models and Deepl.
|
214 |
|
215 |
+
je_check_mode : 1 or 2, 1 will print out the jap then the english below separated by ---, 2 will attempt to pair the english and jap sentences, placing the jap above the eng. If it cannot, it will default to 1. Use 2 for newer models and DeepL.
|
216 |
|
217 |
+
number_of_malformed_batch_retries : (Malformed batch is when je-fixing fails) How many times Kudasai will attempt to mend a malformed batch (mending is resending the request). Be careful with increasing as cost increases at (cost * length * n) at worst case. This setting is ignored if je_check_mode is set to 1.
|
218 |
|
219 |
+
batch_retry_timeout : How long Kudasai will try to translate a batch in seconds, if a requests exceeds this duration, Kudasai will leave it untranslated.
|
220 |
|
221 |
+
number_of_concurrent_batches : How many translations batches Kudasai will send to the translation API at a time. For OpenAI, be conservative as rate-limiting is aggressive, I'd suggest 3-5. For Gemini, do not exceed 15 for 1.0 or 2 for 1.5. This setting more or less doesn't matter for DeepL.
|
222 |
----------------------------------------------------------------------------------
|
223 |
Open AI Settings:
|
224 |
See https://platform.openai.com/docs/api-reference/chat/create for further details
|
225 |
----------------------------------------------------------------------------------
|
226 |
+
openai_model : ID of the model to use. Kudasai only works with 'chat' models.
|
227 |
|
228 |
openai_system_message : Instructions to the model. Basically tells the model how to translate.
|
229 |
|
|
|
231 |
|
232 |
openai_top_p : An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. I generally recommend altering this or temperature but not both.
|
233 |
|
234 |
+
openai_n : How many chat completion choices to generate for each input message. Do not change this, as Kudasai will always use 1.
|
235 |
|
236 |
+
openai_stream : If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. See the OpenAI python library on GitHub for example code. Do not change this as Kudasai does not support this feature.
|
237 |
|
238 |
+
openai_stop : Up to 4 sequences where the API will stop generating further tokens. Do not change this as Kudasai does not support this feature.
|
239 |
|
240 |
+
openai_logit_bias : Modifies the likelihood of specified tokens appearing in the completion. Do not change this as Kudasai does not support this feature.
|
241 |
|
242 |
openai_max_tokens : The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. I wouldn't recommend changing this. Is none by default. If you change to an integer, make sure it doesn't exceed that model's context length or your request will fail and repeat till timeout.
|
243 |
|
|
|
245 |
|
246 |
openai_frequency_penalty : Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. Negative values encourage repetition. Should leave this at 0.0.
|
247 |
----------------------------------------------------------------------------------
|
248 |
+
openai_stream, openai_logit_bias, openai_stop and openai_n are included for completion's sake, current versions of Kudasai will hardcode their values when validating the translation_settings.json to their default values. As different values for these settings do not have a use case in Kudasai's current implementation.
|
249 |
----------------------------------------------------------------------------------
|
250 |
Gemini Settings:
|
251 |
+
See https://ai.google.dev/docs/concepts#model-parameters for further details
|
252 |
----------------------------------------------------------------------------------
|
253 |
+
gemini_model : The model to use. Currently only supports gemini-pro and gemini-pro-vision, the 1.0 model and 1.5 models and their aliases.
|
254 |
|
255 |
gemini_prompt : Instructions to the model. Basically tells the model how to translate.
|
256 |
|
|
|
260 |
|
261 |
gemini_top_k : Determines the number of most probable tokens to consider for each selection step. A higher value increases diversity, a lower value makes the output more deterministic.
|
262 |
|
263 |
+
gemini_candidate_count : The number of candidates to generate for each input message. Do not change this as Kudasai will always use 1.
|
264 |
|
265 |
+
gemini_stream : If set, partial message deltas will be sent, like in Gemini Chat. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. Do not change this as Kudasai does not support this feature.
|
266 |
|
267 |
+
gemini_stop_sequences : Up to 4 sequences where the API will stop generating further tokens. Do not change this as Kudasai does not support this feature.
|
268 |
|
269 |
gemini_max_output_tokens : The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. I wouldn't recommend changing this. Is none by default. If you change to an integer, make sure it doesn't exceed that model's context length or your request will fail and repeat till timeout.
|
270 |
----------------------------------------------------------------------------------
|
271 |
+
gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for completion's sake, current versions of Kudasai will hardcode their values when validating the translation_settings.json to their default values. As different values for these settings do not have a use case in Kudasai's current implementation.
|
272 |
+
----------------------------------------------------------------------------------
|
273 |
+
Deepl Settings:
|
274 |
+
See https://developers.deepl.com/docs/api-reference/translate for further details
|
275 |
----------------------------------------------------------------------------------
|
276 |
+
deepl_context : The context in which the text should be translated. This is used to improve the translation. If you don't have any context, you can leave this empty. This is a DeepL Alpha feature and could be subject to change.
|
277 |
+
|
278 |
+
deepl_split_sentences : How the text should be split into sentences. Possible values are 'OFF', 'ALL', 'NO_NEWLINES'.
|
279 |
+
|
280 |
+
deepl_preserve_formatting : Whether the formatting of the text should be preserved. If you don't want to preserve the formatting, you can set this to False. Otherwise, set it to True.
|
281 |
+
|
282 |
+
deepl_formality : The formality of the text. Possible values are 'default', 'more', 'less', 'prefer_more', 'prefer_less'.
|
283 |
|
284 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
285 |
|
286 |
+
## **Web GUI**<a name="webgui"></a>
|
287 |
|
288 |
+
Kudasai also offers a Web GUI. It has all the main functionality of the program but in an easier and non-linear way.
|
289 |
|
290 |
+
To run the Web GUI, simply run webgui.py which is in the same directory as kudasai.py
|
|
|
291 |
|
292 |
+
Below are some images of the Web GUI.
|
|
|
293 |
|
294 |
+
Detailed Documentation for this can be found on the Hugging Face hosted version of Kudasai [here](https://huggingface.co/spaces/Bikatr7/Kudasai/blob/main/README.md).
|
|
|
295 |
|
296 |
+
Name Indexing | Kairyou:
|
297 |
+
![Name Indexing Screen | Kairyou](https://i.imgur.com/QCPqjrw.jpeg)
|
298 |
|
299 |
+
Text Preprocessing | Kairyou:
|
300 |
+
![Text Preprocessing Screen | Kairyou](https://i.imgur.com/r8nHEvw.jpeg)
|
301 |
|
302 |
+
Text Translation | Translator:
|
303 |
+
![Text Translation Screen | Translator](https://i.imgur.com/0E9q2eh.jpeg)
|
304 |
+
|
305 |
+
Translation Settings Page 1:
|
306 |
+
![Translation Settings Page 1](https://i.imgur.com/0E9q2eh.jpeg)
|
307 |
+
|
308 |
+
Translation Settings Page 2:
|
309 |
+
![Translation Settings Page 2](https://i.imgur.com/8MQk6pL.jpeg)
|
310 |
+
|
311 |
+
Logging Page:
|
312 |
+
![Logging Page](https://i.imgur.com/vDPCUQC.jpeg)
|
313 |
+
|
314 |
+
---------------------------------------------------------------------------------------------------------------------------------------------------
|
315 |
+
|
316 |
+
## **Hugging Face**<a name="huggingface"></a>
|
317 |
+
|
318 |
+
For those who are interested, or simply cannot run Kudasai locally, a instance of Kudasai's WebGUI is hosted on Hugging Face's servers. You can find it [here](https://huggingface.co/spaces/Bikatr7/Kudasai).
|
319 |
+
|
320 |
+
It's a bit slower than running it locally, but it's a good alternative for those who cannot run it locally. The webgui on huggingface does not save anything through runs, so you will need to download the output files or copy the text out of the webgui. API keys are not saved, and the output folder is overwritten every time it loads. Archives deleted every run as well.
|
321 |
+
|
322 |
+
To see the README for the Hugging Face hosted version of Kudasai, please see [here](https://huggingface.co/spaces/Bikatr7/Kudasai/blob/main/README.md).
|
323 |
|
324 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
325 |
+
## **License**<a name="license"></a>
|
326 |
|
327 |
This project (Kudasai) is licensed under the GNU General Public License (GPL). You can find the full text of the license in the [LICENSE](License.md) file.
|
328 |
|
|
|
331 |
Please note that this information is a brief summary of the GPL. For a detailed understanding of your rights and obligations under this license, please refer to the full license text.
|
332 |
|
333 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
334 |
+
## **Contact**<a name="contact"></a>
|
335 |
|
336 |
+
If you have any questions, comments, or concerns, please feel free to contact me at [Bikatr7@proton.me](mailto:Bikatr7@proton.me)
|
337 |
|
338 |
For any bugs or suggestions please use the issues tab [here](https://github.com/Bikatr7/Kudasai/issues).
|
339 |
|
340 |
+
I actively encourage and welcome any feedback on this project.
|
341 |
+
|
342 |
+
---------------------------------------------------------------------------------------------------------------------------------------------------
|
343 |
+
|
344 |
+
## **Acknowledgements**<a name="acknowledgements"></a>
|
345 |
+
|
346 |
+
Kudasai gets it's original name idea from it's inspiration, Atreyagaurav's Onegai. Which also means please. You can find that [here](https://github.com/Atreyagaurav/onegai)
|
347 |
|
348 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
demo/translation_settings.json
ADDED
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"base translation settings": {
|
3 |
+
"prompt_assembly_mode": 1,
|
4 |
+
"number_of_lines_per_batch": 48,
|
5 |
+
"sentence_fragmenter_mode": 2,
|
6 |
+
"je_check_mode": 2,
|
7 |
+
"number_of_malformed_batch_retries": 1,
|
8 |
+
"batch_retry_timeout": 700,
|
9 |
+
"number_of_concurrent_batches": 2
|
10 |
+
},
|
11 |
+
|
12 |
+
"openai settings": {
|
13 |
+
"openai_model": "gpt-4-turbo",
|
14 |
+
"openai_system_message": "As a Japanese to English translator, translate narration into English simple past, everything else should remain in its original tense. Maintain original formatting, punctuation, and paragraph structure. Keep pre-translated terms and anticipate names not replaced. Preserve terms and markers marked with >>><<< and match the output's line count to the input's. Note: 〇 indicates chapter changes.",
|
15 |
+
"openai_temperature": 0.3,
|
16 |
+
"openai_top_p": 1.0,
|
17 |
+
"openai_n": 1,
|
18 |
+
"openai_stream": false,
|
19 |
+
"openai_stop": null,
|
20 |
+
"openai_logit_bias": null,
|
21 |
+
"openai_max_tokens": null,
|
22 |
+
"openai_presence_penalty": 0.0,
|
23 |
+
"openai_frequency_penalty": 0.0
|
24 |
+
},
|
25 |
+
|
26 |
+
"gemini settings": {
|
27 |
+
"gemini_model": "gemini-1.5-pro-latest",
|
28 |
+
"gemini_prompt": "As a Japanese to English translator, translate narration into English simple past, everything else should remain in its original tense. Maintain original formatting, punctuation, and paragraph structure. Keep pre-translated terms and anticipate names not replaced. Preserve terms and markers marked with >>><<< and match the output's line count to the input's. Note: 〇 indicates chapter changes.",
|
29 |
+
"gemini_temperature": 0.3,
|
30 |
+
"gemini_top_p": null,
|
31 |
+
"gemini_top_k": null,
|
32 |
+
"gemini_candidate_count": 1,
|
33 |
+
"gemini_stream": false,
|
34 |
+
"gemini_stop_sequences": null,
|
35 |
+
"gemini_max_output_tokens": null
|
36 |
+
},
|
37 |
+
|
38 |
+
"deepl settings":{
|
39 |
+
"deepl_context": "",
|
40 |
+
"deepl_split_sentences": "ALL",
|
41 |
+
"deepl_preserve_formatting": true,
|
42 |
+
"deepl_formality": "default"
|
43 |
+
}
|
44 |
+
|
45 |
+
}
|
handlers/json_handler.py
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
## built-in libraries
|
2 |
import json
|
3 |
import typing
|
|
|
4 |
|
5 |
## third-party libraries
|
6 |
from easytl import ALLOWED_GEMINI_MODELS, ALLOWED_OPENAI_MODELS
|
7 |
|
8 |
## custom modules
|
9 |
from modules.common.file_ensurer import FileEnsurer
|
10 |
-
from modules.common.logger import Logger
|
11 |
from modules.common.toolkit import Toolkit
|
12 |
|
13 |
##-------------------start-of-JsonHandler---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
@@ -16,81 +16,15 @@ class JsonHandler:
|
|
16 |
|
17 |
"""
|
18 |
|
19 |
-
Handles the
|
20 |
|
21 |
"""
|
22 |
|
23 |
-
|
24 |
|
25 |
-
|
26 |
-
|
27 |
-
Kijiku Settings:
|
28 |
|
29 |
-
prompt_assembly_mode : 1 or 2. 1 means the system message will actually be treated as a system message. 2 means it'll be treated as a user message. 1 is recommend for gpt-4 otherwise either works. For Gemini, this setting is ignored.
|
30 |
-
|
31 |
-
number_of_lines_per_batch : The number of lines to be built into a prompt at once. Theoretically, more lines would be more cost effective, but other complications may occur with higher lines. So far been tested up to 48.
|
32 |
-
|
33 |
-
sentence_fragmenter_mode : 1 or 2 (1 - via regex and other nonsense) 2 - None (Takes formatting and text directly from API return)) the API can sometimes return a result on a single line, so this determines the way Kijiku fragments the sentences if at all. Use 2 for newer models.
|
34 |
-
|
35 |
-
je_check_mode : 1 or 2, 1 will print out the jap then the english below separated by ---, 2 will attempt to pair the english and jap sentences, placing the jap above the eng. If it cannot, it will default to 1. Use 2 for newer models.
|
36 |
-
|
37 |
-
number_of_malformed_batch_retries : (Malformed batch is when je-fixing fails) How many times Kijiku will attempt to mend a malformed batch (mending is resending the request), only for gpt4. Be careful with increasing as cost increases at (cost * length * n) at worst case. This setting is ignored if je_check_mode is set to 1.
|
38 |
-
|
39 |
-
batch_retry_timeout : How long Kijiku will try to translate a batch in seconds, if a requests exceeds this duration, Kijiku will leave it untranslated.
|
40 |
-
|
41 |
-
number_of_concurrent_batches : How many translations batches Kijiku will send to the translation API at a time. For OpenAI, be conservative as rate-limiting is aggressive, I'd suggest 3-5. For Gemini, do not exceed 60.
|
42 |
-
----------------------------------------------------------------------------------
|
43 |
-
Open AI Settings:
|
44 |
-
See https://platform.openai.com/docs/api-reference/chat/create for further details
|
45 |
-
----------------------------------------------------------------------------------
|
46 |
-
openai_model : ID of the model to use. Kijiku only works with 'chat' models.
|
47 |
-
|
48 |
-
openai_system_message : Instructions to the model. Basically tells the model how to translate.
|
49 |
-
|
50 |
-
openai_temperature : What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Lower values are typically better for translation.
|
51 |
-
|
52 |
-
openai_top_p : An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. I generally recommend altering this or temperature but not both.
|
53 |
-
|
54 |
-
openai_n : How many chat completion choices to generate for each input message. Do not change this.
|
55 |
-
|
56 |
-
openai_stream : If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. See the OpenAI python library on GitHub for example code. Do not change this.
|
57 |
-
|
58 |
-
openai_stop : Up to 4 sequences where the API will stop generating further tokens. Do not change this.
|
59 |
-
|
60 |
-
openai_logit_bias : Modifies the likelihood of specified tokens appearing in the completion. Do not change this.
|
61 |
-
|
62 |
-
openai_max_tokens : The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. I wouldn't recommend changing this. Is none by default. If you change to an integer, make sure it doesn't exceed that model's context length or your request will fail and repeat till timeout.
|
63 |
-
|
64 |
-
openai_presence_penalty : Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. While negative values encourage repetition. Should leave this at 0.0.
|
65 |
-
|
66 |
-
openai_frequency_penalty : Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. Negative values encourage repetition. Should leave this at 0.0.
|
67 |
-
----------------------------------------------------------------------------------
|
68 |
-
openai_stream, openai_logit_bias, openai_stop and openai_n are included for completion's sake, current versions of Kudasai will hardcode their values when validating the Kijiku_rule.json to their default values. As different values for these settings do not have a use case in Kudasai's current implementation.
|
69 |
-
----------------------------------------------------------------------------------
|
70 |
-
Gemini Settings:
|
71 |
-
https://ai.google.dev/docs/concepts#model-parameters for further details
|
72 |
-
----------------------------------------------------------------------------------
|
73 |
-
gemini_model : The model to use. Currently only supports gemini-pro and gemini-pro-vision, the 1.0 model and it's aliases.
|
74 |
-
|
75 |
-
gemini_prompt : Instructions to the model. Basically tells the model how to translate.
|
76 |
-
|
77 |
-
gemini_temperature : What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Lower values are typically better for translation.
|
78 |
-
|
79 |
-
gemini_top_p : An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. I generally recommend altering this or temperature but not both.
|
80 |
-
|
81 |
-
gemini_top_k : Determines the number of most probable tokens to consider for each selection step. A higher value increases diversity, a lower value makes the output more deterministic.
|
82 |
-
|
83 |
-
gemini_candidate_count : The number of candidates to generate for each input message. Do not change this.
|
84 |
-
|
85 |
-
gemini_stream : If set, partial message deltas will be sent, like in Gemini Chat. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. Do not change this.
|
86 |
-
|
87 |
-
gemini_stop_sequences : Up to 4 sequences where the API will stop generating further tokens. Do not change this.
|
88 |
-
|
89 |
-
gemini_max_output_tokens : The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. I wouldn't recommend changing this. Is none by default. If you change to an integer, make sure it doesn't exceed that model's context length or your request will fail and repeat till timeout.
|
90 |
-
----------------------------------------------------------------------------------
|
91 |
-
gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for completion's sake, current versions of Kudasai will hardcode their values when validating the Kijiku_rule.json to their default values. As different values for these settings do not have a use case in Kudasai's current implementation.
|
92 |
-
----------------------------------------------------------------------------------
|
93 |
-
"""
|
94 |
|
95 |
##-------------------start-of-validate_json()--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
96 |
|
@@ -99,11 +33,11 @@ gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for
|
|
99 |
|
100 |
"""
|
101 |
|
102 |
-
Validates the
|
103 |
|
104 |
"""
|
105 |
|
106 |
-
|
107 |
"prompt_assembly_mode",
|
108 |
"number_of_lines_per_batch",
|
109 |
"sentence_fragmenter_mode",
|
@@ -139,6 +73,13 @@ gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for
|
|
139 |
"gemini_max_output_tokens"
|
140 |
]
|
141 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
142 |
validation_rules = {
|
143 |
"prompt_assembly_mode": lambda x: isinstance(x, int) and 1 <= x <= 2,
|
144 |
"number_of_lines_per_batch": lambda x: isinstance(x, int) and x > 0,
|
@@ -159,31 +100,42 @@ gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for
|
|
159 |
"gemini_top_p": lambda x: x is None or (isinstance(x, float) and 0 <= x <= 2),
|
160 |
"gemini_top_k": lambda x: x is None or (isinstance(x, int) and x >= 0),
|
161 |
"gemini_max_output_tokens": lambda x: x is None or isinstance(x, int),
|
|
|
|
|
|
|
|
|
162 |
}
|
163 |
|
164 |
try:
|
165 |
## ensure categories are present
|
166 |
-
assert "base
|
167 |
-
assert "openai settings" in JsonHandler.
|
168 |
-
assert "gemini settings" in JsonHandler.
|
|
|
169 |
|
170 |
## assign to variables to reduce repetitive access
|
171 |
-
|
172 |
-
openai_settings = JsonHandler.
|
173 |
-
gemini_settings = JsonHandler.
|
|
|
174 |
|
175 |
## ensure all keys are present
|
176 |
-
|
177 |
-
assert all(key in
|
178 |
-
assert all(key in
|
179 |
-
assert all(key in
|
180 |
|
181 |
## validate each key using the validation rules
|
182 |
for key, validate in validation_rules.items():
|
183 |
-
if(key in
|
|
|
|
|
184 |
raise ValueError(f"Invalid value for {key}")
|
185 |
elif(key in gemini_settings and not validate(gemini_settings[key])):
|
186 |
raise ValueError(f"Invalid value for {key}")
|
|
|
|
|
|
|
187 |
|
188 |
## force stop/logit_bias into None
|
189 |
openai_settings["openai_stop"] = None
|
@@ -195,108 +147,111 @@ gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for
|
|
195 |
|
196 |
## force n and candidate_count to 1
|
197 |
openai_settings["openai_n"] = 1
|
198 |
-
|
199 |
gemini_settings["gemini_candidate_count"] = 1
|
200 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
201 |
except Exception as e:
|
202 |
-
|
203 |
-
|
204 |
-
|
205 |
-
|
|
|
206 |
|
207 |
-
|
208 |
-
|
209 |
|
210 |
-
##-------------------start-of-
|
211 |
|
212 |
@staticmethod
|
213 |
-
def
|
214 |
|
215 |
"""
|
216 |
|
217 |
-
Resets the
|
218 |
|
219 |
"""
|
220 |
|
221 |
-
JsonHandler.
|
222 |
|
223 |
-
JsonHandler.
|
224 |
|
225 |
-
JsonHandler.
|
226 |
|
227 |
-
##-------------------start-of-
|
228 |
|
229 |
@staticmethod
|
230 |
-
def
|
231 |
|
232 |
"""
|
233 |
|
234 |
-
Dumps the
|
235 |
|
236 |
"""
|
237 |
|
238 |
-
with open(FileEnsurer.
|
239 |
-
json.dump(JsonHandler.
|
240 |
|
241 |
-
##-------------------start-of-
|
242 |
|
243 |
@staticmethod
|
244 |
-
def
|
245 |
|
246 |
"""
|
247 |
|
248 |
-
Loads the
|
249 |
|
250 |
"""
|
251 |
|
252 |
-
with open(FileEnsurer.
|
253 |
-
JsonHandler.
|
254 |
|
255 |
-
##-------------------start-of-
|
256 |
|
257 |
@staticmethod
|
258 |
-
def
|
259 |
|
260 |
"""
|
261 |
|
262 |
-
Prints the
|
263 |
Logs by default, but can be set to print to console as well.
|
264 |
|
265 |
Parameters:
|
266 |
-
|
267 |
|
268 |
"""
|
269 |
|
270 |
-
|
271 |
-
|
272 |
-
print
|
273 |
-
|
274 |
-
|
275 |
-
|
276 |
-
|
277 |
-
|
278 |
-
|
279 |
-
|
280 |
-
|
281 |
-
|
282 |
-
|
283 |
-
|
284 |
-
|
285 |
-
|
286 |
-
|
287 |
-
|
288 |
-
for key,value in JsonHandler.current_kijiku_rules["gemini settings"].items():
|
289 |
-
Logger.log_action(key + " : " + str(value), output=output, omit_timestamp=output)
|
290 |
-
|
291 |
|
292 |
-
##-------------------start-of-
|
293 |
|
294 |
@staticmethod
|
295 |
-
def
|
296 |
|
297 |
"""
|
298 |
|
299 |
-
Allows the user to change the settings of the
|
300 |
|
301 |
"""
|
302 |
|
@@ -304,7 +259,7 @@ gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for
|
|
304 |
|
305 |
Toolkit.clear_console()
|
306 |
|
307 |
-
settings_print_message = JsonHandler.
|
308 |
|
309 |
action = input(settings_print_message).lower()
|
310 |
|
@@ -317,23 +272,26 @@ gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for
|
|
317 |
|
318 |
elif(action == "d"):
|
319 |
print("Resetting to default settings.")
|
320 |
-
JsonHandler.
|
321 |
|
322 |
-
elif(action in JsonHandler.
|
323 |
-
SettingsChanger.change_setting("base
|
324 |
|
325 |
-
elif(action in JsonHandler.
|
326 |
SettingsChanger.change_setting("openai settings", action)
|
327 |
|
328 |
-
elif(action in JsonHandler.
|
329 |
SettingsChanger.change_setting("gemini settings", action)
|
330 |
|
|
|
|
|
|
|
331 |
else:
|
332 |
print("Invalid setting name. Please try again.")
|
333 |
|
334 |
Toolkit.pause_console("\nPress enter to continue.")
|
335 |
|
336 |
-
JsonHandler.
|
337 |
|
338 |
##-------------------start-of-convert_to_correct_type()-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
339 |
|
@@ -383,6 +341,10 @@ gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for
|
|
383 |
"gemini_stream": {"type": bool, "constraints": lambda x: x is False},
|
384 |
"gemini_stop_sequences": {"type": None, "constraints": lambda x: x is None},
|
385 |
"gemini_max_output_tokens": {"type": int, "constraints": lambda x: x is None or isinstance(x, int)},
|
|
|
|
|
|
|
|
|
386 |
}
|
387 |
|
388 |
if(setting_name not in type_expectations):
|
@@ -424,7 +386,7 @@ class SettingsChanger:
|
|
424 |
|
425 |
"""
|
426 |
|
427 |
-
Handles changing the settings of the
|
428 |
|
429 |
"""
|
430 |
|
@@ -448,17 +410,22 @@ Current settings:
|
|
448 |
|
449 |
"""
|
450 |
|
451 |
-
for key,value in JsonHandler.
|
452 |
menu += key + " : " + str(value) + "\n"
|
453 |
|
454 |
print("\n")
|
455 |
|
456 |
-
for key,value in JsonHandler.
|
457 |
menu += key + " : " + str(value) + "\n"
|
458 |
|
459 |
print("\n")
|
460 |
|
461 |
-
for key,value in JsonHandler.
|
|
|
|
|
|
|
|
|
|
|
462 |
menu += key + " : " + str(value) + "\n"
|
463 |
|
464 |
menu += """
|
@@ -476,37 +443,37 @@ Enter the name of the setting you want to change, type d to reset to default, ty
|
|
476 |
|
477 |
"""
|
478 |
|
479 |
-
Loads a custom json into the
|
480 |
|
481 |
"""
|
482 |
|
483 |
Toolkit.clear_console()
|
484 |
|
485 |
## saves old rules in case on invalid json
|
486 |
-
|
487 |
|
488 |
try:
|
489 |
|
490 |
## loads the custom json file
|
491 |
-
with open(FileEnsurer.
|
492 |
-
JsonHandler.
|
493 |
|
494 |
JsonHandler.validate_json()
|
495 |
|
496 |
## validate_json() sets a dict to the invalid placeholder if it's invalid, so if it's that, it's invalid
|
497 |
-
assert JsonHandler.
|
498 |
|
499 |
-
JsonHandler.
|
500 |
|
501 |
print("Settings loaded successfully.")
|
502 |
|
503 |
except AssertionError:
|
504 |
print("Invalid JSON file. Please try again.")
|
505 |
-
JsonHandler.
|
506 |
|
507 |
except FileNotFoundError:
|
508 |
-
print("Missing JSON file. Make sure you have a json in the same directory as kudasai.py and that the json is named \"
|
509 |
-
JsonHandler.
|
510 |
|
511 |
##-------------------start-of-change_setting()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
512 |
|
@@ -515,7 +482,7 @@ Enter the name of the setting you want to change, type d to reset to default, ty
|
|
515 |
|
516 |
"""
|
517 |
|
518 |
-
Changes the setting of the
|
519 |
|
520 |
Parameters:
|
521 |
setting_area (str) : The area of the setting to change.
|
@@ -528,7 +495,7 @@ Enter the name of the setting you want to change, type d to reset to default, ty
|
|
528 |
try:
|
529 |
converted_value = JsonHandler.convert_to_correct_type(setting_name, new_value)
|
530 |
|
531 |
-
JsonHandler.
|
532 |
print(f"Updated {setting_name} to {converted_value}.")
|
533 |
|
534 |
except Exception as e:
|
|
|
1 |
## built-in libraries
|
2 |
import json
|
3 |
import typing
|
4 |
+
import logging
|
5 |
|
6 |
## third-party libraries
|
7 |
from easytl import ALLOWED_GEMINI_MODELS, ALLOWED_OPENAI_MODELS
|
8 |
|
9 |
## custom modules
|
10 |
from modules.common.file_ensurer import FileEnsurer
|
|
|
11 |
from modules.common.toolkit import Toolkit
|
12 |
|
13 |
##-------------------start-of-JsonHandler---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
|
|
16 |
|
17 |
"""
|
18 |
|
19 |
+
Handles the translation_settings.json file and interactions with it.
|
20 |
|
21 |
"""
|
22 |
|
23 |
+
current_translation_settings = dict()
|
24 |
|
25 |
+
with open(FileEnsurer.translation_settings_description_path, 'r', encoding='utf-8') as file:
|
26 |
+
translation_settings_message = file.read()
|
|
|
27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
|
29 |
##-------------------start-of-validate_json()--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
30 |
|
|
|
33 |
|
34 |
"""
|
35 |
|
36 |
+
Validates the translation_settings.json file.
|
37 |
|
38 |
"""
|
39 |
|
40 |
+
base_translation_keys = [
|
41 |
"prompt_assembly_mode",
|
42 |
"number_of_lines_per_batch",
|
43 |
"sentence_fragmenter_mode",
|
|
|
73 |
"gemini_max_output_tokens"
|
74 |
]
|
75 |
|
76 |
+
deepl_keys = [
|
77 |
+
"deepl_context",
|
78 |
+
"deepl_split_sentences",
|
79 |
+
"deepl_preserve_formatting",
|
80 |
+
"deepl_formality"
|
81 |
+
]
|
82 |
+
|
83 |
validation_rules = {
|
84 |
"prompt_assembly_mode": lambda x: isinstance(x, int) and 1 <= x <= 2,
|
85 |
"number_of_lines_per_batch": lambda x: isinstance(x, int) and x > 0,
|
|
|
100 |
"gemini_top_p": lambda x: x is None or (isinstance(x, float) and 0 <= x <= 2),
|
101 |
"gemini_top_k": lambda x: x is None or (isinstance(x, int) and x >= 0),
|
102 |
"gemini_max_output_tokens": lambda x: x is None or isinstance(x, int),
|
103 |
+
"deepl_context": lambda x: isinstance(x, str),
|
104 |
+
"deepl_split_sentences": lambda x: isinstance(x, str),
|
105 |
+
"deepl_preserve_formatting": lambda x: isinstance(x, bool),
|
106 |
+
"deepl_formality": lambda x: isinstance(x, str)
|
107 |
}
|
108 |
|
109 |
try:
|
110 |
## ensure categories are present
|
111 |
+
assert "base translation settings" in JsonHandler.current_translation_settings, "base translation settings not found"
|
112 |
+
assert "openai settings" in JsonHandler.current_translation_settings, "openai settings not found"
|
113 |
+
assert "gemini settings" in JsonHandler.current_translation_settings, "gemini settings not found"
|
114 |
+
assert "deepl settings" in JsonHandler.current_translation_settings, "deepl settings not found"
|
115 |
|
116 |
## assign to variables to reduce repetitive access
|
117 |
+
base_translation_settings = JsonHandler.current_translation_settings["base translation settings"]
|
118 |
+
openai_settings = JsonHandler.current_translation_settings["openai settings"]
|
119 |
+
gemini_settings = JsonHandler.current_translation_settings["gemini settings"]
|
120 |
+
deepl_settings = JsonHandler.current_translation_settings["deepl settings"]
|
121 |
|
122 |
## ensure all keys are present
|
123 |
+
assert all(key in base_translation_settings for key in base_translation_keys), "base translation settings keys missing"
|
124 |
+
assert all(key in openai_settings for key in openai_keys), "openai settings keys missing"
|
125 |
+
assert all(key in gemini_settings for key in gemini_keys), "gemini settings keys missing"
|
126 |
+
assert all(key in deepl_settings for key in deepl_keys), "deepl settings keys missing"
|
127 |
|
128 |
## validate each key using the validation rules
|
129 |
for key, validate in validation_rules.items():
|
130 |
+
if(key in base_translation_settings and not validate(base_translation_settings[key])):
|
131 |
+
raise ValueError(f"Invalid value for {key}")
|
132 |
+
elif(key in openai_settings and not validate(openai_settings[key])):
|
133 |
raise ValueError(f"Invalid value for {key}")
|
134 |
elif(key in gemini_settings and not validate(gemini_settings[key])):
|
135 |
raise ValueError(f"Invalid value for {key}")
|
136 |
+
elif(key in deepl_settings and not validate(deepl_settings[key])):
|
137 |
+
raise ValueError(f"Invalid value for {key}")
|
138 |
+
|
139 |
|
140 |
## force stop/logit_bias into None
|
141 |
openai_settings["openai_stop"] = None
|
|
|
147 |
|
148 |
## force n and candidate_count to 1
|
149 |
openai_settings["openai_n"] = 1
|
|
|
150 |
gemini_settings["gemini_candidate_count"] = 1
|
151 |
|
152 |
+
## ensure deepl_formality and deepl_split_sentences are in allowed values
|
153 |
+
if(isinstance(deepl_settings["deepl_formality"], str) and deepl_settings["deepl_formality"] not in ["default", "more", "less", "prefer_more", "prefer_less"]):
|
154 |
+
raise ValueError("Invalid value for deepl_formality")
|
155 |
+
|
156 |
+
if(isinstance(deepl_settings["deepl_split_sentences"], str) and deepl_settings["deepl_split_sentences"] not in ["OFF", "ALL", "NO_NEWLINES"]):
|
157 |
+
raise ValueError("Invalid value for deepl_split_sentences")
|
158 |
+
|
159 |
except Exception as e:
|
160 |
+
logging.warning(f"translation_settings.json is not valid, setting to invalid_placeholder, current:"
|
161 |
+
f"\n{JsonHandler.current_translation_settings}"
|
162 |
+
f"\nReason: {e}")
|
163 |
+
|
164 |
+
JsonHandler.current_translation_settings = FileEnsurer.INVALID_TRANSLATION_SETTINGS_PLACEHOLDER
|
165 |
|
166 |
+
logging.debug(f"translation_settings.json is valid, current:"
|
167 |
+
f"\n{JsonHandler.current_translation_settings}")
|
168 |
|
169 |
+
##-------------------start-of-reset_translation_settings_to_default()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
170 |
|
171 |
@staticmethod
|
172 |
+
def reset_translation_settings_to_default() -> None:
|
173 |
|
174 |
"""
|
175 |
|
176 |
+
Resets the translation_settings.json to default.
|
177 |
|
178 |
"""
|
179 |
|
180 |
+
JsonHandler.current_translation_settings = FileEnsurer.DEFAULT_TRANSLATION_SETTING
|
181 |
|
182 |
+
JsonHandler.dump_translation_settings()
|
183 |
|
184 |
+
JsonHandler.load_translation_settings()
|
185 |
|
186 |
+
##-------------------start-of-dump_translation_settings()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
187 |
|
188 |
@staticmethod
|
189 |
+
def dump_translation_settings() -> None:
|
190 |
|
191 |
"""
|
192 |
|
193 |
+
Dumps the translation_settings.json file to disk.
|
194 |
|
195 |
"""
|
196 |
|
197 |
+
with open(FileEnsurer.config_translation_settings_path, 'w+', encoding='utf-8') as file:
|
198 |
+
json.dump(JsonHandler.current_translation_settings, file)
|
199 |
|
200 |
+
##-------------------start-of-load_translation_settings()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
201 |
|
202 |
@staticmethod
|
203 |
+
def load_translation_settings() -> None:
|
204 |
|
205 |
"""
|
206 |
|
207 |
+
Loads the translation_settings.json file into memory.
|
208 |
|
209 |
"""
|
210 |
|
211 |
+
with open(FileEnsurer.config_translation_settings_path, 'r', encoding='utf-8') as file:
|
212 |
+
JsonHandler.current_translation_settings = json.load(file)
|
213 |
|
214 |
+
##-------------------start-of-log_translation_settings()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
215 |
|
216 |
@staticmethod
|
217 |
+
def log_translation_settings(output_to_console:bool=False, specific_section:str | None = None) -> None:
|
218 |
|
219 |
"""
|
220 |
|
221 |
+
Prints the translation_settings.json file to the log.
|
222 |
Logs by default, but can be set to print to console as well.
|
223 |
|
224 |
Parameters:
|
225 |
+
output_to_console (bool | optional | default=False) : Whether to print to console as well.
|
226 |
|
227 |
"""
|
228 |
|
229 |
+
sections = ["base translation settings", "openai settings", "gemini settings", "deepl settings"]
|
230 |
+
|
231 |
+
## if a specific section is provided, only print that section and base translation settings
|
232 |
+
if(specific_section is not None):
|
233 |
+
specific_section = specific_section.lower()
|
234 |
+
sections = [section for section in sections if section.lower() == specific_section or section == "base translation settings"]
|
235 |
+
|
236 |
+
for section in sections:
|
237 |
+
print("-------------------")
|
238 |
+
print(f"{section.capitalize()}:")
|
239 |
+
print("-------------------")
|
240 |
+
|
241 |
+
for key, value in JsonHandler.current_translation_settings.get(section, {}).items():
|
242 |
+
log_message = f"{key} : {value}"
|
243 |
+
logging.debug(log_message)
|
244 |
+
if(output_to_console):
|
245 |
+
print(log_message)
|
|
|
|
|
|
|
|
|
246 |
|
247 |
+
##-------------------start-of-change_translation_settings()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
248 |
|
249 |
@staticmethod
|
250 |
+
def change_translation_settings() -> None:
|
251 |
|
252 |
"""
|
253 |
|
254 |
+
Allows the user to change the settings of the translation_settings.json file
|
255 |
|
256 |
"""
|
257 |
|
|
|
259 |
|
260 |
Toolkit.clear_console()
|
261 |
|
262 |
+
settings_print_message = JsonHandler.translation_settings_message + SettingsChanger.generate_settings_change_menu()
|
263 |
|
264 |
action = input(settings_print_message).lower()
|
265 |
|
|
|
272 |
|
273 |
elif(action == "d"):
|
274 |
print("Resetting to default settings.")
|
275 |
+
JsonHandler.reset_translation_settings_to_default()
|
276 |
|
277 |
+
elif(action in JsonHandler.current_translation_settings["base translation settings"]):
|
278 |
+
SettingsChanger.change_setting("base translation settings", action)
|
279 |
|
280 |
+
elif(action in JsonHandler.current_translation_settings["openai settings"]):
|
281 |
SettingsChanger.change_setting("openai settings", action)
|
282 |
|
283 |
+
elif(action in JsonHandler.current_translation_settings["gemini settings"]):
|
284 |
SettingsChanger.change_setting("gemini settings", action)
|
285 |
|
286 |
+
elif(action in JsonHandler.current_translation_settings["deepl settings"]):
|
287 |
+
SettingsChanger.change_setting("deepl settings", action)
|
288 |
+
|
289 |
else:
|
290 |
print("Invalid setting name. Please try again.")
|
291 |
|
292 |
Toolkit.pause_console("\nPress enter to continue.")
|
293 |
|
294 |
+
JsonHandler.dump_translation_settings()
|
295 |
|
296 |
##-------------------start-of-convert_to_correct_type()-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
297 |
|
|
|
341 |
"gemini_stream": {"type": bool, "constraints": lambda x: x is False},
|
342 |
"gemini_stop_sequences": {"type": None, "constraints": lambda x: x is None},
|
343 |
"gemini_max_output_tokens": {"type": int, "constraints": lambda x: x is None or isinstance(x, int)},
|
344 |
+
"deepl_context": {"type": str, "constraints": lambda x: isinstance(x, str)},
|
345 |
+
"deepl_split_sentences": {"type": str, "constraints": lambda x: isinstance(x, str)},
|
346 |
+
"deepl_preserve_formatting": {"type": bool, "constraints": lambda x: isinstance(x, bool)},
|
347 |
+
"deepl_formality": {"type": str, "constraints": lambda x: isinstance(x, str)}
|
348 |
}
|
349 |
|
350 |
if(setting_name not in type_expectations):
|
|
|
386 |
|
387 |
"""
|
388 |
|
389 |
+
Handles changing the settings of the translation_settings.json file.
|
390 |
|
391 |
"""
|
392 |
|
|
|
410 |
|
411 |
"""
|
412 |
|
413 |
+
for key,value in JsonHandler.current_translation_settings["base translation settings"].items():
|
414 |
menu += key + " : " + str(value) + "\n"
|
415 |
|
416 |
print("\n")
|
417 |
|
418 |
+
for key,value in JsonHandler.current_translation_settings["openai settings"].items():
|
419 |
menu += key + " : " + str(value) + "\n"
|
420 |
|
421 |
print("\n")
|
422 |
|
423 |
+
for key,value in JsonHandler.current_translation_settings["gemini settings"].items():
|
424 |
+
menu += key + " : " + str(value) + "\n"
|
425 |
+
|
426 |
+
print("\n")
|
427 |
+
|
428 |
+
for key,value in JsonHandler.current_translation_settings["deepl settings"].items():
|
429 |
menu += key + " : " + str(value) + "\n"
|
430 |
|
431 |
menu += """
|
|
|
443 |
|
444 |
"""
|
445 |
|
446 |
+
Loads a custom json into the translation_settings.json file.
|
447 |
|
448 |
"""
|
449 |
|
450 |
Toolkit.clear_console()
|
451 |
|
452 |
## saves old rules in case on invalid json
|
453 |
+
old_translation_settings = JsonHandler.current_translation_settings
|
454 |
|
455 |
try:
|
456 |
|
457 |
## loads the custom json file
|
458 |
+
with open(FileEnsurer.external_translation_settings_path, 'r', encoding='utf-8') as file:
|
459 |
+
JsonHandler.current_translation_settings = json.load(file)
|
460 |
|
461 |
JsonHandler.validate_json()
|
462 |
|
463 |
## validate_json() sets a dict to the invalid placeholder if it's invalid, so if it's that, it's invalid
|
464 |
+
assert JsonHandler.current_translation_settings != FileEnsurer.INVALID_TRANSLATION_SETTINGS_PLACEHOLDER
|
465 |
|
466 |
+
JsonHandler.dump_translation_settings()
|
467 |
|
468 |
print("Settings loaded successfully.")
|
469 |
|
470 |
except AssertionError:
|
471 |
print("Invalid JSON file. Please try again.")
|
472 |
+
JsonHandler.current_translation_settings = old_translation_settings
|
473 |
|
474 |
except FileNotFoundError:
|
475 |
+
print("Missing JSON file. Make sure you have a json in the same directory as kudasai.py and that the json is named \"translation_settings.json\". Please try again.")
|
476 |
+
JsonHandler.current_translation_settings = old_translation_settings
|
477 |
|
478 |
##-------------------start-of-change_setting()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
479 |
|
|
|
482 |
|
483 |
"""
|
484 |
|
485 |
+
Changes the setting of the translation_settings.json file.
|
486 |
|
487 |
Parameters:
|
488 |
setting_area (str) : The area of the setting to change.
|
|
|
495 |
try:
|
496 |
converted_value = JsonHandler.convert_to_correct_type(setting_name, new_value)
|
497 |
|
498 |
+
JsonHandler.current_translation_settings[setting_area][setting_name] = converted_value
|
499 |
print(f"Updated {setting_name} to {converted_value}.")
|
500 |
|
501 |
except Exception as e:
|
kudasai.py
CHANGED
@@ -5,6 +5,8 @@ import json
|
|
5 |
import asyncio
|
6 |
import re
|
7 |
import typing
|
|
|
|
|
8 |
|
9 |
## third-party libraries
|
10 |
from kairyou import Kairyou
|
@@ -12,14 +14,12 @@ from kairyou import Indexer
|
|
12 |
from kairyou.types import NameAndOccurrence
|
13 |
|
14 |
## custom modules
|
15 |
-
from
|
16 |
-
from models.kijiku import Kijiku
|
17 |
|
18 |
from handlers.json_handler import JsonHandler
|
19 |
|
20 |
from modules.common.toolkit import Toolkit
|
21 |
from modules.common.file_ensurer import FileEnsurer
|
22 |
-
from modules.common.logger import Logger
|
23 |
|
24 |
##-------------------start-of-Kudasai---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
25 |
|
@@ -35,8 +35,49 @@ class Kudasai:
|
|
35 |
|
36 |
text_to_preprocess:str
|
37 |
replacement_json:dict
|
|
|
38 |
|
39 |
need_to_run_kairyou:bool = True
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
41 |
##-------------------start-of-boot()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
42 |
|
@@ -45,7 +86,7 @@ class Kudasai:
|
|
45 |
|
46 |
"""
|
47 |
|
48 |
-
Does some logging and sets up the console window, regardless of whether the user is running the CLI, WebGUI, or Console version of Kudasai.
|
49 |
|
50 |
"""
|
51 |
|
@@ -53,34 +94,36 @@ class Kudasai:
|
|
53 |
|
54 |
Toolkit.clear_console()
|
55 |
|
|
|
|
|
|
|
|
|
|
|
56 |
FileEnsurer.setup_needed_files()
|
57 |
|
58 |
-
|
59 |
-
Logger.log_action("Kudasai started")
|
60 |
-
Logger.log_action("Current version: " + Toolkit.CURRENT_VERSION)
|
61 |
-
Logger.log_barrier()
|
62 |
|
63 |
try:
|
64 |
|
65 |
-
with open(FileEnsurer.
|
66 |
-
JsonHandler.
|
67 |
|
68 |
JsonHandler.validate_json()
|
69 |
|
70 |
-
assert JsonHandler.
|
71 |
|
72 |
except:
|
73 |
|
74 |
-
print("Invalid
|
75 |
|
76 |
Toolkit.pause_console()
|
77 |
|
78 |
-
raise Exception("Invalid
|
79 |
|
80 |
##-------------------start-of-run_kairyou_indexer()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
81 |
|
82 |
@staticmethod
|
83 |
-
def run_kairyou_indexer(text_to_index:str, replacement_json:typing.Union[dict,str]) -> typing.Tuple[str, str]:
|
84 |
|
85 |
"""
|
86 |
|
@@ -98,8 +141,6 @@ class Kudasai:
|
|
98 |
|
99 |
Toolkit.clear_console()
|
100 |
|
101 |
-
knowledge_base = input("Please enter the path to the knowledge base you would like to use for the indexer (can be text, a path to a txt file, or a path to a directory of txt files):\n").strip('"')
|
102 |
-
|
103 |
## unique names is a list of named tuples, with the fields name and occurrence
|
104 |
unique_names, indexing_log = Indexer.index(text_to_index, knowledge_base, replacement_json)
|
105 |
|
@@ -144,7 +185,7 @@ class Kudasai:
|
|
144 |
new_text += text[last_end:match.start()] + f">>>{name}<<<"
|
145 |
last_end = match.end()
|
146 |
|
147 |
-
new_text += text[last_end:]
|
148 |
text = new_text
|
149 |
|
150 |
return text
|
@@ -166,8 +207,13 @@ class Kudasai:
|
|
166 |
|
167 |
indexing_log = ""
|
168 |
|
169 |
-
if(Kudasai.replacement_json not in ["",
|
170 |
-
|
|
|
|
|
|
|
|
|
|
|
171 |
|
172 |
preprocessed_text, preprocessing_log, error_log = Kairyou.preprocess(Kudasai.text_to_preprocess, Kudasai.replacement_json)
|
173 |
|
@@ -178,6 +224,9 @@ class Kudasai:
|
|
178 |
if(indexing_log != ""):
|
179 |
preprocessing_log = indexing_log + "\n\n" + preprocessing_log
|
180 |
|
|
|
|
|
|
|
181 |
print(preprocessing_log)
|
182 |
|
183 |
timestamp = Toolkit.get_timestamp(is_archival=True)
|
@@ -190,7 +239,7 @@ class Kudasai:
|
|
190 |
else:
|
191 |
print("(Preprocessing skipped)")
|
192 |
|
193 |
-
await Kudasai.
|
194 |
|
195 |
Toolkit.pause_console("\nPress any key to exit...")
|
196 |
|
@@ -214,92 +263,32 @@ class Kudasai:
|
|
214 |
Toolkit.pause_console()
|
215 |
Toolkit.clear_console()
|
216 |
|
217 |
-
##-------------------start-of-
|
218 |
|
219 |
@staticmethod
|
220 |
-
async def
|
221 |
|
222 |
"""
|
223 |
|
224 |
-
If the user is running the CLI or Console version of Kudasai, this function is called to
|
225 |
|
226 |
"""
|
227 |
|
228 |
-
|
229 |
-
Toolkit.clear_console()
|
230 |
-
|
231 |
-
print("You are not connected to the internet. Please connect to the internet to use the autotranslation feature.\n")
|
232 |
-
Toolkit.pause_console()
|
233 |
-
|
234 |
-
exit()
|
235 |
|
236 |
-
|
237 |
-
|
238 |
-
pathing_msg = "Please select an auto-translation module:\n\n1.Kaiseki (deepL)\n2.Kijiku (OpenAI/Gemini)\n3.Exit\n\n"
|
239 |
-
|
240 |
-
pathing = input(pathing_msg)
|
241 |
|
242 |
Toolkit.clear_console()
|
243 |
|
244 |
-
|
245 |
-
Kudasai.run_kaiseki()
|
246 |
-
elif(pathing == "2"):
|
247 |
-
await Kudasai.run_kijiku()
|
248 |
-
else:
|
249 |
-
Toolkit.clear_console()
|
250 |
-
exit()
|
251 |
-
|
252 |
-
##-------------------start-of-run_kaiseki()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
253 |
|
254 |
-
|
255 |
-
def run_kaiseki() -> None:
|
256 |
-
|
257 |
-
"""
|
258 |
-
|
259 |
-
If the user is running the CLI or Console version of Kudasai, this function is called to run the Kaiseki module.
|
260 |
-
|
261 |
-
"""
|
262 |
-
|
263 |
-
Logger.log_action("--------------------")
|
264 |
-
Logger.log_action("Kaiseki started")
|
265 |
-
Logger.log_action("--------------------")
|
266 |
-
|
267 |
-
Kaiseki.text_to_translate = [line for line in Kudasai.text_to_preprocess.splitlines()]
|
268 |
-
|
269 |
-
Kaiseki.translate()
|
270 |
|
271 |
Toolkit.clear_console()
|
272 |
|
273 |
-
print(
|
274 |
-
|
275 |
-
Kaiseki.write_kaiseki_results()
|
276 |
-
|
277 |
-
##-------------------start-of-run_kijiku()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
278 |
-
|
279 |
-
@staticmethod
|
280 |
-
async def run_kijiku() -> None:
|
281 |
|
282 |
-
|
283 |
-
|
284 |
-
If the user is running the CLI or Console version of Kudasai, this function is called to run the Kijiku module.
|
285 |
-
|
286 |
-
"""
|
287 |
-
|
288 |
-
Logger.log_action("--------------------")
|
289 |
-
Logger.log_action("Kijiku started")
|
290 |
-
Logger.log_action("--------------------")
|
291 |
-
|
292 |
-
Toolkit.clear_console()
|
293 |
-
|
294 |
-
Kijiku.text_to_translate = [line for line in Kudasai.text_to_preprocess.splitlines()]
|
295 |
-
|
296 |
-
await Kijiku.translate()
|
297 |
-
|
298 |
-
Toolkit.clear_console()
|
299 |
-
|
300 |
-
print(Kijiku.translation_print_result)
|
301 |
-
|
302 |
-
Kijiku.write_kijiku_results()
|
303 |
|
304 |
##-------------------start-of-main()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
305 |
|
@@ -319,7 +308,7 @@ async def main() -> None:
|
|
319 |
if(len(sys.argv) <= 1):
|
320 |
await run_console_version()
|
321 |
|
322 |
-
elif(len(sys.argv) in [2, 3]):
|
323 |
await run_cli_version()
|
324 |
|
325 |
else:
|
@@ -340,21 +329,26 @@ async def run_console_version():
|
|
340 |
|
341 |
try:
|
342 |
|
343 |
-
path_to_text_to_preprocess = input("Please enter the path to the input file to be
|
344 |
Kudasai.text_to_preprocess = FileEnsurer.standard_read_file(path_to_text_to_preprocess)
|
345 |
Toolkit.clear_console()
|
346 |
|
347 |
-
path_to_replacement_json = input("Please enter the path to the replacement json file:\n").strip('"')
|
348 |
Kudasai.replacement_json = FileEnsurer.standard_read_json(path_to_replacement_json if path_to_replacement_json else FileEnsurer.blank_rules_path)
|
349 |
Toolkit.clear_console()
|
350 |
|
|
|
|
|
|
|
|
|
351 |
except Exception as e:
|
352 |
print_usage_statement()
|
353 |
|
354 |
raise e
|
|
|
|
|
355 |
|
356 |
await Kudasai.run_kudasai()
|
357 |
-
Logger.push_batch()
|
358 |
|
359 |
##-------------------start-of-run_cli_version()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
360 |
|
@@ -366,22 +360,99 @@ async def run_cli_version():
|
|
366 |
|
367 |
"""
|
368 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
369 |
try:
|
370 |
|
371 |
-
|
372 |
-
|
|
|
|
|
|
|
373 |
|
374 |
-
|
375 |
-
|
|
|
376 |
|
377 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
378 |
|
379 |
-
|
380 |
-
Kudasai.need_to_run_kairyou = False
|
381 |
|
382 |
-
|
383 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
384 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
385 |
##-------------------start-of-print_usage_statement()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
386 |
|
387 |
def print_usage_statement():
|
@@ -391,14 +462,47 @@ def print_usage_statement():
|
|
391 |
Prints the usage statement for the CLI version of Kudasai.
|
392 |
|
393 |
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
394 |
|
395 |
-
|
396 |
-
|
|
|
|
|
397 |
|
398 |
-
print("\n")
|
399 |
|
400 |
##-------------------start-of-submain()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
401 |
|
402 |
|
403 |
-
if(__name__ ==
|
404 |
asyncio.run(main())
|
|
|
5 |
import asyncio
|
6 |
import re
|
7 |
import typing
|
8 |
+
import logging
|
9 |
+
import argparse
|
10 |
|
11 |
## third-party libraries
|
12 |
from kairyou import Kairyou
|
|
|
14 |
from kairyou.types import NameAndOccurrence
|
15 |
|
16 |
## custom modules
|
17 |
+
from modules.common.translator import Translator
|
|
|
18 |
|
19 |
from handlers.json_handler import JsonHandler
|
20 |
|
21 |
from modules.common.toolkit import Toolkit
|
22 |
from modules.common.file_ensurer import FileEnsurer
|
|
|
23 |
|
24 |
##-------------------start-of-Kudasai---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
25 |
|
|
|
35 |
|
36 |
text_to_preprocess:str
|
37 |
replacement_json:dict
|
38 |
+
knowledge_base:str
|
39 |
|
40 |
need_to_run_kairyou:bool = True
|
41 |
+
need_to_run_indexer:bool = True
|
42 |
+
|
43 |
+
##-------------------start-of-setup_logging()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
44 |
+
|
45 |
+
@staticmethod
|
46 |
+
def setup_logging() -> None:
|
47 |
+
|
48 |
+
"""
|
49 |
+
|
50 |
+
Sets up logging for the Kudasai program.
|
51 |
+
|
52 |
+
"""
|
53 |
+
|
54 |
+
## Debug log setup
|
55 |
+
debug_log_handler = logging.FileHandler(FileEnsurer.debug_log_path, mode='w+', encoding='utf-8')
|
56 |
+
debug_log_handler.setLevel(logging.DEBUG)
|
57 |
+
debug_formatter = logging.Formatter('[%(asctime)s] [%(levelname)s] [%(filename)s] %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
|
58 |
+
debug_log_handler.setFormatter(debug_formatter)
|
59 |
+
|
60 |
+
## Error log setup
|
61 |
+
error_log_handler = logging.FileHandler(FileEnsurer.error_log_path, mode='w+', encoding='utf-8')
|
62 |
+
error_log_handler.setLevel(logging.WARNING)
|
63 |
+
error_formatter = logging.Formatter('[%(asctime)s] [%(levelname)s] [%(filename)s] %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
|
64 |
+
error_log_handler.setFormatter(error_formatter)
|
65 |
+
|
66 |
+
## Console handler setup
|
67 |
+
console = logging.StreamHandler()
|
68 |
+
console.setLevel(logging.INFO)
|
69 |
+
console_formatter = logging.Formatter('[%(asctime)s] [%(levelname)s] [%(filename)s] %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
|
70 |
+
console.setFormatter(console_formatter)
|
71 |
+
|
72 |
+
## Add handlers to the logger
|
73 |
+
logger = logging.getLogger('')
|
74 |
+
logger.setLevel(logging.DEBUG)
|
75 |
+
logger.addHandler(debug_log_handler)
|
76 |
+
logger.addHandler(error_log_handler)
|
77 |
+
logger.addHandler(console)
|
78 |
+
|
79 |
+
## Ensure only INFO level and above messages are sent to the console
|
80 |
+
console.setLevel(logging.INFO)
|
81 |
|
82 |
##-------------------start-of-boot()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
83 |
|
|
|
86 |
|
87 |
"""
|
88 |
|
89 |
+
Does some logging and sets up the console window, and translator settings, regardless of whether the user is running the CLI, WebGUI, or Console version of Kudasai.
|
90 |
|
91 |
"""
|
92 |
|
|
|
94 |
|
95 |
Toolkit.clear_console()
|
96 |
|
97 |
+
## Need to create the output dir FIRST as logging files are located in the output folder
|
98 |
+
FileEnsurer.standard_create_directory(FileEnsurer.output_dir)
|
99 |
+
|
100 |
+
Kudasai.setup_logging()
|
101 |
+
|
102 |
FileEnsurer.setup_needed_files()
|
103 |
|
104 |
+
logging.debug(f"Kudasai started; Current version : {Toolkit.CURRENT_VERSION}")
|
|
|
|
|
|
|
105 |
|
106 |
try:
|
107 |
|
108 |
+
with open(FileEnsurer.config_translation_settings_path, "r") as translation_settings:
|
109 |
+
JsonHandler.current_translation_settings = json.load(translation_settings)
|
110 |
|
111 |
JsonHandler.validate_json()
|
112 |
|
113 |
+
assert JsonHandler.current_translation_settings != FileEnsurer.INVALID_TRANSLATION_SETTINGS_PLACEHOLDER
|
114 |
|
115 |
except:
|
116 |
|
117 |
+
print("Invalid translation_settings.json file. Please check the file for errors or mistakes. If you are unsure, delete the file and run Kudasai again. Your file is located at: " + FileEnsurer.config_translation_settings_path)
|
118 |
|
119 |
Toolkit.pause_console()
|
120 |
|
121 |
+
raise Exception("Invalid translation_settings.json file. Please check the file for errors or mistakes. If you are unsure, delete the file and run Kudasai again. Your file is located at: " + FileEnsurer.config_translation_settings_path)
|
122 |
|
123 |
##-------------------start-of-run_kairyou_indexer()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
124 |
|
125 |
@staticmethod
|
126 |
+
def run_kairyou_indexer(text_to_index:str, replacement_json:typing.Union[dict,str], knowledge_base:str) -> typing.Tuple[str, str]:
|
127 |
|
128 |
"""
|
129 |
|
|
|
141 |
|
142 |
Toolkit.clear_console()
|
143 |
|
|
|
|
|
144 |
## unique names is a list of named tuples, with the fields name and occurrence
|
145 |
unique_names, indexing_log = Indexer.index(text_to_index, knowledge_base, replacement_json)
|
146 |
|
|
|
185 |
new_text += text[last_end:match.start()] + f">>>{name}<<<"
|
186 |
last_end = match.end()
|
187 |
|
188 |
+
new_text += text[last_end:] ## Append the rest of the text
|
189 |
text = new_text
|
190 |
|
191 |
return text
|
|
|
207 |
|
208 |
indexing_log = ""
|
209 |
|
210 |
+
if(Kudasai.replacement_json not in ["",
|
211 |
+
FileEnsurer.blank_rules_path,
|
212 |
+
FileEnsurer.standard_read_json(FileEnsurer.blank_rules_path)]
|
213 |
+
|
214 |
+
and Kudasai.need_to_run_indexer
|
215 |
+
and Kudasai.knowledge_base != ""):
|
216 |
+
Kudasai.text_to_preprocess, indexing_log = Kudasai.run_kairyou_indexer(Kudasai.text_to_preprocess, Kudasai.replacement_json, Kudasai.knowledge_base)
|
217 |
|
218 |
preprocessed_text, preprocessing_log, error_log = Kairyou.preprocess(Kudasai.text_to_preprocess, Kudasai.replacement_json)
|
219 |
|
|
|
224 |
if(indexing_log != ""):
|
225 |
preprocessing_log = indexing_log + "\n\n" + preprocessing_log
|
226 |
|
227 |
+
if(preprocessing_log == "Skipped"):
|
228 |
+
preprocessing_log = "Preprocessing skipped."
|
229 |
+
|
230 |
print(preprocessing_log)
|
231 |
|
232 |
timestamp = Toolkit.get_timestamp(is_archival=True)
|
|
|
239 |
else:
|
240 |
print("(Preprocessing skipped)")
|
241 |
|
242 |
+
await Kudasai.run_translator()
|
243 |
|
244 |
Toolkit.pause_console("\nPress any key to exit...")
|
245 |
|
|
|
263 |
Toolkit.pause_console()
|
264 |
Toolkit.clear_console()
|
265 |
|
266 |
+
##-------------------start-of-run_translator()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
267 |
|
268 |
@staticmethod
|
269 |
+
async def run_translator(is_cli:bool=False) -> None:
|
270 |
|
271 |
"""
|
272 |
|
273 |
+
If the user is running the CLI or Console version of Kudasai, this function is called to run the Translator module.
|
274 |
|
275 |
"""
|
276 |
|
277 |
+
Translator.is_cli = is_cli
|
|
|
|
|
|
|
|
|
|
|
|
|
278 |
|
279 |
+
logging.info("Translator started")
|
|
|
|
|
|
|
|
|
280 |
|
281 |
Toolkit.clear_console()
|
282 |
|
283 |
+
Translator.text_to_translate = [line for line in Kudasai.text_to_preprocess.splitlines()]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
284 |
|
285 |
+
await Translator.translate()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
286 |
|
287 |
Toolkit.clear_console()
|
288 |
|
289 |
+
print(Translator.translation_print_result)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
290 |
|
291 |
+
Translator.write_translator_results()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
292 |
|
293 |
##-------------------start-of-main()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
294 |
|
|
|
308 |
if(len(sys.argv) <= 1):
|
309 |
await run_console_version()
|
310 |
|
311 |
+
elif(len(sys.argv) in [2, 3, 4, 5, 6]):
|
312 |
await run_cli_version()
|
313 |
|
314 |
else:
|
|
|
329 |
|
330 |
try:
|
331 |
|
332 |
+
path_to_text_to_preprocess = input("Please enter the path to the input file to be preprocessed/translated:\n").strip('"')
|
333 |
Kudasai.text_to_preprocess = FileEnsurer.standard_read_file(path_to_text_to_preprocess)
|
334 |
Toolkit.clear_console()
|
335 |
|
336 |
+
path_to_replacement_json = input("Please enter the path to the replacement json file (Press enter if skipping to translation):\n").strip('"')
|
337 |
Kudasai.replacement_json = FileEnsurer.standard_read_json(path_to_replacement_json if path_to_replacement_json else FileEnsurer.blank_rules_path)
|
338 |
Toolkit.clear_console()
|
339 |
|
340 |
+
if(path_to_replacement_json != ""):
|
341 |
+
Kudasai.knowledge_base = input("Please enter the path to the knowledge base you would like to use for the name indexer (can be text, a path to a txt file, or a path to a directory of txt files (Press enter if skipping name indexing):\n").strip('"')
|
342 |
+
Toolkit.clear_console()
|
343 |
+
|
344 |
except Exception as e:
|
345 |
print_usage_statement()
|
346 |
|
347 |
raise e
|
348 |
+
|
349 |
+
print("In progress...")
|
350 |
|
351 |
await Kudasai.run_kudasai()
|
|
|
352 |
|
353 |
##-------------------start-of-run_cli_version()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
354 |
|
|
|
360 |
|
361 |
"""
|
362 |
|
363 |
+
def determine_argument_type(arg:str) -> str:
|
364 |
+
|
365 |
+
"""
|
366 |
+
|
367 |
+
Determines the third argument for the CLI version of Kudasai.
|
368 |
+
|
369 |
+
"""
|
370 |
+
|
371 |
+
conditions = [
|
372 |
+
(lambda arg: arg in ["deepl", "openai", "gemini"], "translation_method"),
|
373 |
+
(lambda arg: os.path.exists(arg) and not ".json" in arg, "text_to_translate"),
|
374 |
+
(lambda arg: len(arg) > 10 and not os.path.exists(arg), "api_key"),
|
375 |
+
(lambda arg: arg == "translate", "identifier"),
|
376 |
+
(lambda arg: os.path.exists(arg) and ".json" in arg, "translation_settings_json")
|
377 |
+
]
|
378 |
+
|
379 |
+
for condition, result in conditions:
|
380 |
+
if(condition(arg)):
|
381 |
+
print(result)
|
382 |
+
return result
|
383 |
+
|
384 |
+
raise Exception("Invalid argument. Please use 'deepl', 'openai', or 'gemini'.")
|
385 |
+
|
386 |
+
mode = ""
|
387 |
+
|
388 |
try:
|
389 |
|
390 |
+
indices = {
|
391 |
+
"preprocess": {"text_to_preprocess_index": 2, "replacement_json_index": 3, "knowledge_base_index": 4},
|
392 |
+
"translate": {"text_to_translate_index": 2},
|
393 |
+
"--help": {}
|
394 |
+
}
|
395 |
|
396 |
+
try:
|
397 |
+
arg_indices = indices[sys.argv[1]]
|
398 |
+
mode = sys.argv[1]
|
399 |
|
400 |
+
except KeyError:
|
401 |
+
print_usage_statement()
|
402 |
+
raise Exception("Invalid mode. Please use 'preprocess' or 'translate'. Please use --help for more information.")
|
403 |
+
|
404 |
+
if(mode == "preprocess"):
|
405 |
+
|
406 |
+
Kudasai.text_to_preprocess = FileEnsurer.standard_read_file(sys.argv[arg_indices['text_to_preprocess_index']].strip('"'))
|
407 |
+
Kudasai.replacement_json = FileEnsurer.standard_read_json(sys.argv[arg_indices['replacement_json_index']].strip('"')) if len(sys.argv) >= arg_indices['replacement_json_index'] + 1 else FileEnsurer.standard_read_json(FileEnsurer.blank_rules_path)
|
408 |
+
Kudasai.knowledge_base = sys.argv[arg_indices['knowledge_base_index']].strip('"') if len(sys.argv) == arg_indices['knowledge_base_index'] + 1 else ""
|
409 |
+
|
410 |
+
if(len(sys.argv) == 2):
|
411 |
+
Kudasai.need_to_run_kairyou = False
|
412 |
+
elif(len(sys.argv) == 3):
|
413 |
+
Kudasai.need_to_run_indexer = False
|
414 |
+
|
415 |
+
await Kudasai.run_kudasai()
|
416 |
|
417 |
+
elif(mode == "translate"):
|
|
|
418 |
|
419 |
+
method_to_translation_mode = {
|
420 |
+
"openai": "1",
|
421 |
+
"gemini": "2",
|
422 |
+
"deepl": "3"
|
423 |
+
}
|
424 |
+
|
425 |
+
Kudasai.text_to_preprocess = FileEnsurer.standard_read_file(sys.argv[arg_indices['text_to_translate_index']].strip('"'))
|
426 |
+
|
427 |
+
sys.argv.pop(0)
|
428 |
+
|
429 |
+
arg_dict = {arg.strip('"'): determine_argument_type(arg.strip('"')) for arg in sys.argv}
|
430 |
+
|
431 |
+
assert len(arg_dict) == len(set(arg_dict)), "Invalid arguments. Please use --help for more information."
|
432 |
+
|
433 |
+
arg_type_action_map = {
|
434 |
+
"translation_method": lambda arg: setattr(Translator, 'TRANSLATION_METHOD', method_to_translation_mode[arg]),
|
435 |
+
"translation_settings_json": lambda arg: setattr(JsonHandler, 'current_translation_settings', FileEnsurer.standard_read_json(arg)),
|
436 |
+
"api_key": lambda arg: setattr(Translator, 'pre_provided_api_key', arg),
|
437 |
+
"identifier": lambda arg: None,
|
438 |
+
"text_to_translate": lambda arg: setattr(Kudasai, 'text_to_preprocess', FileEnsurer.standard_read_file(arg))
|
439 |
+
}
|
440 |
|
441 |
+
for arg, arg_type in arg_dict.items():
|
442 |
+
if(arg_type in arg_type_action_map):
|
443 |
+
arg_type_action_map[arg_type](arg)
|
444 |
+
else:
|
445 |
+
raise Exception("Invalid argument type. Please use --help for more information.")
|
446 |
+
|
447 |
+
await Kudasai.run_translator(is_cli=True)
|
448 |
+
|
449 |
+
else:
|
450 |
+
print_usage_statement()
|
451 |
+
|
452 |
+
except Exception as e:
|
453 |
+
print_usage_statement()
|
454 |
+
raise e
|
455 |
+
|
456 |
##-------------------start-of-print_usage_statement()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
457 |
|
458 |
def print_usage_statement():
|
|
|
462 |
Prints the usage statement for the CLI version of Kudasai.
|
463 |
|
464 |
"""
|
465 |
+
python_command = "python" if Toolkit.is_windows() else "python3"
|
466 |
+
|
467 |
+
print(f"""
|
468 |
+
Usage: {python_command} Kudasai.py <mode> <required_arguments> [optional_arguments]
|
469 |
+
|
470 |
+
Modes:
|
471 |
+
preprocess
|
472 |
+
Preprocesses the text file using the provided replacement JSON.
|
473 |
+
|
474 |
+
Required arguments:
|
475 |
+
<input_file> Path to the text file to preprocess. This a path to a text file
|
476 |
+
<replacement_json> Path to the replacement JSON file. This is a path to a json file.
|
477 |
+
|
478 |
+
Optional arguments:
|
479 |
+
<knowledge_base> Path to the knowledge base file. This can be either a directory, file, or even text.
|
480 |
+
|
481 |
+
Example:
|
482 |
+
{python_command} Kudasai.py preprocess "C:\\path\\to\\input_file.txt" "C:\\path\\to\\replacement_json.json" "C:\\path\\to\\knowledge_base"
|
483 |
+
|
484 |
+
translate
|
485 |
+
Translates the text file using the specified translation method.
|
486 |
+
|
487 |
+
Required arguments:
|
488 |
+
<input_file> Path to the text file to translate. This is a txt file.
|
489 |
+
|
490 |
+
Optional arguments:
|
491 |
+
<translation_method> Translation method to use ('deepl', 'openai', or 'gemini'). This defaults to deepl
|
492 |
+
<translation_settings_json> Path to the translation settings JSON file. This will override the current loaded settings.
|
493 |
+
<api_key> API key for the translation service. If not provided, it will use the one on file, otherwise it will ask if not provided
|
494 |
+
|
495 |
+
Example:
|
496 |
+
{python_command} Kudasai.py translate "C:\\path\\to\\input_file.txt" gemini "C:\\path\\to\\translation_settings.json" "YOUR API KEY"
|
497 |
|
498 |
+
Additional Notes:
|
499 |
+
- All arguments should be enclosed in double quotes if they contain spaces. But double quotes are optional and will be striped. Single quotes are not allowed
|
500 |
+
- For more information, refer to the documentation at README.md
|
501 |
+
""")
|
502 |
|
|
|
503 |
|
504 |
##-------------------start-of-submain()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
505 |
|
506 |
|
507 |
+
if(__name__ == "__main__"):
|
508 |
asyncio.run(main())
|
lib/common/translation_settings_description.txt
ADDED
@@ -0,0 +1,79 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
----------------------------------------------------------------------------------
|
2 |
+
Base Translation Settings:
|
3 |
+
|
4 |
+
prompt_assembly_mode : 1 or 2. 1 means the system message will actually be treated as a system message. 2 means it'll be treated as a user message. 1 is recommend for gpt-4 otherwise either works. For Gemini & DeepL, this setting is ignored.
|
5 |
+
|
6 |
+
number_of_lines_per_batch : The number of lines to be built into a prompt at once. Theoretically, more lines would be more cost effective, but other complications may occur with higher lines. So far been tested up to 48 by me.
|
7 |
+
|
8 |
+
sentence_fragmenter_mode : 1 or 2 (1 - via regex and other nonsense) 2 - None (Takes formatting and text directly from API return)) the API can sometimes return a result on a single line, so this determines the way Kudasai fragments the sentences if at all. Use 2 for newer models and Deepl.
|
9 |
+
|
10 |
+
je_check_mode : 1 or 2, 1 will print out the jap then the english below separated by ---, 2 will attempt to pair the english and jap sentences, placing the jap above the eng. If it cannot, it will default to 1. Use 2 for newer models and DeepL.
|
11 |
+
|
12 |
+
number_of_malformed_batch_retries : (Malformed batch is when je-fixing fails) How many times Kudasai will attempt to mend a malformed batch (mending is resending the request). Be careful with increasing as cost increases at (cost * length * n) at worst case. This setting is ignored if je_check_mode is set to 1.
|
13 |
+
|
14 |
+
batch_retry_timeout : How long Kudasai will try to translate a batch in seconds, if a requests exceeds this duration, Kudasai will leave it untranslated.
|
15 |
+
|
16 |
+
number_of_concurrent_batches : How many translations batches Kudasai will send to the translation API at a time. For OpenAI, be conservative as rate-limiting is aggressive, I'd suggest 3-5. For Gemini, do not exceed 15 for 1.0 or 2 for 1.5. This setting more or less doesn't matter for DeepL.
|
17 |
+
----------------------------------------------------------------------------------
|
18 |
+
Open AI Settings:
|
19 |
+
See https://platform.openai.com/docs/api-reference/chat/create for further details
|
20 |
+
----------------------------------------------------------------------------------
|
21 |
+
openai_model : ID of the model to use. Kudasai only works with 'chat' models.
|
22 |
+
|
23 |
+
openai_system_message : Instructions to the model. Basically tells the model how to translate.
|
24 |
+
|
25 |
+
openai_temperature : What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Lower values are typically better for translation.
|
26 |
+
|
27 |
+
openai_top_p : An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. I generally recommend altering this or temperature but not both.
|
28 |
+
|
29 |
+
openai_n : How many chat completion choices to generate for each input message. Do not change this, as Kudasai will always use 1.
|
30 |
+
|
31 |
+
openai_stream : If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. See the OpenAI python library on GitHub for example code. Do not change this as Kudasai does not support this feature.
|
32 |
+
|
33 |
+
openai_stop : Up to 4 sequences where the API will stop generating further tokens. Do not change this as Kudasai does not support this feature.
|
34 |
+
|
35 |
+
openai_logit_bias : Modifies the likelihood of specified tokens appearing in the completion. Do not change this as Kudasai does not support this feature.
|
36 |
+
|
37 |
+
openai_max_tokens : The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. I wouldn't recommend changing this. Is none by default. If you change to an integer, make sure it doesn't exceed that model's context length or your request will fail and repeat till timeout.
|
38 |
+
|
39 |
+
openai_presence_penalty : Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. While negative values encourage repetition. Should leave this at 0.0.
|
40 |
+
|
41 |
+
openai_frequency_penalty : Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. Negative values encourage repetition. Should leave this at 0.0.
|
42 |
+
----------------------------------------------------------------------------------
|
43 |
+
openai_stream, openai_logit_bias, openai_stop and openai_n are included for completion's sake, current versions of Kudasai will hardcode their values when validating the translation_settings.json to their default values. As different values for these settings do not have a use case in Kudasai's current implementation.
|
44 |
+
----------------------------------------------------------------------------------
|
45 |
+
Gemini Settings:
|
46 |
+
See https://ai.google.dev/docs/concepts#model-parameters for further details
|
47 |
+
----------------------------------------------------------------------------------
|
48 |
+
gemini_model : The model to use. Currently only supports gemini-pro and gemini-pro-vision, the 1.0 model and 1.5 models and their aliases.
|
49 |
+
|
50 |
+
gemini_prompt : Instructions to the model. Basically tells the model how to translate.
|
51 |
+
|
52 |
+
gemini_temperature : What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Lower values are typically better for translation.
|
53 |
+
|
54 |
+
gemini_top_p : An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. I generally recommend altering this or temperature but not both.
|
55 |
+
|
56 |
+
gemini_top_k : Determines the number of most probable tokens to consider for each selection step. A higher value increases diversity, a lower value makes the output more deterministic.
|
57 |
+
|
58 |
+
gemini_candidate_count : The number of candidates to generate for each input message. Do not change this as Kudasai will always use 1.
|
59 |
+
|
60 |
+
gemini_stream : If set, partial message deltas will be sent, like in Gemini Chat. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. Do not change this as Kudasai does not support this feature.
|
61 |
+
|
62 |
+
gemini_stop_sequences : Up to 4 sequences where the API will stop generating further tokens. Do not change this as Kudasai does not support this feature.
|
63 |
+
|
64 |
+
gemini_max_output_tokens : The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. I wouldn't recommend changing this. Is none by default. If you change to an integer, make sure it doesn't exceed that model's context length or your request will fail and repeat till timeout.
|
65 |
+
----------------------------------------------------------------------------------
|
66 |
+
gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for completion's sake, current versions of Kudasai will hardcode their values when validating the translation_settings.json to their default values. As different values for these settings do not have a use case in Kudasai's current implementation.
|
67 |
+
----------------------------------------------------------------------------------
|
68 |
+
Deepl Settings:
|
69 |
+
See https://developers.deepl.com/docs/api-reference/translate for further details
|
70 |
+
----------------------------------------------------------------------------------
|
71 |
+
deepl_context : The context in which the text should be translated. This is used to improve the translation. If you don't have any context, you can leave this empty. This is a DeepL Alpha feature and could be subject to change.
|
72 |
+
|
73 |
+
deepl_split_sentences : How the text should be split into sentences. Possible values are 'OFF', 'ALL', 'NO_NEWLINES'.
|
74 |
+
|
75 |
+
deepl_preserve_formatting : Whether the formatting of the text should be preserved. If you don't want to preserve the formatting, you can set this to False. Otherwise, set it to True.
|
76 |
+
|
77 |
+
deepl_formality : The formality of the text. Possible values are 'default', 'more', 'less', 'prefer_more', 'prefer_less'.
|
78 |
+
|
79 |
+
----------------------------------------------------------------------------------
|
lib/gui/HUGGING_FACE_README.md
ADDED
@@ -0,0 +1,224 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: gpl-3.0
|
3 |
+
title: Kudasai
|
4 |
+
sdk: gradio
|
5 |
+
emoji: 🈷️
|
6 |
+
python_version: 3.10.0
|
7 |
+
app_file: webgui.py
|
8 |
+
colorFrom: gray
|
9 |
+
colorTo: gray
|
10 |
+
short_description: Japanese-English preprocessor with automated translation.
|
11 |
+
pinned: true
|
12 |
+
---
|
13 |
+
|
14 |
+
---------------------------------------------------------------------------------------------------------------------------------------------------
|
15 |
+
**Table of Contents**
|
16 |
+
|
17 |
+
- [**Notes**](#notes)
|
18 |
+
- [**General Usage**](#general-usage)
|
19 |
+
- [**Indexing and Preprocessing**](#indexing-and-preprocessing)
|
20 |
+
- [**Translator**](#translator)
|
21 |
+
- [**Translator Settings**](#translator-settings)
|
22 |
+
- [**Web GUI**](#web-gui)
|
23 |
+
- [**License**](#license)
|
24 |
+
- [**Contact**](#contact)
|
25 |
+
- [**Acknowledgements**](#acknowledgements)
|
26 |
+
|
27 |
+
---------------------------------------------------------------------------------------------------------------------------------------------------
|
28 |
+
## **Notes**<a name="notes"></a>
|
29 |
+
|
30 |
+
This readme is for the Hugging Space instance of Kudasai's WebGUI and the WebGUI itself, to run Kudasai locally or see any info on the project, please see the [GitHub Page](https://github.com/Bikatr7/Kudasai).
|
31 |
+
|
32 |
+
Streamlining Japanese-English Translation with Advanced Preprocessing and Integrated Translation Technologies.
|
33 |
+
|
34 |
+
Preprocessor and Translation logic is sourced from external packages, which I also designed, see [Kairyou](https://github.com/Bikatr7/Kairyou) and [EasyTL](https://github.com/Bikatr7/easytl) for more information.
|
35 |
+
|
36 |
+
Kudasai has a public trello board, you can find it [here](https://trello.com/b/Wsuwr24S/kudasai) to see what I'm working on and what's coming up.
|
37 |
+
|
38 |
+
The WebGUI on huggingface does not save anything through runs, so you will need to download the output files or copy the text out of the webgui. API keys are not saved, and the output folder is overwritten every time you run it. Archives deleted every run as well.
|
39 |
+
|
40 |
+
Kudasai is proud to have been a Backdrop Build v3 Finalist:
|
41 |
+
https://backdropbuild.com/builds/v3/kudasai
|
42 |
+
|
43 |
+
---------------------------------------------------------------------------------------------------------------------------------------------------
|
44 |
+
|
45 |
+
## **General Usage**<a name="general-usage"></a>
|
46 |
+
|
47 |
+
Kudasai's WebGUI is pretty easy to understand for the general usage, most incorrect actions will be caught by the system and a message will be displayed to the user on how to correct it.
|
48 |
+
|
49 |
+
Normally, Kudasai would save files to the local system, but on Hugging Face's servers, this is not possible. Instead, you'll have to click the 'Save As' button to download the files to your local system.
|
50 |
+
|
51 |
+
Or you can click the copy button on the top right of textbox modals to copy the text to your clipboard.
|
52 |
+
|
53 |
+
For further details, see below chapters.
|
54 |
+
|
55 |
+
---------------------------------------------------------------------------------------------------------------------------------------------------
|
56 |
+
|
57 |
+
## **Indexing and Preprocessing**<a name="kairyou"></a>
|
58 |
+
|
59 |
+
This section can be skipped if you're only interested in translation or do not know what indexing or preprocessing is.
|
60 |
+
|
61 |
+
Indexing is not for everyone, only use it if you have a large amount of previous text and want to flag new names. It can be a very slow and long process, especially on Hugging Face's servers. It's recommended to use a local version of Kudasai for this process.
|
62 |
+
|
63 |
+
You'll need a txt file or some text to index. You'll also need a knowledge base, this can either be a single txt file or a directory of them, as well as a replacements json. Either Kudasai or Fukuin Type works. See [this](https://github.com/Bikatr7/Kairyou?tab=readme-ov-file#kairyou) for further details on replacement jsons.
|
64 |
+
|
65 |
+
Please do indexing before preprocessing, output is neater that way.
|
66 |
+
|
67 |
+
For Preprocessing, you'll need a txt file or some text to preprocess. You'll also need a replacements json. Either Kudasai or Fukuin Type works like with indexing.
|
68 |
+
|
69 |
+
For both, text is put in the textbox modals, with the output text being in the first field, and results being in the second field.
|
70 |
+
|
71 |
+
They both have a debug field, but neither module really uses it.
|
72 |
+
|
73 |
+
---------------------------------------------------------------------------------------------------------------------------------------------------
|
74 |
+
|
75 |
+
## **Translator**<a name="translator"></a>
|
76 |
+
|
77 |
+
Kudasai supports 3 different translation methods at the moment, OpenAI's GPT, Google's Gemini, and DeepL.
|
78 |
+
|
79 |
+
For OpenAI, you'll need an API key, you can get one [here](https://platform.openai.com/docs/api-reference/authentication). This is a paid service with no free tier.
|
80 |
+
|
81 |
+
For Gemini, you'll also need an API key, you can get one [here](https://ai.google.dev/tutorials/setup). Gemini is free to use under a certain limit, 2 RPM for 1.5 and 15 RPM for 1.0.
|
82 |
+
|
83 |
+
For DeepL, you'll need an API key too, you can get one [here](https://www.deepl.com/pro#developer). DeepL is also a paid service but is free under 500k characters a month.
|
84 |
+
|
85 |
+
I'd recommend using GPT for most things, as it's generally better at translation.
|
86 |
+
|
87 |
+
Mostly straightforward, choose your translation method, fill in your API key, and select your text. You'll also need to add your settings file if on HuggingFace if you want to tune the output, but the default is generally fine.
|
88 |
+
|
89 |
+
You can calculate costs here or just translate. Output will show in the appropriate fields.
|
90 |
+
|
91 |
+
For further details on the settings file, see [here](#translation-with-llms-settings).
|
92 |
+
|
93 |
+
---------------------------------------------------------------------------------------------------------------------------------------------------
|
94 |
+
|
95 |
+
## **Translator Settings**<a name="translator-settings"></a>
|
96 |
+
|
97 |
+
(Fairly technical, can be abstracted away by using default settings or someone else's settings file.)
|
98 |
+
|
99 |
+
Base Translation Settings:
|
100 |
+
|
101 |
+
prompt_assembly_mode : 1 or 2. 1 means the system message will actually be treated as a system message. 2 means it'll be treated as a user message. 1 is recommend for gpt-4 otherwise either works. For Gemini & DeepL, this setting is ignored.
|
102 |
+
|
103 |
+
number_of_lines_per_batch : The number of lines to be built into a prompt at once. Theoretically, more lines would be more cost effective, but other complications may occur with higher lines. So far been tested up to 48 by me.
|
104 |
+
|
105 |
+
sentence_fragmenter_mode : 1 or 2 (1 - via regex and other nonsense) 2 - None (Takes formatting and text directly from API return)) the API can sometimes return a result on a single line, so this determines the way Kudasai fragments the sentences if at all. Use 2 for newer models and Deepl.
|
106 |
+
|
107 |
+
je_check_mode : 1 or 2, 1 will print out the jap then the english below separated by ---, 2 will attempt to pair the english and jap sentences, placing the jap above the eng. If it cannot, it will default to 1. Use 2 for newer models and DeepL.
|
108 |
+
|
109 |
+
number_of_malformed_batch_retries : (Malformed batch is when je-fixing fails) How many times Kudasai will attempt to mend a malformed batch (mending is resending the request). Be careful with increasing as cost increases at (cost * length * n) at worst case. This setting is ignored if je_check_mode is set to 1.
|
110 |
+
|
111 |
+
batch_retry_timeout : How long Kudasai will try to translate a batch in seconds, if a requests exceeds this duration, Kudasai will leave it untranslated.
|
112 |
+
|
113 |
+
number_of_concurrent_batches : How many translations batches Kudasai will send to the translation API at a time. For OpenAI, be conservative as rate-limiting is aggressive, I'd suggest 3-5. For Gemini, do not exceed 15 for 1.0 or 2 for 1.5. This setting more or less doesn't matter for DeepL.
|
114 |
+
----------------------------------------------------------------------------------
|
115 |
+
Open AI Settings:
|
116 |
+
See https://platform.openai.com/docs/api-reference/chat/create for further details
|
117 |
+
----------------------------------------------------------------------------------
|
118 |
+
openai_model : ID of the model to use. Kudasai only works with 'chat' models.
|
119 |
+
|
120 |
+
openai_system_message : Instructions to the model. Basically tells the model how to translate.
|
121 |
+
|
122 |
+
openai_temperature : What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Lower values are typically better for translation.
|
123 |
+
|
124 |
+
openai_top_p : An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. I generally recommend altering this or temperature but not both.
|
125 |
+
|
126 |
+
openai_n : How many chat completion choices to generate for each input message. Do not change this, as Kudasai will always use 1.
|
127 |
+
|
128 |
+
openai_stream : If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. See the OpenAI python library on GitHub for example code. Do not change this as Kudasai does not support this feature.
|
129 |
+
|
130 |
+
openai_stop : Up to 4 sequences where the API will stop generating further tokens. Do not change this as Kudasai does not support this feature.
|
131 |
+
|
132 |
+
openai_logit_bias : Modifies the likelihood of specified tokens appearing in the completion. Do not change this as Kudasai does not support this feature.
|
133 |
+
|
134 |
+
openai_max_tokens : The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. I wouldn't recommend changing this. Is none by default. If you change to an integer, make sure it doesn't exceed that model's context length or your request will fail and repeat till timeout.
|
135 |
+
|
136 |
+
openai_presence_penalty : Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. While negative values encourage repetition. Should leave this at 0.0.
|
137 |
+
|
138 |
+
openai_frequency_penalty : Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. Negative values encourage repetition. Should leave this at 0.0.
|
139 |
+
----------------------------------------------------------------------------------
|
140 |
+
openai_stream, openai_logit_bias, openai_stop and openai_n are included for completion's sake, current versions of Kudasai will hardcode their values when validating the translation_settings.json to their default values. As different values for these settings do not have a use case in Kudasai's current implementation.
|
141 |
+
----------------------------------------------------------------------------------
|
142 |
+
Gemini Settings:
|
143 |
+
See https://ai.google.dev/docs/concepts#model-parameters for further details
|
144 |
+
----------------------------------------------------------------------------------
|
145 |
+
gemini_model : The model to use. Currently only supports gemini-pro and gemini-pro-vision, the 1.0 model and 1.5 models and their aliases.
|
146 |
+
|
147 |
+
gemini_prompt : Instructions to the model. Basically tells the model how to translate.
|
148 |
+
|
149 |
+
gemini_temperature : What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Lower values are typically better for translation.
|
150 |
+
|
151 |
+
gemini_top_p : An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. I generally recommend altering this or temperature but not both.
|
152 |
+
|
153 |
+
gemini_top_k : Determines the number of most probable tokens to consider for each selection step. A higher value increases diversity, a lower value makes the output more deterministic.
|
154 |
+
|
155 |
+
gemini_candidate_count : The number of candidates to generate for each input message. Do not change this as Kudasai will always use 1.
|
156 |
+
|
157 |
+
gemini_stream : If set, partial message deltas will be sent, like in Gemini Chat. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. Do not change this as Kudasai does not support this feature.
|
158 |
+
|
159 |
+
gemini_stop_sequences : Up to 4 sequences where the API will stop generating further tokens. Do not change this as Kudasai does not support this feature.
|
160 |
+
|
161 |
+
gemini_max_output_tokens : The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. I wouldn't recommend changing this. Is none by default. If you change to an integer, make sure it doesn't exceed that model's context length or your request will fail and repeat till timeout.
|
162 |
+
----------------------------------------------------------------------------------
|
163 |
+
gemini_stream, gemini_stop_sequences and gemini_candidate_count are included for completion's sake, current versions of Kudasai will hardcode their values when validating the translation_settings.json to their default values. As different values for these settings do not have a use case in Kudasai's current implementation.
|
164 |
+
----------------------------------------------------------------------------------
|
165 |
+
Deepl Settings:
|
166 |
+
See https://developers.deepl.com/docs/api-reference/translate for further details
|
167 |
+
----------------------------------------------------------------------------------
|
168 |
+
deepl_context : The context in which the text should be translated. This is used to improve the translation. If you don't have any context, you can leave this empty. This is a DeepL Alpha feature and could be subject to change.
|
169 |
+
|
170 |
+
deepl_split_sentences : How the text should be split into sentences. Possible values are 'OFF', 'ALL', 'NO_NEWLINES'.
|
171 |
+
|
172 |
+
deepl_preserve_formatting : Whether the formatting of the text should be preserved. If you don't want to preserve the formatting, you can set this to False. Otherwise, set it to True.
|
173 |
+
|
174 |
+
deepl_formality : The formality of the text. Possible values are 'default', 'more', 'less', 'prefer_more', 'prefer_less'.
|
175 |
+
|
176 |
+
---------------------------------------------------------------------------------------------------------------------------------------------------
|
177 |
+
|
178 |
+
## **Web GUI**<a name="webgui"></a>
|
179 |
+
|
180 |
+
Below are some images of the Web GUI.
|
181 |
+
|
182 |
+
Name Indexing | Kairyou:
|
183 |
+
![Name Indexing Screen | Kairyou](https://i.imgur.com/QCPqjrw.jpeg)
|
184 |
+
|
185 |
+
Text Preprocessing | Kairyou:
|
186 |
+
![Text Preprocessing Screen | Kairyou](https://i.imgur.com/r8nHEvw.jpeg)
|
187 |
+
|
188 |
+
Text Translation | Translator:
|
189 |
+
![Text Translation Screen | Translator](https://i.imgur.com/0E9q2eh.jpeg)
|
190 |
+
|
191 |
+
Translation Settings Page 1:
|
192 |
+
![Translation Settings Page 1](https://i.imgur.com/0E9q2eh.jpeg)
|
193 |
+
|
194 |
+
Translation Settings Page 2:
|
195 |
+
![Translation Settings Page 2](https://i.imgur.com/8MQk6pL.jpeg)
|
196 |
+
|
197 |
+
Logging Page:
|
198 |
+
![Logging Page](https://i.imgur.com/vDPCUQC.jpeg)
|
199 |
+
|
200 |
+
---------------------------------------------------------------------------------------------------------------------------------------------------
|
201 |
+
## **License**<a name="license"></a>
|
202 |
+
|
203 |
+
This project (Kudasai) is licensed under the GNU General Public License (GPL). You can find the full text of the license in the [LICENSE](License.md) file.
|
204 |
+
|
205 |
+
The GPL is a copyleft license that promotes the principles of open-source software. It ensures that any derivative works based on this project must also be distributed under the same GPL license. This license grants you the freedom to use, modify, and distribute the software.
|
206 |
+
|
207 |
+
Please note that this information is a brief summary of the GPL. For a detailed understanding of your rights and obligations under this license, please refer to the full license text.
|
208 |
+
|
209 |
+
---------------------------------------------------------------------------------------------------------------------------------------------------
|
210 |
+
## **Contact**<a name="contact"></a>
|
211 |
+
|
212 |
+
If you have any questions, comments, or concerns, please feel free to contact me at [Bikatr7@proton.me](mailto:Bikatr7@proton.me)
|
213 |
+
|
214 |
+
For any bugs or suggestions please use the issues tab [here](https://github.com/Bikatr7/Kudasai/issues).
|
215 |
+
|
216 |
+
I actively encourage and welcome any feedback on this project.
|
217 |
+
|
218 |
+
---------------------------------------------------------------------------------------------------------------------------------------------------
|
219 |
+
|
220 |
+
## **Acknowledgements**<a name="acknowledgements"></a>
|
221 |
+
|
222 |
+
Kudasai gets it's original name idea from it's inspiration, Atreyagaurav's Onegai. Which also means please. You can find that [here](https://github.com/Atreyagaurav/onegai)
|
223 |
+
|
224 |
+
---------------------------------------------------------------------------------------------------------------------------------------------------
|
lib/gui/save_to_file.js
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
(text) => {
|
2 |
+
const blob = new Blob([text], { type: 'text/plain;charset=utf-8' });
|
3 |
+
const url = URL.createObjectURL(blob);
|
4 |
+
const a = document.createElement('a');
|
5 |
+
a.href = url;
|
6 |
+
a.download = 'downloaded_text.txt';
|
7 |
+
a.click();
|
8 |
+
URL.revokeObjectURL(url);
|
9 |
+
}
|
models/kaiseki.py
DELETED
@@ -1,583 +0,0 @@
|
|
1 |
-
## Basically Deprecated, use Kijiku instead. Currently only maintained for backwards compatibility.
|
2 |
-
##---------------------------------------
|
3 |
-
##---------------------------------------
|
4 |
-
## built-in libraries
|
5 |
-
import string
|
6 |
-
import time
|
7 |
-
import re
|
8 |
-
import base64
|
9 |
-
import time
|
10 |
-
|
11 |
-
## third-party libraries
|
12 |
-
from easytl import EasyTL
|
13 |
-
|
14 |
-
## custom modules
|
15 |
-
from modules.common.toolkit import Toolkit
|
16 |
-
from modules.common.file_ensurer import FileEnsurer
|
17 |
-
from modules.common.logger import Logger
|
18 |
-
from modules.common.decorators import permission_error_decorator
|
19 |
-
from modules.common.exceptions import AuthorizationException, QuotaExceededException
|
20 |
-
|
21 |
-
##-------------------start-of-Kaiseki--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
22 |
-
|
23 |
-
class Kaiseki:
|
24 |
-
|
25 |
-
"""
|
26 |
-
|
27 |
-
Kaiseki is a secondary class that is used to interact with the Deepl API and translate Japanese text sentence by sentence.
|
28 |
-
|
29 |
-
Kaiseki is considered inferior to Kijiku, please consider using Kijiku instead.
|
30 |
-
|
31 |
-
"""
|
32 |
-
|
33 |
-
##---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
34 |
-
|
35 |
-
text_to_translate = []
|
36 |
-
|
37 |
-
translated_text = []
|
38 |
-
|
39 |
-
je_check_text = []
|
40 |
-
|
41 |
-
error_text = []
|
42 |
-
|
43 |
-
translation_print_result = ""
|
44 |
-
|
45 |
-
##---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
46 |
-
|
47 |
-
sentence_parts = []
|
48 |
-
|
49 |
-
sentence_punctuation = []
|
50 |
-
|
51 |
-
## [0] = "" [1] = ~ [2] = '' in Kaiseki.current_sentence but not entire Kaiseki.current_sentence [3] = '' but entire Kaiseki.current_sentence [3] if () in Kaiseki.current_sentence
|
52 |
-
special_punctuation = []
|
53 |
-
|
54 |
-
current_sentence = ""
|
55 |
-
|
56 |
-
translated_sentence = ""
|
57 |
-
|
58 |
-
##-------------------start-of-translate()--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
59 |
-
|
60 |
-
@staticmethod
|
61 |
-
def translate() -> None:
|
62 |
-
|
63 |
-
"""
|
64 |
-
|
65 |
-
Translates the text.
|
66 |
-
|
67 |
-
"""
|
68 |
-
|
69 |
-
Logger.clear_batch()
|
70 |
-
|
71 |
-
time_start = time.time()
|
72 |
-
|
73 |
-
try:
|
74 |
-
|
75 |
-
Kaiseki.initialize()
|
76 |
-
|
77 |
-
## offset time, for if the user doesn't get through Kaiseki.initialize() before the translation starts.
|
78 |
-
time_start = time.time()
|
79 |
-
|
80 |
-
Kaiseki.commence_translation()
|
81 |
-
|
82 |
-
except Exception as e:
|
83 |
-
|
84 |
-
Kaiseki.translation_print_result += "An error has occurred, outputting results so far..."
|
85 |
-
|
86 |
-
FileEnsurer.handle_critical_exception(e)
|
87 |
-
|
88 |
-
finally:
|
89 |
-
|
90 |
-
time_end = time.time()
|
91 |
-
|
92 |
-
Kaiseki.assemble_results(time_start, time_end)
|
93 |
-
|
94 |
-
##-------------------start-of-initialize()--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
95 |
-
|
96 |
-
@staticmethod
|
97 |
-
def initialize() -> None:
|
98 |
-
|
99 |
-
"""
|
100 |
-
|
101 |
-
Initializes the Kaiseki class by getting the API key and creating the translator object.
|
102 |
-
|
103 |
-
"""
|
104 |
-
|
105 |
-
## get saved api key if exists
|
106 |
-
try:
|
107 |
-
|
108 |
-
with open(FileEnsurer.deepl_api_key_path, 'r', encoding='utf-8') as file:
|
109 |
-
api_key = base64.b64decode((file.read()).encode('utf-8')).decode('utf-8')
|
110 |
-
|
111 |
-
EasyTL.set_api_key("deepl", api_key)
|
112 |
-
is_valid, e = EasyTL.test_api_key_validity("deepl")
|
113 |
-
|
114 |
-
assert is_valid == True, e
|
115 |
-
|
116 |
-
Logger.log_action("Used saved api key in " + FileEnsurer.deepl_api_key_path, output=True)
|
117 |
-
|
118 |
-
## else try to get api key manually
|
119 |
-
except Exception as e:
|
120 |
-
|
121 |
-
api_key = input("DO NOT DELETE YOUR COPY OF THE API KEY\n\nPlease enter the deepL api key you have : ")
|
122 |
-
|
123 |
-
## if valid save the api key
|
124 |
-
try:
|
125 |
-
|
126 |
-
EasyTL.set_api_key("deepl", api_key)
|
127 |
-
is_valid, e = EasyTL.test_api_key_validity("deepl")
|
128 |
-
|
129 |
-
assert is_valid, e
|
130 |
-
|
131 |
-
time.sleep(.1)
|
132 |
-
|
133 |
-
FileEnsurer.standard_overwrite_file(FileEnsurer.deepl_api_key_path, base64.b64encode(api_key.encode('utf-8')).decode('utf-8'), omit=True)
|
134 |
-
|
135 |
-
time.sleep(.1)
|
136 |
-
|
137 |
-
## if invalid key exit
|
138 |
-
except AuthorizationException:
|
139 |
-
|
140 |
-
Toolkit.clear_console()
|
141 |
-
|
142 |
-
Logger.log_action("Authorization error with creating translator object, please double check your api key as it appears to be incorrect.\nKaiseki will now exit.", output=True)
|
143 |
-
|
144 |
-
Toolkit.pause_console()
|
145 |
-
|
146 |
-
raise e # type: ignore
|
147 |
-
|
148 |
-
## other error, alert user and raise it
|
149 |
-
except Exception as e:
|
150 |
-
|
151 |
-
Toolkit.clear_console()
|
152 |
-
|
153 |
-
Logger.log_action("Unknown error with creating translator object, The error is as follows " + str(e) + "\nKaiseki will now exit.", output=True)
|
154 |
-
|
155 |
-
Toolkit.pause_console()
|
156 |
-
|
157 |
-
raise e
|
158 |
-
|
159 |
-
Toolkit.clear_console()
|
160 |
-
Logger.log_barrier()
|
161 |
-
|
162 |
-
##-------------------start-of-reset_static_variables()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
163 |
-
|
164 |
-
@staticmethod
|
165 |
-
def reset_static_variables() -> None:
|
166 |
-
|
167 |
-
"""
|
168 |
-
|
169 |
-
Resets the static variables of the Kaiseki class.
|
170 |
-
For when running multiple translations in a row through webgui.
|
171 |
-
|
172 |
-
"""
|
173 |
-
|
174 |
-
Logger.clear_batch()
|
175 |
-
|
176 |
-
Kaiseki.text_to_translate = []
|
177 |
-
Kaiseki.translated_text = []
|
178 |
-
Kaiseki.je_check_text = []
|
179 |
-
Kaiseki.error_text = []
|
180 |
-
Kaiseki.translation_print_result = ""
|
181 |
-
Kaiseki.sentence_parts = []
|
182 |
-
Kaiseki.sentence_punctuation = []
|
183 |
-
Kaiseki.special_punctuation = []
|
184 |
-
Kaiseki.current_sentence = ""
|
185 |
-
Kaiseki.translated_sentence = ""
|
186 |
-
|
187 |
-
##-------------------start-of-commence_translation()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
188 |
-
|
189 |
-
@staticmethod
|
190 |
-
def commence_translation() -> None:
|
191 |
-
|
192 |
-
"""
|
193 |
-
|
194 |
-
Commences the translation process using all the functions in the Kaiseki class.
|
195 |
-
|
196 |
-
"""
|
197 |
-
|
198 |
-
i = 0
|
199 |
-
|
200 |
-
while(i < len(Kaiseki.text_to_translate)):
|
201 |
-
|
202 |
-
## for webgui, if the user presses the clear button, raise an exception to stop the translation
|
203 |
-
if(FileEnsurer.do_interrupt == True):
|
204 |
-
raise Exception("Interrupted by user.")
|
205 |
-
|
206 |
-
Kaiseki.current_sentence = Kaiseki.text_to_translate[i]
|
207 |
-
|
208 |
-
Logger.log_action("Initial Sentence : " + Kaiseki.current_sentence)
|
209 |
-
|
210 |
-
## Kaiseki is an in-place translation, so it'll build the translated text into Kaiseki.translated_text as it goes.
|
211 |
-
if(any(char in Kaiseki.current_sentence for char in ["▼", "△", "◇"])):
|
212 |
-
Kaiseki.translated_text.append(Kaiseki.current_sentence + '\n')
|
213 |
-
Logger.log_action("Sentence : " + Kaiseki.current_sentence + ", Sentence is a pov change... leaving intact.")
|
214 |
-
|
215 |
-
elif("part" in Kaiseki.current_sentence.lower() or all(char in ["1","2","3","4","5","6","7","8","9", " "] for char in Kaiseki.current_sentence) and not all(char in [" "] for char in Kaiseki.current_sentence) and Kaiseki.current_sentence != '"..."' and Kaiseki.current_sentence != "..."):
|
216 |
-
Kaiseki.translated_text.append(Kaiseki.current_sentence + '\n')
|
217 |
-
Logger.log_action("Sentence : " + Kaiseki.current_sentence + ", Sentence is part marker... leaving intact.")
|
218 |
-
|
219 |
-
elif bool(re.match(r'^[\W_\s\n-]+$', Kaiseki.current_sentence)) and not any(char in Kaiseki.current_sentence for char in ["」", "「", "«", "»"]):
|
220 |
-
Logger.log_action("Sentence : " + Kaiseki.current_sentence + ", Sentence is punctuation... skipping.")
|
221 |
-
Kaiseki.translated_text.append(Kaiseki.current_sentence + "\n")
|
222 |
-
|
223 |
-
elif(bool(re.match(r'^[A-Za-z0-9\s\.,\'\?!]+\n*$', Kaiseki.current_sentence))):
|
224 |
-
Logger.log_action("Sentence : " + Kaiseki.current_sentence + ", Sentence is english... skipping translation.")
|
225 |
-
Kaiseki.translated_text.append(Kaiseki.current_sentence + "\n")
|
226 |
-
|
227 |
-
elif(len(Kaiseki.current_sentence) == 0 or Kaiseki.current_sentence.isspace() == True):
|
228 |
-
Logger.log_action("Sentence is empty... skipping translation.\n")
|
229 |
-
Kaiseki.translated_text.append(Kaiseki.current_sentence + "\n")
|
230 |
-
|
231 |
-
else:
|
232 |
-
|
233 |
-
Kaiseki.separate_sentence()
|
234 |
-
|
235 |
-
Kaiseki.translate_sentence()
|
236 |
-
|
237 |
-
## this is for adding a period if it's missing
|
238 |
-
if(len(Kaiseki.translated_text[i]) > 0 and Kaiseki.translated_text[i] != "" and Kaiseki.translated_text[i][-2] not in string.punctuation and Kaiseki.sentence_punctuation[-1] == None):
|
239 |
-
Kaiseki.translated_text[i] = Kaiseki.translated_text[i] + "."
|
240 |
-
|
241 |
-
## re-adds quotes
|
242 |
-
if(Kaiseki.special_punctuation[0] == True):
|
243 |
-
Kaiseki.translated_text[i] = '"' + Kaiseki.translated_text[i] + '"'
|
244 |
-
|
245 |
-
## replaces quotes because deepL messes up quotes
|
246 |
-
elif('"' in Kaiseki.translated_text[i]):
|
247 |
-
Kaiseki.translated_text[i] = Kaiseki.translated_text[i].replace('"',"'")
|
248 |
-
|
249 |
-
## re-adds single quotes
|
250 |
-
if(Kaiseki.special_punctuation[3] == True):
|
251 |
-
Kaiseki.translated_text[i] = "'" + Kaiseki.translated_text[i] + "'"
|
252 |
-
|
253 |
-
## re-adds parentheses
|
254 |
-
if(Kaiseki.special_punctuation[4] == True):
|
255 |
-
Kaiseki.translated_text[i] = "(" + Kaiseki.translated_text[i] + ")"
|
256 |
-
|
257 |
-
Logger.log_action("Translated and Reassembled Sentence : " + Kaiseki.translated_text[i])
|
258 |
-
|
259 |
-
Kaiseki.translated_text[i] += "\n"
|
260 |
-
|
261 |
-
Kaiseki.je_check_text.append(str(i+1) + ": " + Kaiseki.current_sentence + "\n " + Kaiseki.translated_text[i] + "\n")
|
262 |
-
|
263 |
-
i+=1
|
264 |
-
|
265 |
-
Toolkit.clear_console()
|
266 |
-
|
267 |
-
Logger.log_action(str(i) + "/" + str(len(Kaiseki.text_to_translate)) + " completed.", output=True)
|
268 |
-
Logger.log_barrier()
|
269 |
-
|
270 |
-
##-------------------start-of-separate_sentence()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
271 |
-
|
272 |
-
@staticmethod
|
273 |
-
def separate_sentence() -> None:
|
274 |
-
|
275 |
-
"""
|
276 |
-
|
277 |
-
This function separates the sentence into parts and punctuation.
|
278 |
-
|
279 |
-
"""
|
280 |
-
|
281 |
-
## resets variables for current_sentence
|
282 |
-
Kaiseki.sentence_parts = []
|
283 |
-
Kaiseki.sentence_punctuation = []
|
284 |
-
Kaiseki.special_punctuation = [False,False,False,False,False]
|
285 |
-
|
286 |
-
i = 0
|
287 |
-
|
288 |
-
buildString = ""
|
289 |
-
|
290 |
-
## checks if quotes are in the sentence and removes them
|
291 |
-
if('"' in Kaiseki.current_sentence):
|
292 |
-
Kaiseki.current_sentence = Kaiseki.current_sentence.replace('"', '')
|
293 |
-
Kaiseki.special_punctuation[0] = True
|
294 |
-
|
295 |
-
## checks if tildes are in the sentence
|
296 |
-
if('~' in Kaiseki.current_sentence):
|
297 |
-
Kaiseki.special_punctuation[1] = True
|
298 |
-
|
299 |
-
## checks if apostrophes are in the sentence but not at the beginning or end
|
300 |
-
if(Kaiseki.current_sentence.count("'") == 2 and (Kaiseki.current_sentence[0] != "'" and Kaiseki.current_sentence[-1] != "'")):
|
301 |
-
Kaiseki.special_punctuation[2] = True
|
302 |
-
|
303 |
-
## checks if apostrophes are in the sentence and removes them
|
304 |
-
elif(Kaiseki.current_sentence.count("'") == 2):
|
305 |
-
Kaiseki.special_punctuation[3] = True
|
306 |
-
Kaiseki.current_sentence = Kaiseki.current_sentence.replace("'", "")
|
307 |
-
|
308 |
-
## checks if parentheses are in the sentence and removes them
|
309 |
-
if("(" in Kaiseki.current_sentence and ")" in Kaiseki.current_sentence):
|
310 |
-
Kaiseki.special_punctuation[4] = True
|
311 |
-
Kaiseki.current_sentence= Kaiseki.current_sentence.replace("(","")
|
312 |
-
Kaiseki.current_sentence= Kaiseki.current_sentence.replace(")","")
|
313 |
-
|
314 |
-
while(i < len(Kaiseki.current_sentence)):
|
315 |
-
|
316 |
-
if(Kaiseki.current_sentence[i] in [".","!","?","-"]):
|
317 |
-
|
318 |
-
if(i+5 < len(Kaiseki.current_sentence) and Kaiseki.current_sentence[i:i+6] in ["......"]):
|
319 |
-
|
320 |
-
if(i+6 < len(Kaiseki.current_sentence) and Kaiseki.current_sentence[i:i+7] in ["......'"]):
|
321 |
-
buildString += "'"
|
322 |
-
i+=1
|
323 |
-
|
324 |
-
if(buildString != ""):
|
325 |
-
Kaiseki.sentence_parts.append(buildString)
|
326 |
-
|
327 |
-
Kaiseki.sentence_punctuation.append(Kaiseki.current_sentence[i:i+6])
|
328 |
-
i+=5
|
329 |
-
buildString = ""
|
330 |
-
|
331 |
-
if(i+4 < len(Kaiseki.current_sentence) and Kaiseki.current_sentence[i:i+5] in [".....","...!?"]):
|
332 |
-
|
333 |
-
if(i+5 < len(Kaiseki.current_sentence) and Kaiseki.current_sentence[i:i+6] in [".....'","...!?'"]):
|
334 |
-
buildString += "'"
|
335 |
-
i+=1
|
336 |
-
|
337 |
-
if(buildString != ""):
|
338 |
-
Kaiseki.sentence_parts.append(buildString)
|
339 |
-
|
340 |
-
Kaiseki.sentence_punctuation.append(Kaiseki.current_sentence[i:i+5])
|
341 |
-
i+=4
|
342 |
-
buildString = ""
|
343 |
-
|
344 |
-
elif(i+3 < len(Kaiseki.current_sentence) and Kaiseki.current_sentence[i:i+4] in ["...!","...?","---.","....","!..."]):
|
345 |
-
|
346 |
-
if(i+4 < len(Kaiseki.current_sentence) and Kaiseki.current_sentence[i:i+5] in ["...!'","...?'","---.'","....'","!...'"]):
|
347 |
-
buildString += "'"
|
348 |
-
i+=1
|
349 |
-
|
350 |
-
if(buildString != ""):
|
351 |
-
Kaiseki.sentence_parts.append(buildString)
|
352 |
-
|
353 |
-
Kaiseki.sentence_punctuation.append(Kaiseki.current_sentence[i:i+4])
|
354 |
-
i+=3
|
355 |
-
buildString = ""
|
356 |
-
|
357 |
-
elif(i+2 < len(Kaiseki.current_sentence) and Kaiseki.current_sentence[i:i+3] in ["---","..."]):
|
358 |
-
|
359 |
-
if(i+3 < len(Kaiseki.current_sentence) and Kaiseki.current_sentence[i:i+4] in ["---'","...'"]):
|
360 |
-
buildString += "'"
|
361 |
-
i+=1
|
362 |
-
|
363 |
-
if(buildString != ""):
|
364 |
-
Kaiseki.sentence_parts.append(buildString)
|
365 |
-
|
366 |
-
Kaiseki.sentence_punctuation.append(Kaiseki.current_sentence[i:i+3])
|
367 |
-
i+=2
|
368 |
-
buildString = ""
|
369 |
-
|
370 |
-
elif(i+1 < len(Kaiseki.current_sentence) and Kaiseki.current_sentence[i:i+2] == '!?'):
|
371 |
-
|
372 |
-
if(i+2 < len(Kaiseki.current_sentence) and Kaiseki.current_sentence[i:i+3] == "!?'"):
|
373 |
-
buildString += "'"
|
374 |
-
i+=1
|
375 |
-
|
376 |
-
if(buildString != ""):
|
377 |
-
Kaiseki.sentence_parts.append(buildString)
|
378 |
-
|
379 |
-
Kaiseki.sentence_punctuation.append(Kaiseki.current_sentence[i:i+2])
|
380 |
-
i+=1
|
381 |
-
buildString = ""
|
382 |
-
|
383 |
-
## if punctuation that was found is not a hyphen then just follow normal punctuation separation rules
|
384 |
-
elif(Kaiseki.current_sentence[i] != "-"):
|
385 |
-
|
386 |
-
if(i+1 < len(Kaiseki.current_sentence) and Kaiseki.current_sentence[i+1] == "'"):
|
387 |
-
buildString += "'"
|
388 |
-
|
389 |
-
if(buildString != ""):
|
390 |
-
Kaiseki.sentence_parts.append(buildString)
|
391 |
-
|
392 |
-
Kaiseki.sentence_punctuation.append(Kaiseki.current_sentence[i])
|
393 |
-
buildString = ""
|
394 |
-
|
395 |
-
## if it is just a singular hyphen, do not consider it punctuation as they are used in honorifics
|
396 |
-
else:
|
397 |
-
buildString += Kaiseki.current_sentence[i]
|
398 |
-
else:
|
399 |
-
buildString += Kaiseki.current_sentence[i]
|
400 |
-
|
401 |
-
i += 1
|
402 |
-
|
403 |
-
## if end of line, add none punctuation which means a period needs to be added later
|
404 |
-
if(buildString):
|
405 |
-
Kaiseki.sentence_parts.append(buildString)
|
406 |
-
Kaiseki.sentence_punctuation.append(None)
|
407 |
-
|
408 |
-
Logger.log_action("Fragmented Sentence Parts " + str(Kaiseki.sentence_parts))
|
409 |
-
Logger.log_action("Sentence Punctuation " + str(Kaiseki.sentence_punctuation))
|
410 |
-
Logger.log_action("Does Sentence Have Special Punctuation : " + str(Kaiseki.special_punctuation))
|
411 |
-
|
412 |
-
## strip the sentence parts
|
413 |
-
Kaiseki.sentence_parts = [part.strip() for part in Kaiseki.sentence_parts]
|
414 |
-
|
415 |
-
##-------------------start-of-translate_sentence()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
416 |
-
|
417 |
-
@staticmethod
|
418 |
-
def translate_sentence() -> None:
|
419 |
-
|
420 |
-
"""
|
421 |
-
|
422 |
-
This function translates each part of a sentence.
|
423 |
-
|
424 |
-
"""
|
425 |
-
|
426 |
-
i = 0
|
427 |
-
ii = 0
|
428 |
-
|
429 |
-
quote = ""
|
430 |
-
error = ""
|
431 |
-
|
432 |
-
tilde_active = False
|
433 |
-
single_quote_active = False
|
434 |
-
|
435 |
-
while(i < len(Kaiseki.sentence_parts)):
|
436 |
-
|
437 |
-
## if tilde is present in part, delete it and set tilde active to true, so we can add it back in a bit
|
438 |
-
if(Kaiseki.special_punctuation[1] == True and "~" in Kaiseki.sentence_parts[i]):
|
439 |
-
Kaiseki.sentence_parts[i] = Kaiseki.sentence_parts[i].replace("~","")
|
440 |
-
tilde_active = True
|
441 |
-
|
442 |
-
## a quote is present in the sentence, but not enclosing the sentence, we need to isolate it
|
443 |
-
if(Kaiseki.special_punctuation[2] == True and "'" in Kaiseki.sentence_parts[i] and (Kaiseki.sentence_parts[i][0] != "'" and Kaiseki.sentence_parts[i][-1] != "'")):
|
444 |
-
|
445 |
-
sentence = Kaiseki.sentence_parts[i]
|
446 |
-
substring_start = sentence.index("'")
|
447 |
-
substring_end = 0
|
448 |
-
quote = ""
|
449 |
-
|
450 |
-
ii = substring_start
|
451 |
-
while(ii < len(sentence)):
|
452 |
-
if(sentence[ii] == "'"):
|
453 |
-
substring_end = ii
|
454 |
-
ii+=1
|
455 |
-
|
456 |
-
quote = sentence[substring_start+1:substring_end]
|
457 |
-
Kaiseki.sentence_parts[i] = sentence[:substring_start+1] + "quote" + sentence[substring_end:]
|
458 |
-
|
459 |
-
single_quote_active = True
|
460 |
-
|
461 |
-
try:
|
462 |
-
results = EasyTL.deepl_translate(text=Kaiseki.sentence_parts[i], source_lang= "JA", target_lang="EN-US")
|
463 |
-
|
464 |
-
assert isinstance(results, str), "ValueError: " + str(results)
|
465 |
-
|
466 |
-
translated_part = results.rstrip(''.join(c for c in string.punctuation if c not in "'\""))
|
467 |
-
translated_part = translated_part.rstrip()
|
468 |
-
|
469 |
-
## here we re-add the tilde, (note not always accurate but mostly is)
|
470 |
-
if(tilde_active == True):
|
471 |
-
translated_part += "~"
|
472 |
-
tilde_active = False
|
473 |
-
|
474 |
-
## translates the quote and re-adds it back to the sentence part
|
475 |
-
if(single_quote_active == True):
|
476 |
-
results = EasyTL.deepl_translate(text=Kaiseki.sentence_parts[i], source_lang= "JA", target_lang="EN-US")
|
477 |
-
|
478 |
-
assert isinstance(results, str), "ValueError: " + str(results)
|
479 |
-
|
480 |
-
quote = quote.rstrip(''.join(c for c in string.punctuation if c not in "'\""))
|
481 |
-
quote = quote.rstrip()
|
482 |
-
|
483 |
-
translated_part = translated_part.replace("'quote'","'" + quote + "'",1)
|
484 |
-
|
485 |
-
## if punctuation appears first and before any text, add the punctuation and remove it form the list.
|
486 |
-
if(len(Kaiseki.sentence_punctuation) > len(Kaiseki.sentence_parts)):
|
487 |
-
Kaiseki.translated_sentence += Kaiseki.sentence_punctuation[0]
|
488 |
-
Kaiseki.sentence_punctuation.pop(0)
|
489 |
-
|
490 |
-
if(Kaiseki.sentence_punctuation[i] != None):
|
491 |
-
Kaiseki.translated_sentence += translated_part + Kaiseki.sentence_punctuation[i]
|
492 |
-
else:
|
493 |
-
Kaiseki.translated_sentence += translated_part
|
494 |
-
|
495 |
-
if(i != len(Kaiseki.sentence_punctuation)-1):
|
496 |
-
Kaiseki.translated_sentence += " "
|
497 |
-
|
498 |
-
except QuotaExceededException as e:
|
499 |
-
|
500 |
-
Logger.log_action("DeepL API quota exceeded.", output=True)
|
501 |
-
|
502 |
-
Toolkit.pause_console()
|
503 |
-
|
504 |
-
raise e
|
505 |
-
|
506 |
-
except ValueError as e:
|
507 |
-
|
508 |
-
if(str(e) == "Text must not be empty."):
|
509 |
-
Kaiseki.translated_sentence += ""
|
510 |
-
else:
|
511 |
-
Kaiseki.translated_sentence += "ERROR"
|
512 |
-
error = str(e)
|
513 |
-
|
514 |
-
Logger.log_action("Error is : " + error)
|
515 |
-
Kaiseki.error_text.append("Error is : " + error)
|
516 |
-
|
517 |
-
i+=1
|
518 |
-
|
519 |
-
Kaiseki.translated_text.append(Kaiseki.translated_sentence)
|
520 |
-
Kaiseki.translated_sentence = ""
|
521 |
-
|
522 |
-
##-------------------start-of-assemble_results()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
523 |
-
|
524 |
-
@staticmethod
|
525 |
-
def assemble_results(time_start:float, time_end:float) -> None:
|
526 |
-
|
527 |
-
"""
|
528 |
-
|
529 |
-
Prepares the results of the translation for printing.
|
530 |
-
|
531 |
-
Parameters:
|
532 |
-
time_start (float) : the time the translation started.
|
533 |
-
time_end (float) : the time the translation ended.
|
534 |
-
|
535 |
-
"""
|
536 |
-
|
537 |
-
Kaiseki.translation_print_result += "Time Elapsed : " + Toolkit.get_elapsed_time(time_start, time_end)
|
538 |
-
|
539 |
-
Kaiseki.translation_print_result += "\n\nDebug text have been written to : " + FileEnsurer.debug_log_path
|
540 |
-
Kaiseki.translation_print_result += "\nJ->E text have been written to : " + FileEnsurer.je_check_path
|
541 |
-
Kaiseki.translation_print_result += "\nTranslated text has been written to : " + FileEnsurer.translated_text_path
|
542 |
-
Kaiseki.translation_print_result += "\nErrors have been written to : " + FileEnsurer.error_log_path + "\n"
|
543 |
-
|
544 |
-
##-------------------start-of-write_kaiseki_results()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
545 |
-
|
546 |
-
@staticmethod
|
547 |
-
@permission_error_decorator()
|
548 |
-
def write_kaiseki_results() -> None:
|
549 |
-
|
550 |
-
"""
|
551 |
-
|
552 |
-
This function is called to write the results of the Kaiseki translation module to the output directory.
|
553 |
-
|
554 |
-
"""
|
555 |
-
|
556 |
-
## ensures the output directory exists, cause it could get moved or fucked with.
|
557 |
-
FileEnsurer.standard_create_directory(FileEnsurer.output_dir)
|
558 |
-
|
559 |
-
with open(FileEnsurer.error_log_path, 'a+', encoding='utf-8') as file:
|
560 |
-
file.writelines(Kaiseki.error_text)
|
561 |
-
|
562 |
-
with open(FileEnsurer.je_check_path, 'w', encoding='utf-8') as file:
|
563 |
-
file.writelines(Kaiseki.je_check_text)
|
564 |
-
|
565 |
-
with open(FileEnsurer.translated_text_path, 'w', encoding='utf-8') as file:
|
566 |
-
file.writelines(Kaiseki.translated_text)
|
567 |
-
|
568 |
-
## Instructions to create a copy of the output for archival
|
569 |
-
FileEnsurer.standard_create_directory(FileEnsurer.archive_dir)
|
570 |
-
|
571 |
-
timestamp = Toolkit.get_timestamp(is_archival=True)
|
572 |
-
|
573 |
-
## pushes the tl debug log to the file without clearing the file
|
574 |
-
Logger.push_batch()
|
575 |
-
Logger.clear_batch()
|
576 |
-
|
577 |
-
list_of_result_tuples = [('kaiseki_translated_text', Kaiseki.translated_text),
|
578 |
-
('kaiseki_je_check_text', Kaiseki.je_check_text),
|
579 |
-
('kaiseki_error_log', Kaiseki.error_text),
|
580 |
-
('debug_log', FileEnsurer.standard_read_file(Logger.log_file_path))]
|
581 |
-
|
582 |
-
FileEnsurer.archive_results(list_of_result_tuples,
|
583 |
-
module='kaiseki', timestamp=timestamp)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
modules/common/exceptions.py
CHANGED
@@ -1,12 +1,33 @@
|
|
1 |
## third-party libraries
|
2 |
## for importing, other scripts will use from common.exceptions instead of from the third-party libraries themselves
|
3 |
-
from easytl import
|
4 |
-
from easytl import
|
5 |
-
from easytl import GoogleAuthError
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
|
7 |
##-------------------start-of-MaxBatchDurationExceededException--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
8 |
|
9 |
-
class MaxBatchDurationExceededException(
|
10 |
|
11 |
"""
|
12 |
|
@@ -27,7 +48,7 @@ class MaxBatchDurationExceededException(Exception):
|
|
27 |
|
28 |
##-------------------start-of-InvalidAPIKeyException--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
29 |
|
30 |
-
class InvalidAPIKeyException(
|
31 |
|
32 |
"""
|
33 |
|
@@ -48,7 +69,7 @@ class InvalidAPIKeyException(Exception):
|
|
48 |
|
49 |
##-------------------start-of-TooManyFileAccessAttemptsException--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
50 |
|
51 |
-
class TooManyFileAccessAttemptsException(
|
52 |
|
53 |
"""
|
54 |
|
|
|
1 |
## third-party libraries
|
2 |
## for importing, other scripts will use from common.exceptions instead of from the third-party libraries themselves
|
3 |
+
from easytl import OpenAIAuthenticationError, OpenAIInternalServerError, OpenAIRateLimitError, OpenAIAPITimeoutError, OpenAIAPIConnectionError, OpenAIAPIStatusError
|
4 |
+
from easytl import DeepLAuthorizationException, DeepLQuotaExceededException, DeepLException
|
5 |
+
from easytl import GoogleAuthError, GoogleAPIError
|
6 |
+
|
7 |
+
##-------------------start-of-KudasaiException--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
8 |
+
|
9 |
+
class KudasaiException(Exception):
|
10 |
+
|
11 |
+
"""
|
12 |
+
|
13 |
+
KudasaiException is an exception that is raised when an error occurs within the Kudasai Application.
|
14 |
+
|
15 |
+
"""
|
16 |
+
|
17 |
+
def __init__(self, message:str) -> None:
|
18 |
+
|
19 |
+
"""
|
20 |
+
|
21 |
+
Parameters:
|
22 |
+
message (string) : The message to display.
|
23 |
+
|
24 |
+
"""
|
25 |
+
|
26 |
+
self.message = message
|
27 |
|
28 |
##-------------------start-of-MaxBatchDurationExceededException--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
29 |
|
30 |
+
class MaxBatchDurationExceededException(KudasaiException):
|
31 |
|
32 |
"""
|
33 |
|
|
|
48 |
|
49 |
##-------------------start-of-InvalidAPIKeyException--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
50 |
|
51 |
+
class InvalidAPIKeyException(KudasaiException):
|
52 |
|
53 |
"""
|
54 |
|
|
|
69 |
|
70 |
##-------------------start-of-TooManyFileAccessAttemptsException--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
71 |
|
72 |
+
class TooManyFileAccessAttemptsException(KudasaiException):
|
73 |
|
74 |
"""
|
75 |
|
modules/common/file_ensurer.py
CHANGED
@@ -4,10 +4,10 @@ import traceback
|
|
4 |
import json
|
5 |
import typing
|
6 |
import shutil
|
|
|
7 |
|
8 |
## custom modules
|
9 |
from modules.common.decorators import permission_error_decorator
|
10 |
-
from modules.common.logger import Logger
|
11 |
from modules.common.toolkit import Toolkit
|
12 |
|
13 |
class FileEnsurer():
|
@@ -16,7 +16,7 @@ class FileEnsurer():
|
|
16 |
|
17 |
FileEnsurer is a class that is used to ensure that the required files and directories exist.
|
18 |
Also serves as a place to store the paths to the files and directories. Some file related functions are also stored here.
|
19 |
-
As well as some variables that are used to store the default
|
20 |
|
21 |
"""
|
22 |
|
@@ -28,37 +28,36 @@ class FileEnsurer():
|
|
28 |
hugging_face_flag = os.path.join(script_dir, "util", "hugging_face_flag.py")
|
29 |
|
30 |
## main dirs (config is just under userprofile on windows, and under home on linux); secrets are under appdata on windows, and under .config on linux
|
31 |
-
if(
|
32 |
config_dir = os.path.join(os.environ['USERPROFILE'],"KudasaiConfig")
|
33 |
secrets_dir = os.path.join(os.environ['APPDATA'],"KudasaiSecrets")
|
34 |
else: ## Linux
|
35 |
config_dir = os.path.join(os.path.expanduser("~"), "KudasaiConfig")
|
36 |
secrets_dir = os.path.join(os.path.expanduser("~"), ".config", "KudasaiSecrets")
|
37 |
|
38 |
-
Logger.log_file_path = os.path.join(output_dir, "debug_log.txt")
|
39 |
-
|
40 |
##----------------------------------/
|
41 |
|
42 |
## sub dirs
|
43 |
lib_dir = os.path.join(script_dir, "lib")
|
|
|
44 |
gui_lib = os.path.join(lib_dir, "gui")
|
45 |
jsons_dir = os.path.join(script_dir, "jsons")
|
46 |
|
47 |
##----------------------------------/
|
48 |
|
49 |
## output files
|
50 |
-
preprocessed_text_path = os.path.join(output_dir, "preprocessed_text.txt")
|
51 |
-
translated_text_path = os.path.join(output_dir, "translated_text.txt")
|
52 |
|
53 |
-
je_check_path = os.path.join(output_dir, "je_check_text.txt")
|
54 |
|
55 |
-
kairyou_log_path = os.path.join(output_dir, "preprocessing_results.txt")
|
56 |
-
error_log_path = os.path.join(output_dir, "error_log.txt")
|
57 |
-
debug_log_path =
|
58 |
|
59 |
-
##
|
60 |
-
|
61 |
-
|
62 |
|
63 |
## api keys
|
64 |
deepl_api_key_path = os.path.join(secrets_dir, "deepl_api_key.txt")
|
@@ -68,8 +67,14 @@ class FileEnsurer():
|
|
68 |
## favicon
|
69 |
favicon_path = os.path.join(gui_lib, "Kudasai_Logo.png")
|
70 |
|
71 |
-
|
72 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
73 |
"prompt_assembly_mode": 1,
|
74 |
"number_of_lines_per_batch": 36,
|
75 |
"sentence_fragmenter_mode": 2,
|
@@ -103,9 +108,16 @@ class FileEnsurer():
|
|
103 |
"gemini_stream": False,
|
104 |
"gemini_stop_sequences": None,
|
105 |
"gemini_max_output_tokens": None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
106 |
}
|
107 |
}
|
108 |
-
|
109 |
"INVALID JSON":
|
110 |
{
|
111 |
"INVALID JSON":"INVALID JSON"
|
@@ -127,6 +139,9 @@ class FileEnsurer():
|
|
127 |
|
128 |
Determines if Kudasai is running on a Hugging Face server.
|
129 |
|
|
|
|
|
|
|
130 |
"""
|
131 |
|
132 |
return os.path.exists(FileEnsurer.hugging_face_flag)
|
@@ -144,8 +159,6 @@ class FileEnsurer():
|
|
144 |
|
145 |
print("Cleaning up and exiting...")
|
146 |
|
147 |
-
Logger.push_batch()
|
148 |
-
|
149 |
exit()
|
150 |
|
151 |
##-------------------start-of-setup_needed_files()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
@@ -158,6 +171,9 @@ class FileEnsurer():
|
|
158 |
|
159 |
Ensures that the required files and directories exist.
|
160 |
|
|
|
|
|
|
|
161 |
"""
|
162 |
|
163 |
|
@@ -168,9 +184,6 @@ class FileEnsurer():
|
|
168 |
FileEnsurer.standard_create_directory(FileEnsurer.output_dir)
|
169 |
FileEnsurer.standard_create_directory(FileEnsurer.secrets_dir)
|
170 |
|
171 |
-
## creates and clears the log file
|
172 |
-
Logger.clear_log_file()
|
173 |
-
|
174 |
## creates the 5 output files
|
175 |
FileEnsurer.standard_create_file(FileEnsurer.preprocessed_text_path)
|
176 |
FileEnsurer.standard_create_file(FileEnsurer.translated_text_path)
|
@@ -179,9 +192,9 @@ class FileEnsurer():
|
|
179 |
FileEnsurer.standard_create_file(FileEnsurer.error_log_path)
|
180 |
|
181 |
## creates the kijiku rules file if it doesn't exist
|
182 |
-
if(os.path.exists(FileEnsurer.
|
183 |
-
with open(FileEnsurer.
|
184 |
-
json.dump(FileEnsurer.
|
185 |
|
186 |
##-------------------start-of-purge_storage()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
187 |
|
@@ -193,10 +206,16 @@ class FileEnsurer():
|
|
193 |
|
194 |
In case of hugging face, purges the storage.
|
195 |
|
|
|
|
|
|
|
196 |
"""
|
197 |
|
198 |
if(not FileEnsurer.is_hugging_space()):
|
|
|
199 |
return
|
|
|
|
|
200 |
|
201 |
stuff_to_purge = [
|
202 |
FileEnsurer.secrets_dir,
|
@@ -243,11 +262,14 @@ class FileEnsurer():
|
|
243 |
Parameters:
|
244 |
directory_path (str) : path to the directory to be created.
|
245 |
|
|
|
|
|
|
|
246 |
"""
|
247 |
|
248 |
if(os.path.isdir(directory_path) == False):
|
249 |
os.makedirs(directory_path)
|
250 |
-
|
251 |
|
252 |
##--------------------start-of-standard_create_file()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
253 |
|
@@ -262,10 +284,13 @@ class FileEnsurer():
|
|
262 |
Parameters:
|
263 |
file_path (str) : path to the file to be created.
|
264 |
|
|
|
|
|
|
|
265 |
"""
|
266 |
|
267 |
if(os.path.exists(file_path) == False):
|
268 |
-
|
269 |
with open(file_path, "w+", encoding="utf-8") as file:
|
270 |
file.truncate()
|
271 |
|
@@ -286,12 +311,15 @@ class FileEnsurer():
|
|
286 |
Returns:
|
287 |
bool : whether or not the file was overwritten.
|
288 |
|
|
|
|
|
|
|
289 |
"""
|
290 |
|
291 |
did_overwrite = False
|
292 |
|
293 |
if(os.path.exists(file_path) == False or os.path.getsize(file_path) == 0):
|
294 |
-
|
295 |
with open(file_path, "w+", encoding="utf-8") as file:
|
296 |
file.write(content_to_write)
|
297 |
|
@@ -312,7 +340,10 @@ class FileEnsurer():
|
|
312 |
Parameters:
|
313 |
file_path (str) : path to the file to be overwritten.
|
314 |
content to write (str) : content to be written to the file.
|
315 |
-
omit (bool | optional) : whether or not to omit the content from the log.
|
|
|
|
|
|
|
316 |
|
317 |
"""
|
318 |
|
@@ -322,7 +353,7 @@ class FileEnsurer():
|
|
322 |
if(omit):
|
323 |
content_to_write = "(Content was omitted)"
|
324 |
|
325 |
-
|
326 |
|
327 |
##--------------------start-of-clear_file()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
328 |
|
@@ -337,12 +368,15 @@ class FileEnsurer():
|
|
337 |
Parameters:
|
338 |
file_path (str) : path to the file to be cleared.
|
339 |
|
|
|
|
|
|
|
340 |
"""
|
341 |
|
342 |
with open(file_path, "w+", encoding="utf-8") as file:
|
343 |
file.truncate()
|
344 |
|
345 |
-
|
346 |
|
347 |
##--------------------start-of-standard_read_file()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
348 |
|
@@ -360,6 +394,9 @@ class FileEnsurer():
|
|
360 |
Returns:
|
361 |
content (str) : the content of the file.
|
362 |
|
|
|
|
|
|
|
363 |
"""
|
364 |
|
365 |
with open(file_path, "r", encoding="utf-8") as file:
|
@@ -379,17 +416,16 @@ class FileEnsurer():
|
|
379 |
Parameters:
|
380 |
critical_exception (object - Exception) : the exception to be handled.
|
381 |
|
382 |
-
|
|
|
383 |
|
384 |
-
|
385 |
-
Logger.log_action("Kudasai has crashed", output=True)
|
386 |
-
Logger.log_action("Please send the following to the developer on github at https://github.com/Bikatr7/Kudasai/issues :", output=True, omit_timestamp=True)
|
387 |
|
388 |
traceback_msg = traceback.format_exc()
|
389 |
|
390 |
-
|
391 |
-
|
392 |
-
|
393 |
|
394 |
Toolkit.pause_console()
|
395 |
|
@@ -411,6 +447,9 @@ class FileEnsurer():
|
|
411 |
module (str) : name of the module that generated the results.
|
412 |
timestamp (str) : timestamp of when the results were generated.
|
413 |
|
|
|
|
|
|
|
414 |
"""
|
415 |
|
416 |
archival_path = os.path.join(FileEnsurer.archive_dir, f'{module}_run_{timestamp}')
|
@@ -439,6 +478,9 @@ class FileEnsurer():
|
|
439 |
Returns:
|
440 |
json_object (dict) : the json object.
|
441 |
|
|
|
|
|
|
|
442 |
"""
|
443 |
|
444 |
with open(file_path, "r", encoding="utf-8") as file:
|
@@ -462,6 +504,9 @@ class FileEnsurer():
|
|
462 |
error_log (str) : the log of any errors that occurred during preprocessing.
|
463 |
timestamp (str) : the timestamp of when the results were generated (Can be obtained from Toolkit.get_timestamp(is_archival=True))
|
464 |
|
|
|
|
|
|
|
465 |
"""
|
466 |
|
467 |
## ensures the output directory exists, cause it could get moved or fucked with.
|
@@ -485,14 +530,12 @@ class FileEnsurer():
|
|
485 |
## Instructions to create a copy of the output for archival
|
486 |
FileEnsurer.standard_create_directory(FileEnsurer.archive_dir)
|
487 |
|
488 |
-
Logger.push_batch()
|
489 |
-
Logger.clear_batch()
|
490 |
-
|
491 |
list_of_result_tuples = [('kairyou_preprocessed_text', text_to_preprocess),
|
492 |
-
|
493 |
-
|
494 |
-
|
495 |
|
496 |
FileEnsurer.archive_results(list_of_result_tuples,
|
497 |
-
|
|
|
498 |
|
|
|
4 |
import json
|
5 |
import typing
|
6 |
import shutil
|
7 |
+
import logging
|
8 |
|
9 |
## custom modules
|
10 |
from modules.common.decorators import permission_error_decorator
|
|
|
11 |
from modules.common.toolkit import Toolkit
|
12 |
|
13 |
class FileEnsurer():
|
|
|
16 |
|
17 |
FileEnsurer is a class that is used to ensure that the required files and directories exist.
|
18 |
Also serves as a place to store the paths to the files and directories. Some file related functions are also stored here.
|
19 |
+
As well as some variables that are used to store the default translation settings and the allowed models across Kudasai.
|
20 |
|
21 |
"""
|
22 |
|
|
|
28 |
hugging_face_flag = os.path.join(script_dir, "util", "hugging_face_flag.py")
|
29 |
|
30 |
## main dirs (config is just under userprofile on windows, and under home on linux); secrets are under appdata on windows, and under .config on linux
|
31 |
+
if(Toolkit.is_windows()): ## Windows
|
32 |
config_dir = os.path.join(os.environ['USERPROFILE'],"KudasaiConfig")
|
33 |
secrets_dir = os.path.join(os.environ['APPDATA'],"KudasaiSecrets")
|
34 |
else: ## Linux
|
35 |
config_dir = os.path.join(os.path.expanduser("~"), "KudasaiConfig")
|
36 |
secrets_dir = os.path.join(os.path.expanduser("~"), ".config", "KudasaiSecrets")
|
37 |
|
|
|
|
|
38 |
##----------------------------------/
|
39 |
|
40 |
## sub dirs
|
41 |
lib_dir = os.path.join(script_dir, "lib")
|
42 |
+
common_lib = os.path.join(lib_dir, "common")
|
43 |
gui_lib = os.path.join(lib_dir, "gui")
|
44 |
jsons_dir = os.path.join(script_dir, "jsons")
|
45 |
|
46 |
##----------------------------------/
|
47 |
|
48 |
## output files
|
49 |
+
preprocessed_text_path = os.path.join(output_dir, "preprocessed_text.txt")
|
50 |
+
translated_text_path = os.path.join(output_dir, "translated_text.txt")
|
51 |
|
52 |
+
je_check_path = os.path.join(output_dir, "je_check_text.txt")
|
53 |
|
54 |
+
kairyou_log_path = os.path.join(output_dir, "preprocessing_results.txt")
|
55 |
+
error_log_path = os.path.join(output_dir, "error_log.txt")
|
56 |
+
debug_log_path = os.path.join(output_dir, "debug_log.txt")
|
57 |
|
58 |
+
## translation settings
|
59 |
+
external_translation_settings_path = os.path.join(script_dir,'translation_settings.json')
|
60 |
+
config_translation_settings_path = os.path.join(config_dir,'translation_settings.json')
|
61 |
|
62 |
## api keys
|
63 |
deepl_api_key_path = os.path.join(secrets_dir, "deepl_api_key.txt")
|
|
|
67 |
## favicon
|
68 |
favicon_path = os.path.join(gui_lib, "Kudasai_Logo.png")
|
69 |
|
70 |
+
## js save to file
|
71 |
+
js_save_to_file_path = os.path.join(gui_lib, "save_to_file.js")
|
72 |
+
|
73 |
+
## translation settings description
|
74 |
+
translation_settings_description_path = os.path.join(common_lib, "translation_settings_description.txt")
|
75 |
+
|
76 |
+
DEFAULT_TRANSLATION_SETTING = {
|
77 |
+
"base translation settings": {
|
78 |
"prompt_assembly_mode": 1,
|
79 |
"number_of_lines_per_batch": 36,
|
80 |
"sentence_fragmenter_mode": 2,
|
|
|
108 |
"gemini_stream": False,
|
109 |
"gemini_stop_sequences": None,
|
110 |
"gemini_max_output_tokens": None
|
111 |
+
},
|
112 |
+
|
113 |
+
"deepl settings":{
|
114 |
+
"deepl_context": "",
|
115 |
+
"deepl_split_sentences": "ALL",
|
116 |
+
"deepl_preserve_formatting": True,
|
117 |
+
"deepl_formality": "default"
|
118 |
}
|
119 |
}
|
120 |
+
INVALID_TRANSLATION_SETTINGS_PLACEHOLDER = {
|
121 |
"INVALID JSON":
|
122 |
{
|
123 |
"INVALID JSON":"INVALID JSON"
|
|
|
139 |
|
140 |
Determines if Kudasai is running on a Hugging Face server.
|
141 |
|
142 |
+
Returns:
|
143 |
+
bool : whether or not Kudasai is running on a Hugging Face server.
|
144 |
+
|
145 |
"""
|
146 |
|
147 |
return os.path.exists(FileEnsurer.hugging_face_flag)
|
|
|
159 |
|
160 |
print("Cleaning up and exiting...")
|
161 |
|
|
|
|
|
162 |
exit()
|
163 |
|
164 |
##-------------------start-of-setup_needed_files()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
|
|
171 |
|
172 |
Ensures that the required files and directories exist.
|
173 |
|
174 |
+
Decorated By:
|
175 |
+
permission_error_decorator
|
176 |
+
|
177 |
"""
|
178 |
|
179 |
|
|
|
184 |
FileEnsurer.standard_create_directory(FileEnsurer.output_dir)
|
185 |
FileEnsurer.standard_create_directory(FileEnsurer.secrets_dir)
|
186 |
|
|
|
|
|
|
|
187 |
## creates the 5 output files
|
188 |
FileEnsurer.standard_create_file(FileEnsurer.preprocessed_text_path)
|
189 |
FileEnsurer.standard_create_file(FileEnsurer.translated_text_path)
|
|
|
192 |
FileEnsurer.standard_create_file(FileEnsurer.error_log_path)
|
193 |
|
194 |
## creates the kijiku rules file if it doesn't exist
|
195 |
+
if(os.path.exists(FileEnsurer.config_translation_settings_path) == False):
|
196 |
+
with open(FileEnsurer.config_translation_settings_path, 'w+', encoding='utf-8') as file:
|
197 |
+
json.dump(FileEnsurer.DEFAULT_TRANSLATION_SETTING, file)
|
198 |
|
199 |
##-------------------start-of-purge_storage()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
200 |
|
|
|
206 |
|
207 |
In case of hugging face, purges the storage.
|
208 |
|
209 |
+
Decorated By:
|
210 |
+
permission_error_decorator
|
211 |
+
|
212 |
"""
|
213 |
|
214 |
if(not FileEnsurer.is_hugging_space()):
|
215 |
+
logging.debug("Not running on Hugging Face, skipping storage purge")
|
216 |
return
|
217 |
+
|
218 |
+
logging.debug("Running on Hugging Face, purging storage")
|
219 |
|
220 |
stuff_to_purge = [
|
221 |
FileEnsurer.secrets_dir,
|
|
|
262 |
Parameters:
|
263 |
directory_path (str) : path to the directory to be created.
|
264 |
|
265 |
+
Decorated By:
|
266 |
+
permission_error_decorator
|
267 |
+
|
268 |
"""
|
269 |
|
270 |
if(os.path.isdir(directory_path) == False):
|
271 |
os.makedirs(directory_path)
|
272 |
+
logging.debug(directory_path + " created due to lack of the folder")
|
273 |
|
274 |
##--------------------start-of-standard_create_file()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
275 |
|
|
|
284 |
Parameters:
|
285 |
file_path (str) : path to the file to be created.
|
286 |
|
287 |
+
Decorated By:
|
288 |
+
permission_error_decorator
|
289 |
+
|
290 |
"""
|
291 |
|
292 |
if(os.path.exists(file_path) == False):
|
293 |
+
logging.debug(file_path + " was created due to lack of the file")
|
294 |
with open(file_path, "w+", encoding="utf-8") as file:
|
295 |
file.truncate()
|
296 |
|
|
|
311 |
Returns:
|
312 |
bool : whether or not the file was overwritten.
|
313 |
|
314 |
+
Decorated By:
|
315 |
+
permission_error_decorator
|
316 |
+
|
317 |
"""
|
318 |
|
319 |
did_overwrite = False
|
320 |
|
321 |
if(os.path.exists(file_path) == False or os.path.getsize(file_path) == 0):
|
322 |
+
logging.debug(file_path + " was created due to lack of the file or because it is blank")
|
323 |
with open(file_path, "w+", encoding="utf-8") as file:
|
324 |
file.write(content_to_write)
|
325 |
|
|
|
340 |
Parameters:
|
341 |
file_path (str) : path to the file to be overwritten.
|
342 |
content to write (str) : content to be written to the file.
|
343 |
+
omit (bool | optional | default=True) : whether or not to omit the content from the log.
|
344 |
+
|
345 |
+
Decorated By:
|
346 |
+
permission_error_decorator
|
347 |
|
348 |
"""
|
349 |
|
|
|
353 |
if(omit):
|
354 |
content_to_write = "(Content was omitted)"
|
355 |
|
356 |
+
logging.debug(file_path + " was overwritten with the following content: " + content_to_write)
|
357 |
|
358 |
##--------------------start-of-clear_file()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
359 |
|
|
|
368 |
Parameters:
|
369 |
file_path (str) : path to the file to be cleared.
|
370 |
|
371 |
+
Decorated By:
|
372 |
+
permission_error_decorator
|
373 |
+
|
374 |
"""
|
375 |
|
376 |
with open(file_path, "w+", encoding="utf-8") as file:
|
377 |
file.truncate()
|
378 |
|
379 |
+
logging.debug(file_path + " was cleared")
|
380 |
|
381 |
##--------------------start-of-standard_read_file()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
382 |
|
|
|
394 |
Returns:
|
395 |
content (str) : the content of the file.
|
396 |
|
397 |
+
Decorated By:
|
398 |
+
permission_error_decorator
|
399 |
+
|
400 |
"""
|
401 |
|
402 |
with open(file_path, "r", encoding="utf-8") as file:
|
|
|
416 |
Parameters:
|
417 |
critical_exception (object - Exception) : the exception to be handled.
|
418 |
|
419 |
+
Decorated By:
|
420 |
+
permission_error_decorator
|
421 |
|
422 |
+
"""
|
|
|
|
|
423 |
|
424 |
traceback_msg = traceback.format_exc()
|
425 |
|
426 |
+
logging.error(f"Kudasai has crashed "
|
427 |
+
f"Please send the following to the developer on github at https://github.com/Bikatr7/Kudasai/issues :"
|
428 |
+
f"{traceback_msg}")
|
429 |
|
430 |
Toolkit.pause_console()
|
431 |
|
|
|
447 |
module (str) : name of the module that generated the results.
|
448 |
timestamp (str) : timestamp of when the results were generated.
|
449 |
|
450 |
+
Decorated By:
|
451 |
+
permission_error_decorator
|
452 |
+
|
453 |
"""
|
454 |
|
455 |
archival_path = os.path.join(FileEnsurer.archive_dir, f'{module}_run_{timestamp}')
|
|
|
478 |
Returns:
|
479 |
json_object (dict) : the json object.
|
480 |
|
481 |
+
Decorated By:
|
482 |
+
permission_error_decorator
|
483 |
+
|
484 |
"""
|
485 |
|
486 |
with open(file_path, "r", encoding="utf-8") as file:
|
|
|
504 |
error_log (str) : the log of any errors that occurred during preprocessing.
|
505 |
timestamp (str) : the timestamp of when the results were generated (Can be obtained from Toolkit.get_timestamp(is_archival=True))
|
506 |
|
507 |
+
Decorated By:
|
508 |
+
permission_error_decorator
|
509 |
+
|
510 |
"""
|
511 |
|
512 |
## ensures the output directory exists, cause it could get moved or fucked with.
|
|
|
530 |
## Instructions to create a copy of the output for archival
|
531 |
FileEnsurer.standard_create_directory(FileEnsurer.archive_dir)
|
532 |
|
|
|
|
|
|
|
533 |
list_of_result_tuples = [('kairyou_preprocessed_text', text_to_preprocess),
|
534 |
+
('kairyou_preprocessing_log', preprocessing_log),
|
535 |
+
('kairyou_error_log', error_log),
|
536 |
+
('debug_log', FileEnsurer.standard_read_file(FileEnsurer.debug_log_path))]
|
537 |
|
538 |
FileEnsurer.archive_results(list_of_result_tuples,
|
539 |
+
module='kairyou',
|
540 |
+
timestamp=timestamp)
|
541 |
|
modules/common/logger.py
DELETED
@@ -1,132 +0,0 @@
|
|
1 |
-
## custom modules
|
2 |
-
from modules.common.toolkit import Toolkit
|
3 |
-
from modules.common.decorators import permission_error_decorator
|
4 |
-
|
5 |
-
class Logger:
|
6 |
-
|
7 |
-
"""
|
8 |
-
|
9 |
-
The logger class is used to log actions taken by Kudasai.
|
10 |
-
|
11 |
-
"""
|
12 |
-
|
13 |
-
log_file_path = ""
|
14 |
-
|
15 |
-
current_batch = ""
|
16 |
-
|
17 |
-
errors = []
|
18 |
-
|
19 |
-
##--------------------start-of-log_action()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
20 |
-
|
21 |
-
@staticmethod
|
22 |
-
def log_action(action:str, output:bool=False, omit_timestamp:bool=False) -> None:
|
23 |
-
|
24 |
-
"""
|
25 |
-
|
26 |
-
Logs an action.
|
27 |
-
|
28 |
-
Parameters:
|
29 |
-
action (str) : the action being logged.
|
30 |
-
output (bool | optional | defaults to false) : whether or not to output the action to the console.
|
31 |
-
omit_timestamp (bool | optional | defaults to false) : whether or not to omit the timestamp from the action.
|
32 |
-
|
33 |
-
"""
|
34 |
-
|
35 |
-
timestamp = Toolkit.get_timestamp()
|
36 |
-
|
37 |
-
log_line = timestamp + action + "\n"
|
38 |
-
|
39 |
-
Logger.current_batch += log_line
|
40 |
-
|
41 |
-
if(omit_timestamp):
|
42 |
-
log_line = action
|
43 |
-
|
44 |
-
if(output):
|
45 |
-
print(log_line)
|
46 |
-
|
47 |
-
##--------------------start-of-log_error()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
48 |
-
|
49 |
-
@staticmethod
|
50 |
-
def log_error(action:str, output:bool=False, omit_timestamp:bool=False) -> None:
|
51 |
-
|
52 |
-
"""
|
53 |
-
|
54 |
-
Logs an error.
|
55 |
-
|
56 |
-
Parameters:
|
57 |
-
action (str) : the action being logged.
|
58 |
-
output (bool | optional | defaults to false) : whether or not to output the action to the console.
|
59 |
-
omit_timestamp (bool | optional | defaults to false) : whether or not to omit the timestamp from the action.
|
60 |
-
|
61 |
-
"""
|
62 |
-
|
63 |
-
timestamp = Toolkit.get_timestamp()
|
64 |
-
|
65 |
-
log_line = timestamp + action + "\n"
|
66 |
-
|
67 |
-
Logger.current_batch += log_line
|
68 |
-
|
69 |
-
if(omit_timestamp):
|
70 |
-
log_line = action
|
71 |
-
|
72 |
-
if(output):
|
73 |
-
print(log_line)
|
74 |
-
|
75 |
-
Logger.errors.append(log_line)
|
76 |
-
|
77 |
-
##--------------------start-of-log_barrier()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
78 |
-
|
79 |
-
@staticmethod
|
80 |
-
def log_barrier() -> None:
|
81 |
-
|
82 |
-
"""
|
83 |
-
|
84 |
-
Logs a barrier.
|
85 |
-
|
86 |
-
"""
|
87 |
-
|
88 |
-
Logger.log_action("-------------------------")
|
89 |
-
|
90 |
-
##--------------------start-of-clear_batch()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
91 |
-
|
92 |
-
@staticmethod
|
93 |
-
def clear_batch() -> None:
|
94 |
-
|
95 |
-
"""
|
96 |
-
|
97 |
-
Clears the current batch.
|
98 |
-
|
99 |
-
"""
|
100 |
-
|
101 |
-
Logger.current_batch = ""
|
102 |
-
Logger.errors = []
|
103 |
-
|
104 |
-
##--------------------start-of-push_batch()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
105 |
-
|
106 |
-
@staticmethod
|
107 |
-
@permission_error_decorator()
|
108 |
-
def push_batch() -> None:
|
109 |
-
|
110 |
-
"""
|
111 |
-
|
112 |
-
Pushes all stored actions to the log file.
|
113 |
-
|
114 |
-
"""
|
115 |
-
|
116 |
-
with open(Logger.log_file_path, 'a+', encoding="utf-8") as file:
|
117 |
-
file.write(Logger.current_batch)
|
118 |
-
|
119 |
-
##--------------------start-of-clear_log_file()------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
120 |
-
|
121 |
-
@staticmethod
|
122 |
-
@permission_error_decorator()
|
123 |
-
def clear_log_file() -> None:
|
124 |
-
|
125 |
-
"""
|
126 |
-
|
127 |
-
Clears the log file.
|
128 |
-
|
129 |
-
"""
|
130 |
-
|
131 |
-
with open(Logger.log_file_path, 'w+', encoding="utf-8") as file:
|
132 |
-
file.truncate(0)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
modules/common/toolkit.py
CHANGED
@@ -3,6 +3,7 @@ from datetime import datetime
|
|
3 |
|
4 |
import os
|
5 |
import typing
|
|
|
6 |
import platform
|
7 |
import subprocess
|
8 |
|
@@ -14,7 +15,7 @@ class Toolkit():
|
|
14 |
|
15 |
"""
|
16 |
|
17 |
-
CURRENT_VERSION = "v3.4.
|
18 |
|
19 |
##-------------------start-of-clear_console()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
20 |
|
@@ -29,6 +30,22 @@ class Toolkit():
|
|
29 |
|
30 |
os.system('cls' if os.name == 'nt' else 'clear')
|
31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
##-------------------start-of-pause_console()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
33 |
|
34 |
@staticmethod
|
@@ -74,7 +91,6 @@ class Toolkit():
|
|
74 |
termios.tcsetattr(0, termios.TCSANOW, old_settings)
|
75 |
|
76 |
except ImportError:
|
77 |
-
|
78 |
pass
|
79 |
|
80 |
##-------------------start-of-maximize_window()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
@@ -164,12 +180,15 @@ class Toolkit():
|
|
164 |
##-------------------start-of-check_update()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
165 |
|
166 |
@staticmethod
|
167 |
-
def check_update() -> typing.Tuple[bool, str]:
|
168 |
|
169 |
"""
|
170 |
|
171 |
Determines if Kudasai has a new latest release, and confirms if an internet connection is present or not.
|
172 |
|
|
|
|
|
|
|
173 |
Returns:
|
174 |
is_connection (bool) : Whether or not the user has an internet connection.
|
175 |
update_prompt (str) : The update prompt to be displayed to the user, can either be blank if there is no update or contain the update prompt if there is an update.
|
@@ -193,6 +212,8 @@ class Toolkit():
|
|
193 |
|
194 |
if(LooseVersion(latest_version) > LooseVersion(Toolkit.CURRENT_VERSION)):
|
195 |
|
|
|
|
|
196 |
update_prompt += "There is a new update for Kudasai (" + latest_version + ")\nIt is recommended that you use the latest version of Kudasai\nYou can download it at https://github.com/Bikatr7/Kudasai/releases/latest \n"
|
197 |
|
198 |
if(release_notes):
|
@@ -203,9 +224,12 @@ class Toolkit():
|
|
203 |
## used to determine if user lacks an internet connection.
|
204 |
except:
|
205 |
|
|
|
|
|
206 |
print("You seem to lack an internet connection, this will prevent you from checking from update notification and machine translation.\n")
|
207 |
|
208 |
-
|
|
|
209 |
|
210 |
is_connection = False
|
211 |
|
|
|
3 |
|
4 |
import os
|
5 |
import typing
|
6 |
+
import logging
|
7 |
import platform
|
8 |
import subprocess
|
9 |
|
|
|
15 |
|
16 |
"""
|
17 |
|
18 |
+
CURRENT_VERSION = "v3.4.5"
|
19 |
|
20 |
##-------------------start-of-clear_console()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
21 |
|
|
|
30 |
|
31 |
os.system('cls' if os.name == 'nt' else 'clear')
|
32 |
|
33 |
+
##-------------------start-of-is_windows()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
34 |
+
|
35 |
+
@staticmethod
|
36 |
+
def is_windows() -> bool:
|
37 |
+
|
38 |
+
"""
|
39 |
+
|
40 |
+
Returns True if Kudasai is running on Windows.
|
41 |
+
|
42 |
+
Returns:
|
43 |
+
is_windows (bool) : If Kudasai is running on Windows.
|
44 |
+
|
45 |
+
"""
|
46 |
+
|
47 |
+
return os.name == 'nt'
|
48 |
+
|
49 |
##-------------------start-of-pause_console()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
50 |
|
51 |
@staticmethod
|
|
|
91 |
termios.tcsetattr(0, termios.TCSANOW, old_settings)
|
92 |
|
93 |
except ImportError:
|
|
|
94 |
pass
|
95 |
|
96 |
##-------------------start-of-maximize_window()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
|
|
180 |
##-------------------start-of-check_update()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
181 |
|
182 |
@staticmethod
|
183 |
+
def check_update(do_pause:bool=True) -> typing.Tuple[bool, str]:
|
184 |
|
185 |
"""
|
186 |
|
187 |
Determines if Kudasai has a new latest release, and confirms if an internet connection is present or not.
|
188 |
|
189 |
+
Parameters:
|
190 |
+
do_pause (bool | optional | default=True) : Whether or not to pause the console after displaying the update prompt.
|
191 |
+
|
192 |
Returns:
|
193 |
is_connection (bool) : Whether or not the user has an internet connection.
|
194 |
update_prompt (str) : The update prompt to be displayed to the user, can either be blank if there is no update or contain the update prompt if there is an update.
|
|
|
212 |
|
213 |
if(LooseVersion(latest_version) > LooseVersion(Toolkit.CURRENT_VERSION)):
|
214 |
|
215 |
+
logging.debug("New update available: " + latest_version)
|
216 |
+
|
217 |
update_prompt += "There is a new update for Kudasai (" + latest_version + ")\nIt is recommended that you use the latest version of Kudasai\nYou can download it at https://github.com/Bikatr7/Kudasai/releases/latest \n"
|
218 |
|
219 |
if(release_notes):
|
|
|
224 |
## used to determine if user lacks an internet connection.
|
225 |
except:
|
226 |
|
227 |
+
logging.debug("No internet connection detected.")
|
228 |
+
|
229 |
print("You seem to lack an internet connection, this will prevent you from checking from update notification and machine translation.\n")
|
230 |
|
231 |
+
if(do_pause):
|
232 |
+
Toolkit.pause_console()
|
233 |
|
234 |
is_connection = False
|
235 |
|
models/kijiku.py → modules/common/translator.py
RENAMED
@@ -6,6 +6,7 @@ import time
|
|
6 |
import typing
|
7 |
import asyncio
|
8 |
import os
|
|
|
9 |
|
10 |
## third party modules
|
11 |
from kairyou import KatakanaUtil
|
@@ -17,19 +18,18 @@ import backoff
|
|
17 |
from handlers.json_handler import JsonHandler
|
18 |
|
19 |
from modules.common.file_ensurer import FileEnsurer
|
20 |
-
from modules.common.logger import Logger
|
21 |
from modules.common.toolkit import Toolkit
|
22 |
-
from modules.common.exceptions import
|
23 |
from modules.common.decorators import permission_error_decorator
|
24 |
|
25 |
-
##-------------------start-of-
|
26 |
|
27 |
-
class
|
28 |
|
29 |
"""
|
30 |
|
31 |
-
|
32 |
-
Currently supports OpenAI and
|
33 |
|
34 |
"""
|
35 |
|
@@ -48,6 +48,9 @@ class Kijiku:
|
|
48 |
## meanwhile for gemini, we just need to send the prompt and the text to be translated concatenated together
|
49 |
gemini_translation_batches:typing.List[str] = []
|
50 |
|
|
|
|
|
|
|
51 |
num_occurred_malformed_batches = 0
|
52 |
|
53 |
## semaphore to limit the number of concurrent batches
|
@@ -55,7 +58,7 @@ class Kijiku:
|
|
55 |
|
56 |
##--------------------------------------------------------------------------------------------------------------------------
|
57 |
|
58 |
-
|
59 |
|
60 |
translation_print_result = ""
|
61 |
|
@@ -71,6 +74,10 @@ class Kijiku:
|
|
71 |
|
72 |
decorator_to_use:typing.Callable
|
73 |
|
|
|
|
|
|
|
|
|
74 |
##-------------------start-of-get_max_batch_duration()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
75 |
|
76 |
@staticmethod
|
@@ -79,14 +86,14 @@ class Kijiku:
|
|
79 |
"""
|
80 |
|
81 |
Returns the max batch duration.
|
82 |
-
Structured as a function so that it can be used as a lambda function in the backoff decorator. As decorators call the function when they are defined/runtime, not when they are called.
|
83 |
|
84 |
Returns:
|
85 |
max_batch_duration (float) : the max batch duration.
|
86 |
|
87 |
"""
|
88 |
|
89 |
-
return
|
90 |
|
91 |
##-------------------start-of-log_retry()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
92 |
|
@@ -104,9 +111,7 @@ class Kijiku:
|
|
104 |
|
105 |
retry_msg = f"Retrying translation after {details['wait']} seconds after {details['tries']} tries {details['target']} due to {details['exception']}."
|
106 |
|
107 |
-
|
108 |
-
Logger.log_action(retry_msg)
|
109 |
-
Logger.log_barrier()
|
110 |
|
111 |
##-------------------start-of-log_failure()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
112 |
|
@@ -120,13 +125,14 @@ class Kijiku:
|
|
120 |
Parameters:
|
121 |
details (dict) : the details of the failure.
|
122 |
|
|
|
|
|
|
|
123 |
"""
|
124 |
|
125 |
-
error_msg = f"Exceeded duration, returning untranslated text after {details['tries']} tries {details['target']}."
|
126 |
|
127 |
-
|
128 |
-
Logger.log_error(error_msg)
|
129 |
-
Logger.log_barrier()
|
130 |
|
131 |
raise MaxBatchDurationExceededException(error_msg)
|
132 |
|
@@ -141,27 +147,26 @@ class Kijiku:
|
|
141 |
|
142 |
"""
|
143 |
|
144 |
-
Logger.clear_batch()
|
145 |
-
|
146 |
## set this here cause the try-except could throw before we get past the settings configuration
|
147 |
time_start = time.time()
|
148 |
|
149 |
try:
|
150 |
|
151 |
-
await
|
152 |
|
153 |
JsonHandler.validate_json()
|
154 |
|
155 |
-
|
|
|
156 |
|
157 |
## set actual start time to the end of the settings configuration
|
158 |
time_start = time.time()
|
159 |
|
160 |
-
await
|
161 |
|
162 |
except Exception as e:
|
163 |
|
164 |
-
|
165 |
|
166 |
FileEnsurer.handle_critical_exception(e)
|
167 |
|
@@ -169,7 +174,10 @@ class Kijiku:
|
|
169 |
|
170 |
time_end = time.time()
|
171 |
|
172 |
-
|
|
|
|
|
|
|
173 |
|
174 |
##-------------------start-of-initialize()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
175 |
|
@@ -178,38 +186,47 @@ class Kijiku:
|
|
178 |
|
179 |
"""
|
180 |
|
181 |
-
Sets the API Key for the respective service and loads the
|
182 |
|
183 |
"""
|
184 |
|
185 |
-
|
|
|
|
|
|
|
|
|
186 |
|
187 |
-
if(
|
188 |
-
|
189 |
-
|
190 |
-
else:
|
191 |
-
Kijiku.LLM_TYPE = "gemini"
|
192 |
|
193 |
-
|
194 |
-
|
195 |
-
|
196 |
-
|
|
|
|
|
197 |
|
198 |
else:
|
199 |
-
|
200 |
-
|
201 |
-
## try to load the kijiku rules
|
202 |
-
try:
|
203 |
|
204 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
205 |
|
206 |
-
|
|
|
|
|
|
|
|
|
|
|
207 |
except:
|
208 |
-
|
209 |
-
JsonHandler.
|
210 |
-
|
211 |
-
JsonHandler.load_kijiku_rules()
|
212 |
-
|
213 |
Toolkit.clear_console()
|
214 |
|
215 |
##-------------------start-of-init_openai_api_key()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
@@ -241,9 +258,8 @@ class Kijiku:
|
|
241 |
## if not valid, raise the exception that caused the test to fail
|
242 |
if(not is_valid and e is not None):
|
243 |
raise e
|
244 |
-
|
245 |
-
|
246 |
-
Logger.log_barrier()
|
247 |
|
248 |
time.sleep(2)
|
249 |
|
@@ -267,22 +283,22 @@ class Kijiku:
|
|
267 |
FileEnsurer.standard_overwrite_file(api_key_path, base64.b64encode(api_key.encode('utf-8')).decode('utf-8'), omit=True)
|
268 |
|
269 |
## if invalid key exit
|
270 |
-
except (GoogleAuthError,
|
271 |
|
272 |
Toolkit.clear_console()
|
273 |
-
|
274 |
-
|
275 |
|
276 |
Toolkit.pause_console()
|
277 |
|
278 |
-
exit()
|
279 |
|
280 |
## other error, alert user and raise it
|
281 |
except Exception as e:
|
282 |
|
283 |
Toolkit.clear_console()
|
284 |
|
285 |
-
|
286 |
|
287 |
Toolkit.pause_console()
|
288 |
|
@@ -300,17 +316,17 @@ class Kijiku:
|
|
300 |
|
301 |
"""
|
302 |
|
303 |
-
|
304 |
-
|
305 |
-
|
306 |
-
|
307 |
-
|
308 |
-
|
309 |
-
|
310 |
-
|
311 |
-
|
312 |
-
|
313 |
-
|
314 |
|
315 |
##-------------------start-of-check-settings()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
316 |
|
@@ -319,54 +335,48 @@ class Kijiku:
|
|
319 |
|
320 |
"""
|
321 |
|
322 |
-
Prompts the user to confirm the settings in the
|
323 |
|
324 |
"""
|
325 |
|
326 |
print("Are these settings okay? (1 for yes or 2 for no) : \n\n")
|
327 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
328 |
try:
|
329 |
|
330 |
-
JsonHandler.
|
331 |
|
332 |
except:
|
333 |
Toolkit.clear_console()
|
334 |
|
335 |
-
if(input("It's likely that you're using an outdated version of the
|
336 |
Toolkit.clear_console()
|
337 |
-
JsonHandler.
|
338 |
-
JsonHandler.
|
339 |
|
340 |
print("Are these settings okay? (1 for yes or 2 for no) : \n\n")
|
341 |
-
JsonHandler.
|
342 |
-
|
343 |
else:
|
344 |
FileEnsurer.exit_kudasai()
|
345 |
|
346 |
-
if(input("\n")
|
347 |
-
|
348 |
-
else:
|
349 |
-
JsonHandler.change_kijiku_settings()
|
350 |
|
351 |
Toolkit.clear_console()
|
352 |
|
353 |
print("Do you want to change your API key? (1 for yes or 2 for no) : ")
|
354 |
|
355 |
if(input("\n") == "1"):
|
356 |
-
|
357 |
-
|
358 |
-
|
359 |
-
if(os.path.exists(FileEnsurer.openai_api_key_path)):
|
360 |
-
|
361 |
-
os.remove(FileEnsurer.openai_api_key_path)
|
362 |
-
await Kijiku.init_api_key("OpenAI", FileEnsurer.openai_api_key_path, EasyTL.set_api_key, EasyTL.test_api_key_validity)
|
363 |
-
|
364 |
-
else:
|
365 |
-
|
366 |
-
if(os.path.exists(FileEnsurer.gemini_api_key_path)):
|
367 |
-
|
368 |
-
os.remove(FileEnsurer.gemini_api_key_path)
|
369 |
-
await Kijiku.init_api_key("Gemini", FileEnsurer.gemini_api_key_path, EasyTL.set_api_key, EasyTL.test_api_key_validity)
|
370 |
|
371 |
Toolkit.clear_console()
|
372 |
|
@@ -383,106 +393,106 @@ class Kijiku:
|
|
383 |
is_webgui (bool | optional | default=False) : A bool representing whether the function is being called by the webgui.
|
384 |
|
385 |
"""
|
|
|
|
|
|
|
386 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
387 |
|
388 |
-
|
389 |
-
|
390 |
-
|
391 |
-
|
392 |
-
|
393 |
-
|
394 |
-
|
395 |
-
|
396 |
-
Kijiku.prompt_assembly_mode = int(JsonHandler.current_kijiku_rules["base kijiku settings"]["prompt_assembly_mode"])
|
397 |
-
Kijiku.number_of_lines_per_batch = int(JsonHandler.current_kijiku_rules["base kijiku settings"]["number_of_lines_per_batch"])
|
398 |
-
Kijiku.sentence_fragmenter_mode = int(JsonHandler.current_kijiku_rules["base kijiku settings"]["sentence_fragmenter_mode"])
|
399 |
-
Kijiku.je_check_mode = int(JsonHandler.current_kijiku_rules["base kijiku settings"]["je_check_mode"])
|
400 |
-
Kijiku.num_of_malform_retries = int(JsonHandler.current_kijiku_rules["base kijiku settings"]["number_of_malformed_batch_retries"])
|
401 |
-
Kijiku.max_batch_duration = float(JsonHandler.current_kijiku_rules["base kijiku settings"]["batch_retry_timeout"])
|
402 |
-
Kijiku.num_concurrent_batches = int(JsonHandler.current_kijiku_rules["base kijiku settings"]["number_of_concurrent_batches"])
|
403 |
-
|
404 |
-
Kijiku._semaphore = asyncio.Semaphore(Kijiku.num_concurrent_batches)
|
405 |
-
|
406 |
-
Kijiku.openai_model = JsonHandler.current_kijiku_rules["openai settings"]["openai_model"]
|
407 |
-
Kijiku.openai_system_message = JsonHandler.current_kijiku_rules["openai settings"]["openai_system_message"]
|
408 |
-
Kijiku.openai_temperature = float(JsonHandler.current_kijiku_rules["openai settings"]["openai_temperature"])
|
409 |
-
Kijiku.openai_top_p = float(JsonHandler.current_kijiku_rules["openai settings"]["openai_top_p"])
|
410 |
-
Kijiku.openai_n = int(JsonHandler.current_kijiku_rules["openai settings"]["openai_n"])
|
411 |
-
Kijiku.openai_stream = bool(JsonHandler.current_kijiku_rules["openai settings"]["openai_stream"])
|
412 |
-
Kijiku.openai_stop = JsonHandler.current_kijiku_rules["openai settings"]["openai_stop"]
|
413 |
-
Kijiku.openai_logit_bias = JsonHandler.current_kijiku_rules["openai settings"]["openai_logit_bias"]
|
414 |
-
Kijiku.openai_max_tokens = JsonHandler.current_kijiku_rules["openai settings"]["openai_max_tokens"]
|
415 |
-
Kijiku.openai_presence_penalty = float(JsonHandler.current_kijiku_rules["openai settings"]["openai_presence_penalty"])
|
416 |
-
Kijiku.openai_frequency_penalty = float(JsonHandler.current_kijiku_rules["openai settings"]["openai_frequency_penalty"])
|
417 |
-
|
418 |
-
Kijiku.gemini_model = JsonHandler.current_kijiku_rules["gemini settings"]["gemini_model"]
|
419 |
-
Kijiku.gemini_prompt = JsonHandler.current_kijiku_rules["gemini settings"]["gemini_prompt"]
|
420 |
-
Kijiku.gemini_temperature = float(JsonHandler.current_kijiku_rules["gemini settings"]["gemini_temperature"])
|
421 |
-
Kijiku.gemini_top_p = JsonHandler.current_kijiku_rules["gemini settings"]["gemini_top_p"]
|
422 |
-
Kijiku.gemini_top_k = JsonHandler.current_kijiku_rules["gemini settings"]["gemini_top_k"]
|
423 |
-
Kijiku.gemini_candidate_count = JsonHandler.current_kijiku_rules["gemini settings"]["gemini_candidate_count"]
|
424 |
-
Kijiku.gemini_stream = bool(JsonHandler.current_kijiku_rules["gemini settings"]["gemini_stream"])
|
425 |
-
Kijiku.gemini_stop_sequences = JsonHandler.current_kijiku_rules["gemini settings"]["gemini_stop_sequences"]
|
426 |
-
Kijiku.gemini_max_output_tokens = JsonHandler.current_kijiku_rules["gemini settings"]["gemini_max_output_tokens"]
|
427 |
-
|
428 |
-
|
429 |
-
if(Kijiku.LLM_TYPE == "openai"):
|
430 |
-
Kijiku.decorator_to_use = backoff.on_exception(backoff.expo, max_time=lambda: Kijiku.get_max_batch_duration(), exception=(AuthenticationError, InternalServerError, RateLimitError, APITimeoutError), on_backoff=lambda details: Kijiku.log_retry(details), on_giveup=lambda details: Kijiku.log_failure(details), raise_on_giveup=False)
|
431 |
-
|
432 |
-
else:
|
433 |
-
Kijiku.decorator_to_use = backoff.on_exception(backoff.expo, max_time=lambda: Kijiku.get_max_batch_duration(), exception=(Exception), on_backoff=lambda details: Kijiku.log_retry(details), on_giveup=lambda details: Kijiku.log_failure(details), raise_on_giveup=False)
|
434 |
|
435 |
Toolkit.clear_console()
|
436 |
|
437 |
-
|
438 |
-
Logger.log_action("Starting Prompt Building")
|
439 |
-
Logger.log_barrier()
|
440 |
-
|
441 |
-
Kijiku.build_translation_batches()
|
442 |
|
443 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
444 |
|
445 |
-
await
|
446 |
|
447 |
Toolkit.clear_console()
|
448 |
|
449 |
-
|
450 |
-
|
451 |
-
Logger.log_action("Starting Translation...", output=not is_webgui)
|
452 |
-
Logger.log_barrier()
|
453 |
|
454 |
## requests to run asynchronously
|
455 |
-
async_requests =
|
456 |
|
457 |
## Use asyncio.gather to run tasks concurrently/asynchronously and wait for all of them to complete
|
458 |
results = await asyncio.gather(*async_requests)
|
459 |
|
460 |
-
|
461 |
-
Logger.log_action("Translation Complete!", output=not is_webgui)
|
462 |
-
|
463 |
-
Logger.log_barrier()
|
464 |
-
Logger.log_action("Starting Redistribution...", output=not is_webgui)
|
465 |
-
|
466 |
-
Logger.log_barrier()
|
467 |
|
468 |
## Sort results based on the index to maintain order
|
469 |
sorted_results = sorted(results, key=lambda x: x[0])
|
470 |
|
471 |
## Redistribute the sorted results
|
472 |
-
for
|
473 |
-
|
474 |
|
475 |
## try to pair the text for j-e checking if the mode is 2
|
476 |
-
if(
|
477 |
-
|
478 |
|
479 |
Toolkit.clear_console()
|
480 |
|
481 |
-
|
482 |
-
Logger.log_barrier()
|
483 |
-
|
484 |
-
## assemble error text based of the error list
|
485 |
-
Kijiku.error_text = Logger.errors
|
486 |
|
487 |
##-------------------start-of-build_async_requests()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
488 |
|
@@ -502,19 +512,34 @@ class Kijiku:
|
|
502 |
"""
|
503 |
|
504 |
async_requests = []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
505 |
|
506 |
-
|
507 |
-
|
508 |
-
|
509 |
-
|
510 |
-
|
511 |
-
|
512 |
-
|
513 |
-
|
514 |
-
assert isinstance(prompt, ModelTranslationMessage) or isinstance(prompt, str)
|
515 |
|
516 |
-
|
|
|
517 |
|
|
|
|
|
|
|
|
|
518 |
return async_requests
|
519 |
|
520 |
##-------------------start-of-generate_text_to_translate_batches()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
@@ -537,41 +562,36 @@ class Kijiku:
|
|
537 |
|
538 |
prompt = []
|
539 |
non_word_pattern = re.compile(r'^[\W_\s\n-]+$')
|
|
|
|
|
|
|
|
|
|
|
540 |
|
541 |
-
|
542 |
-
|
543 |
-
sentence = Kijiku.text_to_translate[index]
|
544 |
-
stripped_sentence = sentence.strip()
|
545 |
lowercase_sentence = sentence.lower()
|
546 |
-
|
547 |
-
has_quotes = any(char in sentence for char in
|
548 |
is_part_in_sentence = "part" in lowercase_sentence
|
549 |
-
|
550 |
-
|
551 |
-
|
552 |
-
|
|
|
553 |
prompt.append(f'{sentence}\n')
|
554 |
-
|
555 |
-
|
556 |
-
elif(stripped_sentence == ''):
|
557 |
-
Logger.log_action(f"Sentence : {sentence} is empty... skipping.")
|
558 |
|
559 |
-
elif(
|
560 |
-
|
561 |
-
Logger.log_action(f"Sentence : {sentence}, Sentence is part marker... adding to prompt.")
|
562 |
|
563 |
-
elif(
|
564 |
-
Logger.log_action(f"Sentence : {sentence}, Sentence is punctuation... skipping.")
|
565 |
-
|
566 |
-
else:
|
567 |
prompt.append(f'{sentence}\n')
|
568 |
-
|
569 |
-
|
570 |
else:
|
571 |
return prompt, index
|
572 |
-
|
573 |
index += 1
|
574 |
-
|
575 |
return prompt, index
|
576 |
|
577 |
##-------------------start-of-build_translation_batches()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
@@ -587,42 +607,53 @@ class Kijiku:
|
|
587 |
|
588 |
i = 0
|
589 |
|
590 |
-
while i < len(
|
591 |
|
592 |
-
batch, i =
|
593 |
batch = ''.join(batch)
|
594 |
|
595 |
-
if(
|
596 |
|
597 |
-
if(
|
598 |
-
system_msg = SystemTranslationMessage(content=str(
|
599 |
else:
|
600 |
-
system_msg = SystemTranslationMessage(content=str(
|
601 |
|
602 |
-
|
603 |
model_msg = ModelTranslationMessage(content=batch)
|
604 |
-
|
|
|
|
|
|
|
|
|
605 |
|
606 |
else:
|
607 |
-
|
608 |
-
Kijiku.gemini_translation_batches.append(batch)
|
609 |
|
610 |
-
|
611 |
-
|
612 |
-
|
|
|
|
|
|
|
|
|
613 |
|
614 |
i = 0
|
615 |
|
616 |
-
|
|
|
|
|
617 |
|
618 |
i+=1
|
619 |
|
620 |
-
message = str(message) if
|
|
|
|
|
|
|
621 |
|
622 |
-
|
623 |
-
Logger.log_barrier()
|
624 |
|
625 |
-
|
626 |
|
627 |
##-------------------start-of-handle_cost_estimate_prompt()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
628 |
|
@@ -642,39 +673,43 @@ class Kijiku:
|
|
642 |
|
643 |
"""
|
644 |
|
645 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
646 |
|
647 |
## get cost estimate and confirm
|
648 |
-
num_tokens, min_cost, model = EasyTL.calculate_cost(text=
|
649 |
|
650 |
print("Note that the cost estimate is not always accurate, and may be higher than the actual cost. However cost calculation now includes output tokens.\n")
|
651 |
|
652 |
-
|
653 |
-
|
654 |
-
Logger.log_barrier()
|
655 |
-
|
656 |
-
if(Kijiku.LLM_TYPE == "gemini"):
|
657 |
-
Logger.log_action(f"As of Kudasai {Toolkit.CURRENT_VERSION}, Gemini Pro 1.0 is free to use under 15 requests per minute, Gemini Pro 1.5 is free to use under 2 requests per minute. Requests correspond to number_of_current_batches in kijiku_settings.", output=True, omit_timestamp=True)
|
658 |
|
659 |
-
|
660 |
-
|
661 |
-
Logger.log_barrier()
|
662 |
|
663 |
if(not omit_prompt):
|
664 |
if(input("\nContinue? (1 for yes or 2 for no) : ") == "1"):
|
665 |
-
|
666 |
|
667 |
else:
|
668 |
-
|
669 |
-
|
670 |
-
exit()
|
671 |
|
672 |
return model
|
673 |
|
674 |
##-------------------start-of-handle_translation()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
675 |
|
676 |
@staticmethod
|
677 |
-
async def handle_translation(model:str,
|
|
|
|
|
|
|
|
|
678 |
|
679 |
"""
|
680 |
|
@@ -682,20 +717,20 @@ class Kijiku:
|
|
682 |
|
683 |
Parameters:
|
684 |
model (string) : The model of the service used to translate the text.
|
685 |
-
|
686 |
-
|
687 |
-
|
688 |
-
|
689 |
|
690 |
Returns:
|
691 |
-
|
692 |
-
|
693 |
-
|
694 |
|
695 |
"""
|
696 |
|
697 |
## Basically limits the number of concurrent batches
|
698 |
-
async with
|
699 |
num_tries = 0
|
700 |
|
701 |
while True:
|
@@ -704,72 +739,93 @@ class Kijiku:
|
|
704 |
if(FileEnsurer.do_interrupt == True):
|
705 |
raise Exception("Interrupted by user.")
|
706 |
|
707 |
-
|
708 |
-
|
|
|
709 |
|
710 |
try:
|
711 |
|
712 |
-
|
713 |
-
|
714 |
-
|
715 |
-
|
716 |
-
|
717 |
-
|
718 |
-
|
719 |
-
|
720 |
-
|
721 |
-
|
722 |
-
|
723 |
-
|
724 |
-
|
725 |
-
|
726 |
-
|
727 |
-
|
728 |
-
|
729 |
-
|
730 |
-
|
731 |
-
|
732 |
-
|
733 |
-
|
734 |
-
|
735 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
736 |
|
737 |
## will only occur if the max_batch_duration is exceeded, so we just return the untranslated text
|
738 |
except MaxBatchDurationExceededException:
|
739 |
|
740 |
-
|
741 |
break
|
742 |
|
743 |
## do not even bother if not a gpt 4 model, because gpt-3 seems unable to format properly
|
744 |
## since gemini is free, we can just try again if it's malformed
|
745 |
-
|
|
|
746 |
break
|
747 |
|
748 |
-
if(await
|
749 |
-
Logger.log_action(f"Translation for batch {message_number} of {length//2} successful!", output=True)
|
750 |
break
|
751 |
|
752 |
-
if(num_tries >=
|
753 |
-
|
754 |
break
|
755 |
|
756 |
else:
|
757 |
num_tries += 1
|
758 |
-
|
759 |
-
|
760 |
|
761 |
-
if(isinstance(
|
762 |
-
|
763 |
|
764 |
if(isinstance(translated_message, typing.List)):
|
765 |
-
translated_message = ''.join(translated_message)
|
|
|
|
|
766 |
|
767 |
-
return
|
768 |
|
769 |
##-------------------start-of-check_if_translation_is_good()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
770 |
|
771 |
@staticmethod
|
772 |
-
async def check_if_translation_is_good(translated_message:typing.Union[typing.List[str], str],
|
773 |
|
774 |
"""
|
775 |
|
@@ -777,66 +833,61 @@ class Kijiku:
|
|
777 |
|
778 |
Parameters:
|
779 |
translated_message (str) : the translated message.
|
780 |
-
|
781 |
|
782 |
Returns:
|
783 |
is_valid (bool) : whether or not the translation is valid.
|
784 |
|
785 |
"""
|
786 |
|
787 |
-
if(not isinstance(
|
788 |
-
prompt =
|
789 |
|
790 |
else:
|
791 |
-
prompt =
|
792 |
|
793 |
if(isinstance(translated_message, list)):
|
794 |
translated_message = ''.join(translated_message)
|
795 |
|
796 |
-
is_valid = False
|
797 |
-
|
798 |
jap = [line for line in prompt.split('\n') if line.strip()] ## Remove blank lines
|
799 |
eng = [line for line in translated_message.split('\n') if line.strip()] ## Remove blank lines
|
800 |
-
|
801 |
-
if(len(jap) == len(eng)):
|
802 |
-
is_valid = True
|
803 |
|
804 |
-
return
|
805 |
|
806 |
##-------------------start-of-redistribute()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
807 |
|
808 |
@staticmethod
|
809 |
-
def redistribute(
|
810 |
|
811 |
"""
|
812 |
|
813 |
Puts translated text back into the text file.
|
814 |
|
815 |
Parameters:
|
816 |
-
|
817 |
translated_message (str) : the translated message.
|
818 |
|
819 |
"""
|
820 |
|
821 |
-
if(not isinstance(
|
822 |
-
prompt =
|
823 |
|
824 |
else:
|
825 |
-
prompt =
|
826 |
|
827 |
## Separates with hyphens if the mode is 1
|
828 |
-
if(
|
829 |
|
830 |
-
|
831 |
-
|
832 |
|
833 |
## Mode two tries to pair the text for j-e checking, see fix_je() for more details
|
834 |
-
elif(
|
835 |
-
|
836 |
-
|
837 |
|
838 |
## mode 1 is the default mode, uses regex and other nonsense to split sentences
|
839 |
-
if(
|
840 |
|
841 |
sentences = re.findall(r"(.*?(?:(?:\"|\'|-|~|!|\?|%|\(|\)|\.\.\.|\.|---|\[|\])))(?:\s|$)", translated_message)
|
842 |
|
@@ -859,17 +910,17 @@ class Kijiku:
|
|
859 |
build_string += f" {sentence}"
|
860 |
continue
|
861 |
|
862 |
-
|
863 |
|
864 |
-
for i in range(len(
|
865 |
-
if
|
866 |
-
index = patched_sentences.index(
|
867 |
-
|
868 |
|
869 |
## mode 2 just assumes the LLM formatted it properly
|
870 |
-
elif(
|
871 |
|
872 |
-
|
873 |
|
874 |
##-------------------start-of-fix_je()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
875 |
|
@@ -890,36 +941,30 @@ class Kijiku:
|
|
890 |
i = 1
|
891 |
final_list = []
|
892 |
|
893 |
-
while
|
894 |
-
jap =
|
895 |
-
eng =
|
896 |
|
897 |
-
jap = [line for line in jap if
|
898 |
-
eng = [line for line in eng if
|
899 |
|
900 |
final_list.append("-------------------------\n")
|
901 |
|
902 |
if(len(jap) == len(eng)):
|
903 |
-
|
904 |
-
|
905 |
-
if(jap_line and eng_line): ## check if jap_line and eng_line aren't blank
|
906 |
final_list.append(jap_line + '\n\n')
|
907 |
final_list.append(eng_line + '\n\n')
|
908 |
-
|
909 |
final_list.append("--------------------------------------------------\n")
|
910 |
-
|
911 |
-
|
912 |
else:
|
913 |
-
|
914 |
-
final_list.append(
|
915 |
-
final_list.append(Kijiku.je_check_text[i] + '\n\n')
|
916 |
-
|
917 |
final_list.append("--------------------------------------------------\n")
|
918 |
|
919 |
-
i+=2
|
920 |
|
921 |
return final_list
|
922 |
-
|
923 |
##-------------------start-of-assemble_results()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
924 |
|
925 |
@staticmethod
|
@@ -927,7 +972,7 @@ class Kijiku:
|
|
927 |
|
928 |
"""
|
929 |
|
930 |
-
Generates the
|
931 |
|
932 |
Parameters:
|
933 |
time_start (float) : When the translation started.
|
@@ -937,24 +982,24 @@ class Kijiku:
|
|
937 |
|
938 |
result = (
|
939 |
f"Time Elapsed : {Toolkit.get_elapsed_time(time_start, time_end)}\n"
|
940 |
-
f"Number of malformed batches : {
|
941 |
f"Debug text have been written to : {FileEnsurer.debug_log_path}\n"
|
942 |
f"J->E text have been written to : {FileEnsurer.je_check_path}\n"
|
943 |
f"Translated text has been written to : {FileEnsurer.translated_text_path}\n"
|
944 |
f"Errors have been written to : {FileEnsurer.error_log_path}\n"
|
945 |
)
|
946 |
|
947 |
-
|
948 |
|
949 |
-
##-------------------start-of-
|
950 |
|
951 |
@staticmethod
|
952 |
@permission_error_decorator()
|
953 |
-
def
|
954 |
|
955 |
"""
|
956 |
|
957 |
-
This function is called to write the results of the
|
958 |
|
959 |
"""
|
960 |
|
@@ -962,27 +1007,23 @@ class Kijiku:
|
|
962 |
FileEnsurer.standard_create_directory(FileEnsurer.output_dir)
|
963 |
|
964 |
with open(FileEnsurer.error_log_path, 'a+', encoding='utf-8') as file:
|
965 |
-
file.writelines(
|
966 |
|
967 |
with open(FileEnsurer.je_check_path, 'w', encoding='utf-8') as file:
|
968 |
-
file.writelines(
|
969 |
|
970 |
with open(FileEnsurer.translated_text_path, 'w', encoding='utf-8') as file:
|
971 |
-
file.writelines(
|
972 |
|
973 |
## Instructions to create a copy of the output for archival
|
974 |
FileEnsurer.standard_create_directory(FileEnsurer.archive_dir)
|
975 |
|
976 |
timestamp = Toolkit.get_timestamp(is_archival=True)
|
977 |
|
978 |
-
|
979 |
-
|
980 |
-
|
981 |
-
|
982 |
-
list_of_result_tuples = [('kijiku_translated_text', Kijiku.translated_text),
|
983 |
-
('kijiku_je_check_text', Kijiku.je_check_text),
|
984 |
-
('kijiku_error_log', Kijiku.error_text),
|
985 |
-
('debug_log', FileEnsurer.standard_read_file(Logger.log_file_path))]
|
986 |
|
987 |
FileEnsurer.archive_results(list_of_result_tuples,
|
988 |
-
module='
|
|
|
6 |
import typing
|
7 |
import asyncio
|
8 |
import os
|
9 |
+
import logging
|
10 |
|
11 |
## third party modules
|
12 |
from kairyou import KatakanaUtil
|
|
|
18 |
from handlers.json_handler import JsonHandler
|
19 |
|
20 |
from modules.common.file_ensurer import FileEnsurer
|
|
|
21 |
from modules.common.toolkit import Toolkit
|
22 |
+
from modules.common.exceptions import OpenAIAuthenticationError, MaxBatchDurationExceededException, DeepLAuthorizationException, OpenAIInternalServerError, OpenAIRateLimitError, OpenAIAPITimeoutError, GoogleAuthError, OpenAIAPIStatusError, OpenAIAPIConnectionError, DeepLException, GoogleAPIError
|
23 |
from modules.common.decorators import permission_error_decorator
|
24 |
|
25 |
+
##-------------------start-of-Translator--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
26 |
|
27 |
+
class Translator:
|
28 |
|
29 |
"""
|
30 |
|
31 |
+
Translator is a class that is used to interact with translation methods and translate text.
|
32 |
+
Currently supports OpenAI, Gemini, and DeepL.
|
33 |
|
34 |
"""
|
35 |
|
|
|
48 |
## meanwhile for gemini, we just need to send the prompt and the text to be translated concatenated together
|
49 |
gemini_translation_batches:typing.List[str] = []
|
50 |
|
51 |
+
## same as above, but for deepl, just the text to be translated
|
52 |
+
deepl_translation_batches:typing.List[str] = []
|
53 |
+
|
54 |
num_occurred_malformed_batches = 0
|
55 |
|
56 |
## semaphore to limit the number of concurrent batches
|
|
|
58 |
|
59 |
##--------------------------------------------------------------------------------------------------------------------------
|
60 |
|
61 |
+
TRANSLATION_METHOD:typing.Literal["openai", "gemini", "deepl"] = "openai"
|
62 |
|
63 |
translation_print_result = ""
|
64 |
|
|
|
74 |
|
75 |
decorator_to_use:typing.Callable
|
76 |
|
77 |
+
is_cli = False
|
78 |
+
|
79 |
+
pre_provided_api_key = ""
|
80 |
+
|
81 |
##-------------------start-of-get_max_batch_duration()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
82 |
|
83 |
@staticmethod
|
|
|
86 |
"""
|
87 |
|
88 |
Returns the max batch duration.
|
89 |
+
Structured as a function so that it can be used as a lambda function in the backoff decorator. As decorators call the function when they are defined/runtime, not when they are called. Which I learned the hard way.
|
90 |
|
91 |
Returns:
|
92 |
max_batch_duration (float) : the max batch duration.
|
93 |
|
94 |
"""
|
95 |
|
96 |
+
return Translator.max_batch_duration
|
97 |
|
98 |
##-------------------start-of-log_retry()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
99 |
|
|
|
111 |
|
112 |
retry_msg = f"Retrying translation after {details['wait']} seconds after {details['tries']} tries {details['target']} due to {details['exception']}."
|
113 |
|
114 |
+
logging.warning(retry_msg)
|
|
|
|
|
115 |
|
116 |
##-------------------start-of-log_failure()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
117 |
|
|
|
125 |
Parameters:
|
126 |
details (dict) : the details of the failure.
|
127 |
|
128 |
+
Raises:
|
129 |
+
MaxBatchDurationExceededException : An exception that is raised when the max batch duration is exceeded.
|
130 |
+
|
131 |
"""
|
132 |
|
133 |
+
error_msg = f"Exceeded allowed duration of {details['wait']} seconds, returning untranslated text after {details['tries']} tries {details['target']}."
|
134 |
|
135 |
+
logging.error(error_msg)
|
|
|
|
|
136 |
|
137 |
raise MaxBatchDurationExceededException(error_msg)
|
138 |
|
|
|
147 |
|
148 |
"""
|
149 |
|
|
|
|
|
150 |
## set this here cause the try-except could throw before we get past the settings configuration
|
151 |
time_start = time.time()
|
152 |
|
153 |
try:
|
154 |
|
155 |
+
await Translator.initialize()
|
156 |
|
157 |
JsonHandler.validate_json()
|
158 |
|
159 |
+
if(not Translator.is_cli):
|
160 |
+
await Translator.check_settings()
|
161 |
|
162 |
## set actual start time to the end of the settings configuration
|
163 |
time_start = time.time()
|
164 |
|
165 |
+
await Translator.commence_translation()
|
166 |
|
167 |
except Exception as e:
|
168 |
|
169 |
+
Translator.translation_print_result += "An error has occurred, outputting results so far..."
|
170 |
|
171 |
FileEnsurer.handle_critical_exception(e)
|
172 |
|
|
|
174 |
|
175 |
time_end = time.time()
|
176 |
|
177 |
+
Translator.assemble_results(time_start, time_end)
|
178 |
+
|
179 |
+
if(Translator.is_cli):
|
180 |
+
Toolkit.pause_console()
|
181 |
|
182 |
##-------------------start-of-initialize()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
183 |
|
|
|
186 |
|
187 |
"""
|
188 |
|
189 |
+
Sets the API Key for the respective service and loads the translation settings.
|
190 |
|
191 |
"""
|
192 |
|
193 |
+
translation_methods = {
|
194 |
+
"1": ("openai", FileEnsurer.openai_api_key_path),
|
195 |
+
"2": ("gemini", FileEnsurer.gemini_api_key_path),
|
196 |
+
"3": ("deepl", FileEnsurer.deepl_api_key_path),
|
197 |
+
}
|
198 |
|
199 |
+
if(not Translator.is_cli):
|
200 |
+
method = input("What method would you like to use for translation? (1 for OpenAI, 2 for Gemini, 3 for Deepl, or any other key to exit) : \n")
|
|
|
|
|
|
|
201 |
|
202 |
+
if(method not in translation_methods.keys()):
|
203 |
+
print("\nThank you for using Kudasai, goodbye.")
|
204 |
+
time.sleep(2)
|
205 |
+
FileEnsurer.exit_kudasai()
|
206 |
+
|
207 |
+
Toolkit.clear_console()
|
208 |
|
209 |
else:
|
210 |
+
method = Translator.TRANSLATION_METHOD
|
|
|
|
|
|
|
211 |
|
212 |
+
Translator.TRANSLATION_METHOD, api_key_path = translation_methods.get(method, ("deepl", FileEnsurer.deepl_api_key_path))
|
213 |
+
|
214 |
+
if(Translator.pre_provided_api_key != ""):
|
215 |
+
encoded_key = base64.b64encode(Translator.pre_provided_api_key.encode('utf-8')).decode('utf-8')
|
216 |
+
Translator.pre_provided_api_key = ""
|
217 |
+
with open(api_key_path, 'w+', encoding='utf-8') as file:
|
218 |
+
file.write(encoded_key)
|
219 |
|
220 |
+
await Translator.init_api_key(Translator.TRANSLATION_METHOD.capitalize(), api_key_path, EasyTL.set_credentials, EasyTL.test_credentials)
|
221 |
+
|
222 |
+
## try to load the translation settings
|
223 |
+
try:
|
224 |
+
JsonHandler.load_translation_settings()
|
225 |
+
## if the translation settings don't exist, create them
|
226 |
except:
|
227 |
+
JsonHandler.reset_translation_settings_to_default()
|
228 |
+
JsonHandler.load_translation_settings()
|
229 |
+
|
|
|
|
|
230 |
Toolkit.clear_console()
|
231 |
|
232 |
##-------------------start-of-init_openai_api_key()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
|
|
258 |
## if not valid, raise the exception that caused the test to fail
|
259 |
if(not is_valid and e is not None):
|
260 |
raise e
|
261 |
+
|
262 |
+
logging.info(f"Used saved API key in {api_key_path}")
|
|
|
263 |
|
264 |
time.sleep(2)
|
265 |
|
|
|
283 |
FileEnsurer.standard_overwrite_file(api_key_path, base64.b64encode(api_key.encode('utf-8')).decode('utf-8'), omit=True)
|
284 |
|
285 |
## if invalid key exit
|
286 |
+
except (GoogleAuthError, OpenAIAuthenticationError, DeepLAuthorizationException):
|
287 |
|
288 |
Toolkit.clear_console()
|
289 |
+
|
290 |
+
logging.error(f"Authorization error while setting up {service}, please double check your API key as it appears to be incorrect.")
|
291 |
|
292 |
Toolkit.pause_console()
|
293 |
|
294 |
+
exit(1)
|
295 |
|
296 |
## other error, alert user and raise it
|
297 |
except Exception as e:
|
298 |
|
299 |
Toolkit.clear_console()
|
300 |
|
301 |
+
logging.error(f"Unknown error while setting up {service}, The error is as follows " + str(e) + "\nThe exception will now be raised.")
|
302 |
|
303 |
Toolkit.pause_console()
|
304 |
|
|
|
316 |
|
317 |
"""
|
318 |
|
319 |
+
Translator.text_to_translate = []
|
320 |
+
Translator.translated_text = []
|
321 |
+
Translator.je_check_text = []
|
322 |
+
Translator.error_text = []
|
323 |
+
Translator.openai_translation_batches = []
|
324 |
+
Translator.gemini_translation_batches = []
|
325 |
+
Translator.num_occurred_malformed_batches = 0
|
326 |
+
Translator.translation_print_result = ""
|
327 |
+
Translator.TRANSLATION_METHOD = "openai"
|
328 |
+
Translator.pre_provided_api_key = ""
|
329 |
+
Translator.is_cli = False
|
330 |
|
331 |
##-------------------start-of-check-settings()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
332 |
|
|
|
335 |
|
336 |
"""
|
337 |
|
338 |
+
Prompts the user to confirm the settings in the translation settings file.
|
339 |
|
340 |
"""
|
341 |
|
342 |
print("Are these settings okay? (1 for yes or 2 for no) : \n\n")
|
343 |
|
344 |
+
method_to_section_dict = {
|
345 |
+
"openai": ("openai settings", "OpenAI", FileEnsurer.openai_api_key_path),
|
346 |
+
"gemini": ("gemini settings", "Gemini", FileEnsurer.gemini_api_key_path),
|
347 |
+
"deepl": ("deepl settings", "DeepL", FileEnsurer.deepl_api_key_path)
|
348 |
+
}
|
349 |
+
|
350 |
+
section_to_target, method_name, api_key_path = method_to_section_dict[Translator.TRANSLATION_METHOD]
|
351 |
+
|
352 |
try:
|
353 |
|
354 |
+
JsonHandler.log_translation_settings(output_to_console=True, specific_section=section_to_target)
|
355 |
|
356 |
except:
|
357 |
Toolkit.clear_console()
|
358 |
|
359 |
+
if(input("It's likely that you're using an outdated version of the translation settings file, press 1 to reset these to default or 2 to exit and resolve manually : ") == "1"):
|
360 |
Toolkit.clear_console()
|
361 |
+
JsonHandler.reset_translation_settings_to_default()
|
362 |
+
JsonHandler.load_translation_settings()
|
363 |
|
364 |
print("Are these settings okay? (1 for yes or 2 for no) : \n\n")
|
365 |
+
JsonHandler.log_translation_settings(output_to_console=True, specific_section=section_to_target)
|
|
|
366 |
else:
|
367 |
FileEnsurer.exit_kudasai()
|
368 |
|
369 |
+
if(input("\n") != "1"):
|
370 |
+
JsonHandler.change_translation_settings()
|
|
|
|
|
371 |
|
372 |
Toolkit.clear_console()
|
373 |
|
374 |
print("Do you want to change your API key? (1 for yes or 2 for no) : ")
|
375 |
|
376 |
if(input("\n") == "1"):
|
377 |
+
if(os.path.exists(api_key_path)):
|
378 |
+
os.remove(api_key_path)
|
379 |
+
await Translator.init_api_key(method_name, api_key_path, EasyTL.set_credentials, EasyTL.test_credentials)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
380 |
|
381 |
Toolkit.clear_console()
|
382 |
|
|
|
393 |
is_webgui (bool | optional | default=False) : A bool representing whether the function is being called by the webgui.
|
394 |
|
395 |
"""
|
396 |
+
|
397 |
+
logging.debug(f"Translator Activated, Translation Method : {Translator.TRANSLATION_METHOD} "
|
398 |
+
f"Settings are as follows : ")
|
399 |
|
400 |
+
JsonHandler.log_translation_settings()
|
401 |
+
|
402 |
+
Translator.prompt_assembly_mode = int(JsonHandler.current_translation_settings["base translation settings"]["prompt_assembly_mode"])
|
403 |
+
Translator.number_of_lines_per_batch = int(JsonHandler.current_translation_settings["base translation settings"]["number_of_lines_per_batch"])
|
404 |
+
Translator.sentence_fragmenter_mode = int(JsonHandler.current_translation_settings["base translation settings"]["sentence_fragmenter_mode"])
|
405 |
+
Translator.je_check_mode = int(JsonHandler.current_translation_settings["base translation settings"]["je_check_mode"])
|
406 |
+
Translator.num_of_malform_retries = int(JsonHandler.current_translation_settings["base translation settings"]["number_of_malformed_batch_retries"])
|
407 |
+
Translator.max_batch_duration = float(JsonHandler.current_translation_settings["base translation settings"]["batch_retry_timeout"])
|
408 |
+
Translator.num_concurrent_batches = int(JsonHandler.current_translation_settings["base translation settings"]["number_of_concurrent_batches"])
|
409 |
+
|
410 |
+
Translator._semaphore = asyncio.Semaphore(Translator.num_concurrent_batches)
|
411 |
+
|
412 |
+
Translator.openai_model = JsonHandler.current_translation_settings["openai settings"]["openai_model"]
|
413 |
+
Translator.openai_system_message = JsonHandler.current_translation_settings["openai settings"]["openai_system_message"]
|
414 |
+
Translator.openai_temperature = float(JsonHandler.current_translation_settings["openai settings"]["openai_temperature"])
|
415 |
+
Translator.openai_top_p = float(JsonHandler.current_translation_settings["openai settings"]["openai_top_p"])
|
416 |
+
Translator.openai_n = int(JsonHandler.current_translation_settings["openai settings"]["openai_n"])
|
417 |
+
Translator.openai_stream = bool(JsonHandler.current_translation_settings["openai settings"]["openai_stream"])
|
418 |
+
Translator.openai_stop = JsonHandler.current_translation_settings["openai settings"]["openai_stop"]
|
419 |
+
Translator.openai_logit_bias = JsonHandler.current_translation_settings["openai settings"]["openai_logit_bias"]
|
420 |
+
Translator.openai_max_tokens = JsonHandler.current_translation_settings["openai settings"]["openai_max_tokens"]
|
421 |
+
Translator.openai_presence_penalty = float(JsonHandler.current_translation_settings["openai settings"]["openai_presence_penalty"])
|
422 |
+
Translator.openai_frequency_penalty = float(JsonHandler.current_translation_settings["openai settings"]["openai_frequency_penalty"])
|
423 |
+
|
424 |
+
Translator.gemini_model = JsonHandler.current_translation_settings["gemini settings"]["gemini_model"]
|
425 |
+
Translator.gemini_prompt = JsonHandler.current_translation_settings["gemini settings"]["gemini_prompt"]
|
426 |
+
Translator.gemini_temperature = float(JsonHandler.current_translation_settings["gemini settings"]["gemini_temperature"])
|
427 |
+
Translator.gemini_top_p = JsonHandler.current_translation_settings["gemini settings"]["gemini_top_p"]
|
428 |
+
Translator.gemini_top_k = JsonHandler.current_translation_settings["gemini settings"]["gemini_top_k"]
|
429 |
+
Translator.gemini_candidate_count = JsonHandler.current_translation_settings["gemini settings"]["gemini_candidate_count"]
|
430 |
+
Translator.gemini_stream = bool(JsonHandler.current_translation_settings["gemini settings"]["gemini_stream"])
|
431 |
+
Translator.gemini_stop_sequences = JsonHandler.current_translation_settings["gemini settings"]["gemini_stop_sequences"]
|
432 |
+
Translator.gemini_max_output_tokens = JsonHandler.current_translation_settings["gemini settings"]["gemini_max_output_tokens"]
|
433 |
+
|
434 |
+
Translator.deepl_context = JsonHandler.current_translation_settings["deepl settings"]["deepl_context"]
|
435 |
+
Translator.deepl_split_sentences = JsonHandler.current_translation_settings["deepl settings"]["deepl_split_sentences"]
|
436 |
+
Translator.deepl_preserve_formatting = JsonHandler.current_translation_settings["deepl settings"]["deepl_preserve_formatting"]
|
437 |
+
Translator.deepl_formality = JsonHandler.current_translation_settings["deepl settings"]["deepl_formality"]
|
438 |
+
|
439 |
+
exception_dict = {
|
440 |
+
"openai": (OpenAIAuthenticationError, OpenAIInternalServerError, OpenAIRateLimitError, OpenAIAPITimeoutError, OpenAIAPIConnectionError, OpenAIAPIStatusError),
|
441 |
+
"gemini": GoogleAPIError,
|
442 |
+
"deepl": DeepLException
|
443 |
+
}
|
444 |
|
445 |
+
Translator.decorator_to_use = backoff.on_exception(
|
446 |
+
backoff.expo,
|
447 |
+
max_time=lambda: Translator.get_max_batch_duration(),
|
448 |
+
exception=exception_dict.get(Translator.TRANSLATION_METHOD, None),
|
449 |
+
on_backoff=lambda details: Translator.log_retry(details),
|
450 |
+
on_giveup=lambda details: Translator.log_failure(details),
|
451 |
+
raise_on_giveup=False
|
452 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
453 |
|
454 |
Toolkit.clear_console()
|
455 |
|
456 |
+
logging.info("Starting Prompt Building...")
|
|
|
|
|
|
|
|
|
457 |
|
458 |
+
Translator.build_translation_batches()
|
459 |
+
|
460 |
+
translation_methods = {
|
461 |
+
"openai": JsonHandler.current_translation_settings["openai settings"]["openai_model"],
|
462 |
+
"gemini": JsonHandler.current_translation_settings["gemini settings"]["gemini_model"],
|
463 |
+
"deepl": "deepl"
|
464 |
+
}
|
465 |
+
|
466 |
+
model = translation_methods[Translator.TRANSLATION_METHOD]
|
467 |
|
468 |
+
await Translator.handle_cost_estimate_prompt(model, omit_prompt=is_webgui or Translator.is_cli)
|
469 |
|
470 |
Toolkit.clear_console()
|
471 |
|
472 |
+
logging.info("Starting Translation...")
|
|
|
|
|
|
|
473 |
|
474 |
## requests to run asynchronously
|
475 |
+
async_requests = Translator.build_async_requests(model)
|
476 |
|
477 |
## Use asyncio.gather to run tasks concurrently/asynchronously and wait for all of them to complete
|
478 |
results = await asyncio.gather(*async_requests)
|
479 |
|
480 |
+
logging.info("Redistributing Translated Text...")
|
|
|
|
|
|
|
|
|
|
|
|
|
481 |
|
482 |
## Sort results based on the index to maintain order
|
483 |
sorted_results = sorted(results, key=lambda x: x[0])
|
484 |
|
485 |
## Redistribute the sorted results
|
486 |
+
for _, translated_prompt, translated_message in sorted_results:
|
487 |
+
Translator.redistribute(translated_prompt, translated_message)
|
488 |
|
489 |
## try to pair the text for j-e checking if the mode is 2
|
490 |
+
if(Translator.je_check_mode == 2):
|
491 |
+
Translator.je_check_text = Translator.fix_je()
|
492 |
|
493 |
Toolkit.clear_console()
|
494 |
|
495 |
+
logging.info("Done!")
|
|
|
|
|
|
|
|
|
496 |
|
497 |
##-------------------start-of-build_async_requests()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
498 |
|
|
|
512 |
"""
|
513 |
|
514 |
async_requests = []
|
515 |
+
|
516 |
+
translation_batches_methods = {
|
517 |
+
"openai": Translator.openai_translation_batches,
|
518 |
+
"gemini": Translator.gemini_translation_batches,
|
519 |
+
"deepl": Translator.deepl_translation_batches
|
520 |
+
}
|
521 |
+
|
522 |
+
translation_batches = translation_batches_methods[Translator.TRANSLATION_METHOD]
|
523 |
+
batch_length = len(translation_batches)
|
524 |
+
|
525 |
+
if(Translator.TRANSLATION_METHOD != "deepl"):
|
526 |
|
527 |
+
for i in range(0, batch_length, 2):
|
528 |
+
instructions = translation_batches[i]
|
529 |
+
prompt = translation_batches[i+1]
|
530 |
+
|
531 |
+
assert isinstance(instructions, (SystemTranslationMessage, str))
|
532 |
+
assert isinstance(prompt, (ModelTranslationMessage, str))
|
533 |
+
|
534 |
+
async_requests.append(Translator.handle_translation(model, i, batch_length, prompt, instructions))
|
|
|
535 |
|
536 |
+
else:
|
537 |
+
for i, batch in enumerate(translation_batches):
|
538 |
|
539 |
+
assert isinstance(batch, str)
|
540 |
+
|
541 |
+
async_requests.append(Translator.handle_translation(model, i, batch_length, batch, None))
|
542 |
+
|
543 |
return async_requests
|
544 |
|
545 |
##-------------------start-of-generate_text_to_translate_batches()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
|
|
562 |
|
563 |
prompt = []
|
564 |
non_word_pattern = re.compile(r'^[\W_\s\n-]+$')
|
565 |
+
special_chars = ["▼", "△", "◇"]
|
566 |
+
quotes = ["「", "」", "『", "』", "【", "】", "\"", "'"]
|
567 |
+
part_chars = ["1","2","3","4","5","6","7","8","9", " "]
|
568 |
+
|
569 |
+
while(index < len(Translator.text_to_translate)):
|
570 |
|
571 |
+
sentence = Translator.text_to_translate[index].strip()
|
|
|
|
|
|
|
572 |
lowercase_sentence = sentence.lower()
|
573 |
+
|
574 |
+
has_quotes = any(char in sentence for char in quotes)
|
575 |
is_part_in_sentence = "part" in lowercase_sentence
|
576 |
+
is_special_char = any(char in sentence for char in special_chars)
|
577 |
+
is_part_char = all(char in sentence for char in part_chars)
|
578 |
+
|
579 |
+
if(len(prompt) < Translator.number_of_lines_per_batch):
|
580 |
+
if(is_special_char or is_part_in_sentence or is_part_char):
|
581 |
prompt.append(f'{sentence}\n')
|
582 |
+
logging.debug(f"Sentence : {sentence}, Sentence is a pov change or part marker... adding to prompt.")
|
|
|
|
|
|
|
583 |
|
584 |
+
elif(non_word_pattern.match(sentence) or KatakanaUtil.is_punctuation(sentence) and not has_quotes):
|
585 |
+
logging.debug(f"Sentence : {sentence}, Sentence is punctuation... skipping.")
|
|
|
586 |
|
587 |
+
elif(sentence):
|
|
|
|
|
|
|
588 |
prompt.append(f'{sentence}\n')
|
589 |
+
logging.debug(f"Sentence : {sentence}, Sentence is a valid sentence... adding to prompt.")
|
|
|
590 |
else:
|
591 |
return prompt, index
|
592 |
+
|
593 |
index += 1
|
594 |
+
|
595 |
return prompt, index
|
596 |
|
597 |
##-------------------start-of-build_translation_batches()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
|
|
607 |
|
608 |
i = 0
|
609 |
|
610 |
+
while i < len(Translator.text_to_translate):
|
611 |
|
612 |
+
batch, i = Translator.generate_text_to_translate_batches(i)
|
613 |
batch = ''.join(batch)
|
614 |
|
615 |
+
if(Translator.TRANSLATION_METHOD == 'openai'):
|
616 |
|
617 |
+
if(Translator.prompt_assembly_mode == 1):
|
618 |
+
system_msg = SystemTranslationMessage(content=str(Translator.openai_system_message))
|
619 |
else:
|
620 |
+
system_msg = SystemTranslationMessage(content=str(Translator.openai_system_message))
|
621 |
|
622 |
+
Translator.openai_translation_batches.append(system_msg)
|
623 |
model_msg = ModelTranslationMessage(content=batch)
|
624 |
+
Translator.openai_translation_batches.append(model_msg)
|
625 |
+
|
626 |
+
elif(Translator.TRANSLATION_METHOD == 'gemini'):
|
627 |
+
Translator.gemini_translation_batches.append(Translator.gemini_prompt)
|
628 |
+
Translator.gemini_translation_batches.append(batch)
|
629 |
|
630 |
else:
|
631 |
+
Translator.deepl_translation_batches.append(batch)
|
|
|
632 |
|
633 |
+
logging_message = "Built Messages: \n\n"
|
634 |
+
|
635 |
+
batches_to_iterate = {
|
636 |
+
"openai": Translator.openai_translation_batches,
|
637 |
+
"gemini": Translator.gemini_translation_batches,
|
638 |
+
"deepl": Translator.deepl_translation_batches
|
639 |
+
}
|
640 |
|
641 |
i = 0
|
642 |
|
643 |
+
batches = batches_to_iterate[Translator.TRANSLATION_METHOD]
|
644 |
+
|
645 |
+
for message in batches:
|
646 |
|
647 |
i+=1
|
648 |
|
649 |
+
message = str(message) if Translator.TRANSLATION_METHOD != 'openai' else message.content # type: ignore
|
650 |
+
|
651 |
+
if(i % 2 == 1 and Translator.TRANSLATION_METHOD != 'deepl'):
|
652 |
+
logging_message += "\n" "------------------------" "\n"
|
653 |
|
654 |
+
logging_message += message + "\n"
|
|
|
655 |
|
656 |
+
logging.debug(logging_message)
|
657 |
|
658 |
##-------------------start-of-handle_cost_estimate_prompt()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
659 |
|
|
|
673 |
|
674 |
"""
|
675 |
|
676 |
+
translation_instructions_methods = {
|
677 |
+
"openai": Translator.openai_system_message,
|
678 |
+
"gemini": Translator.gemini_prompt,
|
679 |
+
"deepl": None,
|
680 |
+
}
|
681 |
+
|
682 |
+
translation_instructions = translation_instructions_methods[Translator.TRANSLATION_METHOD]
|
683 |
|
684 |
## get cost estimate and confirm
|
685 |
+
num_tokens, min_cost, model = EasyTL.calculate_cost(text=Translator.text_to_translate, service=Translator.TRANSLATION_METHOD, model=model,translation_instructions=translation_instructions)
|
686 |
|
687 |
print("Note that the cost estimate is not always accurate, and may be higher than the actual cost. However cost calculation now includes output tokens.\n")
|
688 |
|
689 |
+
if(Translator.TRANSLATION_METHOD == "gemini"):
|
690 |
+
logging.info(f"As of Kudasai {Toolkit.CURRENT_VERSION}, Gemini Pro 1.0 is free to use under 15 requests per minute, Gemini Pro 1.5 is free to use under 2 requests per minute. Requests correspond to number_of_current_batches in the translation settings.")
|
|
|
|
|
|
|
|
|
691 |
|
692 |
+
logging.info("Estimated number of tokens : " + str(num_tokens))
|
693 |
+
logging.info("Estimated minimum cost : " + str(min_cost) + " USD")
|
|
|
694 |
|
695 |
if(not omit_prompt):
|
696 |
if(input("\nContinue? (1 for yes or 2 for no) : ") == "1"):
|
697 |
+
logging.info("User confirmed translation.")
|
698 |
|
699 |
else:
|
700 |
+
logging.info("User cancelled translation.")
|
701 |
+
FileEnsurer.exit_kudasai()
|
|
|
702 |
|
703 |
return model
|
704 |
|
705 |
##-------------------start-of-handle_translation()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
706 |
|
707 |
@staticmethod
|
708 |
+
async def handle_translation(model:str,
|
709 |
+
batch_index:int,
|
710 |
+
length_of_batch:int,
|
711 |
+
text_to_translate:typing.Union[str, ModelTranslationMessage],
|
712 |
+
translation_instructions:typing.Union[str, SystemTranslationMessage, None]) -> tuple[int, str, str]:
|
713 |
|
714 |
"""
|
715 |
|
|
|
717 |
|
718 |
Parameters:
|
719 |
model (string) : The model of the service used to translate the text.
|
720 |
+
batch_index (int) : Which batch we are currently on.
|
721 |
+
length_of_batch (int) : How long the batches are.
|
722 |
+
text_to_translate (typing.Union[str, ModelTranslationMessage]) : The text to translate.
|
723 |
+
translation_instructions (typing.Union[str, SystemTranslationMessage, None]) : The translation instructions.
|
724 |
|
725 |
Returns:
|
726 |
+
batch_index (int) : The batch index.
|
727 |
+
text_to_translate (str) : The text to translate.
|
728 |
+
translated_text (str) : The translated text
|
729 |
|
730 |
"""
|
731 |
|
732 |
## Basically limits the number of concurrent batches
|
733 |
+
async with Translator._semaphore:
|
734 |
num_tries = 0
|
735 |
|
736 |
while True:
|
|
|
739 |
if(FileEnsurer.do_interrupt == True):
|
740 |
raise Exception("Interrupted by user.")
|
741 |
|
742 |
+
batch_number = (batch_index // 2) + 1
|
743 |
+
|
744 |
+
logging.info(f"Trying translation for batch {batch_number} of {length_of_batch//2}...")
|
745 |
|
746 |
try:
|
747 |
|
748 |
+
translation_methods = {
|
749 |
+
"openai": EasyTL.openai_translate_async,
|
750 |
+
"gemini": EasyTL.gemini_translate_async,
|
751 |
+
"deepl": EasyTL.deepl_translate_async
|
752 |
+
}
|
753 |
+
|
754 |
+
translation_params = {
|
755 |
+
"openai": {
|
756 |
+
"text": text_to_translate,
|
757 |
+
"decorator": Translator.decorator_to_use,
|
758 |
+
"translation_instructions": translation_instructions,
|
759 |
+
"model": model,
|
760 |
+
"temperature": Translator.openai_temperature,
|
761 |
+
"top_p": Translator.openai_top_p,
|
762 |
+
"stop": Translator.openai_stop,
|
763 |
+
"max_tokens": Translator.openai_max_tokens,
|
764 |
+
"presence_penalty": Translator.openai_presence_penalty,
|
765 |
+
"frequency_penalty": Translator.openai_frequency_penalty
|
766 |
+
},
|
767 |
+
"gemini": {
|
768 |
+
"text": text_to_translate,
|
769 |
+
"decorator": Translator.decorator_to_use,
|
770 |
+
"model": model,
|
771 |
+
"temperature": Translator.gemini_temperature,
|
772 |
+
"top_p": Translator.gemini_top_p,
|
773 |
+
"top_k": Translator.gemini_top_k,
|
774 |
+
"stop_sequences": Translator.gemini_stop_sequences,
|
775 |
+
"max_output_tokens": Translator.gemini_max_output_tokens
|
776 |
+
},
|
777 |
+
"deepl": {
|
778 |
+
"text": text_to_translate,
|
779 |
+
"decorator": Translator.decorator_to_use,
|
780 |
+
"context": Translator.deepl_context,
|
781 |
+
"split_sentences": Translator.deepl_split_sentences,
|
782 |
+
"preserve_formatting": Translator.deepl_preserve_formatting,
|
783 |
+
"formality": Translator.deepl_formality
|
784 |
+
}
|
785 |
+
}
|
786 |
+
|
787 |
+
assert isinstance(text_to_translate, ModelTranslationMessage if Translator.TRANSLATION_METHOD == "openai" else str)
|
788 |
+
|
789 |
+
translated_message = await translation_methods[Translator.TRANSLATION_METHOD](**translation_params[Translator.TRANSLATION_METHOD])
|
790 |
|
791 |
## will only occur if the max_batch_duration is exceeded, so we just return the untranslated text
|
792 |
except MaxBatchDurationExceededException:
|
793 |
|
794 |
+
logging.error(f"Batch {batch_number} of {length_of_batch//2} was not translated due to exceeding the max request duration, returning the untranslated text...")
|
795 |
break
|
796 |
|
797 |
## do not even bother if not a gpt 4 model, because gpt-3 seems unable to format properly
|
798 |
## since gemini is free, we can just try again if it's malformed
|
799 |
+
## deepl should produce properly formatted text so we don't need to check
|
800 |
+
if("gpt-4" not in model and Translator.TRANSLATION_METHOD == "openai"):
|
801 |
break
|
802 |
|
803 |
+
if(await Translator.check_if_translation_is_good(translated_message, text_to_translate)): # type: ignore
|
|
|
804 |
break
|
805 |
|
806 |
+
if(num_tries >= Translator.num_of_malform_retries):
|
807 |
+
logging.warning(f"Batch {batch_number} of {length_of_batch//2} was malformed but exceeded the max number of retries ({Translator.num_of_malform_retries})")
|
808 |
break
|
809 |
|
810 |
else:
|
811 |
num_tries += 1
|
812 |
+
logging.warning(f"Batch {batch_number} of {length_of_batch//2} was malformed, retrying...")
|
813 |
+
Translator.num_occurred_malformed_batches += 1
|
814 |
|
815 |
+
if(isinstance(text_to_translate, ModelTranslationMessage)):
|
816 |
+
text_to_translate = text_to_translate.content
|
817 |
|
818 |
if(isinstance(translated_message, typing.List)):
|
819 |
+
translated_message = ''.join(translated_message) # type: ignore
|
820 |
+
|
821 |
+
logging.info(f"Translation for batch {batch_number} of {length_of_batch//2} completed.")
|
822 |
|
823 |
+
return batch_index, text_to_translate, translated_message # type: ignore
|
824 |
|
825 |
##-------------------start-of-check_if_translation_is_good()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
826 |
|
827 |
@staticmethod
|
828 |
+
async def check_if_translation_is_good(translated_message:typing.Union[typing.List[str], str], text_to_translate:typing.Union[ModelTranslationMessage, str]) -> bool:
|
829 |
|
830 |
"""
|
831 |
|
|
|
833 |
|
834 |
Parameters:
|
835 |
translated_message (str) : the translated message.
|
836 |
+
text_to_translate (typing.Union[str, Message]) : the translation prompt.
|
837 |
|
838 |
Returns:
|
839 |
is_valid (bool) : whether or not the translation is valid.
|
840 |
|
841 |
"""
|
842 |
|
843 |
+
if(not isinstance(text_to_translate, str)):
|
844 |
+
prompt = text_to_translate.content
|
845 |
|
846 |
else:
|
847 |
+
prompt = text_to_translate
|
848 |
|
849 |
if(isinstance(translated_message, list)):
|
850 |
translated_message = ''.join(translated_message)
|
851 |
|
|
|
|
|
852 |
jap = [line for line in prompt.split('\n') if line.strip()] ## Remove blank lines
|
853 |
eng = [line for line in translated_message.split('\n') if line.strip()] ## Remove blank lines
|
|
|
|
|
|
|
854 |
|
855 |
+
return len(jap) == len(eng)
|
856 |
|
857 |
##-------------------start-of-redistribute()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
858 |
|
859 |
@staticmethod
|
860 |
+
def redistribute(text_to_translate:typing.Union[Message, str], translated_message:str) -> None:
|
861 |
|
862 |
"""
|
863 |
|
864 |
Puts translated text back into the text file.
|
865 |
|
866 |
Parameters:
|
867 |
+
text_to_translate (typing.Union[str, Message]) : the translation prompt.
|
868 |
translated_message (str) : the translated message.
|
869 |
|
870 |
"""
|
871 |
|
872 |
+
if(not isinstance(text_to_translate, str)):
|
873 |
+
prompt = text_to_translate.content
|
874 |
|
875 |
else:
|
876 |
+
prompt = text_to_translate
|
877 |
|
878 |
## Separates with hyphens if the mode is 1
|
879 |
+
if(Translator.je_check_mode == 1):
|
880 |
|
881 |
+
Translator.je_check_text.append("\n-------------------------\n"+ prompt + "\n\n")
|
882 |
+
Translator.je_check_text.append(translated_message + '\n')
|
883 |
|
884 |
## Mode two tries to pair the text for j-e checking, see fix_je() for more details
|
885 |
+
elif(Translator.je_check_mode == 2):
|
886 |
+
Translator.je_check_text.append(prompt)
|
887 |
+
Translator.je_check_text.append(translated_message)
|
888 |
|
889 |
## mode 1 is the default mode, uses regex and other nonsense to split sentences
|
890 |
+
if(Translator.sentence_fragmenter_mode == 1):
|
891 |
|
892 |
sentences = re.findall(r"(.*?(?:(?:\"|\'|-|~|!|\?|%|\(|\)|\.\.\.|\.|---|\[|\])))(?:\s|$)", translated_message)
|
893 |
|
|
|
910 |
build_string += f" {sentence}"
|
911 |
continue
|
912 |
|
913 |
+
Translator.translated_text.append(sentence + '\n')
|
914 |
|
915 |
+
for i in range(len(Translator.translated_text)):
|
916 |
+
if Translator.translated_text[i] in patched_sentences:
|
917 |
+
index = patched_sentences.index(Translator.translated_text[i])
|
918 |
+
Translator.translated_text[i] = patched_sentences[index]
|
919 |
|
920 |
## mode 2 just assumes the LLM formatted it properly
|
921 |
+
elif(Translator.sentence_fragmenter_mode == 2):
|
922 |
|
923 |
+
Translator.translated_text.append(translated_message + '\n\n')
|
924 |
|
925 |
##-------------------start-of-fix_je()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
926 |
|
|
|
941 |
i = 1
|
942 |
final_list = []
|
943 |
|
944 |
+
while(i < len(Translator.je_check_text)):
|
945 |
+
jap = Translator.je_check_text[i-1].split('\n')
|
946 |
+
eng = Translator.je_check_text[i].split('\n')
|
947 |
|
948 |
+
jap = [line for line in jap if(line.strip())] # Remove blank lines
|
949 |
+
eng = [line for line in eng if(line.strip())] # Remove blank lines
|
950 |
|
951 |
final_list.append("-------------------------\n")
|
952 |
|
953 |
if(len(jap) == len(eng)):
|
954 |
+
for(jap_line, eng_line) in zip(jap, eng):
|
955 |
+
if(jap_line and eng_line): # check if jap_line and eng_line aren't blank
|
|
|
956 |
final_list.append(jap_line + '\n\n')
|
957 |
final_list.append(eng_line + '\n\n')
|
|
|
958 |
final_list.append("--------------------------------------------------\n")
|
|
|
|
|
959 |
else:
|
960 |
+
final_list.append(Translator.je_check_text[i-1] + '\n\n')
|
961 |
+
final_list.append(Translator.je_check_text[i] + '\n\n')
|
|
|
|
|
962 |
final_list.append("--------------------------------------------------\n")
|
963 |
|
964 |
+
i += 2
|
965 |
|
966 |
return final_list
|
967 |
+
|
968 |
##-------------------start-of-assemble_results()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
969 |
|
970 |
@staticmethod
|
|
|
972 |
|
973 |
"""
|
974 |
|
975 |
+
Generates the Translator translation print result, does not directly output/return, but rather sets Translator.translation_print_result to the output.
|
976 |
|
977 |
Parameters:
|
978 |
time_start (float) : When the translation started.
|
|
|
982 |
|
983 |
result = (
|
984 |
f"Time Elapsed : {Toolkit.get_elapsed_time(time_start, time_end)}\n"
|
985 |
+
f"Number of malformed batches : {Translator.num_occurred_malformed_batches}\n\n"
|
986 |
f"Debug text have been written to : {FileEnsurer.debug_log_path}\n"
|
987 |
f"J->E text have been written to : {FileEnsurer.je_check_path}\n"
|
988 |
f"Translated text has been written to : {FileEnsurer.translated_text_path}\n"
|
989 |
f"Errors have been written to : {FileEnsurer.error_log_path}\n"
|
990 |
)
|
991 |
|
992 |
+
Translator.translation_print_result = result
|
993 |
|
994 |
+
##-------------------start-of-write_translator_results()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
995 |
|
996 |
@staticmethod
|
997 |
@permission_error_decorator()
|
998 |
+
def write_translator_results() -> None:
|
999 |
|
1000 |
"""
|
1001 |
|
1002 |
+
This function is called to write the results of the Translator module to the output directory.
|
1003 |
|
1004 |
"""
|
1005 |
|
|
|
1007 |
FileEnsurer.standard_create_directory(FileEnsurer.output_dir)
|
1008 |
|
1009 |
with open(FileEnsurer.error_log_path, 'a+', encoding='utf-8') as file:
|
1010 |
+
file.writelines(Translator.error_text)
|
1011 |
|
1012 |
with open(FileEnsurer.je_check_path, 'w', encoding='utf-8') as file:
|
1013 |
+
file.writelines(Translator.je_check_text)
|
1014 |
|
1015 |
with open(FileEnsurer.translated_text_path, 'w', encoding='utf-8') as file:
|
1016 |
+
file.writelines(Translator.translated_text)
|
1017 |
|
1018 |
## Instructions to create a copy of the output for archival
|
1019 |
FileEnsurer.standard_create_directory(FileEnsurer.archive_dir)
|
1020 |
|
1021 |
timestamp = Toolkit.get_timestamp(is_archival=True)
|
1022 |
|
1023 |
+
list_of_result_tuples = [('kudasai_translated_text', Translator.translated_text),
|
1024 |
+
('kudasai_je_check_text', Translator.je_check_text),
|
1025 |
+
('kudasai_error_log', Translator.error_text),
|
1026 |
+
('debug_log', FileEnsurer.standard_read_file(FileEnsurer.debug_log_path))]
|
|
|
|
|
|
|
|
|
1027 |
|
1028 |
FileEnsurer.archive_results(list_of_result_tuples,
|
1029 |
+
module='translator', timestamp=timestamp)
|
modules/gui/gui_json_util.py
CHANGED
@@ -12,16 +12,16 @@ from handlers.json_handler import JsonHandler
|
|
12 |
|
13 |
class GuiJsonUtil:
|
14 |
|
15 |
-
|
16 |
|
17 |
##-------------------start-of-fetch_kijiku_setting_key_values()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
18 |
|
19 |
@staticmethod
|
20 |
-
def
|
21 |
|
22 |
"""
|
23 |
|
24 |
-
Fetches the default values for the settings tab from the
|
25 |
|
26 |
Parameters:
|
27 |
key_name (str) : Which value to fetch.
|
@@ -32,16 +32,16 @@ class GuiJsonUtil:
|
|
32 |
"""
|
33 |
|
34 |
## Done this way because if the value is None, it'll be shown as a blank string in the settings tab, which is not what we want.
|
35 |
-
return GuiJsonUtil.
|
36 |
|
37 |
-
##-------------------start-of-
|
38 |
|
39 |
@staticmethod
|
40 |
-
def
|
41 |
|
42 |
"""
|
43 |
|
44 |
-
Dumps the new values for the settings tab into the
|
45 |
|
46 |
Parameters:
|
47 |
new_values (typing.List[typing.Tuple[str,str]]) : A list of tuples containing the key and value to be updated.
|
@@ -49,7 +49,7 @@ class GuiJsonUtil:
|
|
49 |
"""
|
50 |
|
51 |
## save old json in case of need to revert
|
52 |
-
old_rules = GuiJsonUtil.
|
53 |
new_rules = old_rules.copy()
|
54 |
|
55 |
try:
|
@@ -58,32 +58,32 @@ class GuiJsonUtil:
|
|
58 |
for key, value in new_values:
|
59 |
new_rules[header][key] = JsonHandler.convert_to_correct_type(key, str(value))
|
60 |
|
61 |
-
JsonHandler.
|
62 |
JsonHandler.validate_json()
|
63 |
|
64 |
## validate_json() sets a dict to the invalid placeholder if it's invalid, so if it's that, it's invalid
|
65 |
-
assert JsonHandler.
|
66 |
|
67 |
-
## so, because of how gradio deals with temp file, we need to both dump into the settings file from FileEnsurer AND the
|
68 |
## name is the path to the file btw
|
69 |
-
with open(FileEnsurer.
|
70 |
json.dump(new_rules, file)
|
71 |
|
72 |
-
with open(
|
73 |
json.dump(new_rules, file)
|
74 |
|
75 |
-
GuiJsonUtil.
|
76 |
|
77 |
except Exception as e:
|
78 |
|
79 |
## revert to old data
|
80 |
-
with open(FileEnsurer.
|
81 |
json.dump(old_rules, file)
|
82 |
|
83 |
-
with open(
|
84 |
json.dump(old_rules, file)
|
85 |
|
86 |
-
GuiJsonUtil.
|
87 |
|
88 |
## throw error so webgui can tell user
|
89 |
raise e
|
|
|
12 |
|
13 |
class GuiJsonUtil:
|
14 |
|
15 |
+
current_translation_settings = dict()
|
16 |
|
17 |
##-------------------start-of-fetch_kijiku_setting_key_values()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
18 |
|
19 |
@staticmethod
|
20 |
+
def fetch_translation_settings_key_values(header:str, key_name:str) -> str:
|
21 |
|
22 |
"""
|
23 |
|
24 |
+
Fetches the default values for the settings tab from the translation_settings.json file.
|
25 |
|
26 |
Parameters:
|
27 |
key_name (str) : Which value to fetch.
|
|
|
32 |
"""
|
33 |
|
34 |
## Done this way because if the value is None, it'll be shown as a blank string in the settings tab, which is not what we want.
|
35 |
+
return GuiJsonUtil.current_translation_settings[header].get(key_name, "None")
|
36 |
|
37 |
+
##-------------------start-of-update_translation_settings_with_new_values()---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
38 |
|
39 |
@staticmethod
|
40 |
+
def update_translation_settings_with_new_values(gradio_translation_settings:gr.File, new_values:typing.List[typing.Tuple[str,str]]) -> None:
|
41 |
|
42 |
"""
|
43 |
|
44 |
+
Dumps the new values for the settings tab into the translation_settings.json file.
|
45 |
|
46 |
Parameters:
|
47 |
new_values (typing.List[typing.Tuple[str,str]]) : A list of tuples containing the key and value to be updated.
|
|
|
49 |
"""
|
50 |
|
51 |
## save old json in case of need to revert
|
52 |
+
old_rules = GuiJsonUtil.current_translation_settings
|
53 |
new_rules = old_rules.copy()
|
54 |
|
55 |
try:
|
|
|
58 |
for key, value in new_values:
|
59 |
new_rules[header][key] = JsonHandler.convert_to_correct_type(key, str(value))
|
60 |
|
61 |
+
JsonHandler.current_translation_settings = new_rules
|
62 |
JsonHandler.validate_json()
|
63 |
|
64 |
## validate_json() sets a dict to the invalid placeholder if it's invalid, so if it's that, it's invalid
|
65 |
+
assert JsonHandler.current_translation_settings != FileEnsurer.INVALID_TRANSLATION_SETTINGS_PLACEHOLDER
|
66 |
|
67 |
+
## so, because of how gradio deals with temp file, we need to both dump into the settings file from FileEnsurer AND the gradio_translation_settings file which is stored in the temp folder under AppData
|
68 |
## name is the path to the file btw
|
69 |
+
with open(FileEnsurer.config_translation_settings_path, "w") as file:
|
70 |
json.dump(new_rules, file)
|
71 |
|
72 |
+
with open(gradio_translation_settings.name, "w") as file: ## type: ignore
|
73 |
json.dump(new_rules, file)
|
74 |
|
75 |
+
GuiJsonUtil.current_translation_settings = new_rules
|
76 |
|
77 |
except Exception as e:
|
78 |
|
79 |
## revert to old data
|
80 |
+
with open(FileEnsurer.config_translation_settings_path, "w") as file:
|
81 |
json.dump(old_rules, file)
|
82 |
|
83 |
+
with open(gradio_translation_settings.name, "w") as file: ## type: ignore
|
84 |
json.dump(old_rules, file)
|
85 |
|
86 |
+
GuiJsonUtil.current_translation_settings = old_rules
|
87 |
|
88 |
## throw error so webgui can tell user
|
89 |
raise e
|
requirements.txt
CHANGED
@@ -1,5 +1,4 @@
|
|
1 |
backoff==2.2.1
|
2 |
-
gradio==4.
|
3 |
kairyou==1.5.0
|
4 |
-
easytl==0.
|
5 |
-
ja_core_news_lg @ https://github.com/explosion/spacy-models/releases/download/ja_core_news_lg-3.7.0/ja_core_news_lg-3.7.0-py3-none-any.whl#sha256=f08eecb4d40523045c9478ce59a67564fd71edd215f32c076fa91dc1f05cc7fd
|
|
|
1 |
backoff==2.2.1
|
2 |
+
gradio==4.20.0
|
3 |
kairyou==1.5.0
|
4 |
+
easytl==0.4.0-alpha-2
|
|
util/openai_model_info/openai_chat_model_info.csv
DELETED
@@ -1,17 +0,0 @@
|
|
1 |
-
,Batch,Name,Price,Recommended Replacement,Depreciation Date,Shutdown Date (earliest)
|
2 |
-
,,,,,,
|
3 |
-
,First (depreciated I),gpt-3.5-turbo-0301,$0.0015 / 1K input tokens + $0.0020 / 1K output tokens,gpt-3.5-turbo-0613,"June 13, 2023","June 13, 2024"
|
4 |
-
,,gpt-4-0314,$0.03 / 1K input tokens + $0.06 / 1K output tokens,gpt-4-0613,"June 13, 2023","June 13, 2024"
|
5 |
-
,,gpt-4-32k-0314,$0.06 / 1K input tokens + $0.12 / 1K output tokens,gpt-4-32k-0613,"June 13, 2023","June 13, 2024"
|
6 |
-
,,,,,,
|
7 |
-
,Second (depreciated II),gpt-3.5-turbo-0613,$0.0015 / 1K input tokens + $0.0020 / 1K output tokens,gpt-3.5-turbo-1106,"November 6,2023","June 13, 2024"
|
8 |
-
,,gpt-3.5-turbo-16k-0613,$0.0030 / 1K input tokens + $0.0040 / 1K output tokens,gpt-3.5-turbo-1106,"November 6,2023","June 13, 2024"
|
9 |
-
,,,,,,
|
10 |
-
,Third (Outdated),gpt-3.5-turbo-1106,$0.0010 / 1K input tokens + $0.0020 / 1K output tokens,N/A,N/A,N/A
|
11 |
-
,,,,,,
|
12 |
-
,Fourth (Current),gpt-3.5-turbo-0125,$0.005 / 1K input tokens + $0.0015 / 1K output tokens,N/A,N/A,N/A
|
13 |
-
,,gpt-4-0613,$0.03 / 1K input tokens + $0.06 / 1K output tokens,N/A,N/A,N/A
|
14 |
-
,,gpt-4-32k-0613,$0.06 / 1K input tokens + $0.012 / 1K output tokens,N/A,N/A,N/A
|
15 |
-
,,,,,,
|
16 |
-
,Fifth (Future),gpt-4-1106-preview,$0.01 / 1K input tokens + $0.03 / 1K output tokens,N/A,N/A,N/A
|
17 |
-
,,gpt-4-0125-preview,$0.01 / 1K input tokens + $0.03 / 1K output tokens,N/A,N/A,N/A
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
util/openai_model_info/openai_chat_model_info.pdf
DELETED
Binary file (31.3 kB)
|
|
util/openai_model_info/openai_chat_model_info.xlsx
DELETED
Binary file (5.68 kB)
|
|
util/openai_model_info/webpage/openai_chat_model_info.html
DELETED
@@ -1,2 +0,0 @@
|
|
1 |
-
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><link type="text/css" rel="stylesheet" href="resources/sheet.css" >
|
2 |
-
<style type="text/css">.ritz .waffle a { color: inherit; }.ritz .waffle .s1{background-color:#ffffff;text-align:right;color:#000000;font-family:'Arial';font-size:10pt;vertical-align:bottom;white-space:nowrap;direction:ltr;padding:2px 3px 2px 3px;}.ritz .waffle .s0{background-color:#ffffff;text-align:left;color:#000000;font-family:'Arial';font-size:10pt;vertical-align:bottom;white-space:nowrap;direction:ltr;padding:2px 3px 2px 3px;}</style><div class="ritz grid-container" dir="ltr"><table class="waffle" cellspacing="0" cellpadding="0"><thead><tr><th class="row-header freezebar-origin-ltr"></th><th id="0C0" style="width:100px;" class="column-headers-background">A</th><th id="0C1" style="width:141px;" class="column-headers-background">B</th><th id="0C2" style="width:159px;" class="column-headers-background">C</th><th id="0C3" style="width:336px;" class="column-headers-background">D</th><th id="0C4" style="width:182px;" class="column-headers-background">E</th><th id="0C5" style="width:167px;" class="column-headers-background">F</th><th id="0C6" style="width:151px;" class="column-headers-background">G</th></tr></thead><tbody><tr style="height: 20px"><th id="0R0" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">1</div></th><td></td><td class="s0" dir="ltr">Batch</td><td class="s0" dir="ltr">Name</td><td class="s0" dir="ltr">Price</td><td class="s0" dir="ltr">Recommended Replacement</td><td class="s0" dir="ltr">Depreciation Date</td><td class="s0" dir="ltr">Shutdown Date (earliest)</td></tr><tr style="height: 20px"><th id="0R1" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">2</div></th><td></td><td></td><td></td><td></td><td></td><td class="s0" dir="ltr"></td><td class="s0" dir="ltr"></td></tr><tr style="height: 20px"><th id="0R2" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">3</div></th><td></td><td class="s0" dir="ltr">First (depreciated I)</td><td class="s0" dir="ltr">gpt-3.5-turbo-0301</td><td class="s0" dir="ltr">$0.0015 / 1K input tokens + $0.0020 / 1K output tokens</td><td class="s0" dir="ltr">gpt-3.5-turbo-0613</td><td class="s1" dir="ltr">June 13, 2023</td><td class="s1" dir="ltr">June 13, 2024</td></tr><tr style="height: 20px"><th id="0R3" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">4</div></th><td></td><td></td><td class="s0" dir="ltr">gpt-4-0314</td><td class="s0" dir="ltr">$0.03 / 1K input tokens + $0.06 / 1K output tokens</td><td class="s0" dir="ltr">gpt-4-0613</td><td class="s1" dir="ltr">June 13, 2023</td><td class="s1" dir="ltr">June 13, 2024</td></tr><tr style="height: 20px"><th id="0R4" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">5</div></th><td></td><td></td><td class="s0" dir="ltr">gpt-4-32k-0314</td><td class="s0" dir="ltr">$0.06 / 1K input tokens + $0.12 / 1K output tokens</td><td class="s0" dir="ltr">gpt-4-32k-0613</td><td class="s1" dir="ltr">June 13, 2023</td><td class="s1" dir="ltr">June 13, 2024</td></tr><tr style="height: 20px"><th id="0R5" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">6</div></th><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr><tr style="height: 20px"><th id="0R6" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">7</div></th><td></td><td class="s0" dir="ltr">Second (depreciated II)</td><td class="s0" dir="ltr">gpt-3.5-turbo-0613</td><td class="s0" dir="ltr">$0.0015 / 1K input tokens + $0.0020 / 1K output tokens</td><td class="s0" dir="ltr">gpt-3.5-turbo-1106</td><td class="s1" dir="ltr">November 6,2023</td><td class="s1" dir="ltr">June 13, 2024</td></tr><tr style="height: 20px"><th id="0R7" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">8</div></th><td></td><td></td><td class="s0" dir="ltr">gpt-3.5-turbo-16k-0613</td><td class="s0" dir="ltr">$0.0030 / 1K input tokens + $0.0040 / 1K output tokens</td><td class="s0" dir="ltr">gpt-3.5-turbo-1106</td><td class="s1" dir="ltr">November 6,2023</td><td class="s1" dir="ltr">June 13, 2024</td></tr><tr style="height: 20px"><th id="0R8" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">9</div></th><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr><tr style="height: 20px"><th id="0R9" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">10</div></th><td></td><td class="s0" dir="ltr">Third (Outdated)</td><td class="s0" dir="ltr">gpt-3.5-turbo-1106</td><td class="s0" dir="ltr">$0.0010 / 1K input tokens + $0.0020 / 1K output tokens</td><td class="s0" dir="ltr">N/A</td><td class="s0" dir="ltr">N/A</td><td class="s0" dir="ltr">N/A</td></tr><tr style="height: 20px"><th id="0R10" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">11</div></th><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr><tr style="height: 20px"><th id="0R11" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">12</div></th><td></td><td class="s0" dir="ltr">Fourth (Current)</td><td class="s0" dir="ltr">gpt-3.5-turbo-0125</td><td class="s0" dir="ltr">$0.005 / 1K input tokens + $0.0015 / 1K output tokens</td><td class="s0" dir="ltr">N/A</td><td class="s0" dir="ltr">N/A</td><td class="s0" dir="ltr">N/A</td></tr><tr style="height: 20px"><th id="0R12" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">13</div></th><td></td><td></td><td class="s0" dir="ltr">gpt-4-0613</td><td class="s0" dir="ltr">$0.03 / 1K input tokens + $0.06 / 1K output tokens</td><td class="s0" dir="ltr">N/A</td><td class="s0" dir="ltr">N/A</td><td class="s0" dir="ltr">N/A</td></tr><tr style="height: 20px"><th id="0R13" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">14</div></th><td></td><td></td><td class="s0" dir="ltr">gpt-4-32k-0613</td><td class="s0" dir="ltr">$0.06 / 1K input tokens + $0.012 / 1K output tokens</td><td class="s0" dir="ltr">N/A</td><td class="s0" dir="ltr">N/A</td><td class="s0" dir="ltr">N/A</td></tr><tr style="height: 20px"><th id="0R14" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">15</div></th><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr><tr style="height: 20px"><th id="0R15" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">16</div></th><td></td><td class="s0">Fifth (Future)</td><td class="s0">gpt-4-1106-preview</td><td class="s0">$0.01 / 1K input tokens + $0.03 / 1K output tokens</td><td class="s0">N/A</td><td class="s0">N/A</td><td class="s0">N/A</td></tr><tr style="height: 20px"><th id="0R16" style="height: 20px;" class="row-headers-background"><div class="row-header-wrapper" style="line-height: 20px">17</div></th><td></td><td class="s0"></td><td class="s0">gpt-4-0125-preview</td><td class="s0">$0.01 / 1K input tokens + $0.03 / 1K output tokens</td><td class="s0">N/A</td><td class="s0">N/A</td><td class="s0">N/A</td></tr></tbody></table></div>
|
|
|
|
|
|
util/openai_model_info/webpage/resources/sheet.css
DELETED
The diff for this file is too large to render.
See raw diff
|
|
webgui.py
CHANGED
The diff for this file is too large to render.
See raw diff
|
|