added readme
Browse files
README.md
CHANGED
@@ -1,19 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
2 |
**Table of Contents**
|
3 |
|
4 |
- [**Notes**](#notes)
|
5 |
-
- [**
|
6 |
-
- [**
|
7 |
-
- [**Command Line Interface (CLI)**](#command-line-interface-cli)
|
8 |
-
- [Usage](#usage)
|
9 |
-
- [Preprocess Mode](#preprocess-mode)
|
10 |
-
- [Translate Mode](#translate-mode)
|
11 |
-
- [Additional Notes](#additional-notes)
|
12 |
-
- [**Preprocessing**](#preprocessing)
|
13 |
- [**Translator**](#translator)
|
14 |
- [**Translator Settings**](#translator-settings)
|
15 |
- [**Web GUI**](#web-gui)
|
16 |
-
- [**Hugging Face**](#hugging-face)
|
17 |
- [**License**](#license)
|
18 |
- [**Contact**](#contact)
|
19 |
- [**Acknowledgements**](#acknowledgements)
|
@@ -21,11 +27,7 @@
|
|
21 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
22 |
## **Notes**<a name="notes"></a>
|
23 |
|
24 |
-
|
25 |
-
|
26 |
-
To see the README for the Hugging Face hosted version of Kudasai, please see [here](https://github.com/Bikatr7/Kudasai/blob/main/lib/gui/HUGGING_FACE_README.md). Further WebGUI documentation can be found there as well.
|
27 |
-
|
28 |
-
Python version: 3.10+
|
29 |
|
30 |
Streamlining Japanese-English Translation with Advanced Preprocessing and Integrated Translation Technologies.
|
31 |
|
@@ -33,170 +35,60 @@ Preprocessor and Translation logic is sourced from external packages, which I al
|
|
33 |
|
34 |
Kudasai has a public trello board, you can find it [here](https://trello.com/b/Wsuwr24S/kudasai) to see what I'm working on and what's coming up.
|
35 |
|
|
|
|
|
36 |
Kudasai is proud to have been a Backdrop Build v3 Finalist:
|
37 |
https://backdropbuild.com/builds/v3/kudasai
|
38 |
|
39 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
40 |
-
## **Dependencies**<a name="dependencies"></a>
|
41 |
-
|
42 |
-
backoff==2.2.1
|
43 |
-
|
44 |
-
gradio==4.20.0
|
45 |
-
|
46 |
-
kairyou==1.5.0
|
47 |
-
|
48 |
-
easytl==0.3.3
|
49 |
-
|
50 |
-
or see requirements.txt
|
51 |
-
|
52 |
-
Also requires spacy's ja_core_news_lg model, which can be installed via the following command:
|
53 |
-
|
54 |
-
```bash
|
55 |
-
python -m spacy download ja_core_news_lg
|
56 |
-
```
|
57 |
-
|
58 |
-
or on Linux
|
59 |
-
|
60 |
-
```bash
|
61 |
-
python3 -m spacy download ja_core_news_lg
|
62 |
-
```
|
63 |
-
|
64 |
-
---------------------------------------------------------------------------------------------------------------------------------------------------
|
65 |
-
## **Quick Start**<a name="quick-start"></a>
|
66 |
-
|
67 |
-
Windows is assumed for the rest of this README, but the process should be similar for Linux. This is for the console version, for something less linear, see the [Web GUI](#webgui) section.
|
68 |
-
|
69 |
-
Due to PyPi limitations, you need to install SpaCy's JP Model, which can not be included automatically due to it being a direct dependency link which PyPi does not support. Make sure you do this after installing the requirements.txt file as it requires Kairyou/SpaCy to be installed first.
|
70 |
-
|
71 |
-
```bash
|
72 |
-
python -m spacy download ja_core_news_lg
|
73 |
-
```
|
74 |
-
|
75 |
-
Simply run Kudasai.py, enter a txt file path to the text you wish to preprocess/translate, and then insert a replacement json file path if you wish to use one. If you do not wish to use a replacement json file, you can simply input a blank space and Kudasai will skip preprocessing and go straight to translation.
|
76 |
-
|
77 |
-
Kudasai will offer to index the text, which is useful for finding new names to add to the replacement json file. This is optional and can be skipped.
|
78 |
-
|
79 |
-
After preprocessing is completed (if triggered), you will be prompted to choose a translation method.
|
80 |
-
|
81 |
-
You can choose between OpenAI, Gemini, and DeepL. Each have their own pros and cons, but OpenAI is the recommended translation method. DeepL and Gemini currently offer free versions, but all three require an api key, you will be prompted to enter this key when you choose to run the translation module.
|
82 |
-
|
83 |
-
Next, Kudasai will ask you to confirm it's settings. This can be overwhelming, but you can simply enter 1 to confirm and use the default settings. If you wish to change them, you can do so here.
|
84 |
-
|
85 |
-
See the [**Translator Settings**](#translator-settings) section for more information on Kudasai's Translation settings, but default should run fine. Inside the demo folder is a copy of the settings I use to translate COTE should you wish to use them. There is also a demo txt file in the demo folder that you can use to test Kudasai.
|
86 |
-
|
87 |
-
Kudasai will then ask if you want to change your api key, simply enter 2 for now.
|
88 |
-
|
89 |
-
Next Kudasai will display an estimated cost of translation, this is based on the number of tokens in the preprocessed text as determined by tiktoken for OpenAI, by Google for Gemini, and by DeepL for DeepL. Kudasai will then prompt for confirmation, if this is fine, enter 1 to run the translation module otherwise 2 to exit.
|
90 |
-
|
91 |
-
Kudasai will then run the translation module and output the translated text and other logs to the output folder in the same directory as Kudasai.py.
|
92 |
-
|
93 |
-
These files are:
|
94 |
-
|
95 |
-
"debug_log.txt" : A log of crucial information that occurred during Kudasai's run, useful for debugging or reporting issues as well as seeing what was done.
|
96 |
-
|
97 |
-
"error_log.txt" : A log of errors that occurred during Kudasai's run if any, useful for debugging or reporting issues.
|
98 |
|
99 |
-
|
100 |
|
101 |
-
|
102 |
|
103 |
-
|
104 |
|
105 |
-
|
106 |
|
107 |
-
|
108 |
-
|
109 |
-
If you have any questions, comments, or concerns, please feel free to open an issue.
|
110 |
|
111 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
112 |
-
## **Command Line Interface (CLI)**<a name="cli"></a>
|
113 |
-
|
114 |
-
Kudasai provides a Command Line Interface (CLI) for preprocessing and translating text files. This section details how to use the CLI, including the required and optional arguments for each mode.
|
115 |
-
|
116 |
-
### Usage
|
117 |
-
|
118 |
-
The CLI supports two modes: `preprocess` and `translate`. Each mode requires specific arguments to function properly.
|
119 |
-
|
120 |
-
#### Preprocess Mode
|
121 |
-
|
122 |
-
The `preprocess` mode preprocesses the text file using the provided replacement JSON file.
|
123 |
-
|
124 |
-
**Command Structure:**
|
125 |
-
|
126 |
-
```bash
|
127 |
-
python path_to_kudasai.py preprocess <input_file> <replacement_json> [<knowledge_base>]
|
128 |
-
```
|
129 |
|
130 |
-
**
|
131 |
-
- `<input_file>`: Path to the text file to preprocess.
|
132 |
-
- `<replacement_json>`: Path to the replacement JSON file.
|
133 |
|
134 |
-
|
135 |
-
- `<knowledge_base>`: Path to the knowledge base file (directory, file, or text).
|
136 |
|
137 |
-
|
138 |
|
139 |
-
|
140 |
-
python C:\\path\\to\\kudasai.py preprocess "C:\\path\\to\\input_file.txt" "C:\\path\\to\\replacement_json.json" "C:\\path\\to\\knowledge_base"
|
141 |
-
```
|
142 |
|
143 |
-
|
144 |
|
145 |
-
|
146 |
|
147 |
-
|
148 |
|
149 |
-
|
150 |
-
python path_to_kudasai.py translate <input_file> <translation_method> [<translation_settings_json>] [<api_key>]
|
151 |
-
```
|
152 |
-
|
153 |
-
**Required Arguments:**
|
154 |
-
- `<input_file>`: Path to the text file to translate.
|
155 |
-
|
156 |
-
**Optional Arguments:**
|
157 |
-
- `<translation_method>`: Translation method to use (`'deepl'`, `'openai'`, or `'gemini'`). Defaults to `'deepl'`.
|
158 |
-
- `<translation_settings_json>`: Path to the translation settings JSON file (overrides current settings).
|
159 |
-
- `<api_key>`: API key for the translation service. If not provided, it will use the in the settings directory or prompt for it if that's not found.
|
160 |
-
|
161 |
-
**Example:**
|
162 |
-
|
163 |
-
```bash
|
164 |
-
python C:\\path\\to\\kudasai.py translate "C:\\path\\to\\input_file.txt" gemini "C:\\path\\to\\translation_settings.json" "YOUR_API_KEY"
|
165 |
-
```
|
166 |
-
|
167 |
-
### Additional Notes
|
168 |
-
- All arguments should be enclosed in double quotes if they contain spaces. Double quotes are optional and will be stripped. Single quotes are not allowed.
|
169 |
|
170 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
171 |
|
172 |
-
## **
|
173 |
-
|
174 |
-
Preprocessing is the act of preparing text for translation by replacing certain words or phrases with their translated counterparts.
|
175 |
-
|
176 |
-
Kudasai uses Kairyou for preprocessing, which is a powerful preprocessor that can replace text in a text file based on a json file. This is useful for replacing names, places, and other things that may not translate well or to simply speed up the translation process.
|
177 |
-
|
178 |
-
You can run the preprocessor by using the CLI or simply running kudasai.py as instructed in the [Quick Start](#quick-start) section.
|
179 |
-
|
180 |
-
Many replacement json files are included in the jsons folder, you can also make your own if you wish provided it follows the same format. See an example below
|
181 |
-
Kudasai/Kairyou works with both Kudasai and Fukuin Json's, the below is a Kudasai type json.
|
182 |
-
|
183 |
-
![Example JSON](https://i.imgur.com/u3FnUia.jpg)
|
184 |
|
185 |
-
|
186 |
|
187 |
-
|
188 |
|
189 |
-
|
190 |
|
191 |
-
|
192 |
|
193 |
-
|
194 |
|
195 |
-
|
196 |
|
197 |
-
|
198 |
|
199 |
-
|
200 |
|
201 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
202 |
|
@@ -285,14 +177,8 @@ The settings are fairly complex, see the below section [Translator Settings](#tr
|
|
285 |
|
286 |
## **Web GUI**<a name="webgui"></a>
|
287 |
|
288 |
-
Kudasai also offers a Web GUI. It has all the main functionality of the program but in an easier and non-linear way.
|
289 |
-
|
290 |
-
To run the Web GUI, simply run webgui.py which is in the same directory as kudasai.py
|
291 |
-
|
292 |
Below are some images of the Web GUI.
|
293 |
|
294 |
-
Detailed Documentation for this can be found on the Hugging Face hosted version of Kudasai [here](https://huggingface.co/spaces/Bikatr7/Kudasai/blob/main/README.md).
|
295 |
-
|
296 |
Name Indexing | Kairyou:
|
297 |
![Name Indexing Screen | Kairyou](https://i.imgur.com/QCPqjrw.jpeg)
|
298 |
|
@@ -311,16 +197,6 @@ Translation Settings Page 2:
|
|
311 |
Logging Page:
|
312 |
![Logging Page](https://i.imgur.com/vDPCUQC.jpeg)
|
313 |
|
314 |
-
---------------------------------------------------------------------------------------------------------------------------------------------------
|
315 |
-
|
316 |
-
## **Hugging Face**<a name="huggingface"></a>
|
317 |
-
|
318 |
-
For those who are interested, or simply cannot run Kudasai locally, a instance of Kudasai's WebGUI is hosted on Hugging Face's servers. You can find it [here](https://huggingface.co/spaces/Bikatr7/Kudasai).
|
319 |
-
|
320 |
-
It's a bit slower than running it locally, but it's a good alternative for those who cannot run it locally. The webgui on huggingface does not save anything through runs, so you will need to download the output files or copy the text out of the webgui. API keys are not saved, and the output folder is overwritten every time it loads. Archives deleted every run as well.
|
321 |
-
|
322 |
-
To see the README for the Hugging Face hosted version of Kudasai, please see [here](https://huggingface.co/spaces/Bikatr7/Kudasai/blob/main/README.md).
|
323 |
-
|
324 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
325 |
## **License**<a name="license"></a>
|
326 |
|
|
|
1 |
+
---
|
2 |
+
license: gpl-3.0
|
3 |
+
title: Kudasai
|
4 |
+
sdk: gradio
|
5 |
+
emoji: 🈷️
|
6 |
+
python_version: 3.10.0
|
7 |
+
app_file: webgui.py
|
8 |
+
colorFrom: gray
|
9 |
+
colorTo: gray
|
10 |
+
short_description: Japanese-English preprocessor with automated translation.
|
11 |
+
pinned: true
|
12 |
+
---
|
13 |
+
|
14 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
15 |
**Table of Contents**
|
16 |
|
17 |
- [**Notes**](#notes)
|
18 |
+
- [**General Usage**](#general-usage)
|
19 |
+
- [**Indexing and Preprocessing**](#indexing-and-preprocessing)
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
- [**Translator**](#translator)
|
21 |
- [**Translator Settings**](#translator-settings)
|
22 |
- [**Web GUI**](#web-gui)
|
|
|
23 |
- [**License**](#license)
|
24 |
- [**Contact**](#contact)
|
25 |
- [**Acknowledgements**](#acknowledgements)
|
|
|
27 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
28 |
## **Notes**<a name="notes"></a>
|
29 |
|
30 |
+
This readme is for the Hugging Space instance of Kudasai's WebGUI and the WebGUI itself, to run Kudasai locally or see any info on the project, please see the [GitHub Page](https://github.com/Bikatr7/Kudasai).
|
|
|
|
|
|
|
|
|
31 |
|
32 |
Streamlining Japanese-English Translation with Advanced Preprocessing and Integrated Translation Technologies.
|
33 |
|
|
|
35 |
|
36 |
Kudasai has a public trello board, you can find it [here](https://trello.com/b/Wsuwr24S/kudasai) to see what I'm working on and what's coming up.
|
37 |
|
38 |
+
The WebGUI on huggingface does not save anything through runs, so you will need to download the output files or copy the text out of the webgui. API keys are not saved, and the output folder is overwritten every time you run it. Archives deleted every run as well.
|
39 |
+
|
40 |
Kudasai is proud to have been a Backdrop Build v3 Finalist:
|
41 |
https://backdropbuild.com/builds/v3/kudasai
|
42 |
|
43 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
|
45 |
+
## **General Usage**<a name="general-usage"></a>
|
46 |
|
47 |
+
Kudasai's WebGUI is pretty easy to understand for the general usage, most incorrect actions will be caught by the system and a message will be displayed to the user on how to correct it.
|
48 |
|
49 |
+
Normally, Kudasai would save files to the local system, but on Hugging Face's servers, this is not possible. Instead, you'll have to click the 'Save As' button to download the files to your local system.
|
50 |
|
51 |
+
Or you can click the copy button on the top right of textbox modals to copy the text to your clipboard.
|
52 |
|
53 |
+
For further details, see below chapters.
|
|
|
|
|
54 |
|
55 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
|
57 |
+
## **Indexing and Preprocessing**<a name="kairyou"></a>
|
|
|
|
|
58 |
|
59 |
+
This section can be skipped if you're only interested in translation or do not know what indexing or preprocessing is.
|
|
|
60 |
|
61 |
+
Indexing is not for everyone, only use it if you have a large amount of previous text and want to flag new names. It can be a very slow and long process, especially on Hugging Face's servers. It's recommended to use a local version of Kudasai for this process.
|
62 |
|
63 |
+
You'll need a txt file or some text to index. You'll also need a knowledge base, this can either be a single txt file or a directory of them, as well as a replacements json. Either Kudasai or Fukuin Type works. See [this](https://github.com/Bikatr7/Kairyou?tab=readme-ov-file#kairyou) for further details on replacement jsons.
|
|
|
|
|
64 |
|
65 |
+
Please do indexing before preprocessing, output is neater that way.
|
66 |
|
67 |
+
For Preprocessing, you'll need a txt file or some text to preprocess. You'll also need a replacements json. Either Kudasai or Fukuin Type works like with indexing.
|
68 |
|
69 |
+
For both, text is put in the textbox modals, with the output text being in the first field, and results being in the second field.
|
70 |
|
71 |
+
They both have a debug field, but neither module really uses it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
72 |
|
73 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
74 |
|
75 |
+
## **Translator**<a name="translator"></a>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
76 |
|
77 |
+
Kudasai supports 3 different translation methods at the moment, OpenAI's GPT, Google's Gemini, and DeepL.
|
78 |
|
79 |
+
For OpenAI, you'll need an API key, you can get one [here](https://platform.openai.com/docs/api-reference/authentication). This is a paid service with no free tier.
|
80 |
|
81 |
+
For Gemini, you'll also need an API key, you can get one [here](https://ai.google.dev/tutorials/setup). Gemini is free to use under a certain limit, 2 RPM for 1.5 and 15 RPM for 1.0.
|
82 |
|
83 |
+
For DeepL, you'll need an API key too, you can get one [here](https://www.deepl.com/pro#developer). DeepL is also a paid service but is free under 500k characters a month.
|
84 |
|
85 |
+
I'd recommend using GPT for most things, as it's generally better at translation.
|
86 |
|
87 |
+
Mostly straightforward, choose your translation method, fill in your API key, and select your text. You'll also need to add your settings file if on HuggingFace if you want to tune the output, but the default is generally fine.
|
88 |
|
89 |
+
You can calculate costs here or just translate. Output will show in the appropriate fields.
|
90 |
|
91 |
+
For further details on the settings file, see [here](#translation-with-llms-settings).
|
92 |
|
93 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
94 |
|
|
|
177 |
|
178 |
## **Web GUI**<a name="webgui"></a>
|
179 |
|
|
|
|
|
|
|
|
|
180 |
Below are some images of the Web GUI.
|
181 |
|
|
|
|
|
182 |
Name Indexing | Kairyou:
|
183 |
![Name Indexing Screen | Kairyou](https://i.imgur.com/QCPqjrw.jpeg)
|
184 |
|
|
|
197 |
Logging Page:
|
198 |
![Logging Page](https://i.imgur.com/vDPCUQC.jpeg)
|
199 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
200 |
---------------------------------------------------------------------------------------------------------------------------------------------------
|
201 |
## **License**<a name="license"></a>
|
202 |
|