Bikatr7 commited on
Commit
99bd90b
1 Parent(s): 4bc65dc

added readme

Browse files
Files changed (1) hide show
  1. README.md +40 -164
README.md CHANGED
@@ -1,19 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---------------------------------------------------------------------------------------------------------------------------------------------------
2
  **Table of Contents**
3
 
4
  - [**Notes**](#notes)
5
- - [**Dependencies**](#dependencies)
6
- - [**Quick Start**](#quick-start)
7
- - [**Command Line Interface (CLI)**](#command-line-interface-cli)
8
- - [Usage](#usage)
9
- - [Preprocess Mode](#preprocess-mode)
10
- - [Translate Mode](#translate-mode)
11
- - [Additional Notes](#additional-notes)
12
- - [**Preprocessing**](#preprocessing)
13
  - [**Translator**](#translator)
14
  - [**Translator Settings**](#translator-settings)
15
  - [**Web GUI**](#web-gui)
16
- - [**Hugging Face**](#hugging-face)
17
  - [**License**](#license)
18
  - [**Contact**](#contact)
19
  - [**Acknowledgements**](#acknowledgements)
@@ -21,11 +27,7 @@
21
  ---------------------------------------------------------------------------------------------------------------------------------------------------
22
  ## **Notes**<a name="notes"></a>
23
 
24
- Windows 10 and Linux Mint are the only tested operating systems, feel free to test on other operating systems and report back to me. I will do my best to fix any issues that arise.
25
-
26
- To see the README for the Hugging Face hosted version of Kudasai, please see [here](https://github.com/Bikatr7/Kudasai/blob/main/lib/gui/HUGGING_FACE_README.md). Further WebGUI documentation can be found there as well.
27
-
28
- Python version: 3.10+
29
 
30
  Streamlining Japanese-English Translation with Advanced Preprocessing and Integrated Translation Technologies.
31
 
@@ -33,170 +35,60 @@ Preprocessor and Translation logic is sourced from external packages, which I al
33
 
34
  Kudasai has a public trello board, you can find it [here](https://trello.com/b/Wsuwr24S/kudasai) to see what I'm working on and what's coming up.
35
 
 
 
36
  Kudasai is proud to have been a Backdrop Build v3 Finalist:
37
  https://backdropbuild.com/builds/v3/kudasai
38
 
39
  ---------------------------------------------------------------------------------------------------------------------------------------------------
40
- ## **Dependencies**<a name="dependencies"></a>
41
-
42
- backoff==2.2.1
43
-
44
- gradio==4.20.0
45
-
46
- kairyou==1.5.0
47
-
48
- easytl==0.3.3
49
-
50
- or see requirements.txt
51
-
52
- Also requires spacy's ja_core_news_lg model, which can be installed via the following command:
53
-
54
- ```bash
55
- python -m spacy download ja_core_news_lg
56
- ```
57
-
58
- or on Linux
59
-
60
- ```bash
61
- python3 -m spacy download ja_core_news_lg
62
- ```
63
-
64
- ---------------------------------------------------------------------------------------------------------------------------------------------------
65
- ## **Quick Start**<a name="quick-start"></a>
66
-
67
- Windows is assumed for the rest of this README, but the process should be similar for Linux. This is for the console version, for something less linear, see the [Web GUI](#webgui) section.
68
-
69
- Due to PyPi limitations, you need to install SpaCy's JP Model, which can not be included automatically due to it being a direct dependency link which PyPi does not support. Make sure you do this after installing the requirements.txt file as it requires Kairyou/SpaCy to be installed first.
70
-
71
- ```bash
72
- python -m spacy download ja_core_news_lg
73
- ```
74
-
75
- Simply run Kudasai.py, enter a txt file path to the text you wish to preprocess/translate, and then insert a replacement json file path if you wish to use one. If you do not wish to use a replacement json file, you can simply input a blank space and Kudasai will skip preprocessing and go straight to translation.
76
-
77
- Kudasai will offer to index the text, which is useful for finding new names to add to the replacement json file. This is optional and can be skipped.
78
-
79
- After preprocessing is completed (if triggered), you will be prompted to choose a translation method.
80
-
81
- You can choose between OpenAI, Gemini, and DeepL. Each have their own pros and cons, but OpenAI is the recommended translation method. DeepL and Gemini currently offer free versions, but all three require an api key, you will be prompted to enter this key when you choose to run the translation module.
82
-
83
- Next, Kudasai will ask you to confirm it's settings. This can be overwhelming, but you can simply enter 1 to confirm and use the default settings. If you wish to change them, you can do so here.
84
-
85
- See the [**Translator Settings**](#translator-settings) section for more information on Kudasai's Translation settings, but default should run fine. Inside the demo folder is a copy of the settings I use to translate COTE should you wish to use them. There is also a demo txt file in the demo folder that you can use to test Kudasai.
86
-
87
- Kudasai will then ask if you want to change your api key, simply enter 2 for now.
88
-
89
- Next Kudasai will display an estimated cost of translation, this is based on the number of tokens in the preprocessed text as determined by tiktoken for OpenAI, by Google for Gemini, and by DeepL for DeepL. Kudasai will then prompt for confirmation, if this is fine, enter 1 to run the translation module otherwise 2 to exit.
90
-
91
- Kudasai will then run the translation module and output the translated text and other logs to the output folder in the same directory as Kudasai.py.
92
-
93
- These files are:
94
-
95
- "debug_log.txt" : A log of crucial information that occurred during Kudasai's run, useful for debugging or reporting issues as well as seeing what was done.
96
-
97
- "error_log.txt" : A log of errors that occurred during Kudasai's run if any, useful for debugging or reporting issues.
98
 
99
- "je_check_text.txt" : A log of the Japanese and English sentences that were paired together, useful for checking the accuracy of the translation and further editing of a machine translation.
100
 
101
- "preprocessed_text.txt" : The preprocessed text, the text output by Kairyou (preprocessor).
102
 
103
- "preprocessing_results.txt" : A log of the results of the preprocessing, shows what was replaced and how many times.
104
 
105
- "translated_text.txt" : The translated text, the text output by Kaiseki or Kijiku.
106
 
107
- Old runs are stored in the archive folder in output as well.
108
-
109
- If you have any questions, comments, or concerns, please feel free to open an issue.
110
 
111
  ---------------------------------------------------------------------------------------------------------------------------------------------------
112
- ## **Command Line Interface (CLI)**<a name="cli"></a>
113
-
114
- Kudasai provides a Command Line Interface (CLI) for preprocessing and translating text files. This section details how to use the CLI, including the required and optional arguments for each mode.
115
-
116
- ### Usage
117
-
118
- The CLI supports two modes: `preprocess` and `translate`. Each mode requires specific arguments to function properly.
119
-
120
- #### Preprocess Mode
121
-
122
- The `preprocess` mode preprocesses the text file using the provided replacement JSON file.
123
-
124
- **Command Structure:**
125
-
126
- ```bash
127
- python path_to_kudasai.py preprocess <input_file> <replacement_json> [<knowledge_base>]
128
- ```
129
 
130
- **Required Arguments:**
131
- - `<input_file>`: Path to the text file to preprocess.
132
- - `<replacement_json>`: Path to the replacement JSON file.
133
 
134
- **Optional Arguments:**
135
- - `<knowledge_base>`: Path to the knowledge base file (directory, file, or text).
136
 
137
- **Example:**
138
 
139
- ```bash
140
- python C:\\path\\to\\kudasai.py preprocess "C:\\path\\to\\input_file.txt" "C:\\path\\to\\replacement_json.json" "C:\\path\\to\\knowledge_base"
141
- ```
142
 
143
- #### Translate Mode
144
 
145
- The `translate` mode translates the text file using the specified translation method.
146
 
147
- **Command Structure:**
148
 
149
- ```bash
150
- python path_to_kudasai.py translate <input_file> <translation_method> [<translation_settings_json>] [<api_key>]
151
- ```
152
-
153
- **Required Arguments:**
154
- - `<input_file>`: Path to the text file to translate.
155
-
156
- **Optional Arguments:**
157
- - `<translation_method>`: Translation method to use (`'deepl'`, `'openai'`, or `'gemini'`). Defaults to `'deepl'`.
158
- - `<translation_settings_json>`: Path to the translation settings JSON file (overrides current settings).
159
- - `<api_key>`: API key for the translation service. If not provided, it will use the in the settings directory or prompt for it if that's not found.
160
-
161
- **Example:**
162
-
163
- ```bash
164
- python C:\\path\\to\\kudasai.py translate "C:\\path\\to\\input_file.txt" gemini "C:\\path\\to\\translation_settings.json" "YOUR_API_KEY"
165
- ```
166
-
167
- ### Additional Notes
168
- - All arguments should be enclosed in double quotes if they contain spaces. Double quotes are optional and will be stripped. Single quotes are not allowed.
169
 
170
  ---------------------------------------------------------------------------------------------------------------------------------------------------
171
 
172
- ## **Preprocessing**<a name="preprocessing"></a>
173
-
174
- Preprocessing is the act of preparing text for translation by replacing certain words or phrases with their translated counterparts.
175
-
176
- Kudasai uses Kairyou for preprocessing, which is a powerful preprocessor that can replace text in a text file based on a json file. This is useful for replacing names, places, and other things that may not translate well or to simply speed up the translation process.
177
-
178
- You can run the preprocessor by using the CLI or simply running kudasai.py as instructed in the [Quick Start](#quick-start) section.
179
-
180
- Many replacement json files are included in the jsons folder, you can also make your own if you wish provided it follows the same format. See an example below
181
- Kudasai/Kairyou works with both Kudasai and Fukuin Json's, the below is a Kudasai type json.
182
-
183
- ![Example JSON](https://i.imgur.com/u3FnUia.jpg)
184
 
185
- ---------------------------------------------------------------------------------------------------------------------------------------------------
186
 
187
- ## **Translator**<a name="translator"></a>
188
 
189
- Kudasai uses EasyTL for translation, which is a versatile translation library that uses several translation APIs to translate text.
190
 
191
- Kudasai currently supports OpenAI, Gemini, and DeepL for translation. OpenAI is the recommended translation method, but DeepL and Gemini are also good alternatives.
192
 
193
- You can run the translator by running kudasai.py as instructed in the [Quick Start](#quick-start) section.
194
 
195
- Note that you need an API key for OpenAI, Gemini, and DeepL. You will be prompted to enter this key when you choose to run the translation module.
196
 
197
- The translator has a lot of settings, simply using the default settings is fine or the one provided in the demo folder. You can also change these manually when confirming your settings, as well as loading a custom json as your settings by pressing c at this window, with the settings in the script directory.
198
 
199
- The settings are fairly complex, see the below section [Translator Settings](#translator-settings) for more information.
200
 
201
  ---------------------------------------------------------------------------------------------------------------------------------------------------
202
 
@@ -285,14 +177,8 @@ The settings are fairly complex, see the below section [Translator Settings](#tr
285
 
286
  ## **Web GUI**<a name="webgui"></a>
287
 
288
- Kudasai also offers a Web GUI. It has all the main functionality of the program but in an easier and non-linear way.
289
-
290
- To run the Web GUI, simply run webgui.py which is in the same directory as kudasai.py
291
-
292
  Below are some images of the Web GUI.
293
 
294
- Detailed Documentation for this can be found on the Hugging Face hosted version of Kudasai [here](https://huggingface.co/spaces/Bikatr7/Kudasai/blob/main/README.md).
295
-
296
  Name Indexing | Kairyou:
297
  ![Name Indexing Screen | Kairyou](https://i.imgur.com/QCPqjrw.jpeg)
298
 
@@ -311,16 +197,6 @@ Translation Settings Page 2:
311
  Logging Page:
312
  ![Logging Page](https://i.imgur.com/vDPCUQC.jpeg)
313
 
314
- ---------------------------------------------------------------------------------------------------------------------------------------------------
315
-
316
- ## **Hugging Face**<a name="huggingface"></a>
317
-
318
- For those who are interested, or simply cannot run Kudasai locally, a instance of Kudasai's WebGUI is hosted on Hugging Face's servers. You can find it [here](https://huggingface.co/spaces/Bikatr7/Kudasai).
319
-
320
- It's a bit slower than running it locally, but it's a good alternative for those who cannot run it locally. The webgui on huggingface does not save anything through runs, so you will need to download the output files or copy the text out of the webgui. API keys are not saved, and the output folder is overwritten every time it loads. Archives deleted every run as well.
321
-
322
- To see the README for the Hugging Face hosted version of Kudasai, please see [here](https://huggingface.co/spaces/Bikatr7/Kudasai/blob/main/README.md).
323
-
324
  ---------------------------------------------------------------------------------------------------------------------------------------------------
325
  ## **License**<a name="license"></a>
326
 
 
1
+ ---
2
+ license: gpl-3.0
3
+ title: Kudasai
4
+ sdk: gradio
5
+ emoji: 🈷️
6
+ python_version: 3.10.0
7
+ app_file: webgui.py
8
+ colorFrom: gray
9
+ colorTo: gray
10
+ short_description: Japanese-English preprocessor with automated translation.
11
+ pinned: true
12
+ ---
13
+
14
  ---------------------------------------------------------------------------------------------------------------------------------------------------
15
  **Table of Contents**
16
 
17
  - [**Notes**](#notes)
18
+ - [**General Usage**](#general-usage)
19
+ - [**Indexing and Preprocessing**](#indexing-and-preprocessing)
 
 
 
 
 
 
20
  - [**Translator**](#translator)
21
  - [**Translator Settings**](#translator-settings)
22
  - [**Web GUI**](#web-gui)
 
23
  - [**License**](#license)
24
  - [**Contact**](#contact)
25
  - [**Acknowledgements**](#acknowledgements)
 
27
  ---------------------------------------------------------------------------------------------------------------------------------------------------
28
  ## **Notes**<a name="notes"></a>
29
 
30
+ This readme is for the Hugging Space instance of Kudasai's WebGUI and the WebGUI itself, to run Kudasai locally or see any info on the project, please see the [GitHub Page](https://github.com/Bikatr7/Kudasai).
 
 
 
 
31
 
32
  Streamlining Japanese-English Translation with Advanced Preprocessing and Integrated Translation Technologies.
33
 
 
35
 
36
  Kudasai has a public trello board, you can find it [here](https://trello.com/b/Wsuwr24S/kudasai) to see what I'm working on and what's coming up.
37
 
38
+ The WebGUI on huggingface does not save anything through runs, so you will need to download the output files or copy the text out of the webgui. API keys are not saved, and the output folder is overwritten every time you run it. Archives deleted every run as well.
39
+
40
  Kudasai is proud to have been a Backdrop Build v3 Finalist:
41
  https://backdropbuild.com/builds/v3/kudasai
42
 
43
  ---------------------------------------------------------------------------------------------------------------------------------------------------
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
+ ## **General Usage**<a name="general-usage"></a>
46
 
47
+ Kudasai's WebGUI is pretty easy to understand for the general usage, most incorrect actions will be caught by the system and a message will be displayed to the user on how to correct it.
48
 
49
+ Normally, Kudasai would save files to the local system, but on Hugging Face's servers, this is not possible. Instead, you'll have to click the 'Save As' button to download the files to your local system.
50
 
51
+ Or you can click the copy button on the top right of textbox modals to copy the text to your clipboard.
52
 
53
+ For further details, see below chapters.
 
 
54
 
55
  ---------------------------------------------------------------------------------------------------------------------------------------------------
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
+ ## **Indexing and Preprocessing**<a name="kairyou"></a>
 
 
58
 
59
+ This section can be skipped if you're only interested in translation or do not know what indexing or preprocessing is.
 
60
 
61
+ Indexing is not for everyone, only use it if you have a large amount of previous text and want to flag new names. It can be a very slow and long process, especially on Hugging Face's servers. It's recommended to use a local version of Kudasai for this process.
62
 
63
+ You'll need a txt file or some text to index. You'll also need a knowledge base, this can either be a single txt file or a directory of them, as well as a replacements json. Either Kudasai or Fukuin Type works. See [this](https://github.com/Bikatr7/Kairyou?tab=readme-ov-file#kairyou) for further details on replacement jsons.
 
 
64
 
65
+ Please do indexing before preprocessing, output is neater that way.
66
 
67
+ For Preprocessing, you'll need a txt file or some text to preprocess. You'll also need a replacements json. Either Kudasai or Fukuin Type works like with indexing.
68
 
69
+ For both, text is put in the textbox modals, with the output text being in the first field, and results being in the second field.
70
 
71
+ They both have a debug field, but neither module really uses it.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
 
73
  ---------------------------------------------------------------------------------------------------------------------------------------------------
74
 
75
+ ## **Translator**<a name="translator"></a>
 
 
 
 
 
 
 
 
 
 
 
76
 
77
+ Kudasai supports 3 different translation methods at the moment, OpenAI's GPT, Google's Gemini, and DeepL.
78
 
79
+ For OpenAI, you'll need an API key, you can get one [here](https://platform.openai.com/docs/api-reference/authentication). This is a paid service with no free tier.
80
 
81
+ For Gemini, you'll also need an API key, you can get one [here](https://ai.google.dev/tutorials/setup). Gemini is free to use under a certain limit, 2 RPM for 1.5 and 15 RPM for 1.0.
82
 
83
+ For DeepL, you'll need an API key too, you can get one [here](https://www.deepl.com/pro#developer). DeepL is also a paid service but is free under 500k characters a month.
84
 
85
+ I'd recommend using GPT for most things, as it's generally better at translation.
86
 
87
+ Mostly straightforward, choose your translation method, fill in your API key, and select your text. You'll also need to add your settings file if on HuggingFace if you want to tune the output, but the default is generally fine.
88
 
89
+ You can calculate costs here or just translate. Output will show in the appropriate fields.
90
 
91
+ For further details on the settings file, see [here](#translation-with-llms-settings).
92
 
93
  ---------------------------------------------------------------------------------------------------------------------------------------------------
94
 
 
177
 
178
  ## **Web GUI**<a name="webgui"></a>
179
 
 
 
 
 
180
  Below are some images of the Web GUI.
181
 
 
 
182
  Name Indexing | Kairyou:
183
  ![Name Indexing Screen | Kairyou](https://i.imgur.com/QCPqjrw.jpeg)
184
 
 
197
  Logging Page:
198
  ![Logging Page](https://i.imgur.com/vDPCUQC.jpeg)
199
 
 
 
 
 
 
 
 
 
 
 
200
  ---------------------------------------------------------------------------------------------------------------------------------------------------
201
  ## **License**<a name="license"></a>
202