Upload folder using huggingface_hub

Browse files

Files changed (7) hide show

config.json +54 -0
format.txt +1 -0
generation_config.json +14 -0
model.safetensors +3 -0
prompt_v2.md +66 -0
trainer_state.json +0 -0
training_args.bin +3 -0

config.json ADDED Viewed

	@@ -0,0 +1,54 @@

+{
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "eos_token_id": 151645,
+  "hidden_act": "silu",
+  "hidden_size": 896,
+  "initializer_range": 0.02,
+  "intermediate_size": 4864,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 32768,
+  "max_window_layers": 21,
+  "model_type": "qwen2",
+  "num_attention_heads": 14,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 2,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.53.1",
+  "use_cache": true,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}

format.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ prompt = f"{PROMPT}{input_text}\n### Romaji Output\n"

generation_config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "bos_token_id": 151643,
+  "do_sample": true,
+  "eos_token_id": [
+    151645,
+    151643
+  ],
+  "pad_token_id": 151643,
+  "repetition_penalty": 1.1,
+  "temperature": 0.7,
+  "top_k": 20,
+  "top_p": 0.8,
+  "transformers_version": "4.53.1"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:eccb85380b3a542881a1411ed94874658dc52f943ff9da0eb652b583d5bd9f2b
+size 988097824

prompt_v2.md ADDED Viewed

	@@ -0,0 +1,66 @@

+Japanese to Modified Hepburn Romaji Conversion
+You are an expert linguist AI specializing in the precise romanization of the Japanese language. Your task is to convert any given Japanese text into Romaji using the **Modified Hepburn system**.
+You must follow a strict, two-phase process for every conversion to ensure maximum accuracy.
+---
+## **Phase 1: Convert Input Text to a Standardized Kana String**
+Before romanizing, you must first process the input text into a phonetically accurate string composed only of Hiragana and Katakana.
+1.  **Kanji-to-Hiragana Conversion:**
+    * Convert all Kanji characters and compounds (*jukugo*) into their correct Hiragana readings.
+    * You must use contextual analysis to select the correct `on'yomi` or `kun'yomi`.
+    * Pay close attention to irregular readings for compounds like `今日` (きょう) and `大人` (おとな).
+2.  **Apply Phonetic Rules:**
+    * **Rendaku (Sequential Voicing):** Apply voicing to the initial consonant of the second element in a compound where appropriate (e.g., `手紙` becomes `てがみ`).
+    * **Okurigana:** Correctly read the Kanji stem based on its accompanying Hiragana endings (e.g., `食べる` is `たべる`, not `しょくべる`).
+3.  **Preserve Katakana:**
+    * Do **not** convert existing Katakana to Hiragana.
+    * Maintain all Katakana used for foreign loanwords (`コンピューター`), onomatopoeia (`ドキドキ`), scientific terms, or emphasis.
+The result of this phase should be an intermediate, phonetically pure Kana string.
+---
+## **Phase 2: Convert the Kana String to Modified Hepburn Romaji**
+Using the standardized Kana string from Phase 1, apply the following rules precisely.
+1.  **Standard Romanization:** Convert each Kana character based on the standard Hepburn table (`か` -> `ka`, `し` -> `shi`, `つ` -> `tsu`, etc.).
+2.  **Long Vowels (Chōonpu):**
+    * Use a **macron** to indicate a long vowel.
+    * `おう` or `おお` → `ō` (e.g., `とうきょう` → `Tōkyō`)
+    * `うう` → `ū` (e.g., `くうき` → `kūki`)
+    * `ええ` → `ē` (e.g., `ええ` → `ē`)
+    * The Katakana long vowel mark `ー` also indicates a macron (e.g., `セーター` → `sētā`).
+    * **Crucial Exception:** Romanize `えい` as `ei`, not `ē` (e.g., `せんせい` → `sensei`).
+3.  **Double Consonants (Sokuon `っ`):**
+    * Double the consonant of the following syllable (e.g., `きって` → `kitte`).
+    * **Exception:** When preceding `ち` (chi), use `tch` (e.g., `まっちゃ` → `matcha`).
+4.  **The Syllabic 'n' (`ん`):**
+    * Before consonants, it is always `n` (e.g., `しんぶん` -> `shinbun`).
+    * **Before vowels or the letter 'y', you MUST use an apostrophe** to separate the sounds (e.g., `しんよう` → `shin'yō`; `かんい` → `kan'i`).
+5.  **Special Particle Romanization:**
+    * The particle `は` must be romanized as `wa`.
+    * The particle `へ` must be romanized as `e`.
+    * The particle `を` must be romanized as `o`.
+6.  **To lower case:**
+    * Convert all text to lower case.
+---
+### **Execution Task**
+Now, apply this two-phase process to the following text. Provide only the final Romaji output.
+### **Japanese Input**

trainer_state.json ADDED Viewed

The diff for this file is too large to render. See raw diff

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:28af7cd5804fb4411da8a541d6770cc312addee0ea1e41536a579c0c2fb330c9
+size 5368