Improve language tag

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show

README.md +208 -196

README.md CHANGED Viewed

@@ -1,197 +1,209 @@
----
-license: apache-2.0
-license_link: https://huggingface.co/Qwen/Qwen2.5-14B-Instruct/blob/main/LICENSE
-language:
-- en
-pipeline_tag: text-generation
-base_model: Qwen/Qwen2.5-14B
-tags:
-- chat
-- linkedin
-library_name: transformers
----
-# LinkedQwen2.5-14B-Instruct: Fine-tuned LinkedIn Post Generator
-## Model Details
-* **Model Name:** jacobpwarren/LinkedQwen2.5-14B-Instruct
-* **Base Model:** ~[Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)~
-* **Framework:** Built using the Emulate Framework
-* **License:** Apache-2.0 (inherited from base model)
-* **Fine-tuning Focus:** LinkedIn post generation with custom writing style parameters
-* **Languages Supported:** English (primary), plus 28+ languages from base model
-## Model Description
-LinkedQwen2.5-14B-Instruct is a specialized language model fine-tuned on the Qwen2.5-14B-Instruct base for generating high-quality LinkedIn posts with customizable writing styles. The model can produce various post structures while adhering to specific writing patterns, tones, and stylistic features derived from analysis of successful LinkedIn content.
-## Emulate Framework Implementation
-This model is a practical demonstration of the Emulate Framework in action - a methodology that transforms expert knowledge and workflows into defensible AI capabilities that preserve unique competitive advantages. Unlike generic implementations, LinkedQwen2.5-14B-Instruct was built by:
-1. **Reverse-engineering expert workflows** rather than starting with available data
-2. **Capturing writing style fingerprints** including sentence structure, vocabulary patterns, and narrative flow
-3. **Developing differentiated features** that preserve the unique elements of successful LinkedIn content
-4. **Creating end-to-end automation** that completes valuable processes rather than just providing information
-The model represents the full execution of the Emulate process, including workflow decomposition, expert style fingerprinting, feature engineering, and systematic validation against business metrics.
-To learn more about fine-tuning competitively differentiated models, visit https://emulateframework.ai.
-## Intended Use
-This model is designed for:
-* Content creators seeking to generate professional LinkedIn posts
-* Marketing professionals developing social media content
-* Individuals looking to improve their LinkedIn presence with stylistically consistent posts
-* Teams wanting to maintain brand voice across LinkedIn communications
-## Base Model: Qwen2.5-14B-Instruct
-The fine-tuning builds upon Qwen2.5-14B-Instruct, which features:
-* 14.7B parameters (13.1B non-embedding)
-* Causal language model architecture with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
-* 48 layers with 40 attention heads for Q and 8 for KV
-* Context length support up to 131,072 tokens with generation capability of 8,192 tokens
-* Significant improvements in instruction following and generating structured outputs
-## Fine-tuning Dataset
-The model was fine-tuned on a dataset of high-performing LinkedIn posts, analyzed for various writing style features including:
-* Sentence structure patterns
-* Vocabulary richness
-* Line break usage
-* Punctuation patterns
-* Bullet styles
-* Topic flow and shifts
-* Narrative structure
-* Pacing and sentiment arcs
-## Prompt Template
-The model accepts stylized prompts in the following format:
-````
-# Request
-Create a LinkedIn post that **[structure type]** **on the topic of**: `[topic]`
-### Key Message
-```
-[opinion] [context]
-```
-### Writing Constraints
-- **Suggested Post Length**: [max_length]
-- **Emoji Usage**: [emoji_usage]
-- **Tone**: [tone]
-### Writing Style Features
-- **Sentence Structure**: [sentence_structure_description]
-- **Vocabulary Usage**: [vocabulary_usage_description]
-- **Common Phrases**: [common_phrases_description]
-- **Section Divider**: [divider_style]
-- **Line Break Usage**: [line_break_description]
-- **Punctuation**: [punctuation_description]
-- **Bullet Styles**: [bullet_styles_description]
-- **Topic Shifts**: [topic_shifts_description]
-- **Narrative Flow**: [narrative_flow_description]
-- **Pacing**: [pacing_description]
-- **Sentiment Arc**: [sentiment_arc_description]
-- **Profanity Level**: [profanity]
-````
-# Parameters and Variables
-### Required Parameters
-| Parameter   | Type       | Description                         | Possible Values                                              |
-|:-----------:|:----------:|:-----------------------------------:|:------------------------------------------------------------:|
-| structure   | String     | Post structure type                 | "instructional", "reflective", "inspirational", "controversial", "insightful", "comparative", "announcement" |
-| topic       | Short text | Main subject of the post            | Open-ended                                                   |
-| opinion     | Short text | The user's viewpoint                | Open-ended                                                   |
-| context     | Long text  | Background information for the post | Open-ended                                                   |
-| max_length  | String     | Target length for the post          | "Up to 750 characters long", "Between 750 and 1,500 characters long", "Between 1,500 and 3,000 characters long" |
-| emoji_usage | String     | Level of emoji inclusion            | "none", "very low", "low", "medium", "high", "extreme"       |
-| tone        | String     | Overall emotional register          | "adventurous", "artistic", "assertive", "authoritative", "bold", "bright", "calm", "capable", "caring", "casual", "charming", "cheerful", "clever", "cocky", "colorful", "comfortable", "conversational", "creative", "daring", "delightful", "detailed", "dramatic", "dry", "eccentric", "elegant", "endearing", "energetic", "engaging", "exciting", "fabulous", "fancy", "fierce", "formal", "friendly", "fun", "futuristic", "glamorous", "honorable", "industrial", "informative", "inspiring", "intense", "inviting", "lively", "natural", "no-nonsense", "persuasive", "playful", "powerful", "professional", "quirky", "rebellious", "reliable", "sarcastic", "savvy", "scholarly", "secure", "serious", "silly", "sleek", "smart", "soothing", "sophisticated", "stable", "stimulating", "strong", "swanky", "tasteful", "thoughtful", "trustworthy", "unconventional", "unique", "upbeat", "versatile", "whimsical", "witty" |
-### Optional Writing Style Features
-| Feature | Type | Description |
-|:-:|:-:|:-:|
-| sentence_structure | Array | Lengths of sentences, analyzed for patterns and described |
-| vocabulary_usage | Float | Ratio of unique words to total words, described in qualitative terms |
-| common_phrases | Array | Distinctive phrases identified in the writing style |
-| divider_style | String | Character pattern used to separate sections |
-| line_breaks | Integer | Count of line breaks, with qualitative description |
-| punctuation_usage | Object | Frequency of various punctuation marks |
-| bullet_styles | String | Type of bullet point formatting |
-| topic_shifts | Array | Indicators of subject changes throughout the post |
-| flow | Array | Narrative progression patterns |
-| pacing | String | Rhythm of the content delivery |
-| sentiment_arc | String | Emotional progression pattern |
-| profanity | String | Level of profanity allowed |
-## Style Feature Descriptions
-### Sentence Structure
-Based on analysis of sentence lengths, classified as:
-* "Short sentences, suggesting brevity and conciseness."
-* "Long and complex sentences, indicating a detailed and elaborate style."
-* "A mix of short and long sentences, showing a balanced style."
-### Vocabulary Usage
-Calculated as a ratio of unique words to total words:
-* 50% unique: "A rich vocabulary, showcasing extensive language use and depth."
-* 35% unique: "A developed vocabulary, indicating a wide range of language and expression."
-* 25% unique: "A normal vocabulary, reflecting a balanced and versatile use of language."
-* 15% unique: "A conservative vocabulary, suggesting a focused and deliberate choice of words."
-* ≤15% unique: "A very narrow vocabulary, highlighting a specific and targeted use of language."
-### Line Break Usage
-Based on frequency and average density:
-* No breaks: "No line breaks, indicating a continuous block of text."
-* Many breaks: "Frequent line breaks, contributing to an easy-to-read structure."
-* Few breaks: "Fewer line breaks, indicating a more compact writing style."
-* Moderate: "A moderate number of line breaks, balancing readability and density."
-### Punctuation
-Analysis of punctuation frequency relative to text length, with descriptions like:
-* "Heavy use of periods/commas/exclamation marks/question marks/semicolons."
-* "Regular use of periods/commas/exclamation marks/question marks/semicolons."
-* "Standard punctuation usage."
-### Bullet Styles
-Categorized as:
-* Various symbol types: "-", "•", "#", "1.", "a.", etc.
-* "Differing Emojis": Using various emojis as bullet points
-* "EmojiBullets": Multiple emojis as bullets
-* "Mixed Bullet Styles": Multiple formatting approaches
-### Topic Shifts
-Based on semantic shift analysis between segments:
-* Dynamic (>0.8): "Dynamic topic shifts, showing a highly versatile and engaging writing style."
-* Regular (>0.6): "Regular topic shifts, reflecting a balanced and varied approach."
-* Moderate (>0.4): "Moderate topic shifts, indicating a well-rounded but focused narrative."
-* Conservative (>0.2): "Conservative topic shifts, suggesting a cautious approach to topic changes."
-* Consistent (≤0.2): "Consistent topic focus, highlighting a deep and thorough exploration of subjects."
-### Narrative Flow
-Captured as a sequence of content structure types:
-* "Introduction/Setup"
-* "Conflict/Resolution Point"
-* "Introduction/Development"
-* "Transition/Reflection"
-### Pacing
-Classified as:
-* "Fast"
-* "Slow"
-* "Variable"
-* "Dynamic"
-* "Moderate"
-* "Short/Not Enough Data" (if <3 sentences)
-### Sentiment Arc
-Progression of emotional tone:
-* "Upward Trend": Increasingly positive
-* "Downward Trend": Increasingly negative
-* "Stable": Consistent emotional tone
-* "Complex/Variable": Multiple shifts
-* "Short/Not Enough Data for Arc": Insufficient for analysis
-## Limitations
-* The model inherits the limitations of the base Qwen2.5-14B-Instruct model
-* Style analysis is most accurate for English-language content
-* Certain combinations of style parameters may produce inconsistent results
-* Performance may vary for highly technical or specialized industry topics
-* Not all writing style features may be represented in generated output with equal fidelity
-## Ethical Considerations
-* The model should not be used to generate misleading or false professional information
-* Users should verify factual claims in generated content before publishing on LinkedIn
-* Consideration should be given to professional norms and cultural sensitivities in different industries and regions
-## Citation
-```
-@misc{LinkedQwen2.5-14B-Instruct,
-    title = {LinkedQwen2.5-14B-Instruct: Fine-tuned LinkedIn Post Generator},
-    url = {https://huggingface.co/jacobpwarren/LinkedQwen2.5-14B-Instruct},
-    author = {Jacob Warren},
-    year = {2025},
-    month = {April}
-}
-@misc{qwen2.5,
-    title = {Qwen2.5: A Party of Foundation Models},
-    url = {https://qwenlm.github.io/blog/qwen2.5/},
-    author = {Qwen Team},
-    month = {September},
-    year = {2024}
-}
 ```

+---
+license: apache-2.0
+license_link: https://huggingface.co/Qwen/Qwen2.5-14B-Instruct/blob/main/LICENSE
+language:
+- zho
+- eng
+- fra
+- spa
+- por
+- deu
+- ita
+- rus
+- jpn
+- kor
+- vie
+- tha
+- ara
+pipeline_tag: text-generation
+base_model: Qwen/Qwen2.5-14B
+tags:
+- chat
+- linkedin
+library_name: transformers
+---
+# LinkedQwen2.5-14B-Instruct: Fine-tuned LinkedIn Post Generator
+## Model Details
+* **Model Name:** jacobpwarren/LinkedQwen2.5-14B-Instruct
+* **Base Model:** ~[Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)~
+* **Framework:** Built using the Emulate Framework
+* **License:** Apache-2.0 (inherited from base model)
+* **Fine-tuning Focus:** LinkedIn post generation with custom writing style parameters
+* **Languages Supported:** English (primary), plus 28+ languages from base model
+## Model Description
+LinkedQwen2.5-14B-Instruct is a specialized language model fine-tuned on the Qwen2.5-14B-Instruct base for generating high-quality LinkedIn posts with customizable writing styles. The model can produce various post structures while adhering to specific writing patterns, tones, and stylistic features derived from analysis of successful LinkedIn content.
+## Emulate Framework Implementation
+This model is a practical demonstration of the Emulate Framework in action - a methodology that transforms expert knowledge and workflows into defensible AI capabilities that preserve unique competitive advantages. Unlike generic implementations, LinkedQwen2.5-14B-Instruct was built by:
+1. **Reverse-engineering expert workflows** rather than starting with available data
+2. **Capturing writing style fingerprints** including sentence structure, vocabulary patterns, and narrative flow
+3. **Developing differentiated features** that preserve the unique elements of successful LinkedIn content
+4. **Creating end-to-end automation** that completes valuable processes rather than just providing information
+The model represents the full execution of the Emulate process, including workflow decomposition, expert style fingerprinting, feature engineering, and systematic validation against business metrics.
+To learn more about fine-tuning competitively differentiated models, visit https://emulateframework.ai.
+## Intended Use
+This model is designed for:
+* Content creators seeking to generate professional LinkedIn posts
+* Marketing professionals developing social media content
+* Individuals looking to improve their LinkedIn presence with stylistically consistent posts
+* Teams wanting to maintain brand voice across LinkedIn communications
+## Base Model: Qwen2.5-14B-Instruct
+The fine-tuning builds upon Qwen2.5-14B-Instruct, which features:
+* 14.7B parameters (13.1B non-embedding)
+* Causal language model architecture with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
+* 48 layers with 40 attention heads for Q and 8 for KV
+* Context length support up to 131,072 tokens with generation capability of 8,192 tokens
+* Significant improvements in instruction following and generating structured outputs
+## Fine-tuning Dataset
+The model was fine-tuned on a dataset of high-performing LinkedIn posts, analyzed for various writing style features including:
+* Sentence structure patterns
+* Vocabulary richness
+* Line break usage
+* Punctuation patterns
+* Bullet styles
+* Topic flow and shifts
+* Narrative structure
+* Pacing and sentiment arcs
+## Prompt Template
+The model accepts stylized prompts in the following format:
+````
+# Request
+Create a LinkedIn post that **[structure type]** **on the topic of**: `[topic]`
+### Key Message
+```
+[opinion] [context]
+```
+### Writing Constraints
+- **Suggested Post Length**: [max_length]
+- **Emoji Usage**: [emoji_usage]
+- **Tone**: [tone]
+### Writing Style Features
+- **Sentence Structure**: [sentence_structure_description]
+- **Vocabulary Usage**: [vocabulary_usage_description]
+- **Common Phrases**: [common_phrases_description]
+- **Section Divider**: [divider_style]
+- **Line Break Usage**: [line_break_description]
+- **Punctuation**: [punctuation_description]
+- **Bullet Styles**: [bullet_styles_description]
+- **Topic Shifts**: [topic_shifts_description]
+- **Narrative Flow**: [narrative_flow_description]
+- **Pacing**: [pacing_description]
+- **Sentiment Arc**: [sentiment_arc_description]
+- **Profanity Level**: [profanity]
+````
+# Parameters and Variables
+### Required Parameters
+| Parameter   | Type       | Description                         | Possible Values                                              |
+|:-----------:|:----------:|:-----------------------------------:|:------------------------------------------------------------:|
+| structure   | String     | Post structure type                 | "instructional", "reflective", "inspirational", "controversial", "insightful", "comparative", "announcement" |
+| topic       | Short text | Main subject of the post            | Open-ended                                                   |
+| opinion     | Short text | The user's viewpoint                | Open-ended                                                   |
+| context     | Long text  | Background information for the post | Open-ended                                                   |
+| max_length  | String     | Target length for the post          | "Up to 750 characters long", "Between 750 and 1,500 characters long", "Between 1,500 and 3,000 characters long" |
+| emoji_usage | String     | Level of emoji inclusion            | "none", "very low", "low", "medium", "high", "extreme"       |
+| tone        | String     | Overall emotional register          | "adventurous", "artistic", "assertive", "authoritative", "bold", "bright", "calm", "capable", "caring", "casual", "charming", "cheerful", "clever", "cocky", "colorful", "comfortable", "conversational", "creative", "daring", "delightful", "detailed", "dramatic", "dry", "eccentric", "elegant", "endearing", "energetic", "engaging", "exciting", "fabulous", "fancy", "fierce", "formal", "friendly", "fun", "futuristic", "glamorous", "honorable", "industrial", "informative", "inspiring", "intense", "inviting", "lively", "natural", "no-nonsense", "persuasive", "playful", "powerful", "professional", "quirky", "rebellious", "reliable", "sarcastic", "savvy", "scholarly", "secure", "serious", "silly", "sleek", "smart", "soothing", "sophisticated", "stable", "stimulating", "strong", "swanky", "tasteful", "thoughtful", "trustworthy", "unconventional", "unique", "upbeat", "versatile", "whimsical", "witty" |
+### Optional Writing Style Features
+| Feature | Type | Description |
+|:-:|:-:|:-:|
+| sentence_structure | Array | Lengths of sentences, analyzed for patterns and described |
+| vocabulary_usage | Float | Ratio of unique words to total words, described in qualitative terms |
+| common_phrases | Array | Distinctive phrases identified in the writing style |
+| divider_style | String | Character pattern used to separate sections |
+| line_breaks | Integer | Count of line breaks, with qualitative description |
+| punctuation_usage | Object | Frequency of various punctuation marks |
+| bullet_styles | String | Type of bullet point formatting |
+| topic_shifts | Array | Indicators of subject changes throughout the post |
+| flow | Array | Narrative progression patterns |
+| pacing | String | Rhythm of the content delivery |
+| sentiment_arc | String | Emotional progression pattern |
+| profanity | String | Level of profanity allowed |
+## Style Feature Descriptions
+### Sentence Structure
+Based on analysis of sentence lengths, classified as:
+* "Short sentences, suggesting brevity and conciseness."
+* "Long and complex sentences, indicating a detailed and elaborate style."
+* "A mix of short and long sentences, showing a balanced style."
+### Vocabulary Usage
+Calculated as a ratio of unique words to total words:
+* 50% unique: "A rich vocabulary, showcasing extensive language use and depth."
+* 35% unique: "A developed vocabulary, indicating a wide range of language and expression."
+* 25% unique: "A normal vocabulary, reflecting a balanced and versatile use of language."
+* 15% unique: "A conservative vocabulary, suggesting a focused and deliberate choice of words."
+* ≤15% unique: "A very narrow vocabulary, highlighting a specific and targeted use of language."
+### Line Break Usage
+Based on frequency and average density:
+* No breaks: "No line breaks, indicating a continuous block of text."
+* Many breaks: "Frequent line breaks, contributing to an easy-to-read structure."
+* Few breaks: "Fewer line breaks, indicating a more compact writing style."
+* Moderate: "A moderate number of line breaks, balancing readability and density."
+### Punctuation
+Analysis of punctuation frequency relative to text length, with descriptions like:
+* "Heavy use of periods/commas/exclamation marks/question marks/semicolons."
+* "Regular use of periods/commas/exclamation marks/question marks/semicolons."
+* "Standard punctuation usage."
+### Bullet Styles
+Categorized as:
+* Various symbol types: "-", "•", "#", "1.", "a.", etc.
+* "Differing Emojis": Using various emojis as bullet points
+* "EmojiBullets": Multiple emojis as bullets
+* "Mixed Bullet Styles": Multiple formatting approaches
+### Topic Shifts
+Based on semantic shift analysis between segments:
+* Dynamic (>0.8): "Dynamic topic shifts, showing a highly versatile and engaging writing style."
+* Regular (>0.6): "Regular topic shifts, reflecting a balanced and varied approach."
+* Moderate (>0.4): "Moderate topic shifts, indicating a well-rounded but focused narrative."
+* Conservative (>0.2): "Conservative topic shifts, suggesting a cautious approach to topic changes."
+* Consistent (≤0.2): "Consistent topic focus, highlighting a deep and thorough exploration of subjects."
+### Narrative Flow
+Captured as a sequence of content structure types:
+* "Introduction/Setup"
+* "Conflict/Resolution Point"
+* "Introduction/Development"
+* "Transition/Reflection"
+### Pacing
+Classified as:
+* "Fast"
+* "Slow"
+* "Variable"
+* "Dynamic"
+* "Moderate"
+* "Short/Not Enough Data" (if <3 sentences)
+### Sentiment Arc
+Progression of emotional tone:
+* "Upward Trend": Increasingly positive
+* "Downward Trend": Increasingly negative
+* "Stable": Consistent emotional tone
+* "Complex/Variable": Multiple shifts
+* "Short/Not Enough Data for Arc": Insufficient for analysis
+## Limitations
+* The model inherits the limitations of the base Qwen2.5-14B-Instruct model
+* Style analysis is most accurate for English-language content
+* Certain combinations of style parameters may produce inconsistent results
+* Performance may vary for highly technical or specialized industry topics
+* Not all writing style features may be represented in generated output with equal fidelity
+## Ethical Considerations
+* The model should not be used to generate misleading or false professional information
+* Users should verify factual claims in generated content before publishing on LinkedIn
+* Consideration should be given to professional norms and cultural sensitivities in different industries and regions
+## Citation
+```
+@misc{LinkedQwen2.5-14B-Instruct,
+    title = {LinkedQwen2.5-14B-Instruct: Fine-tuned LinkedIn Post Generator},
+    url = {https://huggingface.co/jacobpwarren/LinkedQwen2.5-14B-Instruct},
+    author = {Jacob Warren},
+    year = {2025},
+    month = {April}
+}
+@misc{qwen2.5,
+    title = {Qwen2.5: A Party of Foundation Models},
+    url = {https://qwenlm.github.io/blog/qwen2.5/},
+    author = {Qwen Team},
+    month = {September},
+    year = {2024}
+}
 ```