File size: 4,800 Bytes
d3373bb
 
 
 
 
 
 
 
7db3c3a
 
 
d3373bb
95f965b
5b40be3
95f965b
1d23860
 
6d224a5
 
95f965b
 
 
 
2cda0fa
95f965b
4193aa5
95f965b
bbcbb86
95f965b
 
 
 
 
 
33f6d75
95f965b
92b7c3d
 
6b7900e
 
c841d9e
92b7c3d
95f965b
6b7900e
92b7c3d
 
e466a20
b44f552
92b7c3d
 
 
 
 
 
 
 
 
 
 
 
 
95f965b
 
92b7c3d
95f965b
 
2dc988b
95f965b
 
 
92b7c3d
2dc988b
95f965b
 
6b7900e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92b7c3d
6b7900e
 
 
 
 
92b7c3d
 
 
 
 
95f965b
2dc988b
95f965b
33f6d75
95f965b
33f6d75
 
 
 
 
 
 
 
 
 
 
95f965b
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
language:
- sr
tags:
  - Srpski
  - Serbian
  - GPT2
  - generisanje
  - generation
name:
  - Serbian-GPT-2
---

# The Best Generative GPT-2 Model For The Serbian Language

**NOTE**: This model is locked with a key, if you need decryption keys, feel free to contact us at info@edukom.rs

![flag.png](https://cdn-uploads.huggingface.co/production/uploads/64fc6ba4e0dc35986bc3b6ee/gCUs3UIix41opzOu1mkD7.png)

By sharing this model, we aim to foster further research and applications in Serbian language processing.

### Introduction:

This GPT-2 model has been tuned on an extensive Serbian corpus, boasting a richness of 750 million tokens. It is designed to generate high-quality text in Serbian, capturing the nuances and intricacies of the language.

### Dataset Details: 

The dataset encompasses a diverse range of topics, representing various aspects of the Serbian language and culture. Size: 750 million tokens.

### Model Usage:

This model can be utilized for various NLP tasks such as text generation, summarization, translation, and more. Due to its comprehensive training on a vast corpus, it promises accurate and contextually relevant outputs, especially for tasks related to the Serbian language.


### Download & Decryption the Model:

    import os
    import requests
    import shutil
    import threading
    import time
    from transformers import GPT2LMHeadModel
    from cryptography.fernet import Fernet

    os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

    # Download Serbian-GPT-2 model
    print("\nDownload Serbian-GPT-2 model...")
    model_name = 'edukom/Serbian-GPT-2'
    base_url = f'https://huggingface.co/{model_name}/resolve/main/'
    files_to_download = ['added_tokens.json', 'config.json', 'generation_config.json', 'merges.txt', 'pytorch_model.bin', 'special_tokens_map.json', 'tokenizer.json', 'tokenizer_config.json', 'vocab.json']

    cache_dir = 'path/to/where/you/want/to/store/the/model'

    for file in files_to_download:
        response = requests.get(base_url + file)
        with open(os.path.join(cache_dir, file), 'wb') as f:
            f.write(response.content)

    # Decryption pytorch_model.bin
    key = input("\nEnter the decryption key: ").encode()
    cipher_suite = Fernet(key)

    decryption_data = os.path.join(cache_dir, 'pytorch_model.bin')

    try:
        with open(decryption_data, 'rb') as file:
            encrypted_data = file.read()

        decrypted_data = cipher_suite.decrypt(encrypted_data)

        with open(decryption_data, 'wb') as file:
            file.write(decrypted_data)

        def find_and_copy():
            base_snapshot_dir = os.path.join(cache_dir, 'models--edukom--Serbian-GPT-2', 'snapshots')

            while not os.path.exists(base_snapshot_dir):
                time.sleep(0.1)

            while True:
                existing_dirs = [d for d in os.listdir(base_snapshot_dir) if os.path.isdir(os.path.join(base_snapshot_dir, d))]
                if existing_dirs:
                    destination_path = os.path.join(base_snapshot_dir, existing_dirs[0], 'pytorch_model.bin')
                    shutil.copyfile(decryption_data, destination_path)
                    break
                time.sleep(0.1)

        # Start the copy process in parallel
        copy_thread = threading.Thread(target=find_and_copy, name="find_and_copy")
        copy_thread.start()

        # Loading Serbian-GPT-2 model
        model = GPT2LMHeadModel.from_pretrained(model_name, cache_dir=cache_dir)

        # Ensure the copying finishes
        copy_thread.join()

        print("\nCongratulations, the Serbian-GPT-2 model is ready for use ヅ\n")

    except Exception as e:
        print(f"\nError during decryption: {e}")
        print("\nYou can decrypt the model by contacting the author of this model who will add the key, email: info@edukom.rs")

    # Now you can use the Serbian-GPT-2 model for further operations...

### Model Usage License:

The author of this model is the company **Edukom AI**. The model is protected by encryption and its use requires a decryption key.

This model is available under the following license:

**For private and non-public use**: This model is freely available for use without any additional obligations. You can use it in your internal projects and experiments without any restrictions.

**For commercial use**: For commercial use of this model, users are required to contact Edukom AI company to obtain the appropriate license and agreement.

Please adhere to the license terms when using this model. For any questions or if you need decryption keys, feel free to contact us at **info@edukom.rs**

Thank you for using our model! ヅ

![Screenshot.png](https://cdn-uploads.huggingface.co/production/uploads/64fc6ba4e0dc35986bc3b6ee/UoIvwAez4ZoiEsHyx-vn6.png)