jdforaging commited on
Commit
6eca1eb
Β·
1 Parent(s): b7ea942

added new jupyter notebook

Browse files
Files changed (1) hide show
  1. 00-HF-Setup.ipynb +303 -0
00-HF-Setup.ipynb ADDED
@@ -0,0 +1,303 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "id": "131de9a7-0245-4e12-9554-c8f91eec2d21",
6
+ "metadata": {},
7
+ "source": [
8
+ "# Hugging Face Setup\n",
9
+ "\n",
10
+ "Let's quickly make sure HF is set up and that you are able to access downloads from Hugging Face Hub using your Token."
11
+ ]
12
+ },
13
+ {
14
+ "cell_type": "markdown",
15
+ "id": "2e0b59f0-1180-4297-94dc-b26115761264",
16
+ "metadata": {},
17
+ "source": [
18
+ "## Python Libraries Install:\n",
19
+ "\n",
20
+ "Note that we use various versions of these libraries throughout the course, make sure to watch the video to know which version to use!"
21
+ ]
22
+ },
23
+ {
24
+ "cell_type": "code",
25
+ "execution_count": 1,
26
+ "id": "9a2f2dca-fefc-4f36-91f3-4dbcc07345a6",
27
+ "metadata": {},
28
+ "outputs": [],
29
+ "source": [
30
+ "# !pip install transformers diffusers datasets evaluate accelerate"
31
+ ]
32
+ },
33
+ {
34
+ "cell_type": "code",
35
+ "execution_count": 2,
36
+ "id": "ac9d74b6-a136-4858-ab8b-1e7791993380",
37
+ "metadata": {},
38
+ "outputs": [],
39
+ "source": [
40
+ "from huggingface_hub import notebook_login"
41
+ ]
42
+ },
43
+ {
44
+ "cell_type": "code",
45
+ "execution_count": 3,
46
+ "id": "0899dcf9-7504-4097-beb8-c42a519305c3",
47
+ "metadata": {},
48
+ "outputs": [
49
+ {
50
+ "data": {
51
+ "application/vnd.jupyter.widget-view+json": {
52
+ "model_id": "92c0780a54384af9ae5d07a3f5d4766d",
53
+ "version_major": 2,
54
+ "version_minor": 0
55
+ },
56
+ "text/plain": [
57
+ "VBox(children=(HTML(value='<center> <img\\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…"
58
+ ]
59
+ },
60
+ "metadata": {},
61
+ "output_type": "display_data"
62
+ }
63
+ ],
64
+ "source": [
65
+ "notebook_login()"
66
+ ]
67
+ },
68
+ {
69
+ "cell_type": "code",
70
+ "execution_count": 5,
71
+ "id": "7d706a5b-456f-41d1-83a9-98fa397ce90a",
72
+ "metadata": {},
73
+ "outputs": [],
74
+ "source": [
75
+ "from huggingface_hub import scan_cache_dir\n",
76
+ "\n",
77
+ "hf_cache_info = scan_cache_dir()\n",
78
+ "# print(hf_cache_info)"
79
+ ]
80
+ },
81
+ {
82
+ "cell_type": "markdown",
83
+ "id": "0d0cc79a-6eb3-44c5-b98b-5deb40acd62b",
84
+ "metadata": {},
85
+ "source": [
86
+ "When you work with Hugging Face's Python libraries, such as the `transformers` library, you'll often download pre-trained models and datasets. These downloaded files are stored locally on your machine to avoid repeated downloads and to ensure quick access in future uses. Let's explore where and how these files are stored.\n",
87
+ "\n",
88
+ "## Where Are Hugging Face Models Stored?\n",
89
+ "\n",
90
+ "By default, Hugging Face stores downloaded models in a directory under your home directory. Specifically, it uses a hidden folder named `.cache`. The typical path looks like this:\n",
91
+ "\n",
92
+ "- On Unix-based systems (Linux, macOS):\n",
93
+ " ```\n",
94
+ " ~/.cache/huggingface/ \n",
95
+ " ```\n",
96
+ "\n",
97
+ "- On Windows systems:\n",
98
+ " ```\n",
99
+ " C:\\Users\\<YourUsername>\\.cache\\huggingface\\ \n",
100
+ " ```\n",
101
+ " \n",
102
+ "**NOTE - .cache is hidden by default! You will need to set hidden files viewable to see it!\n",
103
+ "\n",
104
+ "----\n",
105
+ "\n",
106
+ "Hidden directories are often used to store configuration files and caches. These directories are typically not shown in default file explorer views. Here’s how you can view hidden directories on different operating systems:\n",
107
+ "\n",
108
+ "## Viewing Hidden Directories on Different Operating Systems\n",
109
+ "\n",
110
+ "### macOS\n",
111
+ "\n",
112
+ "On macOS, hidden directories and files (those starting with a dot, such as `.cache`) can be made visible in Finder:\n",
113
+ "\n",
114
+ "1. **Using Finder:**\n",
115
+ " - Open Finder.\n",
116
+ " - Press `Command + Shift + .` (period). This will toggle the visibility of hidden files and directories.\n",
117
+ "\n",
118
+ "2. **Using Terminal:**\n",
119
+ " - Open Terminal.\n",
120
+ " - To list hidden files in a directory, use the following command:\n",
121
+ " ```bash\n",
122
+ " ls -la\n",
123
+ " ```\n",
124
+ " - The `-a` flag shows all files, including hidden ones, and the `-l` flag gives a detailed listing.\n",
125
+ "\n",
126
+ "### Linux\n",
127
+ "\n",
128
+ "On Linux, hidden files and directories can be viewed in the file manager or terminal:\n",
129
+ "\n",
130
+ "1. **Using File Manager (e.g., Nautilus):**\n",
131
+ " - Open your file manager.\n",
132
+ " - Press `Ctrl + H`. This will toggle the visibility of hidden files and directories.\n",
133
+ "\n",
134
+ "2. **Using Terminal:**\n",
135
+ " - Open Terminal.\n",
136
+ " - To list hidden files in a directory, use the following command:\n",
137
+ " ```bash\n",
138
+ " ls -la\n",
139
+ " ```\n",
140
+ " - The `-a` flag shows all files, including hidden ones, and the `-l` flag gives a detailed listing.\n",
141
+ "\n",
142
+ "### Windows\n",
143
+ "\n",
144
+ "On Windows, hidden files and directories can be viewed in File Explorer:\n",
145
+ "\n",
146
+ "1. **Using File Explorer:**\n",
147
+ " - Open File Explorer.\n",
148
+ " - Click on the `View` tab at the top.\n",
149
+ " - Check the box for `Hidden items` in the Show/hide group. This will toggle the visibility of hidden files and directories.\n",
150
+ "\n",
151
+ "2. **Using Command Prompt:**\n",
152
+ " - Open Command Prompt.\n",
153
+ " - To list hidden files in a directory, use the following command:\n",
154
+ " ```cmd\n",
155
+ " dir /a\n",
156
+ " ```\n",
157
+ " - The `/a` flag lists all files, including hidden ones.\n",
158
+ "\n",
159
+ "## Summary\n",
160
+ "\n",
161
+ "Viewing hidden directories on different operating systems is straightforward:\n",
162
+ "\n",
163
+ "- **macOS:** Press `Command + Shift + .` in Finder or use `ls -la` in Terminal.\n",
164
+ "- **Linux:** Press `Ctrl + H` in the file manager or use `ls -la` in Terminal.\n",
165
+ "- **Windows:** Check `Hidden items` in File Explorer’s View tab or use `dir /a` in Command Prompt.\n",
166
+ "\n",
167
+ "These methods allow you to easily access and manage hidden files and directories on your system.\n",
168
+ "\n",
169
+ "----\n",
170
+ "\n",
171
+ "**Ok, let's move on, back to Hugging Face topics!**"
172
+ ]
173
+ },
174
+ {
175
+ "cell_type": "markdown",
176
+ "id": "b5346ae4-c22c-4bc6-ab7d-21124699bce8",
177
+ "metadata": {},
178
+ "source": [
179
+ "The Hugging Face Python libraries store downloaded models in a centralized cache directory. This cache system is designed to be shared across various libraries that depend on the Hugging Face Hub. Here is a detailed explanation of where and how these models are stored:\n",
180
+ "\n",
181
+ "## Cache Directory Structure\n",
182
+ "\n",
183
+ "The cache directory is typically located in the user's home directory, but it can be customized using the `cache_dir` argument in methods or by setting the `HF_HOME` or `HF_HUB_CACHE` environment variables. The structure of the cache directory is as follows:\n",
184
+ "\n",
185
+ "```\n",
186
+ "<CACHE_DIR>\n",
187
+ "β”œβ”€ <MODELS>\n",
188
+ "β”œβ”€ <DATASETS>\n",
189
+ "β”œβ”€ <SPACES>\n",
190
+ "```\n",
191
+ "\n",
192
+ "Within these main folders, the cache is further organized by repository type, namespace (if applicable), and repository name. For example:\n",
193
+ "\n",
194
+ "```\n",
195
+ "<CACHE_DIR>\n",
196
+ "β”œβ”€ models--julien-c--EsperBERTo-small\n",
197
+ "β”œβ”€ models--lysandrejik--arxiv-nlp\n",
198
+ "β”œβ”€ models--bert-base-cased\n",
199
+ "β”œβ”€ datasets--glue\n",
200
+ "β”œβ”€ datasets--huggingface--DataMeasurementsFiles\n",
201
+ "β”œβ”€ spaces--dalle-mini--dalle-mini\n",
202
+ "```\n",
203
+ "\n",
204
+ "## Detailed Folder Structure\n",
205
+ "\n",
206
+ "Each repository folder contains subfolders that store different types of files, such as references, blobs, and snapshots. Here is an example of the folder structure for a dataset:\n",
207
+ "\n",
208
+ "```\n",
209
+ "<CACHE_DIR>\n",
210
+ "β”œβ”€ datasets--glue\n",
211
+ "β”‚ β”œβ”€ refs\n",
212
+ "β”‚ β”œβ”€ blobs\n",
213
+ "β”‚ β”œβ”€ snapshots\n",
214
+ "```\n",
215
+ "\n",
216
+ "## Managing the Cache\n",
217
+ "\n",
218
+ "### Scanning the Cache\n",
219
+ "\n",
220
+ "To manage and inspect the cache, you can use the `huggingface-cli` tool or the `scan_cache_dir` function from the `huggingface_hub` library. This allows you to see which repositories and revisions are taking up disk space. For example:\n",
221
+ "\n",
222
+ "```python\n",
223
+ "from huggingface_hub import scan_cache_dir\n",
224
+ "\n",
225
+ "hf_cache_info = scan_cache_dir()\n",
226
+ "print(hf_cache_info)\n",
227
+ "```\n",
228
+ "\n",
229
+ "This will return an `HFCacheInfo` object containing details about the cached repositories, their sizes, and any warnings about corrupted caches.\n",
230
+ "\n",
231
+ "### Example Command\n",
232
+ "\n",
233
+ "Using the `huggingface-cli` to scan the cache:\n",
234
+ "\n",
235
+ "```bash\n",
236
+ "huggingface-cli scan-cache\n",
237
+ "```\n",
238
+ "\n",
239
+ "This command will output a detailed report of the cache, including repository IDs, types, sizes, and paths.\n",
240
+ "\n",
241
+ "## Customizing the Cache Directory\n",
242
+ "\n",
243
+ "You can customize the cache directory by setting the `cache_dir` argument in methods or by using environment variables. For example:\n",
244
+ "\n",
245
+ "```python\n",
246
+ "from huggingface_hub import cached_assets_path\n",
247
+ "\n",
248
+ "path = cached_assets_path(library_name=\"datasets\", namespace=\"SQuAD\", subfolder=\"download\")\n",
249
+ "print(path)\n",
250
+ "```\n",
251
+ "\n",
252
+ "This will return the path to the cached assets for the specified library, namespace, and subfolder.\n",
253
+ "\n",
254
+ "## Conclusion\n",
255
+ "\n",
256
+ "The Hugging Face cache system is designed to efficiently store and manage models, datasets, and other resources. By understanding the structure and management tools available, users can effectively control their cache usage and ensure optimal performance.\n",
257
+ "\n",
258
+ "For more detailed information, you can refer to the Hugging Face documentation on managing the cache system[1][3][5][7].\n",
259
+ "\n",
260
+ "Citations:\n",
261
+ "[1] https://huggingface.co/docs/huggingface_hub/guides/manage-cache\n",
262
+ "[2] https://discuss.huggingface.co/t/model-caching-and-locking/44152\n",
263
+ "[3] https://huggingface.co/docs/huggingface_hub/en/guides/manage-cache\n",
264
+ "[4] https://huggingface.co/docs/hub/en/models\n",
265
+ "[5] https://huggingface.co/docs/huggingface_hub/package_reference/cache\n",
266
+ "[6] https://huggingface.co/docs/hub/en/models-libraries\n",
267
+ "[7] https://huggingface.co/docs/huggingface_hub/en/package_reference/cache\n",
268
+ "[8] https://huggingface.co/docs/hub/en/models-adding-libraries\n",
269
+ "[9] https://discuss.huggingface.co/t/how-to-save-my-model-to-use-it-later/20568\n",
270
+ "[10] https://huggingface.co/docs/transformers/en/main_classes/model"
271
+ ]
272
+ },
273
+ {
274
+ "cell_type": "code",
275
+ "execution_count": null,
276
+ "id": "d40ac023-8adb-4206-8954-457f352b4d76",
277
+ "metadata": {},
278
+ "outputs": [],
279
+ "source": []
280
+ }
281
+ ],
282
+ "metadata": {
283
+ "kernelspec": {
284
+ "display_name": "Python 3 (ipykernel)",
285
+ "language": "python",
286
+ "name": "python3"
287
+ },
288
+ "language_info": {
289
+ "codemirror_mode": {
290
+ "name": "ipython",
291
+ "version": 3
292
+ },
293
+ "file_extension": ".py",
294
+ "mimetype": "text/x-python",
295
+ "name": "python",
296
+ "nbconvert_exporter": "python",
297
+ "pygments_lexer": "ipython3",
298
+ "version": "3.9.12"
299
+ }
300
+ },
301
+ "nbformat": 4,
302
+ "nbformat_minor": 5
303
+ }