{"cells":[{"cell_type":"markdown","source":["# Databricks & Hugging Face ML Quickstart: Model Training\n\nThis notebook provides a quick overview of machine learning model training on Databricks using Hugging Face transformers. The notebook includes using MLflow to track the trained models.\n\nThis tutorial covers:\n- Part 1: Training a text classification transformer model with MLflow tracking\n\n### Requirements\n- Cluster running Databricks Runtime 7.5 ML or above\n- Training is super slow/unusable if there is no GPU attached to the cluster"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"528c6d2b-3c5a-4236-ac61-fa8cc8d4d323"}}},{"cell_type":"markdown","source":["### Libraries\nImport the necessary libraries. These libraries are preinstalled on Databricks Runtime for Machine Learning ([AWS](https://docs.databricks.com/runtime/mlruntime.html)|[Azure](https://docs.microsoft.com/azure/databricks/runtime/mlruntime)|[GCP](https://docs.gcp.databricks.com/runtime/mlruntime.html)) clusters and are tuned for compatibility and performance."],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"8738a402-24dc-4074-bebb-b51bec8e74db"}}},{"cell_type":"code","source":["%pip install transformers datasets mlflow torch"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"7c554af6-20e3-44ec-a8e1-b1e3411ab169"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"datasetInfos":[],"data":"Python interpreter will be restarted.\nCollecting transformers\n Downloading transformers-4.20.1-py3-none-any.whl (4.4 MB)\nCollecting datasets\n Downloading datasets-2.3.2-py3-none-any.whl (362 kB)\nCollecting mlflow\n Downloading mlflow-1.27.0-py3-none-any.whl (17.9 MB)\nCollecting torch\n Downloading torch-1.12.0-cp38-cp38-manylinux1_x86_64.whl (776.3 MB)\nCollecting xxhash\n Downloading xxhash-3.0.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)\nCollecting aiohttp\n Downloading aiohttp-3.8.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.3 MB)\nCollecting fsspec[http]>=2021.05.0\n Downloading fsspec-2022.5.0-py3-none-any.whl (140 kB)\nCollecting dill<0.3.6\n Downloading dill-0.3.5.1-py2.py3-none-any.whl (95 kB)\nRequirement already satisfied: numpy>=1.17 in /databricks/python3/lib/python3.8/site-packages (from datasets) (1.20.1)\nCollecting multiprocess\n Downloading multiprocess-0.70.13-py38-none-any.whl (131 kB)\nCollecting responses<0.19\n Downloading responses-0.18.0-py3-none-any.whl (38 kB)\nCollecting huggingface-hub<1.0.0,>=0.1.0\n Downloading huggingface_hub-0.8.1-py3-none-any.whl (101 kB)\nCollecting pyarrow>=6.0.0\n Downloading pyarrow-8.0.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (29.4 MB)\nRequirement already satisfied: packaging in /databricks/python3/lib/python3.8/site-packages (from datasets) (20.9)\nRequirement already satisfied: pandas in /databricks/python3/lib/python3.8/site-packages (from datasets) (1.2.4)\nCollecting tqdm>=4.62.1\n Downloading tqdm-4.64.0-py2.py3-none-any.whl (78 kB)\nRequirement already satisfied: requests>=2.19.0 in /databricks/python3/lib/python3.8/site-packages (from datasets) (2.25.1)\nCollecting pyyaml>=5.1\n Downloading PyYAML-6.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (701 kB)\nRequirement already satisfied: filelock in /usr/local/lib/python3.8/dist-packages (from huggingface-hub<1.0.0,>=0.1.0->datasets) (3.6.0)\nCollecting typing-extensions>=3.7.4.3\n Downloading typing_extensions-4.3.0-py3-none-any.whl (25 kB)\nRequirement already satisfied: pyparsing>=2.0.2 in /databricks/python3/lib/python3.8/site-packages (from packaging->datasets) (2.4.7)\nRequirement already satisfied: certifi>=2017.4.17 in /databricks/python3/lib/python3.8/site-packages (from requests>=2.19.0->datasets) (2020.12.5)\nRequirement already satisfied: chardet<5,>=3.0.2 in /databricks/python3/lib/python3.8/site-packages (from requests>=2.19.0->datasets) (4.0.0)\nRequirement already satisfied: urllib3<1.27,>=1.21.1 in /databricks/python3/lib/python3.8/site-packages (from requests>=2.19.0->datasets) (1.25.11)\nRequirement already satisfied: idna<3,>=2.5 in /databricks/python3/lib/python3.8/site-packages (from requests>=2.19.0->datasets) (2.10)\nRequirement already satisfied: pytz in /databricks/python3/lib/python3.8/site-packages (from mlflow) (2020.5)\nCollecting Flask\n Downloading Flask-2.1.3-py3-none-any.whl (95 kB)\nRequirement already satisfied: scipy in /databricks/python3/lib/python3.8/site-packages (from mlflow) (1.6.2)\nCollecting alembic\n Downloading alembic-1.8.1-py3-none-any.whl (209 kB)\nCollecting sqlparse>=0.3.1\n Downloading sqlparse-0.4.2-py3-none-any.whl (42 kB)\nRequirement already satisfied: protobuf>=3.12.0 in /databricks/python3/lib/python3.8/site-packages (from mlflow) (3.17.2)\nCollecting gitpython>=2.1.0\n Downloading GitPython-3.1.27-py3-none-any.whl (181 kB)\nCollecting databricks-cli>=0.8.7\n Downloading databricks-cli-0.17.0.tar.gz (81 kB)\nCollecting cloudpickle\n Downloading cloudpickle-2.1.0-py3-none-any.whl (25 kB)\nCollecting click>=7.0\n Downloading click-8.1.3-py3-none-any.whl (96 kB)\nCollecting sqlalchemy>=1.4.0\n Downloading SQLAlchemy-1.4.39-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB)\nCollecting gunicorn\n Downloading gunicorn-20.1.0-py3-none-any.whl (79 kB)\nCollecting docker>=4.0.0\n Downloading docker-5.0.3-py2.py3-none-any.whl (146 kB)\nCollecting querystring-parser\n Downloading querystring_parser-1.2.4-py2.py3-none-any.whl (7.9 kB)\nCollecting importlib-metadata!=4.7.0,>=3.7.0\n Downloading importlib_metadata-4.12.0-py3-none-any.whl (21 kB)\nRequirement already satisfied: entrypoints in /databricks/python3/lib/python3.8/site-packages (from mlflow) (0.3)\nCollecting prometheus-flask-exporter\n Downloading prometheus_flask_exporter-0.20.2-py3-none-any.whl (18 kB)\nCollecting pyjwt>=1.7.0\n Downloading PyJWT-2.4.0-py3-none-any.whl (18 kB)\nCollecting oauthlib>=3.1.0\n Downloading oauthlib-3.2.0-py3-none-any.whl (151 kB)\nCollecting tabulate>=0.7.7\n Downloading tabulate-0.8.10-py3-none-any.whl (29 kB)\nRequirement already satisfied: six>=1.10.0 in /databricks/python3/lib/python3.8/site-packages (from databricks-cli>=0.8.7->mlflow) (1.15.0)\nCollecting websocket-client>=0.32.0\n Downloading websocket_client-1.3.3-py3-none-any.whl (54 kB)\nCollecting gitdb<5,>=4.0.1\n Downloading gitdb-4.0.9-py3-none-any.whl (63 kB)\nCollecting smmap<6,>=3.0.1\n Downloading smmap-5.0.0-py3-none-any.whl (24 kB)\nCollecting zipp>=0.5\n Downloading zipp-3.8.1-py3-none-any.whl (5.6 kB)\nCollecting greenlet!=0.4.17\n Downloading greenlet-1.1.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (156 kB)\nCollecting regex!=2019.12.17\n Downloading regex-2022.7.9-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (765 kB)\nCollecting tokenizers!=0.11.3,<0.13,>=0.11.1\n Downloading tokenizers-0.12.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)\nCollecting async-timeout<5.0,>=4.0.0a3\n Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)\nCollecting charset-normalizer<3.0,>=2.0\n Downloading charset_normalizer-2.1.0-py3-none-any.whl (39 kB)\nCollecting frozenlist>=1.1.1\n Downloading frozenlist-1.3.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (158 kB)\nCollecting multidict<7.0,>=4.5\n Downloading multidict-6.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (121 kB)\nRequirement already satisfied: attrs>=17.3.0 in /databricks/python3/lib/python3.8/site-packages (from aiohttp->datasets) (20.3.0)\nCollecting yarl<2.0,>=1.0\n Downloading yarl-1.7.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (308 kB)\nCollecting aiosignal>=1.1.2\n Downloading aiosignal-1.2.0-py3-none-any.whl (8.2 kB)\nCollecting importlib-resources\n Downloading importlib_resources-5.8.0-py3-none-any.whl (28 kB)\nCollecting Mako\n Downloading Mako-1.2.1-py3-none-any.whl (78 kB)\nCollecting Werkzeug>=2.0\n Downloading Werkzeug-2.1.2-py3-none-any.whl (224 kB)\nCollecting itsdangerous>=2.0\n Downloading itsdangerous-2.1.2-py3-none-any.whl (15 kB)\nCollecting Jinja2>=3.0\n Downloading Jinja2-3.1.2-py3-none-any.whl (133 kB)\nRequirement already satisfied: MarkupSafe>=2.0 in /databricks/python3/lib/python3.8/site-packages (from Jinja2>=3.0->Flask->mlflow) (2.0.1)\nRequirement already satisfied: setuptools>=3.0 in /usr/local/lib/python3.8/dist-packages (from gunicorn->mlflow) (52.0.0)\nRequirement already satisfied: python-dateutil>=2.7.3 in /databricks/python3/lib/python3.8/site-packages (from pandas->datasets) (2.8.1)\nRequirement already satisfied: prometheus-client in /databricks/python3/lib/python3.8/site-packages (from prometheus-flask-exporter->mlflow) (0.10.1)\nBuilding wheels for collected packages: databricks-cli\n Building wheel for databricks-cli (setup.py): started\n Building wheel for databricks-cli (setup.py): finished with status 'done'\n Created wheel for databricks-cli: filename=databricks_cli-0.17.0-py3-none-any.whl size=141932 sha256=bb09e2cf09646974e0569af11512120f854d104b5284c4656f865a660e821cc9\n Stored in directory: /root/.cache/pip/wheels/bc/ef/2a/18885b70c6b78d4b9612ef2bf4bfdc7325f43db9d817d20f3f\nSuccessfully built databricks-cli\nInstalling collected packages: zipp, multidict, frozenlist, yarl, Werkzeug, smmap, Jinja2, itsdangerous, importlib-metadata, greenlet, click, charset-normalizer, async-timeout, aiosignal, websocket-client, typing-extensions, tqdm, tabulate, sqlalchemy, pyyaml, pyjwt, oauthlib, Mako, importlib-resources, gitdb, fsspec, Flask, dill, aiohttp, xxhash, tokenizers, sqlparse, responses, regex, querystring-parser, pyarrow, prometheus-flask-exporter, multiprocess, huggingface-hub, gunicorn, gitpython, docker, databricks-cli, cloudpickle, alembic, transformers, torch, mlflow, datasets\n Attempting uninstall: Jinja2\n Found existing installation: Jinja2 2.11.3\n Not uninstalling jinja2 at /databricks/python3/lib/python3.8/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-bab12bc9-22d1-4101-97b5-6ae403f8662e\n Can't uninstall 'Jinja2'. No files were found to uninstall.\n Attempting uninstall: pyarrow\n Found existing installation: pyarrow 4.0.0\n Not uninstalling pyarrow at /databricks/python3/lib/python3.8/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-bab12bc9-22d1-4101-97b5-6ae403f8662e\n Can't uninstall 'pyarrow'. No files were found to uninstall.\nSuccessfully installed Flask-2.1.3 Jinja2-3.1.2 Mako-1.2.1 Werkzeug-2.1.2 aiohttp-3.8.1 aiosignal-1.2.0 alembic-1.8.1 async-timeout-4.0.2 charset-normalizer-2.1.0 click-8.1.3 cloudpickle-2.1.0 databricks-cli-0.17.0 datasets-2.3.2 dill-0.3.5.1 docker-5.0.3 frozenlist-1.3.0 fsspec-2022.5.0 gitdb-4.0.9 gitpython-3.1.27 greenlet-1.1.2 gunicorn-20.1.0 huggingface-hub-0.8.1 importlib-metadata-4.12.0 importlib-resources-5.8.0 itsdangerous-2.1.2 mlflow-1.27.0 multidict-6.0.2 multiprocess-0.70.13 oauthlib-3.2.0 prometheus-flask-exporter-0.20.2 pyarrow-8.0.0 pyjwt-2.4.0 pyyaml-6.0 querystring-parser-1.2.4 regex-2022.7.9 responses-0.18.0 smmap-5.0.0 sqlalchemy-1.4.39 sqlparse-0.4.2 tabulate-0.8.10 tokenizers-0.12.1 torch-1.12.0 tqdm-4.64.0 transformers-4.20.1 typing-extensions-4.3.0 websocket-client-1.3.3 xxhash-3.0.0 yarl-1.7.2 zipp-3.8.1\nPython interpreter will be restarted.\n","removedWidgets":[],"addedWidgets":{},"metadata":{},"type":"ansi","arguments":{}}},"output_type":"display_data","data":{"text/plain":["Python interpreter will be restarted.\nCollecting transformers\n Downloading transformers-4.20.1-py3-none-any.whl (4.4 MB)\nCollecting datasets\n Downloading datasets-2.3.2-py3-none-any.whl (362 kB)\nCollecting mlflow\n Downloading mlflow-1.27.0-py3-none-any.whl (17.9 MB)\nCollecting torch\n Downloading torch-1.12.0-cp38-cp38-manylinux1_x86_64.whl (776.3 MB)\nCollecting xxhash\n Downloading xxhash-3.0.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)\nCollecting aiohttp\n Downloading aiohttp-3.8.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.3 MB)\nCollecting fsspec[http]>=2021.05.0\n Downloading fsspec-2022.5.0-py3-none-any.whl (140 kB)\nCollecting dill<0.3.6\n Downloading dill-0.3.5.1-py2.py3-none-any.whl (95 kB)\nRequirement already satisfied: numpy>=1.17 in /databricks/python3/lib/python3.8/site-packages (from datasets) (1.20.1)\nCollecting multiprocess\n Downloading multiprocess-0.70.13-py38-none-any.whl (131 kB)\nCollecting responses<0.19\n Downloading responses-0.18.0-py3-none-any.whl (38 kB)\nCollecting huggingface-hub<1.0.0,>=0.1.0\n Downloading huggingface_hub-0.8.1-py3-none-any.whl (101 kB)\nCollecting pyarrow>=6.0.0\n Downloading pyarrow-8.0.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (29.4 MB)\nRequirement already satisfied: packaging in /databricks/python3/lib/python3.8/site-packages (from datasets) (20.9)\nRequirement already satisfied: pandas in /databricks/python3/lib/python3.8/site-packages (from datasets) (1.2.4)\nCollecting tqdm>=4.62.1\n Downloading tqdm-4.64.0-py2.py3-none-any.whl (78 kB)\nRequirement already satisfied: requests>=2.19.0 in /databricks/python3/lib/python3.8/site-packages (from datasets) (2.25.1)\nCollecting pyyaml>=5.1\n Downloading PyYAML-6.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (701 kB)\nRequirement already satisfied: filelock in /usr/local/lib/python3.8/dist-packages (from huggingface-hub<1.0.0,>=0.1.0->datasets) (3.6.0)\nCollecting typing-extensions>=3.7.4.3\n Downloading typing_extensions-4.3.0-py3-none-any.whl (25 kB)\nRequirement already satisfied: pyparsing>=2.0.2 in /databricks/python3/lib/python3.8/site-packages (from packaging->datasets) (2.4.7)\nRequirement already satisfied: certifi>=2017.4.17 in /databricks/python3/lib/python3.8/site-packages (from requests>=2.19.0->datasets) (2020.12.5)\nRequirement already satisfied: chardet<5,>=3.0.2 in /databricks/python3/lib/python3.8/site-packages (from requests>=2.19.0->datasets) (4.0.0)\nRequirement already satisfied: urllib3<1.27,>=1.21.1 in /databricks/python3/lib/python3.8/site-packages (from requests>=2.19.0->datasets) (1.25.11)\nRequirement already satisfied: idna<3,>=2.5 in /databricks/python3/lib/python3.8/site-packages (from requests>=2.19.0->datasets) (2.10)\nRequirement already satisfied: pytz in /databricks/python3/lib/python3.8/site-packages (from mlflow) (2020.5)\nCollecting Flask\n Downloading Flask-2.1.3-py3-none-any.whl (95 kB)\nRequirement already satisfied: scipy in /databricks/python3/lib/python3.8/site-packages (from mlflow) (1.6.2)\nCollecting alembic\n Downloading alembic-1.8.1-py3-none-any.whl (209 kB)\nCollecting sqlparse>=0.3.1\n Downloading sqlparse-0.4.2-py3-none-any.whl (42 kB)\nRequirement already satisfied: protobuf>=3.12.0 in /databricks/python3/lib/python3.8/site-packages (from mlflow) (3.17.2)\nCollecting gitpython>=2.1.0\n Downloading GitPython-3.1.27-py3-none-any.whl (181 kB)\nCollecting databricks-cli>=0.8.7\n Downloading databricks-cli-0.17.0.tar.gz (81 kB)\nCollecting cloudpickle\n Downloading cloudpickle-2.1.0-py3-none-any.whl (25 kB)\nCollecting click>=7.0\n Downloading click-8.1.3-py3-none-any.whl (96 kB)\nCollecting sqlalchemy>=1.4.0\n Downloading SQLAlchemy-1.4.39-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB)\nCollecting gunicorn\n Downloading gunicorn-20.1.0-py3-none-any.whl (79 kB)\nCollecting docker>=4.0.0\n Downloading docker-5.0.3-py2.py3-none-any.whl (146 kB)\nCollecting querystring-parser\n Downloading querystring_parser-1.2.4-py2.py3-none-any.whl (7.9 kB)\nCollecting importlib-metadata!=4.7.0,>=3.7.0\n Downloading importlib_metadata-4.12.0-py3-none-any.whl (21 kB)\nRequirement already satisfied: entrypoints in /databricks/python3/lib/python3.8/site-packages (from mlflow) (0.3)\nCollecting prometheus-flask-exporter\n Downloading prometheus_flask_exporter-0.20.2-py3-none-any.whl (18 kB)\nCollecting pyjwt>=1.7.0\n Downloading PyJWT-2.4.0-py3-none-any.whl (18 kB)\nCollecting oauthlib>=3.1.0\n Downloading oauthlib-3.2.0-py3-none-any.whl (151 kB)\nCollecting tabulate>=0.7.7\n Downloading tabulate-0.8.10-py3-none-any.whl (29 kB)\nRequirement already satisfied: six>=1.10.0 in /databricks/python3/lib/python3.8/site-packages (from databricks-cli>=0.8.7->mlflow) (1.15.0)\nCollecting websocket-client>=0.32.0\n Downloading websocket_client-1.3.3-py3-none-any.whl (54 kB)\nCollecting gitdb<5,>=4.0.1\n Downloading gitdb-4.0.9-py3-none-any.whl (63 kB)\nCollecting smmap<6,>=3.0.1\n Downloading smmap-5.0.0-py3-none-any.whl (24 kB)\nCollecting zipp>=0.5\n Downloading zipp-3.8.1-py3-none-any.whl (5.6 kB)\nCollecting greenlet!=0.4.17\n Downloading greenlet-1.1.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (156 kB)\nCollecting regex!=2019.12.17\n Downloading regex-2022.7.9-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (765 kB)\nCollecting tokenizers!=0.11.3,<0.13,>=0.11.1\n Downloading tokenizers-0.12.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)\nCollecting async-timeout<5.0,>=4.0.0a3\n Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)\nCollecting charset-normalizer<3.0,>=2.0\n Downloading charset_normalizer-2.1.0-py3-none-any.whl (39 kB)\nCollecting frozenlist>=1.1.1\n Downloading frozenlist-1.3.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (158 kB)\nCollecting multidict<7.0,>=4.5\n Downloading multidict-6.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (121 kB)\nRequirement already satisfied: attrs>=17.3.0 in /databricks/python3/lib/python3.8/site-packages (from aiohttp->datasets) (20.3.0)\nCollecting yarl<2.0,>=1.0\n Downloading yarl-1.7.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (308 kB)\nCollecting aiosignal>=1.1.2\n Downloading aiosignal-1.2.0-py3-none-any.whl (8.2 kB)\nCollecting importlib-resources\n Downloading importlib_resources-5.8.0-py3-none-any.whl (28 kB)\nCollecting Mako\n Downloading Mako-1.2.1-py3-none-any.whl (78 kB)\nCollecting Werkzeug>=2.0\n Downloading Werkzeug-2.1.2-py3-none-any.whl (224 kB)\nCollecting itsdangerous>=2.0\n Downloading itsdangerous-2.1.2-py3-none-any.whl (15 kB)\nCollecting Jinja2>=3.0\n Downloading Jinja2-3.1.2-py3-none-any.whl (133 kB)\nRequirement already satisfied: MarkupSafe>=2.0 in /databricks/python3/lib/python3.8/site-packages (from Jinja2>=3.0->Flask->mlflow) (2.0.1)\nRequirement already satisfied: setuptools>=3.0 in /usr/local/lib/python3.8/dist-packages (from gunicorn->mlflow) (52.0.0)\nRequirement already satisfied: python-dateutil>=2.7.3 in /databricks/python3/lib/python3.8/site-packages (from pandas->datasets) (2.8.1)\nRequirement already satisfied: prometheus-client in /databricks/python3/lib/python3.8/site-packages (from prometheus-flask-exporter->mlflow) (0.10.1)\nBuilding wheels for collected packages: databricks-cli\n Building wheel for databricks-cli (setup.py): started\n Building wheel for databricks-cli (setup.py): finished with status 'done'\n Created wheel for databricks-cli: filename=databricks_cli-0.17.0-py3-none-any.whl size=141932 sha256=bb09e2cf09646974e0569af11512120f854d104b5284c4656f865a660e821cc9\n Stored in directory: /root/.cache/pip/wheels/bc/ef/2a/18885b70c6b78d4b9612ef2bf4bfdc7325f43db9d817d20f3f\nSuccessfully built databricks-cli\nInstalling collected packages: zipp, multidict, frozenlist, yarl, Werkzeug, smmap, Jinja2, itsdangerous, importlib-metadata, greenlet, click, charset-normalizer, async-timeout, aiosignal, websocket-client, typing-extensions, tqdm, tabulate, sqlalchemy, pyyaml, pyjwt, oauthlib, Mako, importlib-resources, gitdb, fsspec, Flask, dill, aiohttp, xxhash, tokenizers, sqlparse, responses, regex, querystring-parser, pyarrow, prometheus-flask-exporter, multiprocess, huggingface-hub, gunicorn, gitpython, docker, databricks-cli, cloudpickle, alembic, transformers, torch, mlflow, datasets\n Attempting uninstall: Jinja2\n Found existing installation: Jinja2 2.11.3\n Not uninstalling jinja2 at /databricks/python3/lib/python3.8/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-bab12bc9-22d1-4101-97b5-6ae403f8662e\n Can't uninstall 'Jinja2'. No files were found to uninstall.\n Attempting uninstall: pyarrow\n Found existing installation: pyarrow 4.0.0\n Not uninstalling pyarrow at /databricks/python3/lib/python3.8/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-bab12bc9-22d1-4101-97b5-6ae403f8662e\n Can't uninstall 'pyarrow'. No files were found to uninstall.\nSuccessfully installed Flask-2.1.3 Jinja2-3.1.2 Mako-1.2.1 Werkzeug-2.1.2 aiohttp-3.8.1 aiosignal-1.2.0 alembic-1.8.1 async-timeout-4.0.2 charset-normalizer-2.1.0 click-8.1.3 cloudpickle-2.1.0 databricks-cli-0.17.0 datasets-2.3.2 dill-0.3.5.1 docker-5.0.3 frozenlist-1.3.0 fsspec-2022.5.0 gitdb-4.0.9 gitpython-3.1.27 greenlet-1.1.2 gunicorn-20.1.0 huggingface-hub-0.8.1 importlib-metadata-4.12.0 importlib-resources-5.8.0 itsdangerous-2.1.2 mlflow-1.27.0 multidict-6.0.2 multiprocess-0.70.13 oauthlib-3.2.0 prometheus-flask-exporter-0.20.2 pyarrow-8.0.0 pyjwt-2.4.0 pyyaml-6.0 querystring-parser-1.2.4 regex-2022.7.9 responses-0.18.0 smmap-5.0.0 sqlalchemy-1.4.39 sqlparse-0.4.2 tabulate-0.8.10 tokenizers-0.12.1 torch-1.12.0 tqdm-4.64.0 transformers-4.20.1 typing-extensions-4.3.0 websocket-client-1.3.3 xxhash-3.0.0 yarl-1.7.2 zipp-3.8.1\nPython interpreter will be restarted.\n"]}}],"execution_count":0},{"cell_type":"markdown","source":["### Install Git LFS"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"8a18992e-ce3f-4b09-a6a1-ddd867006afa"}}},{"cell_type":"code","source":["%sh\ncurl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash\nsudo apt-get install git-lfs"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"27caf826-3804-40f1-8cd8-bc72077ceeb0"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"datasetInfos":[],"data":"Detected operating system as Ubuntu/focal.\nChecking for curl...\nDetected curl...\nChecking for gpg...\nDetected gpg...\nRunning apt-get update... done.\nInstalling apt-transport-https... done.\nInstalling /etc/apt/sources.list.d/github_git-lfs.list...done.\nImporting packagecloud gpg key... done.\nRunning apt-get update... done.\n\nThe repository is setup! You can now install packages.\nReading package lists...\nBuilding dependency tree...\nReading state information...\nThe following NEW packages will be installed:\n git-lfs\n0 upgraded, 1 newly installed, 0 to remove and 68 not upgraded.\nNeed to get 7,168 kB of archives.\nAfter this operation, 15.6 MB of additional disk space will be used.\nGet:1 https://packagecloud.io/github/git-lfs/ubuntu focal/main amd64 git-lfs amd64 3.2.0 [7,168 kB]\ndebconf: delaying package configuration, since apt-utils is not installed\nFetched 7,168 kB in 0s (15.4 MB/s)\nSelecting previously unselected package git-lfs.\n(Reading database ... \n(Reading database ... 5%\n(Reading database ... 10%\n(Reading database ... 15%\n(Reading database ... 20%\n(Reading database ... 25%\n(Reading database ... 30%\n(Reading database ... 35%\n(Reading database ... 40%\n(Reading database ... 45%\n(Reading database ... 50%\n(Reading database ... 55%\n(Reading database ... 60%\n(Reading database ... 65%\n(Reading database ... 70%\n(Reading database ... 75%\n(Reading database ... 80%\n(Reading database ... 85%\n(Reading database ... 90%\n(Reading database ... 95%\n(Reading database ... 100%\n(Reading database ... 92257 files and directories currently installed.)\nPreparing to unpack .../git-lfs_3.2.0_amd64.deb ...\nUnpacking git-lfs (3.2.0) ...\nSetting up git-lfs (3.2.0) ...\nGit LFS initialized.\nProcessing triggers for man-db (2.9.1-1) ...\n","removedWidgets":[],"addedWidgets":{},"metadata":{},"type":"ansi","arguments":{}}},"output_type":"display_data","data":{"text/plain":["Detected operating system as Ubuntu/focal.\nChecking for curl...\nDetected curl...\nChecking for gpg...\nDetected gpg...\nRunning apt-get update... done.\nInstalling apt-transport-https... done.\nInstalling /etc/apt/sources.list.d/github_git-lfs.list...done.\nImporting packagecloud gpg key... done.\nRunning apt-get update... done.\n\nThe repository is setup! You can now install packages.\nReading package lists...\nBuilding dependency tree...\nReading state information...\nThe following NEW packages will be installed:\n git-lfs\n0 upgraded, 1 newly installed, 0 to remove and 68 not upgraded.\nNeed to get 7,168 kB of archives.\nAfter this operation, 15.6 MB of additional disk space will be used.\nGet:1 https://packagecloud.io/github/git-lfs/ubuntu focal/main amd64 git-lfs amd64 3.2.0 [7,168 kB]\ndebconf: delaying package configuration, since apt-utils is not installed\nFetched 7,168 kB in 0s (15.4 MB/s)\nSelecting previously unselected package git-lfs.\n(Reading database ... \n(Reading database ... 5%\n(Reading database ... 10%\n(Reading database ... 15%\n(Reading database ... 20%\n(Reading database ... 25%\n(Reading database ... 30%\n(Reading database ... 35%\n(Reading database ... 40%\n(Reading database ... 45%\n(Reading database ... 50%\n(Reading database ... 55%\n(Reading database ... 60%\n(Reading database ... 65%\n(Reading database ... 70%\n(Reading database ... 75%\n(Reading database ... 80%\n(Reading database ... 85%\n(Reading database ... 90%\n(Reading database ... 95%\n(Reading database ... 100%\n(Reading database ... 92257 files and directories currently installed.)\nPreparing to unpack .../git-lfs_3.2.0_amd64.deb ...\nUnpacking git-lfs (3.2.0) ...\nSetting up git-lfs (3.2.0) ...\nGit LFS initialized.\nProcessing triggers for man-db (2.9.1-1) ...\n"]}}],"execution_count":0},{"cell_type":"code","source":["import mlflow\nimport torch\n#from hyperopt import fmin, tpe, hp, SparkTrials, Trials, STATUS_OK\n#from hyperopt.pyll import scope\nfrom datasets import load_dataset, load_metric\nfrom huggingface_hub import notebook_login\nfrom transformers import (\n AutoModelForSequenceClassification,\n AutoTokenizer,\n Trainer,\n TrainingArguments,\n)"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"cec61d6d-ea1a-4d2f-9ee6-625393a24aa5"}},"outputs":[],"execution_count":0},{"cell_type":"markdown","source":["### Log into Hugging Face Hub"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"094921b5-f746-4303-af4b-7a4d61b3b48a"}}},{"cell_type":"markdown","source":["This uses the command line to login into the hugging face hub. If the Hugging Face hub is private, specify the location using the \"HF_ENDPOINT\" parameter."],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"4af8d844-c5f0-45c0-80c8-c0b74e55abe9"}}},{"cell_type":"code","source":["from huggingface_hub.commands.user import _login\nfrom huggingface_hub import HfApi\napi = HfApi()\n_login(hf_api = api, token = \"API Token\")"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"896afdc7-85b9-4ad0-ac9e-2a68def84532"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"datasetInfos":[],"data":"Login successful\nYour token has been saved to /root/.huggingface/token\n\u001B[1m\u001B[31mAuthenticated through git-credential store but this isn't the helper defined on your machine.\nYou might have to re-authenticate when pushing to the Hugging Face Hub. Run the following command in your terminal in case you want to set this credential helper as the default\n\ngit config --global credential.helper store\u001B[0m\n","removedWidgets":[],"addedWidgets":{},"metadata":{},"type":"ansi","arguments":{}}},"output_type":"display_data","data":{"text/plain":["Login successful\nYour token has been saved to /root/.huggingface/token\n\u001B[1m\u001B[31mAuthenticated through git-credential store but this isn't the helper defined on your machine.\nYou might have to re-authenticate when pushing to the Hugging Face Hub. Run the following command in your terminal in case you want to set this credential helper as the default\n\ngit config --global credential.helper store\u001B[0m\n"]}}],"execution_count":0},{"cell_type":"code","source":["#Verify Login\n!huggingface-cli whoami"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"0a3fc9dd-7a03-41c4-a43a-6a2a3ee610bd"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"datasetInfos":[],"data":"rajistics\r\n\u001B[1morgs: \u001B[0m huggingface,spaces-explorers,demo-org,HF-test-lab,qualitydatalab\r\n","removedWidgets":[],"addedWidgets":{},"metadata":{},"type":"ansi","arguments":{}}},"output_type":"display_data","data":{"text/plain":["rajistics\r\n\u001B[1morgs: \u001B[0m huggingface,spaces-explorers,demo-org,HF-test-lab,qualitydatalab\r\n"]}}],"execution_count":0},{"cell_type":"markdown","source":["### Load data\nThe tutorial uses the IMDB dataset for move reviews. The complete [model card](https://huggingface.co/datasets/imdb) can be found at Hugging Face with details on the dataset. \n\nThe goal is to classify reviews as positive or negative. \n\nThe dataset is loaded using the Hugging Face datasets package."],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"b2f67ffe-ad1b-49a1-a7cf-603daa8c9890"}}},{"cell_type":"code","source":["# Load and preprocess data\ntrain_dataset, test_dataset = load_dataset(\"imdb\", split=[\"train\", \"test\"])"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"51fd8a9f-bef3-4fbd-90ce-8531f5f71205"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"datasetInfos":[],"data":{"text/plain":"Downloading builder script: 0%| | 0.00/1.79k [00:00 of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.\n","removedWidgets":[],"addedWidgets":{},"metadata":{},"type":"ansi","arguments":{}}},"output_type":"display_data","data":{"text/plain":["Parameter 'function'= of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.\n"]}},{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"datasetInfos":[],"data":{"text/plain":" 0%| | 0/25 [00:00\n .ansiout {\n display: block;\n unicode-bidi: embed;\n white-space: pre-wrap;\n word-wrap: break-word;\n word-break: break-all;\n font-family: \"Source Code Pro\", \"Menlo\", monospace;;\n font-size: 13px;\n color: #555;\n margin-left: 4px;\n line-height: 19px;\n }\n"]}}],"execution_count":0},{"cell_type":"code","source":["mlflow.end_run()\ntrainer.push_to_hub()"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"12d375d9-a641-4b64-a6d1-38a01c095070"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"datasetInfos":[],"data":"Saving model checkpoint to ./output\nConfiguration saved in ./output/config.json\nModel weights saved in ./output/pytorch_model.bin\n","removedWidgets":[],"addedWidgets":{},"metadata":{},"type":"ansi","arguments":{}}},"output_type":"display_data","data":{"text/plain":["Saving model checkpoint to ./output\nConfiguration saved in ./output/config.json\nModel weights saved in ./output/pytorch_model.bin\n"]}},{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"datasetInfos":[],"data":{"text/plain":"Upload file pytorch_model.bin: 0%| | 32.0k/251M [00:00 main\n\nDropping the following result as it does not have all the necessary fields:\n{'task': {'name': 'Text Classification', 'type': 'text-classification'}, 'dataset': {'name': 'imdb', 'type': 'imdb', 'args': 'plain_text'}}\nTo https://huggingface.co/rajistics/distilbert-imdb-mlflow\n 565ce9d..2e139c6 main -> main\n\nOut[12]: 'https://huggingface.co/rajistics/distilbert-imdb-mlflow/commit/565ce9de2a3bf303432d5ca277711f8237b8097c'","removedWidgets":[],"addedWidgets":{},"metadata":{},"type":"ansi","arguments":{}}},"output_type":"display_data","data":{"text/plain":["To https://huggingface.co/rajistics/distilbert-imdb-mlflow\n 11f9d35..565ce9d main -> main\n\nDropping the following result as it does not have all the necessary fields:\n{'task': {'name': 'Text Classification', 'type': 'text-classification'}, 'dataset': {'name': 'imdb', 'type': 'imdb', 'args': 'plain_text'}}\nTo https://huggingface.co/rajistics/distilbert-imdb-mlflow\n 565ce9d..2e139c6 main -> main\n\nOut[12]: 'https://huggingface.co/rajistics/distilbert-imdb-mlflow/commit/565ce9de2a3bf303432d5ca277711f8237b8097c'"]}}],"execution_count":0},{"cell_type":"code","source":["from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline\n# Model Card: https://huggingface.co/lvwerra/distilbert-imdb\ntokenizer = AutoTokenizer.from_pretrained(\"distilbert-base-cased\")\nmodel = AutoModelForSequenceClassification.from_pretrained(\"rajistics/distilbert-imdb-mlflow\")\nmoviereview = pipeline(\"text-classification\", model = model, tokenizer = tokenizer)"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"5bcdb820-3b8f-43f0-8c86-ff34dc5a8bc7"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"datasetInfos":[],"data":"loading configuration file https://huggingface.co/distilbert-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/ebe1ea24d11aa664488b8de5b21e33989008ca78f207d4e30ec6350b693f073f.302bfd1b5e031cc1b17796e0b6e5b242ba2045d31d00f97589e12b458ebff27a\nModel config DistilBertConfig {\n \"_name_or_path\": \"distilbert-base-cased\",\n \"activation\": \"gelu\",\n \"attention_dropout\": 0.1,\n \"dim\": 768,\n \"dropout\": 0.1,\n \"hidden_dim\": 3072,\n \"initializer_range\": 0.02,\n \"max_position_embeddings\": 512,\n \"model_type\": \"distilbert\",\n \"n_heads\": 12,\n \"n_layers\": 6,\n \"output_past\": true,\n \"pad_token_id\": 0,\n \"qa_dropout\": 0.1,\n \"seq_classif_dropout\": 0.2,\n \"sinusoidal_pos_embds\": false,\n \"tie_weights_\": true,\n \"transformers_version\": \"4.20.1\",\n \"vocab_size\": 28996\n}\n\nloading file https://huggingface.co/distilbert-base-cased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/ba377304984dc63e3ede0e23a938bbbf04d5c3835b66d5bb48343aecca188429.437aa611e89f6fc6675a049d2b5545390adbc617e7d655286421c191d2be2791\nloading file https://huggingface.co/distilbert-base-cased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/acb5c2138c1f8c84f074b86dafce3631667fccd6efcb1a7ea1320cf75c386a36.3dab63143af66769bbb35e3811f75f7e16b2320e12b7935e216bd6159ce6d9a6\nloading file https://huggingface.co/distilbert-base-cased/resolve/main/added_tokens.json from cache at None\nloading file https://huggingface.co/distilbert-base-cased/resolve/main/special_tokens_map.json from cache at None\nloading file https://huggingface.co/distilbert-base-cased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/81e970e5e6ec68be12da0f8f3b2f2469c78d579282299a2ea65b4b7441719107.ec5c189f89475aac7d8cbd243960a0655cfadc3d0474da8ff2ed0bf1699c2a5f\nloading configuration file https://huggingface.co/distilbert-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/ebe1ea24d11aa664488b8de5b21e33989008ca78f207d4e30ec6350b693f073f.302bfd1b5e031cc1b17796e0b6e5b242ba2045d31d00f97589e12b458ebff27a\nModel config DistilBertConfig {\n \"_name_or_path\": \"distilbert-base-cased\",\n \"activation\": \"gelu\",\n \"attention_dropout\": 0.1,\n \"dim\": 768,\n \"dropout\": 0.1,\n \"hidden_dim\": 3072,\n \"initializer_range\": 0.02,\n \"max_position_embeddings\": 512,\n \"model_type\": \"distilbert\",\n \"n_heads\": 12,\n \"n_layers\": 6,\n \"output_past\": true,\n \"pad_token_id\": 0,\n \"qa_dropout\": 0.1,\n \"seq_classif_dropout\": 0.2,\n \"sinusoidal_pos_embds\": false,\n \"tie_weights_\": true,\n \"transformers_version\": \"4.20.1\",\n \"vocab_size\": 28996\n}\n\nloading configuration file https://huggingface.co/rajistics/distilbert-imdb-mlflow/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/ef11e290776d32ed8373fb83bdc594abae0602cc3d4a6530f3ed9533e98aac64.a8ccf646a0873f3d60805d2779a0b1caf3ca469ffd72e36224dee22566738d73\nModel config DistilBertConfig {\n \"_name_or_path\": \"rajistics/distilbert-imdb-mlflow\",\n \"activation\": \"gelu\",\n \"architectures\": [\n \"DistilBertForSequenceClassification\"\n ],\n \"attention_dropout\": 0.1,\n \"dim\": 768,\n \"dropout\": 0.1,\n \"hidden_dim\": 3072,\n \"initializer_range\": 0.02,\n \"max_position_embeddings\": 512,\n \"model_type\": \"distilbert\",\n \"n_heads\": 12,\n \"n_layers\": 6,\n \"output_past\": true,\n \"pad_token_id\": 0,\n \"problem_type\": \"single_label_classification\",\n \"qa_dropout\": 0.1,\n \"seq_classif_dropout\": 0.2,\n \"sinusoidal_pos_embds\": false,\n \"tie_weights_\": true,\n \"torch_dtype\": \"float32\",\n \"transformers_version\": \"4.20.1\",\n \"vocab_size\": 28996\n}\n\nhttps://huggingface.co/rajistics/distilbert-imdb-mlflow/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpmdxxho8z\n","removedWidgets":[],"addedWidgets":{},"metadata":{},"type":"ansi","arguments":{}}},"output_type":"display_data","data":{"text/plain":["loading configuration file https://huggingface.co/distilbert-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/ebe1ea24d11aa664488b8de5b21e33989008ca78f207d4e30ec6350b693f073f.302bfd1b5e031cc1b17796e0b6e5b242ba2045d31d00f97589e12b458ebff27a\nModel config DistilBertConfig {\n \"_name_or_path\": \"distilbert-base-cased\",\n \"activation\": \"gelu\",\n \"attention_dropout\": 0.1,\n \"dim\": 768,\n \"dropout\": 0.1,\n \"hidden_dim\": 3072,\n \"initializer_range\": 0.02,\n \"max_position_embeddings\": 512,\n \"model_type\": \"distilbert\",\n \"n_heads\": 12,\n \"n_layers\": 6,\n \"output_past\": true,\n \"pad_token_id\": 0,\n \"qa_dropout\": 0.1,\n \"seq_classif_dropout\": 0.2,\n \"sinusoidal_pos_embds\": false,\n \"tie_weights_\": true,\n \"transformers_version\": \"4.20.1\",\n \"vocab_size\": 28996\n}\n\nloading file https://huggingface.co/distilbert-base-cased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/ba377304984dc63e3ede0e23a938bbbf04d5c3835b66d5bb48343aecca188429.437aa611e89f6fc6675a049d2b5545390adbc617e7d655286421c191d2be2791\nloading file https://huggingface.co/distilbert-base-cased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/acb5c2138c1f8c84f074b86dafce3631667fccd6efcb1a7ea1320cf75c386a36.3dab63143af66769bbb35e3811f75f7e16b2320e12b7935e216bd6159ce6d9a6\nloading file https://huggingface.co/distilbert-base-cased/resolve/main/added_tokens.json from cache at None\nloading file https://huggingface.co/distilbert-base-cased/resolve/main/special_tokens_map.json from cache at None\nloading file https://huggingface.co/distilbert-base-cased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/81e970e5e6ec68be12da0f8f3b2f2469c78d579282299a2ea65b4b7441719107.ec5c189f89475aac7d8cbd243960a0655cfadc3d0474da8ff2ed0bf1699c2a5f\nloading configuration file https://huggingface.co/distilbert-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/ebe1ea24d11aa664488b8de5b21e33989008ca78f207d4e30ec6350b693f073f.302bfd1b5e031cc1b17796e0b6e5b242ba2045d31d00f97589e12b458ebff27a\nModel config DistilBertConfig {\n \"_name_or_path\": \"distilbert-base-cased\",\n \"activation\": \"gelu\",\n \"attention_dropout\": 0.1,\n \"dim\": 768,\n \"dropout\": 0.1,\n \"hidden_dim\": 3072,\n \"initializer_range\": 0.02,\n \"max_position_embeddings\": 512,\n \"model_type\": \"distilbert\",\n \"n_heads\": 12,\n \"n_layers\": 6,\n \"output_past\": true,\n \"pad_token_id\": 0,\n \"qa_dropout\": 0.1,\n \"seq_classif_dropout\": 0.2,\n \"sinusoidal_pos_embds\": false,\n \"tie_weights_\": true,\n \"transformers_version\": \"4.20.1\",\n \"vocab_size\": 28996\n}\n\nloading configuration file https://huggingface.co/rajistics/distilbert-imdb-mlflow/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/ef11e290776d32ed8373fb83bdc594abae0602cc3d4a6530f3ed9533e98aac64.a8ccf646a0873f3d60805d2779a0b1caf3ca469ffd72e36224dee22566738d73\nModel config DistilBertConfig {\n \"_name_or_path\": \"rajistics/distilbert-imdb-mlflow\",\n \"activation\": \"gelu\",\n \"architectures\": [\n \"DistilBertForSequenceClassification\"\n ],\n \"attention_dropout\": 0.1,\n \"dim\": 768,\n \"dropout\": 0.1,\n \"hidden_dim\": 3072,\n \"initializer_range\": 0.02,\n \"max_position_embeddings\": 512,\n \"model_type\": \"distilbert\",\n \"n_heads\": 12,\n \"n_layers\": 6,\n \"output_past\": true,\n \"pad_token_id\": 0,\n \"problem_type\": \"single_label_classification\",\n \"qa_dropout\": 0.1,\n \"seq_classif_dropout\": 0.2,\n \"sinusoidal_pos_embds\": false,\n \"tie_weights_\": true,\n \"torch_dtype\": \"float32\",\n \"transformers_version\": \"4.20.1\",\n \"vocab_size\": 28996\n}\n\nhttps://huggingface.co/rajistics/distilbert-imdb-mlflow/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpmdxxho8z\n"]}},{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"datasetInfos":[],"data":{"text/plain":"Downloading: 0%| | 0.00/251M [00:00\n\nYou can then click the experiment page icon to display the more detailed MLflow experiment page ([AWS](https://docs.databricks.com/applications/mlflow/tracking.html#notebook-experiments)|[Azure](https://docs.microsoft.com/azure/databricks/applications/mlflow/tracking#notebook-experiments)|[GCP](https://docs.gcp.databricks.com/applications/mlflow/tracking.html#notebook-experiments)). This page allows you to compare runs and view details for specific runs.\n\n"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"70e02a64-6878-4b9b-9297-5390c9e19ddc"}}},{"cell_type":"code","source":["runs = mlflow.search_runs(\"3759898664210413\")"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"00ff072f-fccd-41ae-ba16-60340e4b6379"}},"outputs":[],"execution_count":0},{"cell_type":"code","source":["import pandas\nruns.to_csv(\"output/mlflow_runs.csv\")"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"be4baec5-089b-4537-a53d-dfe183f2c68b"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"datasetInfos":[],"data":"fatal: not a git repository (or any of the parent directories): .git\r\nfatal: not a git repository (or any of the parent directories): .git\r\nfatal: not a git repository (or any of the parent directories): .git\r\n","removedWidgets":[],"addedWidgets":{},"metadata":{},"type":"ansi","arguments":{}}},"output_type":"display_data","data":{"text/plain":["fatal: not a git repository (or any of the parent directories): .git\r\nfatal: not a git repository (or any of the parent directories): .git\r\nfatal: not a git repository (or any of the parent directories): .git\r\n"]}}],"execution_count":0},{"cell_type":"code","source":["%sh\ncd output\ngit add mlflow_runs.csv\ngit commit -m \"Add MLFlow results\"\ngit push"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"410156e8-69a3-4adc-84f2-72df68582c81"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"datasetInfos":[],"data":"[main 7d3e3d7] Add MLFlow results\n 1 file changed, 4 insertions(+)\n create mode 100644 mlflow_runs.csv\nTo https://huggingface.co/rajistics/distilbert-imdb-mlflow\n 2e139c6..7d3e3d7 main -> main\n","removedWidgets":[],"addedWidgets":{},"metadata":{},"type":"ansi","arguments":{}}},"output_type":"display_data","data":{"text/plain":["[main 7d3e3d7] Add MLFlow results\n 1 file changed, 4 insertions(+)\n create mode 100644 mlflow_runs.csv\nTo https://huggingface.co/rajistics/distilbert-imdb-mlflow\n 2e139c6..7d3e3d7 main -> main\n"]}}],"execution_count":0}],"metadata":{"application/vnd.databricks.v1+notebook":{"notebookName":"ML Quickstart: Model Training with HuggingFace","dashboards":[],"notebookMetadata":{"pythonIndentUnit":2},"language":"python","widgets":{},"notebookOrigID":3759898664210413}},"nbformat":4,"nbformat_minor":0}