Florent Brosse commited on
Commit
b2183ac
1 Parent(s): 948d8e0

use custom model

Browse files
Files changed (4) hide show
  1. LICENCE +21 -0
  2. NOTICE +28 -0
  3. README.md +229 -1
  4. app.py +5 -3
LICENCE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Copyright (2022) Databricks, Inc.
2
+
3
+ This library (the "Software") may not be used except in connection with the Licensee's use of the Databricks Platform Services pursuant to an Agreement (defined below) between Licensee (defined below) and Databricks, Inc. ("Databricks"). The Object Code version of the Software shall be deemed part of the Downloadable Services under the Agreement, or if the Agreement does not define Downloadable Services, Subscription Services, or if neither are defined then the term in such Agreement that refers to the applicable Databricks Platform Services (as defined below) shall be substituted herein for “Downloadable Services.” Licensee's use of the Software must comply at all times with any restrictions applicable to the Downlodable Services and Subscription Services, generally, and must be used in accordance with any applicable documentation. For the avoidance of doubt, the Software constitutes Databricks Confidential Information under the Agreement.
4
+
5
+ Additionally, and notwithstanding anything in the Agreement to the contrary:
6
+
7
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
8
+ you may view, make limited copies of, and may compile the Source Code version of the Software into an Object Code version of the Software. For the avoidance of doubt, you may not make derivative works of Software (or make any any changes to the Source Code version of the unless you have agreed to separate terms with Databricks permitting such modifications (e.g., a contribution license agreement)).
9
+ If you have not agreed to an Agreement or otherwise do not agree to these terms, you may not use the Software or view, copy or compile the Source Code of the Software.
10
+
11
+ This license terminates automatically upon the termination of the Agreement or Licensee's breach of these terms. Additionally, Databricks may terminate this license at any time on notice. Upon termination, you must permanently delete the Software and all copies thereof (including the Source Code).
12
+
13
+ Agreement: the agreement between Databricks and Licensee governing the use of the Databricks Platform Services, which shall be, with respect to Databricks, the Databricks Terms of Service located at www.databricks.com/termsofservice, and with respect to Databricks Community Edition, the Community Edition Terms of Service located at www.databricks.com/ce-termsofuse, in each case unless Licensee has entered into a separate written agreement with Databricks governing the use of the applicable Databricks Platform Services.
14
+
15
+ Databricks Platform Services: the Databricks services or the Databricks Community Edition services, according to where the Software is used.
16
+
17
+ Licensee: the user of the Software, or, if the Software is being used on behalf of a company, the company.
18
+
19
+ Object Code: is version of the Software produced when an interpreter or a compiler translates the Source Code into recognizable and executable machine code.
20
+
21
+ Source Code: the human readable portion of the Software.
NOTICE ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Copyright (2022) Databricks, Inc.
2
+ ## License
3
+ This Software includes software developed at Databricks (https://www.databricks.com/) and its use is subject to the included LICENSE file.
4
+
5
+ This Software contains code from the following open source projects, licensed under the Apache 2.0 license:
6
+
7
+ psf/requests - https://github.com/psf/requests
8
+ Copyright 2019 Kenneth Reitz
9
+
10
+ ## Data collection
11
+ To improve users experience and dbdemos asset quality, dbdemos sends report usage and capture views in the installed notebook (usually in the first cell) and other assets like dashboards. This information is captured for product improvement only and not for marketing purpose, and doesn't contain PII information. By using `dbdemos` and the assets it provides, you consent to this data collection. If you wish to disable it, you can set `Tracker.enable_tracker` to False in the `tracker.py` file.
12
+
13
+ ## Resource creation
14
+ To simplify your experience, `dbdemos` will create and start for you resources. As example, a demo could start (not exhaustive):
15
+ - A cluster to run your demo
16
+ - A Delta Live Table Pipeline to ingest data
17
+ - A DBSQL endpoint to run DBSQL dashboard
18
+ - An ML model
19
+
20
+ While `dbdemos` does its best to limit the consumption and enforce resource auto-termination, you remain responsible for the resources created and the potential consumption associated.
21
+
22
+ ## Catalog/Database created
23
+ dbdemos will try to create catalogs & databases (schemas). Demos are using the hive_metastore or UC catalogs. dbdemos will try to use the dbdemos catalog when possible.
24
+
25
+ Permissions / ownership can be granted to all users (account users) in these datasets.
26
+
27
+ ## Support
28
+ Databricks does not offer official support for `dbdemos` and the associated assets.
README.md CHANGED
@@ -9,4 +9,232 @@ app_file: app.py
9
  pinned: false
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  pinned: false
10
  ---
11
 
12
+ # dbdemos
13
+
14
+ DBDemos is a toolkit to easily install Lakehouse demos for Databricks.
15
+
16
+ Simply deploy & share demos on any workspace. dbdemos is packaged with a list of demos:
17
+
18
+ - Lakehouse, end-to-end demos (ex: Lakehouse Retail Churn)
19
+ - Product demos (ex: Delta Live Table, CDC, ML, DBSQL Dashboard, MLOps...)
20
+
21
+ **Please visit [dbdemos.ai](https://www.dbdemos.ai) to explore all our demos.**
22
+
23
+ ## Installation
24
+ **Do not clone the repo, just pip install dbdemos wheel:**
25
+
26
+ ```
27
+ %pip install dbdemos
28
+ ```
29
+
30
+ ## Usage within Databricks
31
+
32
+ See [demo video](https://drive.google.com/file/d/12Iu50r7hlawVN01eE_GoUKBQ4kvUrR56/view?usp=sharing)
33
+ ```
34
+ import dbdemos
35
+ dbdemos.help()
36
+ dbdemos.list_demos()
37
+
38
+ dbdemos.install('lakehouse-retail-c360', path='./', overwrite = True)
39
+ ```
40
+
41
+ ![Dbdemos install](https://github.com/databricks-demos/dbdemos/raw/main/resources/dbdemos-screenshot.png)
42
+
43
+ ## Requirements
44
+
45
+ `dbdemos` requires the current user to have:
46
+ * Cluster creation permission
47
+ * DLT Pipeline creation permission
48
+ * DBSQL dashboard & query creation permission
49
+ * For UC demos: Unity Catalog metastore must be available (demo will be installed but won't work)
50
+
51
+ New with 0.2: dbdemos can import/export dahsboard without the import/export preview (using the dbsqlclone toolkit)
52
+
53
+ ## Features
54
+
55
+ * Load demo notebooks (pre-run) to the given path
56
+ * Start job to load dataset based on demo requirement
57
+ * Start demo cluster customized for the demo & the current user
58
+ * Setup DLT pipelines
59
+ * Setup DBSQL dashboard
60
+ * Create ML Model
61
+ * Demo links are updated with resources created for an easy navigation
62
+
63
+ ## Feedback
64
+
65
+ Demo not working? Can't use dbdemos? Please open a github issue. <br/>
66
+ Make sure you mention the name of the demo.
67
+
68
+ # DBDemos Developer options
69
+
70
+ Read the following if you want to add a new demo bundle.
71
+
72
+ ## Packaging a demo with dbdemos
73
+
74
+ Your demo must contain a `_resources` folder where you include all initialization scripts and your bundle configuration file.
75
+
76
+ ### Links & tags
77
+ DBdemos will dynamically override the link to point to the resources created.
78
+
79
+ **Always use links relative to the local path to support multi workspaces. Do not add the workspace id.**
80
+
81
+ #### DLT pipelines:
82
+ Your DLT pipeline must be added in the bundle file (see below).
83
+ Within your notebook, to identify your pipeline using the id in the bundle file, specify the id `dbdemos-pipeline-id="<id>"`as following:
84
+
85
+ `<a dbdemos-pipeline-id="dlt-churn" href="#joblist/pipelines/a6ba1d12-74d7-4e2d-b9b7-ca53b655f39d" target="_blank">Delta Live Table pipeline</a>`
86
+
87
+ #### Workflows:
88
+ Your workflows must be added in the bundle file (see below).
89
+ Within your notebook, to identify your workflow using the id in the bundle file, specify the id `dbdemos-workflow-id="<id>"`as following:
90
+
91
+ `<a dbdemos-workflow-id="credit-job" href="#joblist/pipelines/a6ba1d12-74d7-4e2d-b9b7-ca53b655f39d" target="_blank">Access your workflow</a>`
92
+
93
+
94
+ #### DBSQL dashboards:
95
+ DBSQL dashboard are automatically downloaded during the process. No need to add them in the bundle file. Simply use links as following:
96
+
97
+ ` <a href="/sql/dashboards/19394330-2274-4b4b-90ce-d415a7ff2130" target="_blank">Churn Analysis Dashboard</a>`
98
+
99
+
100
+
101
+ ### bundle_config
102
+ The demo must contain the a `./_resources/bundle_config` file containing your bundle definition.
103
+ This need to be a notebook & not a .json file (due to current api limitation).
104
+
105
+ ```json
106
+ {
107
+ "name": "<Demo name, used in dbdemos.install('xxx')>",
108
+ "category": "<Category, like data-engineering>",
109
+ "title": "<Title>.",
110
+ "description": "<Description>",
111
+ "bundle": <Will bundle when True, skip when False>,
112
+ "tags": [{"dlt": "Delta Live Table"}],
113
+ "notebooks": [
114
+ {
115
+ "path": "<notebbok path from the demo folder (ex: resources/00-load-data)>",
116
+ "pre_run": <Will start a job to run it before packaging to get the cells results>,
117
+ "publish_on_website": <Will add the notebook in the public website (with the results if it's pre_run=True)>,
118
+ "add_cluster_setup_cell": <if True, add a cell with the name of the demo cluster>,
119
+ "title": "<Title>",
120
+ "description": "<Description (will be in minisite also)>",
121
+ "parameters": {"<key>": "<value. Will be sent to the pre_run job>"}
122
+ }
123
+ ],
124
+ "init_job": {
125
+ "settings": {
126
+ "name": "demos_dlt_cdc_init_{{CURRENT_USER_NAME}}",
127
+ "email_notifications": {
128
+ "no_alert_for_skipped_runs": False
129
+ },
130
+ "timeout_seconds": 0,
131
+ "max_concurrent_runs": 1,
132
+ "tasks": [
133
+ {
134
+ "task_key": "init_data",
135
+ "notebook_task": {
136
+ "notebook_path": "{{DEMO_FOLDER}}/_resources/01-load-data-quality-dashboard",
137
+ "source": "WORKSPACE"
138
+ },
139
+ "job_cluster_key": "Shared_job_cluster",
140
+ "timeout_seconds": 0,
141
+ "email_notifications": {}
142
+ }
143
+ ]
144
+ .... Full standard job definition
145
+ }
146
+ },
147
+ "pipelines": <list of DLT pipelines if any>
148
+ [
149
+ {
150
+ "id": "dlt-cdc", <id, used in the notebook links to go to the generated notebook: <a dbdemos-pipeline-id="dlt-cdc" href="#joblist/pipelines/xxxx">installed DLT pipeline</a> >
151
+ "run_after_creation": True,
152
+ "definition": {
153
+ ... Any DLT pipelineconfiguration...
154
+ "libraries": [
155
+ {
156
+ "notebook": {
157
+ "path": "{{DEMO_FOLDER}}/_resources/00-Data_CDC_Generator"
158
+ }
159
+ }
160
+ ],
161
+ "name": "demos_dlt_cdc_{{CURRENT_USER_NAME}}",
162
+ "storage": "/demos/dlt/cdc/{{CURRENT_USER_NAME}}",
163
+ "target": "demos_dlt_cdc_{{CURRENT_USER_NAME}}"
164
+ }
165
+ }
166
+ ],
167
+ "workflows": [{
168
+ "start_on_install": False,
169
+ "id": "credit-job",
170
+ "definition": {
171
+ "settings": {
172
+ ... full pipeline settings
173
+ }
174
+ }]
175
+ }
176
+ ```
177
+
178
+ dbdemos will replace the values defined as {{<KEY>}} based on who install the demo. Supported keys:
179
+ * TODAY
180
+ * CURRENT_USER (email)
181
+ * CURRENT_USER_NAME (derivated from email)
182
+ * DEMO_NAME
183
+ * DEMO_FOLDER
184
+
185
+
186
+ # DBDemo Installer configuration
187
+
188
+ The following describe how to package the demos created.
189
+
190
+ The installer needs to fetch data from a workspace & start jobs. To do so, it requires informations `local_conf.json`
191
+ ```json
192
+ {
193
+ "pat_token": "xxx",
194
+ "username": "xx.xx@databricks.com",
195
+ "url": "https://xxx.databricks.com",
196
+ "repo_staging_path": "/Repos/xx.xx@databricks.com",
197
+ "repo_name": "<field-demos_build>",
198
+ "repo_url": "https://github.com/<repo containing demos to package>",
199
+ "branch": "master",
200
+ "current_folder": "<Used to mock the current folder outside of a notebook, ex: /Users/quentin.ambard@databricks.com/test_install_demo>"
201
+ }
202
+ ```
203
+
204
+ ### Creating the bundles:
205
+ ```python
206
+ bundler = JobBundler(conf)
207
+ # the bundler will use a stating repo dir in the workspace to analyze & run content.
208
+ bundler.reset_staging_repo(skip_pull=False)
209
+ # Discover bundles from repo:
210
+ bundler.load_bundles_conf()
211
+ # Or manually add bundle to run faster:
212
+ #bundler.add_bundle("product_demos/Auto-Loader (cloudFiles)")
213
+
214
+ # Run the jobs (only if there is a new commit since the last time, or failure, or force execution)
215
+ bundler.start_and_wait_bundle_jobs(force_execution = False)
216
+
217
+ packager = Packager(conf, bundler)
218
+ packager.package_all()
219
+ ```
220
+
221
+
222
+ ## Licence
223
+ See LICENSE file.
224
+
225
+ ## Data collection
226
+ To improve users experience and dbdemos asset quality, dbdemos sends report usage and capture views in the installed notebook (usually in the first cell) and dashboards. This information is captured for product improvement only and not for marketing purpose, and doesn't contain PII information. By using `dbdemos` and the assets it provides, you consent to this data collection. If you wish to disable it, you can set `Tracker.enable_tracker` to False in the `tracker.py` file.
227
+
228
+ ## Resource creation
229
+ To simplify your experience, `dbdemos` will create and start for you resources. As example, a demo could start (not exhaustive):
230
+ - A cluster to run your demo
231
+ - A Delta Live Table Pipeline to ingest data
232
+ - A DBSQL endpoint to run DBSQL dashboard
233
+ - An ML model
234
+
235
+ While `dbdemos` does its best to limit the consumption and enforce resource auto-termination, you remain responsible for the resources created and the potential consumption associated.
236
+
237
+ ## Support
238
+ Databricks does not offer official support for `dbdemos` and the associated assets.
239
+ For any issue with `dbdemos` or the demos installed, please open an issue and the demo team will have a look on a best effort basis.
240
+
app.py CHANGED
@@ -40,9 +40,11 @@ demo = gr.ChatInterface(
40
  chatbot=gr.Chatbot(height=400),
41
  textbox=gr.Textbox(placeholder="Ask me a question",
42
  container=False, scale=7),
43
- title="Chat with a Databricks LLM serving endpoint",
44
- description="This is an advanced model hosted on Databricks Serving.",
45
- examples=[["Hello"], ["What is MLflow?"], ["What is Apache Spark?"]],
 
 
46
  cache_examples=False,
47
  theme="soft",
48
  retry_btn=None,
 
40
  chatbot=gr.Chatbot(height=400),
41
  textbox=gr.Textbox(placeholder="Ask me a question",
42
  container=False, scale=7),
43
+ title=""Databricks LLM RAG demo - Chat with llama2 Databricks model serving endpoint,
44
+ description="This chatbot is a demo example for the dbdemos llm chatbot
45
+ This content is provided as a LLM RAG educational example, without support. It is using llama2, can hallucinate and should not be used as production content.
46
+ Please review our dbdemos license and terms for more details.",
47
+ examples=[["How can I start a Databricks cluster?"]],
48
  cache_examples=False,
49
  theme="soft",
50
  retry_btn=None,