Spaces:
Running
Running
add `pip install daft` line to intro
Browse files
daft/01_what_makes_daft_special.py
CHANGED
@@ -35,17 +35,13 @@ def _(mo):
|
|
35 |
Daft is a distributed query engine designed to handle a wide array of data tasks, from data engineering and analytics to powering ML/AI workflows. It provides both a Python DataFrame API, familiar to users of libraries like Pandas, and a SQL interface, allowing you to choose the interaction style that best suits your needs or the task at hand.
|
36 |
|
37 |
The main goal of Daft is to provide a robust and versatile platform for processing data, whether it's gigabytes on your laptop or petabytes on a cluster.
|
|
|
|
|
38 |
"""
|
39 |
)
|
40 |
return
|
41 |
|
42 |
|
43 |
-
@app.cell(hide_code=True)
|
44 |
-
def _(daft, mo):
|
45 |
-
mo.md(f"""You're running Daft version: `{daft.__version__}`""")
|
46 |
-
return
|
47 |
-
|
48 |
-
|
49 |
@app.cell(hide_code=True)
|
50 |
def _(df_with_discount, discount_slider, mo):
|
51 |
mo.vstack(
|
@@ -122,9 +118,7 @@ def _(mo):
|
|
122 |
|
123 |
@app.cell(hide_code=True)
|
124 |
def _(mo):
|
125 |
-
mo.md(
|
126 |
-
r"""A cornerstone of Daft's design is **lazy execution**. Imagine defining a DataFrame with a trillion rows on your laptop – usually not a great prospect for your device's memory!"""
|
127 |
-
)
|
128 |
return
|
129 |
|
130 |
|
@@ -141,9 +135,7 @@ def _(daft):
|
|
141 |
|
142 |
@app.cell(hide_code=True)
|
143 |
def _(mo):
|
144 |
-
mo.md(
|
145 |
-
r"""With Daft, this is perfectly fine. Operations like `with_column` or `filter` don't compute results immediately. Instead, Daft builds a *logical plan* – a blueprint of the transformations you've defined. You can inspect this plan:"""
|
146 |
-
)
|
147 |
return
|
148 |
|
149 |
|
@@ -155,9 +147,7 @@ def _(mo, trillion_rows_df):
|
|
155 |
|
156 |
@app.cell(hide_code=True)
|
157 |
def _(mo):
|
158 |
-
mo.md(
|
159 |
-
r"""This plan is only executed (and data materialized) when you explicitly request it (e.g., with `.show()`, `.collect()`, or by writing to a file). Before execution, Daft's optimizer works to make your query run as efficiently as possible. This approach allows you to define complex operations on massive datasets without immediate computational cost or memory overflow."""
|
160 |
-
)
|
161 |
return
|
162 |
|
163 |
|
@@ -227,17 +217,13 @@ def _(daft):
|
|
227 |
|
228 |
@app.cell(hide_code=True)
|
229 |
def _(mo):
|
230 |
-
mo.md(
|
231 |
-
r"""> Example inspired by the great post [Exploring Art with TypeScript, Jupyter, Polars, and Observable Plot](https://deno.com/blog/exploring-art-with-typescript-and-jupyter) published on Deno's blog."""
|
232 |
-
)
|
233 |
return
|
234 |
|
235 |
|
236 |
@app.cell(hide_code=True)
|
237 |
def _(mo):
|
238 |
-
mo.md(
|
239 |
-
r"""In later chapters, we'll explore in more detail how to work with these image objects and other complex types, including applying User-Defined Functions (UDFs) for custom processing. Until then, you can [take a look at a more complex example](https://blog.getdaft.io/p/we-cloned-over-15000-repos-to-find), in which Daft is used to clone over 15,000 GitHub repos to find the best developers."""
|
240 |
-
)
|
241 |
return
|
242 |
|
243 |
|
|
|
35 |
Daft is a distributed query engine designed to handle a wide array of data tasks, from data engineering and analytics to powering ML/AI workflows. It provides both a Python DataFrame API, familiar to users of libraries like Pandas, and a SQL interface, allowing you to choose the interaction style that best suits your needs or the task at hand.
|
36 |
|
37 |
The main goal of Daft is to provide a robust and versatile platform for processing data, whether it's gigabytes on your laptop or petabytes on a cluster.
|
38 |
+
|
39 |
+
Let's go ahead and `pip install daft` to see it in action!
|
40 |
"""
|
41 |
)
|
42 |
return
|
43 |
|
44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
@app.cell(hide_code=True)
|
46 |
def _(df_with_discount, discount_slider, mo):
|
47 |
mo.vstack(
|
|
|
118 |
|
119 |
@app.cell(hide_code=True)
|
120 |
def _(mo):
|
121 |
+
mo.md(r"""A cornerstone of Daft's design is **lazy execution**. Imagine defining a DataFrame with a trillion rows on your laptop – usually not a great prospect for your device's memory!""")
|
|
|
|
|
122 |
return
|
123 |
|
124 |
|
|
|
135 |
|
136 |
@app.cell(hide_code=True)
|
137 |
def _(mo):
|
138 |
+
mo.md(r"""With Daft, this is perfectly fine. Operations like `with_column` or `filter` don't compute results immediately. Instead, Daft builds a *logical plan* – a blueprint of the transformations you've defined. You can inspect this plan:""")
|
|
|
|
|
139 |
return
|
140 |
|
141 |
|
|
|
147 |
|
148 |
@app.cell(hide_code=True)
|
149 |
def _(mo):
|
150 |
+
mo.md(r"""This plan is only executed (and data materialized) when you explicitly request it (e.g., with `.show()`, `.collect()`, or by writing to a file). Before execution, Daft's optimizer works to make your query run as efficiently as possible. This approach allows you to define complex operations on massive datasets without immediate computational cost or memory overflow.""")
|
|
|
|
|
151 |
return
|
152 |
|
153 |
|
|
|
217 |
|
218 |
@app.cell(hide_code=True)
|
219 |
def _(mo):
|
220 |
+
mo.md(r"""> Example inspired by the great post [Exploring Art with TypeScript, Jupyter, Polars, and Observable Plot](https://deno.com/blog/exploring-art-with-typescript-and-jupyter) published on Deno's blog.""")
|
|
|
|
|
221 |
return
|
222 |
|
223 |
|
224 |
@app.cell(hide_code=True)
|
225 |
def _(mo):
|
226 |
+
mo.md(r"""In later chapters, we'll explore in more detail how to work with these image objects and other complex types, including applying User-Defined Functions (UDFs) for custom processing. Until then, you can [take a look at a more complex example](https://blog.getdaft.io/p/we-cloned-over-15000-repos-to-find), in which Daft is used to clone over 15,000 GitHub repos to find the best developers.""")
|
|
|
|
|
227 |
return
|
228 |
|
229 |
|