File size: 69,657 Bytes
24b744c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
Website,Reviewer,Designation,Ratings,PROS,CONS
Capterra,Verified Reviewer,,4,Easy to use application to build data lakes to run reports on top of it. The Databricks internal architecture helps to run the reports faster.,The data bricks features are completely wrapper and delivered as snowflake (product) . The Databricks should come up with more features to stand out in the market.
Capterra,Verified Reviewer,,5,Databricks is a great platform for working with a huge amount of data. But the most interesting feature is the ability to use Magic query. Magic query allows users to write code in multiple languages in the same notebook.,"Databricks is a go-to platform for most of the analysis and processing workload, but when it comes to the financials, it becomes a little more expensive. And as a result, not a lot of the projects are reliable to be developed in Databricks."
Capterra,Shamaas H.,Software Engineer,5,"I love the z indexing, which allows for really fast querying of data. Optimized by spark it is great.",The data visualization are subpar. I wish there were better libraries to integrate and visual the data.
Capterra,Iulian N.,PM,5,Open source -Built upon excellent technologies -Broad set of data ingestion sources -Reliable and scalable -Cost efficient data processing,Can get overwhelming when you start using it -Would be nice to be able visualize data on the fly
Capterra,Verified Reviewer,,5,Documentation is GREAT - Implementation is mostly straight-forward - The service is easy to use and full of features - Support is top-notch - The interviews we had with the DB guys were more like a peers meeting than a corporate call (I love this) - Our DS lead engineer totally loves it,Can't really speak of anything that we don't like about the product at the moment
Capterra,Vipul C.,Principal Analyst,5,Our team can collaborate on a project simultaneously and make changes to the scripts. It is fast and reliable.,Sometimes we need to restart the cluster when system gets crashed.
Capterra,Cody M.,Controller,4,Databricks was able to pull data from our core and create specialized dashboarding / reporting that automated a host of manual process that took hours per week. It is now totally hands off and management get review the data in just a few clicks.,"It can be extremely confusing given the sheer breadth of tools available. The initial setup and connections certainly require an experienced professional, but once up and running, less-technical users can utilize."
Capterra,Verified Reviewer,,5,"The infrastructure is very simple, I have started using Community edition, and then switched to the paid version, however community edition covers most of your needs if you are a student or doing one time projects.","There are no many users of Pyspark, sometimes finding some answers is hard /there are no many forums, resources, no many questions in Stack overflow/"
Capterra,Andrew K.,Senior Program Manager,5,"Databricks allows data science teams to do things that they normally would not be able to do without a much greater level of technical ability. Their mission is ""making big data simple"" and they definitely deliver on that promise.","One area where there's still potential to improve further is around making machine learning more accessible. Currently ML still requires a pretty significant degree of data engineering knowledge, but I would love to see Databricks make ML even more accessible."
Capterra,Mallikarjuna D.,Lead consultant,5, I'm one of active user using this software day to day needs its pioneer data store layer by holding transactional process stream line it and hold the information by applying business rules.,It's pioneer to to hold the source raw traditions as a refined layer to store the data for longer time
Capterra,Verified Reviewer,,3,Enables simultaneous collaborative work with colleagues - Easy to mix spark queries and python for extra analyses and plots - Handful visualization modes for query results (tables and plots with aggregations),"Hard to manage notebook workspace - Sometimes it gets really slow to run queries - AFAIK, there aren't visualization options for datasets (without running queries)"
Capterra,Rita R.,ux designer,3,"In databricks is easy to transfer the result of a spark query to the python environment, and it has several plots with automatic aggregations","Databricks has a bad file management system and it is slow sometimes. In addition there are no ways to make a visual query, without using code."
Capterra,Verified Reviewer,,5,"What I like most about Databricks is the amount of integrations the platform provides to the user. With Databricks, you can create datasets, develop machine learning models, and analyze performance automatically by setting up a job periodically. Whether the user is an engineer, data scientist, or business analyst, Databricks can streamline everyone's work.",What I least like about Databricks is the instability that usually occurs when there are too many users trying to run their notebooks on the same cluster at the same time.
Capterra,Verified Reviewer,,5,This product has democratized big data computation. Its very easy to move from any platform to this product as it supports most of the languages.,Nothing so far- may be cost of computation can improve over time but still an economical product to build in-house big data capability.
Capterra,Dan S.,SA,4,Easy to use user interface Can be widely shared across an enterprise with various teams Apache Spark Cluster part of product,Information Security considerations have to be taken into account due to need for integrations with databricks VPCs when hosted in AWS
Capterra,Rayla V.,Graduate Research Assistant,5,"I love how easy it is to deploy auto-scaling machine learning models. After a machine learning model is trained, you can just click a button to deploy the model, I believe in a container, and have it auto scale as needed. You can also specify the minimum and maximum size of the deployment to reduce costs but to keep up with the workload as necessary. It is also built around Spark, so tasks involving ""big data"" aren't an issue.","Some of the cons are that the primary language is Java/Scala, whereas many data scientists are using python or R, which run slower on Databricks than Java and Scala. Also, the main interface via coding, which can limit a lot of citizen data scientists."
Capterra,Robert G.,DB Architect,4,"I'm a SQL person, so being able to run big data analytics in my preferred language was quite nice. Being able to (near) seamlessly swap between Scala, SQL, and python in the same script is quite powerful. If you don't know how to do something easily in one language, do it in another and then swap back. It's pretty performant and querying non-indexed data dumped from the source systems, even if those datasets aren;t quite ""big data"". I found it to be quicker to dump 100mil rows of staged date from our on-prem server to the data lake and crunch it in Databricks than it was to run in SQL.","I wasn't involved in the pricing piece, but from what I understand it's fairly expensive. The clusters can be spun up or down as needed, and there's a nice inactivity shutdown feature if you forget to turn off a test cluster, or something. I also had a pretty rough time getting an Azure Gen 2 Data Lake connected, but after finding the not-so-well-documented bug, it wasn't a big deal."
Capterra,Douglas F.,Senior business analyst,5,The access and manipulation of data. The software is very fast and great to manipulate and treat data. Also it is possible to build models.,"The lack of options of visualization and creation of dashboards. The creation of dashboards is possible, but is not intuitive."
Capterra,Balashowry Preetam S.,data scientist,5,"I like the portal page, which connects all Azure subscriptions."," It can be difficult to understand, and not much tutorial is available."
G2.com,Hoora R,Machine Learning eng,5,"its powerful data analytics and machine learning capabilities. The platform includes built-in tools and libraries for data analysis, visualization, and machine learning, allowing users to perform complex data modeling and analysis tasks with ease.

ffers a collaborative and flexible working environment, with support for multiple programming languages and easy integration with popular development tools. This makes it an ideal choice for data teams and organizations of all sizes looking to streamline their data processing and analysis workflows.","I don't like some of the documentation. Some of the features are not being maintained properly and some of the features that are mainly needed never get added. However, I don't think this is an issue with Databricks but rather an issue on MLFLow."
G2.com,"
Maaz Ahmed A.",,4.5,"One of the key advantages of Databricks Lakehouse Platform is its unified approach to data management, which allows organizations to manage all types of data, including structured, semi-structured, and unstructured, in a single location. This simplifies data management and provides a unified view of all data, enabling better decision-making.

Another advantage is its scalability and performance. Databricks Lakehouse Platform is designed to handle large volumes of data and can scale horizontally as well as vertically. It also provides high-speed data processing and query performance, thanks to its distributed architecture and optimized computing engines.

The platform's built-in capabilities for machine learning and AI is another advantage. This allows organizations to easily integrate machine learning and AI into their data workflows and derive insights and value from their data.","One potential challenge is the learning curve associated with the platform. Databricks Lakehouse Platform requires a certain level of technical expertise and familiarity with the tools and technologies used in the platform, such as Apache Spark, SQL, and Python. This can make it challenging for some organizations to adopt the platform, especially if they lack the necessary expertise.

Another potential limitation is the cost associated with the platform. Databricks Lakehouse Platform is a commercial product, and as such, it requires a subscription or licensing fee. This can be a barrier to entry for some organizations, especially smaller ones with limited budgets."
G2.com,"
Sudarsan M.",Solution architect,4.5,"Databricks Serverless SQL with Photon Query acceleration for data analyst & business analyst

In-built Visualization & dashboards, along with GeoSaptial & Advanced SQL functions

Unified Pipeline for Structure streaming batch & real-time ingestion

Auto-loader for standard formats of file ingestion & Schema Evolution in-built

Delta Live Table for data Engineering Workloads & Pipelines

Databricks Multi-task Orchestration job worklfows

Unity Catalog Metstaore & its integration with other data catalogs

MLFlow for building and tracking ML experiments & Feature Store for centralized feature supply for production/inference models

Time Travel & Z-order Optimization","Need to build a more comprehensive orchestration workflow JOBS panel for a diverse set of pattern design workflows

Serverless Cluster for Data Engineering Streaming/Batch pipelines

Integrate most IDE features into the notebook

Clear documentation on Custom Databricks runtime docker image creation will be helpful

Lineage & flow monitoring dashboard can be built automated for non-DLT jobs as well

DLT implementation can be extended to other DELTA format supporting warehouse in future"
G2.com,"
Matthew V.",Data Engineering,4,"Repo deployment allows my team to collaboratively develop against databricks resources while still using their local development toolkit, and quickly deploy out to it when they're ready

- Delta live tables are a breeze to set up and get streaming data into the lakehouse

- Language mixing is very nice; most of my data engineering work is SQL focused, however I can leverage Python or Scala for more complex data manipulation, all within the same notebook"," Data explorer can be incredibly slow and cumbersome if your datalake is unevenly distributed

- Cold starting clusters can take a frustratingly long amount of time, at least for the way our clusters are set up (the minimum size for our cluster options are i3.xlarge on AWS)

- While developing in notebooks is nice, the concept of running notebooks in production where anyone can edit from the ui is concerning, wish there was more ways to ""lock"" down production processes"
G2.com,Amr A,,5,"Keep updating the notebook platform (e,g, keep adding new features such as local variables track).

- MLFLOW experiment and Model registry, where all trained models can be tracked and registered in one place","Connect my local code in Visual code to my Databricks cluster so I can run the code on the cluster. The old databricks-connect approach has many bugs and is hard to set up. The new Databricks extension on Visual Code, doesn't allow the developers to debug their code line by line (only we can run the code)."
G2.com,samin F,,4.5,"Its ability to seamlessly integrate data processing, analytics, and machine learning workflows, its scalability and performance, and its support for a wide range of data sources and programming languages. I like that it get UI updates on the notebooks.",The learning curve for new users can be steep. There are limited documentation on markdowns in notebooks and it can be faster. I would like to see it faster and improved.
G2.com,Greg T,Data scientist,3.5,"I have been using databricks for almost 4 years and it has been a great asset to our development as a team and our product.

Shared folders of re-usable and tracked notebooks allow us to work on tasks only once, minimising duplication of work, which in turn accelerates development cycle.

One of my personal favourites are the workflows, that allowed us to automate a variety of tasks, which availed capacity for us to focus on the right problems at the right time.

Another great selling point for me, is that collaborators can see each other typing and highlighting live.","UX could be improved

While I appreciate the addition of new features, developments and experiments, the frequency of changes made it tiring and frustrating for me recently.

Too much, too frequently. The 'new notebook editor' is a great example here. The editor itself could be a very useful change, but changing all the keyboard shortcuts at the same time without letting the user know is questionable to me.

I would prefer it, if changes were rolled out less frequently with detailed patch updates (see Dota 2 for example), and configurable options in the user settings.

E.g. I would use the experimental 'new notebook editor' if I could keep the keyboard shortcuts the same.

Less frequent, more configurable updates please.

One of the biggest pain point for me is the Log In and Log Out process. Why does Databricks have to log me out every couple of hours? Especially while I am typing in a command cell?

Could this be improved please?

Also, would love it if libraries on clusters could be updated without having to restart the cluster.

Having said all this, I do love some of the new features, such as the new built-in visualisation tool, however would love it even more if titles could be added and adjusted."
G2.com,"
Norman L.",Data Scientist,4.5,"I have been using Databricks platform for business research projects and building ML models for almost a year. It has been a great experience to be able to run analysis and model testing for big data projects in a single platform without switching between SQL server and development environment with Python, R, or Stata. Also, I like the fact that MLflow can track data ingestion for any data shift in realtime for model retraining purposes.","We have had issues using MLflow and feature store on Databricks for ML projects, which slows down the development process. Wish there was better documentation on these tools or more diverse examples to demonstrate different use cases. Also, the test-train split with MLflow does not support time series time interval test-train split for model validation purposes."
G2.com,"
Ankit R.",Software Engineer,4.5,"Databricks come with everything in one place, and that is the best thing about the platform. Now I don't have to go to EMR page to create an ERM and then create a notebook. All these can be done with just one click and one page. One gets to jump from notebooks to Query service and to AI in just one click. You get data store views, auto type suggestions based on table and schema context. All these makes the experience very smooth and fast.","Queries get refreshed once in a while, where if some queries are not saved explicitly get deleted.

There is no dark mode, hence have to use a white theme.

Sometimes your queries get pushed to long queues which takes a lot of time and can be little frustrating."
G2.com,Thabiso K.,Lead Recruiter,5,"The platform is easy to use and collaborate with my colleagues. Deploying to production is simple and can be even easier if you choose the non-self-hosted option. Although Databricks does some of the heavy lifting, it's still open enough to allow teams to use their own flexibility and complex processes without too much configuration. ""","Platforms constantly change as they adapt, so staying on top of everything can be difficult. - If you don't have a CICD system in place, once you hit a certain number it starts to get difficult to manage."
G2.com,"
Vishaw S.",Data engineering,4.5,"We employ Python, Spark, and SQL to develop ELT pipelines, and Databricks is the most reliable and user-friendly option available. Developers may concentrate on writing code, creating pipelines, and creating models rather than spending time setting up the environment because it is very simple to do so.","Knowledge of the cost model and recommendation engine to reduce burned DBUs There is an Overwatch notebook that offers general statistics about the environment, but it isn't developed enough and it also doesn't show you the cost of the infrastructure used in the back cloud kitchen. Platform as a whole is excellent."
G2.com,David H,Business Intelligence Developer,4.5,"Databricks' versatility is its best feature. The range of languages and functionality afforded by Databricks is impressive. Thus far, I've written code in R, Python, SQL and Scala in Databricks. And im just getting started. But I've composed SQL code in both R and Python, executed in Databricks. And then we come to interoperability. Data written to SQL can be accessed by either R or Python. Parameters can be passed across SQL, R and Python via widgets or environmental variables. If you have an intractable data or analytics problem, Databricks would be my 'go to' to maximise the options as to how you could potentially code your way under, around or over the obstacles standing between your project and successful execution.",The options for deployment of Databricks code from dev >> qa >> uat >> prod aren't as intuitive as I might like. This might have more to do with our current use of Azure Data Factory for orchestration. Setting up workflow natively in Databricks was quite straightforward. It seems to be accessing Databricks notebooks from Azure Data Factory in dev >> qa >> uat >> prod where we are perhaps creating problems for ourselves. Perhaps not a shortcoming in Databricks at all. Curious as to how Databricks would operate with AWS rather than Azure. Perhaps a better experience?

G2.com,Sudarsan S.,Data Engineer,4.5,"Unified Batch & Streaming for source systems data

Autoloader capability, along with Schema Evolution

Delta Live Table & orchestrating with Pipelines

CDC Event streams for SCD1 & SCD2 using DELTA apply changes

Databricks Workflows - Multi-task jobs

Serverless SQL Photon cluster along with Re-dash integrated Visualization

Unity Catalog

Delta Sharing & Data MarketPlace

Data Quality expectations

Integration with Collibra, Privacera & other security & governance tools","Issue in running multiple streaming jobs in same cluster

Job clusters can't be reused even for the same retry in PRODUCTION, since shutdown immediately after the job run/fail is set by default - Need to check any options to increase this limit

Multi-Task jobs requires TASK output should be passed to next input TASK and also need to support FAIL on trigger and setting OR dependent predecessors to trigger ,Currently supports only AND

No serverless option for Data engineering jobs outside DLT

DLT need to be matured to handle wide variety of integrating source & target, currently only support DELTA table in databricks. Expecting that be supported for any tool/service/product which supports DELTA format filesystems"
G2.com,"
Kartavya K.",,4.5,"The platform allows us to quickly start developing and prototyping without worrying much about setting up workspaces, the runtimes, connectors etc. The best part is that it is really powerful to move up from basic prototyping to production ready codebase maintenance","The platform allows us to quickly start developing and prototyping without worrying much about setting up workspaces, the runtimes, connectors etc. The best part is that it is really powerful to move up from basic prototyping to production ready codebase maintenance"
G2.com,No name,,3.5,"Ability to edit the same notebook with collaborators

- GitLab compatibility

- Multiple languages supported

- Broad functionality allows most of our digital teams to use it for their own needs

- Spark compute is fast and the amount of processors on a cluster is clear","UI is constantly changing, and changes are not announced with any leadup

- UI can be buggy - WebSocket disconnects, login timeouts, copy/pasting into incorrect cells

- Pricing structure is a little opaque - DBUs don't have a clear dollar-to-time amount

- Notebook structure isn't perfect for production engineering, better for ML or ad-hoc operations"
G2.com,No name,,4.5,"Databricks' Lakehouse platform combines the capabilities of a data lake and a data warehouse to provide a unified, easy-to-use platform for big data processing and analytics. The platform automatically handles tasks such as data ingestion, data curation, data lineage, and data governance, making it easy to manage and organize large amounts of data. The platform includes features such as version control, collaboration tools, and access controls, making it easy for teams to work together and ensure compliance with data governance policies.","The amount of time to spin up a new cluster takes around 10-15 minutes. Moreover, the limited resources and learning materials for new users become challenging. If data bricks can provide more learning resources will be great."

G2.com,Jack Y C,Chief Data Scientist,5,"One of the best analytical databases currently available in the market and can handle all formats of data ranging from structured, semi-structured, to unstructured.","I don't have anything I particularly don't like. If there is, I would say the SPARK statistical modeling libraries are still quite limited comparing with the packages from R, SAS, or Python."

G2.com,No name,,4.5,"As a frequent user of Databricks, it has made my life so much easier by simplifying processes and allowing me to develop proof-of-concept designs rapidly. The orchestration of notebooks via workflows provides excellent visualization and enables me to conduct real-time demos for members on the business side. In addition, the integration with Azure and AWS makes it so that Databricks does not operate in isolation and allows me and other engineering team members to transform large amounts of data that is ingested via our enterprise pipelines.","There can sometimes be issues integrating Databricks workflows with open source frameworks, often requiring lots of debugging and trial and error. Additionally, I've been told that the platform can be pretty expensive."
G2.com,"
Aleksandr P.",SSE,2.5,"Lakehouse platform is a solution that is easy to setup, the infrastructure is easy to maintain and the UI is accessible to a wide variety of engineers.

It allows for a fast rollout to production and covers most common needs of a data company.","The biggest kink in Lakehouse platform is its speed. It does not deliver on the performance promised.

In addition, the Databricks UI is not easy to use. It feels like it's a smartphone app.

On the side of technology, it is slow and expensive, with authorization added as an afterthought.

It's an absolute pain to administer and hard to control expenses."
G2.com,"
Rohit S.",software Developer,4,"Since Databricks Lakehouse provided a unified single platform for data processing, analysis, and machine learning, it helped me to work with the structured, semi-structured, and unstructured datas in a single environment.","Databricks Lakehouse required me to learn few tools and technologies, such as Apache Spark and Delta Lake, which as a beginner was a bit complex for me to learn."
G2.com,"
aashish b.",Analyst Optimization and Generation,4,"I have been actively engaged in Databricks training and I find it very relevant to the work our organization does. We usually have large amounts of data we need to process for our power generation and revenue needs, and I find that Databricks can be a one-stop shop for our automation and streamlining the process.”",I believe it could be a steep learning curve for someone who may not know how to program or have a general understanding of it. The best way to work around this is to follow training offered on data bricks.
G2.com,"
Nishant B.",Data Scientist-I,4.5,"Seamless integration between spark, pyspark, scala, sparkr, and SQL APIs with Cloud Storages, Easy to use and schedule streaming and batch services with delta lake as storage for all data engineering needs with git integration and revision control.","UI can be a little more like VSCode or cloud editor to give you more choices, modular code, packaged code for better unit testing, and CI CD can improve the developer experience drastically."
G2.com,No name,,4.5,"The versatility and scalability are the best features for us. We currently use SQL, R, Python, SQL, Spark and Scala with Databricks. It's impressive how seamless this experience is for different teams with different use cases and skill sets. The interoperability across these languages and accessing data is a blessing and enables us to use a vast array of tools to solve problems.",More insight into individual job costs would be a helpful feature that is currently lacking. Deploying code is also not as intuitive and the Git integration could be more powerful with an enhanced feature set.

G2.com,Adoba Y,,5,"Best for Analytics, we also got started off first using Fivetran and it was the easiest destination for us to use.","Getting setup wasnt the easiest thing to do, also the UI feels a little old."

G2.com,Nigel B,Senior Data Engineer,5,"If you need to leverage python, spark, and SQL to build ELT pipelines, Databricks offers the most robust and easy-to-use solution for this. It doesn't require a lot of effort to configure and deploy, and allows developers to focus on building pipelines, instead of getting the infrastructure to work.","I do wish there was more visibility into individual job cost, and overall cost as well- but this is a relatively minor complaint. Overall, the platform is great!"
G2.com,"
Malathi M.",Big Data Consultant,"4,5","Autoloader

Change Data Feed

DLT pipelines

Schema evolution

Jobs Multitask

Integration with leading Git Providers, Data Governance and security tools

MLflow AutoML

Serverless SQL endpoints for analyst

Photon accelerated engine","No GUI-based drag & drop

Complete Data Lineage visualization at the metadata level is still no there

NO serverless Cluster for data engineering pipelines if you use existing interactive clusters, only available through job clusters through DLT

Every feature has some limitations involved

More work is needed on orchestration workflows"
G2.com,"
Dan P.",,4,"t's fully managed, and gives us lots of processing power with very little effort.","There are lots of areas to it, so understanding all of it at any depth takes time."

G2.com,Ramesh G,,4.5,"Databricks is an excellent tool for data processing and analysis. The platform is user-friendly and intuitive, making it easy for team members of all technical skill levels to collaborate and work on data projects. The integration with popular data storage systems and the ability to run both SQL and Python code make it a versatile option for handling a variety of data types and tasks. The platform also offers robust security features and the ability to scale resources as needed. Overall, I highly recommend Databricks for anyone looking for a reliable and efficient data platform.",Nothing. I like the UI and the toggle between python and sql

G2.com,Likhita B,Senior Business Analyst,5,"I like the ease to switch between Python, Pyspark and sql in the same notebook.",The spark cluster needs to get connected faster in community edition.

G2.com,No name,,4,"The most reliable and user-friendly option for creating ELT pipelines that employ Python, Spark, and SQL is Databricks. Configuring and deploying it doesn't take much labour, and it frees developers from having to worry about setting up the infrastructure.","using the same cluster to perform several streaming tasks

Since shutdown immediately following the job run/fail is configured by default, job clusters cannot be reused even for the same retry in PRODUCTION. Checking potential ways to raise this limit."
G2.com,"
Vivi S.",Data Science Churn Lab Lead,4.5,"Databricks allow us to access data via pyspark, python and sql.

The interface is easy to use and most of my work is spent there.","I was told the model training part is more costly in Databricks than in Azure.

So some of the jobs need to be done in databricks and some of the jobs need to be done on Azure.

It will be good if cost is not an issue when choosing platforms."
G2.com,"
mandi N.",Senior Java developer,4,Offers low cost storage of data with efficient schema structure and analytics. Support for ACID transactions. Quick and easy data accessibility. Good data governance.,I should hold an active account with AWS or Azure to use Databricks Lakehouse platform or else I can't access it. More guidance on data quality tools is needed.
G2.com,"
Prashant S.",,3.5,It's stores data in delta lake that basically helps to generate a backup of data doesn't matter if process failed every time it took cache of data and most importantly we can easily migrate it with any cloud platform to handle big data.,For dislike I would say some time cluster takes time to run and it gives memory error and it's bit costly in use and sometime notebooks cells stuck in between run so team can work on it bit.
G2.com,"
Laksh S.",Threat Hunting Specialist-II,4,"Ease of use, really optimised platform, lots of good integrations, good customer support.",The platform has some glitches that have been lying around for a while now I feel. The SQL dashboards are very very slow and the screen gets stuck often.
G2.com,No name,,4.5,"The platform is powerful and flexible enough to do almost anything you want to do, like ETL, ML models, data mining, simple adhoc queries, etc. Also easy to switch languages between python, sql, r, scala, etc. anytime you want.","The search function is not my favoriate, I often like to use the search function from the browser but it doesn't work well with scripts in a big cell. Also the clusters takes a while to start."

G2.com,No  name,,3,Being quickly able to get the environment up and running for any kind of workloads. The support for all three languages and catering to the needs of Data Engineering and ML.,"Too many customizations are needed to achieve the right mix of parameterization for optimal performance. On the other hand, snowflake provides lots of features out of the box without the developer worrying about these things."

G2.com,No name,,5,"Pyspark, Delta lake, The way that it integrates seamlessly with AWS services and how they managed to open source everything. It provides a great managed spark infrastructure.",Harder to integrate with more legacy data sets. Requires you to move data into AWS to use.

G2.com,No name,,4,Fast iterative abilities and notebook baseed UI. It is also helpful to have multiple contributors on a single notebook at one time. You can see where others are in the notebook which helps with collaboration.,We are using Databricks to move large amounts of data. Our team is able to run different ETL pipelines with different schedules in an organized way. We are able to quickly iterate on our notebooks to add new features.

G2.com,"
Rishabh P.",Associate Data Engineer,4.5,"I like delta live table the most because of its working and the exposure it gave to the customer like data constraints and data quality check , that is best","I dislike the python syntax and code to create the delta live tables , so confusing and need to be change the logic , sql syntax is best"

G2.com,"
Arvind K.",Data scientist,5,"It is the best datalake even when compared to the offerings by AWS, GCP or Azure due to its proprietary delta lake technology",No offering for developers to debug their code line by line

G2.com,"
Senthil Kumarr M.",BigData Solution Architect,5,"No 1 - Delta Lakehouse platform supports ACID transactions (Data lake + Datawarehouse)

Easy DLT pipeline with lineage & quality

Unified governance with the unity catalog

Support Schema evolution

Exceptional AUTOLOADER capability","Awaiting for the Serverless Data engineering pipeline with NO capacity planning outside DLT with SLA-based scaling ( I know it's on ROADMAP, I am waiting).

More features on GCP+Databricks integration compared to same as AWS, Azure. (Some capabilities like credential passthrough missing in GCP)"
G2.com,"
Bipin S.",Senior Data engineer,5,It's a complete package for development to deployment. Helps in experimentation and within a few clicks we can move it from experimentation to production.,"Sometimes lacks the feel of working on a traditional IDE kind of environment. However, it's not a significant drawback and one gets accustomed to it with time."

G2.com,"
Shivaji C.",Research analyst,4,"Ease of access to multiple data sources and we can change the code to python to SQL,Scala etc it is impressive.",Not able to create interactive visualization

G2.com,"
Sahithi K.",Application developer,4,"Easy to schedule and run jobs and integrate with airflow and azure storage accounts.

Easy to execute code cell-wise and debug the errors because of its interpreter.",It won't give auto-fill suggestions while coding like how other IDEA's gives.

G2.com,"
Orr S.",analystics team lead,4.5,The flexibility of working with notebooks that combine python and sql,The visualization tools are nice but very basic and not really helpful

G2.com,"
ihor z.",Data Engineer,5,"Delta Tables, open source format, cloudFiles format, notebook UI and visualizations","other companies do not use delta, so integration is not so simple, as delta sharing"

G2.com,No name,,"4,5","The infrastructure is pretty straightforward. I started out using the Community edition before switching to the premium version, but if you're a student or working on one-off projects, the Community edition should be more than sufficient.","Finding some answers can be challenging at times because there aren't many Pyspark users, forums, or resources available."

G2.com,No name,,4,Great unification of functions & features and data sharing across the organization.,"There's still a lot to learn and make sure that all the functions I use work well and properly. Nothing bad, just more to find out."
G2.com,Sharmila R,Sr Big data Engineer,4.5,Easy to develop and maintain. Flexibility with transactional integrity.,Can be more integrated with DW systems like Snowflake.
G2.com,Nhat H.,data engineer,4.5,Data Engineer & Machine Learning is very easy to use.,It takes time to start a job cluster so I must create a cluster for live update dashboard.
G2.com,Rahul N.,Big data engineer,4,Easy to use and and very small learning curve. This makes it easy to start focusing on the actual probelm statement and start getting value out of it.,"UX. Though there are features available, sometime it's hard to find. If you're not trained, your eye might not catch it. Some features can only be applied via API. This require to keep a constant watch in the documentation to know what other options available. At least those options could be provided as a note in the UI for knowing there are other possibilities."
G2.com,No name,,4,Python notebooks that abstract away a lot of the complexity e.g. packages and infrastructure.,The drop down menu/tool bar in the UI sometimes feels a bit clunky.
G2.com,Max C,SSE,3.5,Databricks is the most reliable and flexible way to run Spark applications for data engineering workloads.,Databricks is at the top end of the market on pricing.
G2.com,No name,,5,I like the simple user interface that allows me to run spark without having to do much configuration. The Terraform support is also great.,"Databricks runtime is not available locally to run unit tests, so some workarounds have to be made for that."
G2.com,No name,,5,"Delta Lake & Lakehouse architecture for streaming and batch operations

- Databricks Academy provides hands-on learning and support

- Interact with Databricks resources via Terraform",Orchestration of pipelines could be improved. We currently use an external product to orchestrate our Databricks Spark jobs.
G2.com,Pradeep S,Team  Lead,5,"Data Governance and Simplified Schema.

Support for unstructured along with structured data enabling support for any use cases to build machine learning, business intelligence, and streaming features. Also support Streaming Live Tables which is a new feature in latest version.",performance benchmark needs to be verified with other competitors like Snowflake. Looks like(as per the documentation) the latest version is blazing fast.
G2.com,Martand S.,Sr Data Engineer,4,"Databricks delta lake is the default storage for databricks which makes it very useful. Time travel, transaction, partitioning makes it very efficient.",Until now I have not faced any limitations for my use case.
G2.com,"
Dr. Ernie P.",IT Biz Apps Manager,4.5,"The core storage technology is Open Source (Delta Lake)

2. Multiple data formats fully accessible via Spark/Python or SQL

3. Ability to manage code via our own GitHub repositories","Not always obvious which pieces are (or will be) open source vs proprietary

2. GitHub integration doesn't support multiple branches, making it difficult to develop alongside production

3. Hard mode-switch between SQL and Data Science user interfaces feels needlessly complex (though I understand there is some technical justification for it)"

G2.com,"
Miguel Ángel F.",Professor - Clustering & Time Series,5,"The ability to work with other professionals (data engineers and other data scientists) on the same platform. I trust Databricks and believe that they will always provide cutting-edge solutions that make Data Science projects more robust, easy and black-box-proof.","Sometimes I feel a bit trapped in the platform. Solutions like Databricks-connect and DBX are great but still miss a development environment more robust than notebooks. For this reason, I tend to use virtual machine from Azure Machine Learning to develop python packages and use Databricks as a compute platform to run queries against Big Data."

G2.com,"
Greg T.",Lead data Engineer,4,"The concept of a lakehouse and the simplicity this brings.

I like the notebook functionality and the constantly expanding ability of Partner Connect and link to BI tools","It feels like an Integrated Development Environment would be a fantastic improvement to the existing UI.

Dbx might help accomplish this with further developments."

G2.com,Tanvi M,Data analyst,5,Delta Table is the best. Spark in a very curated format,Nothing as now . Its very good overallll

G2.com,No name,,5,"Databricks support services needs a special nod, they have come to the table day in and day out with solutions and ideas to better our enterprise environment for any channel we were working with. Also, the support within the platform multi-format data, github support, and an open-source experience are what I love best about the platform.","I'd like a better more seamless way to integrate development teams into our environment. This does feel a bit challenging at times with some minor re-work, but I think there could be progress made in this area. Particularly speaking to how you manage workloads for teams with multiple workspaces."
G2.com,No name,,4.5,"As a Cloud Operation Specialist, I deploy the databricks workspace, setup and manage the clusters. It’s easy to setup and manage the users within the workspace.

UI is very user friendly and intuitive.",Error messages can me more detailed and explained well.
G2.com,No name,,5,"Lakehouse combines the power of storage of data lake and reliability of warehouse, decoupled storage and compute is the best thing","Not enough resources earlier, but now we have all the required material in databricks academy."
G2.com,No name,,5,"For two large projects, our Big Data Analytics and Engineering teams moved to the cloud for the first time with Azure ALDS gen 2 and Databricks. We could have done the ETL the old fashioned way but decided that on a new platform we should adopt the new methodologies. We fully adopted Spark Structured Streaming Medallion Lakehouse archetecture with Bronze, Silver, Gold, and were able to deploy in just a few months what normally would have taken us a full year in Oracle with Informatica. For the first time ever our BI professionals were able to hit the same hive megastore and data model as our data scientists at blazing speeds.",There isn't much that I dislike about the platform. Many of the issues that we are having have to do with using more than one workspace and the need to orchestrate jobs/workflows between them. I understand that some of that new functionality is coming in future releases.
G2.com,No name,,4.5,Opportunity to not manage Hadoop clusters.,"Cluster autoscaling doesn't always work as expected, and I would like to have more control over EC2 instances provisioning (availability to use multiple instance types in a single job/cluster, affinities, possibility to define some sort of topology, etc.). The whole experience is notebook focused."

G2.com,Ahmed M,"Lead Data & AI Cloud Solutions Architect - Telco, Media , Energy, Advanced AI",5,The extensive detailed content that is not shy from being deeply technical and in the same time industry-focused to depth. The amount of information covered here is incredible from data & analytics to GPUs to K8s to industry discussions,"there is no on-demand hands-on labs. and we cannot download all slides for all sessions. The timing of the sessions was also a challenge.

I didn't like the scheduling features too."
G2.com,No name,,5,"Your data is not trapped with a propietary tool or platform.

You can access your data thru an SQL interface or interface it with Python, Scala or R API.

Much faster than other DWs.","The notebook editor could be better for copying and pasting.

Also it needs better Find lookup function for really long cells.

The new feature identity column feature for tables, needs better support to Create or Alter a table from pyschark commands."
G2.com,"
Venkatraman S.",Sr. Data engineer,4.5,Databricks Data Science and Engineering Workspace allows writing the coding in various languages and it enables the ingestion process simpler and guarantee that data available for business queries are reliable and current,"Reusing the Cluster feature and delta live tables features was the least liked process, due to the missing link to the GIT integration directly from the Repos.

If this is available then we will be able to use these cool features widely"
G2.com,Ahmed H,Data engineer,4.5,"Everything is on a single platform like ETL, Sql dashboard and running ML models.

Simplied version for creating scheduled jobs using workshops and the best part is Delta Lake.",Every piece of code should be in the form of notebooks which sometimes makes it difficult to manage. It can be more user friendly if they give different options.
G2.com,Alihan Z,Sr Data engineer,5,"A great experience that combines ML-Runtimes - MLFlow and Spark. The ability to use Python, and SQL seamlessly in one platform. Since databricks notebooks can be saved as python scripts in the background it is amazing to have both notebook and script experience and synchronize to git.",Debugging code and using interactive applications outside out databricks approved tools can be tricky. It is hard to get a grasp of the documentation for beginners to the platform.
G2.com,Laura E,chief Data scientist,5,It is a cloud native modern data estate service which handles core DW concepts around the snowflake and star schema requirements like CDC like a champ,"Lakehouses still can become lake-""swamps"" without true governance and what is offered is more 3rd party or bolt on"
G2.com,abel S,Data analyst,4.5,"Integration with Github repos and CI/CD pipelines. Also having different ways to collaborate with team members and stakeholders (repos, workspace, Databricks SQL)","Depending on cluster settings and number of users running queries at the same time and the number of jobs running at the same time, it can sometimes take time to run queries"
G2.com,Micheal L,analystics engineer,5,Best Data Engineering features. Love it.,Very expensive. Wish it would cost less.
G2.com,No name,,5,"Databricks Lakehouse brings together BI, SQL-based data warehouses, data governance, processing and DAG creation, and ML (and more) under one umbrella. Competitors like Dataiku, Snowflake, Cloudera, etc. really can't compete and don't bring the same value proposition out of the box.","Unless you're willing to keep your clusters that serve DB SQL queries spun up at all times, the ""first query wait"" can be quite annoying. However, using Databricks in its serverless form (managed environments) would mitigate that drawback."

G2.com,Paul b,Full Stack Data Scientist,4,"Scheduled jobs, pre-installed ML environments, abundance of documentation and example use cases. There are a lot of best practices available through the Spark AI conference also.",Their Spark install is not always compatible with 3rd party tools (ex: geospatial) causing a delay in some package availability until an extra release. Some gotachs with distributed model training (Horovod).

G2.com,No name,,5,"Integarted UI for SQL , Spark , Python . This makes the job really seamless",Nothing as of now . Enjoying the product

G2.com,"
Neelakanta P.",sr Data Engineer,4.5,Databricks lakehouse is one shop stop for analytics with Big data case,Databricks have many releases one going and that might create a need for customer to constantly updates there infrastructure

G2.com,No name,,5,"Lesser Running time, handling big datsets, user-friendly platform","Cluster active time is less, active time should be increased when not in use"

G2.com,"
Herivelton A.",Data engineer,5,"Delta lake house implementation, enabling and democratizing the data acquisition and consumption. The delta sharing initiative is so disrupting to creating data assets.",It seems that we could have an easier way to suggest improvements or new features needed in the daily activities. Overall good features.

G2.com,"
Julie-Anne B.",Fraud and Crime specialist,4.5,"I really like that we can have queries. We have only started to review them properly for each account, but there are limitless opportunities once we start digging.","I would like to access more access to creating queries and investigating several things. Sometimes I get glitches with the account, it freezes and I can't see the information."
G2.com,"
Bruno A.",executive manager,4.5,Integration and flebility on the daily base speed up the development and delivery. Data lake and data sharing is amazing and away from the competitors in the market.,It should be ease to get improvements in the platform and embedded reports for daily basis usage.
G2.com,Kavya P,Freelance Writer,5,"It offers multi-cloud support across AWS, GCP, and Azure

New features are aggressively released every quarter.

The UI is relatively user-friendly compared to AWS EMR or other similar products","Errors are not entirely straightforward sometimes.

Regular Maintenance can sometimes cause downtime or failure, which can be solved with proper scheduling and retry mechanisms."
G2.com,Mayank S,Domain architect,5,A single unified platform that can be used for both real-time and batch data ingestion patterns to fulfill both BI and advanced analytics use cases.,Nothing. I absolutely love the platform.
G2.com,Colin M,Principal Sales Engineer - Data Enrichment,5,Easy to use for a SQL only technical person.,I would like SQL query designer to make the text
G2.com,"
Geoffrey F.",Solution architect,4.5,I love that Databricks abstracts away all of the administrative overhead of running spark clusters.,I wish that spark could do run time partition elimination
G2.com,Tejas Sai,Sr Data engineer,5,Data bricks Lake House architecture seems promising,I don't see much video tutorials. If we have tutorials for free we can learn more
G2.com,Mayur s,sr Consultant,4.5,"One platform to access Notebooks, tables, AI/Ml Platform","No debugger like other IDE's, difficult to navigate notebooks and functions"

G2.com,"
Robert T.",Solution arch,5,workload isolation is the best feature for our use case.,serverless endpoint with AAD passthrough has few selections

G2.com,"
Marcelo A.",Data Eng,4,"Really helpful abstractions, intuitive UI and attentive support.",Very hermetic environment. Could allow more integrations with outer platforms.

G2.com,Alan B,,5,Single pane of glass to create an end to end solution,"No dislikes yet, some aspects of capability delivery are difficult to govern"

G2.com,"
Pierre-Alain R.",Data Analyst/ Scientist,4,All integrated platform with different tools.,More visible way to manage and maintain Delta lakes

G2.com,Guilherme de Almeida G.,PM,5,The way that works with the delta format. Providing a lot of possibilities without the necessity to have a dedicated database administrator. It's also nice to talk about their flexibility,"Sometimes the cluster management is not so well distributed, causing some necessity to restart the cluster. Maybe send some warnings before it gets non workable."
G2.com,No name,,5,Databricks is a great platform to bring all your data into a single location and provides tools for many personas to work with that data from BI to AI.,There is a learning curve to using Databricks
G2.com,No name,,5,Has tools like AutoML which reduces human effort and increases better predictions and deeper understanding of the data,The platform can be slow sometimes. Other than that not major issues worth mentioning
G2.com,No name,,5,The platform integrates the best of the warehouse and lake.,"As of now, there is nothing to report on disadvantages."
G2.com,No name,,4,combines data warehousing with data lakes; ease of use and implementation; compatibility with BI tools,nothing i can think of - databricks is awesome
G2.com,No name,,4.5,"For me, I like the data science and SQL platform best. They are extremely helpful for my job, allowing me to streamline my work and automate it using Jobs.",Sometimes the platform can be a bit slow to react but I'm not sure if it's the cluster size or something is wrong with Databricks itself. Overall I didn't find many issues with the platform.

G2.com,No name,,4.5,"How a steep learning curve Databricks is, I'm enjoying learning all the time with such great materials and people.","Sometimes I feel documentation is a bit misleading. E.g. Pandas UDF + ML model combined - the functionality of both is amazing, but actually it's not clear how to use it"

G2.com,Hubert D.,Consultant,5,It is a one-stop shop for all. It is clearly the best data processing solution that I have ever used.,"There is still room for improvements - for example, when you deploy, it could be nice to set the configuration of DBFS storage (is it LRS or GRS, etc.). When VMs are deployed, there should be more options to configure hard drives. It is a minor issue related to administrative tasks, but anyway, it is the best lakehouse on the market today."

G2.com,No name,,4.5,"I like how Data bricks is layed out. At first it can seem difficult to read, but from computer background i find it a great way to layout the information in an organized fashion.",Sometimes Credit Files will pull not pull in the Credit Kyc section and will almost always pull in the Customer information Tab.

G2.com,No name,,5,"One please with all nessarery fetaure.

1. tracking

2. storage of models and files related documentation like log, config file.

3. validation of the model with meteric feature and plot crossponding to it and genrate the report from the experiments.

4. Data callibation with data 🧱, SQL and other cloud providers.","Anything which dislike is nothing tell yet, But If we can build something like feature where we can do more advanced anlaytics crossponding to parameters and meterics and generate the different plot from that tabuler data, like we have d-tale (github: https://github.com/man-group/dtale), beacuse I was running some simulation run with differnet experiments and then write the report crossponding with the experiments and with differnet signnificant plot, demonstrates the report write up."

G2.com,Damon Wilder C.,CTO,4.5,"Incredible support for Data Science, Machine Learning and SQL use cases all in one platform.","Theoretically their SQL Engine may not scale, but I have not seem that at all."

G2.com,No name,,4,The slowly changing dimension features that comes out of the box with the lakehouse,Lake of UI/UX to have a moth user experience

G2.com,Carissa T,BDA,4,I like how granular we can get when running queries and reports for our business purposes,The user interface could be a little sleeker and more intuitive

G2.com,"
Prashidha K.",Optimisation Engineer,4.5,Databricks had very powerful distributed computing built in with easy to deploy optimized clusters for spark computations. The notebooks with MLFlow integration makes it easy to use for Analytics and Data Science team yet the underlying APIs and CICD integrations make it very customizable for the Data Engineers to create complex automated data pipelines. Ability to store and query and manipulate massive Spark SQL tables with ACID in Delta Lake makes big data easily accessible to all in the organization.,"It lacks built in data backup features and ability to restrict data access to specific users. So if anyone accidentally deletes data from Delta Table or DBFS, the lost data cannot be retrieved unless we setup our own customized backup solution."

G2.com,No name,,5,"Delta Lake , SQL Analytics , Optimized Photon engine","Notebook UI , performance for SQL analytics on huge volume on the fly aggregate calculations"

G2.com,No name,,4,The ability to monitor and relaunch jobs from the phone is amazing. I don't need to be on my computer on the weekend to relaunch a failed job.,"I don't particularly appreciate how the mlflow menu is hidden from the left bar UI unless I enter a specific link that contains the ID of an experiment.

For instance, I cannot see anything in https://*******.cloud.databricks.com/#mlflow and I cannot see mlflow in the left bar unless I go to :

https://*******.cloud.databricks.com/#mlflow/experiments/3817218/runs/172f6fcb9f144bbc93cc5b3b857c50f1"
G2.com,"
Joseph M.",DOT,5,"An interface that is better than Jupyter notebooks that allows SQL, Scala, PySpark, Python, R and the ability to collabortate on notebooks",
G2.com,Priyanka B,analyst,5,"I learned to datamine with Python on Databricks and I use it daily. It is a nice software, user friendly and easy to connect to multiple sources",The errors can be a little more explanatory than what it is currently.
G2.com,"
Navisha S.",Associate Data Science Engineer,5,It's very useful when it comes to tracking performance of machine learning models. Acts like a dashboard that would otherwise have to be built from scratch.,There aren't any major downsides but the model training part according to me is still better run locally for comfortable experiments
G2.com,No name,,4.5,MLFlow has been instrumental in providing a decoupled interface between training and prediction part of our ML pipelines where we can store model metadata as part of training cycle and then in prediction we are able to leverage and pick model with highest ROC or just latest by chronological sorting and overall just do good job in tracking our experiments.,Sometimes slow to render in databricks environment as UI but that could be related to our databricks setup.
G2.com,Omkar m,SDE,4,"The easy of use is the most useful feature i like about MLFlow, it can use locally without any need for deployment on any server, great UI which allows use to search through general experiments and flexibility of changing data stores.",The UI design is in Django I think which is a bit laggy and slow improvement can be done on that side.
G2.com,Stefan P,Lead data engineer,5,"Easy administration, easy to create jobs from notebooks, great development environment, new and exciting features coming.",Taking away our dedicated customer service rep and replacing this with just a support GUI.
G2.com,Vinit K,HSEO engineer,3.5,One of the best utilities for packing code into reproducible runs and tuning the hyperparameters.,Merge with Azure data bricks was not very smooth & giving problems while using GPU cores.
G2.com,No name,,5like how it forces the developer to follow a certain code style which can basically help maintain the codebase much easily over time and have a proper documentation over it.,I think there could be improvements within the documentation over how to use MLflow within existing codebases.
G2.com,No name,,5,MLflow tracking has been a major advantage for keeping up the record of the results of the experiments we carry out on the data using different parameters. Tracking the results and parameters is very iseful for achieving the most optimized solution.,One small counter point is that it is not an easy tool and requires all in depth knowledge for making the best use of it.
G2.com,Gitesh K,Coursera Deep Learning Mentor,4.5,The features. There are hell lot of features present on MLFlow,"Haven't explored everything. But yeah, maybe better documentation."
G2.com,"
Jorge C.",,4.5,"It is a highly adaptable solution for data engineering, data science, and AI",I wouldn't say I like the lack of an easier way to import personalized code files or libraries from notebooks.

G2.com,Chen S,Lecturer in Artificial Intelligence,5,"Experiment management, and model deployment.",Support for code engineering and version control.

G2.com,Pavan kumar Y,Sr data eng,5,"It has got everything in it. IDE, Version Control, Scheduling whatnot.",I didn't find something that discomforts me yet.
G2.com,No name,,4.5,The fact that you can store all your models at one place,The support for custom transforms isn't there

G2.com,Deepa Ram S.,Data analyst 2,4.5,Easy to use multiple languages based command in same notebook. Direct connection to Redshift.,Sometime it takes lot of time to load data. Should show better suggestions.

G2.com,Ayush O,Data scientist,4.5,Great for model refreshes and comparison,Ui and artifact loads can be improved...

G2.com,"
Stephen D.",Senior Systems/Data Analyst,4,It makes the power of Spark accessible and innovative solutions like Delta Lake.,Fewer solutions that aren't wholly or partially on the cloud.
G2.com,Debashis P.,BI Development Manager,4.5,Spark Distribution of query and speed of batch query so does performance,Interface can be make better and more intutive
G2.com,"
Ramavtar M.",sse,4.5," A single format to support all measure ML libraries such as Sklearn, Tensorflow, MXnet, Spark MLlib, Pyspark etc.

2) Capabilities to deploy on Amazon Sagemaker with just one API call

3) Flexibility to log all model params such as Accuracy, Recall, etc. along with Hyperparameter tuning support.

4) A good GUI to compare and select the best models.

5) Model registry to track Staging, Production, and Archived models.

6) Python best API

7) REST APIs supported.

8) Available out of the box in Microsoft Azure.","CI/CD pipeline is not supported in the open-source version

2) Recent framework so not a very large community

3) Dependent on many python libraries. It can be a problem while resolving dependencies in your existing setup."
G2.com,"
Chad F.",VP & data head,5,"Incidentally, the thing I like most about Databricks isn't a product feature at all; I love Databricks's proactive and customer-centric service, always willing to make an exception or create a unique feature, all the while minimizing costs for the customer - as @Heather Akuiyibo & Shelby Ferson et al. have done for me and my former teams!",Broadening programming logic and syntax.
G2.com,No name,,4.5,Multiple people can write code in the same file at the same time. We use Databricks in Machine Learning.,"The cluster gets shut down after sometimes, leading to loss of data on the RAM"
G2.com,No name,,5,Having a platform to share codebase with team members and run machine learning models on the cloud.,"Sometimes we have to restart clusters to fix memory errors, which leads to data loss."
G2.com,No name,,5,Machine learning model tracking and find best weight,Add support for other programming language like cpp
G2.com,No name,,4,Centralization and remote servers and API,Nothing specific. I like everything about mlflow.
G2.com,"
ianthe L.","
Digital Marketing Specialist",5,"It is great when you have large amount of data, excellent for collaboration, perfect for using with visualisation tools and functions with many programming languages.",Difficult to get a grasp on how many applications and funcrions it has.
G2.com,"
Somu S.",Data engineer,5,"Interactive clusters, user friendly, excellent cluster management","Cluster takes some time to heat up on start, should support upsert without delta as business need pure upserts too"
G2.com,"
Alvaro R.",Data Engineer,5,"The different languages used for implementation.

Great user experience.

Easy to understand and use.

Creation of different tools inside such as clusters or database.

Ease of integration with other software such as azure services.

Great addition to your expertise if you manage to master it completely.

Integration of spark with the different languages.(Python, R, Scala)","The documentation inside the portal isn't the best, find better support outside with search engines."
G2.com,"
Vikrant B.",Sr Consultant,4.5,"DataBricks is a great analytics tool which provides lightening speed analytics and has given new abilities to Data Scientists. Additionally, our advanced analytics at scale has gone up 100 times.",The learning curve is steep and people would need coding knowledge to work with Databricks. It can also be costly at times.
G2.com,"
Douglas D.",Head of data science,4.5,"It's like a Jupyter notebook but a lot more powerful and flexible. You can easily switch from Python to SQL to Scala from one cell to the next. With the Spark framework, you can preview your data processing tasks without having to build large intermediate tables.","Need better support when it comes to troubleshooting spark applications. It shows a lot of information, but gives you little sense of how to apply it"
G2.com,No name,,4.5,"Good UI

2. Good integrations with other applications/services.

3. Faster and efficient.

4. Updates are good.","Need better support when it comes to troubleshooting spark applications. It shows a lot of information, but gives you little sense of how to apply itSometimes it take much time to load the Spark notebook.

2. Sometimes having issues with interpreter settings while running the notebook."
G2.com,No name,,4.5,"It has significantly improves its performance with the Databricks Inout and Ouput Module. WIth better support for spark, it combines well with Microsoft Azure and Amazon AWS. It has faster execution and faster read write processes in its version 5.",A few schema related queries are still on the slower side considering huge data clusters and the processing involved for those clusters.
G2.com,No name,,3.5,You can sync data from different systems all onto this one platform and everything can be analyzed without switching programs since you can also use many different programming languages and reap the benefits of each such as SQL and Python. This makes it so much easier to work with large datasets. Very nice user interface too!,"Very difficult to collaborate on projects using Databricks, it is its biggest downfall and in fact just almost outweighs the benefits. I also don't think their customer support is the best, have had some challenges with that. Otherwise a very good product."
G2.com,No name,,4.5,The system architects data delivery in a very easy to use and intuitive way. Non-data-savvy individuals are able to access the insights that data can provide and the support around the product is 2nd to none!,"Cost is always a concern when working with a system like this, but if the organization can afford it the insights and ease of use are worth it"
G2.com,Reeham N.,Lead Data Scientist/Analytics Manager,2.5,"The ability to automatically load data from aws into databricks for collection, and analysis. It does a great job of customizing the notebooks and nodes that can be created.","This tool is not optimized for R users. They were supposed to have an update Q1 for R studio but their help team informed me that this was no longer a priority. Even so, heavy AI and machine learning algorithms are not optimized for use here besides the usual theano and keras on python. Its also difficult to run analyses on a large volume of data without sampling which defeats the purpose."
G2.com,Shristi K,Incident Problem Manager,5,This is really a nice user friendly platform.,I have not found any glitches. It is really good.
G2.com,No name,,5,It is no surprise that Spark is one of the fastest growing technologies today and databricks provides a platform that makes transitioning to Spark easier. I like how there are also tutorials for people who are just beginning to learn make the onboarding to Spark easier. Love the connection with Github so there is the ease of sharing the projects with the world. Love the ease of pipeline creation in whatever language that one is comfortable in.,"Could give a bigger size of the cluster for individuals and students so that they can explore it to a bigger extent. Also, technical support is not good enough. I also do not like there is no way you can collaborate on a project. Visualization could be better."
G2.com,No name,,4,"Overall, since we brought in DataBricks, our ability to use DataScience and advacned analytics at scale has gone up 100 times. Our experience has been awesome, and I know we're not even pushing the bounds of what it can do","Overall Databricks has worked well, though it has taken longer than we anticipated to get it up and running."
G2.com,Shivi G,Partner Account Lead,2.5,Get the data you need right at your finger tips,Data can be hard to pull (weite code in SQL) versus other platforms
G2.com,No name,,4,"Databricks is my company's one stop shop for interacting with our expansive datasets. Databricks has been great so far for navigating our complex storage systems, accessing data, and being able to analyze it without having to switch programs. One of the best features of Databricks is that you can use a variety of languages within the program to complete all steps needed to fully use the data. I like being able to switch seamlessly between python, spark, and sql to work on big data sets.

Additionally, the formatting of the workbook is awesome. You can create new spaces below your original data view in order to perform the analysis.",When using Databricks on a cloud-based server it is sometimes difficult to search through the folders and tables to find exactly what you need. I think it would be beneficial if they created an S3 browser to speed up this process.
G2.com,No name,,4,"I love how accurate and quick databricks is. Once I started working with databricks, I couldn’t fathom doing data analyzation and comparisons without it.",
G2.com,Tommy O,,5,Software was great and easy.It was fun to use.,Nothing at all.Ircwas understandable and fun.
G2.com,No name,,5,"Databricks is a great tool to integrate queries from MySQL, Redshift, Python, and Adobe Clickstream data and the queries run pretty fast too.",It takes some coding knowledge to set up and a good Data Engineering team.
G2.com,No name,,3.5,One of the best features on the platform is the ability to use a notebook environment and attach them to different Spark interpreters. I do like the user interface and the easy access to browsing files stored on the cluster.,"Controls are not really developed, it is hard to optimize runtimes. Databricks is expensive and we can not say it is the best for price/value."