Databricks
8 TopicsAzure Databricks - SQL query - Configuration not available
I spun up a FINO's Legend Studio instance locally, and I was able to establish a connectivity between the application and my Azure Databricks resource. However, when I run a SQL query from Legend Studio, which is supposed to execute on Databricks, I get a "Configuration legend_databricks_http_path is not available" error from Databricks: By going to the "Query History" on Azure Databricks, I can confirm Legend Studio is reaching Databricks, but this is responding with the error mentioned above. The "See error" button doesn't provide any additional error details. Is anyone familiar with the "Configuration is not available" type of error in Azure Databricks SQL queries?Solved64Views0likes2CommentsData archiving of delta table in Azure Databricks
Hi all, Currently I am researching on data archiving for delta table data on Azure platform as there is data retention policy within the company. I have studied the documentation from Databricks official (https://docs.databricks.com/en/optimizations/archive-delta.html) which is about archival support in Databricks. It said "If you enable this setting without having lifecycle policies set for your cloud object storage, Databricks still ignores files based on this specified threshold, but no data is archived." Therefore, I am thinking how to configure the lifecycle policy in azure storage account. I have read the documentation on Microsoft official (https://learn.microsoft.com/en-us/azure/storage/blobs/lifecycle-management-overview) Let say the delta table data are stored in "test-container/sales" and there are lots of "part-xxxx.snappy.parquet" data file stored in that folder. Should I simply specify "tierToArchive", "daysAfterCreationGreaterThan: 1825", "prefixMatch: ["test-container/sales"]? However, I am worried that will this archive mechanism impact on normal delta table operation? Besides, I am worried that what if the parquet data file moved to archive tier contains both data created before 5 years and after 5 years, it is possible? Will it by chance move data earlier to archive tier before 5 years? Highly appreciate if someone could help me out with the questions above. Thanks in advance.146Views0likes1CommentHarnessing Retail Data with Azure: Integrating Blob Storage and Databricks for Advanced Analytics
Learn how a retail company leverages Azure Blob Storage and Azure Databricks to store, process, and analyze its massive sales data. You will see how the company uses PySpark to transform data into insights that help them optimize their product strategy and marketing campaigns. You will also find some learning resources to help you get started with data engineering on Microsoft Azure.1.4KViews0likes0CommentsEmpowering Startups: The Introductory Guide to Databricks for Entrepreneur's Data-Driven Success
Unlock the key to entrepreneurial success with Databricks—a journey where data empowers startups to thrive. Get ready to embark on a transformative quest for data-driven excellence!3.3KViews2likes0CommentsLoading Parquet and Delta files into Azure Synapse using ADB or Azure Synapse?
I have a below case scenario. We are using Azure Databricks to pull data from several sources and generate the Parquet and Delta files and loaded them into our ADLS Gen2 Containers. We are now planning to create our data warehouse inside Azure Synapse SQL Pools, where we will create external tables for dimension tables which will use delta files and hash distributed fact tables using Parquet files. Now, the question is, to automate this data warehousing loading activity, which method is better? Is it better to use Azure Databricks to write our transformation logic to create dim and fact tables and load them regularly inside Azure Synapse SQL pools (or) is it better to use Azure Synapse to write our transformation logic to create dim and fact tables and load them regularly inside Azure Synapse SQL pools. Please help.619Views0likes1CommentTrain your Model on Spark/Databricks, score it on ADX
Are you using Spark/Databricks to build Machine Learning models? Do you need to score new data that is streamed into Azure Data Explorer? If this is your scenario please read on! In this blog we show how to train an ML model on Azure Databricks, export it to ADX, and score new samples directly on ADX, in near real time, using inline Python code embedded in KQL query.5.9KViews2likes4CommentsGetting started on Azure
I work with large dataset and I am just getting started on learning Azure. I am famaliar with Python and Powerbi. I am planning to integrate Synapse and Databricks for anaalytics and visualisation using Powerbi. What books do you recommend for me to understand these modules?1.1KViews0likes1Comment