Databricks
4 TopicsAzure Databricks - SQL query - Configuration not available
I spun up a FINO's Legend Studio instance locally, and I was able to establish a connectivity between the application and my Azure Databricks resource. However, when I run a SQL query from Legend Studio, which is supposed to execute on Databricks, I get a "Configuration legend_databricks_http_path is not available" error from Databricks: By going to the "Query History" on Azure Databricks, I can confirm Legend Studio is reaching Databricks, but this is responding with the error mentioned above. The "See error" button doesn't provide any additional error details. Is anyone familiar with the "Configuration is not available" type of error in Azure Databricks SQL queries?Solved64Views0likes2CommentsData archiving of delta table in Azure Databricks
Hi all, Currently I am researching on data archiving for delta table data on Azure platform as there is data retention policy within the company. I have studied the documentation from Databricks official (https://docs.databricks.com/en/optimizations/archive-delta.html) which is about archival support in Databricks. It said "If you enable this setting without having lifecycle policies set for your cloud object storage, Databricks still ignores files based on this specified threshold, but no data is archived." Therefore, I am thinking how to configure the lifecycle policy in azure storage account. I have read the documentation on Microsoft official (https://learn.microsoft.com/en-us/azure/storage/blobs/lifecycle-management-overview) Let say the delta table data are stored in "test-container/sales" and there are lots of "part-xxxx.snappy.parquet" data file stored in that folder. Should I simply specify "tierToArchive", "daysAfterCreationGreaterThan: 1825", "prefixMatch: ["test-container/sales"]? However, I am worried that will this archive mechanism impact on normal delta table operation? Besides, I am worried that what if the parquet data file moved to archive tier contains both data created before 5 years and after 5 years, it is possible? Will it by chance move data earlier to archive tier before 5 years? Highly appreciate if someone could help me out with the questions above. Thanks in advance.146Views0likes1CommentLoading Parquet and Delta files into Azure Synapse using ADB or Azure Synapse?
I have a below case scenario. We are using Azure Databricks to pull data from several sources and generate the Parquet and Delta files and loaded them into our ADLS Gen2 Containers. We are now planning to create our data warehouse inside Azure Synapse SQL Pools, where we will create external tables for dimension tables which will use delta files and hash distributed fact tables using Parquet files. Now, the question is, to automate this data warehousing loading activity, which method is better? Is it better to use Azure Databricks to write our transformation logic to create dim and fact tables and load them regularly inside Azure Synapse SQL pools (or) is it better to use Azure Synapse to write our transformation logic to create dim and fact tables and load them regularly inside Azure Synapse SQL pools. Please help.619Views0likes1CommentGetting started on Azure
I work with large dataset and I am just getting started on learning Azure. I am famaliar with Python and Powerbi. I am planning to integrate Synapse and Databricks for anaalytics and visualisation using Powerbi. What books do you recommend for me to understand these modules?1.1KViews0likes1Comment