Connect with experts and redefine what’s possible at work – join us at the Microsoft 365 Community Conference May 6-8. Learn more >

Databricks

8 Topics

Azure Databricks - SQL query - Configuration not available
I spun up a FINO's Legend Studio instance locally, and I was able to establish a connectivity between the application and my Azure Databricks resource. However, when I run a SQL query from Legend Studio, which is supposed to execute on Databricks, I get a "Configuration legend_databricks_http_path is not available" error from Databricks: By going to the "Query History" on Azure Databricks, I can confirm Legend Studio is reaching Databricks, but this is responding with the error mentioned above. The "See error" button doesn't provide any additional error details. Is anyone familiar with the "Configuration is not available" type of error in Azure Databricks SQL queries?
Solved
damiangelis
Feb 14, 2025 Place Azure
64Views
0likes
2Comments
Data archiving of delta table in Azure Databricks
Hi all, Currently I am researching on data archiving for delta table data on Azure platform as there is data retention policy within the company. I have studied the documentation from Databricks official (https://docs.databricks.com/en/optimizations/archive-delta.html) which is about archival support in Databricks. It said "If you enable this setting without having lifecycle policies set for your cloud object storage, Databricks still ignores files based on this specified threshold, but no data is archived." Therefore, I am thinking how to configure the lifecycle policy in azure storage account. I have read the documentation on Microsoft official (https://learn.microsoft.com/en-us/azure/storage/blobs/lifecycle-management-overview) Let say the delta table data are stored in "test-container/sales" and there are lots of "part-xxxx.snappy.parquet" data file stored in that folder. Should I simply specify "tierToArchive", "daysAfterCreationGreaterThan: 1825", "prefixMatch: ["test-container/sales"]? However, I am worried that will this archive mechanism impact on normal delta table operation? Besides, I am worried that what if the parquet data file moved to archive tier contains both data created before 5 years and after 5 years, it is possible? Will it by chance move data earlier to archive tier before 5 years? Highly appreciate if someone could help me out with the questions above. Thanks in advance.
Brian_169
Jan 05, 2025 Place Analytics on Azure
146Views
0likes
1Comment
Harnessing Retail Data with Azure: Integrating Blob Storage and Databricks for Advanced Analytics
Learn how a retail company leverages Azure Blob Storage and Azure Databricks to store, process, and analyze its massive sales data. You will see how the company uses PySpark to transform data into insights that help them optimize their product strategy and marketing campaigns. You will also find some learning resources to help you get started with data engineering on Microsoft Azure.
Jiechen_Li
Feb 26, 2024 Place Educator Developer Blog
1.4KViews
0likes
0Comments
Empowering Startups: The Introductory Guide to Databricks for Entrepreneur's Data-Driven Success
Unlock the key to entrepreneurial success with Databricks—a journey where data empowers startups to thrive. Get ready to embark on a transformative quest for data-driven excellence!
Destiny_Erhabor
Sep 26, 2023 Place Educator Developer Blog
3.3KViews
2likes
0Comments
Loading Parquet and Delta files into Azure Synapse using ADB or Azure Synapse?
I have a below case scenario. We are using Azure Databricks to pull data from several sources and generate the Parquet and Delta files and loaded them into our ADLS Gen2 Containers. We are now planning to create our data warehouse inside Azure Synapse SQL Pools, where we will create external tables for dimension tables which will use delta files and hash distributed fact tables using Parquet files. Now, the question is, to automate this data warehousing loading activity, which method is better? Is it better to use Azure Databricks to write our transformation logic to create dim and fact tables and load them regularly inside Azure Synapse SQL pools (or) is it better to use Azure Synapse to write our transformation logic to create dim and fact tables and load them regularly inside Azure Synapse SQL pools. Please help.
bspadp2
Sep 06, 2023 Place Azure Architecture
619Views
0likes
1Comment
Train your Model on Spark/Databricks, score it on ADX
Are you using Spark/Databricks to build Machine Learning models? Do you need to score new data that is streamed into Azure Data Explorer? If this is your scenario please read on! In this blog we show how to train an ML model on Azure Databricks, export it to ADX, and score new samples directly on ADX, in near real time, using inline Python code embedded in KQL query.
adieldar
Jul 19, 2023 Place Azure Data Explorer Blog
5.9KViews
2likes
4Comments
Getting started on Azure
I work with large dataset and I am just getting started on learning Azure. I am famaliar with Python and Powerbi. I am planning to integrate Synapse and Databricks for anaalytics and visualisation using Powerbi. What books do you recommend for me to understand these modules?
Chan_Tze_Leong
Oct 29, 2021 Place Analytics on Azure
1.1KViews
0likes
1Comment
ADF Mapping Data Flows for Databricks Notebook Developers
Convert Databricks ETL Notebooks to ADF using Mapping Data Flows
Mark Kromer
Oct 21, 2019 Place Azure Data Factory Blog
6.3KViews
0likes
0Comments