data
26 TopicsOracle RAC on Azure
An Azure datacenter is built on a globally distributed infrastructure, which contains numerous layers of redundancy and a resilient interconnected network. This is far superior to an on-prem datacenter because it HAS TO BE. Geo-regions are fault tolerant in case of complete regional datacenter failure, which means the way we architect for the cloud is often different than the way we architect on-prem.29KViews8likes7CommentsBring Vision to Life with Three Horizons, Data Mesh, Data Lakehouse, and Azure Cloud Scale Analytics
Bring Vision to Life with Three Horizons, Data Mesh, Data Lakehouse, and Azure Cloud Scale Analytics – Plus some bonus concepts! I have not posted in a while so this post is loaded with ideas and concepts to think about. I hope you enjoy it! The structure of the post is a chronological perspective of 4 recent events in my life: 1) Camping on the Olympic Peninsula in WA state, 2) Installation of new windows and external doors in my residential house, 3) Injuring my back (includes a metaphor for how things change over time), and 4) Camping at Kayak Point in Stanwood WA (where I finished writing this). Along with these series of events bookended by Camping trips, I also wanted to mention May 1 st which was International Workers Day (celebrated as Labor Day in September in the US and Canada). To reach the vision of digital transformation through cloud scale analytics we need many more workers (Architects, Developers, DBAs, Data Engineers, Data Scientists, Data Analysts, Data Consumers) and the support of many managers and leaders. Leadership is required so analytical systems can become more distributed and properly staffed to scale vs the centralized and small specialist teams that do not scale. Analytics could be a catalyst for employment with the accelerated building and operating of analytical systems. There is evidence that the structure of the teams working on these analytical systems will need to be more distributed to scale to the level of growth required. When focusing on data management, Data Mesh strives to be more distributed, and Data Lakehouse supports distributed architectures better than the analytical systems of the past. I am optimistic that cloud-based analytical systems supported by these distributed concepts can scale and progress to meet the data management, data engineering, data science, data analysis, and data consumer needs and requirements of many organizations.22KViews6likes1CommentBegin at the Beginning - Data Literacy for the Data Professional
Most of us have "peaks and valleys" in our knowledge. In any given topic, we know a lot about some parts, but not as much about others. In some cases, there are significant gaps. Let's take a look at what the Data Professional needs to know about Data Literacy.2.9KViews4likes0CommentsThe Technical Professional's Learning Scope for SQL Server Big Data Clusters
If you’re a data professional, you know that it’s important to set aside some time for training when a new release or paradigm comes from your platform to create a solution. In the case of SQL Server 2019 (and later), you’ll want to pay close attention to the Big Data Clusters feature. It’s an exponential knowledge increase, and that’s no exaggeration.2.7KViews3likes1CommentAutomated Continuous integration and delivery – CICD in Azure Data Factory
In Azure Data Factory, continuous integration and delivery (CI/CD) involves transferring Data Factory pipelines across different environments such as development, test, UAT and production. This process leverages Azure Resource Manager templates to store the configurations of various ADF entities, including pipelines, datasets, and data flows. This article provides a detailed, step-by-step guide on how to automate deployments using the integration between Data Factory and Azure Pipelines. Prerequisite Azure database factory, Setup of multiple ADF environments for different stages of development and deployment. Azure DevOps, the platform for managing code repositories, pipelines, and releases. Git Integration, ADF connected to a Git repository (Azure Repos or GitHub). The ADF contributor and Azure DevOps build administrator permission is required Step 1 Establish a dedicated Azure DevOps Git repository specifically for Azure Data Factory within the designated Azure DevOps project. Step 2 Integrate Azure Data Factory (ADF) with the Azure DevOps Git repositories that were created in the first step. Step 3 Create developer feature branch with the Azure DevOps Git repositories that were created in the first step. Select the created developer feature branch from ADF to start the development. Step 4 Begin the development process. For this example, I create a test pipeline “pl_adf_cicd_deployment_testing” and save all. Step 5 Submit pull request from developer feature branch to main Step 6 Once the pull requests are merged from the developer's feature branch into the main branch, proceed to publish the changes from the main branch to the ADF Publish branch. The ARM templates (JSON files) will get up-to date, they will be available in the adf-publish branch within the Azure DevOps ADF repository. Step 7 ARM templates can be customized to accommodate various configurations for Development, Testing, and Production environments. This customization is typically achieved through the ARMTemplateParametersForFactory.json file, where you specify environment-specific values such as link service, environment variables, managed link and etc. For example, in a Testing environment, the storage account might be named teststorageaccount, whereas in a Production environment, it could be prodstorageaccount. To create environment specific parameters file Azure DevOps ADF Git repo > main branch > linkedTemplates folder > Copy “ARMTemplateParametersForFactory.json” Create parameters_files folder under root path Copy paste ARMTemplateParametersForFactory.json inside parameters_files folder and rename to specify environment for example, prod-adf-parameters.json Update each environment specific parameter values Step 8 To create an Azure DevOps CICD pipeline, use the following code and ensure you update the variables to match your environment before running it. This will allow you to deploy from one ADF environment to another, such as from Test to Production. name: Release-$(rev:r) trigger: branches: include: - adf_publish variables: azureSubscription: <Your subscription> SourceDataFactoryName: <Test ADF> DeployDataFactoryName: <PROD ADF> DeploymentResourceGroupName: <PROD ADF RG> stages: - stage: Release displayName: Release Stage jobs: - job: Release displayName: Release Job pool: vmImage: 'windows-2019' steps: - checkout: self # Stop ADF Triggers - task: AzurePowerShell@5 displayName: Stop Triggers inputs: azureSubscription: '$(azureSubscription)' ScriptType: 'InlineScript' Inline: | $triggersADF = Get-AzDataFactoryV2Trigger -DataFactoryName "$(DeployDataFactoryName)" -ResourceGroupName "$(DeploymentResourceGroupName)" if ($triggersADF.Count -gt 0) { $triggersADF | ForEach-Object { Stop-AzDataFactoryV2Trigger -ResourceGroupName "$(DeploymentResourceGroupName)" -DataFactoryName "$(DeployDataFactoryName)" -Name $_.name -Force } } azurePowerShellVersion: 'LatestVersion' # Deploy ADF using ARM Template and UAT JSON parameters - task: AzurePowerShell@5 displayName: Deploy ADF inputs: azureSubscription: '$(azureSubscription)' ScriptType: 'InlineScript' Inline: | New-AzResourceGroupDeployment ` -ResourceGroupName "$(DeploymentResourceGroupName)" -TemplateFile "$(System.DefaultWorkingDirectory)/$(SourceDataFactoryName)/ARMTemplateForFactory.json" -TemplateParameterFile "$(System.DefaultWorkingDirectory)/parameters_files/prod-adf-parameters.json" -Mode "Incremental" azurePowerShellVersion: 'LatestVersion' # Restart ADF Triggers - task: AzurePowerShell@5 displayName: Restart Triggers inputs: azureSubscription: '$(azureSubscription)' ScriptType: 'InlineScript' Inline: | $triggersADF = Get-AzDataFactoryV2Trigger -DataFactoryName "$(DeployDataFactoryName)" -ResourceGroupName "$(DeploymentResourceGroupName)" if ($triggersADF.Count -gt 0) { $triggersADF | ForEach-Object { Start-AzDataFactoryV2Trigger -ResourceGroupName "$(DeploymentResourceGroupName)" -DataFactoryName "$(DeployDataFactoryName)" -Name $_.name -Force } } azurePowerShellVersion: 'LatestVersion' Triggering the Pipeline The Azure DevOps CI/CD pipeline is designed to automatically trigger whenever changes are merged into the main branch. Additionally, it can be initiated manually or set to run on a schedule for periodic deployments, providing flexibility and ensuring that updates are deployed efficiently and consistently. Monitoring and Rollback To monitor the pipeline execution, utilize the Azure DevOps pipeline dashboards. In case a rollback is necessary, you can revert to previous versions of the ARM templates or pipelines using Azure DevOps and redeploy the changes.777Views2likes1Comment