Process and Procedure
19 TopicsA Data Science Process, Documentation, and Project Template You Can Use in Your Solutions
In most of the Data Science and AI articles, blogs and papers I read, the focus is on a particular algorithm or math angle to solving a puzzle. And that's awesome - we need LOTS of those. However, even if you figure those out, you have to use them somewhere. You have to run that on some sort of cloud or local system, you have to describe what you're doing, you have to distribute an app, import some data, check a security angle here and there, communicate with a team....you know, DevOps. In this article, I'll show you a complete process, procedures, and free resources to manage your Data Science project from beginning to end.12KViews0likes1CommentCreate and Deploy Azure SQL Managed Instance Database Project integrated with Azure DevOps CICD
Integrating database development into continuous integration and continuous deployment (CI/CD) workflows is the best practice for Azure SQL managed instance database projects. Automating the process through a deployment pipeline is always recommended. This automation ensures that ongoing deployments seamlessly align with your continuous local development efforts, eliminating the need for additional manual intervention. This article guides you through the step-by-step process of creating a new azure SQL managed instance database project, adding objects to it, and setting up a CICD deployment pipeline using GitHub actions. Prerequisites Visual Studio 2022 Community, Professional, or Enterprise Azure DevOps environment Contributor permission within Azure DevOps Con Sysadmin server roles within Azure SQL managed instance Step 01 Open Visual Studio, click Create a new project Search for SQL Server, select SQL Server Database Project Provide the project name, folder path to store .dacpac file, create Step 2 Import the database schema from an existing database. Right-click on the project and select 'Import'. You will see three options: Data-Tier Application (.dacpac), Database, and Script (.sql). In this case, I am using the Database option and importing form Azure SQL managed instance To proceed, you will encounter a screen that allows you to provide a connection string. You can choose to select a database from local, network, or Azure sources, depending on your needs. Alternatively, you can directly enter the server name, authentication type, and credentials to connect to the database server. Once connected, select the desired database to import and include in your project. Step 3 Configure the import settings. There are several options available, each designed to optimize the process and ensure seamless integration. Import application-scoped objects: will import tables, views, stored procedures likewise objects. Imports reference logins: login related imports. Import Permissions: will import related permissions. Import database settings: will import database settings. Folder Structure: option to choose folder structure in your project for database objects. Maximum files per folder: limit number files per folder. Click Start which will show the progress window as shown. Click “Finish” to complete the step. Step 4 To ensure a smooth deployment process, start by incorporating any necessary post-deployment scripts into your project. These scripts are crucial for executing tasks that must be completed after the database has been deployed, such as performing data migrations or applying additional configurations. To compile your database project in Visual Studio, simply right-click on the project and select 'Build'. This action will compile the project and generate a sqlproj file, ensuring that your database project is ready for deployment. When building the project, you might face warnings and errors that need careful debugging and resolution to ensure the successful creation of the sqlproj file. Common issues include missing references, syntax errors, or configuration mismatches. After addressing all warnings and errors, rebuild the project to create the sqlproj file. This file contains the database schema and is essential for deployment. Ensure that any post-deployment scripts are seamlessly integrated into the project. These scripts will run after the database deployment, performing any additional necessary tasks. To ensure all changes are tracked and can be deployed through your CI/CD pipeline, commit the entire codebase, including the sqlproj file and any post-deployment scripts, to your branch in Azure DevOps. This step guarantees that every modification is documented and ready for deployment. Step 5 Create Azure DevOps pipeline to deploy database project Step 6 To ensure the YAML file effectively builds the SQL project and publishes the DACPAC file to the artifact folder of the pipeline, include the following stages. stages: - stage: Build jobs: - job: BuildJob displayName: 'Build Stage' steps: - task: VSBuild@1 displayName: 'Build SQL Server Database Project' inputs: solution: $(solution) platform: $(buildPlatform) configuration: $(buildConfiguration) - task: CopyFiles@2 inputs: SourceFolder: '$(Build.SourcesDirectory)' Contents: '**\*.dacpac' TargetFolder: '$(Build.ArtifactStagingDirectory)' flattenFolders: true - task: PublishPipelineArtifact@1 inputs: targetPath: '$(Build.ArtifactStagingDirectory)' artifact: 'dacpac' publishLocation: 'pipeline' - stage: Deploy jobs: - job: Deploy displayName: 'Deploy Stage' pool: name: 'Pool' steps: - task: DownloadPipelineArtifact@2 inputs: buildType: current artifact: 'dacpac' path: '$(Build.ArtifactStagingDirectory)' - task: PowerShell@2 displayName: 'upgrade sqlpackage' inputs: targetType: 'inline' script: | # use evergreen or specific dacfx msi link below wget -O DacFramework.msi "https://aka.ms/dacfx-msi" msiexec.exe /i "DacFramework.msi" /qn - task: SqlAzureDacpacDeployment@1 inputs: azureSubscription: '$(ServiceConnection)' AuthenticationType: 'servicePrincipal' ServerName: '$(ServerName)' DatabaseName: '$(DatabaseName)' deployType: 'DacpacTask' DeploymentAction: 'Publish' DacpacFile: '$(Build.ArtifactStagingDirectory)/*.dacpac' IpDetectionMethod: 'AutoDetect' Step 7 To execute any Pre and Post SQL script during deployment, you need to update the SQL package, obtain an access token, and then run the scripts. # install all necessary dependencies onto the build agent - task: PowerShell@2 name: install_dependencies inputs: targetType: inline script: | # Download and Install Azure CLI write-host "Installing AZ CLI..." Invoke-WebRequest -Uri https://aka.ms/installazurecliwindows -OutFile .\AzureCLI.msi Start-Process msiexec.exe -Wait -ArgumentList "/I AzureCLI.msi /quiet" Remove-Item .\AzureCLI.msi write-host "Done." # prepend the az cli path for future tasks in the pipeline write-host "Adding AZ CLI to PATH..." write-host "##vso[task.prependpath]C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin" $currentPath = (Get-Item -path "HKCU:\Environment" ).GetValue('Path', '', 'DoNotExpandEnvironmentNames') if (-not $currentPath.Contains("C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin")) { setx PATH ($currentPath + ";C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin") } if (-not $env:path.Contains("C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin")) { $env:path += ";C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin" } write-host "Done." # install necessary PowerShell modules write-host "Installing necessary PowerShell modules..." Get-PackageProvider -Name nuget -force if ( -not (Get-Module -ListAvailable -Name Az.Resources) ) { install-module Az.Resources -force } if ( -not (Get-Module -ListAvailable -Name Az.Accounts) ) { install-module Az.Accounts -force } if ( -not (Get-Module -ListAvailable -Name SqlServer) ) { install-module SqlServer -force } write-host "Done." - task: AzureCLI@2 name: run_sql_scripts inputs: azureSubscription: '$(ServiceConnection)' scriptType: ps scriptLocation: inlineScript inlineScript: | # get access token for SQL $token = az account get-access-token --resource https://database.windows.net --query accessToken --output tsv # configure OELCore database Invoke-Sqlcmd -AccessToken $token -ServerInstance '$(ServerName)' -Database '$(DatabaseName)' -inputfile '.\pipelines\config-db.dev.sql'568Views2likes1CommentSolutions enable Human Experiences that Affect Lives
I am spending my weekends painting my house this summer. For me this is a repetitive 5 to 7-year cycle. Like residential houses, Data Warehouses need to be maintained and modernized. If you are planning to modernize your data and analytics platform this year, consider painting your vision like this diagram below to positively impact the experience of employees, customers, and partners for years to come.4.5KViews2likes3CommentsData Architecture and Designing for Change in the Age of Digital Transformation
Change is constant whether you are designing a new product using the latest design thinking and human-centered product development, or carefully maintaining and managing changes to existing systems, applications, and services. In this post I would like to provide both food for thought related to data architecture and change, as well as provide exposure to a practical analytics accelerator to capture change in data pipelines. Along the way I also want to discuss a couple of terms often referenced in data management and analytics discussions: 1) One Version of the Truth, and 2) Data Swamp. I have never liked either of these terms and will try to explain why realistically these are loaded, misleading, and rather biased terms. Here is the Analytics Accelerator on Change Data Management https://github.com/DataSnowman/ChangeDataCapture17KViews1like5CommentsThe (Amateur) Data Science Body of Knowledge
Whether you're interested in becoming a Data Scientist, a Data Engineer, or just to work with the techniques they use, In this article, I'll help you find resources for whichever path you choose for yourself. At the very least, you'll gain valuable insight to the Data Science field, and how you can use the technologies and knowledge to create a very compelling solution.4KViews3likes1CommentBring Vision to Life with Three Horizons, Data Mesh, Data Lakehouse, and Azure Cloud Scale Analytics
Bring Vision to Life with Three Horizons, Data Mesh, Data Lakehouse, and Azure Cloud Scale Analytics – Plus some bonus concepts! I have not posted in a while so this post is loaded with ideas and concepts to think about. I hope you enjoy it! The structure of the post is a chronological perspective of 4 recent events in my life: 1) Camping on the Olympic Peninsula in WA state, 2) Installation of new windows and external doors in my residential house, 3) Injuring my back (includes a metaphor for how things change over time), and 4) Camping at Kayak Point in Stanwood WA (where I finished writing this). Along with these series of events bookended by Camping trips, I also wanted to mention May 1 st which was International Workers Day (celebrated as Labor Day in September in the US and Canada). To reach the vision of digital transformation through cloud scale analytics we need many more workers (Architects, Developers, DBAs, Data Engineers, Data Scientists, Data Analysts, Data Consumers) and the support of many managers and leaders. Leadership is required so analytical systems can become more distributed and properly staffed to scale vs the centralized and small specialist teams that do not scale. Analytics could be a catalyst for employment with the accelerated building and operating of analytical systems. There is evidence that the structure of the teams working on these analytical systems will need to be more distributed to scale to the level of growth required. When focusing on data management, Data Mesh strives to be more distributed, and Data Lakehouse supports distributed architectures better than the analytical systems of the past. I am optimistic that cloud-based analytical systems supported by these distributed concepts can scale and progress to meet the data management, data engineering, data science, data analysis, and data consumer needs and requirements of many organizations.22KViews6likes1CommentDevOps for Data Science – Part 10 - Automated Testing and Scale
The final DevOps Maturity Model is Load Testing and Auto-Scale. Note that you want to follow this progression – there’s no way to do proper load-testing if you aren’t automatically integrating the Infrastructure as Code, CI, CD, RM and APM phases. The reason is that the automatic balancing you’ll do depends on the automation that precedes it – there’s no reason to scale something that you’re about to change.8.4KViews0likes0Comments