Connect with experts and redefine what’s possible at work – join us at the Microsoft 365 Community Conference May 6-8. Learn more >

Data Science

24 Topics

Bring Vision to Life with Three Horizons, Data Mesh, Data Lakehouse, and Azure Cloud Scale Analytics
Bring Vision to Life with Three Horizons, Data Mesh, Data Lakehouse, and Azure Cloud Scale Analytics – Plus some bonus concepts! I have not posted in a while so this post is loaded with ideas and concepts to think about. I hope you enjoy it! The structure of the post is a chronological perspective of 4 recent events in my life: 1) Camping on the Olympic Peninsula in WA state, 2) Installation of new windows and external doors in my residential house, 3) Injuring my back (includes a metaphor for how things change over time), and 4) Camping at Kayak Point in Stanwood WA (where I finished writing this). Along with these series of events bookended by Camping trips, I also wanted to mention May 1 st which was International Workers Day (celebrated as Labor Day in September in the US and Canada). To reach the vision of digital transformation through cloud scale analytics we need many more workers (Architects, Developers, DBAs, Data Engineers, Data Scientists, Data Analysts, Data Consumers) and the support of many managers and leaders. Leadership is required so analytical systems can become more distributed and properly staffed to scale vs the centralized and small specialist teams that do not scale. Analytics could be a catalyst for employment with the accelerated building and operating of analytical systems. There is evidence that the structure of the teams working on these analytical systems will need to be more distributed to scale to the level of growth required. When focusing on data management, Data Mesh strives to be more distributed, and Data Lakehouse supports distributed architectures better than the analytical systems of the past. I am optimistic that cloud-based analytical systems supported by these distributed concepts can scale and progress to meet the data management, data engineering, data science, data analysis, and data consumer needs and requirements of many organizations.
DarwinSchweitzer
May 19, 2022 Place Data Architecture Blog
22KViews
6likes
1Comment
A Data Science Process, Documentation, and Project Template You Can Use in Your Solutions
In most of the Data Science and AI articles, blogs and papers I read, the focus is on a particular algorithm or math angle to solving a puzzle. And that's awesome - we need LOTS of those. However, even if you figure those out, you have to use them somewhere. You have to run that on some sort of cloud or local system, you have to describe what you're doing, you have to distribute an app, import some data, check a security angle here and there, communicate with a team....you know, DevOps. In this article, I'll show you a complete process, procedures, and free resources to manage your Data Science project from beginning to end.
BuckWoodyMSFT
Dec 14, 2021 Place Data Architecture Blog
11KViews
0likes
1Comment
Using Microsoft R in your Solutions
R is a powerful data language with thousands of "packages" allowing you to extend its uses for Data Science, Advanced Analytics, Machine learning and much more. Microsoft has enhanced this language with a Distribution called "Microsoft R Open". Read on to learn more about this powerful tool you can use stand-alone, or embedded in several Microsoft products.
BuckWoodyMSFT
Jun 09, 2020 Place Data Architecture Blog
8.7KViews
1like
2Comments
DevOps for Data Science – Part 10 - Automated Testing and Scale
The final DevOps Maturity Model is Load Testing and Auto-Scale. Note that you want to follow this progression – there’s no way to do proper load-testing if you aren’t automatically integrating the Infrastructure as Code, CI, CD, RM and APM phases. The reason is that the automatic balancing you’ll do depends on the automation that precedes it – there’s no reason to scale something that you’re about to change.
BuckWoodyMSFT
Aug 31, 2021 Place Data Architecture Blog
8.3KViews
0likes
0Comments
DevOps for Data Science – Part 9 - Application Performance Monitoring
In this series on DevOps for Data Science, I’ve explained the concept of a DevOps “Maturity Model” – a list of things you can do, in order, which will set you on the path for implementing DevOps in Data Science. The first thing you can do in your projects is to implement Infrastructure as Code (IaC) , and the second thing to focus on is Continuous Integration (CI). However, to set up CI, you need to have as much automated testing as you can – and in the case of Data Science programs, that’s difficult to do. From there, the next step in the DevOps Maturity Model is Continuous Delivery (CD). Once you have that maturity level down, you can focus on Release Management. And now we’re off to the next Maturity Level: Application Performance Monitoring (APM).
BuckWoodyMSFT
Jul 27, 2021 Place Data Architecture Blog
7.3KViews
1like
3Comments
DevOps for Data Science – Part 8 - Release Management
Release Management (RM), as a concept, is essentially what it says – determining a method of releasing new and changed software into an environment in a planned fashion. While this sounds simple, it actually takes quite a bit of forethought and planning, and involves not only the technical teams, but several business teams as well. RM is slightly different based on whether you are selling the software you are writing. But in any case, it comes down to the business function of planning. So how does that affect the Data Science team?
BuckWoodyMSFT
May 25, 2021 Place Data Architecture Blog
6.3KViews
0likes
0Comments
DevOps for Data Science - Part 5 - Infrastructure as Code
Right away I may have put a few Data Scientists off. No, it isn’t that they can’t set up a server or components, it’s just that this isn’t normally their job. However, the software, hardware configuration (virtual or otherwise), containers (if you use those, and yes, you should) Python environments, R libraries, and many other parameters affect the experiment. It’s essential that you’re able to duplicate all of that and store it in a source-control system so that you can re-create it for testing, deployment and the downstream phases. Read on for more.
BuckWoodyMSFT
Feb 09, 2021 Place Data Architecture Blog
5.8KViews
0likes
0Comments
Data Drift in Azure Machine Learning
In this article we'll discuss what Data Drift is and how to get started with monitoring data drift within the Azure ML SDK and how to enable the feature within the Azure ML Studio portal.
James Herring
Mar 09, 2020 Place Data Architecture Blog
5.6KViews
0likes
0Comments
Creating a Kubernetes Application for Azure SQL Database
Modern application development has multiple challenges. From selecting a "stack" of front-end through data storage and processing from several competing standards, through ensuring the highest levels of security and performance, developers are required to ensure the application scales and performs well and is supportable on multiple platforms. For that last requirement, bundling up the application into Container technologies such as Docker and deploying multiple Containers onto the Kubernetes platform is now de rigueur in application development. In this example, we'll explore using Python, Docker Containers, and Kubernetes - all running on the Microsoft Azure platform. Using Kubernetes means that you also have the flexibility of using local environments or even other clouds for a seamless and consistent deployment of your application, and allows for multi-cloud deployments for even higher resiliency. We'll also use Microsoft Azure SQL Database for a service-based, scalable, highly resilient and secure environment for the data storage and processing. In fact, in many cases, other applications are often using Microsoft Azure SQL Database already, and this sample application can be used to further leverage and enrich that data. This example is fairly comprehensive in scope, but uses the simplest application, database and deployment to illustrate the process. You can adapt this sample to be far more robust, even including leveraging the latest technologies for the returned data. It's a useful learning tool to create a pattern for other applications.
BuckWoodyMSFT
Oct 04, 2023 Place Data Architecture Blog
5.6KViews
0likes
0Comments
DevOps for Data Science – Part 7 - Continuous Delivery
To fully embrace DevOps in Data Science, you can start by implementing Infrastructure as Code (IaC) , and the second thing to focus on is Continuous Integration (CI) and Testing. The next step in the DevOps Maturity Model is Continuous Delivery (CD). There’s some discussion we need to cover here, since the definitions of DevOps and Continuous Delivery are quite similar, and to some, CD doesn’t belong “under” DevOps. Both DevOps and CD involve an agile mindset of releasing smaller, faster, and automated bits of code into the process rather than waiting for several changes to integrate at once.
BuckWoodyMSFT
Apr 13, 2021 Place Data Architecture Blog
4.8KViews
0likes
0Comments