Process and Procedure
19 TopicsBring Vision to Life with Three Horizons, Data Mesh, Data Lakehouse, and Azure Cloud Scale Analytics
Bring Vision to Life with Three Horizons, Data Mesh, Data Lakehouse, and Azure Cloud Scale Analytics – Plus some bonus concepts! I have not posted in a while so this post is loaded with ideas and concepts to think about. I hope you enjoy it! The structure of the post is a chronological perspective of 4 recent events in my life: 1) Camping on the Olympic Peninsula in WA state, 2) Installation of new windows and external doors in my residential house, 3) Injuring my back (includes a metaphor for how things change over time), and 4) Camping at Kayak Point in Stanwood WA (where I finished writing this). Along with these series of events bookended by Camping trips, I also wanted to mention May 1 st which was International Workers Day (celebrated as Labor Day in September in the US and Canada). To reach the vision of digital transformation through cloud scale analytics we need many more workers (Architects, Developers, DBAs, Data Engineers, Data Scientists, Data Analysts, Data Consumers) and the support of many managers and leaders. Leadership is required so analytical systems can become more distributed and properly staffed to scale vs the centralized and small specialist teams that do not scale. Analytics could be a catalyst for employment with the accelerated building and operating of analytical systems. There is evidence that the structure of the teams working on these analytical systems will need to be more distributed to scale to the level of growth required. When focusing on data management, Data Mesh strives to be more distributed, and Data Lakehouse supports distributed architectures better than the analytical systems of the past. I am optimistic that cloud-based analytical systems supported by these distributed concepts can scale and progress to meet the data management, data engineering, data science, data analysis, and data consumer needs and requirements of many organizations.22KViews6likes1CommentData Architecture and Designing for Change in the Age of Digital Transformation
Change is constant whether you are designing a new product using the latest design thinking and human-centered product development, or carefully maintaining and managing changes to existing systems, applications, and services. In this post I would like to provide both food for thought related to data architecture and change, as well as provide exposure to a practical analytics accelerator to capture change in data pipelines. Along the way I also want to discuss a couple of terms often referenced in data management and analytics discussions: 1) One Version of the Truth, and 2) Data Swamp. I have never liked either of these terms and will try to explain why realistically these are loaded, misleading, and rather biased terms. Here is the Analytics Accelerator on Change Data Management https://github.com/DataSnowman/ChangeDataCapture17KViews1like5CommentsA Data Science Process, Documentation, and Project Template You Can Use in Your Solutions
In most of the Data Science and AI articles, blogs and papers I read, the focus is on a particular algorithm or math angle to solving a puzzle. And that's awesome - we need LOTS of those. However, even if you figure those out, you have to use them somewhere. You have to run that on some sort of cloud or local system, you have to describe what you're doing, you have to distribute an app, import some data, check a security angle here and there, communicate with a team....you know, DevOps. In this article, I'll show you a complete process, procedures, and free resources to manage your Data Science project from beginning to end.11KViews0likes1CommentDevOps for Data Science – Part 10 - Automated Testing and Scale
The final DevOps Maturity Model is Load Testing and Auto-Scale. Note that you want to follow this progression – there’s no way to do proper load-testing if you aren’t automatically integrating the Infrastructure as Code, CI, CD, RM and APM phases. The reason is that the automatic balancing you’ll do depends on the automation that precedes it – there’s no reason to scale something that you’re about to change.8.3KViews0likes0CommentsDevOps for Data Science – Part 8 - Release Management
Release Management (RM), as a concept, is essentially what it says – determining a method of releasing new and changed software into an environment in a planned fashion. While this sounds simple, it actually takes quite a bit of forethought and planning, and involves not only the technical teams, but several business teams as well. RM is slightly different based on whether you are selling the software you are writing. But in any case, it comes down to the business function of planning. So how does that affect the Data Science team?6.3KViews0likes0CommentsSizing Out Oracle Workloads for Azure Using an Oracle Statspack Report
A number of our customers and my peers have asked me how to use the Excel sizing template to estimate an Oracle workload without access to the AWR. This is actually quite easy to do with Oracle Statspack.4.8KViews0likes0Comments