Connect with experts and redefine what’s possible at work – join us at the Microsoft 365 Community Conference May 6-8. Learn more >

Microsoft Fabric

55 Topics

Delivering Information with Azure Synapse and Data Vault 2.0
Data Vault has been designed to integrate data from multiple data sources, creatively destruct the data into its fundamental components, and store and organize it so that any target structure can be derived quickly. This article focused on generating information models, often dimensional models, using virtual entities. They are used in the data architecture to deliver information. After all, dimensional models are easier to consume by dashboarding solutions, and business users know how to use dimensions and facts to aggregate their measures. However, PIT and bridge tables are usually needed to maintain the desired performance level. They also simplify the implementation of dimension and fact entities and, for those reasons, are frequently found in Data Vault-based data platforms. This article completes the information delivery. The following articles will focus on the automation aspects of Data Vault modeling and implementation.
Naveed-Hussain
Mar 07, 2025 Place Analytics on Azure Blog
155Views
0likes
0Comments
MVP’s Favorite Content: AI with Azure Database, Fabric, Dynamics 365 Customer Insights
Learn AI app development with Azure Database for PostgreSQL and deepen your knowledge of Microsoft Fabric and Dynamics 365 with MVP’s top picks.
RieMoriguchi
Mar 06, 2025 Place Microsoft MVP Program Blog
52Views
0likes
0Comments
Connect to any data with Shortcuts, Mirroring and Data Factory using Microsoft Fabric
Easily access and unify your data for analytics and AI — no matter where it lives. With OneLake in Microsoft Fabric, you can connect to data across multiple clouds, databases, and formats without duplication. Use the OneLake catalog to quickly find and interact with your data, and let Copilot in Fabric help you transform and analyze it effortlessly. Eliminate barriers to working with your data using Shortcuts to virtualize external sources and Mirroring to keep databases and warehouses in sync — all without ETL. For deeper integration, leverage Data Factory’s 180+ connectors to bring in structured, unstructured, and real-time streaming data at scale. Maraki Ketema from the Microsoft Fabric team shows how to combine these methods, ensuring fast, reliable access to quality data for analytics and AI workloads. Access your data instantly. Connect to Azure, AWS, Snowflake & on-prem sources in OneLake without moving a single file. Get started with Microsoft Fabric. Replicate databases with near-zero latency. Fast, reliable analytics. Check out Mirroring in Microsoft Fabric. ETL, data prep & movement at scale. Fabric Data Factory makes it simple & efficient to move data faster. See how it works. Watch our video here. QUICK LINKS: 00:00 — Access data wherever it lives 00:42 — Microsoft Fabric background 01:17 — Manage data with Microsoft Fabric 03:04 — Low latency 03:34 — How Shortcuts work 06:41 — Mirroring 08:10 — Open mirroring 08:40 — Low friction ways to bring data in 09:32 — Data Factory in Microsoft Fabric 10:52 — Build out your data flow 11:49 — Use built-in AI to ask questions of data 12:56 — OneLake catalog 13:36 — Data security & compliance 15:10 — Additional options to bring data in 15:42 — Wrap up Link References Watch our show on Real-Time Intelligence at https://aka.ms/MechanicsRTI Check out Open Mirroring at https://aka.ms/FabricOpenMirroring Unfamiliar with Microsoft Mechanics? As Microsoft’s official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft. Subscribe to our YouTube: https://www.youtube.com/c/MicrosoftMechanicsSeries Talk with other IT Pros, join us on the Microsoft Tech Community: https://techcommunity.microsoft.com/t5/microsoft-mechanics-blog/bg-p/MicrosoftMechanicsBlog Watch or listen from anywhere, subscribe to our podcast: https://microsoftmechanics.libsyn.com/podcast Keep getting this insider knowledge, join us on social: Follow us on Twitter: https://twitter.com/MSFTMechanics Share knowledge on LinkedIn: https://www.linkedin.com/company/microsoft-mechanics/ Enjoy us on Instagram: https://www.instagram.com/msftmechanics/ Loosen up with us on TikTok: https://www.tiktok.com/@msftmechanics Video Transcript: - If you’ve struggled with accessing data for your analytics and AI workloads, as it’s spread across different clouds or databases and in different formats, today, we will look at the options available to you for connecting and accessing data wherever it lives with the unified data lake, OneLake part of the cloud data analytics and AI platform, Microsoft Fabric. And importantly, we’ll show you how easy it is for your team members to find the data that you brought in with the new OneLake catalog and how you can use Copilot and Fabric as you work to interact with your data wherever it lives from OneLake. And joining me today from the Microsoft Fabric Product team is Maraki Ketema. Welcome to the show. - Thanks for having me. - And thanks so much for joining us today. But before we get into this, why don’t we set a bit of context for anyone who’s new to Microsoft Fabric. So, Microsoft Fabric is a preintegrated optimized SaaS environment, which provides a comprehensive set of data analytics and an AI platform with built-in capabilities for data integration, data engineering, data science, data warehousing, real time intelligence, data visualization and overall data management. Underpinning Fabric is its multi-cloud data lake, OneLake, which gives you a central point for data to be discovered and accessed wherever it resides across your data estate at scale. Now, we’ve covered Microsoft Fabric in a lot of past shows, but today we really want to be able to specifically demystify how it can help you get a better handle on your data. - Well, it helps on a number of levels. You’ve already mentioned scalability and with all of the integrated capabilities for data teams to collaborate on building clean quality data, it can be done at scale for any use case. And OneLake really is the key to getting a handle on your data by making it accessible with support for open formats like Delta Parquet and Iceberg. This helps eliminate traditional barriers to working with your data, and we give you a variety of methods to bring your data into OneLake like Shortcuts where you can virtualize data from where it’s stored, which creates a pointer to any structured open file-based tabular data or unstructured files, even images and multimedia. All this happens without duplicating the data or options for Mirroring where you can create an always up-to-date replica of the source in Fabric. And this is great for databases and data warehouses with proprietary formats where your business critical data may be stored. Now both of these options can be used like any other native data in OneLake and they require no ETL. Then for all of your other sources that require data transformation or read or write capabilities, you can use the hundreds of connectors provided by Data Factory and Microsoft Fabric to make your data natively available in OneLake and to bring streaming data, you’ll use Microsoft Real Time Intelligence. You’ll likely use these techniques to different extents depending on your data and AI needs and whichever method you use to connect data, we make it available with minimal latency. This is super important, for example, for real time or gen AI tasks because they’re less predictable whereas a user or agent interacts on the backend. This can quickly create a series of requests to retrieve data which need to happen fast to ground the AI so that responses aren’t delayed. Fabric takes care of all of this for you at scale and at low latency. - So quality data then become super accessible whenever you need it and wherever it lives. Why don’t we show them a few examples of this? - Sure. So, today I’m going to walk you through an e-commerce system and it’s for a Supermart with a grocery department where we need to quickly understand demand versus supply as well as market competition over prices and get a 360 view of operations and customer experiences. Now, different teams, including marketing, analytics and IT are collaborating together in a single Fabric workspace. Now here the marketing team creates promotions daily and they work with different vendors who are using different systems to store data and there’s no standard file type. The good news is that we can connect to all of these different systems using Shortcuts. Let me show you how that works. Here under Get Data, I can see my option to bring data in. I’ll choose a new shortcut. You’ll see that I have both Microsoft and non-Microsoft locations. In this case, I want to connect to Amazon S3 for unstructured data. From here, if I don’t already have a connected data source, I can create a new connection using credentials for the service. But to save time, I’ll use an existing connection. I’ll choose the second option here. I can explore the data available to me and I can choose the specific folders I want. I’ll pick a few for Kentoso and confirm. Now the data’s in OneLake, and I can expand the folders and look at different data like these markdown files with texts, which contain customer feedback and I have a nice preview of the data to understand what’s in it. Additionally, I have some image data on my local drive that I want to share with others on my team as we’re trying to figure out the best placement for in-store promotions. The good news is that I can also shortcut to all of this data in OneLake directly from my desktop. Let’s take a look. Here I am in Windows File Explorer and I’m connected to OneLake and I can interact with these files and sync them right from here. In fact, here I’m adding an image file from our grocery department, and from the status I can see that it’s already synced. Now if I move back over to Fabric, you’ll see that it’s just synced into my lakehouse view. From here, I can preview the image right away. So now I have the information I need to start analyzing customer sentiment and where we can place point of sale promotions. Again, in both examples, the file data still remains at the source, just like shortcuts on your desktop, the data doesn’t actually live in OneLake, but always stays in sync. Shortcuts in Microsoft Fabric are supported for open storage formats like Microsoft Dataverse, Azure Data Lake Storage, Google Cloud Storage, Databricks, Amazon S3, and any S3 compatible stores and more. And you can also use Shortcuts for on-premise data sources using Fabric on-premise data gateway. - And beyond your file data, your Supermart is probably dependent on operational data that’s sitting in databases and warehouses, all of which might have their own proprietary formats. So what’s the path of least effort then to bring that data in? -So this is where Mirroring in Microsoft Fabric comes into play. It makes it easy to replicate data into OneLake and storage is included as part of your existing Microsoft Fabric capacity. Let’s jump in. Here, you can see my sales dashboard, which is broken down by category, location and even has some forecasting built in. And on the back end, I already have various sources mirrored into my Fabric workspace in OneLake that are feeding into this particular view. -I’m going to use Mirroring and create a new item to connect to Azure SQL DB and bring in data from the Supermarts in the same region. I’ll filter by mirror and then select the Azure SQL Database option. From here, I’ll add my connection details. I’ll type the database name and the rest securely auto completes. After I connect, it takes seconds to show the table in the database. And from there it’s just one more click to create the mirror database and now it’s ready to use in OneLake. Just like Shortcuts, all of this works without ETL or moving the source data. And now if we go back to our Get data page, you’ll notice that most of the Azure databases are directly supported for Mirroring as well as Snowflake. That said, you aren’t limited to using Mirroring for just these sources. You’ll notice that I have two sources here, Salesforce and our legacy on-prem SQL database. These were brought into OneLake using open mirroring. Open mirroring is an API, which lets users and data providers bring data in from any source while keeping them in sync. You can learn more about open mirroring at aka.ms/FabricOpenMirroring. - So Mirroring then has a great potential than in terms of being a frictionless way to bring your data in. But how real time then is the synchronization? - It’s near real time. Once you’ve created the Mirror database and brought your data in, you don’t need to do anything else to keep the data fresh. On the backend Fabric is continuously listening for changes and making updates to the data in OneLake. So I’ll go ahead and refresh my sales dashboard and you can see the updates flow in. Our sales just quadrupled in seconds with this new database. That’s actually because we’ve added a lot more stores with their sales data. - This is really a game changer then in terms of the time to insights and that you have these low friction ways to bring your data in. That said though, there are lots of cases where you might want to transform your data and need to be able to use more data integration work before you bring it in. -Right. And that’s where Data Factory and Microsoft Fabric is a powerful engine that can bring in your data at petabyte scale with everything you need to prep and transform the data to. Let’s take a look. As you begin to create pipelines, to bring your data, you’ll see that we now have more than 180 connectors to the most common data types. And these span both Microsoft and non-Microsoft options. And connecting to one is like we showed before with Shortcuts. If I click on Snowflake, for example, I just need to add connection settings and valid credentials to add the data source to my pipeline. And from here, let me go deeper on the pipeline experience itself. Here is one that I’ve already started. It takes our Supermart data through the bronze and silver layers before landing the curated data in the gold layer. To gain a deeper understanding, we can actually use Copilot to generate a summary of what the pipeline is doing and in seconds, as Copilot explains here, data is loaded before data is curated, and we have schema validation, which picks up on file mismatches and places them in a separate folder after sending an alert. The pipeline provides a visual view of all of these steps. Then if I move over to my notebook, you’ll see that it applies transformations on the data before it’s loaded into our gold layer. -Now, once my data’s in OneLake, I can also start building out my own data flows. Here’s a table that I just pulled in from Excel that looks at grocery transactions over the past quarter. This table is currently super wide, making analysis, very, very difficult. Here’s where the power of Copilot comes in. I don’t need to know the right buttons or terms or words. Sometimes it can just be as simple as describing how I want my tables to look, and I’ll submit this prompt, and almost instantly the table is transformed and more optimized for analysis. While I’m at it, I can also use Copilot to do a simple operation like renaming a column and pay attention to the middle column. The name was just changed. But what if someone inherits the state of flow? Copilot can also provide descriptions of what your query is doing to help save time. It’s described the query and it’s easy to understand for anyone. And here’s the real power of everything we’ve done today. As you can see in our lineage, we now have all our connected data sources from Shortcuts, Mirroring, and now Data Factory. Not only can I now see everything connected in my dashboard, but I can also use natural language with built-in AI to ask questions of my data. -In this case, I want to get ahead of wastage issues in our grocery department. My dashboard doesn’t quite help me here. This is where we can use the built-in AI to ask questions of the data. So I’ll go ahead and prompt it with which products are at risk of spoilage and required discounting. It’ll take a sec, and once that completes, I’ll get a top level view of the products at risk with details about their expiration dates. Under that, I can see the breakdown of its reasoning with a detailed table of each item with quantity per store. And there’s even the raw SQL query, the agent used to derive these insights. - And that was a really powerful example of what you can do once your data is in OneLake. But what if I’m not as close to the data and I want to be able to discover data that I have access to? - OneLake has the one Lake catalog, which is a central place for data users to discover the data they need and manage the data they own. Let’s take a look from the OneLake catalog, I can see everything I have access to. On the left, I can filter the views by my items, items endorsed by others on my team favorites and individual workspaces. At the top, I can also filter by different types of data artifacts, insights, and processes. Let’s take a look at the Ask questions. AI experience I just showed, and here I can see the lineage for how the data’s coming in. That said, with all this ease of data discovery, it’s super important to control and manage access to the data that’s exposed through OneLake. And what’s great is that data compliance controls from Microsoft Purview are built in. I can see the sensitivity labels for any data asset, and from a lineage perspective, these labels are automatically inherited from upstream parent data sources. Permissions are also fully manageable, and if there’s a direct link to this artifact, I’ll be able to see it here. Under the direct access tab, I can see who and which groups have access to this data already. And as a data admin, I can also add users to grant access to specific resources. In fact, I’ll go ahead and add you to this one, Jeremy and I can determine if you’re allowed to share it with others, edit or even view the data itself. - Okay, so now if we move over to my screen, I can see that the Ask Queue item has been shared with me, and it’s available right here. Now to show you the process to discover and request something, I’ll first filter data in my catalog view by semantic models just to narrow the list down a bit and for items that you can see but not access. You’ll see this icon here and there’s a button to request access like with this operations model here. And when I use that, I can add a message for why I’m requesting and send it to the admin for that data to get their approval. - And beyond access management, the integrations with Microsoft Purview for data security and compliance keep getting deeper. Also, there’s another option for bringing data into OneLake that we haven’t demonstrated, and that’s real time streaming data. That’s because there’s an entire show on how to do that using real-time intelligence that you can check out at aka.ms/MechanicsRTI - It’s really great to see all the ways that you can bring quality data into OneLake for analytics to ground your AI workloads. In fact, you can bring data in from OneLake for use with your Gen AI apps and agents using Azure AI Foundry, which we’ll cover more in an upcoming show. So, Maraki what do you recommend for all the people watching right now to learn more? - It’s simple, you can try everything I show today and everything else Fabric has to offer by signing up for a generous 60 day free trial. We don’t even require a credit card to get started. - So now you have lots of options to bring data in and to start working with it. Thanks so much for joining us today, Maraki and thank you for joining us to learn more about all the updates now. If you haven’t yet, be sure to subscribe to Microsoft Mechanics and we’ll see you again soon.
Zachary-Cavanell
Mar 04, 2025 Place Microsoft Mechanics Blog
420Views
0likes
0Comments
Optimizing fleet management with Microsoft connected fleets reference architecture
Discover how Microsoft and its partners are revolutionizing fleet management with cutting-edge technology. In this blog we explore the integration of real-time analytics, telematics, and business applications to create efficient, safe, and cost-effective fleet operations. Learn about the innovative solutions from industry leaders like Accenture, Bosch, TomTom, Connected Cars, Annata, HERE Technologies, DSA Daten-und Systemtechnik GmbH, and more. Dive into the future of fleet management with composable, modular, and flexible solutions that adapt to the fast-moving and interconnected world.
Mario_Ortegon
Feb 23, 2025 Place Microsoft Industry
2.9KViews
2likes
1Comment
Decision Guide for Selecting an Analytical Data Store in Microsoft Fabric
Learn how to select an analytical data store in Microsoft Fabric based on your workload's data volumes, data type requirements, compute engine preferences, data ingestion patterns, data transformation needs, query patterns, and other factors.
SlavaTrofimov
Feb 11, 2025 Place Analytics on Azure Blog
7.4KViews
12likes
5Comments
MVP’s Favorite Content: .NET, Windows 365, Azure AI, Microsoft Fabric
Four Microsoft MVPs share their top picks on .NET, Windows 365, generative AI, and Microsoft Fabric.
RieMoriguchi
Feb 07, 2025 Place Microsoft MVP Program Blog
90Views
0likes
0Comments
Efficient Log Management with Microsoft Fabric
Introduction In the era of digital transformation, managing and analyzing log files in real-time is essential for maintaining application health, security, and performance. There are many 3rd party solutions in this area allowing collecting / processing storing, analyzing and acting upon this data source. But sometimes as your systems scale, those solution can become very costly, their cost model increases based on the amount of ingested data and not according to the real resources utilization or customer value This blog post explores a robust architecture leveraging Microsoft Fabric SaaS platform focused on its Realtime Intelligence capabilities for efficient log files collection processing and analysis. The use cases can vary from simple application errors troubleshooting, to more advanced use case such as application trends detection: detecting slowly degrading performance issues: like average user session in the app for specific activities last more than expected to more proactive monitoring using log based KPIs definition and monitoring those APIS for alerts generation Regarding cost , since Fabric provides a complete separation between compute and storage you can grow your data without necessarily growing your compute costs and you still pay for the resources that re used in a pay as you go model. Architecture Overview The proposed architecture integrates Microsoft Fabric’s Real time intelligence (Realtime Hub) with your source log files to create a seamless, near real-time log collection solution It is based on Microsoft Fabric: a SAAS solution which is a unified suite integrating several best of breed Microsoft analytical experiences. Fabric is a modern data/ai platform based on unified and open data formats (parquet/delta) allowing both classical data lakes experiences using both traditional Lakehouse/warehouse SQL analytics as well as real-time intelligence on semi structured data , all in on a lake-centric SaaS platform. Fabric's open foundation with built-in governance enables you to connect to various clouds and tools while maintaining data trust. This is High level Overview of Realtime Intelligence inside Fabric Log events - Fabric based Architecture When looking in more details a solution for log collection processing storage and analysis we propose the following architecture Now let's discuss it in more details: General notes: Since Fabric is a SAAS solution, all the components can be used without deploying any infrastructure in advance, just by a click of a button and very simple configurations you can customize the relevant components for this solution The main components used in this solution are Data Pipeline Onelake and Eventhouse Our data source for this example is taken from this public git repo: https://github.com/logpai/loghub/tree/master/Spark The files were taken and stored inside an S3 bucket to simulate the easiness of the data pipeline integration to external data sources. A typical log file looks like this : 16/07/26 12:00:30 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 59219. 16/07/26 12:00:30 INFO spark.SparkEnv: Registering MapOutputTracker 16/07/26 12:00:30 INFO spark.SparkEnv: Registering BlockManagerMaster 16/07/26 12:00:30 INFO storage.DiskBlockManager: Created local directory at /opt/hdfs/nodemanager/usercache/curi/appcache/application_1460011102909_0176/blockmgr-5ea750cb-dd00-4593-8b55-4fec98723714 16/07/26 12:00:30 INFO storage.MemoryStore: MemoryStore started with capacity 2.4 GB Components Data Pipeline First challenge to solve is how to bring the log files from your system into Fabric this is the Log collection phase: many solutions exist for this phase each with its pros and cons In Fabric the standard approach to bring data in is by use of Copy Activity in ADF or in its Fabric SAAS version is now called Data Pipeline: Data pipeline is a low code / no code tool allowing to manage and automate the process of moving and transforming data within Microsoft Fabric, a serverless ETL tool with more than 100 connectors enabling integration with a wide variety of data sources, including databases, cloud services, file systems, and more. In addition, it supports an on prem agent called self-hosted integration runtime, this agent that you install on a VM, acts as a bridge allowing to run your pipeline on a local VM and securing your connection from on prem network to the cloud Let’s describe in more details our solution data pipeline: Bear in mind ADF is very flexible and supports reading at scale from a wide range of data sources / files integrated as well to all major cloud vendors from blob storage retrieval : like S3, GCS, Oracle Cloud, File systems, FTP/SFTP etc so that even if your files are generated externally to Azure this is not an issue at all. Visualization of Fabric Data Pipeline Log Collection ADF Copy Activity: Inside Data pipeline we will create an Activity called Copy Activity with the following basic config Source: mapped to your data sources: it can be azure blob storage with container containing the log files, other cloud object storage like S3 or GCS , log files will be retrieved in general from a specific container/folder and are fetched based on some prefix/suffix in the file name. To support incremental load process we can configure it to delete the source files that it reads so that once the files are successfully transferred to their target they will be automatically deleted from their source . On the next iteration pipeline will not have to process the same files again. Sink: Onelake/Lakehouse folder: we create ahead of time a Lakehouse which is an abstract data container allowing to hold and manage at scale your data either structured or unstructured, we will then select it from the list of connectors (look for Onelake/Lakehouse) Log Shippers: This is an optional component, sometimes it is not allowed for the ETL to access your OnPrem Vnet , in this case tools like Fluentd , Filebeat , Open Telemetry collector used to forward your application collected logs to the main entry point of the system: the Azure Blob Storage. Azcopy CLI: if you don’t wish to invest into expensive tools and all you need to copy your data in a scale/secure manner to Azure Storage, you might consider create your own log shipping solution based on the free Azcopy tool together withs some basic scripting around it for scheduling: Azcopy is a command-line utility designed for high-performance uploading, downloading, and copying data to and from Microsoft Azure Blob and File storage. Fabric first Activity : Copy from Source Bucket to Lakehouse Log Preparation Upon log files landing in the azure blob storage, EventStream can be used to trigger the Data Pipeline that will handle the data preparation and loading phase. So what is Data preparation phase’s main purpose? After the log files land in the storage and before they are loaded to the realtime logs database the KQL Database , it might be necessary to transform the data with some basic manipulations . The reasons for that might be different A Few examples Bad data formats: for example, sometimes logs files contain problematic characters like new lines inside a row (stack trace error message with new lines as part of the message field of the record) Metadata enrichment: sometimes the log file names contain in their name some meaningful data : for example file name describes the originating process name / server name , so this metadata can be lost once the file content is loaded into database Regulation restrictions: sometimes logs contain private data like names, credit card numbers, social security number etc called PII that must be removed , hashed or encrypted before the load to database In our case we will be running a pyspark notebook who reads the files from Onelake folder, fixes the new lines inside a row issue, and create new files in another Onelake folder, we call this notebook with a base parameter called log_path that defines the log files location on the Onelake to read from Fabric second Activity : Running the Notebook Log Loading Inside Data pipeline , the last step, after the transformation phase, we call again the Copy data activity but this time source and sink are differen: Source: Lakehouse folder (previous notebook output) Sink: Evenhouse specific Table (created ahead of time): it is basically an empty table (lograw) Visualization of Fabric last Activity : Loading to EventHouse In summary for this stage the log collection and preparation: we broke this into 3 data pipeline activities: Copy Activity: Read the log files from source: This is the first step of the log ingestion pipeline it is running inside our orchestrator Data pipeline. Run Notebook Activity : Transform the log files : this is the execution of a single or chain of notebooks Copy Activity : Load the log files into destination datatbase : KQL inside Evenhouse : the logs database table called lograw, it is a specific table created ahead of time inside EventHouse Database Inside The Eventhouse We needed to create a KQL database with a table to hold the raw ingested log records KQL datbase is a scalable and efficient storage solution for log files, optimized for high-volume data ingestion and retrieval. Eventhouses and KQL databases operate on a fully managed Kusto engine. With an Eventhouse or KQL database, you can expect available compute for your analytics within 5 to 10 seconds. The compute resources grow with your data analytic needs. Log Ingestion to KQL Database with Update Policy We can separate the ETL transformation logic of what happens to the data before, it reaches the Eventhouse KQL database and after that. Before it reached the database , the only transformation we did was calling during the data pipeline a notebook to handle the new lines merge logic, This cannot be easily done as part of the database ingestion logic simply because when we try to load the files with new lines as part of a field of a record , it breaks the convention and what happens is that the ingestion process creates separate table records for each new line of the exceptions stacktrace. On the other hand, we might need to define basic transformation rules: such as date formatting, type conversion (string to numbers) , parse and extract some interesting value from a String based on regular exception, create JSON (dynamic type) of a hierarchical string (XML / JSON string etc) for all these transformations we can work with what is called an update policy we can define a simple ETL logic inside KQL database as explained here During this step we create from logsraw staging table a new table called logparsed , that will be our destination final table for the log queries. Those are the KQL Tables defined to hold the log files .create table logsraw ( timestamp:string , log_level:string, module:string, message:string) .create table logsparsed ( formattedDatetime:datetime , log_level:string, log_module:string, message:string) This is the update policy that automatically converts data from, the staging table logsraw to the destination table logparsed .create-or-alter function parse_lograw() { logsraw | project formattedtime = todatetime(strcat("20", substring(timestamp, 0, 2), "-", substring(timestamp, 3, 2), "-", substring(timestamp, 6, 2), "T", substring(timestamp, 9, 8))), log_level, logmodule=module, message } .alter table logsparsed policy update @'[{ "IsEnabled": true, "Source": "logsraw", "Query": "parse_lograw()", "IsTransactional": true, "PropagateIngestionProperties": true}]' Since we don't need to retain the data in the staging table (lograw) we can define a retention policy of 0 TTL like this : .alter-merge table logsraw policy retention softdelete = 0sec recoverability = disabled Query Log files After data is ingested and transformed it lands in a basic logs table that is schematized : logparsed, in general we have some common fields that are mapped to their own columns like : log level (INFO/ ERROR/ DEBUG) , log category , log timestamp (a datetime typed column) and log message which can be in general either a simple error string or a complex JSON formatted string in which case it is usually preferred to be converted to dynamic type that will bring additional benefits like simplified query logic, and reduced data processing (to avoid expensive joins) Example for Typical Log Queries Category Purpose KQL Query Troubleshooting Looking for an error at specific datetime range logsparsed | where message contains "Exception" and formattedDatetime between ( datetime(2016-07-26T12:10:00) .. datetime(2016-07-26T12:20:00)) Statistics Basic statistics Min/Max timestamp of log events logsparsed | summarize minTimestamp=min(formattedDatetime), maxTimestamp=max(formattedDatetime) Exceptions Stats Check Exceptions Distributions logsparsed | extend exceptionType = case(message contains "java.io.IOException","IOException", message contains "java.lang.IllegalStateException","IllegalStateException", message contains "org.apache.spark.rpc.RpcTimeoutException", "RpcTimeoutException", message contains "org.apache.spark.SparkException","SparkException", message contains "Exception","Other Exceptions", "No Exception") | where exceptionType != "No Exception" | summarize count() by exceptionType Log Module Stats Check Modules Distribution logsparsed | summarize count() by log_module | order by count_ desc | take 10 Realtime Dashboards After querying the logs, it is possible to visualize the query results in Realtime dashboards, for that all what’s required Select the query Click on Pin to Dashboard After adding the queries to tiles inside the dashboard this is a typical dashboard we can easily build: Realtime dashboards can be configured to be refreshed in Realtime like illustrated here: in which case user can very easily configure how often to refresh the queries and visualization : at the extreme case it can be as low as Continuus There are many more capabilities implemented in the Real-Time Dashboard, like data exploration Alerting using Data Activator , conditional formatting (change items colors based on KPIs threshold) and this framework and capabilities are heavily invested and keep growing. What about AI Integration ? Machine Learning Models: Kusto supports out of the box time series analysis allowing for example anomaly detection: https://learn.microsoft.com/en-us/fabric/real-time-intelligence/dashboard-explore-data and clustering but if it’s not enough for you, you can always mirror the data of your KQL tables into Onelake delta parquet format by selecting OneLake availability This configuration will create another copy of your data in open format delta parquet : you have it available for any Spark/Python/SparkML/SQL analytics for whatever machine learning exploration and ML modeling you wish to explore train and serve This is illustrated here : Bear in mind , there is no additional storage cost to turn on OneLake availability Conclusion A well-designed real-time intelligence solution for log file management using Microsoft Fabric and EventHouse can significantly enhance an organization’s ability to monitor, analyze, and respond to log events. By leveraging modern technologies and best practices, organizations can gain valuable insights and maintain robust system performance and security.
rabindori
Jan 10, 2025 Place Analytics on Azure Blog
1.1KViews
0likes
0Comments
Unleash the power of data and generative AI with Microsoft Cloud for Manufacturing
Microsoft Cloud for Manufacturing uses data and generative AI to enhance operational efficiency and safety in the manufacturing sector. Learn more about the manufacturing data solutions in Microsoft Fabric, Factory Operations Agent in Azure AI, and the new Factory Safety Agent which help manufacturers unify and standardize data, gain real-time insights, and improve decision-making processes.
monicaugwi
Jan 10, 2025 Place Microsoft Industry
796Views
2likes
0Comments
What’s Included with Microsoft’s Granted Offerings for Nonprofits?
Are you a nonprofit looking to boost your impact with cutting-edge technology? Microsoft is here to help! From free software licenses to guided technical documentation and support, this program offers a range of resources designed to empower your organization. In this blog, we’ll dive into the incredible tools and grants available to nonprofits through Microsoft, showing you how to make the most of these generous offerings. Whether you’re managing projects or just trying to simplify your day-to-day tasks, there’s something here for everyone. Let’s explore what’s possible!
KenelleMoore
Jan 08, 2025 Place Nonprofit Techies
545Views
0likes
0Comments
MVP’s Favorite Content: Fabric, Azure, Windows Security
Let’s keep learning about Microsoft Fabric, Azure, and Windows security with top picks from Microsoft MVPs!
RieMoriguchi
Dec 06, 2024 Place Microsoft MVP Program Blog
226Views
3likes
1Comment