kusto
142 TopicsAzure Data Explorer for Vector Similarity Search
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/series-cosine-similarity-function In the world of AI & data analytics, vector databases are emerging as a powerful tool for managing complex and high-dimensional data. In this article, we will explore the concept of vector databases, the need for vector databases in data analytics, and how Azure Data Explorer (ADX) aka Kusto can be used as a vector database.30KViews13likes5CommentsWhat’s New in Azure Synapse Data Explorer – Ignite 2022 !
Kusto team has been busy working with our customers over the last few months to bring exciting new GA & Preview features at Ignite 2022, with a raft of improvements and innovations. Today, we are announcing a set of exciting new features that further improve the service’s performance, security, management, and integration experiences: Integration Cosmos DB synapse link to Azure Data Explorer (Preview) Azure Cosmos DB is a fully managed distributed database for web, mobile, gaming, and IoT application that needs to handle massive amounts of data, reads, and writes at a global scale with near-real response times. ADX native ingestion of Cosmos DB brings the high-throughput / low-latency transactional Cosmos DB data to the analytical world of Kusto, delivering the best of both worlds. Data can be ingested in near real time (streaming ingestion) to run analytics on the most current data or audit changes. The feature is in private preview right now and should be available in public preview before the end of the year. Kusto Emulator (GA) The Kusto Emulator is a Docker Image exposing a Kusto Query Engine endpoint. You can use it to create databases, ingest and query data. The emulator understands Kusto Query Language (KQL) the same way the Azure Service does. We can therefore use it for local development and be ensured the code is going to run the same in an Azure Data Explorer cluster. We can also deploy it in a CI/CD pipeline to run automated test suites to ensure our code behaves as expected. You can find the overview documentation on the Kusto Emulator here and a video here: Kusto Emulator video Ingesting files from AWS S3 (GA) Amazon S3 is one of the most popular object storage services. AWS Customers use Amazon S3 to store data for a range of use cases, such as data lakes, websites, mobile applications, backup and restore, archive, applications, IoT devices, log analytics and big data analytics. With the native S3 ingestion support in ADX, customers can bring data from S3 natively without relying on complex ETL pipelines. Customers can also create a continuous data ingestion pipeline to bring data from S3. For more details please refer. Azure Stream Analytics ADX output (GA) Azure Data Explorer output for Azure Stream Analytics is now Generally Available. ASA-ADX output has been available in Preview since last year. Customers can build powerful real time analytics architecture by leveraging ASA and ADX together. With this new integration Azure Stream Analytics job can natively ingest the data into Azure Data Explorer and Synapse Data Explorer tables. Read more about the output plugin set up and ADX-ASA common use cases. Open Telemetry exporter (GA) OpenTelemetry (OTel) is a vendor-neutral open-source observability framework for instrumenting, generating, collecting, and exporting telemetry data such as traces, metrics, logs. We are releasing ADX OpenTelemetry exporter which supports ingestion of data from many receivers into Azure Data Explorer allowing customers to instrument, generate, collect and store data using a vendor-neutral open-source framework. Read more. Streaming support in Telegraf connector (GA) Telegraf is an open source, lightweight, minimal memory footprint agent for collecting, processing, and writing telemetry data including logs, metrics, and IoT data. The Azure Data Explorer output plugin serves as the connector from Telegraf and supports ingestion of data from many types of input plugins into Azure Data Explorer. We have added support for "managed" steaming ingestion in Telegraf which defaults to streaming ingestion providing latency up to a second when the target table is streaming enabled, with a fallback to batched or queued ingestion. Read more. Protobuf support in Kafka sink (GA) Protocol buffers (Protobuf) are a language and platform-neutral, extensible mechanism for serializing and deserializing structured data for use in communications protocols and data storage. Azure Data Explorer Kafka sink - a gold certified Confluent connector - helps ingest data from Kafka to Azure Data Explorer. We have added Protobuf support in the connector to help customers bring Protobuf data into ADX. Read more. Management No / minimal downtime SKU change Azure Data Explorer provides full flexibility to choose the most optimal SKU based on customer’s desired cpu/cache ratio or query/storage patterns to reach optimal performance. Till now sku change took time as the operation was sequential, and the cluster waited for the new VMs to be created. Now it is done with zero downtime so you can reach the optimal performance faster with no disruption to users. We achieve this in two steps – first, we prepare the new VMs in parallel for the old cluster continuing to provide service, and once the new VMs are ready, only then we perform the switchover to the new VMs. This new capability makes it seamless to transition to the new Lasv3 sku that delivers extreme price performance Table Level Sharing support via Azure Data Share (GA) With Azure Data Share, users can establish in-place sharing with Azure Data Explorer databases, allowing to easily and securely share your data with people in your company or external partners. Sharing occurs in near-real-time, with no need to build or maintain a data pipeline. We have now added Table level sharing support via the Azure Data Share UX where users can share specific tables in the database by including or excluding certain tables or using wildcards. This allows you to provide a subset of the data using different permission sets, allows multitenant ISV solution to keep proprietary tables hidden but share specific tenant data in place to their customers. Please read more and give it a try. Aliasing follower databases (GA) The follower database feature allows you to attach a database located in a different cluster to your Azure Data Explorer cluster. Prior to aliasing capability, a database named DB created on the follower cluster took precedence over a database with the same name that was created on the leader cluster, not allowing databases with same name to co-exist. But now you can override the database name while establishing a follower relationship. This allows you to follow multiple databases with the same name from multiple leader clusters or even just make a database available to users with a more user-friendly name. You can either use a databaseNameOverride property to provide a new follower database name, or use databaseNamePrefix when following an entire cluster to add a prefix to all of the databases original names from leader cluster. Read more about the API, and usage code samples. Leader follower discoverability We have enhanced the discoverability of leader & follower databases in your ADX clusters You can visit the database blade in Azure portal to easily identify all the follower databases following a leader, and the leader for a given follower. The details pane also provides granularity around which specific tables, external tables, and Materialized views have been included or excluded. Read more. Performance Lsv3/Lasv3 SKU availability (GA) The Lsv3 (Intel-based) and Lasv3 (AMD-based) are the recommended Storage Optimized SKU families of Azure Data Explorer. The 2 SKU families are supported in 2 configurations: L8sv3/L8asv3: 8 vCPUs with 1.75TB L16sv3/L16asv3: 16 vCPUs with 3.5TB. The two SKU families are optimal for Storage bound workloads from both a cost and performance perspective. The SKUs are in the process of being rolled out worldwide, and are already available in 17 leading regions. Improved performance of export to parquet blobs (Preview) This new feature allows a more efficient export into Parquet. In most cases it’s faster, and creates smaller output blobs, that are more efficient to query. To make use of this feature set the useNativeParquetWriter to true (default is false) in the one-time Export command or when creating Continuous Data Export. Example: .export to table externalTableParquet with (useNativeParquetWriter = true) <| Usage Improved performance for SQL requests (Preview) Requests to SQL will become more optimized, both when using external tables, and when using the sql_request_plugin (if OutputSchema is provided). This will be achieved by pushing down predicates and operators from the KQL query, to the SQL query sent to SQL server. For example, the query `external_table(“my_sql_table”) | where Name == “John”` used to fetch all the records from the SQL, and then apply the filter on the Name column. After the change, the filter on the Name column will be pushed to the query to the SQL server, which will result in more efficient resource utilization on both SQL server and ADX, and the overall query duration should decrease significantly. Available in Preview by the end of this year, we will share the Preview process closer to the date. Improve performance of ingestion of parquet blobs (Preview) This new feature allows a more efficient ingestion of Parquet blobs – the CPU utilization and the ingestion duration will decrease by tens of percent. In the preview phase, there will be an option to enable it per ingestion command, using a dedicated flag. Once the feature is GA, this will become the default mode. syntax : `with (nativeParquetIngestion=true)` Available in Preview by the end of this year, and planned GA around early next year. Power to the query Parse-kv operator (GA) A new operator which extracts structured information from a string expression and represents the information in a key/value form. The following extraction modes are supported: Specified delimeter: Extraction based on specified delimiters that dictate how keys/values and pairs are separated from each other. Non-specified delimeter: Extraction with no need to specify delimiters. Any non-alphanumeric character is considered a delimiter. Regex: Extraction based on RE2 regular expression. Read more. Scan operator (GA) This powerful operator enables efficient and scalable process mining and sequence analytics and user analytics in ADX. The user can define a linear sequence of events and ‘scan’ will quickly extract all sequences of those events. Common scenarios for using ‘scan’ include preventive maintenance for IoT devices, customers funnel analysis, recursive calculation, security scenarios looking for known attack steps and more. Read more. New Python image for the inline python() plugin (Preview) We have updated the Python image to Python 3.9 and the latest version of packages. This would be the default image when enabling the plugin, while existing users can continue to use the older image (based on Python 3.6.5) for compatibility. Read more. Available in Preview by the end of this year. Reliability Continuous export: write-once guarantee (GA) Up until now, transient errors during continuous export sometimes generated blobs with duplicate data, and sometimes corrupted blobs, in addition to blobs containing the correct data. Following the change, data will be exported exactly once. Hyper-V support for sandboxes We are replacing the sandboxing of the python plugin to use Hyper-V technology. Hyper-V offers enhanced isolation, thus improving security. This will allow Python and R plugins on SKUs with hyper threading improving the overall performance. Visualization ADX Dashboards (GA) Azure Data Explorer Dashboards is a web component that enables you to run queries and build dashboards in the stand-alone web application, the Azure Data Explorer web UI. Azure Data Explorer Dashboards provide two main advantages: Native integration with Azure Data Explorer web UI extended functionalities Optimized dashboard rendering performance Read more about ADX Dashboards here Generally available (GA) by the end of this year. Plotly support in ADX Dashboards (Preview) Plotly is a python graphic package which allows creation of advanced visualizations including 3D, heatmaps, animation and many more. The visualization is defined by short Python script that is supplied by the user, thus it is very flexible and can be tailored to the specific scenario by the user’s program. We are launching support of Plotly visualizations in ADX Dashboards. Available in Preview by the end of this year. Dashboards base query (GA) Base query is a new feature in ADX Dashboards that allows you to re-use queries among Dashboards’ tiles (i.e., visualizations). This feature will not only make it easier to manage the underlying queries of a dashboard but will also improve its performance. Generally Available by the end of this year. Kusto Trender (GA) Timeseries insights will be retired on March 31st, 2025. Meanwhile most of the customers are transitioning to Azure Data Explorer (ADX) for their (Industrial) Internet of Things use cases. Azure Data Explorer provides the best data analytics platform for streaming telemetry data. To further accelerate the transition, we upgraded the Timeseries insights client visualization component to work on ADX. The code is available on GitHub under the MIT license. Samples can be accessed via the Kusto Trender Samples Gallery. Kusto.Explorer - Automation (Preview) Query Automation allows you to define a workflow that contains a series of queries with rules and logic that govern the order in which they are executed. Automations can be reused, and users can re-run the workflow, to get updated results. Upon completion, the saved Automation produces an analysis report, summarizing all queries results with additional insights. This powerful feature is now available in Kusto.Explorer, a rich desktop app that enables you to explore your data using the KQL Guidance POC Playbook We are releasing a prescriptive guidance to help our customers plan for their Azure Data Explorer proof of concept (POC) . The playbook provides a high-level methodology for preparing and running an effective Azure Data Explorer POC project. Please refer the ADX POC Playbook for details. Lastly, if you missed Satya's Ignite 2022 keynote , do watch him talk about real-time analytics with ADX. We would love to hear your feedback and overall experience with these new capabilities. Please let us know your thoughts in the comments.6.6KViews12likes2Comments🔍🎬 Introducing Kusto Detective Agency Season 2: Bigger, Better, and Brimming with Prizes! 🎉🔍
Greetings, esteemed investigators and data enthusiasts! We are thrilled to announce the highly anticipated launch of Kusto Detective Agency Season 2. After the immense success of Season 1, with over 10,000 participants diving deep into the world of data investigation, we cannot thank you enough for your incredible support and enthusiasm! Season 2 of Kusto Detective Agency is set to be an even grander adventure, filled with more challenges, mind-bending mysteries, and countless opportunities to showcase your analytical skills. Prepare yourself for a journey that will push the boundaries of your data prowess and reward you with an unforgettable experience. One of the highlights of Season 2 is the abundance of amazing prizes waiting to be claimed by our brilliant detectives. From cutting-edge tech gadgets to exclusive KDA merchandise and flashy detective badges, the stakes have never been higher. We are grateful to AMD for their collaboration in making Season 2 possible. Powered by AMD's advanced technologies, this season promises to be a true marvel of data exploration and analysis. Thank you, AMD, for joining forces with us! For those who are new to the world of Kusto Detective Agency, fear not! Getting started is easier than ever before. You can begin your journey by either - 1. Creating KQL database in Synapse Real-time Analytics in Microsoft Fabric. Please make sure you create a KQL database in “My workspace” - ensuring it remains to be your personal database. Sign up for Fabric free trial. 2. Or by creating a Kusto free cluster Sharpen your skills, familiarize yourself with the tools at your disposal, and start unraveling captivating mysteries right away. The possibilities are endless, and we can't wait to see what you discover! Now, without further ado, here is what you can expect in Kusto Detective Agency Season 2. So are you ready to accept the challenge? Gather your wits, familiarize yourself with the tools at your disposal, and start unraveling captivating mysteries right away. Join us for a season that promises to be the data-driven journey of a lifetime We have 10 cases in season 2 and we will be releasing a case every 2 weeks starting today. Together, let's make Season 2 of Kusto Detective Agency an adventure for the ages! Happy investigating! Recruiting now at: https://detective.kusto.io/ #KustoDetectiveAgency #Season2 #DataMysteryUnveiled13KViews10likes0CommentsMultivariate Anomaly Detection in Azure Data Explorer
ADX contains native support for detecting anomalies over multiple time series by using the function series_decompose_anomalies() that can analyze thousands of time series in seconds, enabling near real time monitoring solutions and workflows based on ADX. This function analyzes each metric independently for anomalies, however there are some anomalies that can only be detected by looking on multiple metrics at the same time. In this blog we present new ADX functions for multivariate anomaly detection, that jointly analyze time series of multiple metrics, and present example of these anomalies when analyzing prices of MSFT and SPY pair.9.3KViews10likes0Comments