Ingestion
8 Topics[New blog post] Azure Data Explorer connection for CosmosDB change feed
Did you know Azure Data Explorer can ingest Azure CosmosDB documents using a non-code data connection? This records both inserts and updates as a stream. Check out this walk-through. You can even record historical data if needed. Read the details at https://sandervandevelde.wordpress.com/2023/09/10/azure-data-explorer-connection-for-cosmosdb-change-feed/384Views0likes0CommentsData Factory command ".ingest into" adding new columns
I have this table: .create table lsa_datahub_country_codes ( DWH_LOAD_DATE:datetime, DWH_SOURCE_ID:string, DWH_CREATION_DATE:datetime, FIFA: string, DIAL: string, ISO3166_1_ALPHA_3: string, MARC: string, IS_INDEPENDENT: string, ISO3166_1_NUMERIC: string, GAUL: string, FIPS: string, WMO: string, ISO3166_1_ALPHA_2: string, ITU: string, IOC: string, DS: string, UNTERM_SPANISH_FORMAL: string, GLOBAL_CODE: string, INTERMEDIATE_REGION_CODE: string, OFFICIAL_NAME_FR: string, UNTERM_FRENCH_SHORT: string, ISO4217_CURRENCY_NAME: string, DEVELOPED_DEVELOPING_COUNTRIES: string, UNTERM_RUSSIAN_FORMAL: string, UNTERM_ENGLISH_SHORT: string, ISO4217_CURRENCY_ALPHABETIC_CODE: string, SMALL_ISLAND_DEVELOPING_STATES_SIDS: string, UNTERM_SPANISH_SHORT: string, ISO4217_CURRENCY_NUMERIC_CODE: string, UNTERM_CHINESE_FORMAL: string, UNTERM_FRENCH_FORMAL: string, UNTERM_RUSSIAN_SHORT: string, M49: int, SUB_REGION_CODE: string, REGION_CODE: string, OFFICIAL_NAME_AR: string, ISO4217_CURRENCY_MINOR_UNIT: string, UNTERM_ARABIC_FORMAL: string, UNTERM_CHINESE_SHORT: string, LAND_LOCKED_DEVELOPING_COUNTRIES_LLDC: string, INTERMEDIATE_REGION_NAME: string, OFFICIAL_NAME_ES: string, UNTERM_ENGLISH_FORMAL: string, OFFICIAL_NAME_CN: string, OFFICIAL_NAME_EN: string, ISO4217_CURRENCY_COUNTRY_NAME: string, LEAST_DEVELOPED_COUNTRIES_LDC: string, REGION_NAME: string, UNTERM_ARABIC_SHORT: string, SUB_REGION_NAME: string, OFFICIAL_NAME_RU: string, GLOBAL_NAME: string, CAPITAL: string, CONTINENT: string, TLD: string, LANGUAGES: string, GEONAME_ID: int, CLDR_DISPLAY_NAME: string, EDGAR: string ) with ( docstring = "Datahub country codes", folder = "LSA" ); and this command in Data Factory: .ingest into table lsa_datahub_country_codes ("https://xxxxxx.blob.core.windows.net/datahub/20220108/country_codes.csv;yyyyyy") with ( format = "CSV", creationTime = "2022-01-08", ignoreFirstRecord=true, ingestionMapping = ```[ {"Column":"DWH_LOAD_DATE", "Properties": {"ConstValue": "2022-01-08"}}, {"column":"DWH_SOURCE_ID", "Properties": {"ConstValue": "country_codes.csv"}}, {"Column":"DWH_CREATION_DATE", "Properties": {"ConstValue": "2022-01-08 00:00:00"}}, {"column":"FIFA", "Properties":{"Ordinal":"0"}}, {"column":"DIAL", "Properties":{"Ordinal":"1"}}, {"column":"ISO3166_1_ALPHA_3", "Properties":{"Ordinal":"2"}}, {"column":"MARC", "Properties":{"Ordinal":"3"}}, {"column":"IS_INDEPENDENT", "Properties":{"Ordinal":"4"}}, {"column":"ISO3166_1_NUMERIC", "Properties":{"Ordinal":"5"}}, {"column":"GAUL", "Properties":{"Ordinal":"6"}}, {"column":"FIPS", "Properties":{"Ordinal":"7"}}, {"column":"WMO", "Properties":{"Ordinal":"8"}}, {"column":"ISO3166_1_ALPHA_2", "Properties":{"Ordinal":"9"}}, {"column":"ITU", "Properties":{"Ordinal":"10"}}, {"column":"IOC", "Properties":{"Ordinal":"11"}}, {"column":"DS", "Properties":{"Ordinal":"12"}}, {"column":"UNTERM_SPANISH_FORMAL", "Properties":{"Ordinal":"13"}}, {"column":"GLOBAL_CODE", "Properties":{"Ordinal":"14"}}, {"column":"INTERMEDIATE_REGION_CODE", "Properties":{"Ordinal":"15"}}, {"column":"OFFICIAL_NAME_FR", "Properties":{"Ordinal":"16"}}, {"column":"UNTERM_FRENCH_SHORT", "Properties":{"Ordinal":"17"}}, {"column":"ISO4217_CURRENCY_NAME", "Properties":{"Ordinal":"18"}}, {"column":"DEVELOPED_DEVELOPING_COUNTRIES", "Properties":{"Ordinal":"19"}}, {"column":"UNTERM_RUSSIAN_FORMAL", "Properties":{"Ordinal":"20"}}, {"column":"UNTERM_ENGLISH_SHORT", "Properties":{"Ordinal":"21"}}, {"column":"ISO4217_CURRENCY_ALPHABETIC_CODE", "Properties":{"Ordinal":"22"}}, {"column":"SMALL_ISLAND_DEVELOPING_STATES_SIDS", "Properties":{"Ordinal":"23"}}, {"column":"UNTERM_SPANISH_SHORT", "Properties":{"Ordinal":"24"}}, {"column":"ISO4217_CURRENCY_NUMERIC_CODE", "Properties":{"Ordinal":"25"}}, {"column":"UNTERM_CHINESE_FORMAL", "Properties":{"Ordinal":"26"}}, {"column":"UNTERM_FRENCH_FORMAL", "Properties":{"Ordinal":"27"}}, {"column":"UNTERM_RUSSIAN_SHORT", "Properties":{"Ordinal":"28"}}, {"column":"M49", "Properties":{"Ordinal":"29"}}, {"column":"SUB_REGION_CODE", "Properties":{"Ordinal":"30"}}, {"column":"REGION_CODE", "Properties":{"Ordinal":"31"}}, {"column":"OFFICIAL_NAME_AR", "Properties":{"Ordinal":"32"}}, {"column":"ISO4217_CURRENCY_MINOR_UNIT", "Properties":{"Ordinal":"33"}}, {"column":"UNTERM_ARABIC_FORMAL", "Properties":{"Ordinal":"34"}}, {"column":"UNTERM_CHINESE_SHORT", "Properties":{"Ordinal":"35"}}, {"column":"LAND_LOCKED_DEVELOPING_COUNTRIES_LLDC", "Properties":{"Ordinal":"36"}}, {"column":"INTERMEDIATE_REGION_NAME", "Properties":{"Ordinal":"37"}}, {"column":"OFFICIAL_NAME_ES", "Properties":{"Ordinal":"38"}}, {"column":"UNTERM_ENGLISH_FORMAL", "Properties":{"Ordinal":"39"}}, {"column":"OFFICIAL_NAME_CN", "Properties":{"Ordinal":"40"}}, {"column":"OFFICIAL_NAME_EN", "Properties":{"Ordinal":"41"}}, {"column":"ISO4217_CURRENCY_COUNTRY_NAME", "Properties":{"Ordinal":"42"}}, {"column":"LEAST_DEVELOPED_COUNTRIES_LDC", "Properties":{"Ordinal":"43"}}, {"column":"REGION_NAME", "Properties":{"Ordinal":"44"}}, {"column":"UNTERM_ARABIC_SHORT", "Properties":{"Ordinal":"45"}}, {"column":"SUB_REGION_NAME", "Properties":{"Ordinal":"46"}}, {"column":"OFFICIAL_NAME_RU", "Properties":{"Ordinal":"47"}}, {"column":"GLOBAL_NAME", "Properties":{"Ordinal":"48"}}, {"column":"CAPITAL", "Properties":{"Ordinal":"49"}}, {"column":"CONTINENT", "Properties":{"Ordinal":"50"}}, {"column":"TLD", "Properties":{"Ordinal":"51"}}, {"column":"LANGUAGES", "Properties":{"Ordinal":"52"}}, {"column":"GEONAME_ID", "Properties":{"Ordinal":"53"}}, {"column":"CLDR_DISPLAY_NAME", "Properties":{"Ordinal":"54"}}, {"column":"EDGAR", "Properties":{"Ordinal":"55"}} ]``` ) to ingest this csv https://datahub.io/core/country-codes/r/country-codes.csv and, after the execution, Data Explorer create new columns in the table (in the end of the table, with the same csv first line header columns): "DWH_LOAD_DATE": 2022-01-08T00:00:00Z, "DWH_SOURCE_ID": country_codes.csv, "DWH_CREATION_DATE": 2022-01-09T02:19:45Z, "FIFA": TPE, "DIAL": , "ISO3166_1_ALPHA_3": , "MARC": ch, "IS_INDEPENDENT": , "ISO3166_1_NUMERIC": , "GAUL": 925, "FIPS": TW, "WMO": , "ISO3166_1_ALPHA_2": , "ITU": , "IOC": TPE, "DS": RC, "UNTERM_SPANISH_FORMAL": , "GLOBAL_CODE": , "INTERMEDIATE_REGION_CODE": , "OFFICIAL_NAME_FR": , "UNTERM_FRENCH_SHORT": , "ISO4217_CURRENCY_NAME": , "DEVELOPED_DEVELOPING_COUNTRIES": , "UNTERM_RUSSIAN_FORMAL": , "UNTERM_ENGLISH_SHORT": , "ISO4217_CURRENCY_ALPHABETIC_CODE": , "SMALL_ISLAND_DEVELOPING_STATES_SIDS": , "UNTERM_SPANISH_SHORT": , "ISO4217_CURRENCY_NUMERIC_CODE": , "UNTERM_CHINESE_FORMAL": , "UNTERM_FRENCH_FORMAL": , "UNTERM_RUSSIAN_SHORT": , "M49": , "SUB_REGION_CODE": , "REGION_CODE": , "OFFICIAL_NAME_AR": , "ISO4217_CURRENCY_MINOR_UNIT": , "UNTERM_ARABIC_FORMAL": , "UNTERM_CHINESE_SHORT": , "LAND_LOCKED_DEVELOPING_COUNTRIES_LLDC": , "INTERMEDIATE_REGION_NAME": , "OFFICIAL_NAME_ES": , "UNTERM_ENGLISH_FORMAL": , "OFFICIAL_NAME_CN": , "OFFICIAL_NAME_EN": , "ISO4217_CURRENCY_COUNTRY_NAME": , "LEAST_DEVELOPED_COUNTRIES_LDC": , "REGION_NAME": , "UNTERM_ARABIC_SHORT": , "SUB_REGION_NAME": , "OFFICIAL_NAME_RU": , "GLOBAL_NAME": , "CAPITAL": , "CONTINENT": , "TLD": .tw, "LANGUAGES": , "GEONAME_ID": , "CLDR_DISPLAY_NAME": , "EDGAR": , "Dial": 886, "ISO3166_1_Alpha_3": TWN, "is_independent": Yes, "ISO3166_1_numeric": 158, "ISO3166_1_Alpha_2": TW, "UNTERM_Spanish_Formal": , "Global_Code": , "Intermediate_Region_Code": , "official_name_fr": , "UNTERM_French_Short": , "ISO4217_currency_name": , "Developed_Developing_Countries": , "UNTERM_Russian_Formal": , "UNTERM_English_Short": , "ISO4217_currency_alphabetic_code": , "Small_Island_Developing_States_SIDS": , "UNTERM_Spanish_Short": , "ISO4217_currency_numeric_code": , "UNTERM_Chinese_Formal": , "UNTERM_French_Formal": , "UNTERM_Russian_Short": , "Sub_region_Code": , "Region_Code": , "official_name_ar": , "ISO4217_currency_minor_unit": , "UNTERM_Arabic_Formal": , "UNTERM_Chinese_Short": , "Land_Locked_Developing_Countries_LLDC": , "Intermediate_Region_Name": , "official_name_es": , "UNTERM_English_Formal": , "official_name_cn": , "official_name_en": , "ISO4217_currency_country_name": , "Least_Developed_Countries_LDC": , "Region_Name": , "UNTERM_Arabic_Short": , "Sub_region_Name": , "official_name_ru": , "Global_Name": , "Capital": Taipei, "Continent": AS, "Languages": zh-TW,zh,nan,hak, "Geoname_ID": 1668284, "CLDR_display_name": Taiwan, But, when i execute the same command from Data Explorer web or Kusto Explorer these new columns are not created and the data is saved correctly. What's happening?739Views0likes0Commentshow to configure process of continuous export of data from Azure Data Explorer to Azure Data Lake v1
Hi Is is possible to configure continuous export of data from Azure Data Explorer to Azure Data Lake v1 in background? For example I have created table .create external table ExternalTableADL01 (name:string, age:int, [date]:datetime) kind=adl dataformat=csv ( h@'adl://xxxxx.azuredatalakestore.net/folder;impersonate' ) .create-or-alter continuous-export ContinuousExportDemo01 to table ExternalTableADL01 with (intervalBetweenRuns=5m ) <| TestCursor But I have error An admin command cannot be executed due to an invalid state: State='External table 'ExternalTableADL01' cannot be used for continuous export as it uses impersonate authentication type' based on official docs there is only one way https://docs.microsoft.com/en-us/azure/data-explorer/kusto/api/connection-strings/storage Append ;token=AadToken to the URI, with AadToken being a base-64 encoded AAD access token (make sure the token is for the resource https://management.azure.com/) but it is kind of interactive, because token is expiring.Solved2.6KViews0likes2CommentsHow to Monitor Azure Data Explorer ingestion using diagnostic logs (Preview)
Azure Data Explorer uses diagnostic logs for insights on ingestion successes and failures. You can export operation logs to Azure Storage, Event Hub, or Log Analytics to monitor ingestion status. Logs from Azure Storage and Azure Event Hub can be routed to a table in your Azure Data Explorer cluster for further analysis.3.1KViews1like0CommentsCalculating Data Latency
When using Azure Data Explorer to process near real time data, it’s often important to understand how quickly or slowly the data arrives in the source table. For this post, we’ll assume that our source data has an EventTime field which denotes when the event actually happened on the source entity. The quickest way to determine latency is to look for the latest EventTime and compare it to the current time. If you do this repeatedly, you’ll get a rough idea of how often the table is getting updated and how fresh the data is. MyEventData | summarize max(EventTime) We can do a lot better than that though. In the background, Kusto is keeping track of the time that every row was ready to be queried. That information is available in the ingestion_time() scalar function. Comparing the ingestion time to the EventTime will show the lag for every row: MyEventData | project lag = ingestion_time() - EventTime At this point I can run some basic aggregations like min, avg and max, but let’s do more and build a cumulative distribution function for the latency. This will tell me how much of the data arrives within X minutes of the event time. I'll start by creating a function which calculates the cumulative distribution for a table of two values. This function uses the invoke operator which receives the source of the invoke as a tabular parameter argument. .create-or-alter function CumulativePercentage(T:(x:real,y:real)) { let sum = toreal(toscalar(T | summarize sum(y))); T | order by x asc | summarize x=make_list(x), y=make_list(y/sum * 100) | project x = x, y = series_iir(y, dynamic([1]), dynamic([1,-1])) | mv-expand x to typeof(real), y to typeof(real) } Now we need to get our ingestion data into the format that the CumulativePercentage function requires, invoke that function and render a linechart with the results. MyEventData | project lag = round((ingestion_time() - EventTime)/1m, 1) | summarize count() by lag | project x=toreal(lag), y=toreal(count_) | invoke CumulativePercentage() | render linechart Now I can see that if I wait 2.6 minutes, about 48% of the data will have arrived in Kusto. That information is handy if I’m doing manual debugging on logs, setting up a scheduled job to process to the data, or monitoring the latency of various data sources. [Update 3/12/2019] Replaced mvexpand and makelist with the newer/preferred versions: mv-expand and make_list.3.6KViews2likes0CommentsTip of the week: parse, top and Update policy
Parse operator A parse operator provides a streamlined way to extend a table by applying multiple wildcard match operations on a single string expression. This is most useful when the table has a string column that combines multiple values that you want to extract into individual columns. A common example would be if you have a text columns that is produced by a developer trace instrumentation point ("printf"/"Console.WriteLine"). The following example extracts the Api and User fields from a usage trace line. KustoLogs | where EventText startswith "$$USAGE" | take 2 | parse EventText with * "Api='" Api "', User='" User "'" Ignore | project EventText, Api, User EventText Api User $$USAGE[Gateway]: Api='Admin', User='WORKGROUP\SYSTEM', Applicati Admin WORKGROUP\SYSTEM $$USAGE[Gateway]: Api='Admin', User='WORKGROUP\SYSTEM', Applicati Admin WORKGROUP\SYSTEM Top operator The top operator support sorting by numeric and string values. Usage | where Timestamp >= ago(7d) and Api == "Query" | summarize count() by User | top 1 by User User count_ AzureDataExplorer@outlook.com 13 Update Policy The update policy allows associating a query with a table, and invoking that query on each data ingestion operation to that table, then writing the query output to a different table. This is useful for scenarios that require keeping a modified subset of the original data for longer time periods.1.2KViews0likes0CommentsKusto .NET and .NET Standard SDKs are public on NuGet.Org
We are thrilled to announce that Kusto .NET SDK has been released to Nuget.Org. It includes .Net Standard SDK that is currently available for preview. What’s new? Kusto SDK is now officially available from https://nuget.org. It includes .NET (release) and .NET Standard (preview) packages: [Release] Microsoft.Azure.Kusto.Data 4.0.3 [Release] Microsoft.Azure.Kusto.Ingest 4.0.3 [Preview] Microsoft.Azure.Kusto.Data.NETStandard 4.0.3 [Preview] Microsoft.Azure.Kusto.Ingest.NETStandard 4.0.3 [Release] Microsoft.Azure.Kusto.Management 1.0.7 [Release] Microsoft.Azure.Kusto.Tools 1.1.5 (LightIngest and Kusto.Cli) See Quickstart: Ingest data using the Azure Data Explorer .NET Standard SDK for ingestion using .NET SDK. What’s next? .NET Standard SDK released today is a preview version. We will continue to invest in its quality and reliability in the following versions. Currently .NET Standard SDKs are published as stand-alone packages. We plan to repackage both flavors into a single package. Stay tuned!2.6KViews1like0Comments