Big Data
41 TopicsPower BI Embedded dashboards with Azure Stream Analytics
Azure Stream Analytics is a fully managed “serverless” PaaS service in Azure built for running real-time analytics on fast moving streams of data. Today, a significant portion of Stream Analytics customers use Power BI for real-time dynamic dashboarding. Support for Power BI Embedded has been a repeated ask from many of our customers, and today we are excited to share that it is now generally available. Read about it in the Azure blog.1.4KViews1like0CommentsAzure Time Series Insights API, Reference Data, Ingress, and Azure Portal Updates
Today we are announcing the release of several updates to Time Series Insights based on customer feedback. Time Series Insights is a fully-managed analytics, storage, and visualization service that makes it simple to explore and analyze billions of IoT events simultaneously. It allows you to visualize and explore time series data streaming into Azure in minutes, all without having to write a single line of code. For more information about the product, pricing, and getting started, please visit the Time Series Insights website. We also offer a free demo environment to experience the product for yourself. We know that administrators want to plan for and manage their Time Series Insights environments with usage and health telemetry in the Azure Portal. To help enable them to do this more effectively, we have added ingress and storage monitoring at the Time Series Insights environment level in the Portal. We are also working on adding metric alerts, so you can be automatically informed of critical information related to the status of your environment. We will continue to add additional environment telemetry to the Azure Portal in the future – be on the lookout for updates in the coming months. Read about it in the Azure blog.1.4KViews0likes0CommentsBuild secure Oozie workflows in Azure HDInsight with Enterprise Security Package
Customers love to use Hadoop and often rely on Oozie, a workflow and coordination scheduler for Hadoop to accelerate and ease their big data implementation. Oozie is integrated with the Hadoop stack, and it supports several types of Hadoop jobs. However, for users of Azure HDInsight with domain joined clusters, Oozie was not a supported option. To get around this limitation customers had to run Oozie on a regular cluster. This was costly with extra administrative overhead. Today we are happy to announce that customers can now use Oozie in domain-joined Hadoop clusters too. In domain-joined clusters, authentication happens through Kerberos and fine-grained authorization is through Ranger policies. Oozie supports impersonation of users and a basic authorization model for workflow jobs. Read more about it in the Azure blog.905Views0likes0CommentsWelcome our newest family member - Data Box Disk
Last year at Ignite, we talked to you about the preview of Azure Data Box, a ruggedized, portable, and simple way to move large datasets into Azure. So far, the response has been phenomenal. Customers have used Data Box to move petabytes of data into Azure. While our customers and partners love Data Box, they told us that they also wanted a lower capacity, even easier-to-use option. They cited examples such as moving data from Remote/Office Branch Offices (ROBOs), which have smaller data sets and minimal on-site tech support. They said they needed an option for recurring, incremental transfers for ongoing backups and archives. And they said it needed to have the same traits as Data Box – namely fast, simple, and secure. So, we're here today with our partners at Inspire 2018 to announce a new addition to the Data Box family: Azure Data Box Disk. Read about it in the Azure blog.1.5KViews0likes0CommentsStructured streaming with Azure Databricks into Power BI & Cosmos DB
In this blog we’ll discuss the concept of Structured Streaming and how a data ingestion path can be built using Azure Databricks to enable the streaming of data in near-real-time. We’ll touch on some of the analysis capabilities which can be called from directly within Databricks utilising the Text Analytics API and also discuss how Databricks can be connected directly into Power BI for further analysis and reporting. As a final step we cover how streamed data can be sent from Databricks to Cosmos DB as the persistent storage. Structured streaming is a stream processing engine which allows express computation to be applied on streaming data (e.g. a Twitter feed). In this sense it is very similar to the way in which batch computation is executed on a static dataset. Computation is performed incrementally via the Spark SQL engine which updates the result as a continuous process as the streaming data flows in. Read more about it in the Azure blog.1.6KViews0likes0CommentsSiphon: Streaming data ingestion with Apache Kafka
Data is at the heart of Microsoft’s cloud services, such as Bing, Office, Skype, and many more. As these services have grown and matured, the need to collect, process and consume data has grown with it as well. Data powers decisions, from operational monitoring and management of services, to business and technology decisions. Data is also the raw material for intelligent services powered by data mining and machine learning. Most large-scale data processing at Microsoft has been done using a distributed, scalable, massively parallelized storage and computing system that is conceptually similar to Hadoop. This system supported data processing using a batch processing paradigm. Over time, the need for large scale data processing at near real-time latencies emerged, to power a new class of ‘fast’ streaming data processing pipelines. Siphon was created as a highly available and reliable service to ingest massive amounts of data for processing in near real-time. Apache Kafka is a key technology used in Siphon, as its scalable pub/sub message queue. Siphon handles ingestion of over a trillion events per day across multiple business scenarios at Microsoft. Initially Siphon was engineered to run on Microsoft’s internal data center fabric. Over time, the service took advantage of Azure offerings such as Apache Kafka for HDInsight, to operate the service on Azure. Read about it in the Azure blog.2.1KViews0likes0CommentsTop 8 reasons to choose Azure HDInsight
Household names such as Adobe, Jet, ASOS, Schneider Electric, and Milliman are amongst hundreds of enterprises that are powering their Big Data Analytics using Azure HDInsight. Azure HDInsight launched nearly six years ago and has since become the best place to run Apache Hadoop and Spark analytics on Azure. Check out top eight reasons why enterprises are choosing Azure HDInsight for their big data applications in the Azure blog.1.2KViews0likes0CommentsProcess more files than ever and use Parquet with Azure Data Lake Analytics
Azure Data Lake Analytics (ADLA) is a serverless PaaS service in Azure to prepare and transform large amounts of data stored in Azure Data Lake Store or Azure Blob Storage at unparalleled scale. ADLA now offers some new, unparalleled capabilities for processing files of any formats including Parquet at tremendous scale. Read about it in the Azure blog.997Views0likes0CommentsGetting started with Apache Spark on Azure Databricks
Data is growing at an astounding rate, with an estimated 2.5 quintillion bytes being created everyday. Data analysts predict that by 2020, the world’s collected data will quadruple. In the sea of all this data, we are continually exploring new ways of analyzing and interpreting data in a way that’s productive, meaningful and insightful. Designed in collaboration with the original founders of Apache® Spark™, Azure Databricks combines the best of Databricks and Microsoft Azure to help customers accelerate innovation with streamlined workflows, an interactive workspace and one-click set up. Azure Databricks is an analytics engine built for large scale data processing that enables collaboration between data scientists, data engineers and business analysts. Read more about it in the Azure blog.1.3KViews0likes0CommentsPython, Node.js, Go client libraries for Azure Event Hubs in public preview
Azure Event Hubs is expanding its ecosystem to support more languages. Azure Event Hubs is a highly scalable data-streaming platform processing millions of events per second. Event Hubs uses Advanced Message Queuing Protocol (AMQP 1.0) to enable interoperability and compatibility across platforms. Now, with the addition of new clients, you can easily get started with Event Hubs. We are happy to have the new client libraries for Go, Python, and Node.js in public preview. Do your application logging or click stream analytics pipelines, live Dashboarding, or any telemetry processing with our rich ecosystem offering language of your choice. Read more about it in the Azure blog.1.1KViews0likes0Comments