As organizations connect more and more devices, the volume of data influx has increased exponentially. These ever-expanding datasets, including Internet of Things (IoT) data, can uncover new data analytics and insights that help organizations revolutionize their business. But organizations need to first be able to leverage the data in an effective manner to reap such benefits.
Real-time--or streaming--data can become perishable if not acted upon in a timely manner. Storing, processing and visualizing it can be a challenge; but with technologies now available, such processes can be managed almost instantaneously at the time the information was ingested. These new toolsets can help organizations better employ data to monitor actual time operations, detect anomalies, filter events, enrich data for machine learning, etc., to ultimately make real-time business decisions that dramatically improve their enterprise processes, solutions and operations.
In this blog, we will highlight different technology stacks that can be used to manage IoT data and how they can be implemented together.
Stream processing using Apache Spark
Stream processing enables low downtime processing and assessing of streaming data. Apache Spark is an open-source framework that offers capabilities for high-throughput, scalable, fault-tolerant stream processing of real-time data. Spark streaming divides data streams into batches of Z seconds called Dstreams, which is the sequence of Resilient Distributed Datasets (RDD), one for each batch interval. Individual RDD contains the records received during the batch interlude as depicted in the image below.*
DStreams can be used in many real-time use cases where a significant quantity of data needs to be processed as soon as it arrives, such as for fraud detection, managing web traffic, etc. Spark streaming receives the data streams then divides them into batches as depicted in the above diagram, then processes them to generate final outcome streams.
Data transfer using MQTT
Message Queue Telemetry Transport (MQTT) is a publisher/subscriber messaging system which works well for IoT use cases dealing with low memory devices and unreliable networks. MQTT uses a broker to hold the stream data until an organization is ready to consume it. Spark Streaming provides the for MQTT Stream connectivity.
Data store in Elasticsearch
Elasticsearch is another API-driven, scalable tool in Apache used to store, search and analyze any kind of data. It provides indices over the data for faster retrieval. Combined with the visualization tool Kibana, it is designed to enable a platform for a business critical analytics and insights.
Real-time visualization using Kibana
Kibana is the data visualization tool which has a direct plugin with Elasticsearch. It provides interactive engagement with the data through graphs and dashboards for analytic purpose, as shown in the image below.
Users can create bar, line and scatter plots, or pie charts and maps on top of large volumes of data within Kibana. The “Coordinate Map” feature of Kibana lets you plot geo-locations and perform useful analytics. Kibana enables visual analysis and real-time study of data stored in Elasticsearch.
IoT data processing and visualization example
Let’s look at a use case example of a coal mine to help visualize how the combined implementation of the above-mentioned technologies can be used for IoT data collection, storage and analysis as shown in the image below.
In the energy and utility industry, there are a vast number of coal mines located in different geo-location zones across the world. The conditions inside each mine needs to be monitored for worker safety. Different condition data components such as temperature, humidity and corresponding geo-location are transmitted from the mines to various coal companies every few seconds using MQTT protocol. This data could then be processed using Spark streaming and uploaded to Elasticsearch. Kibana could then be used for visualization.
The below-mentioned code snippets depict how data can be loaded from the MQTT broker using Spark Streaming and then saved in Elasticsearch, and Kibana.
Collecting MQTT messages using streaming context
Each MQTT message is saved in Elastic Search
Define index patterns for data
The index pattern under the Management tab in Kibana can be defined to filter out data. The data index pattern can be seen under the Discover tab, as seen in the below screenshot.
Graphs for data visualization
Individual graphs can be created under Kibana’s Visualize tab. Also, a dashboard can be created to include these graphs under the Dashboard tab, as depicted in the below image.
The continuous data streaming from IoT devices is only impactful if it can be conserved and utilized to make quick and informed decisions. The integration of real-time analytics technologies helps refine the data so it can be employed for predictive analysis and maintenance, regulatory checks, improved response times, etc.
Through the use of the above-mentioned technology stacks, many benefits can be achieved, such as dynamic processing, security, improved quality of service, scalability, distributed approach, interactive display, easy distribution of dashboards, etc.