Emtec Insights

Tools for IoT Data Processing, Anaytics and VisualizationAs organizations connect more and more devices, the volume of data influx has increased exponentially. These ever-expanding datasets, including Internet of Things (IoT) data, can uncover new data analytics and insights that help organizations revolutionize their business. But organizations need to first be able to leverage the data in an effective manner to reap such benefits.

Real-time--or streaming--data can become perishable if not acted upon in a timely manner. Storing, processing and visualizing it can be a challenge; but with technologies now available, such processes can be managed almost instantaneously at the time the information was ingested. These new toolsets can help organizations better employ data to monitor actual time operations, detect anomalies, filter events, enrich data for machine learning, etc., to ultimately make real-time business decisions that dramatically improve their enterprise processes, solutions and operations.

In this blog, we will highlight different technology stacks that can be used to manage IoT data and how they can be implemented together.

Stream processing using Apache Spark

Stream processing enables low downtime processing and assessing of streaming data. Apache Spark is an open-source framework that offers capabilities for high-throughput, scalable, fault-tolerant stream processing of real-time data. Spark streaming divides data streams into batches of Z seconds called Dstreams, which is the sequence of Resilient Distributed Datasets (RDD), one for each batch interval. Individual RDD contains the records received during the batch interlude as depicted in the image below.*

Stream processing using Apache Spark 

DStreams can be used in many real-time use cases where a significant quantity of data needs to be processed as soon as it arrives, such as for fraud detection, managing web traffic, etc. Spark streaming receives the data streams then divides them into batches as depicted in the above diagram, then processes them to generate final outcome streams.

Data transfer using MQTT

Message Queue Telemetry Transport (MQTT) is a publisher/subscriber messaging system which works well for IoT use cases dealing with low memory devices and unreliable networks. MQTT uses a broker to hold the stream data until an organization is ready to consume it. Spark Streaming provides the for MQTT Stream connectivity.

Data store in Elasticsearch

Elasticsearch is another API-driven, scalable tool in Apache used to store, search and analyze any kind of data. It provides indices over the data for faster retrieval. Combined with the visualization tool Kibana, it is designed to enable a platform for a business critical analytics and insights.

Real-time visualization using Kibana

Kibana is the data visualization tool which has a direct plugin with Elasticsearch. It provides interactive engagement with the data through graphs and dashboards for analytic purpose, as shown in the image below.

Real-time visualiation using Kibana 

Users can create bar, line and scatter plots, or pie charts and maps on top of large volumes of data within Kibana. The “Coordinate Map” feature of Kibana lets you plot geo-locations and perform useful analytics. Kibana enables visual analysis and real-time study of data stored in Elasticsearch.

IoT data processing and visualization example

Let’s look at a use case example of a coal mine to help visualize how the combined implementation of the above-mentioned technologies can be used for IoT data collection, storage and analysis as shown in the image below.

IoT data processing example


 In the energy and utility industry, there are a vast number of coal mines located in different geo-location zones across the world. The conditions inside each mine needs to be monitored for worker safety. Different condition data components such as temperature, humidity and corresponding geo-location are transmitted from the mines to various coal companies every few seconds using MQTT protocol. This data could then be processed using Spark streaming and uploaded to Elasticsearch. Kibana could then be used for visualization.

The below-mentioned code snippets depict how data can be loaded from the MQTT broker using Spark Streaming and then saved in Elasticsearch, and Kibana.

Collecting MQTT messages using streaming context

Collecting MQTT messages streaming contextEach MQTT message is saved in Elastic Search

Saving MQTT message in Elastic Search

Define index patterns for data

The index pattern under the Management tab in Kibana can be defined to filter out data. The data index pattern can be seen under the Discover tab, as seen in the below screenshot.

Defined index pattern in Kibana 

Graphs for data visualization

Individual graphs can be created under Kibana’s Visualize tab. Also, a dashboard can be created to include these graphs under the Dashboard tab, as depicted in the below image.

 Graphis in Kibana

The continuous data streaming from IoT devices is only impactful if it can be conserved and utilized to make quick and informed decisions. The integration of real-time analytics technologies helps refine the data so it can be employed for predictive analysis and maintenance, regulatory checks, improved response times, etc.

Through the use of the above-mentioned technology stacks, many benefits can be achieved, such as dynamic processing, security, improved quality of service, scalability, distributed approach, interactive display, easy distribution of dashboards, etc.

If you are evaluating the addition of a big data or IoT solution or need real-time data processing for your connected devices, contact us.

*Reference: https://mapr.com/blog/real-time-streaming-data-pipelines-apache-apis-kafka-spark-streaming-and-hbase/ 

Written by Omkar Chaudhari

Big Data Technical Lead

Omkar has more than eight years’ experience in the analysis, design and development of software applications using open-source technologies. He is highly experienced in object-oriented programming, business analysis and requirement gathering and is well versed in SDLC and Agile Methodologies.

He also has expertise in handling toolsets for cloud and Hadoop, where he is skilled in both the AWS and MapR environments. Omkar has managed multiple teams of varying sizes for a wide range of technologies. He likes to learn new techniques and frameworks that helps organizations leverage technologies to improve productivity and quality.

If you would like to connect with Mr. Chaudhari: omkar.chaudhari@emtecinc.com

Popular Posts

More Emtec Insights

Get IT Insights in Your Inbox