The problem of heavy workloads
In the simplest term, a workload is the amount of work performed by an application in a given period. Every application has limited capability to handle and process work. When an application is assigned a workload beyond its capability, it may take a long time to complete or often just crashes. It becomes even more severe in cases where a workload requires high compute resources, such as servers or database systems assigned a workload upon creation eg. cloud based applications. Irrespective of platform or industry, the problem of heavy workloads must be addressed efficiently and optimally to ensure project timelines and deliverables are met in a timely manner.
One solution for heavy workload processing and boosting application performance, is to split the workload into smaller chunks on an increased number of servers to carry out parallel execution of the smaller workloads. The shortcoming of this approach is the complexity and cost to setup, maintain and deploy these environments.
This is where Azure batch processing, a tool from Microsoft Azure Services, can both reduce the cost and complexity of running the application in a cloud environment.
Before we get into the details of Azure batch processing and workflows, let’s first review commonly used terminology:
A Batch account is a uniquely recognized entity within the Batch service. All processing is associated with a Batch account.
A compute node is an Azure virtual machine (VM) or Cloud service VM that is dedicated to processing a portion of your application's workload.
A pool is defined as a collection of nodes which the application runs on.
A job is a collection of tasks. It manages how computation is performed by its tasks on the compute nodes in a pool.
A task is a unit of computation that is associated with a job. It runs on a node.
An application that executes the workload
|Batch Application||An application that automates the process of creating a pool, job and assigning a task to the job.|
|Node agent SKU||
Each node in the pool has the batch node agent running on it. The batch node agent provides the command and authority interface between the node and the Batch service.
An Overview of Azure Batch Processing
Azure Batch is a robust service that provides parallel batch processing to execute intensive workloads of varying size. It creates a pool of compute nodes (virtual machines) to tackle heavy loads. With Azure Batch, batch processing has become more streamlined and viable to complete data-intensive workloads at any scale.
To initiate automated creation and management of Azure Batch pool, job, and task, the developer needs to create a batch application which can carry out on-premise execution using batch API. The batch application can be easily developed as an Azure CLI script, .Net application, or Python application. The batch application is not limited to any technology as Azure batch service provides REST API which is well documented. The following diagram depicts an Azure Batch Processing design based on a parallel workload:
The following steps explain the Azure batch workflow scenario:
- The relevant data which needs to be processed by the task is uploaded to Azure storage. From Azure storage, the files are downloaded to the compute nodes to run the tasks.
- The setup is then completed to configure the pool of compute nodes. As the workload application is present in the compute node, the data needs to reach the compute node for application to process it. After processing the data, it is sent back to the storage.
- The job is set up and configured to manage a collection of tasks/work items.
- The task is then set up, configured, and assigned a job. The task defines the computational work to be done which may consist of retrieving the files stored in the Microsoft Azure Cloud storage.
Typical Workload Applications / Use Cases
- Cloud-aware application:
Consider this use case based on a sample project in GitHub (Ref1)*. In this instance, the application is aware of the existence of Cloud and can interact with it.
The block diagram above is an extension of the detailed diagram represented in the Azure batch processing section above. The overall execution process is the same, except one special attribute that the workload application is aware of the Cloud. It is capable to execute read and write operations on Cloud storage.
- Legacy application:
This includes standalone applications that do not include Cloud as part of the overall architecture or applications which organizations implemented before Cloud became mainstream.
Let’s review an example of an application which is not aware of the cloud. Consider this sample project (Ref 2)where an application takes MP4 files from the filesystem and converts them into an AVI format using a FFmpeg tool (which is not Cloud aware) and then saves them to the filesystem.
The data processing remains same as mentioned in Azure batch processing with the only difference being the tasks act as the intermediate between I/O container and workload application present in the compute node. Azure Batch has the capability to push the file from Cloud storage to the node before starting the task, and after completion of the task, push it back to Cloud storage.
Azure Batch Pool Configuration
Pool configuration plays a critical role while designing a workload solution in Azure Batch. The right configuration will affect valuation and intricacy of the solution. Based on the nature of the workload application, the following configurations are considered:
- Batch Pool Operating System Configuration:
- Cloud Service Configuration:
This specifies that the pool is comprised of Azure Cloud Service nodes. This configuration only supports the Windows operating system. The “OS Family” is provided to create this type of pool. The advantage of Cloud service is that it can deploy nodes quickly, but on the flip side it has a major restriction that it does not support output resource (i.e. application must be Cloud-aware to read and write Cloud storage). It does not support legacy applications.
- Virtual Machine Configuration:
The advantage of the Virtual Machine (VM) is that OS other than Windows can be used. It also supports input and output resource so, both use cases —Cloud-aware applications and legacy applications (mentioned above) are possible with this configuration. Let’s see the type of virtual machines we can create:
- Azure Marketplace VM image:
You need to specify an image from the Azure Marketplace with the properties of the virtual machine image. As an example, the following properties can be specified for creating a virtual machine image reference:
Image reference properties Example Publisher Canonical Offer UbuntuServer SKU 14.04.4-LTS Version latest
The application and its prerequisites need to be installed/pushed after node creation.
- Custom VM image:
A pool of virtual machines using a custom image can be created through a managed image resource in Azure. Use of custom image provides complete flexibility for the operating system, its configuration, and data disks to be used.
The advantage here is that the installation of prerequisite software is not required by the application on startup of pool node as those are already installed before creating the image.
- Docker container:
Running Docker container is the same as running custom VM image. The only difference is that it uses container image instead of the VM image.
- Azure Marketplace VM image:
- Cloud Service Configuration:
- A target number of compute node:
When a pool is created, the types of compute nodes and the target number for each node are specified. The two types of compute nodes are:
- Dedicated compute nodes :
These nodes are dedicated to the application workloads and are guaranteed to be never preempted by Microsoft as the case in low priority compute nodes. They require more investment as compared to low-priority nodes.
- Low-priority compute nodes :
Low-priority nodes utilize excess capacity in Azure to run Batch application workloads. These nodes are comparatively inexpensive vs. dedicated nodes. These compute nodes come with a restriction that they may be preempted anytime when Microsoft Azure has an insufficient surplus capacity and require them back. If a node is preempted while running tasks, the tasks are re-queued and wait for a compute node to become available again.
- Dedicated compute nodes :
- Size of compute node :
The size of compute node is the VM hardware configuration template provided on Azure. Before providing the size to configuration, the supported size for the selected operating system needs to be checked.
Azure batch is a great option to run heavy workloads. Loose coupling of batch configuration makes it easy to migrate existing applications to Azure batch to take advantage of cost-savings. Azure batch also can be used to scale out the execution of R algorithm (Ref 3)*. If you are looking to optimize your costs while ensuring much higher performance with elastic resources, Azure Batch is a great solution. Contact us to get started.
Ref 1 - https://github.com/Azure/azure-batch-samples/tree/master/CSharp/ArticleProjects/DotNetTutorial
Ref 2- https://github.com/Azure-Samples/batch-dotnet-ffmpeg-tutorial
Ref 3 - https://github.com/Azure/doAzureParallel