Abstract
JOB SCHEDULING SCHEME FOR IMAGE PROCESSING TO IMPROVE PERFORMANCE IN HADOOP
*Malatesh S. H., Pallavi S. T., Raveena B., Shreyanka G. P. and Vanitha Y.
ABSTRACT
With the growth of technology, the number of images being uploaded to the internet is exploding. Most current image processing applications, designed for small and local computation, do not integrate well to web-sized problems with their large requirements for resources used in computation and storage. Hadoop and its Mapreduce paradigm are emerging as an important standard for large and data- intensive processing in both industry and academia. A Mapreduce cluster is typically shared among many users with various types of workloads. One challenging issue is to efficiently schedule all the jobs in shared Mapreduce environment in Hadoop. However, we find that prior scheduling algorithms supported by Hadoop cannot ensure good performance for different Image processing workloads. To address this we have developed the Hadoop Image Processing Framework that provides a library that is Hadoop based to support large-scale image processing using Mesos, the resource manager. We propose a new Hadoop scheduler that leverages the study of workload patterns to improve the performance of the system by tuning the resource share dynamically among users and the scheduling algorithms for each user in Hadoop. Mesos is designed using the same principles as the Linux kernel, only at a different level of abstraction. The Mesos kernel will be running on every machine and will provide applications (e.g., Hadoop, Spark, Kafka, and Elastic Search) with API's for scheduling and resource management across the whole data center and all cloud environments. This new framework enhances the performance of about 5-10% of image processing in hadoop.
[Full Text Article] [Download Certificate]