Abstract
VIRTUAL MACHINE BASED DATA SAMPLING APPROACH TO IMPROVE QUERY PERFORMANCE FOR VIRTULIZED HADOOP
Anupama S.*, Kavya G., Kannika J.S., Arjun T.R. and Malatesh S.H.
ABSTRACT
MapReduce emerges as an important distributed programming paradigm for large-scale data analysis applications. As an open-source implementation of MapReduce, Hadoop presents an attractive usage system for many enterprises. There are some drawbacks in a traditional Hadoop cluster deployed with a large scale of physical machines, such as burdensome cluster management and fluctuating resource utilization. Virtualized Hadoop cluster not only simplifies cluster management, but also facilitates cost-effective workload consolidation for resource utilization. In Hadoop system, the data locality and query performance are the critical factors impacting on performance of MapReduce applications.
[Full Text Article] [Download Certificate]