Skip to main content


Implementation of A Machine Learning Technique: Store and Process Big Data in Distributed Environment over the Array of Computers

Issue Abstract

Abstract
Machine learning (ML) is a subdivision of information science which explains with programming the frameworks such that they mechanically learn and enhance with practice. There are several machine learning methods among one, to arrange elements or objects of a given collection into groups based on the likeness between the items called Clustering. For example, the applications related to publishing online news grouping based on articles published in the news using clustering. At present we are breathing in an epoch where information is available in profusion from various media sources like internet, intranet, web etc. The information load has increased to such heights that sometimes it becomes difficult to manage our gadgets tiny mailboxes, predict the dimensions of data and records of popular websites maintenance information up to date. It is true where unknown websites receives and maintains bulk information. To analyze such huge data across the multiple networked computer systems normally it depends on classical mining algorithms to identify trends and draw conclusions. However, the traditional machine learning techniques which are implemented and run on legacy system framework can be sufficiently productive to process constrained datasets and give results in fast time, unless the computational errands are keep running on numerous machines circulated over the clusters of commodity of computers.We propose a novel algorithm can be testing a very big data processed with a new framework called Mahout that allows us to break down a computation task into multiple segments and run each segment on different machines.The experiment results shows that with number of records increases but it will not affect the system performance and also it will gives good cluster quality.
Key Words: - Machine Learning, Clustering, K-Means, Hadoop, Mahout,


Author Information
Nandini Madineni
Issue No
11
Volume No
3
Issue Publish Date
05 Nov 2017
Issue Pages
121-126

Issue References

References
1) Ch.Sivasankar, D.Vivekananda Reddy, “Document Clustering Approach Using Internal Criterion Function,” International Journal of Innovations in Engineering and Technology (IJIET), Vol. 3 Issue 4 April 2014, ISSN: 2319 – 1058
2) Herrington, “Machine learning in ActionPeter” ISBN 9781617290183.
3) Anantha Grama,Anshul Gupta,George Karypisand Vipin Kumar “Introduction to Parallel Computing,” Second Edition.
4) Piero Giacomelli, “Apache Mahout Cookbook,” Open Source.