Big Data Analytics

Big Data Analytics book aims at providing the fundamentals of Apache Spark and Hadoop. All Spark components – Spark Core, Spark SQL, DataFrames, Data sets, Conventional Streaming, Structured Streaming, MLlib, Graphx and Hadoop core components – HDFS, MapReduce and Yarn are explored in greater depth with implementation examples on Spark + Hadoop clusters.

Big Data Analytics

Venkat Ankam

Packt Publishing

2016

Abstract

This book is based on the latest 2.0 version of Apache Spark and 2.7 version of Hadoop integrated with most commonly used tools.

Learn all Spark stack components including latest topics such as DataFrames, DataSets, GraphFrames, Structured Streaming, DataFrame based ML Pipelines and SparkR.

Integrations with frameworks such as HDFS, YARN and tools such as Jupyter, Zeppelin, NiFi, Mahout, HBase Spark Connector, GraphFrames, H2O and Hivemall.

It is moving away from MapReduce to Spark. So, advantages of Spark over MapReduce are explained at great depth to reap benefits of in-memory speeds. DataFrames API, Data Sources API and new Data set API are explained for building Big Data analytical applications. Real-time data analytics using Spark Streaming with Apache Kafka and HBase is covered to help building streaming applications. New Structured streaming concept is explained with an IOT (Internet of Things) use case. Machine learning techniques are covered using MLLib, ML Pipelines and SparkR and Graph Analytics are covered with GraphX and GraphFrames components of Spark.

Readers will also get an opportunity to get started with web based notebooks such as Jupyter, Apache Zeppelin and data flow tool Apache NiFi to analyze and visualize data.

What you will learn

Find out and implement the tools and techniques of big data analytics using Spark on Hadoop clusters with wide variety of tools used with Spark and Hadoop

Understand all the Hadoop and Spark ecosystem components

Get to know all the Spark components: Spark Core, Spark SQL, DataFrames, DataSets, Conventional and Structured Streaming, MLLib, ML Pipelines and Graphx

See batch and real-time data analytics using Spark Core, Spark SQL, and Conventional and Structured Streaming

Get to grips with data science and machine learning using MLLib, ML Pipelines, H2O, Hivemall, Graphx, SparkR and Hivemall.

Venkat has delivered hundreds of trainings, presentations, and white papers in the big data sphere. While this is his first attempt at writing a book, many more books are in the pipeline.

Table of Contents

Big Data Analytics at 10,000 foot view

Getting Started with Apache Hadoop and Apache Spark

Deep Dive into Apache Spark

Big Data Analytics with Spark SQL, DataFrames, and Datasets

Real-Time Analytics with Spark Streaming and Structured Streaming

Notebooks and Dataflows with Spark and Hadoop

Machine Learning with Spark and Hadoop

Building Recommendation Systems with Spark and Mahout

Graph Analytics with GraphX

Interactive Analytics with SparkR

Citation

Venkat Ankam, Big Data Analytics,Packt Publishing, 2016

Collection

Lĩnh vực Công nghệ thông tin

QR code

Content

Thứ Bảy, 20:49 05/11/2022

Tags: Thư viện Sách Giáo trình ĐH Công Nghiệp Hà Nội Ngoại ngữ Big Data Analytics Venkat Ankam

» Self-Directed Learning: Curriculum implementation, praxis and scholarship in context (24/07/2026)

» Methodological Quality of Interventions in Psychology (24/07/2026)

» Contemporary Perspective on Child Psychology and Education (24/07/2026)

» General Chemistry - An Atoms First Approach (Halpern) (24/07/2026)

» General ChemistryA Molecular Approach (24/07/2026)

» Financial Statement Analysis and Business Valuation for the Practical Lawyer (05/11/2022)

» AC Motor Control and Electrical Vehicle Applications (05/11/2022)

» Bộ đề luyện thi năng lực Hán ngữ HSK4 (04/11/2022)

» 호텔서비스 매너와 실무 = Hotel Service Manner (04/11/2022)

» 관광통역안내사 필기+면접 용어상식사전(합격의공식 시대에듀) = Tourism Interpreter Handwriting + Interview Terminology Common Sense Dictionary (Official Age of Pass) (04/11/2022)

GIỚI THIỆU SÁCH

Big Data Analytics

Venkat Ankam

Packt Publishing

Abstract

Citation

Collection

Related document

QR code

Content

Các bài đã đăng

Tin tiêu điểm

Các bài đã đăng

Tags


Big Data Analytics	Begining 3D game development with unity 4: All-in-one, multi-platform-game development	Embedded systems: Introduction to ARM cortex-M Microcontrollers. Volume 1