• LOGIN
  • No products in the cart.

A hands-on workout in Hadoop, MapReduce and the art of thinking “parallel”.


Course Description

This course is a zoom-in, zoom-out, hands-on workout involving Hadoop, MapReduce and the art of thinking parallel.
This course is both broad and deep. It covers the individual components of Hadoop in great detail, and also gives you a higher level picture of how they interact with each other.
This course will get you hands-on with Hadoop very early on. You’ll learn how to set up your own cluster using both VMs and the Cloud. All the major features of MapReduce are covered – including advanced topics like Total Sort and Secondary Sort.
MapReduce completely changed the way people thought about processing Big Data. Breaking down any problem into parallelizable units is an art. The examples in this course will train you to “think parallel”.


Learning Outcomes

  • Develop advanced MapReduce applications to process Big Data.
  • Master the art of “thinking parallel” – how to break up a task into Map/Reduce transformations.
  • Self-sufficiently set up their own mini-Hadoop cluster whether it’s a single node, a physical cluster or in the cloud.
  • Use Hadoop + MapReduce to solve a wide variety of problems : from NLP to Inverted Indices to Recommendations.
  • Understand HDFS, MapReduce and YARN and how they interact with each other.
  • Understand the basics of performance tuning and managing your own cluster.

Pre-requisites

You’ll need an IDE where you can write Java code or open the source code that’s shared. IntelliJ and Eclipse are both great options.
You’ll need some background in Object-Oriented Programming, preferably in Java. All the source code is in Java and we dive right in without going into Objects, Classes etc.
A bit of exposure to Linux/Unix shells would be helpful, but it won’t be a blocker


Who is this course intended for?

Analysts who want to leverage the power of HDFS where traditional databases don’t cut it anymore.
Engineers who want to develop complex distributed computing applications to process lot’s of data.
Data Scientists who want to add MapReduce to their bag of tricks for processing data.

 


Your Instructor

Loonycorn

Loonycorn is us, Janani Ravi and Vitthal Srinivasan. Between us, we have studied at Stanford, been admitted to IIM Ahmedabad and have spent years working in tech, in the Bay Area, New York, Singapore and Bangalore.

Janani: 7 years at Google (New York, Singapore); Studied at Stanford; also worked at Flipkart and Microsoft

Vitthal: Also Google (Singapore) and studied at Stanford; Flipkart, Credit Suisse and INSEAD too

We think we might have hit upon a neat way of teaching complicated tech courses in a funny, practical, engaging way, which is why we are so excited to be here on Learnsector!

We hope you will try our offerings and think you’ll like them 🙂

Course Curriculum

Why is Big Data a Big Deal?
Big Data Introduction 00:00:00
Serial vs Distributed Computing 00:00:00
What is Hadoop? 00:00:00
HDFS or the Hadoop Distributed File System 00:00:00
MapReduce Introduced 00:00:00
YARN or Yet Another Resource Negotiator 00:00:00
Installing Hadoop in a Local Environment
Hadoop Install Modes 00:00:00
Hadoop Standalone mode Install 00:00:00
Hadoop Pseudo-Distributed mode Install 00:00:00
The MapReduce "Hello World"
The basic philosophy underlying MapReduce 00:00:00
MapReduce – Visualized And Explained 00:00:00
MapReduce – Digging a little deeper at every step 00:00:00
“Hello World” in MapReduce 00:00:00
The Mapper 00:00:00
The Reducer 00:00:00
The Job 00:00:00
Run a MapReduce Job
Get comfortable with HDFS 00:00:00
Run your first MapReduce Job 00:00:00
Juicing your MapReduce - Combiners, Shuffle and Sort and The Streaming API
Parallelize the reduce phase – use the Combiner 00:00:00
Not all Reducers are Combiners 00:00:00
How many mappers and reducers does your MapReduce have? 00:00:00
Parallelizing reduce using Shuffle And Sort 00:00:00
MapReduce is not limited to the Java language – Introducing the Streaming API 00:00:00
Python for MapReduce 00:00:00
HDFS and Yarn
HDFS – Protecting against data loss using replication 00:00:00
HDFS – Name nodes and why they’re critical 00:00:00
HDFS – Checkpointing to backup name node information 00:00:00
Yarn – Basic components 00:00:00
Yarn – Submitting a job to Yarn 00:00:00
Yarn – Plug in scheduling policies 00:00:00
Yarn – Configure the scheduler 00:00:00
MapReduce Customizations For Finer Grained Control
Setting up your MapReduce to accept command line arguments 00:00:00
The Tool, ToolRunner and GenericOptionsParser 00:00:00
Configuring properties of the Job object 00:00:00
Customizing the Partitioner, Sort Comparator, and Group Comparator 00:00:00
The Inverted Index, Custom Data Types for Keys, Bigram Counts and Unit Tests!
The heart of search engines – The Inverted Index 00:00:00
Generating the inverted index using MapReduce 00:00:00
Custom data types for keys – The Writable Interface 00:00:00
Represent a Bigram using a WritableComparable 00:00:00
MapReduce to count the Bigrams in input text 00:00:00
Setting up your Hadoop project 00:00:00
Test your MapReduce job using MRUnit 00:00:00
Input and Output Formats and Customized Partitioning
Introducing the File Input Format 00:00:00
Text And Sequence File Formats 00:00:00
Data partitioning using a custom partitioner 00:00:00
Make the custom partitioner real in code 00:00:00
Total Order Partitioning 00:00:00
Input Sampling, Distribution, Partitioning and configuring these 00:00:00
Secondary Sort 00:00:00
Recommendation Systems using Collaborative Filtering
Introduction to Collaborative Filtering 00:00:00
Friend recommendations using chained MR jobs 00:00:00
Get common friends for every pair of users – the first MapReduce 00:00:00
Top 10 friend recommendation for every user – the second MapReduce 00:00:00
Hadoop as a Database
Structured data in Hadoop 00:00:00
Running an SQL Select with MapReduce 00:00:00
Running an SQL Group By with MapReduce 00:00:00
A MapReduce Join – The Map Side 00:00:00
A MapReduce Join – The Reduce Side 00:00:00
A MapReduce Join – Sorting and Partitioning 00:00:00
A MapReduce Join – Putting it all together 00:00:00
K-Means Clustering
What is K-Means Clustering? 00:00:00
A MapReduce job for K-Means Clustering 00:00:00
K-Means Clustering – Measuring the distance between points 00:00:00
K-Means Clustering – Custom Writables for Input/Output 00:00:00
K-Means Clustering – Configuring the Job 00:00:00
K-Means Clustering – The Mapper and Reducer 00:00:00
K-Means Clustering : The Iterative MapReduce Job 00:00:00
Setting up a Hadoop Cluster
Manually configuring a Hadoop cluster (Linux VMs) 00:00:00
Getting started with Amazon Web Servicies 00:00:00
Start a Hadoop Cluster with Cloudera Manager on AWS 00:00:00
Appendix
Setup a Virtual Linux Instance (For Windows users) 00:00:00
[For Linux/Mac OS Shell Newbies] Path and other Environment Variables 00:00:00

Course Reviews

N.A

ratings
  • 5 stars0
  • 4 stars0
  • 3 stars0
  • 2 stars0
  • 1 stars0

No Reviews found for this course.

TAKE THIS COURSE
  • $99.00 $15.00
  • UNLIMITED ACCESS
  • Course Certificate
STUDENTS ENROLLED

    Related Courses

    © Learnsector