• LOGIN
  • No products in the cart.

Extract, Transform and Load data using Pig to harness the power of Hadoop.


Course Description

Pig works with unstructured data. It has many operations which are very SQL-like but Pig can perform these operations on data sets which have no fixed schema. Pig is great at wrestling data into a form which is clean and can be stored in a data warehouse for reporting and analysis.
Pig allows you to transform data in a way that makes is structured, predictable and useful, ready for consumption.

Learning Outcomes

  • Work with unstructured data to extract information, transform it and store it in a usable form
  • Write intermediate level Pig scripts to munge data
  • Optimize Pig operations which work on large data sets

Pre-requisites

  • A basic understanding of SQL and working with data
  • A basic understanding of the Hadoop eco-system and MapReduce tasks

Who is this course intended for?

  • Analysts who want to wrangle large, unstructured data into shape.
  • Engineers who want to parse and extract useful information from large datasets.

 


Your Instructor

Loonycorn

Loonycorn is us, Janani Ravi and Vitthal Srinivasan. Between us, we have studied at Stanford, been admitted to IIM Ahmedabad and have spent years working in tech, in the Bay Area, New York, Singapore and Bangalore.

Janani: 7 years at Google (New York, Singapore); Studied at Stanford; also worked at Flipkart and Microsoft

Vitthal: Also Google (Singapore) and studied at Stanford; Flipkart, Credit Suisse and INSEAD too

We think we might have hit upon a neat way of teaching complicated tech courses in a funny, practical, engaging way, which is why we are so excited to be here on Learnsector!

We hope you will try our offerings and think you’ll like them 🙂

Course Curriculum

Introduction
Introduction 00:00:00
Where does Pig fit in?
Pig and the Hadoop ecosystem 00:00:00
Install and set up 00:00:00
How does Pig compare with Hive? 00:00:00
Pig Latin as a data flow language 00:00:00
Pig with HBase 00:00:00
Pig Basics
Operating modes, running a Pig script, the Grunt shell 00:00:00
Loading data and creating our first relation 00:00:00
Scalar data types 00:00:00
Complex data types – The Tuple, Bag and Map 00:00:00
Partial schema specification for relations 00:00:00
Displaying and storing relations – The dump and store commands 00:00:00
Pig Operations And Data Transformations
Selecting fields from a relation 00:00:00
Built-in functions 00:00:00
Evaluation functions 00:00:00
Using the distinct, limit and order by keywords 00:00:00
Filtering records based on a predicate 00:00:00
Advanced Data Transformations
Group by and aggregate transformations 00:00:00
Combining datasets using Join 00:00:00
Concatenating datasets using Union 00:00:00
Generating multiple records by flattening complex fields 00:00:00
Using Co-Group, Semi-Join and Sampling records 00:00:00
The nested Foreach command 00:00:00
Debug Pig scripts using Explain and Illustrate 00:00:00
Optimizing Data Transformations
Parallelize operations using the Parallel keyword 00:00:00
Join Optimizations: Multiple relations join, large and small relation join 00:00:00
Join Optimizations: Skew join and sort-merge join 00:00:00
Common sense optimizations 00:00:00
A Real-world Example
Parsing server logs 00:00:00
Summarizing error logs 00:00:00
Installing Hadoop in a Local Environment
Hadoop Install Modes 00:00:00
Hadoop Standalone mode Install 00:00:00
Hadoop Pseudo-Distributed mode Install 00:00:00
Appendix
[For Linux/Mac OS Shell Newbies] Path and other Environment Variables 00:00:00
Setup a Virtual Linux Instance (For Windows users) 00:00:00

Course Reviews

N.A

ratings
  • 5 stars0
  • 4 stars0
  • 3 stars0
  • 2 stars0
  • 1 stars0

No Reviews found for this course.

TAKE THIS COURSE
  • $99.00 $15.00
  • UNLIMITED ACCESS
  • Course Certificate
2 STUDENTS ENROLLED

Related Courses

© Learnsector