Extract, Transform and Load data using Pig to harness the power of Hadoop.
- Work with unstructured data to extract information, transform it and store it in a usable form
- Write intermediate level Pig scripts to munge data
- Optimize Pig operations which work on large data sets
- A basic understanding of SQL and working with data
- A basic understanding of the Hadoop eco-system and MapReduce tasks
Who is this course intended for?
- Analysts who want to wrangle large, unstructured data into shape.
- Engineers who want to parse and extract useful information from large datasets.
Loonycorn is us, Janani Ravi and Vitthal Srinivasan. Between us, we have studied at Stanford, been admitted to IIM Ahmedabad and have spent years working in tech, in the Bay Area, New York, Singapore and Bangalore.
Janani: 7 years at Google (New York, Singapore); Studied at Stanford; also worked at Flipkart and Microsoft
Vitthal: Also Google (Singapore) and studied at Stanford; Flipkart, Credit Suisse and INSEAD too
We think we might have hit upon a neat way of teaching complicated tech courses in a funny, practical, engaging way, which is why we are so excited to be here on Learnsector!
We hope you will try our offerings and think you’ll like them 🙂
|Where does Pig fit in?|
|Pig and the Hadoop ecosystem||00:00:00|
|Install and set up||00:00:00|
|How does Pig compare with Hive?||00:00:00|
|Pig Latin as a data flow language||00:00:00|
|Pig with HBase||00:00:00|
|Operating modes, running a Pig script, the Grunt shell||00:00:00|
|Loading data and creating our first relation||00:00:00|
|Scalar data types||00:00:00|
|Complex data types – The Tuple, Bag and Map||00:00:00|
|Partial schema specification for relations||00:00:00|
|Displaying and storing relations – The dump and store commands||00:00:00|
|Pig Operations And Data Transformations|
|Selecting fields from a relation||00:00:00|
|Using the distinct, limit and order by keywords||00:00:00|
|Filtering records based on a predicate||00:00:00|
|Advanced Data Transformations|
|Group by and aggregate transformations||00:00:00|
|Combining datasets using Join||00:00:00|
|Concatenating datasets using Union||00:00:00|
|Generating multiple records by flattening complex fields||00:00:00|
|Using Co-Group, Semi-Join and Sampling records||00:00:00|
|The nested Foreach command||00:00:00|
|Debug Pig scripts using Explain and Illustrate||00:00:00|
|Optimizing Data Transformations|
|Parallelize operations using the Parallel keyword||00:00:00|
|Join Optimizations: Multiple relations join, large and small relation join||00:00:00|
|Join Optimizations: Skew join and sort-merge join||00:00:00|
|Common sense optimizations||00:00:00|
|A Real-world Example|
|Parsing server logs||00:00:00|
|Summarizing error logs||00:00:00|
|Installing Hadoop in a Local Environment|
|Hadoop Install Modes||00:00:00|
|Hadoop Standalone mode Install||00:00:00|
|Hadoop Pseudo-Distributed mode Install||00:00:00|
|[For Linux/Mac OS Shell Newbies] Path and other Environment Variables||00:00:00|
|Setup a Virtual Linux Instance (For Windows users)||00:00:00|
No Reviews found for this course.