Description
HADOOP DEVELOPMENT COURSE CONTENT
- Traditional system Vs. Hadoop system HDFS and Yarn
- Why Spark ? Introduction To RDD
- Loading Data to SPARK Using Text file and collect API Item wise count &Reduced By Key Transformation
- Spark Architecture _DAG stages Tasks driver Executor
- Yarn _client_ yarn_ cluster _error handling _ accumulator
- Shuffle JOIN_ Broad Cast Join
- Map Partition Hash Partition Custom Partition_ File formats Text Input Format
- Sequence file and Avro File
- Reduce, fold, fold Left, aggregate By Key
- Spark SQL Introduction_ Data source _ Data frames_ Loading csv file
- Reading Json, xml files_ Json Input Format_ Multi line Json Input Format
- Simple Queries_ Join_ Nested Queries
- Simple queries _ Joins using Data frame APIs_ Broadcast Join_ Custom Process using Udf and transform API_ Rename individual column _ all columns
- Window operation – moving avg , cumulative sum ,previous visit _ rank_ updated records
- Spark integration with Hive_ Hive Architecture_ Read and Write operation on Hive Table using Spark_ Sensitisation example
- orc and parquet file format