- About the Course
- Intended Audience
- Syllabus
COURSE OVERVIEW
The Data Science and Big Data Analytics course gives down to earth establishment level preparing that empowers prompt and successful interest in Big Data and different Analytics ventures. It incorporates a prologue to Big Data and the Data Analytics lifecycle to address business challenges that influence Big Data. The course gives establishing in essential and progressed systematic techniques and a prologue to Big Data Analytics innovation and instruments. Lab sessions offer chances to see how these strategies and devices might be connected to true business challenges by a rehearsing Data Scientist. This course gives an industry accreditation to business investigators, information distribution center specialists or different experts with comparative foundations to help them change into the universe of Data Science and Big Data Analytics that has extraordinary difficulties and opportunities.
- The problem space and example applications
- Why don’t traditional approaches scale?
- Requirements
- Hadoop History
- The ecosystem and stack: HDFS, MapReduce, Hive, Pig…
- Cluster architecture overview
- Hadoop distribution and basic commands
- Eclipse development
- The HDFS command line and web interfaces
- The HDFS Java API (lab)
- Key philosophy: move computation, not data
- Core concepts: Mappers, reducers, drivers
- The MapReduce Java API (lab)
- Optimizing with Combiners and Partitioners (lab)
- More common algorithms: sorting, indexing and searching (lab)
- Testing with MRUnit
- Patterns to abstract “thinking in MapReduce”
- The Cascading library (lab)
- The Hive database (lab)