Warning message

  • The service having id "twitter" is missing, reactivate its module or save again the list of services.
  • The service having id "facebook" is missing, reactivate its module or save again the list of services.
  • The service having id "google_plus" is missing, reactivate its module or save again the list of services.
  • The service having id "linkedin" is missing, reactivate its module or save again the list of services.

Workshop: Hands on with Apache Spark

Location:

Level: 
Beginner
9:00am - 4:00pm

Key takeaways

The basics of Scala
Understand Apache Spark architecture, core concepts, programming model
Using Spark shell for interactive data analysis
Parallel programming with Spark RDD APIs
Developing standalone Spark applications
Developing Spark streaming applications

Prerequisites

Participants should have some high level knowledge of Hadoop.
Participants should bring their laptop with Java 7 or above and Spark 1.5.1 installed

Apache Spark is a new and exciting open source data processing engine and it is deemed as the next-generation successor of MapReduce. It was designed from the ground up to support streaming data processing, graph processing as well as complex iterative data processing. Apache Spark provides a nice abstraction of large data sets with the concept of Resilient Distributed Datasets (RDD) and elegant APIs to easily manipulate these large data sets.

This workshop will cover the core concepts in Apache Spark and will include hands on exercises with using RDD APIs to solve common data processing problems. The exercises will be done using Apache Spark Scala APIs and therefore this workshop will also cover the essential parts of Scala that are relevant to the exercises.

Tracks

Covering innovative topics

Monday Nov 16

Tuesday Nov 17

Wednesday Nov 18