Posts

Showing posts from July, 2020

A Simple Apache Spark Demo

Apache Spark is a data processing framework that can quickly perform processing tasks on very large data sets, and can also distribute data processing tasks across multiple computers, either on its own or in tandem with other distributed computing tools About this example In this post I am sharing a simple Apache Spark example project. The source code used for this example is available here:  https://github.com/jobinesh/apache-spark-examples.git Here is the quick overview of the modules that you may find in this project spark-job-common :  All common classes that you need for building a Spark job are parked here. This approach may help you to avoid boilerplate code in your Spark job implementation spark-job-impl  : A classic word count Spark  example is available here.   This class may help you to understand the structuring of the source and usage of common classes from spark-job-common module spark-job-launcher : The SparkLauncher helps you to start Spark applications programmatical