Posts

Showing posts from July, 2020

A Simple Apache Spark Demo

Apache Spark is a data processing framework that can quickly perform processing tasks on very large data sets, and can also distribute data processing tasks across multiple computers, either on its own or in tandem with other distributed computing tools
About this example
In this post I am sharing a simple Apache Spark example project. The source code used for this example is available here: https://github.com/jobinesh/apache-spark-examples.git
Here is the quick overview of the modules that you may find in this project
spark-job-common :  All common classes that you need for building a Spark job are parked here. This approach may help you to avoid boilerplate code in your Spark job implementationspark-job-impl : A classic word count Spark  example is available here.   This class may help you to understand the structuring of the source and usage of common classes from spark-job-common modulespark-job-launcher : The SparkLauncher helps you to start Spark applications programmatically.
The Sp…

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of my employer.