Total 9 Posts

Apache Spark

Performance Tweaking Apache Spark

Apache Spark Streaming applications need to be monitored frequently to be certain that they are…

Read More


Incrementally loaded Parquet files

In this post, I explore how you can leverage Parquet when you need to load…

Read More


MongoDB and Apache Spark - Getting started tutorial

MongoDB and Apache Spark are two popular Big Data technologies. In my previous post, I…

Read More


Introduction to the MongoDB connector for Apache Spark

MongoDB is one of the most popular NoSQL databases. Its unique capabilities to store document-oriented…

Read More


Spark Summit East 2017 - A summary

I attended Spark Summit East 2017 last week. This 2 day conference - February 8th…

Read More


A tour of Databricks Community Edition: a hosted Spark service

With the recent announcement of the Community Edition, it’s time to have a look…

Read More


Testing strategy for Spark Streaming - Part 2 of 2

In a previous post, we’ve seen why it’s important to test your Spark…

Read More


Testing strategy for Apache Spark jobs - Part 1 of 2

Like any other application, Apache Spark jobs deserve good testing practices and coverage. Indeed, the…

Read More


Applying Data Science with Apache Spark Coding Dojo

This week, at the power plant (Ippon Technologies USA headquarters), we had the pleasure of…

Read More