Simply Business homepage
  • Business insurance

    • Business Insurance FAQs

    Business insurance covers

  • Support
  • Claims
  • Sign In
Call Us0333 0146 683
Our opening hours
Tech blog

Notes from ElixirConf EU 2018 (part 3)

2-minute read

Paweł Dawczak

Paweł Dawczak

14 August 2018

Share on FacebookShare on TwitterShare on LinkedIn

Continuing our series of posts describing the experience of joining this year's edition of ElixirConf EU 2018, today we will talk about the tools around processing big amounts of data using GenStage and Flow.

Elixir is a young language

Previously, we've mentioned that Elixir is a 7-year-old language, which actually makes it quite young compared to other technologies. This fact, though, puts the Elixir Core team in an interesting position when considering the future of a language - there is a lot in other technologies, that could be considered a good addition to the ecosystem.

For example, Elixir's pipe operator (|>) was taken from the idea of piping commands in command line; the powerful macro tooling was inspired by Lisp macros; and the concept of tests parsing the documentation of the code, and using examples as tests was an idea taken from Python.

BEAM, the virtual machine that Elixir programs run on, is a powerful tool that enables easier concurrency - this is a solid ground for building applications that would be able to handle big amounts of data. If only there was a good abstraction for doing that...

The problem

You might have experienced this problem before: you have to implement a data pipeline that consists of multiple stages, such as reading through a CSV file, enriching or aggregating the parsed data and - at the end - have it "consumed" by a slower process (e.g. sending emails or publishing to your queuing system).

The problem lies in how the pipeline is structured: it's not uncommon that the "first stage" of your data processing will be progressing faster than your "last stage"! One approach for that might be to introduce throttling in the "file reader" to read through the file slower - slow enough, so the consumer will be able to process the data as it arrives.

How do you decide on the throttling in this first stage, though? Make the throttle with too short windows, and the problem remains: too much data produced, so the consumer is unable to catch up; make the window too wide, and the program will be progressing through the data for too long, making it inefficient.

Simply Business events

GenStage and Flow

What if the data flow was described in reverse order? What if a stage requested more data to process from the previous stage, which would then ask its previous stage, and so on, up to the very first stage - here, the data reader?

It turns out, the problem has already been solved by the Akka community. The backpressure mechanism became an inspiration for GenStage, Flow finds inspiration in Apache Spark and Apache Beam for a high-level description of concurrent programming. Think of Flow as an API built on top of GenStage describing the computation, and the framework will help in adjusting the processes to perform it.

Since the official announcement of GenStage I was looking for some more resources to learn about the tools, and in this year's ElixirConfEU edition there were a couple of presentations that talked exactly about it!

My picks are:

Jusabe Guedes - Orchestrating Consumers and Producers Like an Octopus

Jusabe, in his presentation, talks about the experience using GenStage in production system that consume events from Amazon's SQS Service.

László Bácsi - Robust Data Processing Pipeline with Elixir and Flow

László, on the other hand, was playing with the higher-level API using Flow while building the system for UK's parcel service company - CollectPlus. He was tasked with implementing a data pipeline to process billions of rows from MySQL and CSV documents and to load them into Redshift. You can find more information in the recording of his session.

Summary

Both presentations were focusing on the practical aspect of GenStage and Flow, and, if you want to know more, your best bet is to check the videos: they include a lot of code examples for approaching solving the problem at hand.

Ready to start your career at Simply Business?

Want to know more about what it's like to work in tech at Simply Business? Read about our approach to tech, then check out our current vacancies.

Find out more

We create this content for general information purposes and it should not be taken as advice. Always take professional advice. Read our full disclaimer

Find this article useful? Spread the word.

Share on Facebook
Share on Twitter
Share on LinkedIn

Keep up to date with Simply Business. Subscribe to our monthly newsletter and follow us on social media.

Subscribe to our newsletter

Insurance

Public liability insuranceBusiness insuranceProfessional indemnity insuranceEmployers’ liability insuranceLandlord insuranceTradesman insuranceSelf-employed insuranceRestaurant insuranceVan insuranceInsurers

Address

6th Floor99 Gresham StreetLondonEC2V 7NG

Northampton 900900 Pavilion DriveNorthamptonNN4 7RG

© Copyright 2024 Simply Business. All Rights Reserved. Simply Business is a trading name of Xbridge Limited which is authorised and regulated by the Financial Conduct Authority (Financial Services Registration No: 313348). Xbridge Limited (No: 3967717) has its registered office at 6th Floor, 99 Gresham Street, London, EC2V 7NG.