How to make your data transformations more efficient using transducers

By Guido Schmitz

Transforming large collections of data can be expensive, especially when you’re using higher order functions like map and filter.

This article will show the power of transducers to create efficient data transformation functions, which do not create temporary collections. Temporary collections are created when map and filter functions are chained together. This is because these functions return a new collection and will pass the result to the next function.

Imagine having records of 1,000,000 people and wanting to create a subset of “names of women above the age of 18 that live in The Netherlands”. There are different ways to solve this, but let’s start with the chaining approach.

If this approach is new to you, or you want to learn more about it, I’ve written a blog post on using higher order functions.

const ageAbove18 = (person) => person.age > 18;const isFemale = (person) => person.gender === ‘female’;const livesInTheNetherlands = (person) => person.country === ‘NL’;const pickFullName = (person) => person.fullName;

const output = bigCollectionOfData  .filter(livesInTheNetherlands)  .filter(isFemale)  .filter(ageAbove18)  .map(pickFullName);

Below is the visualisation of using the chained approach that creates temporary arrays. Imagine the expense of looping over 1,000,000 records 3 times!

Of course, the filtered collections will be reduced by some amount, but it’s still quite expensive.

A key insight, however, is that map and filter can be defined using reduce. Let’s implement the above code in terms of reduce.

const mapReducer = (mapper) => (result, input) => {  return result.concat(mapper(input));};

const filterReducer (predicate) => (result, input) => {  return predicate(input) ? result.concat(input) : result;};

const personRequirements = (person) => ageAbove18(person)  && isFemale(person)  && livesInTheNetherlands(person);

const output = bigCollectionOfData  .reduce(filterReducer(personRequirements), [])  .reduce(mapReducer(pickFullName), []);

We can further simplify the filterReducer by using function composition.

filterReducer(compose(ageAbove18, isFemale, livesInTheNetherlands));

When using this approach we reduce (haha!) the number of times we create a temporary array. Below is a visualization of the transformation when using the reduce approach.

Beautiful, right? But we were talking transducers. Where are our transducers?
It turns out, the filterReducer and mapReducer we created are reducing functions. We can express this as:

reducing-function :: result, input -> result

Transducers are functions that accept a reducing function and return a reducing function. This can be expressed as the following:

transducer :: (result, input -> result) -> (result, input -> result)

The most interesting part is that transducers are roughly symmetric in their type signature. They take one reducing function and return another.

Because of this we can compose any number of transducers using function composition.

Building your own Transducers

Hopefully it’s all starting to make more sense now. Let’s build our own transducer functions for map and filter.

const mapTransducer = (mapper) => (reducingFunction) => {  return (result, input) => reducingFunction(result, mapper(input));}

const filterTransducer = (predicate) => (reducingFunction) => {  return (result, input) => predicate(input)    ? reducingFunction(result, input)    : result;}

Using the transducers we’ve created above, let’s transform some numbers. We will use the compose function from RamdaJS.

RamdaJS is a library that provides practical functional methods and is specifically designed for functional programming styles.

const concatReducer = (result, input) => result.concat(input);const lowerThan6 = filterTransducer((value) => value < 6);const double = mapTransducer((value) => value * 2);

const numbers = [1, 2, 3];

// Using Ramda's compose hereconst xform = R.compose(double, lowerThan6);

const output = numbers.reduce(xform(concatReducer), []); // [2, 4]

The concatReducer is called the iterator function. This will be called on every iteration and will be responsible for transforming the output of the transducer function.

In this example, we simply concat the result. Because every transducer only accepts a reducing function, we cannot use value.concat.

When we compose multiple transducers into a single function, most of the time it’s called a xform transducer. So when you see this somewhere, you know what it means.

Composing multiple transducers

We’ve been using ordinary function composition in the previous example, and you may be wondering what the order of evaluation is. Although function composition applies functions from right to left, the transformations will actually be evaluated from left to right at execution time — which is far more intuitive to those of us who read in left-to-right languages.

It takes a little bit of thinking to see why this is true: given our transducer double which returns a reducing function, and our transducer lowerThan6 which also returns a reducing function, when you compose double and lowerThan6, the output of double will be passed to lowerThan6, which will then return the reducing function of lowerThan6. Thus, double is the result of the composition and the order of evaluation is indeed from left to right.

I’ve created a JSBin example with some console.log statements, so you can have a look at it for yourself.

Using RamdaJS to improve readability

Since transducers are a perfect example for a functional programming style, let’s look at the way Ramda can help us by using their set of methods.

const lowerThan6 = R.filter((value) => value < 6);const double = R.map((value) => value * 2);const numbers = [1, 2, 3];

const xform = R.compose(double, lowerThan6);

const output = R.into([], xform, numbers); // [2,4]

With Ramda, we can use their map and filter methods. This is because Ramda’s internal reduce method uses the Transducer Protocol under the hood.

“The goal of the Transducer Protocol is that all JavaScript transducer implementations interoperate regardless of the surface level API. It calls transducers independently from the context of their input and output sources and specifies only the essence of the transformation in terms of an individual element.
Because transducers are decoupled from input or output sources, they can be used in many different processes — collections, streams, channels, observables, etc. Transducers compose directly, without awareness of input or creation of intermediate aggregates.”

Conclusion

Transducers are a powerful and composable way to build transformations that you can reuse in many contexts. Once you’ve got a transducer, you can do an open set of things.

They’re especially useful when transforming big datasets, but you can also use the same transducer to transform a single record.

If you want to learn more about this subject, I recommend the following articles:

https://clojure.org/reference/transducers
http://blog.cognitect.com/blog/2014/8/6/transducers-are-coming
https://github.com/cognitect-labs/transducers-js#the-transducer-protocol

How to make your data transformations more efficient using transducers

Building your own Transducers

Composing multiple transducers

Using RamdaJS to improve readability

Conclusion

?? If you enjoyed this article, hit that clap button below ?. It would mean a lot to me and it helps other people see this post.

Follow me to get notified for more programming content like this.