R: Column Summarisation using tidy verse and purrr- towards Functional Programming.

I was working on column summarization(mean, median, standard deviation, etc) and found out better ways to select and summarise the data.

Let’s start

Let’s take Iris data.

Screenshot 2017-07-17 17.27.53

A usual way to summarise(that I used) is this:

This code has the following issues:

  • The functions mean and sd are repeated.
  • I need to write the column name specifically.

To resolve this I started exploring more and found a better way to select columns and perform column wise summaries.

On selecting:

Let’s say I want to select the first three columns. I would do like this:

This will select the columns starting from Sepal.Length to Petal.Length.

You can use regex patterns as well. Like in the example below, If I just need to get the summaries of columns that start with “Sepal”-

This will select the only Sepal.Length and Sepal.Width columns.

You can also use many helper functions with select and hence utilise many ways to select based on the name, position.

 Summarising the data:

You can directly use a function called summarized_at

The result is all mean and sd of all the columns that start with Sepal.

As you can see we have calculated the summaries without repeating the function names. In a way, we are applying functions to data, and not the opposite.

There are scenarios when the result of a function is multiple values. e.g. when we use quantile to get multiple values.

Let’s see an example-

How can perform the summarization, similar to the previous examples?

Map which is a function in the functional programming toolkit purrr. comes handy here. The only difference is that we have to use bind_rows.

This results in quantiles for the two columns that start with “Sepal”.

Summarise columns with specific properties.

Suppose you want to summarise all the columns which are numeric. You can achieve this using summarize_if.

This results in summaries for all numeric columns.



For all the data, you can also do the grouping as well


Next Steps

I will be sharing more on applying Functional Programming principles with R.