Comparing cloud storage

Cloud Datastore is the best for semi-structured application data that is used in app engines applications.

Bigtable is best for analytical data with heavy read-write events like AdTech, Financial or IoT data.

Cloud Storage is best for structured and unstructured, binary or object data like images, large media files, and backups.

Cloud SQL is best for web frameworks and in existing applications like storing user credentials and customer orders.

Cloud Spanner is best for large scale database applications that are larger than two terabytes, for example, for financial trading and e-commerce use cases.

Possible Virtual Machines for Machine Learning(AWS)

These are few VM options for Machine Learning built on/ using  AWS.



The first is software as a serviceThey’re called Databricks. It’s a third party service, so you pay them and then you access underlying Amazon resources. And they are well-known for their implementation of Spark as a service.

Their implementation includes their version of a notebook. It is a Jupyter-like notebook,but it’s a Databricks notebook. A optimized addition of a Spark cluster and the ability to install additional libraries.

The next level is Platform as a Service and Amazon’s offering there is Elastic MapReduce, which is managed Hadoop and Spark. It comes with the ability to install common libraries, such as Spark and Hive and Pig and other types of libraries, just by clicking when you install when you’re setting a flag if you’re doing it via script and you can also optionally install additional machine learning libraries such as TensorFlow and MXNet with bootstrap actions. Interestingly, Amazon has already installed, in SageMaker notebooks, the environments, they’re called Sparkmagic, so that a connection to an external cluster of Spark, including EMR, can be easily made.

Now the third possibility is infrastructure as a service.Most people would say, “Well, you can have an EC2 machine learning or deep learning AMI or image or you can just use EC2.” And, yes, I think you can use a machine learning AMI. It’s optimized for deep learning and all the libraries are already pre-installed. I actually would not recommend you use EC2 because you must manually install and configure all the language run-times and machine learning libraries and I have seen this task take people days or even weeks to set-up at a cluster of machines.

Purrr – mapping pmap functions to data

In functional programming paradigm, map is used to map a set of values to another set of values based on the function used.



In general sense, a unit of function should only be used to map one value to another. While this utility can be applied to a list of inputs to produce another set of input using map function.  It takes two inputs

  • Function
  • Sequence of values

It produces new sequence of values where the function has been applied.

which prints

Note that the above is only used for one input. For two input values you can use map2.

Now for situations where you need to use multiple input values(say multiple lists) to apply to a function, you can use pmap

An important point- Length of x and y should be same.

which produces


What’s the difference to map and map2?

Both map and map2 take vector arguments directly like x = and y = , while pmap takes a list of arguments like list(x = , y = ).

Exploring purrr furthur, I see new use cases which I will explain in next posts.


purrrr is a productivity ninza. Try to  use it.

Think Functional!