These are few VM options for Machine Learning built on/ using AWS.
The first is software as a service. They’re called Databricks. It’s a third party service, so you pay them and then you access underlying Amazon resources. And they are well-known for their implementation of Spark as a service.
Their implementation includes their version of a notebook. It is a Jupyter-like notebook,but it’s a Databricks notebook. A optimized addition of a Spark cluster and the ability to install additional libraries.
The next level is Platform as a Service and Amazon’s offering there is Elastic MapReduce, which is managed Hadoop and Spark. It comes with the ability to install common libraries, such as Spark and Hive and Pig and other types of libraries, just by clicking when you install when you’re setting a flag if you’re doing it via script and you can also optionally install additional machine learning libraries such as TensorFlow and MXNet with bootstrap actions. Interestingly, Amazon has already installed, in SageMaker notebooks, the environments, they’re called Sparkmagic, so that a connection to an external cluster of Spark, including EMR, can be easily made.
Now the third possibility is infrastructure as a service.Most people would say, “Well, you can have an EC2 machine learning or deep learning AMI or image or you can just use EC2.” And, yes, I think you can use a machine learning AMI. It’s optimized for deep learning and all the libraries are already pre-installed. I actually would not recommend you use EC2 because you must manually install and configure all the language run-times and machine learning libraries and I have seen this task take people days or even weeks to set-up at a cluster of machines.