This blog post is the fourth post in a series about how to get started to build a data architecture in the cloud. The architecture is described in the first blog post. This series features two people, Dana Simpsons the data scientist and Frank Steve Davidson the full stack developer, who are building their own SaaS company that will build games based on data science. In this blog post we will describe how Dana will use Azure Machine Learning Studio combined with R to generate the images that Frank will use in the gaming website. In the last blog of this series we will learn how Frank calls this web service.
Introduction to Azure Machine Learning Studio
A good resource to get started with Azure Machine Learning Studio is the free ebook: Microsoft Azure Essentials: Azure Machine Learning. In this blogpost we will focus on the way Dana is working on generating the images for the game website that Dana and Frank are developing. To evaluate Azure Machine Learning Studio, you will need to have a Microsoft Account. You can use this account to log in to http://studio.azureml.net. Exploring Azure Machine Learning Studio is completely free.
Position of this blog in the architecture
Now we will describe the steps that Dana performs to embed her R scripts into Azure Machine Learning Studio and how she will be able to convert the experiments that she makes into a web service so that Frank will be able to use them in the Ruby layer. So the focus of this blog is more on how only a few things are needed so that both Frank and Dana will be able to work with the development tools that they both feel most comfortable with.
Preparing the text to be in the correct format
For this application, Dana started from a book in text format which she converted to a csv file that consists of one column and each row represents one sentence of the text. She did this formatting on her own computer but this can later be automated further in the green block above.
To be able to use this dataset further, she will upload this csv file as a dataset in Azure ML. So after she has logged in into Azure Ml, she has selected DATASETS in the left column. Next, she clicks on + NEW in the left lower corner and she can upload her dataset to her available datasets in Azure ML. In the picture below you can see all the datasets that she has uploaded in this way.
A high level overview of the experiment
The experiment that she built looks as the picture below. You see that there are two type of boxes, white ones and blue ones. First you only will need to focus on the white boxes. The box containing the text “chapter1_to_5_list.csv” is the box that is selecting the input data set. This dataset is fed to a “Select Column in Dataset”. Next, the output of this data is being fed to several “Execute R Scripts”. These R scripts will be generating the data analysis and the different images that are provided to Frank by the Web service in a next step.
Now, focus on the blue boxes. You will see that there is one blue box on top which is called Web service input and six blue boxes on the bottom which are called Web service output. When in a next step the web service will be generated from this experiment, random csv files that only have one column can be fed to this web service and the different images can be automatically generated.
A deeper dive into on Execute R Script box
We now look a bit deeper into the calculation of the word cloud that is being performed in the box with the blue border. As you can see in the R code on the right, first the dataset is being selected in dataset1. Next Dana is deleting some common English stopwords to provide a clearer picture about the special words from the book that she will be displaying. Next, she is working further to build a picture of the wordcloud.
She will right click on the number 2 and will select Visualize. This provides her the output of her R script. Which will look like the picture below. It is important to notice the Graphics title here. In this experiment, she will be generating several graphics and also some extra datasets. When she is happy with her results. She will generate a webservice from this experiment.
The generation of the web service
To generate a web service, she selected, Deploy Web service from the bottom. She also switched the view to the web service view by switching over the slider on the bottom to show the globe. You will see that now the blue boxes of the web service have turned dark blue. There is a curved line from an Execute R Script box to a blue box of the Web service output. When this line starts from the right dot, you will be able to export an image, left dot you will be able to export a csv dataset. Also when you click on a blue box you will be able to provide a meaningful name for the output.
For their project, Dana and Frank will be working with the Batch Execution mode because they are working with the csv files that will be uploaded. On this page, she will find the API key that she will need to give to Frank.
When she clicks on the Batch Execution mode. She will find the Request URI for the web service that she will need to provide to Frank.
Next, she scrolls down to Sample Request Payload, she can validate that all her different inputs and outputs have been defined properly.
Azure Machine Learning Studio has a Free tier that Dana can use for building her current data solutions. For the Web Services, she also still belongs within the DEV/Test limits. All the pricing details can be seen here . The only thing were there will be a cost involved is in the blob storage for the files that will be used as the input and the output for the web service. These are currently 0.01 CAD for using the web service for one book for a bit more than twee weeks.
In this blog post we showed how Azure ML can be used to generate web services on top of the data science experiments that Dana has built. In the next blog, we will show how Frank will call this web service to extract all the different images.