Sunday, 26 March 2017

Some Azure security settings don't really mean what you think they mean

In today's blog we will focus on how to reset a public key for a Linux VM in Azure. Spoiler Alert!!! Using the Reset password functionality from the Azure Portal doesn't have the functionality that you would expect from a password reset. We will explain this further in this blog.
First, we will explain how you generate a VM with public key access in the Azure portal. Next, we will explain what happens when you reset your public key for a VM using the Reset Password functionality in the portal. Next we will explain how you can actually remove the previous public key to get a complete reset functionality. Finally we will also explain what happens with the public keys when a VM gets redeployed.

Setting up a Linux VM in Azure through the portal
How to setup the VM
To setup a Linux VM in Azure, you can go to the Azure Portal. For the Authentication Type you can choose SSH public key. To generate a public/private key pair you can use for example PuTTYgen. You can next choose all the other parts of the VM. In the portal you also need the username that will use the VM. In this example we will use the username mary and Mary will provide her first public key mary1.

Logging in to the VM
You can next choose the ssh client of your choice to access the VM when you provide the IP of the machine together with the private key that corresponds with the public key pair. In this way we can successfully log into the machine.

What happened behind the scenes?
When you use the SSH public key authentication type when you generate your VM, two things happen behind the scene.
1. In your VM, a .ssh directory has been  generated. In this .ssh directory, a file authorized_keys has been generated. In this file you will see the public key that you provided in the Portal.

2. In the Portal under the Automation script tab, various templates were generated. You can use these templates later on when you want to automate the deployment of your VMs. In the picture below you can see a fragment from the Template file and you can see in the highlighted osProfile section that the user mary can log in with the provided public key and that password authentication has been disabled.

Sometime bad things happen
Unfortunately your laptop on which you had installed your keys got stolen and therefore you need to reset all your passwords and off course you also need to revoke public key access for your public keys that you had on your VM.

Resetting the public key through the Azure Portal 
The steps to follow in the portal
When you go to the Azure Portal and go to your VM, you can select Reset password from the left column. In the screenshot below you can see the information that is provided to you and the information that you need to fill in to reset your public key. In this case we have added in public key mary2.

Logging in with key mary2

We use the mary2 key to login and we can happily login to our VM.

Logging in with key mary1
Unfortunately, the person who stole your laptop was able to extract the key because no full disk encryption was used. Contrary to what you expect the key still works and he has access to your vm in the cloud.

What happened behind the scenes?
1. The public key that you provide through the Azure Portal gets added to the authorized_keys file. However, the original public key has not been removed. Therefore you can keep using this key to access the VM.

2. In the Portal in the Automation script you will see that there is still being referred to the first public key. Notice that just copying over this information to deploy a new virtual machine might allow the laptop thief again to gain access.

How to do a "real" reset?
The easiest way to do a real reset of the ssh key, is to log in to the VM and remove the previous public key from the authorized key file. The automated script will however still look the same as above. Also make sure that you put in comments that start with # in your authorized_key file so that you know which public keys are used for what purpose.

What happens after a redeploy of the virtual machine?
When you redeploy your virtual machine, the active authorized_key file is copied over during deployment. So if you could login with key1, key2 and key3 before deployment. You can also still login with key1, key2 and key3. These keys can either be added in the authorized_key file through the VM itself or they can be added to the VM by the Reset functionality of the portal. 
However deploying of  your VM might result in downtime of the machine and in loss of data. Therefore we will go in further detail about what happens with your VM when you redeploy in a later blog post.

In this blog post we have learnt how you can reset the public key that you use to gain access through a Linux VM in Azure. This consists of two steps. The first one is going to VM in the portal and use the Reset password functionality to add an extra public key. If you want to disable the previous public key from the authorized_keys file. You next will need to log in into the VM and remove the previous public key. When a VM is redeployed all the public keys from the authorized_keys file will provide access after the redeployment. In a later blog post we will focus on what happens behind the scenes when a VM is redeployed in Azure. In another upcoming blog post we will discuss how you can switch from password access to a VM to access to a VM with a public key.

Friday, 10 March 2017

How to call a web service generated from Azure Machine Learning Studio From Ruby

Background Note
This blog post is the last post in a series about how to get started to build a data architecture in the cloud. The architecture is described in the first blog post. This series features two people, Dana Simpsons the data scientist and Frank Steve Davidson the full stack developer, who are building their own SaaS company that will build games based on data science. In this blog post we will describe how Frank can call the web service that Dana generated in Azure Machine Learning Studio from Ruby.

Position in the architecture
Frank wants to run the web service that Dana built in Azure Machine Learning Studio. To be able to use this web service, the files that he is providing as input and the files that he gets as output need to be stored in the cloud, so in blob storage. He already knows how to handle these files in the cloud, as we described in a previous post. So therefore he only will need to focus on how to feed and call the web service and how to extract the results from the web service. The major challenge for Frank was also that there was no automatically generated stub available to call a web service from Azure ML for Ruby. This is the case for Python, C# and R. 

Understanding the payload
In the picture below, we see again how Dana sees her experiment in Azure ML. On top, we have the input file in blue and on the bottom we have all the different output files.
Frank will need to build a json object that looks like the Sample Request Payload below. The top blue box presents the input file. The bottom blue boxes present the output files. For the example the output label for the word cloud from the previous blog and the final location of where this file will be stored are annotated in the orange boxes in the picture below. The main goal will be now to generate this json object.

Defining the payload in Ruby
Frank implements now a hash structure, payload_hash in ruby.  For the ConnectionString he looks up again the name of the storage account, and the account key. To be able to distinguish between different runs of the web service he will use a timestamp to have unique names.

Start the web service
To start the web service, Frank needs the API key from the web service and the URI. The API key can be found on the main page of the web service as shown below.

When Dana clicks next on Batch Execution she gets the picture below and there she finds also the Request URI. This is what she will give at Frank.

Armed with all these pieces of information, Frank has now built the Ruby code to call the web service that Dana has built.

While the web service is running
The web service hasn't finished immediately. Therefore, Frank will need to check when the web service is done. You can see the code for this in the code piece below.

I was promised an image?
The goal of this blog was to extract an image from the web service. But currently Frank has only extracted csv files. As you may remember we have connected the right dot to get the output. This is accidentally still called a csv file. But the input itself looks like below.

We now will need to extract the piece with all the funny letter and number combinations that has a graphics title, this will actually be the image file. 

If you have read all the different blogs of this series you have now an idea about the architecture that is involved for building the architecture of a data application in the cloud and what the different tasks are that people need to do that are building this architecture. Thank you very much for reading my blog.

Thursday, 9 March 2017

How to use R scripts in Azure Machine Learning Studio and generate a webservice

Background Note
This blog post is the fourth post in a series about how to get started to build a data architecture in the cloud. The architecture is described in the first blog post. This series features two people, Dana Simpsons the data scientist and Frank Steve Davidson the full stack developer, who are building their own SaaS company that will build games based on data science. In this blog post we will describe how Dana will use Azure Machine Learning Studio combined with R to generate the images that Frank will use in the gaming website. In the last blog of this series we will learn how Frank calls this web service.

Introduction to Azure Machine Learning Studio
A good resource to get started with Azure Machine Learning Studio is the free ebook: Microsoft Azure Essentials: Azure Machine Learning. In this blogpost we will focus on the way Dana is working on generating the images for the game website that Dana and Frank are developing. To evaluate Azure Machine Learning Studio, you will need to have a Microsoft Account. You can use this account to log in to Exploring Azure Machine Learning Studio is completely free.

Position of this blog in the architecture
Now we will describe the steps that Dana performs to embed her R scripts into Azure Machine Learning Studio and how she will be able to convert the experiments that she makes into a web service so that Frank will be able to use them in the Ruby layer. So the focus of this blog is more on how only a few things are needed so that both Frank and Dana will be able to work with the development tools that they both feel most comfortable with.

Preparing the text to be in the correct format
For this application, Dana started from a book in text format which she converted to a csv file that consists of one column and each row represents one sentence of the text. She did this formatting on her own computer but this can later be automated further in the green block above. 

To be able to use this dataset further, she will upload this csv file as a dataset in Azure ML. So after she has logged in into Azure Ml, she has selected DATASETS in the left column. Next, she clicks on + NEW in the left lower corner and she can upload her dataset to her available datasets in Azure ML. In the picture below you can see all the datasets that she has uploaded in this way.

A high level overview of the experiment
The experiment that she built looks as the picture below. You see that there are two type of boxes, white ones and blue ones. First you only will need to focus on the white boxes. The box containing the text “chapter1_to_5_list.csv” is the box that is selecting the input data set. This dataset is fed to a “Select Column in Dataset”. Next, the output of this data is being fed to several “Execute R Scripts”. These R scripts will be generating the data analysis and the different images that are provided to Frank by the Web service in a next step. 
Now, focus on the blue boxes. You will see that there is one blue box on top which is called Web service input and six blue boxes on the bottom which are called Web service output. When in a next step the web service will be generated from this experiment, random csv files that only have one column can be fed to this web service and the different images can be automatically generated.

A deeper dive into on Execute R Script box
We now look a bit deeper into the calculation of the word cloud that is being performed in the box with the blue border. As you can see in the R code on the right, first the dataset is being selected in dataset1. Next Dana is deleting some common English stopwords to provide a clearer picture about the special words from the book that she will be displaying. Next, she is working further to build a picture of the wordcloud. 
She will right click on the number 2 and will select Visualize. This provides her the output of her R script. Which will look like the picture below. It is important to notice the Graphics title here. In this experiment, she will be generating several graphics and also some extra datasets. When she is happy with her results. She will generate a webservice from this experiment.

The generation of the web service
To generate a web service, she selected, Deploy Web service from the bottom. She also switched the view to the web service view by switching over the slider on the bottom to show the globe. You will see that now the blue boxes of the web service have turned dark blue. There is a curved line from an Execute R Script box to a blue box of the Web service output. When this line starts from the right dot, you will be able to export an image, left dot you will be able to export a csv dataset. Also when you click on a blue box you will be able to provide a meaningful name for the output. 

For their project, Dana and Frank will be working with the Batch Execution mode because they are working with the csv files that will be uploaded. On this page, she will find the API key that she will need to give to Frank.
When she clicks on the Batch Execution mode. She will find the Request URI for the web service that she will need to provide to Frank.
Next, she scrolls down to Sample Request Payload, she can validate that all her different inputs and outputs have been defined properly.

Cost Analysis
Azure Machine Learning Studio has a Free tier that Dana can use for building her current data solutions. For the Web Services, she also still belongs within the DEV/Test limits. All the pricing details can be seen here . The only thing were there will be a cost involved is in the blob storage for the files that will be used as the input and the output for the web service. These are currently 0.01 CAD for using the web service for one book for a bit more than twee weeks.

In this blog post we showed how Azure ML can be used to generate web services on top of the data science experiments that Dana has built. In the next blog, we will show how Frank will call this web service to extract all the different images.

Wednesday, 8 March 2017

How to access blob storage with Ruby

Background Note
This blog post is the third post in a series about how to get started to build a data architecture in the cloud. The architecture is described in the first blog post. This series features two people, Dana Simpsons the data scientist and Frank Steve Davidson the full stack developer, who are building their own SaaS company that will build games based on data science. In this blog post we will describe how you can access blob storage in Ruby. In the next blog post of this series you will learn how to use R scripts in Azure Machine Learning Studio.

Position in the architecture
All programs need data at some point. That data can either come from files or from a database. This is not different when your program lives in the cloud. In this case, it also might want to access data that is stored in the cloud. For Azure these files are stored in Blob Storage. For Frank and Dana's  application from Dana and Frank, Frank will need to know how we will be able to access Blob Storage from Ruby. Because eventually you might want to store large pieces of information in the cloud, it is important to also again understand the cost.

How to get the needed pieces from the Azure Portal
To be able to access a piece of blob storage you will need to know the account name, the access keys and the container name. When you go to the Azure portal, you can select Storage accounts to get an overview of the storage accounts. When you click on one of the storage accounts you will get a view similar to the view below. This will help you to identify the pieces needed for your own example.

Initializing blob storage object from Ruby
When you want to access blob storage, make sure that you have the Azure gem installed. Next write require ‘azure’ at the the beginning of your script. When you have gathered all these pieces of information, you will be able to initialize your blob storage object and you will also be able to define your connectionstring.

List the content of your blob storage
The easiest way to list your content from a blob container is by using the Azure portal. You can find an example of this below.

Accessing the content from a container from Ruby however is also straight forward.

The first puts will just write the blob storage objects and the second puts will actually write down the file names like you can see below.

Downloading the files
Finally, it also might be the case that you want to download the files to a local drive or to another VM in the cloud. Below you can see the code for this.

Cost Analysis
Storage Cost
An overview of the blob storage prices can be found here. There is again a fixed cost and a variable cost for using blob storage. The fixed cost is just for hosting your data. The second cost is for accessing your data. The price for storing your data still depends on the amount of redundancy that you require and the amount of data that you are storing in blob storage. The last aspect is whether you want fast or hot access or slower or cool access to your data. All these different combinations, result in the price for storing your data ranging from 0.01 USD per gigabyte per month till 0.046 USD per gigabyte per month.

Access Prices
For accessing your data there are differences between the blob/block operations and data retrieval and data writes. For the blob/block operations the cost is counted in number of operations except for the delete which is free. The data retrial is counted in gigabyte. But it is interesting to notice here that data retrieval and data writing from hot data storage is free. Finally if you would want to import or export large amount of data, there are also options using Azure hard drive disks.

We have learnt more in this blog about using blob storage with Ruby and the costs that are involved in it. It is important to understand the usage pattern for your data to make the best decision about which type of blob storage to use. In one of the next blogs we will show how blob storage can be used to call the web service that Dana generated from Azure Machine Learning Studio.

Tuesday, 7 March 2017

Setting up Linux Virtual Machines in Azure For Ruby Development.

Background note
This blog post is the second post in a series about how to get started to build a data architecture in the cloud. The architecture is described in the first blog post. This series features two people, Dana Simpsons the data scientist and Frank Steve Davidson the full stack developer, who are building their own SaaS company that will build games based on data science. In this blog post we will describe how you can create a Linux Virtual Machine (VM) in Azure. We also will investigate the different components of hosting a VM in the cloud. This is the next blog in the series.

In this blog, Frank has written down his notes that he uses to deploy virtual machines. He has added in lots of screenshots so that it is easy for him to remember the different steps he needs to take in the Azure Portal. These screenshots will also make it easier in the future to hand the creation of VMs off to someone else. While Frank is figuring out how to create the VMs, Dana is investigating the costs and the durability of the VMs when they are not powered on.

Position of this blog in the architecture
Currently Frank will have two VMs on which he will host the different components like the Ruby layer and the Apache web server to host the websites for the different games.
In what follows he will explain first how he creates the first Debian VM in the Azure cloud that is used for Ruby development. Next he will explain which links he uses to install Ruby on this VM.
The second VM he will use as a web server. Therefore he will need to install apache on this VM.
Finally, Dana will give her analysis of the costs that are involved in hosting a VM in the cloud. She will look both at the fixed and the variable costs of hosting a virtual machine in the cloud.

Creating a Debian Linux VM for Ruby development

1. Log into the Azure Portal
Go to the Virtual Machine Section by selecting Virtual Machines in the left column.

2. Add a virtual machine

3. Select Debian Jessie

4. Configure the basic options. 

You can setup your virtual machine so that you can login with your public private key pair. To generate this pair, have a look here. For the rest you can create or use an existing resource group. Next select the geographical location where you want to store your virtual machine.

5. Select the size of the virtual machine

Now you can select the requirements for your virtual machine. When you are creating a virtual Linux Machine you will have an Operating system disk (OS disk) and a temporary disk attached to it
The Operating system disk is labeled /dev/sda and the temporary disk is typically /dev/sdb. The data of the OS disk is persistent in case you stop, deallocate, pause, reboot, shut-down or resize the VM. However, in case of a delete or a failure of the VM this disk is not persistent. In case that you want to have a case that your disk is persistent in the case of a failure or a deletion of your vm you need to attach an extra data disk. 
So in case that you are not working with a data disk make sure that you are making backups of your system in a different way. A good strategy for this is two make sure your code is in github and to study the Automation script that has been generated to create this VM combined with your .bash_history when you installed the needed software packages.
For the development VM we will select the DS1_V2. However in case you need more computing power, you just can change the size of your VM. In cases that you need less computing power, you just can scale down your VM. 

6. Creating an attached data disk 

Because Frank wants to make sure that his data and his code is still safe in case there happens a failure of the virtual machine or if it would get accidentally deleted, he also attaches a data disk to his VM. Therefore he goes to his VM in the azure portal and selects Disks. Next he will click Add data disk.

Now he will select a storage container where he wants to locate his attached data disk. He also has the possibility to select an existing blob. Make sure that you carefully watch the size of the disk that you are making. To have an idea of the prices you can have a look here. We also will discuss this further in the cost analysis section.

Based on the information that you can find here, you will be able now to initialize the new data disk in your VM and mount it as a separate drive. Carefully execute the different steps. 

7. Installing the needed software and packages

Now we need to install Ruby. If we just would install Ruby with sudo-apt-get install we wouldn’t get the newest version. Therefore you can follow the steps as described on this site to install Ruby. For this application ruby 2.4.0 was installed. 
To be able to work with Ruby and Azure you still will need to install the Azure gem. To call the web services generated from Azure ML, you will need to install the httparty gem for Ruby. Therefore, you will need to type the following two commands.

  • sudo gem install azure
  • sudo gem install httparty

Creating a Debian Linux VM as a Web Server

1-4. Creating a VM from the portal

1 till 4 are the same as the previous. However in this case we will select a smaller VM and we will not attach a data disk. You can see the specifications below. But remember again in case you need more computer power, you easily can scale up your VM.

5. Defining a public IP address

Because this VM will be used to host the webserver, a public IP is needed. You can select a static when you are creating the VM. Because this is an extra feature that you are selecting, there is an extra cost added for this.

6. Defining a Network Security Group

Because your VM will be used to host your websites, it also will need to be accessible through http. Therefore an extra Network Security Group will need to be defined as you can see below.

7. Installing Apache2

Finally, to install apache2,  you need to type the following command.
  • sudo apt-get install apache2

Cost Analysis
As the last part of this post, Dana will describe the cost of creating and using the virtual machines in Azure. This consists of a combination of fixed costs and variable costs.

The cost of the web server

The web server will need to be up all the time. For the rest this VM also needs a static IP and http access. Therefore you can consider all the costs of this VM as fixed. The cost for a public static IP is 0.11 CAD per day. Therefore the cost of the web server will be 48.3 CAD a month.

The cost of the Ruby development VM

The development machine doesn't need to be powered on all the time. Therefore we can make a distinction between fixed costs and variable costs for the development VM.

The fixed costs are the costs that you need to pay whether you machine is running or not. In this case these costs consist of the Operating System Disk of the VM and the attached data disk. For this VM, the cost of the disks is 0.06 CAD per day. Next Frank also decided to attach a standard unmanaged data disk of 128 GB that costs him 8.6 CAD a month.

Let's have a look now at the variable costs. In case the virtual machine is running the whole day, the machine costs 2.13 CAD per day. So in this way you see that you can save a lot by turning off your development VM in case you are not using it. To facilitate this, there exist auto-shutdown options that you don't need to remember each day to turn off your VM. It is good to know that your data stays on the operating disk in case your VM is in the deallocated state. Remember however that there is no automatic back up of this drive in case of failure. If you want to make daily backups of your system. You can check out this resource to understand the costs.
If you are using a standard unmanaged disk, you also will need to take into account the access price cost. This is a cost that you will be able to estimate further after your first months of development.

Combining everything together. We assume that Frank has 20 work days of eight hours in a month. This means that he will be using his VM 160 hours a month. Therefore the cost of running his VM for a month is 14.2 CAD during his working days. Next he still needs to pay for this fixed costs of the OS disk (1.2 CAD a month) and the attached data disk (8.6 CAD a month). In this way the cost of the total development VM is 24 CAD a month. By using the auto-shut down option, Dana has implemented cost savings and used some of these cost savings to add in extra piece of mind for Frank by adding the attached data disk.

In this blog post Dana and Frank have described all the different steps that you need to follow to create a Linux VM in Azure and to use Ruby on this VM. We also discussed further how to setup a public IP address and how you can allow http access so that the VM can be used as a web server. Finally Dana discussed the cost of hosting such a VM in Azure and we discussed the types of disk that are available on the VM. In the next blog post we will learn more about how to use this VM to access blob storage in Ruby.

Monday, 6 March 2017

How to build an architecture for your cloud data application

But how do you actually start with building data applications in the cloud? Which known technologies can you use? What are new technologies? How do all these technologies work together? What will the cost be of such a solution? 
We will try to answer these questions in this series of blog posts. We will show how two people, Dana and Frank (see picture below) will start building the next Unicorn using cloud based data technology. We will first present their product, next introduce Dana and Frank and finally explain how they started building their architecture.
The next blogs in the series can be found here

The Usbourne first book of the Computer, Usbourn Publishing Ltd, 1985

The game changing idea
Together they are building a SaaS company that makes online games with data science. Their first game that they are working on is called What's the title?. They show automatically generated data graphs of classic novels. The user needs to guess the title. A real game changer. Below you can see their first game and you can also check out the link to their website.

Meet the team
For now, the team consists of only two people, but they hope to grow fast in the near future. Therefore they both agreed that they will need to acquire extra skill sets besides their current skills to make the full application. Their focus initially is to get all the components connected to have a global overview and see how the different pieces are working together. Later on they can then elaborate further on the different components and move on to using more advanced technologies.

Dana Simpsons, the Data Scientist

Dana loves everything about mathematics and data and she has a decent background in computer science. Her favourite programming languages are R and Python. Finally she also has some SQL and NoSQL database skills. As part of her tasks she will also watch the cloud costs and look for potential savings. She will also be responsible for customer success and will analyze the usage patterns.

Frank Steve Davidson, the Full Stack Developer

Frank loves everything about programming and has a decent background in mathematics. His favorite programming languages are Java and Ruby. He will be responsible for the architecture design and will make sure that the solution gets deployed in the cloud.

The generic architecture
Dana and Frank did some brainstorming meetings and came up with the following architecture. With this architecture they want to achieve the three goals explained below.

Goal 1

Their game will be hosted in the cloud. This means that both their development and their production servers will live in the cloud together with the data storage. This means that they will need to look at how virtual machines are being set up by their chosen cloud provider and what kind of cloud storage solutions there are available. Because they are hosting a website, they also still will need to investigate this further.

Goal 2
Frank has chosen to work with Ruby as programming language to build out the back-end further. Some responsibilities of the back-end will be the storing of different novels in the cloud in a data lake and transforming these novels into a data format consumable by the web services that Dana is building. He will extract the images that Dana generates and will store them back in the cloud. For now they only will use a simple front-end because the data solution will already provide detailed images. In the future they will look further into more advanced front-end technology.

Goal 3
Dana will need data science related software to be able to make her analysis and generate the graphs. She also will need to have an easy way to access different data sources when she is performing her analysis. She prefers to have access to python and R. Finally she wants to have an easy way to expose her graphs and analysis to Frank. When she is happy with her data science solutions she wants to build an automation layer on top of this.

The architecture in Azure
Dana and Frank compared the different cloud providers and they came to the conclusion that Azure suited their needs best. The architecture in Azure is shown below. In what follows we will explain these components more. In the next blogs you will learn how to setup and use these components.

Goal 1: How to deploy and develop

Frank uses a Linux Virtual Machine in Azure on which he has installed Ruby. He next uses a virtual machine to host the web server Apache2.  To save costs he will put his Ruby development machine on auto-shut down mode. To be able to host this virtual machine as a web server he will need to select a public ip address and allow http access on the server. We go into more detail in these steps in the next blog post of this series.

Goal 2: How to talk to and feed the data storage solution

Frank and Dana use blob storage as their cloud storage solution. You can imagine Blob storage as your own big drive in the cloud. When you are developing you can use several APIs to access blob storage. Frank will use the azure gem for Ruby. Dana will analyze the costs that are involved with using blob storage. We go into more detail in this in the third blog post of this series.

Goal 3: How to talk with and build the Data Science Solution

Dana used Azure Machine Learning Studio to build an experiment that generates the needed graphs. Next she deployed this experiment as a web service. This web service consumes the formatted text of a classic novel that is stored in Blob storage and then produces six different graphs. These graphs are then stored in Blob storage. She will describe her strategy in the fourth blog post. Next Frank will describe in the last blog post how he feeds and consumes the data from this web service.

In this blog post we have introduced two people, Dana and Frank, on their journey to build their SaaS company in the cloud. We briefly mentioned some of the technologies that they needed. In the next blog posts we will explain these technologies more in depth so that you will be able to use them for your own application.