The Data Instructors: November 2016

Introduction

Getting started in the cloud might be an overwhelming experience. When you are learning to work with the cloud, you also want to keep your development costs as low as possible. Therefore I will present with this and the following blog a use case that will allow you to write your first cloud blog storage program at no cost. We will use for this the cloud offering of Microsoft which is called Azure.

The use case will exist of uploading a file to the cloud so that a special cloud service would be able to process this file and return the processed result which you eventually can download again.

In this first blog I will present all the different development tools that you need to develop this use case. We will use an emulator for the cloud in this way you can postpone starting to use your free Azure credits when you have a better idea about your first cloud project.

What is Blob Storage

Blob storage is a service for storing large amounts of data or files in the cloud. Such files are called blobs in the cloud. You can see this as your personal hard drive in the cloud. These files can be made accessible through http or https and you can make them publicly available to the world or you can use it to store private application data.

To be able to use blob storage in the cloud, you need a storage account. In your storage account you will have containers what you can compare with directories on your computer that are storing the blobs. If you would store a blob file myblob on the account myaccount, in container mycontainer.

The url of this blob looks as follows: https://myaccount.blob.core.windows.net/mycontainer/myblob . The equivalent of this on your windows account myuser is a file myfile in directory mydirectory is the file C:\Users\myuser\Documents\mydirectory\myblob.

The blob storage emulator will in this case also emulate the storage account.

Use case

We are building the application that is shown in the picture below. We have locally on our computer the file C:\test_files\Awesome_local_file.txt. We have a web service in the cloud that performs a special operation on this file that we can't perform on our file locally. Therefore we will need to upload the Awesome_local_file.txt to my_container as the blob larf_YYYYMMDDHHmm.txt . In this way the Web service will be able to access the file and perform his operation on it. The web service will provide as output the olarf_YYYYMMDDHHmm.txt file which it will store in the container mycontainer. Afterwards you will be able to download this file back to your computer as Awesome_output.txt.

Needed free resources for blob storage development
Development Environment

This part is based on Get started with Azure Blob storage using .NET As the development environment we will use Visual Studio. Visual Studio Community is a free version of Visual Studio.

When you have installed Visual Studio, open Visual Studio and select New Project in the left column and select a new Visual C# Console Application as demonstrated in the picture below and press OK.

When you have created your project, you will need to add some extra libraries. You will be able to do this with the Nuget Package Manager. You can launch it by Selecting Nuget Package Manager from the Tools menu. Next you can select the Nuget Package Manager Console.

In the console in the bottom type the following two commands

Install-Package WindowsAzure.Storage
Install-Package Microsoft.WindowsAzure.ConfigurationManager

Azure Storage Emulator

For this use case we will use a blob storage emulator, this will mean that your computer will emulate the blob storage environment in the cloud locally and you don't need to generate an Azure account yet. To be able to use the Azure Storage Emulator, you first will need to have an instance of SQL Server installed. A free option is offered in SQL Server 2016 Express.

When SQL Server has been successfully installed, you will be able to install the Azure Storage Emulator. When you have installed the storage emulator you can start it from the program menu. You will see a command window similar such as the one below when everything has run correctly.

Authenticating requests against the storage emulator

The benefit of using the storage emulator is that the account name and the account key are the same for all the developers using the storage emulator, so these ones don't need to be kept secret. These values are:

Account name: devstoreaccount1

Account key: Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==

When you want to use this emulator in Visual Studio, you need to put the following values in the app.config file. But we will cover this more in the next blog when we are building up the use case.

Microsoft Azure Storage Explorer

To be able to access your files in storage with a desktop application similar to Windows Explorer, you can install Microsoft Azure Storage Explorer, To connect to your storage emulator, use the Account key Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw== and the account name devstoreaccount1.

The Microsoft Azure Storage Explore will look now like the picture below.

Conclusion

Congratulations, now you have set up all the different parts to be able to get started with blog storage. In the next blog we will show step by step how you can build the use case. Good luck with your journey to the cloud.

In the current world of software development cloud computing has taken an important place. The triage of software crashes however has become more challenging because of the use of the cloud. Software crashes never happen at a good time, therefore it is important that you beforehand know where you can get all your logs. So that you can build an efficient triage strategy.

In this blog post I will explain some tips learnt from analyzing crashes from a cloud data solution that I recently developed. This data solution was a web service derived from Azure Machine Learning Studio (Azure ML) that consumes and produces blob storage files. Blob storage is the data storage solution from Azure. Although some of the tips are only related to Azure ML and blob storage, they can also help you out to build an efficient triage strategy for other cloud solutions.

In the next part of this blog I will first describe the cloud architecture, then the different logs that are available and how you can access them and finally, I will discuss some bug triage strategies.

High level overview of the cloud architecture

Web service from Azure Machine ML

In the picture above, you can see an overview of the architecture that I built around the web service derived from Azure ML. The only important thing that you need to know about Azure ML is that it provides you a lot of different data science and machine learning algorithms in an easy way to solve complex data problems. Such a solution is called an experiment in Azure ML. An experiment will consist of different components that are connected to each other. When your experiment satisfies the requirements, you can deploy it as a web service and make it part of your cloud solution.

Web services can both be consumed in a Request-Response Service and in a Batch Execution Service (BES) mode. The first one means that you only provide one data point to be evaluated by the web service and the second one means that you provide larger volumes of data to be consumed by the web service. In this example I use the BES mode.

Feeding the web service from blob storage

The web service is fed with files from Azure blog storage. Azure blob storage is a service for easily storing files in the cloud. In the current architecture, I make changes of my files locally, then these files are automatically uploaded to blob storage with a script. The web service consumes these blobs and produces new blobs. The blobs are finally downloaded to my local machine.

Multiple instances of this web service can run at the same time and if the maximum number of simultaneous instances has been reached, the instances are queued till one has been finished. In a perfect world, everything runs perfectly and my blog post would end here. However, bugs and crashes are part of real life. Therefore, I will provide now some tips that can help with efficient triage of bugs so that you can deal with them when they appear in production.

Have access to all the evidence

Know which logs are available and how you can acquire them

One of the most important things before you can start the triage process is having access to all available logs. On the one hand, the cloud components that you are using might be generating client side logs that you can easily access from your cloud portal. They might however not be turned on by default. On the other hand, your cloud provider also might be generating server side logs that you might get access to in case of need. Finally, you also must make sure that your own application generates useful logs that you also save in a consistent way.

Turn on client side logging in Azure ML

To access your logs, you need an Azure account. The client side logs are not automatically turned on in Azure ML. You can turn on these logs through the Classic portal or through the Azure Machine Learning Web Services portal. I prefer to use the Azure Classic Portal. You will be able to find these logs in the ml-diagnostics folder in your blob storage.

Get familiar with the log file structure

Your log files will be stored in a blob called ml-diagnostics and each web service will have a separate unique identifier. When you are running your web service, you will see folders with each run of your web service.

If you look into the folder belonging to the run of a web service. You will find files that are structured in the following way ‘COMPONENT_TYPE’_’NB’.stdoutand ‘COMPONENT_TYPE’_NB.stderr. In the stdout file you will find normal output information and in the stderr file you will find the error information.

Examples for the ‘COMPONENT_TYPE’ are Apply%20SQL%20Tranformation, Execute%20R%20Script, Join%20Data. Which refer to different components like SQL Transformations, R scripts and Join components that you can add in Azure ML.

It is however still a challenge to know which ‘NB’ corresponds to which component in Azure ML. For the Python script and the R script components you can solve this easily by adding in an extra print command with a useful identifier that makes it easier to determine which of the components you are looking at. For the other ones you need to dig deeper into the file structure.

Save your local logs

Also, make sure that you keep track of your local logs in which you are calling the web service. This will help you out later on with debugging issues. Some information that is useful here are: time stamps, unique identifiers, files that you are saving, errors thrown by the web service. Also make sure that you keep the information that you need to revert changes in case of a crash.

Save your local logs

Last but not least, make also sure that you keep track of your local logs in which you are calling the web service. This will help you out later on with debugging issues. Some information that is useful here are: time stamps, unique identifiers, files that you are saving, errors thrown by the web service. Also make sure that you keep the information that you need to revert changes in case of a crash.

Ask an example of a cloud side log in a normal case

At the cloud side there also might be extra log information available that you can’t access yourself directly. If your web service is a critical component of your data solution it might be a good idea to ask support for these log files. That way you have an idea what information you can derive from these log files. You can then also ask for these logs when you run into a problem.

Catching the bugs

Is it a missing data issue, a data formatting issue or something else?

The difference between a data issue and a formatting error is tricky. A formatting error means that there is an error in one of the input files which means that the web service couldn’t run till the end. A data issue means that the data was uploaded to blob storage but that a component of the web service started running before the data was available. This also means that the web service couldn’t run till the end. Besides this you also can have a cloud side error which also will mean that your web service didn’t run till the end.

After some trial and error, I have established the following steps to determine the root cause of the issue.

1. I go in Azure Machine Learning Studio and I run the Azure Machine Learning experiment with the input blobs that caused the bugs.
2. If the experiment runs till the end smoothly I will examine the log files from the last component and will look for something that says 0 rows. If I find this, this means that a component started to run before the data was available. In the other case there might be a cloud side issue which I will discuss later.
3. If the experiment fails, I now know on which component it failed. During development of the R and Python components I made sure to add in enough logging information and this will easily help me to track down the issue and resolve it. The fix will mean fixing the local input file and uploading the fixed file to the cloud.

In case these two strategies don’t work you might have run into a cloud side error.

Cloud side error

There exist two types of cloud side errors, the glitches and the systematic errors. A cloud side error will always start as a glitch because there always will need to be a first time that something bad happens. Make sure that you store the information of this error and try to memorize the error. If you don’t see the error happening again, then it was a glitch.

However, when you see the same error happening more than once you might have run into a systematic error and it is time to call the Azure ML support team for help. Make sure in this case you provide all your local logs and your client side logs which will enable you to nail down the issue in the fastest possible way.

Conclusion

I hope with these tips you will be able to discover new pieces of logging information of your cloud architecture. Hopefully, you will never need them, but that they will help you out in the unfortunate case that you run into a crash of your cloud application.

The Data Instructors

Sunday, 20 November 2016

How to get started with development in the cloud for free: tools for blob storage development

Sunday, 6 November 2016

Efficient debugging of cloud applications using an Azure case study

Web service from Azure Machine ML

Feeding the web service from blob storage

Have access to all the evidence

Know which logs are available and how you can acquire them

Turn on client side logging in Azure ML

Get familiar with the log file structure

Save your local logs

Save your local logs

Ask an example of a cloud side log in a normal case

Is it a missing data issue, a data formatting issue or something else?

Cloud side error

Conclusion