Monday, 6 March 2017

How to build an architecture for your cloud data application

Introduction
But how do you actually start with building data applications in the cloud? Which known technologies can you use? What are new technologies? How do all these technologies work together? What will the cost be of such a solution? 
We will try to answer these questions in this series of blog posts. We will show how two people, Dana and Frank (see picture below) will start building the next Unicorn using cloud based data technology. We will first present their product, next introduce Dana and Frank and finally explain how they started building their architecture.
The next blogs in the series can be found here

The Usbourne first book of the Computer, Usbourn Publishing Ltd, 1985


The game changing idea
Together they are building a SaaS company that makes online games with data science. Their first game that they are working on is called What's the title?. They show automatically generated data graphs of classic novels. The user needs to guess the title. A real game changer. Below you can see their first game and you can also check out the link to their website.



Meet the team
For now, the team consists of only two people, but they hope to grow fast in the near future. Therefore they both agreed that they will need to acquire extra skill sets besides their current skills to make the full application. Their focus initially is to get all the components connected to have a global overview and see how the different pieces are working together. Later on they can then elaborate further on the different components and move on to using more advanced technologies.

Dana Simpsons, the Data Scientist

Dana loves everything about mathematics and data and she has a decent background in computer science. Her favourite programming languages are R and Python. Finally she also has some SQL and NoSQL database skills. As part of her tasks she will also watch the cloud costs and look for potential savings. She will also be responsible for customer success and will analyze the usage patterns.

Frank Steve Davidson, the Full Stack Developer

Frank loves everything about programming and has a decent background in mathematics. His favorite programming languages are Java and Ruby. He will be responsible for the architecture design and will make sure that the solution gets deployed in the cloud.

The generic architecture
Dana and Frank did some brainstorming meetings and came up with the following architecture. With this architecture they want to achieve the three goals explained below.




Goal 1

Their game will be hosted in the cloud. This means that both their development and their production servers will live in the cloud together with the data storage. This means that they will need to look at how virtual machines are being set up by their chosen cloud provider and what kind of cloud storage solutions there are available. Because they are hosting a website, they also still will need to investigate this further.

Goal 2
Frank has chosen to work with Ruby as programming language to build out the back-end further. Some responsibilities of the back-end will be the storing of different novels in the cloud in a data lake and transforming these novels into a data format consumable by the web services that Dana is building. He will extract the images that Dana generates and will store them back in the cloud. For now they only will use a simple front-end because the data solution will already provide detailed images. In the future they will look further into more advanced front-end technology.

Goal 3
Dana will need data science related software to be able to make her analysis and generate the graphs. She also will need to have an easy way to access different data sources when she is performing her analysis. She prefers to have access to python and R. Finally she wants to have an easy way to expose her graphs and analysis to Frank. When she is happy with her data science solutions she wants to build an automation layer on top of this.

The architecture in Azure
Dana and Frank compared the different cloud providers and they came to the conclusion that Azure suited their needs best. The architecture in Azure is shown below. In what follows we will explain these components more. In the next blogs you will learn how to setup and use these components.




Goal 1: How to deploy and develop



Frank uses a Linux Virtual Machine in Azure on which he has installed Ruby. He next uses a virtual machine to host the web server Apache2.  To save costs he will put his Ruby development machine on auto-shut down mode. To be able to host this virtual machine as a web server he will need to select a public ip address and allow http access on the server. We go into more detail in these steps in the next blog post of this series.

Goal 2: How to talk to and feed the data storage solution



Frank and Dana use blob storage as their cloud storage solution. You can imagine Blob storage as your own big drive in the cloud. When you are developing you can use several APIs to access blob storage. Frank will use the azure gem for Ruby. Dana will analyze the costs that are involved with using blob storage. We go into more detail in this in the third blog post of this series.


Goal 3: How to talk with and build the Data Science Solution




Dana used Azure Machine Learning Studio to build an experiment that generates the needed graphs. Next she deployed this experiment as a web service. This web service consumes the formatted text of a classic novel that is stored in Blob storage and then produces six different graphs. These graphs are then stored in Blob storage. She will describe her strategy in the fourth blog post. Next Frank will describe in the last blog post how he feeds and consumes the data from this web service.

Conclusion 
In this blog post we have introduced two people, Dana and Frank, on their journey to build their SaaS company in the cloud. We briefly mentioned some of the technologies that they needed. In the next blog posts we will explain these technologies more in depth so that you will be able to use them for your own application.


5 comments:

  1. This kind of is interesting for me when I see this amazing site for getting the best data scientist for my big data handling, so when I see the work I quickly completely satisfied with their work, so when I require the solution for my data handling however always choose the best data scientist from you could check here www.activewizards.com .

    ReplyDelete
  2. This is an awesome post. Really very informative and creative contents. This concept is a good way to enhance knowledge. I like it and help me to development very well. Thank you for this brief explanation and very nice information. Well, got good knowledge.
    WordPress website development Chennai

    ReplyDelete