Posted on

Data Warehouse as a Service powered by Azure

If 1000 queries are thrown, then there is a slight chance that 20 queries might fail to execute ( SQL Data Warehouse will be in preview, it will grow in the near future and so will the reliability ).

How secure SQL Data Warehouse?
As we know security is a two way process, the users who is using SQL Data Warehouse should secure his/her laptop. DWaaS is the first enterprise class cloud Data Warehouse, which can grow or shrink. The job of Compute Node is to serve as power for the service and underneath data it is loaded in SQL Data warehouse, which is distributed across the node of service.

Storage: The storage media used for SQL Data Warehouse Blobs. For security reason Azure is providing some measures like:

Connection Security: It’s one of the measures where we can set firewall rules and connection encryption. Before starting any business, you might think how to equip your data, how to maintain and manipulate your data (as it will be there in abundance). At present SQL Data Warehouse support SQL Authentication with username and password.

Authorization: It refers to what we can do with SQL Data Warehouse database, which will allow user to do anything in database. You might go and ask some IT expert what to do, how to do and a bunch of other questions. The IT guy may suggest you to build your own Data Warehouse, to which you will ask if it is cost effective or not, what’s the upfront cost, what will be the maintenance charges, etc.

For this reason Azure has comes up with a solution called SQL Data Warehouse as a Service (DWaaS). The best practice will be to limit the access to the user.

Encryption: Azure SQL Data Warehouse provides “Transparent Data Encryption” to secure our data when it is at rest or stored in database files and backups. Suppose if the traffic is high in day time, you can add any number of machines you want and when the traffic is low, you can remove number of machines.

Reason 3: The reliability of SQL Data Warehouse is estimated to be 98% i.e. It offers full SQL server experience in cloud, which customers expect. Firewall rules will be applied for both server and database. The best part for this is, when user will interact with data, it will directly fetch from Blobs ( The blob storage is one of the best storage options in Azure when the data is enormous amount ).

Difference between on-premises Data Warehouse and Cloud Data Warehouse as a Service
On-Premises Data Warehouse
Cloud Data Warehouse as a Service
Reliability
Reliable, but not much More reliable
Scalability
Not scalable More scalable
Speed
Faster, but may fail in any point of time (in care of hardware failure) Faster and will up always (SLA of 98% is provided)
Deployment
It will take time Within no time it will deploy
Cost effectiveness
Lots of capital required to setup on-premises data warehouse as a service You have to pay what you use

Why should we go for SQL Data Warehouse as a Service in Azure?
Reason 1: It can handle and scale petabytes of data and is highly reliable for all data warehouse operations.

Reason 2: You can scale up and scale down the services as per your requirement. DWaaS is one amazing solution for the organizations, which are just starting or in a process to start. If the password is key logged by Man in the Middle attack, then the cloud provider will not be able to do anything. Until and unless any IP, which is whitelisted can’t enter into database. We can also set server-level firewall using PowerShell (this can be done in Azure Classic Portal).

Authentication: It refers to how you prove your identity when user enters and getting connected to database. Here TDE provides file level encryption.

Stay tuned for more blogs on Azure and if you have any queries or comments please feel free to post. The organization should not worry about spending the upfront cost and maintaining the hardware or software resources they buy.

Architecture of SQL Data Warehouse
For users, it’s like sending data to a database, but underneath SQL Data Warehouse runs “Massive Parallel Processing (MPP) Engine”, which helps in dividing the query send by user to Control Node.

Control Node: When a command is passed to Control Node, it breaks down the query for faster computing into set of pieces and passes on to other nodes of the service.

Compute Node: Like Control Node, Compute Node is powered by SQL Databases.

Posted on

Performing Python Data Science on Azure

The web app offers support for Python, Julia and Ruby.

jupyter
Figure 2: Jupyter is an open source web based tool for interactive Data Science and Machine Learning

Each Jupyter Notebook consists of a back-end kernel known as IPython. Additional packages can also be installed through the Jupyter front-end interface with the use of magic commands. However, over time I realized that IDEs save a lot of time and its more about efficiency rather than practice. Compile and run

When I started performing Data Science and Machine Learning, this process proved too cumbersome. Azure Notebooks are Jupyter Notebooks that are hosted on the Cloud using Azure virtual machines.

Azure Notebook
Figure 4: Azure Notebooks are Jupyter Notebooks that are hosted on Azure

Access and use of Azure Notebooks is completely free and you need not possess an Azure account to access these resources. As such, a custom kernel can be created based on one’s requirements.

During the initial stages of my foray into Data Science, I realized that I needed separate environments for different tasks. I could set up TensorFlow and all its dependencies in one kernel and data visualization in another.

IPython
Figure 3: IPython forms the backbone of a Python Notebook

One of the biggest drawbacks of using Jupyter Notebooks for Python development or Data Science is that it is quite resource intensive. However, this drawback can be overcome with the help of Azure Notebooks. Jupyter Notebook is an open-source web app for interactive coding. Apart from the cost benefit, Azure Notebooks come pre-installed with a variety of different packages that aid in Data Science. Microsoft has also loaded Azure Notebooks with a plethora of study and practice material in form of notebooks that enables users to approach Data Science and programming in an interactive manner. I have always been a proponent of text editors over IDEs. The availability of multiple kernels was a boon. I found myself constantly going through the sample notebooks to find new and interesting ways to use Jupyter and Azure.

Jupyter Notebooks can also be imported from fellow Azure Notebook users or from GitHub ensuring that one has the fastest possible method to pull resources. The kernel analyses and runs your code. Let us know in the comments section.

To know about our service offerings, kindly visit www.cloudthat.in (for training services) and www.cloudthat.com (for consulting services). Although this system has proved to be reliable, modern applications of programming possess alternate requirements. The Azure Notebooks service is deployed when one attempts to visualize data or make Python code specific changes in Azure Machine Learning Studio. This enables Data Scientists to work dynamically with data and numbers. One such tool for Python is Jupyter. For example, when I needed to visualize a dataset in Python, I would have to save the graph locally on my system, close the program and view the graph. The packages that comes installed include the entire Anaconda stack as well as numerous Microsoft specific modules such as CNTK and Azure ARM.

Azure Notebook
Figure 5: Azure notebooks come with a wide range of packages pre-installed

Another major benefit of using Azure Notebooks is that the tool does not exclusively feature Python. Open the terminal
2. Veterans in the interactive Data Science sphere will feel right at home on Azure Notebooks.

What do you use for Data Science? We would love to hear about the new technologies or tools that you use. It also has support for R and F#. This is partly since I originally started coding with text editors and text editors ensured that I did not constantly look for auto-correct to help me. In the past, programming has followed a general cycle involving write, compile, check and execute. Soon, I realized that this did not align with my philosophy of efficiency.

Figure 1: Conventional programming techniques employed text editors such as Notepad++ or IDEs such as PyCharm

When it comes to Data Science, real-time interaction with the data is paramount. With Jupyter, multiple kernels are supported. The terminal for the machine hosting the Jupyter Notebook can also be accessed.

Azure Notebook Library
Figure 6: “Libraries” can be easily shared via GitHub or within Azure Notebooks