Showing posts with label Integration Runtimes. Show all posts
Showing posts with label Integration Runtimes. Show all posts

Sunday, September 15, 2019

Introduction to Azure Data factory and its components


Hello Readers!

The below blog content is about the PaaS offering by Azure for the ETL process, called Azure data factory. I intend to introduce the components that one should know in ADF ,before migrating your SSIS packages into Azure. 

Azure Data Factories (ADF)

ADF is the Azure data integration is the Platform or the Service that will help us to perform data movement, transformation and package executions either by its own compute components or with the help of other services available in Azure. Just like the SSIS in your on-premise , it has been equipped with a variety of tools to make the ETL process easy and code-free.


Integration RunTime

Integration runtime is the component of ADF that does the job of moving data or executing a package etc. We can assign compute resources to it and make it join our VNet and hence can be made connected to the resources in both on-premise and within the Azure Vnet based on the network configuration of the VNet.


Types of IR

1.Azure Integration Runtime  : 
 This comes default when we create a data factory instance and it essentially helps to run data flow within azure, copy data to and fro two cloud data stores, and dispatch transformation tasks to other tools like Data Bricks and HD insights. 
2. Self Hosted Integration Runtime :
This can be downloaded and installed in our private machines and can be used to perform data integration activities securely in a private network (on-premise) with the azure data factory looking after the execution through the pipeline defenitions. This can also dispatch transformation tasks to the other tools like Data Bricks and HD insights.
3. SSIS Integration Runtime
This is what is being used to migrate and execute SSIS packages in Azure. We would learn about in the blog content for the migration and execution Demos later. 

Data factory Pipeline:

Pipeline are basically grouping activities (also called pipeline activities) for a specific task. You can have one more activity within a pipeline and can be made connected different data sources by means of linked services. 
Linked services :
These are like connection manager entries we have in SSIS and it helps to connect data factory to the data stores. 
Triggers : 
We can set the schedules to execute the pipelines in Triggers and can disable/Enable them as we like. 

Hope the above explanations are simple enough. Its worthy to note that Data factory doesn't do transformation tasks on its own and depend on other tools for it. But that should not bother you if you have the transformation tasks defined in SSIS package itself and you only need to execute it. The first two integration Runtimes , i have defined above can made use of , to dispatch the transformation tasks other than the ones in the SSIS packages to a variety of tools that are available in Azure itself .