Friday, October 25, 2019

Read only replicas of Azure managed instances and Linked Servers

Hello Readers!

Welcome to my blog once again!
The below blog is to summarize the experience I had with the read only replicas and using linked servers to connect to it for read only workload .
Before we begin, if you have no clue about the read only replicas, I recommend you read my blog post on Azure managed instances high Availability options here for the detailed description.

A few months back, I was involved in finalizing the migration plan for a reporting system which basically process the OLTP or Live transactional data with a set of T-SQL procedures called from SSIS packages.In the On-premise setup , they used to have a separate SQL server to which the transactional replication is enabled from the Actual Prod OLTP server. So the system works with the replicated subset of databases and process it locally.We had plans to use the business critical instances for the Prod OLTP instances and a general purpose instance fir the report server. We wanted to explore the feasibility of using the Read only replicas of the Prod instances without having any impact on performance,  for building a cost effective plan for the migration . There is another reason why wanted to explore this. The data in the OLTP was actually very near the retention period planned (which is 7 years) and future growth was kind of capped and predictable. If the  read only replica plan  could match the performance of the On-premise setup, we could save some cost in hosting replicated copies of the databases in the reporting DB instance. 

The plan was to create a linked server in the reporting instance and make it connect to the Read only replica , so procedures in the reporting database could access the data from the Read only replica through the linked server.

So I have created a  linked server for my POC with the below query.

EXEC sp_addlinkedserver @server=N'Prodcopy', --Change the name as you need
@srvproduct=N'',
@provider=N'SQLNCLI',
@provstr = N'ApplicationIntent=ReadOnly',
@datasrc= '<ProdInstance PrivateEndpoint address>';

The next step was to update the procedures kept in the report database , so it would not access the OLTP databases' data locally but use the remote database through the linked server. This part was quite simple.I could convert the existing three parts query to a 4 parts one (Linked server.Database.schema.object)

1.On top of the procedure(which is written on dynamic SQL), i have introduced a variable named  @link

Declare @link nvarchar(15)
Set @link='Prodcopy'

2. Added this part   '+@link+'.'+ infront of every object references in the procedures.
3. Removed (nolock) query hint as 4 parts query dont work with it. 

I have edited this in all 12 procedures which were working with retrieving data from prod tables containing data like orders, quotes, invoices etc, and join them with local tables in the reporting databases.


I could test one or two procedures and could see that its working fine. 
But I could also observe  that the procedures' performance got degraded drastically. I tried applying workarounds like the below.

1. Recording the table names and the columns being referred from these table names in an excel sheet

2.Create Views in the Prod databases for these tables (with all column names used in the procedures).

3.Replacing the four parts query with Openquery.


4 parts query:

Select column1,column2 from Prodcopy.DB.dbo.quotes where orderid=23432

OpenQuery :

(SELECT column1,column2 FROM openquery (Prodcopy, 'SELECT column1,column2 FROM DB.dbo.quotes WHERE orderid =23432');

By doing this I could see that 11 out of the 12 procedures performance got improved a lot, even better than the on-premise response time. One procedures which rather had a big list of tables and joins, couldnt improve much. The procedure was as big as to be written in five or six  A4 size papers.

Eventually I had to drop the plan (because of this one procedure which is huge), and go for transactional replicated databases for processing the data locally. But I learned a thing or two.


I was even aware earlier that Linked server often get excluded from new gen code practices because of performance issues. Yet we use it for reporting purposes. I have seen many SQL reporting setups which uses linked servers 


Conclusion: 

1. Linked server performance when we are joining local tables and remote ones are a bit bad
2.You can workaround by converting the tables to views and using openquery to access them in your t-sql/dynamic SQL scripts.
3.Openquery and 4 parts query execution plans are entirely different . One can confirm this from their queries'  execution plan in SSMS.


You should independently do a performance comparison to decide if you can adopt it or not based on your own experience.I hope to have a setup planned like this in one of my future assignments.

Thanks for reading !
























No comments:

Post a Comment