How to tune long running Queries on Azure Synapse DW (Dedicated Pool)

Its took little long time to derive our approach tunning on azure data factory with Azure Synapse why because we need to run the system atleast two months validate our approach is smooth

Yes its all running good as expected performance on the ETL loads and Processes

Here are the major things we need to take care on Azure Synapse Dedicated Pool ,

Dedicated SQL Pool Scaling
Dedicated SQL Pool Workload Management

Dedicated SQL Pool Scaling :

We have decide below metrices to optimize the Azure Synapse

Data Load and Processing timelines
Critical Process running timelines
Minimal Processing timelines
Reporting and Visualtions Timelines

Based on above requirements , we can easily scalable the Dedicated SQL Pool based on the load

Consider we need to load the data and process with in 2 hours , we can upscale the Dedicated SQL Pool to as like we want

Key Part to increase the DW is concurrency , Instance Capacity and allocated resource for each queries

Before start our critical process , we can automate upscale process with ADF Pipelines itself . So many blogs available to configure that

Best method configure the authentication method with service principle ,

Dedicated SQL Pool Workload Management :

We have decide below metrices to prepare workload management

Short Queries
Critical Queries
High Intensive Queries
Reporting Queries

Based on the classifications , we have to split workload group for above queries

Step 1 :

We need to create login and user for workload management in Dedicated SQL Pool

--CREATE LOGIN [Analyticsprocess] WITH PASSWORD='xxxxxxxxx'
--CREATE USER [Analyticsprocess] FOR LOGIN [Analyticsprocess]
--GRANT CONTROL ON DATABASE::[sql-datablogs-dw] TO
Analyticsprocess

Step 2 :

Consider you have upscaled instance into DW400c below are the resources allocation for the DW400c instance concurrency requirements

In the workload group --> New workload group --> Click ELT

Consider analytics process user is used for high intensive queries we have to allocate as much as minimum resource for workload group

Click Classifiers --> Add Classifiers --> Name it as ELT --> and specify Member should be the login and Label is important to mention

Once click Add , we will get below concurrency range based on DW400c

By Default , its having system level workload group to handle the queries but its not effective we have to force our workload group

Step 3 :

This is very important to utilize the workload group properly . We need to specify Label on the heavy processing queries so that it will utilized properly

CREATE TABLE rpt_datablogs_finalreport.smgreport WITH (HEAP,DISTRIBUTION = REPLICATE) AS select * into rpt_datablogs_finalreport.smgreport_vw from rpt_datablogs_finalreport.vw_smgreport OPTION (LABEL='highintensiveprocess')

So these two things given much improvement on performance as well as reduced operational cost below 60%

Still few more things to learn to optimize it on Azure Data Factory lets see on that next blog

Happy Troubleshooting !!!

datablogs

Data will talk to you if you're willing to listen !!!

Categories

Friday, February 17, 2023

How to tune long running Queries on Azure Synapse DW (Dedicated Pool) - Part 2

0 comments:

Post a Comment