Friday, February 17, 2023
Selva |
February 17, 2023 |
No comments
|
Azure Cloud, Azure Synapse Analytics, Database Administration, Performance Tunning, SQL Server
Its took little long time to derive our approach tunning on azure data factory with Azure Synapse why because we need to run the system atleast two months validate our approach is smooth
Yes its all running good as expected performance on the ETL loads and Processes
Here are the major things we need to take care on Azure Synapse Dedicated Pool ,
- Dedicated SQL Pool Scaling
- Dedicated SQL Pool Workload Management
Dedicated SQL Pool Scaling :
We have decide below metrices to optimize the Azure Synapse
- Data Load and Processing timelines
- Critical Process running timelines
- Minimal Processing timelines
- Reporting and Visualtions Timelines
Based on above requirements , we can easily scalable the Dedicated SQL Pool based on the load
Consider we need to load the data and process with in 2 hours , we can upscale the Dedicated SQL Pool to as like we want
Key Part to increase the DW is concurrency , Instance Capacity and allocated resource for each queries
Before start our critical process , we can automate upscale process with ADF Pipelines itself . So many blogs available to configure that
Best method configure the authentication method with service principle ,
Dedicated SQL Pool Workload Management :
We have decide below metrices to prepare workload management
- Short Queries
- Critical Queries
- High Intensive Queries
- Reporting Queries
Based on the classifications , we have to split workload group for above queries
Step 1 :
We need to create login and user for workload management in Dedicated SQL Pool
--CREATE LOGIN [Analyticsprocess] WITH PASSWORD='xxxxxxxxx'
--CREATE USER [Analyticsprocess] FOR LOGIN [Analyticsprocess]
--GRANT CONTROL ON DATABASE::[sql-datablogs-dw] TO
Analyticsprocess
Step 2 :
Consider you have upscaled instance into DW400c below are the resources allocation for the DW400c instance concurrency requirements
In the workload group --> New workload group --> Click ELT
Consider analytics process user is used for high intensive queries we have to allocate as much as minimum resource for workload group
Click Classifiers --> Add Classifiers --> Name it as ELT --> and specify Member should be the login and Label is important to mention
Once click Add , we will get below concurrency range based on DW400c
By Default , its having system level workload group to handle the queries but its not effective we have to force our workload group
Step 3 :
This is very important to utilize the workload group properly . We need to specify Label on the heavy processing queries so that it will utilized properly
CREATE TABLE rpt_datablogs_finalreport.smgreport WITH (HEAP,DISTRIBUTION = REPLICATE) AS select * into rpt_datablogs_finalreport.smgreport_vw from rpt_datablogs_finalreport.vw_smgreport OPTION (LABEL='highintensiveprocess')
So these two things given much improvement on performance as well as reduced operational cost below 60%
Still few more things to learn to optimize it on Azure Data Factory lets see on that next blog
Happy Troubleshooting !!!
Monday, January 30, 2023
Update of Previous post ,
As mentioned in below blog , past years we cant restore database into same SQL Server RDS
https://medium.com/@yogitha.o/backup-and-restore-in-same-sql-server-rds-3e667622f7ff
But now you can able to restore without any issues from recent releases of SQL Server RDS
Thursday, December 15, 2022
Selva |
December 15, 2022 |
No comments
|
Amazon RDS, AWS, Database Administration, Oracle 19C, Restore
Its Easy to achieve it in easy method ,
Tuesday, December 6, 2022
If you are newbie to AWS Glue its really difficult to run the Crawlers without these failures , Below are basic steps you need to make sure done before running the Crawler
- AWS IAM Role and Privileges
- S3 Endpoint
1.AWS IAM Role and Policies
We have to atleast attach below policies in IAM Role
2.S3 Endpoint
Most of the time you will get below error ,
VPC S3 endpoint validation failed for SubnetId: subnet-0e0b8e2ad1b85d036. VPC: vpc-01bdc81e45566a823. Reason: Could not find S3 endpoint or NAT gateway for SubnetId: subnet-0e0b8e2ad1b85d036 in Vpc vpc-01bdc81e45566a823 (Service: AWSGlueJobExecutor; Status Code: 400; Error Code: InvalidInputException; Request ID: 0495f22f-a7b5-4f74-8691-7de8a6a47b42; Proxy: null)
To fix this error , you need understand the issue first . Its saying
Create Endpoints for S3 not for Glue , Some worst cases people create Nat Gateway and they loss huge money for simple thing . So you have to create S3 Endpoint Gateway as like below ,
Once we created your job will run like flight :)