datablogs

Friday, February 17, 2023

How to tune long running Queries on Azure Synapse DW (Dedicated Pool) - Part 2

Selva | February 17, 2023 | No comments | Azure Cloud, Azure Synapse Analytics, Database Administration, Performance Tunning, SQL Server

Its took little long time to derive our approach tunning on azure data factory with Azure Synapse why because we need to run the system atleast two months validate our approach is smooth

Yes its all running good as expected performance on the ETL loads and Processes

Here are the major things we need to take care on Azure Synapse Dedicated Pool ,

Dedicated SQL Pool Scaling
Dedicated SQL Pool Workload Management

Dedicated SQL Pool Scaling :

We have decide below metrices to optimize the Azure Synapse

Data Load and Processing timelines
Critical Process running timelines
Minimal Processing timelines
Reporting and Visualtions Timelines

Based on above requirements , we can easily scalable the Dedicated SQL Pool based on the load

Consider we need to load the data and process with in 2 hours , we can upscale the Dedicated SQL Pool to as like we want

Key Part to increase the DW is concurrency , Instance Capacity and allocated resource for each queries

Before start our critical process , we can automate upscale process with ADF Pipelines itself . So many blogs available to configure that

Best method configure the authentication method with service principle ,

Dedicated SQL Pool Workload Management :

We have decide below metrices to prepare workload management

Short Queries
Critical Queries
High Intensive Queries
Reporting Queries

Based on the classifications , we have to split workload group for above queries

Step 1 :

We need to create login and user for workload management in Dedicated SQL Pool

--CREATE LOGIN [Analyticsprocess] WITH PASSWORD='xxxxxxxxx'
--CREATE USER [Analyticsprocess] FOR LOGIN [Analyticsprocess]
--GRANT CONTROL ON DATABASE::[sql-datablogs-dw] TO
Analyticsprocess

Step 2 :

Consider you have upscaled instance into DW400c below are the resources allocation for the DW400c instance concurrency requirements

In the workload group --> New workload group --> Click ELT

Consider analytics process user is used for high intensive queries we have to allocate as much as minimum resource for workload group

Click Classifiers --> Add Classifiers --> Name it as ELT --> and specify Member should be the login and Label is important to mention

Once click Add , we will get below concurrency range based on DW400c

By Default , its having system level workload group to handle the queries but its not effective we have to force our workload group

Step 3 :

This is very important to utilize the workload group properly . We need to specify Label on the heavy processing queries so that it will utilized properly

CREATE TABLE rpt_datablogs_finalreport.smgreport WITH (HEAP,DISTRIBUTION = REPLICATE) AS select * into rpt_datablogs_finalreport.smgreport_vw from rpt_datablogs_finalreport.vw_smgreport OPTION (LABEL='highintensiveprocess')

So these two things given much improvement on performance as well as reduced operational cost below 60%

Still few more things to learn to optimize it on Azure Data Factory lets see on that next blog

Happy Troubleshooting !!!

Monday, January 30, 2023

Update Backup and Restore in Same SQL Server RDS

Selva | January 30, 2023 | No comments | Amazon RDS, Database Administration, SQL Server

Update of Previous post ,

As mentioned in below blog , past years we cant restore database into same SQL Server RDS

https://medium.com/@yogitha.o/backup-and-restore-in-same-sql-server-rds-3e667622f7ff

But now you can able to restore without any issues from recent releases of SQL Server RDS

Thursday, December 15, 2022

Cross region RDS replication with customized parameter group

Selva | December 15, 2022 | No comments | Amazon RDS, AWS, Configuration, MySQL

Replicate with customized parameter group ..

https://medium.com/@datablogs/cross-region-rds-replication-with-customized-parameter-group-fc29bf991957

ORA-01940 cannot drop a user that is currently connected

Selva | December 15, 2022 | No comments | Amazon RDS, AWS, Database Administration, Oracle 19C, Restore

Its Easy to achieve it in easy method ,

https://datablogs.medium.com/ora-01940-cannot-drop-a-user-that-is-currently-connected-and-kill-inactive-sessions-in-oracle-c009fbbbc726

Tuesday, December 6, 2022

How do I troubleshoot the AWS Glue error "VPC S3 endpoint validation failed for SubnetId"?

Selva | December 06, 2022 | No comments | AWS, AWS Glue, Migration, Troubleshooting

If you are newbie to AWS Glue its really difficult to run the Crawlers without these failures , Below are basic steps you need to make sure done before running the Crawler

AWS IAM Role and Privileges
S3 Endpoint

1.AWS IAM Role and Policies

We have to atleast attach below policies in IAM Role

2.S3 Endpoint

Most of the time you will get below error ,

VPC S3 endpoint validation failed for SubnetId: subnet-0e0b8e2ad1b85d036. VPC: vpc-01bdc81e45566a823. Reason: Could not find S3 endpoint or NAT gateway for SubnetId: subnet-0e0b8e2ad1b85d036 in Vpc vpc-01bdc81e45566a823 (Service: AWSGlueJobExecutor; Status Code: 400; Error Code: InvalidInputException; Request ID: 0495f22f-a7b5-4f74-8691-7de8a6a47b42; Proxy: null)

To fix this error , you need understand the issue first . Its saying

Create Endpoints for S3 not for Glue , Some worst cases people create Nat Gateway and they loss huge money for simple thing . So you have to create S3 Endpoint Gateway as like below ,

Once we created your job will run like flight :)

Migrate and Sync RDS PostgreSQL to Amazon Redshift Serverless using AWS Glue

Selva | December 06, 2022 | 4 comments | Amazon RDS, Amazon Redshift Serverless, AWS, Migration, Zero Downtime

Always customer prefers cost less solutions to run the business . To help them their business and the requirements we also needs to provide efficient solutions

Some cases cloud vendors provides good solutions for analytics load but cost will be very high , most of the time we don't want to recommend that but we need to do

Like that one of the solution in AWS , its cost much but works much faster like anything

We are talking about Amazon Redshift Solutions only , So recently they have launched Amazon Redshift Serverless solutions for few regions .

Whatever new comes , before customer catches the features we need to find and deliver the best approach to them . So ,

What is Amazon Redshift Serverless ?

Amazon Redshift Serverless automatically provisions data warehouse capacity and intelligently scales the underlying resources. Amazon Redshift Serverless adjusts capacity in seconds to deliver consistently high performance and simplified operations for even the most demanding and volatile workloads.

With Amazon Redshift Serverless, you can benefit from the following features:

Access and analyze data without the need to set up, tune, and manage Amazon Redshift provisioned clusters
Use the superior Amazon Redshift SQL capabilities, industry-leading performance, and data-lake integration to seamlessly query across a data warehouse, a data lake, and operational data sources
Deliver consistently high performance and simplified operations for the most demanding and volatile workloads with intelligent and automatic scaling
Use workgroups and namespaces to organize compute resources and data with granular cost controls
Pay only when the data warehouse is in use

So , Overall no need of human interventions in the Redshift Serverless

Everything is Fine , How to we migrate and sync Amazon RDS / EC2 Postgres / Aurora Postgres to utilize this Redshift Serverless

What are the options available to migrate and Sync ?

DMS - Still Redshift Target is not available to migrate the Data
Export/Import - Yes we can perform , how to handle zero downtime migration . Syncing real-time data is not possible
AWS Glue - Its Good Option , We can migrate and Sync real-time data from RDS to Redshift Serverless

Lets start sample data migrate and sync into Amazon Redshift Serverless ,

Environment Setup Go Through ,