datablogs: SQL Server
Showing posts with label SQL Server. Show all posts
Showing posts with label SQL Server. Show all posts

Sunday, August 28, 2022

How to tune long running Queries on Azure Synapse DW (Dedicated Pool) - Part 1

Three hours of long running pipeline is reduced to run in 35 Minutes when we given right kind of top bottom optimization like our Body !!! 

Yes Sometimes if we give proper core and cross workouts you can run 10km in 55 Minutes !!! 

Its happened to me But Not Now 😆😆😆 

Oh Oh we are away from Data ground !!! Lets Move into Azure Synapse tunning,

In the starting phase every piece of code and pipeline process was really trouble us to optimize it , whatever we do on tunning it will run as same time 

Waited long time until each process to complete as its running 3 hours !!!

But Reecha blog helped something to start with basic check and given some hope to fine tune it 

Remember below piece of monitoring code always help us to too dig more 

--Check Long Running Query in Azure Synapse

SELECT *
FROM sys.dm_pdw_exec_requests
WHERE status not in ('Completed','Failed','Cancelled')
  AND session_id <> session_id()
ORDER BY submit_time DESC;
--Find the distributed query plan steps for long running query

SELECT * FROM sys.dm_pdw_request_steps WHERE request_id = 'xxxxxx'  ORDER BY step_index; 

***Replace your request_id from first query result  
Sample Running Query :


Initially we thought data volume is huge , so we are expecting this much delay on complete . But distributed query Plans are given more finishing time for each queries in Azure Synapse

We will go step by step process of tunning , 

Multiple ways we can see the Execution plan of your Azure Synapse Queries

  1. Using Azure Console --> Click you Dedicated Pool --> Click Query Activity --> Each Queries you have Query Plans 

        2.Using SSMS , you can run below query and get your explain plan in XML Format 

                EXPLAIN select * from dbo_datablogs.vw_fact_transferdata (nolock)



So ,We have figured out the execution plan and further need to fix below things to make it faster 

We need to reduce data movement operations ( Shuffle Move , Trim Move , Partition Move ) and needs to have proper indexing on your business tables 

Based on your data distribution on tables and indexing , queries will be executed on SQL Pool . We need to distribute the tables accordingly . So Major part of the tunning on table level only 

Once we done that in proper way we have achieved your milestone ,

How to we distribute table in Azure Synapse ? 

Two types of distribution is available in Azure Synapse . Round Robin and Hash Distribution 

Round Robin Distribution
  • By Default , If you create table it will create in Round Robin Distribution
  • Table rows are distributed in all distribution 
  • Main purpose of this table to improve loading speed we can use round robin distribution
  • If your table is just junk or doesn't have proper any constraints and keys use round robin distribution
CREATE TABLE [dbo_datablogs].[ControlTable_blogs](
                [Id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY NONCLUSTERED NOT ENFORCED,
                [SourceObjectSettings] [nvarchar](max) NULL) 
WITH ( CLUSTERED COLUMNSTORE INDEX,  DISTRIBUTION = ROUND_ROBIN )

Hash Distribution
  • If your table is having frequent DML operations ( Insert , Update , Delete ) use hash distribution
  • It will increase performance 200% of your queries 
  • It will distribute the data to nodes based on your distributed column 
CREATE TABLE [dbo_datablogs].[ControlTable_blogs](
                [Id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY NONCLUSTERED NOT ENFORCED,
                [SourceObjectSettings] [nvarchar](max) NULL) 
WITH ( HEAP,  DISTRIBUTION = HASH(Id))

Replicate 

This is another way to storing the table in SQL Pool 
  • Full copy of the table will be available in all distribution to avoid data movement 
  • Table is having less size use replicate option
CREATE TABLE [dbo_datablogs].[ControlTable_blogs](
                [Id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY NONCLUSTERED NOT ENFORCED,
                [SourceObjectSettings] [nvarchar](max) NULL) 
WITH ( HEAP,  DISTRIBUTION = REPLICATE )

Thumb of Rules of Creating tables 

First Level : Find the absolute unique column or else use partial unique column for creating table as Hash Distribution 

Second Level : If your table doesn't have proper unique columns , try to create the table with used columns in Join , Group By , Distinct , Over and Having Clauses on Procedures or Select Clauses 

Lets examine First Level , 

When creating the table with Round Robin and created table with below script ,

CREATE TABLE dbt_datablogs.fact_transferdata
WITH
  (
    CLUSTERED COLUMNSTORE INDEX,  
    DISTRIBUTION = ROUND_ROBIN   
  )  
AS SELECT * FROM dbt_datablogs.vw_fact_transferdata

Its taken 28 Minutes to complete


So we figured out good candidate key and created table with below script 

CREATE TABLE dbt_datablogs.fact_transferdata
WITH
  (
    CLUSTERED COLUMNSTORE INDEX,  
    DISTRIBUTION = HASH(tranferaddressid)
  )  
AS SELECT * FROM dbt_datablogs.vw_fact_transferdata

Its taken 4 Minutes to complete 


We will see the huge difference when creating proper keys 

Lets examine Second Level ,

When we don't have proper keys , based on your business logics or select clause join conditions we can try to evaluate with multiple columns and create proper hash function to improve performance 

Lets do the smaller table tunning on this blog , you can try the same in your own environments , 

Below table is small but its having shuffle move and its delaying the process 11 Minutes ,







Then based on shuffle columns , add the proper hash function and recreate the table 

CREATE TABLE dbt_datablogs.fact_memberinfo
WITH
  (
    Heap,  
    DISTRIBUTION = HASH(infoid)
  )  
AS SELECT * FROM dbt_datablogs.vw_memberinfo

Once created the table again examined the Execution plan for the same statement and process is finished with in a seconds 



This is not an end , Also we have tunned few things on Data Factory and Azure Synapse Workload Management to get expected performance , we will see further on next blog 

To learn more in-depth , Please use below references from Microsoft Site as well 

References : 


Thursday, August 11, 2022

SQL Server Point in Time Recovery

 Oooooouch !!! I have deleted critical data in the ledger table !!!

When we ran delete or drop Script on any database without double check , Its Really big problem for the end users !!! Also you wont sleep 😁😁😁

If you have GODDBA , you are saved millions dollars of your data also your valuable customers . 

Oh Great 😀😀😀 How we can achieve that ? 

In SQL Server Multiple Database Backup Methods are available to tackle that , 

If SQL Server database is in full recovery model we can take transaction log backups on every hour/Minute/Seconds into local or remote drives . 

So, Yes am taking Log Backup 😕😕😕 How to we restore particular time ? 

Using with transaction logs we can bring back the data as much as you want depends on your log backup strategy . 

To achieve that here is the automated point in time restore script for your valuable environment ,

Scripts are available in https://github.com/selvackp/SQLServerPointInTimeRestore.git

So finally you have saved your millions dollars worth of data in 10 Minutes !!!

USE master
GO

DECLARE 
        @DatabaseOLDName			sysname  = 'test',										
        @DatabaseNewName			sysname  = 'test1',										
        @PrimaryDataFileName		        sysname  = 'test',									
	@SecDataFileName                        sysname  =  NULL,
        @DatabaseLogFileName		        sysname  = 'test_log',								
	@PrimaryDataFileCreatePath              sysname  = 'D:\MISC\Bkp\testdata.mdf',			
        @SecDataFileCreatePath		        sysname  = NULL,									
        @SecDataFileCreatePath1		        sysname  = NULL,									
        @DatabaseLogFileCreatePath              sysname  = 'D:\MISC\Bkp\test_log.ldf',		
        @PITRDateTime				datetime = '2022-08-11T20:44:11';

DECLARE @command				nvarchar(MAX),
        @OldPhysicalPathName		        nvarchar(MAX),
        @FullBackupDateTime			datetime,
        @DiffBackupDateTime			datetime,
        @LogBackupDateTime			datetime,
        @message				nvarchar(MAX);

SET @command = N'RESTORE DATABASE @DatabaseNewName FROM DISK = @OldPhysicalPathName WITH FILE = 1, NORECOVERY, NOUNLOAD, REPLACE, STATS = 5, STOPAT = @PITRDateTime,
     MOVE N''' + @PrimaryDataFileName + N''' TO N''' + @PrimaryDataFileCreatePath + N''','
           + COALESCE('
     MOVE N''' + @SecDataFileName + ''' TO N''' + @SecDataFileCreatePath + ''',', '')
           + N'
     MOVE N''' + @DatabaseLogFileName + N''' TO N''' + @DatabaseLogFileCreatePath + N''';';

SELECT     TOP (1) @OldPhysicalPathName = bmf.physical_device_name,@FullBackupDateTime = bs.backup_start_date
FROM       msdb.dbo.backupset AS bs INNER JOIN msdb.dbo.backupmediafamily AS bmf ON  bmf.media_set_id = bs.media_set_id
WHERE      bs.database_name = @DatabaseOLDName AND  bs.type= 'D' AND  bs.backup_start_date < @PITRDateTime
ORDER BY   bs.backup_start_date DESC;

SET @message = N'Starting restore of full backup file '+ @OldPhysicalPathName + N', taken ' + CONVERT(nvarchar(30), @FullBackupDateTime, 120);

RAISERROR(@message, 0, 1) WITH NOWAIT;

EXEC sys.sp_executesql @command,
                       N'@DatabaseNewName sysname, @OldPhysicalPathName nvarchar(260), @PITRDateTime datetime',
                       @DatabaseNewName,
                       @OldPhysicalPathName,
                       @PITRDateTime;


SET @command = N'RESTORE DATABASE @DatabaseNewName FROM DISK = @OldPhysicalPathName WITH FILE = 1, NORECOVERY, NOUNLOAD, REPLACE, STATS = 5, STOPAT = @PITRDateTime;';

SELECT     TOP (1) @OldPhysicalPathName = bmf.physical_device_name,@DiffBackupDateTime = bs.backup_start_date
FROM       msdb.dbo.backupset   AS bs INNER JOIN msdb.dbo.backupmediafamily AS bmf ON  bmf.media_set_id = bs.media_set_id
WHERE      bs.database_name = @DatabaseOLDName AND  bs.type  = 'I' AND  bs.backup_start_date >= @FullBackupDateTime AND  bs.backup_start_date< @PITRDateTime
ORDER BY   bs.backup_start_date DESC;

IF @@ROWCOUNT > 0
BEGIN;
    SET @message = N'Starting restore of differential backup file ' + @OldPhysicalPathName + N', taken ' + CONVERT(nvarchar(30), @DiffBackupDateTime, 120);

    RAISERROR(@message, 0, 1) WITH NOWAIT;

EXEC sys.sp_executesql @command,
                       N'@DatabaseNewName sysname, @OldPhysicalPathName nvarchar(260), @PITRDateTime datetime',
                       @DatabaseNewName,
                       @OldPhysicalPathName,
                       @PITRDateTime;
END;

SET @command = N'RESTORE LOG @DatabaseNewName
FROM DISK = @OldPhysicalPathName
WITH FILE = 1, NORECOVERY, NOUNLOAD, REPLACE, STATS = 5, STOPAT = @PITRDateTime;';

DECLARE c CURSOR LOCAL FAST_FORWARD READ_ONLY TYPE_WARNING FOR
SELECT     bmf.physical_device_name,
           bs.backup_start_date
FROM       msdb.dbo.backupset         AS bs
INNER JOIN msdb.dbo.backupmediafamily AS bmf
   ON      bmf.media_set_id = bs.media_set_id
WHERE      bs.database_name = @DatabaseOLDName
AND        bs.type                 = 'L'
AND        bs.backup_start_date    >= COALESCE(@DiffBackupDateTime, @FullBackupDateTime)
ORDER BY   bs.backup_start_date ASC;

OPEN c;

FETCH NEXT FROM c
INTO @OldPhysicalPathName,
     @LogBackupDateTime;

WHILE @@FETCH_STATUS = 0
BEGIN;
    SET @message = N'Starting restore of log backup file '
                   + @OldPhysicalPathName + N', taken '
                   + CONVERT(nvarchar(30), @LogBackupDateTime, 120);
    RAISERROR(@message, 0, 1) WITH NOWAIT;
    EXEC sys.sp_executesql @command,
                           N'@DatabaseNewName sysname, @OldPhysicalPathName nvarchar(260), @PITRDateTime datetime',
                           @DatabaseNewName,
                           @OldPhysicalPathName,
                           @PITRDateTime;

    IF @LogBackupDateTime >= @PITRDateTime
        BREAK;

    FETCH NEXT FROM c
    INTO @OldPhysicalPathName,
         @LogBackupDateTime;
END;

CLOSE c;
DEALLOCATE c;

SET @command = N'RESTORE DATABASE @DatabaseNewName
WITH RECOVERY;';

RAISERROR('Starting recovery', 0, 1) WITH NOWAIT;
EXEC sys.sp_executesql @command,
                       N'@DatabaseNewName sysname, @OldPhysicalPathName nvarchar(260), @PITRDateTime datetime',
                       @DatabaseNewName,
                       @OldPhysicalPathName,
                       @PITRDateTime;
GO

Wednesday, March 30, 2022

Wednesday, February 16, 2022

SQL Server Always On availability group cluster in the AWS Cloud


Microsoft gives HA features like a charm . Lower to higher deployment costs its giving many features as per business requirements . Replication , Mirroring , Log shipping and Always On many features available to build HA Setup in On Premises . 

Like wise , we can setup all the above features in Cloud as well . In that we can see Always on availability group cluster in this blog 




What is Always On Availability Group?


  • An availability group supports a replicated environment for a discrete set of user databases, known as availability databases. 
  • You can create an availability group for high availability (HA) or for read-scale. An HA availability group is a group of databases that fail over together.  


Environment Setup for Always on Availability Group ,


Launched one Active Directory and two SQL Nodes with below range . Detailed setup for environment steps are below ,


Below are the detailed steps for environment steps ,
Step 1 : Create ag-sql-vpc with 10.0.0.0/16 IPv4 CIDR range


Step 2 : Create two private subnets ( 10.0.3.0/24 , 10.0.1.0/24 ) for SQL Nodes and one public subnet ( 10.0.4.0/24 ) for Active Directory  


Step 3 : Launched the windows instances with two secondary ip's for Failover Cluster and Always on Listener
In this POC Setup , Launched windows instance and installed SQL Server Developer edition . Also we can launch Windows with SQL Server 2016 based on your requirements 







Step 4 : Change the computer properties and rename the instance names accordingly 
Step 5 : Completed the AD Server configuration and its named as ag-sql-AD , After that change DNS server address in network properties in ag-sql-node1 and ag-sql-node2 ( 10.0.4.33 is static IP of AD Server )



Step 6 : Once modified the DNS configuration reboot the server and login with AD administrator account 
Step 7 : Once logged in with AD login , Install the failover clustering and below dependent features in ag-sql-node1 and ag-sql-node2


Configuring Shared Drive for Backup and Restore 


Step 8 : Between the ag-sql-node1 and ag-sql-node2 needs to take backup and log backups for Always on background process

Step 9 : Create folder in ag-sql-node2 and share with everyone in AD account
  
Step 10 : Take one time backup of DW_Mart and DataLake in that shared folder . Created Shared drive will be used while always on group creation 

Failover Cluster Configuration 


Step 11 : Open the Failover Cluster Manager console and Create the cluster . Browse and add the both servers 


Step 12 : Once all the steps finished , create the cluster wizard 



Step 13 : Click agsqlprod failover cluster and modify the cluster core resources . In this we need to add secondary IP for both nodes ( 10.0.1.11 and 10.0.3.11 )

Once we added both secondary IP's one of the IP will be come to online 

If we have not added secondary IP , it will show as an error like below 



Configuring SQL Server Services  


Step 14 : Once all the steps are completed on Failover cluster manager , modify the SQL Service Account to AD service account 

Step 15 : Next right click the SQL Server Service in configuration manager and enable the Always on High Availability on ag-sql-node1 and ag-sql-node2 SQL instances 


Create and Configuring the Availability Group 

Step 16 : Right click the always on group wizard and create the availability group as agsqldb


Step 17 : Based on the requirements add the number of replicas ,


Step 18 : Below are the endpoints and make sure allowed below ports between the cluster nodes 


Step 19 : Then create availability group listener with remaining secondary IP ( 10.0.1.12 and 10.0.3.12 )


Step 20 : Once everything is completed click Next to create availability group 



Once its created we can able to see the Cluster Manager Role in the Failover cluster manager console ,




Ready to sync the Data from Primary to Secondary 



After all availability group is healthy and primary and secondary nodes are synchronized 

Thanks for Reading !!! Any corrections or any doubt please contact me directly !!! 

Monday, September 24, 2018

MySQL for Entity Framework

The tutorial guides you through MySQL Database connectivity with Entity Framework

We are facing entity framework database provider compatible with version could not be found for your data connection issue while connecting with MySQL 5.7 and 8.0. Problem with MySql.Data.Entity.EF6.dll missing and not able to connect with Entity framework

As per my requirements, the customer wants to migrate MSSQL to MySQL server. So I want to use any version of MySQL, installed with below configurations and achieve the connectivity 
1.MySQL Server : 5.6.41
2.MySQL Connector/Net : 6.9.12
3.Entity Framework : 6.2.0
4.Visual Studio : Professional 2017
Configuration steps are below,

1. Download required versions,

MySQL - https://dev.mysql.com/downloads/windows/installer/5.7.html

Visual Studio - https://visualstudio.microsoft.com/downloads/

2. Once downloaded, Launch web API project and Install entity framework 6.2.0 through NuGet Packages 


3.Install the MySQL 5.6 with Connector/Net 

Once you installed, you will see below dll files local server 
C:\Program Files (x86)\MySQL\MySQL Connector Net 6.9.12\Assemblies\v4.5


4. Add the dll files in Project reference and rebuild the project 

Once we rebuild, open the web.config file add the below lines if already provider tag part exist remove and add it 


5. Again rebuild the project and add the connection. You will successfully be connected


Thanks for reading ...