datablogs: Database Administration
Showing posts with label Database Administration. Show all posts
Showing posts with label Database Administration. Show all posts

Friday, February 24, 2023

How to Automate MongoDB Database Backups in Linux

We have setup of One Primary with Multiple Secondary

Even if we configured highly available setup and backups , native backup are so special to take it and keep it somewhere in the cloud 

using the below script we can easily schedule backup in Linux environments

export PATH=/bin:/usr/bin:/usr/local/bin #Decalre Today Date TODAY=`date +"%d%b%Y"` #Declare Variables Required to pass for mongo dump command DB_BACKUP_PATH='/mnt/mongobackup' MONGO_HOST='localhost' MONGO_PORT='27017' MONGO_USER='xxxxxxxxxxx' MONGO_PASSWD='xxxxxxxxxxxxx' DATABASE_NAMES='ALL' #Remove Old Backup Files find ${DB_BACKUP_PATH} -name "*.zip" -type f -mtime +3 -delete find ${DB_BACKUP_PATH} -type d -mtime +3 -exec rm -rf {} \; #Create Directory for Backup mkdir -p ${DB_BACKUP_PATH}/${TODAY} cd ${DB_BACKUP_PATH}/${TODAY}/ if [ ${DATABASE_NAMES} = "ALL" ]; then echo "You have choose to backup all database" mongodump --uri="mongodb://${MONGO_USER}:${MONGO_PASSWD}@${MONGO_HOST}:${MONGO_PORT}" else echo "Running backup for selected databases" for DB_NAME in ${DATABASE_NAMES} do mongodump --uri="mongodb://${MONGO_USER}:${MONGO_PASSWD}@${MONGO_HOST}:${MONGO_PORT}/${DB_NAME}" done fi #Compress The Backup cd ${DB_BACKUP_PATH}/${TODAY} zip -r ${DB_BACKUP_PATH}_${TODAY}.zip ${DB_BACKUP_PATH}/${TODAY} cd ${DB_BACKUP_PATH}/${TODAY} #Copy the Compressed file into Azure Container using Shared Access Token azcopy cp ${DB_BACKUP_PATH}_${TODAY}.zip "" --recursive=true #Send Mail with Backup Logs if [ $? -ne 0 ] then echo "Mongo Native backup Failed in $(hostname) $(date). Please contact administrator." | mail -r -s "Mongo Native backup Failed $(hostname)" < /mongodata/cronscripts/mongo_backup_log.log else echo "Mongo Native backup completed in $(hostname)." | mail -r -s "Mongo Native backup completed in $(hostname)" < /mongodata/cronscripts/mongo_backup_log.log fi

Friday, February 17, 2023

How to tune long running Queries on Azure Synapse DW (Dedicated Pool) - Part 2

Its took little long time to derive our approach tunning on azure data factory with Azure Synapse why because we need to run the system atleast two months validate our approach is smooth 

Yes its all running good as expected performance on the ETL loads and Processes 

Here are the major things we need to take care on Azure Synapse Dedicated Pool ,

  1. Dedicated SQL Pool Scaling 
  2. Dedicated SQL Pool Workload Management

Dedicated SQL Pool Scaling : 

We have decide below metrices to optimize the Azure Synapse 

  1. Data Load and Processing timelines
  2. Critical Process running timelines 
  3. Minimal Processing timelines 
  4. Reporting and Visualtions Timelines 
Based on above requirements , we can easily scalable the Dedicated SQL Pool based on the load 

Consider we need to load the data and process with in 2 hours , we can upscale the Dedicated SQL Pool to as like we want 

Key Part to increase the DW is concurrency , Instance Capacity and allocated resource for each queries 

Before start our critical process , we can automate upscale process with ADF Pipelines itself . So many blogs available to configure that 

Best method configure the authentication method with service principle ,

Dedicated SQL Pool Workload Management : 

We have decide below metrices to prepare workload management 

  1. Short Queries
  2. Critical Queries 
  3. High Intensive Queries 
  4. Reporting Queries

Based on the classifications , we have to split workload group for above queries 

Step 1 :

We need to create login and user for workload management in Dedicated SQL Pool 

--CREATE LOGIN [Analyticsprocess] WITH PASSWORD='xxxxxxxxx'

--CREATE USER [Analyticsprocess] FOR LOGIN [Analyticsprocess]

--GRANT CONTROL ON DATABASE::[sql-datablogs-dw] TO 


Step 2 :

Consider you have upscaled instance into DW400c below are the resources allocation for the DW400c instance concurrency requirements 

In the workload group --> New workload group --> Click ELT

Consider analytics process user is used for high intensive queries we have to allocate as much as minimum resource for workload group 

Click Classifiers --> Add Classifiers --> Name it as ELT --> and specify Member should be the login and Label is important to mention 

Once click Add , we will get below concurrency range based on DW400c

By Default , its having system level workload group to handle the queries but its not effective we have to force our workload group 

Step 3 :

This is very important to utilize the workload group properly . We need to specify Label on the heavy processing queries so that it will utilized properly  

CREATE TABLE rpt_datablogs_finalreport.smgreport WITH (HEAP,DISTRIBUTION = REPLICATE) AS select * into rpt_datablogs_finalreport.smgreport_vw from rpt_datablogs_finalreport.vw_smgreport OPTION (LABEL='highintensiveprocess')


So these two things given much improvement on performance as well as reduced operational cost below 60% 

Still few more things to learn to optimize it on Azure Data Factory lets see on that next blog 

Happy Troubleshooting !!! 

Monday, January 30, 2023

Update Backup and Restore in Same SQL Server RDS

Update of Previous post ,

As mentioned in below blog , past years we cant restore database into same SQL Server RDS

But now you can able to restore without any issues from recent releases of SQL Server RDS 

Thursday, December 15, 2022

Sunday, August 28, 2022

How to tune long running Queries on Azure Synapse DW (Dedicated Pool) - Part 1

Three hours of long running pipeline is reduced to run in 35 Minutes when we given right kind of top bottom optimization like our Body !!! 

Yes Sometimes if we give proper core and cross workouts you can run 10km in 55 Minutes !!! 

Its happened to me But Not Now 😆😆😆 

Oh Oh we are away from Data ground !!! Lets Move into Azure Synapse tunning,

In the starting phase every piece of code and pipeline process was really trouble us to optimize it , whatever we do on tunning it will run as same time 

Waited long time until each process to complete as its running 3 hours !!!

But Reecha blog helped something to start with basic check and given some hope to fine tune it 

Remember below piece of monitoring code always help us to too dig more 

--Check Long Running Query in Azure Synapse

FROM sys.dm_pdw_exec_requests
WHERE status not in ('Completed','Failed','Cancelled')
  AND session_id <> session_id()
ORDER BY submit_time DESC;
--Find the distributed query plan steps for long running query

SELECT * FROM sys.dm_pdw_request_steps WHERE request_id = 'xxxxxx'  ORDER BY step_index; 

***Replace your request_id from first query result  
Sample Running Query :

Initially we thought data volume is huge , so we are expecting this much delay on complete . But distributed query Plans are given more finishing time for each queries in Azure Synapse

We will go step by step process of tunning , 

Multiple ways we can see the Execution plan of your Azure Synapse Queries

  1. Using Azure Console --> Click you Dedicated Pool --> Click Query Activity --> Each Queries you have Query Plans 

        2.Using SSMS , you can run below query and get your explain plan in XML Format 

                EXPLAIN select * from dbo_datablogs.vw_fact_transferdata (nolock)

So ,We have figured out the execution plan and further need to fix below things to make it faster 

We need to reduce data movement operations ( Shuffle Move , Trim Move , Partition Move ) and needs to have proper indexing on your business tables 

Based on your data distribution on tables and indexing , queries will be executed on SQL Pool . We need to distribute the tables accordingly . So Major part of the tunning on table level only 

Once we done that in proper way we have achieved your milestone ,

How to we distribute table in Azure Synapse ? 

Two types of distribution is available in Azure Synapse . Round Robin and Hash Distribution 

Round Robin Distribution
  • By Default , If you create table it will create in Round Robin Distribution
  • Table rows are distributed in all distribution 
  • Main purpose of this table to improve loading speed we can use round robin distribution
  • If your table is just junk or doesn't have proper any constraints and keys use round robin distribution
CREATE TABLE [dbo_datablogs].[ControlTable_blogs](
                [SourceObjectSettings] [nvarchar](max) NULL) 

Hash Distribution
  • If your table is having frequent DML operations ( Insert , Update , Delete ) use hash distribution
  • It will increase performance 200% of your queries 
  • It will distribute the data to nodes based on your distributed column 
CREATE TABLE [dbo_datablogs].[ControlTable_blogs](
                [SourceObjectSettings] [nvarchar](max) NULL) 


This is another way to storing the table in SQL Pool 
  • Full copy of the table will be available in all distribution to avoid data movement 
  • Table is having less size use replicate option
CREATE TABLE [dbo_datablogs].[ControlTable_blogs](
                [SourceObjectSettings] [nvarchar](max) NULL) 

Thumb of Rules of Creating tables 

First Level : Find the absolute unique column or else use partial unique column for creating table as Hash Distribution 

Second Level : If your table doesn't have proper unique columns , try to create the table with used columns in Join , Group By , Distinct , Over and Having Clauses on Procedures or Select Clauses 

Lets examine First Level , 

When creating the table with Round Robin and created table with below script ,

CREATE TABLE dbt_datablogs.fact_transferdata
AS SELECT * FROM dbt_datablogs.vw_fact_transferdata

Its taken 28 Minutes to complete

So we figured out good candidate key and created table with below script 

CREATE TABLE dbt_datablogs.fact_transferdata
    DISTRIBUTION = HASH(tranferaddressid)
AS SELECT * FROM dbt_datablogs.vw_fact_transferdata

Its taken 4 Minutes to complete 

We will see the huge difference when creating proper keys 

Lets examine Second Level ,

When we don't have proper keys , based on your business logics or select clause join conditions we can try to evaluate with multiple columns and create proper hash function to improve performance 

Lets do the smaller table tunning on this blog , you can try the same in your own environments , 

Below table is small but its having shuffle move and its delaying the process 11 Minutes ,

Then based on shuffle columns , add the proper hash function and recreate the table 

CREATE TABLE dbt_datablogs.fact_memberinfo
AS SELECT * FROM dbt_datablogs.vw_memberinfo

Once created the table again examined the Execution plan for the same statement and process is finished with in a seconds 

This is not an end , Also we have tunned few things on Data Factory and Azure Synapse Workload Management to get expected performance , we will see further on next blog 

To learn more in-depth , Please use below references from Microsoft Site as well 

References : 

Thursday, August 11, 2022

SQL Server Point in Time Recovery

 Oooooouch !!! I have deleted critical data in the ledger table !!!

When we ran delete or drop Script on any database without double check , Its Really big problem for the end users !!! Also you wont sleep 😁😁😁

If you have GODDBA , you are saved millions dollars of your data also your valuable customers . 

Oh Great 😀😀😀 How we can achieve that ? 

In SQL Server Multiple Database Backup Methods are available to tackle that , 

If SQL Server database is in full recovery model we can take transaction log backups on every hour/Minute/Seconds into local or remote drives . 

So, Yes am taking Log Backup 😕😕😕 How to we restore particular time ? 

Using with transaction logs we can bring back the data as much as you want depends on your log backup strategy . 

To achieve that here is the automated point in time restore script for your valuable environment ,

Scripts are available in

So finally you have saved your millions dollars worth of data in 10 Minutes !!!

USE master

        @DatabaseOLDName			sysname  = 'test',										
        @DatabaseNewName			sysname  = 'test1',										
        @PrimaryDataFileName		        sysname  = 'test',									
	@SecDataFileName                        sysname  =  NULL,
        @DatabaseLogFileName		        sysname  = 'test_log',								
	@PrimaryDataFileCreatePath              sysname  = 'D:\MISC\Bkp\testdata.mdf',			
        @SecDataFileCreatePath		        sysname  = NULL,									
        @SecDataFileCreatePath1		        sysname  = NULL,									
        @DatabaseLogFileCreatePath              sysname  = 'D:\MISC\Bkp\test_log.ldf',		
        @PITRDateTime				datetime = '2022-08-11T20:44:11';

DECLARE @command				nvarchar(MAX),
        @OldPhysicalPathName		        nvarchar(MAX),
        @FullBackupDateTime			datetime,
        @DiffBackupDateTime			datetime,
        @LogBackupDateTime			datetime,
        @message				nvarchar(MAX);

     MOVE N''' + @PrimaryDataFileName + N''' TO N''' + @PrimaryDataFileCreatePath + N''','
           + COALESCE('
     MOVE N''' + @SecDataFileName + ''' TO N''' + @SecDataFileCreatePath + ''',', '')
           + N'
     MOVE N''' + @DatabaseLogFileName + N''' TO N''' + @DatabaseLogFileCreatePath + N''';';

SELECT     TOP (1) @OldPhysicalPathName = bmf.physical_device_name,@FullBackupDateTime = bs.backup_start_date
FROM       msdb.dbo.backupset AS bs INNER JOIN msdb.dbo.backupmediafamily AS bmf ON  bmf.media_set_id = bs.media_set_id
WHERE      bs.database_name = @DatabaseOLDName AND  bs.type= 'D' AND  bs.backup_start_date < @PITRDateTime
ORDER BY   bs.backup_start_date DESC;

SET @message = N'Starting restore of full backup file '+ @OldPhysicalPathName + N', taken ' + CONVERT(nvarchar(30), @FullBackupDateTime, 120);

RAISERROR(@message, 0, 1) WITH NOWAIT;

EXEC sys.sp_executesql @command,
                       N'@DatabaseNewName sysname, @OldPhysicalPathName nvarchar(260), @PITRDateTime datetime',


SELECT     TOP (1) @OldPhysicalPathName = bmf.physical_device_name,@DiffBackupDateTime = bs.backup_start_date
FROM       msdb.dbo.backupset   AS bs INNER JOIN msdb.dbo.backupmediafamily AS bmf ON  bmf.media_set_id = bs.media_set_id
WHERE      bs.database_name = @DatabaseOLDName AND  bs.type  = 'I' AND  bs.backup_start_date >= @FullBackupDateTime AND  bs.backup_start_date< @PITRDateTime
ORDER BY   bs.backup_start_date DESC;

    SET @message = N'Starting restore of differential backup file ' + @OldPhysicalPathName + N', taken ' + CONVERT(nvarchar(30), @DiffBackupDateTime, 120);

    RAISERROR(@message, 0, 1) WITH NOWAIT;

EXEC sys.sp_executesql @command,
                       N'@DatabaseNewName sysname, @OldPhysicalPathName nvarchar(260), @PITRDateTime datetime',

SET @command = N'RESTORE LOG @DatabaseNewName
FROM DISK = @OldPhysicalPathName

SELECT     bmf.physical_device_name,
FROM       msdb.dbo.backupset         AS bs
INNER JOIN msdb.dbo.backupmediafamily AS bmf
   ON      bmf.media_set_id = bs.media_set_id
WHERE      bs.database_name = @DatabaseOLDName
AND        bs.type                 = 'L'
AND        bs.backup_start_date    >= COALESCE(@DiffBackupDateTime, @FullBackupDateTime)
ORDER BY   bs.backup_start_date ASC;


INTO @OldPhysicalPathName,

    SET @message = N'Starting restore of log backup file '
                   + @OldPhysicalPathName + N', taken '
                   + CONVERT(nvarchar(30), @LogBackupDateTime, 120);
    RAISERROR(@message, 0, 1) WITH NOWAIT;
    EXEC sys.sp_executesql @command,
                           N'@DatabaseNewName sysname, @OldPhysicalPathName nvarchar(260), @PITRDateTime datetime',

    IF @LogBackupDateTime >= @PITRDateTime

    INTO @OldPhysicalPathName,


SET @command = N'RESTORE DATABASE @DatabaseNewName

RAISERROR('Starting recovery', 0, 1) WITH NOWAIT;
EXEC sys.sp_executesql @command,
                       N'@DatabaseNewName sysname, @OldPhysicalPathName nvarchar(260), @PITRDateTime datetime',

Wednesday, February 16, 2022

SQL Server Always On availability group cluster in the AWS Cloud

Microsoft gives HA features like a charm . Lower to higher deployment costs its giving many features as per business requirements . Replication , Mirroring , Log shipping and Always On many features available to build HA Setup in On Premises . 

Like wise , we can setup all the above features in Cloud as well . In that we can see Always on availability group cluster in this blog 

What is Always On Availability Group?

  • An availability group supports a replicated environment for a discrete set of user databases, known as availability databases. 
  • You can create an availability group for high availability (HA) or for read-scale. An HA availability group is a group of databases that fail over together.  

Environment Setup for Always on Availability Group ,

Launched one Active Directory and two SQL Nodes with below range . Detailed setup for environment steps are below ,

Below are the detailed steps for environment steps ,
Step 1 : Create ag-sql-vpc with IPv4 CIDR range

Step 2 : Create two private subnets ( , ) for SQL Nodes and one public subnet ( ) for Active Directory  

Step 3 : Launched the windows instances with two secondary ip's for Failover Cluster and Always on Listener
In this POC Setup , Launched windows instance and installed SQL Server Developer edition . Also we can launch Windows with SQL Server 2016 based on your requirements 

Step 4 : Change the computer properties and rename the instance names accordingly 
Step 5 : Completed the AD Server configuration and its named as ag-sql-AD , After that change DNS server address in network properties in ag-sql-node1 and ag-sql-node2 ( is static IP of AD Server )

Step 6 : Once modified the DNS configuration reboot the server and login with AD administrator account 
Step 7 : Once logged in with AD login , Install the failover clustering and below dependent features in ag-sql-node1 and ag-sql-node2

Configuring Shared Drive for Backup and Restore 

Step 8 : Between the ag-sql-node1 and ag-sql-node2 needs to take backup and log backups for Always on background process

Step 9 : Create folder in ag-sql-node2 and share with everyone in AD account
Step 10 : Take one time backup of DW_Mart and DataLake in that shared folder . Created Shared drive will be used while always on group creation 

Failover Cluster Configuration 

Step 11 : Open the Failover Cluster Manager console and Create the cluster . Browse and add the both servers 

Step 12 : Once all the steps finished , create the cluster wizard 

Step 13 : Click agsqlprod failover cluster and modify the cluster core resources . In this we need to add secondary IP for both nodes ( and )

Once we added both secondary IP's one of the IP will be come to online 

If we have not added secondary IP , it will show as an error like below 

Configuring SQL Server Services  

Step 14 : Once all the steps are completed on Failover cluster manager , modify the SQL Service Account to AD service account 

Step 15 : Next right click the SQL Server Service in configuration manager and enable the Always on High Availability on ag-sql-node1 and ag-sql-node2 SQL instances 

Create and Configuring the Availability Group 

Step 16 : Right click the always on group wizard and create the availability group as agsqldb

Step 17 : Based on the requirements add the number of replicas ,

Step 18 : Below are the endpoints and make sure allowed below ports between the cluster nodes 

Step 19 : Then create availability group listener with remaining secondary IP ( and )

Step 20 : Once everything is completed click Next to create availability group 

Once its created we can able to see the Cluster Manager Role in the Failover cluster manager console ,

Ready to sync the Data from Primary to Secondary 

After all availability group is healthy and primary and secondary nodes are synchronized 

Thanks for Reading !!! Any corrections or any doubt please contact me directly !!! 

Wednesday, February 2, 2022

Redis to Elasticache Migration in AWS

Comparing Redis , AWS Elasticache giving multiple options to cache the data in cloud . Its enhanced with two ways of Access control options in PaaS Service 

Below are the few mission handling features we loved in elastic cache ,

  • Detecting data failures and recovery the nodes automatically 
  • Multi AZ features with automatic failover to replication nodes 
  • Redis Cluster mode enabled supports data partitioning up to 500 shards in different zones
  • Redis managed backups , automatic failure detection and log files transfer to Kinesis Firehouse and cloud watch 
  • Easy online migration from any Redis source 
  • Encryption at rest and Encryption in transit are supported 
  • Data volume up to peta bytes with micro second response time  
Let start migrating data from Redis to Elasticache ,

Preparing our source and target Redis nodes for migration 

  1. Launched two EC2 machines and installed redis 5.x version 
  2. Configured replication between master and replica
        To avoid downtime , we are doing online migration from replication server . so we have modified below values in replica server . Its based on your source modify the below values

      3.Configured replication between master and replica
  • protected-mode no
  • bind 6379
  • requirepass should be disabled

       Note : Transferring source master or replica should not have AUTH configured and protected mode disabled 

      4.Launch Elastic cache instance with below steps 

Cluster Mode Disabled ( Redis Engine 5.0.5 or Higher ) 

Multi-AZ Should be Enabled 

Encrypt at rest and in-transit should be disabled

If we have not followed above all steps , we will get below popup while we start the migration 

Starting Migration 

In Elastic console Click --> Action --> Migrate data from endpoint 

Once clicked , we can type the Source Redis Endpoint ( Private IP ) and Port ( Customized Port ) in below console ,

Then click Start Migration and follow the logs from Events in same console , 

If we get any error , It will be printed in the events log

Once its started , Restart the Master of Elasticache node one time , In our case we need to restart the replica server because its our source

Once restarted verify the info replication in Elasticache nodes 

Once started successfully , we can't perform any other operation in Elasticache nodes ,

Verifying the Data Migration Progress 

We have inserted sample record in the EC2 Master Node and verified the same in the replica and as well as in Elasticache 

Data looks good , once all the data transferred from IaaS to PaaS . Complete the migration in the Elasticache console and point the application endpoints to Elasticache node 

Thanks for Reading !!!