datablogs

Tuesday, November 14, 2023

PostgreSQL Table Partition on AWS RDS

Selva | November 14, 2023 | No comments | Amazon RDS, AWS, Database Administration, Development, Performance Tunning, Postgresql

If we grow bigger in the business , seamlessly our customer and transaction data also increases . In the meantime performance needs to consider as well

So in this case of bigger tables indexes will not help us to achieve good performance on peak times . Alternatively we have partitioning to split the tables data into multiple pieces on all the relational database environments

Like wise we are going to do range partition on sample table in PostgreSQL Database , In PostgreSQL three types of partition methods are available ,

Below are the important concern in the PostgreSQL Partition ,

Possible to attach regular table into partitioned one
Not Possible to transform regular table to partitioned one

So based on the above formula , we have tried to transform regular table into partitioned one for your reference

Any one can use this example and perform partitioning in AWS PostgreSQL RDS easily

Click GitHub Link for Code : AWS-PostgreSQL-RDS-Table-Partition

Step 1 : Create base datablogspaycheck table and insert some sample records

DROP TABLE IF EXISTS datablogspaycheck CASCADE;
DROP SEQUENCE IF EXISTS public.paycheck_id_seq;
CREATE SEQUENCE public.paycheck_id_seq
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
create table datablogspaycheck
(
payment_id int NOT NULL DEFAULT nextval('public.paycheck_id_seq'::regclass),
created timestamptz NOT NULL,
updated timestamptz NOT NULL DEFAULT now(),
amount float,
status varchar DEFAULT 'new'
);
CREATE INDEX idx_paycheck ON datablogspaycheck (created);
INSERT INTO datablogspaycheck (created) VALUES (
generate_series(timestamp '2023-01-01'
, now()
, interval '5 minutes') );

Step 2 : Rename base table with new name

ALTER TABLE datablogspaycheck RENAME TO datablogspaycheck_basetable;

Step 3 : Create Partitioned table

create table datablogspaycheck
(
payment_id int NOT NULL DEFAULT nextval('public.paycheck_id_seq'::regclass),
created timestamptz NOT NULL,
updated timestamptz NOT NULL DEFAULT now(),
amount float,
status varchar DEFAULT 'new'
)PARTITION BY RANGE (created);

Step 4 : Create Separate Partition for each create date

CREATE TABLE datablogspaycheck_202303 PARTITION OF datablogspaycheck
FOR VALUES FROM ('2023-01-01') TO ('2023-03-01');

CREATE TABLE datablogspaycheck_20230304 PARTITION OF datablogspaycheck
FOR VALUES FROM ('2023-03-01') TO ('2023-04-01');

CREATE TABLE datablogspaycheck_202304 PARTITION OF datablogspaycheck
FOR VALUES FROM ('2023-04-01') TO ('2023-05-01');

CREATE TABLE datablogspaycheck_202311 PARTITION OF datablogspaycheck
FOR VALUES FROM ('2023-05-01') TO ('2023-11-01');

CREATE TABLE datablogspaycheck_2024 PARTITION OF datablogspaycheck
FOR VALUES FROM ('2023-11-01') TO ('2024-01-01');

Step 5 : Migrate the all records

insert into datablogspaycheck (payment_id,created,updated,amount,status) select payment_id,created,updated,amount,status from datablogspaycheck_basetable;

Step 6 : Validate each partition

select * from datablogspaycheck_202303 order by 2 desc
select * from datablogspaycheck_20230304 order by 2 desc
select * from datablogspaycheck_202311 order by 2 desc

Its done , Easily migrated normal table data into partitioned table

Thanks for Reading !!!

Sunday, June 25, 2023

Deep dive into Babelfish Compass

Selva | June 25, 2023 | 1 comment | Amazon RDS, AWS, Bebelfish, Migration, Postgresql, SQL Server

Wow !!! If suppose on the migration projects we need to more stuffs and things to convert when coming to procedures , functions and other database objects

But AWS is providing good things to migrate with easy steps , Ha Ha ... Don't overthink still you need to do 40% code migration works

In this part Babelfish Compass is giving various options to support migration the codes from SQL Server to PostgreSQL with bebelfish feature enabled PaaS servers

Below are easy steps on the Script Conversations

Prerequisites

Install a 64-bit Java Runtime Environment (JRE) version 8 or higher

1.Download Compass Tool in Below

https://github.com/babelfish-for-postgresql/babelfish_compass/releases/tag/v.2023-03-a

Needs to download .zip file to work with Babelfish Compatibility

2.Unzip and Place the files in separate folder

3.Be Ready with you SQL Database Generated Scripts file and Copy it in Same Folder

Database name has been highlighted in below ,

4.Next , we can start running report with Babelfish Compass

C:\Users\Admin\Downloads\BabelfishCompass_v.2023-03-a\BabelfishCompass>BabelfishCompass.bat reportfinal datablogsdbprod.sql

5.Finally Reports are generated in Documents Directory

6.We can review the reports in any format , for me its easy with in HTML browser

Just double click the HTML document , So like below we will get supported and unsupported features details in depth .

We can directly go and debug the code . Also bebelfish compass is having plenty of methods to rewrite the code , we will check it in next blog

Happy Coding !!!

Wednesday, June 21, 2023

Oracle RDS Audit log enable

Selva | June 21, 2023 | No comments | Amazon RDS, Audit, AWS, Database Administration, Oracle

Oracle Audit Log :

Oracle Audit Log refers to the feature in Oracle Database that records and stores information about various database activities and events. It provides a mechanism to track and monitor user activities, system events, and changes made to the database.

User Logins: Recording user login attempts and authentication information.
Database Activities: Logging SQL statements executed by users, including select, insert, update, and delete operations.
Privilege Usage: Monitoring the usage of privileges, such as granting or revoking permissions.
Schema Changes: Tracking modifications to database objects, such as creating or altering tables, views, or indexes.
System Events: Recording system-level events, such as startup and shutdown of the database.
Security Violations: Detecting unauthorized access attempts or suspicious activities.
Administrative Operations: Logging administrative tasks performed by database administrators, such as user management or database configuration changes.

The Oracle Audit Log provides an essential tool for security, compliance, and troubleshooting purposes.

Types of Auditing in Amazon RDS for Oracle :

Standard Auditing
Unified Auditing
Fine-grained Auditing

We are going to see , how do we enable Standard auditing in Oracle RDS

How to enable Audit Log in Oracle RDS?

Make sure you have enabled custom parameter group for Oracle RDS

Modify below values for Audit_Trail Parameter

Audit_Trail - DB, EXTENDED

Next ,Just needs to modify below DDL or DML statements to capture the logs from the server

AUDIT DELETE ANY TABLE;

AUDIT DELETE TABLE BY USER_01 BY ACCESS;

AUDIT DELETE TABLE BY USER_02 BY ACCESS;

AUDIT ALTER, GRANT, INSERT, UPDATE, DELETE ON DEFAULT;

AUDIT READ ON DIRECTORY datapump_dir;

Its all done , we have enabled required logs to capture for security purpose

How to we monitor Audit Logs ?

We can just run the below command get the captured audit logs in Oracle RDS ,

SELECT * FROM DBA_AUDIT_TRAIL order by 1 desc

Its just for normal scenario , explained the process . Still we can separate Audit Table space and many further things are available in Oracle . Let see on another blogs

Happy Auditing !!!

Friday, February 24, 2023

How to Automate MongoDB Database Backups in Linux

Selva | February 24, 2023 | No comments | Automation, Backup, Database Administration, MongoDB, NoSQL

We have setup of One Primary with Multiple Secondary

Even if we configured highly available setup and backups , native backup are so special to take it and keep it somewhere in the cloud

using the below script we can easily schedule backup in Linux environments

https://github.com/selvackp/MongoNativeBackup-/blob/main/mongo_dump.sh

export PATH=/bin:/usr/bin:/usr/local/bin
#Decalre Today Date
TODAY=`date +"%d%b%Y"`

#Declare Variables Required to pass for mongo dump command
DB_BACKUP_PATH='/mnt/mongobackup'
MONGO_HOST='localhost'
MONGO_PORT='27017'
MONGO_USER='xxxxxxxxxxx'
MONGO_PASSWD='xxxxxxxxxxxxx'
DATABASE_NAMES='ALL'

#Remove Old Backup Files 
find ${DB_BACKUP_PATH} -name "*.zip" -type f -mtime +3 -delete

find ${DB_BACKUP_PATH} -type d -mtime +3 -exec rm -rf {} \;

#Create Directory for Backup
mkdir -p ${DB_BACKUP_PATH}/${TODAY}
cd ${DB_BACKUP_PATH}/${TODAY}/

if [ ${DATABASE_NAMES} = "ALL" ]; then
	echo "You have choose to backup all database"
	mongodump --uri="mongodb://${MONGO_USER}:${MONGO_PASSWD}@${MONGO_HOST}:${MONGO_PORT}"
else
	echo "Running backup for selected databases"
	for DB_NAME in ${DATABASE_NAMES}
	do
	mongodump --uri="mongodb://${MONGO_USER}:${MONGO_PASSWD}@${MONGO_HOST}:${MONGO_PORT}/${DB_NAME}"
	done
fi

#Compress The Backup
cd ${DB_BACKUP_PATH}/${TODAY}

zip -r ${DB_BACKUP_PATH}_${TODAY}.zip ${DB_BACKUP_PATH}/${TODAY}

cd ${DB_BACKUP_PATH}/${TODAY}

#Copy the Compressed file into Azure Container using Shared Access Token
azcopy cp ${DB_BACKUP_PATH}_${TODAY}.zip "https://xxxxxxxxxxx.blob.core.windows.net/xxxxxxxxxxxx?sp=w&st=xxxxxTxxxxxxxZ&se=xxxxxxZ&spr=https&sv=2021-06-08&sr=c&sig=csdfcdsxxxxxxxxxxxxxxx" --recursive=true

#Send Mail with Backup Logs 
if [ $? -ne 0 ]
then
        echo "Mongo Native backup Failed in $(hostname) $(date). Please contact administrator." | mail -r mail@datablogs.com -s "Mongo Native backup Failed $(hostname)" dbsupport@datablogs.com < /mongodata/cronscripts/mongo_backup_log.log

        else

        echo "Mongo Native backup completed in $(hostname)." | mail -r mail@datablogs.com -s "Mongo Native backup completed in $(hostname)" dbsupport@datablogs.com < /mongodata/cronscripts/mongo_backup_log.log
fi

Friday, February 17, 2023

Restore and Point in time Restore with Percona Backup of MongoDB on another instance

Selva | February 17, 2023 | No comments | Database Administration, medium, MongoDB, NoSQL, percona, Point In Time Restore

Its easy to recover MongoDB Backup using Percona Backup for MongoDB

https://medium.com/@datablogs/restore-and-point-in-time-restore-with-percona-backup-of-mongodb-on-another-instance-4766f4468443

How to tune long running Queries on Azure Synapse DW (Dedicated Pool) - Part 2

Selva | February 17, 2023 | No comments | Azure Cloud, Azure Synapse Analytics, Database Administration, Performance Tunning, SQL Server

Its took little long time to derive our approach tunning on azure data factory with Azure Synapse why because we need to run the system atleast two months validate our approach is smooth

Yes its all running good as expected performance on the ETL loads and Processes

Here are the major things we need to take care on Azure Synapse Dedicated Pool ,

Dedicated SQL Pool Scaling
Dedicated SQL Pool Workload Management

Dedicated SQL Pool Scaling :

We have decide below metrices to optimize the Azure Synapse

Data Load and Processing timelines
Critical Process running timelines
Minimal Processing timelines
Reporting and Visualtions Timelines

Based on above requirements , we can easily scalable the Dedicated SQL Pool based on the load

Consider we need to load the data and process with in 2 hours , we can upscale the Dedicated SQL Pool to as like we want

Key Part to increase the DW is concurrency , Instance Capacity and allocated resource for each queries

Before start our critical process , we can automate upscale process with ADF Pipelines itself . So many blogs available to configure that

Best method configure the authentication method with service principle ,

Dedicated SQL Pool Workload Management :

We have decide below metrices to prepare workload management

Short Queries
Critical Queries
High Intensive Queries
Reporting Queries

Based on the classifications , we have to split workload group for above queries

Step 1 :

We need to create login and user for workload management in Dedicated SQL Pool

--CREATE LOGIN [Analyticsprocess] WITH PASSWORD='xxxxxxxxx'
--CREATE USER [Analyticsprocess] FOR LOGIN [Analyticsprocess]
--GRANT CONTROL ON DATABASE::[sql-datablogs-dw] TO
Analyticsprocess

Step 2 :

Consider you have upscaled instance into DW400c below are the resources allocation for the DW400c instance concurrency requirements

In the workload group --> New workload group --> Click ELT

Consider analytics process user is used for high intensive queries we have to allocate as much as minimum resource for workload group

Click Classifiers --> Add Classifiers --> Name it as ELT --> and specify Member should be the login and Label is important to mention

Once click Add , we will get below concurrency range based on DW400c

By Default , its having system level workload group to handle the queries but its not effective we have to force our workload group

Step 3 :

This is very important to utilize the workload group properly . We need to specify Label on the heavy processing queries so that it will utilized properly

CREATE TABLE rpt_datablogs_finalreport.smgreport WITH (HEAP,DISTRIBUTION = REPLICATE) AS select * into rpt_datablogs_finalreport.smgreport_vw from rpt_datablogs_finalreport.vw_smgreport OPTION (LABEL='highintensiveprocess')

So these two things given much improvement on performance as well as reduced operational cost below 60%

Still few more things to learn to optimize it on Azure Data Factory lets see on that next blog

Happy Troubleshooting !!!