datablogs: AWS
Showing posts with label AWS. Show all posts
Showing posts with label AWS. Show all posts

Tuesday, November 14, 2023

PostgreSQL Table Partition on AWS RDS

If we grow bigger in the business , seamlessly our customer and transaction data also increases . In the meantime performance needs to consider as well 

So in this case of bigger tables indexes will not help us to achieve good performance on peak times . Alternatively we have partitioning to split the tables data into multiple pieces on all the relational database environments 

Like wise we are going to do range partition on sample table in PostgreSQL Database , In PostgreSQL three types of partition methods are available , 


Below are the important concern in the PostgreSQL Partition ,

  • Possible to attach regular table into partitioned one 
  • Not Possible to transform regular table to partitioned one   

So based on the above formula , we have tried to transform regular table into partitioned one for your reference 

Any one can use this example and perform partitioning in AWS PostgreSQL RDS easily 

Click GitHub Link for Code : AWS-PostgreSQL-RDS-Table-Partition

Step 1 : Create base datablogspaycheck table and insert some sample records 

DROP TABLE IF EXISTS datablogspaycheck CASCADE;

DROP SEQUENCE IF EXISTS public.paycheck_id_seq;

CREATE SEQUENCE public.paycheck_id_seq

    START WITH 1

    INCREMENT BY 1

    NO MINVALUE

    NO MAXVALUE

    CACHE 1;

create table datablogspaycheck

(

    payment_id int NOT NULL DEFAULT nextval('public.paycheck_id_seq'::regclass), 

    created timestamptz NOT NULL,

    updated  timestamptz NOT NULL DEFAULT now(),

    amount float,

    status varchar DEFAULT 'new'

);

CREATE INDEX idx_paycheck ON datablogspaycheck (created);

INSERT INTO datablogspaycheck (created) VALUES (

generate_series(timestamp '2023-01-01'

               , now()

               , interval  '5 minutes') ); 

Step 2 : Rename base table with new name

ALTER TABLE datablogspaycheck RENAME TO datablogspaycheck_basetable;

Step 3 : Create Partitioned table 

create table datablogspaycheck

(

    payment_id int NOT NULL DEFAULT nextval('public.paycheck_id_seq'::regclass), 

    created timestamptz NOT NULL,

    updated  timestamptz NOT NULL DEFAULT now(),

    amount float,

    status varchar DEFAULT 'new'

)PARTITION BY RANGE (created);

Step 4 : Create Separate Partition for each create date 

CREATE TABLE datablogspaycheck_202303 PARTITION OF datablogspaycheck

    FOR VALUES FROM ('2023-01-01') TO ('2023-03-01');

   

CREATE TABLE datablogspaycheck_20230304 PARTITION OF datablogspaycheck

    FOR VALUES FROM ('2023-03-01') TO ('2023-04-01');

    

CREATE TABLE datablogspaycheck_202304 PARTITION OF datablogspaycheck

    FOR VALUES FROM ('2023-04-01') TO ('2023-05-01');

    

CREATE TABLE datablogspaycheck_202311 PARTITION OF datablogspaycheck

    FOR VALUES FROM ('2023-05-01') TO ('2023-11-01');

   

CREATE TABLE datablogspaycheck_2024 PARTITION OF datablogspaycheck

    FOR VALUES FROM ('2023-11-01') TO ('2024-01-01');

Step 5 : Migrate the all records

insert into datablogspaycheck (payment_id,created,updated,amount,status) select payment_id,created,updated,amount,status from datablogspaycheck_basetable;

Step 6 : Validate each partition 

select * from datablogspaycheck_202303 order by 2 desc

select * from datablogspaycheck_20230304 order by 2 desc

select * from datablogspaycheck_202311 order by 2 desc

Its done , Easily migrated normal table data into partitioned table 

Thanks for Reading !!!



Sunday, June 25, 2023

Deep dive into Babelfish Compass

        Wow !!! If suppose on the migration projects we need to more stuffs and things to convert when coming to procedures , functions and other database objects 

But AWS is providing good things to migrate with easy steps , Ha Ha ... Don't overthink still you need to do 40% code migration works 

In this part Babelfish Compass is giving various options to support migration the codes from SQL Server to PostgreSQL with bebelfish feature enabled PaaS servers 

Below are easy steps on the Script Conversations

Prerequisites 
  • Install a 64-bit Java Runtime Environment (JRE) version 8 or higher

1.Download Compass Tool in Below 

https://github.com/babelfish-for-postgresql/babelfish_compass/releases/tag/v.2023-03-a

Needs to download .zip file to work with Babelfish Compatibility 


2.Unzip and Place the files in separate folder 


3.Be Ready with you SQL Database Generated Scripts file and Copy it in Same Folder 

Database name has been highlighted in below , 


4.Next , we can start running report with Babelfish Compass 

C:\Users\Admin\Downloads\BabelfishCompass_v.2023-03-a\BabelfishCompass>BabelfishCompass.bat reportfinal datablogsdbprod.sql


5.Finally Reports are generated in Documents Directory 


6.We can review the reports in any format , for me its easy with in HTML browser 

Just double click the HTML document , So like below we will get supported and unsupported features details in depth . 

We can directly go and debug the code . Also bebelfish compass is having plenty of methods to rewrite the code , we will check it in next blog 


Happy Coding !!!



Wednesday, June 21, 2023

Oracle RDS Audit log enable

Oracle Audit Log : 

Oracle Audit Log refers to the feature in Oracle Database that records and stores information about various database activities and events. It provides a mechanism to track and monitor user activities, system events, and changes made to the database.

  1. User Logins: Recording user login attempts and authentication information.
  2. Database Activities: Logging SQL statements executed by users, including select, insert, update, and delete operations.
  3. Privilege Usage: Monitoring the usage of privileges, such as granting or revoking permissions.
  4. Schema Changes: Tracking modifications to database objects, such as creating or altering tables, views, or indexes.
  5. System Events: Recording system-level events, such as startup and shutdown of the database.
  6. Security Violations: Detecting unauthorized access attempts or suspicious activities.
  7. Administrative Operations: Logging administrative tasks performed by database administrators, such as user management or database configuration changes.

The Oracle Audit Log provides an essential tool for security, compliance, and troubleshooting purposes.

Types of Auditing in Amazon RDS for Oracle : 

  1. Standard Auditing 
  2. Unified Auditing 
  3. Fine-grained Auditing

We are going to see , how do we enable Standard auditing in Oracle RDS 

How to enable Audit Log in Oracle RDS?

Make sure you have enabled custom parameter group for Oracle RDS 

  • Modify below values for Audit_Trail Parameter 

            Audit_Trail - DB, EXTENDED

  • Next ,Just needs to modify below DDL or DML statements to capture the logs from the server 

            AUDIT DELETE ANY TABLE;

            AUDIT DELETE TABLE BY USER_01 BY ACCESS;

            AUDIT DELETE TABLE BY USER_02 BY ACCESS;

            AUDIT ALTER, GRANT, INSERT, UPDATE, DELETE ON DEFAULT;

            AUDIT READ ON DIRECTORY datapump_dir;

Its all done , we have enabled required logs to capture for security purpose 

How to we monitor Audit Logs ? 

We can just run the below command get the captured audit logs in Oracle RDS ,

SELECT * FROM DBA_AUDIT_TRAIL order by 1 desc 

Its just for normal scenario , explained the process . Still we can separate Audit Table space and many further things are available in Oracle . Let see on another blogs 

Happy Auditing !!!


Thursday, December 15, 2022

Tuesday, December 6, 2022

How do I troubleshoot the AWS Glue error "VPC S3 endpoint validation failed for SubnetId"?

If you are newbie to AWS Glue its really difficult to run the Crawlers without these failures , Below are basic steps you need to make sure done before running the Crawler

  1. AWS IAM Role and Privileges 
  2. S3 Endpoint 

1.AWS IAM Role and Policies 

We have to atleast attach below policies in IAM Role 

2.S3 Endpoint 

Most of the time you will get below error ,  

VPC S3 endpoint validation failed for SubnetId: subnet-0e0b8e2ad1b85d036. VPC: vpc-01bdc81e45566a823. Reason: Could not find S3 endpoint or NAT gateway for SubnetId: subnet-0e0b8e2ad1b85d036 in Vpc vpc-01bdc81e45566a823 (Service: AWSGlueJobExecutor; Status Code: 400; Error Code: InvalidInputException; Request ID: 0495f22f-a7b5-4f74-8691-7de8a6a47b42; Proxy: null)



To fix this error , you need understand the issue first . Its saying 

Create Endpoints for S3 not for Glue , Some worst cases people create Nat Gateway and they loss huge money for simple thing . So you have to create S3 Endpoint Gateway as like below ,


Once we created your job will run like flight :)




Migrate and Sync RDS PostgreSQL to Amazon Redshift Serverless using AWS Glue

Always customer prefers cost less solutions to run the business . To help them their business and the requirements we also needs to provide efficient solutions 

Some cases cloud vendors provides good solutions for analytics load but cost will be very high , most of the time we don't want to recommend that but we need to do  

Like that one of the solution in AWS , its cost much but works much faster like anything 

We are talking about Amazon Redshift Solutions only , So recently they have launched Amazon Redshift Serverless solutions for few regions .

Whatever new comes , before customer catches the features we need to find and deliver the best approach to them . So ,

What is Amazon Redshift Serverless ? 

Amazon Redshift Serverless automatically provisions data warehouse capacity and intelligently scales the underlying resources. Amazon Redshift Serverless adjusts capacity in seconds to deliver consistently high performance and simplified operations for even the most demanding and volatile workloads.

With Amazon Redshift Serverless, you can benefit from the following features:

  • Access and analyze data without the need to set up, tune, and manage Amazon Redshift provisioned clusters
  • Use the superior Amazon Redshift SQL capabilities, industry-leading performance, and data-lake integration to seamlessly query across a data warehouse, a data lake, and operational data sources
  • Deliver consistently high performance and simplified operations for the most demanding and volatile workloads with intelligent and automatic scaling
  • Use workgroups and namespaces to organize compute resources and data with granular cost controls
  • Pay only when the data warehouse is in use

So , Overall no need of human interventions in the Redshift Serverless 

Everything is Fine , How to we migrate and sync Amazon RDS / EC2 Postgres / Aurora Postgres to utilize this Redshift Serverless 

What are the options available to migrate and Sync ?

  • DMS - Still Redshift Target is not available to migrate the Data
  • Export/Import - Yes we can perform , how to handle zero downtime migration . Syncing real-time data is not possible 
  • AWS Glue - Its Good Option , We can migrate and Sync real-time data from RDS to Redshift Serverless  
Lets start sample data migrate and sync into Amazon Redshift Serverless ,

Environment Setup Go Through ,

  1. RDS PostgreSQL
  2. AWS Glue
  3. Amazon Redshift Serverless 
  4. VPC S3 Endpoint
  5. IAM Role

RDS PostgreSQL : 


Amazon Redshift Serverless :


VPC S3 Endpoint :


IAM Role : 


Once Environment is completed , we can start adding connections and jobs in AWS Glue

How to add connections in AWS Glue , 

In AWS Glue Console --> Click Connections --> Create Connections 





Create source and target databases ,



For testing sample schema and data inserted into RDS PostgreSQL before creating the crawler 

Lets start to create separate crawlers for Source and Target to update catalog ,

Below data source is mapped to RDS PostgreSQL 


Also required policies updated role needs to attached ,


Choose appropriate databases for crawler , below is the for source 


Below is the for Target , 


Once its completed , we have already deployed sample schema scripts in both side to transfer the data . Lets run the crawler and check it 


So both source and target tables are updated 


Lets Create Glue Job in AWS Console ,




After all , Lets schedule the job every 5 Minutes and sync the data

Whatever data is inserted in RDS PostgreSQL will be sync into Redshift Serverless Every 5 Minutes ,


So Quickly we can migrate and save our cost 

Any Troubles and issues Please contact me immediately !!!

Wednesday, March 30, 2022

MongoDB 4.4.13 : Here to SRV you with easier replica set connections

We love MongoDB for extraordinary features as per business perspective 

Lets come to our Blog Discussion , Only in PaaS Environments we have features like DNS endpoints for database easily connect with primary or secondary at any single point of failure 

Mongo Atlas Providing all the features but small scale customers still using MongoDB with Virtual Machines or EC2 Instances . To handle point of failures in primary we can use DNS Seed List Connection Format in mongoDB . We will discuss in detail how to we configure this in AWS Cloud 

What is seed list ?

Seed list can be list of hosts and ports in DNS Entries . Using DNS we can configure available mongoDB servers in under one hood . When client connects to an common DNS , its also knows replica set members available in seed list . Single SRV identifies all the nodes associated with the cluster .  Like Below ,
root@ip-172-31-86-8:~# mongo "mongodb+srv://superuser:zU2iU9pF7mO7rZ4z@db.datamongo.com/?authSource=admin&readPreference=primary&ssl=false"
Percona Server for MongoDB shell version v4.4.13-13
connecting to: mongodb://db1.datamongo.com:27717,db3.datamongo.com:27717,db2.datamongo.com:27717/?authSource=admin&compressors=disabled&gssapiServiceName=mongodb&readPreference=primary&replicaSet=db-replication&ssl=false


Environment Setup : 

For Testing Purpose , We have launched 3 Private Subnet Servers and 1 Public Subnet Server to use like Bastion . Create One Private Hosted Zone for DNS and Installed Percona Server for MongoDB 4.4.13 then configured Replication in it 

AWS EC2 Servers ,


Route 53 Hosted Zone ,


Creating A Records : 

We have launched private subnet instances , so we required to create A Records for private IP's . If Public IPv4 DNS available we can create CNAME Records

A Records Created for db1 server ,

Inside the datamongo.com hosted Zone , Just Click Create Record



Same like we need to create A Records for other two nodes 


Verify the A Records ,
root@ip-172-31-95-215:~# dig db1.datamongo.com

; <<>> DiG 9.11.3-1ubuntu1.17-Ubuntu <<>> db1.datamongo.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13639
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;db1.datamongo.com.             IN      A

;; ANSWER SECTION:
db1.datamongo.com.      10      IN      A       172.31.85.180

;; Query time: 2 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Tue Mar 29 11:58:09 UTC 2022
;; MSG SIZE  rcvd: 62

root@ip-172-31-95-215:~# dig db2.datamongo.com

; <<>> DiG 9.11.3-1ubuntu1.17-Ubuntu <<>> db2.datamongo.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9496
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;db2.datamongo.com.             IN      A

;; ANSWER SECTION:
db2.datamongo.com.      300     IN      A       172.31.83.127

;; Query time: 3 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Tue Mar 29 12:06:28 UTC 2022
;; MSG SIZE  rcvd: 62

root@ip-172-31-95-215:~# dig db3.datamongo.com

; <<>> DiG 9.11.3-1ubuntu1.17-Ubuntu <<>> db3.datamongo.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46401
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;db3.datamongo.com.             IN      A

;; ANSWER SECTION:
db3.datamongo.com.      300     IN      A       172.31.86.8

;; Query time: 2 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Tue Mar 29 12:06:33 UTC 2022
;; MSG SIZE  rcvd: 62

root@ip-172-31-95-215:~#


Creating SRV and TXT Records :

As like Atlas , Once we have the A Records for MongoDB Nodes , we can able to create SRV Records 

Again Inside the datamongo.com hosted Zone , Just Click Create Record


Once its created , again click create record and create TXT records 


Once all the records are created in hosted Zone , Just see the details in same page 


Reading SRV and TXT Records : 

We can use nslookup and verify the configured DNS Seeding ,
root@ip-172-31-95-215:~# nslookup
> set type=SRV
> _mongodb._tcp.db.datamongo.com
Server:         127.0.0.53
Address:        127.0.0.53#53

Non-authoritative answer:
_mongodb._tcp.db.datamongo.com  service = 0 0 27717 db2.datamongo.com.
_mongodb._tcp.db.datamongo.com  service = 0 0 27717 db3.datamongo.com.
_mongodb._tcp.db.datamongo.com  service = 0 0 27717 db1.datamongo.com.

Authoritative answers can be found from:
> set type=TXT
> db.datamongo.com
Server:         127.0.0.53
Address:        127.0.0.53#53

Non-authoritative answer:
db.datamongo.com        text = "authSource=admin&replicaSet=db-replication"

Authoritative answers can be found from:


Verify Connectivity : 

Its all done , We can verify the connectivity with DNS Seed List Connection format ,

By Default , it will connect with ssl true , but we have configured mongodb without SSL . If you required to configure with SSL please refer our blog and configure DNS Seeding with help of this blog 
root@ip-172-31-86-8:~# mongo "mongodb+srv://superuser:zU2iU9pF7mO7rZ4z@db.datamongo.com/?authSource=admin&readPreference=primary&ssl=false"
Percona Server for MongoDB shell version v4.4.13-13
connecting to: mongodb://db1.datamongo.com:27717,db3.datamongo.com:27717,db2.datamongo.com:27717/?authSource=admin&compressors=disabled&gssapiServiceName=mongodb&readPreference=primary&replicaSet=db-replication&ssl=false
Implicit session: session { "id" : UUID("ee74effc-92c7-4189-9e97-017afb4b4ad4") }
Percona Server for MongoDB server version: v4.4.13-13
---
The server generated these startup warnings when booting:
        2022-03-29T11:32:47.133+00:00: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine. See http://dochub.mongodb.org/core/prodnotes-filesystem
---
db-replication:PRIMARY> rs.status().members.find(r=>r.state===1).name;
172.31.83.127:27717
db-replication:PRIMARY> rs.status().members.find(r=>r.state===1).stateStr;
PRIMARY
db-replication:PRIMARY> rs.status().members.find(r=>r.state===2).name;
172.31.85.180:27717
db-replication:PRIMARY> rs.status().members.find(r=>r.state===2).stateStr;
SECONDARY

Currently 172.31.83.127 is the primary server and 172.31.85.180 is secondary , to test connection we have stopped the primary server (172.31.83.127) in AWS console 



after stopping primary server (172.31.83.127) , mongodb failover happened to to 172.31.85.180 . Its verified without disconnecting the mongo shell 

root@ip-172-31-86-8:~# mongo "mongodb+srv://superuser:zU2iU9pF7mO7rZ4z@db.datamongo.com/?authSource=admin&readPreference=primary&ssl=false"
Percona Server for MongoDB shell version v4.4.13-13
connecting to: mongodb://db1.datamongo.com:27717,db3.datamongo.com:27717,db2.datamongo.com:27717/?authSource=admin&compressors=disabled&gssapiServiceName=mongodb&readPreference=primary&replicaSet=db-replication&ssl=false
Implicit session: session { "id" : UUID("ee74effc-92c7-4189-9e97-017afb4b4ad4") }
Percona Server for MongoDB server version: v4.4.13-13
---
The server generated these startup warnings when booting:
2022-03-29T11:32:47.133+00:00: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine. See http://dochub.mongodb.org/core/prodnotes-filesystem
---
db-replication:PRIMARY> rs.status().members.find(r=>r.state===1).name;
172.31.83.127:27717
db-replication:PRIMARY> rs.status().members.find(r=>r.state===1).stateStr;
PRIMARY
db-replication:PRIMARY> rs.status().members.find(r=>r.state===2).name;
172.31.85.180:27717
db-replication:PRIMARY> rs.status().members.find(r=>r.state===2).stateStr;
SECONDARY
db-replication:PRIMARY> rs.status().members.find(r=>r.state===1).name;
172.31.85.180:27717
db-replication:PRIMARY> rs.status().members.find(r=>r.state===1).stateStr;
PRIMARY

Its working as expected and we have no worries if anything happens on mongoDB primary node in Cloud IaaS as Well !!!

Please contact us if any queries and concerns , we are always happy to help !!!

Wednesday, February 16, 2022

SQL Server Always On availability group cluster in the AWS Cloud


Microsoft gives HA features like a charm . Lower to higher deployment costs its giving many features as per business requirements . Replication , Mirroring , Log shipping and Always On many features available to build HA Setup in On Premises . 

Like wise , we can setup all the above features in Cloud as well . In that we can see Always on availability group cluster in this blog 




What is Always On Availability Group?


  • An availability group supports a replicated environment for a discrete set of user databases, known as availability databases. 
  • You can create an availability group for high availability (HA) or for read-scale. An HA availability group is a group of databases that fail over together.  


Environment Setup for Always on Availability Group ,


Launched one Active Directory and two SQL Nodes with below range . Detailed setup for environment steps are below ,


Below are the detailed steps for environment steps ,
Step 1 : Create ag-sql-vpc with 10.0.0.0/16 IPv4 CIDR range


Step 2 : Create two private subnets ( 10.0.3.0/24 , 10.0.1.0/24 ) for SQL Nodes and one public subnet ( 10.0.4.0/24 ) for Active Directory  


Step 3 : Launched the windows instances with two secondary ip's for Failover Cluster and Always on Listener
In this POC Setup , Launched windows instance and installed SQL Server Developer edition . Also we can launch Windows with SQL Server 2016 based on your requirements 







Step 4 : Change the computer properties and rename the instance names accordingly 
Step 5 : Completed the AD Server configuration and its named as ag-sql-AD , After that change DNS server address in network properties in ag-sql-node1 and ag-sql-node2 ( 10.0.4.33 is static IP of AD Server )



Step 6 : Once modified the DNS configuration reboot the server and login with AD administrator account 
Step 7 : Once logged in with AD login , Install the failover clustering and below dependent features in ag-sql-node1 and ag-sql-node2


Configuring Shared Drive for Backup and Restore 


Step 8 : Between the ag-sql-node1 and ag-sql-node2 needs to take backup and log backups for Always on background process

Step 9 : Create folder in ag-sql-node2 and share with everyone in AD account
  
Step 10 : Take one time backup of DW_Mart and DataLake in that shared folder . Created Shared drive will be used while always on group creation 

Failover Cluster Configuration 


Step 11 : Open the Failover Cluster Manager console and Create the cluster . Browse and add the both servers 


Step 12 : Once all the steps finished , create the cluster wizard 



Step 13 : Click agsqlprod failover cluster and modify the cluster core resources . In this we need to add secondary IP for both nodes ( 10.0.1.11 and 10.0.3.11 )

Once we added both secondary IP's one of the IP will be come to online 

If we have not added secondary IP , it will show as an error like below 



Configuring SQL Server Services  


Step 14 : Once all the steps are completed on Failover cluster manager , modify the SQL Service Account to AD service account 

Step 15 : Next right click the SQL Server Service in configuration manager and enable the Always on High Availability on ag-sql-node1 and ag-sql-node2 SQL instances 


Create and Configuring the Availability Group 

Step 16 : Right click the always on group wizard and create the availability group as agsqldb


Step 17 : Based on the requirements add the number of replicas ,


Step 18 : Below are the endpoints and make sure allowed below ports between the cluster nodes 


Step 19 : Then create availability group listener with remaining secondary IP ( 10.0.1.12 and 10.0.3.12 )


Step 20 : Once everything is completed click Next to create availability group 



Once its created we can able to see the Cluster Manager Role in the Failover cluster manager console ,




Ready to sync the Data from Primary to Secondary 



After all availability group is healthy and primary and secondary nodes are synchronized 

Thanks for Reading !!! Any corrections or any doubt please contact me directly !!!