datablogs: AWS
Showing posts with label AWS. Show all posts
Showing posts with label AWS. Show all posts

Thursday, December 15, 2022

Tuesday, December 6, 2022

How do I troubleshoot the AWS Glue error "VPC S3 endpoint validation failed for SubnetId"?

If you are newbie to AWS Glue its really difficult to run the Crawlers without these failures , Below are basic steps you need to make sure done before running the Crawler

  1. AWS IAM Role and Privileges 
  2. S3 Endpoint 

1.AWS IAM Role and Policies 

We have to atleast attach below policies in IAM Role 

2.S3 Endpoint 

Most of the time you will get below error ,  

VPC S3 endpoint validation failed for SubnetId: subnet-0e0b8e2ad1b85d036. VPC: vpc-01bdc81e45566a823. Reason: Could not find S3 endpoint or NAT gateway for SubnetId: subnet-0e0b8e2ad1b85d036 in Vpc vpc-01bdc81e45566a823 (Service: AWSGlueJobExecutor; Status Code: 400; Error Code: InvalidInputException; Request ID: 0495f22f-a7b5-4f74-8691-7de8a6a47b42; Proxy: null)



To fix this error , you need understand the issue first . Its saying 

Create Endpoints for S3 not for Glue , Some worst cases people create Nat Gateway and they loss huge money for simple thing . So you have to create S3 Endpoint Gateway as like below ,


Once we created your job will run like flight :)




Migrate and Sync RDS PostgreSQL to Amazon Redshift Serverless using AWS Glue

Always customer prefers cost less solutions to run the business . To help them their business and the requirements we also needs to provide efficient solutions 

Some cases cloud vendors provides good solutions for analytics load but cost will be very high , most of the time we don't want to recommend that but we need to do  

Like that one of the solution in AWS , its cost much but works much faster like anything 

We are talking about Amazon Redshift Solutions only , So recently they have launched Amazon Redshift Serverless solutions for few regions .

Whatever new comes , before customer catches the features we need to find and deliver the best approach to them . So ,

What is Amazon Redshift Serverless ? 

Amazon Redshift Serverless automatically provisions data warehouse capacity and intelligently scales the underlying resources. Amazon Redshift Serverless adjusts capacity in seconds to deliver consistently high performance and simplified operations for even the most demanding and volatile workloads.

With Amazon Redshift Serverless, you can benefit from the following features:

  • Access and analyze data without the need to set up, tune, and manage Amazon Redshift provisioned clusters
  • Use the superior Amazon Redshift SQL capabilities, industry-leading performance, and data-lake integration to seamlessly query across a data warehouse, a data lake, and operational data sources
  • Deliver consistently high performance and simplified operations for the most demanding and volatile workloads with intelligent and automatic scaling
  • Use workgroups and namespaces to organize compute resources and data with granular cost controls
  • Pay only when the data warehouse is in use

So , Overall no need of human interventions in the Redshift Serverless 

Everything is Fine , How to we migrate and sync Amazon RDS / EC2 Postgres / Aurora Postgres to utilize this Redshift Serverless 

What are the options available to migrate and Sync ?

  • DMS - Still Redshift Target is not available to migrate the Data
  • Export/Import - Yes we can perform , how to handle zero downtime migration . Syncing real-time data is not possible 
  • AWS Glue - Its Good Option , We can migrate and Sync real-time data from RDS to Redshift Serverless  
Lets start sample data migrate and sync into Amazon Redshift Serverless ,

Environment Setup Go Through ,

  1. RDS PostgreSQL
  2. AWS Glue
  3. Amazon Redshift Serverless 
  4. VPC S3 Endpoint
  5. IAM Role

RDS PostgreSQL : 


Amazon Redshift Serverless :


VPC S3 Endpoint :


IAM Role : 


Once Environment is completed , we can start adding connections and jobs in AWS Glue

How to add connections in AWS Glue , 

In AWS Glue Console --> Click Connections --> Create Connections 





Create source and target databases ,



For testing sample schema and data inserted into RDS PostgreSQL before creating the crawler 

Lets start to create separate crawlers for Source and Target to update catalog ,

Below data source is mapped to RDS PostgreSQL 


Also required policies updated role needs to attached ,


Choose appropriate databases for crawler , below is the for source 


Below is the for Target , 


Once its completed , we have already deployed sample schema scripts in both side to transfer the data . Lets run the crawler and check it 


So both source and target tables are updated 


Lets Create Glue Job in AWS Console ,




After all , Lets schedule the job every 5 Minutes and sync the data

Whatever data is inserted in RDS PostgreSQL will be sync into Redshift Serverless Every 5 Minutes ,


So Quickly we can migrate and save our cost 

Any Troubles and issues Please contact me immediately !!!

Wednesday, March 30, 2022

MongoDB 4.4.13 : Here to SRV you with easier replica set connections

We love MongoDB for extraordinary features as per business perspective 

Lets come to our Blog Discussion , Only in PaaS Environments we have features like DNS endpoints for database easily connect with primary or secondary at any single point of failure 

Mongo Atlas Providing all the features but small scale customers still using MongoDB with Virtual Machines or EC2 Instances . To handle point of failures in primary we can use DNS Seed List Connection Format in mongoDB . We will discuss in detail how to we configure this in AWS Cloud 

What is seed list ?

Seed list can be list of hosts and ports in DNS Entries . Using DNS we can configure available mongoDB servers in under one hood . When client connects to an common DNS , its also knows replica set members available in seed list . Single SRV identifies all the nodes associated with the cluster .  Like Below ,
root@ip-172-31-86-8:~# mongo "mongodb+srv://superuser:zU2iU9pF7mO7rZ4z@db.datamongo.com/?authSource=admin&readPreference=primary&ssl=false"
Percona Server for MongoDB shell version v4.4.13-13
connecting to: mongodb://db1.datamongo.com:27717,db3.datamongo.com:27717,db2.datamongo.com:27717/?authSource=admin&compressors=disabled&gssapiServiceName=mongodb&readPreference=primary&replicaSet=db-replication&ssl=false


Environment Setup : 

For Testing Purpose , We have launched 3 Private Subnet Servers and 1 Public Subnet Server to use like Bastion . Create One Private Hosted Zone for DNS and Installed Percona Server for MongoDB 4.4.13 then configured Replication in it 

AWS EC2 Servers ,


Route 53 Hosted Zone ,


Creating A Records : 

We have launched private subnet instances , so we required to create A Records for private IP's . If Public IPv4 DNS available we can create CNAME Records

A Records Created for db1 server ,

Inside the datamongo.com hosted Zone , Just Click Create Record



Same like we need to create A Records for other two nodes 


Verify the A Records ,
root@ip-172-31-95-215:~# dig db1.datamongo.com

; <<>> DiG 9.11.3-1ubuntu1.17-Ubuntu <<>> db1.datamongo.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13639
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;db1.datamongo.com.             IN      A

;; ANSWER SECTION:
db1.datamongo.com.      10      IN      A       172.31.85.180

;; Query time: 2 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Tue Mar 29 11:58:09 UTC 2022
;; MSG SIZE  rcvd: 62

root@ip-172-31-95-215:~# dig db2.datamongo.com

; <<>> DiG 9.11.3-1ubuntu1.17-Ubuntu <<>> db2.datamongo.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9496
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;db2.datamongo.com.             IN      A

;; ANSWER SECTION:
db2.datamongo.com.      300     IN      A       172.31.83.127

;; Query time: 3 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Tue Mar 29 12:06:28 UTC 2022
;; MSG SIZE  rcvd: 62

root@ip-172-31-95-215:~# dig db3.datamongo.com

; <<>> DiG 9.11.3-1ubuntu1.17-Ubuntu <<>> db3.datamongo.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46401
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;db3.datamongo.com.             IN      A

;; ANSWER SECTION:
db3.datamongo.com.      300     IN      A       172.31.86.8

;; Query time: 2 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Tue Mar 29 12:06:33 UTC 2022
;; MSG SIZE  rcvd: 62

root@ip-172-31-95-215:~#


Creating SRV and TXT Records :

As like Atlas , Once we have the A Records for MongoDB Nodes , we can able to create SRV Records 

Again Inside the datamongo.com hosted Zone , Just Click Create Record


Once its created , again click create record and create TXT records 


Once all the records are created in hosted Zone , Just see the details in same page 


Reading SRV and TXT Records : 

We can use nslookup and verify the configured DNS Seeding ,
root@ip-172-31-95-215:~# nslookup
> set type=SRV
> _mongodb._tcp.db.datamongo.com
Server:         127.0.0.53
Address:        127.0.0.53#53

Non-authoritative answer:
_mongodb._tcp.db.datamongo.com  service = 0 0 27717 db2.datamongo.com.
_mongodb._tcp.db.datamongo.com  service = 0 0 27717 db3.datamongo.com.
_mongodb._tcp.db.datamongo.com  service = 0 0 27717 db1.datamongo.com.

Authoritative answers can be found from:
> set type=TXT
> db.datamongo.com
Server:         127.0.0.53
Address:        127.0.0.53#53

Non-authoritative answer:
db.datamongo.com        text = "authSource=admin&replicaSet=db-replication"

Authoritative answers can be found from:


Verify Connectivity : 

Its all done , We can verify the connectivity with DNS Seed List Connection format ,

By Default , it will connect with ssl true , but we have configured mongodb without SSL . If you required to configure with SSL please refer our blog and configure DNS Seeding with help of this blog 
root@ip-172-31-86-8:~# mongo "mongodb+srv://superuser:zU2iU9pF7mO7rZ4z@db.datamongo.com/?authSource=admin&readPreference=primary&ssl=false"
Percona Server for MongoDB shell version v4.4.13-13
connecting to: mongodb://db1.datamongo.com:27717,db3.datamongo.com:27717,db2.datamongo.com:27717/?authSource=admin&compressors=disabled&gssapiServiceName=mongodb&readPreference=primary&replicaSet=db-replication&ssl=false
Implicit session: session { "id" : UUID("ee74effc-92c7-4189-9e97-017afb4b4ad4") }
Percona Server for MongoDB server version: v4.4.13-13
---
The server generated these startup warnings when booting:
        2022-03-29T11:32:47.133+00:00: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine. See http://dochub.mongodb.org/core/prodnotes-filesystem
---
db-replication:PRIMARY> rs.status().members.find(r=>r.state===1).name;
172.31.83.127:27717
db-replication:PRIMARY> rs.status().members.find(r=>r.state===1).stateStr;
PRIMARY
db-replication:PRIMARY> rs.status().members.find(r=>r.state===2).name;
172.31.85.180:27717
db-replication:PRIMARY> rs.status().members.find(r=>r.state===2).stateStr;
SECONDARY

Currently 172.31.83.127 is the primary server and 172.31.85.180 is secondary , to test connection we have stopped the primary server (172.31.85.180) in AWS console 



after stopping primary server (172.31.83.127) , mongodb failover happened to to 172.31.85.180 . Its verified without disconnecting the mongo shell 

root@ip-172-31-86-8:~# mongo "mongodb+srv://superuser:zU2iU9pF7mO7rZ4z@db.datamongo.com/?authSource=admin&readPreference=primary&ssl=false"
Percona Server for MongoDB shell version v4.4.13-13
connecting to: mongodb://db1.datamongo.com:27717,db3.datamongo.com:27717,db2.datamongo.com:27717/?authSource=admin&compressors=disabled&gssapiServiceName=mongodb&readPreference=primary&replicaSet=db-replication&ssl=false
Implicit session: session { "id" : UUID("ee74effc-92c7-4189-9e97-017afb4b4ad4") }
Percona Server for MongoDB server version: v4.4.13-13
---
The server generated these startup warnings when booting:
2022-03-29T11:32:47.133+00:00: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine. See http://dochub.mongodb.org/core/prodnotes-filesystem
---
db-replication:PRIMARY> rs.status().members.find(r=>r.state===1).name;
172.31.83.127:27717
db-replication:PRIMARY> rs.status().members.find(r=>r.state===1).stateStr;
PRIMARY
db-replication:PRIMARY> rs.status().members.find(r=>r.state===2).name;
172.31.85.180:27717
db-replication:PRIMARY> rs.status().members.find(r=>r.state===2).stateStr;
SECONDARY
db-replication:PRIMARY> rs.status().members.find(r=>r.state===1).name;
172.31.85.180:27717
db-replication:PRIMARY> rs.status().members.find(r=>r.state===1).stateStr;
PRIMARY

Its working as expected and we have no worries if anything happens on mongoDB primary node in Cloud IaaS as Well !!!

Please contact us if any queries and concerns , we are always happy to help !!!

Wednesday, February 16, 2022

SQL Server Always On availability group cluster in the AWS Cloud


Microsoft gives HA features like a charm . Lower to higher deployment costs its giving many features as per business requirements . Replication , Mirroring , Log shipping and Always On many features available to build HA Setup in On Premises . 

Like wise , we can setup all the above features in Cloud as well . In that we can see Always on availability group cluster in this blog 




What is Always On Availability Group?


  • An availability group supports a replicated environment for a discrete set of user databases, known as availability databases. 
  • You can create an availability group for high availability (HA) or for read-scale. An HA availability group is a group of databases that fail over together.  


Environment Setup for Always on Availability Group ,


Launched one Active Directory and two SQL Nodes with below range . Detailed setup for environment steps are below ,


Below are the detailed steps for environment steps ,
Step 1 : Create ag-sql-vpc with 10.0.0.0/16 IPv4 CIDR range


Step 2 : Create two private subnets ( 10.0.3.0/24 , 10.0.1.0/24 ) for SQL Nodes and one public subnet ( 10.0.4.0/24 ) for Active Directory  


Step 3 : Launched the windows instances with two secondary ip's for Failover Cluster and Always on Listener
In this POC Setup , Launched windows instance and installed SQL Server Developer edition . Also we can launch Windows with SQL Server 2016 based on your requirements 







Step 4 : Change the computer properties and rename the instance names accordingly 
Step 5 : Completed the AD Server configuration and its named as ag-sql-AD , After that change DNS server address in network properties in ag-sql-node1 and ag-sql-node2 ( 10.0.4.33 is static IP of AD Server )



Step 6 : Once modified the DNS configuration reboot the server and login with AD administrator account 
Step 7 : Once logged in with AD login , Install the failover clustering and below dependent features in ag-sql-node1 and ag-sql-node2


Configuring Shared Drive for Backup and Restore 


Step 8 : Between the ag-sql-node1 and ag-sql-node2 needs to take backup and log backups for Always on background process

Step 9 : Create folder in ag-sql-node2 and share with everyone in AD account
  
Step 10 : Take one time backup of DW_Mart and DataLake in that shared folder . Created Shared drive will be used while always on group creation 

Failover Cluster Configuration 


Step 11 : Open the Failover Cluster Manager console and Create the cluster . Browse and add the both servers 


Step 12 : Once all the steps finished , create the cluster wizard 



Step 13 : Click agsqlprod failover cluster and modify the cluster core resources . In this we need to add secondary IP for both nodes ( 10.0.1.11 and 10.0.3.11 )

Once we added both secondary IP's one of the IP will be come to online 

If we have not added secondary IP , it will show as an error like below 



Configuring SQL Server Services  


Step 14 : Once all the steps are completed on Failover cluster manager , modify the SQL Service Account to AD service account 

Step 15 : Next right click the SQL Server Service in configuration manager and enable the Always on High Availability on ag-sql-node1 and ag-sql-node2 SQL instances 


Create and Configuring the Availability Group 

Step 16 : Right click the always on group wizard and create the availability group as agsqldb


Step 17 : Based on the requirements add the number of replicas ,


Step 18 : Below are the endpoints and make sure allowed below ports between the cluster nodes 


Step 19 : Then create availability group listener with remaining secondary IP ( 10.0.1.12 and 10.0.3.12 )


Step 20 : Once everything is completed click Next to create availability group 



Once its created we can able to see the Cluster Manager Role in the Failover cluster manager console ,




Ready to sync the Data from Primary to Secondary 



After all availability group is healthy and primary and secondary nodes are synchronized 

Thanks for Reading !!! Any corrections or any doubt please contact me directly !!!