Introduction
Database administrators and support teams spend significant time searching through operational runbooks, troubleshooting guides, backup procedures, and recovery documentation.
Traditional keyword search often fails because documentation can be written differently than the question being asked.
To solve this challenge, I built a lightweight Database Operations Knowledge Assistant using:
- Amazon MemoryDB for Valkey
- Vector Search
- Sentence Transformers
- Streamlit
- Python
The solution allows users to upload database documentation and perform semantic searches using natural language.
Problem Statement
Consider the following question:
How do I configure streaming replication?
The PostgreSQL documentation may contain:
- Standby Server Setup
- WAL Sender Configuration
- Replication Slots
- Archive Recovery
Traditional keyword search may miss relevant sections.
Vector Search understands semantic meaning and retrieves the most relevant content.
Solution Architecture
Technology Stack
| Component | Technology |
|---|---|
| Frontend | Streamlit |
| Language | Python |
| Embeddings | Sentence Transformers |
| Vector Database | Amazon MemoryDB |
| Search Engine | HNSW Vector Search |
| Cloud Platform | AWS |
Why Amazon MemoryDB?
Amazon MemoryDB recently introduced native Vector Search support.
Benefits include:
- Fully Managed
- Multi-AZ Architecture
- High Availability
- TLS Encryption
- Native Valkey Compatibility
- HNSW Vector Indexing
- Low Latency Search
Unlike self-managed vector databases, MemoryDB eliminates operational overhead.
Step 1: Create MemoryDB Subnet Group
Create a subnet group spanning multiple Availability Zones.
This ensures future scalability and high availability.
Selected subnets:
- ap-south-1a
- ap-south-1b
- ap-south-1c
Vector Search clusters require proper VPC networking.
Step 2: Create MemoryDB Cluster
Create a new cluster
Most importantly:
- Enable Vector Search
This option cannot be changed after cluster creation.
Step 3: Verify Cluster Creation
After provisioning
Step 4: Connect Using Redis CLI
redis-cli \-h clustercfg.database-assistant.memorydb.amazonaws.com \-p 6379 \--tls \--user default \-a <password>
INFO server
server_name: valkeyos: Amazon MemoryDBvalkey_version: 7.3.0
Step 5: Generate Embeddings
from sentence_transformers import SentenceTransformermodel = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
Embedding Dimensions: 384
Step 6: Create Vector Index
FT.CREATE docs_idxON HASHPREFIX 1 "doc:"SCHEMAdocument TEXTchunk TEXTembedding VECTOR HNSW 6TYPE FLOAT32DIM 384DISTANCE_METRIC COSINE
FT._LIST
docs_idx
Step 7: Upload Database Documentation
Document Size:7,484,712 charactersTotal Chunks:3,743
Step 8: Store Chunks in MemoryDB
Step 9: Build Streamlit Search Interface
- PDF Upload
- Duplicate Detection
- Document Indexing
- Semantic Search
- Search Results Viewer
Step 10: Perform Semantic Search
How do I configure streaming replication?
Current Limitations
This project is intentionally lightweight.
Current limitations:
No LLM Integration
Results display retrieved chunks only.
No summarization is performed.
Chunk-Level Search
Search returns matching chunks instead of complete answers.
No Metadata Filtering
Currently
All Documents
are searched together.
Basic Ranking
Only vector similarity is used.
Hybrid ranking is not implemented.
Future Enhancements
Planned improvements:
Amazon Bedrock Integration
Generate human-readable answers.
RAG Architecture
MemoryDB
+
Amazon Bedrock
+
Prompt Engineering
Hybrid Search
Combine:
- Vector Search
- Keyword Search
Metadata Filtering
Search by:
- Database Type
- Document Name
- Version
- Environment
User Authentication
- IAM
- Cognito
- SSO
Enterprise Features
- RBAC
- Multi-Tenant Support
- Audit Logging
Lessons Learned
MemoryDB Vector Search is Production Ready
The setup process was straightforward.
Semantic Search Works Well for Documentation
Even without an LLM, vector retrieval significantly improves document discovery.
Cost Effective
db.t4g.small
Excellent Foundation for RAG
This architecture can easily evolve into a complete enterprise knowledge assistant.
Conclusion
Using Amazon MemoryDB Vector Search, I built a lightweight semantic search platform for database operations documentation.
The solution successfully:
- Indexed PostgreSQL documentation
- Generated embeddings
- Stored vectors in MemoryDB
- Performed semantic search
- Returned relevant operational procedures
For organizations looking to build an internal knowledge search platform without deploying OpenSearch or a heavyweight vector database, Amazon MemoryDB Vector Search provides a simple, scalable, and managed alternative.
The next evolution of this project is integrating Amazon Bedrock to transform semantic search into a complete Retrieval-Augmented Generation (RAG) solution.
Complete Code : https://github.com/selvackp/db-knowledge-assistant/tree/main









0 Comments