AI Assistant Portfolio Upgrade: From Concept to Production RAG System
AI Assistant Portfolio Upgrade: From Concept to Production RAG System
When I first started thinking about enhancing my portfolio, I knew I wanted something that would truly showcase my AI/ML engineering capabilities. What better way than to build an intelligent conversational AI that could discuss my work, experience, and projects? Today, I'm excited to share how I transformed my portfolio from a static site into an interactive AI-powered experience.
The Vision: More Than Just a Chatbot
The goal wasn't just to add a chatbot to my portfolio - anyone can do that with existing tools. I wanted to build a sophisticated Retrieval-Augmented Generation (RAG) system that could intelligently answer questions about my professional background by combining the power of large language models with semantic search through my knowledge base.
The key requirements were:
- Intelligent Conversations: Context-aware responses about my work and experience
- Enterprise-Grade Security: Secure API key management and proper access controls
- Scalable Architecture: Built on AWS serverless infrastructure
- Real-time Performance: Fast response times for seamless user experience
- Professional Quality: Production-ready implementation, not a proof of concept
Architecture Overview: AWS Amplify Gen2 + Bedrock + Pinecone
The final architecture leverages several cutting-edge AWS services and technologies:
Complete RAG system workflow: User query flows through Next.js API to Lambda function, which orchestrates Pinecone vector search and AWS Bedrock to generate contextually relevant AI responses
Core Components
Frontend: Next.js 14 with TypeScript and modern React patterns Backend: AWS Amplify Gen2 with Lambda functions AI Engine: AWS Bedrock (Claude 3 Sonnet) Vector Database: Pinecone for semantic search Security: AWS Secrets Manager and IAM roles Deployment: Serverless architecture with auto-scaling
Implementation Deep Dive
1. AWS Amplify Gen2 Backend Configuration
First, I set up the Amplify backend with proper IAM permissions. The beauty of Amplify Gen2 is how it simplifies complex AWS resource management:
// amplify/backend.ts
import { defineBackend } from '@aws-amplify/backend';
import { PolicyStatement } from 'aws-cdk-lib/aws-iam';
const backend = defineBackend({
auth,
data,
bedrockChat,
});
// Bedrock permissions for Claude 3 Sonnet
backend.bedrockChat.resources.lambda.addToRolePolicy(
new PolicyStatement({
actions: ['bedrock:InvokeModel'],
resources: ['arn:aws:bedrock:*:*:foundation-model/anthropic.claude-3-sonnet-*']
})
);
// Secrets Manager access for API keys
backend.bedrockChat.resources.lambda.addToRolePolicy(
new PolicyStatement({
actions: ['secretsmanager:GetSecretValue'],
resources: ['arn:aws:secretsmanager:*:*:secret:pinecone/api-keys*']
})
);
2. Lambda Function: The Heart of the RAG System
The Lambda function orchestrates the entire RAG workflow. Here's how it works:
Step 1: Secure API Key Retrieval
async function getPineconeApiKey(): Promise<string> {
const secretName = "pinecone/api-keys";
const client = new SecretsManagerClient({ region: process.env.AWS_REGION });
const response = await client.send(new GetSecretValueCommand({
SecretId: secretName
}));
return JSON.parse(response.SecretString!).api_key;
}
Step 2: Semantic Search with Pinecone
async function searchPinecone(query: string, apiKey: string): Promise<string[]> {
const pineconeUrl = process.env.PINECONE_INDEX_URL; // Your Pinecone index URL
const response = await fetch(pineconeUrl, {
method: 'POST',
headers: {
'Api-Key': apiKey,
'Content-Type': 'application/json'
},
body: JSON.stringify({
query: {
inputs: { text: query },
top_k: 5
},
fields: ['text', 'content', 'chunk_text']
})
});
const data = await response.json();
return data.result?.hits?.map(hit => hit.fields?.text).filter(Boolean) || [];
}
Step 3: Context-Augmented AI Response
// Enhanced system prompt with retrieved context
let systemPrompt = `You are an AI assistant representing [Your Name]...`;
if (relevantContexts.length > 0) {
systemPrompt += "\n\nAdditional relevant context:\n" +
relevantContexts.join('\n\n');
}
// Bedrock Claude 3 Sonnet invocation
const bedrockResponse = await bedrockClient.send(new InvokeModelCommand({
modelId: 'anthropic.claude-3-sonnet-20240229-v1:0',
body: JSON.stringify({
anthropic_version: "bedrock-2023-05-31",
max_tokens: 1000,
system: systemPrompt,
messages: [{ role: "user", content: message }]
})
}));
3. Frontend Integration: Seamless User Experience
The frontend provides a modern chat interface with real-time markdown rendering:
// Next.js API route for clean separation of concerns
export async function POST(request: NextRequest) {
const { message } = await request.json();
const lambdaResponse = await fetch(process.env.LAMBDA_FUNCTION_URL, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ message }),
});
const data = await lambdaResponse.json();
return NextResponse.json({ message: data.reply });
}
Technical Challenges and Key Decisions
Decision 1: Pinecone vs AWS OpenSearch Knowledge Bases
Consideration: While AWS Bedrock Knowledge Bases with OpenSearch would provide native integration, the cost implications were significant for a portfolio project. Pinecone's serverless pricing model offered better cost predictability and lower entry costs. Decision: Chose Pinecone for vector search, accepting the additional integration complexity in exchange for cost efficiency.
Decision 2: Direct Bedrock Invocation vs Bedrock Agents
Consideration: Bedrock Agents would have provided more sophisticated orchestration and potentially better vector search integration out of the box. Decision: Implemented direct model invocation for better control over the RAG pipeline and cost management. This gives me full visibility into each step of the process. Future Upgrade: Migrating to Bedrock Agents could provide more sophisticated reasoning capabilities and automated tool selection.
Challenge 1: Vector Database Integration
Problem: Integrating Pinecone with AWS Lambda while maintaining performance Solution: Optimized API calls with proper error handling and fallback mechanisms
Challenge 2: Security at Scale
Problem: Secure API key management in serverless environment Solution: AWS Secrets Manager with least-privilege IAM policies
Challenge 3: Response Quality
Problem: Ensuring AI responses are accurate and contextually relevant Solution: Sophisticated prompt engineering with retrieval-augmented context
Challenge 4: Performance Optimization
Problem: Minimizing latency in the RAG pipeline Solution: Parallel processing of vector search and optimized Lambda configuration
Results and Impact
The implementation exceeded my expectations:
š Performance: Sub-3-second response times for complex queries š Security: Enterprise-grade security with zero API key exposure šÆ Accuracy: High-quality responses grounded in my actual experience š± User Experience: Seamless chat interface with markdown support ā” Scalability: Auto-scaling serverless architecture
Key Learnings
AWS Amplify Gen2: A Game Changer
The most significant learning was how AWS Amplify Gen2 transforms serverless development:
-
Seamless CI/CD: Every push to the repository triggers automatic deployment across the entire stack - frontend, backend, and Lambda functions. No manual deployment steps needed.
-
Local Development Excellence: The Amplify sandbox environment allows local testing of Lambda functions with ease. Unlike previous projects where testing serverless functions locally was painful, Amplify's
npx ampx sandbox
command creates a complete development environment that mirrors production. -
Infrastructure as Code: Defining backend resources in TypeScript feels natural and maintainable compared to raw CloudFormation or Terraform.
-
Automatic Environment Management: Amplify handles environment variables, secrets, and service connections automatically across development and production environments.
Technical Architecture Insights
-
RAG Systems Work: The combination of semantic search + LLM produces remarkably relevant responses when properly implemented.
-
Cost-Driven Architecture: Choosing Pinecone over AWS OpenSearch Knowledge Bases saved significant costs while maintaining functionality. Sometimes the "native" solution isn't the most practical one.
-
Control vs Convenience Trade-offs: Direct Bedrock invocation gives complete control over the RAG pipeline, though Bedrock Agents might offer more sophisticated capabilities for future iterations.
-
Security First: Proper secrets management is crucial for production systems - AWS Secrets Manager integration was seamless with Amplify.
-
User Experience Matters: A smooth frontend is as important as a sophisticated backend - the chat interface needed to feel natural and responsive.
Development Workflow Transformation
Coming from projects where local Lambda testing required complex setups, Docker containers, or SAM CLI configurations, Amplify's development experience is revolutionary. The ability to test the entire RAG pipeline locally, including Bedrock calls and Pinecone integration, dramatically accelerated development cycles.
What's Next?
This RAG implementation opens up exciting possibilities for future enhancements:
Short-term Improvements
- Conversation Memory: Adding persistent chat history for multi-turn conversations
- Response Streaming: Implementing real-time response streaming for better UX
- Enhanced Error Handling: More sophisticated fallback mechanisms for service failures
Medium-term Evolution
- Bedrock Agents Migration: Exploring AWS Bedrock Agents for more sophisticated reasoning and automated tool selection - this could provide better orchestration of the RAG pipeline
- Multi-modal Capabilities: Integrating document and image understanding for richer interactions
- Knowledge Base Expansion: Continuously updating and expanding the vector database with new content
Long-term Vision
- Advanced Analytics: Tracking user interactions and improving responses based on usage patterns
- Personalization: Adapting responses based on user interests and interaction history
- Multi-language Support: Expanding beyond English to serve a global audience
The architecture I've built provides a solid foundation for these future enhancements, with the flexibility to swap components (like migrating from direct Bedrock calls to Bedrock Agents) without major restructuring.
Technical Stack Summary
Component | Technology | Purpose |
---|---|---|
Frontend | Next.js 14 + TypeScript | Modern React application |
Backend | AWS Amplify Gen2 | Serverless infrastructure |
AI Engine | AWS Bedrock (Claude 3 Sonnet) | Language model inference |
Vector DB | Pinecone | Semantic search and retrieval |
Security | AWS Secrets Manager + IAM | Secure key management |
Hosting | AWS Lambda + CloudFront | Scalable deployment |
Conclusion
Building this AI assistant has been an incredible journey that combines my passion for AI/ML engineering with practical cloud architecture skills. It's not just a portfolio feature - it's a demonstration of how modern AI systems can be built with enterprise-grade quality and security.
What This Project Demonstrates
The system showcases real-world application of:
- RAG Architecture: Practical implementation of retrieval-augmented generation
- Cost-Conscious Engineering: Making architectural decisions based on practical constraints
- Cloud Engineering: Serverless AWS infrastructure management with Amplify Gen2
- AI/ML Operations: Production-ready AI system deployment and testing
- Security Engineering: Proper secrets management and access controls
The Amplify Advantage
Perhaps the most valuable learning has been experiencing AWS Amplify Gen2's development workflow. Having worked on serverless projects where local testing was complex and deployments were manual processes, Amplify's seamless integration of development, testing, and deployment represents a significant leap forward in developer productivity.
Feel free to try out the AI assistant on my portfolio and ask it anything about my work and experience. It's a living example of how conversational AI can enhance professional portfolios and create engaging user experiences.
The decision to build this from scratch, rather than using existing chatbot solutions, has provided deep insights into RAG system architecture, cost optimization strategies, and modern cloud development practices that will be invaluable for future AI/ML projects.
Want to learn more about the implementation details? Check out the full source code on my GitHub repository or connect with me on LinkedIn to discuss AI/ML engineering projects.