AI Assistant Portfolio Upgrade: From Concept to Production RAG System

AI Assistant Portfolio Upgrade: From Concept to Production RAG System

When I first started thinking about enhancing my portfolio, I knew I wanted something that would truly showcase my AI/ML engineering capabilities. What better way than to build an intelligent conversational AI that could discuss my work, experience, and projects? Today, I'm excited to share how I transformed my portfolio from a static site into an interactive AI-powered experience.

The Vision: More Than Just a Chatbot

The goal wasn't just to add a chatbot to my portfolio - anyone can do that with existing tools. I wanted to build a sophisticated Retrieval-Augmented Generation (RAG) system that could intelligently answer questions about my professional background by combining the power of large language models with semantic search through my knowledge base.

The key requirements were:

  • Intelligent Conversations: Context-aware responses about my work and experience
  • Enterprise-Grade Security: Secure API key management and proper access controls
  • Scalable Architecture: Built on AWS serverless infrastructure
  • Real-time Performance: Fast response times for seamless user experience
  • Professional Quality: Production-ready implementation, not a proof of concept

Architecture Overview: AWS Amplify Gen2 + Bedrock + Pinecone

The final architecture leverages several cutting-edge AWS services and technologies:

AI Assistant RAG System Architecture FlowšŸ”Complete RAG system workflow: User query flows through Next.js API to Lambda function, which orchestrates Pinecone vector search and AWS Bedrock to generate contextually relevant AI responses

Core Components

Frontend: Next.js 14 with TypeScript and modern React patterns Backend: AWS Amplify Gen2 with Lambda functions AI Engine: AWS Bedrock (Claude 3 Sonnet) Vector Database: Pinecone for semantic search Security: AWS Secrets Manager and IAM roles Deployment: Serverless architecture with auto-scaling

Implementation Deep Dive

1. AWS Amplify Gen2 Backend Configuration

First, I set up the Amplify backend with proper IAM permissions. The beauty of Amplify Gen2 is how it simplifies complex AWS resource management:

// amplify/backend.ts
import { defineBackend } from '@aws-amplify/backend';
import { PolicyStatement } from 'aws-cdk-lib/aws-iam';

const backend = defineBackend({
  auth,
  data,
  bedrockChat,
});

// Bedrock permissions for Claude 3 Sonnet
backend.bedrockChat.resources.lambda.addToRolePolicy(
  new PolicyStatement({
    actions: ['bedrock:InvokeModel'],
    resources: ['arn:aws:bedrock:*:*:foundation-model/anthropic.claude-3-sonnet-*']
  })
);

// Secrets Manager access for API keys
backend.bedrockChat.resources.lambda.addToRolePolicy(
  new PolicyStatement({
    actions: ['secretsmanager:GetSecretValue'],
    resources: ['arn:aws:secretsmanager:*:*:secret:pinecone/api-keys*']
  })
);

2. Lambda Function: The Heart of the RAG System

The Lambda function orchestrates the entire RAG workflow. Here's how it works:

Step 1: Secure API Key Retrieval

async function getPineconeApiKey(): Promise<string> {
  const secretName = "pinecone/api-keys";
  const client = new SecretsManagerClient({ region: process.env.AWS_REGION });
  
  const response = await client.send(new GetSecretValueCommand({
    SecretId: secretName
  }));
  
  return JSON.parse(response.SecretString!).api_key;
}

Step 2: Semantic Search with Pinecone

async function searchPinecone(query: string, apiKey: string): Promise<string[]> {
  const pineconeUrl = process.env.PINECONE_INDEX_URL; // Your Pinecone index URL
  
  const response = await fetch(pineconeUrl, {
    method: 'POST',
    headers: {
      'Api-Key': apiKey,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      query: {
        inputs: { text: query },
        top_k: 5
      },
      fields: ['text', 'content', 'chunk_text']
    })
  });
  
  const data = await response.json();
  return data.result?.hits?.map(hit => hit.fields?.text).filter(Boolean) || [];
}

Step 3: Context-Augmented AI Response

// Enhanced system prompt with retrieved context
let systemPrompt = `You are an AI assistant representing [Your Name]...`;

if (relevantContexts.length > 0) {
  systemPrompt += "\n\nAdditional relevant context:\n" + 
                 relevantContexts.join('\n\n');
}

// Bedrock Claude 3 Sonnet invocation
const bedrockResponse = await bedrockClient.send(new InvokeModelCommand({
  modelId: 'anthropic.claude-3-sonnet-20240229-v1:0',
  body: JSON.stringify({
    anthropic_version: "bedrock-2023-05-31",
    max_tokens: 1000,
    system: systemPrompt,
    messages: [{ role: "user", content: message }]
  })
}));

3. Frontend Integration: Seamless User Experience

The frontend provides a modern chat interface with real-time markdown rendering:

// Next.js API route for clean separation of concerns
export async function POST(request: NextRequest) {
  const { message } = await request.json();
  
  const lambdaResponse = await fetch(process.env.LAMBDA_FUNCTION_URL, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ message }),
  });
  
  const data = await lambdaResponse.json();
  return NextResponse.json({ message: data.reply });
}

Technical Challenges and Key Decisions

Decision 1: Pinecone vs AWS OpenSearch Knowledge Bases

Consideration: While AWS Bedrock Knowledge Bases with OpenSearch would provide native integration, the cost implications were significant for a portfolio project. Pinecone's serverless pricing model offered better cost predictability and lower entry costs. Decision: Chose Pinecone for vector search, accepting the additional integration complexity in exchange for cost efficiency.

Decision 2: Direct Bedrock Invocation vs Bedrock Agents

Consideration: Bedrock Agents would have provided more sophisticated orchestration and potentially better vector search integration out of the box. Decision: Implemented direct model invocation for better control over the RAG pipeline and cost management. This gives me full visibility into each step of the process. Future Upgrade: Migrating to Bedrock Agents could provide more sophisticated reasoning capabilities and automated tool selection.

Challenge 1: Vector Database Integration

Problem: Integrating Pinecone with AWS Lambda while maintaining performance Solution: Optimized API calls with proper error handling and fallback mechanisms

Challenge 2: Security at Scale

Problem: Secure API key management in serverless environment Solution: AWS Secrets Manager with least-privilege IAM policies

Challenge 3: Response Quality

Problem: Ensuring AI responses are accurate and contextually relevant Solution: Sophisticated prompt engineering with retrieval-augmented context

Challenge 4: Performance Optimization

Problem: Minimizing latency in the RAG pipeline Solution: Parallel processing of vector search and optimized Lambda configuration

Results and Impact

The implementation exceeded my expectations:

šŸš€ Performance: Sub-3-second response times for complex queries šŸ”’ Security: Enterprise-grade security with zero API key exposure šŸŽÆ Accuracy: High-quality responses grounded in my actual experience šŸ“± User Experience: Seamless chat interface with markdown support ⚔ Scalability: Auto-scaling serverless architecture

Key Learnings

AWS Amplify Gen2: A Game Changer

The most significant learning was how AWS Amplify Gen2 transforms serverless development:

  1. Seamless CI/CD: Every push to the repository triggers automatic deployment across the entire stack - frontend, backend, and Lambda functions. No manual deployment steps needed.

  2. Local Development Excellence: The Amplify sandbox environment allows local testing of Lambda functions with ease. Unlike previous projects where testing serverless functions locally was painful, Amplify's npx ampx sandbox command creates a complete development environment that mirrors production.

  3. Infrastructure as Code: Defining backend resources in TypeScript feels natural and maintainable compared to raw CloudFormation or Terraform.

  4. Automatic Environment Management: Amplify handles environment variables, secrets, and service connections automatically across development and production environments.

Technical Architecture Insights

  1. RAG Systems Work: The combination of semantic search + LLM produces remarkably relevant responses when properly implemented.

  2. Cost-Driven Architecture: Choosing Pinecone over AWS OpenSearch Knowledge Bases saved significant costs while maintaining functionality. Sometimes the "native" solution isn't the most practical one.

  3. Control vs Convenience Trade-offs: Direct Bedrock invocation gives complete control over the RAG pipeline, though Bedrock Agents might offer more sophisticated capabilities for future iterations.

  4. Security First: Proper secrets management is crucial for production systems - AWS Secrets Manager integration was seamless with Amplify.

  5. User Experience Matters: A smooth frontend is as important as a sophisticated backend - the chat interface needed to feel natural and responsive.

Development Workflow Transformation

Coming from projects where local Lambda testing required complex setups, Docker containers, or SAM CLI configurations, Amplify's development experience is revolutionary. The ability to test the entire RAG pipeline locally, including Bedrock calls and Pinecone integration, dramatically accelerated development cycles.

What's Next?

This RAG implementation opens up exciting possibilities for future enhancements:

Short-term Improvements

  • Conversation Memory: Adding persistent chat history for multi-turn conversations
  • Response Streaming: Implementing real-time response streaming for better UX
  • Enhanced Error Handling: More sophisticated fallback mechanisms for service failures

Medium-term Evolution

  • Bedrock Agents Migration: Exploring AWS Bedrock Agents for more sophisticated reasoning and automated tool selection - this could provide better orchestration of the RAG pipeline
  • Multi-modal Capabilities: Integrating document and image understanding for richer interactions
  • Knowledge Base Expansion: Continuously updating and expanding the vector database with new content

Long-term Vision

  • Advanced Analytics: Tracking user interactions and improving responses based on usage patterns
  • Personalization: Adapting responses based on user interests and interaction history
  • Multi-language Support: Expanding beyond English to serve a global audience

The architecture I've built provides a solid foundation for these future enhancements, with the flexibility to swap components (like migrating from direct Bedrock calls to Bedrock Agents) without major restructuring.

Technical Stack Summary

ComponentTechnologyPurpose
FrontendNext.js 14 + TypeScriptModern React application
BackendAWS Amplify Gen2Serverless infrastructure
AI EngineAWS Bedrock (Claude 3 Sonnet)Language model inference
Vector DBPineconeSemantic search and retrieval
SecurityAWS Secrets Manager + IAMSecure key management
HostingAWS Lambda + CloudFrontScalable deployment

Conclusion

Building this AI assistant has been an incredible journey that combines my passion for AI/ML engineering with practical cloud architecture skills. It's not just a portfolio feature - it's a demonstration of how modern AI systems can be built with enterprise-grade quality and security.

What This Project Demonstrates

The system showcases real-world application of:

  • RAG Architecture: Practical implementation of retrieval-augmented generation
  • Cost-Conscious Engineering: Making architectural decisions based on practical constraints
  • Cloud Engineering: Serverless AWS infrastructure management with Amplify Gen2
  • AI/ML Operations: Production-ready AI system deployment and testing
  • Security Engineering: Proper secrets management and access controls

The Amplify Advantage

Perhaps the most valuable learning has been experiencing AWS Amplify Gen2's development workflow. Having worked on serverless projects where local testing was complex and deployments were manual processes, Amplify's seamless integration of development, testing, and deployment represents a significant leap forward in developer productivity.

Feel free to try out the AI assistant on my portfolio and ask it anything about my work and experience. It's a living example of how conversational AI can enhance professional portfolios and create engaging user experiences.

The decision to build this from scratch, rather than using existing chatbot solutions, has provided deep insights into RAG system architecture, cost optimization strategies, and modern cloud development practices that will be invaluable for future AI/ML projects.


Want to learn more about the implementation details? Check out the full source code on my GitHub repository or connect with me on LinkedIn to discuss AI/ML engineering projects.