Profile picture

Arkadiusz Kulpa

AI & ML Engineer

HomeAboutBlogChat
Profile picture

Arkadiusz Kulpa

AI & ML Engineer

HomeAboutBlogChat

Adding an AI Chat Assistant with RAG to My Portfolio

2025-08-205 min read
aws-bedrockpineconeraglambdadynamodbnextjsdevlog

Adding an AI Chat Assistant with RAG to My Portfolio

A portfolio shows what you have built. But what if it could also talk about what you have built?

That was the idea behind adding an AI chat assistant to my site. Not a generic chatbot that spits out canned responses, but a retrieval-augmented generation system that understands my projects, experience, and technical background — and can hold an intelligent conversation about them.

This is the second post in the development journey series. The foundation was in place. Now it was time to make it intelligent.

The Vision

I wanted visitors to be able to ask questions like "What AI projects has Arek worked on?" or "Tell me about your experience with AWS" and get accurate, contextual answers grounded in my actual portfolio content. Not hallucinated generalities — real information, retrieved and synthesised in real time.

The key requirements:

  • Contextual answers grounded in my portfolio content, not generic LLM knowledge
  • Real-time streaming so responses feel conversational, not like waiting for a loading spinner
  • Conversation history so the assistant remembers what was discussed in the current session
  • Enterprise-grade infrastructure on AWS — no third-party chatbot services

The Architecture

The system uses a classic RAG (Retrieval-Augmented Generation) pattern:

  1. Embed — Portfolio content (project descriptions, blog posts, experience details) gets chunked and embedded into vectors
  2. Store — Vectors go into a Pinecone index for fast similarity search
  3. Retrieve — When a user asks a question, the query gets embedded and the most relevant chunks are retrieved from Pinecone
  4. Generate — The retrieved context plus the user's question go to AWS Bedrock (Claude 3 Sonnet), which generates a grounded response
User question → Embed query → Pinecone similarity search → Top-K chunks
                                                              ↓
                                          Claude 3 Sonnet ← Context + Question
                                                              ↓
                                                      Streamed response → User

AWS Bedrock

I chose AWS Bedrock over calling the Anthropic API directly because it keeps everything within the AWS ecosystem. IAM handles authentication, there are no API keys to manage externally, and the billing integrates with my existing AWS account. Claude 3 Sonnet provided the right balance of capability and cost for a portfolio assistant.

Pinecone Vector Database

Pinecone handles the vector storage and similarity search. Each piece of portfolio content gets split into chunks, embedded using a sentence transformer, and stored with metadata (source document, section, topic). At query time, a single API call returns the most relevant chunks ranked by cosine similarity.

The beauty of this approach is that I can update my portfolio content — add a new project, update my experience — and the assistant immediately has access to the latest information after re-embedding.

Lambda Functions

Two Lambda functions power the backend:

  • search-pinecone — Takes the user's query, embeds it, searches Pinecone, and returns relevant context chunks
  • bedrockChat — Receives the context plus conversation history, constructs the prompt, and streams the response from Bedrock back to the client

Both run on Node.js 20 with minimal cold start times. The serverless model means I pay nothing when nobody is chatting, and the system scales automatically if traffic spikes.

Streaming Responses

This was one of the trickier parts. LLM responses can take several seconds to generate fully, and making the user wait for the complete response before displaying anything creates a terrible experience. Streaming — sending tokens as they are generated — makes the interaction feel conversational.

The frontend connects to the backend and receives tokens progressively, rendering them as they arrive. The user sees the response build word by word, just like ChatGPT or Claude. Behind the scenes, this required careful handling of the streaming connection, buffering, and error states.

Conversation History

A chat assistant that forgets what you just said is frustrating. I implemented conversation persistence using DynamoDB for metadata (session ID, timestamps, message counts) and S3 for detailed conversation logs.

Each session gets a unique ID. Messages are tracked and stored so that the assistant maintains context throughout the conversation. When constructing the prompt for Bedrock, the recent conversation history is included alongside the retrieved context, giving the model everything it needs to generate coherent, contextual responses.

Dark and Light Mode

While building the chat interface, I also standardised the site's dark and light mode system. The chat UI needed to work in both themes, which exposed inconsistencies in how theming was handled across the site. I consolidated the approach using CSS custom properties and a theme context, ensuring every component — including the new chat interface — renders correctly in both modes.

What This Means for You

If you visit the chat page on my portfolio, you are interacting with a system I built end-to-end. The embeddings were generated from my actual content. The retrieval happens in real time against a production Pinecone index. The responses stream from Claude 3 Sonnet through my Lambda functions. The conversation history persists for your session.

It is not a demo or a wrapper around someone else's API. It is a complete RAG implementation running on AWS infrastructure.

What Came Next

With the AI assistant live, the site had two main features: a static blog and an intelligent chat. But the blog was still just markdown files committed to the repository. Every new post required a code commit and a deployment. That was fine for a handful of articles, but it would not scale.

The next step was to rebuild the blog backend entirely — replacing static files with S3, DynamoDB, and a proper admin UI.

That story is in From Static Markdown to a Full CMS.


This post is part of a series documenting the development of arkadiuszkulpa.co.uk.

← Back to Blog