A portfolio shows what you have built. But what if it could also talk about what you have built?
That was the idea behind adding an AI chat assistant to my site. Not a generic chatbot that spits out canned responses, but a retrieval-augmented generation system that understands my projects, experience, and technical background — and can hold an intelligent conversation about them.
This is the second post in the development journey series. The foundation was in place. Now it was time to make it intelligent.
I wanted visitors to be able to ask questions like "What AI projects has Arek worked on?" or "Tell me about your experience with AWS" and get accurate, contextual answers grounded in my actual portfolio content. Not hallucinated generalities — real information, retrieved and synthesised in real time.
The key requirements:
The system uses a classic RAG (Retrieval-Augmented Generation) pattern:
User question → Embed query → Pinecone similarity search → Top-K chunks
↓
Claude 3 Sonnet ← Context + Question
↓
Streamed response → User
I chose AWS Bedrock over calling the Anthropic API directly because it keeps everything within the AWS ecosystem. IAM handles authentication, there are no API keys to manage externally, and the billing integrates with my existing AWS account. Claude 3 Sonnet provided the right balance of capability and cost for a portfolio assistant.
Pinecone handles the vector storage and similarity search. Each piece of portfolio content gets split into chunks, embedded using a sentence transformer, and stored with metadata (source document, section, topic). At query time, a single API call returns the most relevant chunks ranked by cosine similarity.
The beauty of this approach is that I can update my portfolio content — add a new project, update my experience — and the assistant immediately has access to the latest information after re-embedding.
Two Lambda functions power the backend:
Both run on Node.js 20 with minimal cold start times. The serverless model means I pay nothing when nobody is chatting, and the system scales automatically if traffic spikes.
This was one of the trickier parts. LLM responses can take several seconds to generate fully, and making the user wait for the complete response before displaying anything creates a terrible experience. Streaming — sending tokens as they are generated — makes the interaction feel conversational.
The frontend connects to the backend and receives tokens progressively, rendering them as they arrive. The user sees the response build word by word, just like ChatGPT or Claude. Behind the scenes, this required careful handling of the streaming connection, buffering, and error states.
A chat assistant that forgets what you just said is frustrating. I implemented conversation persistence using DynamoDB for metadata (session ID, timestamps, message counts) and S3 for detailed conversation logs.
Each session gets a unique ID. Messages are tracked and stored so that the assistant maintains context throughout the conversation. When constructing the prompt for Bedrock, the recent conversation history is included alongside the retrieved context, giving the model everything it needs to generate coherent, contextual responses.
While building the chat interface, I also standardised the site's dark and light mode system. The chat UI needed to work in both themes, which exposed inconsistencies in how theming was handled across the site. I consolidated the approach using CSS custom properties and a theme context, ensuring every component — including the new chat interface — renders correctly in both modes.
If you visit the chat page on my portfolio, you are interacting with a system I built end-to-end. The embeddings were generated from my actual content. The retrieval happens in real time against a production Pinecone index. The responses stream from Claude 3 Sonnet through my Lambda functions. The conversation history persists for your session.
It is not a demo or a wrapper around someone else's API. It is a complete RAG implementation running on AWS infrastructure.
With the AI assistant live, the site had two main features: a static blog and an intelligent chat. But the blog was still just markdown files committed to the repository. Every new post required a code commit and a deployment. That was fine for a handful of articles, but it would not scale.
The next step was to rebuild the blog backend entirely — replacing static files with S3, DynamoDB, and a proper admin UI.
That story is in From Static Markdown to a Full CMS.
This post is part of a series documenting the development of arkadiuszkulpa.co.uk.