How AI Thinks and Remembers
This is the most important technical page on this site. If you read one thing before talking to us, make it this.
Most AI vendors will tell you their system is "smart" and leave it at that. We think you deserve to understand what is actually happening under the hood — not because you need to become an engineer, but because understanding these concepts changes how you evaluate AI solutions, what questions you ask, and what you expect from a custom system.
Two concepts matter above all others: reasoning (how the AI processes your request) and memory (what the AI knows about you and your business). Get these right, and the system works. Get them wrong, and you have an expensive autocomplete.
How AI "Thinks" — Reasoning Explained
AI does not think the way you do. It does not have opinions, feelings, or consciousness. But it does process information in a structured way that produces useful output — and the way it processes matters more than most people realize.
Chain of thought — showing your work
When you ask a simple question ("What is 2 + 2?"), the AI responds immediately. No processing needed. But when you ask a complex question ("Review this contract and identify clauses that expose us to liability if the vendor fails to deliver on time"), the AI needs to work through it step by step.
This is called chain of thought reasoning. The AI breaks the problem into smaller steps, works through each one, and builds toward a final answer. Like showing your work in math class — the answer alone is less valuable than the reasoning that produced it.
Why this matters for you: When the AI shows its reasoning, you can verify it. If step 3 of 7 is wrong, you can catch it. If the AI just gives you a final answer with no reasoning, you have no way to check whether it arrived at that answer through sound logic or through a lucky guess.
We configure our systems to show reasoning for complex tasks. Not because it looks impressive, but because it makes errors visible and correctable.
Reasoning models vs standard models
Not all AI models reason the same way. There is a meaningful difference between models optimized for quick responses and models built for deep analysis.
Standard models (like Claude Haiku 4.5) are fast. They process your request quickly and give you a response in seconds. Perfect for simple tasks: drafting a short email, answering a factual question, classifying a document into a category.
Reasoning models (like Claude Opus 4.6) take more time but go deeper. They consider multiple angles, weigh evidence, handle nuance, and produce more thorough analysis. Better for complex tasks: reviewing a legal document, analyzing a financial report, debugging a technical problem, or making a recommendation based on incomplete data.
The difference in practice:
Ask a standard model: "Should we renew the contract with Vendor X?"
You get: "Based on the information provided, renewing the contract seems reasonable given the competitive pricing and established relationship."
Ask a reasoning model the same question, with the same context:
You get: "Three factors to consider. First, the contract's auto-renewal clause locks you in for another 24 months — if you want to renegotiate, the window closes in 14 days. Second, the SLA penalties in section 8.3 are weaker than industry standard — they cap liability at one month's fee, not the actual cost of downtime. Third, their delivery track record over the last 12 months shows 3 missed deadlines, all in Q4. If Q4 performance matters to you, this is a pattern, not an anomaly. Recommendation: negotiate before renewing, focus on the SLA penalties and the auto-renewal term."
Same question. Same data. One response is surface-level. The other is actionable.
Reasoning can be configured
This is something most people do not know: you can control how deeply the AI analyzes a problem. You are not stuck with one mode.
For routine tasks (email drafts, simple lookups, standard responses), you want speed. The AI should respond in seconds, not overthink it.
For important decisions (contract review, financial analysis, strategy recommendations), you want depth. The AI should take its time, consider edge cases, and flag risks.
We configure this per-task in the systems we build. The same system can respond instantly to "What is our return policy?" and spend 30 seconds analyzing "Should we accept this vendor's counter-proposal?" The right level of reasoning is applied automatically based on the type of request.
The models we use and why
We use multiple models from different providers — Claude (Anthropic), GPT-5.4 and Codex 5.3 (OpenAI), OpenClaw, and open-source options. Each model has a different strength: deep reasoning, speed, code analysis, multimodal processing, or on-premise privacy. We match the model to the task, not the other way around.
The choice of model is a technical decision we make during the build. You do not need to understand the differences — but you should know that the system is not one-size-fits-all. The right model for the right task is part of what makes a custom system perform better than ChatGPT out of the box.
For the full breakdown of every model we use, with a comparison table, see Our Tools and Technology Stack.
The System Prompt — the AI's Operating Manual
Every AI interaction starts with something the user never sees: the system prompt. This is a set of instructions that the AI reads before your first message. It defines who the AI is, what it knows, how it behaves, and what rules it follows.
Think of it as the operating manual for a new employee. Before they interact with a single client, they read the manual. It tells them the company's policies, how to handle complaints, what language to use, what to avoid, and when to escalate.
Without a system prompt, the AI has no manual. It defaults to its generic behavior — helpful, polite, but completely unaware of your specific context. With a well-written system prompt, the AI behaves like it has been working at your company for months.
Generic vs custom system prompt — the same question, two worlds apart
The question: A client emails with a complaint about a delayed delivery.
AI with no system prompt (generic):
"Thank you for reaching out. We apologize for the inconvenience. Your complaint has been noted, and we will look into this matter. Please allow 3-5 business days for a response."
This is the AI equivalent of a form letter. It says nothing specific. It commits to nothing concrete. It does not reference the client by name, their order, or the actual issue. A human reading this knows immediately that nobody actually read their message.
AI with a custom system prompt:
"Hi Anna, I have logged your delayed delivery as case #4521. Based on our shipping policy, orders delayed by more than 5 business days qualify for either a full refund or expedited re-shipment at no charge — your choice. I have flagged this with our logistics team. You will receive a tracking update or a resolution within 24 hours. If you would prefer the refund, just reply to this email and I will process it immediately."
Same AI model. Same question. The difference is the system prompt.
The custom system prompt told the AI:
- ›Use the client's first name
- ›Create a case number in the specified format
- ›Reference the company's actual shipping policy
- ›Offer the specific options the policy allows
- ›Commit to a concrete timeline
- ›Make the next step easy for the client
What goes into a system prompt
A well-built system prompt includes:
Identity and role
- ›Who the AI is (customer support agent, internal assistant, research analyst)
- ›The company it represents
- ›The tone it uses (professional, friendly, technical, casual)
Knowledge boundaries
- ›What the AI knows and can discuss
- ›What it must not discuss (competitors, confidential information, topics outside scope)
- ›What it should admit it does not know rather than guessing
Response rules
- ›Maximum and minimum response length
- ›Required elements (always include a case number, always end with a next step)
- ›Formatting preferences (bullet points vs paragraphs, formal vs informal)
Escalation rules
- ›When to hand off to a human
- ›How to hand off (message format, where to route)
- ›What constitutes an urgent issue vs a standard one
Error handling
- ›What to do when the question does not match any known category
- ›How to respond when information is missing
- ›How to handle contradictory instructions
Why this matters for business
AI without a good system prompt is like an employee with no onboarding. They are intelligent. They are capable. But they do not know your rules, your clients, or your standards. They will give reasonable-sounding answers that are wrong for your specific situation.
Every dollar you spend on AI without investing in the system prompt is partially wasted. The model is the engine, but the system prompt is the steering wheel. A powerful engine with no steering does not get you where you want to go.
We write system prompts as part of every build. They are not an afterthought or a template. They are custom-written based on your actual processes, your actual communication examples, and your actual edge cases.
What Is CLAUDE.md — the Identity File
Here is where things get specific to how we build at Kuliberda Labs.
CLAUDE.md is a configuration file that we create for every client system. It is the master document that defines who the AI is in the context of your company. If the system prompt is the employee handbook, CLAUDE.md is the employee's complete dossier — their training, their knowledge, their personality, their rules, all in one place.
What CLAUDE.md contains
Communication tone and style
- ›How formal or casual the AI should be
- ›Words and phrases to use (matching your brand voice)
- ›Words and phrases to avoid (competitor names, banned terminology, internal jargon clients should not see)
- ›Examples of ideal responses the AI should emulate
Company knowledge
- ›Products and services with descriptions, pricing, and conditions
- ›Common client questions and their correct answers
- ›Company policies, procedures, and decision trees
- ›Contact information and escalation paths
Behavioral rules
- ›What the AI does when asked about something outside its scope
- ›How it handles angry or frustrated clients
- ›When it offers alternatives vs when it simply answers
- ›Maximum response time commitments it can make
What to avoid
- ›Topics the AI must not engage with
- ›Claims the AI must not make (guarantees, legal advice, medical guidance)
- ›Formatting patterns to avoid (walls of text, excessive bullet points)
This is not "training"
An important distinction: CLAUDE.md is not training data. The AI does not learn from it the way a human learns from experience. It reads the file and follows the instructions — the same way it reads the system prompt.
Why this matters: Training an AI model takes weeks, costs thousands, and produces unpredictable results. Updating CLAUDE.md takes minutes, costs nothing, and produces immediate results.
- ›Change your pricing? Edit the file. The AI quotes the new price on the next interaction.
- ›New product launch? Add it to the file. The AI knows about it immediately.
- ›New policy? Update the rules. Effective immediately.
- ›Holiday hours? Add a note. The AI tells clients right away.
- ›Bad response pattern spotted? Add a rule to avoid it. Fixed by the next conversation.
There is no retraining cycle. No waiting for a new model version. No risk of the update breaking something that was working before. You change the file, and the behavior changes.
Version controlled and human-readable
Every CLAUDE.md file we create is stored in version control. This means:
- ›Full history of changes. You can see exactly what was changed, when, and why. If a response suddenly seems off, you can check what changed in the config and roll it back.
- ›Rollback to any version. New update causing problems? Revert to yesterday's version in seconds.
- ›Audit trail. For regulated industries, you can demonstrate exactly what instructions the AI was operating under at any point in time.
And critically: CLAUDE.md is written in plain English. Not code. Not a proprietary format. Not a black box. You can open the file, read it, and understand every instruction the AI is following. You do not need a programmer to read it. You do not need us to explain it.
This is deliberate. You should know what your AI system is doing. You should be able to review its rules the same way you review an employee's job description. No mysteries.
How we build CLAUDE.md for your company
We do not generate CLAUDE.md from a template. We build it from:
- ›Your actual communication. We review real emails, real support tickets, real client interactions. We extract your voice, your patterns, your decision-making logic.
- ›Your actual processes. We document the decision trees your team follows. When does a complaint get a refund? When does it get an apology? When does it get escalated? These rules go into the file.
- ›Your actual edge cases. We ask: what are the weird situations that trip up new employees? The client who always asks for a discount? The product that requires a disclaimer? The region with different shipping rules? Those go into the file.
- ›Iterative refinement. After launch, we review real interactions and refine. The first version is good. The tenth version is excellent. This is part of the post-launch support period.
Memory: Session vs Persistent
This is where most people's understanding of AI breaks down. And it is where the biggest opportunity lies for custom systems.
Session memory — the conversation window
Within a single conversation, the AI remembers everything that has been said. You mention your name at the start, and the AI uses it throughout. You describe a problem in message one, reference it in message five, and the AI connects the dots.
This is session memory. It works the same way a meeting works. Everyone in the room remembers what was said 10 minutes ago.
But here is the critical point: when the conversation ends, the memory disappears. Completely. The next conversation starts from zero. The AI does not know you talked yesterday. It does not remember your name, your project, your preferences, or the decision you made.
This is like having a meeting with no notes. While you are in the room, everything is fresh. The moment you walk out, everything discussed is gone. You come back tomorrow, and it is as if the meeting never happened.
This is the default behavior for every AI model. ChatGPT, Claude, Gemini — out of the box, they all have this limitation. Some have added basic memory features, but they are surface-level (remembering your name, your location, a few preferences). They do not remember the substance of your work.
Why this matters
For casual use, session-only memory is fine. You ask ChatGPT a question, get an answer, move on.
For business use, it is a deal-breaker. Imagine:
- ›Your AI assistant helps a client on Monday. On Tuesday, the same client comes back, and the AI has no idea who they are or what was discussed. The client has to explain everything again.
- ›You spend an hour configuring the AI's behavior during a conversation. Next session, all of that configuration is gone.
- ›The AI gives a client incorrect information. You correct it. Next session, it gives the same incorrect information to the next client.
This is not a hypothetical. This is what happens with default AI. And it is why custom systems exist.
Persistent memory — memory.md
The solution is a file we call memory.md. It is a structured document that the AI reads at the start of every session. It contains everything the AI needs to "remember" about you, your company, and your work.
What memory.md contains:
- ›User preferences — communication style, formatting preferences, common requests
- ›Company context — products, pricing, team structure, key clients, ongoing projects
- ›Previous decisions — what was decided in past sessions, why, and what the outcomes were
- ›Lessons from past sessions — corrections, edge cases discovered, patterns to follow or avoid
- ›Active tasks and status — what is in progress, what is blocked, what was completed
How it works in practice:
- ›A session starts
- ›Before your first message is processed, the AI reads memory.md
- ›The AI now has full context: who you are, what you are working on, what was decided before, what to avoid
- ›During the session, new information is noted
- ›At the end of the session, memory.md is updated with relevant new information
- ›Next session, the updated memory is loaded
The effect: The AI "remembers" you between sessions, even though it technically starts from zero each time. To you, it feels continuous. The AI picks up where it left off. No re-explaining. No lost context. No repeated mistakes.
The briefing document analogy
Think of memory.md like a briefing document for a consultant. Before every meeting with a client, the consultant reads the brief:
- ›Who is the client? What do they do?
- ›What have we discussed before? What was decided?
- ›What are the open items? What is the current priority?
- ›What are the sensitivities? (Budget concerns, timeline pressure, past bad experiences)
A consultant who reads the brief walks into the meeting ready to contribute from minute one. A consultant who does not read the brief spends the first 20 minutes catching up while the client gets increasingly frustrated.
Your AI system is the same. With memory, it is prepared from the first word. Without memory, every interaction starts at zero.
Memory is maintained, not infinite
Memory.md does not grow forever. Left unchecked, it would become a disorganized mess of notes that the AI would struggle to process. We maintain it:
- ›Compaction — old, resolved items are archived. Active items stay current.
- ›Organization — information is structured by category, not dumped chronologically.
- ›Relevance filtering — not everything from every session needs to persist. We keep what matters and let go of what does not.
- ›Version control — same as CLAUDE.md, every change is tracked and reversible.
This is part of the ongoing system management. The memory stays clean, organized, and useful.
Context Window as "Working Memory"
There is a technical concept that directly affects how useful an AI system can be: the context window. This is how much text the AI can process at once during a single interaction.
Think of it as working memory — the amount of information the AI can hold "in mind" while it works on your request. A small context window means the AI can only consider a small amount of information. A large context window means it can consider much more.
What the numbers mean
- ›Claude Opus 4.6: 1 million tokens — approximately 750,000 words, or about 2,500 pages of A4 paper
- ›GPT-5.4: 1 million tokens — same ballpark
- ›Claude Sonnet 4.6: 200,000 tokens — approximately 150,000 words
To put this in perspective:
- ›Your company's entire employee handbook: probably 50-100 pages
- ›Your full product catalog with descriptions: probably 20-50 pages
- ›Your complete pricing structure with all conditions: probably 5-10 pages
- ›All your email templates and response guidelines: probably 10-20 pages
- ›A year of client support tickets: maybe 500-1,000 pages
With a 1 million token context window, your entire company knowledge base fits in a single session. The AI does not need to search for information — it can hold all of it at once. Your pricing, your policies, your client history, your product details, your procedures — all loaded, all accessible, all considered when generating a response.
Why this matters in practice
When you ask the AI "What should I tell this client about the pricing change?", a system with a large context window can simultaneously consider:
- ›Your current pricing structure
- ›The old pricing structure
- ›This specific client's contract terms
- ›The client's communication history
- ›Your company's policy on price change notifications
- ›The templates you typically use
- ›Past decisions about similar situations
All at once. No lookups. No switching between documents. Everything is right there in working memory.
This is a recent development. In early 2023, the standard context window was 4,000-8,000 tokens (a few pages). By late 2023, leading models reached 128,000 tokens. Now, 1 million tokens is available from multiple providers. The jump changes what is possible. Systems that previously needed complex retrieval pipelines can now simply load everything into context and work with it directly.
How we use large context windows
We design systems to take advantage of this. Instead of building complex search systems that find relevant information (and sometimes miss it), we load the full context where possible. Your AI assistant does not search for your pricing — it already has your pricing loaded. It does not look up the client's history — the history is already in context.
For systems where the total knowledge exceeds the context window (large knowledge bases, extensive document libraries), we combine direct loading with targeted retrieval. The most important information is always in context. The rest is searchable and pulled in when needed.
Why This Matters for Your Business
Let us bring it all together with what this means in practical terms.
Every session starts with your full context
When someone interacts with your AI system, the system loads:
- ›The system prompt (operating rules)
- ›CLAUDE.md (identity, knowledge, behavior)
- ›memory.md (history, preferences, active items)
- ›Relevant documents from your knowledge base
From the very first word, the AI knows who it is talking to, what the company does, what the rules are, and what happened before. There is no warm-up period. No "getting to know you." No first-day-on-the-job awkwardness.
Updates are instant
Your pricing changes next Monday. On Monday morning, you update the pricing in CLAUDE.md. From that moment, every interaction uses the new pricing. No retraining. No waiting for a model update. No filing a ticket with a vendor.
Your company launches a new product in March. In February, you add the product details to the knowledge base. On launch day, you update the system prompt to include it in relevant responses. Done.
A client complaint reveals a gap in your AI's knowledge — it did not know that your refund policy has an exception for custom orders. You add the exception to CLAUDE.md. Fifteen seconds later, the AI handles custom order refunds correctly.
The speed of change is the speed of editing a text file.
No training time, no training cost
Traditional AI customization involves fine-tuning — feeding the model thousands of examples and letting it learn patterns. This takes weeks, costs thousands, and produces results that are hard to predict and hard to correct.
Our approach uses no fine-tuning. The models we use (Claude, GPT-5.4) are already extremely capable. They do not need to "learn" your business. They need to be told about your business in clear instructions.
The difference:
- ›Fine-tuning: weeks to prepare data, days to train, unpredictable results, expensive to iterate, one model version locked in
- ›Prompt-based customization: hours to write the config, instant results, predictable behavior, free to iterate, works with any model version
We chose this approach deliberately. It is faster, cheaper, more predictable, and gives you more control.
The system gets smarter over time
Not because the model improves (though it does, periodically). Because the configuration improves.
After the first week of real usage, we review interactions:
- ›Where did the AI give a suboptimal response? We refine the prompt.
- ›What questions did clients ask that the AI could not answer? We add the information.
- ›Where did the AI's tone not match the brand? We adjust the voice calibration.
- ›What edge cases appeared that we did not anticipate? We add handling rules.
After a month, the system handles 95% of cases correctly. After three months, it handles 99%. The improvement is not magic — it is systematic refinement based on real data.
You are not locked in
Everything we build is in readable files that you own:
- ›CLAUDE.md — plain English, readable by anyone
- ›memory.md — structured notes, readable by anyone
- ›System prompts — plain English, readable by anyone
- ›Knowledge base — your documents in standard formats
If you decide to switch providers, all of this transfers. The knowledge is not locked in a proprietary format. It is not embedded in a model you cannot access. It is in files on your system that work with any AI model.
We do not believe in lock-in. If our work is good, you stay because the system works. If it is not good, you should be able to take your configuration and move. That is fair.
LLMs Make Mistakes — and That Is Normal
AI models hallucinate. They generate information that sounds correct but is not. This is not a flaw — it is a fundamental property of the technology, like humans having stronger and weaker days. The difference with AI is that the variable is context and data quality, not mood.
The question is not "does AI make mistakes?" — it does. The question is "how do you catch those mistakes before they reach the client?"
Our answer: a multi-step verification workflow built into every system we deliver. Every output is generated, fact-checked against source material, cross-referenced, and reviewed before it reaches you. The client always gets premium quality — not because the AI is perfect, but because our process catches imperfections.
For the full explanation of how hallucination works and our 4-step verification process, see Language Models: How They Work.
Summary
- ›Reasoning determines how deeply the AI analyzes your request. We configure the right depth for each task.
- ›System prompts are the AI's operating manual. A good one transforms generic AI into a useful business tool.
- ›CLAUDE.md is the identity file we build for your company. Plain English. Version controlled. Instantly updatable.
- ›Memory (memory.md) makes the AI remember you between sessions. No more starting from zero every time.
- ›Context windows determine how much information the AI can hold at once. Modern models can hold your entire business context.
- ›Updates are instant. Edit a file, change behavior. No retraining.
- ›You own everything. Readable files, standard formats, no lock-in.
If you want to see what this looks like for your specific business, start with discovery. We will map your situation and tell you what makes sense.
Questions?
Email dawid@kuliberda.ai. We respond to every message, usually within 24 hours.