Our Tools and Technology Stack
Choosing the right tools for an AI project is like choosing the right tools for a kitchen renovation. You would not use a sledgehammer to hang a picture frame. You also would not use a screwdriver to knock down a wall. We pick tools based on what solves your problem best — not what is trending on Hacker News this week, not what has the flashiest demo on Twitter, and not what some vendor is paying conference speakers to promote.
This page breaks down every major tool and technology we use, why we chose it, and what role it plays in the systems we build for you. No mystery. No black boxes.
Our Philosophy: Best Tool for the Job
The AI industry has a hype problem. Every week there is a new model, a new framework, a new "everything killer." Most of it is noise. Some of it is useful.
Our approach is boring in the best possible way:
- ›Does it solve the client's actual problem? Not a theoretical problem. Not a future problem. The problem on the table right now.
- ›Is it reliable enough for production? Demos are cheap. Running something 24/7 with real customer data is expensive if it breaks.
- ›Can the client understand and maintain it? If we build something so complex that only we can touch it, we have failed. Your system should be yours, fully.
- ›Does it have a reasonable cost structure? The fanciest model in the world is useless if running it costs more than the problem it solves.
We regularly evaluate new tools. When something better comes along, we adopt it. But we do not chase trends, and we do not swap out working systems for the sake of novelty.
Language Models We Use
Language models are the core engine of any AI system. Think of them like employees with different strengths — you would not assign the same person to write a legal brief and sort incoming mail. Different tasks need different capabilities, and using the wrong model for the job either wastes money or delivers bad results.
Here is what we use and why.
Claude (Anthropic) — Our Primary Engine
Claude is our default choice for most tasks. Anthropic builds models that are strong at following complex instructions, understanding context, and producing reliable, well-structured output. We have tested extensively and Claude consistently delivers the most dependable results for business applications.
Opus 4.6 — The senior analyst.
Opus is the most powerful model in Anthropic's lineup. It has a 1 million token context window — that means it can read and reason over roughly 750,000 words at once. An entire book. A full contract package. Six months of customer emails. All at once, with deep understanding.
We use Opus for:
- ›Complex document analysis — reviewing contracts, extracting key terms, identifying risks
- ›System design — planning the architecture of your AI solution before we build it
- ›Multi-step reasoning — problems that require connecting dots across many documents
- ›High-stakes output — anything where accuracy matters more than speed
Opus is not fast. It is thorough. When you need depth over speed, this is the tool.
Sonnet 4.6 — The daily workhorse.
Sonnet sits in the sweet spot between capability and speed. It is fast enough for real-time interactions (customer-facing chat, live classification) and smart enough to handle nuance (tone detection, complex queries, multi-turn conversations).
We use Sonnet for:
- ›Customer-facing AI assistants and chatbots
- ›Real-time document processing and classification
- ›Content generation with brand voice consistency
- ›Most day-to-day tasks in production systems
If Opus is the senior consultant you bring in for big decisions, Sonnet is the reliable team member who handles the daily workload without breaking a sweat.
Haiku 4.5 — The speed specialist.
Haiku is built for volume. It responds in milliseconds, not seconds. It is not as deep as Opus or as versatile as Sonnet, but for simple, high-volume tasks, it is exactly what you need.
We use Haiku for:
- ›Message classification and routing (is this email urgent? → yes/no)
- ›Quick data extraction (pull the invoice number from this PDF)
- ›Spam and abuse detection
- ›Any task where you are processing thousands of items and speed matters more than depth
Think of Haiku as the mail room — fast, efficient, accurate at sorting, and you do not need to pay senior analyst rates for it.
GPT-5.4 (OpenAI)
OpenAI's GPT-5.4 is a powerful model with its own strengths. It also offers a 1 million token context window and has strong multimodal capabilities — it handles text, images, and complex documents well.
We use GPT-5.4 for:
- ›Specific analytical tasks where its reasoning patterns complement Claude's
- ›Image and document analysis — when a task involves interpreting charts, diagrams, or visual layouts
- ›Cross-validation — for high-stakes outputs, we sometimes run the same task through both Claude and GPT-5.4 and compare results. Two models agreeing is much stronger than one model guessing.
- ›Tasks where a second perspective improves accuracy
We are not married to any single provider. When GPT-5.4 is the better tool for a specific job, we use it. The goal is the best result for you, not loyalty to a brand.
Codex 5.3 (OpenAI)
Codex is a specialized model built specifically for code generation and code analysis. It is not a general-purpose assistant — it is a focused tool for software development.
When we build your system, Codex helps us:
- ›Write clean, well-documented code faster
- ›Review existing codebases and suggest improvements
- ›Identify bugs and potential issues before they reach production
- ›Generate test suites — automated tests that verify the system works correctly
- ›Understand and document legacy systems — if you have existing code that nobody remembers how it works, Codex can read it and explain it
Codex does not replace human developers. It makes them faster and more thorough. Think of it as a very knowledgeable pair programmer who never gets tired and has read every open-source project on the internet.
OpenClaw — AI Agent on Your Phone
OpenClaw is different from everything else on this list. It is not a model you query — it is an AI agent that lives on your phone and actually does things for you.
The distinction matters. A chatbot answers questions. An agent executes tasks.
OpenClaw can:
- ›Manage your calendar — schedule meetings, detect conflicts, send reminders
- ›Handle communications — draft and send messages based on your instructions
- ›Execute multi-step workflows — "check my inbox for invoices, extract the amounts, update the spreadsheet, and message the accountant if anything is overdue"
- ›Act as a personal assistant that never sleeps, never forgets, and is always in your pocket
This is what AI agents actually look like in daily life. Not a sci-fi robot — a practical tool in your pocket that handles the tasks you keep forgetting or do not have time for.
We integrate OpenClaw into client workflows where personal productivity and task management are key bottlenecks.
Open-Source Models (Llama, Mistral, and others)
Sometimes data cannot leave your building. Literally. Regulatory requirements, contractual obligations, or simple business prudence might mean that sending data to OpenAI or Anthropic is not an option.
For these cases, we deploy open-source models on your own infrastructure:
- ›Llama (Meta) — strong general-purpose capabilities, runs on standard server hardware
- ›Mistral — excellent efficiency, good performance at smaller model sizes
- ›Other specialized models — depending on the task, we may recommend domain-specific open-source models
The trade-off is clear: on-premise models are less capable than the top commercial models (for now). But they give you complete data sovereignty. Nothing leaves your network. Ever.
We will always be honest about this trade-off. If your use case needs the power of Opus or GPT-5.4 and also requires on-premise deployment, we will tell you the realistic limitations and help you find the right balance.
Model Comparison Table
| Model | Best For | Context Window | Speed | When We Use It | |-------|----------|---------------|-------|----------------| | Claude Opus 4.6 | Deep analysis, complex reasoning, contract review | 1M tokens | Slow (thorough) | High-stakes tasks requiring maximum accuracy | | Claude Sonnet 4.6 | Daily operations, customer-facing AI, content | 200K tokens | Fast | Most production systems, real-time applications | | Claude Haiku 4.5 | Classification, routing, quick lookups | 200K tokens | Very fast (ms) | High-volume simple tasks, message sorting | | GPT-5.4 | Multimodal analysis, cross-validation, specific analytics | 1M tokens | Moderate | Complementary analytical tasks, image analysis | | Codex 5.3 | Code generation, code review, testing | Large | Fast | Building and reviewing all code we deliver | | OpenClaw | Personal task execution, scheduling, communications | N/A | Real-time | Personal productivity, mobile workflows | | Llama/Mistral | On-premise, data-sovereign deployments | Varies | Varies | When data must never leave your infrastructure |
Development and Delivery
The AI model is only part of the picture. The system around it — the code, the deployment, the infrastructure — is what makes it actually useful in your business.
GitHub — Your Code, Always
Every project we build lives in a Git repository on GitHub. You get full access — not viewer access, not read-only. Full owner-level access.
What this means for you:
- ›Complete change history — every modification tracked, with who made it and when. You can see exactly what changed and why.
- ›Documentation inside the repo — README files, configuration guides, maintenance instructions. All in one place, versioned alongside the code.
- ›You own the code. Fully. If you want to hire another developer to modify it, they can. If you want to switch to a different consulting firm, take the repo with you. If we disappear tomorrow, you have everything.
- ›Collaboration is built in — multiple developers can work on the same project without stepping on each other's toes.
We have seen too many businesses locked into vendors because the vendor "owned" the code. That is not how we work. You are paying for a solution, and you get to keep it.
Python (FastAPI / Flask)
Python is our primary backend language. Here is why:
- ›AI ecosystem — every major AI library, framework, and model SDK is Python-first. TensorFlow, PyTorch, LangChain, the Anthropic SDK, the OpenAI SDK — all Python. Building in Python means we have direct access to every tool in the AI world without translation layers.
- ›Fast development — Python is concise and readable. We build faster, which means your project costs less and ships sooner.
- ›Easy to maintain — Python code reads almost like English. When your team needs to understand what the system does, they can actually read the code and follow the logic.
- ›FastAPI for high-performance APIs that need speed and automatic documentation
- ›Flask for simpler services where minimalism is a feature
TypeScript / Next.js
When your project needs a web interface — a dashboard, an admin panel, a customer-facing portal — we build it with TypeScript and Next.js.
- ›TypeScript adds type safety to JavaScript. Translation: fewer bugs, better tooling, code that catches errors before they reach your users.
- ›Next.js gives us server-side rendering (fast page loads), API routes (backend logic alongside the frontend), and excellent performance out of the box.
- ›Modern, well-supported, and widely adopted — finding developers who can maintain a Next.js app is easy.
Cloudflare Workers — Edge Deployment
Your system should be fast regardless of where your users are located. Cloudflare Workers runs your code on servers distributed across the globe — over 300 cities.
What this means in practice:
- ›A user in Warsaw and a user in Tokyo both get fast response times
- ›Your system runs close to your users, not in a single data center somewhere in Virginia
- ›Cost-effective — you pay for what you use, not for idle servers
- ›Built-in DDoS protection and security features
For most projects, Cloudflare Workers is our default deployment target. It is reliable, fast, and the pricing model makes sense for businesses of any size.
Integration Tools
AI does not exist in a vacuum. It needs to connect to your existing tools — your CRM, your email, your database, your spreadsheets. Here is how we make those connections.
REST APIs and Webhooks
These are the standard interfaces for connecting modern software. If you have ever used Zapier or connected two apps together, you have used something built on REST APIs.
- ›REST APIs — structured requests between systems. "Give me this customer's data." "Create a new ticket." "Update this record." Standard, well-understood, works with virtually every modern tool.
- ›Webhooks — event-driven notifications. "When a new order comes in, tell the AI system." "When this document is updated, trigger a re-analysis." Instant reactions without constantly checking for changes.
Automation Platforms (n8n, Make)
Sometimes a visual workflow builder is the right tool. If your automation is straightforward — "when X happens, do Y and Z" — a platform like n8n or Make can be faster to set up and easier for your team to modify than custom code.
We use these when:
- ›The workflow is simple enough that code would be overkill
- ›Your team wants to be able to modify the automation themselves
- ›Rapid prototyping — getting a proof of concept running in hours, not days
We use custom code when:
- ›The logic is too complex for a visual builder
- ›Performance matters (visual builders add overhead)
- ›The automation is a core business process that needs full version control and testing
Databases
- ›SQLite — for simple projects, prototypes, and situations where a full database server is unnecessary. It is a single file. Zero configuration. Perfect for getting started.
- ›PostgreSQL — for production systems that need reliability, concurrent access, and scale. Battle-tested, open-source, runs everywhere.
We match the database to the project. A proof of concept does not need PostgreSQL. A production system serving 10,000 users a day does not belong in SQLite.
Quality Assurance and Our Verification Workflow
Here is something most AI companies will not tell you: language models hallucinate. Every single one. Claude does it. GPT does it. The open-source models do it. It is baked into how these systems work.
This is not a flaw any more than humans occasionally misremembering something is a "flaw." It is a characteristic of the technology. The question is not "does it hallucinate?" — it does. The question is "what do you do about it?"
Here is what we do about it.
Our Verification Workflow
Every piece of output that matters goes through a multi-step process:
- ›Generate — the AI produces its initial output based on the task, context, and instructions
- ›Verify facts — claims, numbers, dates, and references are checked against source material. If the AI says "according to your pricing document, the Standard plan is 199 PLN" — we verify that against your actual pricing document.
- ›Cross-reference sources — for important outputs, we check against multiple sources. If possible, we run the same query through a second model and compare results.
- ›Human review — a human reads the output before it goes to you. Not a rubber stamp — an actual review by someone who understands the context.
- ›Deliver — only after passing all checks does the output reach you or your customers.
Why This Matters
The difference between a good AI system and a dangerous one is not the model — it is the process around the model. A raw LLM output is a draft. Our job is to turn that draft into something you can trust.
For automated systems (chatbots, classification pipelines), we build these verification steps directly into the system:
- ›Confidence scoring — the AI flags when it is unsure
- ›Source attribution — responses include references to the documents they are based on
- ›Escalation rules — questions the AI cannot answer confidently get routed to a human
- ›Automated fact-checking — prices, dates, and policy references are verified against your actual data before the response is sent
The client always gets premium quality. Not because AI is perfect — it is not. Because our process catches imperfections before they reach you.
Security
We take security seriously. Not as a marketing claim — as a set of specific practices we follow on every project.
Credential Management
- ›Environment variables — API keys, passwords, and tokens are stored as environment variables, never hardcoded in source code. If someone reads the code, they see
os.environ["API_KEY"], not the actual key. - ›Encrypted storage — credentials at rest are encrypted. If a server is compromised, the credentials are not readable without the decryption key.
- ›Secret rotation — when a credential might have been exposed, we rotate it immediately. Not "soon." Immediately.
Data Protection
- ›Encryption in transit — all data moving between systems uses HTTPS/TLS. The same encryption your bank uses for online banking.
- ›Encryption at rest — data stored on disk is encrypted. Physical access to a server does not mean access to your data.
- ›GDPR compliance by design — we build systems with data protection in mind from day one, not as an afterthought. Data minimization, purpose limitation, right to deletion — these are engineering requirements, not legal checkboxes.
Access Control
- ›Minimum privilege principle — the AI gets only the access it needs to do its job. If it needs to read emails but not send them, it gets read-only access. If it needs access to one database table, it does not get access to the whole database.
- ›Access audits — after project completion, we review all access the system has and tighten anything that is no longer needed.
- ›You control access — the system runs on your infrastructure, under your control. You can revoke any access at any time.
Input Validation and Safety
- ›All user inputs are validated — the system does not blindly trust data from external sources
- ›SQL injection prevention — parameterized queries on every database interaction
- ›XSS prevention — sanitized HTML on every web interface
- ›Rate limiting — protection against abuse and denial-of-service attempts on all endpoints
- ›Error messages are safe — error messages never expose internal system details, database structures, or sensitive information
Summary
Our stack is opinionated but practical. We use Claude as our primary AI engine because it consistently delivers the best results for business applications. We supplement with GPT-5.4, Codex, and open-source models when specific strengths are needed. We build in Python and TypeScript because they are productive and maintainable. We deploy on Cloudflare Workers because it is fast and cost-effective. We deliver through GitHub because you should own your code.
Every tool choice serves one purpose: building AI systems that actually work for your business, are secure, and that you fully own and control.
If you want to know more about how any of this applies to your specific situation, book a free consultation and we will walk through it together.