Skip to content

Rich Stokoe

aerospace, cars, finance, leadership, opinion, tech

Menu
  • Learning to Fly
  • Patents
  • Publications
  • Reading List
Menu

State of the AI Nation 5: Walking the Talk

Posted on September 12, 2025September 12, 2025 by rich

It’s been a while since I last offered my perspective on the world of AI. A lot has happened so this will be a big one…

The key takeaways:

  • AI Agents are providing powerful new ways to use large language models to perform actions, not just chat
  • Self-hosted AI models and Proprietary AI models accessible only through APIs are offering 2 different paths to users, each with their own advantages and drawbacks
  • US Big Tech and the rest of the world are in tense competition over who can build the best model
  • The “Edge AI” space is occupied by one company
  • The next big thing might be small

Agents

The most significant and exciting changes in AI revolve around giving large language models the ability to not only chat back to you but to do things. To give them agency.

AI Agent

An agent is like an assistant powered by a large-language model. Agents use LLM reasoning to decide the best course of action and then performs that action. Agents can book flights, send emails, write code, or control your computer.

Multi-agent systems – where individual agents are chained together to perform a series of actions – are proving to be extremely powerful. For example, people who write newsletters each week are chaining together a “research” agent which can find good topics, a “newsletter writer” agent which will write the newsletter in the author’s voice, a “newsletter proof reader” agent which will act as an editor, brutally reviewing the auto-generated newsletter and sending it back to the “newsletter writer” agent if necessary. If this sounds like cheating, as someone who tries to own every word of their blogs, I would agree!

Agents can be built in a number of frameworks and programming languages. Most people are building AI systems with Python, and LangChain is very popular as it has strong Python support although many other languages are supported well. LangChain also has great tooling including LangGraph which helps you build multi-agent systems, and LangSmith, which is a managed platform for hosting and operating agentic systems in production with observability and eval testing (checking inputs/outputs make sense).

As primarily a .NET developer, I’ve been playing with Semantic Kernel, which allows you to embed LLM super-powers into almost any .NET application.

Like LangGraph, other AI workflow tools are emerging to make multi-step agentic systems easier to build, test and manage. Probably the most popular right now is n8n, which lets you easily wire up a mix of LLMs, AI agents and non-AI capabilities with a visual editor. n8n can be run locally on your machine for free or in the cloud for £20/month.

MCP

Model Context Protocol (MCP)

MCP is a set of rules and contracts which allow Large Language Models to be connected to other software and systems in a standardised way.

One challenge that Anthropic (maker of Claude) has addressed is how you connect LLMs directly with interfaces such APIs (e.g. weather services), running local commands (“stdio”) and local file systems. For example, you might want to ask your LLM to read your GMail and let you know if there is anything important. Gmail has a powerful API and “MCP” provides the following concepts to connect your LLM to it:

  • MCP Host – a software application which directly hosts or interacts with an LLM and uses MCP Clients to connect to MCP Servers. For example, Visual Studio Code, Ollama or LM Studio are all MCP Hosts.
  • MCP Client – a piece of code in the MCP Host which connects to the MCP Server
  • MCP Server – a service, application or middleware which sits between the application (MCP Client) and the “tool” and exposes a well-defined interface for MCP Clients to use. MCP Servers can run locally on a machine as small utility application (often a small Node application run via npx) or be hosted as a web service. MCP Servers can be written in any popular language and frameworks exist to make it simple to get started.
  • MCP Tool – the interface, software or service that the MCP Server will broker connections with, for example the GMail API. Tools can also be databases (e.g. Postgres), work management software like Jira, in-memory storage, web search engines like DuckDuckGo, agentic systems, other LLMs, or anything else pretty much you can think of.

But MCP servers aren’t just a tool for engineers.

Product Managers can chat with their favourite LLM and use an MCP Server for Figma to help build wireframes through conversation.

Combining Agentic AI, MCP servers and LLMs means we can start building software in completely new ways.

The Next Evolution of Coding Assistants

Coding Assistant

An AI that helps you write code. It can autocomplete what you’re typing, explain how a piece of code works, fix bugs, write and run tests or even write entire applications from a description.

We’ve actually had coding assistants for decades, for example, Intellisense in Visual Studio. These have been useful and added lots of value, but generally are limited in scope to suggesting the properties of an object based on its definition.

GitHub Copilot and similar tools like Tabnine (my preferred 1st-generation AI coding assistant) brought the power of LLMs into the code editor. Although both now support agents and expanded scope, these initially only offered suggestions based on the single file you were editing.

Now, Copilot, along with other more advanced Coding Assistants such as Cursor and Claude Code (my personal favourite) have the ability to read and semantically understand entire application codebases, make suggestions for not only code changes but also rearchitecting complex software to improve changeability.

Cursor and Claude Code can build entire applications from scratch, either through “vibe coding” or from a “spec”.

Vibe Coding

A transactional back and forth conversation with an AI-powered code assistant which generates code on the user’s behalf over a period of time. An enjoyable way to build software

Spec (specification)

A detailed blueprint of an application, written in plain, natural language rather than formal code. It serves as a comprehensive set of instructions describing the app’s purpose, features, and design, which the AI then uses to generate the software. In traditional software development, a spec can be roughly equated to a Product Requirement Document (PRD).

Good specs are highly detailed, describing key business concepts, user personas, behaviours, business rules, and examples, which will be translated into code.

Being able to write effective specifications is quickly becoming an essential skill and is simply the next layer of abstraction between the software developer and the hardware of the machine.

Other AI Development Tools

Let’s start by giving Apple some credit before we talk about the current state of Apple Intelligence…

XCode
As predicted by MacRumours, XCode 26 will support Anthropic models in addition to GPT-5 alongside the existing GPT-4.1 support when launched.

Google AI Studio

Google AI Studio is a prompt-driven, browser-based environment that allows developers to quickly get started with AI-powered prototypes. AI Studio supports prompt engineering, model selection, data grounding, code generation, collaborative workflows across teams, project templates and integrated with our next tool.

Google AI Studio: https://aistudio.google.com/prompts/new_chat

Google Colab

Colab is Google’s hosted Jupyter Notebooks-on-steroids product with FREE hosted T4 GPUs. Colab is great for perfoming more data-intensive tasks such as data analysis and model fine-tuning.

Google Colab: https://colab.research.google.com

Google AutoML

AutoML allows you to train high-quality custom machine learning models without machine learning experience and without code.

Google AutoML: https://cloud.google.com/automl

Google Vertex AI

Building on all the capabilities above, Vertex AI is Google’s “enterprise grade MLOps platform”, supporting fully managed machine learning operations including building, deploying and scaling ML models and generative AI applications.

VertexAI allows machine learning engineers to train their own models, specific to a use case, host them securely, and serve them at scale.

Google Vertex AI: https://cloud.google.com/vertex-ai

Lovable.Dev

One of the most exciting applications of agentic AI from the past year is Lovable.dev, which lets you build interactive user interfaces through chat (or, ideally, through proper specifications).

Battle of the Smarterphones

I want to talk about “Edge AI” – how the most popular form factor for users globally – the mobile phone – is being served with AI capabilities.

At the 9th September Apple Event, Apple launched the iPhone 17 family including the new “Air”, a very thin phone with less power than the Pro, worse thermal management than the Pro but a higher price tag!

The hardware across the iPhone, Airpod and Watch range was described as being extremely capable, as you would expect, and there was a lot of cool animations showing the various vector processing engines on the new A19 and A19 Pro chips. But not a lot on “Apple Intelligence”.

Apple Intelligence is still very basic, with many of the promised features still not delivered more than a year after being announced at WWDC ’24. On iOS you can spin off requests that Siri failed to answer to Chat GPT but that’s about it. Improvements to Siri, which is horrible to use compared to the very capable LLM-powered competition, are nowhere to be seen, and the automated summation of emails and notifications is terrible.

Apple instead seems to be shying away from mentioning its AI brand, preferring to talk about “machine learning” instead. Some quotes from the event:

“Machine learning algorithms fuse together data from the sensors and accelerometers to deliver precise measurements.”

“The 2x telephoto has an updated photonic engine which now uses machine learning to capture the lifelike details”

These would have been great opportunities to attach the “Apple Intelligence” label to delivered features, so this linguistic gymnastics only invites more scrutiny over their failure to deliver Apple Intelligence, not less. Not really acceptable for a company with a $3 trillion market cap.

Disclosure: Still an Apple shareholder. For now.

While we’re all still waiting for Siri to get smarter, Google are putting adverts like this out for the Pixel 10, showing a strong ability to deeply embed Gemini AI’s exceptional multi-modal capabilities into Android:

Google is showing the world how AI can be applied in useful and more subtle ways. I’ve mentioned before that Google was doing AI auto-complete in Gmail nearly a decade ago and didn’t get the credit it really deserved.

Now with the latest versions of their multi-modal LLM, Gemini, Google are demonstrating how to blend hardware and software into an extremely competitive product that delivers real value to users. Although it’s not entirely clear how much of this AI is on-device.

East Meets West

US Big Tech’s plans for AI domination are being thwarted in creative ways by China and the UAE.

When the open-source Deepseek R1 dropped in March this year, it took the world by storm for its quality (ChatGPT 4-like), reasoning capabilities, and how inexpensive it was to train and run. Deepseek VL added multi-modal capabilities too.

Alibaba has been releasing the Qwen family of open source models which have extensive multi-lingual capabilities, dynamic reasoning modes depending on the type of task, and an ability to handle structured data, making them attractive to Enterprise AI use cases.

Just this week, a team in the United Arab Emirates (UAE) dropped K2 Think, which demonstrates exceptional performance in many benchmarks despite a relatively small number of parameters (32B) by using a distinct, 6-stage training process.

K2 Think whitepaper: https://arxiv.org/abs/2509.07604

Open source model: https://huggingface.co/LLM360/K2-Think

Qwen has just dropped their latest Thinking and Instruct models, “Next” which are available from Huggingface: https://huggingface.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9

By activating only 3B parameters per token instead of all 80B both training and inference is 10x faster. More information from the Alibaba team here:

🚀 Introducing Qwen3-Next-80B-A3B — the FUTURE of efficient LLMs is here!

🔹 80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!)
🔹Hybrid Architecture: Gated DeltaNet + Gated Attention → best of speed &… pic.twitter.com/yO7ug721U6

— Qwen (@Alibaba_Qwen) September 11, 2025

Google is still able to surprise with its model-training expertise (don’t forget, Google is the home of DeepMind), and in August, we saw the launch of Nano Banana (AKA Gemini 2.5 Flash Image), a next-generation text-to-image generative model.

Not wanting to be outdone, Bytedance (parent company of Tiktok) released Seedream 4.0, a hyper-realistic image generator and editor – this one is also closed-source, like Nano Banana) so only available through an API.

This highlights the other dimension in the AI world today – open vs. closed models…

Open Source vs. Proprietary Models

The AI world is taking two separate paths with different value propositions:

Closed-source, proprietary AI models

Models owned, developed, and controlled by a company, where the architecture, training data, and weights are not publicly available. Unlike open-source models, proprietary models are typically accessed only through paid APIs or licensed services, giving the owning company full control over how the model is used, distributed, and monetised.
Examples: OpenAI GPT, Anthropic Claude, xAI Grok, and Google Gemini.

Open Source AI models

Models whose code, architecture, internal weights, and often training data are available to the public, allowing anyone to inspect, modify, distribute, and use the model without restrictions.
These models promote transparency, collaboration, and innovation in the AI community, enabling further development of these models, customisation, and contributing improvements back to the community.
Examples: Meta’s LLaMa models, Microsoft’s Phi family, Mistral’s models, and various models hosted on platforms like Hugging Face.

Proprietary Models

Proprietary, closed-source models are typically very, very large, have extensive general knowledge, and are often multi-model (supporting text generation as well as image or even video generation).

These models demonstrate the highest and best quality reasoning and responses, but at a cost, both financially and control. For regulated industries, handing sensitive information to a closed-source model behind an API is completely unacceptable.

There are solutions to this, Enterprises can host their own copy of these proprietary models in virtual private clouds via the big cloud providers (Azure AI Foundry with OpenAI proprietary models, AWS Bedrock with Anthropic Claude models), but this can be quite expensive and complex. While some proprietary models offer customisation features such as Open AI’s “Custom GPTs”, some offer only limited fine-tuning options, if any at all.

Open-source models
Meta was an unexpected driver of the open source model race when its LLaMa model was leaked. Meta leaned into this and has published other extremely good models over the last few years, most recently the Llama 4 Scout multi-modal model.

Open Source models can be downloaded and run locally on your laptop or PC (with a sufficiently good GPU) and used as an offline “ChatGPT”. Ollama and LM Studio are good ways of managing and using local models, but also enable local MCP Servers to be used, such as providing direct access to part of your file system, and these models can be used by local installations of n8n, providing full end-to-end agentic workflow opportunities running entirely disconnected from the internet.

While proprietary models are still leading in benchmarks, open source models really shine with their ability to be changed and further trained to perform specific tasks better.

From General Models to Domain-Expert Models

There are 3 main ways of getting a model that is really good at a particular task, rather than just being generally OK at a broad range of tasks.

1. Training your own model

This is obviously the most involved option, requiring training data to be created, a model training pipeline to be built, model testing (eval) to be performed, and ongoing updates as new information is learned.

For those looking to train their own model from scratch, can use AutoML as mentioned above, or can use FastAI for rapid prototyping.

2. Altering an existing model

Making changes to existing (open source) models sounds extremely complex, but is quite straight forward.

An approach known as LoRA (Low-Rank Adaptation) allows the “weights” (how models choose to relate information within it together) in an existing model to be updated through supervised reinforcement-learning from human feedback – “RLHF”, and more commonly now through reinforcement-learning with verifiable rewards – “RLVR”. RLVR allows AI developers to create a “reward model” which tells the model when it demonstrates the behaviours and outputs that are preferred. This rewards model can be automated, allowing for much faster fine-tuning.

For example, training a model on legal texts and challenging it to cite correct case law would be one way of optimising a model specifically for the legal domain.

LoRA is proving to be extremely powerful and is challenging the conventional wisdom around models. Fine-tuning experts, Unsloth, which also provide simple-to-use tooling for fine-tuning, have reduced the size of the DeepSeek V3.1 model using LoRA and reducing the quantisation of the model (simplistically, how precisely the model measures similarity between words / information. Consider the distance between two places being described with multiple decimal points – “Edinburgh to London = 332.52489 miles” and then reducing its precision, “Edinburgh to London = 330 miles”. Until now it has been assumed that more precision would give better results, however, by reducing the quantisation (decimal precision), Unsloth is finding better performance in certain benchmarks:

Can a 1-bit or 3-bit quantized model outperform GPT-4.1 or Claude-Opus-4?

Yes!

Today, we're excited to show how LLMs like DeepSeek-V3.1 can be quantized to just 1-bit or 3-bit, and still beat SOTA models like Claude-Opus-4 (thinking) on Aider Polyglot.

Details and blog below! pic.twitter.com/uwKuHj6aYn

— Unsloth AI (@UnslothAI) September 10, 2025

Retrieval-Augmented Generation (RAG)

RAG has been around for a little while now and is still very useful when you need to give more information to a model without changing the model itself.

Most proprietary models support retrieval-augmented generation (RAG) such as searching the web to help them respond with up-to-date information. Many proprietary models enable customised RAG via MCP servers.

For self-hosted, open-source models, you can use MCP Servers to perform RAG via web searches, or reading vectorised data (known as “embeddings”) from a Vector Store.

Vectorisation / Embeddings

The process of converting text, documents, or other data into numerical representations called vectors (“embeddings”) that capture the semantic meaning and relationships of the content, allowing an AI model to use this in its context when responding to requests.

Vector Storage

Vector Stores are specialised databases designed to efficiently store vector embeddings and perform rapid vector (similarity) searches to find the most relevant information based on a user’s query.

So what’s next?

Again, the future is likely to be a tale of 2 parallel paths: the very large, and the fairly small.

At the large end of the spectrum, the search for Artificial General Intelligence (AGI) continues, meaning a quest to developing bigger, better and more powerful models. Only a small number of companies in the world can pursue this path.

OpenAI is one of them, having just inked a $300bn cloud infrastructure deal with Oracle showing that achieving AGI remains firmly in its cross-hairs.

On the other hand, the work by Unsloth, the UAE team, Deepseek and others is showing that small models, creatively trained and carefully quantised, can provide more than enough generative AI and reasoning capability to power agentic workflows, and automate significant amounts of work. This seems to offer enough clear business value in the short to medium term to be attractive to investors, while the problem statement for AGI remains… vague, at best.

With so many high quality open source models being produced, a strong open-source community building free tools and optimising open source models, and China so keen to disrupt any semblance of US AI supremacy, it’s unlikely we will see one AI company pulling too far away from the pack. If any of them do, OpenAI would be the natural choice, but it’s important not to underestimate Anthropic and Perplexity who may take a different tack.

The East/West battle should keep consumer prices under control, but that presents a problem as many AI products are operating at a reduced margin or even a loss to attract customers. With “VC-subsidized tokens” being a challenge for profitability, could we even see some big names disappear?

The models we’re seeing from Qwen, K2, LLaMa and even the new OpenAI open source OSS-20b and OSS-120b models are smart enough for 90% of agentic AI use cases, especially when fine-tuned, which has never been easier to do. This has commoditised generative, multi-model AI to the point where people don’t need to pay through the nose for it. That sounds like a business opportunity.

Category: Artificial Intelligence, Featured, Technology

Pages

  • Learning to Fly
  • Patents
  • Publications
  • Reading List

Categories

Architecture (2) Artificial Intelligence (8) Aviation (25) Bad Company (2) Books (1) Cars (2) Civil Aviation (27) Code (16) Companies (28) Entertainment (1) Featured (18) Film (1) Finance (4) Fitness (2) Gaming (4) General Aviation (26) Internet (28) Leadership (1) Military Aviation and Space (1) Mobile (7) Motoring (2) Open Finance (1) Payments (1) People (11) Places and Events (12) Politics (8) Products (6) Quantum Computing (1) Search (6) Security (1) Social (5) Social Business (3) Social Media (6) Software (22) Sports (2) Technology (48) TV (1) Uncategorized (22)

Recent Posts

  • TOON – The AI Data Format Designed to Save You Money
  • Building a 2nd Brain: Your Personal, AI-Augmented Knowledge Manager
  • State of the AI Nation 5: Walking the Talk
  • Learning to Fly: Lesson 25 (Video)
  • Learning to Fly: Lesson 24
  • Learning to Fly: Lesson 23 (Full Flight Video)
  • State of the AI Nation 4: Open AI Update
  • Learning to Fly: Lesson 22
  • Q-Day and Post-Quantum Cryptography
  • Tabs and Pockets – The Secret to Building Highly Effective Teams
© 2026 Rich Stokoe | Powered by Minimalist Blog WordPress Theme