Why Forge
Why another embedding engine is worth building when retrieval quality, transport shape, and trust boundaries decide what an AI system is allowed to know.
In November 2025, the question was simple enough to sound rude:
Why does the world need another vector embedding tool?
The honest answer is that the world does not need another wrapper around somebody else’s model. It does not need another dashboard with a prettier cosine similarity demo. It does not need a new place to paste text and receive a list of floats.
But it does need better retrieval.
The battle ahead is not simply who has access to AI. Access will commoditize. The real battle is who can turn AI into leverage without poisoning the work with bad knowledge. There will be winners and losers, and one of the things that will slow trust, progress, and competitive advantage is poor use of what an organization already knows.
Retrieval is the part of an AI system that decides what the model is allowed to know. If retrieval misses the right passage, the LLM never sees it. If retrieval finds the wrong passage, the LLM can sound confident while standing on bad ground. The answer looks intelligent, the citation looks plausible, and the system quietly loses contact with the truth.
That is the problem Forge is meant to push on.
Forge is Voxell’s embedding engine: 87ms median latency, 42ms sustained batch behavior, three quality tiers up to 4096 dimensions, up to 75 MTEB, zero data retention, and zero trust access through mTLS instead of bearer keys.1 Those are product facts. This post is about the motive underneath them.
The simpler version: Forge is the pipeline between text knowledge and the vector knowledge LLM systems require. It has to be fast, because latency kills momentum. It has to be high quality, because a wrong neighbor in vector space can become a wrong answer in production. It has to be secure, because the knowledge worth embedding is often the knowledge you cannot afford to leak.
The image at the top of this post is the mood version of that idea: language being worked like material. The diagram below is the engineering version. On the left is human-readable text, including the awkward fact that real knowledge is often partial, messy, or redacted. In the middle is the part that keeps the pipeline honest: a typed protobuf work order and a zero trust identity gate. On the right is the output Forge is responsible for, a vector field clean enough for an LLM system to use without wandering toward the wrong neighbor.
I Noticed the Drift
I came to this through annoyance before I came to it through strategy.
As an RPG player, I care about veracity inside a world. Names matter. Roles matter. Negation matters. If the system says the duke betrayed the queen when the queen betrayed the duke, the story breaks. If it treats “the spell does not affect undead” as close enough to “the spell affects undead,” the table has a problem.
That sounds like a game complaint until you swap in contracts, policies, medical notes, incident reports, source code, or research logs.
Embedding drift is not always dramatic. More often it is a small semantic failure hidden under high lexical overlap. Same nouns. Same structure. Same vibe. Wrong meaning.
I had already been working through hard negation, role inversion, polarity flips, and the ugly places where generated text gets fluent while losing structure. That made the retrieval failures harder to ignore. Once you see the pattern, you see it everywhere.
Forge starts from that irritation: the embedding layer is too important to treat as plumbing nobody inspects.
The Benchmark Is Not the Product
MTEB matters because it is one of the few common scoreboards for embedding models.2 It is imperfect, but it gives builders a way to compare retrieval, classification, clustering, and semantic similarity across a wide spread of tasks.
If Forge lands around 75 as expected, that matters. Competitors such as Cohere have strong production embedding products, and they are a reasonable comparison point because buyers already know them.3 A high MTEB score does not mean Forge wins every workload. It means Forge belongs in the serious conversation before the custom evaluation starts.
At the top end, a few points are not vanity. Retrieval compounds. Better recall changes what enters the prompt. Better semantic separation changes what gets excluded. Better embeddings reduce how often downstream rerankers and LLMs have to recover from a bad first move.
The benchmark is not the product.
The benchmark is a receipt.
Why Protobuf and gRPC Matter
People often think of embedding as a tiny API call:
text in, vector out
That is true at demo scale. It stops being true when you are embedding billions of tokens.
At that point, embedding is a factory line. You care about batch shape, wire size, deadlines, retries, idempotency, tenant identity, model tier, dimensionality, versioning, and whether a failed chunk can be resumed without guessing what happened. JSON over a loose REST boundary can work for small jobs, but every ambiguity becomes operational debt at volume.
Protobuf gives the request a contract. gRPC gives the contract a transport that understands streaming, deadlines, backpressure, multiplexing, and binary framing. The shape matters because the work is repetitive and enormous. A little overhead becomes a bill. A vague schema becomes a support ticket. A missing field becomes a re-embed.
When the corpus is large enough, the protocol is part of the model experience.
The vector is not the only output. The system also needs to know which model produced it, which tier was requested, which dimensions were returned, which tenant asked for it, what retention policy applied, and how to replay or audit the job later.
That is why the protobuf/gRPC shape is not decoration. It is how the embedding engine stays boring when the workload stops being small.
Why Zero Trust Beats Floating API Keys
API keys are convenient because they are just strings.
That is also the problem.
Strings end up in environment files, shell histories, CI logs, copied Slack messages, notebooks, screenshots, and old machines nobody remembers. A bearer key says: whoever has this string is allowed in. The server has to pretend possession is identity.
That is a poor fit for infrastructure that may sit in front of sensitive documents and high-volume embedding jobs.
Forge’s zero trust path uses mutual TLS with Ed25519 client certificates. The connection carries cryptographic identity. Access can be tied to a client, organization, environment, and workload. Certificates can expire, rotate, and be revoked without asking every developer to hunt through a pile of hidden strings.
The point is not security theater. The point is reducing the number of secrets that float around loose while making machine identity explicit.
If an embedding service becomes part of production memory, it should not be protected like a hobby webhook.
Why Me
I am biased toward the physical layer of AI.
I care about GPUs, memory movement, batch behavior, transport costs, and the way a small bad assumption becomes expensive when repeated a billion times. Voxell exists because I think real-time AI infrastructure needs people who are willing to go below the product surface and argue with the machine.
Forge sits in that same pattern.
The model quality matters. The dataset matters. The transport matters. The trust boundary matters. The launch plan matters. The boring operational pieces matter because they decide whether the good benchmark turns into a dependable system.
The target is June 13, 2026.
That date is not just a marketing marker. It is a forcing function. By then, Forge needs to be more than a promising score. It needs to be an engine someone can use without wondering where their data went, where their key leaked, why the batch failed, or why the search result drifted away from the truth.
I want Forge because I want retrieval I can trust.
Not perfect. Not mystical. Just honest enough, fast enough, and well-shaped enough that the rest of the system has a fair chance.
That is the competitive claim too. If advantage depends on turning text knowledge into vector knowledge, the embedding pipeline becomes strategic infrastructure. Speed, quality, and security are not separate virtues. Together, they decide whether the system can actually use what the organization knows.
That is enough reason to build one more embedding tool.
Voxell Forge product page: https://voxell.ai/forge/. ↩︎
Muennighoff, N. et al. (2022). MTEB: Massive Text Embedding Benchmark. https://arxiv.org/abs/2210.07316. ↩︎
Cohere Embed docs: https://docs.cohere.com/docs/cohere-embed. ↩︎