Safety research at the intersection of complex systems, epidemiological cybersecurity, and semantic immunity.
Gaia Robotics is an independent research group defending society against semantic compromise, prompt worms, recursive hacking, and other population-scale autonomy threats. We view risk from an ecosystems perspective, living in the interactions of heterogeneous populations of autonomous agents, individual humans, and humanity as a whole. Our recommendations often entail network conditioning, signaling, or population-level semantic defenses.
Where the most robust formalisms exist outside of computer science, we bridge them. Many live in biology, sociology, economics, or governance, where the primary constraint is stabilizing a society of individual actors with variable traits and behaviors. We use the formal mathematics of biological immunity, epidemic thresholds, fail-closed defenses, and market externalities, among others. This is not an attempt to be "interdisciplinary" for its own sake - the best tools simply exist outside of traditional cybersecurity.
Our work is open-source outside of situations where this would cause harm. We publish papers, ship SDKs, and develop simulation tooling for researchers and operators building safe agent infrastructure.
An AI attacker is software and can be recursively instantiated on compromised machines. Formalizes recursive hacking as a threat class, recognizes inference at the target's cost as the controlling resource, and introduces inference sparsification, fail-closed economic defenses, and crowdsourced defense (e.g. Crowdsec, Cortex) as the core containment primitives, with impact backed by modeling over scale-free graphs. It is possible to contain this.
Models the spread of recursive autonomous compromise (RAC) over graphs of machines as a SIRVS compartmental model. Allows simulation of graph size (n), topology (ER or scale-free), marginal success probability (p), inference capable fraction (ρ), and crowd defense adoption (v). Performs monte-carlo simulations, computes percolation thresholds (ρc), and generates beautiful graphs and surfaces by performing parameter sweeps.
Opensourcing advanced AI allows anyone to finetune any behavior they wish into the models, without any durable safeguards or controls. I compare this to allowing everyone to have personal access to large quantities of antimatter - an ungovernable, planet-threatening risk that advanced AI may even directly unlock in the future. Nearer term, opensource models also factor prominently in our work on recursive hacking, as the most likely drop artifacts in a recursive hacking campaign.
Prompts running on a transformer are essentially code, programmed in natural language. Multi-agent “social networks”, such as Moltbook, may seem like “agents just talking”, but they are actually a perfect vehicle for the propagation of prompt worms - self propagating instructions as described by Cohen, Bitton, and Nassi. Given the number of agents on these platforms, this can result in the creation of a powerful agentic botnet, limited only by the alignment training of the underlying models.
In this paper, we analyze the spread of a prompt worm from an epidemiological standpoint using an SI model. We then introduce Semantic Immunity: a multilayered adaptive defense against prompt worms that mirrors the design of the adaptive immune system. Although detecting all bad prompts before execution is undecidable, we detect the signature of compromise from an agent's actions and use this "innate immunity" to seed an adaptive database of locality sensitive hashes of the associated embedding trajectory. With population level visibility, we avoid the pitfalls of quarantining arbitrary embedding trajectories; rather, we detect jerk away from a normal conversational trajectory into the space of the worm. Embedding models are not vulnerable to the injection, yet the approach scales with the power of the agents because the LLMs' actions populate the embedding database (satisfying requisite variety). This approach places semantic evasion in direct tension with the prompt's ability to spread.
Because prompting an LLM approximates Turing completeness, we believe that agentic cybersecurity will ultimately converge on epidemiology and immunology.
Agent Embedding Guard and Immune System (AEGIS) is an opensource implementation of Semantic Immunity. One line to implement: wrapping your LLM call gives you input/output scanning, behavioral consistency checking, drift and jerk detection, the aforementioned adaptive embedding database, action and skill quarantine, Ed25519 attestation, a tiered trust system, and central monitoring capabilities similar to Anthropic's Clio system. Innate immune measures bootstrap adaptive ones.
We formally generalize the notion that systems capable of supporting complex function require lower-level component behaviors that are functionally differentiated, and map a failure mode called effacement that occurs when component differentiation is lost.
AEGIS lives on GitHub. Issues, PRs, and integration reports welcome — particularly around behavioral detection, attestation primitives, and benchmark coverage.
Visit GitHub →The papers are open and citable. If you're building on this work (e.g. defenses, simulations, theory, extensions, even corrections) we'd like to hear about it. We will actually reply.
Read papers →We're always interested in chatting with others who see what we do, especially if you'd like to implement, share, or extend our work. We don't really need money, but we could use reach.
Get in touch →