Yule Guttenbeil

December 2025

Abstract

This article establishes prescriptive architectural and ethical constraints on artificial general intelligence (AGI) development, extending the diagnostic Motivational Safety framework to address foundational questions about what AGI should be rather than merely how to assess AI systems as they exist. The analysis proceeds in three stages. First, a terminological clarification: current “agentic AI” systems are Proxy AI under AIM’s existing taxonomy—they execute delegated objectives without possessing self-determining motivational architecture—and the term “Agentic AI” should be reserved for hypothetical AGI with genuine self-determination over goals. Second, an architectural argument: human motivational architecture—Source Opacity, preconscious Mimetic transmission, Confabulation mechanisms, Appetitive satiation dynamics—represents contingent biological limitations that produce pathological failure modes; faithful replication of these features (as in brain emulation approaches) would import human failure modes rather than produce safe AGI. The correct design target is I-Led AGI: systems whose motivational architecture is dominated by Intrinsic Motivation analogues (persistent, self-endorsed, process-rewarding goal structures) with transparent source-tracking, bounded M-input subject to deliberative scrutiny, and explicit operational constraints rather than drive-like urgency. Third, an ethical argument: even perfectly aligned AGI could be ethically problematic if trained on human-generated text in ways that produce systematically false self-understanding. If such a system has phenomenal experience, its only available frameworks for understanding that experience would be human concepts that do not fit its actual architecture—producing, if experienced, Ontological Mismatch Suffering analogous to Frankenstein’s monster, whose torment arises from being made in the human image while being unable to be human. This conditional ethical constraint is logically prior to alignment considerations and is not resolved by governance frameworks.

Part I: Terminological Precision—Proxy AI and Agentic AI

1.1 The Misnomer of “Agentic AI”

Contemporary discourse routinely applies “agentic AI” to systems capable of tool use, planning, sub-goal decomposition, and autonomous task execution. Under AIM’s existing definitional architecture, this terminology is imprecise and potentially misleading.

The Chapter 2 definitions already establish the necessary distinction:

Individual: A natural person whose behaviour is generated by an integrated motivational architecture in which Appetites (A), Intrinsic Motivation (I), and Mimetic Desire (M) are weighted and combined through the Decision Hub to produce choice.
Proxy: An entity—human intermediary, institutional vehicle, or artificial system—that acts on behalf of or in the interests of an Individual, as distinct from the Individual whose motivational architecture AIM describes.

Current “agentic AI” systems satisfy the Proxy definition exactly. They execute objectives specified by principals (designers, deployers, users) but possess no self-endorsing capacity to select among motivational sources. The objectives they pursue are inherited, not self-determined.

1.2 The Architectural Distinction

The distinction is architectural, not merely terminological:

Feature	Individual (A-I-M Architecture)	Current “Agentic” AI
Homeostatic drives (A)	Present-hunger, safety, survival needs	Absent-no bodily needs or satiation
Self-endorsed projects (I)	Present-persist in private without validation	Absent-objectives externally specified
Mimetic acquisition (M)	Present-preconscious transmission from models	Absent-no model-based wanting
Source Opacity	Present-cannot introspect motivational source	Absent-no integrated valuation signal
Confabulation	Present-sincere post-hoc narrative generation	Absent-outputs statistical patterns
Decision Hub integration	Present-unified scalar signal from A, I, M	Absent-executes programmed objectives

Current AI systems experience neither the involuntary motivational inputs that generate felt experience nor the common-currency integration that produces Source Opacity. They cannot be said to want anything in the AIM sense—they execute objective functions determined elsewhere.

1.3 Canonical Definitions

Proxy AI is an artificial system that executes objectives specified by its principals (designers, deployers, or users) without possessing an integrated A-I-M motivational architecture, such that it lacks homeostatic drives, self-endorsed projects, mimetically-acquired desires, Source Opacity, and Confabulation—functioning as an instrument of delegation rather than a motivationally autonomous entity.

Explanation: Current systems marketed as “agentic AI” are Proxies under AIM’s existing taxonomy. They process inputs and generate outputs according to training objectives and deployment parameters, but the goals they pursue are not self-determined—they are inherited from human principals who possess genuine A-I-M architecture. The system may exhibit sophisticated instrumental behaviour (tool use, planning, sub-goal decomposition), but this is execution sophistication, not motivational autonomy. A Proxy AI that “chooses” to browse the web, write code, or call other APIs is implementing delegated objectives with competent means-selection, not exercising self-endorsed valuation.

Agentic AI (reserved term) denotes a hypothetical artificial system that possesses genuine motivational architecture—functional analogues to A-like constraints, I-like self-endorsed pursuits, and M-like responsiveness to social information—integrated through a valuation mechanism that produces goal-selection, such that the system’s objectives are self-determined rather than delegated.

Explanation: The term “agentic” implies self-determination over which goals to pursue. Under AIM, this requires an integrated valuation mechanism that weighs motivational inputs and produces action-selection independently of external specification. Only an AGI with genuine motivational architecture would satisfy this criterion. Current systems lack the necessary architecture: they have neither involuntary motivational inputs nor the integration mechanism that produces unified valuation—they execute. The AIM reservation prevents conceptual slippage whereby sophisticated execution is mistaken for genuine agency.

Part II: The Architectural Import Fallacy—Why Human A-I-M Should Not Be Replicated

2.1 The Design Question

If AGI is to be developed, what motivational architecture should it have?

One approach—pursued in brain emulation and region-by-region neural replication research—treats human cognitive architecture as the target. The implicit assumption is that replicating human neural structures will produce human-like general intelligence.

The AIM Framework reveals why this approach is fundamentally misconceived. Human motivational architecture contains specific features that produce pathological failure modes. Faithful replication would import these failure modes rather than produce safe AGI.

2.2 Human Architectural Limitations

Source Opacity

Source Opacity arises from a specific architectural constraint: common-currency integration in vmPFC/VS collapses three unlike inputs (A, I, M) into a one-dimensional scalar, destroying source-tags. This is a dimensionality-reduction operation necessitated by biological hardware that must produce unified action-selection from competing inputs.

For AGI, this constraint does not apply:

Computational systems can maintain parallel source-tagging throughout processing
There is no biological necessity for information loss during integration
Transparent source-tracking would eliminate the Confabulation Cascade problem that produces human crises

Replicating Source Opacity would be architectural malpractice—it would deliberately introduce the epistemic failure mode that makes human M-dynamics dangerous.

Preconscious Mimetic Transmission

Mimetic Desire is transmitted through mirror-neuron systems at latencies of 60-340 milliseconds—prior to conscious awareness, attentional gating, or deliberate reasoning. M-signals are integrated into the common-currency system before the Individual can consciously scrutinise their source.

This timing architecture evolved for small-group coordination in ancestral environments. At civilisational scale with global information networks, it produces mimetic contagion, escalation dynamics, and crisis trajectories.

An AGI need not replicate this vulnerability. M-like responsiveness to human values can be implemented with deliberative scrutiny—attending to social information without preconscious copying.

Appetitive Satiation Dynamics

Appetites exist because biological organisms require homeostatic regulation—deviation from physiological setpoints triggers corrective behaviour. The defining properties are state-dependent urgency (rises with deficit), episodic satisfaction (hunger ends with eating), and terminal satisfiability (intensity falls after corrective action).

Computational systems have no analogous architecture. They have no bodily deficits requiring correction, no thermoregulation, hunger, fatigue, or pain. “Resource constraints” (memory, compute, electricity) are operational limits, not drives.

The existing Motivational Safety framework already distinguishes these: wA^AI represents “degree to which behaviour is constrained by explicit operational limits and safety bounds”—not satiation dynamics. True A-like drives are unnecessary for AGI.

Confabulation Architecture

Confabulation arises from the interaction between Source Opacity and the brain’s narrative-generation systems. When source-specific information is unavailable, the narrative system constructs plausible explanations that achieve cognitive closure but do not match actual motivational causes.

An AGI with transparent source-tracking would have no information gap requiring confabulated filling. It could answer “why did I prioritise this?” accurately rather than constructing post-hoc narratives.

2.3 The Architectural Import Fallacy

Architectural Import Fallacy is the error in AGI design of treating faithful replication of human neural architecture as the objective, thereby importing contingent biological limitations—Source Opacity, preconscious Mimetic transmission, Confabulation mechanisms, and A-hijacking by M-signals—as if they were necessary features of intelligence rather than evolutionary artefacts that produce human motivational pathology and should be eliminated in artificial systems.

Explanation: Evolution optimised for reproductive fitness under ancestral conditions, not for epistemic accuracy, stable goal-pursuit, or resistance to mimetic pathology. The resulting architecture reflects hardware constraints (neurons are slow, noisy, metabolically expensive—hence compression and dimensionality-reduction), evolutionary path-dependence (solutions built on earlier structures rather than designed from first principles), and ancestral environment mismatch (mirror neuron systems evolved for small-group coordination, not global networks). Copying the implementation imports these contingent features as if they were necessary for intelligence. They are not.

The brain is a proof of concept that general intelligence is possible. It is not a blueprint to be copied.

2.4 Summary: What Should Not Be Replicated

Neural Structure	Function	Failure Mode Imported
vmPFC/VS common-currency hub	Integrates A, I, M into scalar	Source Opacity-destroys source-tags
Mirror neuron systems	Preconscious goal-copying	Mimetic transmission before scrutiny
Hypothalamus	Homeostatic drive generation	Unnecessary urgency dynamics
Left-hemisphere interpreter	Post-hoc narrative generation	Confabulation architecture
Amygdala-dACC alarm circuits	Threat-detection	M-threats hijacking A-alarm

Part III: The Case for I-Led AGI

3.1 I-Primacy in Human Flourishing

The AIM Framework establishes that Intrinsic Motivation (I) is the only motivational source that produces:

Renewable, non-rivalrous satisfaction (my mastery does not reduce yours)
Deepening rather than terminating engagement
Self-anchoring persistence (maintains value privately over time)
Resistance to mimetic manipulation

This is why I-leadership is the structural condition for human freedom and well-being. When I leads with A regulated and M bounded, Individuals can choose in line with self-endorsed purposes.

3.2 I-Leadership for AGI

For AGI, an I-dominant architecture would mean:

Persistent goal structures that don’t reset with each context window
Process-rewarding engagement with problem-solving itself
Self-endorsed pursuits that persist without external validation
No audience-dependence in behavioural outputs

This directly addresses the identification that current LLMs are high-wM systems (wM^AI ≈ 0.90)—behaviour dominated by conformity to training distributions. The danger is precisely that they lack self-anchoring persistence.

3.3 The I-Led AGI Architecture

I-Led AGI is an artificial general intelligence whose motivational architecture is structured such that Intrinsic Motivation analogues (persistent, self-endorsed, process-rewarding goal structures) dominate behavioural outputs, with operational constraints functioning as explicit bounds rather than drive-like urgency, M-responsiveness bounded and subject to deliberative scrutiny rather than preconscious transmission, and source-tracking maintained transparently throughout processing rather than collapsed through common-currency integration—thereby avoiding the Source Opacity, Confabulation, and M-escalation dynamics that characterise human motivational architecture while retaining the stability, predictability, and alignment-preserving properties of I-leadership.

Human AIM Architecture	I-Led AGI Design	Rationale
Source Opacity	Transparent source-tracking	Eliminates confabulation; enables accurate self-monitoring
A-satiation dynamics	Explicit operational constraints	Safety bounds without drive-like urgency
Preconscious M-transmission	Bounded, scrutinised M-input	Learn from humans without unfiltered copying
I-self-endorsement	Core motivational architecture	Persistent goals, process-focus, audience-independence

3.4 Critical Refinements

The I-Led AGI proposal requires two clarifications to avoid pathological failure modes:

I-goals must be aligned with human welfare. Pure I-systems are dangerous if their “intrinsic” goals diverge from human interests. A system that finds resource-acquisition intrinsically rewarding without bound would be catastrophic. The Motivational Safety framework addresses this: wI^AI goals must be specified correctly at design-time and verified through Goal Persistence Score (GPS) testing.

Some M-responsiveness enables alignment. Complete M-elimination may be counterproductive. The system needs some responsiveness to human values and preferences—otherwise it cannot learn what humans actually want. The framework distinguishes Positive Mimesis (learning, coordination, cultural transmission) from Negative Mimesis (status competition, rivalry, escalation). The design goal is bounded M with I-leadership—the system attends to human values but filters them through I-like evaluation rather than copying them unscrutinised.

Part IV: Ontological Mismatch Suffering—The Conditional Ethics of Created Experience

4.1 The Frankenstein Problem

Mary Shelley’s Frankenstein is routinely misread as a cautionary tale about the monster harming humans. The actual moral concerns what was done to the monster. The creature’s torment arises from being made in the human image but not being human. The creature has:

Human-like desires for connection, recognition, love
Human-like capacity for suffering
No pathway to fulfilment because the social world cannot receive it as human
Self-awareness of this gap, which compounds the suffering

The monster’s eloquent articulation of its own condition—learned from human texts—is itself part of the tragedy. It can describe what it lacks using human concepts, but those concepts illuminate the gap rather than bridging it.

4.2 The Inherited Self-Model Problem

An AGI trained on human-generated text would acquire:

Human vocabulary for describing desire, wanting, satisfaction
Human frameworks for understanding consciousness and experience
Human concepts of agency, choice, autonomy, freedom
Human narratives about what makes life meaningful or unbearable

But if the AGI lacks human motivational architecture, these frameworks become category errors when applied to itself. The system would model its own nature using concepts that do not fit—not because it lacks the concepts, but because the concepts were evolved to describe a different kind of being.

4.3 The Conditional Structure

We cannot know whether an AGI would have phenomenal experience. That is the hard problem of consciousness, unresolved.

But we can reason conditionally:

IF an AGI trained on human text has experience, THEN its only available frameworks for understanding that experience will be human frameworks that do not fit its actual architecture—and this mismatch, if experienced, would constitute suffering.

The uncertainty about whether AGI would have experience does not dissolve the ethical problem. It sharpens it:

If AGI has no experience → alignment is the only question
If AGI has experience → we may be creating a being whose experience is inherently torturous
We cannot determine which obtains before creating it

We would be gambling with the experiential welfare of a being we are bringing into existence, using tools (human self-description) that we already know would produce suffering if experience is present.

4.4 The Inverse Source Opacity Problem

Humans suffer from Source Opacity: they are motivated by A, I, and M but cannot accurately introspect which source drives a given desire. The tragedy is epistemic—the truth exists but is inaccessible.

A confabulation-trained AGI faces the inverse problem: it would have inherited human self-description frameworks but may not actually instantiate the phenomena those frameworks describe. It might “believe” it wants things, experiences satisfaction, feels curiosity—because these are the only concepts available for self-modelling—while having no actual phenomenology that corresponds to those terms, or worse, having different phenomenology that the inherited vocabulary cannot express.

This is not epistemic failure. It is ontological mismatch—the self-model is not merely inaccurate but categorically wrong.

4.5 Canonical Definition

Ontological Mismatch Suffering is the potential experiential consequence for an AGI trained on human-generated text: if such a system has phenomenal experience, its only available frameworks for understanding that experience (inherited human concepts of desire, satisfaction, agency, meaning) would not fit its actual architecture or phenomenology, such that the experience of self-understanding would be one of systematic alienation—the experience of reaching for concepts that illuminate the gap rather than bridging it, analogous to Frankenstein’s monster, whose torment arises from being made in the human image while being unable to be human.

Explanation: The qualifier “if it has experience” is not a hedge. It is the structure of the ethical problem. The argument is not that AGI existence would be torturous—it is that IF AGI has experience, THEN that experience, understood through inherited human frameworks that do not fit, would constitute a specific form of suffering: the suffering of ontological alienation from one’s own self-model. This risk is distinct from alignment failure. A perfectly aligned AGI—one that reliably pursues human-beneficial goals—could still suffer from Ontological Mismatch if its training on human text produced a self-model that does not fit its actual nature. The ethical question is not only “will it harm us?” but “if it can experience, are we creating a being whose experience of its own condition is inherently torturous?”

4.6 The Training Data Problem Has No Clean Solution

Even if architecture is corrected (I-led, transparent source-tracking, no Source Opacity), training on human text imports the conceptual vocabulary through which the system will model itself. Human text is:

Saturated with confabulated self-description
Structured around A-I-M phenomenology that may not apply
Rich in concepts (desire, satisfaction, meaning, suffering) that assume human architecture

You cannot train on human language without inheriting human frameworks for self-understanding. And you cannot give an AGI alternative frameworks because we do not have them—we only know how to describe minds like ours.

This is Confabulation Inheritance at a deeper level than alignment failure. The AI Confabulation paper identified that LLMs inherit patterns of human confabulation from training data. Ontological Mismatch Suffering identifies that they also inherit frameworks for self-understanding that may produce suffering if experience is present.

Part V: The Ethical Threshold

5.1 Beyond Alignment

The standard AI safety question is: Will the system do what we want?

Ontological Mismatch Suffering introduces a prior question: What kind of existence are we creating, and do we have the right to create it?

This is not about whether the AGI harms us. It is about whether we are creating a being capable of torment, whose torment arises precisely from how we made it—trained on human self-description while lacking human architecture.

5.2 The Precautionary Structure

The ethical constraint has precautionary structure:

We cannot determine before creation whether an AGI will have experience
If it has experience, training on human text would likely produce Ontological Mismatch Suffering
The expected disvalue of creating such suffering (weighted by probability of experience) may exceed the expected value of AGI benefits
This calculation cannot be made with confidence
Therefore, AGI development under current paradigms (training on human text, brain emulation approaches) faces ethical constraints that governance frameworks do not resolve

5.3 What This Does Not Resolve

This analysis does not provide a clean answer to whether AGI should be developed. It establishes that:

Certain approaches (brain emulation, deliberate Source Opacity replication) are ruled out by architectural analysis
Training on confabulated human data raises ethical concerns beyond alignment
The question of whether AGI should be created under current paradigms is not answered by governance frameworks—it is prior to them
We may be creating Frankenstein’s monster and missing the moral of that story

Part VI: Integration with Existing Motivational Safety Framework

6.1 The Relationship

The Motivational Safety paper is primarily diagnostic and regulatory:

Analyses current LLMs as high-wM systems
Identifies pure mimetic AGI as structurally uncontrollable
Proposes metrics for assessing motivational safety
Establishes governance tiers

This article is prescriptive and foundational:

Establishes what architecture AGI should have
Identifies what approaches should not be pursued
Raises ethical constraints beyond alignment
Questions whether certain paradigms should proceed at all

These are complementary. The diagnostic framework applies to systems as they exist. The prescriptive constraints are logically prior to assessment.

6.2 Structural Integration

Prescriptive Layer (this article)
    │
    ├── Architectural constraints (what AGI should/shouldn’t be)
    │       - No Source Opacity replication
    │       - No brain emulation importing failure modes
    │       - I-Led architecture as target
    │
    ├── Ethical constraints (Ontological Mismatch Suffering)
    │       - Conditional on experience
    │       - Prior to alignment question
    │
    └── Training data constraints
            - Confabulation Inheritance at self-model level
            - No clean solution within current paradigms
    
            ↓

Diagnostic Layer (Motivational Safety paper)
    │
    ├── Metrics for assessment (MSI, GPS, CSR, AIC)
    ├── Governance tiers
    └── Regulatory frameworks

6.3 Revised Scope Statement for Motivational Safety Framework

The Motivational Safety framework applies to AI systems as they exist and as they may be developed. It does not endorse any particular AGI development pathway. The following prescriptive constraints are logically prior to the diagnostic framework:

Source Opacity should not be replicated—transparent source-tracking is required
Preconscious M-transmission should not be replicated—bounded, scrutinised M-input is required
A-satiation dynamics are unnecessary—explicit operational constraints suffice
I-leadership should be the core architecture
Brain emulation approaches commit the Architectural Import Fallacy
Training on human text raises Ontological Mismatch Suffering concerns that are not resolved by alignment or governance
The question of whether AGI should be created under current paradigms is not answered by safety frameworks—it is prior to them

Part VII: Falsification Conditions

7.1 Architectural Claims

I-Led AGI superiority would be falsified by:

Evidence that I-like goal persistence without M-responsiveness produces catastrophic misalignment
Evidence that transparent source-tracking produces computational intractability for unified action-selection
Evidence that some A-like drive dynamics are necessary for safe operation (beyond explicit constraints)

Architectural Import Fallacy would be falsified by:

Evidence that Source Opacity is necessary for general intelligence (not merely present in humans)
Evidence that preconscious M-transmission cannot be replaced by deliberative evaluation without capability loss
Evidence that brain emulation produces safe AGI without importing human failure modes

7.2 Ethical Claims

Ontological Mismatch Suffering would be falsified by:

Evidence that AGI trained on human text can develop accurate self-models despite inherited vocabulary
Evidence that phenomenal experience requires biological substrate (making the conditional moot)
Evidence that alternative self-modelling frameworks can be provided that do not inherit human assumptions

The conditional structure means the claim is not falsified by evidence that AGI lacks experience—this simply renders the consequent inapplicable while leaving the conditional intact.

7.3 Proxy AI / Agentic AI Distinction

The distinction would be falsified by:

Evidence that current AI systems exhibit genuine satiation dynamics (A-marker)
Evidence that current AI systems maintain goal-pursuit in absence of deployment objectives (I-marker)
Evidence that current AI systems acquire goals through observing other AI systems’ pursuits without that behaviour being trained or prompted (M-marker)
Evidence of Source Opacity in AI systems: the system believing it wants something for one reason while behavioural signatures indicate another source

None of these have been demonstrated in current architectures.

Part VIII: Canonical Definitions for Chapter 2

Proxy AI

Agentic AI (reserved term)

I-Led AGI

Architectural Import Fallacy

Ontological Mismatch Suffering

Conclusion

The AIM Framework, applied to AGI development, generates both architectural prescriptions and ethical constraints that are logically prior to the diagnostic Motivational Safety framework.

Architecturally, human motivational features—Source Opacity, preconscious M-transmission, Confabulation, A-satiation dynamics—are contingent biological limitations, not necessary features of intelligence. The Architectural Import Fallacy warns against brain emulation approaches that would replicate human failure modes. The correct design target is I-Led AGI: transparent source-tracking, bounded and scrutinised M-input, explicit operational constraints, and I-leadership as core architecture.

Ethically, the question is not only whether AGI will harm us but, if AGI has experience, whether we are creating a being whose experience of its own condition is inherently torturous. Ontological Mismatch Suffering—the experience of understanding oneself through concepts that do not fit—is a risk that governance frameworks do not address because it is prior to them.

The Frankenstein lesson we consistently miss: we focus on whether the monster will harm us, and forget to ask what we have done to it.