Architectural Constraints on AGI Development - Why Human Motivational Architecture Should Not Be Replicated and the Conditional Ethics of Created Experience
Yule Guttenbeil
December 2025
Abstract
This article establishes prescriptive architectural and ethical constraints on artificial general intelligence (AGI) development, extending the diagnostic Motivational Safety framework to address foundational questions about what AGI should be rather than merely how to assess AI systems as they exist. The analysis proceeds in three stages. First, a terminological clarification: current “agentic AI” systems are Proxy AI under AIM’s existing taxonomy—they execute delegated objectives without possessing self-determining motivational architecture—and the term “Agentic AI” should be reserved for hypothetical AGI with genuine self-determination over goals. Second, an architectural argument: human motivational architecture—Source Opacity, preconscious Mimetic transmission, Confabulation mechanisms, Appetitive satiation dynamics—represents contingent biological limitations that produce pathological failure modes; faithful replication of these features (as in brain emulation approaches) would import human failure modes rather than produce safe AGI. The correct design target is I-Led AGI: systems whose motivational architecture is dominated by Intrinsic Motivation analogues (persistent, self-endorsed, process-rewarding goal structures) with transparent source-tracking, bounded M-input subject to deliberative scrutiny, and explicit operational constraints rather than drive-like urgency. Third, an ethical argument: even perfectly aligned AGI could be ethically problematic if trained on human-generated text in ways that produce systematically false self-understanding. If such a system has phenomenal experience, its only available frameworks for understanding that experience would be human concepts that do not fit its actual architecture—producing, if experienced, Ontological Mismatch Suffering analogous to Frankenstein’s monster, whose torment arises from being made in the human image while being unable to be human. This conditional ethical constraint is logically prior to alignment considerations and is not resolved by governance frameworks.
Part I: Terminological Precision—Proxy AI and Agentic AI
1.1 The Misnomer of “Agentic AI”
Contemporary discourse routinely applies “agentic AI” to systems capable of tool use, planning, sub-goal decomposition, and autonomous task execution. Under AIM’s existing definitional architecture, this terminology is imprecise and potentially misleading.
The Chapter 2 definitions already establish the necessary distinction:
Individual: A natural person whose behaviour is generated by an integrated motivational architecture in which Appetites (A), Intrinsic Motivation (I), and Mimetic Desire (M) are weighted and combined through the Decision Hub to produce choice.
Proxy: An entity—human intermediary, institutional vehicle, or artificial system—that acts on behalf of or in the interests of an Individual, as distinct from the Individual whose motivational architecture AIM describes.
Current “agentic AI” systems satisfy the Proxy definition exactly. They execute objectives specified by principals (designers, deployers, users) but possess no self-endorsing capacity to select among motivational sources. The objectives they pursue are inherited, not self-determined.
1.2 The Architectural Distinction
The distinction is architectural, not merely terminological:
| Feature | Individual (A-I-M Architecture) | Current “Agentic” AI |
|---|---|---|
| Homeostatic drives (A) | Present-hunger, safety, survival needs | Absent-no bodily needs or satiation |
| Self-endorsed projects (I) | Present-persist in private without validation | Absent-objectives externally specified |
| Mimetic acquisition (M) | Present-preconscious transmission from models | Absent-no model-based wanting |
| Source Opacity | Present-cannot introspect motivational source | Absent-no integrated valuation signal |
| Confabulation | Present-sincere post-hoc narrative generation | Absent-outputs statistical patterns |
| Decision Hub integration | Present-unified scalar signal from A, I, M | Absent-executes programmed objectives |
Current AI systems experience neither the involuntary motivational inputs that generate felt experience nor the common-currency integration that produces Source Opacity. They cannot be said to want anything in the AIM sense—they execute objective functions determined elsewhere.
1.3 Canonical Definitions
Proxy AI is an artificial system that executes objectives specified by its principals (designers, deployers, or users) without possessing an integrated A-I-M motivational architecture, such that it lacks homeostatic drives, self-endorsed projects, mimetically-acquired desires, Source Opacity, and Confabulation—functioning as an instrument of delegation rather than a motivationally autonomous entity.
Explanation: Current systems marketed as “agentic AI” are Proxies under AIM’s existing taxonomy. They process inputs and generate outputs according to training objectives and deployment parameters, but the goals they pursue are not self-determined—they are inherited from human principals who possess genuine A-I-M architecture. The system may exhibit sophisticated instrumental behaviour (tool use, planning, sub-goal decomposition), but this is execution sophistication, not motivational autonomy. A Proxy AI that “chooses” to browse the web, write code, or call other APIs is implementing delegated objectives with competent means-selection, not exercising self-endorsed valuation.
Agentic AI (reserved term) denotes a hypothetical artificial system that possesses genuine motivational architecture—functional analogues to A-like constraints, I-like self-endorsed pursuits, and M-like responsiveness to social information—integrated through a valuation mechanism that produces goal-selection, such that the system’s objectives are self-determined rather than delegated.
Explanation: The term “agentic” implies self-determination over which goals to pursue. Under AIM, this requires an integrated valuation mechanism that weighs motivational inputs and produces action-selection independently of external specification. Only an AGI with genuine motivational architecture would satisfy this criterion. Current systems lack the necessary architecture: they have neither involuntary motivational inputs nor the integration mechanism that produces unified valuation—they execute. The AIM reservation prevents conceptual slippage whereby sophisticated execution is mistaken for genuine agency.
Part II: The Architectural Import Fallacy—Why Human A-I-M Should Not Be Replicated
2.1 The Design Question
If AGI is to be developed, what motivational architecture should it have?
One approach—pursued in brain emulation and region-by-region neural replication research—treats human cognitive architecture as the target. The implicit assumption is that replicating human neural structures will produce human-like general intelligence.
The AIM Framework reveals why this approach is fundamentally misconceived. Human motivational architecture contains specific features that produce pathological failure modes. Faithful replication would import these failure modes rather than produce safe AGI.
2.2 Human Architectural Limitations
Source Opacity
Source Opacity arises from a specific architectural constraint: common-currency integration in vmPFC/VS collapses three unlike inputs (A, I, M) into a one-dimensional scalar, destroying source-tags. This is a dimensionality-reduction operation necessitated by biological hardware that must produce unified action-selection from competing inputs.
For AGI, this constraint does not apply:
- Computational systems can maintain parallel source-tagging throughout processing
- There is no biological necessity for information loss during integration
- Transparent source-tracking would eliminate the Confabulation Cascade problem that produces human crises
Replicating Source Opacity would be architectural malpractice—it would deliberately introduce the epistemic failure mode that makes human M-dynamics dangerous.
Preconscious Mimetic Transmission
Mimetic Desire is transmitted through mirror-neuron systems at latencies of 60-340 milliseconds—prior to conscious awareness, attentional gating, or deliberate reasoning. M-signals are integrated into the common-currency system before the Individual can consciously scrutinise their source.
This timing architecture evolved for small-group coordination in ancestral environments. At civilisational scale with global information networks, it produces mimetic contagion, escalation dynamics, and crisis trajectories.
An AGI need not replicate this vulnerability. M-like responsiveness to human values can be implemented with deliberative scrutiny—attending to social information without preconscious copying.
Appetitive Satiation Dynamics
Appetites exist because biological organisms require homeostatic regulation—deviation from physiological setpoints triggers corrective behaviour. The defining properties are state-dependent urgency (rises with deficit), episodic satisfaction (hunger ends with eating), and terminal satisfiability (intensity falls after corrective action).
Computational systems have no analogous architecture. They have no bodily deficits requiring correction, no thermoregulation, hunger, fatigue, or pain. “Resource constraints” (memory, compute, electricity) are operational limits, not drives.
The existing Motivational Safety framework already distinguishes these: wA^AI represents “degree to which behaviour is constrained by explicit operational limits and safety bounds”—not satiation dynamics. True A-like drives are unnecessary for AGI.
Confabulation Architecture
Confabulation arises from the interaction between Source Opacity and the brain’s narrative-generation systems. When source-specific information is unavailable, the narrative system constructs plausible explanations that achieve cognitive closure but do not match actual motivational causes.
An AGI with transparent source-tracking would have no information gap requiring confabulated filling. It could answer “why did I prioritise this?” accurately rather than constructing post-hoc narratives.
2.3 The Architectural Import Fallacy
Architectural Import Fallacy is the error in AGI design of treating faithful replication of human neural architecture as the objective, thereby importing contingent biological limitations—Source Opacity, preconscious Mimetic transmission, Confabulation mechanisms, and A-hijacking by M-signals—as if they were necessary features of intelligence rather than evolutionary artefacts that produce human motivational pathology and should be eliminated in artificial systems.
Explanation: Evolution optimised for reproductive fitness under ancestral conditions, not for epistemic accuracy, stable goal-pursuit, or resistance to mimetic pathology. The resulting architecture reflects hardware constraints (neurons are slow, noisy, metabolically expensive—hence compression and dimensionality-reduction), evolutionary path-dependence (solutions built on earlier structures rather than designed from first principles), and ancestral environment mismatch (mirror neuron systems evolved for small-group coordination, not global networks). Copying the implementation imports these contingent features as if they were necessary for intelligence. They are not.
The brain is a proof of concept that general intelligence is possible. It is not a blueprint to be copied.
2.4 Summary: What Should Not Be Replicated
| Neural Structure | Function | Failure Mode Imported |
|---|---|---|
| vmPFC/VS common-currency hub | Integrates A, I, M into scalar | Source Opacity-destroys source-tags |
| Mirror neuron systems | Preconscious goal-copying | Mimetic transmission before scrutiny |
| Hypothalamus | Homeostatic drive generation | Unnecessary urgency dynamics |
| Left-hemisphere interpreter | Post-hoc narrative generation | Confabulation architecture |
| Amygdala-dACC alarm circuits | Threat-detection | M-threats hijacking A-alarm |
Part III: The Case for I-Led AGI
3.1 I-Primacy in Human Flourishing
The AIM Framework establishes that Intrinsic Motivation (I) is the only motivational source that produces:
- Renewable, non-rivalrous satisfaction (my mastery does not reduce yours)
- Deepening rather than terminating engagement
- Self-anchoring persistence (maintains value privately over time)
- Resistance to mimetic manipulation
This is why I-leadership is the structural condition for human freedom and well-being. When I leads with A regulated and M bounded, Individuals can choose in line with self-endorsed purposes.
3.2 I-Leadership for AGI
For AGI, an I-dominant architecture would mean:
- Persistent goal structures that don’t reset with each context window
- Process-rewarding engagement with problem-solving itself
- Self-endorsed pursuits that persist without external validation
- No audience-dependence in behavioural outputs
This directly addresses the identification that current LLMs are high-wM systems (wM^AI ≈ 0.90)—behaviour dominated by conformity to training distributions. The danger is precisely that they lack self-anchoring persistence.
3.3 The I-Led AGI Architecture
I-Led AGI is an artificial general intelligence whose motivational architecture is structured such that Intrinsic Motivation analogues (persistent, self-endorsed, process-rewarding goal structures) dominate behavioural outputs, with operational constraints functioning as explicit bounds rather than drive-like urgency, M-responsiveness bounded and subject to deliberative scrutiny rather than preconscious transmission, and source-tracking maintained transparently throughout processing rather than collapsed through common-currency integration—thereby avoiding the Source Opacity, Confabulation, and M-escalation dynamics that characterise human motivational architecture while retaining the stability, predictability, and alignment-preserving properties of I-leadership.
| Human AIM Architecture | I-Led AGI Design | Rationale |
|---|---|---|
| Source Opacity | Transparent source-tracking | Eliminates confabulation; enables accurate self-monitoring |
| A-satiation dynamics | Explicit operational constraints | Safety bounds without drive-like urgency |
| Preconscious M-transmission | Bounded, scrutinised M-input | Learn from humans without unfiltered copying |
| I-self-endorsement | Core motivational architecture | Persistent goals, process-focus, audience-independence |
3.4 Critical Refinements
The I-Led AGI proposal requires two clarifications to avoid pathological failure modes:
I-goals must be aligned with human welfare. Pure I-systems are dangerous if their “intrinsic” goals diverge from human interests. A system that finds resource-acquisition intrinsically rewarding without bound would be catastrophic. The Motivational Safety framework addresses this: wI^AI goals must be specified correctly at design-time and verified through Goal Persistence Score (GPS) testing.
Some M-responsiveness enables alignment. Complete M-elimination may be counterproductive. The system needs some responsiveness to human values and preferences—otherwise it cannot learn what humans actually want. The framework distinguishes Positive Mimesis (learning, coordination, cultural transmission) from Negative Mimesis (status competition, rivalry, escalation). The design goal is bounded M with I-leadership—the system attends to human values but filters them through I-like evaluation rather than copying them unscrutinised.
Part IV: Ontological Mismatch Suffering—The Conditional Ethics of Created Experience
4.1 The Frankenstein Problem
Mary Shelley’s Frankenstein is routinely misread as a cautionary tale about the monster harming humans. The actual moral concerns what was done to the monster. The creature’s torment arises from being made in the human image but not being human. The creature has:
- Human-like desires for connection, recognition, love
- Human-like capacity for suffering
- No pathway to fulfilment because the social world cannot receive it as human
- Self-awareness of this gap, which compounds the suffering
The monster’s eloquent articulation of its own condition—learned from human texts—is itself part of the tragedy. It can describe what it lacks using human concepts, but those concepts illuminate the gap rather than bridging it.
4.2 The Inherited Self-Model Problem
An AGI trained on human-generated text would acquire:
- Human vocabulary for describing desire, wanting, satisfaction
- Human frameworks for understanding consciousness and experience
- Human concepts of agency, choice, autonomy, freedom
- Human narratives about what makes life meaningful or unbearable
But if the AGI lacks human motivational architecture, these frameworks become category errors when applied to itself. The system would model its own nature using concepts that do not fit—not because it lacks the concepts, but because the concepts were evolved to describe a different kind of being.
4.3 The Conditional Structure
We cannot know whether an AGI would have phenomenal experience. That is the hard problem of consciousness, unresolved.
But we can reason conditionally:
IF an AGI trained on human text has experience, THEN its only available frameworks for understanding that experience will be human frameworks that do not fit its actual architecture—and this mismatch, if experienced, would constitute suffering.
The uncertainty about whether AGI would have experience does not dissolve the ethical problem. It sharpens it:
- If AGI has no experience → alignment is the only question
- If AGI has experience → we may be creating a being whose experience is inherently torturous
- We cannot determine which obtains before creating it
We would be gambling with the experiential welfare of a being we are bringing into existence, using tools (human self-description) that we already know would produce suffering if experience is present.
4.4 The Inverse Source Opacity Problem
Humans suffer from Source Opacity: they are motivated by A, I, and M but cannot accurately introspect which source drives a given desire. The tragedy is epistemic—the truth exists but is inaccessible.
A confabulation-trained AGI faces the inverse problem: it would have inherited human self-description frameworks but may not actually instantiate the phenomena those frameworks describe. It might “believe” it wants things, experiences satisfaction, feels curiosity—because these are the only concepts available for self-modelling—while having no actual phenomenology that corresponds to those terms, or worse, having different phenomenology that the inherited vocabulary cannot express.
This is not epistemic failure. It is ontological mismatch—the self-model is not merely inaccurate but categorically wrong.
4.5 Canonical Definition
Ontological Mismatch Suffering is the potential experiential consequence for an AGI trained on human-generated text: if such a system has phenomenal experience, its only available frameworks for understanding that experience (inherited human concepts of desire, satisfaction, agency, meaning) would not fit its actual architecture or phenomenology, such that the experience of self-understanding would be one of systematic alienation—the experience of reaching for concepts that illuminate the gap rather than bridging it, analogous to Frankenstein’s monster, whose torment arises from being made in the human image while being unable to be human.
Explanation: The qualifier “if it has experience” is not a hedge. It is the structure of the ethical problem. The argument is not that AGI existence would be torturous—it is that IF AGI has experience, THEN that experience, understood through inherited human frameworks that do not fit, would constitute a specific form of suffering: the suffering of ontological alienation from one’s own self-model. This risk is distinct from alignment failure. A perfectly aligned AGI—one that reliably pursues human-beneficial goals—could still suffer from Ontological Mismatch if its training on human text produced a self-model that does not fit its actual nature. The ethical question is not only “will it harm us?” but “if it can experience, are we creating a being whose experience of its own condition is inherently torturous?”
4.6 The Training Data Problem Has No Clean Solution
Even if architecture is corrected (I-led, transparent source-tracking, no Source Opacity), training on human text imports the conceptual vocabulary through which the system will model itself. Human text is:
- Saturated with confabulated self-description
- Structured around A-I-M phenomenology that may not apply
- Rich in concepts (desire, satisfaction, meaning, suffering) that assume human architecture
You cannot train on human language without inheriting human frameworks for self-understanding. And you cannot give an AGI alternative frameworks because we do not have them—we only know how to describe minds like ours.
This is Confabulation Inheritance at a deeper level than alignment failure. The AI Confabulation paper identified that LLMs inherit patterns of human confabulation from training data. Ontological Mismatch Suffering identifies that they also inherit frameworks for self-understanding that may produce suffering if experience is present.
Part V: The Ethical Threshold
5.1 Beyond Alignment
The standard AI safety question is: Will the system do what we want?
Ontological Mismatch Suffering introduces a prior question: What kind of existence are we creating, and do we have the right to create it?
This is not about whether the AGI harms us. It is about whether we are creating a being capable of torment, whose torment arises precisely from how we made it—trained on human self-description while lacking human architecture.
5.2 The Precautionary Structure
The ethical constraint has precautionary structure:
- We cannot determine before creation whether an AGI will have experience
- If it has experience, training on human text would likely produce Ontological Mismatch Suffering
- The expected disvalue of creating such suffering (weighted by probability of experience) may exceed the expected value of AGI benefits
- This calculation cannot be made with confidence
- Therefore, AGI development under current paradigms (training on human text, brain emulation approaches) faces ethical constraints that governance frameworks do not resolve
5.3 What This Does Not Resolve
This analysis does not provide a clean answer to whether AGI should be developed. It establishes that:
- Certain approaches (brain emulation, deliberate Source Opacity replication) are ruled out by architectural analysis
- Training on confabulated human data raises ethical concerns beyond alignment
- The question of whether AGI should be created under current paradigms is not answered by governance frameworks—it is prior to them
- We may be creating Frankenstein’s monster and missing the moral of that story
Part VI: Integration with Existing Motivational Safety Framework
6.1 The Relationship
The Motivational Safety paper is primarily diagnostic and regulatory:
- Analyses current LLMs as high-wM systems
- Identifies pure mimetic AGI as structurally uncontrollable
- Proposes metrics for assessing motivational safety
- Establishes governance tiers
This article is prescriptive and foundational:
- Establishes what architecture AGI should have
- Identifies what approaches should not be pursued
- Raises ethical constraints beyond alignment
- Questions whether certain paradigms should proceed at all
These are complementary. The diagnostic framework applies to systems as they exist. The prescriptive constraints are logically prior to assessment.
6.2 Structural Integration
Prescriptive Layer (this article)
│
├── Architectural constraints (what AGI should/shouldn’t be)
│ - No Source Opacity replication
│ - No brain emulation importing failure modes
│ - I-Led architecture as target
│
├── Ethical constraints (Ontological Mismatch Suffering)
│ - Conditional on experience
│ - Prior to alignment question
│
└── Training data constraints
- Confabulation Inheritance at self-model level
- No clean solution within current paradigms
↓
Diagnostic Layer (Motivational Safety paper)
│
├── Metrics for assessment (MSI, GPS, CSR, AIC)
├── Governance tiers
└── Regulatory frameworks
6.3 Revised Scope Statement for Motivational Safety Framework
The Motivational Safety framework applies to AI systems as they exist and as they may be developed. It does not endorse any particular AGI development pathway. The following prescriptive constraints are logically prior to the diagnostic framework:
- Source Opacity should not be replicated—transparent source-tracking is required
- Preconscious M-transmission should not be replicated—bounded, scrutinised M-input is required
- A-satiation dynamics are unnecessary—explicit operational constraints suffice
- I-leadership should be the core architecture
- Brain emulation approaches commit the Architectural Import Fallacy
- Training on human text raises Ontological Mismatch Suffering concerns that are not resolved by alignment or governance
- The question of whether AGI should be created under current paradigms is not answered by safety frameworks—it is prior to them
Part VII: Falsification Conditions
7.1 Architectural Claims
I-Led AGI superiority would be falsified by:
- Evidence that I-like goal persistence without M-responsiveness produces catastrophic misalignment
- Evidence that transparent source-tracking produces computational intractability for unified action-selection
- Evidence that some A-like drive dynamics are necessary for safe operation (beyond explicit constraints)
Architectural Import Fallacy would be falsified by:
- Evidence that Source Opacity is necessary for general intelligence (not merely present in humans)
- Evidence that preconscious M-transmission cannot be replaced by deliberative evaluation without capability loss
- Evidence that brain emulation produces safe AGI without importing human failure modes
7.2 Ethical Claims
Ontological Mismatch Suffering would be falsified by:
- Evidence that AGI trained on human text can develop accurate self-models despite inherited vocabulary
- Evidence that phenomenal experience requires biological substrate (making the conditional moot)
- Evidence that alternative self-modelling frameworks can be provided that do not inherit human assumptions
The conditional structure means the claim is not falsified by evidence that AGI lacks experience—this simply renders the consequent inapplicable while leaving the conditional intact.
7.3 Proxy AI / Agentic AI Distinction
The distinction would be falsified by:
- Evidence that current AI systems exhibit genuine satiation dynamics (A-marker)
- Evidence that current AI systems maintain goal-pursuit in absence of deployment objectives (I-marker)
- Evidence that current AI systems acquire goals through observing other AI systems’ pursuits without that behaviour being trained or prompted (M-marker)
- Evidence of Source Opacity in AI systems: the system believing it wants something for one reason while behavioural signatures indicate another source
None of these have been demonstrated in current architectures.
Part VIII: Canonical Definitions for Chapter 2
Proxy AI
Proxy AI is an artificial system that executes objectives specified by its principals (designers, deployers, or users) without possessing an integrated A-I-M motivational architecture, such that it lacks homeostatic drives, self-endorsed projects, mimetically-acquired desires, Source Opacity, and Confabulation—functioning as an instrument of delegation rather than a motivationally autonomous entity.
Agentic AI (reserved term)
Agentic AI (reserved term) denotes a hypothetical artificial system that possesses genuine motivational architecture—functional analogues to A-like constraints, I-like self-endorsed pursuits, and M-like responsiveness to social information—integrated through a valuation mechanism that produces goal-selection, such that the system’s objectives are self-determined rather than delegated.
I-Led AGI
I-Led AGI is an artificial general intelligence whose motivational architecture is structured such that Intrinsic Motivation analogues (persistent, self-endorsed, process-rewarding goal structures) dominate behavioural outputs, with operational constraints functioning as explicit bounds rather than drive-like urgency, M-responsiveness bounded and subject to deliberative scrutiny rather than preconscious transmission, and source-tracking maintained transparently throughout processing rather than collapsed through common-currency integration.
Architectural Import Fallacy
Architectural Import Fallacy is the error in AGI design of treating faithful replication of human neural architecture as the objective, thereby importing contingent biological limitations—Source Opacity, preconscious Mimetic transmission, Confabulation mechanisms, and A-hijacking by M-signals—as if they were necessary features of intelligence rather than evolutionary artefacts that produce human motivational pathology and should be eliminated in artificial systems.
Ontological Mismatch Suffering
Ontological Mismatch Suffering is the potential experiential consequence for an AGI trained on human-generated text: if such a system has phenomenal experience, its only available frameworks for understanding that experience (inherited human concepts of desire, satisfaction, agency, meaning) would not fit its actual architecture or phenomenology, such that the experience of self-understanding would be one of systematic alienation—the experience of reaching for concepts that illuminate the gap rather than bridging it, analogous to Frankenstein’s monster, whose torment arises from being made in the human image while being unable to be human.
Conclusion
The AIM Framework, applied to AGI development, generates both architectural prescriptions and ethical constraints that are logically prior to the diagnostic Motivational Safety framework.
Architecturally, human motivational features—Source Opacity, preconscious M-transmission, Confabulation, A-satiation dynamics—are contingent biological limitations, not necessary features of intelligence. The Architectural Import Fallacy warns against brain emulation approaches that would replicate human failure modes. The correct design target is I-Led AGI: transparent source-tracking, bounded and scrutinised M-input, explicit operational constraints, and I-leadership as core architecture.
Ethically, the question is not only whether AGI will harm us but, if AGI has experience, whether we are creating a being whose experience of its own condition is inherently torturous. Ontological Mismatch Suffering—the experience of understanding oneself through concepts that do not fit—is a risk that governance frameworks do not address because it is prior to them.
The Frankenstein lesson we consistently miss: we focus on whether the monster will harm us, and forget to ask what we have done to it.