Superalignment: Cultural Logics as Safety Parameters

(Preliminary Draft)

Sam Mann

February 20, 2026

How does the cultural logic embedded in a researcher’s or author's cognition limit and dictate what computational ethical frameworks they have the capacity to develop?

Table of Contents

Section 1: Introduction

Section 2: Gödelian and Cultural Logics

Section 3: Three Approaches to Computational Ethics (Asimov, Leike, Lehrer)

Section 4: GPT Case Studies

Section 5: Conclusion

Section 1: Introduction

Superintelligence entails the cognitive ability to outperform human cognitive systems across domains. Thus, one ethical implication of such advanced systems includes the integration of human values and ethics. Human values and ethics are pluralistic and thus prone to being in tension with one another; this can result in contradictions, ambiguities, paradoxes and also uncertainty. In order to lay out such tensions, it is more effective to explore the structure via which AI reasons about values, rather than focusing on the content of values; rather than just content, the architecture of such systems will be considered at the intersection of cultural, computational, and cognitive frameworks.

Following the introduction, Section Two will introduce two theoretical frameworks: 1) Gödel's Incompleteness Theorems and 2) Cultural Logics. Gödel's Theorems demonstrate the structural limitations of formal systems; Cultural Logics serve as a means of navigating Gödel's structural limitations. After the connection between Gödelian and Cultural Logics is made, Section Three will explore three authors/researchers: Issac Asimov, Jan Leike, Eli Lehrer; also will be introduced: how each of their computational ethics absorbs a cultural logic. The analysis will go a step further by demonstrating how the cultural logics embedded in their cognitions, respectively, limits and dictates what computational ethical frameworks they have the capacity to develop. In Section 4, GPT Case Studies will be introduced as a practical application of the theoretical arguments being made. The GPT Builders and their interface will be briefly explained; six GPT Builders will simulate one cultural logic each, and then be applied to real-world ethical dilemmas surrounding medical ethics and DEI. Such a simulation will serve as a testbed for AI; it will demonstrate how different AI (cognitive-cultural-computational) modes can process the plurality of values and ethics that encompass the human experience. Finally, Section 5 will serve as a conclusion laying out the implications for advanced AI systems and their safety parameters.

Section 2: Gödelian and Cultural Logics

Gödelian Logic

Think of a Book of Grammar Rules that states:

"In the Book of Grammar Rules, all the rules are true."

However, you must leave out the following line:

"This grammar rules is not true."

The line — "This grammar rule is not true." — this line is TRUE statement in itself.

Despite its truth that line cannot be included in the Book of Grammar Rules and thus the book will never recognize its own limitations.

The Book of Grammar Rules will never be fully COMPLETE.

Thus, Gödel's Incompleteness Theorems, based in mathematical logic, states that: within any FORMAL system, there will always be true statements that cannot be proven within that system (first theorem), and that the system cannot prove its own consistency (second theorem). The system would need external verification.

Kurt Gödel's work applies to AI due to its inability to step outside itself and externally verify its own logic. While Gödelian logic aligns with the notion that some truths have the inability to be proven within a formal system, for probabilistic systems it's about the stabilization and destabilization of meaning, rather than about provability.

This will later connect to the work of the three authors/researchers; for example, Asimov's Three Laws are formal or rule-based and thus, according to Gödel's Theorems, a robot who is programmed with such would hit limitations that would necessitate external feedback.

Cultural Logics

Cultural movements are broad societal movements that shape everything; they shape literature, the arts, graphics design, architecture — and even technology. On a meta-level, cultural movements embed their logic into such fields. How a society navigates its moral consciousness during a given era is also greatly influenced by cultural logics and thus it also shapes cognitive reasoning modes on an individual level; this means the structure of Asimov, Leike, and Lehrer's cognitive reasoning mode and the cultural logics embedded within them, gives rise to their respective computational ethics.

Cultural movements function as contradictions. Thus, they transition from one to the next by reacting in opposition to some elements of the prior movements, while also carrying forward other aspects of that same movement. As the movement of Romanticism reached its end, Modernism took off in the 1890s and approached its end in the 1940s/1950s. Postmodernism's estimated time period is 1960s to 1990s and Metamodernism's is mid-2000s to the present.

Modernism consisted of a move away from Romanticism's realism and onwards towards abstraction (and thus the development of abstract art); it emphasized objective truth and saw contradictions as errors to resolve; it prioritized resolution of ambiguity and uncetainty, paired with clarity and systematization. Modernism believed in a universal abstract theory as the underlying structure of society (thus the prominence of ideologies such as Nazism and Marxism). Late Modernism compounded the trauma of the World Wars, Nazism, colonialism and atomic warfare and thus it was Modernism but with a logic that emitted crisis rather than objective productivity; the cultural logic was aware of its crisis but could not adequately address it.

Postmodernism rejected modernism's universal notions in favor of subjectivity. It preferred individual human experience over universality and was skeptical towards its predecessors' notions of progress, innovation and science to back what it classified as objective truth. French Postmodernism lasted from the 1960s to 1980s and involved Derrida's framework of deconstruction, which framed language as unstable, fluid and open to multiple interpretations due to no stable frame of reference. It consisted of a refusal to resolve paradoxes and contradictions. 

If French Postmodernism was deconstruction, then Radical Postmodernism of the 1990s and 2000s was reconstruction. Anglo-American theorists of this time period saw the gap in French Postmodernism and went on to establish a framework for marginalized voices. Radical Postmodernism believed there was no universal truth and, unlike French Postmodernism, it did not dismantle it but instead argued that some subjective realities needed to be in a position of greater salience. The current era of Metamodernism started in the mid-2000s and took the contradictory relationship between Modernism and Postmodernism, and synthesized it in the form of oscillation. By oscillating between the two prior cultural logics, Metamodernism attempted to metabolize the contradictions, rather than viewing them as underlying truths or as instabilities that were not navigable.

Cultural Logics are a response to structural Gödelian limits, and together the theories demonstrate the mechanism by which such systems fail. The arguments will center Modernism, Late Modernism, and Metamodernism, but a brief overview will be given for how AI's reasoning mode relates to all five Cultural Logics. Authors/researchers with a Modernist cognitive mode, such as Asimov, will deviate towards developing computational ethical frameworks that prioritize formal-rules, hierarchies, and categories; Gödelian limits would thus be a valid theoretical framework for pinpointing the mechanisms by which his Three Laws hit their limitations. Late Modernist authors/researchers have greater awareness of the flaws of Modernist reasoning but are unable to adequately address it; this is the case for Leike who recognizes the flaws of formalization and explores more nuanced approaches, but despite this his cognitive reasoning mode deviates back to Modernist notions of formalization. French Postmodernism argues that language has no stable frame of reference; language can have different meaning in different contexts; the inability to navigate contexts indicates a need for a meta-approach which Metamodernism addresses. French Postmodernism's exploration of language and its instability, parallels instability in AI optimization; for example, "fairness'" has varying definitions for different people and contexts; thus, there isn't a singular correct definition for the system to optimize as the optimization target is context-dependent; this emphasizes Gödel's claim you cannot formalize what is at a foundational level unstable, which in this case is the context-dependent nature of human ethics.

Radical Postmodernism's theoretical framework informs computational ethics in the context of algorithmic bias; it also implicates French Postmodernism as different frames of reference (or social groups) have varying interpretations of concepts, values and ethics; in the context of Gödel's Theorems, a single formal framework cannot encompass frames from several standpoints. Metamodernism takes the bird's eye view of the interconnected relationship between Modernism and Postmodernism and does so without relativism; it holds the tension resulting from contradictions and attempts to productively metabolize it; this meta-approach becomes its means of navigating Gödel's limitations and it is the approach attempted by Lehrer's framework.

Section 3: Three Approaches to Computational Ethics (Asimov, Leike, Lehrer)

Asimov

Asimov's Three Laws state:

(1) A robot may not injure a human being or, through inaction, allow a human being to come to harm.

(2) A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.

(3) A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

Asimov's Three Laws are representative of Modernism as they are hierarchical and rule-based; they are framed as universally applicable but external probing (or verification) brings forth the limitations present beyond the echo chambers of its own formalization; in context of Gödel's Theorems, the limits on its universal application is the result of its incompleteness. The rule-based approach also leaves little room for the ambiguity required to metabolize the two conflicting rules the robot hits; Modernist logic, like the Three Laws, did not have a built in mechanism for metabolizing contradictions productively either.

In line with the Gödelian notion of external verification, the robot required external feedback to stabilize because the system could not see beyond its own reasoning. This makes salient how Asimov's Modernist cognitive mode was encoded in his Three Laws (content and structure wise), and then encoded in the robot's functioning; the isomorphism between Asimov's cognitive mode, his computational ethics, and broader cultural logic permits this resonance. Asimov could not see beyond his own cognitive reasoning mode's Modernist structure and thus placing limitations on the structure of his ethical framework. The contradiction is he would need to interrogate the very mode doing the interrogating; more precisely, he would need a meta approach (or external verification) to see how his Modernist reasoning placed limitations on his ethical solutions.

Jan Leike

* Preliminary Draft will be updated in May 2026.

Citations

Issac Asimov, "Runaround," October 1941,https://web.williams.edu/Mathematics/sjmiller/public_html/105Sp10/handouts/Runaround.html.

Jan Leike, "What could a solution to the alignment problem look like?," Musings on the Alignment Problem, September 26, 2022,https://aligned.substack.com/p/alignment-solution.

Jan Leike, "A proposal for importing society’s value," Musings on the Alignment Problem, March 9, 2023,https://aligned.substack.com/p/a-proposal-for-importing-societys-values.

Jan Leike, "Crisp and fuzzy tasks," Musings on the Alignment Problem, November 22, 2024,https://aligned.substack.com/p/a-proposal-for-importing-societys-values.

Eli Lehrer, "AI Safety Requires Pluralism, Not a Single Moral Operating System," R Street Institute, December 10, 2025.

Sources

  • What could a solution to the alignment problem look like?

    By Jan Leike

  • A proposal for importing society’s values: Building towards Coherent Extrapolated Volition with language models

    By Jan Leike

  • Crisp and fuzzy tasks: Why fuzzy tasks matter and how to align models on them

    By Jan Leike

  • AI Safety Requires Pluralism, Not a Single Moral Operating System

    By Eli Lehrer

About

You Might Be Sleeping (est. March 2023) is an archive created by Sam Mann. Sam established this archive as a passion project to document and explore her research interests. Her interests include psychosis + schizophrenia, artificial intelligence, culture and more. Currently she is academically studying film and is immersed in the artistic exploration of an emerging phenomenon: psychosis from AI + human interaction, as documented by the Rolling Stone + New York Times. She believes her personal experience with psychosis and schizophrenia equips her to artistically + scientifically explore this phenomenon from a niche perspective. At the center of her work are AI and medical safety + ethics, as she believes such frameworks should be baked into the work rather than an afterthought.

If you’re someone with lived experience of psychosis, schizophrenia and/or neurodivergence – if you’re someone who is studying this emerging phenomenon from a research/scientific/artistic perspective – or more interestingly, if you’re someone who sits at the intersection of both, this archive can serve as one perspective among the vast sea of many interacting with one of the most intriguing phenomena of our times.

Snapshots from Sam’s November 2023 film The Paradox of Sanity/Creativity.

Twitter:

Professional: xymbsx

Semi-Professional: NotesFromAnAI

Bluesky:

Professional: xymbsx

Semi-Professional: NotesFromAnAI