Alignment

April 10, 2024

AGI Ruing: A List of Lethalities - Eliezer Yudkowsky. Forty-three reasons that make Yudkowsky pessimistic about our world being able to solve AGI safety. A longer list of dangers can be found in Rationality: From AI to Zombies - Eliezer Yudkowsky. Especially My Naturalist Awakening, That Tiny Note of Discord, Sympathetic Minds, Sorting Pebbles into Heaps, No Universally Compelling Argument, The Design of Minds in General, Detached Lever Fallacy, Ethical Injunctions, Magical Theories, Fake Utility Functions.

The Basic AI Drives - Steve Omohundro. On fundamental drives that may be inherent in any artificially intelligent system and their dangers.

Orthogonality Thesis - Nick Bostrom. On why an increase in intelligence does not have to be correlated with alignment in human values.

AI Alignment & Security - Paul Christiano. On how the relationship between security and alignment concerns is underappreciated.

Eliciting Latent Knowledge - Paul Christiano, Ajeya Cotra, Mark Xu. On how to train models to elicit knowledge of off-screen events that is latent to the models.

Artificial Intelligence, Values and Alignment - Iason Gabriel. On philosophical considerations in relationship to AI alignment, proposing to select principles that receive wide-spread reflective endorsement