Fique off-line com o app Player FM !
Davidad Dalrymple: Towards Provably Safe AI
Manage episode 438376126 series 2975159
Episode 137
I spoke with Davidad Dalrymple about:
* His perspectives on AI risk
* ARIA (the UK’s Advanced Research and Invention Agency) and its Safeguarded AI Programme
Enjoy—and let me know what you think!
Davidad is a Programme Director at ARIA. He was most recently a Research Fellow in technical AI safety at Oxford. He co-invented the top-40 cryptocurrency Filecoin, led an international neuroscience collaboration, and was a senior software engineer at Twitter and multiple startups.
Find me on Twitter for updates on new episodes, and reach me at editor@thegradient.pub for feedback, ideas, guest suggestions.
Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on Twitter
Outline:
* (00:00) Intro
* (00:36) Calibration and optimism about breakthroughs
* (03:35) Calibration and AGI timelines, effects of AGI on humanity
* (07:10) Davidad’s thoughts on the Orthogonality Thesis
* (10:30) Understanding how our current direction relates to AGI and breakthroughs
* (13:33) What Davidad thinks is needed for AGI
* (17:00) Extracting knowledge
* (19:01) Cyber-physical systems and modeling frameworks
* (20:00) Continuities between Davidad’s earlier work and ARIA
* (22:56) Path dependence in technology, race dynamics
* (26:40) More on Davidad’s perspective on what might go wrong with AGI
* (28:57) Vulnerable world, interconnectedness of computers and control
* (34:52) Formal verification and world modeling, Open Agency Architecture
* (35:25) The Semantic Sufficiency Hypothesis
* (39:31) Challenges for modeling
* (43:44) The Deontic Sufficiency Hypothesis and mathematical formalization
* (49:25) Oversimplification and quantitative knowledge
* (53:42) Collective deliberation in expressing values for AI
* (55:56) ARIA’s Safeguarded AI Programme
* (59:40) Anthropic’s ASL levels
* (1:03:12) Guaranteed Safe AI —
* (1:03:38) AI risk and (in)accurate world models
* (1:09:59) Levels of safety specifications for world models and verifiers — steps to achieve high safety
* (1:12:00) Davidad’s portfolio research approach and funding at ARIA
* (1:15:46) Earlier concerns about ARIA — Davidad’s perspective
* (1:19:26) Where to find more information on ARIA and the Safeguarded AI Programme
* (1:20:44) Outro
Links:
* Davidad’s Twitter
* Papers
* Davidad’s Open Agency Architecture for Safe Transformative AI
* Dioptics: a Common Generalization of Open Games and Gradient-Based Learners (2019)
* Asynchronous Logic Automata (2008)
Get full access to The Gradient at thegradientpub.substack.com/subscribe
147 episódios
Manage episode 438376126 series 2975159
Episode 137
I spoke with Davidad Dalrymple about:
* His perspectives on AI risk
* ARIA (the UK’s Advanced Research and Invention Agency) and its Safeguarded AI Programme
Enjoy—and let me know what you think!
Davidad is a Programme Director at ARIA. He was most recently a Research Fellow in technical AI safety at Oxford. He co-invented the top-40 cryptocurrency Filecoin, led an international neuroscience collaboration, and was a senior software engineer at Twitter and multiple startups.
Find me on Twitter for updates on new episodes, and reach me at editor@thegradient.pub for feedback, ideas, guest suggestions.
Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on Twitter
Outline:
* (00:00) Intro
* (00:36) Calibration and optimism about breakthroughs
* (03:35) Calibration and AGI timelines, effects of AGI on humanity
* (07:10) Davidad’s thoughts on the Orthogonality Thesis
* (10:30) Understanding how our current direction relates to AGI and breakthroughs
* (13:33) What Davidad thinks is needed for AGI
* (17:00) Extracting knowledge
* (19:01) Cyber-physical systems and modeling frameworks
* (20:00) Continuities between Davidad’s earlier work and ARIA
* (22:56) Path dependence in technology, race dynamics
* (26:40) More on Davidad’s perspective on what might go wrong with AGI
* (28:57) Vulnerable world, interconnectedness of computers and control
* (34:52) Formal verification and world modeling, Open Agency Architecture
* (35:25) The Semantic Sufficiency Hypothesis
* (39:31) Challenges for modeling
* (43:44) The Deontic Sufficiency Hypothesis and mathematical formalization
* (49:25) Oversimplification and quantitative knowledge
* (53:42) Collective deliberation in expressing values for AI
* (55:56) ARIA’s Safeguarded AI Programme
* (59:40) Anthropic’s ASL levels
* (1:03:12) Guaranteed Safe AI —
* (1:03:38) AI risk and (in)accurate world models
* (1:09:59) Levels of safety specifications for world models and verifiers — steps to achieve high safety
* (1:12:00) Davidad’s portfolio research approach and funding at ARIA
* (1:15:46) Earlier concerns about ARIA — Davidad’s perspective
* (1:19:26) Where to find more information on ARIA and the Safeguarded AI Programme
* (1:20:44) Outro
Links:
* Davidad’s Twitter
* Papers
* Davidad’s Open Agency Architecture for Safe Transformative AI
* Dioptics: a Common Generalization of Open Games and Gradient-Based Learners (2019)
* Asynchronous Logic Automata (2008)
Get full access to The Gradient at thegradientpub.substack.com/subscribe
147 episódios
Alle episoder
×Bem vindo ao Player FM!
O Player FM procura na web por podcasts de alta qualidade para você curtir agora mesmo. É o melhor app de podcast e funciona no Android, iPhone e web. Inscreva-se para sincronizar as assinaturas entre os dispositivos.