Fique off-line com o app Player FM !
Empirical work that might shed light on scheming (Section 6 of "Scheming AIs")
Manage episode 388852125 series 3402048
This is section 6 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”
Text of the report here: https://arxiv.org/abs/2311.08379
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Capítulos
1. Empirical work that might shed light on scheming (Section 6 of "Scheming AIs") (00:00:00)
2. 6. Empirical work that might shed light on scheming (00:00:33)
3. 6.1 Empirical work on situational awareness (00:05:34)
4. 6.2 Empirical work on beyond-episode goals (00:07:03)
5. 6.3 Empirical work on the viability of scheming as an instrumental strategy (00:10:29)
6. 6.4 The “model organisms” paradigm (00:12:14)
7. 6.5 Traps and honest tests (00:13:29)
8. 6.6 Interpretability and transparency (00:16:49)
9. 6.7 Security, control, and oversight (00:18:35)
10. 6.8 Other possibilities (00:21:08)
54 episódios
Manage episode 388852125 series 3402048
This is section 6 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”
Text of the report here: https://arxiv.org/abs/2311.08379
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Capítulos
1. Empirical work that might shed light on scheming (Section 6 of "Scheming AIs") (00:00:00)
2. 6. Empirical work that might shed light on scheming (00:00:33)
3. 6.1 Empirical work on situational awareness (00:05:34)
4. 6.2 Empirical work on beyond-episode goals (00:07:03)
5. 6.3 Empirical work on the viability of scheming as an instrumental strategy (00:10:29)
6. 6.4 The “model organisms” paradigm (00:12:14)
7. 6.5 Traps and honest tests (00:13:29)
8. 6.6 Interpretability and transparency (00:16:49)
9. 6.7 Security, control, and oversight (00:18:35)
10. 6.8 Other possibilities (00:21:08)
54 episódios
Todos os episódios
×Bem vindo ao Player FM!
O Player FM procura na web por podcasts de alta qualidade para você curtir agora mesmo. É o melhor app de podcast e funciona no Android, iPhone e web. Inscreva-se para sincronizar as assinaturas entre os dispositivos.