Artwork

Conteúdo fornecido por Soroush Pour. Todo o conteúdo do podcast, incluindo episódios, gráficos e descrições de podcast, é carregado e fornecido diretamente por Soroush Pour ou por seu parceiro de plataforma de podcast. Se você acredita que alguém está usando seu trabalho protegido por direitos autorais sem sua permissão, siga o processo descrito aqui https://pt.player.fm/legal.
Player FM - Aplicativo de podcast
Fique off-line com o app Player FM !

Ep 10 - Accelerated training to become an AI safety researcher w/ Ryan Kidd (Co-Director, MATS)

1:16:58
 
Compartilhar
 

Manage episode 382650717 series 3428190
Conteúdo fornecido por Soroush Pour. Todo o conteúdo do podcast, incluindo episódios, gráficos e descrições de podcast, é carregado e fornecido diretamente por Soroush Pour ou por seu parceiro de plataforma de podcast. Se você acredita que alguém está usando seu trabalho protegido por direitos autorais sem sua permissão, siga o processo descrito aqui https://pt.player.fm/legal.

We speak with Ryan Kidd, Co-Director at ML Alignment & Theory Scholars (MATS) program, previously "SERI MATS".
MATS (https://www.matsprogram.org/) provides research mentorship, technical seminars, and connections to help new AI researchers get established and start producing impactful research towards AI safety & alignment.
Prior to MATS, Ryan completed a PhD in Physics at the University of Queensland (UQ) in Australia.
We talk about:
* What the MATS program is
* Who should apply to MATS (next *deadline*: Nov 17 midnight PT)
* Research directions being explored by MATS mentors, now and in the past
* Promising alignment research directions & ecosystem gaps , in Ryan's view
Hosted by Soroush Pour. Follow me for more AGI content:
* Twitter: https://twitter.com/soroushjp
* LinkedIn: https://www.linkedin.com/in/soroushjp/
== Show links ==
-- About Ryan --
* Twitter: https://twitter.com/ryan_kidd44
* LinkedIn: https://www.linkedin.com/in/ryan-kidd-1b0574a3/
* MATS: https://www.matsprogram.org/
* LISA: https://www.safeai.org.uk/
* Manifold: https://manifold.markets/
-- Further resources --
* Book: “The Precipice” - https://theprecipice.com/
* Ikigai - https://en.wikipedia.org/wiki/Ikigai
* Fermi paradox - https://en.wikipedia.org/wiki/Fermi_p...
* Ajeya Contra - Bioanchors - https://www.cold-takes.com/forecastin...
* Chomsky hierarchy & LLM transformers paper + external memory - https://en.wikipedia.org/wiki/Chomsky...
* AutoGPT - https://en.wikipedia.org/wiki/Auto-GPT
* BabyAGI - https://github.com/yoheinakajima/babyagi
* Unilateralist's curse - https://forum.effectivealtruism.org/t...
* Jeffrey Ladish & team - fine tuning to remove LLM safeguards - https://www.alignmentforum.org/posts/...
* Epoch AI trends - https://epochai.org/trends
* The demon "Moloch" - https://slatestarcodex.com/2014/07/30...
* AI safety fundamentals course - https://aisafetyfundamentals.com/
* Anthropic sycophancy paper - https://www.anthropic.com/index/towar...
* Promising technical alignment research directions
* Scalable oversight
* Recursive reward modelling - https://deepmindsafetyresearch.medium...
* RLHF - could work for a while, but unlikely forever as we scale
* Interpretability
* Mechanistic interpretability
* Paper: GPT4 labelling GPT2 - https://openai.com/research/language-...
* Concept based interpretability
* Rome paper - https://rome.baulab.info/
* Developmental interpretability
* devinterp.com - http://devinterp.com
* Timaeus - https://timaeus.co/
* Internal consistency
* Colin Burns research - https://arxiv.org/abs/2212.03827
* Threat modelling / capabilities evaluation & demos
* Paper: Can large language models democratize access to dual-use biotechnology? - https://arxiv.org/abs/2306.03809
* ARC Evals - https://evals.alignment.org/
* Palisade Research - https://palisaderesearch.org/
* Paper: Situational awareness with Owain Evans - https://arxiv.org/abs/2309.00667
* Gradient hacking - https://www.lesswrong.com/posts/uXH4r6MmKPedk8rMA/gradient-hacking
* Past scholar's work
* Apollo Research - https://www.apolloresearch.ai/
* Leap Labs - https://www.leap-labs.com/
* Timaeus - https://timaeus.co/
* Other orgs mentioned
* Redwood Research - https://redwoodresearch.org/
Recorded Oct 25, 2023

  continue reading

15 episódios

Artwork
iconCompartilhar
 
Manage episode 382650717 series 3428190
Conteúdo fornecido por Soroush Pour. Todo o conteúdo do podcast, incluindo episódios, gráficos e descrições de podcast, é carregado e fornecido diretamente por Soroush Pour ou por seu parceiro de plataforma de podcast. Se você acredita que alguém está usando seu trabalho protegido por direitos autorais sem sua permissão, siga o processo descrito aqui https://pt.player.fm/legal.

We speak with Ryan Kidd, Co-Director at ML Alignment & Theory Scholars (MATS) program, previously "SERI MATS".
MATS (https://www.matsprogram.org/) provides research mentorship, technical seminars, and connections to help new AI researchers get established and start producing impactful research towards AI safety & alignment.
Prior to MATS, Ryan completed a PhD in Physics at the University of Queensland (UQ) in Australia.
We talk about:
* What the MATS program is
* Who should apply to MATS (next *deadline*: Nov 17 midnight PT)
* Research directions being explored by MATS mentors, now and in the past
* Promising alignment research directions & ecosystem gaps , in Ryan's view
Hosted by Soroush Pour. Follow me for more AGI content:
* Twitter: https://twitter.com/soroushjp
* LinkedIn: https://www.linkedin.com/in/soroushjp/
== Show links ==
-- About Ryan --
* Twitter: https://twitter.com/ryan_kidd44
* LinkedIn: https://www.linkedin.com/in/ryan-kidd-1b0574a3/
* MATS: https://www.matsprogram.org/
* LISA: https://www.safeai.org.uk/
* Manifold: https://manifold.markets/
-- Further resources --
* Book: “The Precipice” - https://theprecipice.com/
* Ikigai - https://en.wikipedia.org/wiki/Ikigai
* Fermi paradox - https://en.wikipedia.org/wiki/Fermi_p...
* Ajeya Contra - Bioanchors - https://www.cold-takes.com/forecastin...
* Chomsky hierarchy & LLM transformers paper + external memory - https://en.wikipedia.org/wiki/Chomsky...
* AutoGPT - https://en.wikipedia.org/wiki/Auto-GPT
* BabyAGI - https://github.com/yoheinakajima/babyagi
* Unilateralist's curse - https://forum.effectivealtruism.org/t...
* Jeffrey Ladish & team - fine tuning to remove LLM safeguards - https://www.alignmentforum.org/posts/...
* Epoch AI trends - https://epochai.org/trends
* The demon "Moloch" - https://slatestarcodex.com/2014/07/30...
* AI safety fundamentals course - https://aisafetyfundamentals.com/
* Anthropic sycophancy paper - https://www.anthropic.com/index/towar...
* Promising technical alignment research directions
* Scalable oversight
* Recursive reward modelling - https://deepmindsafetyresearch.medium...
* RLHF - could work for a while, but unlikely forever as we scale
* Interpretability
* Mechanistic interpretability
* Paper: GPT4 labelling GPT2 - https://openai.com/research/language-...
* Concept based interpretability
* Rome paper - https://rome.baulab.info/
* Developmental interpretability
* devinterp.com - http://devinterp.com
* Timaeus - https://timaeus.co/
* Internal consistency
* Colin Burns research - https://arxiv.org/abs/2212.03827
* Threat modelling / capabilities evaluation & demos
* Paper: Can large language models democratize access to dual-use biotechnology? - https://arxiv.org/abs/2306.03809
* ARC Evals - https://evals.alignment.org/
* Palisade Research - https://palisaderesearch.org/
* Paper: Situational awareness with Owain Evans - https://arxiv.org/abs/2309.00667
* Gradient hacking - https://www.lesswrong.com/posts/uXH4r6MmKPedk8rMA/gradient-hacking
* Past scholar's work
* Apollo Research - https://www.apolloresearch.ai/
* Leap Labs - https://www.leap-labs.com/
* Timaeus - https://timaeus.co/
* Other orgs mentioned
* Redwood Research - https://redwoodresearch.org/
Recorded Oct 25, 2023

  continue reading

15 episódios

Todos os episódios

×
 
Loading …

Bem vindo ao Player FM!

O Player FM procura na web por podcasts de alta qualidade para você curtir agora mesmo. É o melhor app de podcast e funciona no Android, iPhone e web. Inscreva-se para sincronizar as assinaturas entre os dispositivos.

 

Guia rápido de referências