Melhores podcasts sobre Arx (2024)

1
[QA] Can Transformers Smell Like Humans? 7:34

9h ago7:34

7:34

This study explores whether pre-trained transformer models of chemical structures align with human olfactory perception, demonstrating their ability to predict expert labels and human ratings of odorants. https://arxiv.org/abs//2411.03038 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: http…

1
Can Transformers Smell Like Humans? 18:01

9h ago18:01

18:01

This study explores whether pre-trained transformer models of chemical structures align with human olfactory perception, demonstrating their ability to predict expert labels and human ratings of odorants. https://arxiv.org/abs//2411.03038 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: http…

1
[QA] Mixtures of In-Context Learners 7:11

9h ago7:11

7:11

The paper introduces Mixtures of In-Context Learners (MOICL), enhancing in-context learning by optimizing demonstration subsets, improving performance, and reducing memory usage in Transformer LLMs. https://arxiv.org/abs//2411.02830 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://po…

1
Mixtures of In-Context Learners 15:37

9h ago15:37

15:37

The paper introduces Mixtures of In-Context Learners (MOICL), enhancing in-context learning by optimizing demonstration subsets, improving performance, and reducing memory usage in Transformer LLMs. https://arxiv.org/abs//2411.02830 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://po…

1
[QA] How Far Is Video Generation from World Model: A Physical Law Perspective 8:58

1d ago8:58

8:58

OpenAI's Sora evaluates video generation models' ability to learn physical laws, revealing limitations in generalization and suggesting scaling alone isn't enough for uncovering fundamental principles. https://arxiv.org/abs//2411.02385 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https:/…

1
How Far Is Video Generation from World Model: A Physical Law Perspective 27:51

1d ago27:51

27:51

OpenAI's Sora evaluates video generation models' ability to learn physical laws, revealing limitations in generalization and suggesting scaling alone isn't enough for uncovering fundamental principles. https://arxiv.org/abs//2411.02385 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https:/…

1
[QA] ADOPT: Modified Adam Can Converge with Any with the Optimal Rate 7:47

1d ago7:47

7:47

The paper introduces ADOPT, a new adaptive gradient method that resolves Adam's non-convergence issue without bounded noise assumptions, demonstrating superior performance across various deep learning tasks. https://arxiv.org/abs//2411.02853 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: h…

1
ADOPT: Modified Adam Can Converge with Any with the Optimal Rate 15:16

1d ago15:16

15:16

The paper introduces ADOPT, a new adaptive gradient method that resolves Adam's non-convergence issue without bounded noise assumptions, demonstrating superior performance across various deep learning tasks. https://arxiv.org/abs//2411.02853 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: h…

1
[QA] Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks? 7:24

2d ago7:24

7:24

This study evaluates 17 leading Large Language Models' abilities in complex information retrieval, revealing many are thread-safe but have shorter effective context limits than supported lengths. https://arxiv.org/abs//2411.05000 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podca…

1
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks? 14:03

2d ago14:03

14:03

This study evaluates 17 leading Large Language Models' abilities in complex information retrieval, revealing many are thread-safe but have shorter effective context limits than supported lengths. https://arxiv.org/abs//2411.05000 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podca…

1
[QA] Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models 7:53

2d ago7:53

7:53

https://arxiv.org/abs//2411.04996 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/supp…

1
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models 41:18

2d ago41:18

41:18

https://arxiv.org/abs//2411.04996 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/supp…

1
[QA] Do Mice Grok? Glimpses of Hidden Progress During Overtraining in Sensory Cortex 10:52

4d ago10:52

10:52

The study reveals that task-specific representation learning continues in mice's piriform cortex during overtraining, enhancing classification accuracy despite behavior plateauing, suggesting hidden learning mechanisms at play. https://arxiv.org/abs//2411.03541 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_pape…

1
Do Mice Grok? Glimpses of Hidden Progress During Overtraining in Sensory Cortex 15:09

4d ago15:09

15:09

The study reveals that task-specific representation learning continues in mice's piriform cortex during overtraining, enhancing classification accuracy despite behavior plateauing, suggesting hidden learning mechanisms at play. https://arxiv.org/abs//2411.03541 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_pape…

1
[QA] How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis 7:22

4d ago7:22

7:22

This study explores how transformers, both small and large, perform complex logical reasoning, identifying key circuits and mechanisms involved in planning and reasoning through a synthetic propositional logic problem. https://arxiv.org/abs//2411.04105 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple …

1
How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis 22:34

4d ago22:34

22:34

This study explores how transformers, both small and large, perform complex logical reasoning, identifying key circuits and mechanisms involved in planning and reasoning through a synthetic propositional logic problem. https://arxiv.org/abs//2411.04105 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple …

1
[QA] Discovering Data Structures: Nearest Neighbor Search and Beyond 7:59

5d ago7:59

7:59

We present a framework for end-to-end learning of data structures, optimizing query and space complexity, applied to nearest neighbor search and frequency estimation in data streams. https://arxiv.org/abs//2411.03253 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com…

1
Discovering Data Structures: Nearest Neighbor Search and Beyond 28:18

5d ago28:18

28:18

We present a framework for end-to-end learning of data structures, optimizing query and space complexity, applied to nearest neighbor search and frequency estimation in data streams. https://arxiv.org/abs//2411.03253 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com…

1
[QA] BrainBits: How Much of the Brain are Generative Reconstruction Methods Using? 7:36

5d ago7:36

7:36

The paper examines factors influencing stimulus reconstruction fidelity, revealing that powerful generative models can mislead interpretations of neural signal extraction effectiveness. It proposes improved evaluation metrics for reconstruction methods. https://arxiv.org/abs//2411.02783 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://…

1
BrainBits: How Much of the Brain are Generative Reconstruction Methods Using? 15:29

5d ago15:29

15:29

The paper examines factors influencing stimulus reconstruction fidelity, revealing that powerful generative models can mislead interpretations of neural signal extraction effectiveness. It proposes improved evaluation metrics for reconstruction methods. https://arxiv.org/abs//2411.02783 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://…

1
[QA] Adapting Language Models via Token Translation 8:13

7d ago8:13

8:13

Sparse Sinkhorn Token Translation (S2T2) improves text compression and inference in new domains by training tailored tokenizers and enabling effective token translation, enhancing performance in language models. https://arxiv.org/abs//2411.00593 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcast…

1
Adapting Language Models via Token Translation 9:33

7d ago9:33

9:33

Sparse Sinkhorn Token Translation (S2T2) improves text compression and inference in new domains by training tailored tokenizers and enabling effective token translation, enhancing performance in language models. https://arxiv.org/abs//2411.00593 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcast…

1
[QA] Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models 8:29

7d ago8:29

8:29

Specialized Sparse Autoencoders (SSAEs) enhance interpretability of foundation models by effectively capturing rare concepts, improving classification accuracy, and revealing insights into subdomain representations. https://arxiv.org/abs//2411.00743 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Pod…

1
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models 26:54

7d ago26:54

26:54

Specialized Sparse Autoencoders (SSAEs) enhance interpretability of foundation models by effectively capturing rare concepts, improving classification accuracy, and revealing insights into subdomain representations. https://arxiv.org/abs//2411.00743 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Pod…

1
[QA] Tokenformer: Rethinking Transformer Scaling with Tokenized Model Parameters 7:51

8d ago7:51

7:51

Tokenformer introduces a scalable architecture that enhances Transformers' efficiency by using token-parameter attention, allowing for incremental scaling without retraining, thus reducing computational costs significantly. https://arxiv.org/abs//2410.23168 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers A…

1
Tokenformer: Rethinking Transformer Scaling with Tokenized Model Parameters 19:10

8d ago19:10

19:10

Tokenformer introduces a scalable architecture that enhances Transformers' efficiency by using token-parameter attention, allowing for incremental scaling without retraining, thus reducing computational costs significantly. https://arxiv.org/abs//2410.23168 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers A…

1
[QA] $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources 7:22

8d ago7:22

7:22

This paper challenges the assumption that academic researchers can't pre-train models, providing benchmarks and insights on optimizing GPU resources for efficient model training. https://arxiv.org/abs//2410.23261 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/…

1
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources 16:51

8d ago16:51

16:51

This paper challenges the assumption that academic researchers can't pre-train models, providing benchmarks and insights on optimizing GPU resources for efficient model training. https://arxiv.org/abs//2410.23261 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/…

1
[QA] What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective 7:59

10d ago7:59

7:59

This study analyzes layer-wise gradients in LLMs, revealing that slow thinking enhances learning stability and response correctness, while fast thinking shows larger gradient variations. https://arxiv.org/abs//2410.23743 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple…

1
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective 15:27

10d ago15:27

15:27

This study analyzes layer-wise gradients in LLMs, revealing that slow thinking enhances learning stability and response correctness, while fast thinking shows larger gradient variations. https://arxiv.org/abs//2410.23743 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple…

1
[QA] Tokenformer: Rethinking Transformer Scaling with Tokenized Model Parameters 7:28

10d ago7:28

7:28

Tokenformer introduces a scalable architecture that enhances Transformers' efficiency by treating model parameters as tokens, allowing for flexible scaling without retraining, significantly reducing computational costs. https://arxiv.org/abs//2410.23168 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple…

1
Tokenformer: Rethinking Transformer Scaling with Tokenized Model Parameters 19:38

10d ago19:38

19:38

Tokenformer introduces a scalable architecture that enhances Transformers' efficiency by treating model parameters as tokens, allowing for flexible scaling without retraining, significantly reducing computational costs. https://arxiv.org/abs//2410.23168 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple…

1
[QA] Where Do Large Learning Rates Lead Us? 8:30

12d ago8:30

8:30

This study investigates optimal initial learning rates for neural networks, finding a narrow range enhances generalization by locating high-quality minima and focusing on relevant features, unlike extreme rates. https://arxiv.org/abs//2410.22113 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcast…

1
Where Do Large Learning Rates Lead Us? 28:43

12d ago28:43

28:43

This study investigates optimal initial learning rates for neural networks, finding a narrow range enhances generalization by locating high-quality minima and focusing on relevant features, unlike extreme rates. https://arxiv.org/abs//2410.22113 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcast…

1
[QA] Fourier Head: Helping Large Language Models Learn Complex Probability Distributions 7:10

12d ago7:10

7:10

The paper introduces a Fourier series-based neural network layer to improve continuous token modeling in decision-making and time series tasks, enhancing performance in various benchmarks. https://arxiv.org/abs//2410.22269 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.app…

1
Fourier Head: Helping Large Language Models Learn Complex Probability Distributions 13:56

12d ago13:56

13:56

The paper introduces a Fourier series-based neural network layer to improve continuous token modeling in decision-making and time series tasks, enhancing performance in various benchmarks. https://arxiv.org/abs//2410.22269 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.app…

1
[QA] LoRA vs Full Fine-tuning: An Illusion of Equivalence 7:47

13d ago7:47

7:47

This study analyzes the differences between full fine-tuning and LoRA in large language models, revealing distinct weight matrix structures and generalization behaviors despite similar performance on tasks. https://arxiv.org/abs//2410.21228 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: ht…

1
LoRA vs Full Fine-tuning: An Illusion of Equivalence 13:44

13d ago13:44

13:44

This study analyzes the differences between full fine-tuning and LoRA in large language models, revealing distinct weight matrix structures and generalization behaviors despite similar performance on tasks. https://arxiv.org/abs//2410.21228 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: ht…

1
[QA] Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad? 6:57

14d ago6:57

6:57

Vision-Language Models show promise in reasoning across text and images but struggle with basic visual concepts, revealing significant gaps in their understanding and generalization abilities. https://arxiv.org/abs//2410.19546 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts…

1
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad? 8:44

14d ago8:44

8:44

Vision-Language Models show promise in reasoning across text and images but struggle with basic visual concepts, revealing significant gaps in their understanding and generalization abilities. https://arxiv.org/abs//2410.19546 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts…

1
[QA] Computational Bottlenecks of Training Small-scale Large Language Models 8:10

14d ago8:10

8:10

This study investigates the training behavior and computational requirements of Small-scale Large Language Models (SLMs), focusing on hyperparameters and configurations to enhance efficiency and support low-resource AI research. https://arxiv.org/abs//2410.19456 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_pap…

1
Computational Bottlenecks of Training Small-scale Large Language Models 9:57

14d ago9:57

9:57

This study investigates the training behavior and computational requirements of Small-scale Large Language Models (SLMs), focusing on hyperparameters and configurations to enhance efficiency and support low-resource AI research. https://arxiv.org/abs//2410.19456 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_pap…

1
[QA] Physics-informed Neural Networks for Functional Differential Equations: Cylindrical Approximation and Its Convergence Guarantees 9:12

16d ago9:12

9:12

This paper introduces a hybrid approach combining physics-informed neural networks and cylindrical approximation to efficiently solve functional differential equations, addressing computational challenges and improving numerical analysis. https://arxiv.org/abs//2410.18153 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/…

1
Physics-informed Neural Networks for Functional Differential Equations: Cylindrical Approximation and Its Convergence Guarantees 19:53

16d ago19:53

19:53

This paper introduces a hybrid approach combining physics-informed neural networks and cylindrical approximation to efficiently solve functional differential equations, addressing computational challenges and improving numerical analysis. https://arxiv.org/abs//2410.18153 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/…

1
[QA] A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration 8:04

16d ago8:04

8:04

This paper shows that integrating coherent reasoning in Few-shot Chain-of-Thought prompting enhances transformer performance, revealing sensitivity to errors in intermediate steps and proposing improvements using varied reasoning paths. https://arxiv.org/abs//2410.16540 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@a…

1
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration 18:20

16d ago18:20

18:20

This paper shows that integrating coherent reasoning in Few-shot Chain-of-Thought prompting enhances transformer performance, revealing sensitivity to errors in intermediate steps and proposing improvements using varied reasoning paths. https://arxiv.org/abs//2410.16540 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@a…

1
[QA] LEGO: Language Model Building Blocks 7:19

17d ago7:19

7:19

LEGO is a novel technique for extracting and recombining small language models from large language models, enhancing efficiency, robustness, and user data privacy while reducing costs. https://arxiv.org/abs//2410.18287 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.c…

1
LEGO: Language Model Building Blocks 16:46

17d ago16:46

16:46

LEGO is a novel technique for extracting and recombining small language models from large language models, enhancing efficiency, robustness, and user data privacy while reducing costs. https://arxiv.org/abs//2410.18287 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.c…

1
[QA] Knowledge Distillation Using Frontier Open-Source LLMs: Generalizability and the Role of Synthetic Data 8:13

17d ago8:13

8:13

This study explores knowledge distillation from Llama-3.1-405B to smaller models, demonstrating improved accuracy and efficiency through synthetic data and diverse evaluation methods across various tasks. https://arxiv.org/abs//2410.18588 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: http…

1
Knowledge Distillation Using Frontier Open-Source LLMs: Generalizability and the Role of Synthetic Data 19:45

17d ago19:45

19:45

This study explores knowledge distillation from Llama-3.1-405B to smaller models, demonstrating improved accuracy and efficiency through synthetic data and diverse evaluation methods across various tasks. https://arxiv.org/abs//2410.18588 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: http…

Podcasts que valem a pena ouvir

Podcasts sobre Arx

Podcasts que valem a pena ouvir

Guia rápido de referências