The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) Podcast

Active

Has guests

Sam Charrington

[email protected]

#63 in Technology News Tech News Business

8.4K - 14.1K listeners Neutral 4.7 rating 563 reviews 748 episodes USA

30s Ad: $282 - $349 60s Ad: $338 - $405 CPM Category: Technology

No data No data

Website Contact Page RSS Apple Podcast

Machine learning and artificial intelligence are dramatically changing the way businesses operate and people live. The TWIML AI Podcast brings the top minds and ideas from the world of ML and AI to a broad and influential community of ML/AI researchers, data scientists, engineers and tech-savvy business and IT leaders. Hosted by Sam Charrington, a sought after industry analyst, speaker, commentator and thought leader. Technologies covered include machine learning, artificial intelligence, deep learning, natural language processing, neural networks, analytics, computer science, data science and more.

Producers, Hosts, and Production Team

Last updated 21 days ago

Hosts

Sam Charrington

Host, TWIML AI Podcast

Host and creator of the TWIML AI Podcast, a prominent voice in machine learning and artificial intelligence discussions.

Emails, Phones, and Addresses

Contact Page Emails

Emails

Phone Numbers

No phone numbers found.

Addresses

No addresses found.

Form

A contact form is available on this page. You can fill out the form at this link.

Source

https://twimlai.com/contact/

General Website Emails

No website emails found.

Externally Sourced Emails

No external emails found.

RSS Emails

[email protected]

Recent Hosts, Guests & Topics

Here's a quick summary of the last 5 episodes on The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence).

Hosts

Sam Charrington

Previous Guests

Josh Tobin

Josh Tobin is a member of the technical staff at OpenAI, where he focuses on developing AI agents that can think and act autonomously. He has a background in machine learning and artificial intelligence, contributing to the advancement of AI technologies and their applications in real-world scenarios. His work includes exploring the integration of AI in software development and enhancing the capabilities of AI systems through innovative approaches.

Nidhi Rastogi

Nidhi Rastogi is an assistant professor at the Rochester Institute of Technology, specializing in Cyber Threat Intelligence (CTI) and artificial intelligence applications in cybersecurity. She has been involved in research projects focusing on evaluating large language models (LLMs) for real-world CTI tasks, including the development of CTIBench, a benchmark for assessing LLMs' capabilities in threat detection and analysis. Nidhi's work emphasizes the evolution of AI in cybersecurity, the importance of benchmarks in identifying model limitations, and the future directions of her AI4Sec Research Lab, which aims to improve mitigation techniques and explainability in cybersecurity.

Kelly Hong

Kelly Hong is a researcher at Chroma, specializing in generative benchmarking and retrieval systems. She has a background in evaluating machine learning models and has contributed to the development of novel approaches for assessing the performance of embedding models in real-world applications. Kelly's work focuses on improving the accuracy of evaluations by aligning them with actual user behavior and preferences, particularly in the context of technical support systems.

Emmanuel Ameisen

Emmanuel Ameisen is a research engineer at Anthropic, where he focuses on developing mechanistic interpretability methods for large language models. He has contributed to significant research in understanding the internal workings of AI systems, particularly in the context of enhancing their safety and interpretability. Emmanuel's work involves exploring how AI models can be manipulated and understood through their neural pathways and computational graphs, making him a key figure in the field of AI research.

Maohao Shen

Maohao Shen is a PhD student at the Massachusetts Institute of Technology (MIT), specializing in machine learning and artificial intelligence. His research focuses on enhancing language model reasoning through innovative approaches such as reinforcement learning and the Chain-of-Action-Thought (COAT) methodology. Maohao has contributed to the development of Satori, a framework that enables language models to self-reflect and self-correct, improving their performance on complex reasoning tasks.

Topics Discussed

OpenAI AI agents Deep Research Operator Codex CLI reinforcement learning human-AI collaboration vibe coding Model Control Protocol context management trust and safety Cyber Threat Intelligence CTIBench LLMs AI in cybersecurity Retrieval-Augmented Generation benchmarking threat detection explainability Generative Benchmarking retrieval systems synthetic data MTEB embedding models document filtering user behavior Weights & Biases technical support bot domain-specific evaluation LLM judges chunking strategies retrieval effectiveness production queries benchmark queries large language models circuit tracing mechanistic interpretability Claude neural networks poetry mathematical calculations neural pathways MLPs attention mechanisms hallucinations chain-of-thought explanations AI safety Reinforcement Learning Satori Chain-of-Action-Thought self-reflection self-correction reasoning model performance trial-and-error format tuning

YouTube Channel

Podcast has no YouTube channel.

Instagram Profile

Podcast has no Instagram profile.

Episodes

Here's the recent few episodes on The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence).

0:00 1:07:27

How OpenAI Builds AI Agents That Think and Act with Josh Tobin - #730

May 06, 2025

Sam Charrington

Josh Tobin

OpenAI AI agents Deep Research Operator Codex CLI reinforcement learning human-AI collaboration vibe coding Model Control Protocol context management trust and safety

Today, we're joined by Josh Tobin, member of technical staff at OpenAI, to discuss the company’s approach to building AI agents. We cover OpenAI's three agentic offerings—Deep Research for comprehensive web research, Operator for website navigation, and Codex CLI for local code execution. We explore OpenAI’s shift from simple LLM workflows to reasoning models specifically trained for multi-step tasks through reinforcement learning, and how that enables agents to more easily recover from failures while executing complex processes. Josh shares insights on the practical applications of these agents, including some unexpected use cases. We also discuss the future of human-AI collaboration in software development, such as with "vibe coding," the integration of tools through the Model Control Protocol (MCP), and the significance of context management in AI-enabled IDEs. Additionally, we highlight the challenges of ensuring trust and safety as AI agents become more powerful and autonomous.

The complete show notes for this episode can be found at https://twimlai.com/go/730.

0:00 56:18

CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729

April 30, 2025

Sam Charrington

Nidhi Rastogi

Cyber Threat Intelligence CTIBench LLMs AI in cybersecurity Retrieval-Augmented Generation benchmarking threat detection explainability

Today, we're joined by Nidhi Rastogi, assistant professor at Rochester Institute of Technology to discuss Cyber Threat Intelligence (CTI), focusing on her recent project CTIBench—a benchmark for evaluating LLMs on real-world CTI tasks. Nidhi explains the evolution of AI in cybersecurity, from rule-based systems to LLMs that accelerate analysis by providing critical context for threat detection and defense. We dig into the advantages and challenges of using LLMs in CTI, how techniques like Retrieval-Augmented Generation (RAG) are essential for keeping LLMs up-to-date with emerging threats, and how CTIBench measures LLMs’ ability to perform a set of real-world tasks of the cybersecurity analyst. We unpack the process of building the benchmark, the tasks it covers, and key findings from benchmarking various LLMs. Finally, Nidhi shares the importance of benchmarks in exposing model limitations and blind spots, the challenges of large-scale benchmarking, and the future directions of her AI4Sec Research Lab, including developing reliable mitigation techniques, monitoring "concept drift" in threat detection models, improving explainability in cybersecurity, and more.

The complete show notes for this episode can be found at https://twimlai.com/go/729.

0:00 54:17

Generative Benchmarking with Kelly Hong - #728

April 23, 2025

Sam Charrington

Kelly Hong

Generative Benchmarking retrieval systems synthetic data MTEB embedding models document filtering user behavior Weights & Biases technical support bot domain-specific evaluation LLM judges chunking strategies retrieval effectiveness production queries benchmark queries

In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications.

The complete show notes for this episode can be found at https://twimlai.com/go/728.

0:00 1:34:06

Exploring the Biology of LLMs with Circuit Tracing with Emmanuel Ameisen - #727

April 14, 2025

Sam Charrington

Emmanuel Ameisen

large language models circuit tracing mechanistic interpretability Claude neural networks poetry mathematical calculations neural pathways MLPs attention mechanisms hallucinations chain-of-thought explanations AI safety

In this episode, Emmanuel Ameisen, a research engineer at Anthropic, returns to discuss two recent papers: "Circuit Tracing: Revealing Language Model Computational Graphs" and "On the Biology of a Large Language Model." Emmanuel explains how his team developed mechanistic interpretability methods to understand the internal workings of Claude by replacing dense neural network components with sparse, interpretable alternatives. The conversation explores several fascinating discoveries about large language models, including how they plan ahead when writing poetry (selecting the rhyming word "rabbit" before crafting the sentence leading to it), perform mathematical calculations using unique algorithms, and process concepts across multiple languages using shared neural representations. Emmanuel details how the team can intervene in model behavior by manipulating specific neural pathways, revealing how concepts are distributed throughout the network's MLPs and attention mechanisms. The discussion highlights both capabilities and limitations of LLMs, showing how hallucinations occur through separate recognition and recall circuits, and demonstrates why chain-of-thought explanations aren't always faithful representations of the model's actual reasoning. This research ultimately supports Anthropic's safety strategy by providing a deeper understanding of how these AI systems actually work.

The complete show notes for this episode can be found at https://twimlai.com/go/727.

0:00 51:45

Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen - #726

April 08, 2025

Sam Charrington

Maohao Shen

Reinforcement Learning LLMs Satori Chain-of-Action-Thought self-reflection self-correction reasoning model performance trial-and-error format tuning

Today, we're joined by Maohao Shen, PhD student at MIT to discuss his paper, “Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search.” We dig into how Satori leverages reinforcement learning to improve language model reasoning—enabling model self-reflection, self-correction, and exploration of alternative solutions. We explore the Chain-of-Action-Thought (COAT) approach, which uses special tokens—continue, reflect, and explore—to guide the model through distinct reasoning actions, allowing it to navigate complex reasoning tasks without external supervision. We also break down Satori’s two-stage training process: format tuning, which teaches the model to understand and utilize the special action tokens, and reinforcement learning, which optimizes reasoning through trial-and-error self-improvement. We cover key techniques such “restart and explore,” which allows the model to self-correct and generalize beyond its training domain. Finally, Maohao reviews Satori’s performance and how it compares to other models, the reward design, the benchmarks used, and the surprising observations made during the research.

The complete show notes for this episode can be found at https://twimlai.com/go/726.

Ratings

Global:

4.7 rating 563 reviews

USA

4.7 ratings 413 reviews

UK

4.8 ratings 49 reviews

Canada

4.8 ratings 45 reviews

Australia

4.7 ratings 40 reviews

New Zealand

5.0 ratings 6 reviews

Singapore

5.0 ratings 4 reviews

Ireland

5.0 ratings 3 reviews

South Africa

4.0 ratings 3 reviews