Descriptions:
Stephanie Nyarko builds a fully functional WhatsApp AI voice agent from scratch using n8n, Twilio, and ElevenLabs — a system where any incoming message, whether typed text or a voice note, triggers an AI-generated reply delivered back as a human-sounding audio message.
The tutorial covers the complete integration stack with explanation of the reasoning behind each component. Twilio acts as the WhatsApp gateway, forwarding messages to an n8n webhook via POST. The workflow branches based on message type: voice notes are downloaded via the Twilio media endpoint and validated using a file-size heuristic (larger than 1,000 bytes) to filter out placeholder payloads and failed downloads before spending ElevenLabs API credits on transcription. Transcribed audio and plain text messages both flow into an AI agent node, and the agent’s reply is then converted to speech via ElevenLabs TTS and sent back through Twilio as a WhatsApp voice note.
Nyarko highlights a key business use case: ElevenLabs voice cloning means the outbound voice note can sound like the business owner, creating a personalized automated response experience. The video explains authentication setup for both Twilio and ElevenLabs credentials in n8n, sandbox configuration for testing, and the specific guard logic that prevents the workflow from erroring on ambiguous media. The complete n8n template is available free through Nyarko’s School community.
📺 Source: Stephanie Nyarko · Published January 07, 2026
🏷️ Format: Hands On Build







