Back to Case Studies
Production Voice AI — HoyMismo AutoTransport
VocalisAIGemini Live APIElevenLabsTwilioFastAPIFirebaseWhatsApp API

Voice AI for Vehicle Import: Automating 500+ Monthly Calls

Vehicle import agencies in Mexico receive hundreds of repetitive phone inquiries every week. This is the story of how a VocalisAI-powered phone agent — speaking fluent Mexican Spanish, understanding vehicle specs, checking import status by folio number, and qualifying leads — eliminated 80% of routine call handling and made the agency available 24 hours a day.

80%
Queries Automated (No Human Needed)
24/7
Availability (vs 9am–6pm before)
500+
Calls Handled Monthly
<2 min
Response Time (vs 20+ min holds)

The Problem: Repetitive Calls Drowning a Specialist Team

HoyMismo AutoTransport handles vehicle imports from the USA to Mexico — a process that involves customs valuations, required documents, tariff calculations, and coordination with multiple authorities. The agency received hundreds of calls per week, but the vast majority were asking the same 4 questions: How much does it cost? What documents do I need? What is the status of my import? How long does it take? Answering these consumed 4–6 hours of specialist staff time daily — time that could have been spent on complex cases that actually required human expertise.

The 4 Most Common Queries

  • 1. "How much does it cost to import a [year] [make] [model]?"
  • 2. "What documents do I need to bring?"
  • 3. "What is the status of my import? My folio is XXXX-XXXX"
  • 4. "What are the customs requirements for my specific vehicle?"

Before the Voice Agent

  • 4–6 hours/day of specialist time on routine queries
  • 20+ minute hold times during peak hours
  • Zero availability outside 9am–6pm Monday–Friday
  • 40% of calls arriving after hours — all leads lost
  • Inconsistent answers depending on which staff member answered

The Solution: A Voice AI Agent Built for Mexican Vehicle Import

The agent was designed ground-up for the specific context of Mexican vehicle imports from the USA — not a generic customer service bot. It speaks in a professional Mexican Spanish accent, understands vehicle year/make/model combinations, knows customs regulations for different vehicle types and import corridors (Laredo, Tijuana, Nogales), and integrates with the agency's Firebase backend to check real folio status.

Natural Mexican Spanish

ElevenLabs voice synthesis with a professional Mexican Spanish voice. Perfect pronunciation of currency amounts ("quince mil pesos" not "15,000"), vehicle terminology, and customs jargon. The agent handles interruptions, topic changes, and conversational repairs exactly like a trained human receptionist.

Accurate Import Quotes

The agent collects vehicle year, make, and model, then queries the pricing database to give an accurate import cost estimate based on current customs valuations. It explains what the cost includes: customs duties, agency fees, paperwork, and plates — no hidden surprises.

Document Requirements by Vehicle Type

Rather than reading a generic list, the agent tailors the document checklist based on the caller's specific vehicle type, origin state, and customs corridor. It can explain each document's purpose, where to obtain it, and what happens if it's missing.

Live Status Lookup by Folio

Callers can say their folio number and the agent queries Firebase in real time to report the current stage of their import process. It explains what each stage means and provides a realistic timeline for the next steps — reducing follow-up "where is my vehicle?" calls by 60%.

Lead Qualification + CRM

For new prospects, the agent qualifies key information: vehicle type, current location, urgency, budget awareness, and preferred customs corridor. All data is saved to Firebase and flagged for the sales team with a lead score — so specialists focus only on qualified, ready-to-buy prospects.

WhatsApp Follow-Up

After a call, the agent sends a WhatsApp summary to the caller: quote estimate, document checklist, or status update — depending on what was discussed. This drives a 40% improvement in lead conversion since prospects have the information in writing.

Technology Stack

Voice Framework: VocalisAI (conversation orchestration)
LLM: Gemini Live API (real-time streaming inference)
Voice Synthesis: ElevenLabs (Mexican Spanish, professional female voice)
Telephony: Twilio Programmable Voice (inbound call handling)
Backend: FastAPI (WebSocket for real-time audio streaming)
Database: Firebase Firestore (folio lookup, lead storage)
Messaging: WhatsApp Business API (post-call summaries)
Analytics: Firebase Analytics + custom dashboard (call metrics)

Implementation in 5 Weeks

1
Week 1
Discovery & Conversation Design
  • Analyzed 50+ real call recordings to identify every conversation pattern and edge case
  • Mapped the full conversation flow: 12 main intents, 40+ sub-branches
  • Defined agent personality: professional, warm, knowledgeable — not robotic
  • Built the vehicle-to-pricing lookup table from customs valuation data
2
Weeks 2–3
Core Development
  • VocalisAI + Gemini Live API integration with bidirectional audio streaming
  • ElevenLabs voice tuning: pronunciation rules for currency, vehicle models, customs terms
  • Firebase integration for folio status lookup and lead data storage
  • WhatsApp Business API for post-call summaries and document checklists
3
Week 4
Testing & Optimization
  • 100+ test calls covering all identified patterns and edge cases
  • Latency optimization: reduced response time from 3.2s to under 1.5s
  • Prompt refinement to reduce misinterpretation rate below 3%
  • Fallback logic for complex queries that require human transfer
4
Week 5
Gradual Rollout & Calibration
  • Launched at 20% traffic — monitored call quality scores and escalation rate
  • Expanded to 100% after 3 days with satisfactory metrics
  • Built real-time analytics dashboard for call volume, topics, and satisfaction
  • Two weeks of post-launch refinement based on live call data

Business Impact & ROI

Quantitative Results

  • 80% of routine queries resolved without human intervention
  • Response time: 20+ min hold → under 2 minutes
  • Availability: 9am–6pm → 24/7/365
  • Lead qualification rate: +40% (structured data capture)
  • After-hours leads captured: +60% (previously 0)
  • ROI: Recovered in under 2 months

Team Impact

  • Specialists redirected from routine calls to complex cases
  • Call-handling stress reduced — team focuses on high-value consultations
  • Consistent information delivery — zero variation between agents
  • Sales pipeline: leads arrive pre-qualified with structured data, not as cold calls
"Harry didn't just build a voice agent — he transformed how we operate. We now capture leads while we sleep, and the team can focus on what actually generates revenue: closing sales. The agent knows our business better than most new hires."
JP
Operations Director
HoyMismo AutoTransport

Key Technical Learnings

Pronunciation engineering matters more than the LLM

Three days were spent exclusively on how the agent pronounces currency amounts, vehicle year ranges, and customs terminology in Mexican Spanish. "$15,000 USD" needs to sound like "quince mil dólares" — not a robotic digit-by-digit reading. Getting this right is what separates "sounds like AI" from "sounds professional."

Domain specificity is the agent's competitive advantage

Generic voice AI would fail here. The agent needed to know that importing through Laredo requires different paperwork than Tijuana, that TSURU models have different valuation rules than pickup trucks, and that a "pedimento" is a specific customs document — not a generic word for request. This domain knowledge is what makes the agent genuinely useful versus merely functional.

Edge cases are 80% of the engineering work

The happy path — caller asks A, agent answers B — takes 20% of development time. The other 80% covers: interruptions mid-sentence, topic pivots mid-conversation, unclear vehicle descriptions, callers who want to negotiate on the call, callers who speak too fast or with regional accents, and graceful escalation to a human when confidence is low.

Ready to Automate Your Inbound Calls?

If your business receives repetitive phone calls — import agencies, real estate, logistics, legal intake — a domain-specific voice agent can be deployed in 4–6 weeks with measurable ROI from the first month.

BETA