Conversational Agents

Conversational Agents

27 feb 2024

Artificial Intelligence

4 Minutes

Most of us have talked to a chatbot or virtual assistant. We did this on our phones, smart speakers, or a terrible automated customer service system. The usual flow is you ask them to do something for you. But, they either don’t understand you or it cuts you off half way.

Well the good news is that this is all about to change.

Thanks to the new AI companies, who spend tens of billions on NVIDIA GPUs, these are no longer bad.

In this post, you’ll learn about some of the players in making the next Conversation Agents. You'll also learn how you might encounter them in your day.

Also, you might pick up a couple of tips on how to create your own small army of smart digital agents. They'll help you do whatever you want.

Before we tuck in, watch this short demo from Retell to frame this entire conversation.

Now that your brain has exploded.

What is a Conversational Agent?

In simple terms, a Conversational Agent is a computer program. It uses artificial intelligence to simulate human-like conversations. It does this through text or speech-based interactions.

For example, instead of you calling a restaurant to book a table, a Conversational Agent would do it for you. It would act as a digital assistant on your phone or echo. It would help you with tasks like scheduling appointments or ordering groceries.

It either always listens out for a command in a Siri/Alexa type way or is triggered by a key press or tap. You would then ask your agent to do something. You could type or speak. It would respond like a human and do the task.

Before you ask, yes you can give it any voice you want — Morgan Freeman is definitely on the table here.

Taking this one step further. A Conversation Agent could also be a Sales Person, Customer Service Rep, Consultant, or Mental Health companion. It could even be an all knowing assistant like JARVIS from Iron Man.

These use-cases are great. They are direct connections to OpenAI and GPT4. They have all of human knowledge at their digital fingertips. They are only restricted by how you set them up.

Want to make a digital property investment advisor. It will work 24/7 and have 1000s of 1-on-1 conversations at once.

Sure why not.

This may seem farfetched to those learning about these for the first time. Historically, few things were more annoying than speaking with a bad bot. Super annoying bordering on anger inducing. However, there were good reasons for this from a technical point of view. Some of which have only recently been solved.

Take Latency for example. The delay between the user finishing their sentence to getting a response back was too high. Next the human element. Digital Voices actually weren’t that good. They lacked emotion and we instantly knew we were talking to a bot. A dumb one at that. And finally, as mentioned before, the lack of context and understanding are big issues. The company only has a few account details on record.

This has now all changed.

Where are we now?

Large Language Models give us easy access to human knowledge through an API. With teraflops of processing power available, latency is no longer a problem. Natural language processing has improved, and new machine learning techniques make Conversational Agents more human-like by understanding and responding to user requests contextually.

Add to the mix Eleven Labs' high-quality digital synthetic voices. They can convey emotion and intonation. This gives the Conversational Agents a more natural feel. This mix is a recipe for one of the most transformative technologies of our generation.

Next up we have something called a context window.

A context window is the information a Conversational Agent remembers while working. It's like the agent's short-term memory. This helps the agent understand and answer many questions from the user quickly.

Until recently this was a problem. But, a mix of approaches, including RAG, has solved it. They made bigger context windows a thing, making these agents very powerful.

For example, compare an agent that can only recall one past conversation at a time while holding a conversation. Compare it to an agent that remembers every conversation with a user and can recall it at any moment.

No contest.

Add GPT4 or similar models as a base layer to create a powerful Conversational Agent. This agent can perform various roles more effectively than a human. It can handle multiple tasks simultaneously and maintain consistency.

Emotional understanding

This goes into a more controversial area. Technology can understand emotions in both directions between a user and a Conversational Agent. These agents can sense and react to user emotions and show emotional understanding in their responses. This makes them more human-like and relatable.

Typically this tech has been semi-limited to voice but startups like Hume.ai have brought this into facial recognition also.

Humans have a great ability to pick up on micro-expressions when talking with another human. Historically, AI's did not. This has also now changed.

The Conversational Agents now roam the internet and weave into our lives. They will not only be more effective, but they will adapt to human emotion too.

AI that read human emotion are banned under the AI act in the EU. So, if you’re building something here, note that. But Conversational Agents are here to stay.

Now onto some people you should look at in this space.

Companies to watch

Here's a list of some of the companies that might worth watching in this space.

  • Retell - Conversational Voice API for Your LLM

  • Sindarin - Deploy lifelike conversational speech AI in your business.

  • Elto - Elto is a highly authentic, low-latency live voice AI that can make phone calls up to 1 hour long and execute follow-up tasks.

  • Bland - Bland is the infrastructure for building AI phone calling applications at scale.

  • Vee - An intelligent consultant people enjoy talking to.

So, what will you build?

If you’re starting a company and have an interesting application or technology in this space, we’d love to talk to you at Interface. We invest in pre-seed and seed stage startups.

Drop me an email at tom@interfacecap.com.

Or if you’re just exploring the world of AI and want to learn through emersion and a peer group then join us inside of Shiny by joining here.