10+ years of app development
Everything from a single source
50+ successful app projects

Blog

Using voice commands in .NET MAUI

This post is a continuation of the Hackathon topic post, where the technical implementation of voice commands in .NET MAUI is revealed, as well as the challenges the development team faced and how they successfully solved them.

Artikelbild für Using voice commands in .NET MAUI

After discussing various ideas that were presented by each member of our team, we chose the idea of ​​creating a voice assistant (hereinafter referred to as the assistant) that could record information about the amount of food and drinks eaten in a natural form.


In this blog article, I would like to share how the work of the development team was going and show what steps were taken to develop the business logic and visualization of the application.

Program operation algorithm

First of all, we needed to create a conceptual model to better understand how the program should work. In general, it should look like the following:

  1. The assistant asks the user a question
  2. The user answers the assistant in the usual way
  3. The user's voice response is translated into text and analyzed
  4. Based on the analysis, the assistant's voice response is formed.

On the basis of this model a flowchart was created, which describes in detail the algorithm of the program operation:

flowchart

As you can see from the flowchart, communication with the assistant can be divided into three stages:

  1. The user freely tells what he/she ate or drank and in what amount.
  2. If the amount of ingredients was not initially stated, the user receives a clarifying question from the assistant and states the amount for a particular ingredient.
  3. If all ingredients have an amount, the user gets asked if there was still a meal.

Assistant setup and creating a data model

The main difficulty in creating this kind of application is that we need to analyze the user's text and represent data model objects as a result. At the same time, we should not limit the user in how (in what format) he should list the meals. Everything should be in a conversation format that is natural to a user. The use of AI is ideal for this purpose. Let's see how the initialization of the GPT Chat, which we have chosen for this task, looks like:

The following data model structure was then created:

The result of the communication with the assistant should be a meal array containing ingredients with name, amount, category and measurement.

Parsing plain text into data model

Now came probably the most interesting stage, which required correctly configuring the GPT chat client. From the flowchart we can see that we have three methods:

  1. public async Task ParseMeal(string userText)
  2. public async Task GetAmount(string userText, string name)
  3. public async Task IsEnd(string userText)

For example, the body of the ParseMeal method looks as follows:

The most interesting thing is the value of the assistantRequest variable, which essentially configures the chat client. Let's see what the client configuration looks like for each method:

  1. ParseMeal:
  1. GetAmount:
  1. IsEnd:

Thus, having created a request for the assistant, we clearly define the structure of the response in the JSON format, so that later, using deserialization, we can obtain the objects we need.

Voice-to-text and vice versa

Voice to text transformation

The next step that needed to be done was to convert the voice to plain text and back again. Luckily, .NET MAUI Community Toolkit is quite large and offers Speech To Text API. This allows converting spoken words into text, which can be used in a variety of ways for the IOS, Android, Mac, and Windows platforms. Here is a code snippet for using the service:

We also added a timer with a delay of 3 seconds, after which the voice is stopped. In case of successful recognition, the resulting string is processed by the Action<string> progress delegate, where the result is written to RecognitionText.

Due to the specifics of speech recognition in Android, the final string is formed differently than on other platforms.

Text to voice transformation

Reverse text-to-speech conversion can be implemented quite simple, like this:

where message is an object of the Viewmodel class:

and _speechOptions can be configured as follows:

Program visualization

Communication with the assistant should look as natural as possible for the user, so it was decided to organize a dialog with the assistant as it would be with an ordinary user.

For this purpose, three types of messages were created in the SystemMessageTemplate, OutgoingMessageTemplate, ResultMessageTemplate and MessageDataTemplateSelector, which would be able to substitute the necessary message template into the CollectionView depending on the type of message. The templates in turn depend on the values of the MessageType and MealType properties of the MessageViewModel object.

Example OutgoingMessageTemplate:

Let's look at the result:

  App Screen 1  App Screen 2  App Screen 3  App Screen 4

Сonclusion

It was a real challenge for the development team to step out of their comfort zone, try and learn something new, design the application architecture and create a working instance in just three days. As a result, we gained experience in integrating artificial intelligence into applications, as well as operational work in a strong and friendly team.

Igor Gridin

Igor Gridin

With over 15 years of experience in software development with Java and Kotlin, I support our customers in the banking and logistics sector with exciting native Android projects. Here I write about all the new technologies and trends that are emerging in the field of Android application development.

Related articles

Bidirectional communication with MQTT in .NET MAUI
Bidirectional communication with MQTT in .NET MAUI

As mobile app developer, we constantly have the need to exchange information between the app and the backend. In most cases, a RESTful-API is the solution. But what if a constant flow of data exchange in both directions is required? In this post we will take a look at MQTT and how to create your own simple chat app in .NET MAUI.

7 steps to migrate from Xamarin.Forms to .NET MAUI
7 steps to migrate from Xamarin.Forms to .NET MAUI

With the end of support for Xamarin approaching in May 2024, developers are busy migrating existing Xamarin.Forms projects to .NET MAUI as its successor. So are we, of course. In this article, I'll show 7 steps we've always had to take during the transition to make your upgrade .NET MAUI easier.

Responsive Layouts in .NET MAUI: Master device orientation
Responsive Layouts in .NET MAUI: Master device orientation

.NET MAUI enable us to write platform and device-factor independent apps, which makes it neccessary to adapt dynamically to the users screen size and form. In this blog post you learn how to make your XAML layouts adapt to different device orientations, using a similar syntax to OnIdiom and OnPlatform that you might already be familiar with.