Using voice commands in .NET MAUI

This post is a continuation of the Hackathon topic post, where the technical implementation of voice commands in .NET MAUI is revealed, as well as the challenges the development team faced and how they successfully solved them.

Artikelbild für Using voice commands in .NET MAUI

After discussing various ideas that were presented by each member of our team, we chose the idea of creating a voice assistant (hereinafter referred to as the assistant) that could record information about the amount of food and drinks eaten in a natural form.

In this blog article, I would like to share how the work of the development team was going and show what steps were taken to develop the business logic and visualization of the application.

Program operation algorithm
Assistant setup and creating a data model
Parsing plain text into data model
Voice-to-text and vice versa
Program visualization
Сonclusion

Program operation algorithm

First of all, we needed to create a conceptual model to better understand how the program should work. In general, it should look like the following:

The assistant asks the user a question
The user answers the assistant in the usual way
The user's voice response is translated into text and analyzed
Based on the analysis, the assistant's voice response is formed.

On the basis of this model a flowchart was created, which describes in detail the algorithm of the program operation:

flowchart

As you can see from the flowchart, communication with the assistant can be divided into three stages:

The user freely tells what he/she ate or drank and in what amount.
If the amount of ingredients was not initially stated, the user receives a clarifying question from the assistant and states the amount for a particular ingredient.
If all ingredients have an amount, the user gets asked if there was still a meal.

Assistant setup and creating a data model

The main difficulty in creating this kind of application is that we need to analyze the user's text and represent data model objects as a result. At the same time, we should not limit the user in how (in what format) he should list the meals. Everything should be in a conversation format that is natural to a user. The use of AI is ideal for this purpose. Let's see how the initialization of the GPT Chat, which we have chosen for this task, looks like:

The following data model structure was then created:

The result of the communication with the assistant should be a meal array containing ingredients with name, amount, category and measurement.

Parsing plain text into data model

Now came probably the most interesting stage, which required correctly configuring the GPT chat client. From the flowchart we can see that we have three methods:

public async Task ParseMeal(string userText)
public async Task GetAmount(string userText, string name)
public async Task IsEnd(string userText)

For example, the body of the ParseMeal method looks as follows:

The most interesting thing is the value of the assistantRequest variable, which essentially configures the chat client. Let's see what the client configuration looks like for each method:

ParseMeal:

GetAmount:

IsEnd:

Thus, having created a request for the assistant, we clearly define the structure of the response in the JSON format, so that later, using deserialization, we can obtain the objects we need.

Voice-to-text and vice versa

Voice to text transformation

The next step that needed to be done was to convert the voice to plain text and back again. Luckily, .NET MAUI Community Toolkit is quite large and offers Speech To Text API. This allows converting spoken words into text, which can be used in a variety of ways for the IOS, Android, Mac, and Windows platforms. Here is a code snippet for using the service:

We also added a timer with a delay of 3 seconds, after which the voice is stopped. In case of successful recognition, the resulting string is processed by the Action<string> progress delegate, where the result is written to RecognitionText.

Due to the specifics of speech recognition in Android, the final string is formed differently than on other platforms.

Text to voice transformation

Reverse text-to-speech conversion can be implemented quite simple, like this:

where message is an object of the Viewmodel class:

and _speechOptions can be configured as follows:

Program visualization

Communication with the assistant should look as natural as possible for the user, so it was decided to organize a dialog with the assistant as it would be with an ordinary user.

For this purpose, three types of messages were created in the SystemMessageTemplate, OutgoingMessageTemplate, ResultMessageTemplate and MessageDataTemplateSelector, which would be able to substitute the necessary message template into the CollectionView depending on the type of message. The templates in turn depend on the values of the MessageType and MealType properties of the MessageViewModel object.

Example OutgoingMessageTemplate:

Let's look at the result:

App Screen 1

App Screen 2

App Screen 3

App Screen 4

Сonclusion

It was a real challenge for the development team to step out of their comfort zone, try and learn something new, design the application architecture and create a working instance in just three days. As a result, we gained experience in integrating artificial intelligence into applications, as well as operational work in a strong and friendly team.

Igor Gridin

Igor Gridin

Mit über 15 Jahren Erfahrung in der Softwareentwicklung mit Java und Kotlin unterstütze ich im Banken- und Logistik-Bereich unsere Kunden bei spannenden nativen Android-Projekten. Hier schreibe ich über alle neuen Technologien und Trends, die im Bereich der Android-Anwendungsentwicklung auftauchen.

Verwandte Artikel

Erstellen eines .NET MAUI Karten-Steuerelements

von Sebastian Seidel | 05.01.2025 Erstellen eines .NET MAUI Karten-Steuerelements

Ich arbeite derzeit an der Portierung einer Xamarin Forms App zu .NET MAUI. Die App verwendet auch Karten von Apple oder Google Maps, um Standorte anzuzeigen. Obwohl es bis zur Veröffentlichung von .NET 7 keine offizielle Unterstützung in MAUI gab, möchte ich Ihnen eine Möglichkeit zeigen, Karten über einen benutzerdefinierten Handler anzuzeigen.

Responsive Layouts in .NET MAUI

von Flavio Goncalves | 04.01.2025 Responsive Layouts in .NET MAUI

.NET MAUI ermöglicht es uns, plattform- und geräteunabhängige Anwendungen zu schreiben, was eine dynamische Anpassung an die Bildschirmgröße und -form des Benutzers erforderlich macht. In diesem Blog-Beitrag erfahren Sie, wie Sie Ihre XAML-Layouts an unterschiedliche Geräteausrichtungen anpassen können. Dabei verwenden Sie eine ähnliche Syntax wie OnIdiom und OnPlatform, die Ihnen vielleicht schon bekannt ist.

Bidirectional communication with MQTT in .NET MAUI

von Martin Luong | 30.09.2023 Bidirectional communication with MQTT in .NET MAUI

As mobile app developer, we constantly have the need to exchange information between the app and the backend. In most cases, a RESTful-API is the solution. But what if a constant flow of data exchange in both directions is required? In this post we will take a look at MQTT and how to create your own simple chat app in .NET MAUI.