This post is a continuation of the Hackathon topic post, where the technical implementation of voice commands in .NET MAUI is revealed, as well as the challenges the development team faced and how they successfully solved them.
After discussing various ideas that were presented by each member of our team, we chose the idea of creating a voice assistant (hereinafter referred to as the assistant) that could record information about the amount of food and drinks eaten in a natural form.
In this blog article, I would like to share how the work of the development team was going and show what steps were taken to develop the business logic and visualization of the application.
First of all, we needed to create a conceptual model to better understand how the program should work. In general, it should look like the following:
The assistant asks the user a question
The user answers the assistant in the usual way
The user's voice response is translated into text and analyzed
Based on the analysis, the assistant's voice response is formed.
On the basis of this model a flowchart was created, which describes in detail the algorithm of the program operation:
As you can see from the flowchart, communication with the assistant can be divided into three stages:
The user freely tells what he/she ate or drank and in what amount.
If the amount of ingredients was not initially stated, the user receives a clarifying question from the assistant and states the amount for a particular ingredient.
If all ingredients have an amount, the user gets asked if there was still a meal.
Assistant setup and creating a data model
The main difficulty in creating this kind of application is that we need to analyze the user's text and represent data model objects as a result. At the same time, we should not limit the user in how (in what format) he should list the meals. Everything should be in a conversation format that is natural to a user. The use of AI is ideal for this purpose. Let's see how the initialization of the GPT Chat, which we have chosen for this task, looks like:
The following data model structure was then created:
The result of the communication with the assistant should be a meal array containing ingredients with name, amount, category and measurement.
Parsing plain text into data model
Now came probably the most interesting stage, which required correctly configuring the GPT chat client. From the flowchart we can see that we have three methods:
public async Task ParseMeal(string userText)
public async Task GetAmount(string userText, string name)
public async Task IsEnd(string userText)
For example, the body of the ParseMeal method looks as follows:
The most interesting thing is the value of the assistantRequest variable, which essentially configures the chat client. Let's see what the client configuration looks like for each method:
ParseMeal:
GetAmount:
IsEnd:
Thus, having created a request for the assistant, we clearly define the structure of the response in the JSON format, so that later, using deserialization, we can obtain the objects we need.
Voice-to-text and vice versa
Voice to text transformation
The next step that needed to be done was to convert the voice to plain text and back again. Luckily, .NET MAUI Community Toolkit is quite large and offers Speech To Text API. This allows converting spoken words into text, which can be used in a variety of ways for the IOS, Android, Mac, and Windows platforms. Here is a code snippet for using the service:
We also added a timer with a delay of 3 seconds, after which the voice is stopped. In case of successful recognition, the resulting string is processed by the Action<string> progress delegate, where the result is written to RecognitionText.
Due to the specifics of speech recognition in Android, the final string is formed differently than on other platforms.
Text to voice transformation
Reverse text-to-speech conversion can be implemented quite simple, like this:
where message is an object of the Viewmodel class:
and _speechOptions can be configured as follows:
Program visualization
Communication with the assistant should look as natural as possible for the user, so it was decided to organize a dialog with the assistant as it would be with an ordinary user.
For this purpose, three types of messages were created in the SystemMessageTemplate, OutgoingMessageTemplate, ResultMessageTemplate and MessageDataTemplateSelector, which would be able to substitute the necessary message template into the CollectionView depending on the type of message. The templates in turn depend on the values of the MessageType and MealType properties of the MessageViewModel object.
Example OutgoingMessageTemplate:
Let's look at the result:
Сonclusion
It was a real challenge for the development team to step out of their comfort zone, try and learn something new, design the application architecture and create a working instance in just three days. As a result, we gained experience in integrating artificial intelligence into applications, as well as operational work in a strong and friendly team.
Igor Gridin
Mit über 15 Jahren Erfahrung in der Softwareentwicklung mit Java und Kotlin unterstütze ich im Banken- und Logistik-Bereich unsere Kunden bei spannenden nativen Android-Projekten. Hier schreibe ich über alle neuen Technologien und Trends, die im Bereich der Android-Anwendungsentwicklung auftauchen.
Ich arbeite derzeit an der Portierung einer Xamarin Forms App zu .NET MAUI. Die App verwendet auch Karten von Apple oder Google Maps, um Standorte anzuzeigen. Obwohl es bis zur Veröffentlichung von .NET 7 keine offizielle Unterstützung in MAUI gab, möchte ich Ihnen eine Möglichkeit zeigen, Karten über einen benutzerdefinierten Handler anzuzeigen.
.NET MAUI ermöglicht es uns, plattform- und geräteunabhängige Anwendungen zu schreiben, was eine dynamische Anpassung an die Bildschirmgröße und -form des Benutzers erforderlich macht. In diesem Blog-Beitrag erfahren Sie, wie Sie Ihre XAML-Layouts an unterschiedliche Geräteausrichtungen anpassen können. Dabei verwenden Sie eine ähnliche Syntax wie OnIdiom und OnPlatform, die Ihnen vielleicht schon bekannt ist.
As mobile app developer, we constantly have the need to exchange information between the app and the backend. In most cases, a RESTful-API is the solution. But what if a constant flow of data exchange in both directions is required? In this post we will take a look at MQTT and how to create your own simple chat app in .NET MAUI.