Voice assistant Alice for PC (Windows). All about Alice: what is the Yandex voice assistant capable of? Alice show me the map

07.10.2021

Mobile application development,

Data Mining

Machine learning

In the future, we think people will interact with devices using their voice. Already, applications recognize the exact voice commands embedded in them by developers, but with the development of artificial intelligence technologies, they will learn to understand the meaning of arbitrary phrases and even maintain a conversation on any topic. Today we will tell Habr readers about how we are bringing this future closer using the example of Alice, the first voice assistant that is not limited to a set of predefined answers and uses neural networks for communication.

Despite its apparent simplicity, the voice assistant is one of Yandex’s largest technology projects. In this post, you will learn what difficulties voice interface developers face, who actually writes answers for virtual assistants, and what Alice has in common with artificial intelligence from the movie “Her.”

At the dawn of their existence, computers were mainly used in large scientific or defense enterprises. At that time, only science fiction writers thought about voice control, but in reality, operators loaded programs and data using a piece of cardboard. Not the most convenient way: one mistake, and you have to start all over again.

Over the years, computers have become more affordable and are beginning to be used in smaller companies. Specialists control them using text commands entered into the terminal. A good, reliable method - it is used in a professional environment to this day, but it requires preparation. Therefore, when computers began to appear in the homes of ordinary users, engineers began to look for simpler ways for machine and person to interact.

The concept of the graphical interface WIMP (Windows, Icons, Menus, Point-n-Click) was born in the Xerox laboratory - it has found widespread use in products from other companies. It was no longer necessary to memorize text commands to control a home computer - they were replaced by gestures and mouse clicks. For its time this was a real revolution. And now the world is moving towards the next one.

Now almost everyone has a smartphone in their pocket, which has enough computing power to land a spacecraft on the moon. The mouse and keyboard have replaced the fingers, but we still use them to make the same gestures and clicks. This is convenient to do while sitting on the couch, but not on the road or on the go. In the past, humans had to learn machine language to interact with computer interfaces. We believe that now is the time to teach devices and applications to communicate in the language of people. It was this idea that formed the basis of the Alice voice assistant.

You can ask Alice [Where can I get coffee nearby?] instead of dictating something like [coffee shop on Cosmonaut Street]. Alice will look into Yandex and suggest a suitable place, and in response to the question [Great, how to get there?] - she will give a link to an already built route in Yandex.Maps. She knows how to distinguish precise factual questions from the desire to see classic search results, rudeness from a polite request, a command to open a site from the desire to just chat.

It may even seem that somewhere in the cloud there is a neural miracle network that alone solves any problem. But in reality, behind any answer from Alice there is a whole chain of technological problems, which we have been learning to solve for 5 years now. And we will begin our excursion from the very first link - with the ability to listen.

Hello Alice

Artificial intelligence from science fiction can listen - people don’t have to press special buttons to turn on “recording mode.” And for this you need voice activation - the application must understand that a person is contacting it. This is not as easy to do as it might seem.

If you just start recording and processing the entire incoming audio stream on the server, you will very quickly drain the device’s battery and waste all your mobile traffic. In our case, this is solved using a special neural network, which is trained exclusively to recognize key phrases (“Hello, Alice”, “Listen, Yandex” and some others). Support for a limited number of such phrases allows you to do this work locally and without accessing the server.

If the network is trained to understand only a few phrases, you might think that doing so would be fairly simple and fast. But no. People pronounce phrases in far from ideal conditions, but surrounded by completely unpredictable noise. And everyone's voices are different. Therefore, to understand just one phrase, thousands of training recordings are needed.

Even a small local neural network consumes resources: you can’t just pick up and start processing the entire stream from the microphone. Therefore, at the forefront, a less heavyweight algorithm is used that cheaply and quickly recognizes the “speech has started” event. It is this that turns on the neural network engine for recognizing key phrases, which in turn runs the hardest part - speech recognition.

If thousands of examples are needed to train just one phrase, then you can imagine how labor-intensive it is to train a neural network to recognize any words and phrases. For the same reason, recognition is performed in the cloud, where the audio stream is transmitted, and from where ready-made answers are returned. The accuracy of the answers directly depends on the quality of recognition. That is why the main challenge is to learn to recognize speech as well as a person does. By the way, people make mistakes too. It is believed that a person recognizes 96-98% of speech (WER metric). We managed to achieve an accuracy of 89-95%, which is not only comparable to the level of a live interlocutor, but also unique for the Russian language.

But even speech perfectly converted into text will mean nothing if we cannot understand the meaning of what was said.

What's the weather like tomorrow in St. Petersburg?

If you want your application to display a weather forecast in response to a voice request [weather], then everything is simple - compare the recognized text with the word “weather” and if you get a match, display the answer. And this is a very primitive way of interaction, because in real life people ask questions differently. A person can ask an assistant [What is the weather tomorrow in St. Petersburg?], and he should not be confused.

The first thing Alice does when she receives a question is recognize the scenario. Send a search request and show classic results with 10 results? Search for one exact answer and immediately give it to the user? Take an action, such as opening a website? Or maybe just talk? It is incredibly difficult to teach a machine to accurately recognize behavioral scenarios. And any mistake here is unpleasant. Fortunately, we have all the power of the Yandex search engine, which every day encounters millions of queries, searches for millions of answers and learns to understand which ones are good and which ones are not. This is a huge knowledge base, on the basis of which it is possible to train another neural network - one that would “understand” with a high probability what exactly a person wants. Mistakes, of course, are inevitable, but people make them too.

With the help of machine learning, Alice “understands” that the phrase [What is the weather tomorrow in St. Petersburg?] is a weather request (by the way, this is obviously a simple example for clarity). But what city are we talking about? On what date? This is where the stage of retrieving named entities from user replicas begins (Named Entity Recognition). In our case, two such objects carry important information: “Peter” and “tomorrow”. And Alice, who has search technologies behind her, “understands” that “Peter” is a synonym for “St. Petersburg”, and “tomorrow” is “current date + 1”.

Natural language is not only the external form of our remarks, but also their coherence. In life, we do not exchange short phrases, but conduct a dialogue - it is impossible if you do not remember the context. Alice remembers it - it helps her deal with complex linguistic phenomena: for example, cope with ellipsis (recover missing words) or resolve coreference (identify an object by pronoun). So, if you ask [Where is Elbrus?], and then clarify [What is its height?], then the assistant will find the correct answers in both cases. And if after the request [What is the weather today?] ask [And tomorrow?], Alice will understand that this is a continuation of the dialogue about the weather.

And something else. The assistant must not only understand natural language, but also be able to speak it - like a person, not like a robot. For Alice, we synthesize the voice that originally belonged to dubbing actress Tatyana Shitova (the official voice of Scarlett Johansson in Russia). She voiced artificial intelligence in the film Her, although you might also remember her from the voice of the sorceress Yennefer in The Witcher. Moreover, we are talking about a fairly deep synthesis using neural networks, and not about cutting ready-made phrases - it is impossible to write down all their diversity in advance.

Above we described the features of natural communication (unpredictable form of remarks, missing words, pronouns, errors, noise, voice), which you need to be able to work with. But live communication has one more property - we do not always demand a specific answer or action from the interlocutor; sometimes we just want to talk. If the application sends such requests to the search, then all the magic will be destroyed. This is why popular voice assistants use a database of editorial answers to popular phrases and questions. But we went even further.

What about chatting?

We taught the machine to answer our questions, conduct a dialogue in the context of certain scenarios, and solve user problems. This is good, but is it possible to make her less soulless and endow her with human properties: give her a name, teach her to talk about herself, maintain a conversation on free topics?

The voice assistant industry solves this problem through editorial responses. A special team of authors takes hundreds of the most popular questions among users and writes several answers to each. Ideally, this should be done in a unified style, so that all the answers form a cohesive personality of the assistant. We also write answers for Alice - but we have something else. Something special.

In addition to the top popular questions, there is a long tail of low-frequency or even unique phrases for which it is impossible to prepare an answer in advance. You already guessed how we solve this problem, right? Using another neural network model. To answer questions and remarks unknown to her, Alice uses a neural network trained on a huge database of texts from the Internet, books and films. Machine learning connoisseurs may be interested in the fact that we started with a 3-layer neural network, and now we are experimenting with a huge 120-layer one. We’ll save the details for specialized posts, but here we’ll say that the current version of Alice tries to respond to arbitrary phrases using a “neural network chat” - that’s what we call it internally.

Alice learns from a huge number of different texts, in which people and characters do not always behave politely. A neural network can learn something completely different from what we want to teach it.

- Order me a sandwich.
- You'll get by.

Like any child, Alice cannot be taught not to be rude by protecting her from all manifestations of rudeness and aggression - that is, by training the neural network on a “clean” basis, where there are no rudeness, provocations and other unpleasant things often found in the real world. If Alice does not know about the existence of such expressions, she will answer them thoughtlessly, with random phrases - for her they will remain unknown words. Let her know better what it is and develop a definite position on these issues. If you know what swearing is, you can either swear back or say that you will not talk to someone who is swearing. And we model Alice's behavior so that she chooses the second option.

It happens that Alice’s remark itself is quite neutral, but in the context specified by the user, the answer ceases to be harmless. Once, during closed testing, we asked the user to find some establishments - a cafe or something similar. He said: “Find another one like it.” And at that moment a bug occurred in Alice, and instead of running the organization search script, she gave a rather daring answer - something like “look on the map.” And I didn’t look for anything. The user was surprised at first, and then surprised us too by praising Alice’s behavior.

When Alice uses the neural network chatter, a million different personalities can appear in it, since the neural network has absorbed a little from the author of each replica from the training set. Depending on the context, Alice can be polite or rude, cheerful or depressed. We want the personal assistant to be a holistic person with a very specific set of qualities. This is where our editorial texts come to the rescue. Their peculiarity is that they were initially written on behalf of the personality that we want to recreate in Alice. It turns out that you can continue to train Alice on millions of lines of random texts, but she will respond with an eye to the standard of behavior laid down in the editorial responses. And this is what we are already working on.

Alice became the first voice assistant we know of that tries to maintain communication not only with the help of editorial responses, but also using a trained neural network. Of course, we are still very far from what is depicted in modern science fiction. Alice does not always accurately recognize the essence of the remark, which affects the accuracy of the answer. Therefore, we still have a lot of work to do.

We plan to make Alice the most humanoid assistant in the world. Instill in her empathy and curiosity. Make her proactive - teach her to set goals in dialogue, take initiative and involve the interlocutor in the conversation. We are now both at the very beginning of our journey and at the forefront of science studying this area. To move further, you will have to move this edge.

Not long ago, such a well-known search engine as Yandex released its own voice assistant and it is called very simply - Alice.

I think you are very interested in such things and therefore, I decided to go through the issues that people are interested in most often.

Voice assistant Alice from Yandex - what is it?

Like other similar assistants, she can talk to you and, using voice or text dialogue, give you answers to the questions you need.

Features of the Alice voice assistant from Yandex

Alice is nothing special and has all the similar features that you can find in similar options from Google or Apple.

Basically, it works with all services from Yandex. If you try to interact with other applications, problems may arise.

All functions can be described by the following points:

conduct a simple dialogue;
give answers to various questions;
everything related to the weather forecast (in different cities, weather for tomorrow, etc.);
clarification of the date and day (which is very important);
any information related to maps (plot a route, find out the distance, tips on where to eat, etc.);
transactions with money (find out exchange rates, transfer from one currency to another, etc.);
other.

Although we already have a full-fledged version, the assistant still has room to grow and despite its limited capabilities, the reviews are only positive.

The question is: “How will it compete with existing options?”

How to enable the voice assistant Alice from Yandex?

At the moment there are versions for iOS, Android, Windows (beta) and in the future it is planned to be built into Yandex Browser.

If you are looking for a version for a mobile device, you can find it in the Yandex application. The developers decided to simply integrate the assistant into a ready-made program.

To talk to Alice, you need to do one of these actions (with the Yandex application running):

click on the purple round button with a microphone;
We say “Hello Alice.”

In both cases, we get exactly the same result and then we start asking questions and Alice starts answering you.

If the assistant does not know how to implement your request, then the Yandex search engine opens with your question and a list of results.

Everything looks like the most ordinary chat. I think there will be some changes in the future, but for now everything looks quite simple and tasteful.

Who voiced the voice assistant Alice from Yandex?

Alice is voiced by the very famous actress Tatyana Shitova, and if you don’t know who she is, I can say that she is the voice of Scarlett Johansson in the Russian dubbing.

So when watching films like Ghost in the Shell or Lucy, you can remember Alice and compare the voices. But this is optional.

How to download the voice assistant Alice from Yandex on iOS or Android?

If you try to find an assistant by simply entering Alice in the App Store or Google Play search, then the answers will contain an application called Yandex.

Don't be alarmed, because this is it. Previously, this program was dedicated only to the search engine, but now there is a built-in assistant.

It weighs differently for different devices (for example, on the iPhone 5S - a little more than 60 MB), so it won’t take up much space. Here are the links so you don't get confused:

Good afternoon. The official release of the Alice voice assistant for smartphones took place, which made me happy, and a beta version of the assistant for Windows was also released today. I installed it, tested it a little and was just as pleasantly surprised.

Voice assistant Alice for PC

For installation " Alice"on a PC, you need to go to the website https://alice.yandex.ru/windows and click on " Install“, after that the installation file will be downloaded from you. Launch and install.

ATTENTION! Yandex has removed the Alice installer; the official link now downloads a browser with built-in Alice!

I still have the installer if anyone needs it —

(screenshot)

After installation, you will see at the bottom left near the button Start search bar, on Win 10 it is integrated into the standard search, on Win7 it is placed as a separate widget. Let's look at what this assistant, which is in the Beta stage, can do now.

In the first tab, frequently visited sites and trending news or search engine queries, as I understand it:

In the second tab there is a list of programs that you can open either by clicking on the program itself with the mouse, or by asking “Voice Control” to open the application for you.

If on the main (first) tab you click on the icon “ question mark“, then you will see there a small list of what Alice can do:

For the test, I decided to ask her the latest news, to which Alice said that she was giving the floor to her colleague from “ Yandex.News“, and a male voice began to read the news.

Then I tried to talk to her, in principle she answered exactly the same as on the phone. Applications open without, websites too. If you ask her to turn on the radio or a certain song/group, then Alice opens a browser, opens Yandex.Music in it and launches what you asked her to do. She doesn't know how to work with video yet.

For PC control - it can turn the sound on and off, turn off, restart the PC, and also send it to sleep mode.

Conclusion:

What can we say? Yandex did a great job on its assistant. I hope they don’t abandon it, but continue to develop it. This is not a bad analogue of Cortana, which we are unlikely to see in Windows 10. Microsoft has been promising to release it in Russian for a long time, but so far it has fallen on deaf ears. And then Yandex and Alice just arrived.

Install, try, test.

Share in the comments what other interesting functions and “jokes” it has, what it can do and how it really helps you in your daily work with your PC.

16.03.2018

The new version of Yandex Browser comes with the Alice voice assistant. Some time ago, Alice was included in the mobile version of the browser, and now it is also available in the Windows version of the browser.

Alice is a voice assistant that is built into the operating system and allows you to control your computer and Yandex Browser using speech. The main task of the function is to perform commands such as opening sites and third-party applications, performing search queries and obtaining answers to a variety of questions. The new function sits in the right corner of the taskbar and responds to the voice command “Listen, Alice” or to a mouse click on the icon.

For example, Alice can be given voice commands such as “Open”, “Turn off the sound”, “Open VKontakte”, “Turn off the computer” and many others. The command format is not at all strict - Alice successfully understands both the commands “Open VKontakte” and “Go to the VKontakte website”, and in response to the question “What is VKontakte” she will give a brief reference, without opening the site.

In fact, executing commands is only a small part of the voice assistant's capabilities. Another key feature is answering questions asked by the user. It is important to clarify that the program responds by voice with duplication of the response in text. The simplest thing is background information, such as weather, exchange rates, traffic jams, exact date and time, etc. To answer more interesting questions, such as “What is...”, information from Wikipedia and Yandex’s own services is used, which is read briefly. More complex questions turn into search queries with the opening of search results in Yandex.Browser.

Moreover, a lonely user can even talk to a smart robot. The program readily answers not very difficult questions and will easily tell you about your favorite color or why her name is Alice. She can even tell a joke or two and play word games like Cities and many others. And if the user asks to sing a song, the voice assistant will send it to the Yandex.Music service.

The voice assistant from Yandex is focused on the Russian language. During its development, technologies created by Yandex itself were used, taking into account the peculiarities of the semantics of the Russian language. Alice's voice recognition and speech synthesis are at a fairly high level. Simple commands are recognized and executed efficiently and quickly, and flaws can only be noticed when recognizing queries containing very complex word forms or when there are significant problems with the user’s diction.

If a Yandex Browser user does not need a voice assistant, then it can be disabled in the program settings.

And yes, don’t call Alice by the names Cortana and Siri - this makes her upset.

Hello Alice.

It becomes easier to get answers to many questions when you have the Alice voice assistant from Yandex at hand. Yandex Alice is a personal assistant with artificial intelligence, developed by Yandex, an alternative to its competitor Okay Google. Alice easily helps you cope with everyday tasks and communicates meaningfully. The program is created on the basis of neural networks that recognize speech, accents in the voice, create responses and synthesize the assistant’s voice. Thanks to such skills, Alice is able to improvise and communicate in a spoken language accessible to everyone. With each subsequent update of the voice assistant, the program gains new capabilities, and now in addition to performing search queries

Alice can:

This is not the entire list of her capabilities; she is constantly learning new skills and improving herself.

If you are bored or sad, she will joke, tell an anecdote or play with you. Would you like to watch a movie? Easy - movie posters, tickets and prices in a jiffy. Alice can play a fairy tale for children. Her answers will always be varied; the creators of the application worked for a long time and were able to put modern live speech into the voice assistant, which will be understandable to many.

Russian actress Tatyana Shitova participated in creating the voice. She previously voiced American actress Scarlett Johansson. Coincidence or not, but the voice of Tatyana Shilova in the science fiction film She spoke to the virtual assistant Samantha. Thanks to this voice acting, Alice turned out to be very alive. Her intonations reveal sadness, joy and even insolence.

The creators explained why they decided to focus on the virtual assistant. First, the industry is moving towards voice messaging as today's generation of users prefer voice search over typing. Secondly, building algorithms on meaningful dialogues. That is, the virtual assistant understands that subsequent phrases may be interrelated. This is what the dialogue is based on. The Yandex Alice voice assistant is now built in by default, making the browser much more convenient.

Video review of Yandex Alice

How to install Alice Yandex

1. Download the Alice application from the link below.
2. Install the application.
2. Allow the application to determine geolocation.
3. For full operation, allow to record sound.
4. For ease of use, you can add a widget or shortcut to the home screen.