Our Technology: Building Conversational AI Part 2

This is the second part of a 4-part series on CalendarHero’s conversational AI. Read the first one here. We’ll be deconstructing the way we use our award-winning technology to help our customers schedule meetings quickly and intelligently.

In our previous blog post, we started the discussion of natural language processing and went over entity extraction. In this article, we’ll share our unique and powerful intent recognition algorithms that improved our accuracies by over 20%.

By POURIA FEZWEE | SENIOR AI ADVISOR

Intent Recognition

Intent recognition is a text classification problem that aims to distinguish the user’s intention based on their request. As an example, for the request “postpone my call with Cara to next week,” the intention is to reschedule a meeting. Intent recognition is a difficult problem due to two reasons:

1. Problem one. There is often many different ways to ask the same question due to variations in choice of words and the order in which the words appear in a sentence. Another way of thinking about this, is as an availability problem for the training data. In order to train the machine to recognize the users’ intent, we need to provide it with examples that capture the nature of those variations.

2. Problem two. For each intent that we support, the number of training request variations, that we provide, is negligible with respect to all the possible requests that our users may form. The second difficulty is commonly known as class unbalancedness. In other words, we may never have sufficient number of examples for the use cases that we do not support; therefore, we have to rely only on use cases that we do support.

Our novel approach addresses both of these problems by combining word embeddings and dependency graphs. Word embeddings capture the semantic (and some syntactic) structures of the language, while dependency graphs capture the syntactic (and some semantic) structures of the language. The combination of two addresses both problems:

1. Problem one. Word embeddings address the variations in choice of words, as alternative words are expected to be found in each other’s close proximity. Furthermore, dependency graphs take care of the variations in the ordering of words by relying on the grammatical structure of the language.

2. Problem two. Using the features that we extract from dependency graphs, we can define a notion of unknown intent through establishing a distance from the concept graph that we build using the hierarchical structure of concepts in the dependency graph.

In the following, we learn about word embeddings and dependency graph, in order to understand how each of the two helps us achieve our requirements and address the aforementioned challenges.

Word Embedding

Word embedding is the idea of mapping words to a high-dimensional space, where each word is presented by a vector in that space. This approach transforms the discrete space of words in a language for the continuous embedding space and helps gain notable computational advantages. The following are the two advantages that we find beneficial:

Similarities

If two words are used interchangeably in more or less similar contexts, their representations in the embedding space are positioned in relatively close proximity. For example, the representation of the word “Monday” would be closer to the representation of any other weekday, than any of the months, or any other words’ representation for that matter. The following graph shows a representation of those two groups of words in a lower dimensional space.

That is because months and weekdays are used in different contexts and are followed by different prepositions, and that is the common pattern that each of the two groups of words share, and one of the ways in which they’re different than any other word in the other category. For example, we tend to say “in March” and “on Monday”, and through differences such as that of the choice of preposition, we can establish different clustering for words, merely by making use of their usage in the language and without having to have prior knowledge about what they mean and what concepts they represent.

Relationships

Relationships between words can be represented by the direction and length of the vector that is the result of the subtraction of the two words. For example, between two words “woman” and “queen”, we can think of royalty as the relationship. Now, let’s imagine we use W and Q for the representation of those words in the embedding space. Therefore, the vector that is the result of Q − W is the “royalty” vector. Now, if we were to represent man with M, the result of M + (Q −W) should point to the representation of the word “king”. The following figure shows this example.

The main importance of word embeddings in the context of intent recognition is accounting for different variations of words and phrases that can be interchangeably used in a request.

Word embeddings are built using neural networks known as multilayer perceptron (MLP). We have used the pre-trained embeddings based on English Wikipedia and we’ve re-trained them based on millions of samples of chat data that we’ve collected. Therefore, the resulting embeddings are adjusted to the specific use for CalendarHero.

Dependency Graph

We talked about dependency graphs in the context of entity extraction. Three main properties of dependency graphs that make them well suited for intent recognition are hierarchical representation of the main concepts, order invariance dictated by the grammatical structure of the language, and insensitivity to miscellaneous details. Each one of these concepts is explained below.

Hierarchy of concepts

Dependency graph organizes words based on their dependencies on other words in a sentence or noun phrase. Since the graph is acyclic and consists of only one connected component, it’s essentially a tree. At the top of the tree is the root node. The root represents the word that does not depend on any other word and can therefore be taken as the single most important word. From the root, each step to the bottom of the graph and towards the leaves can be thought of as different hierarchies that show the importance of each word in the sentence. Therefore, intuitively, since the purpose of a request is conveyed using words and concepts that are of highest importance, in order to recognize the intent of a sentence, we may not need to traverse the tree too far from the root.

The following graph is meant to clarify this point, where the two first levels of hierarchies include the two words in the sentence that are most relevant to the intent of the request.

Order invariance

By changing the order of the words in a sentence, as long as the sentence remains grammatically correct, the corresponding dependency graph stays unchanged. This is because dependency is a relationship that is independent of order. Therefore, such invariance is maintained in the graph. This is important, because there are several correct ways of asking the same request. If we were to take into account all such variations, we would need a lot more training data, compared to the language model that the dependency graph provides us.

Robustness to extra detail

Adding more pieces to a sentence, as long as they don’t change the general meaning of the sentence, will have more impact on the leaves than on the root and the immediate next level. This is because, more often than not, details modify noun phrases that usually take direct, indirect, or propositional object roles in a sentence, and such roles are often concerned with entities rather than the intent.

In the previous dependency graph, as an example, the change of the attendee of the meeting or the change of the time will have no effect on the first two levels of the hierarchy, where the clues for the intent of the request resides. This is important, because we don’t want our intent recognition algorithm to be sensitive to parts of the request that don’t change the intent. By applying this technique, we further reduce the need for an excess of training data.

Although the last two properties of the dependency graph can be derived from the previous one, they deserve mentioning, given that we benefit from them greatly.

Modularity of Models

At CalendarHero, we support a variety of skill sets, where each one demands its own intent model, given that types of intents and requests that are valid for one skill set may not be shared with other ones. Therefore, we have built our technology so that intent models are built at application level, and each model is independent from other models. This modularity of design is important to us due to two reasons, both of which are concerned with scalability:

1. It allows us to build additional models without compromising the accuracy of the existing models.

2. It allows us to train our models incrementally, which in turn frees us from having to retrain already existing models, and therefore allows us to save computational cost.

As a result of this design choice, we build a distinct intent model per skill set. Therefore, each request’s intent is evaluated against different models, and the resulting intent is the one that is recognized with the highest level of confidence.

Engineering, AIKailah BharathMarch 31, 2020