You may have noticed that the machine learning models you are building are great at using data with numbers, but when there is text in the data, you have to remove it or the model doesn’t work. This is because machine learning models are build with complex math, and nobody can do math with words, even a computer. In this chapter, you willcross this barrier in AI and learn to create a model that can analyize text data.
The reason analyzing text (often abbreviated as NLP, short for natural language processing) is tricky for computers is because human language is very nuanced, involving not only simply spoken speech, but also emotion and the conversation context for meaning. For example, saying “are you kidding me?” two different ways can convey two different meanings - either awe and excitement or disappointment. While humans can use the context of a sentence and the speaker’s emotion to seperate two different meanings of a particular phrase, computers won’t recognize the distinction and pick one, which may lead to incorrect understanding and bad answers to certain questions. On top of this, it is impossible for a computer to understand raw text, so text has to be converted into numbers.2.2 Bag of Words