Integrating Computer Vision with NLP

My new car displays the highway speed limit. It’s a small reproduction of a speed limit sign, located on the upper left corner of the display. Which isn’t such a big deal, considering map data has included speed limits for the last ten years.

But I had a surprise when I was driving on a rural back road. It was twisty and in the middle of the forest. I was preoccupied with trying to avoid the deer springing out of the ditches, but noticed the speed limit had changed to eighty-five miles per hour. What?

This was clearly a bug in the software or a typo in the GPS data. I maintained a reasonable speed limit and went back to watching for leaping deer. About a mile later, another speed limit sign appeared indicating twenty-five miles per hour. I looked at the map, and sure-enough, it had gone back to 25 mph.

Several miles later, another speed limit sign appeared. Officially, it was a 25 miles per hour. However, some miscreant with a can of black spray paint had changed the number two into a crude number eight. Take a look at this image showing the before and after.

I quickly recognized the speed limit should be twenty-five mph. But my car was again indicating the speed limit was 85 mph. It was dangerously clear that one of car’s cameras had seen the sign, recognized the words “SPEED LIMIT,” recognized the number, and fed the new speed limit to the computer.

I was fortunate the car wasn’t making decisions about how fast it could drive. But it brings up an interesting example of how natural language processing is an essential part of computer vision.

Computers in a Human World

Computers would be much happier in a world without human speech and writing. Our language is messy, context sensitive, and contains words with different meanings (“I saw the man with the saw“), different pronunciations (“I will read the book you read last week.” ), and different spellings (“In the UK, color is spelled colour.” ). Computers are picky about word definitions, spelling, and intent.

Consider a speed limit sign with a QR code instead of numbers and words. If our aforementioned miscreant vandalized the sign, the QR code error protocol would be violated and the computer would ignore the data. It’s either accurate, or it isn’t.

But computers exist in our world, so they have to deal with our language. It’s delightful to assume all essential data will be available in digital form, but that’s not the way it works. We insist on hand-written signs, chalkboards, paper notes, and scribbles on napkins. Computer vision will have to deal with this constant flow of anomalies.

Step 1: Optical Character Recognition

We’ve been doing Optical Character Recognition (OCR) for a long time. With the addition of AI, computers are getting really good at interpreting fonts, colors, and letters. Converting the printed page to a faithful digital reproduction is a reliable process. But that only provides a string of characters and punctuation. OCR won’t indicate if a sign is a speed limit or an advertisement for beer. It can only convert lines to characters. If we want the meaning of the converted text, we need natural language processing.

Step 2: Natural Language Processing

The goal of natural language processing (NLP) is to enable machines to understand, interpret, and generate human language in a way that is both meaningful and contextually relevant. This involves tasks such as language translation, sentiment analysis, speech recognition, and text generation. NLP combines computational linguistics, computer science, and cognitive psychology to bridge the gap between human communication and computer understanding, allowing machines to process, analyze, and respond to human language data.

That’s a mouthful; fortunately, there are sophisticated NLP tools available for many computer languages. For example, here is a simple NLP task written with the R programming language:

library(tm, quietly = TRUE)
sampleText <- as.String("In natural language processing (NLP), 
a token refers to a sequence of characters that represents 
a meaningful unit of text.")
# produces bag of words -----------
Boost_tokenizer(sampleText)

This would produce the following:

[1] "In"         "natural"    "language"   "processing" "(NLP),"    
 [6] "a"          "token"      "refers"     "to"         "a"         
[11] "sequence"   "of"         "characters" "that"       "represents"
[16] "a"          "meaningful" "unit"       "of"         "text."

By itself, that isn’t much of an improvement over OCR. But this is only the first step that will then enable the use of parts-of-speech, n-grams, stop-words, and lexical analysis.

Want to Know More?

Look for my course on Performing Natural Language Processing with R on Educative. It covers all aspects of Natural Language Processing, including sentiment analysis as well as other NLP topics. Find it at https://www.educative.io/courses/performing-natural-language-processing-with-r

How about some Tips & Tricks on Programming R?

Sign up to receive content in your inbox every month.

We don’t spam! Read our privacy policy for more info.

How about some Tips & Tricks on Programming R?

Sign up to receive content in your inbox every month.

We don’t spam! Read our privacy policy for more info.