‹ Blog / FL1

Google Vision AI: Add Character Recognition to your App and a whole lot more…

Jun 4th 2021

We look at the Google Vision API and how you can add some powerful image processing capabilities to any App or website.

So what is Google Vision AI?

Well, Google says:
Derive insights from your images in the cloud or at the edge with Vertex AI’s vision capabilities powered by AutoML, or use pre-trained Vision API models to detect emotion, understand text, and more.
Think OCR (Optical character recognition) for starters, but a whole lot smarter.

OCR isn’t a new thing, but it’s certainly useful, and in todays digital age, being able to process an image and read text is essential. Now, when we say Image, this could be a photograph or any graphic with text, or simply holding your smartphone up to a menu to enlarge or translate the text that it’s recognising it.

You might be asking why Google cares, and it’s likely that one reason is, so it can add better image processing capabilities into its Search Engine to provide more accurate search results. For example, a historic issue for Search Engines has been crawling website where text is embedded into an image. Using Vision’s processing technology, this gets around this previous limitation and allows Google to bring image based content in to its algorithm too.

But OCR Isn’t all; here are just a few other things it can do:

  • Identify celebrity faces in images
  • Detect facial expression and the likelihood of certain emotions
  • Detect potentially explicit content
  • Identify logos
  • Detect well known places and landmarks

So, yes, you read that right, it can detect emotions, so you’d make sure you’re smiling in your picture, as Google Vision will detect it.

So why is all this useful?

Well for starters if you have an App or site that needs to read text from images, look no further.

How about if you have a Social site or a site that requires your users to upload and share pictures? In this scenario, identifying potentially explicit content or copyrighted logos and people could be very useful for moderation and flagging potentially problematic content.

Or how just a bit of fun?

Here’s a simple Mobile App we built in React Native that can read text of a camera image: