A prototype using Google Cloud Vision API.

Go to Visiony.

What it does: In February 2016, Google announced the public Beta availability of their Google Cloud Vision API. Cloud Vision lets clients

...submit their images to the Cloud Vision API to understand the contents of those images — from detecting everyday objects (for example, “sports car,” “sushi,” or “eagle”) to reading text within the image or identifying product logos.
That sounds fun, right? Visiony allows users to upload an image (private or public) and see what information Google thinks it can "see" in the image. Users can view details for their images and public images. Users can delete their images.

Image Details page screenshot Upload Page screenshot Image List screenshot

I've included almost everything Google Vision returns. That currently includes logo recognition, face detection (not face recognition), text content, color detection, landmark detection, label detection (meaning the labels you would apply to items in the photo), and explicit content detection.

How it works: I've used Django on Amazon AWS but decided to put this project on the Microsoft Azure cloud hosting platform. The setup is slightly different and there isn't quite as much freedom as you would get from using a virtual machine but those limitations are fine by me.

I wanted to use as little front-end code as possible (for now) so that I could spend the time getting the project running and, by the end, be confident in my ability to create and deploy a production Django site on Azure. I am using Skeleton, "a dead simple, responsive boilerplate" so I can have a CSS/HTML grid system without writing it myself (I have one I use but it is in Sass, something I am avoiding in this project). Skeleton's CSS has requires Normalize.css so that is also included. I borrowed some colors and button styles from an existing project and that's all the front-end code I'm using except for this code block, which makes the images square on the image list page:

  var cards = document.querySelectorAll('.image-card');
  var numberOfCards = cards.length;
  for (var i = 0; i < numberOfCards; i++) {
    var card = cards[i];
    var width = card.getBoundingClientRect().width + 'px'; = width;
No jQuery, no Bootstrap, just Django, Skeleton, and Normalize, consuming Google Cloud Vision, deployed to Azure. Not something I would do in a professional capacity. If I continue to iterate the app, this will change.

The Cloud Vision API results will be changing in the future. An example of this is the category "Image Properties." It currently consists of one thing: a list of colors in the image. Because of this, I am saving the API's JSON response as a string instead of any normalization or translating the data to models.

What is next:


March 2016: Deployed initial version of the project. Login required to upload images. Public can view images marked "public." Al(most al)l returned information is listed on page. Seperate image for each category of results with a marked area (face, logo, text).