Using Microsoft Service for image recognition


To build a classifier, you must first have a valid Microsoft Account, so you can sign into and get started.

  • Note that you will be able to access your subscription keys once you have created your first project.
  • A series of images to train your classifier (minimum of 6 images per tag and a minimum of 2 tags).
  • A few images to test your classifier after the classifier is trained.

Example of identification of Cat Breeds using two tags per image (Cat) (Breed)

Dataset University of Oxford,

The Oxford-IIIT Pet Dataset

Omkar M Parkhi and Andrea Vedaldi and Andrew Zisserman and C. V. Jawahar
Overview of Data Set

We have created a 37 category pet dataset with roughly 200 images for each class. The images have a large variations in scale, pose and lighting. All images have an associated ground truth annotation of breed, head ROI, and pixel level trimap segmentation 

Getting Started: Build a Classifier

Custom Vision Service can be found by clicking here:

After you log into Custom Vision Service, you will be presented with a list of projects.

  1. Click New Project to create your first project.

  2. If this is your first project, you are asked to agree to the Terms of Service. Check the check box, then click the I agree button

The New Project dialog box appears.

There are several domains available, and each one optimizes the classifier for a specific type of images:

If none of the other domains are appropriate, or you are unsure of which domain to choose, select the Generic domain.

Optimized for photographs of dishes as you would see on a restaurant menu. If you want to classify photographs of individual fruits or vegetables, use the Generic domain for that purpose.

Optimized for recognizable landmarks, both natural and artificial. This domain works best when the landmark is clearly visible in the photograph, even if the landmark is slightly obstructed by a group of people posing in front of it.

Optimized for images found in a shopping catalog or shopping website. If you want high precision classifying between dresses, pants, and shirts, use this domain.

Optimized to better define between adult content and non-adult content. For example, if you want to block images of people in bathing suits, this domain allows you to build a custom classifier to do that.

You can change the domain later if you wish.

Enter a name for this project, a description of the project, and select one domain.


Add images to train your classifier.

Add some images to train your classifier. Let's say you want a classifier to distinguish between dogs and ponies. You would upload and tag at least 6 images.. Try to upload a variety of images with different camera angles, lighting, background, types, styles, groups, sizes, etc. We recommend variety in your photos to ensure your classifier is not biased in any way and can generalize well.

Note: Custom Vision Service accepts training images in JPG/JPEG, PNG, and BMP format, up to 6 MB per image (prediction images can be up to 4 MB per image). Images are recommended to be 256 pixels on the shortest edge. Any images shorter than 256 pixels on the shortest edge will be scaled up by Custom Vision Service and uptro a maximum of 1000 images.

a. Click Add images.


b. Browse to the location of your training images.

Note: You can use the REST API to load training images from URLs. The web app can only upload training images from your local computer.


c. Select the images for your first tag.

d. Click Open to open the selected images.

e. Assign tags: Type in the tag you want to assign, then press the + button to assign the tag. You can add more than one tag at a time to the images.


f. When you are done adding tags, click Upload [number] files. The upload could take some time if you have a large number of images or a slow Internet connection.

g. After the files have uploaded, click Done.


h. To load more images with a different set of tags, return to step a

  1. Train your classifier

After your images are uploaded, you are ready to train your classifier. All you have to do is click the Train button.


It should only take a few minutes to train your classifier.


DataSet upload 1000 images 6 different breads each image is tagged with TAG1. Cat and TAG2. Breed name.


Evaluate your classifier

The precision and recall indicators tell you how good your classifier is, based on automatic testing. Note that Custom Vision Service uses the images you submitted for training to calculate these numbers, using a process called k-fold cross validation.


Note: Each time you hit the "Train" button, you create a new iteration of your classifier. You can view all your old iterations in the Performance tab, and you can delete any that may be obsolete. When you delete an iteration, you end up deleting any images uniquely associated with it.

The classifier uses all the images to create a model that identifies each tag. To test the quality of the model, the classifier then tries each image on its model to see what the model finds.

The qualities of the classifier results are displayed

When you classify an image, how likely is your classifier to correctly classify the image? Out of all images used to train the classifier, what percent did the model get correct? 99 correct tags out of 100 images gives a Precision of 99%.

Out of all images that should have been classified correctly, how many did your classifier identify correctly? A Recall of 100% would mean, if there were 38 cat images in the images used to train the classifier, 38 cats were found by the classifier.

Running a quick test from


Utilising the Microsoft Intelligent Kiosk Windows 10 UWP app 



Custom Vision Docs

Getting Started with – creating a custom model for image recognition.