Setting up a machine vision camera for object detection

Principal Engineer

Cameras are everywhere these days, and with good reason.

It’s never been so easy or cheap to incorporate one into an IoT device, or leverage image-based data in a user experience, and image inference can unlock product and service possibilities that would’ve been unimaginable a few years ago. 

For developers, one of the biggest challenges of bringing image inference into a project is often identifying and setting up the right camera. For smartphone apps, this is less of an issue—if you’re developing for an iPhone, you know what camera you’ll be using, and there’s a robust ecosystem of software tools for doing so. But for IoT devices, augmented reality, or any other non-smartphone project that depends on object detection and tracking, the options are practically endless. This article will help you get started. 

One complicating factor is the lag between hardware and software. If you’re writing code for a project that uses image inference, you may find yourself needing good-quality image-based data now, but the camera(s) that the product eventually uses won’t be ready for months. In this case, it’s worth investing a bit of money and time into setting up a camera that’s designed specifically for machine vision.

A machine vision camera (like the Allied Vision Alvium 1800 U-120c we use in the example below) has several advantages over a webcam or smartphone camera. They’re far more configurable, for one thing: where a webcam might have a few settings, machine vision cameras can have over 100, offering granular control over every conceivable imaging sensor parameter. They provide high-resolution data with minimal fuss, and won’t try to automatically adjust color balance or image compression, as web- and smartphone cameras often do. They also come with a GPIO interface that enables things like image capture synchronization for multi-camera setups, and precisely timed lighting. 

Last but not least, machine cameras come with technical support, which can be critical if you’re in the middle of a week-long sprint and need to solve a problem immediately. We like Allied Vision in particular for their friendly, knowledgeable staff and tendency to respond in just a few hours.  

Moreover, machine vision cameras aren’t terribly expensive or complicated to use. Working with the technology team at Smart, I’ve set up and run several such cameras on AR and IoT projects, using just a few hundred dollars worth of hardware. To show you how straightforward this process is, I’ve listed out the details of such a system, including hardware, drivers, and operating environment. Once we’ve completed the setup process, I’ll also go through a simple exercise for having the system perform a quick YOLO (“You Only Look Once”) inference on an image.

Set up

For this example, we’ll be using the following hardware:

Alvium 1800 U-120c from Allied Vision


8mm f/1.8 C-Mount from Edmund Optics

UDOO V8 SBC (or any x64 Ubuntu 20.04 machine with GPU)

Jupyter Notebook
PyTorch (Includes Yolo5 support)
Vimba Python – A camera interface library for Python, provided by Allied Vision


The setup process looks like this:


1. Download drivers from here to your Downloads folder.

2. Place Vimba and place in /opt/

$ cd ~/Downloads
$ tar zxvf Vimba64_v6.0_Linux.tgz
$ sudo mv Vimba64_6_0 /opt/

3. Now run USB Transport Layer setup (so your machine can recognize the camera).

$ cd /opt/Vimba_6_0/VimbaUSBTL
$ sudo chmod +x
$ sudo ./

4. Reboot your machine.

5. Now that Vimba is setup, lets check if the camera is working properly.

$ cd /opt/Vimba_6_0/Tools/Viewer/Bin/x86_64bit/VimbaViewer
$ ./VimbaViewer
You will see a vimba viewer window listing available cameras:

You can now click on available camera and press the play button to see camera video feed.

This is a handy utility to get your camera up and running.  It also provides some controls on the right, if you want to test things out really quickly.

6. Set up and activate the Python virtual environment and required packages. 

We are going to be using a virtual environment. You can create one using virtualenv or conda. We use pyenv to manage Python version on our development machines, so the Python we will be using for this is located in:


Note that if Python is setup differently on your machine, you can replace this with an appropriate command.

$ ~/.pyenv/viersions/3.8.0/bin/python -m venv ~/.venvs/python380

The above command will execute a venv module and create a virtual environment which will be located in: 


Now we can activate the environment by executing the following:

$ source ./venvs/Python380/bin/activate

There are other tools for working with virtual environments. We prefer the above method for linux machines, especially when working with Docker. On MacOS and Windows, minoconda3 can be a good tool for this purpose.

7. Now install the Vimba Python module into your virtual environment.

$ pip install /opt/Vimba_6_0/VimbaPython/Source/xxx

8. Check if the library is installed.

$ Python
$ >>> import vimba
$ >>>

9. We can see that Vimba installed successfully, so we are ready to go.


Now that the camera is installed, we can get it ready to do some YOLO image inference. In this case, we’ll have our system look at an object and determine whether it’s a coffee cup or not. In the process, you’ll learn how to discover camera settings and image formats, and to quickly adjust the camera so that you can run a simple inference.

1. Start by importing the Vimba library that we just installed.

from vimba import *

2. We will need numpy too, since image data is stored using numpy arrays.

import numpy as np

3. Last is the opencv library for image manipulation.

import cv2

4. You’ll also need to import pytorch so you can run a quick inference test at the end.

import torch


Next, we will do a quick investigation to see what features of the camera are available. The Vimba library allows us to list all of the camera’s available settings. Keep in mind that these could be different depending on which model camera you have.

Vimba is written in a very Python-ic way, so all operations on the camera are done using context manager. This means that you as a user don’t have to fuss with opening and closing the camera manually every time you use it.

5. Start by getting an instance of Vimba module:

with Vimba.get_instance() as vimba:

6. Now
get a list of cameras. Since we only have one camera, we don’t have to worry about selecting the right one, but Vimba allows us to select specific cameras based on different parameters.

cams = vimba.get_all_cameras()

7. Now list all the features (i.e. settings) that this particular camera might have. We will loop over the entire list of cameras—which in this case includes just one camera—then list its features.

with cams[0] as cam: 
for feature in cam.get_all_features():
      value = feature.get()
    except (AttributeError, VimbaFeatureError)
      value = None

8. Next, print the features and values depending on their availability.

    print(f"Feature name: {feature.get_name()}")
    print(f"Display name: {feature.get_display_name()}")
      if not value == None:
      if not feature.get_unit() == '':
        print(f"Unit: {feature.get_unit()}", end = ' ')
        print(f"Not set")

The resulting list of settings is quite long. We can output this list to a file so we can study it in detail.

Here is just a few lines:
Found 1 camera(s)
Feature name: AcquisitionFrameCount
Display name: Acquisition Frame Count
Not set
Feature name: AcquisitionFrameRate
Display name: Acquisition Frame Rate
Unit: Hz value=40.9975471496582
Feature name: AcquisitionFrameRateEnable
Display name: Acquisition Frame Rate Enable

This camera has over 100 different settings you can adjust. 
Next, we need to look at what image formats are available for this camera. Since our goal is to do image processing using OpenCV, it’s best to get an image that is compatible with OpenCV.  The Vimba library offers methods to help us find out what available image formats are compatible with OpenCV.
9. Get the list of cameras, as before.
with Vimba.get_instance() as vimba:
    cams = vimba.get_all_cameras()
10. Now, list the available formats, then ask Vimba to tell us which formats can be used in OpenCV directly.
    with cams[0] as cam:
        formats = cam.get_pixel_formats()
        opencv_formats = intersect_pixel_formats(formats, OPENCV_PIXEL_FORMATS)
    print(f"Available formats:")
    for i, format in enumerate(formats):
        print(i, format)

    print(f"\nOpencv compatible formats:")
    for i, format in enumerate(opencv_formats):
        print(i, format)

This will first give us a list of all image formats available.
Available formats:
0 Mono8
1 Mono10
2 Mono10p
3 Mono12
4 Mono12p
5 BayerGR8
6 BayerGR10
7 BayerGR12
8 BayerGR10p
9 BayerGR12p
10 Rgb8
11 Bgr8
12 YCbCr411_8_CbYYCrYY
13 YCbCr422_8_CbYCrY
14 YCbCr8_CbYCr
Then it tells us which formats can be used with OpenCV directly.
Opencv compatible formats:
0 Mono8
1 Bgr8
11. Now that we have this info, let’s tell the camera which format we would like to use.
with Vimba.get_instance() as vimba:
    cams = vimba.get_all_cameras()
We will use BGR8 format. Keep in mind that this format performs debayering on the camera, which decreases frame rate. So if frame rate is important, we will need to use a different format and perform debayering on the host computer.
12. Before taking an image, set the exposure (which we can look up in the list of options) to 2 milliseconds or 2/1000 of a second. This is just an estimate, to use as a starting point.
with Vimba.get_instance() as vimba:
    cams = vimba.get_all_cameras()
    with cams[0] as cam:
        # Will also set expeosure to 20000us i.e. 20 milliseconds        
        exposure_time = cam.ExposureTime
        print(f"Exposure changed to: {(exposure_time.get()/1000):.0f} ms")
Now we can grab an image.
13. First, get the camera instance, as usual.
with Vimba.get_instance() as vimba :
    cams = vimba.get_all_cameras()   
14. Then grab a frame.
    with cams[0] as cam:
        frame = cam.get_frame().as_opencv_image()
15. Now convert it to RGB so we can display the frame as an image.
        rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
16. And now display the frame. Note that, since this article is focused on how to use the camera, the code for image display is omitted, but feel free to look it up in the repository.
show_image_with_histogram(rgb, 5)
Here is the result, which is too dark.

17. Let’s reset it with a slightly longer exposure…

with Vimba.get_instance() as vimba:
    cams = vimba.get_all_cameras()
    with cams[0] as cam:
        # Will also set exposure to 20000us i.e. 20 milliseconds        
        exposure_time = cam.ExposureTime
        print(f"Exposure changed to: {(exposure_time.get()/1000):.0f} ms")

18. …and then run image capture again.

with Vimba.get_instance() as vimba :
    cams = vimba.get_all_cameras()   
    with cams[0] as cam:
        # We can now aquire single frame from camera and have it changed to 
        # opencv compatible format i.e. a properly shaped numpy array.
        frame = cam.get_frame().as_opencv_image()
        # Change image format to RGB for display
        rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
show_image_with_histogram(rgb, 5)

This is much better, but you might notice that the image is very red. If you look at the histogram inset, you can see that the red curve is shifted to the right of the blue and the green, indicating that there’s an excess of red pixels. In this case, it’s probably because this camera has some previous adjustments to the color balance, but in practice there are a variety of reasons this can happen.

19. We need to ask the camera to reset the white balance, to get a more reasonable color representation.

with Vimba.get_instance() as vimba:
    cams = vimba.get_all_cameras()  
        wba = cam.BalanceWhiteAuto
        print(f"White balance is set to: {(wba.get())}")
        print(f"Changed white balance to: {(wba.get())}")

20. We will need to capture two images, since the camera performs white balance correction using data from captured image.

with Vimba.get_instance() as vimba :
    cams = vimba.get_all_cameras()   
    with cams[0] as cam:
        # We can now aquire single frame from camera and have it changed to 
        # opencv compatible format i.e. a properly shaped numpy array.
        frame = cam.get_frame().as_opencv_image()
        # Change image format to RGB for display
        rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
show_image_with_histogram(rgb, 5)
This looks much better. This is confirmed by the histogram, which shows red, green, and blue appearing in roughly equal proportions.

So now that our camera is setup and we can get image data directly into Python, let’s run a quick test and see if we can perform a simple inference using PyTorch.

21. We will need to set up some fonts so we can annotate images.

font_scale = 2
font_color = (0,255,0)
font_thick = 4

22. Next, get a Yolov5 pre-trained model from ultralytics.

model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

23. Check what classes are included in this model.

classes = model.names
If we output all of the classes, we see that cups is one of the objects that is included in this dataset — this is why we’ve chosen to take pictures of coffee mugs.

24. Now acquire an image and convert it to RGB.

with Vimba.get_instance() as vimba :
    cams = vimba.get_all_cameras()   
    with cams[0] as cam:
        # We can now aquire single frame from camera and have it changed to 
        # opencv compatible format i.e. a properly shaped numpy array.
        frame = cam.get_frame().as_opencv_image()
        # Change image format to RGB for display
        rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

25. Then run inference on the acquired image.

        results = model(rgb)

26. Iterate over the inference results to annotate images, in case anything was recognized.

if len(results.xyxy[0]) > 0:
    for result in results.xyxy[0]:
        # print(f"{result}")
        pt1 = (int(result[0]), int(result[1]))
        pt_text_place = (int(result[0]), int(result[1])+100)
        pt2 = (int(result[2]), int(result[3]))
        rgb = cv2.rectangle(rgb, pt1, pt2, (0,0,255), 5)
        # Add class name to image
        rgb = cv2.putText(rgb,

27. Finally, display the annotated image.

show_image_with_histogram(rgb, 4, False)

The possibilities of machine vision and image inference are enormous, and as you can see, it’s surprisingly straightforward to set up. So whether you’re a veteran developer or someone just getting started in tech, this is a capability that’s accessible to you. 

A relatively inexpensive camera and some readily available libraries can let you pull usable data from just about anything on earth that you can point a camera at. The bigger challenge is what comes next: to turn all that data into a product or user experience that’s genuinely useful. 

About Boris Kontorovich

Boris Kontorovich is a system engineer with a deep understanding of the technology stack required to make some of the most advanced products in use today. Boris brings expertise in integrating technical knowledge and engineering with creative discipline. His most notable clients include NASA, JPL, Sirona Dental, PepsiCo, Starbucks, and UCB.

Let’s design a smarter world together