Skip to the content.

How to YOLO(v8)

Back to Vision Docs

YOLO (You Only Look Once) is a deep learning object detection algorithm family made by the Ultralytics company. It uses a convolutional neural network (CNN) to effectively identify objects based on their features.

There are different versions of YOLO. This documentation is for YOLOv8. Additionally, YOLOv8 has multiple available sizes: yolov8n (Nano), yolov8s (Small), yolov8m (Medium), yolov8l (Large), and yolov8x (Extra Large). The smallest size is the least accurate but is the fastest to train and run. As size increases the models get more accurate, but they also get slower.


  1. Python version team is using
    1. Currently 3.10.x
    2. Also need pip
  2. Python Pip packages needed (name — pip-package-name [opt])
    1. Numpy — numpy
    2. OpenCV — opencv-python
    3. Ultralytics (YOLOv8) — ultralytics
    4. PyTorch — torch
      1. PyTorch Vision — torchvision
      2. PyTorch Audio — torchaudio opt?
    5. SciPy — scipy opt
    6. matplotlib — matplotlib opt
    7. NOTE: if an importing error occurs saying that the _lzma package is missing, then you need to:
      1. run $ sudo apt install lzma liblzma-dev
      2. uninstall and reinstall python
        1. and redownload all needed packages
    8. Other packages approved/required for vision team and needed for completion of the task



To train a YOLO model, you need images to feed it.

A LOT of images.

Additionally, these images need to be annotated. Annotations are additional files that correspond to images. These files contain data about relevant objects in the image.

Each different type of object that the model needs to identify needs to be represented as a class. The dataset to be used for training needs to include: the definition of all the classes, the images, and a description of which objects are in each image and where they are.

The file hierarchy

(this is not strict, but let’s stick to one convention)

├── datasets/
|	|
|	└── dataset_name/
|		|
|		├── train/
|		|	|
|		|	├── images/
|		|	|	└── all_your_pics.jpg
|		|	|
|		|	└── labels/
|		|		└── all_your_pics.txt (same names as image files)
|		|
|		└── valid/
|			|
|			├── images/
|			|	└── different_pics.jpg
|			|
|			└── labels/
|				└── different_pics.txt (same names as image files)
└── dataset_name.yaml

The files

Labels (annotations)

File format:

<object-class> <x-center> <y-center> <width> <height>
<object-class> <x-center> <y-center> <width> <height>

Normalization formula:

    val - min_val
--------------------- = val_normalized
  max_val - min_val

Normalization is the process of taking any range of data (range of [min_val-max_val] for any min_val < max_val) and turning it into a range of [0-1]. When normalizing coordinates on images, min_val will be 0 (because the indicies of the pixels starts at 0), max_val will be the width-1 or length-1 of the image, for x and y pixel coordinates respectively, and val is the value to be normalized. The result will be a number between 0 and 1.

YAML file (dataset_name.yaml)

This is the file that specifies the location of the dataset, training data, validation data, and the classes of objects to will identify (with an index of the class)

# the dataset YAML file
path: dataset_name/
train: 'train/images'
val: 'valid/images'

# class names
  0: 'obj_class_0'
  1: 'obj_class_1'
  2: 'obj_class_2'
  .: 'obj_class_...'
  .: 'foo'
  .: 'obj_class_...'
  .: 'bar'
  .: 'obj_class_...'
  n: 'obj_class_n'


With a dataset created, the next step is to train a model.

from ultralytics import YOLO

# Load a model
model = YOLO('')

# Train the model
results = model.train(data='dataset_name.yaml', epochs=100, imgsz=640)


Using the Trained Model

The new model can be loaded, once it has been saved as a .pt file. Then the model can be passed an image; either as a numpy array or image path. The save=True and save_txt=True kwargs can be passed to the predict() function to have more verbose output saved to files in the project directory.

from ultralytics import YOLO

# load your trained model
model = YOLO("path/to/your/trained/")

# generate the results
results = model.predict(img)
