How to YOLO(v8)
YOLO (You Only Look Once) is a deep learning object detection algorithm family made by the Ultralytics company. It uses a convolutional neural network (CNN) to effectively identify objects based on their features.
There are different versions of YOLO. This documentation is for YOLOv8. Additionally, YOLOv8 has multiple available sizes: yolov8n (Nano), yolov8s (Small), yolov8m (Medium), yolov8l (Large), and yolov8x (Extra Large). The smallest size is the least accurate but is the fastest to train and run. As size increases the models get more accurate, but they also get slower.
Dependencies
- Python version team is using
- Currently 3.10.x
- Also need pip
- Python Pip packages needed (name —
pip-package-name
[opt])- Numpy —
numpy
- OpenCV —
opencv-python
- Ultralytics (YOLOv8) —
ultralytics
- PyTorch —
torch
- PyTorch Vision —
torchvision
- PyTorch Audio —
torchaudio
opt?
- PyTorch Vision —
- SciPy —
scipy
opt - matplotlib —
matplotlib
opt - NOTE: if an importing error occurs saying that the
_lzma
package is missing, then you need to:- run
$ sudo apt install lzma liblzma-dev
- uninstall and reinstall python
- and redownload all needed packages
- run
- Other packages approved/required for vision team and needed for completion of the task
- Numpy —
Training
Datasets
To train a YOLO model, you need images to feed it.
A LOT of images.
Additionally, these images need to be annotated. Annotations are additional files that correspond to images. These files contain data about relevant objects in the image.
Each different type of object that the model needs to identify needs to be represented as a class. The dataset to be used for training needs to include: the definition of all the classes, the images, and a description of which objects are in each image and where they are.
The file hierarchy
(this is not strict, but let’s stick to one convention)
your_project/
|
├── datasets/
| |
| └── dataset_name/
| |
| ├── train/
| | |
| | ├── images/
| | | └── all_your_pics.jpg
| | |
| | └── labels/
| | └── all_your_pics.txt (same names as image files)
| |
| └── valid/
| |
| ├── images/
| | └── different_pics.jpg
| |
| └── labels/
| └── different_pics.txt (same names as image files)
|
└── dataset_name.yaml
The files
Images
- Sets of pictures that contain some/all of the objects to identify
- Use varied pictures for best results
Labels (annotations)
- One file per image (must have same name as image file)
- One line in the file per object in the corresponding image
File format:
<object-class> <x-center> <y-center> <width> <height>
<object-class> <x-center> <y-center> <width> <height>
.
.
.
object-class
- The index (0 - n) of the class in
dataset_name.yaml
(see below)
- The index (0 - n) of the class in
x-center
,y-center
- x and y coordinates of the center of a minimum upright bounding box around the object in the image
- Normalized to [0, 1]
width
,height
- width and height of the minimum bounding box around the object in the image
- Normalized to [0, 1]
Normalization formula:
val - min_val
--------------------- = val_normalized
max_val - min_val
Normalization is the process of taking any range of data (range of [min_val-max_val] for any min_val < max_val) and turning it into a range of [0-1]. When normalizing coordinates on images, min_val will be 0 (because the indicies of the pixels starts at 0), max_val will be the width-1 or length-1 of the image, for x and y pixel coordinates respectively, and val is the value to be normalized. The result will be a number between 0 and 1.
YAML file (dataset_name.yaml
)
This is the file that specifies the location of the dataset, training data, validation data, and the classes of objects to will identify (with an index of the class)
# the dataset YAML file
path: dataset_name/
train: 'train/images'
val: 'valid/images'
# class names
names:
0: 'obj_class_0'
1: 'obj_class_1'
2: 'obj_class_2'
.: 'obj_class_...'
.: 'foo'
.: 'obj_class_...'
.: 'bar'
.: 'obj_class_...'
n: 'obj_class_n'
Training
With a dataset created, the next step is to train a model.
from ultralytics import YOLO
# Load a model
model = YOLO('yolov8n.pt')
# Train the model
results = model.train(data='dataset_name.yaml', epochs=100, imgsz=640)
Breakdown:
yolov8n.pt
- This specifies the model to use. YOLOv8n is the nano model - other sizes can be used and may be slower but more accurate
dataset_name.yaml
- This is the previously created YAML file that specifies information about the training dataset
- epochs
- This is the number of times the model will train on the input data
- More epochs will take longer but should hopefully result in a more accurate model
- imgsz
- The max of the length and width of the input images
- If images are not this size, then they will automatically be resized, preserving the aspect ratio of objects in the image
- the picture will not be stretched, ie. circles will stay circles
- Other parameters
- There are other optional parameters that can be passed to this function to control aspects of the training
- These parameters can be seen on Ultralytics’s Docs
Using the Trained Model
The new model can be loaded, once it has been saved as a .pt
file. Then the model can be passed an image; either as a numpy array or image path. The save=True
and save_txt=True
kwargs can be passed to the predict()
function to have more verbose output saved to files in the project directory.
from ultralytics import YOLO
# load your trained model
model = YOLO("path/to/your/trained/model.pt")
# generate the results
results = model.predict(img)
References
- YOLO Tutorial
- Ultralytics YOLO Docs
- Data Augmentation Whitepaper For future reference
- Wikipedia CNN Page