Introduction

This project delves into object detection models and classification models in order to make a facial mood detector system.

Modern day camera systems are pretty adapt at detecting faces as well as specifying other information like age, gender, mood based on the facial image. This project aims to develop such a system using the FastAI and Ultralytics library to get a better sense of the API callbacks and finetuning process for computer vision models

Goals

The primary objective is to design a machine learning model that can accurately detect faces in an image and categorize them as per the mood of the person.

This will be done using a two step architecture. An object detection module - YOLOv8 would be used to isolate the faces. Then ResNet34 architecture, a robust convolutional neural network (CNN) would be used to classify the mood based on the facial expressions.

Work Done

In this project, the YOLOv8 was used for object detection on the Wider Face dataset. The initial steps involved the preparation of dataset loaders specifically tailored for the dataset, splitting into training, validation, testing data. Labels were also arranged in the required format.

The model was finetuned for 10 epochs and was able to detect faces pretty accurately albeit with lower confidence metrics. A single step architecture could have been used which has different mood faces as its classes. This would have been faster but could have compromised on accuracy.

Then I imported the ResNet34 architecture, to serve as the backbone for the  classification model. This was used with a kaggle based facial expression recognition dataset. To facilitate fine-tuning, the last layers of ResNet34 were modified to align the moods -  Angry, Surprise, Disgust, Sad, Happy. The process begin with an initial fine-tuning stage where the learning rate was dynamically determined using the "get learning rate" step. The model was fine-tuned for three epochs, with all layers frozen except the last.

Following this, another learning rate determination step was executed, leading to a subsequent fine-tuning phase spanning five epochs in which all layers were unfrozen, leading to a 65% accuracy for the classification task.

Results

The detection had an mAP50 score of 0.58 while classification had 65% accuracy. The wrong predictions were visually confusing ones like an angry face that seemed to be a surprise emotion. Overall the model did pretty well.

A hugging face demo of the model running can be seen here. One can upload images and test the working of the model. The model was trained on a gpu and the inference is on CPU. So it takes some time to run the model.

Contact me!

πŸ“ž(404) 388-3944    πŸ“Atlanta, GA, 30309   πŸ“§bhushan.pawaskar@hotmail.com

Β© Copyright 2023 Bhushan Pawaskar - All Rights Reserved