Finished
Unbiased datasets generation
About The Project

Goals

Most of the existing facial datasets for machine learning consist of imbalanced distribution of classes. For example, a dataset could have 70% of its pictures belong to the class Male, or 80% of its pictures belong to the class White. This imbalance in datasets propagates implicit biases into facial detection tools against the least represented classes. This project's goal is to provide software that generates facial datasets depending on your needs. Consider the following scenario, AUB wants to move to an ID-less campus where students don't have to present their identification cards at the gate but instead will be identified through cameras while they enter. Unfortunately, the existing datasets to train such a model won't be sufficient, given that most of those datasets consist of White Females and Males and depict faces from one angle. Here, AUB gets the assistance of our startup, where we work on generating these datasets from existing video feeds or a small sample of faces. The process: All faces get extracted from the video feeds, making sure to extract all faces of all ethnicities and different clothing styles and from different perspectives. Given that we are going to extract every face picture we will have multiple pictures of the same face but from different perspectives and wearing different clothing. This problem will be solved using DBSCAN clustering which will result in clustering all the pictures of the same face into one cluster. All the extracted and clustered faces will be passed into multiple different models that will categorize them by age, gender, and ethnicity, resulting in distribution statistics over our desired classes (for example our current dataset has 20% males, and 30% black females and so on). Cases of data shortage will be evident, for example, as AUB may not be able to provide us with sufficient video feeds. However, small samples will be sufficient for us to generate fake pictures of faces using GAN models. These models will be built and tuned to generate fake pictures to augment or create most of the dataset. All our datasets will guarantee equal distribution among classes with the aim of preventing implicit biases in our data as our main goal is to build unbiased AI. Moreover, our startup will also provide support and consulting services for the clients while they are building their models, in this scenario, our scientists and engineers will provide help to AUB during the process of developing the model. In the future, we plan to expand into multiple kinds of data and multiple applications (textual data, speech data, medical data.. etc )

Challenges

One of the most prominent challenges is that the current existing datasets are not sufficient to build the required models. However, I have attempted before to build models that can categorize faces into ethnicities, age groups, and genders, and the preliminary results are promising! However, building the GAN models that will help in augmenting a dataset or creating most of it requires immense amounts of data that don't exist because we are going to require pictures of faces from all ethnicities, with and without Hijab, and from different angles and perspectives. This challenge could be overcome by building our own datasets by scrapping the internet and other existing open resources. Also, Semi-supervised learning might be a good way to label the immense amounts of data. Storing and training such datasets require significant computing power and days of training, but I believe utilizing AUB's supercomputer, other existing machine learning clouds, and GPUs will extremely help in shorting the cycle of training and testing allowing us to test multiple hypotheses.
Methods
  • Computer Science
  • Machine Learning
  • Deep Learning
Academic Majors of Interest
Academic Majors of Interest
  • Computer Science
  • Computer Engineering

Preferred Skills

  • Python
  • Machine Learning
  • Deep Learning
  • PyTorch
  • Tensorflow
  • Feature Engineering
  • MLOps
  • Data Processing
  • Web Scrapping

Join today

Apply to the project today, and join other students and faculty members.

Request Information