”工欲善其事,必先利其器。“—孔子《论语.录灵公》
首页 > 编程 > 使用 VGG 进行人脸和性别识别

使用 VGG 进行人脸和性别识别

发布于2024-11-12
浏览:355

Using VGGfor face and gender recognition

How to build a face and gender recognition Python project using deep learning and VGG16.

What is deep learning?

Deep learning is a subcategory of machine learning, a neural network with three or more layers. These neural networks try to simulate the behavior of the human brain by learning from large amounts of data. While a neural network with a single layer can still make approximate predictions, additional hidden layers can help to optimize and refine for accuracy.

Deep learning improves automation by performing tasks without human intervention. Deep learning can be found in digital assistants, voice-enabled TV remotes, credit card fraud detection, and self-driving cars.

Building the Python project

** Check out the full code on GitHub: https://github.com/alexiacismaru/face-recognision

Download the VGG16 Face Dataset and the Haar Cascade XML file used for face detection which will be used for the preprocessing in the face recognition task.

faceCascade = cv2.CascadeClassifier(os.path.join(base_path, "haarcascade_frontal_face_default.xml")) # haar cascade detects faces in images

vgg_face_dataset_url = "http://www.robots.ox.ac.uk/~vgg/data/vgg_face/vgg_face_dataset.tar.gz"

with request.urlopen(vgg_face_dataset_url) as r, open(os.path.join(base_path, "vgg_face_dataset.tar.gz"), 'wb') as f:
  f.write(r.read())

# extract VGG dataset
with tarfile.open(os.path.join(base_path, "vgg_face_dataset.tar.gz")) as f:
  f.extractall(os.path.join(base_path))

# download Haar Cascade for face detection
trained_haarcascade_url = "https://raw.githubusercontent.com/opencv/opencv/master/data/haarcascades/haarcascade_frontalface_default.xml"
with request.urlopen(trained_haarcascade_url) as r, open(os.path.join(base_path, "haarcascade_frontalface_default.xml"), 'wb') as f:
    f.write(r.read())

Selectively load and process a specific number of images for a set of predefined subjects from the VGG Face Dataset.

# populate the list with the files of the celebrities that will be used for face recognition
all_subjects = [subject for subject in sorted(os.listdir(os.path.join(base_path, "vgg_face_dataset", "files"))) if subject.startswith("Jesse_Eisenberg") or subject.startswith("Sarah_Hyland") or subject.startswith("Michael_Cera") or subject.startswith("Mila_Kunis") and subject.endswith(".txt")]

# define number of subjects and how many pictures to extract
nb_subjects = 4
nb_images_per_subject = 40

Iterate through each subject’s file by opening a text file associated with the subject and reading the contents. Each line in these files contains a URL to an image. For each URL (which points to an image), the code tries to load the image using urllib and convert it into a NumPy array.

images = []

for subject in all_subjects[:nb_subjects]:
  with open(os.path.join(base_path, "vgg_face_dataset", "files", subject), 'r') as f:
    lines = f.readlines()

  images_ = []
  for line in lines:
    url = line[line.find("http://"): line.find(".jpg")   4]

    try:
      res = request.urlopen(url)
      img = np.asarray(bytearray(res.read()), dtype="uint8")
      # convert the image data into a format suitable for OpenCV
      # images are colored 
      img = cv2.imdecode(img, cv2.IMREAD_COLOR)
      h, w = img.shape[:2]
      images_.append(img)
      cv2_imshow(cv2.resize(img, (w // 5, h // 5)))

    except:
      pass

    # check if the required number of images has been reached
    if len(images_) == nb_images_per_subject:
      # add the list of images to the main images list and move to the next subject
      images.append(images_)
      break

Face detection set up

Using VGGfor face and gender recognition

  1. Locate one or more faces in the image and put it in a box.
  2. Make sure the face is consistent with the database, such as geometry and photometrics.
  3. Extract features from the face that can be used for the recognition task.
  4. Match the face to one or more known faces in a prepared database.
# create arrays for all 4 celebrities
jesse_images = []
michael_images = []
mila_images = []
sarah_images = []

faceCascade = cv2.CascadeClassifier(os.path.join(base_path, "haarcascade_frontalface_default.xml"))

# iterate over the subjects
for subject, images_ in zip(all_subjects, images):

  # create a grayscale copy to simplify the image and reduce computation
  for img in images_:
    img_ = img.copy()
    img_gray = cv2.cvtColor(img_, cv2.COLOR_BGR2GRAY)
    faces = faceCascade.detectMultiScale(
        img_gray,
        scaleFactor=1.2,
        minNeighbors=5,
        minSize=(30, 30),
        flags=cv2.CASCADE_SCALE_IMAGE
    )
    print("Found {} face(s)!".format(len(faces)))

    for (x, y, w, h) in faces:
        cv2.rectangle(img_, (x, y), (x w, y h), (0, 255, 0), 10)

    h, w = img_.shape[:2]
    resized_img = cv2.resize(img_, (224, 224))
    cv2_imshow(resized_img)

    if "Jesse_Eisenberg" in subject:
        jesse_images.append(resized_img)
    elif "Michael_Cera" in subject:
        michael_images.append(resized_img)
    elif "Mila_Kunis" in subject:
        mila_images.append(resized_img)
    elif "Sarah_Hyland" in subject:
        sarah_images.append(resized_img)

The detectMultiScale method recognizes faces in the image. It then returns the coordinates of rectangles where it believes faces are located. For each face, a rectangle is drawn around it in the image, indicating the face’s location. Each image is resized to 224x224 pixels.

Split the dataset into a training and validation set:

  • The training set is used to train the machine learning model. It’s used to learn the patterns, features, and relationships within the data. The model adjusts its parameters to minimize errors in predictions or classifications made on the training data.
  • The validation set evaluates the model’s performance on a new set of data. This helps in checking how well the model generalizes to unseen data. The validation set should be an independent set that is not used during the training of the model(s). Mixing/using information from the validation set during training can lead to skewed results.
# create directories for saving faces
for person in ['train/male', 'train/female', 'valid/male', 'valid/female']:
  os.makedirs(os.path.join(base_path, "faces", person), exist_ok=True)
# 'exist_ok=True' parameter allows the function to run without error even if some directories already exist

def split_images(images, train_size):
    training_images = images[:train_size]
    validation_images = images[train_size:train_size   10]
    return training_images, validation_images

michael_training, michael_testing = split_images(michael_images, 20)
mila_training, mila_testing = split_images(mila_images, 20)

jesse_testing = jesse_images[:10]
sarah_testing = sarah_images[:10]

# Save the pictures to an individual filename
def save_faces(images, directory, firstname, lastname):
    for i, img in enumerate(images):
        filename = os.path.join(base_path, "faces", directory, f"{firstname}_{lastname}_{i}.jpg")
        cv2.imwrite(filename, img)

# Save the split images
save_faces(michael_training, 'train/male', 'Michael', 'Cera')
save_faces(michael_testing, 'valid/male', 'Michael', 'Cera')
save_faces(mila_training, 'train/female', 'Mila', 'Kunis')
save_faces(mila_testing, 'valid/female', 'Mila', 'Kunis')

# Since Jesse and Sarah are only for testing, save them directly to the test directory
save_faces(jesse_testing, 'valid/male', 'Jesse', 'Eisenberg')
save_faces(sarah_testing, 'valid/female', 'Sarah', 'Hyland')

Data Augmentation

The accuracy of deep learning models depends on the quality, quantity, and contextual meaning of training data. This is one of the most common challenges in building deep learning models and it can be costly and time-consuming. Companies use data augmentation to reduce dependency on training examples to build high-precision models quickly.

Data augmentation means artificially increasing the amount of data by generating new data points from existing data. This includes adding minor alterations to data or using machine learning models to generate new data points in the latent space of original data to amplify the dataset.

Synthetics represent artificially generated data without using real-world images and it’s produced by Generative Adversarial Networks.

Augmented derives from original images with some sort of minor geometric transformations (such as flipping, translation, rotation, or the addition of noise) to increase the diversity of the training set.

pipeline_male = Augmentor.Pipeline(source_directory='/content/sample_data/deep_learning_assignment/faces/train/male', output_directory='/content/sample_data/deep_learning_assignment/faces/train_augmented/male')
pipeline_male.flip_left_right(probability=0.7)
pipeline_male.rotate(probability=0.7, max_left_rotation=10, max_right_rotation=10)
pipeline_male.greyscale(probability=0.1)
pipeline_male.sample(50)

pipeline_female = Augmentor.Pipeline(source_directory='/content/sample_data/deep_learning_assignment/faces/train/female', output_directory='/content/sample_data/deep_learning_assignment/faces/train_augmented/female')
pipeline_female.flip_left_right(probability=0.7)
pipeline_female.rotate(probability=0.7, max_left_rotation=10, max_right_rotation=10)
pipeline_female.greyscale(probability=0.1)
pipeline_female.sample(50)

Data augmentation improves the performance of ML models through more diverse datasets and reduces operation costs related to data collection:

  • Flip Left-Right: Images are randomly flipped horizontally with a probability of 0.7. This simulates the variation due to different orientations of subjects in images.
  • Rotation: The images are rotated slightly (up to 10 degrees in both directions) with a probability of 0.7. This adds variability to the dataset by simulating different head poses.
  • Greyscale Conversion: With a probability of 0.1, the images are converted to greyscale. This ensures the model can process and learn from images irrespective of their color information.
  • Sampling: The sample(50) method generates 50 augmented images from the original set. This expands the dataset, providing more data for the model to learn from.

Implementing the VGG16 model

VGG16 is a convolutional neural network widely used for image recognition. It is considered to be one of the best computer vision model architectures. It consists of 16 layers of artificial neurons that process the image incrementally to improve accuracy. In VGG16, “VGG” refers to the Visual Geometry Group of the University of Oxford, while “16” refers to the network’s 16 weighted layers.

VGG16 is used for image recognition and classification of new images. The pre-trained version of the VGG16 network is trained on over one million images from the ImageNet visual database. VGG16 can be applied to determine whether an image contains certain items, animals, plants, and more.

VGG16 architecture

Using VGGfor face and gender recognition

There are 13 convolutional layers, five Max Pooling layers, and three Dense layers. This results in 21 layers with 16 weights, meaning it has 16 learnable parameter layers. VGG16 takes input tensor size as 224x244. The model focuses on having convolution layers of a 3x3 filter with stride 1. It always uses the same padding with a maxpool layer of 2x2 filter of stride 2.

Conv-1 Layer has 64 filters, Conv-2 has 128 filters, Conv-3 has 256 filters, Conv 4 and Conv 5 have 512 filters, and three fully connected layers where the first two have 4096 channels each, the third performs 1000-way ILSVRC classification and contains 1000 channels (one for each class). The final layer is the soft-max layer.

Start preparing the base model.

base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# Set the layers of the base model to be non-trainable
for layer in base_model.layers:
    layer.trainable = False

To make sure that the model will classify the images correctly, we need to extend the model with additional layers.

x = base_model.output
x = GlobalAveragePooling2D()(x)

# dense layers
x = Dense(1024, activation='relu')(x)
x = Dense(512, activation='relu')(x)
x = Dense(256, activation='relu')(x)
# add a logistic layer for binary classification
x = Dense(1, activation='sigmoid')(x)

model = Model(inputs=base_model.input, outputs=x)

The Global Average Pooling 2D layer condenses the feature maps obtained from VGG16 into a single 1D vector per map. It simplifies the output and reduces the total number of parameters, aiding in the prevention of overfitting.

The Dense layers are a sequence of fully connected (Dense) layers that are added. Each layer contains a specified number of units (1024, 512, and 256), chosen based on common practices and experimentation. These layers further process the features extracted by VGG16.

The final Dense layer (the Output layer) uses sigmoid activation suitable for binary classification (our two classes being ‘female’ and ‘male’).

Adam Optimization

The Adam Optimization algorithm is an extension of the stochastic gradient descent procedure to update network weights iterative based on training data. The method is efficient when working with large problems involving a lot of data or parameters. It requires less memory and is efficient.

This algorithm combines two gradient descent methodologies: momentum and Root Mean Square Propagation (RMSP).

Momentum is an algorithm used to help accelerate the gradient descent algorithm using the exponentially weighted average of the gradients.

Using VGGfor face and gender recognition

Root mean square prop is an adaptive learning algorithm that tries to improve the AdaGrad by taking the “exponential moving average”.

Using VGGfor face and gender recognition

Since mt and vt have both initialized as 0 (based on the above methods), it is observed that they gain a tendency to be ‘biased towards 0’ as both β1 & β2 ≈ 1. This Optimizer fixes this problem by computing ‘bias-corrected’ mt and vt. This is also done to control the weights while reaching the global minimum to prevent high oscillations when near it. The formulas used are:

Using VGGfor face and gender recognition

Intuitively, we are adapting to the gradient descent after every iteration so that it remains controlled and unbiased throughout the process, hence the name Adam.

Now, instead of our normal weight parameters mt and vt, we take the bias-corrected weight parameters (m_hat)t and (v_hat)t. Putting them into our general equation, we get:

Using VGGfor face and gender recognition

Source: Geeksforgeeks, https://www.geeksforgeeks.org/adam-optimizer/

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

Set up image data preprocessing, augmentation, and model training in a deep learning context.

train_datagen = ImageDataGenerator()

training_set = train_datagen.flow_from_directory(
    '/content/sample_data/deep_learning_assignment/faces/train_augmented',
    batch_size=30,
    class_mode='binary'
)

validation_datagen = ImageDataGenerator() # used for real-time data augmentation and preprocessing
# generates batches of tensor image data with real-time data augmentation 

validation_set = validation_datagen.flow_from_directory(
    '/content/sample_data/deep_learning_assignment/faces/valid',
    batch_size=30,
    class_mode='binary'
)

model.fit(training_set, epochs=10, validation_data=validation_set)
  • epochs: the number of epochs specifies how much the entire training dataset will be passed forward and backward through the neural network. The model will go through the training data 10 times. An epoch is one complete presentation of the data set to be learned to a learning machine.
  • batch_size: this parameter defines the number of samples that are propagated through the network at one time. Here, we are using a batch size of 30, meaning the model will take 30 images at a time, process them, update the weights, and then proceed to the next batch of 30 images.

The model’s performance is evaluated by making predictions on the validation set. This gives an idea of how well the model performs unseen data. A threshold is applied to these predictions to classify each image into one of two classes (“male” or “female”).

# Evaluate the model on the validation set
validation_loss, validation_accuracy = model.evaluate(validation_set)

print(f"Validation Accuracy: {validation_accuracy * 100:.2f}%")
print(f"Validation Loss: {validation_loss}")

# Make predictions on the validation set
validation_predictions = model.predict(validation_set)

# Apply threshold to determine class
threshold = 0.5
predicted_classes = (validation_predictions > threshold).astype(int)

# Display the predicted classes along with image names
for i in range(len(validation_set.filenames)):
    filename = validation_set.filenames[i]
    prediction = predicted_classes[i][0]  # Binary predictions, extract single value

    class_name = 'male' if prediction == 0 else 'female'
    print(f"Image: {filename}, Predicted Class: {class_name}\n")

Create a confusion matrix to visualize the accuracy.

actual_labels = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
predictions =   [1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1]

cm = confusion_matrix(actual_labels, predictions)

sns.heatmap(cm, annot=True, fmt='d')
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()

For binary classification, the Receiver Operating Characteristic (ROC) curve and Area Under Curve (AUC) are useful to understand the trade-offs between true positive rate and false positive rate.

fpr, tpr, thresholds = roc_curve(actual_labels, predictions)
roc_auc = auc(fpr, tpr)

plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic')
plt.legend(loc="lower right")
plt.show()

Conclusion

In conclusion, by using deep learning and image processing algorithms you can build a Python project that recognizes human faces and can categorize them as either male or female.

版本声明 本文转载于:https://dev.to/alexia_cismaru_f49ed201c2/using-vgg16-for-face-and-gender-recognition-38a2?1如有侵犯,请联系[email protected]删除
最新教程 更多>
  • 图片在Chrome中为何仍有边框?`border: none;`无效解决方案
    图片在Chrome中为何仍有边框?`border: none;`无效解决方案
    在chrome 在使用Chrome and IE9中的图像时遇到的一个频繁的问题是围绕图像的持续薄薄边框,尽管指定了图像,尽管指定了;和“边境:无;”在CSS中。要解决此问题,请考虑以下方法: Chrome具有忽略“ border:none; none;”的已知错误,风格。要解决此问题,请使用以下...
    编程 发布于2025-07-13
  • 如何使用Python理解有效地创建字典?
    如何使用Python理解有效地创建字典?
    在python中,词典综合提供了一种生成新词典的简洁方法。尽管它们与列表综合相似,但存在一些显着差异。与问题所暗示的不同,您无法为钥匙创建字典理解。您必须明确指定键和值。 For example:d = {n: n**2 for n in range(5)}This creates a dicti...
    编程 发布于2025-07-13
  • 为什么不````''{margin:0; }`始终删除CSS中的最高边距?
    为什么不````''{margin:0; }`始终删除CSS中的最高边距?
    在CSS 问题:不正确的代码: 全球范围将所有余量重置为零,如提供的代码所建议的,可能会导致意外的副作用。解决特定的保证金问题是更建议的。 例如,在提供的示例中,将以下代码添加到CSS中,将解决余量问题: body H1 { 保证金顶:-40px; } 此方法更精确,避免了由全局保证金重置引...
    编程 发布于2025-07-13
  • 版本5.6.5之前,使用current_timestamp与时间戳列的current_timestamp与时间戳列有什么限制?
    版本5.6.5之前,使用current_timestamp与时间戳列的current_timestamp与时间戳列有什么限制?
    在时间戳列上使用current_timestamp或MySQL版本中的current_timestamp或在5.6.5 此限制源于遗留实现的关注,这些限制需要对当前的_timestamp功能进行特定的实现。 创建表`foo`( `Productid` int(10)unsigned not n...
    编程 发布于2025-07-13
  • 为什么PHP的DateTime :: Modify('+1个月')会产生意外的结果?
    为什么PHP的DateTime :: Modify('+1个月')会产生意外的结果?
    使用php dateTime修改月份:发现预期的行为在使用PHP的DateTime类时,添加或减去几个月可能并不总是会产生预期的结果。正如文档所警告的那样,“当心”这些操作的“不像看起来那样直观。 ; $ date->修改('1个月'); //前进1个月 echo $ date->...
    编程 发布于2025-07-13
  • 我可以将加密从McRypt迁移到OpenSSL,并使用OpenSSL迁移MCRYPT加密数据?
    我可以将加密从McRypt迁移到OpenSSL,并使用OpenSSL迁移MCRYPT加密数据?
    将我的加密库从mcrypt升级到openssl 问题:是否可以将我的加密库从McRypt升级到OpenSSL?如果是这样,如何?答案:是的,可以将您的Encryption库从McRypt升级到OpenSSL。可以使用openssl。附加说明: [openssl_decrypt()函数要求iv参...
    编程 发布于2025-07-13
  • 如何在无序集合中为元组实现通用哈希功能?
    如何在无序集合中为元组实现通用哈希功能?
    在未订购的集合中的元素要纠正此问题,一种方法是手动为特定元组类型定义哈希函数,例如: template template template 。 struct std :: hash { size_t operator()(std :: tuple const&tuple)const {...
    编程 发布于2025-07-13
  • Java数组中元素位置查找技巧
    Java数组中元素位置查找技巧
    在Java数组中检索元素的位置 利用Java的反射API将数组转换为列表中,允许您使用indexof方法。 (primitives)(链接到Mishax的解决方案) 用于排序阵列的数组此方法此方法返回元素的索引,如果发现了元素的索引,或一个负值,指示应放置元素的插入点。
    编程 发布于2025-07-13
  • 如何修复\“常规错误:2006 MySQL Server在插入数据时已经消失\”?
    如何修复\“常规错误:2006 MySQL Server在插入数据时已经消失\”?
    How to Resolve "General error: 2006 MySQL server has gone away" While Inserting RecordsIntroduction:Inserting data into a MySQL database can...
    编程 发布于2025-07-13
  • Spark DataFrame添加常量列的妙招
    Spark DataFrame添加常量列的妙招
    在Spark Dataframe ,将常数列添加到Spark DataFrame,该列具有适用于所有行的任意值的Spark DataFrame,可以通过多种方式实现。使用文字值(SPARK 1.3)在尝试提供直接值时,用于此问题时,旨在为此目的的column方法可能会导致错误。 df.withCo...
    编程 发布于2025-07-13
  • 在JavaScript中如何并发运行异步操作并正确处理错误?
    在JavaScript中如何并发运行异步操作并正确处理错误?
    同意操作execution 在执行asynchronous操作时,相关的代码段落会遇到一个问题,当执行asynchronous操作:此实现在启动下一个操作之前依次等待每个操作的完成。要启用并发执行,需要进行修改的方法。 第一个解决方案试图通过获得每个操作的承诺来解决此问题,然后单独等待它们: co...
    编程 发布于2025-07-13
  • 如何使用PHP将斑点(图像)正确插入MySQL?
    如何使用PHP将斑点(图像)正确插入MySQL?
    essue VALUES('$this->image_id','file_get_contents($tmp_image)')";This code builds a string in PHP, but the function call ...
    编程 发布于2025-07-13
  • HTML格式标签
    HTML格式标签
    HTML 格式化元素 **HTML Formatting is a process of formatting text for better look and feel. HTML provides us ability to format text without us...
    编程 发布于2025-07-13
  • Go web应用何时关闭数据库连接?
    Go web应用何时关闭数据库连接?
    在GO Web Applications中管理数据库连接很少,考虑以下简化的web应用程序代码:出现的问题:何时应在DB连接上调用Close()方法?,该特定方案将自动关闭程序时,该程序将在EXITS EXITS EXITS出现时自动关闭。但是,其他考虑因素可能保证手动处理。选项1:隐式关闭终止数...
    编程 发布于2025-07-13
  • 如何在JavaScript对象中动态设置键?
    如何在JavaScript对象中动态设置键?
    在尝试为JavaScript对象创建动态键时,如何使用此Syntax jsObj['key' i] = 'example' 1;不工作。正确的方法采用方括号: jsobj ['key''i] ='example'1; 在JavaScript中,数组是一...
    编程 发布于2025-07-13

免责声明: 提供的所有资源部分来自互联网,如果有侵犯您的版权或其他权益,请说明详细缘由并提供版权或权益证明然后发到邮箱:[email protected] 我们会第一时间内为您处理。

Copyright© 2022 湘ICP备2022001581号-3