The large amount of information generated by devices and applications makes machine learning (ML) and artificial intelligence (AI) essential tools in the modern world.
They mimic human intelligence to recognize images rapidly, predict the near future, or indicate actions that should be taken to prevent a breakdown. However, ML and AI use complex mathematics, which makes it challenging to start using them in your everyday projects. To simplify this process, you can rely on dedicated frameworks, which abstract the underlying mathematics and help implement complexities.
There are two popular ML and AI frameworks: TensorFlow and PyTorch. Google Brain developed TensorFlow, while Meta AI mainly developed PyTorch. Both are open source and free to use. These frameworks enable you to create, train, and deploy models, which you can use for your ML and AI needs.
This tutorial compares both frameworks using the same neural network.
What Will You Build?
You will create and train the feedforward neural network to classify handwritten digits from the Modified National Institute of Standards and Technology (MNIST) dataset. This dataset contains 70,000 images (60,000 training and 10,000 testing images) of handwritten numerals (0-9), each with a shape indicated by nodes (28 x 28). Some of the representative MNIST digits with their labels are below.
Your neural network starts with an input layer containing 28 x 28 = 784 input nodes, which matches the size of the input images (28 x 28 pixels). Each node accepts a single pixel from the MNIST image. Then, you have a linear hidden layer with the hyperbolic tangent (tanh) activation function. This layer contains 64 nodes, leading to 64 x 784 + 64 = 50,240 parameters (50,176 connections + 64 biases). To avoid overfitting, you use the Dropout layer, which randomly sets 20 percent of the nodes to zero.
Subsequently, you use another Hidden layer with 128 nodes, leading to another 8,320 parameters (64 x 128 + 128). Again, you remove 20 percent of the resulting nodes using the Dropout layer. Finally, you have the Output layer with ten nodes representing the probability of recognizing the digit. The last connection has 1,290 parameters (128 x 10 + 10). So, in total, your feedforward network has 59,850 trainable parameters.
You can download the entire code for this project, tested using Python 3.10 using Jupyter notebooks. The list of packages you need is in requirements.txt.
To set up the project, create a new folder, move into the folder, and install the packages using pip:
pip install -r requirements.txt
If you can, you should use the macOS-tested tensorflow-macos Python package (pip install tensorflow-macos). For Windows, you must install the tensorflow package (pip install tensorflow).
TensorFlow
Start by creating your feedforward neural network in TensorFlow. You can find the code used in this section in the following notebooks:
Load and Prepare Data
Start by creating the new notebook (01_TensorFlow_Training.ipynb), where you’ll import the necessary packages:
from tensorflow import keras
import matplotlib.pyplot as plt
Then, download the MNIST dataset using the load_data method of keras.datasets.mnist:
# Load data
(train_images, train_labels), (test_images, test_labels) = keras.datasets.mnist.load_data()
If you see errors during the download, you must install SSL certificates. On macOS, you can use a script that installs those certificates to the following path. Note that you might eventually change the Python version from 3.10 to your version:
/Applications/Python 3.10/Install Certificates.command
To install those certificates, run the script (assuming you’re in /Applications/Python 3.10) by typing ./Install\ Certificates.command.
Then, preview the images from the MNIST dataset:
# Class names
class_names = range(10);
# Preview train images
rowCount = 6;
colCount = 10;
plt.figure(figsize=(10, 10))
for i in range(rowCount * colCount):
plt.subplot(rowCount, colCount, i+1)
plt.xticks([])
plt.yticks([])
plt.imshow(train_images[i], cmap=plt.cm.binary)
plt.title(class_names[train_labels[i]], fontsize=25)
After running this code, you’ll see an array of MNIST images with their labels, shown earlier.
Finally, normalize, train, and test the images so that the pixel values fall in the range of 0-1. This step is not strictly necessary, but it usually helps to avoid numerical and precision issues.
# Prepare data
train_images = train_images / 255.0;
test_images = test_images / 255.0;
Neural Network Architecture
After getting and preparing the data, you can architect your model. TensorFlow provides three ways of building machine learning models:
- Sequential API, with which you can quickly define models using an intuitive and user-friendly sequence of the building blocks (layers).
- Functional API, which provides more flexibility than the Sequential API. Specifically, you can use more than one input and output.
- Model subclassing, where you create a custom class for your model. This more advanced approach can give you more flexibility in defining custom layers.
The Sequential API is generally an easier way to start, while model subclassing is like building models with PyTorch, which you will see later. Therefore, you’ll use the Sequential API and define your model using the following statements:
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(64, activation='tanh'),
keras.layers.Dropout(.2),
keras.layers.Dense(128, activation='sigmoid'),
keras.layers.Dropout(.2),
keras.layers.Dense(len(class_names), activation='softmax')
])
This code creates the sequential model, consisting of six stacked layers. Each layer has one input tensor and one output tensor. You’ll use built-in layers, available in the keras.layers module. Specifically, the first layer, Flatten, takes the input image of shape (28, 28), and reshapes it into a 786-dimensional vector.
Then, there’s a Dense layer. In this layer, TensorFlow uses the following mathematical formula: activation (output equals a times input plus bias), where activation is the activation function, “a” denotes the vector of coefficients (weights), and “bias” is the vector of offsets. The “a” and “bias” are the parameters adjusted during the training. For the activation, you use a hyperbolic tangent.
Next, you use the Dropout layer, another Dense layer with the sigmoid activation, and another Dropout layer. The last layer contains ten output nodes representing the probability or the score of recognizing each one from 10 digits. The higher the score, the better recognition confidence. Note that for the last layer, you must use softmax activation. It normalizes outputs such that their probabilities sum up to 1.
You can further parameterize the layers to use additional kernels, regularizers, and constraints during the training described in Keras docs.
After architecting the network, you must configure it for training. You do so using the model.compile method:
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
The above method configures three things:
- The optimizer algorithm adjusts model parameters to minimize the error between true and predicted recognition labels. Here, you use the Adam optimizer. For the list of other optimizers, check the Keras docs.
- The loss function calculates the error between true and predicted values. The optimizer uses this function during training. You use ‘sparse_categorical_crossentropy’ here. You can find the list of all available losses here. Note that, in general, you’ll generally choose the loss function, depending on the problem.
- The metrics list holds metrics evaluated during model training. Here, you’ll use accuracy only. You can find other metrics here.
Now, you can display the model summary:
model.summary()
After executing this method, you’ll see the following output. The Param # column shows how many parameters (the collection of “a” coefficients and biases) training will adjust.
Train
After preparing the network, you can train it. To do so, use the fit method of the model:
model.fit(train_images, train_labels,
validation_data=(test_images, test_labels), epochs=10)
The method accepts several inputs. Here, you’ll use only four:
- x input data is the collection of your training images.
- y data labels are true labels associated with your images.
- Validation_data validates the model during training. You pass the tuple of your test images along with their labels.
- Epoch is the number of iterations.
After invoking the fit method, you’ll see output like that shown below. It shows the training progress, loss, and accuracy for each epoch:
In this case, you’ve trained the model to achieve final accuracy of about 98 percent.
Save the Trained Model
In the last step, save the trained model using the following method:
model.save('tf-model.tf', save_format='tf')
The output of this method looks as follows:
Inference
You can now use the trained model for prediction.
Create a new notebook (02_TensorFlow_Inference.ipynb), and then modify it by importing the following packages:
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
Then, load and prepare the test data:
_, (test_images, test_labels) = keras.datasets.mnist.load_data()
test_images = test_images / 255.0
Now, load the trained model and use it for predictions:
inputPath = 'tf-model.tf'
model = keras.models.load_model(inputPath);
# Recognize digits
prediction_result = model.predict(test_images);
# Get predicted labels
predicted_labels = np.argmax(prediction_result, axis=1);
To recognize digits, you’ll use the predict method. This method returns a two-dimensional array of scores with the shape (10,000, 10). Each score array contains ten elements, and there are 10,000 such arrays (since there are 10,000 test images). To get the highest score, use the argmax function from NumPy.
Finally, preview the results:
# Get randomly selected image for preview
preview_image_index = np.random.randint(0, test_images.shape[0] - 1)
plt.figure()
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(test_images[preview_image_index], cmap=plt.cm.binary)
plt.xlabel(f"Actual: {test_labels[preview_image_index]} \n
Predicted: {predicted_labels[preview_image_index]}", fontsize=20);
Here are the sample outputs of the above code:
As shown above, the model correctly recognized test images. You further evaluate the model when comparing it to the PyTorch model.
PyTorch
Now, it is time to create the same feedforward neural network with PyTorch.
Architecture
Start by creating the new notebook, 03_PyTorch_Training.ipynb, where you’ll import the following packages:
import torch
from torch import nn
from torchvision import transforms, datasets
from torchsummary import summary
from torch.utils.data import DataLoader
Then, define the model class:
class_names = range(10);
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_stack = nn.Sequential(
nn.Linear(28*28, 64),
nn.Tanh(),
nn.Dropout(.2),
nn.Linear(64, 128),
nn.Sigmoid(),
nn.Dropout(.2),
nn.Linear(128, len(class_names)),
nn.Softmax(dim=1)
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_stack(x)
return logits
To architect the neural network in PyTorch, define the class, deriving from PyTorch’s on.Module. This approach is like TensorFlow’s model subclassing. Here, you defined the class name NeuralNetwork. This class contains two elements:
- The __init__ method, which acts like the class constructor. You use this method to define the feedforward network. Specifically, you flatten the input (reshape the image from (28, 28) to a 728-dimensional vector. Then, you create the linear stack of layers. It has the same structure as the network created with TensorFlow. There are, of course, slight differences in naming conventions. You use Linear (PyTorch) instead of Dense (TensorFlow). In PyTorch, you must use activation functions separately, right after the Linear layers.
- The forward method takes the input image, flattens it, and then passes it through the network to calculate the prediction (score array).
Next, you initialize the model and display its summary:
model = NeuralNetwork();
summary(model, (1, 28, 28))
You then see the similar output as in TensorFlow (note the number of trainable parameters is the same as before):
Get data
After creating the neural network, you download the training and test data. PyTorch provides the datasets module, which provides access to many built-in datasets.
You use this module to access the MNIST dataset:
# Training data
training_data = datasets.MNIST(
root="data",
train=True,
download=True,
transform=transforms.ToTensor()
)
# Test data
test_data = datasets.MNIST(
root="data",
train=False,
download=True,
transform=transforms.ToTensor()
)
# Dataloaders
batch_size = 32
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)
When accessing the dataset, you can specify the transformations to apply to each item. Here, you use ToTensor transform, which converts MNIST images to tensors and scales the images to the range 0-1.
To load the data, you use the DataLoader utility class. This class enables you to load multiple images at once. You control the number of images to load using the batch_size parameter. You set its value to 32, the same as TensorFlow’s default value for the fit method.
Train
Now, you have all the tools needed to train the model.
First, specify the loss function, and the optimizer:
learning_rate = 1e-3;
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
Similar to the previous case, you use the CrossEntropyLoss, and use the Adam optimizer. For the learning rate, you set 1e-3, which is the same value as the default in TensorFlow.
Then, you must define the method for training and evaluating your feedforward neural network. This is equivalent to the fit method from TensorFlow.
def train_loop(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
for batch, (x, y) in enumerate(dataloader):
# Compute prediction and loss
pred = model(x)
loss = loss_fn(pred, y)
# Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
def test_loop(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
test_loss, correct = 0, 0
with torch.no_grad():
for x, y in dataloader:
pred = model(x)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
print(f"Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
The first method, train_loop, uses the backpropagation algorithm to optimize the trainable parameters to minimize the prediction error of the neural network. The second method, test_loop, calculates the neural network error using the test images and displays the accuracy and loss value. You can also implement the fit method using this guide.
You can now invoke those methods to train and evaluate the model. Like TensorFlow, you use ten epochs:
epochs = 10
for t in range(epochs):
print(f"Epoch {t+1}:")
train_loop(train_dataloader, model, loss_fn, optimizer)
test_loop(test_dataloader, model, loss_fn)
After running this code, you’ll see the following output:
In this case, you trained the model to an accuracy of about 95 percent.
The model is torch.save(model.state_dict(), “PyTorch-model.pth”).
Inference
Now, you’ll use the trained model for inference. To do so, create the new notebook, 04_PyTorch_Inference.ipynb, then add the following imports:
import torch
from torch import nn
from torchvision import transforms, datasets
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import numpy as np
Now you must redefine the model. (Alternatively, you could import the NeuralNetwork class from a previous notebook.):
class_names = range(10);
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_stack = nn.Sequential(
nn.Linear(28*28, 64),
nn.Tanh(),
nn.Dropout(.2),
nn.Linear(64, 128),
nn.Sigmoid(),
nn.Dropout(.2),
nn.Linear(128, len(class_names)),
nn.Softmax(dim=1)
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_stack(x)
return logits
Subsequently, initialize the model, and then load its state (weights and biases) from the file:
model = NeuralNetwork()
model.load_state_dict(torch.load("PyTorch-model.pth"))
Then, you must load the test data by using a batch size of 10,000 to load all available test images:
# Test dataset
test_data = datasets.MNIST(
root="data",
train=False,
download=True,
transform=transforms.ToTensor()
)
# Dataloader
batch_size = 10000
test_dataloader = DataLoader(test_data, batch_size=batch_size)
Afterward, you get the test images with their labels:
# Get data
test_images, test_labels = next(iter(test_dataloader))
# Recognize digits
prediction_result = model(test_images);
And predict labels:
# Get predicted labels
predicted_labels = prediction_result.argmax(1);
Last, using similar code to the TensorFlow case, preview the results:
# Get randomly selected image for preview
preview_image_index = np.random.randint(0, test_images.shape[0] - 1)
plt.figure()
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(test_images[preview_image_index][0].numpy(), cmap=plt.cm.binary)
plt.xlabel(f"Actual: {test_labels[preview_image_index]} \n
Predicted: {predicted_labels[preview_image_index]}", fontsize=20);
This code leads to the following results:
As before, the model correctly recognizes handwritten digits. Using three criteria, you can compare both frameworks:
- Usability
- Capabilities
- Performance
Usability
From this tutorial, you can see that the Sequential API provided by TensorFlow is generally more beginner-friendly. You can start building the ML models rapidly without needing deep knowledge of all the mathematics behind it. However, the approach (Sequential API) limits you to the stacked layers, where each has only one input and one output.
For more advanced scenarios, you must use a model subclassing approach in TensorFlow or PyTorch. Though the article didn’t cover how to create models using subclassing in TensorFlow, this method is very similar to the PyTorch approach. So, PyTorch, although more difficult to start, is better for advanced or custom scenarios.
Capabilities
In this case, both frameworks provided the same capabilities. You were able to reproduce the neural network architecture with TensorFlow in PyTorch. All major components you used, including layers, activation functions, regularizations, optimizers, losses, and metrics, were available in both frameworks. So, even if you prototyped something in TensorFlow, you should be able to move your code to PyTorch easily.
Both frameworks enable you to convert models to Open Neural Network Exchange (ONNX) standards to use hardware optimizations. So, you benefit from that for your production workloads.
Model Performance
You can compare model performances using common classification metrics, including precision, recall, and f1-score. The code that evaluates the classification model performance is available in scikit-learn.
Start by supplementing the 02_Tensor_Inference.ipynb notebook with the following imports:
from sklearn import metrics
import seaborn as sns
Then, print the classification report:
print(metrics.classification_report(test_labels, predicted_labels))
In the report above, you see the model precision, recall, and an f1-score for each class (the recognized digit). The additional column contains the number of images for each class. You see that each metric reaches 97 percent on average.
Use the confusion matrix to see when your model incorrectly predicts the digit:
confusion_matrix = metrics.confusion_matrix(test_labels, predicted_labels, labels=class_names);
class_names = range(10);
plt.figure(figsize=(10, 8))
sns.heatmap(confusion_matrix,
xticklabels=class_names,
yticklabels=class_names,
annot=True, fmt='g');
plt.xlabel('Predicted label', fontsize=20);
plt.ylabel('True label', fontsize=20);
The above code displays the following matrix:
You can use the same code in the 04_PyTorch_Inference.ipynb to evaluate the performance of the PyTorch model. Here’s the classification report:
The PyTorch model has slightly lower precision, recall, and an f1-score (95 percent on average).
Here’s the confusion matrix:
Compared to the TensorFlow model, the above matrix has larger off-diagonal elements, meaning there is a greater chance of predicting an incorrect digit.
This model comparison shows that, in this particular case, the PyTorch model performs slightly worse than the TensorFlow model. This performance is most likely due to the simple training loop implementation.
Conclusion
In this tutorial, you learned how to create and train the feedforward neural network for the multi-class classification using TensorFlow and PyTorch. You applied this model to recognize handwritten digits from the MNIST dataset. You also learned how to architect the network, train it, and evaluate its performance using standard classification metrics such as precision, recall, and f1-score. Finally, you compared both models by using the confusion matrix.
From this comparison, we conclude that TensorFlow is more suitable for beginners than PyTorch. However, both frameworks provide similar capabilities.