
PyTorch
Time to upgrade my skills in ML, and since COVID-19 cases are rising and new restrictions are now in place, decided to do some learning while missing my social life and friends :(
Why PyTorch?
Seems that it’s like NumPy on the GPU that can parrallelize operations, so super efficient. (Also, I’m considering taking the Machine Learning Engineer Course on Udacity, and they are doing everything in PyTorch there, so win-win!) Some of the pros I’ve descivered are:
- Tensor (multidimentional array) processing
- Efficient Data Loading
- Deep Learning Functions
- Distributed Training
- Provides Dynamic Computational Graphs
- More Strong in Academia than ndustry (as TensorFlow provides additional deployment tools)
PyTorch Data Structures
In math, the generalization of vectors and matrices to a higher dimensional space - a tensonthe generalization of vectors and matrices to a higher dimensional space - a tensor.
Tensor:
It’s an entity with a defined number of dimensions called an order (rank).
Scalars:
A rank-0-tensor. Let’s denote scalar value as $x∈ℝ$, where $ℝ$ is a set of real numbers.
Vectors:
A rank-1-tensor. Vectors belong to linear space (vector space), a set of possible vectors of a specific length. A vector consisting of real-valued scalars $x∈ℝ$ can be defined as $y∈ℝ^n$, where $y$ is vector value and $ℝ^n$ - nn-dimensional real-number vector space. $y_i$ - $i_{th}$ vector element (scalar): \(y = \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{bmatrix}\)
Matrices:
A rank-2-tensor. A matrix of size $m \times n$, where $m, n \in \mathbb{N}$ (rows and columns number accordingly) consisting of real-valued scalars can be denoted as $A \in \mathbb{R}^{m \times n}$, where $\mathbb{R}^{m \times n}$ is a real-valued $m \times n$-dimensional vector space:
\[A = \begin{bmatrix} x_{11} & x_{12} & x_{13} & \dots & x_{1n} \\ x_{21} & x_{22} & x_{23} & \dots & x_{2n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ x_{m1} & x_{m2} & x_{m3} & \dots & x_{mn} \end{bmatrix}\]Code Basics
Examples
First, we import the libraries that we need and set a random seed.
import torch
import numpy as np
from matplotlib import pyplot as plt
# set seed
torch.random.manual_seed(42)
PyTorch and NumPy
# rank-2 (2d) tensor - float is default
torch.zeros(3, 4, dtype=torch.float16)
# rank-3 tensor
torch.zeros(2, 4, 5, dtype=torch.int16)
# rank-4 tensor
torch.rand(2, 3, 4, 5)
# create tensor from lists and np arrays
python_list = [1, 2]
# Create a numpy array from python list
numpy_array = np.array(python_list)
# Create a torch Tensor from python list
tensor_from_list = torch.tensor(python_list)
# Create a torch Tensor from Numpy array (compies memory)
tensor_from_array = torch.tensor(numpy_array)
# Another way to create a torch Tensor from Numpy array (Share same storage)
tensor_from_array_v2 = torch.from_numpy(numpy_array)
# Convert torch tensor to numpy array
array_from_tensor = tensor_from_array.numpy()
print('List: ', python_list)
print('Array: ', numpy_array)
print('Tensor: ', tensor_from_list)
print('Tensor: ', tensor_from_array)
print('Tensor: ', tensor_from_array_v2)
print('Array: ', array_from_tensor)
Output:
Output:
List: [1, 2]
Array: [1 2]
Tensor: tensor([1, 2])
Tensor: tensor([1, 2])
Tensor: tensor([1, 2])
Array: [1 2]
Differnce between torch.Tensor and torch.from_numpy
Pytorch aims to be an effective library for computations and avoids memory copying if it can:
numpy_array[0] = 10
print('Array: ', numpy_array)
print('Tensor: ', tensor_from_array)
print('Tensor: ', tensor_from_array_v2)
Output:
Output:
Array: [10 2]
Tensor: tensor([1, 2])
Tensor: tensor([10, 2])
It also works the opposite way
Indexing
# rank-1
a = torch.rand(5)
a[2]
Output:
Output:
tensor(0.7936)
# select two elements
a[[2, 4]]
Output:
Output:
tensor([0.7936, 0.1332])
# select three elements with a mask
a[[True, False, False, True, True]]
Output:
Output:
tensor([0.6009, 0.9408, 0.1332])
# rank-2
tensor = torch.rand((5, 3))
tensor
Output:
Output:
tensor([[0.2695, 0.3588, 0.1994],
[0.5472, 0.0062, 0.9516],
[0.0753, 0.8860, 0.5832],
[0.3376, 0.8090, 0.5779],
[0.9040, 0.5547, 0.3423]])
#select row
tensor[0]
Output:
Output:
tensor([0.2695, 0.3588, 0.1994])
# select element
tensor[0, 2]
Output:
Output:
tensor(0.1994)
# select rows
rows = torch.tensor([0, 2, 4])
rows
Output:
tensor([0, 2, 4])
tensor[rows]
Output:
tensor([[0.2695, 0.3588, 0.1994],
[0.0753, 0.8860, 0.5832],
[0.9040, 0.5547, 0.3423]])
Tensor Shapes
We can reshape a tensor without the memory copying overhead. There are two methods for that: reshape
and view
. The difference is the following:
view
tries to return the tensor, and it shares the same memory with the original tensor. In case, if it cannot reuse the same memory due to some reasons, it just fails.reshape
always returns the tensor with the desired shape and tries to reuse the memory. If it cannot, it creates a copy.tensor = torch.rand(2, 3, 4) tensor
Output: tensor([[[0.6343, 0.3644, 0.7104, 0.9464], [0.7890, 0.2814, 0.7886, 0.5895], [0.7539, 0.1952, 0.0050, 0.3068]], [[0.1165, 0.9103, 0.6440, 0.7071], [0.6581, 0.4913, 0.8913, 0.1447], [0.5315, 0.1587, 0.6542, 0.3278]]])
print('Pointer to data: ', tensor.data_ptr()) print('Shape: ', tensor.shape)
Pointer to data: 80622400 Shape: torch.Size([2, 3, 4])
reshaped = tensor.reshape(24) reshaped
Output: tensor([0.6343, 0.3644, 0.7104, 0.9464, 0.7890, 0.2814, 0.7886, 0.5895, 0.7539, 0.1952, 0.0050, 0.3068, 0.1165, 0.9103, 0.6440, 0.7071, 0.6581, 0.4913, 0.8913, 0.1447, 0.5315, 0.1587, 0.6542, 0.3278])
view = tensor.view(3, 2, 4) view
Output: tensor([[[0.6343, 0.3644, 0.7104, 0.9464], [0.7890, 0.2814, 0.7886, 0.5895]], [[0.7539, 0.1952, 0.0050, 0.3068], [0.1165, 0.9103, 0.6440, 0.7071]], [[0.6581, 0.4913, 0.8913, 0.1447], [0.5315, 0.1587, 0.6542, 0.3278]]])
# return adresses print('Reshaped tensor - pointer to data', reshaped.data_ptr()) print('Reshaped tensor shape ', reshaped.shape) print('Viewed tensor - pointer to data', view.data_ptr()) print('Viewed tensor shape ', view.shape)
Reshaped tensor - pointer to data 80622400 Reshaped tensor shape torch.Size([24]) Viewed tensor - pointer to data 80622400 Viewed tensor shape torch.Size([3, 2, 4])
```py
assert if the original and the view tensor have the same memory adress
assert tensor.data_ptr() == view.data_ptr()
assert is flatted tensor and reshapes are the same
assert np.all(np.equal(tensor.numpy().flat, reshaped.numpy().flat))
print stride - the jump necessary to go from one element to the next one in the specified dimension dim
print(‘Original stride: ‘, tensor.stride()) print(‘Reshaped stride: ‘, reshaped.stride()) print(‘Viewed stride: ‘, view.stride())
```sh
Original stride: (12, 4, 1)
Reshaped stride: (1,)
Viewed stride: (8, 4, 1)
If we have a multi-dimentional tensor
and a mask
of different dimentions we can use expand_as
operation to create a view
of the mask
that has the same dimensions as the tensor we want to apply it to, but has not copied the data.
Autograd
Pytorch supports automatic differentiation the Autograd
module. It calculates the gradients and keeps track in forward and backward passes. For primitive tensors, you need to enable requires_grad
flag. For advanced tensors, it is enabled by default.
a = torch.rand((3, 5), requires_grad=True)
a
Output:
tensor([[0.3470, 0.0240, 0.7797, 0.1519, 0.7513],
[0.7269, 0.8572, 0.1165, 0.8596, 0.2636],
[0.6855, 0.9696, 0.4295, 0.4961, 0.3849]], requires_grad=True)
result = a * 5
result
Output:
tensor([[1.7351, 0.1200, 3.8987, 0.7595, 3.7565],
[3.6345, 4.2861, 0.5824, 4.2980, 1.3181],
[3.4277, 4.8478, 2.1474, 2.4807, 1.9244]], grad_fn=<MulBackward0>)
grad
can be implicitly created only for scalar outputs
# we use sum to make it scalar to apply backward pass
mean_result = result.sum()
# Calculate Gradient
mean_result.backward()
# gradient of a
a.grad
Output:
tensor([[5., 5., 5., 5., 5.],
[5., 5., 5., 5., 5.],
[5., 5., 5., 5., 5.]])
We multiplied an input by 5, so as expected the calculated gradient is 5.
Disable autograd
We don’t need to compute gradients for all the variables that are involved in the pipeline. The Pytorch API provides 2 ways to disable autograd.
detach
- returns a copy built on the same memory of the tensor with autograd disabled. In-place size/stride/storage changes modifications are not allowed.torch.no_grad()
- It is a context manager that allows you to guard a series of operations from autograd without creating new tensors.
Context managers allow you to allocate and release resources precisely when you want to. The most widely used example of context managers is the with statement.
a = torch.rand((3, 5), requires_grad=True)
detached_a = a.detach()
a
Output:
tensor([[0.0323, 0.7047, 0.2545, 0.3994, 0.2122],
[0.4089, 0.1481, 0.1733, 0.6659, 0.3514],
[0.8087, 0.3396, 0.1332, 0.4118, 0.2576]], requires_grad=True)
detached_a
Output:
tensor([[0.0323, 0.7047, 0.2545, 0.3994, 0.2122],
[0.4089, 0.1481, 0.1733, 0.6659, 0.3514],
[0.8087, 0.3396, 0.1332, 0.4118, 0.2576]])
detached_result = detached_a * 5
result = a * 10
Same as before, we cannot do backward pass that is required for autograd using multideminsional output, so we calculate the sum
mean_result = result.sum()
mean_result.backward()
a.grad
Output:
tensor([[10., 10., 10., 10., 10.],
[10., 10., 10., 10., 10.],
[10., 10., 10., 10., 10.]])
a = torch.rand((3, 5), requires_grad=True)
with torch.no_grad():
detached_result = a * 5
result = a * 10
detached_result
Output:
tensor([[0.4125, 3.6998, 0.0182, 4.0520, 4.3706],
[4.8643, 1.9103, 0.4459, 3.0621, 3.8811],
[0.0117, 1.9325, 1.0014, 2.2813, 1.2695]])
mean_result = result.sum()
mean_result.backward()
a.grad
Output:
tensor([[10., 10., 10., 10., 10.],
[10., 10., 10., 10., 10.],
[10., 10., 10., 10., 10.]])
Again, we multiplied result
by 10, so as expected, the grad
is 10.
Sources: