PyTorch model input shape

We Are Going To Discuss About PyTorch model input shape. So lets Start this Python Article.

PyTorch model input shape

  1. How to solve PyTorch model input shape

    PyTorch flexibility
    PyTorch models are very flexible objects, to the point where they do not enforce or generally expect a fixed input shape for data.
    If you have certain layers there may be constraints e.g:
    a flatten followed by a fully connected layer of width N would enforce the dimensions of your original input (M1 x M2 x … Mn) to have a product equal to N
    a 2d convolution of N input channels would enforce the data to be 3 dimensionsal, with the first dimension having size N
    But as you can see neither of these enforce the total shape of the data.
    We might not realize it right now, but in more complex models, getting the size of the first linear layer right is sometimes a source of frustration. We’ve heard stories of famous practitioners putting in arbitrary numbers and then relying on error messages from PyTorch to backtrack the correct sizes for their linear layers. Lame, eh? Nah, it’s all legit!
    Deep Learning with PyTorch
    Investigation
    Simple case: First layer is Fully Connected
    If your model's first layer is a fully connected one, then the first layer in print(model) will detail the expected dimensionality of a single sample.
    Ambiguous case: CNN
    If it is a convolutional layer however, since these are dynamic and will stride as long/wide as the input permits, there is no simple way to retrieve this info from the model itself.1 This flexibility means that for many architectures multiple compatible input sizes2 will all be acceptable by the network.
    This is a feature of PyTorch's Dynamic computational graph.
    Manual inspection
    What you will need to do is investigate the network architecture, and once you've found an interpretable layer (if one is present e.g. fully connected) “work backwards” with its dimensions, determining how the previous layers (e.g. poolings and convolutions) have compressed/modified it.
    Example
    e.g. in the following model from Deep Learning with PyTorch (8.5.1):
    class NetWidth(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1) self.conv2 = nn.Conv2d(32, 16, kernel_size=3, padding=1) self.fc1 = nn.Linear(16 * 8 * 8, 32) self.fc2 = nn.Linear(32, 2) def forward(self, x): out = F.max_pool2d(torch.tanh(self.conv1(x)), 2) out = F.max_pool2d(torch.tanh(self.conv2(out)), 2) out = out.view(-1, 16 * 8 * 8) out = torch.tanh(self.fc1(out)) out = self.fc2(out) return out
    We see the model takes an input 2.d. image with 3 channels and:
    Conv2d -> sends it to an image of the same size with 32 channels
    max_pool2d(,2) -> halves the size of the image in each dimension
    Conv2d -> sends it to an image of the same size with 16 channels
    max_pool2d(,2) -> halves the size of the image in each dimension
    view -> reshapes the image
    Linear -> takes a tensor of size 16 * 8 * 8 and sends to size 32

    So working backwards, we have:
    a tensor of shape 16 * 8 * 8
    un-reshaped into shape (channels x height x width)
    un-max_pooled in 2d with factor 2, so height and width un-halved
    un-convolved from 16 channels to 32
    Hypothesis: It is likely 16 in the product thus refers to the number of channels, and that the image seen by view was of shape (channels, 8,8), and currently is (channels, 16,16)2
    un-max_pooled in 2d with factor 2, so height and width un-halved again (channels, 32,32)
    un-convolved from 32 channels to 3
    So assuming the kernel_size and padding are sufficient that the convolutions themselves maintain image dimensions, it is likely that the input image is of shape (3,32,32) i.e. RGB 32×32 pixel square images.

    Notes:
     
    Even the external package pytorch-summary requires you provide the input shape in order to display the shape of the output of each layer.
    It could however be any 2 numbers whose produce equals 8*8 e.g. (64,1), (32,2), (16,4) etc however since the code is written as 8*8 it is likely the authors used the actual dimensions.

  2. PyTorch model input shape

    PyTorch flexibility
    PyTorch models are very flexible objects, to the point where they do not enforce or generally expect a fixed input shape for data.
    If you have certain layers there may be constraints e.g:
    a flatten followed by a fully connected layer of width N would enforce the dimensions of your original input (M1 x M2 x … Mn) to have a product equal to N
    a 2d convolution of N input channels would enforce the data to be 3 dimensionsal, with the first dimension having size N
    But as you can see neither of these enforce the total shape of the data.
    We might not realize it right now, but in more complex models, getting the size of the first linear layer right is sometimes a source of frustration. We’ve heard stories of famous practitioners putting in arbitrary numbers and then relying on error messages from PyTorch to backtrack the correct sizes for their linear layers. Lame, eh? Nah, it’s all legit!
    Deep Learning with PyTorch
    Investigation
    Simple case: First layer is Fully Connected
    If your model's first layer is a fully connected one, then the first layer in print(model) will detail the expected dimensionality of a single sample.
    Ambiguous case: CNN
    If it is a convolutional layer however, since these are dynamic and will stride as long/wide as the input permits, there is no simple way to retrieve this info from the model itself.1 This flexibility means that for many architectures multiple compatible input sizes2 will all be acceptable by the network.
    This is a feature of PyTorch's Dynamic computational graph.
    Manual inspection
    What you will need to do is investigate the network architecture, and once you've found an interpretable layer (if one is present e.g. fully connected) “work backwards” with its dimensions, determining how the previous layers (e.g. poolings and convolutions) have compressed/modified it.
    Example
    e.g. in the following model from Deep Learning with PyTorch (8.5.1):
    class NetWidth(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1) self.conv2 = nn.Conv2d(32, 16, kernel_size=3, padding=1) self.fc1 = nn.Linear(16 * 8 * 8, 32) self.fc2 = nn.Linear(32, 2) def forward(self, x): out = F.max_pool2d(torch.tanh(self.conv1(x)), 2) out = F.max_pool2d(torch.tanh(self.conv2(out)), 2) out = out.view(-1, 16 * 8 * 8) out = torch.tanh(self.fc1(out)) out = self.fc2(out) return out
    We see the model takes an input 2.d. image with 3 channels and:
    Conv2d -> sends it to an image of the same size with 32 channels
    max_pool2d(,2) -> halves the size of the image in each dimension
    Conv2d -> sends it to an image of the same size with 16 channels
    max_pool2d(,2) -> halves the size of the image in each dimension
    view -> reshapes the image
    Linear -> takes a tensor of size 16 * 8 * 8 and sends to size 32

    So working backwards, we have:
    a tensor of shape 16 * 8 * 8
    un-reshaped into shape (channels x height x width)
    un-max_pooled in 2d with factor 2, so height and width un-halved
    un-convolved from 16 channels to 32
    Hypothesis: It is likely 16 in the product thus refers to the number of channels, and that the image seen by view was of shape (channels, 8,8), and currently is (channels, 16,16)2
    un-max_pooled in 2d with factor 2, so height and width un-halved again (channels, 32,32)
    un-convolved from 32 channels to 3
    So assuming the kernel_size and padding are sufficient that the convolutions themselves maintain image dimensions, it is likely that the input image is of shape (3,32,32) i.e. RGB 32×32 pixel square images.

    Notes:
     
    Even the external package pytorch-summary requires you provide the input shape in order to display the shape of the output of each layer.
    It could however be any 2 numbers whose produce equals 8*8 e.g. (64,1), (32,2), (16,4) etc however since the code is written as 8*8 it is likely the authors used the actual dimensions.

Solution 1

PyTorch flexibility

PyTorch models are very flexible objects, to the point where they do not enforce or generally expect a fixed input shape for data.

If you have certain layers there may be constraints e.g:

  • a flatten followed by a fully connected layer of width N would enforce the dimensions of your original input (M1 x M2 x … Mn) to have a product equal to N
  • a 2d convolution of N input channels would enforce the data to be 3 dimensionsal, with the first dimension having size N

But as you can see neither of these enforce the total shape of the data.

We might not realize it right now, but in more complex models, getting the size of the first linear layer right is sometimes a source of frustration. We’ve heard stories of famous practitioners putting in arbitrary numbers and then relying on error messages from PyTorch to backtrack the correct sizes for their linear layers. Lame, eh? Nah, it’s all legit!

  • Deep Learning with PyTorch

Investigation

Simple case: First layer is Fully Connected

If your model’s first layer is a fully connected one, then the first layer in print(model) will detail the expected dimensionality of a single sample.

Ambiguous case: CNN

If it is a convolutional layer however, since these are dynamic and will stride as long/wide as the input permits, there is no simple way to retrieve this info from the model itself.1 This flexibility means that for many architectures multiple compatible input sizes2 will all be acceptable by the network.

This is a feature of PyTorch’s Dynamic computational graph.

Manual inspection

What you will need to do is investigate the network architecture, and once you’ve found an interpretable layer (if one is present e.g. fully connected) “work backwards” with its dimensions, determining how the previous layers (e.g. poolings and convolutions) have compressed/modified it.

Example

e.g. in the following model from Deep Learning with PyTorch (8.5.1):

class NetWidth(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 16, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(16 * 8 * 8, 32)
        self.fc2 = nn.Linear(32, 2)
    
    def forward(self, x):
        out = F.max_pool2d(torch.tanh(self.conv1(x)), 2)
        out = F.max_pool2d(torch.tanh(self.conv2(out)), 2)
        out = out.view(-1, 16 * 8 * 8)
        out = torch.tanh(self.fc1(out))
        out = self.fc2(out)
        return out

We see the model takes an input 2.d. image with 3 channels and:

  • Conv2d -> sends it to an image of the same size with 32 channels
  • max_pool2d(,2) -> halves the size of the image in each dimension
  • Conv2d -> sends it to an image of the same size with 16 channels
  • max_pool2d(,2) -> halves the size of the image in each dimension
  • view -> reshapes the image
  • Linear -> takes a tensor of size 16 * 8 * 8 and sends to size 32

So working backwards, we have:

  • a tensor of shape 16 * 8 * 8
  • un-reshaped into shape (channels x height x width)
  • un-max_pooled in 2d with factor 2, so height and width un-halved
  • un-convolved from 16 channels to 32
    Hypothesis: It is likely 16 in the product thus refers to the number of channels, and that the image seen by view was of shape (channels, 8,8), and currently is (channels, 16,16)2
  • un-max_pooled in 2d with factor 2, so height and width un-halved again (channels, 32,32)
  • un-convolved from 32 channels to 3

So assuming the kernel_size and padding are sufficient that the convolutions themselves maintain image dimensions, it is likely that the input image is of shape (3,32,32) i.e. RGB 32×32 pixel square images.


Notes:

 

  1. Even the external package pytorch-summary requires you provide the input shape in order to display the shape of the output of each layer.

  2. It could however be any 2 numbers whose produce equals 8*8 e.g. (64,1), (32,2), (16,4) etc however since the code is written as 8*8 it is likely the authors used the actual dimensions.

 

Original Author iacob Of This Content

Solution 2

print(model)

Will give you a summary of the model, where you can see the shape of each layer.

You can also use the pytorch-summary package.

If your network has a FC as a first layer, you can easily figure its input shape. You mention that you have a Convolutional layer at the front. With Fully Connected layers present too, the network will produce output for only one specific input size. I’m proposing to figure this out by using various shapes, i.e. feeding a toy batch with some shape, and then checking the output of the Conv layer just before the FC layer.

As this depends on the architecture of the net before the first FC layer (num of conv layers, kernels, etc), I can’t give you an exact formula for the correct input. As mentioned, you have to figure this out by experimenting with various input shapes, and the resulting net’s output before the first FC. There’s (almost) always a way to solve something with code, but I can’t think of something else right now.

Original Author Alex Metsai Of This Content

Solution 3

You can get input shape from first tensor in model parameters.

For example create some model:

class CustomNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(1568, 256)
        self.fc2 = nn.Linear(256, 256)
        self.fc3 = nn.Linear(256, 20)

    def forward(self, x):
        out = self.fc1(x)
        out = F.relu(out)
        out = self.fc2(out)
        out = F.relu(out)
        out = self.fc3(out)
        return out

model = CustomNet()

So model.parameters() method returns an iterator over module parameters of torch.Tensor class. Look at the docs https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.parameters

And first parameter is the input tensor.

first_parameter = next(model.parameters())
input_shape = first_parameter.size()

Original Author Alexander Nikitin Of This Content

Conclusion

So This is all About This Tutorial. Hope This Tutorial Helped You. Thank You.

Also Read,

ittutorial team

I am an Information Technology Engineer. I have Completed my MCA And I have 4 Year Plus Experience, I am a web developer with knowledge of multiple back-end platforms Like PHP, Node.js, Python and frontend JavaScript frameworks Like Angular, React, and Vue.

Leave a Comment