import os
import datetime
#Print Time
def printbar():
nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
print("\n"+"=========="*8 + "%s"%nowtime)
#Mac system pytorch and matplotlib running at the same time in jupyter need to change environment variables
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
Earlier we introduced some common APIs in Pytorch's tensor structure operations and mathematical operations.
Using these tensor APIs, we can build neural network-related components (such as activation function, model layer, loss function).
Most of the functional components related to Pytorch and neural networks are encapsulated under the torch.nn module.
The vast majority of these functional components are implemented both in functional form and in class form.
Among them, nn.functional (generally renamed to F after being introduced) has the function realization of various functional components. E.g:
(Activation function)
- F.relu
- F.sigmoid
- F.tanh
- F.softmax
(Model layer)
- F.linear
- F.conv2d
- F.max_pool2d
- F.dropout2d
- F.embedding
(Loss function)
- F.binary_cross_entropy
- F.mse_loss
- F.cross_entropy
In order to facilitate the management of parameters, it is generally converted into the realization form of the class by inheriting nn.Module, and is directly encapsulated in the nn module. E.g:
(Activation function)
- nn.ReLU
- nn.Sigmoid
- nn.Tanh
- nn.Softmax
(Model layer)
- nn.Linear
- nn.Conv2d
- nn.MaxPool2d
- nn.Dropout2d
- nn.Embedding
(Loss function)
- nn.BCELoss
- nn.MSELoss
- nn.CrossEntropyLoss
In fact, nn.Module can not only manage the various parameters it references, but also manage the sub-modules it references, which is very powerful.
In Pytorch, the parameters of the model need to be trained by the optimizer. Therefore, it is usually necessary to set the parameter to a tensor with requires_grad = True.
At the same time, in a model, there are often many parameters, and it is not easy to manually manage these parameters.
Pytorch generally uses nn.Parameter to represent parameters, and uses nn.Module to manage all parameters under its structure.
import torch
from torch import nn
import torch.nn.functional as F
from matplotlib import pyplot as plt
# nn.Parameter has requires_grad = True attribute
w = nn.Parameter(torch.randn(2,2))
print(w)
print(w.requires_grad)
Parameter containing:
tensor([[ 0.3544, -1.1643],
[1.2302, 1.3952]], requires_grad=True)
True
# nn.ParameterList can combine multiple nn.Parameters into a list
params_list = nn.ParameterList([nn.Parameter(torch.rand(8,i)) for i in range(1,3)])
print(params_list)
print(params_list[0].requires_grad)
ParameterList(
(0): Parameter containing: [torch.FloatTensor of size 8x1]
(1): Parameter containing: [torch.FloatTensor of size 8x2]
)
True
# nn.ParameterDict can combine multiple nn.Parameters into a dictionary
params_dict = nn.ParameterDict({"a":nn.Parameter(torch.rand(2,2)),
"b":nn.Parameter(torch.zeros(2))})
print(params_dict)
print(params_dict["a"].requires_grad)
ParameterDict(
(a): Parameter containing: [torch.FloatTensor of size 2x2]
(b): Parameter containing: [torch.FloatTensor of size 2]
)
True
# You can use Module to manage them
# module.parameters() returns a generator, including all the parameters under its structure
module = nn.Module()
module.w = w
module.params_list = params_list
module.params_dict = params_dict
num_param = 0
for param in module.parameters():
print(param,"\n")
num_param = num_param + 1
print("number of Parameters =",num_param)
Parameter containing:
tensor([[ 0.3544, -1.1643],
[1.2302, 1.3952]], requires_grad=True)
Parameter containing:
tensor([[0.9391],
[0.7590],
[0.6899],
[0.4786],
[0.2392],
[0.9645],
[0.1968],
[0.1353]], requires_grad=True)
Parameter containing:
tensor([[0.8012, 0.9587],
[0.0276, 0.5995],
[0.7338, 0.5559],
[0.1704, 0.5814],
[0.7626, 0.1179],
[0.4945, 0.2408],
[0.7179, 0.0575],
[0.3418, 0.7291]], requires_grad=True)
Parameter containing:
tensor([[0.7729, 0.2383],
[0.7054, 0.9937]], requires_grad=True)
Parameter containing:
tensor([0., 0.], requires_grad=True)
number of Parameters = 5
#In practice, the module class is generally constructed by inheriting nn.Module, and all the parts containing the parameters that need to be learned are placed in the constructor.
#The following example is a simplified version of the source code of nn.Linear in Pytorch
#You can see that it puts the parameters that need to be learned in the __init__ constructor, and calls the F.linear function in the forward to implement the calculation logic.
class Linear(nn.Module):
__constants__ = ['in_features','out_features']
def __init__(self, in_features, out_features, bias=True):
super(Linear, self).__init__()
self.in_features = in_features
self.out_features = out_features
self.weight = nn.Parameter(torch.Tensor(out_features, in_features))
if bias:
self.bias = nn.Parameter(torch.Tensor(out_features))
else:
self.register_parameter('bias', None)
def forward(self, input):
return F.linear(input, self.weight, self.bias)
Under normal circumstances, we rarely use nn.Parameter directly to define parameters to build models, but construct models by assembling some commonly used model layers.
These model layers are also objects inherited from nn.Module, and they also include parameters, which belong to the sub-modules of the module we want to define.
nn.Module provides some methods to manage these sub-modules.
-
children() method: returns the generator, including all submodules under the module.
-
named_children() method: returns a generator, including all submodules under the module, and their names.
-
The modules() method: returns a generator, including all the modules of each level under the module, including the module itself.
-
named_modules() method: returns a generator, including all the modules and their names under the module, including the module itself.
Among them, the chidren() method and the named_children() method are more used.
The modules() method and the named_modules() method are rarely used, and their functions can be implemented by nesting multiple named_children().
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.embedding = nn.Embedding(num_embeddings = 10000,embedding_dim = 3,padding_idx = 1)
self.conv = nn.Sequential()
self.conv.add_module("conv_1",nn.Conv1d(in_channels = 3,out_channels = 16,kernel_size = 5))
self.conv.add_module("pool_1",nn.MaxPool1d(kernel_size = 2))
self.conv.add_module("relu_1",nn.ReLU())
self.conv.add_module("conv_2",nn.Conv1d(in_channels = 16,out_channels = 128,kernel_size = 2))
self.conv.add_module("pool_2",nn.MaxPool1d(kernel_size = 2))
self.conv.add_module("relu_2",nn.ReLU())
self.dense = nn.Sequential()
self.dense.add_module("flatten",nn.Flatten())
self.dense.add_module("linear",nn.Linear(6144,1))
self.dense.add_module("sigmoid",nn.Sigmoid())
def forward(self,x):
x = self.embedding(x).transpose(1,2)
x = self.conv(x)
y = self.dense(x)
return y
net = Net()
i = 0
for child in net.children():
i+=1
print(child,"\n")
print("child number",i)
Embedding(10000, 3, padding_idx=1)
Sequential(
(conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
(pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(relu_1): ReLU()
(conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
(pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(relu_2): ReLU()
)
Sequential(
(flatten): Flatten()
(linear): Linear(in_features=6144, out_features=1, bias=True)
(sigmoid): Sigmoid()
)
child number 3
i = 0
for name,child in net.named_children():
i+=1
print(name,":",child,"\n")
print("child number",i)
embedding: Embedding(10000, 3, padding_idx=1)
conv: Sequential(
(conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
(pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(relu_1): ReLU()
(conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
(pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(relu_2): ReLU()
)
dense: Sequential(
(flatten): Flatten()
(linear): Linear(in_features=6144, out_features=1, bias=True)
(sigmoid): Sigmoid()
)
child number 3
i = 0
for module in net.modules():
i+=1
print(module)
print("module number:",i)
Net(
(embedding): Embedding(10000, 3, padding_idx=1)
(conv): Sequential(
(conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
(pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(relu_1): ReLU()
(conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
(pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(relu_2): ReLU()
)
(dense): Sequential(
(flatten): Flatten()
(linear): Linear(in_features=6144, out_features=1, bias=True)
(sigmoid): Sigmoid()
)
)
Embedding(10000, 3, padding_idx=1)
Sequential(
(conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
(pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(relu_1): ReLU()
(conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
(pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(relu_2): ReLU()
)
Conv1d(3, 16, kernel_size=(5,), stride=(1,))
MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
ReLU()
Conv1d(16, 128, kernel_size=(2,), stride=(1,))
MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
ReLU()
Sequential(
(flatten): Flatten()
(linear): Linear(in_features=6144, out_features=1, bias=True)
(sigmoid): Sigmoid()
)
Flatten()
Linear(in_features=6144, out_features=1, bias=True)
Sigmoid()
module number: 13
Below we find the embedding layer through the named_children method, and set its parameters to be non-trainable (equivalent to freezing the embedding layer).
children_dict = {name:module for name,module in net.named_children()}
print(children_dict)
embedding = children_dict["embedding"]
embedding.requires_grad_(False) #Freeze its parameters
{'embedding': Embedding(10000, 3, padding_idx=1),'conv': Sequential(
(conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
(pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(relu_1): ReLU()
(conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
(pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(relu_2): ReLU()
),'dense': Sequential(
(flatten): Flatten()
(linear): Linear(in_features=6144, out_features=1, bias=True)
(sigmoid): Sigmoid()
)}
#You can see that the parameters of the first layer can no longer be trained.
for param in embedding.parameters():
print(param.requires_grad)
print(param.numel())
False
30000
from torchkeras import summary
summary(net,input_shape = (200,),input_dtype = torch.LongTensor)
# Increase in the number of untrainable parameters
-------------------------------------------------- --------------
Layer (type) Output Shape Param #
================================================= ==============
Embedding-1 [-1, 200, 3] 30,000
Conv1d-2 [-1, 16, 196] 256
MaxPool1d-3 [-1, 16, 98] 0
ReLU-4 [-1, 16, 98] 0
Conv1d-5 [-1, 128, 97] 4,224
MaxPool1d-6 [-1, 128, 48] 0
ReLU-7 [-1, 128, 48] 0
Flatten-8 [-1, 6144] 0
Linear-9 [-1, 1] 6,145
Sigmoid-10 [-1, 1] 0
================================================= ==============
Total params: 40,625
Trainable params: 10,625
Non-trainable params: 30,000
-------------------------------------------------- --------------
Input size (MB): 0.000763
Forward/backward pass size (MB): 0.287796
Params size (MB): 0.154972
Estimated Total Size (MB): 0.443531
-------------------------------------------------- --------------
If this book is helpful to you and want to encourage the author, remember to add a star to this project, and share it with your friends 😊!
If you need to further communicate with the author on the understanding of the content of this book, please leave a message under the public account "Algorithm Food House". The author has limited time and energy and will respond as appropriate.
You can also reply to keywords in the background of the official account: Add group, join the reader exchange group and discuss with you.