Skip to content

Latest commit

 

History

History
343 lines (225 loc) · 9.61 KB

2-3,动态计算图.md

File metadata and controls

343 lines (225 loc) · 9.61 KB

2-3, dynamic calculation graph

In this section we will introduce Pytorch's dynamic calculation graph.

Include:

  • Introduction to dynamic calculation graph

  • Function in the calculation graph

  • Calculation graph and back propagation

  • Leaf nodes and non-leaf nodes

  • Visualization of calculation graph in TensorBoard

1. Introduction to dynamic calculation graph

The calculation graph of Pytorch is composed of nodes and edges. Nodes represent tensors or Functions, and edges represent the dependencies between tensors and Functions.

The calculation graph in Pytorch is a dynamic graph. The dynamics here have two main implications.

The first meaning is: the forward propagation of the calculation graph is executed immediately. There is no need to wait for the complete calculation graph to be created. Each statement dynamically adds nodes and edges to the calculation graph, and immediately performs forward propagation to obtain the calculation result.

The second meaning is: the calculation graph is destroyed immediately after backpropagation. The next call needs to rebuild the calculation graph. If the backward method is used in the program to perform backpropagation, or the torch.autograd.grad method is used to calculate the gradient, the created calculation graph will be destroyed immediately, freeing up storage space, and the next call needs to be recreated.

**1, the forward propagation of the calculation graph is executed immediately. **

import torch
w = torch.tensor([[3.0,1.0]],requires_grad=True)
b = torch.tensor([[3.0]],requires_grad=True)
X = torch.randn(10,2)
Y = torch.randn(10,1)
Y_hat = X@w.t() + b # After Y_hat is defined, its forward propagation is executed immediately, regardless of the loss creation statement that follows
loss = torch.mean(torch.pow(Y_hat-Y,2))

print(loss.data)
print(Y_hat.data)
tensor(17.8969)
tensor([[3.2613],
        [4.7322],
        [4.5037],
        [7.5899],
        [7.0973],
        [1.3287],
        [6.1473],
        [1.3492],
        [1.3911],
        [1.2150]])

**2, the calculation graph is destroyed immediately after backpropagation. **

import torch
w = torch.tensor([[3.0,1.0]],requires_grad=True)
b = torch.tensor([[3.0]],requires_grad=True)
X = torch.randn(10,2)
Y = torch.randn(10,1)
Y_hat = X@w.t() + b # After Y_hat is defined, its forward propagation is executed immediately, regardless of the loss creation statement that follows
loss = torch.mean(torch.pow(Y_hat-Y,2))

#The calculation graph is destroyed immediately after backpropagation. If you need to keep the calculation graph, you need to set retain_graph = True
loss.backward() #loss.backward(retain_graph = True)

#loss.backward() #If backpropagation is executed again, an error will be reported

Second, calculate the Function in the graph

We are already familiar with the tensor in the calculation graph. The other kind of node in the calculation graph is Function, which is actually the various functions for tensor operations in Pytorch.

There is a big difference between these Functions and our Python functions, that is, it includes both forward calculation logic and back propagation logic.

We can create this kind of function that supports backpropagation by inheriting torch.autograd.Function

class MyReLU(torch.autograd.Function):
   
    #Forward propagation logic, you can use ctx to store some values ​​for back propagation.
    @staticmethod
    def forward(ctx, input):
        ctx.save_for_backward(input)
        return input.clamp(min=0)

    #Back Propagation Logic
    @staticmethod
    def backward(ctx, grad_output):
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input <0] = 0
        return grad_input
import torch
w = torch.tensor([[3.0,1.0]],requires_grad=True)
b = torch.tensor([[3.0]],requires_grad=True)
X = torch.tensor([[-1.0,-1.0],[1.0,1.0]])
Y = torch.tensor([[2.0,3.0]])

relu = MyReLU.apply # relu can now also have forward propagation and back propagation functions
Y_hat = relu(X@w.t() + b)
loss = torch.mean(torch.pow(Y_hat-Y,2))

loss.backward()

print(w.grad)
print(b.grad)
tensor([[4.5000, 4.5000]])
tensor([[4.5000]])
# Y_hat's gradient function is our own definition of MyReLU.backward

print(Y_hat.grad_fn)
<torch.autograd.function.MyReLUBackward object at 0x1205a46c8>

Three, calculation graph and back propagation

Knowing the function of Function, we can simply understand the principle and process of backpropagation. Understanding this part of the principle requires some basic knowledge of the derivative chain rule in advanced mathematics.

import torch

x = torch.tensor(3.0,requires_grad=True)
y1 = x + 1
y2 = 2*x
loss = (y1-y2)**2

loss.backward()

After the loss.backward() statement is called, the following calculation processes occur in sequence.

  1. The grad gradient of loss is assigned to 1, that is, the gradient to itself is 1.

  2. Loss calculates the gradient of its corresponding independent variables, namely y1 and y2, according to its own gradient and the associated backward method, and assigns this value to y1.grad and y2.grad.

  3. y2 and y1 respectively calculate the gradient of their corresponding independent variable x according to their own gradient and the associated backward method, and x.grad accumulates the multiple gradient values ​​it receives.

(Note that the gradient order of steps 1, 2, and 3 and the accumulation rule of multiple gradient values ​​are exactly the program expression of the derivative chain rule)

Because of the gradient accumulation rule derived from the derivative chain rule, the grad gradient of the tensor will not be automatically cleared, and it needs to be manually set to zero when needed.

Four, leaf nodes and non-leaf nodes

Execute the following code, we will find that loss.grad is not the 1, we expect, but None.

Similarly, y1.grad and y2.grad are also None.

Why is this? This is because they are not leaf node tensors.

In the backpropagation process, only the leaf nodes with is_leaf=True, the derivative result of the tensor that needs to be derived will be finally retained.

So what is a leaf node tensor? The leaf node tensor needs to meet two conditions.

  1. The leaf node tensor is a tensor created directly by the user, not a tensor calculated by a function.

  2. The requires_grad attribute of the leaf node tensor must be True.

Pytorch designs such rules mainly to save memory or video memory space, because almost all the time, the user only cares about the gradient of the tensor created directly by himself.

All tensors that depend on the leaf node tensor must have the requires_grad attribute to be True, but their gradient value is only used in the calculation process and will not be stored in the grad attribute.

If you need to retain the gradient of the intermediate calculation result in the grad attribute, you can use the retain_grad method. If it is only for debugging the code to view the gradient value, you can use register_hook to print the log.

import torch

x = torch.tensor(3.0,requires_grad=True)
y1 = x + 1
y2 = 2*x
loss = (y1-y2)**2

loss.backward()
print("loss.grad:", loss.grad)
print("y1.grad:", y1.grad)
print("y2.grad:", y2.grad)
print(x.grad)
loss.grad: None
y1.grad: None
y2.grad: None
tensor(4.)
print(x.is_leaf)
print(y1.is_leaf)
print(y2.is_leaf)
print(loss.is_leaf)
True
False
False
False

Use retain_grad to retain the gradient value of non-leaf nodes, and use register_hook to view the gradient value of non-leaf nodes.

import torch

#Forward spread
x = torch.tensor(3.0,requires_grad=True)
y1 = x + 1
y2 = 2*x
loss = (y1-y2)**2

#Non-leaf node gradient display control
y1.register_hook(lambda grad: print('y1 grad:', grad))
y2.register_hook(lambda grad: print('y2 grad:', grad))
loss.retain_grad()

#Backpropagation
loss.backward()
print("loss.grad:", loss.grad)
print("x.grad:", x.grad)
y2 grad: tensor(4.)
y1 grad: tensor(-4.)
loss.grad: tensor(1.)
x.grad: tensor(4.)

Five, the visualization of computational graphs in TensorBoard

You can use torch.utils.tensorboard to export the calculation graph to TensorBoard for visualization.

from torch import nn
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.w = nn.Parameter(torch.randn(2,1))
        self.b = nn.Parameter(torch.zeros(1,1))

    def forward(self, x):
        y = x@self.w + self.b
        return y

net = Net()
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter('./data/tensorboard')
writer.add_graph(net,input_to_model = torch.rand(10,2))
writer.close()
%load_ext tensorboard
#%tensorboard --logdir ./data/tensorboard
from tensorboard import notebook
notebook.list()
#View model in tensorboard
notebook.start("--logdir ./data/tensorboard")

If this book is helpful to you and want to encourage the author, remember to add a star⭐️ to this project and share it with your friends 😊!

If you need to further communicate with the author on the understanding of the content of this book, please leave a message under the public account "Algorithm Food House". The author has limited time and energy and will respond as appropriate.

You can also reply to keywords in the background of the official account: Add group, join the reader exchange group and discuss with you.

算法美食屋logo.png