-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dataloader issure when use parallel #1187
Comments
I'm sorry, but I can already see why the dataloader parellel I wrote myself why it reported an error. library(torch) minist_dataset<-dataset( ministdsta<-minist_dataset(dfminist2[,-1],dfminist2[,1]) #The following is if you use another data, function, or package in your dataset section. #In the training process, you need to pay attention to the following, the dataset and dataloader part should be written in the for_epoch loop. #Execute dataset , dataloader and optimizer here model$train() #After each epoch, you need to delete loss,output,b,optimizer,dataset,dataloader #In this way, you can use dataloader parallel normally, especially if you use gpu for neural network accelerated training。 |
#The above dataloader parallel in the large batch_size is still index error, after practice, although the underlying code can not be corrected, but through the following code to a large extent to prevent the occurrence of such an error occurs. #code ministdsta<-minist_dataset(dfminist2[,-1],dfminist2[,1]) #The following is if you use another data, function, or package in your dataset section. for(epoch in 1:100){ #This way, the dataloader parallel can be used normally |
Final solution for dataloader parallel to report errors during neural network training. The previous problem solving method is only suitable for small datasets, where data can be loaded into memory in one go, this method is not suitable for large datasets, especially when the data is based on images. ######################dataset########################## mydstr<-my_dataset_tr(d_tr) ##################model##################### ###################train################## n_epochs<-length(lr_ratio)# #Train #Evaluate } In this way, the problems that arise during the training of the network can be solved.
|
code:
ministdsta<-minist_dataset(xarray2,label=dfminist2[,1])
ministdlta<-dataloader(ministdsta,batch_size=200,shuffle=T,
num_workers = 2,pin_memory=T,
worker_globals=list(xarray2,dfminist2,ministdsta)
)
result:
Warning messages:
1: Datasets used with parallel dataloader (num_workers > 0) shouldn't have fields containing tensors as they can't be correctly passed to the wroker subprocesses.
2: Datasets used with parallel dataloader (num_workers > 0) shouldn't have fields containing tensors as they can't be correctly passed to the wroker subprocesses.
and then when run in coorp.
code:
torch_manual_seed(1)
for(epoch in 1:100) {
coro::loop(for(b in ministdlta) {
b1<-b[[1]]
b2<-b[[2]]
})
print(epoch)}
result:
Error in
self$.pop_task()
:! Error when getting dataset item.
Caused by error:
! in callr subprocess.
Caused by error:
! 找不到对象'.socket_con'
Run
rlang::last_trace()
to see where the error occurred.Backtrace:
x
<fn>
()Run rlang::last_trace(drop = FALSE) to see 1 hidden frame.
The text was updated successfully, but these errors were encountered: