You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Impact of the bug
WMAgent (current version 2.3.9.2)
Describe the bug
As Artur reported in the CompOps mattermost channel, LogCollect jobs are failing with the following error message [1].
This error appears because LogCollect jobs transfer the log tarball over to CERN, with the XRDC implementation, which at the moment does not implement the method createDebuggingCommand().
Note that this problem only surfaced because CERN /store namespace is out of quota, which caused all stage out retries to fail and the debugging block to be executed.
For those plugins that will soon be deprecated (with tokens adoption?), we can skip the actual debugging code, if needed, and just leave a simple placeholder like print/echo.
Additional context and error message
[1]
WMBS job id: 15678072
Task: /pdmvserv_Run2024C_ParkingVBF2_2024CDEReprocessing_250129_055301_2513/DataProcessing/LogCollect
Status: jobfailed
Site: T1_DE_KIT
Agent: cmsgwms-submit14.fnal.gov
logCollect1
WMAgentStepExecutionError (Exit Code: 60408)
<@========== WMException Start ==========@>
Exception Class: LogCollectStageOutError
Message: Unable to stageOut LogCollect to Castor:
<@========== WMException Start ==========@>
Exception Class: StageOutFailure
Message: Failure for override stage out:
StageOutImpl.createDebuggingCommand
ClassName : None
ModuleName : WMCore.Storage.StageOutError
MethodName : __init__
ClassInstance : None
FileName : /srv/job/WMCore.zip/WMCore/Storage/StageOutError.py
LineNumber : 32
ErrorNr : 0
Command : xrdcp
LFN : /store/logs/prod/2025/02/WMAgent/pdmvserv_Run2024C_ParkingVBF2_2024CDEReprocessing_250129_055301_2513/pdmvserv_Run2024C_ParkingVBF2_2024CDEReprocessing_250129_055301_2513-LogCollect-c02-014-184-69-logs.tar
InputPFN : /srv/job/WMTaskSpace/logCollect1/pdmvserv_Run2024C_ParkingVBF2_2024CDEReprocessing_250129_055301_2513-LogCollect-c02-014-184-69-logs.tar
TargetPFN : root://eoscms.cern.ch//eos/cms/store/logs/prod/2025/02/WMAgent/pdmvserv_Run2024C_ParkingVBF2_2024CDEReprocessing_250129_055301_2513/pdmvserv_Run2024C_ParkingVBF2_2024CDEReprocessing_250129_055301_2513-LogCollect-c02-014-184-69-logs.tar
ErrorCode : 60311
ErrorType : GeneralStageOutFailure
Traceback:
File "/srv/job/WMCore.zip/WMCore/Storage/StageOutMgr.py", line 361, in stageOut
impl(self.overrideConf['command'], localPfn, pfn, self.overrideConf["option"], checksums)
File "/srv/job/WMCore.zip/WMCore/Storage/StageOutImpl.py", line 239, in __call__
command = self.createDebuggingCommand(sourcePFN, targetPFN, options, checksums)
File "/srv/job/WMCore.zip/WMCore/Storage/StageOutImpl.py", line 150, in createDebuggingCommand
raise NotImplementedError("StageOutImpl.createDebuggingCommand")
<@---------- WMException End ----------@>
ClassName : None
ModuleName : WMCore.WMSpec.Steps.WMExecutionFailure
MethodName : __init__
ClassInstance : None
FileName : /srv/job/WMCore.zip/WMCore/WMSpec/Steps/WMExecutionFailure.py
LineNumber : 18
ErrorNr : 60408
Traceback:
File "/srv/job/WMCore.zip/WMCore/WMSpec/Steps/Executors/LogCollect.py", line 234, in execute
eosStageOutMgr(tarInfo)
File "/srv/job/WMCore.zip/WMCore/Storage/StageOutMgr.py", line 289, in __call__
raise lastException
File "/srv/job/WMCore.zip/WMCore/Storage/StageOutMgr.py", line 272, in __call__
pfn = self.stageOut(lfn, fileToStage['PFN'], fileToStage.get('Checksums'))
File "/srv/job/WMCore.zip/WMCore/Storage/StageOutMgr.py", line 365, in stageOut
raise StageOutFailure(msg, Command=self.overrideConf['command'],
The text was updated successfully, but these errors were encountered:
Impact of the bug
WMAgent (current version 2.3.9.2)
Describe the bug
As Artur reported in the CompOps mattermost channel, LogCollect jobs are failing with the following error message [1].
This error appears because LogCollect jobs transfer the log tarball over to CERN, with the XRDC implementation, which at the moment does not implement the method
createDebuggingCommand()
.Note that this problem only surfaced because CERN /store namespace is out of quota, which caused all stage out retries to fail and the debugging block to be executed.
How to reproduce it
None
Expected behavior
We need to implement the Interface
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/Storage/StageOutImpl.py#L145
in all stage out plugins (Backends python package) that can be reached in this line:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/Storage/StageOutImpl.py#L239
For those plugins that will soon be deprecated (with tokens adoption?), we can skip the actual debugging code, if needed, and just leave a simple placeholder like print/echo.
Additional context and error message
[1]
The text was updated successfully, but these errors were encountered: